Beyond Machine Learning: Capturing Cause-and-Effect Relationships
by Irving Wladawsky-Berger, Chairman r4 Advisory Council
Artificial intelligence is rapidly becoming one of the most important technologies of our era. Every day we can read about the latest AI advances from startups and large companies. AI technologies are approaching or surpassing human levels of performance in vision, speech recognition, language translation, and other human domains. Machine learning advances, like deep learning, have played a central role in AI’s recent achievements, giving computers the ability to be trained by ingesting and analyzing large amounts of data instead of being explicitly programmed.
Deep learning is a powerful statistical technique for classifying patterns using large training data sets and multi-layer AI neural networks. It’s essentially a method for machines to learn from all kinds of data, whether structured or unstructured, that’s loosely modeled on the way a biological brain learns new capabilities. Each artificial neural unit is connected to many other such units, and the links can be statistically strengthened or decreased based on the data used to train the system. Each successive layer in a multi-layer network uses the output from the previous layer as input.
Machine learning can be applied to just about any domain of knowledge given our ability to gather valuable data in almost any area of interest. But, machine learning methods are significantly narrower and more specialized than humans. There are many tasks for which they’re not effective given the current state-of-the-art. In an article recently published in Science, professors Erik Brynjolfsson and Tom Mitchell identified the key criteria that help distinguish tasks that are particularly suitable for machine learning from those that are not. These include:
- Tasks that map well-defined inputs to well-defined outputs, – e.g., labeling images of specific animals, the probability of cancer in medical record, the likelihood of defaulting on a loan application;
- Large data sets exist or can be created containing such input-output pairs, – the bigger the training data sets the more accurate the learning;
- The capability being learned should be relatively static, – If the function changes rapidly, retraining is typically required, including the acquisition of new training data; and
- No need for detailed explanation of how the decision was made, – the methods behind a machine learning recommendation, – subtle adjustments to the numerical weights that interconnect its huge number of artificial neurons, – are difficult to explain because they’re so different from those used by humans.
Physics, biology and other natural sciences have long relied on scientific models and principles to understand and explain the cause-and-effect relationships that enable them to detect faint signals within large and/or noisy data sets, – i.e., the proverbial needle in a haystack. For example, tracking potentially hazardous, fast moving, near-Earth objects is based on the detection of very small changes in the sky. Similarly, searching for Earth-size extrasolar planets is based on detecting the faint changes in a star’s light caused by a potential planet quickly passing by. No matter how much data we might have access to, it would be near impossible to detect the weak, noisy signals associated with either task without the models developed over the past few hundred years indicating a potential entity of interest that should be further investigated.
Such scientific models have enabled the discovery of very short lived elementary particles, – like the Higgs boson, – amidst the huge amounts of data generated by high energy particle accelerators. “According to the casino rules of modern quantum physics, anything that can happen will happen eventually,” explained a recent article on CERN’s Large Hadron Collider. “Before a single proton is fired through the collider, computers have calculated all the possible outcomes of a collision according to known physics. Any unexpected bump in the real data at some energy could be a signal of unknown physics, a new particle. That was how the Higgs was discovered, emerging from the statistical noise in the autumn of 2011.”
A few months ago, Gartner published a report on The Top 10 Strategic Technology Trends for 2019, that is, trends with the potential to impact and transform industries over the next 5 years. The report included three AI trends, noting that AI is opening up a new frontier for digital business, “because virtually every application, service and Internet of Things (IoT) object incorporates an intelligent aspect to automate or augment application processes or human activities.”
The Augmented Developer is one of Gartner’s top AI trends. “The market is rapidly shifting from one in which professional data scientists must partner with application developers to create most AI-enhanced solutions to one in which professional developers can operate alone using predefined models delivered as a service. This provides the developer with an ecosystem of AI algorithms and models, as well as development tools tailored to integrating AI capabilities and models into a solution.”
“Some AI services are complete models that a developer can simply call as a function, pass the appropriate parameters and data, and obtain a result. Others may be pretrained to a high level but require some additional data to complete the training… The advantage of these partially trained models is that they require much smaller datasets for training. Not only does the evolution of these AI platforms and suites of AI services enable a wider range of developers to deliver AI-enhanced solutions, but it also delivers much higher developer productivity.”
Let me discuss two examples of AI solutions that’ve been enhanced by the inclusion of pre-defined models. The first deals with predicting human behaviors, based on research in MIT’s Human Dynamics group led by Media Lab professor Alex (Sandy) Pentland, – which is explained in detail in his 2014 book Social Physics: How Good Ideas Spread.
Data derived from human behavior is dynamic and ever-changing. Such messy data is difficult to analyze to make predictions, such as who are our top customers and how do we acquire more of them?; and where should we open our next store?
After years of research, Pentland’s group discovered that all event-data representing human activity contain a special set of social activity patterns regardless of what the data is about. These patterns are common across all human activities and demographics, and can be used to detect emerging behavioral trends before they can be observed by any other technique. Detecting such fast-changing trends requires the ability to frequently analyze data sets collected over short periods of time looking for deviations from the patterns predicted by human behavior models.
Why are there such universal human activity patterns? The answer likely lies in human evolution. We are a social species, with the drive to learn from others in our social group. Such social learning has helped us survive by adapting to drastically different environments, and has thus been reinforced by natural selection.
These social behavior patterns have been tested across a variety of applications involving people, including strategy formulation in business, economic activity in cities, and, – working with an intelligence agency, – the detection of potential terrorist activity based on Twitter data. As long as the data involves human activity, – regardless of the type of data, the demographic of the users or the size of the data sets, – similar behavioral dynamics apply.
The second example comes from r4 Technologies, an AI-based company whose advisory board I recently joined. Human-based organizations, like companies and industries, are generally based on common elements, processes, and relationship patterns, making it possible to develop fairly universal models of the way they function. Over the past decade, r4 has developed such generic, customizable models of business organizations and the industry sectors in which they operate, based on three key entities, – people, products and places, – and their various attributes and interrelationships.
The models are then customized for each specific company using its own internal data as well as a variety of external data sources, thus creating a unique digital twin simulation of the company and its market environment, which is continually updated as new data comes in. This enables the company to detect emerging business and market trends before they can be detected by statistical methods, helping the company make better, faster decisions.
Let me conclude by summarizing the key benefits of AI solutions based on augmenting statistical methods with domain-based models:
- they can be trained or customized with much smaller data sets;
- they can tolerate much more noise in the data;
- they can be continually updated with new data reflecting changing conditions;
- it’s easier to explain how a decision or recommendation was arrived at; and,
- such augmented AI solutions help capture cause-and-effect relationships.