- Machine learning faces causality problems.
- Even after training the machine learning models on use data sets.
- It may make dangerous mistakes when slide changes are applied to the environment.
When a batsman moves his bat to hit the ball with his bat we know that the moment of hand is causing the bat to hit the ball. As humans, we know that the bad causes the change of the direction of the ball. These inferences come to us naturally and we have learned these at a very young age. But the machine learning algorithms have managed to outperform humans and still struggle with causality. Deep neural networks which are a part of machine learning algorithms are good at ferreting subtle patterns from the huge data set. They can transcribe the audio in a real-time environment, label thousands of images and video frames per second, and conduct examinations of X-Ray and MRI scans for cancer patients.
Yet these networks face difficulty to define simple causal inferences which we saw in the ball and bat example. In a paper titled, “Towards Causal Representation Learning”, researchers at the Max Planck Institute for Intelligent Systems, the Montreal Institute for Learning Algorithms, and Google Research have discussed the challenges that arise from the lack of causal representations in machine learning models. It provides directions for the creation of artificial intelligence systems which can learn the causal representation. It is one of the several initiatives that aim to explore and solve machine learning’s lack of causality which could be the key to overcome some of the major challenges the field faces today.
Independent and identically distributed data:
Machine learning models fail at generalizing beyond the narrow domains and training data – why is that?
“Machine learning often disregards information that animals use heavily: interventions in the world, domain shifts, temporal structure — by and large, we consider these factors a nuisance and try to engineer them away,” write the authors of the causal representation learning paper. “In accordance with this, the majority of current successes of machine learning boil down to large scale pattern recognition on suitably collected independent and identically distributed (i.i.d.) data.”
The common machine learning term “i.i.d”, assumes that the random observations in a problem space do not depend on each other and have a constant probability of occurrence. The simplest example of i.i.d is flipping a coin are tossing a die. The result that would come out of tossing them would be independent of the previous outcomes and the probability of each outcome remains constant. As far as the complicated area like computer vision is concerned, the machine learning engineers are trying to turn the problem into an i.i.d domain by training the model on large corpora of examples.
The assumption is that with enough examples, the machine learning model would be able to encode the distribution of problems in its parameters. But in the real world environment distribution change is with the factors that cannot be considered and controlled in the training data. For example, convolutional neural networks that have been trained on millions of images can fail when they see objects under new light conditions or from slightly different angles or new backgrounds.
Addressing machine learning problems:
The efforts to address these problems include training the machine learning models on sufficient examples. As the complexity grows it becomes difficult to cover the entire distribution with the addition of more training examples. This stands true in the domains where the AI I agents must interact with the world which includes robotics and self-driven cars. The lack of causal understanding makes it difficult to make predictions and work in novel situations. This is the reason you see the self-driven cars make weird and dangerous mistakes even after they have been trained for a million miles.
Generalizing well outside the i.i.d. setting requires learning not mere statistical associations between variables, but an underlying causal model,” the AI researchers write. The causal models enable humans how to repurpose the previously gained knowledge for application within new domains. For instance, when you learn a real-time strategy game such as Warcraft, it is easy to apply your knowledge to similar games like StarCraft and Age of Empires.
Transfer learning in machine learning algorithms is limited to superficial use such as fine-tuning an image classifier to detect new objects. The machine learning models may require thousands of years of training to understand and learn complex tasks such as learning to play video games. Even after such training, the machine learning models may perform poorly if there are slight changes in the environment.
Key factors necessary for deep learning:
“When learning a causal model, one should thus require fewer examples to adapt as most knowledge, i.e., modules can be reused without further training,” the authors of the causal machine learning paper write. So why has i.i.d remained the dominant form of machine learning even though it has weaknesses? The pure observation-based approaches are scalable. You can continue to achieve incremental gains inaccuracy by the addition of more training data and speeding up the training process by the addition of more computation power.
One of the key factors behind the success of deep learning is the availability of more data and processes with increased capability. Causal models remain rubbished when the interventions change the statistical distribution of the problem. For instance, if you see an object for the first time your mind will subconsciously fact in lighting from its appearance. This is why we can identify the same object when it is placed in different lighting conditions and environments.
Causal models also enable us to respond in the situations we haven’t been found in to, about counterfactuals. For instance, we don’t need to drive a car off a cliff to know what will happen. Counterfactuals play a significant role in the elimination of the number of training examples a machine learning model requires. In a broad sense, causality can address machine learning’s lack of generalization. “It is fair to say that much of the current practice (of solving i.i.d. benchmark problems) and most theoretical results (about generalization in i.i.d. settings) fail to tackle the hard open challenge of generalization across problems,” the researchers write.
Adding causality to machine learning:
The researchers have brought together several concepts and principles which are essential for the creation of causal machine learning models. Two of these concepts include “structural causal model” and “independent causal mechanism”. The principles imply that instead of looking for superficial statistical correlation, the n si I model should be able to identify causal variables and differentiate their effects on the environment. This is a mechanism that enables you to detect different objects regardless of the view of angle, background, lighting, and other noise. Differentiating these causal variables can make a system more robust for unpredictable changes and interventions. As a consequence causal models won’t require huge training data sets.
“Once a causal model is available, either by external human knowledge or a learning process, causal reasoning allows [it] to conclude the effect of interventions, counterfactuals, and potential outcomes,” the authors of the causal machine learning paper write.
The researchers have provided ideas for systems which combine machine learning mechanism with structural causal models: “To combine structural causal modeling and representation learning, we should strive to embed an SCM into larger machine learning models whose inputs and outputs may be high-dimensional and unstructured, but whose inner workings are at least partly governed by an SCM (that can be parameterized with a neural network). The result may be a modular architecture, where the different modules can be individually fine-tuned and repurposed for new tasks.”
Noteworthy ideas presented in the paper:
It is worth noting that the ideas put out in the paper a very conceptual level. The authors acknowledge that implementation of these concepts requires facing various challenges and difficulties: “(a) in many cases, we need to infer abstract causal variables from the available low-level input features; (b) there is no consensus on which aspects of the data reveal causal relations; (c) the usual experimental protocol of training and test set may not be sufficient for inferring and evaluating causal relations on existing data sets, and we may need to create new benchmarks, for example with access to environmental information and interventions; (d) even in the limited cases we understand, we often lack scalable and numerically sound algorithms.”
The paper also contains various ideas which overlap with the concept of hybrid AI models proposed by Gary Marcus that combine the reasoning power of symbolic systems with the pattern recognition power of neural networks. However, the paper does not make any direct reference to hybrid systems. It is not yet clear which of the several proposed approaches we aid in solving machine learning’s causality problems.
“At its core, i.i.d. pattern recognition is but a mathematical abstraction, and causality may be essential to most forms of animate learning,” the authors write. “Until now, machine learning has neglected a full integration of causality, and this paper argues that it would indeed benefit from integrating causal concepts.”