Counterfactuals and Causal Models: Introduction to the Special Issue
Judea Pearl won the 2010 Rumelhart Prize in computational cognitive science due to his seminal contributions to the development of Bayes nets and causal Bayes nets, frameworks that are central to multiple domains of the computational study of mind. At the heart of the causal Bayes nets formalism is the notion of a counterfactual, a representation of something false or nonexistent. Pearl refers to Bayes nets as oracles for intervention, and interventions can tell us what the effect of action will be or what the effect of counterfactual possibilities would be. Counterfactuals turn out to be necessary to understand thought, perception, and language. This selection of papers tells us why, sometimes in ways that support the Bayes net framework and sometimes in ways that challenge it.
If my words did glow with the gold of sunshine —“Ripple” by Robert Hunter
The study of counterfactuals might seem esoteric. But there is reason to believe that the ability to think counterfactually is what makes the human mind special. The term “counterfactual” could refer to many things: to a sentence form, a kind of idea, an implicit assumption required to make sense of a concept or utterance, or to a possible world. Whatever one has in mind, the reference is to something that does not actually obtain. And the cognitive system makes assumptions about what is false or nonexistent all the time, whether it be to define a term (e.g., the term “academic” can refer to a person who does not exist, like the ideal academic), to make inferences that go beyond what is observable (even unicorns have internal organs), or to understand why we feel certain emotions (because things did not turn out the way we had hoped). The cognitive system deals with propositions that are false and events that did not occur just as easily as propositions that are true and events that did occur. Indeed, as my examples show, processing one type can entail processing the other.
The topic of this special issue was chosen by its honoree, Judea Pearl, and it could not be a more perfect choice to reveal the depth and magnitude of Pearl's contribution to cognitive science. Pearl's career traces the development of tools to think about and model the mind in a new way. Pearl came at the problem as a computer scientist. His early work focused on new methods for heuristic search and he soon began to focus on probabilistic inference. Realizing the value of graphical representations as a compact way of representing probability distributions, he coined the term “Bayesian network” in 1985, a term that refers to one of the central concepts in the modern study of computation and cognitive science. His seminal 1988 book Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference introduced algorithms for learning and inference to the artificial intelligence community, where they were quickly assimilated and became a foundational part of the machine learning movement, whose core techniques now involve probabilistic and statistical inference.
But it was with his book Causality: Models, Reasoning, and Inference, published in 2000, that Pearl's ideas changed the way we think about the mind. Taking advantage of the insights of others, including Dawid (1979), Spirtes, Glymour and Scheines (1993), and Spohn (1980), Pearl introduced a way to interpret Bayes nets not just as representations of probability distributions but also as representations of causal structure. The key insight was to distinguish observation, represented by conditional probabilities, from intervention, which required a way to represent action. To represent action, Pearl introduced the do operator, a mathematical operator that captured what we do when we act as agents to intervene on the world. With the do operator in hand, Bayes nets could be used to make inferences not only about the way the world is (seeing) but also about the effects we have on the world when we act on it (doing) and—here's the kicker—the effects we would have on the world if we were to act on it, in other words, the effects of counterfactual intervention.
The beauty of the formalism is that its assumptions are mostly standard and commonsensical; the only element that is really new is the do operator, the means to represent action. And such an operator seems necessary anyway. Experimental scientists have known since the Enlightenment that observation affords different inferences than experiment, and an experiment is no more than a careful intervention in which one or more independent variables are set to values. So some way of representing intervention has always been necessary anyway. And everything else just falls out; no additional assumptions are needed to represent counterfactual inference. A counterfactual is merely an intervention on a model, rather than an intervention in the actual world. Pearl calls causal Bayes nets “oracles for intervention.” He could just as well have called them “oracles for imagined interventions.” These ideas have transformed the study of artificial intelligence and machine learning, computational linguistics, and cognitive science, and they have transformed how commercial enterprises go about mining, storing, and analyzing data.
The articles in this special issue have been carefully selected to show off the range and power of the causal Bayes nets framework. They include formal analyses by philosophers, computer scientists, and linguists, as well as more empirically motivated pieces by psychologists. They range from discussions of causality in thought to conditional inference to ascriptions of responsibility.
1. Actual cause and responsibility
One of the issues that causal models give purchase on is causal attribution. If an event occurs, what should we say its cause is? This question arises for legal issues (e.g., what caused the car accident—the ice, the tires, or the driver?), for moral questions (e.g., who caused the little boy to cry?), for economic questions (e.g., what is the cause of today's unemployment?), etc. What makes this issue difficult is that every event has multiple contributing causes and it is a challenge both to say which other events are “contributing causes” and then to pick out from that set a specific event that deserves the special status of “actual cause.”
Pearl's (2000) attempt to define contributing cause was most fully fleshed out by Woodward (2003). That is, a cause supports intervention: Roughly, A is a cause of B if an intervention that sufficiently changes the value of A would also change B. This definition opens a hornet's nest. For instance, can intervention be defined without appealing to causation? Woodward thinks not. And there are problems of overdetermination and pre-emption that must be considered. Also, one could argue that such a definition is really fundamentally counterfactual, because the imagined intervention is taking place in an alternative world to the current one.
Halpern and Pearl (2005) developed a causal model representing structural equations to identify the actual cause of an event. Later efforts, including Halpern and Hitchcock (2011, unpublished data), showed that structural equations are not enough to identify an actual cause; a measure of normality is also needed. Events that are normal are not given causal responsibility to the same degree as events that are abnormal. Both oxygen and lightning may have been necessary for the fire, but oxygen's presence is normal whereas the lightning was unusual and hence we blame the lightning, not the oxygen. Halpern and Hitchcock (2011, unpublished data) propose that, like structural equations, normality itself can often be represented with causal structure because events are more normal if they obey causal laws. In their contribution to this volume, Halpern and Hitchcock discuss ways of taking advantage of the fact that causal models can do “double duty” by representing both structural knowledge and normality. In particular, they discuss ways of constructing compact representations that provide both benefits: They both represent the effects of interventions—that is, which counterfactuals are operative—and they represent a normality ordering. One of the virtues of the article is that it provides a painless yet sophisticated primer on causal models.
The article by Shpitser offers a more detailed formal introduction to Bayes nets. Although it does not address the issues of normality raised by Halpern and Hitchcock, it does take aim at some of the more challenging problems of actual causation. Events can have effects on other events via different pathways. For instance, radiation therapy can prolong life by ridding the body of one kind of cancer while causing death later by producing a different kind of cancer. Sometimes we want to know what the effect of an event is only along a specific causal path. Shpitser offers a way to represent such path-specific effects using causal models. He represents them through counterfactuals that span multiple hypothetical worlds.
To put some of these ideas to test in the experimental laboratory, Lagnado, Gerstenberg, and Zultan consider the problem of how people assign individual responsibility in cases where multiple players contribute to a joint outcome. They evaluate a structural model of causal attribution proposed by Halpern and Pearl (2005) and an extension to responsibility assessments proposed by Chockler and Halpern (2004). On this account, an individual is fully responsible for an outcome if his or her action is pivotal for the outcome; that is, if the outcome counterfactually depends on the action. Lagnado et al. show, however, that an individual can be partially responsible even when an outcome is over-determined. In this case the individual's responsibility decreases with the number of changes that would be required to turn the actual world into a counterfactual world where the individual's contribution is pivotal. In general, they find support for the structural model theories although also find that a complete account of responsibility attribution requires some new ideas. For instance, greater responsibility is attributed to an individual whose role was deemed critical prior to the event, not just how pivotal he or she proved to be after the fact.
In his article, Spohn points out that conditional inference is everywhere; language is replete with conditional operators. The prototype is “if, then,” but there are others like “although” (“I'll go to New Jersey, although I'd prefer not to”) and “because” (“they stopped the search because it was too dark”). Causal model theory offers a profound new perspective to take on the understanding of certain conditionals, those expressing probabilistic relations between variables in a causal structure like “if there were rain, then I might not have to wash my car.” Starting with the seminal work of Adams (1965), the most popular view of conditionals has become probabilistic. Due to its theoretical centrality, Edgington (1995) labeled the idea that the probability of a conditional is a conditional probability:
as “The Equation.” This view dominates psychological theorizing about conditionals (Evans & Over, 2004; Oaksford & Chater, 2007). There is now in fact a lot of evidence supporting the general view that conditional inference is closely related to conditional probability. Many experiments have asked people to judge the probability that “if p then q” and have found that people answer by reporting the conditional probability P(q|p) (e.g., Over, Hadjichristidis, Evans, Handley, & Sloman, 2007).
The view finds its motivating force in the Ramsey test, the claim that, to evaluate a conditional statement, you should introduce the antecedent of the conditional into your current stock of beliefs, adjust your degrees of belief in the most natural and conservative manner to conform to this change, and then examine the probability of the consequent in this new environment. Presumably, one would go through the same series of steps to evaluate a conditional probability, suggesting that probabilities of conditional statements and conditional probabilities are identical.
The causal model framework takes very naturally to these two turns—the move to probabilistic representation and the Ramsey test. Causal models are representations of probability. Causal models are all about making inferences about some variables or events conditional on other variables or events, and they offer a means to make such inferences by either conditioning on how the world actually is or by conditioning on how one or more alternative worlds might be. And the Ramsey test can be viewed as an imaginary intervention on a causal model, one in which the event or variable conditionally believed is assumed true and probabilities of everything else are then updated.
In his chapter, Spohn offers a different, nonprobabilistic theory of conditionals. Like the probabilistic position, it assumes that conditional statements are related to conditional beliefs, but he represents conditional beliefs in a different way, in terms of ranking theory. This gives him the freedom to represent a variety of conditionals that emanate from different expressive relations. Not only can he cover cases that are well described by the Ramsey test, he describes a variety of conditionals that express other logical schemes. For instance, he discusses conditionals that assert claims of relevance of the antecedent for the consequent. The theory offers a distinct alternative to current approaches and is rich in the variety of conditional statements it can represent. One of its virtues is that it is not truth-conditional and therefore does not suffer from the paradoxes of material implication, yet it does not dismiss the truth-conditional view of conditionals. Truth conditions are given a clear and precise place in Spohn's theory.
With respect to counterfactual conditionals, Pearl (2000) offered one of the first algorithmic theories of counterfactual conditional inference. The basic idea is that counterfactuals involve three steps:
- Update your model of the world by observing what is going on.
- Use the do operator to change the model to reflect the counterfactual assumption being made.
- Update the probabilities within the modified model and report the probability of the variable or event requested.
Sloman and Lagnado (2005) tested this model of counterfactual inference and found that it did a reasonable job for “interventional conditionals” in which the counterfactual antecedent was explicitly stated as an intervention from outside the causal system at hand. But its ability to describe other kinds of counterfactual inferences was spotty. Rips and Edwards in their study take up the challenge. They use counterfactual conditional inferences without explicit interventions to compare Pearl's theory to an alternative that they refer to as minimal network theory (Hiddleston, 2005). The idea of minimal network theory is that people do not prune their causal models using the do operator, but rather make inferences based on the causal model closest to the current situation that retains the original causal structure. They consider both deterministic and probabilistic situations, and they evaluate both inferences and people's explanations for their inferences. Their data come out in favor of minimal network theory, posing a challenge to the psychological validity of Pearl's pruning theory of counterfactual inference.
In his article, Kaufmann reveals the power of Pearl's ideas about intervention and counterfactuals. He shows how they can be incorporated into the most widely disseminated linguistic theory of counterfactuals, Kratzer-style pragmatics. Specifically, he shows how to implement Pearl's theory of counterfactuals in the framework of Premise Semantics theory (Kratzer, 1981). This article thus opens the door to building a common framework for describing counterfactuals as linguistic entities and as objects of belief.
In the final article, Chater and Oaksford argue for the breadth of the idea that causality is central to human cognition and computation more generally. They show that, like causal Bayes nets, computer programs can also be construed as oracles for intervention (Halpern and Hitchcock also point out the analogy between structural equation models and computer programs). Like Sloman (2005), they paint Pearl's (2000) picture in which mental representation is the ability to represent causal structure as implied by which counterfactuals we can and cannot imagine. Chater and Oaksford argue that perceptual processes and even aspects of language can be painted with the same strokes. They argue for a computational view of mind that distinguishes algorithm and data structure, in which programs define a set of counterfactuals supported by interventions on the data structure (not the algorithm). On their view, a program encodes a family of counterfactuals; representational claims about the mind are really a set of counterfactuals.
Pearl's idea is that we gain a lot by building models that describe the causal structure that generates events in a probabilistically coherent way. We gain in predictive power because causal models capture the properties that actually produce events. But what is unique about causal models is that they allow us to ask questions that we would not have otherwise been able to ask. These include questions about counterfactual possibilities, like whether the same number of people would have died if some obstacle had been put in the killer's path on some prior occasion. They also include questions about the degree to which one event caused another along a particular causal pathway. The answers we get to such questions are only as good as the causal structure that we have evidence for, but we cannot expect to do better than our capacity to understand the structure of the environment that we are making claims about.
The papers in this special issue are a testament to the integrative potential of Pearl's framework. The framework unites a variety of theoretical frameworks, from structural equation modeling to Bayesian updating to graphical representations of uncertainty to formal pragmatics in a simple, powerful, and rigorous way. And by virtue of its integrative power, it applies to every domain of cognitive science, from those represented in this special issue like language, causal and blame attribution, and perception to others not represented, like categorization and decision making. Indeed, several authors have noted that the ability to represent intervention sheds a new light on how to model decision making because choice itself can be construed as an intervention. Choice is the leading edge of all reasoned action. To the extent that decisions are implemented through such action, they are interventions—at least from the decision-maker's perspective (they are certainly not observations). This has not been missed by philosophers like Nozick (1993) and Joyce (1999). And the fact that people understand decision making this way has been discussed by Sloman and Hagmayer (2006) and demonstrated by Hagmayer and Sloman (2009).
An open question that we have begun to address in this special issue is whether people make inferences in the way suggested by the structural approach. Do causal Bayes nets describe human reasoning potential? The papers are not univocal on this issue. While all of them subscribe to the centrality of causal reasoning and admit the importance of intervention, we still do not have a complete theory of human counterfactual inference. Just as we have known for a long time that human judgment is not probabilistically coherent and decision making is not entirely rational, it should come as no surprise that a normative model will not have the final say in describing human counterfactual inference.