The Issue: Why Bayesian Rationality in Economics?
- Top of page
- Abstract
- The Issue: Why Bayesian Rationality in Economics?
- The Explorer: Wald's Economic Approach to Statistics
- The Ingredients: From Sequential Testing to Statistical Decision Functions
- The Legacy: Savage on Wald
- The Strategy: FS's Game Plan
- The Achievement: SEUT
- The Challenge: FS's Second Part
- The Fiasco: FS's Abrupt Ending
- The Puzzle: Why in Economics?
- The Conjectures: From Consistency to Business Schools
- Acknowledgments
- References
- Biography
Rational behavior is the cornerstone of the dominant paradigm of postwar economics, the so-called neoclassical approach. In the neoclassical vision economic agents, be they individuals, households, or firms, make optimal choices, independently solving their own decision problems under the constraints of scarce resources and potential conflicts with each other's choices. Market prices are the signals, which tell agents if and how their conflicting solutions to decision problems can be reconciled in a market equilibrium. Modern neoclassical economics may thus be defined as the study of aggregate phenomena as equilibrium outcomes of individual choices (Bisin, 2011).1
Two broad research themes engage neoclassical economists: the welfare properties of the market equilibria and the way agents make their choices. The fundamental assumption in the latter area is rationality. A rational agent, or homo economicus, is someone who consistently pursues the maximization of his own payoff measured in some utility scale. In short, a homo economicus is a consistent utility maximizer.
While this notion of rationality dates back to the late nineteenth century, it was only after WWII that neoclassical economists achieved a complete characterization of rational behavior when decisions are taken under conditions of uncertainty—that is, the situation under which most real choices are made. The postwar version of the neoclassical paradigm equates economic rationality with Bayesian rationality, as embodied for example in the core areas of game and decision theory. Hence, the homo economicus is modeled today by most mainstream economists as a Bayesian decision maker.
Nobel Prize winner Roger Myerson (1991) explains Bayesian rationality as follows: “Any rational decision-maker's behavior should be describable by a utility function, which gives a quantitative characterization of his preferences for outcomes or prizes, and a subjective probability distribution, which characterizes his beliefs about all relevant unknown factors. Furthermore, when new information becomes available to such a decision-maker, his subjective probabilities should be revised in accordance with Bayes's formula” (p. 5; original emphasis).2 This passage highlights the three main tenets of Bayesianism: (1) uncertainty is captured by probability (whenever a fact is not known, the decision maker should have probabilistic beliefs about it); (2) information is captured by conditioning probabilities (the decision maker should update her prior beliefs according to Bayes's rule as new information arrives); (3) agents follow the expected utility rule (the chosen alternative should maximize the weighted average of probabilities and utilities).3
The modern characterization of rationality is more exacting than the traditional one in terms of the decision-maker's abilities to process the available information and to exploit it to make the “right” choice. No surprise at that, given that decisions under uncertainty are usually trickier than those made under certainty. That real economic agents may actually be able to take the former kind of decisions with the same degree of confidence and precision of the latter is not warranted. A huge experimental literature has questioned the empirical validity of the Bayesian approach to rationality, and the debate is still far from settled.4 What is therefore surprising is that, notwithstanding the obvious relevance of the issue, historians of modern economics have so far under-investigated the transition from the relatively straightforward, nineteenth-century notion of homo economicus as a self-interested utility maximizer to the more sophisticated one of a Bayesian decision maker. Indeed, we still do not know exactly how, when, and why this transition took place.
A reason for the neglect may be that the answer is trivial. Credit should be given to the theory developed by Leonard J. Savage in his celebrated 1954 volume, The Foundations of Statistics (Savage, 1972; FS henceforth). In the book, Savage successfully combined a personalistic notion of probability with Bayes's rule and the axiomatic method to develop his subjective expected utility theory (SEUT henceforth), which became the new orthodox characterization of economic rationality. SEUT and Bayesianism then played a key role in the late 1970s—early 1980s boom of game theory. In the hands of would-be Nobelist John Harsanyi and others, Savage's decision theory became the logical underpinning of the new orthodoxy in the field, Bayesian game theory, so much so that the latter may be considered the former's extension to a multiagent setting. Hence, both parametric and strategic rationality—the twin core tenets of neoclassical economics—are nowadays founded upon Savage's SEUT and share the three main tenets of Bayesianism.
The “how, when and why” question may thus look like a no-brainer—a settled issue devoid of further historical interest. The problem is that, historically speaking, the emergence of Bayesianism and SEUT as the characterization of economic rationality was hardly warranted, in view of Savage's real goal in FS—transforming traditional statistics into a behavioral discipline—and actual achievement—his conclusion that the transformation was unattainable. The aim of the present paper is to tell the story of that goal and that achievement, that is, of how a self-recognized fiasco eventually became an unintended triumph. Acknowledging this story reveals that the metamorphosis of the traditional homo economicus into a Bayesian decision maker was far from inevitable. Though the paper does not provide a complete answer to the “how, when and why” of Bayesianism in economics,5 it will hopefully set the stage for viewing that question as a historically serious one and as a potential contribution to the ongoing debate about the meaning of rationality in economics and other social sciences.
To introduce my main thesis, let us start from Jacob Marschak's 1946 review of von Neumann and Morgenstern's classic, the 1944 Theory of Games and Economic Behavior. In that review Marschak focused, as many other commentators, on John von Neumann's far-from-intuitive notion of mixed strategies and remarked that, by embodying that notion into his characterization of rational strategic behavior, von Neumann's theory ended up requiring that “to be an economic man, one must be a statistical man.” Yet, in a footnote Marschak noted that, in the same years, the renowned statistician Abraham Wald was working at a new kind of statistical decision theory, according to which “being a statistical man implies being an economic man.” While von Neumann was requiring the homo economicus to make inferences like a proven statistician, Wald was suggesting that statisticians should embrace an economic way of reasoning (Marschak, 1946, p. 109, text and fn.14).
This passage in Marschak's review captures the gist of the present paper. I will argue that Savage's 1954 project was just to accomplish, by way of the axiomatic method, the Wald's part of Marschak's remark, that is, to teach statisticians how to behave as rational economic agents. And I will also show that, while he substantially failed in that project, he eventually, and unintentionally, ended up strengthening the von Neumann's part of the remark, transforming the economic agents within neoclassical models into fully fledged Bayesian statisticians. In short, the paper reveals that the Bayesian turn of economic rationality was a by-product of an unsuccessful research program addressed at a different field. While SEUT became a core ingredient of neoclassical economics, most statisticians neglected Savage's Bayesianism—and they still do, inasmuch as traditional (frequentist) inference methods are still majoritarian in modern statistics.6
The latter attitude is surprisingly shared by most contemporary neoclassical economists who, in their econometric work, still apply the estimation techniques of classical (frequentist) statistics—the surprising aspect being that, in their analytical work, those same economists assume that rational agents behave differently, that is, as Bayesian statisticians. This little paradox—which only a historical assessment may reveal7—offers much to ponder about the empirical validity of a discipline whose practitioners follow in their professional routine different behavioral principles than those they impose upon the agents populating their models.
I will begin by exposing in some detail Wald's economic approach to statistics and the basic elements of his analysis in next two sections. In the following section, I will examine Savage's explicit goal of fulfilling the project Wald had left unaccomplished. The subsequent three sections discuss the core of the paper, which contain an elaborate analysis of Savage's research strategy in FS, from his great achievement—the derivation of SEUT—to his failure to implement Wald's project. The last two sections offer a pair of tentative answers to the “how, when and why” question, focusing on the key theoretical notion of consistency and on the new postwar requirements of MBA programs.
The Explorer: Wald's Economic Approach to Statistics
- Top of page
- Abstract
- The Issue: Why Bayesian Rationality in Economics?
- The Explorer: Wald's Economic Approach to Statistics
- The Ingredients: From Sequential Testing to Statistical Decision Functions
- The Legacy: Savage on Wald
- The Strategy: FS's Game Plan
- The Achievement: SEUT
- The Challenge: FS's Second Part
- The Fiasco: FS's Abrupt Ending
- The Puzzle: Why in Economics?
- The Conjectures: From Consistency to Business Schools
- Acknowledgments
- References
- Biography
The name of Abraham Wald is commonly associated with the rise of behavioral statistics on account of his work on statistical decision theory.8 His basic intuition was that statistics is nothing but the science of decision making under uncertainty. Statistical problems should be considered as special instances of general decision problems, where decisions have to be taken in the face of uncertainty. A solution to a statistical problem should therefore instruct the statistician about what to do, that is, what particular action to take, not just what to say. This approach was dubbed by Wald as inductive behavior, following the similar expression used in 1937 by Jerzy Neyman.9
The key insight relating Wald's approach to modern economics is that the decision model he developed in several statistical works also provides a setup for analyzing individual behavior in mundane problems of everyday life. Generally speaking, a statistical decision problem (SDP henceforth) arises when a set of alternative decisions exists and the statistician's preference over them, as measured by some expected benefit function, depends on an unknown probability distribution.10 The central issues in any SDP are, first, choosing what “experiment” to perform in order to extract information from the available data and, then, choosing what action to take (say, continuing experimentation or taking a final decision) given the experiment's outcome.
Wald's decision model highlights these very issues. Its four components are: (1) the available actions; (2) the states of the world, one of which is the true, unknown one (so-called parameter space); (3) the loss function, measuring the loss to the statistician if he takes a certain action when the true state is a given one;11 (4) an experiment, whose goal is to help the statistician to reduce the loss and whose results (called observations) depend on the true state (see Ferguson, 1976). A decision function is a rule associating an action to each possible experimental outcome. The available decision functions are evaluated according to the expected loss their adoption may cause under the various possible states. The statistician's task is to choose the decision function capable of minimizing, in some specified sense, the expected loss.
Wald developed this setup in several works, which represented the benchmark for Savage's later research. The seminal one was “Contributions to the theory of statistical estimation and testing hypotheses” (Wald, 1939), a paper that already contained most of the main ideas, such as the general decision problem and the loss function. The work also featured the two principles, which were later to play a crucial role in Wald's analysis, namely, the minimax principle and Bayes's rule.12 The solution of a SDP—what Wald called “the determination of the region of acceptance” of a given hypothesis—was made to depend on two circumstances: first, that not all errors (i.e., taking a certain action, like accepting a given hypothesis, when the true, yet unknown, state is a given one) are equal, and, second, that the statistician may have an a priori probability distribution over the parameter space (Wald, p. 301). The first circumstance was accounted for by introducing a weight function, which expressed the relative importance of the possible errors. Wald emphasized that the choice of the specific shape of the weight function was not a matter of either mathematics or statistics. Often the importance of errors could be expressed in monetary terms, so much so that the function measured the monetary loss of taking a certain action when a certain state was the true one.13
As to the second circumstance, Wald explicitly rejected Bayesianism as a philosophy of probability and mentioned many objections against the use of a priori probabilities. Solving a SDP had therefore to be independent of the availability of an a priori distribution. Yet, the existence of such a distribution was a very useful analytical tool: “The reason we introduce here a hypothetical probability distribution of [states of the world] is simply that it proves to be useful in deducing certain theorems and in the calculation of the best system of regions of acceptance” (Wald, p. 302). What we see here is the first instance of Wald's instrumental approach to Bayesianism, a recurring theme in his later works and a key point to understand the real goal of Savage's 1954 project.14
Having defined the risk function as the expected loss of selecting a certain “region of acceptance,” Wald proposed the mimimax principle as a general solution for a SDP, that is, as a rule to choose the region of acceptance under a given weight function. He argued that, whenever we decide not to take into consideration an a priori probability distribution over the parameter space, “it seems reasonable to choose that [region of acceptance] for which [the maximum risk] becomes a minimum” (Wald, p. 305). Thus, as early as 1939, Wald advocated the minimax as a reasonable solution criterion. Both the minimax and Bayes's rule were singled out for their expediency in deriving analytical results, but it was only for the minimax that Wald suggested that an explicit justification could be found for employing it as a concrete SDP solution.
Though several elements were still missing—above all the idea that the design of the experiment be also part of a SDP15—the 1939 paper shows Wald's awareness that his ideas might be used to build a unified general theory of statistics, as well as to solve explicit statistical problems. The stage was set for the further steps in his program, which were favored by two external events. First, the publication in 1944 of the Theory of Games and Economic Behavior, which suggested him the key insight of reinterpreting a SDP as a special case of von Neumann's two-person zero-sum games (2PZSGs). Second, his being presented with a specific instance of real world SDP, in the form of quality control of warfare supplies.
The latter event took place in early 1943, when a U.S. Navy Captain, Garret L. Schuyler, complained to economist and statistician Allen Wallis about the excessive size of the sample required for comparing percentages in ordnance testing.16 The case was that of judging alternative methods of firing naval shells. Captain Schuyler claimed that “a wise and seasoned ordnance expert […] would see after the first few thousand, or even few hundred, [rounds] that the experiment need not be completed, either because the new method is obviously inferior or because it is obviously superior.” Why, he grumbled, statisticians demand “an experiment which is so long, which is so costly and which uses up so much of your shells that you've lost the war before you get the test over?" (see Wallis, 1980, p. 325; Klein, 2000, p. 47). Why not take a “more economic” approach to testing warfare equipment, one that could at the same time minimize experiment costs and ensure adequate sampling for proper quality control?
As we learn from the historical note in Wald (1945a), Wallis, together with future Nobelist Milton Friedman, tried to answer Captain Schuyler's challenge by conjecturing that there might exist a sequential test capable of controlling type I and type II errors17 as effectively as the ordinary most powerful tests, while requiring a smaller expected number of observations. “It was at this stage that the problem was called to the attention of the author of the present paper. […] In April 1943 the author devised such a test, called the sequential probability ratio test” (Wald, 1945a, p. 121).18
The Ingredients: From Sequential Testing to Statistical Decision Functions
- Top of page
- Abstract
- The Issue: Why Bayesian Rationality in Economics?
- The Explorer: Wald's Economic Approach to Statistics
- The Ingredients: From Sequential Testing to Statistical Decision Functions
- The Legacy: Savage on Wald
- The Strategy: FS's Game Plan
- The Achievement: SEUT
- The Challenge: FS's Second Part
- The Fiasco: FS's Abrupt Ending
- The Puzzle: Why in Economics?
- The Conjectures: From Consistency to Business Schools
- Acknowledgments
- References
- Biography
A sequential test is defined by Wald as any kind of statistical procedure, which gives a specific rule for taking, at any stage of the experiment, one of the following three actions: either accept the hypothesis being tested, or reject it, or continue experimentation by making an additional observation (Wald, 1945a, p. 118). The crucial feature of a sequential test is therefore that the number of observations is not predetermined, but is itself a random variable, given that at any stage of the experiment the decision to terminate the process depends on the result of previous observations. This was a big change from standard testing procedures, which required a fixed number of trials to be specified ex ante. Sequential procedures greatly economized on the number of observations, thereby answering Captain Schuyler's complaints.
The analytical goal of sequential testing is to minimize the number of observations required to reach a decision about acceptance or rejection of a given hypothesis under the desired test power. In the 1945 paper, Wald did not manage to build an optimal test, that is, one minimizing both the expected values of the number of observations required when either the given statistical assumption or its negation is true. Yet, he provided a substitute for the optimal test, and the proxy was, once again, offered by the minimax logic. He claimed that, when no a priori knowledge exists of how frequently the given hypothesis or its negation are true in long run, “it is perhaps more reasonable to minimize the maximum of [expected number of observations]…” (Wald, p. 124).
The main tool developed by Wald under this logic was the sequential probability ratio (SPR) test, a testing procedure based on an expected number of observations considerably smaller than in standard most powerful tests for any desired level of control of type I and type II errors. Crucially for our story, the SPR test was explicitly founded upon Bayes's rule: it required updating any a priori probability the experimenter might entertain about the truthfulness of a given hypothesis with the new information arising from experimental observations. At any stage of the experiment the updating warranted that one of three actions be taken, namely, either accept or reject the hypothesis or continue with one more observation.
By 1943, Wald had therefore devised a brand new approach to the testing of statistical assumptions. It was based on strict economic logic in that it economized over the experiment's costs and asked the experimenter to behave as an economic agent and take at each stage an optimal action (i.e., whether to endorse a certain hypothesis or to continue experimentation). In the latter respect, sequential testing was a direct application of the behavioral logic introduced in the 1939 paper—indeed, it proved the logic might bring operational results. Yet, the new procedure went beyond the seminal paper in that it got rid of the traditional single-stage experiment constraint, explicitly allowing for multistage experimentation, where the behavioral element was even more crucial.
In the following years, Wald pursued three different research lines: (1) the construction of usable sequential testing procedures; (2) the solution of specific SDPs; (3) the development of a general theory of statistical decision. It is this third, and most important, branch that matters to us.
The final ingredient in Wald's statistical decision theory came by acknowledging the formal overlap between his SDP and von Neumann's 2PZSGs. We know that in a SDP the experimenter wishes to minimize the risk function r(F,δ), that is, the expected maximum loss that taking a certain decision δ might cause when the true distribution of the parameter space is F. Risk depends on two variables, but the experimenter can choose only one of them, δ, but not the other, the true distribution F. This is chosen by Nature and the choice is unknown to the experimenter. Wald realized that the situation was very similar to a 2PZSG, with Nature playing the role of the experimenter's opponent. Thus, in Wald (1945b) we find the first formulation of a SDP as a “game against Nature,” an approach that will enjoy considerable popularity in the following years and will shape much of postwar decision theory.
Like in von Neumann's games, the solution to the SDP-turned-2PZSG came from the minimax logic: “Whereas the experimenter wishes to minimize the risk r(F,δ), we can hardly say that Nature wishes to maximize r(F,δ). Nevertheless, since Nature's choice is unknown to the experimenter, it is perhaps not unreasonable for the experimenter to behave as if Nature wanted to maximize the risk” (Wald, 1950a, p. 27).19 In this framework, “a problem of statistical inference becomes identical with a zero sum two person game” (Wald, 1945b, p. 279). Yet again, Wald's commitment to minimax was instrumental. What really mattered to him was that, even without endorsing their underlying logic,20 both the theory of 2PZSG and the minimax solution were crucial for the analytics of statistical decision theory.
In a 1947 paper, Wald provided the first complete and truly general formulation of a SDP. He also demonstrated the complete class theorem, the crucial result upon which most of his later theory is founded. Having defined a statistical decision function as a rule associating each sequence of observations with a decision to accept a given hypothesis about an unknown distribution (Wald, 1947, p. 549), the theorem claims that the class of Bayesian decision functions—that is, of decision functions based on the existence of an a priori probability over the unknown distribution and on the updating of that probability according to Bayes's rule—is complete (Wald, p. 552). This means that for any non-Bayesian decision function, which can be used to solve a given SDP, there always exists a Bayesian decision function which, for all possible a priori distributions, is at least as effective at minimizing the risk function, that is, for which the expected value of maximum loss is never larger.
With all the necessary ingredients at hand, Wald was eventually able to present in a compact form the fruits of his decade-long research in the 1950 volume, Statistical Decision Functions (Wald, 1950a; SDF henceforth). The book states from the beginning the motivation behind the whole project, namely, setting statistical theory free of the restrictions which marred it “until about ten years ago” (SDF, p. v).21 These were, first, that experimentation was assumed to be carried out in a single stage and, second, that decision problems were restricted to the two special cases of hypothesis testing and point/interval estimation. Wald's theory avoided both restrictions, allowing for multistage experiments and general multidecision problems. Any instance of the latter was a problem of inductive behavior: this because a statistician's choice of a specific decision function uniquely prescribes the procedure she must follow for carrying out her experiments and making a terminal decision (SDF, p. 10). More than ever, the behavioral character of statistics was at the core of Wald's full theory. His new, decision-theoretic—or, as Marschak would say, “economic”22—approach represented a significant generalization of standard statistics.
The 1950 book was in many respects simply an outgrowth of Wald's previous papers, though it was said to contain “a considerable expansion and generalization of the ideas and results obtained in these papers” (SDF, p. 31). For our aims, it is important how Wald dealt with the crucial issue of how an experimenter may judge the relative merit of any given decision function. Two features matter for this judgment: the cost of the experiment and the experimenter's relative degree of preference over the various possible decisions when the true state of the world is known (SDF, p. 8). Experiment costs depend on the chance variable selected for observation, on the actual observed values and on the number of stages in which experiment has been carried out. The loss suffered by making a given terminal decision d when F is the true distribution of the parameter space is captured by the weight function W(F,d).23 This function is always among the data of a SDP, but the big issue is in many cases how to attach values to it, that is, how to measure losses. The sum of the expected value of W(F,d) and the expected cost of experimentation gives the risk function r(F,δ), where δ is the specific decision function adopted by the experimenter. The merit of any given decision function for purposes of inductive behavior may thus be entirely evaluated on the basis of the risk associated with it (SDF, p. 12). The complete class theorem of Wald (1947) then allows to conclude that, if an a priori distribution ξ exists and is known to the experimenter, a decision function for which the average risk—the average being calculated using Bayes's updating rule—takes its minimum value may be regarded as an optimum solution. In fact, a DF δ0 that minimizes this average risk for all possible δ is called a Bayes solution relative to the a priori distribution ξ (SDF, p. 16).
Once more, Wald explicitly distanced himself from “real” Bayesianism, claiming that the a priori distribution ξ may often not exist or be unknown to the experimenter.24 As an alternative, he proposed the minimax solution: a decision function is a minimax solution of the SDP if it minimizes the maximum risk with respect to the distribution F. The “intimate connection” between Bayes and minimax solutions then warrants that under fairly general conditions a minimax solution to a SDP is also a Bayes solution (SDF, pp. 89 ff.). Wald's justification for both kinds of solution was again purely instrumental, though he added once more a timid defense of the minimax, calling it “a reasonable solution of the decision problem” in those cases where ξ does not exist or is unknown (SDF, p. 18).
At the 1950 International Congress of Mathematicians, Wald (1950b) concluded a concise presentation of his approach by saying that “While the general decision theory has been developed to a considerable extent and many results of great generality are available, explicit solutions have been worked out so far only in a relatively small number of special cases. The mathematical difficulties in obtaining explicit solutions, particularly in the sequential case, are still great, but it is hoped that future research will lessen these difficulties and explicit solutions will be worked out in a great variety of problems” (p. 242). The airplane crash, which killed Abraham Wald and his wife in December 1950 brought an abrupt end to this research. It will be up to other scholars to continue it and to one of them to turn it toward an unexpected direction.
The Legacy: Savage on Wald
- Top of page
- Abstract
- The Issue: Why Bayesian Rationality in Economics?
- The Explorer: Wald's Economic Approach to Statistics
- The Ingredients: From Sequential Testing to Statistical Decision Functions
- The Legacy: Savage on Wald
- The Strategy: FS's Game Plan
- The Achievement: SEUT
- The Challenge: FS's Second Part
- The Fiasco: FS's Abrupt Ending
- The Puzzle: Why in Economics?
- The Conjectures: From Consistency to Business Schools
- Acknowledgments
- References
- Biography
Savage's statistical work began at Columbia University's wartime Statistical Research Group, where he joined a stellar team of economists and statisticians, which included, among others, Friedman, Wallis, and Wald.25 The SRG had been the recipient of Captain Schuyler's complaints and where Wald had developed his analysis of sequential testing. The impact of Wald's new approach on the young statistician was considerable, as is demonstrated by the two papers that Savage co-authored while at the SRG.26 Yet, his plans were more ambitious.
At the 1949 meeting of the Econometric Society, two sessions were held under the common title “Statistical inference in decision making.” The sessions featured five papers related to Wald's research, including one by Wald himself who chaired one of the sessions. Savage was among the other presenters, with a paper titled “The role of personal probability in statistics.” That work has never been published, but its Econometrica abstract shows that by that early date Savage had already identified the core of his 1954 book. The key idea was, so to speak, “let's take Wald seriously.” That is, if statistics must become a behavioral discipline, if statistical inference is a matter of decision theory, if statisticians must behave as rational economic men, then it is necessary to characterize more rigorously what rational behavior amounts to. Only a full theory of rational behavior under uncertainty would provide—as Wald's project requires—the decision-theoretic foundations for a general theory of behavioral statistics (cf. Savage, 1949).
To reach this goal, one had to go beyond “the tendency of modern statisticians to countenance no other than the frequency definition of probability.” According to Savage, the frequentist view was responsible for “insurmountable obstacles” preventing the development of behavioral statistics—obstacles that even Wald's minimax theory had been unable to overcome, but that “may be bypassed by introducing into statistical theory a probability concept, which seems to have been best expressed by [Italian probability theorist] Bruno de Finetti….” The latter had argued that “plausible assumptions about the behavior of a ‘reasonable’ individual faced with uncertainty… [imply that] …he associates numbers with the [uncertain] events, which from the purely mathematical point of view are probabilities.” Moreover, these probabilities, which Savage called “personal probabilities,” were “in principle measurable by experiments” on the individual; their interpretation offered a well-defined characterization of how the individual should act in the face of uncertainty “in view of the von Neumann – Morgenstern theory of utility.” Unfortunately, Savage noted, de Finetti's theory “compares unsatisfactorily with others (in particular Wald's theory of minimum risk),” because it neither predicted nor demanded that a crucial feature of modern statistical analysis, deliberate randomization, be undertaken by the decision maker. Therefore, “both Wald's and de Finetti's theories are incomplete descriptions of what statistical behavior is and should be,” so much so that “we may look forward to their unification into a single more satisfactory theory.”27
All the essential ingredients of FS were already there: the rejection of frequentism, though, note well, not of frequentist-based statistical techniques; the praise of Wald's minimax; the personalistic view of probability, taken as the numerical evaluation of uncertainty entertained by a “reasonable” decision maker; the idea that personal probabilities can be elicited observing the agent's behavior under uncertainty; the idea that these probabilities fully characterize that behavior according to von Neumann's expected utility; the explicit normative penchant of the analysis; above all, the intuition that combining Wald and de Finetti may represent the most promising path toward a general theory of statistics as “acting in the face of uncertainty.” Also noteworthy is what was not in Savage's 1949 sort of manifesto. No reference was made to Bayesianism as a general philosophy of probability,28 nor to the idea of upturning consolidated statistical techniques. Even the goal of developing a general theory of decision making under uncertainty, to be used as a guide for rational behavior beyond the boundaries of statistical work, was conspicuously absent.
Savage's manifesto was clearly addressed at pursuing Wald's path. As in Wald, he wanted to apply (what today we call) Bayesian techniques to provide traditional statistics with more solid decision-theoretic foundations. As in Wald, his aim was to offer a guide to statisticians in their daily work. Where he wished to improve upon Wald was in the characterization of what it actually meant for a statistician to behave rationally in the face of uncertainty.
The continuity is even stronger in Savage's review of Wald's SDF (Savage, 1951; “Review” henceforth). The review appeared in the Journal of the American Statistical Association only after Wald's tragic death, but had been written before it. It had been commissioned as more than a simple review: the goal assigned to Savage by the JASA Editor was to give an informal exposition of Wald's new approach to statistics (“Review,” p. 55, fn. 1). The text is made up of three distinct parts, each of great historical relevance: (1) a presentation of the decision-theoretic approach to statistics; (2) an introduction to the state-act-consequence (S-A-C) model as a method to characterize an agent's decision under uncertainty; (3) a critical exposition, plus a possible defense, of Wald's minimax rule. Much like the 1949 paper, the 1951 review is a kind of manifesto—or, more properly, a work-in-progress report of FS's theoretical edifice.
The review begins with the remark that the traditional statistical problem is to draw inferences, that is, to make assertions on the basis of incomplete information. A related problem is the design of experiments permitting the strongest inference for given expenditure. But Wald's theory is about statistical action, not inference, that is, about deciding what behavior to take under incomplete information. As prominent examples, Savage mentions quality control and experiment design, but his main point is that “all problems of statistics, including those of inference, are problems of action” (“Review,” p. 55). Thus, statistics must be reinterpreted as a discipline concerned with behavior, rather than assertions: it is about what to do, rather than what to say.29
Having affirmed the behavioral content of statistics, Savage endeavors to explain what “a course of action” or, more simply, “an act” actually is. The notion must be understood “in a flexible sense,” as even the whole design of a complicated statistical program may be regarded as a single act. More than that, “in a highly idealized sense,” even an agent's entire existence may be thought as involving only one, single-act life decision, such as, say, “the decision to conduct himself according to some set of maxims envisaging all eventualities” (“Review,” p. 56). Of course, such an idealized view of an act “is manifestly far-fetched,” but Savage believes it may call attention to the appropriateness in the new approach of considering “very large decision problems as organic wholes.” This is a crucial passage in the paper (“Review,” p. 56) and, possibly, in Savage's overall intellectual project, as it is precisely at this stage that his analysis parts company with Wald's.
The latter's SDPs had been expressed in the technical jargon of probability distributions, set theory and parameter spaces, with no concession to the reader in terms of simplified, possibly nonstatistical examples. Having been assigned the task of offering a cut down exposition of Wald's book, Savage elects to present the basic decision problem in a more straightforward way: “Acts have consequences for the actor, and these consequences depend on facts, not all of which are generally known to him. The unknown facts will often be referred to as states of the world” (“Review,” p. 56). The rhetorical power of the S-A-C language can hardly be downplayed. It brings the reader the message that SDPs are really just like any other kind of decision problems. This is reinforced by the first example chosen by Savage to illustrate the power of the new language (“Review,” pp. 56–57), namely, the problem of deciding… whether to carry an umbrella under uncertain weather conditions!
Provided each consequence can be assigned “a numerical score such that higher scores are preferred to lower scores” (monetary income being of course the most intuitive way to measure those scores), and provided agents may assign probabilities to the various states, it is possible to calculate the expected value associated with an action. The decision maker will then follow von Neumann and Morgenstern's utility theory and choose the action maximizing this expected value (“Review,” pp. 57–58). However, if the agent does not assign probabilities to states “this trivial solution does not apply and the problem is newer.” The main theme of “modern, or un-Bayesian, statistical theory,” as Savage calls it, has been precisely to deal with uncertainty when probability does not apply to unknown states of the world (“Review,” p. 58).
Reading the 1951 review, one may not escape a sense of discontinuity in Savage's exposition. While the paper's first page, dedicated to explaining the new behavioral approach to statistics, might have been written by Wald himself and is fully pertinent to the general SDP issue, the introduction of the S-A-C terminology and, even more, the umbrella example bring the reader away from the realm of statistics and into that of economics, that is, into the world of the theory of decision under uncertainty. The jump was of course intentional, as Savage wished to promote an economic approach to statistics, that is, to bring forward the view that statisticians should behave as rational economic men. The message was: if we can devise a rule to effectively solve the umbrella dilemma, the very same rule can be applied to any kind of decision problems, including complicated statistical ones. All such problems are indeed amenable to treatment according to the S-A-C formalism and their solution always involves the selection of an act among a set of alternatives.
Having defined the notions of dominant and mixed acts, Savage proceeds to state “the general rule by which the theory of statistical decision functions tentatively proposes to solve all decision problems” (“Review,” p. 58), that is, Wald's minimax rule. Again, the rhetorical device is remarkable. Wald's minimax is presented as a way out from the stalemate caused by the unavailability of probability values to be attached to states. Such a stalemate, which in FS Savage will call “complete ignorance,” is said to always affect statistical problems within the “modern, or un-Bayesian” approach.
Let I(a,s) be the expected income if act a is chosen when the true state is s (both a and s belong to finite sets). Let loss L(a,s) be the difference between the most that can be earned by choosing any act when state s obtains and what is actually earned by choosing a when s obtains, that is to say, L(a,s) = maxa′ I(a′,s) – I (a,s). As in Wald, the loss measures “how inappropriate the action a is in the state s” (“Review,” p. 59). Wald's minimax principle then states: choose an action a such that the maximum loss which may be suffered for any possible state s is as small as possible, that is, that minimizes the maximum loss. This principle, which Savage credits as being “the only rule of comparable generality proposed since Bayes’ was published in 1763,”30 is said to be “central to the theory of statistical decision functions, at least today” (“Review,” p. 59; emphasis added). The emphasized words elucidate Savage's attitude with respect to Wald's minimax. In the rest of the review he will, first, offer a possible argument to justify the minimax, then, criticize the criterion, and, finally, argue that Wald's book—on account of its reliance on so arguable a criterion—is just a preliminary step toward a more complete reconstruction of statistics on behavioral basis. An endeavor that, after Wald's death, it will be up to Savage himself to accomplish.
Savage's defense of the minimax as a rule for making statistical decisions under uncertainty is ingenious. In a clear anticipation of FS's SEUT, he argues that in the case of individual decisions, the criterion is not required because, following de Finetti's personalistic view of probability, individuals are never in a situation of “complete ignorance”: a single agent can always rank the consequences of her actions because she always holds probabilistic beliefs about the states of the world. The minimax becomes indispensable whenever decisions must be taken by a group: “If, however, the actor is a group of individuals who must act in concert with a view to augmenting the income of the group, the situation is radically different, and the problems of statistics may often, if not always, be considered of this sort.” (“Review,” p. 61; emphasis added). The reason is intuitive: whose probability beliefs about the states of the world should be given priority in deciding what action to take? Absent any reason for privileging one set of beliefs over the others, adopting the minimax criterion as if no such beliefs exist sounds appealing, because it “means to act so that the greatest violence done to anyone's opinion shall be as small as possible” (“Review,” p. 62). Therefore the minimax commends itself as “a principle of group action,” a compromise solution that may avoid causing undue losses to any of the group's members. As the previously emphasized words clarify, Savage believes that group action is ubiquitous in statistics—indeed, in the whole of science.31 Moreover, it is often the case that, “under the minimax rule the same amount will be given up by each member of the group,” again reinforcing the compromise character of the minimax choice. Finally, the minimax also offers a way out from another, potentially troublesome issue, namely, selecting who is entitled to be part of the decision-making group: under the minimax all “reasonable” opinions may be considered, without having to decide beforehand whose opinion is legitimate and whose is not (“Review,” p. 61).32
Despite this strong defense of the minimax criterion, Savage does not refrain from criticizing Wald's theory. The first motive of dissatisfaction is Wald's inability to offer a valid justification for a criterion that, as we know, he considered a mere analytical device. Moreover, Wald had given a questionable definition of the loss function, which exposed the minimax to the critique of being an unjustifiably ultra-pessimistic rule. Savage remarks that in Wald's definition the notion of loss L(a,s) cannot be distinguished from that of negative income −I(a,s). The two notions coincide only if zero is the maximum value of I(a,s), that is, if we assume that the most the decision maker may earn by guessing the right state and selecting the right action is the absence of any loss. In such a case it is L(a,s) = −I(a,s), but the assumption is truly ultra-pessimistic: “no serious justification for it has ever been suggested” and it may even bring to the absurd conclusion that, in some cases, no amount of experimentation will bring the agent to behave differently than as if he were under complete ignorance (“Review,” p. 63). The potential irrelevance of observations for the agent's choice makes Wald's negative-income version of the minimax rule untenable for statistics.33
Savage concludes the review by saying that the 1950 book—a “difficult and irritating” one to read—is at best an intermediate report of Wald's research, albeit of “great scholarly value” and with “inestimable” possible influence on future statistics. The project of instructing statisticians to behave as economic actors is still incomplete: the minimax rule is itself far from perfect and, above all, no results has been achieved about either the applicability of the rule to concrete statistical problems or the possibility to encompass the traditional inference techniques under the minimax umbrella. The review already offers the guidelines of Savage's own attempt to fulfill Wald's project—what in the next section I will call the FS's game plan. A theoretical breakthrough will nonetheless be required to accomplish the task, namely, the replacement of Wald's minimax with a new decision criterion that, in turn, will call for a novel characterization of probability as subjective belief.
The Strategy: FS's Game Plan
- Top of page
- Abstract
- The Issue: Why Bayesian Rationality in Economics?
- The Explorer: Wald's Economic Approach to Statistics
- The Ingredients: From Sequential Testing to Statistical Decision Functions
- The Legacy: Savage on Wald
- The Strategy: FS's Game Plan
- The Achievement: SEUT
- The Challenge: FS's Second Part
- The Fiasco: FS's Abrupt Ending
- The Puzzle: Why in Economics?
- The Conjectures: From Consistency to Business Schools
- Acknowledgments
- References
- Biography
Savage's project in FS followed Wald's steps and aimed at refounding statistics by transforming it into a fully fledged behavioral theory. Yet, differently from Wald, the discipline's new foundations were to lie in a subjectivist notion of probability (FS, pp. 4–5). Any examination of the 1954 book must therefore recognize that it targeted statistics, not economics, and that its key ingredients were behaviorism and subjectivism.
“The purpose of this book, and indeed of statistics generally, [is] to discuss the implications of reasoning for the making of decisions” (FS, p. 6). These words were placed by Savage at the beginning of FS's Chapter 2 to illustrate the first essential ingredient of his analysis, namely, the reinterpretation of statistics as a discipline concerned with the making of decisions, rather than the statement of assertions. As he put it, “[t]he name ‘statistical decision’ reflects the idea that inductive inference is not always, if ever, concerned with what to believe in the face of inconclusive evidence, but that at least sometimes it is concerned with what action to decide upon such circumstances” (FS, p. 2). The traditional verbalistic outlook of statistics, where statistical problems concern deciding what to say, had to be replaced by Wald's behavioralistic outlook, according to which the object of statistics was to recommend wise action in the face of uncertainty (FS, p. 159).
The second key ingredient in FS was the subjectivist, or personalistic, view of probability. Following de Finetti's pioneering work, probability was defined as an index of a person's opinion about an event, that is, a measure of “…the confidence that a particular individual has in the truth of a particular proposition…” (FS, p. 3). In this view, “…personal probability […] is […] the only probability concept essential to science and other activities that call upon probability.” (FS, p. 56). Savage acknowledged that the twentieth-century boom of statistical research—carried on by what he calls the British-American school of Ronald Fisher and his associates—had taken place entirely within the objectivist field. However, the frequentist view suffers from several well-known weaknesses, like the fact that objective probability only applies to the very special case of repetitive events, or the circularity of the frequentist definition, which refers to infinite sequences of independent events. Savage added to the list that objective probability could not be used as a measure of the trust to be put in a proposition: “…the existence of evidence for a proposition can never, on an objectivistic view, be expressed by saying that the proposition is true with a certain probability. […] if one must choose among several courses of action in the light of experimental evidence, it is not meaningful, in terms of objective probability, to compute which of these actions […] has the highest expected income” (FS, p. 4). Objective probability was therefore unfit as a basis for a truly behavioral approach to statistical problems. Still, no effort to rebuild the whole of statistics could overlook the achievements of Fisher and the British-American School.34
The grand goal of refounding statistics explains why the FS is divided in two distinct parts. In the first one (Chapters 2–7), Savage developed a rigorous theory of rational behavior under uncertainty, which provided the required normative benchmark for behavioral statisticians. In the second (Chapters 9–17), he endeavored to reinterpret the statistical methods of the British-American School according to the new decision theory, in order to demonstrate that standard inference techniques could work even in a subjectivist/behaviorist framework. As can be read in the book's outline: “It will, I hope, be demonstrated thereby that the superficially incompatible systems of ideas associated on the one hand with a personalistic view of probability and on the other with the objectivistically inspired developments of the British-American School do in fact land each other mutual support and clarification” (FS, p. 5). Drop the “subjectivist” component and what you have in the second part of FS is once again Wald's general project for a behavioral statistics.
Only by keeping in mind this two-part program it is possible to disentangle the complex structure of the 1954 book, which otherwise may look messy or even written by two different, sometimes opposed, authors.35 The program may be best appreciated by bringing to light the logical sequence followed by Savage to implement it, what I call the FS's game plan: first, develop subjective probability theory and the new theory of rational behavior (SEUT); second, present and defend Wald's minimax rule; third, show that the minimax rule may also be given a subjectivist interpretation; fourth, apply the minimax rule to demonstrate that orthodox inference techniques may be translated into behaviorist terms; fifth, replace in the latter task the minimax rule with the new SEUT. At the end of this sequence, a brand new kind of statistics would emerge, the verbalistic-frequentist approach having been transformed into a behavioral discipline governed by a rigorous subjective decision theory. An ingenious game plan, but, alas, an unsuccessful one.
As we already know, the FS's second part did not fulfill its goal. The book's treasure resides in its first seven chapters, while the remaining ones have been commonly neglected or disparaged. Savage himself recognized the failure. By 1961, he had already realized that Wald's project was a theoretical dead end: “…the minimax theory of statistics is […] an acknowledged failure. […] …those of us who, twelve or thirteen years ago, hoped to find in this rule an almost universal answer to the dilemma posed by abstinence from Bayes’ theorem have had to accept disappointment” (Savage, 1961/1964, p. 179).36 In the preface to the 1972 edition of FS, he was even more explicit and admitted that the promised subjectivist justification of the frequentist inferential devices had not been found—quite the contrary, what he had proved in the book was that such a justification could not be found! Indeed, his late 1950s to early 1960s “Bayesian awakening” came, at least in part, because he recognized the theoretical fiasco of the FS's second half. As he put it in 1972, “Freud alone could explain how [such a] rash and unfulfilled promise […] went unamended through so many revisions of the manuscript” (FS, p. iv).
Actually, it was not a matter of psychoanalysis, but rather of the author's not-yet-mature awareness of the potential of the Bayesian approach. It is again Savage himself who tells us that in the early 1950s he was still “… too deeply in the grip of the frequentist tradition […] to do a thorough job” (Savage, 1961/1964, p. 183).37 At the time of writing FS Savage had just converted from his initial objectivist stance to the subjectivist creed,38 as is confirmed by the book's goal to establish the behaviorist foundations of orthodox inference techniques. To this aim, subjectivism—both in terms of personal probability and in terms of SEUT—was just a useful tool, not the outcome of any deep philosophical commitment. It is even legitimate to ask whether Savage was a Bayesian in 1954. Of the three main tenets of Bayesianism—assess empirical claims via subjective probabilities; use Bayes's rule to evaluate new evidence; make decisions according to the expected utility rule—he fully endorsed in FS just the last one. This is not to mean that he disregarded the other two, but to underline that in the book he questioned them time and again, and even included features one would never expect in a “truly Bayesian” analysis, like the several pages dedicated to the analysis of vagueness in probability claims and to choice under complete ignorance (see section The Challenge: FS's Second Part).39
The simple truth is that in 1954 Savage aimed neither at revolutionizing statistics, nor at describing how agents (let alone, economic agents) really behave. As to the latter, his new decision theory was explicitly normative: the goal was to teach statisticians how they should behave to solve their statistical problems, not to describe how they—or any other decision maker—actually behave. As to the former, we should not be deceived by the book's foundational emphasis. Yes, Savage wanted to rebuild statistics on behavioral foundations, but this was hardly a novel project (actually, it was Wald's one) and its most immediate implications were, to him, far from revolutionary. He openly admired and wanted to preserve orthodox inferential techniques—actually, he tried to strengthen them by beefing up their logical and behavioral underpinnings. The Bayesian revolution, in both statistics and economics, was conspicuously absent from Savage's 1954 radar.40 As his disciple and co-author Dennis Lindley observed, in FS Savage was an unconscious revolutionary, and the paradigm change the book eventually brought to statistics was, at least initially, an unintended one (Lindley, 1980, pp. 5–7). Yet, a revolutionary he was, and he was soon to recognize it: suffices to compare the timid, hands-off tone with which he presented personal probability in 1954 with the self-confident, almost arrogant way he used time and again the expression “we Bayesians” (and even “we radical Bayesians”) in the 1961 paper.41
The Achievement: SEUT
- Top of page
- Abstract
- The Issue: Why Bayesian Rationality in Economics?
- The Explorer: Wald's Economic Approach to Statistics
- The Ingredients: From Sequential Testing to Statistical Decision Functions
- The Legacy: Savage on Wald
- The Strategy: FS's Game Plan
- The Achievement: SEUT
- The Challenge: FS's Second Part
- The Fiasco: FS's Abrupt Ending
- The Puzzle: Why in Economics?
- The Conjectures: From Consistency to Business Schools
- Acknowledgments
- References
- Biography
Few would doubt that Savage's fame is due to the axiomatic development of SEUT in the first seven chapters of FS. In the crucial Chapter 5 he demonstrated what in decision-theoretic jargon is called a representation theorem, that is, a theorem showing that, given an evaluation criterion for determining a preference relation over a set of options and given a set of axioms that the decision-maker's preferences satisfy, the preferences about the options determined according to the evaluation criterion always coincide with the decision-maker's preferences.42 In the case of Savage's SEUT—where the options are acts, the evaluation criterion is the expected utility formula, and the axioms are those listed in FS's Chapters 2 and 3—the representation theorem shows that there exist a unique utility function and a unique subjective probability function such that the decision-maker's preferences always conform with—that is, are represented by—the expected utility formula. This means that an agent prefers act f over act g if and only if the expected utility associated with act f and calculated using those unique utility and probability functions is no less than that associated with act g (cf. FS, 79, Theorem 1). A rational agent, that is, an agent whose preferences satisfy Savage's axioms, makes her decisions according to the expected utility formula. The latter is therefore the criterion for rational decision making under uncertainty.
There are three distinguishing features in Savage's SEUT: subjectivism, consistency, and behaviorism. First of all, SEUT incorporates a personalistic view of probability, where uncertainty about the states of the world is captured in terms of probabilistic beliefs over those states. Second, the theory embodies a consistency view of rationality: subjective beliefs are not entirely free, but must be constrained by a set of consistency requirements (the axioms). Third, SEUT partakes of the behaviorist methodology because the unknown beliefs are elicited by observing the agent's behavior in the face of simple choices under uncertainty (more on this below).
Savage did not develop his theory in a vacuum. He availed himself of the foundations provided by John von Neumann's EUT and explicitly targeted the latter's main weakness, namely, the not-so-well-specified nature of the probability values. In their Theory of Games, von Neumann and Morgenstern (1953) had based the derivation of numerical utilities on the “…perfectly well founded interpretation of probability as frequency in long runs” (p. 19), but had left the door open to a subjectivist restatement (p. 19, fn. 2). That opportunity was seized by Savage: “A primary and elegant feature of Savage's theory is that no concept of objective probability is assumed; rather a subjective probability measure arises as a consequence of his axioms” (Luce and Raiffa, 1957, p. 304). Hence, the first feature of SEUT, the subjectivist approach to probability, finds a further—possibly, its main—rationale in the author's willingness to improve upon von Neumann's theory.
As to the notion of probability itself, Savage defined it as “the confidence that a particular individual has in the truth of a particular proposition” (FS, p. 3) and exploited de Finetti's characterization of subjective beliefs as purely formal objects, constrained by consistency requirements and elicited by observing the decision-maker's behavior under uncertainty (see de Finetti, 1964/1964). This established a strong connection between the subjectivist side of SEUT and its two other components, axiom-based consistency and behaviorism, in that it was the latter's combination which gave analytical content to the former.
The second feature of SEUT, axiom-based consistency, warrants the theory's desired normativeness (do not forget that Savage's main goal was to instruct statisticians about how to behave rationally). Inconsistent beliefs are those that violate the axioms and thus entail irrationality. Conversely, an agent may label herself rational if and only if her beliefs, as well as her preferences, obey the axioms. The role of Bayes's rule is to police the consistency of beliefs by allowing their updating as long as new information arrives.
The characterization of rationality as consistency was the trademark of postwar decision theory, which in turn represented the culmination of an almost century-long endeavor undertaken by marginalist and neoclassical economists who wished to model rational behavior. While the economists’ efforts had focused on the notion of utility, SEUT extended the consistency requirement to a different kind of unobservable entities, the decision-maker's beliefs about “the truth of a particular proposition.” Just like Gerard Debreu did for the agent's preferences (Debreu, 1959) and Paul Samuelson for the agent's choices (Samuelson, 1948), Savage transformed those beliefs into tightly constrained theoretical objects, suitable for formal modeling.43 This was a major innovation with respect to the previous literature on expectations and conjectures, where—with the only exception of non-economists de Finetti and Frank Ramsey—economists had treated agents’ beliefs as loose introspective entities, implicitly delegating their analysis to less rigorous disciplines, such as psychology or sociology.
Savage's theory does contain two elements, which seemingly evoke introspection, namely, the property of cardinality of the EU function and, obviously, the subjective character of probability. It may therefore seem natural to interpret SEUT, and the whole Bayesian program, as a step back toward a “metaphysical” view of rationality, imbued with unobservable mental variables and a naïve psychologism.44 Nothing could be farther from Savage's project. His goal was to obtain a characterization of rational behavior devoid of any psychological contamination by extending logic “to bear more fully on uncertainty” (FS, p. 6)—the kind of logic he referred to being the axiomatic one.
Such a misunderstanding can be avoided by focusing on the key property of consistency. As we know, SEUT's main claim is that if agents’ preferences and beliefs are consistent—in the sense specified by the axioms—then those preferences may be represented by the expected utility formula. The axioms’ role is, therefore, mainly normative, that is, “to police my own decisions for consistency and, where possible, to make complicated decisions depend on simpler ones” (FS, p. 20). Axioms, and more generally logic itself, should be viewed as “…a set of criteria by which to detect, with sufficient trouble, any inconsistencies there may be among our beliefs and to derive from the beliefs we already hold such new ones as consistency demands” (FS, p. 20). Far from being a retreat to old-style introspection or psychologism, Savage's theory is first and foremost a normative guide to the formation of consistent beliefs: a rational agent—rectius, a rational statistician—is defined as someone who checks whether her beliefs satisfy the axioms and, in the negative case, revises them according to Bayes's rule.45
The requirement of consistency is also the yardstick to evaluate the admissibility of a given property to the axioms’ list. Here Savage followed again de Finetti to argue that only those properties whose violation entails an inconsistency should be legitimately listed as postulates (FS, p. 43). This is another noteworthy statement in view of the economists’ subsequent adoption of SEUT as a description of agents’ behavior. Take for instance Savage's first postulate, P1, namely, that the preference relation be complete and transitive (preference as a simple ordering among acts). Economists before and after FS would oscillate between a descriptive and a prescriptive interpretation of the postulate, but Savage was much more clear-cut: P1 is simply a normative rule of consistency (FS, pp. 18–19). Viewing that axiom as a prediction of how real agents evaluate their market alternatives is, to say the least, to betray its original purpose of a prescription to be obeyed by rational statisticians.46
The third main feature of SEUT is behaviorism.47 Savage had one more technical problem to solve: how to make the decision-maker's probabilistic beliefs operational? How to “extract” those beliefs from the agent's mind without making recourse to unacceptable introspection? He found the answer in the observable notion of choice. It is choice that reveals the agent's preference for one alternative over another and it is still choice that reveals the agent's belief on the different probability of two events. Introspection is explicitly rejected as an “…especially objectionable [solution], because I think it of great importance that preference, and indifference, between [two events] be determined, at least in principle, by decisions between acts and not by response to introspective questions” (FS, p. 17). However, even with respect to behaviorism, Savage's 1954 views were hardly extreme. Once again, one gets the impression that his commitment to this aspect of SEUT was motivated more by technical necessity than by methodological conviction.48
The general procedure by which probabilities are elicited in FS resembles Paul Samuelson's technique in revealed preference theory, that is, in the most admired example of applied behaviorism in postwar neoclassical economics (Samuelson, 1948; Wong, 1978). Those subjective probability numbers that did not feature among the initial data emerge from extending von Neumann's EU theorem to the preferences held over a properly constructed probabilistic space. With his new representation theorem Savage proved that any choice made under conditions of uncertainty could be represented in terms of the EU rule and, in particular, as if the agent's decision were guided by the attribution to the states of the world of a well-defined (i.e., numerical) subjective probability. We may therefore speak of “revealed” subjective probabilities, though this should not be taken to imply that they emerge from the observation of real choice data, rather than from the answers to a purely intellectual experiment.49
Still in Chapter 5 Savage applied his representation theorem to classes of acts of increasing generality (FS, pp. 76 ff.). In its most general version the result was extended to bounded acts, that is, acts whose utilities are bounded random variables. The axioms also allow the replacement of consequences with utility numbers, so much so that any act f may be reinterpreted as a real-valued random variable of the kind featuring in statistical problems. Savage's SEUT may thus be summarized as follows: given an event B and two bounded acts f and g, f is not preferred to g if and only if, under event B, the expected utility of f is less than that of g (FS, p. 82). This is the version of the representation theorem which he employed in the rest of FS as the cornerstone for the new foundations of statistics.
The Challenge: FS's Second Part
- Top of page
- Abstract
- The Issue: Why Bayesian Rationality in Economics?
- The Explorer: Wald's Economic Approach to Statistics
- The Ingredients: From Sequential Testing to Statistical Decision Functions
- The Legacy: Savage on Wald
- The Strategy: FS's Game Plan
- The Achievement: SEUT
- The Challenge: FS's Second Part
- The Fiasco: FS's Abrupt Ending
- The Puzzle: Why in Economics?
- The Conjectures: From Consistency to Business Schools
- Acknowledgments
- References
- Biography
Savage opened the sixth chapter of FS boasting that: “With the construction of [SEUT] the theory of decision in the face of uncertainty is, in a sense, complete” (FS, p. 105). The time was ripe to tackle his real target, SDPs. Thanks to Wald, postwar statistics already conceived of the latter in terms of the general structure we described in the section The Explorer: Wald's Economic Approach to Statistics.50 The difference among the various approaches hinged upon the alternative ways to process the additional knowledge, if any at all, which could be obtained via experiments or observations. Remarkably, in the second part of his book Savage stuck to Wald's solution to this issue, despite its being founded upon an objectivistic view of probability. As we know from the section The Strategy: FS's Game Plan, FS's game plan imposed the reconciliation between Wald's objectivist minimax rule and Savage's personalistic view of probability as an essential step toward the desired reinterpretation of orthodox inference techniques in subjectivist and behaviorist terms. This is why Savage spent so much effort to achieve such a reconciliation, albeit with meager results.
Savage's and Wald's approaches are brought together in the opening remarks of Chapter 8, at the very beginning of FS's second part.51 There the author identifies the two worst defects of existing decision theory, and thus the two main obstacles preventing the full development of behavioral statistics. These are, first, the inevitable vagueness of the decision-maker's beliefs and, second, the absence of a solution to multiperson decision problems with heterogeneous agents. It is exactly here that we find the key passage introducing the reader to the second part of Savage's project: “From the personalistic point of view, statistics proper can perhaps be defined as the art of dealing with vagueness and interpersonal difference in decision situations” (FS, p. 154). Crucially, it is claimed that the solution to both issues lies in Wald's minimax rule.
Savage had already dealt with vagueness in Chapter 4, when discussing the traditional objections against subjective probability. The standard critique was that an agent's beliefs may hardly be assigned a precise quantitative value and that such vagueness is even more serious in the case of a normative theory, like that in FS, because it makes virtually impossible to formulate specific behavioral prescriptions (FS, p. 59). To counter the critique, Savage offered in Chapter 9 a fervent defense of… Wald's objectivist minimax rule!
He began by underlining that the key ingredients of his theory of subjective probability—that is, the notions of states, events, acts, and consequences—applied as well to Wald's theory, “from which they were in fact derived” (FS, p. 158).52 The two approaches are separated by the postulate that the decision-maker's preferences establish a simple ordering among all acts. This is an assumption that no frequentist statistician may accept because, when combined with neutral (in terms of probabilistic views) consistency requirements, it would entail, as demonstrated in Chapter 5, the existence of a subjective probability distribution. How could then Savage reconcile the two approaches?
Three years after the publication of FS, Duncan Luce and Howard Raiffa surveyed the different approaches to decision theory in their highly influential book, Games and Decisions (Luce & Raiffa, 1957). The survey listed the alternatives available in a period when SEUT was still far from dominant. Not surprisingly, the authors ranked Savage's approach as intermediate between the two extreme cases of risk and complete ignorance. The former refers to situations, like those in von Neumann's EUT, where a probability distribution over the set of states of nature is either known or taken by the decision maker as if it were known. The latter refers to cases where the decision maker has no information about the states of the world (pp. 277–278). Between the two extremes, Luce and Raiffa identified the vast territory of decision making under partial ignorance, that is, when an agent holds the subjective feeling that a state is more plausible than another. Here the decision maker is assumed to possess at least some vague information about the true state of the world, the real point being how such vague information can be processed (p. 299). It is in this territory that Luce and Raiffa placed SEUT.
Keeping in mind Luce and Raiffa's classification, we can better understand Savage's reconciliation of his own theory with Wald's one. He argued that “…in practice the theory of personal probability is supposed to be an idealization of one's own standards of behavior”; and that “the idealization is often imperfect in such a way that an aura of vagueness is attached to many judgments of personal probability…” (FS, p. 169, emphasis added). Thus, the minimax criterion might be invoked as a rule of thumb to be applied whenever the notion of “best choice” was “impractical”—impractical for the frequentist statistician because it would require the use of subjective probabilities; impractical for the subjectivist statistician in case of overwhelming vagueness (FS, p. 206). Conversely, every time a coherent notion of “best choice” could be pursued—that is, every time a simple ordering among all available acts might be established—the statistician should follow the consistency axioms and therefore apply SEUT. In short, Savage defended the minimax as a pragmatic rule that might step in whenever we face a failure of SEUT's tight requirements—so tight, indeed, that Savage had to invent the notion of small worlds to warrant them a sort of “protected reserve” (see FS, p. 9 and pp. 82 ff.).53
It goes without saying that the reason mentioned by Savage for the “impracticality” of SEUT, namely, the vagueness of beliefs, would be almost anathema to real, die-hard Bayesians. Savage's acknowledgment of the possibility of a complete ignorance environment, and thus of the pragmatic validity of the minimax solution, actually went further. He admitted that, in many instances, “I do not feel I know my own mind well enough to act definitely on the idea that the expected loss for [act] g really is L; but […] of course, feel perfectly confident that [act] f cannot result in a loss greater than [minimax value] L*” (FS, p. 169). Given that almost all SDPs allow for the possibility of a (relatively) inexpensive observation, and given that it may always exist an act conditional on such observation and capable of warranting a (relatively) small maximum loss L*, it turns out that the minimax criterion represents a desirable solution in every SDP where the statistician is unable to confidently select the SEU-maximizing act (FS, p. 169). Thus, far from being incompatible with, or alternative to, SEUT, Wald's minimax is presented in FS as a welcome complement to Savage's own approach for handling those cases where vagueness prevails. This very conclusion was, as we know, a crucial element of FS's game plan.
The second defect of decision theory highlighted by Savage is the inability to solve multiperson decision problems with heterogeneous agents (MPPs henceforth). Repeating his 1951 view, Savage argued that statistics had to do first and foremost with the behavior of groups, rather than with isolated decision makers: “Multipersonal considerations constitute much of the essence of what is ordinarily called statistics” (FS, p. 154; also see FS, p. 105). There are two kinds of MPPs: those arising out of differences in tastes and those arising out of differences in judgment (FS, pp. 155–156). The former can well be tackled by the new SEUT, which explicitly allows for heterogeneous preferences, but the latter cannot, because agents diverge in their subjective probability assessments. Savage complained that the statisticians of the frequentist school had begged the issue, on account of their denial of any role for personal judgments. Even subjectivists had only coped with it by a sort of analytical self-limitation, assuming that statistics be “largely devoted to exploiting similarities in the judgments of certain classes of people and in seeking devices, notably relevant observation, that tend to minimize their differences” (FS, p. 156; emphasis added). Why then did he put so much emphasis on such an intractable—and seemingly lateral—problem?
The first obvious answer is that, within a truly behaviorist approach, statistics must deal with all kinds of probabilistic judgments as essential ingredients of action, regardless of their possible heterogeneity. Second, the emphasis on MPPs may be viewed as another outgrowth of Savage's normative perspective; more specifically, of his desire to also prescribe rationality to groups of statisticians, be they teams of experimenters facing observation problems or members of a scientific community summoned to sanction newly published results. Finally, but crucially, MPPs have a role to play in FS's game plan, namely, that of allowing the desired reinterpretation of the main results of the British-American school of statistics in subjectivist terms.
Savage acknowledged the received view that subjective probability could at best apply to individual decision making, but never to MPPs. The scientific method itself was said to rely exclusively on objective notions, the only ones that could be effectively shared interpersonally. It followed that “[t]he theory of probability relevant to science […] ought to be a codification of universally acceptable criteria,” so much so that no room is left for “personal difference” in probability judgments (FS, p. 67). As a counterargument, Savage claimed that “the personalistic view incorporates all the universally acceptable criteria for reasonableness in judgment,” although those criteria “do not guarantee agreement on all questions among all honest and freely communicating people” (FS, p. 67). The key insight to uphold this claim was his reinterpretation of basic SDPs54 as MPPs where agents differ in their a priori probability distribution on a random variable, but share the same set of acts and the same a posteriori (i.e., postobservation) distribution (FS, p. 122). This reinterpretation provided the basis to demonstrate what is allegedly the most important result of the whole second part of FS: given a properly defined subjectivist variant of a MPP, it is possible to solve it applying Wald's minimax rule.
To understand what this result means, let us skip to the model of group decision in Chapter 10 (FS, pp. 172 ff.). Take a group of agents, indexed by j. Assume they share the same utility function, but differ in their personal beliefs. It is this difference—explicitly denied by the British-American approach—which captures the subjectivist element of the analysis. The group is called to choose, by acting in concert, an act f out of a set F of alternatives. Such a setup—epitomized by the functioning of a jury, but actually “…widespread in science and industry” (FS, p. 173)—defines a group decision problem (GDP). Savage recognized that, while no decision rule may have general validity for every GDP under all circumstances,55 rules of thumb do exist, which may lead the group to reach an acceptable compromise. In particular, the possibility of using mixed acts—one of the main tools in Wald's box—may be given a natural interpretation in a GDP as a way to avoid the impasse and achieve the required compromise (FS, p. 173). Though the members of group may differ in their personal judgments, there are plenty of instruments, such as coins, dice, cards, allowing the group to “objectively” mix, in any desired proportion, the individually preferred acts. Hence, it may well be assumed that the set F of available alternatives in any GDP always contains all the possible mixtures of its primary elements. This in turn leads to a straightforward application of Wald's criterion, in the novel form of a group minimax rule (FS, p. 174). The rule states that, as a compromise solution, the group should adopt the act such that the largest personal loss faced by any member of the group is as small as possible. Formally, the act f′ should be chosen such that, for any agent j, maxj L(f′; j) = L* = minf maxj L(f; j).56
Though it cannot be expected that the group minimax rule will or should be accepted by every member of the group, something may be said in its support. First, Savage noted that, as in the standard single-agent case, if the minimax loss L* is small, the group may find it reasonable to follow the rule and select act f′ because none of its members should consider this choice unacceptable nor could suggest any alternative capable of limiting every member's loss to, at most, L*. Moreover, as he had already observed in the 1951 review, a symmetry argument may be invoked: in case of a symmetric GDP, the minimax rule would impose exactly the same loss L* on every member. This outcome would warrant an element of fairness to the solution, further augmenting the rule's appeal (FS, p. 174). In short, far from being intractable, multiperson statistical problems, explicitly formulated in behaviorist terms and explicitly allowing for the heterogeneity of the members’ subjective beliefs, may find a pragmatic solution in the minimax criterion. Once again, it was Wald's rule which enabled Savage to get rid of the other major shortcoming of behavioral statistics.
The Fiasco: FS's Abrupt Ending
- Top of page
- Abstract
- The Issue: Why Bayesian Rationality in Economics?
- The Explorer: Wald's Economic Approach to Statistics
- The Ingredients: From Sequential Testing to Statistical Decision Functions
- The Legacy: Savage on Wald
- The Strategy: FS's Game Plan
- The Achievement: SEUT
- The Challenge: FS's Second Part
- The Fiasco: FS's Abrupt Ending
- The Puzzle: Why in Economics?
- The Conjectures: From Consistency to Business Schools
- Acknowledgments
- References
- Biography
The continuity between the second part of FS and Savage's review of Wald (1950a) should be apparent by now. The 1954 endeavor was no more, and no less, than Savage's try at completing Wald's project, with the only, though crucial, addition of the SEUT theorem. The continuity is confirmed by Savage's critical attitude toward the minimax rule, and even more toward its “group” version—again a feature that went almost unchanged from the 1951 review to the 1954 book. However, the importance of the group minimax rule within the theoretical edifice of FS did not hinge upon its practical acceptability as a decision-making criterion to be used in real MPPs. What the rule brought home for Savage was the desired synthesis between the subjectivist view of probability—here captured by the members’ differences in their beliefs about events—and Wald's behaviorist approach—here epitomized by the asserted ubiquity of, explicitly behavioral, GDPs and MPPs in modern statistics. Hence, the achievement of FS's Chapter 10, regardless of its intrinsic merit, was pivotal in Savage's game plan.
The rest of the book validates the latter assertion. The final chapters contain Savage's attempt to reinterpret the three most important inference techniques—respectively, point estimation (Chapter 15), the testing of hypothesis (Chapter 16), and interval estimation (Chapter 17)—in behavioral terms57 by applying Wald's objectivist minimax rule. Granted such a reinterpretation, the game plan would then require its reformulation in subjectivist terms, exploiting the newly established group minimax rule. Finally, for those special decision problems where vagueness is not a problem, the minimax rule would be replaced by the SEU rule, that is, by the criterion allowing statisticians to select not just an “acceptable compromise” solution, but the “best” solution. Replacing minimax with SEUT would mean “mission accomplished” in terms of FS's game plan.
Unfortunately, Savage's ingenious strategy ended in a fiasco. While he partially succeeded in giving a behavioral reinterpretation of point estimation (see FS, pp. 229 ff.),58 doing the same for interval estimation proved impossible. In a behaviorist framework, the statistician's problem is to make her “best” choice of an unknown parameter. The orthodox doctrine of “accuracy estimation,” which only allows to know how “good” such a “best” choice is with respect to an interval of values, is however of no avail for such a problem and “…could at most satisfy [the statistician's] idle curiosity…” (FS, pp. 257–258). In other words, when the decision maker is called to choose an act, that is, to produce a point estimate, interval estimation is simply nonsensical. Regardless of the fact that “most leaders in statistical theory have a long-standing enthusiasm for [interval estimation],” the technique may at most be used as a rough and informal device, while its usefulness for statistical decision making is nil (FS, pp. 260–261).
The impossibility of offering a behaviorist reinterpretation of so crucial a tool like interval estimation was a fatal blow to Savage's project of reconstructing mainstream statistics in behavioral and subjectivist terms. The simple truth emerging from his analysis was that behaviorism, subjectivism, and orthodox inference techniques could not be reconciled. The collapse of FS's game plan is testified by the abrupt conclusion of the book, which features no closing remarks, no summary of the main results, no recapitulation of the analytical trajectory followed thus far. It really looks like the author had suddenly realized that his entire program was doomed and therefore decided to quit at that point. As we already know, in the preface to the second edition, Savage invoked Freud to explain how the book's project managed to survive in the face of such a dire conclusion (FS, p. iv). Fortunately, it took him only little time to react against the rout and become the most ardent supporter of a wholly new approach to statistics.
In a 1961 reassessment of his book, Savage was already aware of the revolution he had helped to trigger and wholeheartedly praised it.59 Inference was now defined in strictly Bayesian terms as “changes in opinion induced by evidence on the application of Bayes's theorem” (Savage 1961/1964, p. 178). The frequentist approach was chastised as being more subjective than the personalistic one on account of the unconstrained nature of experiment design. And as an attempt to get rid of the subjective element of traditional statistics, Wald's minimax criterion was deemed a failure—this despite “the strongest apology for the rule” given in FS (Savage, p. 179). Subjectiveness was simply unavoidable in statistical analysis. By omitting empty questions about what “nice properties” a certain statistical procedure should have, and by simply asking the decision maker to choose between alternative procedures subject to strict consistency conditions (Savage, pp. 179–780), the Bayesian approach ended up being more objective—that is, more scientific—than the frequentist one, because it imposed greater order on the inevitable subjective element of statistical decision making (Savage, p. 181).
What did Savage retain of his old game plan? The answer lies in his remark that: “One of the most valuable outgrowths of the frequentist movement is the behavioralistic (or one might say economic-theoretic) approach to statistical problems.” Thus, Jerzy Neyman and, above all, Abraham Wald deserved credit for having launched the “economic analysis of statistical problems,” what they called “inductive behavior” (Savage, p. 177). But if the truly valuable part of Wald's project—as well as of Savage's attempt at fulfilling it—was its behaviorist and decision-theoretic character, first and foremost its conception of the statistician as an economic man, then FS was indeed successful under this respect. The book had crowned Wald's efforts by developing a theory for decision making under uncertainty, which could be prescribed as a rational criterion for statistical problems. Yes, the other (possibly, the main) part of Wald's project, and of FS as well, namely, the reconciliation of inductive behavior with orthodox inference techniques via the minimax rule, had not been achieved. But, no, FS had not been written in vain, because it contained a new powerful tool, SEUT, for rebuilding statistics as a wholly behavioral discipline. In this sense, Wald's project had been accomplished, though by 1961 that rebuilding was still far from complete. The 1954 book might be said to have honored its title, by laying solid foundations of statistics—albeit not the statistics Wald and the early Savage were thinking of, but a brand new kind of it.
The Puzzle: Why in Economics?
- Top of page
- Abstract
- The Issue: Why Bayesian Rationality in Economics?
- The Explorer: Wald's Economic Approach to Statistics
- The Ingredients: From Sequential Testing to Statistical Decision Functions
- The Legacy: Savage on Wald
- The Strategy: FS's Game Plan
- The Achievement: SEUT
- The Challenge: FS's Second Part
- The Fiasco: FS's Abrupt Ending
- The Puzzle: Why in Economics?
- The Conjectures: From Consistency to Business Schools
- Acknowledgments
- References
- Biography
A nice instance of unintended consequences, the spread of Bayesian statistics helped popularize Savage's SEUT. In itself, the theory did represent a remarkable achievement, and a source of perennial glory for its author. By his masterful use of the axiomatic method, Savage elegantly succeeded to fulfill the economists’ almost century-old pursuit of a general criterion for rational decision making. A big historiographic puzzle yet remains: how could SEUT so quickly and so wholeheartedly be endorsed by neoclassical economists? How to explain its success in economics?
To understand what a puzzle it is, consider what we have learned so far. First, Savage's book was not addressed to economists, but to statisticians. Second, the book's overall project—the reinterpretation of standard inference techniques along subjectivist and behaviorist lines—was a remarkable fiasco, because it ended up demonstrating exactly the opposite of what it intended to. Third, FS's most significant contribution, SEUT, was an explicitly normative theory aimed at guiding the statisticians’ work, not a positive one describing how real economic agents behave.
Two further circumstances should be taken into account. On the one hand, Bayesian statistics, while spreading after the FS at a considerable pace in a small number of academic strongholds,60 was, and still is, a minority approach. Most of the discipline has kept using traditional inference methods, though recognizing their foundational weaknesses, while Bayesianism itself has been constantly and severely criticized ever since.61 Hence, Savage's revolution in statistics was, at most, only a partial success. In no sense any part of it could be considered mainstream by the time neoclassical economists embraced SEUT.
On the other hand, SEUT itself was hardly the only available option in mid-1950s decision theory. As the list of different criteria in Chapter 13 of the influential Luce and Raiffa (1957) makes clear, several alternatives did exist. Some were specifically addressed at encompassing situations of complete ignorance (i.e., Savage's vagueness), as a way out from one of the most vexed issues of theoretical economics, the so-called Knightian or Keynesian uncertainty.62 Moreover, all these criteria commended themselves as lying upon rigorous axiomatic foundations. Following Chernoff (1954), Luce and Raiffa's 1957 survey aimed precisely at offering a general axiomatic characterization of all available decision rules, including those valid under complete ignorance.63
In view of all these circumstances, I reiterate the claim that explaining the postwar triumph of SEUT in neoclassical economics really is a historiographic puzzle. To reword Marschak's 1946 dictum, why “to be an economic agent implies being a Bayesian statistician?” How, when, and why did the agents populating economic models come to be modeled as rigorous followers of Savage's rationality prescriptions? As I said in the Introduction, I have no complete answer to this question. The paper's goal was to show that the question itself is an interesting one for the history of postwar microeconomics. Only a couple of conjectures, whose validity awaits further corroboration, may be offered in the next section, by way of conclusion.
The Conjectures: From Consistency to Business Schools
- Top of page
- Abstract
- The Issue: Why Bayesian Rationality in Economics?
- The Explorer: Wald's Economic Approach to Statistics
- The Ingredients: From Sequential Testing to Statistical Decision Functions
- The Legacy: Savage on Wald
- The Strategy: FS's Game Plan
- The Achievement: SEUT
- The Challenge: FS's Second Part
- The Fiasco: FS's Abrupt Ending
- The Puzzle: Why in Economics?
- The Conjectures: From Consistency to Business Schools
- Acknowledgments
- References
- Biography
My first tentative explanation hinges upon a history of economic analysis argument, whose catchword is “consistency.” As I have showed at length elsewhere,64 the history of the twentieth-century neoclassical notion of rational behavior may be told in terms of the progressive replacement of the traditional maximization approach, where rationality consisted of the reasoned pursuit of one's own self interest, with the consistency approach, where rationality means the satisfaction of a formal requirement of consistency and where the notion of agency becomes an all-purpose concept, valid for real individuals as well as for groups or machines. The process of distilling the formal essence of the notion of rationality in order to make it as general and rigorous as possible found its milestones in the works of authors such as Paul Samuelson, John von Neumann, Kenneth Arrow, and Gerard Debreu, and culminated with Savage's SEUT.65 Therefore, we may argue that accepting to portray the homo economicus as a Bayesian statistician was simply the inevitable, almost automatic, conclusion of an intellectual journey neoclassical economists had begun long before.
Modern historiography has showed that many top-notch theoretical economists working in the postwar era embraced a new standard of representation of subjective variables in economic models. The standard was based upon a (flexible) combination of axiomatic and behavioral methods. At the turn of the 1960s, it had already been applied to the modeling of the agents’ preferences and, therefore, of their choices under conditions of either certainty or risk. What remained to be done was to apply the same standard to the other source of subjectivism in the case of uncertainty, the agent's beliefs. From this point of view, Savage's SEUT was just the right theory at the right time. It may thus be surmised that what made it so attractive for postwar economists was the demonstration that even the agent's beliefs could be subjected to tight rationality (i.e., consistency) constraints. If beliefs satisfied the new rigorous standard of theorizing, they could as well become proper ingredients for formalized models. In short, SEUT enabled neoclassical microeconomics to boast “game, set and match,” at least as far as parametric rationality was concerned.66
My second explanation is based on a wholly different argument, involving institutional and sociological factors. The catchword is “business schools,” that is to say, the conjecture that economists came to appreciate Bayesian decision theory via their increasing involvement in major MBA programs, where the new approach was explicitly taught as the method for making rational business choices.
In a recent paper, Marion Fourcade and Rakesh Khurana have reconstructed the twentieth-century evolution of teaching in U.S. business schools, from their being dominated by mere practitioners devoid of any academic legitimacy to their becoming the largest employers of disciplinary trained social scientists. The new era for business schools began after WWII, when the rise of modern management science met the demand for a more scientific kind of managerial capabilities embodying the postwar conception of corporate “command and control,” which in turn stemmed from the war economy experience. In the new conception, “managers were increasingly described as ‘systems designers,’ ‘information processors,’ and ‘programmers’ involved in regulating the interfaces between the organization and its environment and bringing rational analysis to bear on a firm's problems” (Fourcade & Khurana, 2011, p. 18). What Fourcade and Khurana call the “scientization” of business disciplines was triggered by a small group of scholars and academic officers who, working in newly established programs like Carnegie's Graduate School of Industrial Administration (GSIA), managed to transplant into business teaching the decision-making techniques they had crafted doing operations research on behalf of the military during the war (Fourcade & Khurana, p. 5). This paved the way to the new “scientific approach” to the managing of U.S. corporations.
The GSIA did represent a radical departure from existing practices. Business education was seen at Carnegie “as an extension of the social sciences, rooted in quantitative analysis and the behavioral disciplines” (Fourcade & Khurana, p. 15). With the help of a few funding patrons, chief among them the Ford Foundation, the landscape of U.S. business schools rapidly underwent a major transformation, whereby a generation of economists, statisticians, and other social scientists, all with formal PhD training and serious research interests, replaced the old instructors who frequently lacked academic credentials. Thus, “MBA courses were to be taught by disciplinary trained scholars steeped in the latest quantitative methods to study various phenomena of business. […] Business schools were to restructure their own doctoral programs by grounding students in the basic social science disciplines and direct their research toward more fundamental theory” (Fourcade & Khurana, p. 20).
Crucially for our story, Fourcade and Khurana argue that the fertilization was not unidirectional. While the new recruits brought the rigor of scientific methods to the teaching of would-be managers, the encounter with real world business problems affected, and sometimes redirected, their own research agenda, with deep implications for the evolution of contemporary social science and, in particular, of post-1960s economics. As the authors put it, business schools have become increasingly intertwined with the progress of economics over the second half of the twentieth century, “as both recipients and agents of scientific and intellectual change” (Fourcade & Khurana, p. 30; emphasis added).67 Of special importance in this cross-fertilization was the Chicago Graduate School of Business (GSB), whose rapid ascendance in MBA rankings marked at the same time the triumph of mainstream price theory in business teaching and a big transformation in the subject matter and analytical orientations of neoclassical economics itself. While Fourcade and Khurana mention topics such as firm theory, the principal/agent model, rational expectations theory, and financial economics, I surmise that a similar argument may be constructed for Bayesian decision theory.
The clues are indeed significant. There seems to be a substantial degree of overlap between Fourcade and Khurana's story and the spread of Bayesian thinking in the United States. All the three business schools the authors single out in their narrative, Carnegie, Chicago, and Stanford, featured in the (rather short) list of early Bayesian strongholds in U.S. statistics community.
In the case of Chicago GSB, the dean from 1956 to 1973 was that Allen Wallis whom we already met as the first recipient of Captain Schuyler's problem of sequential testing (see section The Explorer: Wald's Economic Approach to Statistics). But first and foremost Chicago was the place where, at the recently founded Department of Statistics, “an amazing confluence of statisticians” materialized around Savage's scholarship (Feinberg, 2003, p. 16). The list included faculty members Harry Roberts and David Wallace, graduate students Morris de Groot (who later helped establish Carnegie Department of Statistics) and Roy Radner, visiting scholars Dennis Lindley, Frederick Mosteller (later the first chair of Harvard Department of Statistics), and John Pratt (who also became a Harvard professor of statistics). Above all, one of the giants of American Bayesianism, Arnold Zellner, was professor of economics and statistics at GSB for 30 years (1966–1996). Beyond practically inventing Bayesian econometrics (Zellner, 1971), Zellner has also given “unparalleled contributions to the infrastructure of the profession: not only through the organization of [the Seminar in Bayesian Inference in Econometrics and Statistics] and the Savage Award, but also as editor or co-editor of 10 volumes of collected papers, as a founding co-editor of the Journal of Econometrics, as founding editor of Journal of Business and Economic Statistics, as founder of the International Society for Bayesian Analysis, and as president of the American Statistical Association” (Berry, Chaloner, & Geweke, 1996, p. xxi).68
Another clue comes from the events of statistical teaching at Harvard Business School (HBS). There, independently of Wald and Savage, Howard Raiffa and Robert Schlaifer developed their own version of Bayesian analysis. In a recent interview, Raiffa has told the story of his “almost religious conversion” to Bayesianism, after learning about the sure-thing principle in Chernoff (1954) and following his understanding that “push[ing] the axiomatics […] it made sense to assign a prior probability distribution over the states of the problem and maximize expected utility” (Raiffa interview, in Feinberg, 2008, p. 141). Raiffa was at the time neither a statistician nor an economist—he knew nothing about Savage and had never taken a course in economics—but his background in game theory and operations research made it natural to him to think of statistics in terms of decision problems. Hired in 1957 jointly by the newly established Harvard Department of Statistics and by HBS, Raiffa spent most of his time working in the latter. It was at HBS that he met Schlaifer, another accidental statistician who, trained as a historian but unexpectedly assigned to teaching basic statistics to HBS students, had quickly immersed himself in the frequentist literature, only to conclude that “standard statistical pedagogy did not address the main problem of a businessman: how to make decisions under uncertainty.” Thus, “he threw away the books and invented Bayesian decision theory from scratch” (Raiffa interview, in Feinberg, 2008, pp. 143–144).69
Both Schlaifer and Raiffa believed that frequentist inference methods were simply unsuited for making real business decisions. When a manager had to make a choice, traditional techniques à la Fisher or Neyman/Pearson were of little help because she often had neither objective probabilities at her disposal nor the possibility to avail herself of a properly constructed sample. Yet, what the manager could do was to apply her experience to formulate an informed probabilistic belief and then exploit any further information to consistently revise her a priori belief. In short, Bayesian techniques looked like the most proper set of decision tools a manager could employ in her day-to-day business. The striking similarity with Wald's original intuition for tackling sequential testing problems70 should not go unnoticed.
Soon Raiffa and Schlaifer were to join forces and teach together an elective course in statistics at HBS. As a pedagogical move within this course, Raiffa transformed the game tree employed in extensive form games into his famous decision tree, where probability distributions, subjectively assessed by the decision maker, are attached to chance nodes—a landmark tool for modern decision theory. Still in support of their shared course, the two published in 1961 Applied Statistical Decision Theory (Raiffa & Schlaifer, 1961). The book, which featured Raiffa's decision tree as the prototype of SDPs, was to become the main reference for the teaching of Bayesian decision theory in U.S. business schools. Finally, in the mid-1960s Schlaifer's course became compulsory for the hundreds of HBS students in the Managerial Economics program. As Raiffa put it, that course was “like an existence theorem: it demonstrated that decision analysis was relevant and teachable to future managers” (Raiffa interview, in Feinberg, 2008, p. 148).
Even more relevant were two further initiatives launched by the duo. First, from 1961 to 1964 they ran the HBS Decision Under Uncertainty weekly seminar, which offered a stage for the presentation of advances in decision theory. The seminar was a real turning point in that it testified that Raiffa and Schlaifer were starting to see themselves as fully fledged decision theorists, rather than “just” statisticians. This new self-characterization of their own work significantly broadened the field of Bayesian analysis. Second, in 1960–1961 Raiffa organized an 11-month program for 40 professors teaching in management schools who felt the need to learn more mathematics. A measure of the program's success is given by the circumstance that future deans of Stanford, Harvard, and Northwestern business schools attended it. In the program Raiffa taught subjectivist statistics with a heavy decision theory orientation, thereby ensuring that “the [Bayesian] gospel radiated outward in schools of management” (Raiffa interview, in Feinberg, 2008, p. 147). Remarkably, the program was financed by the Ford Foundation, the same patron which, by generously supporting both the GSIA and the GSB, as well as other important MBA programs, such as Stanford's,71 had favored the “scientization” revolution in U.S. business schools narrated by Fourcade and Khurana.
Summing up, we may say that Bayesianism was at least “in the air” in the Department of Statistics of places like Carnegie, Chicago, and Stanford, which also happened to host the leading business schools of the new, “scientific” kind. HBS may also be added to the list, thanks to the teaching of scholars like Raiffa, Schlaifer and, later, John Pratt. While of course the conjecture awaits further confirmation (say, by examining MBA curricula and reading lists, or by investigating personal relations between economists and Bayesian statisticians in those very universities), it may provisionally be concluded that the dramatic change underwent by management teaching in the second half of the twentieth century has probably played a major role in the spreading of Bayesianism in general, and of SEUT in particular, within contemporary economics. To reword Marschak's dictum once again, it was not that homo economicus directly became a Bayesian statistician, but possibly that, first, homo managerialis was taught to behave that way, and, then, that economic agents came to be modeled as Bayesian corporate managers.