SEARCH

SEARCH BY CITATION

The popularity of probabilistic (Bayesian) modeling is growing in cognitive science as evidenced by an increase in the number of articles, conference papers, symposia, and workshops on the topic.1 The popularity of the Bayesian modeling framework can be understood as a natural outflow of its success in producing models that describe and predict a wide variety of cognitive phenomena in domains ranging from vision (Yuille & Kersten, 2006), categorization (Anderson, 1990; Griffiths, Sanborn, Canini, & Navarro, 2008), decision making (Sloman & Hagmayer, 2006), and language learning (Chater & Manning, 2006; Frank, Goodman, & Tenenbaum, 2009) to motor control (Körding & Wolpert, 2004, 2006) and theory of mind (Baker, Saxe, & Tenenbaum, 2009). Notwithstanding the empirical success of the Bayesian framework, models formulated within this framework are known to often face the theoretical obstacle of computational intractability. Formally, this means that computations that are postulated by many Bayesian models of cognition fall into the general class of so-called NP-hard problems. Informally, this means that the computations postulated by such models are too resource demanding to be plausibly performed by our resource-bounded minds/brains in a realistic amount of time for all but small inputs.2

NP-hard problems are problems with the property that they can be solved only by super-polynomial time algorithms.3 Such algorithms require an amount of time which cannot be upper bounded by any polynomial function nc (where n is a measure of the input size and c is some constant). Examples are exponential-time algorithms, which require a time that can, at best, be upper bounded by some exponential function cn. To see that such algorithms consume an excessive amount of time, even for medium input size, consider that 225 is more than the number of seconds in a year and 235 is more than the seconds in a millennium. To the extent that the cognitive abilities that Bayesian models aim to describe operate on a time scale of seconds or minutes, computations requiring on the order of years or centuries for their completion are inevitably explanatorily unsatisfactory, no matter how well the models may fit human performance data obtained in the laboratory. Opponents of Bayesian modeling Gigerenzer, Hoffrage, and Goldstein have put it as follows:4

The computations postulated by a model of cognition need to be tractable in the real world in which people live, not only in the small world of an experiment with only a few cues. This eliminates NP-hard models that lead to computational explosion such as probabilistic inference using Bayesian belief networks (...)(Gigerenzer, Hoffrage, & Goldstein, 2008, p. 236)

Bayesian modelers seem to be aware that their models often face the theoretical charge of intractability. Yet we observe that they seem eager to downplay the real challenge posed by ‘‘intractability’’ and are quick to claim that—despite the intractability of exact algorithms—Bayesian computations can be efficiently approximated using inexact algorithms (see, e.g., Chater et al., 2006; Sanborn et al., 2010). Although we agree that human minds/brains likely implement all kinds of inexact or quick-and-dirty algorithms, in this letter we wish to draw attention to the fact that this assumption alone is insufficient for Bayesian modelers to guarantee tractability of their models. The reason is, simply put, that intractable Bayesian computations are not generally tractably approximable. This is not to say, of course, that cognitive algorithms do not approximate Bayesian computations, but rather to claim that approximation by itself cannot guarantee tractability.

With this letter, we wish to communicate two important points with the cognitive science community: First, current claims of tractable approximability of intractable (Bayesian) models in the cognitive science are mathematically unfounded and often provably unjustified. Second, there are a variety of complexity-theoretic tools available that Bayesian modelers can use to assess the (in)tractability of their models in a mathematically sound way.

To make our points, we will use a widely adopted—see, for example, Baker et al., (2009), Chater and Manning (2006), Yuille and Kersten (2006)—subcomputation of cognitive Bayesian models as an illustrative example: probabilistic abduction, a.k.a. most probable explanation (MPE). In brief, this computation is defined by the following input–output mapping:

Most Probable Explanation (MPE)

Input: A set of hypotheses H, a set of observations E, and a knowledge structure K encoding the probabilistic dependencies between observations, hypotheses, and possibly intermediate variables (e.g., K could be a Bayesian network).

Output: A truth assignment for each hypothesis in H with the largest possible conditional probability over all such assignments (more formally, argmaxT(H)PrK(T(H)|E) where T is a function T:H[RIGHTWARDS ARROW]{true, false}).

The computational complexity of MPE has been extensively studied in the computer science literature. Not only is it known that computing MPE is NP-hard (Shimony, 1994), but it is also known that ‘‘approximating’’ MPE—in the sense of computing a truth assignment that has close to maximal probability—is NP-hard (Abdelbar & Hedetniemi, 1998). An even more sobering result is that it has been proven NP-hard to compute a truth assignment with a conditional probability of at least q for any value 0 < q < 1 (Kwisthout, 2010). Importantly, such inapproximability results hold not only for MPE but also for many other computations postulated in Bayesian models. For instance, computations known to be NP-hard to approximate include Bayesian inference (Dagum & Luby, 1993; Sanborn et al., 2010), Bayesian decision making (Schachter, 1986, 1988; Vul, Goodman, Griffiths, & Tenenbaum, 2009), Bayesian planning (Körding & Wolpert, 2006; Littman, Goldsmith, & Mundhenk, 1998), and Bayesian learning (Chickering, 1996; Kemp & Tenenbaum, 2008). Computational complexity results such as these show that claims in the cognitive science literature about the tractable approximability of intractable Bayesian computations are not generally warranted.

We realize that our message may seem counterintuitive from the perspective of the algorithmic-level modeler who implements probabilistic or randomized algorithms for approximating Bayesian computations and who may find that such algorithms may run quite fast and perform quite well. The paradox can be understood as a mismatch between the generality of the (intractable) computational-level Bayesian models and the (tractable) algorithms implemented for ‘‘approximating’’ the postulated input–output mappings. The algorithms will run fast and perform well only for a proper subset of input domains, viz., those domains for which the computation (exact or approximate) is tractable.

A general methodology for identifying restricted domains of inputs for which otherwise intractable computations are tractable is available (Blokpoel, Kwisthout, van der Weide, & van Rooij, 2010; van Rooij, 2008; van Rooij & Wareham, 2008; van Rooij, Evans, Müller, Gedge, & Wareham, 2008) and builds on the mathematical theory of parameterized complexity (Downey & Fellows, 1999). Parameterized complexity theory is motivated by the observation that some NP-hard problems can be computed by algorithms whose running time is polynomial in the overall input size n and nonpolynomial only in some small aspect of the input called the input parameter.5 To illustrate, consider a Bayesian network as input structure for some Bayesian cognitive model C. Such a network has many parameters, each of which may have restricted values for the presumed domain of application. For instance, networks may have nodes with at most k1 incoming connections, or at most k2 outgoing connections, or a diameter of at most k3, or a network density of at most k4, or at most k5 independent layers, or consists of at most k6 independent subnetworks, or has node variables with at most k7 different possible values, or have a treewidth (a measure of treelikeness) of at most k8, etc. Observe that even if the computation C is NP-hard, it may still be computable in a time that is nonpolynomial only as function of one or more such parameters k ∈ {k1,k2,k3,…}, and polynomial in the rest of the input size. In those cases, as long as k is relatively small (e.g., much smaller than the size of the entire network), the computation of C may be performed quite fast even for large Bayesian networks.

Using proof techniques from parameterized complexity theory, computer scientists have already been able to show that if the knowledge structure underlying MPE is a Bayesian network with a relatively small treewidth (for details on this special, constrained form of connectivity, see Bodlaender, 1997), and with relatively few possible values for each variable in the network, then computing MPE is tractable (Kwisthout, 2009, 2010; Nilsson, 1998). This means that for all cognitive domains in which such special connectivity and restricted cardinality can be assumed (e.g., psychologically and/or ecologically motivated), then the Bayesian computation MPE is no longer intractable, and approximation algorithms can tractably approximate that computation. Importantly, the tractability is achieved not by giving up on the ‘‘exactness’’ of the postulated computations, but by explicating that the modeled processes operate on domains defined by restricted parameter ranges.

In closing, we wish to emphasize that our purpose is certainly not to downgrade Bayesian models of cognition or to argue that the challenge of making such models tractable cannot be met. On the contrary, we see great potential for Bayesian models to help advance the field of cognitive science and we hope to contribute to this by sharing our observations and pointing to a mathematically sound methodology for making Bayesian models tractable. Given that Bayesian modeling is now an accepted framework for approaching cognition, we think the time is ripe to start thinking seriously about the scalability of these models to real-world scenarios. In doing so, we believe that the Bayesian modeling community can (and should) display the same mathematical and scientific rigor as they have in probabilistic modeling and statistical testing. Making mathematically unfounded claims of efficient approximability of intractable Bayesian computations is not the way to move forward.

Footnotes
  • 1

     For example, a search on keyword ‘‘Bayesian’’ in the proceedings of the 2000, 2004, and 2010 annual meetings of the Cognitive Science Society for papers and posters either using or discussing Bayesian methods shows an increase from one symposium and 5 papers in 2000 and 12 papers in 2004 to 100+ papers and posters in 2010.

  • 2

     The purpose of this letter is not to argue for why NP-hard problems are indeed generally intractable—as this has been done extensively elsewhere (van Rooij, 2008)—nor to argue that intractability is really at stake for Bayesian models—as this is generally acknowledged by critics (Gigerenzer, 2008; Gigerenzer & Todd, 1999) and proponents (Chater, Tenenbaum, & Yuille, 2006; Sanborn, Griffiths, & Navarro, 2010) of Bayesian modeling alike.

  • 3

     This is true, assuming the class of problems solvable in polynomial time, P, is not equal to the class of problems whose solutions can be verified in polynomial time, NP. This famous P ≠ NP conjecture is mathematically unproven, yet widely believed by mathematicians on both theoretical and empirical grounds (Fortnow, 2009; Garey & Johnson, 1979). As do our intended interlocutors, we will assume P ≠ NP for the purposes of our discussion.

  • 4

     As an anonymous reviewer noted, intractability (e.g., NP-hardness) may be used for explaining why humans cannot perform certain tasks well or fast. Be that as it may, Bayesian models typically aim at explaining cognitive tasks that people can do well and fast. It is for this latter explanatory purpose that intractable (NP-hard) computations are problematic.

  • 5

     In these cases, the problem is said to be fixed-parameter tractable for that parameter (Downey & Fellows, 1999). More formally, a computation C is fixed-parameter tractable for a parameter k if there exists at least one algorithm that computes C in time f(k)nc, where f is a function depending only on the parameter k, n is a measure of the input size, and c is a constant.

References

  1. Top of page
  2. References
  • Abdelbar, A. M., & Hedetniemi, S. M. (1998). Approximating MAPs for belief networks is NP-hard and other theorems. Artificial Intelligence, 10, 221238.
  • Anderson, J. R. (1990). The adaptive character of thought. Hillsdale, NJ: Lawrence Erlbaum Associates.
  • Baker, C. L., Saxe, R., & Tenenbaum, J. B. (2009). Action understanding as inverse planning. Cognition, 113, 329349.
  • Blokpoel, M., Kwisthout, J., der Weide, T., & van Rooij, I. (2010). How action understanding can be rational, Bayesian, and tractable. Proceedings of the 32nd Annual Conference of the Cognitive Science Society, pp. 5055. Austin, TX: Cognitive Science Society.
  • Bodlaender, H. L. (1997). Treewidth: Algorithmic techniques and results. Proceedings of the 22nd International Symposium on Mathematical Foundations of Computer Science, pp. 1936. Berlin: Springer-Verlag.
  • Chater, N., & Manning, C. (2006). Probabilistic models of language processing and acquisition. Trends in Cognitive Sciences, 107, 335344.
  • Chater, N., Tenenbaum, J. B., & Yuille, A. (2006). Probabilistic models of cognition: Conceptual foundations. Trends in Cognitive Sciences, 107, 287201.
  • Chickering, D. M. (1996). Learning Bayesian networks is NP-complete. In D.Fisher & H.-J.Lenz (Eds.), Learning from data: AI and statistics V (pp. 121130). Heidelberg: Springer-Verlag.
  • Dagum, P., & Luby, M. (1993). Approximating probabilistic inference in Bayesian belief networks is NP-hard. Artificial Intelligence, 160, 141153.
  • Downey, R., & Fellows, M. (1999). Parameterized complexity. Berlin: Springer-Verlag.
  • Fortnow, L. (2009). The status of the P versus NP problem. Communications of the ACM, 52, 978986.
  • Frank, M. C., Goodman, N. D., & Tenenbaum, J. B. (2009). Using speakers’ referential intentions to model early cross-situational word learning. Psychological Science, 20, 578585.
    Direct Link:
  • Garey, M. R., & Johnson, D. S. (1979). Computers and intractability: A guide to the theory of NP-completeness. San Francisco, CA: W.H. Freeman.
  • Gigerenzer, G. (2008). Why heuristics work. Perspectives in Psychological Science, 31, 2029.
    Direct Link:
  • Gigerenzer, G., & Todd, P. M. (1999). Simple heuristics that make us smart. Oxford, England: Oxford University Press.
  • Gigerenzer, G., Hoffrage, U., & Goldstein, D. G. (2008). Fast and frugal heuristics are plausible models of cognition: Reply to Dougherty, Franco-Watkins, and Thomas (2008). Psychological Review, 115, 1230239.
  • Griffiths, T. L., Sanborn, A. N., Canini, K. R., & Navarro, D. J. (2008). Categorization as nonparametric Bayesian density estimation. In M.Oaksford & N.Chater (Eds.), The probabilistic mind: Prospects for rational models of cognition (pp. 303350). Oxford, England: Oxford University Press.
  • Kemp, C., & Tenenbaum, J. B. (2008). The discovery of structural form. Proceedings of the National Academy of Sciences, 10531, 1068710692.
  • Körding, K. P., & Wolpert, D. M. (2004). Bayesian integration in sensorimotor learning. Nature, 42, 7244247.
  • Körding, K. P., & Wolpert, D. M. (2006). Bayesian decision theory in sensorimotor control. Trends in Cognitive Sciences, 107, 319326.
  • Kwisthout, J. H. P. (2009). The computational complexity of probabilistic networks. Unpublished doctoral dissertation, Faculty of Science, Utrecht University, The Netherlands.
  • Kwisthout, J. H. P. (in press). Most probable explanations in Bayesian networks: Complexity and algorithms. International Journal of Approximate Reasoning.
  • Littman, M. L., Goldsmith, J., & Mundhenk, M. (1998). The computational complexity of probabilistic planning. Journal of Artificial Intelligence Research, 9, 9136.
  • Nilsson, D. (1998). An efficient algorithm for finding the M most probable configurations in probabilistic expert systems. Statistics and Computing, 8, 159173.
  • Sanborn, A. N., Griffiths, T. L., & Navarro, D. J. (2010). Rational approximations to rational models: Alternative algorithms for category learning. Psychological Review, 117(4), 11441167.
  • Schachter, R. (1986). Evaluating influence diagrams. Operations Research, 34, 871882.
  • Schachter, R. (1988). Probabilistic inference and influence diagrams. Operations Research, 36, 589604.
  • Shimony, S. E. (1994). Finding MAPs for belief networks is NP-hard. Artificial Intelligence, 682, 399410.
  • Sloman, S. A., & Hagmayer, Y. (2006). The causal psycho-logic of choice. Trends in Cognitive Sciences, 10, 407412.
  • van Rooij, I. (2008). The tractable cognition thesis. Cognitive Science, 32, 939984.
  • van Rooij, I., & Wareham, T. (2008). Parameterized complexity in cognitive modeling: Foundations, applications and opportunities. Computer Journal, 513, 385404.
  • van Rooij, I., Evans, P., Müller, M., Gedge, J., & Wareham, T. (2008). Identifying sources of intractability in cognitive models: An illustration using analogical structure mapping. Proceedings of the 30th Annual Conference of the Cognitive Science Society (pp. 915920). Austin, TX: Cognitive Science Society.
  • Vul, E., Goodman, N. D., Griffiths, T. L., & Tenenbaum, J. B. (2009). One and done? Optimal decisions from very few samples. 31st Annual Meeting of the Cognitive Science Society (pp. 148153). Austin, TX: Cognitive Science Society.
  • Yuille, A., & Kersten, D. (2006). Vision as Bayesian inference: Analysis by synthesis? Trends in Cognitive Sciences, 107, 301308.