But what are concepts save formulations and creations of thought, which, instead of giving us the true form of objects, show us rather the forms of thought itself? (Cassirer, 1946, p. 7)

The study of concepts—what they are, how they are used and how they are acquired—has provided one of the most enduring and compelling windows into the structure of the human mind. What we look for in a theory of concepts, and what kinds of concepts we look at, depends on the functions of concepts that interest us. Three intuitions weave throughout the cognitive science literature (e.g., see Fodor, 1998; Murphy, 2002):

- 1
*Concepts are mental representations that are used to discriminate between objects, events, relationships, or other states of affairs*. Cognitive psychologists have paid particular attention to concepts that identify kinds of things—those that classify or categorize objects—and such concepts are our focus here. It is clear how an ability to separate objects according to kind could be critical to survival. To take a classic example, a decision about whether it is appropriate to pursue or to flee an intruder depends on a judgment of kind (prey or predator), and an error in this judgment could have disastrous consequences. - 2
*Concepts are learned inductively from the sparse and noisy data of an uncertain world*. Animals make some instinctive discriminations among objects on the basis of kind, but cognition in humans (and probably other species) goes beyond an innate endowment of conceptual discriminations. New kind concepts can be learned, often effortlessly despite great uncertainty. Even very sparse and noisy evidence, such as a few randomly encountered examples, can be sufficient for a young child to accurately grasp a new concept. - 3
*Many concepts are formed by combining simpler concepts, and the meanings of complex concepts are derived in systematic ways from the meanings of their constituents*. Concepts are the constituents of thought, and thought is in principle unbounded, although human thinkers are clearly bounded. The “infinite use of finite means” (Humboldt, 1863) can be explained if concepts are constructed, as linguistic structures are constructed, from simpler elements: For example, morphemes are combined into words, words into phrases, phrases into more complex phrases and then sentences.

In our view, all of these intuitions about concepts are central and fundamentally correct, yet previous accounts have rarely attempted to (or been able to) do justice to all three. Early work in cognitive psychology focused on the first theme, concepts as rules for discriminating among categories of objects (Bruner, Goodnow, & Austin, 1956). Themes 2 and 3 were also present, but only in limited ways. Researchers examined the processes of learning concepts from examples, but in a deductive, puzzle-solving mode more than an inductive or statistical mode. The discrimination rules considered were constructed compositionally from simpler concepts or perceptual features. For instance, one might study how people learn a concept for picking out objects as “large and red and round.” An important goal of this research program was to characterize which kinds of concepts were harder or easier to learn in terms of syntactic measures of a concept's complexity, when that concept was expressed as a combination of simple perceptual features. This approach reached its apogee in the work of Shepard, Hovland, and Jenkins (1961) and Feldman (2000), who organized possible Boolean concepts (those that discriminate among objects representable by binary features) into syntactically equivalent families and studied how the syntax was reflected in learnability.

A second wave of research on concept learning, often known as the “statistical view” or “similarity-based approach,” emphasized the integration of Themes 1 and 2 in the form of inductive learning of statistical distributions or statistical discrimination functions. These accounts include prototype theories (Medin & Schaffer, 1978; Posner & Keele, 1968), exemplar theories (Kruschke, 1992; Nosofsky, 1986; Shepard & Chang, 1963), and some theories in between (Anderson, 1990; Love, Gureckis, & Medin, 2004). These theories do not rest on a compositional language for concepts and so have nothing to say about Theme 3—how simple concepts are combined to form more complex structures (Osherson & Smith, 1981).

An important recent development in the statistical tradition has been the rational analysis of concept learning in terms of Bayesian inference (Anderson, 1990; Shepard, 1987; Tenenbaum & Griffiths, 2001). These analyses show how important aspects of concept learning—such as the exponential decay gradient of generalization from exemplars (Shepard, 1987) or the transitions between exemplar and prototype behavior (Anderson, 1990)—can be explained as approximately optimal statistical inference given limited examples. However, these rational analyses have typically been limited by the need to assume a fixed hypothesis space of simple candidate concepts—such as Shepard's (1987) “consequential regions”: connected regions in a low-dimensional continuous representation of stimuli. The standard Bayesian framework shows how to do rational inductive learning given such a hypothesis space, but not where this hypothesis space comes from or how learners can go beyond the simple concepts it contains when required to do so by the complex patterns of their experience.

The last decade has also seen renewed interest in the idea that concepts are constructed by combination, as a way of explaining apparently unbounded human conceptual resources. This is especially apparent in accounts of concepts and concept learning that place compositionality at center stage (Fodor, 1998; Murphy, 2002). Logical or rule-based representations are often invoked. Most relevant to our work here is the rules-plus-exceptions (RULEX) model of Nosofsky, Palmeri, and McKinley (1994), which is based on a set of simple heuristics for constructing classification rules, in the form of a conjunction of features that identify the concept plus a conjunction of features that identify exceptions. The RULEX model has achieved strikingly good fits to classic human concept learning data, including some of the data sets that motivated statistical accounts, but it too has limitations. Fitting RULEX typically involves adjusting a number of free parameters, and the model has no clear interpretation in terms of rational statistical approaches to inductive learning. Perhaps most important, it is unclear how to extend the rule-learning heuristics of RULEX to more complex representations. Therefore, although RULEX uses a compositional representation, it is unable to fully leverage compositionality.

Our goal here is a model that integrates all three of these major themes from the literature on concepts and concept learning. Our Rational Rules model combines the inferential power of Bayesian induction with the representational power of mathematical logic and generative grammar; the former accounts for how concepts are learned under uncertainty, whereas the latter provides a compositional hypothesis space of candidate concepts. This article is only a first step toward an admittedly ambitious goal, so we restrict our attention to some of the simplest and best-studied cases of concept learning from the cognitive psychology literature. We hope readers will judge our contribution not by these limits, but by the insights we develop for how to build a theory that captures several deep functions of concepts that are rarely integrated and often held to be incompatible. We think these insights have considerable generality.

Our approach can best be understood in the tradition of rational modeling (Anderson, 1990; Oaksford & Chater, 1998), and specifically rational models of generalization (Shepard, 1987; Tenenbaum & Griffiths, 2001). The main difference with earlier work is the adoption of a qualitatively richer and more interesting form for the learner's hypothesis space. Instead of defining a hypothesis space directly as a set of subsets of objects, with no intrinsic structure or constructive relations between more complex hypotheses and simpler ones, we work with a compositional hypothesis space generated by a probabilistic grammar. The grammar yields a “concept langauge” that describes a range of concepts varying greatly in complexity, all generated from a small basis set of features or atomic concepts. Hypotheses range from the simplest, single feature rules to complex combinations of rules needed to describe an arbitrary discrimination boundary. The prior probability of each hypothesis is not specified directly, by hand, but rather is generated automatically by the grammar in a way that naturally favors the simpler hypotheses. By performing Bayesian inference over this concept language, a learner can make rational decisions about how to generalize a novel concept to unseen objects given only a few, potentially noisy, examples of that concept. The resulting model is successful at predicting human judgments in a range of concept learning tasks at both the group and individual level, using a minimum of free parameters and arbitrary processing assumptions.

This analysis is an advance for several viewpoints on concepts and concept learning. From the vantage of rule-based models of categorization (e.g., RULEX) the Rational Rules model provides a rational inductive logic explaining why certain rules should be extracted. This naturally complements the insights of existing process-level accounts by tying the rule-learning competency to general principles of learning and representation. To rational statistical accounts of concept learning, the grammar-based approach of the Rational Rules model contributes a more satisfying account of the hypothesis space and prior: From simple components the grammar compactly specifies an infinite, flexible hypothesis space of structured rules and a prior that controls complexity. For compositionality advocates, we provide a way to use the “language of thought” (Fodor, 1975; reified in the concept language of logical rules) to do categorization under uncertainty—that is, the grammar-based induction approach suggests a compelling way to quantitatively relate the language of thought, already an elegant and rational view of the mind, to human behavior, even at the level of predicting individual participants. Finally, by showing that the rational statistical approach to concepts is compatible with the rule-based, combinatorial, approach, and that the union can accurately predict human behavior, we hope to cut one of the gordian knots of modern psychology: the supposed dichotomy between rules and statistics.