Formal learning theory, broadly construed, is a diverse collection of approaches to the mathematical modeling of learning. From the point of view of cognitive science, formal learning theory can provide constraints on what is learnable by different types of idealized mechanism. Moreover, in some cases, specific algorithms discussed may be part of the cognitive model. Therefore, such modeling could potentially be relevant to human learning, either by modeling the learning algorithm directly or by deriving theoretical results which constrain what any learning system may possibly achieve given certain data. In contrast, many theoretical discussions and computational frameworks for understanding learning in cognitive science are often not appropriately related to theoretical findings, so that, for example, whether a particular model can, in principle, scale up to more complex cases may be difficult to assess. Conversely, when not directly tied to cognitive scientific questions (or practical challenges in machine learning), formal learning theory can become a rather specialized mathematical activity, with no clear application. This disconnect between two disciplines that can productively inform each other more fully deserves some remedy, and it is our hope that the papers within the present topic go some way toward fostering a greater connection.

Although cognitive science is most canonically concerned with the construction of computational models of specific cognitive phenomena (including learning of all kinds, and of course language acquisition), there remain fundamental questions about the capabilities of different classes of cognitive model, and about the classes of data from which such models can successfully learn. In view of this, formal learning theory has the potential to play a role within cognitive science analogous to that played by theoretical computer science with respect to applied computing. Thus, we suggest that the analysis of learning in cognitive science has a rich vein of theoretical work to draw upon, but too often the literature describes specific computational learning simulations that are not accompanied by, or situated within, any theoretical analysis.

Theoretical analysis of learning is frequently discussed under the heading of “inductive inference”—viewing learning as a type of inference in which the “premises” typically consist of observed data (which might be perceptual input, reward signals from the environment, linguistic materials, and so on). Early formal work on inductive inference attempted to model the process by extending classical logic to inductive logic, for example Carnap (1950), building on probability theory. Although still pursued in some fields (Anshakov & Gergely, 2010; Milch et al., 2007; Muggleton, 1990), research has typically shifted away from the powerful but often intractable representational machinery of formal logic, to models of learning defined over much more restricted representational formalisms. Thus, interest often focuses on relatively restrictive problems, such as learning categories from examples, or learning what sentences are allowed in a language from examples (or, even more abstractly, learning potentially infinite sets of *numbers* from examples, and perhaps also nonexamples, labeled as such).

Various formal learning models have gradually come to the fore, including Bayesian inference, the statistical inference model codified in Solomonoff (1964a), the Probably Approximately Correct framework proposed by Valiant (1984), and the “identification in the limit” model of (Gold, 1967). The now well-known “Gold's theorem” from the latter launched a vigorous debate in linguistics which is far from over. Its deceptive simplicity has led to its being possibly more often misunderstood than correctly interpreted within the linguistics and cognitive science community, as was richly documented by Johnson (2004). All of these learning models fall under the rubric of *supervised,* because they must use data that are in some sense labeled—either the learner assumes all examples are positive examples of one concept such as a language, or a more elaborate distribution of multiple labels can be provided, as with category learning. Another area of research involves induction of structured knowledge from unlabeled data—so-called *unsupervised* learning. This area encompasses such disparate enterprises as data clustering, independent components analysis, and self-organizing maps; it is generally difficult to measure the success of such unsupervised techniques. The use of unlabeled data in conjunction with labeled data has developed into an interesting subfield of *semi-supervised* learning. We will next present a brief overview of learning theory covering many subfields, attempting to touch upon all of the approaches that are applied in the contributed papers.