Philosophy and Language Testing

Evaluation, Methodology, and Interdisciplinary Themes

Part 12. Interdisciplinary Themes

  1. Glenn Fulcher

Published Online: 28 OCT 2013

DOI: 10.1002/9781118411360.wbcla032

The Companion to Language Assessment

Philosophy is concerned with “rational thinking about . . . the general nature of the world (metaphysics or theory of existence), the justification of belief (epistemology or theory of knowledge) and the conduct of life (ethics or theory of value)” (Honderich, 1995, p. 666). In education and language testing we are concerned with questions of ontology (what we believe to be true), epistemology (how we discover what is true), and the consequences of testing (the nature of ethical practice). This chapter will focus primarily on questions of ontology and epistemology, as ethics is dealt with separately in Chapter 7. Furthermore, while general agreement among language testers exists on key ethical principles to guide our practice, there are radical differences of views regarding ontological and epistemological questions.

As far as epistemology is concerned, the question usually boils down to: “Should the human sciences emulate the methods of the natural sciences or should they develop their own?” (Polkinghorne, 1983, p. 15). Realists—heirs to Hobbes, Mill, and Comte, who believe in the existence of what we observe and test independently of the observer or tester—give special place to the scientific method. Antirealists, on the other hand, usually hold that the constructs we claim to test are not independent of the language tester or the act of testing. The so-called “objects” of our observation exist only in relation to our interpretations of them as they are locally constructed. They would argue with Dilthey (1883/2008) that the richness of human experience and culture cannot be captured by methods developed for the natural sciences. Of particular importance in language testing is the “social turn,” which brings critical analysis to test use and impact. There is much room for disagreement here. Paradigm clashes are not unusual in the social sciences, but in language testing the fault lines are more pronounced because, for most of its history, it has been firmly grounded in the scientific realism of early quantitative approaches: “One of the most important objects of measurement . . . is to obtain a general knowledge of the capacities of man by sinking shafts, as it were, at a few critical points” (Cattell & Galton, 1890, p. 380). In this chapter I set out the realist and antirealist positions, realizing that there are many gradations between the two. I argue that extreme positions on the cline are untenable. I make a case for realism in the pragmatist tradition, which is not to be associated with the naive realism that is the target of constructivism. I also recognize the role for critical research, especially where language testing is misused or abused. I conclude by proposing an optimistic view of the future within an Enlightenment-inspired framework.

I begin by describing the realist position, and then move on to antirealist stances. With Bachman (2006, pp. 196–7), I distinguish two kinds of antirealist stance, the constructivist and the operationalist, although I prefer to call the latter instrumentalist for reasons that will become clear, and because Kane (2006b, p. 442) explicitly distances his approach to validation from the operationalist position. I then discuss two key issues upon which language testers are in fundamental disagreement because of their philosophical positions. I then briefly indicate the research each position generates, and outline the challenges they face. Finally, I suggest a way forward based on classical pragmatism.


Realists hold to the Enlightenment view that the scientific method is the most productive in empirical research (whether quantitative or qualitative), as expressed by Popper (1959, p. 3):

A scientist, whether theorist or experimenter, puts forward statements, or systems of statements, and tests them step by step. In the field of the empirical sciences, more particularly, he constructs hypotheses, or systems of theories, and tests them against experience by observation and experiment.

The applicability of realism to social sciences has also been championed by educationalists such as Dewey, for whom

the scientific method is simply the method of experimental enquiry combined with free and full discussion—which means, in the case of social problems, the maximum use of the capacities of citizens for proposing courses of action, for testing them, and for evaluating the results. (Putnam, 1990, p. 190)

Theories and evidence that provide the basis for decision making need to be assessed using generally accepted criteria. In language testing, four have been suggested (Fulcher & Davidson, 2007, p. 20):

  1. Testability: 
    Theory generates predictions that can be tested, specifically to see whether scores support inferences from test-taker responses to skills, abilities, or knowledge, and to investigate if inferences are generalizable, and capable of extrapolation to the real world.
  2. Simplicity (Ockham's razor): 
    The requirement that the theory does not use more abstract terms or constructs than are necessary to explain the evidence available.
  3. Coherence: 
    The need to construct theories that are in keeping with what is already known, as well as for the theory itself to be internally coherent.
  4. Comprehensiveness: 
    The requirement that our theories account for as much of the available data and facts as possible.

It is argued that these criteria are “paradigm free” and can be used in theory and model evaluation of any kind. However, the logic of the key criterion of testability assumes an evidential approach to validation, which in turn presupposes that the evidence exists. It seems reasonable that a researcher in any evidence-based discipline must subscribe to this notion, encapsulated in this summary of Hume's position: “He holds that objects that have real existence must have duration and must be independent of what we individually think about them” (Meyers, 2006, p. 63). In order to test theories we must have experiences of enduring objects, events, or states that co-occur to a degree that would minimally allow us to make statements about the likelihood of, and possible reasons for, co-occurrence.

In language testing this leads to two claims. First, that individuals have a stable language competence and capacity for use that endures for some time even though it is subject to change (through learning or attrition), and that responses to test items or tasks can be translated into numbers that are indexical of that competence. This is not to deny that communication is a social act, but recognizes that, unless an individual has an enduring performable competence, they cannot engage in anything like the “co-construction” of discourse (Fulcher, 2003, pp. 19–20). Second, that score meaning can be generalized and extrapolated to relevant domains for a reasonable period of time, and with a known degree of probability: our theory makes predictions about the likelihood of future events.

Language testing has, for the most part, relied on realist assumptions throughout its history, partly because it has been largely dependent upon the normative practices in measurement that Quetelet imported into social science research from astronomy in the creation of his “social physics” (1842/1962, p. 9); and, as Hamp-Lyons (2000, p. 582) has argued, “The early history of language testing on the American side of the Atlantic is part of the larger story of intelligence testing, which was firmly grounded in positivism.” This observation is largely correct, even if the geographical claim and the reference to positivism are not. First, there had always been an interest in measurement in the United Kingdom (Edgeworth, 1888, 1890), and in 1923 Ballard (1923, p. 29) could write

The British Press refers to mental tests as though they were new things invented by Americans. In point of fact they are neither new nor American. They have been the common property of the race since the dawn of history.

Ballard cites research by Cyril Burt, as well as the adaptation of the Binet tests. Second, the label “positivism” is now typically used pejoratively, and with less specificity than it deserves. Most researchers who hold a realist position do not hold positivist views or espouse the verifiability principle (Jordan, 2004, p. 32). Such a position is nominalist, and therefore profoundly antirealist. In arguing that only verifiable statements are meaningful, and that only words which refer to observables are capable of verification, all “theoretical” words are rendered unintelligible (Devitt & Sterelny, 1987, pp. 189–90). Without theoretical language, scientific research programs are unattainable; this is why positivism is referred to as “the linguistic turn” in philosophy.


Constructionism (or social constructionism) is a postmodern approach that does not ask about truth, but wishes to uncover the historical and cultural reasons that led to the currently dominant version of truth. This may take the form of deconstructing text where no form (particularly scientific) has any special status (Derrida), or uncovering the power structures that are claimed to marginalize people while legitimizing the power of the elite (Foucault). Constructivists hold that our tests and what they measure are contingent upon the social context in which they are designed and used.

All shades of constructionism are therefore critical, and the basic assumptions are laid out by Hacking (1999):

  • 0. X is taken for granted. X appears to be inevitable.
  • 1. X need not have existed. X could have been different.
  • 2. X is bad.
  • 3. We would be better off if X were changed, or if X did not exist.

To be a constructivist, it is necessary to subscribe to at least (0) and (1), and it is (1) that gives constructionism its edge: Our current beliefs and practices, including our theories and constructs, are contingent. If a constructivist also holds (2), she is usually committed to unmasking the evils of X in order to undermine the power or authority that is associated with it, or wishes to reform aspects of X. When a constructivist also holds (3) the attacks on X are usually strident, foregrounding injustices, marginalization, or subjugation of peoples. In applied linguistics the language becomes one of struggle and conflict, with charges of “cultural imperialism” and a determination by the powerful “centre” (Western cultures and Anglo-American norms) to keep the “periphery in a state of dependence” (Phillipson, 1988, p. 348). All groups who can be cast as minority or downtrodden are drawn into the argument, and labels such as “patriarchal,” “oppressive,” and “positivist” are attached to alternative views (Pennycook, 2001).

Social constructivist schools of thought bring the same critical approach to “knowledge,” which for them is also contingent. The concept becomes a battleground in education because constructivists claim that it is the powerful who decide what “knowledge” counts and is therefore learned and tested. Testing is seen as the mechanism through which the elite exercise power and maintain their position (Foucault, 1975, pp. 184–94). Questions of inductive inference are irrelevant, because all “knowledges” are equal in value; facts do not help to build, support, or undermine theories, for “the facts emerge only in the context of some point of view” (Fish, 1995, p. 253). The ultimate statement of this extreme position was provided by Nietzsche (1888, ¶ 604):

“Interpretation,” the introduction of meaning not “explanation” (in most cases a new interpretation over an old interpretation that has become incomprehensible, that is now itself only a sign). There are no facts, everything is in flux, incomprehensible, elusive; what is relatively most enduring is—our opinions.

This carries a number of implications. First, no utterance (consisting of conventional signs—or words) can be evaluated in terms of whether it succeeds or fails to correspond to some external reality. Rather, use of language is a moment-by-moment attempt to deal with experience, whether of other people or of our environment. Attempts to decide if conventional signs “fit the facts” or describe “the way the world is” are futile (Rorty, 1989, p. 121); we are simply negotiating our way through existence. Reference from conventional signs to the real world as described by Frege (1892) is no longer of concern. Second, dualism is abolished. What is language? Nothing but “new forms of life constantly killing off old forms—not to accomplish a higher purpose, but blindly” (Rorty, 1989, p. 120). This nominalism (which constructivism shares with positivism!) makes it equally meaningless to ask questions about psychological states, as they are transitory and ephemeral. They simply cannot be known, explained, or predicted. What we are left with is the transient social construction of meaning on an interaction-by-interaction basis.


Although I have classed instrumentalism as antirealist, it may be more appropriate to call it nonrealist, because instrumentalists hold that, if a test assists in useful decision making, that is really all that matters. For instrumentalists the issue of whether the terms of theories refer to any real entity is simply irrelevant. They accept Hume's fork, and hold that nondeductive (subjective) inference is always subject to question and error. One argument for instrumentalism is provided by Laudan (1981a) in his critique of realism, in which he uses historical evidence to undermine the premise that successful theories have terms that refer. For example, atomic theory failed to be empirically successful for hundreds of years, while the miasmatic theory of disease transmission was: it led to policies of moving people away from ports and introducing quarantine. Thus, theories are evaluated primarily on the grounds of the degree to which they enable us to predict phenomena and manipulate our environment in useful ways, as we can never be certain that our terms refer.

Each of the three positions described in the introduction have impacted upon language testing, leading to incommensurable stances that are explored in the next section.

Current Positions on Key Issues

I have selected two themes for discussion. My rationale is that these best illustrate fault lines that are directly related to philosophical beliefs.

Constructs/Theoretical Terms

Bachman (2006, pp. 182–3) writes: “When a researcher observes some phenomenon in the real world, he generally does this because he wants to describe, induce or explain something on the basis of this observation. That something is what can be called a ‘construct’.” These are nonobservable abstract nouns that are operationalized in such a way that we may make inferences about them from our observations (Fulcher & Davidson, 2007, p. 7). Realists minimally subscribe to the “reality” of these nonobservables.

This is very close to a correspondence theory of truth—the natural home of the realist. Models of communicative competence/language ability, from Oller's use of Spearman's “g” to modern componential approaches, rest on an assumption that the terms of the theory refer to real competences that are not merely useful fictions.

Some researchers explicitly work within this paradigm rather than just assume it to be the case:

We argue that the validity of any given teaching, learning, and assessment task—whether it is representative, authentic, and generalizable—is just a more complex version of the problem of determining whether a representation of a given state of affairs is true or not. We provide two logical arguments. Both of them show the construal (production and interpretation) of surface forms of discourse in order to represent faithfully (and truthfully) certain changing states of affairs in the real world is the necessary and sufficient basis for any validity to be found in any teaching, learning, and assessment tasks whatever. (Badon, Oller, Yan, & Oller, 2005, p. 2)

Badon et al. argue that the validity of a test of aviation English can be evaluated on the grounds of whether or not language used by pilots, air traffic controllers, and test takers represents a true state of affairs in the real world. The facts of real-world events must be encoded into recognized conventional signs (linguistic realizations). Based on Oller's theory of pragmatic mapping, the validity question becomes whether the construct to be measured exists, and whether variation in scores is causally linked to variations in the construct. It is therefore necessary to develop tasks which require test takers to refer to objects and events in the real world, and use language to control and change events.

The data-based approach to scale development, with its careful analysis of language use in context, but relating observable variables to constructs such as “discourse management” and “pragmatics,” would sit comfortably within this kind of interpretation (Fulcher, Davidson, & Kemp, 2011). For this reason we add the further observation that realist approaches do not abandon context. Rather,

The authenticity, representativeness, and consequent generalizability of teaching, learning, and assessment tasks depends on their incorporation of the sign systems, social actions, and realia found in actual contexts of discourse. While codes, contexts, and interactions must be distinguished in theory, in practice they interact holistically. (Badon et al., 2005, p. 1)

For realists, context is real, not constructed, and so, while it is important to maintain a connection between the world and conventional signs, realists must also take seriously implicature and illocutionary intent.

Some would go further and argue that the term “construct” needs to be distinguished from “trait,” as the former implies that the theoretical term is a construction of the researcher: It may be part of a nomological net, but does not refer. That is, construct theorists are said to really be constructivists with a scientific air about them. For example, they may admit that a number of models could fit their data, and the theoretical terms could vary by model. In contrast, Blackburn (2005, p. 118) describes a “real realist, an industrial strength, meat-eating realist” as someone who holds that (a) there are no such things as constructs, only traits, which refer to properties that exist in the real world, are discovered not created, and exist independently of the researcher or theories, and (b) the terms define the properties in ways that are not contingent. This position is best represented by Borsboom and colleagues, who argue:

Realism, in the context of measurement, simply says that a measurement instrument for an attribute has the property that it is sensitive to differences in the attribute; that is, when the attribute differs over objects then the measurement procedure gives a different outcome. (Borsboom, Cramer, Kievit, Scholten, & Franic, 2009, p. 148)

Validity in this formulation is equivalent to the existence of what the test measures, and goes back to the strongest scientific claims for testing made in the 19th and early 20th centuries. The argument is that only “if this ontological claim holds, then the measurement procedure can be used to find out about the attributes to which it refers” (Borsboom, 2005, p. 152).

Constructivism is incommensurable with all shades of realism. Constructivists challenge the primary claim that there are facts or traits in the real world that exist independently of the mind of the researcher or test taker. The world itself is constructed. The trail of the human serpent is everywhere.

Do language testers deal with “facts” or things that exist? McNamara argues that they do not. He represents a trend in language-testing research that focuses upon the social nature of language testing, and the dependency of all concepts and communication on locally situated interaction:

Recent work has drawn attention to the potential of poststructuralist thought in understanding how apparently neutral language proficiency constructs are inevitably socially constructed and thus embody values and ideologies (McNamara, 2001, 2006). It is worth noting here that the deconstruction of such test constructs applies no less to constructs in other fields of applied linguistics, notably second language acquisition.

There is also a growing realization that many language test constructs are explicitly political in character and hence not amenable to influences which are not political. (McNamara, 2006, pp. 37–8)

The constructs have no “existence” in the external world, and their conventional names are signs constructed for social—primarily political—purposes. More specifically, tests play a critical role in the power struggles that constitute identity-forming social life, and may be deconstructed using Foucauldian insights (Shohamy, 2001, pp. 20–4, 54–8). The proper focus of attention is the social construction of tests, their social impact, and role in policy. Construct labels no longer refer, reducing them to the embodiment of the values and ideologies at play in the power struggles of the day.

As a direct consequence, the role of cognition is downplayed in critiques of validity theories, and the link between performance (observation) and competence (construct) abolished. Using the notion of performativity from feminist poststructuralism, McNamara also suggests:

We assume in language testing the existence of prior constructs such as language proficiency or language ability. It is the task of the language tester to allow them to be expressed, to be displayed, in the test performance. But what if the direction of the action is the reverse, so that the act of testing itself constructs the notion of language proficiency? (McNamara, 2001, p. 339)

Presumably, in the process of testing, we see just another transitory interaction, or what Davidson (1980) refers to as “a passing theory,” in which identity and meaning are temporarily constructed and deconstructed:

In linguistic communication nothing corresponds to a linguistic competence as often described . . . I conclude that there is no such thing as a language, not if a language is anything like what many philosophers and linguists have supposed. There is therefore no such thing to be learned, mastered, or born with. We must give up the idea of a clearly defined shared structure which language-users acquire and then apply to cases. (Davidson, 1980, p. 265)

The instrumentalist position makes no assumption about construct reality. Nor does it admit the necessity of constructs for language testing to be a successful enterprise. Validity is an issue of whether the testing processes lead to useful outcomes. This is the primary reason for the move from talk of “validity” (Messick, 1989) to talk of “validation” (Kane, 2006a). Although Kane uses the language of constructs and traits, he argues that “The use of trait language does not necessarily buy us much, and it can be misleading. It can suggest that we have found an explanation for an observed regularity, when we have merely labelled it” (Kane, 2006a, p. 30). Such an error is defined as “reification” (Kane, 2006a, p. 59). Kane (2009, pp. 54–7) has also argued that it is possible to avoid construct language completely, scoring only relevant observable variables displayed in tasks sampled from the universe of generalization. Chapelle, Enright, and Jamieson (2010) embrace this position, arguing that the construct of academic language proficiency has proved too difficult to define and articulate as a basis for test development and validation: “Kane's organizing concept of an ‘interpretive argument,’ which does not rely on a construct, proved to be useful” (Chapelle et al., 2010, pp. 3–4). Bypassing construct labels and definitions, they move straight from observables to claims using the Toulmin model as the basis for an interpretive argument (see Figure 1).


Figure 1. Interpretive argument.

Adapted from Chapelle et al. (2010, p. 5)

The evidence leads to a score generated by scoring rules (the application of a scoring rubric), and an inference is made from the score to the claim. It is important to note that this is done without the need for a construct inference such as the student's “fluency.”

The procedures for constructing and evaluating interpretative arguments are generic, but adapted to the specific claims of each assessment context (Kane, 2010, p. 79). Constructing and challenging arguments has an analogy in the courtroom where, “If the procedures have not been followed correctly or if the procedures themselves are clearly inadequate, the interpretive argument would be effectively overturned” (Kane, 2006a, p. 29). The role of the prosecution is to undermine the defence's argument with alternative explanations of the data. The argument of utility for an intended purpose is all that we are able to evaluate.

Neither the “real realists” nor the constructivists are keen on instrumentalism. For the former it does away with the all-important traits (Borsboom, 2006a, p. 431). For the latter it is too concerned with individual cognition (McNamara & Roever, 2006). But this does not matter to instrumentalists, because they accept both critiques: we need pluralism so that we have a range of approaches to solve different problems (Kane, 2006b). If it seems useful, instrumentalists go with it.

Society, Impact, and Consequences

It would appear that the realists have a problem with the impact of tests on society and individuals. Although consequences have been the focus of legal disputes for a long time (Fulcher & Bamford, 1996), the traditional position has been that there is a cause for concern only if “the adverse social consequences are empirically traceable to sources of test invalidity” (Messick, 1989, p. 88). The only exception was Cronbach (1988), who argued that any socially negative effect should be a concern for the test developer. On the other hand, the most strident realists wish to abolish social impact and consequences from validity discussions completely:

Validity is not complex, faceted, or dependent on nomological networks and social consequences of testing. It is a very basic concept and was correctly formulated, for instance, by Kelley (1927, p. 14) when he stated that a test is valid if it measures what it purports to measure. (Borsboom, Mellenbergh, & van Heerden, 2004)

However, other realists do not agree. Badon et al. (2005, pp. 9–10) argue that, if a test can be shown to measure a trait that is critical to aviation communication, and if teaching this trait reduces miscommunication and hence aviation accidents, this would (a) constitute evidence of validity, and (b) have a positive social consequence.

Clearly, this is not likely to be enough for constructivists. McNamara and Roever (2006, pp. 250–1), for example, describe Borsboom's version of realism as an attempt to “strip validity theory of its concern for values and consequences and to take the field back 80 years to the view that a test is valid if it measures what it purports to measure.” They quote Shohamy with approval:

The ease with which tests have become so accepted and admired by all those who are affected by them is remarkable. How can tests persist in being so powerful, so influential, so domineering and play such enormous roles in our society? One answer to this question is that tests have become symbols of power for both individuals and society. Based on Bourdieu's . . . notion of symbolic power, [we] will examine the symbolic power and ideology of tests and the specific mechanisms that society invited to enhance such symbolic power. (Shohamy, 2001, p. 117)

When constructivists turn to instrumentalism, they find that “there is nothing in Kane's model of an interpretative argument, or in its adoption within language testing, even when it focuses on test use, that would invite such reflection” (McNamara & Roever, 2006, p. 39). For constructivists the focus is the test taker as a “political subject in a political context,” and so research that ignores the social and ideological is suspect. Of particular concern is the topic of identity. This comes in two forms. The first is the use of tests for purposes of identifying/classifying, in contexts such as war, immigration, asylum, or citizenship, where there are possibilities of oppression or mistreatment. The second is related to the kind of identity the test taker must assume in order to pass this test, which includes using discourse that reflects the power relations of dominant institutions. In this sense all tests are claimed to be tests of identity (McNamara & Roever, 2006, pp. 196–9) and thus an exercise of power in their own right.

The instrumentalists take a middle position on social impact and consequences. They acknowledge that there are real policy and political issues, and questions of fairness for the individual. They are also happy to embed these within validity theory where Messick placed them. However, dealing with consequences is very much a technical matter: evaluating consequences that stakeholders feel are important using program evaluation as a model (Kane, 2006a, p. 56), rather than adopting a critical stance.

Current Research

Much of the research in designing assessments for specific purposes is generally realist. We have seen that this is the case with aviation English, arguably one of the highest stakes uses of tests. It seems unlikely that stakeholders would wish to use a test that the designers claimed did not measure constructs/traits of interest because they did not exist. Similarly, the growth of interest in diagnostic testing (Jang, 2009) and the assessment of language disorders (Oller, 2012) has a strongly realist flavor. Approaches that employ factor-analytic techniques, particularly structural equation modeling, make strong realist assumptions about traits (e.g., Song, 2008). Work into the design of scoring models also assumes that performance in domains of interest can be described in terms of relevant generalizable traits. For example, Fulcher et al. (2011) arrange observable variables from the analysis of service encounters into clusters under the trait headings of “discourse competence” and “pragmatic competence.” It is assumed that these “competencies” exist, and that they are manifested through their associated observable variables. Most current test development activity also takes place within a realist framework (Mislevy & Yin, 2012).


Constructivist research takes a number of forms. One trend is the description of language use, particularly investigating locally “co-constructed” interaction between participants in speaking tests (e.g., Brooks, 2009). Another area of interest is the description and assessment of second language pragmatics (Roever, 2011). There is always a strong fairness agenda in constructivist writing, with advocacy for those who are marginalized. This can be combined with test analysis techniques such as differential item functioning to discover if tests discriminate against subgroups (McNamara & Roever, 2006). Where constructivists excel is in carrying out case studies of the social use of tests, unmasking policy agendas behind test use, and investigating the construction of identities through competing discourses (Shohamy, 2001). Constructivist research in this vein helps maintain the conscience of the field by asking difficult questions about contingent constructed ideas.

As constructivists are inherently distrustful of tests and the motivations of their developers, there is little research into “constructivist test development.” The one exception is dynamic assessment (DA). Set within a sociocultural theoretical framework, DA uses assessment to scaffold language acquisition, and so is concerned with change (Fulcher, 2010, pp. 72–7). As each use of DA is considered a unique encounter, the preferred method of research is the individual case study, which cannot be generalized to any other case (Lantolf & Poehner, 2011).


Research within this tradition is concerned with establishing and following appropriate procedures, because reports of what was done count as validity evidence (Chapelle, 2008, p. 320). While there will be variation of content according to purpose, procedures are generic. These are a useful addition to our validation tools. The second area of expansion is in the development and application of argument models to language-testing projects (Chapelle, Enright, & Jamieson, 2008; Bachman & Palmer, 2010) that expand and put into practice the work of Kane (2006a), which in turn depends upon Toulmin. The quality of argument is critical because claims are evaluated in terms of the warrants and backing brought to bear (Toulmin, 2003, pp. 15–16). Proper procedure and good argumentation are central to validation in the absence of ontological claims.


Realism needs strong testable theories, which it is generally acknowledged do not exist in psychology or language testing even by real realists (Borsboom, 2006b, pp. 464–5). Closely related to this problem is the fact that “traits” in language testing are not separate from the individuals in whom we posit their existence; even if we can claim that traits like “discourse competence” or “fluency” really exist, separating out their effect on measures is simply not as easy as in the natural sciences. Perhaps the most intransigent problem in all social science research is that the researcher interacts with and changes the subjects of the research, both as a result of the research methods, and by naming traits (value labels in Messick's terms). In short, there is a genuine problem not only with reference but also with defining and operationalizing traits (Fulcher, 2010, pp. 32–4), and this may be the most significant reason why social science theories have not led to research programs that are as successful as those in the natural sciences.


The first problem is that constructivist research is ideologically driven. Those committed to a Foucauldian reading of the use of tests will see evidence of struggle and marginalization in any data they collect. In principle, there are no data that could falsify a priori beliefs. The second problem is concerned with what is constructed. Hacking (1999) argues that constructivism is useful as a tool to investigate “ideas” that are abstractions of observables and reified within a matrix of facts and relations. In language testing, such an “idea” would be “the native speaker” (Davies, 2003). Individual native speakers exist, and are not problematic. We manage to classify them accurately despite dialects and idiolects. But once we extract the idea of “the native speaker” it becomes a political, social, and problematic thing; and we know that it is used for political purposes, including in some cases weaving it into a matrix that relates it to territory and citizenship. However, critical social tools are not appropriate for the analysis of objects in the real world, theoretical terms, or “elevator words” like “knowledge” or “reality.” We do not construct people, trees, quarks, or (in the case of elevator words) everything. That would be to reduce the world to mere mental states (without individuals in which to reside).

Perhaps the most disturbing aspect of the strain of constructivism that has most influenced language testing is the deep pessimism about the world and its institutions. Everything is seen as evidence of conflict and there is no way out. Fulcher and Davidson (2008) constructed an imaginary dialogue between Mill and Foucault to tease out these problems. Mill was an optimist, so when he wrote about testing he saw it as helping to create personal development which would support the introduction of universal suffrage. For Mill we make progress through personal and social development. For Foucault there is no escape from despair, and tests will forever be instruments of oppression.

Despite the problems associated with constructivism it has served a useful purpose in drawing our attention to the very real misuse of tests. It is a legitimate enterprise to describe and critique the political contexts of test use (Fulcher, 2009), and to build explicit intended effects of tests into test development. However, the overarching ambitions of constructionism have also had a negative impact that needs to be critiqued—preferably before constructionism itself is taken for granted.


The only test of success in instrumentalism is the utility of a belief, practice, or test to improving life and furthering our projects. While engagement with data is important, it is accepted that all our theories are underdetermined, and hence no single explanation is “true.” This does not matter, however, as long as we have an assessment process that proves to be useful for making decisions with reasonable accuracy. Perhaps the major criticism to be directed at instrumentalism is its lack of ambition. It has given up on the larger questions of truth (just what is the nature and structure of language knowledge and ability for use in a specified domain?) in return for a purely epistemological solution to a practical problem.

This is not a new problem for instrumentalism, and neither is the standard response. Dewey (1912) argues that truth is wrapped up with the notion of “social credit,” or what works to improve the human condition:

I should say that as method for philosophy it indicated a more severe intellectual conscience; less free and easy use of the concept of Truth in general and more careful use of truths in particular to designate such conceptions and propositions as have emerged successfully from the test conditions that are practically appropriate. (Dewey, 1912, p. 80)

If this is accepted as a defence, then consequences become paramount. They are not optional to the development of the technical processes and argumentation, and cannot be relegated to an afterthought. However, recollecting Lauden's argument for instrumentalism over realism, we must remember that, despite the practical success of miasmatic theories of disease, they were wrong. Without the noncontingent (true) explanation, we would not have been able to develop modern vaccines.

Future Directions

Bachman (2006, p. 200) correctly suggests that many studies do not succeed in clearly combining philosophical approaches. We should add that frequently they do not articulate their own philosophical assumptions, and some are internally incoherent. Even when they do articulate assumptions there can be less clarity than is sometimes required. This is the case, for example, in Fulcher and Davidson (2007), where there is some sliding between classical and modern pragmatism, which has led some readers to (mistakenly) assume that the text has a postmodern agenda. Researchers also need to be aware that while some combining is possible there are areas where assumptions are incommensurable. It is a disservice to the field to paper over the fault lines, for it is only in disagreement and healthy debate that progress is made (Mill, 1859/1998, p. 25).

The first important question for the future relates to the nature of our constructs/traits. Unless there is some general consensus, it appears that the field will follow three separate agendas. I will start by making explicit what is implicit in the preceding discussion—that the constructivist position is both confused and untenable in this respect. If everything is constructed and contingent, from processes to traits, our project is lost from the start.

The rest of the problem may be tackled by recourse to classical pragmatism. Pragmatism was defined by Peirce in Baldwin's dictionary (1902/1998, p. 300) as:

The opinion that metaphysics is to be largely cleared up by the application of the following maxim for attaining clearness of apprehension: “Consider what effects, that might conceivably have practical bearings, we conceive the object of our conception to have. Then, our conception of these effects is the whole of our conception of the object”.

This could easily be misinterpreted as an instrumentalist position, and was construed as such by later pragmatists such as William James. However, Peirce applied the maxim primarily to the notion of objects and constructs. The example he provided in the original 1878 formulation of the pragmatic maxim was the construct of “hardness,” which manifested itself in the effect of the application of the construct, such as observing (and predicting) that a diamond will cut other materials, but not vice versa. This, he said, was to “insist upon the reality of the objects of general ideas in their generality” (1902/1998, p. 302). The construct of “hardness” is therefore “real” because of the practical consequences that flow from its definition and meaning.

In classical pragmatism, therefore, an abstraction is defined as a generalization of experience, labeled with an abstract noun. An example from the language-testing literature might be “fluency,” a term given to a range of linguistic and processing features that we may experience and describe (Fulcher, 1996). Peirce (1903, p. 134) would ask under what circumstances such an abstraction can be real, and answers: “according to the pragmatic maxim this must depend on whether all the practical consequences of it are true.” Next, he asks what kind of thing such an abstraction is:

What kind of being has it? What does its reality consist in? Why it consists in something being true of something else that has a more primary mode of substantiality. Here we have, I believe, the materials for a good definition of abstraction. (1903, p. 134)

In the case of fluency, the abstraction consists of a set of primary “substances” (in Peirce's terms), which may include features such as speed of delivery, pausing (for content planning at syntactically appropriate slots), hesitating (causing syntactic disjunct), and so on. Peirce continues to a definition: “An abstraction is a substance whose being consists in the truth of some proposition concerning a more primary substance” (1903, p. 135). If the categories of “fluency” described in Fulcher (1996) can be observed, and if they vary in ways predicted (North, 2007, p. 657, found independently that the fluency descriptors were the only consistent set capable of acting as anchors in the construction of the CEFR), the abstraction is true, even though its name is conventional. Finally, Peirce (1903, p. 134) insists “reality can mean nothing except the truth of statements in which the real thing is asserted.” According to this treatment it is arguably the case that “fluency” is a trait that has the property of being real (although it is questionable how “real” it remains if reductionist strategies are employed for the sake of automated scoring or research, as in the case of Bernstein, Van Moere, & Cheng, 2010, p. 362), just as hardness and weight are real because of their practical consequences.

The pragmatist strategy therefore avoids the need for a strong correspondence theory of truth that is required by the “real realists” on the one hand, while incorporating the instrumentalist arguments supported by relevant empirical data on the other. It steers a course between extremes, incorporating the advantages of each, while mitigating the challenges.

Research agendas within such a framework could lead to substantive validation programs. This would have practical consequences; as Laudan (1981b, p. 145) says: “the aim of science is to secure theories with a high problem-solving effectiveness” and language testing is a problem-solving activity.

The second way forward is to re-engage with a progressive Enlightenment agenda that incorporates consideration of consequences, but without ideological baggage. All fields evolve, and for the most part advances are made through incremental theory building, empirical research, and conceptual development. Theory in natural sciences evolves as well, and each stage has allowed humans to manipulate their environment in predictable and successful ways in order to achieve more than had previously been possible. This is also true of language testing and the validation process. Karl Popper referred to this as verisimilitude, or the approximation of a theory to truth. Peirce (1877/1998, p. 155) held a similar view:

This great law is embodied in the conception of truth and reality. The opinion that is fated to be ultimately agreed to by all who investigate, is what we mean by the truth, and the object represented in this opinion is the real. That is the way I would explain reality.

Advancement requires a critical, collaborative profession, prepared to argue cases and abandon them when necessary. Peirce and Mill both knew that the cycle of progress would be endless. Scientific inquiry does not lead to the discovery of “Truth” with a capital T, but makes genuine progress by not being wrong. A better language-testing future cannot be built on a static or ideological view of society, individuals, or trait definitions. It needs an optimistic agenda of expanding our knowledge, and learning how to build better tests in the service of meritocratic and just decision making.


Suggested Readings

