What can electrophysiology tell us about the cognitive processing of scalar implicatures?

Correspondence Stephen Politzer‐Ahles, Department of Chinese and Bilingual Studies, The Hong Kong Polytechnic University, Hong Kong, China. Email: stephen.politzerahles@polyu.edu.hk Abstract One of the most widely studied phenomena in neuropragmatics—the study of how the brain derives context‐ and speaker‐based aspects of meaning—is scalar implicature. A scalar implicature is the interpretation of a proposition like Some of the students failed as meaning that a stronger proposition (All of the students failed) is not true. While scalar implicatures have been a significant object of study for decades, in recent years there has been an explosion of experiments investigating them using neuroscientific methods, particularly electroencephalography. Much of this research aims to identify neural substrates of comprehending scalar implicatures. Here, I review the extant findings and argue that the most of these studies have not directly observed neural correlates of scalar implicatures; rather, they have mostly observed downstream and/or domain‐general processes that happen to be related to implicatures, but not uniquely so. I argue that an instrumental approach to neuroscience—one that treats brain components not as objects of research in of themselves, but as tools for learning about pragmatics—would be a valuable addition to this emerging field.


| INTRODUCTION
The past decade has seen an explosion of research using neuroscience methods to investigate how people process semantic and pragmatic aspects of language. A challenge that arises in such a research enterprise is that both the object of study (language and the psychological mechanisms that handle it) and the tools (patterns of brain activity) are not fully understood, and the link between them is even less understood. These challenges raise potential concerns about how much can be learned about the processing of language using neuroscience techniques. These concerns are not new-for example, Poeppel and Embick (2005) discuss fears that the alliance of linguistics and neuroscience could result in unproductive 'interdisciplinary cross-sterilization'. The present paper revisits this concern by adopting as a case study one particular subfield of neurolinguistics that has attracted substantial interest in the last decade: the neural processing of scalar implicature.
Event-related potentials 1 are a powerful tool for studying language processing, and particularly the comprehension of scalar implicatures, for several reasons (for an introduction to event-related potentials in language comprehension, see Kaan, 2007). They have excellent temporal resolution, allowing researchers to distinguish between brain activity that occurs at  Noveck and Posada (2003)  the moment a scalar implicature is made and activity that occurs later during downstream processing. They are multidimensional, as signals may differ in terms of where on the scalp they appear, how late after a stimulus they emerge, how long they last, whether they have positive or negative voltage relative to the recording reference and how strong they are; these provides a means to distinguish between qualitatively different processes. And they can be recorded without any task that requires participants to make metalinguistic judgements about the meaning of sentences, which is especially important given that sentences with different interpretations (e.g., where some is interpreted with or without a scalar inference) may engender different downstream verification strategies during judgement tasks-for instance, determining whether some and possibly all Xs are Y requires just searching through the Xs until the comprehender finds one that meets the criterion Y, whereas determining whether some but not all Xs are Y requires searching for one that does meet the criterion Y and one that does not (see, e.g., Politzer-Ahles & Gwilliams, 2015).
With the power of neuroscientific methods, however, also come limitations. 2 The past decade or so of electroencephalography research on scalar implicatures serves as an instructional example of some of the shortcomings and pitfalls that can occur when attempting to use neuroscience methods to address linguistic or psychological questions. These shortcomings are not just abstract; they have real consequences for the potential value of neurolinguistic research on pragmatics. For example, some active researchers in the experimental pragmatics field have lamented that the copious outpouring of experimental studies on scalar implicature through the years has actually had little impact on theories of how scalar implicature works. As the popularity of using neuroscience techniques to test linguistic and psychological questions is unlikely to go away in the foreseeable future, there is a real need to find ways to make neuropragmatics experiments provide more useful and impactful observations for shaping our understanding of how pragmatics, the mind and the brain work.
This study will summarise what has been observed so far in studies using these methods, as well as caveats about what these studies cannot show. While this review focuses on scalar implicatures as a case study, the issues discussed here can apply to pretty much any situation in which a linguistic or psycholinguistic phenomenon is investigated using electrophysiological methods. For anyone beginning to learn about any given topic in neurolinguistics, or preparing to do experiments themselves on such topics, it is valuable to have an awareness of the potential pitfalls of this kind of research and familiarity with other ways this kind of research can be done.

PSYCHOLINGUISTIC QUESTIONS
Before looking into the ways neuroscience methods have been used to study scalar implicatures, it will be informative to consider what some of the core questions about scalar implicatures are. Scalar implicatures have been the subject of substantial debate regarding both their linguistic properties and the psycholinguistic mechanisms that support their comprehension. For more detailed reviews, the reader is referred to Chemla and Singh (2014a,b), Katsos and Cummins (2010), Sauerland (2012) and Sauerland and Schumacher (2016), among others.
One major question is what kind of implicature a scalar implicature really is. An implicature is an interpretation of an utterance which arises not because of the actual meanings of the words in the utterance and the ways in which they are combined, but because of something POLITZER-AHLES -5 of 22 else. That 'something else' may be one of many different things, and thus there are many different kinds of implicature. Consider, for instance, a vignette like the following: Carlyle asked Thisbe if she would lend him a thousand dollars. Thisbe scoffed and walked out of the room.
The second sentence not only expresses the literal meaning of the two clauses (Thisbe scoffed, Thisbe walked out of the room), but also implies that (1) Thisbe walked out of the room after scoffing, rather than before and (2) Thisbe is not willing to lend Carlyle a thousand dollars. According to the theory introduced by Grice (1989), these two implicatures arise via different mechanisms. The implicature that Thisbe scoffed before leaving the room, as opposed to after, arises from a linguistic convention, called conjunction buttressing, whereby two clauses conjoined with 'and' are often interpreted as describing events occurring one after the other (e.g., Levinson, 2000, p. 122). Grice (1989) calls this a 'generalised conversational implicature' because it is a function of the use of particular words and phrases themselves, rather than a function of a specific context, and thus can arise in most contexts. That is to say, almost any time someone says 'X scoffed and walked out of the room', it will be interpreted as meaning X scoffed first and walked out of the room later, unless something else about the context of that utterance makes this interpretation unlikely or impossible. On the other hand, the implicature that Thisbe is not willing to lend Carlyle a thousand dollars depends on this particular context, and would not have arisen if 'Thisbe scoffed and walked out of the room' had been uttered in most other contexts. Grice (1989) calls this a 'particularised conversational implicature'. Some theories consider scalar implicatures, such as the implicature that some of my students failed means not all of my students failed, to be a type of generalised conversational implicature, which would mean that they are realised via different mechanisms than many other types of pragmatic meaning. Other theories, however, either propose that scalar implicatures are not generalised conversational implicatures or eschew the distinction between generalised and particularised implicatures entirely. In either case, such approaches predict that scalar implicatures are derived by the same context-specific mechanisms as any other implicatures (see, e.g., Degen & Tanenhaus, 2015;Noveck & Sperber, 2007).
While the question of whether or not scalar implicatures are generalised conversational implicatures mainly concerns linguistic representation rather than psychological processing, this issue has also raised a closely related question in psycholinguistics: that of how quickly and effortfully scalar implicatures are understood in real-life comprehension. Much of this research has assumed that if scalar implicatures are a kind of generalised conversational implicature then it follows that they would be realised immediately and effortlessly by comprehenders, whereas if they are dependent on particular contexts then it will take extra time and extra processing costs for a scalar implicature to arise since the comprehender must first evaluate the context (see, e.g., Breheny et al., 2006;Katsos & Cummins, 2010). A large portion of the behavioural research on scalar implicatures has been aiming to resolve this question, using a variety of psycholinguistic techniques to test whether scalar implicatures arise earlier or later than literal meanings, and whether they evoke extra processing costs. More recent formulations have complicated this question by pointing out that even if scalar implicatures are context dependent they may not necessarily always take extra time or processing resources (Degen & Tanenhaus, 2015;Politzer-Ahles & Gwilliams, 2015), and that the derivation of a scalar inference is not a monolithic process, but is a process involving multiple intermediate steps, each of which might be rapid and effortful or might be slow and costly (Chemla & Singh, 2014a,b). Some recent research is beginning to make steps towards evaluating these more articulated kinds of scalar implicature processing models; for example, Rees and Bott (2018) use the psycholinguistic paradigm of structural priming to investigate the properties of a particular sub-component of the scalar implicature derivation process, the determination of which alternatives are relevant in the context (e.g., whether or not all is a relevant alternative to some in the discourse).
Another major question about scalar implicatures is whether they are derived by pragmatic or semantic mechanisms. All the approaches mentioned above assume that scalar implicatures are derived pragmatically, that is, by integrating information about the context with information from the linguistic expressions themselves to infer what the speaker meant. But other approaches hold that scalar implicatures are derived not by making context-based inferences about what the speaker might have meant, but from language itself. One way this might happen is through the insertion of a covert semantic operator into the sentence structure (Chierchia, Fox, & Spector, 2012). Another is by the selection of a lexicalised enriched meaning-for example, for the interpretation of some as some but not all, it could be the case that some is polysemous, with one sense equivalent to some-and-possibly-all and another sense equivalent to some-but-not-all, and a 'scalar implicature' is just the selection of the latter sense (see, e.g., Sauerland, 2012). Under such accounts, a so-called 'scalar implicature' is not an implicature at all. 3 Another major line of research in the experimental pragmatics of scalar implicatures, then, focuses on testing whether people derive scalar inferences in certain contexts, such as embedded clauses, where pragmatic and semantic accounts make different predictions about whether or not scalar implicatures should be derived (for review, see Chemla & Singh, 2014a,b;Sauerland, 2012;among others).
These questions about the linguistic representation and psychological processing of scalar implicatures provide a backdrop for the recent explosion of neurolinguistics research about this phenomenon.

HAVE SHOWN
Since the seminal study by Noveck and Posada (2003), and particularly in the last decade, many studies have been conducted attempting to use neuroscientific data to adjudicate between competing theories of how scalar implicatures are processed and to identify the neural correlates of scalar implicature realization. Table 1 summarises all the studies, to my knowledge, that have used electrophysiology to study scalar implicature processing (while the present review focuses on electrophysiology, the points that will be raised below regarding the functional interpretation of brain data apply likewise to data from haemodynamic studies using, e.g., functional magnetic resonance imaging).

| The incremental use of scalar inferences to drive downstream predictions
Many studies of scalar implicature have focused on brain responses to words occurring later in an utterance which serve as probes of whether or not an enriched meaning (e.g., some but not all) has been realised earlier. In Table 1, I refer to this paradigm as 'Violation of SI-based prediction'. While such studies do not attempt to directly identify the brain correlate of the inference-making process POLITZER-AHLES itself, this sort of downstream probe of whether or not an inference was made is useful for testing theories of when and how scalar implicatures occur. For example, Noveck and Posada (2003) compared event-related potentials elicited by the final words of patently true sentences like Some people have brothers (it is true that some, but not all, people have brothers) and those of infelicitous sentences like Some staircases have steps (while the sentence is literally true, it is not true that 'not all staircases have steps'). Later studies introduced more controlled designs (e.g., Nieuwland, Ditman, & Kuperberg, 2010), with the most recent studies in this vein using picture stimuli to control the contexts rather than relying on world knowledge as a context like the above examples do (Hunt, Politzer-Ahles, Gibson, Minai, & Fiorentino, 2013;Spychalska, Kontinen, & Werning, 2016, Spychalska, Kontinen, Noveck, Reimer, & Werning, 2019. For example, Hunt et al. (2013) showed participants visual contexts in which an agent affected none of one group of referents (i.e., the brownies in the lower example), all of another group (i.e., the tomatoes in the lower example), and some-but-not-all of the third group (i.e., the steaks in the lower example); see Figure 1. Participants then read a sentence like 'The boy cut some of the steaks in this story'. The critical word could felicitously refer to a group that the boy cut some but not all of (the lower example) or infelicitously refer to a group that the boy actually cut all of (the upper example). The logic of this and other such studies is that if a scalar implicature has been made, such that some is interpreted as not all, then the readers should be surprised to read the infelicitous critical word (e.g., steaks in the upper example in Figure 1), given that they would expect the sentence to mention the group that the boy cut some but not all of (e.g., brownies in the upper example in Figure 1). If the scalar implicature is realised as the sentence is unfolding, this surprise should elicit a different brain response as soon as the critical word is read. On the other hand, if scalar implicatures are only realised after a proposition has been fully uttered, then such an effect might not be observed when the critical word is read.
In the majority of these studies, downstream words that render a sentence infelicitous (steaks in the upper example in Figure 1) indeed elicit a different brain response than downstream words that render a sentence true (steaks in the lower example). Typically, they tend to elicit a stronger N400 response. The N400 is an event-related potential component observed around 300-500 ms after the appearance of a contentful stimulus, mostly on central and posterior parts of the scalp, and tends to be stronger when a stimulus is less expected based on the context (for review, see Lau, Phillips, & Poeppel, 2008; van Berkum, 2009;among others). Thus, these studies have shown that the inference-based interpretation of some is integrated incrementally into the online interpretation of a sentence, modulating the extent to which downstream words are expected. Words consistent with both the literal interpretation and the enriched 'not all' interpretation are more strongly expected than words consistent only with the literal, not the enriched, interpretation; the more expected words, in turn, evoke smaller N400 components than the less expected words that render a sentence under-informative. These studies have provided important evidence that scalar implicatures are realised before the end of a sentence. This finding goes against the predictions of strong pragmatic accounts which assume that implicatures are a function of a whole utterance, although it can still be accommodated in modern pragmatic accounts (Geurts, 2010). This technique is also useful as a probe for how scalar implicatures influence comprehension in different populations; for instance, Spychalska et al. (2016) found that comprehenders who tend to interpret some as not all in an explicit judgement task also show a different neural response profile in this paradigm than participants who tend not to use this interpretation in the explicit judgement task.
Converging evidence for the incremental use of scalar inferences to update processing predictions comes from Hartshorne, Azar, Snedeker, and Kim (2014), who adapted a paradigm widely used in reading time research (Bergen & Grodner, 2012;Breheny et al., 2006;Politzer-Ahles & Husband, 2018) to show that the derivation of scalar implicatures is sensitive to context. In this paradigm, people read a sentence in which some is embedded in a context that either does or does not support scalar implicatures, and then a later word in the sentence serves as a probe for whether the inference was realised. For example, in the sentence Addison ate some of the cookies before breakfast this morning, and the rest are on the counter, 'some' is likely to be interpreted as meaning 'not all', and thus the reader should be aware that there are some cookies that have not been eaten. The comprehension of the rest is then facilitated since the reader was already aware of this remaining set of referents. On the other hand, in the sentence If Addison ate some of the cookies before breakfast this morning, then the rest are on the counter, 'some' is less likely to be interpreted as meaning 'not all', because scalar inferences are less likely in downward entailing contexts such as the antecedent of a conditional (i.e., an 'if…' statement; see Chierchia et al., 2012;among others). Thus, in this context, comprehension of the rest is not facilitated. Indeed, the authors observed a more negative electrophysiological signal at the rest in the latter case and not in maximally similar control comparisons that do not involve scalar implicature. This is consistent with the notion that the scalar implicature was realised in the former context and facilitated later processing of the rest, whereas it was not realised in the latter case and thus the rest was less expected and more difficult to process.

| Differences between pragmatic and semantic processing
While the above-mentioned studies probed the downstream consequences of scalar implicatures rather than the neural correlates of the scalar implicatures themselves, other studies have attempted to directly measure brain responses underlying the realization of scalar implicatures. These too have the potential to test different theories of scalar implicature representation and processing: if one or more neural correlates of scalar implicature processing could be identified, then the properties of these responses could tell us something about the nature of scalar F I G U R E 1 Sample stimuli from Hunt et al. (2013) POLITZER-AHLES implicature processing. One such body of work has used event-related potentials to examine whether scalar implicatures elicit qualitatively different processing than semantics. Most of this work has done so by comparing scalar implicature-based violations to purely semantic violations. In Table 1, I refer to this paradigm as 'Infelicitous scalar expression'. For example, Politzer-Ahles, Fiorentino, Jiang, and Zhou (2013) examined brain responses to putatively pragmatic violations, along with matched control conditions for each of these, at the first moment the scalar implicature could have been realised, rather than at a downstream word; see Figure 2. Pragmatic violations were realised by showing the participant a context in which everybody is doing the same thing (i.e., all of the girls are sitting on blankets) and then presenting them with a sentence like 'In this picture, some of the…'. If some of the is interpreted as meaning not all of the, then this sentence should be considered infelicitous by the comprehender and should elicit a different brain response than a correct control (the same sentence if preceded by a picture in which some, but not all, of the girls are sitting on blankets). Indeed, these pragmatic violations elicited sustained negative brain responses compared to the correct controls. Semantic violations, on the other hand, were realised by showing the participant a context in which different people are doing different things (some girls are sitting on blankets and some are not) and then presenting them with a sentence like 'In this picture, all of the…'. Unlike the pragmatic violations, these sentences should be considered incorrect under any circumstance, given that the meaning of all is not negotiable like the meaning of some is. These violations did not elicit the sustained negativity, relative to their correct controls, that the pragmatic violations elicited. Crucially, these effects were observed at the quantifier itself, where the scalar implicature might first become available, rather than downstream as in the studies summarised above.
Such findings have been taken as evidence that pragmatic and semantic processing engender qualitatively different brain responses, a claim which has also drawn support from haemodynamic brain imaging research that has found activation in different brain regions for putatively pragmatic versus semantic violations in a similar paradigm (Shetreet, Chierchia, & Gaab, 2014a,b,c;Zhan, Jiang, Politzer-Ahles, & Zhou, 2017). Such a dissociation would be theoretically important, given that it is an open question whether scalar implicatures are derived by pragmatic and semantic mechanisms, as discussed above. Isolating the neural mechanisms that are involved in scalar implicatures but not in other sorts of computations could provide a means to determine whether those mechanisms are involved in pragmatic or semantic processes, by comparing the neural correlates of scalar implicature processing to the neural correlates of other processes that are uncontroversially pragmatic or uncontroversially semantic. 4 However, the event-related potential pattern observed in this study was not replicated in a similar study by Panizza, Onea, and Mani (2014) and only in a subset of participants in Politzer-Ahles (2013, Experiment 3). Furthermore, as discussed further below, it remains unclear whether the presence of qualitatively different processing components for 'pragmatic' and 'semantic' violations reflects qualitatively different correlates of pragmatic and semantic processing, or reflects different domain-general processes that are differentially implicated in these two types of violations.

| How scalar implicatures are actually realised
Finally, some studies have compared the brain responses elicited by a scalar expression like some in contexts that do or do not licence scalar inferences, in order to isolate components involved in the derivation of scalar inferences themselves as opposed to the deployment of these interpretations for downstream prediction or the processing of pragmatically infelicitous stimuli. While the studies described above tested scalar implicatures indirectly by using violation paradigms where the interpretation based on a scalar implicature conflicts with some other aspect of the context or world knowledge, the studies described in this section attempt to directly probe what happens when scalar implicatures actually occur. The motivation for such studies is similar to that for the studies described in the previous section: if we can identify the neural substrates of the scalar implicature derivation process itself, we could be able to use that to learn more about the nature of that process. In Table 1, I refer to this paradigm as 'Contextual support for inference'.
Various context manipulations have been used to carry out this sort of test. As described above,  examined brain responses to some in sentences like 'Addison ate some of the cookies…' and 'If Addison ate some of the cookies…'. It has been frequently observed that scalar implicatures are typically not realised in an 'if' clause, like in the latter example (Chierchia et al., 2012;among others). Therefore, if deriving a scalar implicature requires extra processing effort, some in the former context might elicit an additional neural response not elicited by some in the latter context. This is not, however, what has been observed.  found no significant difference in brain responses to some in contexts that do or do not licence scalar inferences. Politzer-Ahles and Gwilliams (2015), using a similar experimental F I G U R E 2 Sample stimuli from Politzer-Ahles et al. (2013) POLITZER-AHLES paradigm but a different context manipulation, found a sustained magnetoencephalogram component, originating from left lateral prefrontal cortex, for some in contexts that are less likely to licence scalar inferences, compared to contexts that are more likely-the opposite of the prediction outlined above. They argued that this component reflects increased activation related to deriving the 'not all' interpretation of some when it has little contextual support. Further research is needed, however, both to determine the replicability of this component and to investigate whether this reflects pragmatic processing specifically or domain-general operations that simply happen to also be implicated in a scalar implicature manipulation.

| ARE THESE 'PRAGMATIC COMPONENTS'?
A limitation of many of these studies, however, is that the brain responses observed are not uniquely attributable to pragmatics, as opposed to being domain-general processes; this is also acknowledged by , Hunt et al. (2013) and Nieuwland et al. (2010). Consider, for example, studies that found modulation of the N400 on critical words downstream of a scalar expression. Hunt et al. (2013), for instance, observed that the word steaks in the sentence 'The boy cut some of the steaks in this story' elicited a larger N400 in a context where all of the steaks were cut (and thus the words steaks is not expected here) than in a context where some, but not all, of the steaks were cut (and thus the word steaks highly expected); see Figure 1. The N400, however, is not a direct index of scalar implicature processing. Rather, it is known to be an index of the ease or difficulty of lexical activation. 5 Thus, these experiments are not directly observing brain responses related to pragmatics. Rather, they are observing domain-general brain responses which happen to be influenced by pragmatic processes that have already happened. That is to say, the realization of a scalar inference makes a given word less expected and less congruent with the context, and therefore harder to access; it is the difficulty of access, not the scalar implicature itself, that leads to a greater N400. Fortunately, those studies were not designed to directly observe scalar implicature processing itself, and the conclusions of those studies do not depend on being able to observe scalar implicature processing itself. Readers of this literature must be careful not to misinterpret these studies as revealing neural correlates of scalar implicatures themselves.
We could think of the brain responses in such experiments as being like a high-tech thermometer. A person who is sick might show a higher reading on a thermometer than a person who is not sick. But this happens because the sickness causes the person's body temperature to rise and that temperature in turn affects the thermometer. A person observing the thermometer reading is not directly observing the illness; rather, they are only observing a downstream consequence of it. In the same way, a person observing N400 effects in an experiment is observing downstream consequences of a pragmatic computation, rather than observing the computation itself.
Another relevant comparison is eye-movement research on scalar implicatures (e.g., Grodner et al., 2010;Huang & Snedeker, 2009). In these studies, a camera monitors how participants' eyes move as they are viewing a display, under the assumption that when a comprehender realises a certain interpretation they will look towards the image depicting that interpretation. When participants move their eyes to look at a given picture in a visual-world display, this presumably reflects the consequence of a pragmatic computation: making a scalar inference allows the hearer to decide who is being referred to in the sentence, and later they look at that referent. Moving their eyes does not reflect the pragmatic computation itself. In eye-movement research, however, this is widely understood, and eye movements are generally not mistaken for representing pragmatic processing itself. In neurolinguistic research, on the other hand, perhaps because brain components are less well understood, there is a temptation to interpret brain components as directly reflecting pragmatic processing-and indeed much research aims to identify the brain regions and brain components that carry out scalar implicatures. As described above, though, brain components observed in such experiments usually cannot be uniquely attributed to pragmatic processing itself.
The problem is more serious with experiments observing new or unpredicted components. Studies examining N400 effects on downstream critical words, such as those discussed above, are designed to elicit specific downstream effects (i.e., N400s) and thus are in a good position to interpret these effects. For those studies, the fact that the brain responses do not reflect pragmatic processing is not a problem; those studies were designed to teach us something about pragmatic processing without needing to directly observe 'pragmatic' brain responses. Other studies have aimed to directly observe what brain responses are elicited by pragmatic processing. The latter two categories of studies described above are both subject to this concern. Politzer-Ahles et al. (2013), for example, compared brain responses elicited by pragmatic versus semantic violations, with no specific predictions about how they would differ. They then observed qualitatively different patterns for the two violation types and concluded that there are qualitatively different neural mechanisms for the processing of semantics and pragmatics; indeed, this study is frequently cited as offering experimental support for that view. But do these responses really show that pragmatics and semantics are processed in different ways? An alternative explanation is that these pragmatic and semantic violations elicited different domain-general processing strategies. For example, the pragmatic violation included a sentence that could be reinterpreted to fit the context: the context was a picture in which everyone is doing the same thing (e.g., there are five girls, and all of them are sitting on blankets) and the critical sentence was In the picture, some of the girls are sitting on blankets; see Figure 2. If some of is interpreted as meaning 'at least one', rather than as meaning 'some but not all', then the sentence is no longer inconsistent with the context. Thus, the different brain response elicited in this violation might have reflected the reinterpretation of the sentence or the inhibition of one of the interpretations. If that is the case, then this brain response does not reflect pragmatics per se; rather, what has been attributed to 'pragmatic processing' is rather an epiphenomenon of other domain-general processes, such as revision, that happened to be possible in this kind of violation sentence and not in the semantic violation sentence. Crucially, in a different kind of manipulation it could turn out that semantic violations and not pragmatic violations could implicate this process. 6 Similar alternative, domain-general explanations are available for (and suggested in) other electrophysiological studies that attempted to probe scalar implicature processing directly (e.g., Politzer-Ahles & Gwilliams, 2015). Such studies provide an empirical observation (e.g., that two types of violation are processed differently) but do not provide an explanation for that observation. Unfortunately, results from such studies have sometimes been claimed to provide evidence for what the neural correlates of scalar implicatures are. In fact, as described above, such a conclusion is not strictly justified.
These concerns have been raised before (e.g., van Berkum, 2009van Berkum, , 2010 and in fact are just a special case of more general problems raised by Poeppel and Embick (2005): the granularity mismatch problem and the ontological incommensurability problem. The granularity mismatch problem refers to the limitations of searching the brain for coarse-grained concepts like 'pragmatics' and 'semantics' in a language system that works with more fine-grained, specific operations. Much of the neurolinguistics work in the past 2 decades has been resolving this problem by focusing on more and more specific phenomena-rather than searching for neural correlates of syntax and semantics; nowadays, we have research subfields searching for the neural correlates of gender agreement, complement coercion, scalar implicature and so on, just to name a few. There is, however, a second problem, the ontological incommensurability problem: the brain likely functions in terms of basic operations like linearization, concatenation, chunking and so on, rather than abstract linguistic operations like syntactic movement, implicature generation and so on. Thus, such research may be examining units that are incommensurable, looking for correlates of things like "scalar implicature" in a brain that does not have such an operation.

| AN ALTERNATIVE APPROACH
There is another approach to neurolinguistic research on scalar implicatures, which could be more fruitful. This approach is, essentially, not to look for 'neural correlates' of scalar implicatures, but to use brain components as a thermometer: something which we recognise is not a correlate of scalar implicatures, but which has well-understood properties that allow it to be used as an independent tool for testing the predictions of theories.
To put this approach in context, a categorization of existing approaches is useful. van Berkum (2010) classifies neuropragmatics research into four rough categories. 'Neuro lite' research involves doing psycholinguistic research (with behavioural rather than neuroscientific measures) but just phrasing the conclusions as being about the 'brain' rather than about the 'mind'. 'Instrumental' research involves using neuroscientific measures not to understand the brain per se, but as a tool to understand psychological processes. 'Modestly ontological' research involves searching for neural correlates of some process or concept that is motivated by linguistic theory. Finally, 'deeply ontological' research attempts to understand how actual brain functions support these processes. Much extant neurolinguistics research on scalar implicatures, especially the studies attempting to observe scalar implicature processing directly, could be considered 'modestly ontological', as it often takes the brain responses in of themselves as the object of study; this type of research is subject to the limitations described above. On the other hand, an 'instrumental' approach would be valuable for integrating neuroscientific methods with pragmatic theories and psychological models of scalar implicature processing.
How does an instrumental approach to neuroscience work in practice? An excellent example in neurolinguistics comes from van Turennout, Hagoort, and Brown (1998), who used event-related potentials to detect when people accessed different components of a lexical entry. Specifically, they wanted to see whether a person comprehending a word accesses its morphosyntax first or its pronunciation first. They did so by taking advantage of the lateralised readiness potential, a brain response that emerges when a person prepares to make a muscle movement, such as moving the hand to press a button. This component emerges even if the person ultimately decides not to make the movement; thus, it can be used to reveal things that a person considers doing but ultimately does not do. In this study, Dutch native speakers saw a picture representing some word (e.g., a picture of a bear), and their task was to press a button to indicate the word's morphological gender. Unless, that is, the first sound in the word was /b/, in which case participants were supposed to do nothing. A lateralised readiness potential was observed even on trials with no button press, suggesting that participants prepared to press a button to indicate gender, even though they ultimately did not press the button. In other words, participants accessed the morphosyntactic gender of the word, and accordingly prepared to make a response, before they accessed the pronunciation of the word and stopped preparing their response. Crucially, the lateralised readiness potential itself has nothing to do with the processing of morphosyntax or phonology; it is a simple correlate of motor preparation, and its properties are well known. Nonetheless, the experimenters were able to design a paradigm in which this component could be used as a simple instrument to test much more abstract aspects of cognition.
In fact, much electrophysiological research on scalar implicatures is already using this approach. The body of research examining how the realization of a scalar inference modulates processing of words downstream (Hunt et al., 2013;Nieuwland et al., 2010;Noveck & Posada, 2003;Sikos, Tomlinson, Traut, & Grodner, 2013;Spychalska et al., 2016Spychalska et al., , 2019, summarised above, exemplifies an instrumental approach: such studies were never intended to show what is happening in the brain when a scalar implicature is actually realised, but rather were intended to use downstream brain responses as an instrument to detect whether or not an inference had been realised earlier. The interpretation of these results is, accordingly, more straightforward than the interpretation of much of the rest of the electrophysiological literature discussed above. This paradigm has convincingly demonstrated that scalar implicatures can be realised before the end of a proposition and has been fruitfully used to examine how scalar implicatures are or are not derived in different populations (e.g., Spychalska et al., 2016). However, examining downstream responses can only answer some, not all, of the important questions about scalar implicatures. By now, most theories of scalar implicature processing agree that some can be realised as not all without the comprehender needing to wait until the end of the proposition; even strongly pragmatic accounts have apparatus to account for such phenomena (e.g., Geurts, 2010). In current debates, the issue at stake regarding the speed of scalar implicatures is not so much a question of whether they are realised at the end of a sentence or in the middle, but whether they are realised immediately or a few hundred milliseconds later (see, e.g., Huang & Snedeker, 2009). Such a question cannot be answered by looking at brain responses a few words after a scalar implicature could have been triggered; it requires paradigms that look directly at the expression that triggers the inference. Likewise, examining downstream modulations of how strongly a word is expected might not be sufficient for determining what sort of computation a scalar implicature itself is (e.g., semantic or pragmatic). For reasons like this, it is important to explore ways to extend the instrumental approach to more questions than it has been used to test thus far.
Researchers have recently begun conducting other sorts neurolinguistics research on scalar implicatures using such instrumental approaches to examine what happens the moment a scalar implicature could be realised (e.g., Barbet & Thierry, 2016, 2017. Barbet and Thierry (2016), for example, took advantage of the P300 component, which is known to be increased when a person detects a rare stimulus that they have been waiting for (a 'target'). In one condition, participants were instructed to press a button every time they see a word in which the number of uppercase letters matches the meaning of the word (e.g., they should press a button for ALL and tWO, but not for aLL or tWo) and were instructed to treat 'some' as meaning 'some but not all'. Thus, when they saw SOME, they should not press a button, as the amount of uppercase letters (all of them) does not match the intended meaning of the word (some, but not all). Nonetheless, participants showed an increased P300 component in this condition, possibly suggesting that the brain initially interpreted these words as targets (i.e., initially interpreted 'some' as meaning 'at least one') despite instructions not to. Crucially, just like the example of van Turennout et al. (1998) above, this study was not intended to reveal anything new about the P300 itself or to claim that the P300 is a locus of pragmatic processing; it simply used the P300 as an instrument to test specific theories of pragmatic processing.
Such approaches offer a fruitful way to link pragmatics and electrophysiology. Unlike the studies described above, the interpretation of the findings from these studies and their consequences for pragmatic theory are more straightforward, as they focus on components whose properties are well understood. When a well-understood component like a lateralised readiness potential, N400, or P300 is elicited, we generally know why it was elicited: because participants saw a word they were not expecting, saw a word that they need to press a button in response to and so on. It is then up to pragmatic theory and psycholinguistic models to explain why this happened. On the other hand, the interpretation of results from studies that are looking to observe neural correlates of pragmatic processing directly is generally problematic: if a study observes some brain component whose functional significance is unknown, then it is difficult to say what this means for theories of pragmatics, given that we do not know what that brain component means in of itself. Even for studies that observe known brain components, such as the N400 experiments reviewed above, they only are informative for theories of pragmatics if those components are interpreted instrumentally-as independent tools that give us insight into how much the pragmatic context caused a word to be expected, whether a scalar implicature was realised upstream and so on-rather than as themselves correlates of pragmatic processing.
While the instrumental approach is powerful, it of course has limitations. One is that many of these studies rely on unnatural tasks-for example, the examples reviewed above used complex metalinguistic tasks that are quite different from naturalistic comprehension. Another limitation is that this approach requires both good instruments and good theories. Just as we would be unable to diagnose a fever well if we used a thermometer whose properties were not understood (a thermometer for which we do not know what will cause its reading to go up and down) we likewise cannot diagnose pragmatic theories if we rely on brain components that we do not know how to modulate. Likewise, a good instrument-a brain component with clear and predictable properties-is not useful if we cannot set up clear predictions for it: if the pragmatic theories and psychological models of pragmatic processing do not make clear and falsifiable predictions about what should happen under various circumstances, then instrumental approaches cannot unambiguously support or falsify them. Both of these limitations are potentially serious, given the state of pragmatics and neurolinguistics today: current models of pragmatic processing are still mostly vague and each can generate or accommodate a wide variety of hypotheses, given a few degrees of freedom (Chemla & Singh, 2014a,b), and the exact functional interpretation of many event-related potential components is still under debate (for the N400, e.g., see Lau et al., 2008). In other words, an experiment cannot adjudicate between two theories if both theories can claim to be able to accommodate the results of the experiment. Thus, instrumental approaches to neurolinguistics research on scalar implicatures are not applicable in all situations-it is certainly not the case that all pragmatics experiments in the future must use lateralised readiness potential or P300 designs! These instrumental approaches must be supplemented with other approaches, such as exploratory research in more natural settings, and psycholinguistic and linguistic research to help clarify the underlying models and the predictions they make. Nevertheless, instrumental approaches are an invaluable part of the neurolinguistics toolbox, and the study of scalar implicatures would benefit if they were more widely used.
Finally, another important consideration is what gains in knowledge about scalar implicature are uniquely attributable to the use of electrophysiological methods. As discussed above, if one accepts that the (or a) major goal of the neuropragmatics research enterprise is to understand the psychological mechanisms understanding pragmatics rather than to understand the brain itself, then instrumental research approaches are likely to be more productive than 'modestly ontological' approaches. Much electrophyisological research on scalar implicatures is already using instrumental approaches, as acknowledged above; this can be exemplified by studies such as the study by Hunt et al. (2013) described above, and similar studies that examine how readers react to seeing a word that is unexpected because of a previously computed scalar implicature. In these studies, however, brain activity is being used as an instrument in much the same way behavioural data could be; while these studies may take advantage of the multidimensional data of electrophysiological data to search for different effects at different time windows or different parts of the scalp, they are not doing anything fundamentally different than what could be done by, for example, measuring how quickly people read the unexpected words. Thus, it is difficult to point to them as illustrations of how electrophysiology can teach us about scalar implicature processing. Contrast these with studies that use the brain as an instrument in a way that behavioural methods cannot, such as the above-mentioned examples from van Turennout et al. (1998) and Barbet and Thierry (2016); both of these measure brain responses in the absence of observable behaviour. For researchers who are interested in avoiding the pitfalls of difficult-to-interpret results and futile searches for neural correlates of poorly understood psychological processes, but who still want to tap the power of electrophysiological methods and justify why such methods (as opposed to [usually] simpler behavioural methods) would be worthwhile for studying pragmatics and cognition, instrumental approaches that focus on paradigms unique to neuroscience would be fruitful.

| CONCLUSION
Recent years have seen an explosion of research using neurolinguistics methods to study the processing of scalar implicatures. The relevance of much of this work to our understanding of how scalar implicatures occur and how they are processed, however, is still limited; in most cases, it is not possible to conclude that brain responses observed in scalar implicature experiments reflect scalar implicatures at all. While it is possible to learn about how scalar implicatures work without examining the neural patterns that emerge during the realization of the scalar implicature itself (e.g., by examining the downstream consequences of the scalar implicature), it is important for readers and researchers to be aware of the difference and to not be under the illusion that using neuroscience methods automatically provides a window into how scalar implicatures emerge in the brain. The answer to the question raised in the title of this study, then, is that electrophysiology can teach us very little about the cognitive processing of scalar implicatures unless future research can move beyond, or supplement, extant approaches which mainly either focus on an arguably misguided attempt to find 'neural correlates' of pragmatic processes or which use electrophysiological methods to do the same things that behavioural methods can do. Researchers and students wishing to use electrophysiological methods and findings to shed light on how scalar implicatures work should interpret these results with caution and are advised to consider adopting a more instrumental approach which would allow neurolinguistics experiments to more directly test theories of scalar implicature processing.

ORCID
Stephen Politzer-Ahles https://orcid.org/0000-0002-5474-7930 ENDNOTES 1 As well as their magnetic equivalent, event-related fields. Event-related potentials are electric signals recorded using electroencephalography, whereas event-related fields are magnetic signals recorded using magnetoencephalography. While there are some differences in these techniques, especially in terms of what parts of the brain these techniques are most capable of measuring signals from, the techniques are conceptually very similar and have similar applications. In the following discussion, I consider both of these together-sometimes under the umbrella term electrophysiology, which typically includes both electroencephalography and magnetoencephalography, and which distinguishes these methods from haemodynamic methods (such as functional magnetic resonance imaging) which have substantially different properties. 2 For similar challenges in the application of haemodynamic neuroscientific methods to research on other kinds of inferences, see Virtue and Sundermeier (2016). 3 In this study, I will continue to refer to these enriched meanings, such as 'some but not all', as 'implicatures' or 'inferences' when discussing research that tests when and how such meanings are realised. This sort of shorthand is conventional in much of this literature. 4 Such an enterprise would, however, rely on what are known as reverse inferences (Poldrack, 2006), which are of questionable validity. 5 This is a simplification; for more nuanced review see, for example, Lau et al. (2008) and van Berkum (2009), among others. In fact, the N400 is not even specific to language; it can be elicited by non-linguistic stimuli such as pictures (Ganis & Kutas, 2003;Goto, Ando, Huang, Yee, & Lewis, 2010) and smells (Kowalewski & Murphy, 2012). 6 For example, semantic scope ambiguities (e.g., whether 'every kid saw a painting' means that different kids saw different paintings or they all saw the same painting) might involve similar processes of revision or adjudication between multiple interpretations; while it is not clear to me what the maximally similar 'pragmatic' control comparison would be, such semantic ambiguities have been found to elicit sustained negativities downstream (Dwivedi, Phillips, Einagel, & Baum, 2010).