INVITED RESEARCH ISSUES Redressing the Balance in the Native Speaker Debate: Assessment Standards, Standard Language, and Exposing Double Standards

In his philosophical novel, Thus spoke Zarathustra, Nietzsche (1883 – 85), famously wrote, ‘God is dead,’ signifying that God is no longer credible as an absolute moral compass. Over a century later, Paikeday (1985), pro-claimed that The native speaker is dead! in his book title, implying that the native speaker as the arbiter of what is correct in a language is obsolete. This paper discusses this complex, contentious ideological issue from language assessment and sociolinguistic standpoints against the backdrop of global Englishes. After highlighting difﬁculties identifying standard language norms, we discuss the practical need of having some assessment standard against which to evaluate language performance. Proposals as to what that standard should be are then critiqued in view of ways that second language proﬁciency has been operationalized in assessment systems. Next, we argue against vilifying those who use the term ‘native speaker’ and consider terminological problems introduced by some reconceptualizing efforts. We argue that we have a long way to go as a ﬁeld before reaching a truly post-native speaker era, which would seem to be a reasonable aspiration for most, but not necessarily all contexts, and propose recommendations for addressing pressing research problems. This includes standardizing terminology to incorporate semantically transparent terms, exploring assessment alternatives that focus more on

language use than standard language adherence, improving scoring systems to remove nativeness from the equation when inappropriate, and acknowledging a place for accuracy-focused research within a broad tent of applied linguistics research traditions.
doi: 10.1002/tesq.3041 I n his philosophical novel, Thus spoke Zarathustra, Nietzsche (1883-85) famously wrote, "God is dead," signifying that God is no longer credible as an absolute moral compass. Over a century later, Canadian lexicographer, Paikeday (1985), provocatively entitled his book, The native speaker is dead!, arguing that the native speaker (NS) entity as the arbiter of what is correct or grammatical in a language is obsolete. In the format of a Socratic dialogue with linguists of the day, this work is an early example of the vociferous attack on the NS construct. Since this era, an increased awareness of, and sensitivity towards, diversities in English has challenged NS and standard language (SL) constructs (Matsuda, 2018). Debate has continued into the 21 st century, ranging from calls to reconceptualize the use of the NS as a benchmark against which to measure second language (L2) performance, to moves to impose a community-wide ban on the term or chastize those who use it. But is the native speaker really dead? And is it the case that this baggage-laden, identity-shaping term is so contentious, with the potential to cause such egregious offence, that it needs to be replaced by a more inclusive, less polarizing term? This article examines this fraught topic from assessment and sociolinguistic perspectives in reference to assessment standards and SL. After discussing terminological problems introduced by re-conceptualizations, we argue against indiscriminately reprimanding those who use the term, whatever its imperfections. Finally, we propose recommendations for addressing pressing research problems. This includes using semantically transparent terms for interdisciplinary research, exploring assessment alternatives that focus on language use that underpins SL, and acknowledging a place for accuracyfocused research within a broad tent of applied linguistics (AL).

THE NS CONSTRUCT: STANDARDS, SL, TERMINOLOGICAL PROBLEMS, AND IMPERFECT SOLUTIONS
There is widespread agreement on the need for some standard in assessing language ability but little consensus about what this standard Correspondence Email: talia.isaacs@ucl.ac.uk should be. Davies (2013), who wrote prolifically on the topic of the NS, distinguishes between NSs (i.e. we are all NSs of some language) and the NS, designating an idealized goal. Davies opts for "native user" rather than nonnative speaker (NNS) to denote a highly proficient individual not exposed to the language from infancy but who frequently uses that language. He builds arguments from a language variation perspective to show NSs are not a monolithic comparator category, but does not challenge robust psycholinguistic evidence of cognitive differences between individuals who learn a language from birth versus later in life (e.g. Flege, Munro, & Mackay, 1995). Instead of persisting with the unproductive NS/native user dichotomy, he argues that SL is the object of institutional learning and, as such, can be accessed and appropriated by all individuals regardless of their status as NSs or native users. Therefore, SL through education could constitute a common goal and assessment standard for all.
Within the field of Global Englishes, the notion a globally accepted SL has been problematized. Two long upheld spoken standards or prestige varieties, General American (GA) and Received Pronunciation (RP), are neither static nor uniform, and neither pervasive in use compared to L2 varieties nor easy to pinpoint in terms of geographical locus (e.g. Van Riper, 1986). For example, the regional variety that constitutes GA is inconsistent across definitions. Is GA best represented by the English spoken in American West (e.g. Hollywood) or parts of the Midwest? Should it include Canadian English? (Labov et al., 2006). On the other side of the Atlantic, RP is sometimes referred to as the Queen's English. Yet, acoustic analyses of Queen Elizabeth II's vowels in archived Christmas messages revealed that "the Queen no longer speaks the Queen's English of the 1950s" (Harrington, Palethorpe, & Watson, 2000, p. 927). Thus, even in the case of a single prototypical speaker, the goalpost of what constitutes a SL can and does shift. This implies that SL, which is sometimes defined by pre-specifying which stigmatized varieties it is not, is not absolute. Naturally, identifying SL can be even more complex in contexts where local and global varieties of English vie to be the standard (e.g. Rose & Galloway, 2017). Difficulties identifying SL can pose challenges for assessment, such as specifying criterial linguistic features at different ability levels in scoring systems (e.g. rating scales or parameters for acceptable responses in automated scoring; see Isaacs, 2018).
In L2 educational settings, access to or the dominance of SL has also been problematized because it may limit learners' preparedness to interact with speakers of nonstandard varieties outside the classroom (Rose & Galloway, 2019). Furthermore, it may weaken the validity of tests attempting to mirror or predict real-world, domain-relevant L2 performance and lead to negative washback effects (Lowenberg, 2002). Recent trends in assessment have moved away from inner-circle SL models by integrating a wider variety of accents into listening prompts. However, to date, this has mostly been restricted to innercircle varieties in standardized tests (Harding & McNamara, 2018). One factor deterring assessment organizations from incorporating more varied accents in listening comprehension tests centres on concerns about items unfairly advantaging test-takers with high exposure to the accent of the speaker(s) in the listening prompt compared to test-takers with less exposure (Taylor & Geranpayeh, 2011).
One approach for dealing with use of the NS as a proxy for high L2 performance standards is to apply Cook's (2002) concept of multicompetence to L2 assessment settings (Brown, 2013). For example, the top level of L2 speaking scales could describe the performance features of multicompetent speakers with high L2 proficiency rather than monolingual NSs. This approach has been used in some rating scale development efforts (e.g. Fulcher, 1996) but has not been pervasive. Multicompetence is consistent with other aspects of operational practice in L2 assessment. For example, one criterion for becoming an accredited IELTS examiners is having "fully operational command" of English, reflecting the descriptor at the top level of the scale (IELTS Australia, n.d.). All applicants, including NSs, need to complete the English proficiency section of the form, with some additionally required to take IELTS to demonstrate they have the requisite proficiency. Thus, in some assessment settings, procedures have long been in place to permit the use of multicompetent speakers rather than NS targets as the benchmark or standard, although overtures in this direction need to continue and intensify. Cook (2002) defines "L2 users" as individuals who use a language besides their mother tongue to communicate. This does not directly align with Davies' (2013) native user definition cited above nor with Paikeday's (1985) "proficient user," which presumes a higher proficiency level than Cook's broader definition. Dewaele (2017) applauds Cook's introduction of the term "L2 user" for its inclusiveness and movement away from the linguistic deficit engendered by the term "NNS." Dewaele then advances the term "LX user" to avoid confusion about when the language was learned chronologically (L1, L2, L3, etc.). This re-conceptualization of "NNS" is perhaps unsurprising given the decades of critical literature problematizing this term (see Rose & Galloway, 2019or Selvi, 2018 for a review). For example, the prefix "non-" perennially boxes individuals into othering associated with being defined in terms of their normative "native" peers (Holliday, 2018). It is also consistent with developments in an age of political correctness, where terms, names, and logos are sometimes changed in favour of more socially acceptable, neutral labels (Hughes, 2009). What is striking, however, is Paikeday's, Davies', Cooks', and Dewaele's convergence on the polysemous word "user" as the substitute for NS or NNS.
However, is the term "user" semantically transparent enough to traverse the multiple disciplines AL research touches upon? Secondary definitions of this polysemous noun are negative, referring to a person who "takes illegal drugs" or "exploits others" (Oxford English Living Dictionary, n.d.). In The Corpus of Contemporary American English (Davies, 2008), "drug" is the strongest collocate directly left of nodeword "user," so is not suitable as a neutral replacement for "speaker" in all research domains. Thus, despite terminological advances in AL, the use of the terms NS and NNS is still widespread in some research areas (Thomas and Osment, 2020).

DISCUSSING NS TERMINOLOGY WITH SENSITIVITY
Some AL researchers strenuously object to the NNS label due to the comparative fallacy and deficit model of language it reinforces (Ortega, 2018). Consequently, editors or reviewers of AL fora sometimes instruct authors to use alternative terms. Conversely, when authors avoid using NS/NNS, editors or reviewers may suggest reinstating these terms for clarity 1 . In some academic organizations, there have been moves to expunge the term NS from advertised job posts, which is now policy in several professional organizations (e.g. TESOL International, Japan Association of Language Teaching). In some cases, this has led to disparaging those who use the term. Jenkins (2014), for example, published exchanges from the British Association for Applied Linguistics (BAAL) mailing list, including her response to a job advertisement that referred to "near-native (or native) proficiency": I thought BAAL members had agreed not to post any more job adverts asking for "native" English (whatever that is). Or is it okay if the 'n' word is in brackets, preceded by "near-native or", and followed by "proficiency"? (p. 209) Jenkins makes reference to NS as the n-word, which is a euphemism for an incendiary, racist term that is among the most offensive in American English (Rahman, 2011). We trust that the comparison of nativespeakerism to the suffering and degradation of African Americans in the United States via reference to the n-word was unintended, but this example nonetheless reflects an extreme reaction within the TESOL community. It also highlights somewhat of a double-standard, where academic camps on one hand seek an open-minded community to promote ideas and on the other hand vilify dissenting opinions, with "detractors . . . accused of misinterpreting and misunderstanding" (O'Regan, 2015, p. 129).
Aggressive positions over the term NS unfortunately detract from the work of scholars who have fought tirelessly to debate and problematize the construct that underpins the term (e.g. Cook, 2002;Davies, 2013;Holliday, 2018). Arguments against the use of NS terminology in advertisements for English language teachers have been clearly articulated (see Matsuda, 2018;Selvi, 2018). We concur that in most cases, nativeness should not be considered as an essential or even desirable attribute of the successful candidate as long as they can demonstrate a high level of proficiency (however defined). However, there may also be jobs where speakers of particular language varieties are required at the exclusion of others. For example, a researcher running a Mandarin vowel perception experiment might aim to recruit Chinese-born speakers from the Beijing area to establish a degree of uniformity in speech stimuli. This example is not intended to problematize such research from the perspective of multilingualism (see May, 2014). Nor do we intend to nullify discriminatory hiring practices that may occur on the basis of real or perceived accented speech (Moyer, 2013), including decision-making coloured by racial prejudice (e.g. influenced by an individual's phenotype, such as skin colour; Kubota & Lin, 2006). However, adverts that use the term NS in the absence of another agreed term or standard should not automatically be considered racist. They are not necessarily all one and the same. Our intention is not to defend nativespeakerism, but, rather, in highlighting deep-seeded fieldwide divisions, to appeal for decent treatment of our peers, who may use constructs in different ways and for different purposes within the broad tent of AL.

FUTURE DIRECTIONS
The twilight of the NS, however labelled (and current labels are flawed), is unlikely to come any time soon. The term NNS is incendiary and problematic and will rightfully continue to be problematized. Within this context, we propose recommendations for the future of our field. First, in light of a lack of consensus on a suitable replacement term (Selvi, 2011), we suggest that the NS label may still serve a purpose if used critically. "NS" currently has the greatest semantic transparency to communicate TESOL research to a broad audience.
With observations of a research-teaching divide in TESOL and fragmentation within the wider research community (McKinley, 2019), it is important to avoid terms that educational stakeholders, policymakers, other researchers, and members of the general public might find inaccessible across the disciplines with which applied linguists work (e.g. TESOL, psychology, assessment) 2 .
Second, we need to recognize that SL plays an important role in L2 assessment, which can feed into curricula outcomes. It is also important to untangle the terms NS and SL, which refer to different constructs. Indeed, Selvi (2018) observes that it is a myth to believe a NS standard is needed to benchmark outcomes in English language teaching. While it is unrealistic to eradicate standards, we need to address what these standards should be. In L2 pronunciation research, Jenkins (2002) advocates supplanting SL norms with a core set of English as a lingua franca features. However, a strengthened evidence-base is needed to implement this proposal for assessment purposes, particularly in light of criticisms about her proposed set of features and methodological limitations of the research on which it is based (Isaacs, 2018). This notwithstanding the practical problem Jenkins (2002) raises on how to account for global Englishes in L2 assessments warrants careful consideration. Again, an SL need not equate to a "native" English (Hu, 2012). Rather, it should be informed by research on how English is being used within and across local and global contexts. In assessment settings, the target language use domain (i.e. real-world language use settings to which the test performance needs to generalize; Bachman & Palmer, 2010) should be instructive in determining which language variety is assessed, regardless of whether or not it is considered the SL. As Matsuda (2018, p. 5) argues, "it is about being more precise about what students need to know." Proposals to integrate a global Englishes approach into operational assessments, including in defining the test construct, are graining traction within the language assessment community (e.g. Harding & McNamara, 2018). This extends to scoring L2 performances when accounting for acceptance of nonstandard features that do not interfere with communication, especially in the less-formal spoken medium. For example, although the NS standard casts a shadow on some Common European Framework of Reference for Languages (CEFR) scales, there has been a shift towards a more intelligibility-based approach in some revised CEFR descriptors (Council of Europe, 2018). Other speaking assessment instruments now explicitly guide evaluators that nativeness is not required for achieving high-level performance in an attempt to steer their scoring away from nativelike attainment when this is extraneous to the focal construct (e.g. Isaacs, Trofimovich, & Foote, 2018). In most language use contexts, one need not sound like a NS to perform the task at hand (e.g. fulfilling job duties, accessing social services). Bearing such real-world demands in mind, researchers should query whether there is a tacit NS standard being upheld in their study and, if so, whether this is appropriate.
A related recommendation is for greater engagement with largescale test developers to address washback. Standardized English proficiency tests still implicitly assume that L2 speakers will only interact with either NSs or other L2 speakers who themselves adopt a NS variety (Jenkins & Leung, 2017). This draws into question whether such tests are truly international and measure the criterion of global communicative behaviour. There are signs of some positive movement. For example, some testing organizations have commissioned validation research on the use of a variety of accents better representing global Englishes (e.g. Kang, Thomson, & Moran, 2019). This work is underpinned by an acknowledgement that L2 varieties are also legitimate, with assessments needing to better reflect the varieties that target testtakers speak or are likely to encounter. In the same way that people can select which English variety their keyboard setting, sat nav, or AI virtual assistant is set to, looking into the future, international tests could develop more context-specific options reflected in different versions of a test that test-takers could select from. The accents represented in the listening component could reflect those that are most prevalent in the target language use context (e.g. desired destination for university study). These need not only be varieties from innercircle countries that traditionally host large numbers of international students (e.g. US, UK). Versions of the test could also extend to contexts where English is used as the medium of instruction but is not the dominant language in society (e.g. Denmark, Taiwan; Macaro, 2018). There has also been a push away from monolingual NS norms in recent work on assessment, as evidenced by a growing focus on the construct of multilingualism (Schissel, Leung, & Chalhoub-Deville, 2019) and the multilingual turn (May, 2014). Research needs to continue in this vein, especially as it has been argued that changes to assessments can positively influence the acceptance of local standards in education (see Hu, 2012).
Finally, we need to accept that NS benchmarks will likely persist in accuracy-focused research, particularly given the lack of an alternative codifiable global standard. Much of what we know about L2 acquisition is built on psycholinguistic studies investigating NS processes and/or performance-benchmarks, which have been used as a foundation to branch out to other standards of comparison. We concur with Andringa (2014) that studies that recruit NSs should not treat this group as a monolith. Researchers need to adequately describe NS background characteristics and within-group performance. Furthermore, studies that draw on NS controls or norms should not do so uncritically. Authors should provide a methodological justification for this in light of the research problem, ecological validity, and so forth. That is, including NS participants should not simply be the default option. Other alternatives should be explored and matters such as the utility of eliciting ceiling performance should be considered. This could lead to a more careful operationalization of what constitutes a NS in the context of the study and to more transparent research reporting.
To conclude, SL and NS should not be dirty words in a field aiming to explore the complexities of L2 learning. Language standards can be both problematic and useful, but ideologically ignoring their presence does not help to consolidate knowledge across our field. We need a research community that does not rebuke its own members for adopting terminology that reflects different ideologies. While there is clear value that terms such as L2 user, LX user, or multicompetence bring to our field, there may be pragmatic reasons for SL benchmarks to persist and for continued use of NS labels. In a pluralistic academic community that boasts wide-ranging research traditions, accuracyfocused research in reference to a standard (NS or otherwise) should be valued in the broader tapestry of AL research. We have a long way to go as a field before reaching a truly post-NS era, which would seem to be a reasonable aspiration for most, but not necessarily all contexts. Heath Rose is an Associate Professor of Applied Linguistics at the Department of Education at The University of Oxford. He is the author of a number of books including Global Englishes for Language Teaching (Cambridge University Press, 2019) and Global TESOL for the 21st Century (Multilingual Matters, 2020). His research explores the implications of globalization and internationalization on English language teaching, with a focus on the higher education sector in China and Japan.