A systematic review of argumentation related to the engineering‐designed world

Across academic disciplines, researchers have found that argumentation‐based pedagogies increase learners' achievement and engagement. Engineering educational researchers and teachers of engineering may benefit from knowledge regarding how argumentation related to engineering has been practiced and studied.


| INTRODUCTION
According to the Next Generation Science Standards (NGSS Lead States, 2013), students should be able to produce evidence-based arguments about the "designed world" to meet standards related to engineering design in science classes (MS-ETS1 Engineering Design). The designed world includes human-engineered "devices, systems, or processes whose form and function achieve clients' objectives or users' needs while satisfying a specified set of constraints" (Dym, Agogino, Eris, Frey, & Leifer, 2005, p. 104). The NGSS recommendation to integrate argumentation into engineering design instruction was informed by decades of research in science education, which found that argumentation leads to many positive outcomes, including improved understanding of scientific concepts and more nuanced and epistemologically authentic views of the nature of science, for diverse students from kindergarten through their final year of secondary school (K-12;Jiménez-Aleixandre & Erduran, 2007).
The NGSS recommendation to integrate argumentation into engineering instruction also coheres with studies of engineers' workplace practices (Gainsburg, Fox, & Solan, 2016;Jarzębowicz & Wardzi nski, 2015;Madhavan, 2015), which indicate that argumentation is an authentic, if not essential, component of engineering that is vital to positive outcomes. Though engineers address a wide array of problems, they routinely engage in the shared practice of argumentation as they make and justify claims to a range of audiences (Latour, 1987). These claims cover a range of topics, from claims that a system is likely to fail (Gouran, 1995) to claims that a particular design element should be adopted over another possible design element (Bucciarelli, 1994;Wilson-Lopez, Minichiello, & Green, 2019). When justifying these claims, engineers weigh trade-offs related to a range of factors, such as whether the design protects human safety, meets local and federal regulations, minimizes negative environmental impacts, offers benefits to clients or financiers, maximizes efficiency, and is ethical (ABET, 2018;Dym, Little, & Orwin, 2013). In recognition of the importance of argumentation to the field of engineering education, this term has been included in the taxonomy for engineering education research (Finelli, 2013).
Despite a widespread consensus that argumentation is important to engineering and despite educational standards that recommend argumentation be taught in the context of engineering design, few syntheses have explored how engineering argumentation has been enacted and researched in educational settings, thereby limiting the potential for a coherent and empirically based vision for improvement in engineering argumentation instruction (Wilson-Lopez, Sias, Strong, et al., 2018). Therefore, the purpose of this systematic review of empirical literature was twofold: to explore how arguments and argumentation relative to the engineering-designed world were operationalized in educational contexts and to identify strengths and opportunities for improvement in practice and research in this area. Following Sampson and Clark (2008), we distinguish between arguments, or the verbal, written, visual, and/or mathematical products that include claims and justifications, and argumentation, or the processes through which arguments are constructed.

| THEORETICAL FRAMEWORK
Quality systematic reviews are not atheoretical (Gough, Oliver, & Thomas, 2017); on the contrary, authors of systematic reviews should explicate the theories that shaped their research questions and methodology (Foster & Jewell, 2017). Accordingly, in this section, we describe the sociocultural theories of knowledge production that shaped this review. These theories assert that people in different settings use socio-historically derived tools to construct and transmit knowledge (Lave & Wenger, 1991;Wertsch, 1998). In writing of the settings in which knowledge construction occurs, Bernstein (2000) argued that workplaces, such as engineering firms or labs, are primary contexts and fields of production in which new knowledge is constructed. By contrast, educational settings are secondary contexts in which knowledge is reproduced and re-contextualized.
This re-contextualization entails significant shifts in social roles, mediational tools, and goals (Bezemer & Kress, 2008), and these shifts ensure that engineering education never fully approximates workplace engineering. For example, engineers often work in teams under the guidance of engineering managers as they use advanced computerbased tools and mathematical models to develop products and systems for users. By contrast, many students engage in engineering under the guidance of educators, often in expectation of grades. The pedagogical and representational tools that support children's or youths' engineering should be developmentally appropriate and, thus, different from those used by engineers. Moreover, because engineering is not required in many schools, K-12 students may engage in engineering tasks in non-engineering settings, such as in science classes or outreach events, whose overarching goals may also influence how engineering is taught and enacted.
Despite complex differences between sites of knowledge production and re-contextualization, educational settings can scaffold epistemically authentic experiences in ways that prepare students to be powerful producers of knowledge within given disciplines (Wilson, 2011). Argumentation has been recommended as a core instructional approach for achieving this goal (Andriessen & Baker, 2014;Osborne, Simon, Christodoulou, Howell-Richardson, & Richardson, 2013). In other words, although students may not be situated within the primary context of the engineering workplace, they can still learn how to make valid claims that bear similarities to those made by engineers, using the types of evidentiary supports that bear similarities to those used by engineers. In effect, through argumentation, educators can prepare students for further participation in epistemic cultures (Knorr-Cetina, 1999), defined as "those amalgams of arrangements and mechanisms … which, in a given field, make up how we know what we know" (p. 1, emphasis in original).
The phrase "how we know what we know" consists of two elements. First, epistemic cultures delimit "what we know," or products of knowledge construction. These products are discipline-specific in the sense that each discipline values and produces distinct types of knowledge claims. The second element of epistemic cultures is "how we know," or the practices for producing and legitimizing particular claims or rejecting those that do not meet standards of evidence established by the community of practitioners. Chemists, for example, may use molecular models and mathematical equations to make claims regarding the causes of changes in macroscopic properties (Osborne, Rafanelli, & Kind, 2018), whereas historians may corroborate across primary source documents, while determining the source and affiliation of different authors, to make claims about the significance of an event (Wineburg, 1998).
These discipline-specific processes for building and communicating knowledge have been termed epistemic practices (Cunningham & Kelly, 2017;Sandoval & Reiser, 2004). Although many secondary contexts for knowledge production (Bernstein, 2000) do not have dedicated instructional blocks for the discipline of engineering, scholars (e.g., Wendell, Swenson, & Dalvi, 2019) have demonstrated that the epistemic practices of one discipline can be enacted at a task level even in instructional settings designated for other disciplines. For example, the epistemic practices of engineering can be demonstrated through individual engineering design tasks or argumentative tasks within the context of instructional blocks designated for other subjects such as science, literacy, or technology education (e.g., Wilson-Lopez, Gregory, & Larsen, 2016;Wilson-Lopez & Minichiello, 2017).
In the context of these argumentative tasks, learners have opportunities to engage in engineering-specific knowledge construction practices (argumentation) that result in the production of claims with justifications (arguments). To return to Knorr-Cetina's (1999) phrase, the term argument roughly maps onto "what we know" and how we communicate what we know, whereas the term argumentation roughly maps onto "how we know." The two terms, however, overlap in the sense that quality arguments, defined as arguments that meet standards of evidence for a discipline, reflect the process of argumentation that produced them. In engineering, this process often includes the epistemic practices of iteratively testing and improving prototypes under specified conditions; developing and/or applying mathematical models; demonstrating how design ideas meet specified criteria and constraints that are outlined in part in regulations or specifications; predicting and defending projected impacts on systems; and justifying trade-offs-all in the context of discussion with other engineers and stakeholders (Bucciarelli, 1994;Downey, 1998;Vincenti, 1990).
Although arguments and argumentation are discipline-specific, scholars commonly use Toulmin's (1958) pattern of an argument to describe and analyze arguments across disciplines (Litman & Greenleaf, 2017;Ryu & Sandoval, 2012). Toulmin's pattern includes claims, or the assertions that one wishes to prove; data, the facts or evidence used to support the claim; warrants, or the explanations of how the evidence adequately supports the claim; backing, or the support for the warrant; and rebuttals, or the stated or anticipated counterarguments that offer competing sets of data or alternative claims. In the engineering workforce, claims may include assertions regarding whether testing procedures and processes are adequate; tentative or formal assertions regarding whether a particular design decision is justifiable; and assertions regarding where, how, when, and with whom overall solutions should be adopted (Vinck, 2003;Winsor, 2003). Engineers often consider a range of data when making claims, including results from tests of CAD models and physical prototypes; visual data such as maps, diagrams, or photographs; client feedback and perspectives; budget sheets; and coherence with mathematical and scientific principles. They use these data to justify claims in the context of weighing trade-offs, foregrounding safety and ethics, and meeting regulations and client needs, among other considerations (Wilson-Lopez et al., 2019).
Given that argumentation is a core practice in the engineering workforce and that it represents a promising approach to engineering education, the first purpose of this systematic review was to categorize and describe how arguments and argumentation have been operationalized in existing research literature in order to unpack the assumptions and values that drive epistemic cultures of engineering as they are enacted across diverse educational settings. The second purpose of this systematic review was to identify areas of strengths and areas for improvement in research and practice in engineering-related argumentation, given this review of the field. Specifically, we sought to answer the following three research questions: RQ1 How are engineering-related arguments operationalized in empirical studies conducted with learners? This question included the following sub-questions: What types of claims do educators ask learners to make, and how are these claims supported by particular types of justifications? By answering this question, we intended to illuminate how "what we know" about engineering is described in relevant empirical literature.

RQ2
How is engineering-related argumentation operationalized in empirical studies conducted with learners? This question included the following sub-question: What pedagogical practices or processes preceded or supported learners' production of arguments? By answering this question, we intended to identify epistemic practices that demonstrate the ways in which "how we know" in engineering is described in relevant empirical literature.
RQ3 What phenomena are studied in the context of engineering-related arguments and argumentation, and with which populations? By answering this question, we intended to identify gaps in the way that arguments and argumentation are studied. For example, we wondered whether argumentation had primarily been studied with White undergraduate engineering students; if so, research in this area could be improved by studying more diverse populations in respect to age and race/ethnicity.
Overall, we intended to use the answers from these research questions to identify current strengths, as well as potential areas for improvement, in how engineering-related arguments and argumentation were operationalized and researched in educational settings.

| METHOD
To answer these research questions, we conducted a systematic review. Scholars (Borrego, Foster, & Froyd, 2014) have asserted that in order to establish quality in systematic reviews, authors must first assemble a research team with appropriate interdisciplinary expertise. Accordingly, we began this review by assembling a team of researchers with advanced degrees in the following disciplines: engineering; engineering education; literacy education (including written and oral argumentation); science education; and systematic reviews and academic librarianship.

| Scoping review
The academic librarian and the first author conducted a scoping review (Petticrew & Roberts, 2006) using several education databases (Education Source, ERIC, and Academic Search Premier) and subject-specific databases (Engineering Village). This scoping review served four functions: (a) to determine whether a full systematic review was feasible; (b) to determine an appropriate time frame for inclusion in the systematic review; (c) to determine appropriate search terms; and (d) to identify additional areas of expertise needed on the research team. We report specific results from one database, Education Source on EBSCO Host, in Table 1 because it returned the most comprehensive results and the other databases indicated similar patterns. As suggested by Table 1, our scoping review suggested that although hundreds of articles were in some way related to argumentation in the engineering-designed world, only three articles addressed this topic as a main subject. Because the NGSS recommended engineering in science classrooms, we wondered whether science education journals might also illuminate how argumentation has been operationalized and researched in the context of engineering instruction. Indeed, in his evaluation of the NGSS, Rodriguez (2015) asserted that "good [science] teachers and researchers/teacher educators have been promoting engineering practices in their classrooms and/or research projects all along" although they may not have "specifically called these practices engineering or sought to … make a distinction between engineering and science" (p. 1036). To identify whether science education journals might be relevant to our search, we searched for articles on argumentation in science education and found that many more articles (128) identified argument in science and education as a main subject.
In skimming through several of these articles, we noticed that research participants made arguments in relation to the designed world. For instance, in McNeill's (2011) empirical study, a fifth-grade teacher "had students debate the quality and strength of their claims, evidence, and reasoning for an argument addressing the question: How do you design a car to go the fastest?" (p. 806). One author, a registered professional engineer, determined that this type of argument directly related to engineering education even though McNeill never used the word engineering when describing students' activity. Consequently, we decided to add "science education" to our search terms for the full systematic review.
We wondered whether we might find comparable results for studies conducted under mathematics or technology education because engineering is often grouped as a STEM (science, technology, engineering, and mathematics) subject. Although these search terms yielded more articles than those for engineering education (see Table 1), we found that these articles were not relevant to this systematic review. After skimming each article in mathematics and technology education with argumentation as an identified subject, we determined that the research participants' arguments did not relate to engineering.
For instance, the search terms mathematics AND argument AND education resulted in a study by Conner and colleagues (Conner, Singletary, Smith, Wagner, & Francisco, 2014), which described a high school mathematics classroom in which students justified or refuted the claim that "the angles are never going to change in a regular polygon, even if the sides do" (p. 404). Students' knowledge of angles was not applied to or placed in context of engineered designs, and, thus, this argument was not coded as engineering-related. As a second example, the search terms technology AND argument AND education resulted in teacher research by Fink (2017) regarding how multimedia formats, including music, were used to enhance the quality of college undergraduates' persuasive arguments. The students used technology to argue whether characters from literature were examples of good parents, a topic that was not coded as engineeringrelated.
In all, the scoping review pointed toward search terms that were relevant for our subsequent systematic review. Informed by these search terms, the academic librarian partnered with another librarian to develop a search syntax, which was modified for different database searches (see Appendix A for final search strings) from major education, engineering, and science research databases. This scoping review also indicated that the first empirical studies with science or engineering, education, and argument as main subjects were published in 2000. Consequently, we set 2000 as the beginning date for this systematic review, which spans articles published from January 1, 2000 to July 31, 2017, including advanced online publications. Finally, given the prevalence of science education in our search, we added an expert in science education who had also conducted research in engineering practices in science classrooms to the research team.

| Systematic review
Using the search strings specified in Appendix A, we located 3,397 studies, which we uploaded to Rayyan (Ouzzani, Hammady, Fedorowicz, & Elmagarmid, 2016), a web application designed for conducting systematic reviews. This web application helped the research team to identify and eliminate 1,213 duplicates. For the remaining articles, four authors with expertise in engineering or literacy education read and discussed randomly selected articles to develop the following inclusion and exclusion criteria. 1. Study was in English. Although members of the research team spoke Spanish and Portuguese, the purpose of this criterion was to enable all researchers to read and discuss individual articles where necessary. 2. Study was peer-reviewed, including dissertations. The purpose of this criterion was to ensure that the documents met standards for quality as determined by others in the research community. 3. Study was empirical (qualitative, quantitative, or mixed methods). The purpose of this criterion was to enable an examination of studies, versus theoretical papers or practitioner-oriented articles, in accordance with our research questions. For a study to be considered empirical, it had to (a) state a research question, purpose, or hypothesis; (b) include a methods section that explicitly mentioned data sources and one or more methods for data analysis; and (c) include results or findings that stemmed from the analysis. 4. Learners generated or used justified claims. The purpose of this criterion was to enable us to answer the research questions by analyzing the types of arguments learners made (RQ1) and by identifying the epistemic practices (if any) that preceded these arguments (RQ2). We defined "learner" as any child or adult in any educational setting, including but not limited to K-12 science or engineering classes, college labs, teacher pre-service or in-service programs, and informal educational spaces such as museums. We defined "justified claims" as claims that were supported with data, evidence, reasons, or experiences, regardless of the perceived quality or adequacy of their justifications. Although Toulmin's argument model is widely used among educational researchers, we adopted this broader and more generic definition of justified claims because people often justify claims in culturally specific ways (Gee, 2015) and because Toulmin's sub-elements of arguments (e.g., warrants and backings) are often difficult to tease apart in analyses (Sampson & Clark, 2008). 5. The justified claims related to the designed world, with "designed world" defined as processes, products, systems, or devices created by humans. We adopted this phrase from the NGSS related to engineering design, which recommended that learners should be able to communicate "a convincing argument that supports or refutes claims for either explanations or solutions about the natural and designed world" (MS-ETS1-2). Although the NGSS explicitly state that this recommendation should be enacted in the context of engineering design, we identified instances in which learners made claims related to the designed world even when they were not designing something, such as when the undergraduate engineering majors in Jonassen and Cho's (2011) study argued which regulations should be adopted in a hypothetical ethics-based scenario. When operationalizing this criterion in consideration of our research questions and the dataset, we defined engineering-related arguments according to the topic of the argumentative task, and not necessarily according to the context (e.g., an engineering class) or activity (engaging in engineering design in a science class). To summarize, if learners made claims related to the designed world, the study was included. 6. Meta-analyses or systematic reviews were excluded. The purpose of this criterion was to maintain a focus on engineering-related arguments. Because we could not discern whether all articles in the meta-analyses included studies related to engineering, we excluded syntheses from this study.
After these criteria had been articulated, two authors independently read abstracts from the remaining 2,184 documents; they identified that 1,746 documents should be removed from the dataset because they clearly did not meet one or more inclusion criteria. These two coders achieved 97.08% agreement on exclusion decisions based on abstracts. In cases where they disagreed, the abstract was retained in the dataset for further consideration.
After eliminating these abstracts, two authors independently read the full text of the remaining 438 studies, and achieved 94.74% agreement regarding whether each study should be included or excluded in the final dataset. A minimum of three members of the research team read and discussed the studies for which there was disagreement until all mutually agreed that they should be included or excluded based on the criteria. Using this process, we excluded 321 articles based on their full texts, which left 117 studies that comprised the final dataset for the systematic review. These studies are indicated with an asterisk in the references section.

| Segmenting the data
To segment the dataset comprised of all included empirical studies, two members of the research team reread each article and placed relevant information from each study into three spreadsheets, each of which corresponded with a research question. We segmented the data thematically in accordance with the research questions (Schreier, 2014); for example, one claim represented one segment, and one support for a claim represented one segment, regardless of their length. The first spreadsheet (RQ1) included the heading "Claims" and the heading "Justification" (RQ1). We combined Toulmin's categories of evidence, warrants, and backing under the general heading Justification because at times all three appeared in a single sentence, making their separation difficult (cf. Sampson & Clark, 2008). Moreover, as noted previously, we wanted to be inclusive of diverse methods for supporting claims even if they fell outside of Toulmin's model. The following example illustrates the segmenting process we followed for RQ1. In Maloney's (2004) study, the sentence "Jackie claims that the thin plastic cup would be best to take on a picnic" (p. 160) constituted one segment under the table heading Claim.
We followed similar thematic segmenting processes for the second and third research questions. The second spreadsheet (RQ2) included the following three headings: physical practices, oral language practices, and literacy practices. These categories emerged as core epistemic practices, or processes through which arguments were constructed, from the dataset. As an example, the middle school students in Basche, Genareo, Leshem, Kissell, and Pauley's (2016) study engaged in "watershed model building to trace pollutant movement and landscape management as an intervention" (p. 3). That phrase constituted one segment under the table heading Physical Practice in the second spreadsheet. Finally, the third spreadsheet (RQ3) included the following headings: Study Design, Phenomenon Studied, and Demographics. For example, the article by Watson, Swain, and McRobbie (2004) contained the following sentences: "Each of two teachers was observed teaching mixed ability Year 8 (age 12-13) classes in nine 50 minute lessons. Each class consisted of 30 students. Observation focused on each teacher and two groups within each class. In each class, one boys' group and one girls' group was chosen" (p. 29). These sentences constituted one segment under the heading Demographics because they provided information about the participants' age and gender.

| Coding the data
We used qualitative content analysis (Schreier, 2014) to inductively develop codes that described patterns that emerged from the data in the spreadsheets. Qualitative content analysis is used for "reducing the amount of materials" by "focus[ing] on selected aspects of meaning, namely those aspects that relate to the overall research question" (Schreier, 2014, p. 170). Following procedures for qualitative content analysis, we used data-driven processes to develop codes and definitions that adhered to the data while aligning with the research questions. The codes and definitions, indicating alignment with the research questions, are indicated in Table 2.
Most studies did not indicate the relative frequency of arguments nor epistemic practices by indicating that a specified number of research participants used a certain type of justification for their claim. To account for this limitation, in our frequency count one code was counted for each study only once; however, each study (and each segment) could receive multiple codes from the same superordinate category. For example, the fourth-grade students in Chen, Wang, Lu, Lin, and Hong's (2016) study used multiple types of scientific principles to justify their claims regarding their egg drop device designs, but the frequency count included only one instance of "scientific principle" for this study under the superordinate category of "justifications." The fourth-graders also drew a connection when they compared air foam (used to protect expensive equipment during transportation) with elements of their egg drop devices. We coded this latter example as "analogy" and also counted it once under the superordinate category of "justifications" for this study.

| Ensuring quality
We took several measures to ensure quality in this systematic review. First, we assembled a research team with diverse areas of expertise who helped to conceptualize and implement the study. Second, as described previously, we used multiple readers during each step of the process (Freeman, deMarrais, Preissle, Roulston, & St. Pierre, 2007). Third, after the codes had been developed, two authors coded the segmented data using Dedoose (2018), a qualitative analysis software package, and achieved over 80% agreement on all codes, an indication that they were reliable (Saldaña, 2016). For the codes on which we disagreed, we discussed them until we came to mutual agreement (Smagorinsky, 2008). To further verify our findings, a minimum of two authors read the codes assigned to each study and cross-checked them against the fulltext of the articles to ensure that each study received confirmable codes. Finally, we sought to achieve transparency through explicating our decisions at all steps of the process, including through communicating search strings, definitions of codes, and a summary of coded studies. These methods helped to ensure the trustworthiness (Lincoln & Guba, 1985) of the systematic review.

Category
Code Definition

RQ1: Arguments: Claims and justifications
Claim Assertion that the learner is making in the face of other possible contradictory assertions.

Adoption
Learner argues that a design or design element, which somebody else has created, should or should not be implemented at all or in a specified context or in relation to other possible designs.

Design
Learner argues that a design or design element for a product or process, which they generated themselves, should or should not be adopted; that it performs better or worse than other designs; and/or that it meets specified criteria or constraints.

Evaluation
Learner argues that a design or design element, which somebody else has created, does or does not meet a criterion or constraint necessary for product quality.

Failure analysis
Learner argues that a process, product, or system failed due to a specified reason or series of reasons as opposed to other possible reasons.

Liability
Learner argues that a person or entity, as opposed to other possible people, bears responsibility for the oversight or consequences of a design.

Science
Learner argues that a particular scientific or mathematical concept, as opposed to another possible scientific concept, explains how or why a design works.
Testing Learner argues that a testing event should include a specified variable or feature or that it was adequate or inadequate for its stated purpose.

Justifications
Evidence, warrants, backings, experiences, or reasons which justify why the claim should be accepted.

Analogy
Learners draw connections between previous cases or experiences to justify a claim.

Authority
Learners invoke expertise, including a text or a person, to justify a claim.

Economics
Learners use factors related to budget, economic impacts, or revenue to justify a claim.
Environment Learners describe projected or actual environmental impacts, including damage to plants and animals and concerns with sustainability, to justify a claim.

Ethics
Learners appeal to values, moral judgments, or articulations of right or wrong to justify a claim.

Data
Learners use empirical results from tests or observations, conducted by themselves or somebody else, to justify a claim.

Human users
Learners use human preferences, aesthetics, or behaviors to justify claims.

Originality
Learners highlight creative or original features to justify a claim.

Regulations
Learners use laws, ordinances, policies, or standards to justify a claim.

Safety
Learners use factors related to human safety or health to justify claims.

Scientific principles
Learners use scientific or mathematical concepts to justify a claim. (Continues)

RQ2: Argumentation: Epistemic practices
Physical practices Practices for deriving feedback from material world.

Experiment
Learners seek to answer a question by controlling variables and observing outcomes.

Observation
Learners observe existing phenomena without controlling or manipulating variables.

Test
Learners test a design or design element to see how it performs.
Oral language practices Practices for structuring social interactions using oral language.

Interviews
Learners question people who are stakeholders relative to the designed world.
Role play Learners assume the role of a stakeholder and participate in a debate or discussion.

Small-group discussion
Learners exchange or debate information or ideas in small groups.

Whole-class discussion
Learners exchange or debate information or ideas as a whole class or group.

Literacy practices
Practices surrounding the reading and writing of texts.

Audience
Learners direct their argument to a specified audience, such as a client, excluding fellow classmates and their teacher.

Differing texts
Learners read one or more texts that represent different or opposing viewpoints or insights on the problem.

Feedback
Learners receive feedback on initial graphic organizers or drafts from a peer or a teacher.

Scenario
Learners read a case study, scenario, or brief that situates the problem.

Search
Learners use search strategies, such as entering terms into internet searches, to gather information.

Writing features
Learners are explicitly taught the features of an argument through definitions of components, exemplar texts, or evaluation tools such as rubrics and checklists.

Writing scaffold
Learners complete a graphic organizer, template, or question prompts prior to writing arguments.
RQ3: Study design, phenomena studied, and demographics

Study design
The research design used to examine the phenomenon of interest.

Mixed methods
Studies describing themselves as mixed-methods, and/or studies reporting both quantitative data analysis and qualitative data analysis.

Qualitative
Studies state they use a qualitative design or analysis, and/or use inductive or deductive analysis of data to generate codes or themes presented as frequency counts, thematic trends, or narrative descriptions.

Quantitative
Studies state they use a quantitative design or analysis; and/or they use inferential statistics; and/or they use a pre-and post-test design or a control or comparison group with findings presented numerically.

Phenomenon studied
The stated focal outcome or phenomenon of interest in relation to the research purpose, question, or hypothesis.

Affect
Researchers studied participants' attitudes, motivation, efficacy, emotional states, confidence, or interest.

Arguments
Researchers studied the quality, quantity, or nature of the participants' oral or written argumentation.

Engineering
Researchers studied aspects of the participants' engineeringrelated thinking or activity (e.g., design processes and weighing trade-offs) excluding knowledge of science. (Continues)

| Limitations
This systematic review is characterized by at least three major limitations. First, researchers do not always use consistent terms when describing similar phenomena. This review did not include studies without the terms "argument" or "argumentation," T A B L E 2 (Continued)

Science
Researchers studied research participants' knowledge of scientific concepts or principles, scientific literacy, or epistemic understandings of the nature of science.

Pedagogy
Researchers studied participants' pedagogical knowledge or practices.

Thinking
Researchers studied participants' metacognition, reflection, reasoning, general thinking, or critical thinking skills.

Other
Researchers studied an outcome not enumerated above, or researchers' phenomenon of study was unclear. This code was only applied if the study received no other code.

Demographics
Information related to participants' race/ethnicity, gender, and language. Studies that reported demographic information of school or community (not participants) were counted as "not reported".

African American
Studies reporting a percentage or number of participants identifying as Black or African American.

Asian
Studies reporting a percentage or number of participants identifying as Asian or Asian American.

Biracial
Studies reporting a percentage or number of participants identifying with two or more races or ethnicities.

Indigenous
Studies reporting a percentage or number of participants identifying as Native American, American Indian, or Native North American.

Language
Studies reporting a percentage or number of participants whose home language or language proficiency was indicated.
Latinx Studies reporting a percentage or number of participants identifying as Latino/a, Latin American, Latinx, or Hispanic.

Not reported
No information about gender, race, or language was provided in the study.

White, non-Hispanic
Studies reporting a percentage or number of participants identifying as White or Caucasian.

Grade band
Background information related to participants' educational stage.

Elementary
Participants were attending grades K-5 or the equivalent.

Middle school
Participants were attending grades 4-8 or the equivalent.
High school Participants were attending grades 9-12 or the equivalent.

Undergrads
Participants were attending college (freshmen through seniors).

Graduates
Participants were attending master's or doctoral level courses.

Teachers
Participants were educators in formal or informal settings, including K-12 schools and universities.

Industry
Participants were from engineering-related industries (e.g., architecture and engineering). even if those studies described the ways in which students use evidence to support claims in engineering. While acknowledging this limitation, at the same time we justify our choice of search strings because argumentation is a core term used across educational research to describe the types of epistemic practices we sought to study (Goldman et al., 2016). Second, this review did not attempt to remove multiple studies that may have been conducted with the same research participants. For example, Nielsen (2012aNielsen ( , 2012bNielsen ( , 2012c published three studies in which biology students from a Danish upper secondary school discussed whether human gene therapy should be allowed. We included all three studies in the review, coded each claim as "adoption" per our coding scheme, and counted "adoption" three times (once for each study). While this method may have resulted in an inflated count, we justify this decision because we did not want to make unfounded assumptions regarding research participants, and at times different phenomena were studied even when the research participants may have been the same. Third, this systematic review is an analysis of existing studies for which we did not see the raw data. Thus, the participants in each study may have engaged in pedagogical practices or made claims that were not reported by the researchers when they wrote the study. Moreover, because most studies did not report the relative frequency of argument types or epistemic practices, this systematic review does not identify their relative prevalence. Given these limitations, we do not claim that this systematic review represents an accurate reflection of educational practices in engineering argumentation. Instead, this review illuminates how researchers describe arguments and argumentation related to engineering.

| FINDINGS
This section is divided into three sections, each corresponding with the three research questions. In the first section, we describe how "what we know" was operationalized in relevant empirical studies by identifying the types of arguments that research participants made in the included studies. In the second section, we describe how "how we know" was operationalized in relevant empirical studies by identifying the epistemic practices that preceded the arguments. In the third section, we describe the populations that were studied, as well as the researchers' focal phenomena in the studies as a whole. Appendix B provides examples of our findings for each of the studies included.

| RQ1: Arguments
This section details how different types of claims and justifications were operationalized across the dataset. Table 3 represents an overview of the findings in relation to the first research question.

| Types of claims
Most commonly, in more than half of the studies (60.68%), research participants made one or more claims that an existing technology should or should not be adopted (coded as "adoption"). Under this category, the two most prevalent

Frequency of types of claims
Frequency of types of support Originality 2 examples of claims were whether a nuclear power plant should be built in the participants' region and whether specific types of genetically modified organisms should be produced or allowed. In studies that had been assigned an adoption code, learners did not tend to argue whether a specific design should be adopted-for example, by arguing that a nuclear power plant with boiling water reactors versus pressurized heavy water reactors should be built-but rather they made claims based on the ramifications of a generalized technology as a whole-for example, by arguing that a generic nuclear power plant with non-specified components should or should not be built in a particular place (e.g., Namdar & Shen, 2016;Ozturk & Yilmaz-Tuzun, 2017). Adoption was the most common type of claim that appeared across the studies, but claims regarding scientific concepts (25.64%), participant-generated designs (23.93%), and evaluations of existing designs in relation to criteria or constraints (18.80%) were also not uncommon. As examples of claims related to scientific concepts, high school students argued over how and when they felt lateral forces acting on an amusement park ride (Nielsen, Nashon, & Anderson, 2009); and when designing an electrical circuit, third-graders argued whether electricity traveled in only one direction or in multiple directions in a circle (Harlow & Otero, 2004). As examples of design claims, the middle school students in Kim and Song's (2006) study argued whether their self-designed straw flute should have a finger hole, while the elementary and middle school teachers in the study conducted by Mathis, Siverling, Glancy, and Moore (2015) developed curricula asking their students to argue on behalf of the anchor system they designed to maintain the stability of an amusement park ride. As an example of an evaluation claim, undergraduate pre-service teachers argued whether a concentrated acid solution had enough power to remove lime from a kettle (Cetin, 2014).
Less commonly, participants made claims regarding the adequacy of testing conditions (10.26%), regarding who bore responsibility for the oversight and consequences of designs (5.13%), and regarding the causes of design failures (4.27%). At times, an educator purposefully scaffolded and facilitated discussions of these claims, whereas at other times, the learners raised concerns themselves. As an example of the latter, in Gu's (2016) dissertation, middle and high school students tested water quality in a local river. In response to the discussion question, "What are some methods you have observed … that were used by the scientists," one student noted, "we dipped the test paper, but didn't even measure how long we dipped … we could have had it more controlled" (p. 70). The student then proceeded to provide the justification behind why he perceived the water testing conditions were not controlled. In this example, a research participant produced an argument about testing conditions even though testing was not the focus of the instructional approach or of the study.

| Types of justifications
As indicated by Table 3, across the dataset as a whole, research participants considered a range of factors when justifying their engineering-related claims. Most commonly, in more than half of the studies (53.85%), research participants used their understanding of scientific or mathematical principles to justify claims (coded as "scientific principle"). For example, in the study conducted by Kind, Kind, Hofstein, and Wilson (2011), 12 and 13-year-olds produced claims regarding which color a metal tanker lorry should be in order to keep its liquid contents hot. The students justified their responses by using their understanding of heat absorption: They asserted that a black container is "in the Sun and then it absorbs the heat" (p. 2539). Secondarily, in 42.74% of the studies, research participants considered human safety or health; in 39.32% of the studies, they considered environmental impacts, and in 35.90% of the studies, they considered factors related to cost, such as revenue or budget. Justifications for claims related to data (results from tests or observations), consideration of human users, ethics, and analogy were included in 29.06% to 30.77% of the studies. By contrast, appeals to authorities (14.53%), reference to regulations (11.97%), and mentions of originality (1.71%) were relatively rarely used as justifications for claims.
These percentages suggest that, across the studies as a whole, participants marshaled many different types of considerations toward supporting claims, which is consonant with K-12 and undergraduate engineering standards as well as with the work of practicing engineers. However, when we examined each study individually, we found that more than half of the studies (54.70%) did not indicate that participants used four or more different types of justifications (e.g., safety and environment) to support a single engineering-related claim, and almost all researchers did not describe scaffolds that provided learners with opportunities to systematically coordinate multiple types of justifications. Dym et al. (2013), for example, recommended matrices to help undergraduates systematically identify whether and to what extent a design includes a range of required or desired characteristics, while Wilson, Smith, and Householder (2014) found that, given the complexity of engineering design, high school students were less likely to remember to incorporate several previously identified criteria and constraints into their designs when they did not list and revisit them. Thus, while several studies (45.30%) indicated that learners considered four or more different types of justifications to support their claims, there was no indication that their learning environments included instructional supports in prioritizing and justifying differing considerations or weighing them against one another. To further support this interpretation, we conducted a word search within each study and found that only 8.55% of the studies used the word trade-off or tradeoff when describing the research participants' arguments or argumentation processes.

| RQ2: Argumentation
This section describes how authors operationalized argumentation, or the ways in which they described the epistemic practices through which research participants constructed their arguments. As indicated in the codebook, three categories emerged from the content analysis: literacy practices (appearing in 82.91% of the studies), oral language practices (appearing in 75.21% of the studies), and physical practices (appearing in 35.90% of the studies). Table 4 presents an overview of the findings in relation to the second research question.

| Literacy practices
Most studies portrayed reading and writing as epistemic practices central to how we know, or the process of argumentation. In 38.46% of the studies, research participants read texts that presented different perspectives on the technology or design that was the subject of debate (coded as "differing texts"). For example, the high school students in Nielsen's (2012a) study read "written material" that took "four archetypal positions toward gene therapy based on authentic statements from participants in the public debate in the US" (p. 434) prior to constructing arguments regarding whether human gene therapy should be allowed. In other studies, participants read about technologies from different perspectives, including from economic, ecological, ethical and social perspectives (e.g., Rosborough, 2010), or from the perspectives of different stakeholders (e.g., Basche et al., 2016).
In addition to introducing students to different perspectives, reading served another important function in argumentation as well: It introduced research participants to a problem. Across many studies (37.61%), participants read scenarios that provided an overview of the issue that had been or could be addressed through engineering (coded as "scenario"). For example, undergraduate engineering students in Jonassen and Cho's (2011) study read "teaching cases," such as one in which a manager wanted to grandfather buildings in under older (and less safe) enforcement regulations, prior to placing themselves in the position of an engineer who argued which regulations should be adopted. Across studies, reading also played a third role in the research participants' argumentation processes: In 17.95% of the studies, they conducted database or Internet searches to gather more information about the problem (coded as "search"). For example, when developing an argument regarding whether a nuclear power plant should be established in Turkey, the pre-service science teachers in Tekbiyik's (2015) study were encouraged to conduct "deeper research" via the Internet (p. 242).
Given that reading appeared in many studies, we sought to identify whether the researchers described literacybased supports designed to help students locate or interpret information. Previous research (Wertz, Purzer, Fosmire, &  Cardella, 2013) has indicated that even undergraduate engineering students have difficulties with locating and evaluating information related to design problems. However, after re-reading the studies that had been assigned a "search" code, we found that only three of them mentioned explicit supports provided to students, such as discussions on how to use search terms effectively or on how to evaluate the quality of websites. For example, the instructional materials in Shoulders' (2012) dissertation stated that, prior to writing an argument evaluating the pros and cons of cultured meat produced through tissue engineering, high school agriculture students should "develop a list of criteria they used in determining whether the websites' information was credible or not" (p. 268). In all, a vast majority of studies (97.44%) did not indicate that learners received explicit supports regarding how to search for and evaluate relevant and trustworthy information.
We also wanted to identify whether the studies in our dataset indicated that educators provided supports on how to understand or interpret texts. Existing research (Wilson et al., 2014) suggests that some learners may not identify implicit criteria and constraints in engineering scenarios without instructional supports, while other research (Fang, 2005;Snow, 2010) suggests that many learners may not understand difficult information in complex scientific and technical texts without instructional supports designed to promote comprehension. Upon re-reading the studies that had been coded as search, scenario, or differing texts, we did not find explicit examples of comprehension strategy instruction, which has long been recommended by reading researchers (e.g., Block & Pressley, 2002) as a core approach for promoting comprehension. However, a handful of studies (e.g., Bligh & Coyle, 2013;Diazibarra, 2016;Falcones, Wong-Villacres, Barzola, & Garcia, 2016;Safadi, Safadi, & Meidav, 2017) mentioned activities in which students "annotated" texts; and, as described in more detail later, a majority of studies included student discussions, both of which held the potential for helping students clarify their understandings of texts through talking with others.
In addition to reading, many studies also included writing as a core method through which arguments were constructed. Osborne, Erduran, and Simon (2004) contended that to enhance the quality of argumentation in science, researchers can draw "from the literature on teaching students to write" (p. 1001). In accordance with this recommendation to draw from research on writing, many studies included writing supports that cohered with research on writing (Graham et al., 2016;Graham & Perin, 2007). For example, 30.77% of the studies mentioned writing scaffolds, such as graphic organizers or concept maps, which visually presented the relationship between different components of an argument. In 21.37% of the studies, learners explicitly learned the features of an argument, often through viewing and discussing models and/or through learning the definitions of different components of an argument, before writing their own arguments. In 17.09% of the studies, they wrote to a real or imagined audience (coded as "audience"), such as a client. Finally, in 15.38% of the studies, research participants received personalized suggestions on their writing (coded as "feedback"), either from teachers or peers, and had opportunities to reflect on or revise their written arguments.

| Oral language practices
Like reading and writing, discussions (or interactions using oral language) were usually included as a core epistemic practice through which claims were constructed and defended. Most commonly, in 58.97% of the studies, research participants engaged in discussions in small groups prior to producing an argument (coded as "small group discussion"). While some of these small-group discussions were open-ended by allowing students to discuss the topic without specified prompts, other discussions were more directed. For example, Emig, McDonald, Zembal-Saul, and Strauss (2014) gave guides with questions to pre-service elementary teachers. They discussed these guides in small groups prior to making arguments regarding which simple machines were most analogous.
Secondarily, in 40.17% of the studies, teachers and students engaged in whole-class discussions or debates as part of the argumentation process (coded as "whole-class discussion"), such as when the pre-service science teachers in Genel and Topçu's (2016) study orally debated whether synthetic pesticides should be used on farms. In some studies (16.24%), research participants assumed a specified perspective during discussions (coded as "role playing"), such as assuming the stance of a fish farmer or a fishmonger prior to arguing whether a local transgenic salmon farm should be established (Simonneaux, 2001). Finally, a handful of studies (7.69%) mentioned interviews with stakeholders (coded as "interviews") such as clients, technicians, or engineers. Collectively, these studies suggest that research on arguments related to the designed world, as a whole, presents engineering-related claims as being constructed through a variety of different types of social interactions, including discussions and debates with others who hold diverse perspectives. This emphasis on discussion, including listening and understanding multiple stakeholders' perspectives, coheres with the work of practicing engineers (e.g., Bucciarelli, 1994).

| Physical practices
Research participants engaged in physical practices, or practices in which they gathered empirical evidence by seeking "feedback … from the material world" (Apedoe & Ford, 2010, p.166) in approximately one-third (35.90%) of the studies. Most commonly, in 16.24% of the studies, research participants conducted tests of designs or design elements (coded as "tests"). For example, fifth-grade students designed eco-columns and used evidence (e.g., pill bugs in the column died) to argue whether they had designed a stable ecosystem (McNeill, 2011).
Secondarily, in 12.82% of the studies, participants made observations of existing phenomena (coded as "observations"), such as when 11th and 12th grade students observed a nature park prior to making claims about changes that should be made to its design (Pennock, 2015). Finally, in 6.84% of the studies, participants engaged in experiments, such as when the eighth-grade students in Yang, Lin, She, and Huang's (2015) identified changes in mass for chemical reactions prior to making arguments about methods for extinguishing fires. As a second example, a high school science teacher asked students to compare bread smeared with make-up to plain bread prior to arguing whether people should use synthetic make-up (Lin, Hung, & Hung, 2017). Collectively, these findings indicate that empirical tests or observations did not appear in a majority of studies. When they did, these tests either bore a direct relationship to claims, such as when death of pillbugs was used to support claims about the health of an ecosystem, or an indirect or potentially specious relationship to claims, such as when make-up on bread was used to support claims regarding make-up use on humans. Table 5 summarizes the demographics of the research participants as well as the focal objects of investigation. As indicated by this table, most studies (77.78%) did not include information about the race or ethnicity of the research participants, nor did they mention the participants' linguistic backgrounds, including whether the language of instruction cohered with their home languages. However, most studies did mention basic information about the participants' institutional roles or designations in relation to schools (e.g., elementary students or pre-service teachers). Table 5 indicates that most research on engineering-related argumentation (64.10%) had been conducted with K-12 students, and 36.75% with undergraduates, with a smaller handful (11.11%) focused on adults in professional settings such as teachers or those in industry.

| PHENOMENA AND POPULATIONS STUDIED
In terms of phenomenon studied, a majority of studies (70.09%) were designed to identify features or developments in learners' arguments. For example, the tenth-grade students in Dawson and Carson's (2017) study argued about a variety of socioscientific issues, such as whether wind turbines should be built on a farmer's property. The analysis revealed that "a majority of their responses consisted of a claim and data with backings, qualifiers, and rebuttals rarely provided" (p. 1). This finding was coded as "arguments" because the researchers studied the nature of the participants' claims and evidence. Secondarily, studies (28.21%) examined features or developments in learners' knowledge of scientific concepts or nature of science, such as when Emig (2011) identified that analogical-mapping-based comparison Bi-racial 2 tasks in argumentation supported learners' "content learning about simple machines" (p. iii; coded as "science"). By contrast, studies about learners' affect (e.g., their attitudes or interests) or teachers' pedagogies remained relatively rare (15.38 and 9.40%, respectively). Finally, very few studies (5.13%) explicitly reported on features or developments in learners' engineering thinking or activity, such as systems thinking (e.g., Dori, Tal, & Tsaushu, 2003) or weighing of trade-offs (e.g., Sakschewski, Eggert, Schneider, & Bögeholz, 2014). Of the studies designed to determine changes or developments related to a specified outcome, most identified positive results for research participants when they engaged in engineering-related argumentation. For example, the preservice first grade teachers in Cetin's (2014) quasi-experimental study performed significantly better on tests of scientific knowledge of reaction rates when they participated in engineering-and science-related argumentation activities as contrasted with those in a control group. However, a few of these studies (3.42%) indicated that there were no effects associated with engineering argumentation on some measures, such as when Callahan (2009) found that engineering-related argumentation did not produce statistically significant changes in high school biology students' nature of science understanding, reflective judgment, and argumentation skills. We did not find any studies that indicated argumentationbased instructional approaches had a negative effect on any outcome, for example, that it decreased learners' interest in science or engineering.
Collectively, these studies suggest that engineering-related argumentation is a promising instructional approach for promoting a variety of positive learning outcomes even though the majority of studies were not designed to explicitly focus on students' engineering thinking or activity. This latter assertion is consistent with the fact that we located most studies under the search term of "science education" versus "engineering education." Moreover, most studies did not specify potentially salient characteristics of the research population such as their race/ethnicity or their level of familiarity with the language of instruction.

| DISCUSSION AND IMPLICATIONS
These findings indicate both areas of strength and areas for improvement in how researchers operationalize and study arguments and argumentation related to engineering. As an area of strength, this body of research identified positive developments, such as improvement in the quality of arguments or knowledge of scientific concepts, which occur when learners make claims in relation to different technologies: either their own or others'. As another strength, across studies as a whole, learners marshaled different types of supports for their arguments, such as their knowledge of science, attention to finances, concern for safety and the environment, and ethical considerations. Finally, this body of research, as a whole, presented engineering-related claims as being constructed through reading and writing, through conversations with different stakeholders, and (to a lesser extent) through empirical tests, all of which are authentic epistemic practices in engineering workplaces (Phillips, Fosmire, Petershiem, Turner, & Lu, 2018;Tenopir & King, 2004;Vinck, 2003).
In addition to illuminating areas of strength, this study points toward areas in which research and practice on engineering-related argumentation might improve, which we elaborate in detail later. To make recommendations regarding improvements, we identified ways in which the findings from this systematic review do not cohere with K-12 national standards (e.g., NGSS Lead States, 2013) and post-secondary accreditation requirements (ABET, 2018), with philosophical and empirical literature on engineering education (e.g., Katehi, Pearson, & Feder, 2009;Koen, 2003), and with research in engineering education that foregrounds the practices of underrepresented youth (e.g., Nazar, Calabrese Barton, Morris, & Tan, 2019;Wilson-Lopez, Sias, Smithee, & Hasbún, 2018).

| Diversifying engineering-related arguments
This systematic review indicated that, across studies, learners had opportunities to justify claims for solutions about the designed world, most commonly regarding whether an existing technology should be adopted, and to consider different factors when supporting these claims. However, to be more coherent with NGSS and ABET recommendations for learners to have opportunities to engage in design themselves, this review suggests that learners can be provided with more opportunities to make claims related to their own designs (appearing in 23.93% of studies), versus making claims regarding others' designs. When educators position learners as designers (versus consumers or evaluators) of technologies and processes, this positioning can open opportunities for learners to make other types of claims as well.
For example, learners could create their own tests to study aspects of their designs, thereby resulting in opportunities to make more testing claims, and they could analyze why their designs did not perform as expected, thereby resulting in opportunities to make more failure analysis claims.
Positioning learners as designers throughout a product's life cycle (development, testing, troubleshooting, and disposal) could introduce them to a wider range of epistemic practices that are core to the work of many engineers because they would have opportunities to make a wider range of justified claims embedded within engineering activities. While several claims (e.g., failure analysis and testing) occurred to a limited extent and often by happenstance in the studies analyzed, we suggest that positioning learners as designers would provide more intentionally planned opportunities for them to engage in broader, interrelated types of knowledge construction (e.g., knowledge regarding why something failed or how to design a test), as well as other types of knowledge construction that did not appear in this dataset. Just as the terms "scientific arguments and argumentation" can be operationalized in different ways in relation to different epistemic practices of the sciences (Osborne et al., 2018), "engineering arguments and argumentation" might also be operationalized as inclusive of a wide range of claims that can be fostered through learners' active participation as designers.
6.2 | Designing learning environments that support engineering-related argumentation In addition to pointing toward different types of arguments that learners might make in relation to the engineering-designed world, this systematic review also points toward possible facets of learning environments that may support learners in producing engineering-related arguments. In the following, we address four facets of these environments: providing language and literacy supports; providing opportunities to collect and use empirical data; scaffolding the coordination of multiple, potentially competing criteria and constraints; and foregrounding and sustaining learners' epistemic practices.

| Providing language-and literacy-based supports
As indicated by this review, oral language practices and literacy practices were the two most common methods through which engineering-related claims were constructed across studies. Though coded separately in this systematic review, reading, writing, and discussion often work synergistically to help learners construct and communicate knowledge (Bereiter & Scardamalia, 1987;Nystrand, 1986). This review indicated that scaffolds for oral language practices, including open-ended question prompts for whole-class or small-group discussions and role-plays, can support learners' argumentation in relation to the designed world. This finding echoes research on oral discourse in education (e.g., Nystrand, Wu, Gamoran, Zeiser, & Long, 2003), which suggests that rich student talk is correlated with increased student learning in multiple disciplines.
In addition to oral language scaffolds, literacy-based supports may also be a promising method for supporting engineering-related argumentation. In several studies (38.46%) included in this systematic review, learners were provided with explicit writing-based supports, such as graphic organizers or exemplar texts, before they wrote their own arguments. However, we did not find equally substantial evidence that researchers or practitioners also emphasized reading-based supports. A robust body of research (Lee, Quinn, & Valdes, 2013;Pearson, Moje, & Greenleaf, 2010) has indicated that emergent bilingual students, as well as K-16 students for whom the language of instruction matches their home language, benefit from reading comprehension instruction when they read difficult technical or scientific information, which is common to engineering. Although literacy practices were common argumentation practices in this dataset, most studies in which learners read or searched for information did not indicate that learning environments included explicit supports for comprehending difficult vocabulary or information, locating and evaluating information online, or synthesizing and applying information obtained across multiple sources. To address this absence, future research could identify promising instructional approaches that scaffold learners in locating, interpreting, evaluating, and synthesizing information as they seek to make informed claims in engineering.

| Providing opportunities to collect and use empirical data
Physical practices, including testing or observing, were not reported in about two-thirds of the studies. Moreover, when studies did describe empirical tests, the test results were not always closely related to the subsequent claims.
We recognize there are pragmatic reasons why testing in K-16 settings might not fully approximate testing in engineering workplaces: It is a safety concern to test potentially harmful substances, such as make-up, on humans, and physical prototypes may be too expensive or dangerous to build and/or implement in non-workforce settings. At the same time, learners can have opportunities to reflect on the quality and adequacy of tests and observations in relation to claims, and to iteratively improve testing conditions. Based on these findings, we suggest that educators can advance epistemic practices of engineering by providing learners with opportunities to generate and use data while they robustly consider the relationships between tests or observations and designs. This approach would cohere with a range of philosophical and empirical literature (Dym et al., 2013;Vinck, 2003) as well as K-12 and post-secondary educational standards (ABET, 2018;NGSS Lead States, 2013), which emphasize the imperative for adequate tests of prototypes or models before a device, process, or system is made available to the public.

| Coordinating and weighing multiple supports for claims
This systematic review found that many studies described research participants using fewer than three types of justifications for a single claim related to the designed world. Even in studies in which participants used more than four types of justifications, we did not find evidence that they weighed potentially competing considerations using a systematic process. Finally, only a handful of studies explicitly mentioned the term trade-off, another indication that learners did not have instructional supports that enabled them to weigh the specific benefits of a design element against its drawbacks. Therefore, this review points toward the need for research on instructional practices that scaffold learners toward the production of oral and written arguments that systematically weigh trade-offs in the context of multiple specified criteria and constraints.
These instructional practices would more closely align with complex practices found in engineering workplaces. They would also more closely align with the NGSS, which require students to compare design solutions against specified criteria and constraints in elementary school and to use systematic processes to consider how designs meet increasingly complex sets of criteria and constraints as they progress through middle and high school. In post-secondary schools, supports for claims can be expanded to include addressing the impacts of designs in intersecting global, economic, environmental, and societal contexts (ABET, 2018).

| Foregrounding learners' epistemic practices
This recommendation stems from the ethical imperative to sustain youths' cultural and linguistic practices as a core endeavor of education in a democratic society (Paris, 2012). Epistemic practices, or "how we know," constitute and are derived from epistemic cultures (Knorr-Cetina, 1999). These epistemic cultures and practices have often been defined in relation to scientific workplaces (Knorr-Cetina, 1999) and disciplines such as engineering (Kelly & Cunningham, 2019). However, other scholars, such as Bang and Medin (2010) in their work with Indigenous youth, have asserted that youth also engage in epistemic practices-derived from sociohistorical traditions and interactions with elders, family, community members, and places-which provide generative platforms for learning and doing expansive forms of science and engineering. Nazar et al., (2019), for example, described how a 12-year-old African American male critically engaged "epistemologies of place" (p. 638) through deep and sustained co-construction of knowledge with community members as he engineered an app, which used the community's geospatial knowledge of the area, to prevent bullying.
This example illustrates that youth bring critically important knowledge and knowledge construction practices, which can be engaged with engineering technologies and processes that matter to them. This systematic review indicated that youth are given some opportunities to draw from their own sense of ethics, from their knowledge stemming from their relationships and experiences with people (coded as human users), from their everyday experiences that could be applied to other domains (coded as analogy), and from their knowledge of health and respect for nature and the environment (coded as safety and environment). Studies (Calabrese Barton & Tan, 2009;Chinn, 2011; have indicated that this basis for knowledge construction can be especially salient for many youth who are underrepresented in engineering. Thus, learning environments, designed to support engineering-related argumentation, can foreground youths' values, experiences, and connections with people and places, in addition to engaging epistemic practices of engineering as defined in standards and theoretical and empirical literature.

| Designing empirical studies of engineering-related argumentation
Though this systematic review did not attempt to evaluate the quality of the included studies, it nonetheless points toward areas for future research. As noted in our description of the scoping review, we found that researchers who published in science education have written many studies that are germane to engineering. However, most of these studies (93.16%) were not purposefully designed to explore facets of participants' engineering design thinking or activity. Additionally, most studies (97.44%) were conducted with K-16 students or practicing teachers. Because engineered technologies and systems affect all people and because school years represent a relatively small portion of a person's lifespan, more research might be conducted on how adults, including non-engineers, construct or evaluate arguments, such as claims on social media, related to technologies. This body of research could inform approaches to promoting informed personal action or citizenship in relation to the engineering-designed world. In all, this systematic review indicates the need for more research that explicitly traces developments or identifies outcomes in pre-K to adult learners' engineering thinking and practices as they construct or evaluate claims.
Moreover, findings from this systematic review indicate that research in engineering-related argumentation should be more explicit in regards to participants' race, ethnicity, class, and language, all of which are factors that may intersect with learners' desires to pursue engineering pathways (Foor, Walsen, & Trytten, 2007;Ohland et al., 2011). We affirm Pawley's (2017) assertion that "all research, not only the research focused on diversity, [should] report the gender and race of participants so that we could begin to see how many studies make claims about people 'in general' when in fact the majority of their participants are white males" (p. 532). In contrast to Pawley's recommendation, 39.32% of the studies did not mention any demographic information about the participants, while 77.78% did not mention the participants' race or ethnicity. Furthermore, despite the fact that emergent bilinguals comprise almost 10% of public school students in the United States (McFarland et al., 2018), only 4.27% of the studies mentioned language in relation to research participants. In summary, future research can explore the engineering-specific outcomes in the context of learning environments that simultaneously elicit learners' epistemic practices while scaffolding participation in the epistemic practices of engineering with diverse learners whose demographics are specified.