Large-scale interventions in science education for diverse student groups in varied educational settings


  • Okhee Lee,

    Corresponding author
    1. Department of Teaching and Learning, Steinhardt School of Culture, Education, and Human Development, New York University, 239 Greene Street, Room 635K, New York, New York, 10003
    • Department of Teaching and Learning, Steinhardt School of Culture, Education, and Human Development, New York University, 239 Greene Street, Room 635K, New York, New York, 10003.
    Search for more papers by this author
    • Guest Editor.

  • Joseph Krajcik

    1. College of Education and College of Natural Science, Michigan State University, 237 Erickson Hall, East Lansing, Michigan 48824
    Search for more papers by this author
    • JRST Co-Editor.


Current classroom practices in the U.S. and internationally have largely been shaped by changing student demographics and accountability policies. This special issue includes manuscripts that develop conceptual frameworks or report on empirical studies addressing large-scale interventions of educational innovations for diverse student groups in varied educational settings. Understanding issues related to large-scale interventions will be particularly important for the U.S. as the science education system embraces new science standards [National Research Council, [2011] A framework for K-12 science education: Practices, crosscutting concepts, and core ideas. Washington, DC: National Research Council.]. In our introduction to the special issue, we discuss critical issues in scaling up educational innovations through which large-scale interventions evolve. First, we describe the process of scaling up an educational innovation. Then, we address challenges in scaling up an innovation. Next, we discuss implications that these challenges present to implementation of an innovation and evaluation of its efficacy and effectiveness. Finally, we briefly introduce the articles that appear in this special issue. © 2012 Wiley Periodicals, Inc. J Res Sci Teach 49: 271–280, 2012

Internationally, current classroom practices have largely been shaped by changing student demographics and accountability policies. In the U.S. the school-aged population is becoming increasingly more diverse, while achievement gaps across content areas persist and in some areas are increasing. At the same time, more demands are being placed on all students as a result of high-stakes testing and accountability policies. Educational innovations take place against the backdrop of changing student demographics and evolving educational policies.

This special issue includes manuscripts that develop conceptual frameworks or report on empirical studies addressing large-scale interventions of educational innovations for diverse student groups in varied educational settings. These manuscripts discuss challenges and failures as well as accomplishments and promises of such efforts. Interventions are defined broadly including, but not limited to, curriculum development, teacher professional development, instructional strategies, assessment, learning technologies, school restructuring, whole-school reform, school leadership, or some combination of these. In a similar manner, student diversity in varied educational settings is defined broadly including, but not limited to, race, ethnicity, culture, language, social class, disability, and gender.

In recent years, the primary funding sources for large-scale interventions in the US include the Institute for Education Sciences (IES) at the US Department of Education and the Discovery Research K-12 (DRK-12) Program at National Science Foundation. In addition, more recent scale-up projects through the Investing in Innovation (i3) program will produce results about scale-up of educational innovations. Compared to English language arts and mathematics education, research and development in science education tend to lag. Thus, we are pleased with the contributions of this special issue that help to push our understanding of this important area forward and that address the current funding priorities in science education as well as English language arts and mathematics education.

In our introduction to the special issue, we discuss critical issues in scaling up educational innovations through which large-scale interventions evolve. First, we describe the process of scaling up an educational innovation. Then, we address challenges in scaling up an innovation. Next, we discuss implications that these challenges present to implementation of an innovation and evaluation of its efficacy and effectiveness. Finally, we briefly introduce the articles that appear in this special issue.

Scale-Up of Educational Innovations

A greater understanding of how and under what conditions scaling up of educational interventions can be successful is urgently needed for system-wide improvements (Schneider & McDonald, 2007a, 2007b). Yet, scale-up research is an emerging field in education that requires further conceptualization (Coburn, 2003; Raudenbush, 2007) and methodological rigor (McDonald, Keesler, Kauffman, & Schneider, 2006).

Conceptualization of the stages of going to scale for educational interventions is currently under development (Raudenbush, 2007). In the model that is used by the IES, the first stage involves designing an educational innovation. Once an innovation is designed, the next stage in the scale-up process is testing its efficacy, which means how well the innovation works under extremely favorable conditions. Efficacy studies involve the designers of the innovation in the implementation, for example, facilitation of professional development activities. The purpose of an efficacy study is to test causality—whether the innovation can have a significant impact on the desired outcomes (i.e., to test the theory underlying the innovation)—when implemented with fidelity. If an efficacy study demonstrates a positive impact of an intervention, the next stage involves testing its effectiveness in routine practices, which means how well the innovation works under typical resource constraints and across varied educational settings. Effectiveness studies usually do not involve the original designers of the innovation in the implementation. The purpose of an effectiveness study is to test generalizability—whether the innovation has the ability to go to scale (i.e., to test broad-based implementation of the innovation). If an effectiveness study demonstrates a positive impact, then evidence exists that the innovation is robust enough to scale broadly. In evaluating the scale-up of an innovation, a randomized control design is employed to test the impact while holding other variables constant. This multi-step approach is important to ensure that the large amount of resources needed for scaling up an innovation is used only when there is good evidence that the innovation will be successful on a large scale.

In a similar manner, the NSF's DR K-12 program (2010) applies a “cycle of research and development” for scale-up as follows: (1) synthesize and theorize; (2) explore, hypothesize, and clarify; (3) design, develop, and test; (4) implement, study efficacy, and improve; and (5) scale-up and study effectiveness. For example, this cycle of research and development is seen in the intervention designed to improve middle school students' understanding of science through curriculum materials that engage students in finding solutions to real world questions through investigations, collaboration, and use of cognitive tools (Fishman, Marx, Blumenfeld, Krajcik, & Soloway, 2004; Geier et al., 2008; Marx et al., 2004) or the intervention designed to promote science learning and language development of English language learners in urban elementary schools (Lee & Maerten-Rivera, in press; Lee, Maerten-Rivera, Penfield, LeRoy, & Secada, 2008). In both lines of work, researchers first worked closely with a small number of teachers and classrooms to develop an intervention. Then, they used their findings to modify and scale up the intervention to other settings. Such careful research and development can result in scaling up an intervention that can promote teacher change and student learning.

Scaling-up efforts occur within the confines of national and state-level policies, local institutional conditions, limited resources, individual teacher practices, expectations of local stakeholders, and other factors. As a result, scaling up of interventions requires compromises between conceptual rigor and fidelity of implementation as intended by the developers, on one hand, and constraints of real world implementation in which funds for resources are limited and high-stakes testing dominates the focus of classroom instruction, on the other hand. Furthermore, evolving student demographics and growing cultural and linguistic diversity in classrooms lead to additional conflicts and inconsistencies in defining what constitutes effective and equitable educational policy and practice.

Challenges in Scale-Up of Educational Innovations

Successful interventions require more than just high quality curriculum, effective teacher professional development, or any other form of innovation. Developers and researchers also have to contend with multiple factors that lie outside of the intervention, but influence the implementation and impact of the intervention. Next, we discuss some of these factors, particularly those that impact culturally and linguistically diverse students in urban settings.

First, high rates of teacher and student mobility in urban schools threaten to compromise the fidelity of implementation of the intervention since the impact of any intervention is likely to depend, first and foremost, on exposure. Exposure to an intervention is one of the criteria to measure fidelity of implementation (see O'Donnell, 2008, for literature on curricular interventions; see Lee, Penfield, & Maerten-Rivera, 2009, on science intervention). It takes time for teachers to learn how to optimally implement new instructional strategies and for students to learn how to best learn from new strategies. Teachers or students who lack adequate opportunity to engage with the intervention are unlikely to learn from it. The challenge of high rates of mobility is more serious when an intervention involves a particular curriculum for a particular grade level(s), for example, science curriculum for grades 3 through 5. The challenge is even more serious with a multi-year intervention, for example, 3-year participation of teachers or 3-year participation of students from grades 3 through 5.

Second, for teachers working in challenging urban schools to feel comfortable implementing reform-oriented practices, they must first increase their own comfort level with science content, inquiry investigations, and/or new teaching strategies. Teacher knowledge and skills can be developed through a combination of effective professional development (Desimone, 2009; Garet, Porter, Desimone, Birman, & Yoon, 2001) and educative materials embedded in the curriculum (Davis & Krajcik, 2005; Remillard, 2005). As an added layer of complexity, in culturally and linguistically diverse school contexts, strategies for linguistic and cultural scaffolding must also be integrated into the process of teacher learning (Lee & Maerten-Rivera, in press; Moje, Collazo, Carillo, & Marx, 2001). As the scale of the intervention increases, choices need to be made in terms of how to maximize limited resources. Choices that limit professional development or educative materials are likely to have an adverse effect on teachers' willingness and/or ability to implement reform-oriented practices with fidelity.

Third, as interventions are scaled up, they need to become school-wide initiatives; otherwise, they are not likely to scale up within the system. On the one hand, collective participation of teachers from the same school, department, or grade is a critical feature of effective professional development as teachers develop common goals, share instructional materials, and exchange ideas and experiences arising from a common context (Desimone, 2009; Garet et al., 2001). On the other hand, school-wide implementation results in the inclusion of some teachers who are open to adopting more reform-oriented practices and other teachers who are less open to change. Gamoran et al. (2003) warn that teachers who are less open to change may resist, or even work directly against, programmatic changes that are supported by other teachers in their school, thereby revealing organizational divides within the school. Unless interventions have collective teacher buy-in, it is unlikely that they are scaled up within the system (Blumenfeld, Fishman, Krajcik, Marx, & Soloway, 2000).

Finally, as an intervention is scaled up, it should address the demands of high-stakes testing and accountability in its specific policy context (Marx & Harris, 2006; Settlage & Meadows, 2002; Southerland, Smith, Sowell, & Kittleson, 2007). This is because schools and districts place an increasing amount of trust in the intervention as it goes to scale. Schools take risks that it will not deter student learning, achievement, or performance on accountability measures. This caution on the part of school and district administrators is felt more strongly in urban schools, which are more likely to be under scrutiny based on past test performance (Geier et al., 2008). Such closely monitored schools tend to have concentrations of students who have traditionally been underserved in science and where sanctions against poor academic performance are most likely to be severe (Amrein & Berliner, 2002).

Together, these and other factors could present substantive challenges to an intervention and could undermine the fidelity of implementation in ways that may result in underestimates of the impact of the intervention on teacher change and student achievement. Yet, embracing these challenges is at the heart of scale-up research, in that these “real world” challenges allow the results to be more generalizable by including both volunteer and non-volunteer teachers and by considering high rates of teacher and student mobility, urban school contexts, and accountability policies, among other factors. This special issue hopes to provide greater insights into factors that both promote and prohibit scaling up of innovations with diverse student groups across varied educational settings.

Implications for Scale-Up Implementation and Evaluation: The Case of Teacher and Student Mobility

The current funding priorities for educational research and development pay particular attention to challenges of scaling up educational innovations, supporting teaching and learning for growing student diversity, and addressing the demands of high-stakes testing and accountability policies. Researchers and funding agencies are seeking models of implementation and evaluation that simultaneously address these overlapping challenges.

Of the multiple factors influencing educational innovations, below, we choose teacher and student mobility as an example of how this factor influences scale-up implementation and evaluation of an innovation. We stress that other factors, including those described above, also influence scale-up implementation and evaluation of an innovation in distinct ways.

High mobility rates tend to depress the impact of an intervention on teacher and student outcomes, leading to underestimation of the “true” effect that the intervention could have if it were fully implemented with a low rate of participant mobility. Reform-based curriculum or teaching practices stress depth of knowledge, scientific reasoning, and the practice of science (Krajcik, McNeill, & Reiser, 2008). Thus, it takes time for teachers to learn how to implement reform-based curriculum or teaching practices, even when they have been exposed to effective professional development. Teachers need to learn new teaching practices and pacing of the new curriculum. Moreover, because most reform-based curriculum or teaching practices focus on depth of understanding, teachers need to develop not only deeper subject matter knowledge themselves but also pedagogical content knowledge for their students. Similarly, students using reform-based curriculum should learn how to learn using the new materials.

In practice, high mobility rates are an inescapable reality in most urban settings, which complicates the implementation, reduces exposure to the intervention, compromises fidelity of implementation, disrupts teachers' professional growth, and disrupts the progression of students' learning. Thus, interventions in urban settings need to be designed with participant mobility and its ramifications in mind. For example, an intervention could be designed to explicitly examine how varied exposure levels are related to fidelity of implementation, teacher change, and student achievement.

Consideration of teacher and student mobility in scale-up could provide important conceptual contributions to the literature. The literature on teacher mobility to date has been limited to teachers who leave the teaching profession or move from one state/district/school to another, and has addressed the impact of mobility on student achievement in broad and general ways. In contrast, participant mobility in the specific context of educational interventions allows examination of how mobility affects evaluations of intervention efficacy and effectiveness. This work introduces additional facets to how the education field should address issues of mobility. First, we should think more broadly about how we conceptualize mobility; for example, teachers who change grade levels within the same school or teaching assignments within the same grade are examples of teacher mobility in specific intervention contexts. Second, we need to consider how teacher mobility not only limits teacher change and student achievement, but also limits researchers' ability to identify interventions that are most likely to improve teacher change and student achievement. In short, how we conceptualize mobility has important implications for how we should design and analyze scale-up research and evaluation in high-mobility settings.

Consideration of mobility issues can point to more accurate evaluation of the effects of educational interventions on teacher change and student achievement. First, high mobility rates are likely to compromise the fidelity of implementation of an intervention. Exposure to an intervention is one of several criteria for measuring fidelity of implementation (see the review of literature in O'Donnell, 2008; Lee et al., 2009). Researchers contending with high mobility rates should determine how to address the complication that the reported impact of any intervention may depend on exposure. Future research will benefit from considerations of how to operationalize exposure to an intervention, how to measure exposure, and how to account for exposure in data analysis and interpretation of results.

Second, mobility issues raise an important question about evaluating the efficacy of an intervention, or how the intervention works under extremely favorable conditions (Raudenbush, 2007, p. 26). High mobility rates in urban schools raise the question about the very notion of efficacy under less than optimal conditions. What does it mean to conduct an efficacy study when “very favorable conditions” are not feasible? Perhaps a more valid approach to the evaluation of efficacy would be to focus only on participants who receive full-exposure to the intervention.

Finally, mobility issues relate to evaluating the effectiveness of an intervention, or how well the intervention works as part of routine practice and under typical resource constraints (Raudenbush, 2007, p. 26). In urban schools with high mobility rates, a more valid approach to an effectiveness study would be to test the impact of full-exposure to the intervention and then test the impact of varying degrees of partial exposure separately. Another approach, when there is substantial variability in the exposure to the intervention, would be to address the relationship between exposure levels and teacher change or between exposure levels and student achievement.

Gaining a better understanding of participant mobility, as both a conceptual and a practical issue, is critical in evaluating an intervention's impact. Educational policy makers, instructional material developers, educational researchers, and funding agencies should all consider teacher and student mobility issues in designing, implementing, and evaluating educational innovations in urban settings in order to effectively engage those students who most need the support of such interventions. In addition, educational researchers should pay particular attention to devising research approaches that have greater validity for testing efficacy and effectiveness within the context of teacher and student mobility issues.

Although we focus on mobility issues in this section, other factors also influence the scale-up implementation and evaluation of an innovation. To scale up curriculum materials that are highly developed and built on what we know about student learning, they should be tested in classrooms that are similar to its intended use. Teacher professional development is another critical factor because without teachers' understanding, willingness, or ability to implement an intervention with fidelity, the intervention is likely to fail. Alignment of assessment practices to the intervention presents still another critical factor. In particular, when high-stakes tests and classroom assessment practices are not aligned, the intervention typically fails. Finally, administrative support to provide resources and infrastructure that are aligned with the intervention is a critical factor. Moreover, administrators need to provide the vision and leadership to ensure that innovations become sustained over time. An intervention that does not fit in with a new or changing policy initiative at the district level is likely to be abandoned while still being implemented, even if teacher and/or student outcomes are positive. Thus, scaling-up an intervention involves multi-faceted problems, as each factor or multiple factors compromise fidelity of implementation and underestimate the intervention's efficacy and effectiveness. The articles in this special issue shed light on some of these important factors.

Articles in the Special Issue

Because of the importance of understanding critical factors involved in large-scale interventions for diverse student groups in varied educational settings, JRST in November 2010 invited authors to submit manuscripts that discussed challenges and failures as well as accomplishments and promises of various large-scale interventions. Our goal was to learn more about large-scale interventions with respect to reform-based curriculum, teacher professional development, instructional strategies, assessment, learning technologies, school restructuring, or school leadership. The articles in this special issue address many of these important issues involved in large-scale interventions. Below, we briefly describe the articles published in this issue and highlight their major contributions.

Large-Scale Science Education Intervention Research We Can Use (William R. Penuel and Barry J. Fishman)

While intervention research traditionally involves testing the efficacy of innovations using scientifically based research (i.e., randomized control trial), Penuel and Fishman advance a different kind of research, design-based implementation research (DBIR), which is aimed at investigating and improving the effective implementation of interventions. DBIR is an emerging form of design research that supports the productive adaptation of interventions as they go to scale. In contrast to traditional intervention research that focuses on one level of educational systems, DBIR designs and tests interventions that cross multiple levels and settings. Penuel and Fishman argue that DBIR complements large-scale efficacy research, in that it seeks to support the development of usable and efficacious interventions and to support implementation of interventions found in efficacy studies for improvement of teaching and learning. The paper concludes by outlining four areas of DBIR that new science standards could achieve their intended purpose of establishing an effective and equitable system of opportunities for science learning of diverse student groups in the U.S.

A Retrospective View of a Study of Middle School Science Curriculum Materials: Implementation, Scale-Up, and Sustainability in Changing Policy Environments (Sharon Jo Lynch, Curtis Pyke, and Bonnie Hansen Grafton)

Large-scale interventions occur at the intersections of ongoing evaluation of outcomes and evolving district and state policies, and the synergy between the two could break even when the outcomes are positive. Lynch and colleagues describe an extended, comprehensive example of how teachers, schools, districts, and external factors (e.g., parental pressure and policy mandates) shape curriculum research. Using three middle school science curriculum units that were highly rated by the Project 2061 Curriculum Analysis, the study examined features of the interventions (curriculum materials), the environment (school district, schools, teachers, and science education leaders), and complex interactions that affected the process from implementation, to scale-up, and to sustainability of each of the units. Using evidence-based decisions, two of the units were found to be effective and equitable and went to scale, but one was not effective. However, the course of scale-up was also affected by a changing policy climate and proceeded in unpredictable ways. Four years after funding ended, none of the units were sustained within the school district. The interactions between the demands of the units and the school district's policy environment suggest reasons for these results. The authors' insights about their successes and challenges would resonate with the experiences of many of us as our intervention research has been affected, either positively or negatively, by shifting policy mandates and priorities at the state, district, school, and teacher levels.

Differential Effects of Three Professional Development Models on Teacher Knowledge and Student Achievement in Elementary Science (Joan I. Heller, Kirsten R. Daehler, Nicole Wong, Mayumi Shinohara, and Luke W. Miratrix)

Regardless of whether innovations involve curriculum, instruction, or assessment at the classroom, school, or system level, teacher professional development is an essential component to bring about sustained change. Although a number of manuscripts have been published that describe professional development, few have carefully examined the outcomes using experimental methods in which specific treatments are tested. Thus, the field of science education needs more rigorous intervention research to determine the type of professional development that can lead to changes in teacher practices and student learning. Heller and her colleagues conducted such a study in which they examined three different types of professional development interventions—teaching cases, looking at student work, and metacognitive analysis—along with the control group. They used a randomized experiment implemented in six states with over 270 elementary teachers and 7,000 students. Their work shows the importance of professional development that blends both content learning and analysis of student work rather than advanced content alone. A unique aspect is the use of both teacher and student outcomes to explore differential effects of professional development models. Moreover, their work points to the importance of carefully designed, well-controlled large-scale experimental studies to make decisions on how to scale up professional development interventions. Their study can serve as a model for other researchers in the field.

Science Assessments for All: Integrating Science Simulations into Balanced State Science Assessment Systems (Edys S. Quellmalz, Michael J. Timms, and Matt D. Silberglitt)

In this era of high-stakes testing, we often hear criticism that high-stakes tests do not measure what we most treasure in student learning—the ability to reason, use evidence, solve problems, and justify. One primary cause of this lack of alignment between the goals of science education and high-stakes tests stems from the limited focus of the assessments. Quellmalz and colleagues explore the technical quality, feasibility, and utility of simulation-based science assessments to develop assessments that match more closely to goals of science education. Using an evidence-centered design approach, they make use of the visual, dynamic, and interactive features of simulations to design formative assessments that provide insights into middle school students' learning of ecosystems and force and motion. They tested the assessments across several states and thousands of students. They found that the simulation-based assessments that contained more visual representations and interactive features but less textual features allowed a great range of students, including English language learners and students with disabilities, to better demonstrate their science understanding in comparison to more conventional approaches to assessment. Their work provides a model as to how assessments can be developed as technology becomes prevalent in classrooms. Their work is particularly critical as the science education community embraces a new science education framework (National Research Council, 2011) and the release of new science standards that will articulate students' performance expectations blending big ideas, scientific and engineering practices, and disciplinary core ideas.

Investigating the Effectiveness of Computer Simulations for Chemistry Learning (Jan L.Plass, Catherine Milne, Bruce D. Homer, Ruth Schwartz, Elizabeth O. Hayward, Trace Jordan, Jay Verkuilen, Florrie Ng, Yan Wang, and Juan Barrientos)

Advances in computer technology and computer-based educational environments have become prominent in virtually all aspects of science education. In this study, Plass and colleagues examined the effectiveness of computer simulations to support student understanding of complex concepts in high school chemistry classrooms. In two studies conducted in rural Texas and urban New York contexts, chemistry teachers implemented two versions of a curricular unit—the experimental version incorporating simulations and the control version using text-based materials covering the same content (which was not “business as usual” curriculum). Plass and colleagues tested the effectiveness of the simulations on student learning outcomes and how fidelity of implementation affected the outcomes in the rural and urban contexts. While the results supported the effectiveness, they had to contend with variations of fidelity in the two contexts in terms of school district policies, administrative support, and teachers' willingness to implement the intervention as intended. In particular, two key features had significant bearings on the implementation and impact of the intervention—technology infrastructure and student mobility. Like the Lynch et al. study described above, this study is another sobering reminder of how intervention research is affected by multitude of factors at the state, district, school, and teacher levels.

Closing Comments

Although many individuals are concerned with improving the teaching and learning of science, we will never bring about sustained change without a careful examination of what it takes to bring innovations to scale. Ron Marx closes this special issue by offering a broad perspective of the education system within which large-scale interventions take place. He points out three challenges to science education research and practice: (1) the changed meaning of science education reform over the past two decades, (2) the importance of teacher professional development and the role that well-designed and internally valid research plays in developing knowledge in this area, and (3) the chaotic and contradictory nature of educational policy in the U.S. While recognizing a maturing research tradition that has enormous promise in science education, he argues that our field needs to align our research and development efforts with the ways in which real and impactful decisions are made within the rapidly changing policy context.

His insights are especially important because we live in an era when all individuals need to have an understanding of science for personal, societal, and global issues. Moreover, the children of today will be integral in helping to solve the problems that face our planet. Scale-up of educational innovations is intended to bring about sustained change in educational practices and policies. To this end, current funding priorities focus on large-scale innovations across a wide range of areas in education endeavors. The manuscripts in this special issue address what is involved in implementing and evaluating large-scale interventions with diverse student groups. Understanding issues related to large-scale interventions will be important for the U.S. as the science education system embraces new science standards (National Research Council, 2011). We hope that the insights provided by the manuscripts in this special issue spur further conversation on these critical issues and that the manuscripts contribute to improving the teaching and learning of science for all students.