• accountability;
  • language of science and classrooms;
  • science education;
  • sociocultural issues;
  • validity/reliability


Education policy in the U.S. in the last two decades has emphasized large-scale assessment of students, with growing consequences for schools, teachers, and students. Given the high stakes of such tests, it is important to understand the relationships between students' answers to test items and their knowledge and skills in the tested content area. Due to persistent test score gaps, students from historically non-dominant communities, and their teachers and schools, are differentially affected by the consequences of large-scale testing. As a result, it is particularly important to understand how students from historically non-dominant communities interact with test items on large-scale tests. We report on a study in which we interviewed 36 students about their responses to six multiple-choice science test items from the Massachusetts state science assessment for fifth grade. The 36 students included 12 students from low-income households, 12 English Language Learners, and 12 middle-class native English speakers. We found that for five of the six selected test items, students' descriptions of the science content knowledge they used to answer the test items frequently did not match the content knowledge targeted by the items. In addition, students from low-income households and English Language Learners were more likely than middle-class native English speakers to answer incorrectly despite demonstrating knowledge of the targeted science content for the items. We argue that such evidence challenges the expectation that students' answers to individual test items reflect their knowledge of the targeted science content, and that evidence of this kind should be included in investigations of the validity of large-scale tests. © 2012 Wiley Periodicals, Inc. J Res Sci Teach 49: 778–803, 2012