The TIMSS Science Framework
IEA's first science study FISS was conducted in 1970/1971, followed by the second (SISS) in 1984/1985 and the third (TIMSS) in 1995. The last combined mathematics and science, leading to TIMSS, the Third International Mathematics and Science Study, and was repeated (TIMSS R) in 1999. From 2003, TIMSS became the Trends in Mathematics and Science Study with surveys arranged every 4 years (2003, 2007, and 2011).
The FISS framework introduced a two-dimensional matrix, or a table of specifications, based on Tyler (1950) in combination with Bloom et al.'s (1956) taxonomy. The principle was to categorize learning objectives into content, meaning the specific subject matter to be conveyed, and behavior, explained as, what the student should do with the material. Presenting these in a matrix produced cells combining every behavior category with every content category, generating a “blueprint” to ensure content validity; that all aspects (cells) were included in the assessment. The matrix also underlined the inseparability of content and behavior, demonstrating the impossibility of understanding content knowledge without using cognitive behaviors, and vice versa. Bloom's taxonomy added another principle by providing a hierarchical structure of the cognitive behavior. This helped separate simple from advanced reasoning and therefore could be used as an expression for cognitive demand. The framework model set a standard that was used by IEA across all subjects and, as will be shown, has dominated many assessment projects since. The FISS framework included four content categories, earth sciences, biology, chemistry, and physics, and four hierarchical behavior categories, functional information, comprehension, application, and higher processes (Comber & Keeves, 1973).
Bloom, Hastings, and Madaus (1971) published the Handbook on Formative and Summative Evaluation of Student Learning in the same year as FISS was carried out. This text was also based on the Tyler–Bloom behavior-by-content model and had invited authors to write chapters presenting assessment frameworks for school subjects. The authors, curriculum specialists with training in either a content area or educational psychology, approached their tasks differently (Haertel & Calfee, 1983). Psychologists tended to describe objectives in terms of mental structures, whereas content specialists looked toward the curriculum structure. Leopold Klopfer, the science chapter author, belonged in the latter group. His chapter started with a statement about inclusion of the “full range of student behaviors which may be sought as outcomes of science instruction in elementary and secondary schools” (Klopfer, 1971: 566), including
- cognitive categories from Bloom's taxonomy,
- processes of scientific inquiry,
- skills of performing laboratory work,
- students’ attitudes toward science, and
- students’ orientations to the relationships of science to other aspects of culture and to the individual.
As a move to implement all these, Klopfer created a two-dimensional framework, just as in FISS, but including all bullet points as categories in the behavior dimension. This caused three essential problems. First, a “dimensionality problem” arose because very different elements (Bloom's cognitive categories, laboratory skills, and attitudes) were placed in the same dimension; second, the hierarchical structure introduced with Bloom's taxonomy was disturbed and thereby made it more difficult to express cognitive demand; and third, categories not fitting neatly into the separation between content and behavior were brought in. Examples of the last problem are scientific inquiry and orientation (i.e., knowing about science). Klopfer's writing reveals a struggle to decide if either of these categories is “behavior” or “content.” The final outcome was to place them both in both dimensions. Despite these problems, Klopfer's framework was adopted for IEA's second science study, SISS, (Rosier, 1987), with only relabeling as a significant change. The behavior dimension became the objective dimension. This reflected Klopfer's reinterpretation of the original Tyler–Bloom matrix: the new behavior dimension read as a list of objectives rather than Bloom's interpretation of cognitive behavior. In contrast, however, categories in the behavior dimension were renamed similarly to Bloom's terminology and made to look like cognitive behaviors. Process of scientific inquiry, for example, was renamed processes and the application of scientific knowledge and method renamed application.
The two next revisions of the IEA science frameworks appear as attempts to solve Klopfer's three problems. The third, TIMSS 1995, committee (Robitaille et al., 1993) focused on the “dimensionality problem,” resolved by splitting the behavior dimension into two more coherent dimensions: performance expectations, which combined Bloom's cognitive domain and scientific inquiry processes, and perspectives, including attitudes and orientation. The solution was not ideal, making a complicated three-dimensional matrix (performance expectations, perspectives, and content). The problem of a hierarchical performance dimension was also discussed, but dismissed at the time because the argument that science processes cannot be ordered in this way held sway. Klopfer's third problem that science inquiry is both “behavior” and “content” was left untouched.
The committee responsible for the fourth revision in the TIMSS 2003 framework (Mullis et al., 2003) attacked, and solved, all Klopfer's three problems but at a certain cost. The solution, first, involved moving scientific inquiry out of the matrix to become a separate “overarching” dimension; “[overlapping] all of the fields of science and [having] both content- and skills-based components” (Mullis et al., 2001, p. 69). This alternative (see below) was adapted from NAEP's 1996–2005 framework (NAGB, 2004). Second, the perspectives dimension, with attitudes and interests, was excluded from the framework entirely. Together, these two moves reestablished a two-dimensional matrix where both dimensions are more unidimensional (Klopfer's first problem), reinstated a hierarchical behavior dimension (Klopfer's second problem), and removed topics that belonged to both dimensions (Klopfer's third problem). The behavior dimension, labeled cognitive domain, included three categories simplifying Bloom's taxonomy:
- factual knowledge,
- conceptual understanding, and
- reasoning and analysis.
For the TIMSS 2007 study (Mullis et al., 2005), categories in the behavior dimension were relabeled to match the revised version of Bloom's taxonomy (Anderson et al., 2001) using the categories
- applying, and
In summary, the IEA science framework has moved from and to a two-dimensional matrix based on the Tyler–Bloom model and defining behavior as cognitive demand. Klopfer “disturbed” this model by attempting to include the “full range of student behaviour.” This, however, was unsuccessful and only by classifying scientific inquiry separately and excluding attitudes and nature of science has the framework become conceptually coherent and functional.
The NAEP Science Framework
The U.S.-based NAEP science studies began in 1969/1970 and have been repeated 10 times over the past 40 years. The first few surveys were carried out at irregular intervals, with individual states participating voluntarily. Since 2001, the reauthorization of the Elementary and Secondary Education Act, often referred to as No Child Left Behind, requires states’ participation at Grades 4 and 8 every 4 years in science and reading and mathematics biennially. The National Assessment Governing Board (NAGB) holds overall responsibility, whereas the assessment is carried out by the National Center for Education Statistics (NCES). NAEP results are known colloquially as the Nation's Report Card.
NAEP started with an open listing of objectives, styled like curriculum guidelines. A systematic categorization developed after a few surveys with “dimensions and categories” similar to those in IEA's frameworks (NAEP, 1979), although not combining dimension in a matrix. Thus, the 1976–1977 survey listed three dimensions separately:
- content (the body of science knowledge),
- process (the process by which the body of knowledge comes about), and
- science and society (the implications of the body of knowledge for mankind).
In 1981–1982, a fourth dimension, attitudes, was added (Hueftle, Rakow, & Welch, 1983). By having an open principle and using four dimensions, the framework omitted conceptual problems described in Klopfer's and IEA's two- and three-dimensional matrices above. However, the conflict between general cognition and scientific inquiry that tainted the IEA study was underlying and became apparent in the 1990 NAEP science framework, when process was renamed thinking skills and given the following three categories:
- knowing science,
- solving problems, and
- conducting inquiries.
At this stage, NAEP also adopted a matrix structure like that of IEA and a series of revisions were made leading to a new framework (Figure 1) in 1996 that was kept unchanged for nearly 10 years (NAGB, 2004). This framework had a “content” dimension named fields of science and a “behavior” dimension named knowing and doing. The attitudes dimension from the previous framework was removed and the science and society became an “overarching” dimension called nature of science outside the two-dimensional matrix. Another overarching dimension called themes was also added.
The framework had commonalities with IEA developments occurring at the time, but with some key differences. First, as with Klopfer (1971) and SISS (Rosier, 1987), NAEP extended the behavior dimension by allowing it to include both (Bloomian) cognitive behavior and scientific inquiry. This caused a similar reinterpretation of behavior from what students should do “to knowledge” toward a general statement of what they should do “in science” (i.e., making it an objectives dimension rather than a classification of cognitive demand). While IEA, however, returned to a Tyler–Bloom interpretation, NAEP in the 1996–2005 framework kept the objectives interpretation. This, it seems, was influenced largely by U.S. curriculum development project representation in NAEP (Mullis, 1992). The framework committee “reviewed key blue-ribbon reports, examined exemplary practices, studied local and state-based innovations in science curricula, reviewed science education literature, and noted innovations emerging in other countries” (NAGB, 2004: 9). Among projects reviewed, for example, Mullis (1992) lists Project 2061 (American Association for the Advancement of Science, 1989), by the American Association for the Advancement of Science, and Scope and Sequence (Aldridge, 1989), by the National Science Teachers Association. Both projects demanded widening the science curriculum from traditional teaching of scientific concepts and theories. In other words, there was a great pressure on NAEP to include “the full range of student behaviors” and not, like TIMSS, place scientific inquiry and nature of science (which do not fit into the Tyler–Bloom matrix) in the background. Second, NAEP expressed awareness of Millar and Driver (1987) and others who claimed that science behavior is knowledge dependent. Hence, statements such as “control of variables,” became something students should understand rather than a skill they should do. The knowing and doing term was thus used to express that behavior means knowing and doing science (as distinct curriculum aims), and that the behavior includes knowledge. These changes had the effect of making the two NAEP matrix dimensions more independent, combining them becoming an ideal rather than a psychologically inextricability as it had been in the Tyler–Bloom rationale.
By abandoning the Tyler–Bloom interpretation of behavior, NAEP was left with the same problem of describing levels of achievement as IEA had experienced (i.e., Klopfer's second problem). The solution came in terms of the “Angoff principle” and took place against a background of general debate about U.S. academic achievement, which claimed unacceptably low levels, masked by norm-referenced reporting (Koretz, 2008, p. 182). Angoff (1971) suggested using panels of judges to evaluate item difficulty, coupled with alignment with cut scores for achievement levels on assessment scales. Subsequently, three levels were introduced across all NAEP frameworks (NAGB, 2004). These were basic, denoting “partial mastery of prerequisite knowledge and skills that are fundamental for proficient work” (p. 36). proficient, representing “competency over challenging subject matter, including subject matter knowledge, application of such knowledge to real-world situations, and analytical skills appropriate to the subject matter solid academic performance” (p. 36). and advanced, meaning that students could “integrate, interpolate, and extrapolate information embedded in data to draw well-formulated explanations and conclusions” and “use complex reasoning skills to apply scientific knowledge to make predictions” (p. 36). Placing this principle onto the two-dimensional matrix (content and behavior) created a “third dimension” for achievement level. Interestingly, this new dimension became similar to the hierarchical ordering of the TIMSS cognitive demand dimension, i.e., still had resemblance with Bloom's taxonomy. For example, the basic, proficient, and advanced levels matching knowing, applying, and reasoning and with many similar cognitive processes at each level. A difference, however, was NAEP including the complexity of knowledge and not just reasoning in their definition of cognitive demand.
One observation from the NAEP 1996–2005 framework document is the struggle in explaining the changes being made. Labeling the behavior dimension knowing and doing, for example, illustrated a fundamental problem accounting for the knowledge dependency of the behavior dimension. The combination of content (fields of science) and behavior (knowing and doing) into the third achievement level dimension was also explained hesitantly.
The next, current, version (see Figure 2) applying from 2009 corrected some of this uncertainty, making the principles from the previous version explicit and theoretically coherent. The achievement level dimension, for example, was named performance expectations and explained,
… science content statements can be combined (crossed) with science [behaviour] to generate performance expectations (i.e., descriptions of students’ expected and observable performances on the NAEP Science Assessment). Based on these performance expectations, assessment items can be developed and then inferences can be derived from student responses about what students know and can do in science. (NAGB, 2008, p. 63)
This version of the framework emerged from another comprehensive development process involving many experts from different academic areas and an extensive hearing among science educators. NAEP, however, this time has moved away from the “curriculum influence” expressing more interest in including “new advances from science and cognitive research” and “to learn from experiences in TIMSS and PISA” (NAGB, 2008, p. 2). This has resulted in new principles, but also new ambiguities in the conceptualization, discussed next.
The framework has moved away the overarching dimensions (nature of science and themes), and by using two dimensions only appears more similar to the traditional content-by-behavior matrix. The behavior dimension, however, has become science practices, demonstrating an interest to adapt to the “practice turn” in learning sciences and science studies. One implication arising is that the nature of science dimension is embedded in behavior and linked to students’ cognition. The framework document explains this using Li and Shavelson's (2001) distinction between declarative, procedural, schematic, and strategic knowledge, presented as “knowing that,” “knowing how,” “knowing why,” and “knowing when and where to apply knowledge” (NAGB, 2008, p. 65). For example, the practice “using scientific principles” is explained as drawing on schematic and declarative knowledge to predict observations and evaluate alternative explanations (p. 68). Other practices in Figure 2 are explained similarly.
The framework, however, implies uncertainty about what procedural, schematic, and strategic knowledge actually are. First, the knowledge is concealed in the framework, and not listed explicitly, and second, Li and Shavelson (2001) link these concepts to psychology rather than science philosophy, making it unclear how they can replace nature of science.
The lack of rationale for outlining and choosing categories in the science practice dimension is also problematic. These are presented as a “natural outcome” of the fact that “science knowledge is used to reason about the natural world and improve the quality of scientific thought and action” (NAGB, 2008, p. 66), a statement that gives poor guidelines for knowing what are sufficient or necessary categories. The actual practice categories included have many similarities to the knowing and doing categories in the previous version of the framework, suggesting these have been influential in what is regarded as “natural.”
The overall impression is, therefore, that NAEP's attempt to be at the cutting edge of science education has produced a framework that supports current perspectives in learning science and science studies, but which fail to operationalize these at a detailed level. The commitment to bring in “hundreds of individuals across the country” (NAGB, 2008, p. vi) seems further to have forced compromises to the labeling and organizing principles of the framework.
In summary, the NAEP science framework offers an alternative to that of the IEA. Both use two-dimensional content-by-behavior matrixes, but with different dimensions and underlying principles. TIMSS retains a “cognitive matrix,” describing behaviors as what students should “do to the knowledge.” NAEP, in contrast, first established a “curriculum matrix,” treating the behavior dimension as a fuller list of “objectives” of the science curriculum. This required a third dimension to define achievement levels. The conceptualization was later modified in the current version of the framework by redefining the behavior dimension as scientific practices. It is, however, unclear how this actually should be interpreted and the framework document fails somewhat to explain the difference between science practice and science process. NAEP's framework has been influenced by U.S. curriculum changes and the intention to implement educational research findings, but these act as double-edge swords, causing uncertainties about understanding of concepts and principles. Current challenges include explaining the meaning of embedding nature of science into science practices and establishing a rationale for selecting science practices.
The PISA Science Framework
The PISA project was established in 1997 and started triennial surveys from 2000. The surveys focus on literacy in reading, mathematics, and science, alternating between each domain as its main focus. Thus, the first PISA survey with science as main focus took place in 2006, with the next due in 2015. The OECD Secretariat is responsible for PISA survey design, and implementation is through an international consortium led by the Australian Council for Educational Research.
Starting in the late 1990s, PISA was in a different position to NAEP and TIMSS, as no previous version guided the choice of assessment framework. Hence, a new model could have been created. However, Wynne Harlen, the first framework committee chair, had experience from two assessment projects, namely the Techniques for the Assessment of Practical Skills (TAPS) (Bryce, McCall, MacGregor, Robertson, & Weston, 1988) and the Assessment for Performance Unit (APU) (Johnson, 1989). Both proved influential in the PISA development process. The first PISA framework (OECD, 1999) was similar to the APU framework (Murphy & Gott, 1984), adopting a three-dimensional framework with scientific processes, scientific concepts, and situations as dimensions. As in TAPS and APU, PISA announced scientific processes as the main target. Harlen (1999) continued a debate about knowledge dependency of scientific processes using arguments similar to those emerging from TAPS and APU (Gott & Murphy, 1987). The framework document (OECD, 1999), for example, argued “there is no meaning in content-free processes” (p. 60) and scientific knowledge and process “are bound together” (p. 60). Operationalizing these arguments proved to be as difficult as NAEP found. Authors resorted to phrases such as “processes, because they are scientific, involve knowledge” (p. 60) and “priority is given to processes about science compared to processes within science” (p. 61, their emphasis).
As the knowledge-dependency problem remained unresolved, the PISA framework's originality relative to TAPS and APU became the scientific literacy focus. This meant emphasizing the processes of evaluating scientific evidence and claims in socioscientific contexts, giving less attention to experiments and data gathering in a laboratory context. Five categories developed on the scientific process dimension were (OECD, 1999, p. 62)
- recognizing scientifically investigable questions,
- identifying evidence needed in a scientific investigation,
- drawing or evaluating conclusions,
- communicating valid conclusions, and
- demonstrating understanding of scientific concepts.
The second PISA framework retained the process-oriented focus, but rearranged the process dimension into three categories (OECD, 2003b, p. 137):
- Process 1: describing, explaining, and predicting scientific phenomena,
- Process 2: understanding scientific investigation, and
- Process 3: interpreting scientific evidence and conclusions.
As with many previous framework reviews, limited explanation for this is found, but the new categories became similar to Klahr and Li's (2005) three main “phases of the scientific discovery process” (p. 218). A move is therefore observed away from the stepwise approach to the scientific method seen in TAPS and APU, toward describing more how science works in principle. This was a small but important step toward the same “practice turn” as observed in NAEP. The conceptual problem, however, about defining science processes as “knowledge based” remained unsolved in the 2003 framework.
The scientific concepts and situation dimensions played inferior roles in both the first two PISA frameworks. The situation dimension, however, added an important difference from TIMSS and NAEP by describing characteristics of the context rather than what students should learn. This will be discussed later as an extension of the conceptualization of the science domain.
The next development, when scientific literacy became the main focus in PISA 2006, offered a new start and a new committee chaired by Rodger Bybee from the U.S. National Standards for Science Education Committee. His background together with developments in OECD's DeSeCo project to define key competencies for the future's (OECD, 2003a) laid the groundwork for a revised framework with scientific competency instead of science processes as the main focus and using a new organizing principle (see Figure 3).
The most notable characteristic on the new frameworks is omitting the traditional “matrix model,” representing the domain instead as a concept map. Compared to the matrix model, this format allows additional dimensions to be involved and it explains more explicitly relationships between the dimensions. Accordingly, the PISA framework in Figure 3 suggests students’ competency to “do science” is influenced by their knowledge and attitudes. This principle resolved Harlen's earlier conceptual problem of ascribing meaning to a science process being knowledge based. It also provided an alternative to NAEP's problem discussed earlier about explaining nature of science as “embedded” in science behavior. In the PISA framework, knowledge about science is placed alongside knowledge of science, that is, as a similar “cause” for scientific behavior. The framework has stayed unchanged since 2006, and gradually become familiar to many science educators as PISA has become a more important source for international comparison and for defining scientific literacy (DeBoer, 2011).
To understand the changes made to the new PISA framework, it is necessary to look toward the tradition of competency modeling that developed from the 1990s (Shippman et al., 2000) and was stimulated in particular by Prahalad and Hamel's (1990) demand for core competencies to prepare for increased global competition. In this context, competency is first a “managerial” concept, used, for example, by assessment centers conducting “job analysis” (Anastasi & Urbina, 1997). The concept merges two aspects: The activity or task someone should be able to master, and the knowledge, attitudes, skills, and other characteristics (so-called KASOs) that a person needs to learn to solve the task successfully. Kane (1992) uses these aspects in a competence definition:
… an individual's level of competence in an area of practice can be defined as the degree to which the individual can use the knowledge, skills, and judgment associated with the profession to perform effectively in the domain of possible encounters defining the scope of professional practice. (p. 166).
In this perspective, PISA made two major changes to framework development. First, it defined scientific behavior by turning toward “task analysis” for citizenship. The guiding question put forward was “What is it important for citizens to know, value, and be able to do in situations involving science and technology?” (OECD, 2006, p. 20, emphasis added). This is significant because behavior is defined through the situations and tasks students should be able to handle rather than scientific principles. The PISA framework mentions briefly that competencies “rest on [their] importance for scientific investigation” (p. 29), but elaborating this is deemed unnecessary as “tasks define behavior.” Second, PISA made it more obvious and explicit that developing a framework means modeling and not just categorizing the subject domain. The NAEP 2009 framework also had gone further than previous frameworks in trying to provide a rationale explaining the domain. In both frameworks, organizing principle becomes a key to understand the domain.
In summary, the PISA framework's initial conceptualization was similar to the UK-based APU and TAPS assessments, focusing on science processes and using a matrix as the organizing principle. Inspired by competency modeling in the managerial sector, from 2006 this changed to science competencies. PISA then replaced the matrix model with a concept map, explaining that science behavior is influenced by knowledge and attitudes. Competency modeling made the conceptualization become “task oriented,” which means that PISA has moved away from explaining science principles toward identifying task students should be able to handle in everyday life. PISA has become a recognized conceptualization among science educators, not at least for its support to scientific literacy but also because many agree to competency as an appropriate concept for scientific behavior.