1. Top of page
  2. Abstract
  3. Method
  4. Results
  5. Discussion
  6. Conclusion
  7. References

The purpose of this experimental study was to investigate the effects of metacognitive reflective assessment instruction on student achievement in mathematics. The study compared the performance of 141 students who practiced reflective assessment strategies with students who did not. A posttest-only control group design was employed, and results were analyzed by conducting one-way analysis of variance (ANOVA) and nonparametric procedures. On both a posttest and a retention test, students who practiced reflective strategies performed significantly higher than students who did not use the strategies. A within-subjects ANOVA was also conducted six weeks following the intervention to assess how the factor of time affected retention levels. No significant difference was found between the posttest and retention test results for the experimental groups or the control group.

At least since Socrates, learners have been counseled to reflect on what they know. Socrates taught that the unexamined life is not worth living. In the Meno, there is the familiar story of Socrates teaching a slave boy the Pythagorean Theorem. The point of the story is that teaching involves to a considerable extent the teacher's willingness to give the learner an opportunity to reflect on what the learner knows. With regard to learner empowerment, Socrates is quoted in Theaetetus as saying, “… the many fine discoveries to which they cling are of their own making” (Plato, 1952, pp. 150–151). Socrates' lament was that teachers spend far too much time telling and too little time allowing students to think about what they are learning, an argument that continues to be heard to this day. Recent reforms in mathematics education have included, among other things, the recommendation that students spend more time studying problems in depth and less time covering wide ranges of topics (Kaye, 2005). Also noted in the current literature is the recommendation that students be allowed opportunities to practice formative self-assessment as a means of clarifying their thinking about what they are learning (Marzano, 2009; Shepard, 2008).

Daily reflection by students on the material they are taught represents formative assessment, primarily designed to afford students a sense of their progress. Whether systematic opportunity to practice reflective assessment about what is being taught to them enhances academic performance is an empirical question. The oft-absent integration of formative assessment and reflective activity as part of the lesson routine contrasts sharply and ironically with the current emphasis upon standardized testing. Assessment is often viewed by teachers and students alike as a process separate from teaching and learning (Gulikers, Bastiaens, Kirschner, & Kester, 2006; Herman, Aschbacher, & Winters, 1992). As a consequence, knowledge and skills are often taught as preparation for assessment but not as the corollary, that is, assessment is construed as ongoing informer of knowledge and skills (Perrone, 1994; Simmons, 1994; Wiggins, 1993; Wragg, 1997). It is hypothesized in the present study that lesson-integrated formative assessment rather than merely as a separate and often judgmental process (Chappuis, 2005; Earl & LeMahieu, 1997; McTighe & O'Connor, 2005; Stiggins, 2008; Wiggins, 1993) has a positive effect on achievement. Students are often required to think in order to solve problems, but the deeper stage of thinking about their thinking, or metacognition, is seldom solicited as part of their problem solving.

The extent to which elementary school children (in the case of the present study, children age 10–12 years) are capable of exercising the executive function needed to benefit from reflective practice is a question that invites further inquiry. Research by Michalsky, Mevarech, and Haibi (2009) with fourth-grade students who studied scientific texts and who practiced metacognitive learning strategies provides evidence that such practice is beneficial. The present study focuses on slightly older children and their ability to reflect meaningfully on concepts and skills in mathematics.

The relationship between metacognition and reflective assessment is a close one. Metacognition literally means thinking (cognition) after (meta) and in that sense represents reflection on experience. Metacognition has been defined as an awareness of one's thinking patterns, learning characteristics and techniques (Schneider, Borkowski, Kurtz, & Kerwin, 1986) and is commonly referred to as “thinking about thinking” (Costa, 2001). The term, metacognition, was introduced into the literature of educational psychology by John Flavell to indicate self-knowledge of cognitive states and processes (Flavell, 1976). Beyond mere definition, Flavell offers this description:

Metacognition refers to one's knowledge concerning one's own cognitive processes or anything related to them, e.g., the learning relevant properties of information or data. For example, I am engaging in metacognition if I notice that I am having more trouble learning A than B; if it strikes me that I should double check C before accepting it as a fact (Flavell, 1976, p. 232).

Brown (1978) offers the corollary of “secondary ignorance” (p. 82) as not knowing what you know.

The literature is rich (Bandura, 1997; Chappuis, 2005; Dewey, 1933; Earl & LeMahieu, 1997; Stiggins, 2008; Tittle, 1994; Wiliam & Thompson, 2008) with philosophy and opinion regarding the value of metacognitive practice for students, but few empirical studies designed specifically to measure such effects have been published. Although a number of empirical investigations (Blank & Hewson, 2000; Black & Wiliam, 1998; Conner & Gunstone, 2004; Dignath & Büttner, 2008; Gulikers et al., 2006; Gustafson & Bennett, 2002; Hartlep & Forsyth, 2000; Naglieri & Johnson, 2000; Schneider et al., 1986; Schunk, 1983; Wang, Haertel, & Walberg, 1993; White & Frederiksen, 1998) have reported positive effects of metacognitive activities on student achievement, these findings are typically embedded as one of several components of such studies. While not all research on this topic has found significant results (Andrade, 1999; Kurtz & Borkowski, 1984), a pattern of findings appears to be developing that supports the inclusion of reflective assessment strategies in learning activities. The present study is an effort to contribute to the body of research.

Purpose and Research Questions

The purpose of this study was to investigate the effects of metacognitive reflective assessment instruction on the mathematics achievement of fifth- and sixth-grade students. The following research questions guided the investigation: (a) What are the effects of using reflective assessment strategies on the mathematics achievement of fifth and sixth-grade students? (b) Does the use of reflective assessment strategies enhance student retention of mathematics concepts over time?


  1. Top of page
  2. Abstract
  3. Method
  4. Results
  5. Discussion
  6. Conclusion
  7. References

This investigation of student reflective assessment and its effects upon mathematics achievement employed an experimental posttest-only control group design (Campbell & Stanley, 1963; Shadish, Cook, & Campbell, 2002). The independent variable was reflective assessment strategies, which were practiced only by Experimental Group I. The dependent variable was the mathematics scores for Experimental Group I, Experimental Group II, and the Control Group, as measured by a researcher-designed instrument.

A posttest-only control group design was selected for this study, rather than a pretest–posttest design, for several reasons. First, a stable enrollment at the school in which the investigation was conducted indicated that participant mortality would not be a serious threat to internal validity, which is a weakness of posttest-only designs (Shadish et al., 2002). Second, since the study was embedded within an ongoing mathematics curriculum pilot, the posttest-only design allowed the researcher to disguise the purpose of the study in order to control for teacher effects (Bingham & Felbinger, 2001). Thus, teacher and student participants were not exposed to the test content through the four weeks of the investigation, which controlled for their possible sensitivity to the purpose of the research (Chong-ho & Ohlund, 2010). In particular, not having a pretest kept the control group teachers blind to the mathematical content that comprised the dependent variable. Since the six randomly assigned teacher participants worked in close proximity in the school, not having a pretest minimized the risk for conversation about the study content and potential experimental treatment diffusion. Third, the effects of pretesting were a concern because of the four-week duration of the study and the repeated administration of the instrument as a retention test. In a pretest–posttest design the retention test would have been the third administration of the instrument over a 10-week period, which would have called for development of an alternate instrument.

The study was conducted in conjunction with a curriculum pilot of mathematics materials that were being considered for a school district adoption. Since the study did not interfere with the normal course of study, there was no resistance to the random assignment of students.


A sample of 141 fifth- and sixth-grade student participants from a suburban elementary school were randomly assigned to three experimental conditions (reflective assessment, nonreflective review, and control) delivered to two reconstituted classes for each condition. Six teacher participants were randomly assigned to one of the three treatments. Each group was comprised of two subgroups of approximately 24 participants each for total group sizes of 47, 48, and 46. Random assignment of participants resulted in balance among the three groups regarding gender, ability level, socioeconomic status (SES), and ethnicity. Socioeconomic status was estimated by participation in the free or reduced meal program (see Table 1). The sample was drawn from a predominantly White, middle-class school population (see Table 1) and was comprised of 61 males, 80 females, 61 fifth graders, and 80 sixth graders. Fifteen of the student participants, 10.6% of the sample, belonged to subgroups commonly accepted in educational literature as at-risk factors to academic performance. Seven students received special education instruction for both reading and mathematics, four were English Language Learners (ELL), and five qualified for free or reduced lunch. Of the four ELL students, Spanish was the first language of three students, and Russian was the first language of one student.

Table 1. Achievement and Demographic Percentages
 Study SitePilot Site 1Pilot Site 2
Met standard on state test   
American Indian/Alaskan Native01.7.9
Asian/Pacific Islander9.412.211.8
African American3.82.4.7
Special programs   
Free or reduced price meals8.39.410.7
Special education14.714.311.3


The posttest questions were drawn from Connected Mathematics: Data About Us (Lappan, Fey, Fitzgerald, Friel, & Phillips, 2002b) on which the lessons were based for the experimental groups. Objectives for each of the 16 scripted lessons taught by the Experimental I and II groups were stated in multiple choice questions. A pilot test of 38 questions was administered in two classrooms at schools not involved in the study to determine the reliability of the test items. The pilot schools and classes were selected to closely match the achievement, diversity, mobility, and at-risk factors of the school in which the study was to be conducted (see Table 1).

Item analyses of the pilot test resulted in two questions being discarded. Cronbach's alpha and split-half coefficients of .72 and .71, respectively, were found on the remaining 36 posttest items, indicating satisfactory reliability (Vogt, 2005). In addition to instrument reliability analyses, the face and content validities of the instrument were examined by two assessment experts, one a university professor of statistics and the other a school district assessment director. Independently, they reported that both the face validity and content validity of the instrument appeared to be high. The 36-item multiple choice test on probability and statistics was administered following the intervention and again six weeks later as a retention test.

A priori power analysis for a one-way analysis of variance (ANOVA; posttest-only design with approximately 40 cases per cell) was conducted using G*Power 3 (Faul, Erdfelder, Lang, & Buchner, 2007) to determine a sufficient sample size using an alpha of .05, a power of .70, a small effect size (d = .4), and one tail.


Scripted lesson plans were provided for the teachers of the experimental and control groups. All lesson scripts were derived from the Connected Mathematics Program. A probability and statistics unit was the focus of lessons for the experimental groups (Lappan et al., 2002b), while the control group was taught a unit on area and perimeter (Lappan, Fey, Fitzgerald, Friel, & Phillips, 2002a). Students in the experimental groups were taught identical statistics lessons, except for the reflective intervention. At the closing of each class session, students in the Experimental I Group practiced a reflective activity regarding what they had learned in the class session. This reflection served as the independent variable in the study. Teachers in Experimental Group II closed each class session with a five-minute review of the lesson activities and objectives in order to ensure that equal time on task existed for the two experimental groups. Control Group lessons focused on area and perimeter, which was another mathematics unit being piloted.

Two separate reflection strategies were combined to form the independent variable: a written “I Learned” statement and a verbal “Thinking Aloud” strategy. These reflective strategies are efficient ways for teachers to facilitate student reflection on what has been learned while finding out if their lesson objectives have been attained. During the last five minutes of the lesson, students in the Experimental Group I were asked to think about what they had learned during the class period and then to write a sentence that began with the phrase “I learned.” Students were then prompted by the teacher to talk about what they had written with another student, the “Think aloud” strategy, and finally to edit as appropriate their “I learned” statement. The written statements were then collected by the teacher each day and submitted to the researchers.

Prior to the start of the investigation, the researchers provided training for teacher participants that emphasized the need to precisely follow the lesson scripts and prescribed time allotments. Teacher participants agreed not to discuss the investigation until its completion. The researcher closely monitored progress throughout the investigation to ensure that lesson scripts were followed, confidentiality was maintained, and disruptions were avoided.


  1. Top of page
  2. Abstract
  3. Method
  4. Results
  5. Discussion
  6. Conclusion
  7. References

A one-way ANOVA was conducted to evaluate the effects of the reflective strategy intervention on participant achievement on the mathematics test. An alpha level of .05 was used for all statistical tests. Three participant groups (reflective assessment, nonreflective review, and control) were administered a posttest at the end of the study, and again six weeks later as a retention test (see Table 2). Significant main effects were found in both administrations of the mathematics test (see Table 3). Effect size calculations indicated a medium effect of the reflection strategy (see Table 3).

Table 2. Posttest and Retention Test Means, Medians, and Standard Deviations
 PosttestRetention Test
Reflection group29.4031.004.334729.1830.003.5444
No reflection group26.9229.005.614826.7727.005.5448
Control group22.3022.004.374622.4222.504.4545
Table 3. Between-Subjects Effects: Posttest and Retention Test
 dfMean squareFpPartial η2
Corrected model2602.53825.962<.01.273
Retention test without missing values     
Corrected model2524.06424.606<.01.269
Retention test with missing values imputed     
Corrected model2544.88226.347<.01.276

Follow-up tests were conducted to evaluate pairwise differences among the means. For both the posttest and the retention test, Tukey HSD procedures indicated that the mean scores were significantly different between the Experimental I and II Groups (posttest, p = .035; retention test, p = .036). Significant difference (p < .01) was found in all pairwise comparisons with the Control Group on both administrations of the mathematics test. These results indicate that the reflective strategy intervention did indeed lead to higher achievement, related to the first research question.

On the retention test Levene's Test of Equality of Error Variance found nonhomogeneity among the groups (F[2, 134] = 3.28; p = .041). During the six weeks between the posttest and retention test administrations, four students withdrew from school causing unequal sample sizes. Since this could have violated an assumption that all groups were the same, a Dunnett's C-test was conducted, which also found significant mean differences between the Experimental I and II Groups (p < .05) and in comparisons with the control group (p < .05). In addition, leptokurtosis (Ku = 2.45) was found in the reflection group posttest scores, although the sample size likely was large enough to yield fairly accurate alpha values (Green & Salkind, 2007).

Further nonparametric procedures were calculated in response to the nonhomogeneity issue discussed above. A Kruskall–Wallis test was found to be significant on both the posttest (χ2 [2, N = 141] = 41.95, p < .01) and on the retention test (χ2 [2, N = 137] = 41.25, p < .01). Mann–Whitney U tests conducted for pairwise comparison among the three groups found significant results that the Experimental I Group (reflection) performed significantly higher than the Experimental II Group (nonreflection) on both the posttest and the retention test (posttest [z = −2.37, p = .018]; retention test [z = −2.29; p = .022]). On both tests, the reflection and nonreflection groups scored significantly higher (p < .01) than the control group. These findings verified the results of the one-way ANOVA.

A repeated measures analysis of variance was conducted in which the within-subject factor was time (post and retention test occasions), and the between-subjects factor was experimental condition (reflective assessment, nonreflective review, and control). The dependent variable was performance on the mathematics test. This analysis was conducted first with the four missing retention test scores omitted, and then with missing values imputed (see Table 4). No significant difference between the posttest and retention test results was found in either of the repeated measures ANOVAs (see Table 5). There was also no significant interaction found between the test factor and the treatment factor (see Table 5).

Table 4. Within-Subjects ANOVA: Descriptive Statistics
 Missing Values OmittedMissing Values Imputed
Reflection group29.863.914429.183.4247
No reflection group26.775.544826.775.5448
Control group22.274.414522.424.4046
Table 5. Repeated Measures: Missing Value Comparisons
 NWilks' ΛFpPartial η2
Without missing values137.998.222.64.002
Missing values imputed141.999.110.74.001
Without missing values137.989.492.69.011
Missing values imputed141.998.106.96.002

Mauchly's Test of Sphericity was not significant for both analyses indicating that variance differences were roughly equal. These results indicate that reflective strategies do not necessarily result in higher retention over time, since both the Experimental I and II groups sustained their levels of performance after six weeks. While the Experimental I (reflection) Group learned significantly more, both groups equally sustained what they had learned. As expected, the retention test scores for the Control Group scores were only slightly different than the posttest results.


  1. Top of page
  2. Abstract
  3. Method
  4. Results
  5. Discussion
  6. Conclusion
  7. References

The results of this study support the theory that student reflection during learning activities enhances achievement (Bandura, 1997; Black & Wiliam, 1998; Costa, 2001; Dewey, 1933; Marzano, 2009; Stiggins, 2008; Wiliam & Thompson, 2008). Students who practiced reflective strategies performed significantly better when compared to students who did not. In addition to the statistical significance, medium to large effect sizes were found that give support for reflective strategies as a practical classroom learning tool.

These results provide an answer to the first research question in demonstrating that the inclusion of reflective strategies in mathematics lessons did indeed cause greater learning. The findings are consistent with previous empirical research (Michalsky et al., 2009; Naglieri & Johnson, 2000) that has supported reflective strategies. This is reason to advocate for increased application of reflective strategies as embedded formative assessment in daily classroom activity.

In response to the second research question, the within-subjects ANOVA results indicate that reflective assessment strategies do not lead to enhanced retention of learning over time. It had been expected that students who practiced reflective strategies would retain more what they had learned than students who did not reflect. This did not prove to be the case. In fact, all three groups—reflective, nonreflective, and control—sustained their levels of performance on the second administration of the mathematics test. Additionally, the absence of interaction shows that the retention test results were not significantly influenced by exposure to the instrument six weeks earlier.


As with any research conducted in a school, this study was tailored to match the situation. As expected, the posttest-only control group design controlled for anticipated threats to internal validity of pretesting and teacher effects. In addition, this design avoided the need to develop an alternate instrument that would have been desirable had a pretest been administered in addition to the other two assessments. For this study, these strengths outweighed the risk of student attrition that is an inherent problem with posttest only designs (Shadish et al., 2002). While the stability of student enrollment led the researcher to choose a posttest-only design, in an area of moderate or high student mobility, a pretest–posttest control group design would be appropriate.

Several strengths to this investigation give confidence in the findings. These include the study's experimental design, with random assignment of student participants and of teachers to groups, which provided assurance of balanced groups and consistency of instruction (Gall, Gall, & Borg, 2007). First, as anticipated, mortality of student participants was not an issue during the four weeks of the study, which supported the choice to use a posttest-only control group design. Therefore, the absence of a pretest, a design weakness according to Shadish et al. (2002), did not prove to be a problem in the major component of the investigation regarding the effects of reflective strategies on learning mathematical concepts. Teacher differences were further controlled by the provision of lesson plans that included verbatim scripts, time allotments, and materials. Adherence to the scripted lesson plans was closely monitored by the researcher throughout the investigation, and experimental treatment diffusion did not appear to be a cause for concern. Equal time on task for the experimental groups was accounted for in the lesson plans, including an alternate closure activity for the nonreflection group. In addition, teacher and student participants were blind to the purpose of the study (Bingham & Felbinger, 2001), which was conducted during a curriculum pilot.

Providing a meaningful experience for the Control Group was an important aspect to this study. Since these students were also participants in the mathematics curriculum pilot, their experience during the four-week study was equally desirable to that of the other groups. Feedback from Control Group teachers contributed to school district decision making regarding the piloted mathematics curriculum, just as did that of Experimental Group I and II teachers.


The limitations of this study include mortality of student participants for the retention test, the use of a researcher-designed instrument, and the generalizability of the findings. First, it was not expected that homogeneity of groups would prove to be a limitation of this investigation considering the stability of student enrollment and the initial efforts to ensure balance among groups. Participant attrition, in fact, was not a concern during the four weeks of the investigation and the subsequent posttest. However, mortality did prove to be a minor problem six weeks later for the retention test. While the overall attrition rate for the retention test was only 2.8%, four of the 141 participants, three participants withdrew from the Experimental Group I. While imputation of the missing values confirmed the findings related to research question 2 (see Tables 4 and 5), participant attrition limited the validity of retention test results. For the retention test component of this study, a pretest–posttest design would have better accounted for the mortality issues.

The use of a researcher-designed instrument is another potential limiting factor for this study. Since the opportunity to implement an experimental design depended on doing so during a curriculum pilot, the choice of curricular content was beyond the researcher's control. For this reason a researcher-designed instrument was developed that aligned with the lesson objectives found in the piloted mathematics curriculum. Even though the pilot study found instrument reliability to be adequate, the use of a standardized instrument would have been preferable.

The sample for this study was representative of a suburban, middle class population, and thus, any generalizing of results should be done with this in mind (see Table 1). Since the posttest findings are causal, external validity is required in order to be of use to other settings (Briggs, 2008). Caution, therefore, should be exercised when applying the findings to schools of high poverty, urban situations, or cultural and ethnic diversity. In addition, generalizing the results to at-risk populations, such as special education or ELL students, should be carefully considered. While the results of this study offer promise that other populations of students will benefit from practicing reflective strategies, the findings should be generalized with high confidence only to schools with similar demographics as those represented in the study (see Table 1). Future research is needed to demonstrate the effectiveness of reflective strategies in diverse student populations.

Conducting the study in one school where student and teacher participants had contact outside of the randomly assigned treatments is another limiting factor (Shuttleworth, 2009). This was a major factor in the selection of the posttest-only control group design, however, stronger control for potential contamination of the results could have been provided if several schools had been included in the study. Due to the structure of the curriculum pilot in which the study was embedded, the research was limited to one elementary school.


  1. Top of page
  2. Abstract
  3. Method
  4. Results
  5. Discussion
  6. Conclusion
  7. References

In an era of high-stakes testing and increased pressure on classroom teachers to improve student achievement, the results of this study offer proof of the effectiveness of reflective assessment strategies in improving student learning. Mathematics teachers, especially, can leverage these findings to support the incorporation of student reflection as an integral part of lesson activity. Standing out among the findings of this study is the positive impact on student learning when reflective assessment strategies are included in daily mathematics instruction. That this innovation can be easily implemented at low cost and with minimal impact on classroom instruction makes reflective assessment a highly practical innovation. At a time when public education faces the dual dilemmas of increased expectations and diminishing resources, reflective assessment is an innovation that should be broadly embraced for it addresses both issues.

The results of this study were statistically significant and causal, which offer mathematics practitioners strong rationale for applying the findings in the classroom. The findings also inform practice in other content areas and provide reason to delve deeper into how reflection can be harnessed in all classrooms to enhance student learning. Further research should be conducted with diverse student populations, in other subject areas, at different grade levels, and with a variety of reflective strategies. It will be especially important to conduct research in schools of high poverty, where reflective strategies will provide effective small-scale assessment tools that are usable by both teachers and students.

The Connected Mathematics Project (CMP), whose curriculum was used in this research, has developed a substantial evidence base regarding effective mathematics curriculum, instruction, and assessment. The results of this study contribute to the growing body of empirical evidence regarding mathematics instruction and assessment that is being developed and compiled by CMP-affiliated researchers.

In addition, more research is needed on how long-term memory is impacted by the inclusion of reflective strategies in learning experiences. While this study did not find differences in the retention levels of the three groups, it may be that six weeks was too short a time period to find retention differences among the three groups. Future studies should include several repeated assessments over longer time spans to determine how reflection impacts long-term memory.

An important outcome of this study is that it demonstrated that experimental research can be conducted in the schoolhouse without major disruption. This occurred because the study was conducted as part of a school district curriculum pilot. Collaboration with school districts on curriculum adoptions offers opportunity to conduct empirical research without interfering with the scope and sequence of an instructional program. In this study, for example, it is not likely that the school district would have allowed random assignment of students had a curriculum pilot not been in progress.

The results of this study lend support to the theoretical view that student reflection on material taught increases the probability that the student will learn the material. The results provide support for the incorporation of reflective assessment strategies into daily classroom activities. The statistically significant findings of this study contribute empirical evidence to the argument in the metacognitive literature that supports reflective strategies as an effective practice and provide reason for continued research on the topic.


  1. Top of page
  2. Abstract
  3. Method
  4. Results
  5. Discussion
  6. Conclusion
  7. References
  • Andrade, H. G. (1999). Student self-assessment: At the intersection of metacognition and authentic assessment. Paper presented at the Annual Meeting of the American Educational Research Association, Montreal, Quebec, Canada.
  • Bandura, A. (1997). Self-efficacy: The exercise of control. New York: W. H. Freeman and Company.
  • Bingham, R. D., & Felbinger, C. L. (2001). Evaluation in practice: A methodological approach (2nd ed.). New York: Chatham House Publishers/Seven Bridges Press.
  • Black, P., & Wiliam, D. (1998). Inside the black box. Phi Delta Kappan, 80(2), 139148. Retrieved from
  • Blank, L. M., & Hewson, P. W. (2000). A metacognitive learning cycle: A better warranty for student understanding? Science Education, 84(4), 486506.
  • Briggs, D. C. (2008). Comments on Slavin: Synthesizing causal inferences. Educational Researcher, 37(1), 1522.
  • Brown, A. L. (1978). Knowing when, where, and how to remember: A problem of metacognition. In R. Glaser (Ed.), Advances in instructional psychology (Volume 1, pp. 77165). Hillsdale, NJ: Lawrence Erlbaum Associates, Publishers.
  • Campbell, D. T., & Stanley, J. C. (1963). Experimental and quasi-experimental designs for research. Boston, MA: Houghton Mifflin Company.
  • Chappuis, J. (2005). Helping students understand assessment. Educational Leadership, 63(3), 3943. Retrieved from
  • Chong-Ho, Y., & Ohlund, B. (2010). Threats to validity of research design. Retrieved from
  • Conner, L., & Gunstone, R. (2004). Conscious knowledge of learning: Accessing learning strategies in a final year high school biology class. International Journal of Science Education, 26(12), 14271443. doi: 10.1080/0950069042000177271.
  • Costa, A. L. (2001). Developing minds: A resource book for teaching thinking (3rd ed.). Alexandria, VA: Association for Supervision and Curriculum Development.
  • Dewey, J. (1933). How we think: A restatement of the relation of reflective thinking to the educative process. New York: D. C. Heath and Company.
  • Dignath, C., & Büttner, G. (2008). Components of fostering self-regulated learning among students: A meta-analysis on intervention studies at primary and secondary school level. Metacognition and Learning, 3(3), 231264. doi: 10.007/s11409-008-9029-x.
  • Earl, L. M., & LeMahieu, P. G. (1997). Rethinking assessment and accountability. In A. Hargreaves (Ed.), Rethinking educational change with heart and mind: 1997 ASCD yearbook (pp. 149167). Alexandria, VA: Association for Supervision and Curriculum Development.
  • Faul, F., Erdfelder, E., Lang, A. G., & Buchner, A. (2007). G*Power 3: A flexible statistical power analysis for the social, behavioral, and biomedical sciences. Behavior Research Methods, 39(2), 175191.
  • Flavell, J. H. (1976). Metacognitive aspects of problem solving. In L. B. Resnick (Ed.), The nature of intelligence (pp. 231236). Hillsdale, NJ: Lawrence Erlbaum Associates, Publishers.
  • Gall, M. D., Gall, J. P., & Borg, W. R. (2007). Educational research: An introduction (8th ed.). Boston, MA: Pearson/Allyn & Bacon.
  • Green, S. B., & Salkind, N. J. (2007). Using SPSS for Windows and Macintosh: Analyzing and understanding data (5th ed.). Upper Saddle River, NJ: Prentice Hall, Inc..
  • Gulikers, J., Bastiaens, T. J., Kirschner, P. A., & Kester, L. (2006). Relations between student perceptions of assessment authenticity, study approaches and learning outcome. Studies in Educational Evaluation, 32(4), 381400. doi: 10.1016/j.stueduc.2006.10.003.
  • Gustafson, K., & Bennett, W. (2002). Issues and difficulties in promoting learner reflection: Results from a three-year study. Retrieved from
  • Hartlep, K. H., & Forsyth, G. A. (2000). The effect of self-reference on learning and retention. Teaching of Psychology, 27, 269271. doi: 10.1207/S15328023TOP2704_05.
  • Herman, J. L., Aschbacher, P. R., & Winters, L. (1992). A practical guide to alternative assessment. Alexandria, VA: Association for Supervision and Curriculum Development.
  • Kaye, S. (2005). The place of problem solving in contemporary mathematics curriculum documents. Journal of Mathematical Behavior, 24(3/4), 341350. doi: 10.1016/jmjmathb.2005.09.004.
  • Kurtz, B. E., & Borkowski, J. G. (1984). Children's metacognition: Exploring relations among knowledge, process, and motivational variables. Journal of Experimental Child Psychology, 37(2), 335354. doi: 10.1016/0022-0965(84)90008-0.
  • Lappan, G., Fey, J., Fitzgerald, W., Friel, S., & Phillips, E. (2002a). Connected mathematics: Covering and surrounding. Upper Saddle River, NJ: Prentice Hall.
  • Lappan, G., Fey, J., Fitzgerald, W., Friel, S., & Phillips, E. (2002b). Connected mathematics: Data about us. Upper Saddle River, NJ: Prentice Hall.
  • Marzano, R. L. (2009). When students track their progress. Educational Leadership, 67(4), 8687. Retrieved from
  • McTighe, J., & O'Connor, K. (2005). Seven practices for effective teaching. Educational Leadership, 63(3), 1017. Retrieved from
  • Michalsky, T., Mevarech, Z., & Haibi, L. (2009). Elementary school children reading scientific texts: Effects of metacognitive instruction. The Journal of Educational Research, 102(5), 363374. doi: 10.3200/JOER.102.5.363-376.
  • Naglieri, J. A., & Johnson, D. (2000). Effectiveness of a cognitive strategy intervention in improving arithmetic computation based on the PASS theory. Journal of Learning Disabilities, 33(6), 591598. doi: 10.1177/002221940003300607.
  • Perrone, V. C. (1994). How to engage students in learning. Educational Leadership, 51(5), 1113. Retrieved from
  • Plato (1952). Theaetetus. In B. Jowett (Ed.), Dialogues of Plato (pp. 150151). Chicago, IL: University of Chicago Press.
  • Schneider, W., Borkowski, J. G., Kurtz, B., & Kerwin, K. (1986). Metamemory and motivation: A comparison of strategy use and performance in German and American children. Journal of Cross-Cultural Psychology, 17(3), 315336. doi: 10.1177/0022002186017003005.
  • Schunk, D. H. (1983). Progress self-monitoring: Effects on children's self-efficacy and achievement. Journal of Experimental Education, 51(2), 8993. Retrieved from
  • Shadish, W. R., Cook, T. D., & Campbell, D. T. (2002). Experimental and quasi-experimental designs for generalized causal inference. New York: Houghton Mifflin Company.
  • Shepard, L. (2008). Formative assessment: Caveat emptor. In C. A. Dwyer (Ed.), The future of assessment: Shaping teaching and learning (pp. 279303). New York: Lawrence Erlbaum Associates.
  • Shuttleworth, M. (2009). Pretest-posttest designs. Retrieved from
  • Simmons, R. (1994). The horse before the cart: Assessing for understanding. Educational Leadership, 51(5), 2223. Retrieved from
  • Stiggins, R. J. (2008). Correcting errors of measurement that sabotage student learning. In C. A. Dwyer (Ed.), The future of assessment: Shaping teaching and learning (pp. 229244). New York: Lawrence Erlbaum Associates.
  • Tittle, C. K. (1994). Toward an educational psychology of assessment for teaching and learning: Theories, contexts, and validation arguments. Educational Psychologist, 29(3), 149162. doi: 10.1207/s15326985ep2903_4.
  • Vogt, W. P. (2005). Dictionary of statistics and methodology: A nontechnical guide for the social sciences (3rd ed.). Thousand Oaks, CA: Sage Publications.
  • Wang, M. C., Haertel, G. D., & Walberg, H. J. (1993). Toward a knowledge base for school learning. Review of Educational Research, 63(3), 249294. doi: 10.2307/1170546.
  • White, B. C., & Frederiksen, J. (1998). Inquiry, modeling, and metacognition: Making science accessible to all students. Cognition and Instruction, 16(1), 3118. doi: 10.1207/s1532690xci1601_2.
  • Wiggins, G. (1993). Assessing student performance: Exploring the purpose and limits of testing. San Francisco, CA: Jossey-Bass Publishers.
  • Wiliam, D., & Thompson, M. (2008). Integrating assessment with learning: What will make it work? In C. A. Dwyer (Ed.), The future of assessment: Shaping teaching and learning (pp. 5382). New York: Lawrence Erlbaum Associates.
  • Wragg, T. (1997). Assessment & learning: Primary and secondary. New York: Routledge.