The next generation of school psychologists enters the field at a time when demands for accountability, within education and the broader work force, have never been so strong (Kane, Taylor, Tyler, & Wooten, 2011). The intensification of demands for professional accountability come at a time in our profession when school psychologists must demonstrate a greater breadth and depth of knowledge and skills to meet the needs of an increasingly diverse student population and provide a comprehensive array of services that have a direct, measurable impact (Ysseldyke et al., 2008). Across the nation, school psychologists face the challenge of advocating for an approach to performance evaluation that is as comprehensive and data driven as their professional practice. School psychologists need empirically based means to ensure that the services they provide are effective, efficient, and, most critically, have a positive impact on student outcomes. The selection of methods for evaluating the performance of school psychologists makes important statements about the professional competencies, characteristics, and practices that our profession values and encourages.
This article reviews the legislative context and historical trends for professional accountability for school psychologists, provides four key principles to consider in designing an accountability system for school psychologists, and outlines the advantages, disadvantages, and recommended guidelines for using case studies (i.e., single-case designs) and rubric-based approaches for evaluating school psychological services. An illustration of the use of case studies for measuring the impact of school psychological services will be presented in the context of a statewide effort for professional accountability in a field-based school psychology internship. Finally, implications for the graduate preparation of school psychologists are discussed.
The demand for accountability within education was intensified with the U.S. Department of Education's Race to the Top competition (Kane et al., 2011). Race to the Top, a $4.35-billion competitive grant program designed to reward states for implementing comprehensive educational reform, requires states to address teacher quality as a priority and adopt teacher evaluation procedures that assess student growth as a significant factor. Student growth was defined by the U.S. Department of Education in the Race to the Top application as the change in student achievement for an individual student between two or more points in time.
The pressure to develop policies and practices for evaluating teacher effects using student growth has exerted a tremendous sense of urgency for Race to the Top recipients and states seeking Race to the Top awards. Although teachers are at the center of the debate over how much weight should be given to student achievement within a performance assessment and what measures or indicators best represent student growth, school psychologists and other related services personnel recognize that they will soon be compelled to meet similar accountability standards (Skalski, 2011). The rationale for holding teachers, administrators, and related services personnel accountable for student achievement is that a shared accountability system will encourage collaboration among educators and produce better outcomes (Steele, Hamilton, & Stecher, 2010). Yet, the use of student achievement data is far more complicated for school psychologists and other related service providers who provide indirect service delivery to students (Minke, 2011; Skalski, 2011). School psychologists are not directly involved in instruction typically and have roles that vary widely; their time is often split between more than one school. Unlike teachers who could be evaluated using classroom-level data or principals who could be evaluated using school-level data, school psychologists are not directly responsible for the academic achievement of a discrete group of students. Thus, the primary challenge facing school psychologists regarding their own professional accountability involves addressing the question, “How do school psychologists demonstrate the impact of their professional practices on student growth?”
Professional Standards for Accountability
School psychologists are uniquely qualified to address questions regarding their impact on student outcomes and design their own evaluation system. Practicing school psychologists have expertise in measurement theory, including a deep understanding of the critical importance of reliability, validity, and utility, along with knowledge of a variety of methods for measuring performance (e.g., observations, rating scales, and tests; Prus & Waldron, 2008). Data-based decision-making skills critical to assessing students’ needs apply equally well to professional accountability decision making. In both arenas, school psychologists must demonstrate appreciation for the important implications of assessment results (Prus & Waldron, 2008).
According to the National Association for School Psychologists’ (NASP, 2010a) Model for Comprehensive and Integrated School Psychological Services, professional practices associated with research and program evaluation include the use of techniques for data collection, analysis, and accountability in evaluation of services at the individual, group, and system levels. School psychologists are expected to apply knowledge of evidence-based interventions and programs in designing, implementing, and evaluating the fidelity and effectiveness of school-based intervention plans (NASP, 2010a). The NASP Standards for Graduate Preparation of School Psychologists emphasize the need for professional accountability for outcomes explicitly in setting forth the expectation that school psychologists delivering a comprehensive range of professional practice that result in “direct, measurable outcomes for children, families, schools and/or other consumers” (NASP, 2010c, p. 4).
Key Principles for Evaluating the Performance of School Psychologists
From a review of the related literature on evaluating the performance of school psychologists in the field and during their graduate preparation (Prus & Waldron, 2008), four key principles emerge as critical to a credible performance evaluation system: (a) the use of multiple measures, including at least one measure of impact on student outcomes; (b) reliability and validity, with validity anchored to the 2010 NASP Practice Model; (c) utility for distinguishing different levels of proficiency; and (d) linkage to professional development and improvement (see Table 1). These four key principles align well with the policy guidelines set by the National Alliance of Pupil Services Organizations (NAPSO) to assist states in considering how to best apply student achievement outcomes to educator evaluation systems (www.napso.org). NAPSO's (2011) recommendations include the central role for school psychologists and other support personnel in the creation of evaluation systems designed to determine their competence. These evaluation systems should be research-based and consider professional preparation and practice models supported by the national organizations responsible for advancing research and practice for each distinct profession. NAPSO (2011) also advocates for the use of evaluators with expertise in roles, responsibilities, and job functions specific to the position they are evaluating to understand the unique practices and foundational knowledge of the profession and the specific demands, needs, and requirements of each position. Appropriately, credentialed evaluators are also critical for providing meaningful feedback. Finally, evaluation systems must use multiple measures in evaluating professional performance (NAPSO, 2011).
Table 1. Key Principles in a Performance Evaluation System for School Psychologists
|1. The use of multiple measures, including at least one measure of impact on student outcomes.|
|2. Reliability and validity, with validity anchored to the 2010 NASP Practice Model.|
|3. Utility for distinguishing different levels of proficiency.|
|4. Linked to professional development and improvement.|
Performance Appraisal Rubrics
The use of multiple measures in a performance evaluation system is essential as a means of capitalizing on the advantages and minimizing the disadvantages of each individual, episodic or isolated method (Prus & Waldron, 2008). Traditionally, performance appraisal rubrics and rating scales have been utilized almost exclusively as the sole measure to evaluate the performance of a school psychologist. Typically, performance appraisal rubrics are adapted from instruments used to evaluate teachers (e.g., Marzano Model, Charlotte Danielson's Framework for Teaching) or administrators, depending on whether school psychologists are under a teacher or administrative contract. Two examples of performance standards based on Charlotte Danielson's Framework for Teaching include the standards developed by the Cincinnati Public Schools (2009; see Table 1) and the framework developed by the Delaware Department of Education (2012). An example of a performance appraisal rubric based in part on the NASP Standards can be obtained from the Indiana Association of School Psychologists website (www.iasponline.org).
Far too often, a performance appraisal rubric or rating scale is the sole measure used to evaluate school psychologists’ professional competencies, and observations are conducted in one setting of one professional activity (e.g., leading a meeting among teachers and parents). Whereas many school districts have developed, or are in the process of developing, performance appraisal rubrics or rating scales, the evaluator is not necessarily someone with professional knowledge and a background in school psychology (i.e., a building principal, assistant principal, or district administrator without school psychology credentials or affiliations).
The advantages of performance appraisal rubrics and rating scales are that they can be aligned with professional training standards, they provide a direct measure of skills and behaviors in the settings in which the skills and behaviors are expected to be performed, and they are generally accepted, and expected, in the context, and there is a sense of fairness in that school psychologists are evaluated using the same type of measure used to evaluate teachers or administrators (Prus & Waldron, 2008).
Performance appraisal rubrics and rating scales also have several disadvantages identified by Prus and Waldron (2008) that may place limits on their reliability, validity, and utility for use as a measure within a comprehensive performance evaluation system. First, ratings can be quite subjective, especially if provided by a single evaluator. This is a particular concern in situations in which the single evaluator does not have the background knowledge and expertise in school psychology needed to evaluate the more complex aspects of professional practice (e.g., data-based decision making, assessment). In these instances, the non-school psychologist evaluator frequently bases the appraisal on professional competencies that are on public display (e.g., conducting meetings with parents and teachers), severely limiting the comprehensiveness of the performance evaluation. Reliability and validity are sacrificed to the degree that the evaluator is not able to discern competent practice or is disinclined to report less than competent practice (Yariv, 2006). Likewise, the utility of a performance appraisal rubric is reduced if the evaluator is unable to distinguish different levels of proficiency.
A second and related limitation of performance appraisal rubrics is that actual observations of some situations (e.g., counseling, conflict resolution) may be difficult due to concerns about confidentiality, the potential impact of observers on clients, or the low frequency in which the circumstances requiring these skills may occur (Prus & Waldron, 2008). Similarly, the results for school psychologists may vary as a function of the school setting (e.g., expectations for practice) and rater (e.g., building administrator, supervising school psychologist). Individuals responsible for conducting evaluations of school psychologists point out that the time and effort required to complete performance appraisal rubrics can be overwhelming, particularly if many competencies are to be assessed (Prus & Waldron, 2008).
Performance appraisal rubrics may be incorporated as one measure in a comprehensive performance evaluation system comprising multiple measures if school districts follow the guidelines put forth by Prus and Waldron (2008). Rubrics and ratings scales must have specific, operational criteria for observing and appraising performance. Additionally, rigorous training in the use of the measure must be provided to all evaluators. Specific operational criteria and rigorous training are critically important for all evaluators, and particularly for non-school psychologists who may serve as an evaluator of school psychologists. Prus and Waldron (2008) recommend that each school psychologist is rated by more than one source (e.g., building administrator, supervising school psychologist), and the performance of a school psychologist should be assessed in multiple situations and settings over time.
Case Studies (Single-Case Designs)
Single-case designs are widely considered to be one of the best methods for evaluating intervention effectiveness and linking practitioner efforts to student growth over time. Although school psychologists are not typically involved in the direct implementation of academic and behavioral interventions, they do play an essential role in collaborative problem solving with individual teachers as a member of a problem-solving team. Assessing student outcomes in response to increasingly intensive interventions in the context of a multitiered system of support is an outcomes-based approach to evaluating a school psychologist's consultation skills and knowledge of evidence-based academic and behavioral interventions. As school psychology practitioners, we are required by NASP Standards to link our professional practices to direct, measurable outcomes, regardless of whether the practices involve direct services (e.g., behavior contingency contracts, academic tutoring, counseling) or indirect services (e.g., consultation).
The basic AB (case study) single-case design can be highly effective in documenting a student's baseline level of performance as well as academic and/or behavior changes over the course of an intervention (Bloom, Fischer, & Orme, 2005). The essential steps for gathering case study data involve (a) selecting an outcome measure, (b) collecting baseline data, (c) implementing an intervention, and (d) collecting ongoing data (Steege, Brown-Chidsey, & Mace, 2002). Baseline data should be collected for a duration sufficient to document that the behavior is stable. Case study data need to be visually displayed to discern whether there have been changes in trend, level, or variability. Other standard methods of outcome determination can be used based on data from AB designs, such as goal attainment scaling (GAS), percentage of nonoverlapping data (PND), effect size (ES).
Although case study designs are not adequate to establish internal validity definitively, as is the case with more rigorous single-case designs (e.g., ABA or ABAB), Kazdin (1981) has argued that the use of specific methodologies can maximize the extent to which valid inferences can be drawn from case studies, enabling case study designs (also known as accountability designs) to play an important role in the overall framework of evidence-based practices. Specific methodologies that increase the strength of case study designs to serve accountability purposes include: (a) the use of direct observations of operationally defined student behaviors to yield objective data whose reliability and validity can be assessed, in contrast to anecdotal information; (b) multiple assessment occasions prior to and during the implementation of an intervention, in contrast to a single assessment before and after the intervention; (c) repeated measures of a student's target behavior(s) to establish the range of preintervention and postintervention variability in the student's performance (Kazdin, 1981). Given these methodological features, case study designs can provide reasonable evidence that the intervention services being provided by a school psychologist are producing the desired results (Brown-Chidsey, Steege, & Mace, 2008).
Drawing a distinction between accountability for research and accountability for practice may help clarify the role of case study designs. In research, rigorous experimental designs are required to establish the internal validity of a novel intervention approach if the new intervention is to be disseminated as evidence based (Brown-Chidsey et al., 2008). By contrast, school psychologists’ practice involves the delivery of well-established, research-based intervention approaches and the documentation of their effectiveness. Indeed, federal, state, and agency regulations require the documentation of intervention effectiveness, and school psychologists have an ethical responsibility to do so (Polaha & Allen, 1999; Steege et al., 2002). A parallel can be drawn to primary care physicians who are expected to show that their recommended treatments had the desired effects over time for a variety of concerns, but they are not obligated to conduct double-blind randomly controlled trials with patients as part of their routine practice to demonstrate accountability. Thus, case study designs can and should be used as part of a comprehensive approach to demonstrating accountability in practice. The strength of the evidence is further enhanced when case study designs are incorporated into a school psychologist's routine practice and multiple replications are demonstrated over time (Steege et al., 2002).
The use of case study designs for performance evaluation shares many of the advantages and disadvantages of portfolio assessments used in graduate preparation programs. The advantages include the ability to represent multiple samples of work over time, thus reflecting a practitioner's knowledge and skill development across settings while avoiding the problems inherent with one-shot measurement occasions (Prus & Waldron, 2008). Case study designs can be used to measure the effectiveness of interventions targeting individuals or small groups, or at a classwide or systems level (Polaha & Allen, 1999; Steege et al., 2002). Case studies further allow for flexibility in assessing a variety of professional competencies (e.g., data-based decision making, problem solving, consultation, academic and behavioral intervention design, communication skills) in the natural context in which the school psychologist works, thus enabling low-inference evaluative judgments to be made regarding the practitioner's performance (Prus & Waldron, 2008). Finally, case study designs increase school psychologists’ participation in the performance evaluation process.
The primary disadvantage of the use of case study designs is that as a descriptive approach (also referred to as pre-experimental), AB single-case designs do not completely address all the plausible rival hypotheses, nor do they control for threat to internal validity (Cook & Campbell, 1979). Consequently, the school psychologist is unable to conclude with any confidence that changes in student performance were the direct result of the intervention (Brown-Chidsey et al., 2008).
A second limitation of case study designs for performance evaluation is that cases depend largely on the opportunities school psychologists have in the settings in which they work, which may vary by unique role and context variables (Prus & Waldron, 2008). Given that potentially high-stakes performance evaluation decisions may be based on case study demonstrations, the extent to which samples represent the school psychologist's independent ability rather than the product of other collaborators may be a concern (Prus & Waldron, 2008). A final limitation is the recognition that collecting, analyzing, and aggregating case study data may involve knowledge and skills not previously mastered by the school psychologist.
Case study designs may be incorporated as a measure in a comprehensive performance evaluation system comprised of multiple measures if school districts adhere to the following guidelines. First, the case study approach needs to have clear, published expectations for content and the evaluation criteria, including exemplars (Prus & Waldron, 2008). The case study process developed as part of NASP's National School Psychology Certification System for candidates from non–NASP-approved programs includes a rubric for evaluating the quality of case studies (NASP, 2010b). It is recommended that each case study or collection of case studies be rated by more than one trained professional, that inter-rater reliability be monitored, and that recalibration be completed periodically, as needed (Prus & Waldron, 2008). Practically speaking, submitted work should be limited to a volume that can be thoroughly and effectively evaluated by raters (Prus & Waldron, 2008). A cost-effective approach may involve submitting case studies electronically and having evaluators review them over the summer months. Given that submitted case studies involve actual cases, it will be critical that school psychologists remove all identifiable student and consultee (i.e., teacher/parent) information in all submitted materials (Prus & Waldron, 2008). To verify the authenticity of the case study's implementation and outcomes, however, procedures need to be established for a third-party “sign off” from an impartial administrator or supervisor familiar with the intervention and its outcomes. Finally, it should be recognized that case studies submitted as part of a performance evaluation system will likely represent a school psychologist's best work and need to be evaluated as such (Prus & Waldron, 2008).
Measuring Impact Using Case Studies: The Ohio Internship Program in School Psychology
The evaluation of the Ohio Internship Program illustrates how case studies can be used to evaluate the impact of school psychological services. The Ohio Internship Program is a collaboration among the Ohio Department of Education, Office for Exceptional Children, and Ohio's nine school psychology graduate preparation programs. Nearly 100 school psychology graduate students complete their internship each year in the state-funded Ohio Internship Program. Emphasis in accountability for school psychological services and shifts toward evidence-based intervention decisions led to the development of a model of the evaluation of the statewide internship experience with regard to outcomes for schools and students (Morrison et al., 2011; Morrison, Graden, & Barnett, 2009).
The evaluation of the Ohio Internship Program comprises three components. The first is a measure of intern competencies. To assess the development of interns’ skills and competencies during the internship, university-developed rating scales were completed by internship field supervisors at the beginning, midpoint, and end of the internship. The second is a measure of the number of students served by each intern based on the professional practice logs they were required to maintain throughout the school year. For this output measure, interns are asked to report the number of students served at each tier within a multitiered system of support: Tier 1—universal-/system-level practices, such as Positive Behavior Support planning and universal screening for instructional decision making; Tier 2—supplemental/targeted intervention, and Tier 3—intensive/individualized interventions. The third component of the evaluation of the Ohio Internship Program is a measure of the impact of intervention services using a case study approach. Interns are asked to provide outcome data for six individual, targeted, and universal interventions in which they were meaningfully involved. The interventions for which outcome data are required include three academic interventions (Tiers 1–3) and three social/behavior interventions (Tiers 1–3). The interventions in which outcome data are provided are judged by the interns to be exemplars of the support services they provided during their internship year.
Goal Attainment Scaling. GAS is the primary method used for summarizing intervention outcomes for students served by school psychology interns. As a supplement to the GAS process, two additional summary statistics are calculated, in instances where such calculations are appropriate, to measure the effects of an intervention provided by the interns: the PND and ES.
The GAS process involves the development of a 5-point scale for measuring goal attainment as outlined by Kiresuk, Smith, and Cardillo (1994). In this evaluation model, “Expected Level of Outcome” is replaced with “No Change” to better represent students’ responses to the intervention. Thus, positive ratings reflect a positive change in the target, and negative ratings reflect a change in an undesired direction for the target. The other scale anchors remained the same: “Somewhat More Than Expected,” “Somewhat Less Than Expected,” “Much More Than Expected,” and “Much Less Than Expected.”
Reviews of the reliability and validity of many applications of GAS procedures are available in Cardillo and Smith (1994) and Smith and Cardillo (1994), respectively. Studies that used a 5-point scale (similar to the approach used herein) reported interrater reliability indices between .87 and .93 (as cited in Cardillo & Smith, 1994). Test–retest reliability also was acceptable (e.g., correlation of r = .84 over a 2- to 3-week period; see studies reported in Cardillo & Smith, 1994). In school settings, the use of GAS methodology has been demonstrated to be of significant value in the evaluation of intervention-based change and is “a more accurate estimate than any other measure” (Sladeczek, Elliott, Kratochwill, Robertson-Mjaanes, & Stoiber, 2001, p. 52). GAS validity evidence includes analyses of many types of intervention outcomes, including school-based interventions (see Kratochwill, Elliott, & Busse, 1995). GAS has been found to be responsive to measuring diverse functional goals across services and sensitive to measuring intervention-inducted change, making it a strong outcome measure for groups of students in which the rate of progress varies (MacKay, McCool, Cheseldine, & McCartney, 1993). A summary of the research regarding the utility and acceptability of GAS for measuring students’ progress can be found in Roach and Elliott (2005).
Percentage of Nonoverlapping Data. Calculating the PND involves counting the number of intervention data points that exceed the highest baseline point (for studies seeking to increase a target behavior) or counting the number of intervention data points lower than the lowest baseline point (for studies seeking to decrease a target behavior). The number of nonoverlapping data points is then divided by the total number of intervention points to obtain the PND. PND has been found to produce a summary statistic that is consistent with the outcomes obtained through visual analysis of individual participant graphs (Olive & Smith, 2005). PND should not be calculated when a baseline data point of zero is present in decreasing behavior studies or an extremely high baseline data point is present in increasing behavior studies (Scruggs & Mastropieri, 1998; Scruggs, Mastropieri, & Casto, 1987).
The use of PND as a summary statistic that is easy to calculate and interpret has wide support in the research literature (Mathur, Kavale, Quinn, Forness, & Rutherford, 1998). Ratings using PND are judged on the following scale: a PND greater than or equal to 90% is considered “Highly Effective,” a PND of 70% to less than 90% is judged as “Moderately Effective,” a PND of 50% to less than 70% is considered “Mildly Effective,” and a PND of less than 50% is rated as “Ineffective” (Scruggs, Mastropieri, Cook, & Escobar, 1986).
Effect Size. There are many ES estimation methods (Busk & Serlin, 1992; Thompson, 2007). ES in this evaluation model was calculated as the change in achievement or behavior relative to the baseline (control) standard deviation (Busk & Serlin, 1992). As a general guide for outcomes without much specific prior evidence for comparisons, interventions that yield an ES greater than or equal to 0.80 are considered to have a large effect; an ES between 0.50 and 0.79 represents a moderate effect, whereas an ES between 0.20 and 0.49 reflects a small effect.
Implications for Evaluating School Psychologists Using Ohio's Internship Model
The Ohio Internship Program in School Psychology provides future school psychologists an opportunity to model professional accountability practices that can be used throughout their career to demonstrate impact. With its emphasis on field supervisors’ ratings of interns’ competencies over time (Fall, Winter, Spring) and direct measures of student growth using case studies, the Ohio Internship Program offers a reliable and valid approach to evaluating the effectiveness of a school psychologist that can be linked to the 2010 NASP Practice Model. Field supervisors’ ratings of intern competencies have utility for distinguishing different levels of proficiency based on their thorough knowledge of best practices in the field of school psychology. Field supervisors’ ratings and case studies outcomes can be used to identify targets for further professional development.
At a state or local level, the Ohio Internship Program is a viable approach to meeting the demand for professional accountability for school psychological services. The Ohio Internship Program incorporates the four key principles of a credible performance evaluation system: (a) the use of multiple measures, including at least one measure of impact on student outcomes; (b) reliability and validity, with validity anchored to the 2010 NASP Practice Model; (c) utility for distinguishing different levels of proficiency; and (d) linkage to professional development and improvement. As demonstrated by the Ohio Internship in School Psychology model, the ratings of professional competencies, coupled with evidence of positive student academic and behavioral outcomes, provide compelling evidence of the collective impact of school psychological services.