<?xml version="1.0" encoding="UTF-8"?>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"><channel rdf:about="http://onlinelibrary.wiley.com/rss/journal/10.1111/(ISSN)1745-3992" xmlns="http://purl.org/rss/1.0/"><title>Educational Measurement: Issues and Practice</title><description> Wiley Online Library : Educational Measurement: Issues and Practice</description><link>http://onlinelibrary.wiley.com/resolve/doi?DOI=10.1111%2F%28ISSN%291745-3992</link><dc:publisher xmlns:dc="http://purl.org/dc/elements/1.1/">John Wiley &amp; Sons, Inc</dc:publisher><dc:language xmlns:dc="http://purl.org/dc/elements/1.1/">en</dc:language><dc:rights xmlns:dc="http://purl.org/dc/elements/1.1/">© National Council on Measurement on Education</dc:rights><prism:issn xmlns:prism="http://prismstandard.org/namespaces/1.2/basic/">0731-1745</prism:issn><prism:eIssn xmlns:prism="http://prismstandard.org/namespaces/1.2/basic/">1745-3992</prism:eIssn><dc:date xmlns:dc="http://purl.org/dc/elements/1.1/">2013-03-01T00:00:00-05:00</dc:date><prism:coverDisplayDate xmlns:prism="http://prismstandard.org/namespaces/1.2/basic/">Spring 2013</prism:coverDisplayDate><prism:volume xmlns:prism="http://prismstandard.org/namespaces/1.2/basic/">32</prism:volume><prism:number xmlns:prism="http://prismstandard.org/namespaces/1.2/basic/">1</prism:number><prism:startingPage xmlns:prism="http://prismstandard.org/namespaces/1.2/basic/">1</prism:startingPage><prism:endingPage xmlns:prism="http://prismstandard.org/namespaces/1.2/basic/">55</prism:endingPage><image rdf:resource="http://onlinelibrary.wiley.com/store/10.1111/emip.2013.32.issue-1/asset/cover.gif?v=1&amp;s=218fd3aaa18ab4adbe0e4ad98cac18b59a94d411"/><items><rdf:Seq><rdf:li rdf:resource="http://onlinelibrary.wiley.com/resolve/doi?DOI=10.1111%2Femip.12006"/><rdf:li rdf:resource="http://onlinelibrary.wiley.com/resolve/doi?DOI=10.1111%2Femip.12000"/><rdf:li rdf:resource="http://onlinelibrary.wiley.com/resolve/doi?DOI=10.1111%2Femip.12001"/><rdf:li rdf:resource="http://onlinelibrary.wiley.com/resolve/doi?DOI=10.1111%2Femip.12002"/><rdf:li rdf:resource="http://onlinelibrary.wiley.com/resolve/doi?DOI=10.1111%2Femip.12003"/><rdf:li rdf:resource="http://onlinelibrary.wiley.com/resolve/doi?DOI=10.1111%2Femip.12004"/></rdf:Seq></items></channel><item rdf:about="http://onlinelibrary.wiley.com/resolve/doi?DOI=10.1111%2Femip.12006" xmlns="http://purl.org/rss/1.0/"><title>Editorial</title><link>http://onlinelibrary.wiley.com/resolve/doi?DOI=10.1111%2Femip.12006</link><dc:title xmlns:dc="http://purl.org/dc/elements/1.1/">Editorial</dc:title><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/"/><dc:date xmlns:dc="http://purl.org/dc/elements/1.1/">2013-03-26T00:49:31.964803-05:00</dc:date><dc:identifier xmlns:dc="http://purl.org/dc/elements/1.1/">doi:10.1111/emip.12006</dc:identifier><dc:rights xmlns:dc="http://purl.org/dc/elements/1.1/"/><dc:publisher xmlns:dc="http://purl.org/dc/elements/1.1/">John Wiley &amp; Sons, Inc.</dc:publisher><prism:doi xmlns:prism="http://prismstandard.org/namespaces/1.2/basic/">10.1111/emip.12006</prism:doi><prism:url xmlns:prism="http://prismstandard.org/namespaces/1.2/basic/">http://onlinelibrary.wiley.com/resolve/doi?DOI=10.1111%2Femip.12006</prism:url><prism:startingPage xmlns:prism="http://prismstandard.org/namespaces/1.2/basic/">1</prism:startingPage><prism:endingPage xmlns:prism="http://prismstandard.org/namespaces/1.2/basic/">1</prism:endingPage><content:encoded xmlns:content="http://purl.org/rss/1.0/modules/content/"><![CDATA[]]></content:encoded><description/></item><item rdf:about="http://onlinelibrary.wiley.com/resolve/doi?DOI=10.1111%2Femip.12000" xmlns="http://purl.org/rss/1.0/"><title>The Philosophical Aspects of IRT Equating: Modeling Drift to Evaluate Cohort Growth in Large-Scale Assessments</title><link>http://onlinelibrary.wiley.com/resolve/doi?DOI=10.1111%2Femip.12000</link><dc:title xmlns:dc="http://purl.org/dc/elements/1.1/">The Philosophical Aspects of IRT Equating: Modeling Drift to Evaluate Cohort Growth in Large-Scale Assessments</dc:title><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Husein Taherbhai, Daeryong Seo</dc:creator><dc:date xmlns:dc="http://purl.org/dc/elements/1.1/">2013-03-26T00:49:31.964803-05:00</dc:date><dc:identifier xmlns:dc="http://purl.org/dc/elements/1.1/">doi:10.1111/emip.12000</dc:identifier><dc:rights xmlns:dc="http://purl.org/dc/elements/1.1/"/><dc:publisher xmlns:dc="http://purl.org/dc/elements/1.1/">John Wiley &amp; Sons, Inc.</dc:publisher><prism:doi xmlns:prism="http://prismstandard.org/namespaces/1.2/basic/">10.1111/emip.12000</prism:doi><prism:url xmlns:prism="http://prismstandard.org/namespaces/1.2/basic/">http://onlinelibrary.wiley.com/resolve/doi?DOI=10.1111%2Femip.12000</prism:url><prism:startingPage xmlns:prism="http://prismstandard.org/namespaces/1.2/basic/">2</prism:startingPage><prism:endingPage xmlns:prism="http://prismstandard.org/namespaces/1.2/basic/">14</prism:endingPage><content:encoded xmlns:content="http://purl.org/rss/1.0/modules/content/"><![CDATA[
<div class="para" xmlns:ol="http://www.wiley.com/namespaces/ol/xsl-lib" xmlns="http://www.w3.org/1999/xhtml"><p><em>Calibration and equating is the quintessential necessity for most large-scale educational assessments. However, there are instances when no consideration is given to the equating process in terms of context and substantive realization, and the methods used in its execution.</em></p></div>
<div class="para" xmlns="http://www.w3.org/1999/xhtml"><p><em>In the view of the authors, equating is not merely an exhibit of the statistical methodology, but it is also a reflection of the thought process undertaken in its execution. For example, there is hardly any discussion in literature of the ideological differences in the selection of an equating method. Furthermore, there is little evidence of modeling cohort growth through an identification and use of construct-relevant linking items’ drift, using the common item nonequivalent group equating design. In this article, the authors philosophically justify the use of Huynh's statistical method for the identification of construct-relevant outliers in the linking pool. The article also dispels the perception of scale instability associated with the inclusion of construct-relevant outliers in the linking item pool and concludes that an appreciation of the rationale used in the selection of the equating method, together with the use of linking items in modeling cohort growth, can be beneficial to the practitioners.</em></p></div>
]]></content:encoded><description>
Calibration and equating is the quintessential necessity for most large-scale educational assessments. However, there are instances when no consideration is given to the equating process in terms of context and substantive realization, and the methods used in its execution.
In the view of the authors, equating is not merely an exhibit of the statistical methodology, but it is also a reflection of the thought process undertaken in its execution. For example, there is hardly any discussion in literature of the ideological differences in the selection of an equating method. Furthermore, there is little evidence of modeling cohort growth through an identification and use of construct-relevant linking items’ drift, using the common item nonequivalent group equating design. In this article, the authors philosophically justify the use of Huynh's statistical method for the identification of construct-relevant outliers in the linking pool. The article also dispels the perception of scale instability associated with the inclusion of construct-relevant outliers in the linking item pool and concludes that an appreciation of the rationale used in the selection of the equating method, together with the use of linking items in modeling cohort growth, can be beneficial to the practitioners.
</description></item><item rdf:about="http://onlinelibrary.wiley.com/resolve/doi?DOI=10.1111%2Femip.12001" xmlns="http://purl.org/rss/1.0/"><title>Assessing a Critical Aspect of Construct Continuity When Test Specifications Change or Test Forms Deviate from Specifications</title><link>http://onlinelibrary.wiley.com/resolve/doi?DOI=10.1111%2Femip.12001</link><dc:title xmlns:dc="http://purl.org/dc/elements/1.1/">Assessing a Critical Aspect of Construct Continuity When Test Specifications Change or Test Forms Deviate from Specifications</dc:title><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Jinghua Liu, Neil J. Dorans</dc:creator><dc:date xmlns:dc="http://purl.org/dc/elements/1.1/">2013-03-26T00:49:31.964803-05:00</dc:date><dc:identifier xmlns:dc="http://purl.org/dc/elements/1.1/">doi:10.1111/emip.12001</dc:identifier><dc:rights xmlns:dc="http://purl.org/dc/elements/1.1/"/><dc:publisher xmlns:dc="http://purl.org/dc/elements/1.1/">John Wiley &amp; Sons, Inc.</dc:publisher><prism:doi xmlns:prism="http://prismstandard.org/namespaces/1.2/basic/">10.1111/emip.12001</prism:doi><prism:url xmlns:prism="http://prismstandard.org/namespaces/1.2/basic/">http://onlinelibrary.wiley.com/resolve/doi?DOI=10.1111%2Femip.12001</prism:url><prism:startingPage xmlns:prism="http://prismstandard.org/namespaces/1.2/basic/">15</prism:startingPage><prism:endingPage xmlns:prism="http://prismstandard.org/namespaces/1.2/basic/">22</prism:endingPage><content:encoded xmlns:content="http://purl.org/rss/1.0/modules/content/"><![CDATA[
<div class="para" xmlns:ol="http://www.wiley.com/namespaces/ol/xsl-lib" xmlns="http://www.w3.org/1999/xhtml"><p><em>We make a distinction between two types of test changes: inevitable deviations from specifications versus planned modifications of specifications. We describe how score equity assessment (SEA) can be used as a tool to assess a critical aspect of construct continuity, the equivalence of scores, whenever planned changes are introduced to testing programs. We also report on how SEA can be used as a quality control check to evaluate whether tests developed to a static set of specifications remain within acceptable tolerance levels with respect to equatability.</em></p></div>
]]></content:encoded><description>
We make a distinction between two types of test changes: inevitable deviations from specifications versus planned modifications of specifications. We describe how score equity assessment (SEA) can be used as a tool to assess a critical aspect of construct continuity, the equivalence of scores, whenever planned changes are introduced to testing programs. We also report on how SEA can be used as a quality control check to evaluate whether tests developed to a static set of specifications remain within acceptable tolerance levels with respect to equatability.
</description></item><item rdf:about="http://onlinelibrary.wiley.com/resolve/doi?DOI=10.1111%2Femip.12002" xmlns="http://purl.org/rss/1.0/"><title>A Proposed Framework for Evaluating Alignment Studies</title><link>http://onlinelibrary.wiley.com/resolve/doi?DOI=10.1111%2Femip.12002</link><dc:title xmlns:dc="http://purl.org/dc/elements/1.1/">A Proposed Framework for Evaluating Alignment Studies</dc:title><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Susan L. Davis-Becker, Chad W. Buckendahl</dc:creator><dc:date xmlns:dc="http://purl.org/dc/elements/1.1/">2013-03-26T00:49:31.964803-05:00</dc:date><dc:identifier xmlns:dc="http://purl.org/dc/elements/1.1/">doi:10.1111/emip.12002</dc:identifier><dc:rights xmlns:dc="http://purl.org/dc/elements/1.1/"/><dc:publisher xmlns:dc="http://purl.org/dc/elements/1.1/">John Wiley &amp; Sons, Inc.</dc:publisher><prism:doi xmlns:prism="http://prismstandard.org/namespaces/1.2/basic/">10.1111/emip.12002</prism:doi><prism:url xmlns:prism="http://prismstandard.org/namespaces/1.2/basic/">http://onlinelibrary.wiley.com/resolve/doi?DOI=10.1111%2Femip.12002</prism:url><prism:startingPage xmlns:prism="http://prismstandard.org/namespaces/1.2/basic/">23</prism:startingPage><prism:endingPage xmlns:prism="http://prismstandard.org/namespaces/1.2/basic/">33</prism:endingPage><content:encoded xmlns:content="http://purl.org/rss/1.0/modules/content/"><![CDATA[
<div class="para" xmlns:ol="http://www.wiley.com/namespaces/ol/xsl-lib" xmlns="http://www.w3.org/1999/xhtml"><p><em>Evaluating the multiple characteristics of alignment has taken a prominent role in educational assessment and accountability systems given its attention in the No Child Left Behind legislation (NCLB). Leading to this rise in popularity, alignment methodologies that examined relationships among curriculum, academic content standards, instruction, and assessments were proposed as strategies to evaluate evidence of the intended uses and interpretations of test scores. In this article, we propose a framework for evaluating alignment studies based on similar concepts that have been recommended for standard setting (Kane). This framework provides guidance to practitioners about how to identify sources of validity evidence for an alignment study and make judgments about the strength of the evidence that may impact the interpretation of the results.</em></p></div>
]]></content:encoded><description>
Evaluating the multiple characteristics of alignment has taken a prominent role in educational assessment and accountability systems given its attention in the No Child Left Behind legislation (NCLB). Leading to this rise in popularity, alignment methodologies that examined relationships among curriculum, academic content standards, instruction, and assessments were proposed as strategies to evaluate evidence of the intended uses and interpretations of test scores. In this article, we propose a framework for evaluating alignment studies based on similar concepts that have been recommended for standard setting (Kane). This framework provides guidance to practitioners about how to identify sources of validity evidence for an alignment study and make judgments about the strength of the evidence that may impact the interpretation of the results.
</description></item><item rdf:about="http://onlinelibrary.wiley.com/resolve/doi?DOI=10.1111%2Femip.12003" xmlns="http://purl.org/rss/1.0/"><title>Validating Student Score Inferences With Person-Fit Statistic and Verbal Reports: A Person-Fit Study for Cognitive Diagnostic Assessment</title><link>http://onlinelibrary.wiley.com/resolve/doi?DOI=10.1111%2Femip.12003</link><dc:title xmlns:dc="http://purl.org/dc/elements/1.1/">Validating Student Score Inferences With Person-Fit Statistic and Verbal Reports: A Person-Fit Study for Cognitive Diagnostic Assessment</dc:title><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Ying Cui, Mary Roduta Roberts</dc:creator><dc:date xmlns:dc="http://purl.org/dc/elements/1.1/">2013-03-26T00:49:31.964803-05:00</dc:date><dc:identifier xmlns:dc="http://purl.org/dc/elements/1.1/">doi:10.1111/emip.12003</dc:identifier><dc:rights xmlns:dc="http://purl.org/dc/elements/1.1/"/><dc:publisher xmlns:dc="http://purl.org/dc/elements/1.1/">John Wiley &amp; Sons, Inc.</dc:publisher><prism:doi xmlns:prism="http://prismstandard.org/namespaces/1.2/basic/">10.1111/emip.12003</prism:doi><prism:url xmlns:prism="http://prismstandard.org/namespaces/1.2/basic/">http://onlinelibrary.wiley.com/resolve/doi?DOI=10.1111%2Femip.12003</prism:url><prism:startingPage xmlns:prism="http://prismstandard.org/namespaces/1.2/basic/">34</prism:startingPage><prism:endingPage xmlns:prism="http://prismstandard.org/namespaces/1.2/basic/">42</prism:endingPage><content:encoded xmlns:content="http://purl.org/rss/1.0/modules/content/"><![CDATA[
<div class="para" xmlns:ol="http://www.wiley.com/namespaces/ol/xsl-lib" xmlns="http://www.w3.org/1999/xhtml"><p><em>The goal of this study was to investigate the usefulness of person-fit analysis in validating student score inferences in a cognitive diagnostic assessment. In this study, a two-stage procedure was used to evaluate person fit for a diagnostic test in the domain of statistical hypothesis testing. In the first stage, the person-fit statistic, the hierarchy consistency index (HCI; </em><a href="#b2" rel="references:#b2"><em>Cui, 2007</em></a><em>; </em><a href="#b3" rel="references:#b3"><em>Cui &amp; Leighton, 2009</em></a><em>), was used to identify the misfitting student item-score vectors</em>. <em>In the second stage, students’ verbal reports were collected to provide additional information about students’ response processes so as to reveal the actual causes of misfits. This two-stage procedure helped to identify the misfits of item-score vectors to the cognitive model used in the design and analysis of the diagnostic test, and to discover the reasons of misfits so that students’ problem-solving strategies were better understood and their performances were interpreted in a more meaningful way</em>.</p></div>
]]></content:encoded><description>
The goal of this study was to investigate the usefulness of person-fit analysis in validating student score inferences in a cognitive diagnostic assessment. In this study, a two-stage procedure was used to evaluate person fit for a diagnostic test in the domain of statistical hypothesis testing. In the first stage, the person-fit statistic, the hierarchy consistency index (HCI; Cui, 2007; Cui &amp; Leighton, 2009), was used to identify the misfitting student item-score vectors. In the second stage, students’ verbal reports were collected to provide additional information about students’ response processes so as to reveal the actual causes of misfits. This two-stage procedure helped to identify the misfits of item-score vectors to the cognitive model used in the design and analysis of the diagnostic test, and to discover the reasons of misfits so that students’ problem-solving strategies were better understood and their performances were interpreted in a more meaningful way.
</description></item><item rdf:about="http://onlinelibrary.wiley.com/resolve/doi?DOI=10.1111%2Femip.12004" xmlns="http://purl.org/rss/1.0/"><title>A Synthesis of the Peer-Reviewed Differential Bundle Functioning Research</title><link>http://onlinelibrary.wiley.com/resolve/doi?DOI=10.1111%2Femip.12004</link><dc:title xmlns:dc="http://purl.org/dc/elements/1.1/">A Synthesis of the Peer-Reviewed Differential Bundle Functioning Research</dc:title><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Kathleen Banks</dc:creator><dc:date xmlns:dc="http://purl.org/dc/elements/1.1/">2013-03-26T00:49:31.964803-05:00</dc:date><dc:identifier xmlns:dc="http://purl.org/dc/elements/1.1/">doi:10.1111/emip.12004</dc:identifier><dc:rights xmlns:dc="http://purl.org/dc/elements/1.1/"/><dc:publisher xmlns:dc="http://purl.org/dc/elements/1.1/">John Wiley &amp; Sons, Inc.</dc:publisher><prism:doi xmlns:prism="http://prismstandard.org/namespaces/1.2/basic/">10.1111/emip.12004</prism:doi><prism:url xmlns:prism="http://prismstandard.org/namespaces/1.2/basic/">http://onlinelibrary.wiley.com/resolve/doi?DOI=10.1111%2Femip.12004</prism:url><prism:startingPage xmlns:prism="http://prismstandard.org/namespaces/1.2/basic/">43</prism:startingPage><prism:endingPage xmlns:prism="http://prismstandard.org/namespaces/1.2/basic/">55</prism:endingPage><content:encoded xmlns:content="http://purl.org/rss/1.0/modules/content/"><![CDATA[
<div class="para" xmlns:ol="http://www.wiley.com/namespaces/ol/xsl-lib" xmlns="http://www.w3.org/1999/xhtml"><p><em>The purpose of this article was to present a synthesis of the peer-reviewed differential bundle functioning (DBF) research that has been conducted to date. A total of 16 studies were synthesized according to the following characteristics: tests used and learner groups, organizing principles used for developing bundles, DBF detection methods used, and types of bundles that indicated statistically significant DBF in the hypothesized direction on multiple occasions. The article concludes with a list of suggestions to individuals who conduct DBF research. For example, effect size guidelines should be established for interpreting the amount of DBF in bundles of items assessed with simultaneous item bias test (SIBTEST), given that it is the most commonly used DBF procedure. This would reduce our reliance on statistical significance testing. General effect size guidelines are needed as well as guidelines for special circumstances like small sample cases. Other useful suggestions are offered as well.</em></p></div>
]]></content:encoded><description>
The purpose of this article was to present a synthesis of the peer-reviewed differential bundle functioning (DBF) research that has been conducted to date. A total of 16 studies were synthesized according to the following characteristics: tests used and learner groups, organizing principles used for developing bundles, DBF detection methods used, and types of bundles that indicated statistically significant DBF in the hypothesized direction on multiple occasions. The article concludes with a list of suggestions to individuals who conduct DBF research. For example, effect size guidelines should be established for interpreting the amount of DBF in bundles of items assessed with simultaneous item bias test (SIBTEST), given that it is the most commonly used DBF procedure. This would reduce our reliance on statistical significance testing. General effect size guidelines are needed as well as guidelines for special circumstances like small sample cases. Other useful suggestions are offered as well.
</description></item></rdf:RDF>