«® Knowledge Works Logo CMYK Equivalents - For Print Materials Acknowledgements Thanks to generous support from the Nellie Mae Education Foundation, ...»
in the Design and Implementation
of an Innovative Assessment and
Knowledge Works Logo
CMYK Equivalents - For Print Materials
Thanks to generous support from the Nellie Mae Education Foundation, KnowledgeWorks, and the National Center for the Improvement of Educational Assessment (Center for Assessment) have partnered to help states better understand and leverage the new Innovative Assessment and Accountability Demonstration Authority authorized under the Every Student Succeeds Act (ESSA). The goal of this partnership is to help states identify and explore a set of readiness conditions that are critical to the development of a high quality application and implementation process under this new authority. While we share a history of advocacy for next generation assessments, our organizations each bring a unique perspective to this work. KnowledgeWorks focuses on policy development, partnering with states, districts, and educators to identify and remove policy barriers that inhibit the growth of personalized learning. The Center for Assessment specializes in the design of assessment and accountability systems, helping states, districts, and other entities improve the quality of these systems and maximize student success.
Lyons, S., Marion, S.F., Pace, L., & Williams, M. (2016). Addressing Accountability Issues including Comparability in the Design and Implementation of an Innovative Assessment and Accountability System. www.knowledgeworks.org and www.nciea.org.
Ensuring and Evaluating Assessment Quality | 2 Table of Contents Introduction 4 Purpose 5 Alignment to Theory of Action 6 Defining Comparability 7 Comparability by Design 9 Methods for Establishing a Strong Evidence Base to Support Claims of Comparability State and District Roles 15 State Example 16 Summary 18 Additional Support 19 About 20 KnowledgeWorks 20 National Center for the Improvement of Educational Assessment 20 Nellie Mae Education Foundation 20 Ensuring and Evaluating Assessment Quality | 3 Introduction This is the third in a series of policy and practice briefs produced by KnowledgeWorks and the National Center for the Improvement of Educational Assessment (Center for Assessment) designed to assist states in thinking through the opportunities and challenges associated with flexibility provided under the Every Student Succeeds Act (ESSA).1 These briefs help define “Readiness Conditions” for states considering applying for and successfully implementing an innovative assessment and accountability system as defined by the Demonstration Authority opportunity under ESSA. In addition to those that have already been published, the following briefs will be released over
the next few months:
Supporting Educators and Students through Implementation of an Innovative Assessment and Accountability System Evaluating and Continuously Improving an Innovative Assessment and Accountability System Establishing a Timeline and Budget for Design and Implementation of an Innovative Assessment and Accountability System Building Capacity and Stakeholder Support for Scaling an Innovative Assessment and Accountability System Brief #3 in a series of policy and practice briefs designed to help states prepare for the ESSA Assessment and Accountability Demonstration Authority. We are grateful to the Nellie Mae Foundation for their generous support of this project.
Ensuring and Evaluating Assessment Quality | 4 Purpose The Innovative Assessment and Accountability Demonstration Authority (hereafter known as the “innovative pilot” or the “Demonstration Authority”) provides states with an opportunity to collaborate with a sample of local districts to pilot a new kind of assessment and accountability system within the state. This system does not have to rely on statewide, standardized assessments as the sole indicator of student achievement, but instead may pilot different types of non-standardized assessments (e.g., instructionally embedded assessments, performance tasks) that may provide for some degree of local flexibility. Because states must incorporate assessment results from the pilot districts into the state accountability system alongside the results generated from the non-pilot districts, the assessment system must meet all of the same technical requirements as the state standardized assessments—e.g., alignment, validity, reliability, accessibility.2 Additionally, because the innovative pilot will take time to scale statewide, the state must ensure that the assessment systems are producing comparable results within pilot districts, among pilot districts, and importantly, across pilot and non-pilot districts.
The purpose of this brief is to support states in planning for a successful Demonstration Authority application by providing key conceptual and technical considerations related to promoting and evaluating comparability in an innovative assessment and accountability pilot. We begin with a discussion of alignment to the state’s theory of action so the pilot focuses on the intended goals of the system.
Next, we define comparability in the era of ESSA flexibility, and lay the groundwork for a common understanding of how evidence of comparability differs depending on the nature and use of the reported scores. We then delve deeply into how states could approach comparability from a design perspective, providing detailed examples of processes that states could use to support their intended comparability claims. We additionally provide descriptions of the state and local roles for ensuring comparability. Lastly, we provide a case study that details a key comparability practice from the innovative assessment and accountability system in New Hampshire.
For detailed information regarding the technical quality considerations of an innovative pilot, please refer to Brief #2 Ensuring and Evaluating Assessment Quality in the Design and Implementation of an Innovative Assessment and Accountability System.
Ensuring and Evaluating Assessment Quality | 5 Alignment to Theory of Action As emphasized in the Project Narrative brief, the importance of a clear articulation of the state vision and the associated theory of action for attaining that vision cannot be overemphasized. Comparability is a critical goal whenever assessments are being used for accountability. This is especially true when states have incorporated some degree of local flexibility into its assessment systems. Providing for comparability within the initial design conceptualization of the system will be crucial to the success of the pilot.
In order to design and administer meaningful assessment that will change the way instruction and learning occurs in the classroom, local educators will need to engage in rich discussions about what deep learning looks like for every grade level and content area. For example, defining the expectations for student performance in a competency-based education model requires that educators across the state have shared definitions about both the content standards and the required evidence for evaluating student competence relative to the content standards. Therefore, the beginnings of a comparability argument are baked into the learning system of the innovative pilot that the assessment and accountability systems must capture.
This brief provides examples of how states can achieve the goal of comparability by planning for it in the pilot design and the processes and audits that comprise the new assessment and accountability system. Each of these design features should be born out of an alignment with the overall theory of action for how learning is changing within the state, and how the pilot will ultimately bring about that change.
Ensuring and Evaluating Assessment Quality | 6 Defining Comparability In educational measurement, comparability is usually premised on the notion of score interchangeability. If scores can be used interchangeably, that means the scores support the same interpretations about what students know and can do relative to the assessed content. Comparability is an accumulation of evidence to support claims about the meaning of test scores and whether scores from two or more tests can be used to support the same inferences and uses. While it is typical in the United States to support comparability by standardizing testing conditions (e.g., administration, scoring), we must acknowledge that score comparability is not necessarily at odds with flexibility.3 As an example, we provide accommodations for standardized assessments because we believe this type of “flexibility” actually improves our claims of score comparability by removing barriers to the assessed content. Just like changing the administration conditions for students with different abilities supports our notion of comparability, it could be argued that changing the mode of assessment (e.g., performance-based assessment) will provide better information about what students know and can do for students in different educational settings (e.g., competencybased), than we could glean from traditional standardized assessments.
Because claims of comparability are inherently tied to the interpretations and uses of the scores, comparability rests on what is being reported. This means that evidence used to support claims of comparability will differ depending on the nature (or grain-size) of the reported scores. For example, supporting claims of raw score interchangeability—the strongest form of comparability—would likely require the administration of a single assessment form with measurement properties that are the same across all respondents (i.e., measurement invariance). Any state assessment system with multiple assessment forms fails to meet this level of score interchangeability. Instead, the design of most state assessment systems aims to be comparable enough to support scale score interchangeability. This level of comparability typically requires that multiple test forms are designed to the same blueprint, administered under almost identical conditions, and scored using the same rules and procedures. Still, many states continue to struggle to meet this level of comparability (e.g., challenges with multiple modes of administration—paper-based, computer-based, and device-based). In this way, comparability is an evidence-based argument, and the strength of evidence needed will necessarily depend on the type of score being supported. As shown in Figure 1, comparability lies on a continuum that is based on both the degree of similarity in the assessed content and the granularity of the score being reported.4 Gong, B., & DePascale, C. (2013). Different but the same: Assessment “comparability” in the era of the Common Core State Standards. Washington, DC: The Council of Chief State School Officers.
Winter, P. (2010). Comparability and test variations. In P. Winter (Ed.), Evaluating the comparability of scores from achievement test variations (pp. 1–11). Washington, DC: Council of Chief State School Officers.
The Demonstration Authority requires states to ensure that summative “annual determinations” (e.g., performance levels) are comparable. Comparability, therefore, must exist at the level of the annual determinations. This means that if a student is determined to be “proficient” relative to the grade-level content standards in one district in the state, had that student been assigned to another district’s assessment system (either pilot or non-pilot) he or she could expect to be proficient. To support claims of comparability at the annual determination level, any pilot program will need to build in a number of processes and auditing mechanisms to create a strong evidence base for supporting the claims of comparability within each pilot district, among pilot districts, and across pilot and non-pilot districts.
Winter, P. (2010). Comparability and test variations. In P. Winter (Ed.), Evaluating the comparability of scores from achievement test variations (p. 5). Washington, DC: Council of Chief State School Officers.
Ensuring and Evaluating Assessment Quality | 8 Comparability by Design The methods for gathering evidence to support a comparability claim are not a series of analyses, but rather begin with the design of the innovative assessment and accountability pilot itself. In traditional standardized assessment programs, comparability is generally established by planning for it in the assessment system design (e.g., addressing the same learning targets in the same ways, embedding items), evaluating the degree of comparability achieved (e.g., analyses of differential item functioning), and then, if necessary, adjusting the measurement scales to account for differences (e.g., equating). Providing evidence of comparability for the innovative assessment system will require discussion related to each of these steps, even if the methods related to each step are necessarily different. Three key questions shown in Figure 2 below can guide the process of designing a pilot to produce comparability results—comparability by design.
Figure 2. Comparability by Design—Guiding Questions
The order of these guiding questions is exceedingly important. It will not be possible to evaluate the degree of comparability that these scores produce under different assessment systems if comparability has not been carefully planned for (e.g., through common items or tasks). Similarly, it will not be possible to calibrate results if the nature and magnitude of the adjustments are not known through a careful evaluation of the degree of comparability achieved across assessment systems. No amount of evaluation and calibration can fix a system that has not been carefully designed to produce scores that are likely to be comparable. Thus, garnering evidence to support comparability of the assessment system results will require thoughtful planning of the program processes that will promote comparability, and the program monitoring mechanisms that will evaluate comparability. Examples of how this could be done to support claims of comparability of results within pilot districts, among pilot districts, and across pilot and non-pilot districts are provided on the next few pages.