HOME:  Dismissive Reviews in Education Policy Research        
  Author Co-author(s) Dismissive Quote type Title Source Link1 Funders Notes Notes2
1 John F. Pane   "Practitioners and policymakers seeking to implement personalized learning, lacking clearly defined evidence-based models to adopt, are creating custom designs for their specific contexts. Those who want to use rigorous research evidence to guide their designs will find many gaps and will be left with important unanswered questions about which practices or combinations of practices are effective. It will likely take many years of research to fill these gaps". Dismissive Strategies for Implementing Personalized Learning While Evidence and Resources Are Underdeveloped, p.1 Rand Corporation Perspective, October 2018 https://www.rand.org/pubs/perspectives/PE314.html Funded by the William and Flora Hewlett Foundation, UCLA’s National Center for Research on Evaluation, Standards, and Student Testing (CRESST) is monitoring the extent to which the two consortia’s assessment development efforts are likely to produce tests that measure and support goals for deeper learning.  And, Rand Corporation funders Pane devotes considerable text to claims that no prior research exists, except for another Rand study, and then, on p.7 admits that there exist some relevant mastery learning studies from the 1980s. He implies, however, that there were only one or a few. In fact, there were hundreds. There have also been thousands  of studies of personalized instruction in conjunction with studies in special education, tutoring, teachers' aides, tracking, etc. 
2 John F. Pane   "The purpose of this Perspective is to offer strategic guidance for designers of personalized learning programs to consider while the evidence base is catching up." Dismissive Strategies for Implementing Personalized Learning While Evidence and Resources Are Underdeveloped, p.1 Rand Corporation Perspective, October 2018 https://www.rand.org/pubs/perspectives/PE314.html Funded by the William and Flora Hewlett Foundation, UCLA’s National Center for Research on Evaluation, Standards, and Student Testing (CRESST) is monitoring the extent to which the two consortia’s assessment development efforts are likely to produce tests that measure and support goals for deeper learning.  And, Rand Corporation funders Pane devotes considerable text to claims that no prior research exists, except for another Rand study, and then, on p.7 admits that there exist some relevant mastery learning studies from the 1980s. He implies, however, that there were only one or a few. In fact, there were hundreds. There have also been thousands  of studies of personalized instruction in conjunction with studies in special education, tutoring, teachers' aides, tracking, etc. 
3 John F. Pane   "This guidance draws on theory, basic principles of learning science, and the limited research that does exist on personalized learning and its component parts." Dismissive Strategies for Implementing Personalized Learning While Evidence and Resources Are Underdeveloped, p.1 Rand Corporation Perspective, October 2018 https://www.rand.org/pubs/perspectives/PE314.html Funded by the William and Flora Hewlett Foundation, UCLA’s National Center for Research on Evaluation, Standards, and Student Testing (CRESST) is monitoring the extent to which the two consortia’s assessment development efforts are likely to produce tests that measure and support goals for deeper learning.  And, Rand Corporation funders Pane devotes considerable text to claims that no prior research exists, except for another Rand study, and then, on p.7 admits that there exist some relevant mastery learning studies from the 1980s. He implies, however, that there were only one or a few. In fact, there were hundreds. There have also been thousands  of studies of personalized instruction in conjunction with studies in special education, tutoring, teachers' aides, tracking, etc. 
4 John F. Pane   "Thus far, the research evidence on personalized learning as an overarching schoolwide model is sparse." Dismissive Strategies for Implementing Personalized Learning While Evidence and Resources Are Underdeveloped, p.4 Rand Corporation Perspective, October 2018 https://www.rand.org/pubs/perspectives/PE314.html Funded by the William and Flora Hewlett Foundation, UCLA’s National Center for Research on Evaluation, Standards, and Student Testing (CRESST) is monitoring the extent to which the two consortia’s assessment development efforts are likely to produce tests that measure and support goals for deeper learning.  And, Rand Corporation funders Pane devotes considerable text to claims that no prior research exists, except for another Rand study, and then, on p.7 admits that there exist some relevant mastery learning studies from the 1980s. He implies, however, that there were only one or a few. In fact, there were hundreds. There have also been thousands  of studies of personalized instruction in conjunction with studies in special education, tutoring, teachers' aides, tracking, etc. 
5 John F. Pane   "A team of RAND Corporation researchers conducted the largest and most-rigorous studies of student achievement effects to date." 1stness Strategies for Implementing Personalized Learning While Evidence and Resources Are Underdeveloped, p.4 Rand Corporation Perspective, October 2018 https://www.rand.org/pubs/perspectives/PE314.html Funded by the William and Flora Hewlett Foundation, UCLA’s National Center for Research on Evaluation, Standards, and Student Testing (CRESST) is monitoring the extent to which the two consortia’s assessment development efforts are likely to produce tests that measure and support goals for deeper learning.  And, Rand Corporation funders Pane devotes considerable text to claims that no prior research exists, except for another Rand study, and then, on p.7 admits that there exist some relevant mastery learning studies from the 1980s. He implies, however, that there were only one or a few. In fact, there were hundreds. There have also been thousands  of studies of personalized instruction in conjunction with studies in special education, tutoring, teachers' aides, tracking, etc. 
6 John F. Pane   "While we await the answers to those questions, substantial enthusiasm around personalized learning persists. Educators, policy makers, and advocates are moving forward without the guidance of conclusive research evidence." Dismissive Strategies for Implementing Personalized Learning While Evidence and Resources Are Underdeveloped, p.5 Rand Corporation Perspective, October 2018 https://www.rand.org/pubs/perspectives/PE314.html Funded by the William and Flora Hewlett Foundation, UCLA’s National Center for Research on Evaluation, Standards, and Student Testing (CRESST) is monitoring the extent to which the two consortia’s assessment development efforts are likely to produce tests that measure and support goals for deeper learning.  And, Rand Corporation funders Pane devotes considerable text to claims that no prior research exists, except for another Rand study, and then, on p.7 admits that there exist some relevant mastery learning studies from the 1980s. He implies, however, that there were only one or a few. In fact, there were hundreds. There have also been thousands  of studies of personalized instruction in conjunction with studies in special education, tutoring, teachers' aides, tracking, etc. 
7 John F. Pane   "In the absence of comprehensive, rigorous evidence to help select the personalized learning components most likely to succeed, what is the path forward? I suggest a few guiding principles aimed at using existing scientific knowledge and the best available resources." Denigrating Strategies for Implementing Personalized Learning While Evidence and Resources Are Underdeveloped, p.5 Rand Corporation Perspective, October 2018 https://www.rand.org/pubs/perspectives/PE314.html Funded by the William and Flora Hewlett Foundation, UCLA’s National Center for Research on Evaluation, Standards, and Student Testing (CRESST) is monitoring the extent to which the two consortia’s assessment development efforts are likely to produce tests that measure and support goals for deeper learning.  And, Rand Corporation funders Pane devotes considerable text to claims that no prior research exists, except for another Rand study, and then, on p.7 admits that there exist some relevant mastery learning studies from the 1980s. He implies, however, that there were only one or a few. In fact, there were hundreds. There have also been thousands  of studies of personalized instruction in conjunction with studies in special education, tutoring, teachers' aides, tracking, etc. 
8 John F. Pane   "However, more work is necessary to establish causal evidence that the concept leads to improved outcomes for students" Dismissive Strategies for Implementing Personalized Learning While Evidence and Resources Are Underdeveloped, p.9 Rand Corporation Perspective, October 2018 https://www.rand.org/pubs/perspectives/PE314.html Funded by the William and Flora Hewlett Foundation, UCLA’s National Center for Research on Evaluation, Standards, and Student Testing (CRESST) is monitoring the extent to which the two consortia’s assessment development efforts are likely to produce tests that measure and support goals for deeper learning.  And, Rand Corporation funders Pane devotes considerable text to claims that no prior research exists, except for another Rand study, and then, on p.7 admits that there exist some relevant mastery learning studies from the 1980s. He implies, however, that there were only one or a few. In fact, there were hundreds. There have also been thousands  of studies of personalized instruction in conjunction with studies in special education, tutoring, teachers' aides, tracking, etc. 
9 Lorraine M. McDonnell   "However, an essential question for those interested in the politics of education policy has not been central in past research: To what extent have recent accountability policies altered the politics of education? This article begins to address that question ..." Dismissive Educational Accountability and Policy Feedback, p.171 Educational Policy, 27(2) 170–189 https://journals.sagepub.com/doi/10.1177/0895904812465119 "The author received financial support from the William T. Grant Foundation for research presented in this article."  
10 Jinok Kim Joan L. Herman "However, the validity of existing criteria and procedures lack an empirical base; in fact, reclassification practices are formulated and implemented with little knowledge of the factors that may influence their success." Dismissive, Denigrating Understanding Patterns and Precursors of ELL Success Subsequent to Reclassification, p.1 CRESST Report 818, August, 2012 https://files.eric.ed.gov/fulltext/ED540604.pdf "The work reported herein was supported under the National Research and Development Centers, PR/Award Number R305A09058101, as administered by the U.S. Department of Education, Institute of Education Sciences."  
11 Jinok Kim Joan L. Herman "Because the research basis for making mainstreaming or reclassification decisions remains slim, it may not be surprising that criteria for reclassifying students from ELL to Reclassified as Fluent English Proficient (RFEP) status vary substantially across states, as documented by a recent report reviewing statewide practices related to ELLs." Dismissive Understanding Patterns and Precursors of ELL Success Subsequent to Reclassification, p.3 CRESST Report 818, August, 2012 https://files.eric.ed.gov/fulltext/ED540604.pdf "The work reported herein was supported under the National Research and Development Centers, PR/Award Number R305A09058101, as administered by the U.S. Department of Education, Institute of Education Sciences."  
12 Jinok Kim Joan L. Herman "Previous studies cited earlier have identified potential problems in current reclassification, qualitatively analyzed criteria, and student characteristics that may relate to high versus low redesignation rates, and examined related research questions, such as how long it takes for non native speakers to acquire ELP or be reclassified; but none of the existing literature has directly dealt with reclassification systems and their consequences, and more specifically with the consequences of various reclassification criteria." 1stness Understanding Patterns and Precursors of ELL Success Subsequent to Reclassification, p.6 CRESST Report 818, August, 2012 https://files.eric.ed.gov/fulltext/ED540604.pdf "The work reported herein was supported under the National Research and Development Centers, PR/Award Number R305A09058101, as administered by the U.S. Department of Education, Institute of Education Sciences."  
13 Laura S. Hamilton Brian M. Stecher, Kun Yuan "He also noted that virtually all of the arguments, both for and against standards, are based on beliefs and hypotheses rather than on direct empirical evidence” (p. 427) Although a large and growing body of research has been conducted to examine the effects of SBA, the caution Porter expressed in 1994 about the lack of empirical evidence remains relevant today." pp.157-158 Denigrating Standards-Based Accountability in the United States: Lessons Learned and Future Directions Education Inquiry, 3(2), June 2012, 149-170 https://www.academia.edu/15201890/Standards_Based_Accountability_in_the_United_States_Lessons_Learned_and_Future_Directions_1 "Material in this paper has been adapted from a paper commissioned by the Center on Education Policy: Hamilton, L.S., Stecher, B.M., & Yuan, K. (2009) Standards-based Reform in the United States: History, Research, and Future Directions. Washington, DC: Center on Education Policy. Portions of this work were supported by the National Science Foundation under Grant No. REC-0228295." Relevant pre-2000 studies of the effects of standards, alignment, goal setting, setting reachable goals, etc. include those of Mitchell (1999); Morgan & Ramist (1998); the *Southern Regional Education Board (1998); Miles, Bishop, Collins, Fink, Gardner, Grant, Hussain, et al. (1997); the Florida Office of Program Policy Analysis (1997); Pomplun (1997); Schmoker (1996); Aguilera & Hendricks (1996); Banta, Lund, Black & Oblander (1996); Bottoms & Mikos (1995); *Bamburg & Medina (1993); Bishop (1993); the U. S. General Accounting Office (1993); Eckstein & Noah (1993); Mattsson (1993); Brown (1992); Heyneman & Ransom (1992); Whetton (1992); Anderson, Muir, Bateson, Blackmore & Rogers (1990); Csikszentmihalyi (1990); *Levine & Lezotte (1990); LaRoque & Coleman (1989); Hillocks (1987); Willingham & Morris (1986); Resnick & Resnick (1985); Ogle & Fritts (1984); *Natriello & Dornbusch (1984); Brooke & Oxenham (1984); Rentz (1979); Wellisch, MacQueen, Carriere & Dick (1978); *Rosswork (1977); Estes, Colvin & Goodwin (1976); Wood (1953); and Panlasigui & Knight (1930).
14 Laura S. Hamilton Brian M. Stecher, Kun Yuan "High-quality research on the effects of SBA is difficult to conduct for a number of reasons,…." p.158 Dismissive Standards-Based Accountability in the United States: Lessons Learned and Future Directions Education Inquiry, 3(2), June 2012, 149-170 https://www.academia.edu/15201890/Standards_Based_Accountability_in_the_United_States_Lessons_Learned_and_Future_Directions_1 "Material in this paper has been adapted from a paper commissioned by the Center on Education Policy: Hamilton, L.S., Stecher, B.M., & Yuan, K. (2009) Standards-based Reform in the United States: History, Research, and Future Directions. Washington, DC: Center on Education Policy. Portions of this work were supported by the National Science Foundation under Grant No. REC-0228295." Access to anonymized student data is granted all the time. Externally administered high-stakes testing is widely reviled among US educationists. It strains credulity that one can not find one or a few districts out of the many thousands to cooperate in a study to discredit testing.
15 Laura S. Hamilton Brian M. Stecher, Kun Yuan "Even when the necessary data have been collected by states or other entities, it is often difficult for researchers to obtain these data because those responsible for the data refuse to grant access, either because of concerns about confidentiality or because they are not interested in having their programmes scrutinised by. researchers. Thus, the amount of rigorous analysis is limited." p.158 Dismissive Standards-Based Accountability in the United States: Lessons Learned and Future Directions Education Inquiry, 3(2), June 2012, 149-170 https://www.academia.edu/15201890/Standards_Based_Accountability_in_the_United_States_Lessons_Learned_and_Future_Directions_1 "Material in this paper has been adapted from a paper commissioned by the Center on Education Policy: Hamilton, L.S., Stecher, B.M., & Yuan, K. (2009) Standards-based Reform in the United States: History, Research, and Future Directions. Washington, DC: Center on Education Policy. Portions of this work were supported by the National Science Foundation under Grant No. REC-0228295." Access to anonymized student data is granted all the time. Externally administered high-stakes testing is widely reviled among US educationists. It strains credulity that one can not find one or a few districts out of the many thousands to cooperate in a study to discredit testing.
16 Laura S. Hamilton Brian M. Stecher, Kun Yuan "These evaluation findings reveal the challenges inherent in trying to judge the quality of standards. Arguably the most important test of quality is whether the standards promote high-quality instruction and improved student learning but, as we discuss later, there is very little research to address that question." p.158 Dismissive Standards-Based Accountability in the United States: Lessons Learned and Future Directions Education Inquiry, 3(2), June 2012, 149-170 https://www.academia.edu/15201890/Standards_Based_Accountability_in_the_United_States_Lessons_Learned_and_Future_Directions_1 "Material in this paper has been adapted from a paper commissioned by the Center on Education Policy: Hamilton, L.S., Stecher, B.M., & Yuan, K. (2009) Standards-based Reform in the United States: History, Research, and Future Directions. Washington, DC: Center on Education Policy. Portions of this work were supported by the National Science Foundation under Grant No. REC-0228295." Relevant pre-2000 studies of the effects of standards, alignment, goal setting, setting reachable goals, etc. include those of Mitchell (1999); Morgan & Ramist (1998); the *Southern Regional Education Board (1998); Miles, Bishop, Collins, Fink, Gardner, Grant, Hussain, et al. (1997); the Florida Office of Program Policy Analysis (1997); Pomplun (1997); Schmoker (1996); Aguilera & Hendricks (1996); Banta, Lund, Black & Oblander (1996); Bottoms & Mikos (1995); *Bamburg & Medina (1993); Bishop (1993); the U. S. General Accounting Office (1993); Eckstein & Noah (1993); Mattsson (1993); Brown (1992); Heyneman & Ransom (1992); Whetton (1992); Anderson, Muir, Bateson, Blackmore & Rogers (1990); Csikszentmihalyi (1990); *Levine & Lezotte (1990); LaRoque & Coleman (1989); Hillocks (1987); Willingham & Morris (1986); Resnick & Resnick (1985); Ogle & Fritts (1984); *Natriello & Dornbusch (1984); Brooke & Oxenham (1984); Rentz (1979); Wellisch, MacQueen, Carriere & Dick (1978); *Rosswork (1977); Estes, Colvin & Goodwin (1976); Wood (1953); and Panlasigui & Knight (1930).
17 Laura S. Hamilton Brian M. Stecher, Kun Yuan "In fact, the bulk of research relevant to SBA has focused on the links between high-stakes tests and educators’ practices rather than standards and practices." p.159 Dismissive Standards-Based Accountability in the United States: Lessons Learned and Future Directions Education Inquiry, 3(2), June 2012, 149-170 https://www.academia.edu/15201890/Standards_Based_Accountability_in_the_United_States_Lessons_Learned_and_Future_Directions_1 "Material in this paper has been adapted from a paper commissioned by the Center on Education Policy: Hamilton, L.S., Stecher, B.M., & Yuan, K. (2009) Standards-based Reform in the United States: History, Research, and Future Directions. Washington, DC: Center on Education Policy. Portions of this work were supported by the National Science Foundation under Grant No. REC-0228295." Relevant pre-2000 studies of the effects of standards, alignment, goal setting, setting reachable goals, etc. include those of Mitchell (1999); Morgan & Ramist (1998); the *Southern Regional Education Board (1998); Miles, Bishop, Collins, Fink, Gardner, Grant, Hussain, et al. (1997); the Florida Office of Program Policy Analysis (1997); Pomplun (1997); Schmoker (1996); Aguilera & Hendricks (1996); Banta, Lund, Black & Oblander (1996); Bottoms & Mikos (1995); *Bamburg & Medina (1993); Bishop (1993); the U. S. General Accounting Office (1993); Eckstein & Noah (1993); Mattsson (1993); Brown (1992); Heyneman & Ransom (1992); Whetton (1992); Anderson, Muir, Bateson, Blackmore & Rogers (1990); Csikszentmihalyi (1990); *Levine & Lezotte (1990); LaRoque & Coleman (1989); Hillocks (1987); Willingham & Morris (1986); Resnick & Resnick (1985); Ogle & Fritts (1984); *Natriello & Dornbusch (1984); Brooke & Oxenham (1984); Rentz (1979); Wellisch, MacQueen, Carriere & Dick (1978); *Rosswork (1977); Estes, Colvin & Goodwin (1976); Wood (1953); and Panlasigui & Knight (1930).
18 Laura S. Hamilton Brian M. Stecher, Kun Yuan "The existing evidence does not provide definitive guidance regarding the SBA system features that would be most likely to promote desirable outcomes." p.163 Dismissive Standards-Based Accountability in the United States: Lessons Learned and Future Directions Education Inquiry, 3(2), June 2012, 149-170 https://www.academia.edu/15201890/Standards_Based_Accountability_in_the_United_States_Lessons_Learned_and_Future_Directions_1 "Material in this paper has been adapted from a paper commissioned by the Center on Education Policy: Hamilton, L.S., Stecher, B.M., & Yuan, K. (2009) Standards-based Reform in the United States: History, Research, and Future Directions. Washington, DC: Center on Education Policy. Portions of this work were supported by the National Science Foundation under Grant No. REC-0228295." Relevant pre-2000 studies of the effects of standards, alignment, goal setting, setting reachable goals, etc. include those of Mitchell (1999); Morgan & Ramist (1998); the *Southern Regional Education Board (1998); Miles, Bishop, Collins, Fink, Gardner, Grant, Hussain, et al. (1997); the Florida Office of Program Policy Analysis (1997); Pomplun (1997); Schmoker (1996); Aguilera & Hendricks (1996); Banta, Lund, Black & Oblander (1996); Bottoms & Mikos (1995); *Bamburg & Medina (1993); Bishop (1993); the U. S. General Accounting Office (1993); Eckstein & Noah (1993); Mattsson (1993); Brown (1992); Heyneman & Ransom (1992); Whetton (1992); Anderson, Muir, Bateson, Blackmore & Rogers (1990); Csikszentmihalyi (1990); *Levine & Lezotte (1990); LaRoque & Coleman (1989); Hillocks (1987); Willingham & Morris (1986); Resnick & Resnick (1985); Ogle & Fritts (1984); *Natriello & Dornbusch (1984); Brooke & Oxenham (1984); Rentz (1979); Wellisch, MacQueen, Carriere & Dick (1978); *Rosswork (1977); Estes, Colvin & Goodwin (1976); Wood (1953); and Panlasigui & Knight (1930).
19 Girlie C. Delacruz   "Opportunities for student use of rubrics to improve learning appears logical, although only a few studies have examined this idea directly." Dismissive Impact of Incentives on the Use of Feedback in Educational Videogames CRESST Report 813, March, 2012, p.3 https://cresst.org/wp-content/uploads/R813.pdf Institute of Education Sciences, US Education Department Relevant pre-2000 studies of the effects of standards, alignment, goal setting, setting reachable goals, etc. include those of Mitchell (1999); Morgan & Ramist (1998); the *Southern Regional Education Board (1998); Miles, Bishop, Collins, Fink, Gardner, Grant, Hussain, et al. (1997); the Florida Office of Program Policy Analysis (1997); Pomplun (1997); Schmoker (1996); Aguilera & Hendricks (1996); Banta, Lund, Black & Oblander (1996); Bottoms & Mikos (1995); *Bamburg & Medina (1993); Bishop (1993); the U. S. General Accounting Office (1993); Eckstein & Noah (1993); Mattsson (1993); Brown (1992); Heyneman & Ransom (1992); Whetton (1992); Anderson, Muir, Bateson, Blackmore & Rogers (1990); Csikszentmihalyi (1990); *Levine & Lezotte (1990); LaRoque & Coleman (1989); Hillocks (1987); Willingham & Morris (1986); Resnick & Resnick (1985); Ogle & Fritts (1984); *Natriello & Dornbusch (1984); Brooke & Oxenham (1984); Rentz (1979); Wellisch, MacQueen, Carriere & Dick (1978); *Rosswork (1977); Estes, Colvin & Goodwin (1976); Wood (1953); and Panlasigui & Knight (1930).
20 Jinok Kim   "Though we can find many such statistics in various reports, few have dealt with comparisons across students reclassified in various grade levels. Lack of such studies may be in part due to the difficulty in defining who are reclassified students as well as when they are reclassified."   Relationshiips among and between ELL status, demographic characteristics, enrollment history, and school persistence CRESST Report 810, December, 2011, p.6 https://cresst.org/wp-content/uploads/R810.pdf "The work reported herein was supported under the National Research and Development Centers, PR/Award Number R305A090581, as administered by the U.S. Department of Education, Institute of Education Sciences with funding to the National Center for Research on Evaluation, Standards, and Student Testing (CRESST)."  
21 Joan Herman 4 others "While the challenge of teachers’ content-pedagogical knowledge has been documented (Heritage et al., 2009; Heritage, Jones & White, 2010; Herman et al., 2010), few studies have examined the relationship between such knowledge and teachers’ assessment practices, nor examined how teachers’ knowledge may moderate the relationship between assessment practices and student learning." Dismissive Relationships between Teacher Knowledge, Assessment Practice, and Learning-Chicken, Egg, or Omelet? CRESST Report 809, November 2011 http://cresst.org/wp-content/uploads/R809.pdf Institute of Education Sciences, US Education Department See, for example, https://www.tandfonline.com/doi/full/10.1080/15305058.2011.602920 ; https://nonpartisaneducation.org/Review/Resources/SurveyList.htm ; https://nonpartisaneducation.org/Review/Resources/QualitativeList.htm
22 Lorrie A. Shepard Kristen L. Davidson, Richard Bowman "Although some instruments, such as the Northwest Evaluation Association‘s (NWEA) Measures of Academic Progress (MAP®), have been around for decades, few studies have been conducted to examine the technical adequacy of interim assessments or to evaluate their effects on teaching and student learning."  Dismissive How Middle-School Mathematics Teachers Use Interim and Benchmark Assessment Data, p.2 CRESST Report 807, October 2011 http://cresst.org/wp-content/uploads/R807.pdf Institute of Education Sciences, US Education Department Studies of formative testing date back a cenury, and the evidence, on average, is strongly positive, which is not the result favored by CRESST, so they declare the studies nonexistent. See, for example,  https://www.tandfonline.com/doi/full/10.1080/15305058.2011.602920 ; https://nonpartisaneducation.org/Review/Resources/QuantitativeList.htm ; https://nonpartisaneducation.org/Review/Resources/SurveyList.htm ; https://nonpartisaneducation.org/Review/Resources/QualitativeList.htm
23 Kristen L. Davidson Greta Frohbieter "Yet, districts’ processes to this end [of adopting interim or benchmark assessments] have been largely unexamined (Bulkley et al.; Mandinach et al.; Young & Kim). Dismissive District Adoption and Implementation of Interim and Benchmark Assessments, p.2 CRESST Report 806, September 2011 https://eric.ed.gov/?id=ED525098 Institute of Education Sciences, US Education Department Studies of formative testing date back a cenury, and the evidence, on average, is strongly positive, which is not the result favored by CRESST, so they declare the studies nonexistent. See, for example,  https://www.tandfonline.com/doi/full/10.1080/15305058.2011.602920 ; https://nonpartisaneducation.org/Review/Resources/QuantitativeList.htm ; https://nonpartisaneducation.org/Review/Resources/SurveyList.htm ; https://nonpartisaneducation.org/Review/Resources/QualitativeList.htm
24 Kristen L. Davidson Greta Frohbieter "As noted above, district processes with regard to interim assessment adoption and implementation remain largely uninvestigated. A review of the few relevant studies, however, reveals..." Dismissive District Adoption and Implementation of Interim and Benchmark Assessments, p.4 CRESST Report 806, September 2011 https://eric.ed.gov/?id=ED525098 Institute of Education Sciences, US Education Department Studies of formative testing date back a cenury, and the evidence, on average, is strongly positive, which is not the result favored by CRESST, so they declare the studies nonexistent. See, for example,  https://www.tandfonline.com/doi/full/10.1080/15305058.2011.602920 ; https://nonpartisaneducation.org/Review/Resources/QuantitativeList.htm ; https://nonpartisaneducation.org/Review/Resources/SurveyList.htm ; https://nonpartisaneducation.org/Review/Resources/QualitativeList.htm
25 Marguerite Clarke   “The evidence base is stronger in some areas than in others. For example, there are many professional standards for assessment quality that ` be applied to classroom assessments, examinations, and large-scale assessments (APA, AERA, and NCME, 1999), but less professional or empirical research on enabling contexts.” p. 20 Dismissive Framework for Building an Effective Student Assessment System  World Bank, READ/SABER Working Paper, Aug. 2011  http://files.eric.ed.gov/fulltext/ED553178.pdf World Bank funders No matter that there exist hundreds of other countries, a century's worth of research prior to 2010, literally thousands of other journals that might publish such a article, and a large "grey literature" of alignment studies conducted as routine parts of test development. Virtually any standards-based, large-scale test development includes an alignment study, not to be found in a scholarly journal.  Some notable alignment studies:
with NRTs:  Freeman, Kuhs, Porter, Floden, Schmidt, Schwille (1983); Debra P. v. Turlington (1984); Cohen, Spillane (1993); La Marca, Redfield, Winter, Bailey, and Despriet (2000); Wainer (2011)
with Standards: Archbald (1994); Buckendahl, Plake, Impara, Irwin (2000); Bhola, Impara, Buckendahl (2003); Phelps (2005)
with RTs: Massell, Kirst, Hoppe (1997); Wiley, Hembry, Buckendahl, Forte,Towles Nebelsick-Gullett (2015)
26 Marguerite Clarke   “Data for some of these indicator areas can be found in official documents, published reports (for example, Ferrer, 2006), research articles (for example, Braun and Kanjee, 2005), and online databases. For the most part, data have not been gathered in any comprehensive or systematic fashion. Those wishing to review this type of information for a particular assessment system will most likely need to collect the data themselves.” p. 21 Denigrating Framework for Building an Effective Student Assessment System  World Bank, READ/SABER Working Paper, Aug. 2011  http://files.eric.ed.gov/fulltext/ED553178.pdf World Bank funders No matter that there exist hundreds of other countries, a century's worth of research prior to 2010, literally thousands of other journals that might publish such a article, and a large "grey literature" of alignment studies conducted as routine parts of test development. Virtually any standards-based, large-scale test development includes an alignment study, not to be found in a scholarly journal.  Some notable alignment studies:
with NRTs:  Freeman, Kuhs, Porter, Floden, Schmidt, Schwille (1983); Debra P. v. Turlington (1984); Cohen, Spillane (1993); La Marca, Redfield, Winter, Bailey, and Despriet (2000); Wainer (2011)
with Standards: Archbald (1994); Buckendahl, Plake, Impara, Irwin (2000); Bhola, Impara, Buckendahl (2003); Phelps (2005)
with RTs: Massell, Kirst, Hoppe (1997); Wiley, Hembry, Buckendahl, Forte,Towles Nebelsick-Gullett (2015)
27 Marguerite Clarke   “This paper has extracted principles and guidelines from countries’ experiences and the current research base to outline a framework for developing a more effective student assessment system. The framework provides policy makers and others with a structure for discussion and consensus building around priorities and key inputs for their assessment system.” p. 27 1rstness Framework for Building an Effective Student Assessment System  World Bank, READ/SABER Working Paper, Aug. 2011  http://files.eric.ed.gov/fulltext/ED553178.pdf World Bank funders No matter that there exist hundreds of other countries, a century's worth of research prior to 2010, literally thousands of other journals that might publish such a article, and a large "grey literature" of alignment studies conducted as routine parts of test development. Virtually any standards-based, large-scale test development includes an alignment study, not to be found in a scholarly journal.  Some notable alignment studies:
with NRTs:  Freeman, Kuhs, Porter, Floden, Schmidt, Schwille (1983); Debra P. v. Turlington (1984); Cohen, Spillane (1993); La Marca, Redfield, Winter, Bailey, and Despriet (2000); Wainer (2011)
with Standards: Archbald (1994); Buckendahl, Plake, Impara, Irwin (2000); Bhola, Impara, Buckendahl (2003); Phelps (2005)
with RTs: Massell, Kirst, Hoppe (1997); Wiley, Hembry, Buckendahl, Forte,Towles Nebelsick-Gullett (2015)
28 Michael Hout, Stuart W. Elliot, Editors   "Unfortunately, there were no other studies available that would have allowed us to contrast the overall effect of state incentive programs predating NCLB…" p. 4-6 Dismissive Incentives and Test-Based Accountability in Education, 2011 Board on Testing and Assessment, National Research Council https://www.nap.edu/catalog/12521/incentives-and-test-based-accountability-in-education National Research Council funders Relevant studies of the effects of varying types of incentive or the optimal structure of incentives include those of Kelley (1999); the *Southern Regional Education Board (1998); Trelfa (1998); Heneman (1998); Banta, Lund, Black & Oblander (1996); Brooks-Cooper, 1993; Eckstein & Noah (1993); Richards & Shen (1992); Jacobson (1992); Heyneman & Ransom (1992); *Levine & Lezotte (1990); Duran, 1989; *Crooks (1988); *Kulik & Kulik (1987); Corcoran & Wilson (1986); *Guskey & Gates (1986); Brook & Oxenham (1985); Oxenham (1984); Venezky & Winfield (1979); Brookover & Lezotte (1979); McMillan (1977); Abbott (1977); *Staats (1973); *Kazdin & Bootzin (1972); *O’Leary & Drabman (1971); Cronbach (1960); and Hurlock (1925).   *Covers many studies; study is a research review, research synthesis, or meta-analysis.  Other researchers who, prior to 2000, studied test-based incentive programs include Homme, Csanyi, Gonzales, Rechs, O’Leary, Drabman, Kaszdin, Bootzin, Staats, Cameron, Pierce, McMillan, Corcoran, Roueche, Kirk, Wheeler, Boylan, and Wilson. "Others have considered the role of tests in incentive programs.  These researchers have included Homme, Csanyi, Gonzales, Rechs, O’Leary, Drabman, Kaszdin, Bootzin, Staats, Cameron, Pierce, McMillan, Corcoran, and Wilson. International organizations, such as the World Bank or the Asian Development Bank, have studied the effects of testing on education programs they sponsor.  Researchers have included Somerset, Heynemann, Ransom, Psacharopoulis, Velez, Brooke, Oxenham, Bude, Chapman, Snyder, and Pronaratna.
Moreover, the mastery learning/mastery testing experiments conducted from the 1960s through today varied incentives, frequency of tests, types of tests, and many other factors to determine the optimal structure of testing programs. Researchers included such notables as Bloom, Carroll, Keller, Block, Burns, Wentling, Anderson, Hymel, Kulik, Tierney, Cross, Okey, Guskey, Gates, and Jones."
29 Michael Hout, Stuart W. Elliot, Editors   "Test-based incentive programs, as designed and implemented in the programs that have been carefully studied have not increased student achievement enough to bring the United States close to the levels of the highest achieving countries.", p. 4-26 Denigrating Incentives and Test-Based Accountability in Education, 2011 Board on Testing and Assessment, National Research Council https://www.nap.edu/catalog/12521/incentives-and-test-based-accountability-in-education National Research Council funders Relevant studies of the effects of varying types of incentive or the optimal structure of incentives include those of Kelley (1999); the *Southern Regional Education Board (1998); Trelfa (1998); Heneman (1998); Banta, Lund, Black & Oblander (1996); Brooks-Cooper, 1993; Eckstein & Noah (1993); Richards & Shen (1992); Jacobson (1992); Heyneman & Ransom (1992); *Levine & Lezotte (1990); Duran, 1989; *Crooks (1988); *Kulik & Kulik (1987); Corcoran & Wilson (1986); *Guskey & Gates (1986); Brook & Oxenham (1985); Oxenham (1984); Venezky & Winfield (1979); Brookover & Lezotte (1979); McMillan (1977); Abbott (1977); *Staats (1973); *Kazdin & Bootzin (1972); *O’Leary & Drabman (1971); Cronbach (1960); and Hurlock (1925).   *Covers many studies; study is a research review, research synthesis, or meta-analysis.  Other researchers who, prior to 2000, studied test-based incentive programs include Homme, Csanyi, Gonzales, Rechs, O’Leary, Drabman, Kaszdin, Bootzin, Staats, Cameron, Pierce, McMillan, Corcoran, Roueche, Kirk, Wheeler, Boylan, and Wilson. Others have considered the role of tests in incentive programs.  These researchers have included Homme, Csanyi, Gonzales, Rechs, O’Leary, Drabman, Kaszdin, Bootzin, Staats, Cameron, Pierce, McMillan, Corcoran, and Wilson. International organizations, such as the World Bank or the Asian Development Bank, have studied the effects of testing on education programs they sponsor.  Researchers have included Somerset, Heynemann, Ransom, Psacharopoulis, Velez, Brooke, Oxenham, Bude, Chapman, Snyder, and Pronaratna.
Moreover, the mastery learning/mastery testing experiments conducted from the 1960s through today varied incentives, frequency of tests, types of tests, and many other factors to determine the optimal structure of testing programs. Researchers included such notables as Bloom, Carroll, Keller, Block, Burns, Wentling, Anderson, Hymel, Kulik, Tierney, Cross, Okey, Guskey, Gates, and Jones.
30 Michael Hout, Stuart W. Elliot, Editors   "Despite using them for several decades, policymakers and educators do not yet know how to use test-based incentives to consistently generate positive effects on achievement and to improve education." p .5-1 Dismissive Incentives and Test-Based Accountability in Education, 2011 Board on Testing and Assessment, National Research Council https://www.nap.edu/catalog/12521/incentives-and-test-based-accountability-in-education National Research Council funders Relevant studies of the effects of varying types of incentive or the optimal structure of incentives include those of Kelley (1999); the *Southern Regional Education Board (1998); Trelfa (1998); Heneman (1998); Banta, Lund, Black & Oblander (1996); Brooks-Cooper, 1993; Eckstein & Noah (1993); Richards & Shen (1992); Jacobson (1992); Heyneman & Ransom (1992); *Levine & Lezotte (1990); Duran, 1989; *Crooks (1988); *Kulik & Kulik (1987); Corcoran & Wilson (1986); *Guskey & Gates (1986); Brook & Oxenham (1985); Oxenham (1984); Venezky & Winfield (1979); Brookover & Lezotte (1979); McMillan (1977); Abbott (1977); *Staats (1973); *Kazdin & Bootzin (1972); *O’Leary & Drabman (1971); Cronbach (1960); and Hurlock (1925).   *Covers many studies; study is a research review, research synthesis, or meta-analysis.  Other researchers who, prior to 2000, studied test-based incentive programs include Homme, Csanyi, Gonzales, Rechs, O’Leary, Drabman, Kaszdin, Bootzin, Staats, Cameron, Pierce, McMillan, Corcoran, Roueche, Kirk, Wheeler, Boylan, and Wilson. "Others have considered the role of tests in incentive programs.  These researchers have included Homme, Csanyi, Gonzales, Rechs, O’Leary, Drabman, Kaszdin, Bootzin, Staats, Cameron, Pierce, McMillan, Corcoran, and Wilson. International organizations, such as the World Bank or the Asian Development Bank, have studied the effects of testing on education programs they sponsor.  Researchers have included Somerset, Heynemann, Ransom, Psacharopoulis, Velez, Brooke, Oxenham, Bude, Chapman, Snyder, and Pronaratna.
Moreover, the mastery learning/mastery testing experiments conducted from the 1960s through today varied incentives, frequency of tests, types of tests, and many other factors to determine the optimal structure of testing programs. Researchers included such notables as Bloom, Carroll, Keller, Block, Burns, Wentling, Anderson, Hymel, Kulik, Tierney, Cross, Okey, Guskey, Gates, and Jones."
31 Michael Hout, Stuart W. Elliot, Editors   "The general lack of guidance coming from existing studies of test-based incentive programs in education…" Dismissive Incentives and Test-Based Accountability in Education, 2011 Board on Testing and Assessment, National Research Council https://www.nap.edu/catalog/12521/incentives-and-test-based-accountability-in-education National Research Council funders Relevant studies of the effects of varying types of incentive or the optimal structure of incentives include those of Kelley (1999); the *Southern Regional Education Board (1998); Trelfa (1998); Heneman (1998); Banta, Lund, Black & Oblander (1996); Brooks-Cooper, 1993; Eckstein & Noah (1993); Richards & Shen (1992); Jacobson (1992); Heyneman & Ransom (1992); *Levine & Lezotte (1990); Duran, 1989; *Crooks (1988); *Kulik & Kulik (1987); Corcoran & Wilson (1986); *Guskey & Gates (1986); Brook & Oxenham (1985); Oxenham (1984); Venezky & Winfield (1979); Brookover & Lezotte (1979); McMillan (1977); Abbott (1977); *Staats (1973); *Kazdin & Bootzin (1972); *O’Leary & Drabman (1971); Cronbach (1960); and Hurlock (1925).   *Covers many studies; study is a research review, research synthesis, or meta-analysis.  Other researchers who, prior to 2000, studied test-based incentive programs include Homme, Csanyi, Gonzales, Rechs, O’Leary, Drabman, Kaszdin, Bootzin, Staats, Cameron, Pierce, McMillan, Corcoran, Roueche, Kirk, Wheeler, Boylan, and Wilson. "Others have considered the role of tests in incentive programs.  These researchers have included Homme, Csanyi, Gonzales, Rechs, O’Leary, Drabman, Kaszdin, Bootzin, Staats, Cameron, Pierce, McMillan, Corcoran, and Wilson. International organizations, such as the World Bank or the Asian Development Bank, have studied the effects of testing on education programs they sponsor.  Researchers have included Somerset, Heynemann, Ransom, Psacharopoulis, Velez, Brooke, Oxenham, Bude, Chapman, Snyder, and Pronaratna.
Moreover, the mastery learning/mastery testing experiments conducted from the 1960s through today varied incentives, frequency of tests, types of tests, and many other factors to determine the optimal structure of testing programs. Researchers included such notables as Bloom, Carroll, Keller, Block, Burns, Wentling, Anderson, Hymel, Kulik, Tierney, Cross, Okey, Guskey, Gates, and Jones."
32 Eva L. Baker   "At the same time that interest in alternative assessment is high, our knowledge about the design, distribution, quality and impact of such efforts is low. This is a time of tingling metaphor, cottage industry, and existence proofs rather than carefully designed research and development." p.2 Dismissive, Denigrating What Probably Works in Alternative Assessment, July 2010 CRESST Report 772   Institute of Education Sciences, US Education Department It is selected-response item formats (e.g., multiple choice) that are new. Open-ended item formats have been with us for millenia. And, thousands of research, evalution, and validity studies have been conducted on them. See, for example,  https://www.tandfonline.com/doi/full/10.1080/15305058.2011.602920 ; https://nonpartisaneducation.org/Review/Resources/QuantitativeList.htm ; https://nonpartisaneducation.org/Review/Resources/SurveyList.htm ; https://nonpartisaneducation.org/Review/Resources/QualitativeList.htm
33 Eva L. Baker   "Moreover, because psychometric methods appropriate for dealing with such new measures are not readily available, nor even a matter of common agreement, no clear templates exist to guide the technical practices of alternative assessment developers (Linn, Baker, Dunbar, 1991)." p.2 Dismissive What Probably Works in Alternative Assessment, July 2010 CRESST Report 772   Institute of Education Sciences, US Education Department It is selected-response item formats (e.g., multiple choice) that are new. Open-ended item formats have been with us for millenia. And, thousands of research, evalution, and validity studies have been conducted on them. See, for example,  https://www.tandfonline.com/doi/full/10.1080/15305058.2011.602920 ; https://nonpartisaneducation.org/Review/Resources/QuantitativeList.htm ; https://nonpartisaneducation.org/Review/Resources/SurveyList.htm ; https://nonpartisaneducation.org/Review/Resources/QualitativeList.htm
34 Eva L. Baker   "Given that the level of empirical work is so obviously low, one well might wonder what these studies are about. Some studies argue for new approaches to achievement testing." p.3 Denigrating What Probably Works in Alternative Assessment, July 2010 CRESST Report 772   Institute of Education Sciences, US Education Department She looked in two databases -- ERIC and NTIS -- and then implied she had looked everywhere.
35 Eva L. Baker   "Despite this fragile research base, alternative assessment has already taken off. What issues can we anticipate being raised by relevant communities about the value of these efforts?" p.6 Dismissive, Denigrating What Probably Works in Alternative Assessment, July 2010 CRESST Report 772   Institute of Education Sciences, US Education Department She looked in two databases -- ERIC and NTIS -- and then implied she had looked everywhere.
36 Lawrence O. Picus
Frank Adamson
William Montague
Margaret Owens
"As in the earlier studies, efforts are made to distinguish between the concept of economic or opportunity costs (i.e., the use of teacher time that is already “paid for” through the contract and used as part of the assessment process rather then for some other activity or function), and the direct expenditures made for assessment." p.1 Dismissive A New Conceptual Framework for Analyzing the Costs of Performance Assessment, 2010 The Stanford Center for Opportunity Policy in Education (SCOPE) https://edpolicy.stanford.edu/sites/default/files/publications/new-conceptual-framework-analyzing-costs-performance-assessment_0.pdf SCOPE funders (1) For at least two decades, Larry Picus has elevated the trivial and elemental difference between expenditure and cost to the level of heavenly revelation. As any beginning undergraduate in economics knows expenditures -- particularly budgetary line-item expenditures -- don't necessarily equal the cost of an item or acitivity. The classifications of the amounts might or might not match. Picus needled the trivial point over and over for decades. Meanwhile my project on  testing costs at the GAO (1991-1993) was a cost study in every sense that Picus identified for the term, but the word "expenditures" was in the title of the report. So, when Picus repeated and repeated that most studies on the topic prior to his were "just expenditure studies" (and not really "cost" studies) there was the GAO report, one of the few cost studies done prior to his, with the word expenditure in its title. The ploy worked, and many were convinced then, and still today, that my work at the GAO relied on budgetary line-item expenditure data (it didn't), neglected to include the cost of personnel time (it did include those costs), or was otherwise suspect, an inferior study. Picus and CRESST managed to denigrate into oblivion a taxpayer-funded study that was vastly superior to any he would ever do.
37 Lawrence O. Picus
Frank Adamson
William Montague
Margaret Owens
"Determining the resources necessary to achieve each of these goals is, at best, a complex task. Because of this difficulty, many analysts stop short of estimating the true costs of a program, and instead focus on the expenditures required for its implementation." p.7 Dismissive A New Conceptual Framework for Analyzing the Costs of Performance Assessment, 2010 The Stanford Center for Opportunity Policy in Education (SCOPE) https://edpolicy.stanford.edu/sites/default/files/publications/new-conceptual-framework-analyzing-costs-performance-assessment_0.pdf SCOPE funders (1) For at least two decades, Larry Picus has elevated the trivial and elemental difference between expenditure and cost to the level of heavenly revelation. As any beginning undergraduate in economics knows expenditures -- particularly budgetary line-item expenditures -- don't necessarily equal the cost of an item or acitivity. The classifications of the amounts might or might not match. Picus needled the trivial point over and over for decades. Meanwhile my project on  testing costs at the GAO (1991-1993) was a cost study in every sense that Picus identified for the term, but the word "expenditures" was in the title of the report. So, when Picus repeated and repeated that most studies on the topic prior to his were "just expenditure studies" (and not really "cost" studies) there was the GAO report, one of the few cost studies done prior to his, with the word expenditure in its title. The ploy worked, and many were convinced then, and still today, that my work at the GAO relied on budgetary line-item expenditure data (it didn't), neglected to include the cost of personnel time (it did include those costs), or was otherwise suspect, an inferior study. Picus and CRESST managed to denigrate into oblivion a taxpayer-funded study that was vastly superior to any he would ever do.
38 Lawrence O. Picus
Frank Adamson
William Montague
Margaret Owens
"The study defined purchase cost as the money spent on test-related goods and services, a category in line with what we call expenditures (U.S. GAO, 1993)." p.21 Denigrating A New Conceptual Framework for Analyzing the Costs of Performance Assessment, 2010 The Stanford Center for Opportunity Policy in Education (SCOPE) https://edpolicy.stanford.edu/sites/default/files/publications/new-conceptual-framework-analyzing-costs-performance-assessment_0.pdf SCOPE funders (1) For at least two decades, Larry Picus has elevated the trivial and elemental difference between expenditure and cost to the level of heavenly revelation. As any beginning undergraduate in economics knows expenditures -- particularly budgetary line-item expenditures -- don't necessarily equal the cost of an item or acitivity. The classifications of the amounts might or might not match. Picus needled the trivial point over and over for decades. Meanwhile my project on  testing costs at the GAO (1991-1993) was a cost study in every sense that Picus identified for the term, but the word "expenditures" was in the title of the report. So, when Picus repeated and repeated that most studies on the topic prior to his were "just expenditure studies" (and not really "cost" studies) there was the GAO report, one of the few cost studies done prior to his, with the word expenditure in its title. The ploy worked, and many were convinced then, and still today, that my work at the GAO relied on budgetary line-item expenditure data (it didn't), neglected to include the cost of personnel time (it did include those costs), or was otherwise suspect, an inferior study. Picus and CRESST managed to denigrate into oblivion a taxpayer-funded study that was vastly superior to any he would ever do.
39 Lawrence O. Picus
Frank Adamson
William Montague
Margaret Owens
"Unfortunately, aggregating these different types of time disguises important differences between them that, in fairness to the GAO, have emerged in the NCLB era as more important considerations than in previous decades. Specifically, test-preparation time for students has become a subject of national debate about how much class time teachers spend 'teaching to the test.'" p.21 Denigrating A New Conceptual Framework for Analyzing the Costs of Performance Assessment, 2010 The Stanford Center for Opportunity Policy in Education (SCOPE) https://edpolicy.stanford.edu/sites/default/files/publications/new-conceptual-framework-analyzing-costs-performance-assessment_0.pdf SCOPE funders (1) I continued to publish articles and made presentations based on the GAO project for several years after I left the GAO. These publications reported the disagregated costs and estimated benefits. Indeed, I published a net benefit (i.e., benefit/cost) study in the Journal of Education Finance ten years prior to this Picus article. Almost certainly he knows about it -- he has served as editor or on the editorial board for that journal for many years. In this report of his for SCOPE, my name is never mentioned nor are any of my many publications or presentations related to the costs and benefits of testing. 
40 Lawrence O. Picus
Frank Adamson
William Montague
Margaret Owens
"In its analysis, the GAO does provide aggregate time estimates. However, it does not provide disaggregated estimates of teacher time, nor estimated benefits in terms of either teacher PD or improved student learning." p.21 Denigrating A New Conceptual Framework for Analyzing the Costs of Performance Assessment, 2010 The Stanford Center for Opportunity Policy in Education (SCOPE) https://edpolicy.stanford.edu/sites/default/files/publications/new-conceptual-framework-analyzing-costs-performance-assessment_0.pdf SCOPE funders (1) I continued to publish articles and made presentations based on the GAO project for several years after I left the GAO. These publications reported the disagregated costs and estimated benefits. Indeed, I published a net benefit (i.e., benefit/cost) study in the Journal of Education Finance ten years prior to this Picus article. Almost certainly he knows about it -- he has served as editor or on the editorial board for that journal for many years. In this report of his for SCOPE, my name is never mentioned nor are any of my many publications or presentations related to the costs and benefits of testing. 
41 Lawrence O. Picus
Frank Adamson
William Montague
Margaret Owens
"The performance assessments studied by the GAO also do not demonstrate much variety. Most included only writing samples, reading comprehension and response, and math/science problem-solving items. A few districts used science lab work, group work, and skills observations, but most still relied on paper-and-pencil testing (U.S. GAO, 1993)." p.21 Denigrating A New Conceptual Framework for Analyzing the Costs of Performance Assessment, 2010 The Stanford Center for Opportunity Policy in Education (SCOPE) https://edpolicy.stanford.edu/sites/default/files/publications/new-conceptual-framework-analyzing-costs-performance-assessment_0.pdf SCOPE funders (1) Picus neglects to mention that the GAO collected data from the universe of states with testing programs and a very large, representative sample (> 660) of public school districts. We collected all the data on all the systemwide testing occurring at the time. We oversampled districts in certain states, such as Maryland, the one state at the time with the most elaborate performance test types. In doing that, we did more than he ever did in his couple of state studies. Yet, as usual, he implies that the GAO study or my work must have left out something important. 
42 Lawrence O. Picus
Frank Adamson
William Montague
Margaret Owens
"In every instance, test developers crafting the performancebased tests started from scratch, writing test questions that fit the state’s curriculum or guidelines, then testing the draft on pilot groups of students and using an iterative revision process that did not involve state curriculum, which was undergoing simultaneous development (U.S. GAO, 1993)." p.22 Denigrating A New Conceptual Framework for Analyzing the Costs of Performance Assessment, 2010 The Stanford Center for Opportunity Policy in Education (SCOPE) https://edpolicy.stanford.edu/sites/default/files/publications/new-conceptual-framework-analyzing-costs-performance-assessment_0.pdf SCOPE funders (1) This sentence doesn't make sense, but he doesn't include page numbers in his citations so it is not even possible to find what text he might have been misunderstanding. Within one sentence, Picus claims that test items were based on established content standards, but then not based on them, because they didn't yet exist. The latter point is certainly not true. When standards-based tests are developed, the content standards are completed first, and the test items are written directly from them. 
43 Joan L. Herman
Ellen Osmundson, David Silver "These indeed are promising developments for pushing formative assessment to fruition in classroom practice. They acknowledge and work toward remedying the need for classroom tools to assess and support student learning. Yet at the same time, recent studies reveal challenges in implementing quality formative assessment and show non-robust results with regard to effects on student learning (Herman, Osmundson, Ayala, Schneider, & Timms, 2006; Furtak, et al., 2008)." Dismissive, Denigrating Capturing Quality in Formative Assessment Practice: Measurement Challenges, p.2 CRESST Report 770, June 2010 https://eric.ed.gov/?id=ED512648 Institute of Education Sciences, US Education Department Studies of formative testing date back a cenury, and the evidence, on average, is strongly positive, which is not the result favored by CRESST, so they declare the studies nonexistent. See, for example,  https://www.tandfonline.com/doi/full/10.1080/15305058.2011.602920 ; https://nonpartisaneducation.org/Review/Resources/QuantitativeList.htm ; https://nonpartisaneducation.org/Review/Resources/SurveyList.htm ; https://nonpartisaneducation.org/Review/Resources/QualitativeList.htm
44 Joan L. Herman
Ellen Osmundson, David Silver "Just as the concept of formative assessment itself underscores the central role of evidence—learning data—in an effective teaching and learning process, so too do policymakers and practitioners need evidence on which to build effective formative practices. Toward this latter goal, this report explores ..." 1stness Capturing Quality in Formative Assessment Practice: Measurement Challenges, p.2 CRESST Report 770, June 2010 https://eric.ed.gov/?id=ED512648 Institute of Education Sciences, US Education Department Studies of formative testing date back a cenury, and the evidence, on average, is strongly positive, which is not the result favored by CRESST, so they declare the studies nonexistent. See, for example,  https://www.tandfonline.com/doi/full/10.1080/15305058.2011.602920 ; https://nonpartisaneducation.org/Review/Resources/QuantitativeList.htm ; https://nonpartisaneducation.org/Review/Resources/SurveyList.htm ; https://nonpartisaneducation.org/Review/Resources/QualitativeList.htm
45 Diana Pullin (Chair)
Joan Herman, Scott Marion, Dirk Mattson, Rebecca Maynard, Mark Wilson,  "However, there have been very few studies of how interim assessments are actually used by individual teachers in classrooms, by principals, and by districts or of their impact on student achievement." p. 6 Dismissive Best Practices for State Assessment Systems, Part I Committee on Best Practices for State Assessment Systems: Improving Assessment While Revisiting Standards; Center for Education; Division of Behavioral and Social Sciences and Education; National Research Council https://www.nap.edu/catalog/12906/best-practices-for-state-assessment-systems-part-i-summary-of "With funding from the James B. Hunt, Jr. Institute for Educational Leadership and Policy, as well as additional support from the Bill & Melinda Gates Foundation and the Stupski Foundation, the National Research Council (NRC) planned two workshops designed to explore some of the possibilities for state assessment systems."  
46 Diana Pullin (Chair)
Joan Herman, Scott Marion, Dirk Mattson, Rebecca Maynard, Mark Wilson,  "Research indicates that the result has been emphasis on lower-level knowledge and skills and very thin alignment with the standards. For example, Porter, Polikoff, and Smithson (2009) found very low to moder
ate alignment between state assessments and standards—meaning that large proportions of content standards are not covered on the assessments (see also Fuller et al., 2006; Ho, 2008). p. 10
Denigrating Best Practices for State Assessment Systems, Part I Committee on Best Practices for State Assessment Systems: Improving Assessment While Revisiting Standards; Center for Education; Division of Behavioral and Social Sciences and Education; National Research Council https://www.nap.edu/catalog/12906/best-practices-for-state-assessment-systems-part-i-summary-of "With funding from the James B. Hunt, Jr. Institute for Educational Leadership and Policy, as well as additional support from the Bill & Melinda Gates Foundation and the Stupski Foundation, the National Research Council (NRC) planned two workshops designed to explore some of the possibilities for state assessment systems."  
47 Diana Pullin (Chair)
Joan Herman, Scott Marion, Dirk Mattson, Rebecca Maynard, Mark Wilson,  "Another issue is that the implications of computer-based approaches for validity and reliability have not been thoroughly evaluated." p. 40 Dismissive Best Practices for State Assessment Systems, Part I Committee on Best Practices for State Assessment Systems: Improving Assessment While Revisiting Standards; Center for Education; Division of Behavioral and Social Sciences and Education; National Research Council https://www.nap.edu/catalog/12906/best-practices-for-state-assessment-systems-part-i-summary-of "With funding from the James B. Hunt, Jr. Institute for Educational Leadership and Policy, as well as additional support from the Bill & Melinda Gates Foundation and the Stupski Foundation, the National Research Council (NRC) planned two workshops designed to explore some of the possibilities for state assessment systems."  
48 Diana Pullin (Chair)
Joan Herman, Scott Marion, Dirk Mattson, Rebecca Maynard, Mark Wilson,  "For current tests, he [Lauress Wise] observed, there is little evidence that they are good indicators of instructional effectiveness or good predictors of students’ readiness for subsequent levels of instruction." Dismissive Best Practices for State Assessment Systems, Part I Committee on Best Practices for State Assessment Systems: Improving Assessment While Revisiting Standards; Center for Education; Division of Behavioral and Social Sciences and Education; National Research Council https://www.nap.edu/catalog/12906/best-practices-for-state-assessment-systems-part-i-summary-of "With funding from the James B. Hunt, Jr. Institute for Educational Leadership and Policy, as well as additional support from the Bill & Melinda Gates Foundation and the Stupski Foundation, the National Research Council (NRC) planned two workshops designed to explore some of the possibilities for state assessment systems."  
49 Laura S. Hamilton Brian M. Stecher, Kun Yuan “A few studies have attempted to examine how the creation and publication of standards, per se, have affected practices.” p. 3 Dismissive Standards-Based Reform in the United States: History, Research, and Future Directions Center on Education Policy, December, 2008 http://www.rand.org/content/dam/rand/pubs/reprints/2009/RAND_RP1384.pdf Institute of Education Sciences, US Education Department Relevant pre-2000 studies of the effects of standards, alignment, goal setting, setting reachable goals, etc. include those of Mitchell (1999); Morgan & Ramist (1998); the *Southern Regional Education Board (1998); Miles, Bishop, Collins, Fink, Gardner, Grant, Hussain, et al. (1997); the Florida Office of Program Policy Analysis (1997); Pomplun (1997); Schmoker (1996); Aguilera & Hendricks (1996); Banta, Lund, Black & Oblander (1996); Bottoms & Mikos (1995); *Bamburg & Medina (1993); Bishop (1993); the U. S. General Accounting Office (1993); Eckstein & Noah (1993); Mattsson (1993); Brown (1992); Heyneman & Ransom (1992); Whetton (1992); Anderson, Muir, Bateson, Blackmore & Rogers (1990); Csikszentmihalyi (1990); *Levine & Lezotte (1990); LaRoque & Coleman (1989); Hillocks (1987); Willingham & Morris (1986); Resnick & Resnick (1985); Ogle & Fritts (1984); *Natriello & Dornbusch (1984); Brooke & Oxenham (1984); Rentz (1979); Wellisch, MacQueen, Carriere & Dick (1978); *Rosswork (1977); Estes, Colvin & Goodwin (1976); Wood (1953); and Panlasigui & Knight (1930).
50 Laura S. Hamilton Brian M. Stecher, Kun Yuan “The research evidence does not provide definitive answers to these questions.” p. 6 Denigrating Standards-Based Reform in the United States: History, Research, and Future Directions Center on Education Policy, December, 2008 http://www.rand.org/content/dam/rand/pubs/reprints/2009/RAND_RP1384.pdf Institute of Education Sciences, US Education Department Relevant pre-2000 studies of the effects of standards, alignment, goal setting, setting reachable goals, etc. include those of Mitchell (1999); Morgan & Ramist (1998); the *Southern Regional Education Board (1998); Miles, Bishop, Collins, Fink, Gardner, Grant, Hussain, et al. (1997); the Florida Office of Program Policy Analysis (1997); Pomplun (1997); Schmoker (1996); Aguilera & Hendricks (1996); Banta, Lund, Black & Oblander (1996); Bottoms & Mikos (1995); *Bamburg & Medina (1993); Bishop (1993); the U. S. General Accounting Office (1993); Eckstein & Noah (1993); Mattsson (1993); Brown (1992); Heyneman & Ransom (1992); Whetton (1992); Anderson, Muir, Bateson, Blackmore & Rogers (1990); Csikszentmihalyi (1990); *Levine & Lezotte (1990); LaRoque & Coleman (1989); Hillocks (1987); Willingham & Morris (1986); Resnick & Resnick (1985); Ogle & Fritts (1984); *Natriello & Dornbusch (1984); Brooke & Oxenham (1984); Rentz (1979); Wellisch, MacQueen, Carriere & Dick (1978); *Rosswork (1977); Estes, Colvin & Goodwin (1976); Wood (1953); and Panlasigui & Knight (1930).
51 Laura S. Hamilton Brian M. Stecher, Kun Yuan “He [Poynter 1994] also noted that ‘virtually all of the arguments, both for and against standards, are based on beliefs and hypotheses rather than on direct empirical evidence’ (p. 427).” pp. 34-35 Dismissive, Denigrating Standards-Based Reform in the United States: History, Research, and Future Directions Center on Education Policy, December, 2008 http://www.rand.org/content/dam/rand/pubs/reprints/2009/RAND_RP1384.pdf Institute of Education Sciences, US Education Department Relevant pre-2000 studies of the effects of standards, alignment, goal setting, setting reachable goals, etc. include those of Mitchell (1999); Morgan & Ramist (1998); the *Southern Regional Education Board (1998); Miles, Bishop, Collins, Fink, Gardner, Grant, Hussain, et al. (1997); the Florida Office of Program Policy Analysis (1997); Pomplun (1997); Schmoker (1996); Aguilera & Hendricks (1996); Banta, Lund, Black & Oblander (1996); Bottoms & Mikos (1995); *Bamburg & Medina (1993); Bishop (1993); the U. S. General Accounting Office (1993); Eckstein & Noah (1993); Mattsson (1993); Brown (1992); Heyneman & Ransom (1992); Whetton (1992); Anderson, Muir, Bateson, Blackmore & Rogers (1990); Csikszentmihalyi (1990); *Levine & Lezotte (1990); LaRoque & Coleman (1989); Hillocks (1987); Willingham & Morris (1986); Resnick & Resnick (1985); Ogle & Fritts (1984); *Natriello & Dornbusch (1984); Brooke & Oxenham (1984); Rentz (1979); Wellisch, MacQueen, Carriere & Dick (1978); *Rosswork (1977); Estes, Colvin & Goodwin (1976); Wood (1953); and Panlasigui & Knight (1930).
52 Laura S. Hamilton Brian M. Stecher, Kun Yuan "Although a large and growing body of research has been conducted to examine the effects of SBR, the caution Poynter expressed in 1994 about the lack of empirical evidence remains relevant today.” pp. 34-35 Dismissive Standards-Based Reform in the United States: History, Research, and Future Directions Center on Education Policy, December, 2008 http://www.rand.org/content/dam/rand/pubs/reprints/2009/RAND_RP1384.pdf Institute of Education Sciences, US Education Department Relevant pre-2000 studies of the effects of standards, alignment, goal setting, setting reachable goals, etc. include those of Mitchell (1999); Morgan & Ramist (1998); the *Southern Regional Education Board (1998); Miles, Bishop, Collins, Fink, Gardner, Grant, Hussain, et al. (1997); the Florida Office of Program Policy Analysis (1997); Pomplun (1997); Schmoker (1996); Aguilera & Hendricks (1996); Banta, Lund, Black & Oblander (1996); Bottoms & Mikos (1995); *Bamburg & Medina (1993); Bishop (1993); the U. S. General Accounting Office (1993); Eckstein & Noah (1993); Mattsson (1993); Brown (1992); Heyneman & Ransom (1992); Whetton (1992); Anderson, Muir, Bateson, Blackmore & Rogers (1990); Csikszentmihalyi (1990); *Levine & Lezotte (1990); LaRoque & Coleman (1989); Hillocks (1987); Willingham & Morris (1986); Resnick & Resnick (1985); Ogle & Fritts (1984); *Natriello & Dornbusch (1984); Brooke & Oxenham (1984); Rentz (1979); Wellisch, MacQueen, Carriere & Dick (1978); *Rosswork (1977); Estes, Colvin & Goodwin (1976); Wood (1953); and Panlasigui & Knight (1930).
53 Laura S. Hamilton Brian M. Stecher, Kun Yuan “Arguably the most important test of quality is whether the standards promote high-quality instruction and improved student learning, but as we discuss later, there is very little research to address that question.” p. 37 Dismissive Standards-Based Reform in the United States: History, Research, and Future Directions Center on Education Policy, December, 2008 http://www.rand.org/content/dam/rand/pubs/reprints/2009/RAND_RP1384.pdf Institute of Education Sciences, US Education Department Relevant pre-2000 studies of the effects of standards, alignment, goal setting, setting reachable goals, etc. include those of Mitchell (1999); Morgan & Ramist (1998); the *Southern Regional Education Board (1998); Miles, Bishop, Collins, Fink, Gardner, Grant, Hussain, et al. (1997); the Florida Office of Program Policy Analysis (1997); Pomplun (1997); Schmoker (1996); Aguilera & Hendricks (1996); Banta, Lund, Black & Oblander (1996); Bottoms & Mikos (1995); *Bamburg & Medina (1993); Bishop (1993); the U. S. General Accounting Office (1993); Eckstein & Noah (1993); Mattsson (1993); Brown (1992); Heyneman & Ransom (1992); Whetton (1992); Anderson, Muir, Bateson, Blackmore & Rogers (1990); Csikszentmihalyi (1990); *Levine & Lezotte (1990); LaRoque & Coleman (1989); Hillocks (1987); Willingham & Morris (1986); Resnick & Resnick (1985); Ogle & Fritts (1984); *Natriello & Dornbusch (1984); Brooke & Oxenham (1984); Rentz (1979); Wellisch, MacQueen, Carriere & Dick (1978); *Rosswork (1977); Estes, Colvin & Goodwin (1976); Wood (1953); and Panlasigui & Knight (1930).
54 Laura S. Hamilton Brian M. Stecher, Kun Yuan “[T]here have been a few studies of SBR as a comprehensive system. . . . [T]here is some research on how the adoption of standards, per se, or the alignment of standards with curriculum influences school practices or student outcomes.” p. 38 Dismissive Standards-Based Reform in the United States: History, Research, and Future Directions Center on Education Policy, December, 2008 http://www.rand.org/content/dam/rand/pubs/reprints/2009/RAND_RP1384.pdf Institute of Education Sciences, US Education Department Relevant pre-2000 studies of the effects of standards, alignment, goal setting, setting reachable goals, etc. include those of Mitchell (1999); Morgan & Ramist (1998); the *Southern Regional Education Board (1998); Miles, Bishop, Collins, Fink, Gardner, Grant, Hussain, et al. (1997); the Florida Office of Program Policy Analysis (1997); Pomplun (1997); Schmoker (1996); Aguilera & Hendricks (1996); Banta, Lund, Black & Oblander (1996); Bottoms & Mikos (1995); *Bamburg & Medina (1993); Bishop (1993); the U. S. General Accounting Office (1993); Eckstein & Noah (1993); Mattsson (1993); Brown (1992); Heyneman & Ransom (1992); Whetton (1992); Anderson, Muir, Bateson, Blackmore & Rogers (1990); Csikszentmihalyi (1990); *Levine & Lezotte (1990); LaRoque & Coleman (1989); Hillocks (1987); Willingham & Morris (1986); Resnick & Resnick (1985); Ogle & Fritts (1984); *Natriello & Dornbusch (1984); Brooke & Oxenham (1984); Rentz (1979); Wellisch, MacQueen, Carriere & Dick (1978); *Rosswork (1977); Estes, Colvin & Goodwin (1976); Wood (1953); and Panlasigui & Knight (1930).
55 Laura S. Hamilton Brian M. Stecher, Kun Yuan “The lack of evidence about the effects of SBR derives primarily from the fact that the vision has never been fully realized in practice.” p. 47 Dismissive Standards-Based Reform in the United States: History, Research, and Future Directions Center on Education Policy, December, 2008 http://www.rand.org/content/dam/rand/pubs/reprints/2009/RAND_RP1384.pdf Institute of Education Sciences, US Education Department Relevant pre-2000 studies of the effects of standards, alignment, goal setting, setting reachable goals, etc. include those of Mitchell (1999); Morgan & Ramist (1998); the *Southern Regional Education Board (1998); Miles, Bishop, Collins, Fink, Gardner, Grant, Hussain, et al. (1997); the Florida Office of Program Policy Analysis (1997); Pomplun (1997); Schmoker (1996); Aguilera & Hendricks (1996); Banta, Lund, Black & Oblander (1996); Bottoms & Mikos (1995); *Bamburg & Medina (1993); Bishop (1993); the U. S. General Accounting Office (1993); Eckstein & Noah (1993); Mattsson (1993); Brown (1992); Heyneman & Ransom (1992); Whetton (1992); Anderson, Muir, Bateson, Blackmore & Rogers (1990); Csikszentmihalyi (1990); *Levine & Lezotte (1990); LaRoque & Coleman (1989); Hillocks (1987); Willingham & Morris (1986); Resnick & Resnick (1985); Ogle & Fritts (1984); *Natriello & Dornbusch (1984); Brooke & Oxenham (1984); Rentz (1979); Wellisch, MacQueen, Carriere & Dick (1978); *Rosswork (1977); Estes, Colvin & Goodwin (1976); Wood (1953); and Panlasigui & Knight (1930).
56 Laura S. Hamilton Brian M. Stecher, Kun Yuan “[A]lthough many conceptions of SBR emphasize autonomy, we currently know relatively little about the effects of granting autonomy or what the right balance is between autonomy and prescriptiveness.” p. 55 Dismissive Standards-Based Reform in the United States: History, Research, and Future Directions Center on Education Policy, December, 2008 http://www.rand.org/content/dam/rand/pubs/reprints/2009/RAND_RP1384.pdf Institute of Education Sciences, US Education Department Relevant pre-2000 studies of the effects of standards, alignment, goal setting, setting reachable goals, etc. include those of Mitchell (1999); Morgan & Ramist (1998); the *Southern Regional Education Board (1998); Miles, Bishop, Collins, Fink, Gardner, Grant, Hussain, et al. (1997); the Florida Office of Program Policy Analysis (1997); Pomplun (1997); Schmoker (1996); Aguilera & Hendricks (1996); Banta, Lund, Black & Oblander (1996); Bottoms & Mikos (1995); *Bamburg & Medina (1993); Bishop (1993); the U. S. General Accounting Office (1993); Eckstein & Noah (1993); Mattsson (1993); Brown (1992); Heyneman & Ransom (1992); Whetton (1992); Anderson, Muir, Bateson, Blackmore & Rogers (1990); Csikszentmihalyi (1990); *Levine & Lezotte (1990); LaRoque & Coleman (1989); Hillocks (1987); Willingham & Morris (1986); Resnick & Resnick (1985); Ogle & Fritts (1984); *Natriello & Dornbusch (1984); Brooke & Oxenham (1984); Rentz (1979); Wellisch, MacQueen, Carriere & Dick (1978); *Rosswork (1977); Estes, Colvin & Goodwin (1976); Wood (1953); and Panlasigui & Knight (1930).
57 Laura S. Hamilton Brian M. Stecher, Kun Yuan “One of the primary responsibilities of the federal government should be to ensure ongoing collection of evidence demonstrating the effects of the policies, which could be used to make decisions about whether to continue on the current course or whether small adjustments or a major overhaul are needed.” p. 55 Dismissive Standards-Based Reform in the United States: History, Research, and Future Directions Center on Education Policy, December, 2008 http://www.rand.org/content/dam/rand/pubs/reprints/2009/RAND_RP1384.pdf Institute of Education Sciences, US Education Department Relevant pre-2000 studies of the effects of standards, alignment, goal setting, setting reachable goals, etc. include those of Mitchell (1999); Morgan & Ramist (1998); the *Southern Regional Education Board (1998); Miles, Bishop, Collins, Fink, Gardner, Grant, Hussain, et al. (1997); the Florida Office of Program Policy Analysis (1997); Pomplun (1997); Schmoker (1996); Aguilera & Hendricks (1996); Banta, Lund, Black & Oblander (1996); Bottoms & Mikos (1995); *Bamburg & Medina (1993); Bishop (1993); the U. S. General Accounting Office (1993); Eckstein & Noah (1993); Mattsson (1993); Brown (1992); Heyneman & Ransom (1992); Whetton (1992); Anderson, Muir, Bateson, Blackmore & Rogers (1990); Csikszentmihalyi (1990); *Levine & Lezotte (1990); LaRoque & Coleman (1989); Hillocks (1987); Willingham & Morris (1986); Resnick & Resnick (1985); Ogle & Fritts (1984); *Natriello & Dornbusch (1984); Brooke & Oxenham (1984); Rentz (1979); Wellisch, MacQueen, Carriere & Dick (1978); *Rosswork (1977); Estes, Colvin & Goodwin (1976); Wood (1953); and Panlasigui & Knight (1930).
58 Douglas N. Harris Lori L. Taylor, Amy A. Levine, William K. Ingle, Leslie McDonald "However, previous studies under-state current costs by focusing on costs before NCLB was put in place and by excluding important cost categories." Denigrating The Resource Costs of Standards, Assessments, and Accountability report to the National Research Council   National Research Council funders No, they did not leave out important cost categories; Harris' study deliberately exagerates costs. See pages 3-10:  https://nonpartisaneducation.org/Review/Essays/v10n1.pdf
59 Joan Herman Katherine E. Ryan, Lorrie A. Shepard, Eds. "Yet, available evidence suggests that the rhetoric surpasses the reality of formative assessment use" p.217 Denigrating Accountability and assessment: Is public interest in K-12 education being served? Chapter 11 in The Future of Test-Based Educational Accountability https://www.routledge.com/The-Future-of-Test-Based-Educational-Accountability-1st-Edition/Ryan-Shepard/p/book/9780805864700 Institute of Education Sciences, US Education Department Studies of formative testing date back a cenury, and the evidence, on average, is strongly positive, which is not the result favored by CRESST, so they declare the studies nonexistent. See, for example,  https://www.tandfonline.com/doi/full/10.1080/15305058.2011.602920 ; https://nonpartisaneducation.org/Review/Resources/QuantitativeList.htm ; https://nonpartisaneducation.org/Review/Resources/SurveyList.htm ; https://nonpartisaneducation.org/Review/Resources/QualitativeList.htm
60 Joan Herman Katherine E. Ryan, Lorrie A. Shepard, Eds. "The research base examining effects on students with disabilities and on English Language learners is scanty." p.223 Dismissive Accountability and assessment: Is public interest in K-12 education being served? Chapter 11 in The Future of Test-Based Educational Accountability https://www.routledge.com/The-Future-of-Test-Based-Educational-Accountability-1st-Edition/Ryan-Shepard/p/book/9780805864700 Institute of Education Sciences, US Education Department  
61 Joan Herman Katherine E. Ryan, Lorrie A. Shepard, Eds. "...there is no obvious accountability mechanism for the "average student" who may have made it just over the proficient level. There is little research on this issue." p.224 Dismissive Accountability and assessment: Is public interest in K-12 education being served? Chapter 11 in The Future of Test-Based Educational Accountability https://www.routledge.com/The-Future-of-Test-Based-Educational-Accountability-1st-Edition/Ryan-Shepard/p/book/9780805864700 Institute of Education Sciences, US Education Department Relevant pre-2000 studies of the effects of minimum-competency testing and the problems with a single passing score include those of Frederiksen (1994); Jacobson (1992); Winfield (1990); Ligon, Johnstone, Brightman, Davis, et al. (1990); Losack (1987); Marshall (1987); Mangino & Babcock (1986); Michigan Department of Education (1984); Serow (1982); Brunton (1982); Paramore, et al. (1980); Ogden (1979); and Findley (1978).
62 Joan Herman   "The report considers how well the model fits available evidence by examining whether and how accountability assessment influences students’ learning opportunities and the relationship between accountability and learning." abstract Dismissive Accountability and assessment: Is public interest in K-12 education being served? CRESST Report 728, October 2007 https://eric.ed.gov/?id=ED499421 Institute of Education Sciences, US Education Department See, for example, Test Frequency, Stakes, and Feedback in Student Achievement: A Meta-Analysis   https://journals.sagepub.com/doi/abs/10.1177/0193841X19865628#abstract
63 Joan Herman   "What of the impact of accountability on other segments of the student population--traditionally higher performing students? ...The average student? ...there is no obvious accountability mechanism for the "average student. There is little research on this issue." Dismissive Accountability and assessment: Is public interest in K-12 education being served? CRESST Report 728, October 2007 https://eric.ed.gov/?id=ED499421 Institute of Education Sciences, US Education Department Relevant pre-2000 studies of the effects of minimum-competency testing and the problems with a single passing score include those of Frederiksen (1994); Winfield (1990); Ligon, Johnstone, Brightman, Davis, et al. (1990); Losack (1987); Mangino & Babcock (1986); Serow (1982); Brunton (1982); Paramore, et al. (1980); Ogden (1979); and Findley (1978).
64 Joan Herman   "While a thorough treatment of the effects on teachers is also beyond the scope of this report, it is worth noting a growing literature that is cause for concern." p.17 Dismissive Accountability and assessment: Is public interest in K-12 education being served? CRESST Report 728, October 2007 https://eric.ed.gov/?id=ED499421 Institute of Education Sciences, US Education Department Rubbish. Entire books dating back a century were written on the topic, for example:  C.C. Ross, Measurement in Today’s Schools, 1942;  G.M. Ruch, G.D. Stoddard, Tests and Measurements in High School Instruction, 1927;  C.W. Odell, Educational Measurement in High School, 1930. Other testimonies to the abundance of educational testing and empirical research on test use starting in the first half of the twentieth century can be found in Lincoln & Workman 1936, 4, 7; Butts 1947, 605; Monroe 1950, 1461; Holman & Docter 1972, 34; Tyack 1974, 183; and Lohman 1997, 88.
65 Joan Herman   "The research base examining effects on students with disabilities and on English language learner students is scanty." pp.16-17 Dismissive Accountability and assessment: Is public interest in K-12 education being served? CRESST Report 728, October 2007 https://eric.ed.gov/?id=ED499421 Institute of Education Sciences, US Education Department  
66 Eva L. Baker   "Tests only dimly reflect in their design the results of research on learning, whether of skills, subject matter, or problem solving." p.310 Denigrating The End(s) of Testing Educational Researcher, Vol. 36, No. 6, pp. 309–317   2007 Presidential Address for the American Educational Research Association  
67 Eva L. Baker   "To my mind, the evidential disconnect between test design and learning research is no small thing." p.310 Dismissive The End(s) of Testing Educational Researcher, Vol. 36, No. 6, pp. 309–317   2007 Presidential Address for the American Educational Research Association  
68 Eva L. Baker   "What if we set aside learning-based design and ask, “How well do any of our external tests work?” The answer is that we often don’t know enough to know. We have little evidence that tests are in sync with their stated or de facto purposes or that their results lead to appropriate decisions." p.310 Dismissive The End(s) of Testing Educational Researcher, Vol. 36, No. 6, pp. 309–317   2007 Presidential Address for the American Educational Research Association  
69 Laura S. Hamilton Brian M. Stecher, Julie A. Marsh, Jennifer Sloan McCombs, Abby Robyn, Jennifer Lin Russell, Scott Naftel, Heather Barney "However, the paths through which SBA [standards-based accountability] changes district, school, and classroom practices and how these changes in practice influence student outcomes are largely unexplored. There is strong evidence that SBA leads to changes in teachers’ instructional practices (Hamilton, 2004; Stecher, 2002)." p.5 Dismissive Standards-Based Accountability Under No Child Left Behind: Experiences of Teachers and Administrators in Three States Rand Corporation, 2007 https://www.rand.org/pubs/monographs/MG589.html "This research was sponsored by the National Science Foundation under grant number REC-0228295." Relevant pre-2000 studies of the effects of standards, alignment, goal setting, setting reachable goals, etc. include those of Mitchell (1999); Morgan & Ramist (1998); the *Southern Regional Education Board (1998); Miles, Bishop, Collins, Fink, Gardner, Grant, Hussain, et al. (1997); the Florida Office of Program Policy Analysis (1997); Pomplun (1997); Schmoker (1996); Aguilera & Hendricks (1996); Banta, Lund, Black & Oblander (1996); Bottoms & Mikos (1995); *Bamburg & Medina (1993); Bishop (1993); the U. S. General Accounting Office (1993); Eckstein & Noah (1993); Mattsson (1993); Brown (1992); Heyneman & Ransom (1992); Whetton (1992); Anderson, Muir, Bateson, Blackmore & Rogers (1990); Csikszentmihalyi (1990); *Levine & Lezotte (1990); LaRoque & Coleman (1989); Hillocks (1987); Willingham & Morris (1986); Resnick & Resnick (1985); Ogle & Fritts (1984); *Natriello & Dornbusch (1984); Brooke & Oxenham (1984); Rentz (1979); Wellisch, MacQueen, Carriere & Dick (1978); *Rosswork (1977); Estes, Colvin & Goodwin (1976); Wood (1953); and Panlasigui & Knight (1930).
70 Laura S. Hamilton Brian M. Stecher, Julie A. Marsh, Jennifer Sloan McCombs, Abby Robyn, Jennifer Lin Russell, Scott Naftel, Heather Barney "Much less is known about the impact of SBA at the district and school levels and the relationships among actions at the various levels and student outcomes. This study was designed to shed light on this complex set of relationships…" p.5 Dismissive Standards-Based Accountability Under No Child Left Behind: Experiences of Teachers and Administrators in Three States Rand Corporation, 2007 https://www.rand.org/pubs/monographs/MG589.html "This research was sponsored by the National Science Foundation under grant number REC-0228295." Relevant pre-2000 studies of the effects of standards, alignment, goal setting, setting reachable goals, etc. include those of Mitchell (1999); Morgan & Ramist (1998); the *Southern Regional Education Board (1998); Miles, Bishop, Collins, Fink, Gardner, Grant, Hussain, et al. (1997); the Florida Office of Program Policy Analysis (1997); Pomplun (1997); Schmoker (1996); Aguilera & Hendricks (1996); Banta, Lund, Black & Oblander (1996); Bottoms & Mikos (1995); *Bamburg & Medina (1993); Bishop (1993); the U. S. General Accounting Office (1993); Eckstein & Noah (1993); Mattsson (1993); Brown (1992); Heyneman & Ransom (1992); Whetton (1992); Anderson, Muir, Bateson, Blackmore & Rogers (1990); Csikszentmihalyi (1990); *Levine & Lezotte (1990); LaRoque & Coleman (1989); Hillocks (1987); Willingham & Morris (1986); Resnick & Resnick (1985); Ogle & Fritts (1984); *Natriello & Dornbusch (1984); Brooke & Oxenham (1984); Rentz (1979); Wellisch, MacQueen, Carriere & Dick (1978); *Rosswork (1977); Estes, Colvin & Goodwin (1976); Wood (1953); and Panlasigui & Knight (1930).
71 Eva L. Baker Joan L. Herman, Robert L. Linn "For  example, performance assessment was a rage in the early 1990s  because it was something new and flashy, and looked to have great promise.  Before almost any research was done, a number of states dropped their multiple-choice accountability systems, replacing them with performance assessments.   Dismissive ACCELERATING FUTURE POSSIBILITIES FORASSESSMENT AND LEARNING, p.1 CRESST Line, Winter 2006 https://www.researchgate.net/publication/277283780_in_Educational_Researcher_called_The_Awful_Reputation_of_Education_Research Institute of Education Sciences, US Education Department It is selected-response item formats (e.g., multiple choice) that are new. Open-ended item formats have been with us for millenia. And, thousands of research, evalution, and validity studies have been conducted on them.
72 Eva L. Baker Joan L. Herman, Robert L. Linn "By the end of this year, nearly half of all states will have graduation exams in place (Peterson, 2005). Short institutional memory forgets that similar minimum competency tests did not lead to increased achievement some 20 years ago, but instead contributed to higher numbers of high school dropouts and inequities along racial lines (Catterall, 1989; Haertel & Herman, 2005)." Dismissive ACCELERATING FUTURE POSSIBILITIES FORASSESSMENT AND LEARNING, p.3 CRESST Line, Winter 2006 https://www.researchgate.net/publication/277283780_in_Educational_Researcher_called_The_Awful_Reputation_of_Education_Research Institute of Education Sciences, US Education Department Just some of the relevant pre-2008 studies of the effects of minimum-competency or exit exams and the problems with a single passing score include those of Alvarez, Moreno, & Patrinos (2007); Grodsky & Kalogrides (2006); Audette (2005); Orlich (2003); StandardsWork (2003); Meisels, et al. (2003); Braun (2003); Rosenshine (2003); Tighe, Wang, & Foley (2002); Carnoy & Loeb (2002); Baumert & Demmrich (2001); Rosenblatt & Offer (2001); Phelps (2001); Toenjes, Dworkin, Lorence, & Hill (2000); Wenglinsky (2000); Massachusetts Finance Office (2000); DeMars (2000); Bishop (1999, 2000, 2001, & 2004); Grissmer & Flanagan(1998); Strauss, Bowes, Marks, & Plesko (1998); Frederiksen (1994); Ritchie & Thorkildsen (1994); Chao-Qun & Hui (1993); Potter & Wall (1992); Jacobson (1992); Rodgers, et al. (1991); Morris (1991); Winfield (1990); Ligon, Johnstone, Brightman, Davis, et al. (1990); Winfield (1987); Koffler (1987); Losack (1987); Marshall (1987); Hembree (1987); Mangino, Battaille, Washington, & Rumbaut (1986); Michigan Department of Education (1984); Ketchie (1984); Serow (1982); Indiana Education Department (1982); Brunton (1982); Paramore, et al. (1980); Ogden (1979); Down(2) (1979); Wellisch (1978); and Findley (1978).
73 Edward Haertel Joan Herman "Passing rates on MCTs in many states rose rapidly from year to year (Popham, Cruse, Rankin, Sandifer, & Williams, 1985). Despite these gains, and positive trends on examinations like the National Assessment of Educational Progress (NAEP), there is little evidence that MCTs were the reason for improvements on other examinations." Dismissive A Historical Perspective on Validity Arguments for Accountability Testing CRESST Report 654, June 2005 https://cresst.org/wp-content/uploads/R654.pdf Institute of Education Sciences, US Education Department Just some of the relevant pre-2008 studies of the effects of minimum-competency or exit exams and the problems with a single passing score include those of Alvarez, Moreno, & Patrinos (2007); Grodsky & Kalogrides (2006); Audette (2005); Orlich (2003); StandardsWork (2003); Meisels, et al. (2003); Braun (2003); Rosenshine (2003); Tighe, Wang, & Foley (2002); Carnoy & Loeb (2002); Baumert & Demmrich (2001); Rosenblatt & Offer (2001); Phelps (2001); Toenjes, Dworkin, Lorence, & Hill (2000); Wenglinsky (2000); Massachusetts Finance Office (2000); DeMars (2000); Bishop (1999, 2000, 2001, & 2004); Grissmer & Flanagan(1998); Strauss, Bowes, Marks, & Plesko (1998); Frederiksen (1994); Ritchie & Thorkildsen (1994); Chao-Qun & Hui (1993); Potter & Wall (1992); Jacobson (1992); Rodgers, et al. (1991); Morris (1991); Winfield (1990); Ligon, Johnstone, Brightman, Davis, et al. (1990); Winfield (1987); Koffler (1987); Losack (1987); Marshall (1987); Hembree (1987); Mangino, Battaille, Washington, & Rumbaut (1986); Michigan Department of Education (1984); Ketchie (1984); Serow (1982); Indiana Education Department (1982); Brunton (1982); Paramore, et al. (1980); Ogden (1979); Down(2) (1979); Wellisch (1978); and Findley (1978).
74 Robert L. Linn   "Despite the clear appeal of assessment-based accountability and the widespread use of this approach, the development of assessments that are aligned with content standards and for which there is solid evidence of validity and reliability is a challenging endeavor." Dismissive Issues in the Design of Accountability Systems CRESST Report 650, April 2005 https://cresst.org/wp-content/uploads/R650.pdf Institute of Education Sciences, US Education Department Relevant pre-2000 studies of the effects of standards, alignment, goal setting, setting reachable goals, etc. include those of Mitchell (1999); Morgan & Ramist (1998); the *Southern Regional Education Board (1998); Miles, Bishop, Collins, Fink, Gardner, Grant, Hussain, et al. (1997); the Florida Office of Program Policy Analysis (1997); Pomplun (1997); Schmoker (1996); Aguilera & Hendricks (1996); Banta, Lund, Black & Oblander (1996); Bottoms & Mikos (1995); *Bamburg & Medina (1993); Bishop (1993); the U. S. General Accounting Office (1993); Eckstein & Noah (1993); Mattsson (1993); Brown (1992); Heyneman & Ransom (1992); Whetton (1992); Anderson, Muir, Bateson, Blackmore & Rogers (1990); Csikszentmihalyi (1990); *Levine & Lezotte (1990); LaRoque & Coleman (1989); Hillocks (1987); Willingham & Morris (1986); Resnick & Resnick (1985); Ogle & Fritts (1984); *Natriello & Dornbusch (1984); Brooke & Oxenham (1984); Rentz (1979); Wellisch, MacQueen, Carriere & Dick (1978); *Rosswork (1977); Estes, Colvin & Goodwin (1976); Wood (1953); and Panlasigui & Knight (1930).
75 Robert L. Linn   "Alignment of an assessment with the content standards that it is intended to measure is critical if the assessment is to buttress rather than undermine the standards. Too little attention has been given to the evaluation of the alignment of assessments and standards." Denigrating Issues in the Design of Accountability Systems CRESST Report 650, April 2005 https://cresst.org/wp-content/uploads/R650.pdf Institute of Education Sciences, US Education Department Relevant pre-2000 studies of the effects of standards, alignment, goal setting, setting reachable goals, etc. include those of Mitchell (1999); Morgan & Ramist (1998); the *Southern Regional Education Board (1998); Miles, Bishop, Collins, Fink, Gardner, Grant, Hussain, et al. (1997); the Florida Office of Program Policy Analysis (1997); Pomplun (1997); Schmoker (1996); Aguilera & Hendricks (1996); Banta, Lund, Black & Oblander (1996); Bottoms & Mikos (1995); *Bamburg & Medina (1993); Bishop (1993); the U. S. General Accounting Office (1993); Eckstein & Noah (1993); Mattsson (1993); Brown (1992); Heyneman & Ransom (1992); Whetton (1992); Anderson, Muir, Bateson, Blackmore & Rogers (1990); Csikszentmihalyi (1990); *Levine & Lezotte (1990); LaRoque & Coleman (1989); Hillocks (1987); Willingham & Morris (1986); Resnick & Resnick (1985); Ogle & Fritts (1984); *Natriello & Dornbusch (1984); Brooke & Oxenham (1984); Rentz (1979); Wellisch, MacQueen, Carriere & Dick (1978); *Rosswork (1977); Estes, Colvin & Goodwin (1976); Wood (1953); and Panlasigui & Knight (1930).
76 Lorraine M. McDonnell   "A growing body of research suggests that school and classroom practices do change in response to these assessments (Herman and Golan, 1993; Smith and Rottenberg, 1991; Madaus, 1988)" 1stness Politics, Persuasion, and Educational Testing, p.9 Harvard University Press, 2004     Rubbish. Entire books dating back a century were written on the topic, for example:  C.C. Ross, Measurement in Today’s Schools, 1942;  G.M. Ruch, G.D. Stoddard, Tests and Measurements in High School Instruction, 1927;  C.W. Odell, Educational Measurement in High School, 1930. Other testimonies to the abundance of educational testing and empirical research on test use starting in the first half of the twentieth century can be found in Lincoln & Workman 1936, 4, 7; Butts 1947, 605; Monroe 1950, 1461; Holman & Docter 1972, 34; Tyack 1974, 183; and Lohman 1997, 88.
77 Lorraine M. McDonnell   "A growing body of research suggests that school and classroom practices do change in response to these assessments (Herman and Golan, 1993; Smith and Rottenberg, 1991; Madaus, 1988)" Dismissive Politics, Persuasion, and Educational Testing, p.9 Harvard University Press, 2004     Rubbish. Entire books dating back a century were written on the topic, for example:  C.C. Ross, Measurement in Today’s Schools, 1942;  G.M. Ruch, G.D. Stoddard, Tests and Measurements in High School Instruction, 1927;  C.W. Odell, Educational Measurement in High School, 1930. Other testimonies to the abundance of educational testing and empirical research on test use starting in the first half of the twentieth century can be found in Lincoln & Workman 1936, 4, 7; Butts 1947, 605; Monroe 1950, 1461; Holman & Docter 1972, 34; Tyack 1974, 183; and Lohman 1997, 88.
78 Lorraine M. McDonnell   "Although most literature on policy instruments identifies this persuasive tool as one of the stategies available to policymakers, little theoretical or comparative empirical research has been conducted on its properties." Dismissive Politics, Persuasion, and Educational Testing, p.24 Harvard University Press, 2004      
79 Lorraine M. McDonnell   "There is empirical research on policies that rely on hortatory tools, but studies of these individual policies have not examined them within a broader theoretical framework." Denigrating Politics, Persuasion, and Educational Testing, p.24 Harvard University Press, 2004      
80 Lorraine M. McDonnell   "This chapter represents an initial attempt to analyze the major characteristics of hortatory policy by taking an inductive approach and looking across several different policy areas to identify a few basic properties common to most policies of this type." 1stness Politics, Persuasion, and Educational Testing, p.24 Harvard University Press, 2004      
81 Lorraine M. McDonnell   "This chapter has begun the task of building a conceptual framework for understanding hortatory  policies by identifying their underlying causal assumptions and analyzing some basic properties common to most polcies that rely on information and values to motivate action."  1stness Politics, Persuasion, and Educational Testing, p.44–45 Harvard University Press, 2004      
82 Lorraine M. McDonnell   "Because so little systematic research has been conducted on hortatory policy, it is possible at this point only to suggest, rather than to specify, the conditions under which its underlying assumptions will be valid and a policy likely to succeed." Dismissive Politics, Persuasion, and Educational Testing, p.45 Harvard University Press, 2004      
83 Lorraine M. McDonnell   "Additional theoretical and empirical work is needed to develop a more rigorous and nuanced understanding of hotatory policy. Nevertheless, this study starts that process by articulating the policy theory undergirding hortatory policy and by outlining its potential promise and shortcomings." Denigrating Politics, Persuasion, and Educational Testing, p.45 Harvard University Press, 2004      
84 Lorraine M. McDonnell   "However, because research on the effects of high stakes testing is limited, finds mixed results, and suggests unintended consequences, the informational and persuasive dimensions of testing will continue to be critical to the success of this policy." Dismissive Politics, Persuasion, and Educational Testing, p.182–183 Harvard University Press, 2004     Rubbish. Entire books dating back a century were written on the topic, for example:  C.C. Ross, Measurement in Today’s Schools, 1942;  G.M. Ruch, G.D. Stoddard, Tests and Measurements in High School Instruction, 1927;  C.W. Odell, Educational Measurement in High School, 1930. Other testimonies to the abundance of educational testing and empirical research on test use starting in the first half of the twentieth century can be found in Lincoln & Workman 1936, 4, 7; Butts 1947, 605; Monroe 1950, 1461; Holman & Docter 1972, 34; Tyack 1974, 183; and Lohman 1997, 88.
85 Laura S. Hamilton Daniel F. McCaffrey, J.R. Lockwood, Daniel M. Koretz “The shortcomings of the studies make it difficult to determine the size of teacher effects, but we suspect that the magnitude of some of the effects reported in this literature are overstated.” p. xiii Denigrating Evaluating Value-Added Models for Teacher Accountability  Rand Corporation, 2003 https://www.rand.org/content/dam/rand/pubs/monographs/2004/RAND_MG158.pdf Rand Corporation funders Tennessee's TVAAS value-added measurement system had been running a decade when they wrote this and did much of what these authors claim had never been done.
86 Laura S. Hamilton Daniel F. McCaffrey, J.R. Lockwood, Daniel M. Koretz “Using VAM to estimate individual teacher effects is a recent endeavor, and many of the possible sources of error have not been thoroughly evaluated in the literature.” p. xix Dismissive Evaluating Value-Added Models for Teacher Accountability  Rand Corporation, 2003 https://www.rand.org/content/dam/rand/pubs/monographs/2004/RAND_MG158.pdf Rand Corporation funders Tennessee's TVAAS value-added measurement system had been running a decade when they wrote this and did much of what these authors claim had never been done.
87 Laura S. Hamilton Daniel F. McCaffrey, J.R. Lockwood, Daniel M. Koretz “Empirical evaluations do not exist for many of the potential sources of error we have identified. Studies need to be conducted to determine how these factors contribute to estimated teacher effects and to determine the conditions that exacerbate or mitigate the impact these factors have on teacher effects.” p. xix Dismissive Evaluating Value-Added Models for Teacher Accountability  Rand Corporation, 2003 https://www.rand.org/content/dam/rand/pubs/monographs/2004/RAND_MG158.pdf Rand Corporation funders Tennessee's TVAAS value-added measurement system had been running a decade when they wrote this and did much of what these authors claim had never been done.
88 Laura S. Hamilton Daniel F. McCaffrey, J.R. Lockwood, Daniel M. Koretz “This lack of attention to teachers in policy discussions may be attributed in part to another body of literature that attempted to determine the effects of specific teacher background characteristics, including credentialing status (e.g., Miller, McKenna, and McKenna, 1998; Goldhaber and Brewer, 2000) and subject matter coursework (e.g., Monk, 1994).” p. 8 Dismissive Evaluating Value-Added Models for Teacher Accountability  Rand Corporation, 2003 https://www.rand.org/content/dam/rand/pubs/monographs/2004/RAND_MG158.pdf Rand Corporation funders Tennessee's TVAAS value-added measurement system had been running a decade when they wrote this and did much of what these authors claim had never been done.
89 Laura S. Hamilton Daniel F. McCaffrey, J.R. Lockwood, Daniel M. Koretz “To date, there has been little empirical exploration of the size of school effects and the sensitivity of teacher effects to modeling of school effects.” p. 78 Dismissive Evaluating Value-Added Models for Teacher Accountability  Rand Corporation, 2003 https://www.rand.org/content/dam/rand/pubs/monographs/2004/RAND_MG158.pdf Rand Corporation funders Tennessee's TVAAS value-added measurement system had been running a decade when they wrote this and did much of what these authors claim had never been done.
90 Laura S. Hamilton Daniel F. McCaffrey, J.R. Lockwood, Daniel M. Koretz “There are no empirical explorations of the robustness of estimates to assumptions about prior-year schooling effects.“ p. 81 Dismissive Evaluating Value-Added Models for Teacher Accountability  Rand Corporation, 2003 https://www.rand.org/content/dam/rand/pubs/monographs/2004/RAND_MG158.pdf Rand Corporation funders Tennessee's TVAAS value-added measurement system had been running a decade when they wrote this and did much of what these authors claim had never been done.
91 Laura S. Hamilton Daniel F. McCaffrey, J.R. Lockwood, Daniel M. Koretz “There is currently no empirical evidence about the sensitivity of gain scores or teacher effects to such alternatives.” p. 89 Dismissive Evaluating Value-Added Models for Teacher Accountability  Rand Corporation, 2003 https://www.rand.org/content/dam/rand/pubs/monographs/2004/RAND_MG158.pdf Rand Corporation funders Tennessee's TVAAS value-added measurement system had been running a decade when they wrote this and did much of what these authors claim had never been done.
92 Laura S. Hamilton Daniel F. McCaffrey, J.R. Lockwood, Daniel M. Koretz “Empirical evaluations do not exist for many of the potential sources of error we have identified. Studies need to be conducted to determine how these factors contribute to estimated teacher effects and to determine the conditions that exacerbate or mitigate the impact these factors have on teacher effects.” p. 116 Dismissive Evaluating Value-Added Models for Teacher Accountability  Rand Corporation, 2003 https://www.rand.org/content/dam/rand/pubs/monographs/2004/RAND_MG158.pdf Rand Corporation funders Tennessee's TVAAS value-added measurement system had been running a decade when they wrote this and did much of what these authors claim had never been done.
93 Laura S. Hamilton Daniel F. McCaffrey, J.R. Lockwood, Daniel M. Koretz “Although we expect missing data are likely to be pervasive, there is little systematic discussion of the extent or nature of missing data in test score databases.” p. 117 Dismissive Evaluating Value-Added Models for Teacher Accountability  Rand Corporation, 2003 https://www.rand.org/content/dam/rand/pubs/monographs/2004/RAND_MG158.pdf Rand Corporation funders Tennessee's TVAAS value-added measurement system had been running a decade when they wrote this and did much of what these authors claim had never been done.
94 Marguerite Clarke 5 co-authors “What this study adds to the body of literature in this area is a systematic look at how impact varies with the stakes attached to the test results.” p. 91 1stness Perceived Effects of State-Mandated Testing Programs on Teaching and Learning etc. (5 co-authors) National Board on Educational Testing and Public Policy monograph, January 2003 http://files.eric.ed.gov/fulltext/ED474867.pdf Ford Foundation See, for example, Test Frequency, Stakes, and Feedback in Student Achievement: A Meta-Analysis   https://journals.sagepub.com/doi/abs/10.1177/0193841X19865628#abstract
95 Marguerite Clarke 5 co-authors “Many calls for school reform assert that high-stakes testing will foster the economic competitiveness of the U.S. However, the empirical basis for this claim is weak.” p. 96, n. 1 Denigrating Perceived Effects of State-Mandated Testing Programs on Teaching and Learning etc. (5 co-authors) National Board on Educational Testing and Public Policy monograph, January 2003 http://files.eric.ed.gov/fulltext/ED474867.pdf Ford Foundation  
96 Brian M. Stecher Laura S. Hamilton "The business model of setting clear targets, attaching incentives to the attainment of those targets, and rewarding those responsible for reaching the targets has proven successful in a wide range of business enterprises. But there is no evidence that these accountability principles will work well in an educational context, and there are many reasons to doubt that the principles can be applied without significant adaptation." Dismissive Putting Theory to the Test: Systems of "Educational Accountability" Should be Held Accountable Rand Review, Spring 2002 https://www.rand.org/pubs/periodicals/rand-review/issues/rr-04-02/theory.html Rand Corporation funders See, for example:  https://www.tandfonline.com/doi/full/10.1080/15305058.2011.602920 ; https://nonpartisaneducation.org/Review/Resources/QuantitativeList.htm .  This list includes 24 studies completed before 2000 whose primary focus was to measure the effect of “test-based accountability.” A few dozen more pre-2000 studies also measured the effect of test-based accountability although such was not their primary focus. Include qualitative and program evaluation studies of test-based accountability, and the count of pre-2000 studies rises into the hundreds. 
97 Brian M. Stecher Laura S. Hamilton " The lack of strong evidence regarding the design and effectiveness of accountability systems hampers policymaking at a critical juncture." Denigrating Putting Theory to the Test: Systems of "Educational Accountability" Should be Held Accountable Rand Review, Spring 2002 https://www.rand.org/pubs/periodicals/rand-review/issues/rr-04-02/theory.html Rand Corporation funders See, for example:  https://www.tandfonline.com/doi/full/10.1080/15305058.2011.602920 ; https://nonpartisaneducation.org/Review/Resources/QuantitativeList.htm .  This list includes 24 studies completed before 2000 whose primary focus was to measure the effect of “test-based accountability.” A few dozen more pre-2000 studies also measured the effect of test-based accountability although such was not their primary focus. Include qualitative and program evaluation studies of test-based accountability, and the count of pre-2000 studies rises into the hundreds. 
98 Brian M. Stecher Laura S. Hamilton "Nonetheless, the evidence has yet to justify the expectations. The initial evidence is, at best, mixed. On the plus side, students and teachers seem to respond to the incentives created by the accountability systems Dismissive Putting Theory to the Test: Systems of "Educational Accountability" Should be Held Accountable Rand Review, Spring 2002 https://www.rand.org/pubs/periodicals/rand-review/issues/rr-04-02/theory.html Rand Corporation funders See, for example:  https://www.tandfonline.com/doi/full/10.1080/15305058.2011.602920 ; https://nonpartisaneducation.org/Review/Resources/QuantitativeList.htm .  This list includes 24 studies completed before 2000 whose primary focus was to measure the effect of “test-based accountability.” A few dozen more pre-2000 studies also measured the effect of test-based accountability although such was not their primary focus. Include qualitative and program evaluation studies of test-based accountability, and the count of pre-2000 studies rises into the hundreds. 
99 Brian M. Stecher Laura S. Hamilton "Proponents of accountability attribute the improved scores in these states to clearer expectations, greater motivation on the part of the students and teachers, a focused curriculum, and more-effective instruction. However, there is little or no research to substantiate these positive changes or their effects on scores." Dismissive Putting Theory to the Test: Systems of "Educational Accountability" Should be Held Accountable Rand Review, Spring 2002 https://www.rand.org/pubs/periodicals/rand-review/issues/rr-04-02/theory.html Rand Corporation funders In fact, the evidence "that testing can improve education" is voluminous. See, for example, Phelps, R. P. (2005). The rich, robust research literature on testing’s achievement benefits. In R. P. Phelps (Ed.), Defending standardized testing (pp. 55–90). Mahwah, NJ: Psychology Press. Or, see https://journals.sagepub.com/doi/abs/10.1177/0193841X19865628#abstract
100 Brian M. Stecher Laura S. Hamilton "One of the earliest studies on the effects of testing (conducted in two Arizona schools in the late 1980s) showed that teachers reduced their emphasis on important, nontested material." Dismissive Putting Theory to the Test: Systems of "Educational Accountability" Should be Held Accountable Rand Review, Spring 2002 https://www.rand.org/pubs/periodicals/rand-review/issues/rr-04-02/theory.html Rand Corporation funders Hundreds, perhaps thousands, of studies of the effects of testing predate CRESST's in the 1980s. See, for example:  https://www.tandfonline.com/doi/full/10.1080/15305058.2011.602920 ; https://nonpartisaneducation.org/Review/Resources/QuantitativeList.htm ; https://nonpartisaneducation.org/Review/Resources/SurveyList.htm ; https://nonpartisaneducation.org/Review/Resources/QualitativeList.htm
101 Brian M. Stecher Laura S. Hamilton "Test-based accountability systems will work better if we acknowledge how little we know about them, if the federal government devotes appropriate resources to studying them, and if the states make ongoing efforts to improve them."  Dismissive Putting Theory to the Test: Systems of "Educational Accountability" Should be Held Accountable Rand Review, Spring 2002 https://www.rand.org/pubs/periodicals/rand-review/issues/rr-04-02/theory.html Rand Corporation funders See, for example:  https://www.tandfonline.com/doi/full/10.1080/15305058.2011.602920 ; https://nonpartisaneducation.org/Review/Resources/QuantitativeList.htm .  This list includes 24 studies completed before 2000 whose primary focus was to measure the effect of “test-based accountability.” A few dozen more pre-2000 studies also measured the effect of test-based accountability although such was not their primary focus. Include qualitative and program evaluation studies of test-based accountability, and the count of pre-2000 studies rises into the hundreds. 
102 Robert L. Linn Eva L. Baker "“It is true that many of these accommodated test conditions are not subjected to validity studies to determine whether the construct or domain tested has been significantly altered. In part, this lack of empirical data results from restricted resources.” p. 14 Dismissive Validity Issues for Accountability Systems CSE Technical Report 585 (December 2002) http://www.cse.ucla.edu/products/reports/TR585.pdf Office of Research and Improvement, US Education Department External evaluations of large-scale testing programs not only exist, but represent the norm. 
103 Lauren B. Resnick Robert Rothman, Jean B. Slattery, Jennifer L. Vranek "States that have or adopt test-based accountability programs claim that their tests are aligned to their standards. But there has been, up to now, no independent methodology for checking alignment. This paper describes and illustrates such a methodology..." 1stness Benchmarking and Alignment of Standards and Testing, p.1 CSE Technical Report 566, CRESST/Achieve, May 2002 https://www.achieve.org/files/TR566.pdf Office of Research and Improvement, US Education Department Relevant pre-2000 studies of the effects of standards, alignment, goal setting, setting reachable goals, etc. include those of Mitchell (1999); Morgan & Ramist (1998); the *Southern Regional Education Board (1998); Miles, Bishop, Collins, Fink, Gardner, Grant, Hussain, et al. (1997); the Florida Office of Program Policy Analysis (1997); Pomplun (1997); Schmoker (1996); Aguilera & Hendricks (1996); Banta, Lund, Black & Oblander (1996); Bottoms & Mikos (1995); *Bamburg & Medina (1993); Bishop (1993); the U. S. General Accounting Office (1993); Eckstein & Noah (1993); Mattsson (1993); Brown (1992); Heyneman & Ransom (1992); Whetton (1992); Anderson, Muir, Bateson, Blackmore & Rogers (1990); Csikszentmihalyi (1990); *Levine & Lezotte (1990); LaRoque & Coleman (1989); Hillocks (1987); Willingham & Morris (1986); Resnick & Resnick (1985); Ogle & Fritts (1984); *Natriello & Dornbusch (1984); Brooke & Oxenham (1984); Rentz (1979); Wellisch, MacQueen, Carriere & Dick (1978); *Rosswork (1977); Estes, Colvin & Goodwin (1976); Wood (1953); and Panlasigui & Knight (1930).
104 Lauren B. Resnick Robert Rothman, Jean B. Slattery, Jennifer L. Vranek "Yet  few,  if  any,  states have put in place effective policies or resource systems for improving instructional quality (National Research Council, 1999)." Dismissive Benchmarking and Alignment of Standards and Testing, p.4 CSE Technical Report 566, CRESST/Achieve, May 2002 https://www.achieve.org/files/TR566.pdf Office of Research and Improvement, US Education Department Hundreds, perhaps thousands, of studies of the effects of testing predate CRESST's in the 1980s. See, for example:  https://www.tandfonline.com/doi/full/10.1080/15305058.2011.602920 ; https://nonpartisaneducation.org/Review/Resources/QuantitativeList.htm ; https://nonpartisaneducation.org/Review/Resources/SurveyList.htm ; https://nonpartisaneducation.org/Review/Resources/QualitativeList.htm
105 Laura S. Hamilton, Brian M. Stecher, Stephen P. Klein   "Although test-based accountability has shown some compelling results, the issues are complex, the research is new and incomplete, and many of the claims that have received the most attention have proved to be premature and superficial." Denigrating Summary, p.xiv Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002 https://www.rand.org/pubs/monograph_reports/MR1554.html US National Science Foundation Hundreds, perhaps thousands, of studies of the effects of testing predate CRESST's in the 1980s. See, for example:  https://www.tandfonline.com/doi/full/10.1080/15305058.2011.602920 ; https://nonpartisaneducation.org/Review/Resources/QuantitativeList.htm ; https://nonpartisaneducation.org/Review/Resources/SurveyList.htm ; https://nonpartisaneducation.org/Review/Resources/QualitativeList.htm
106 Laura S. Hamilton, Brian M. Stecher, Stephen P. Klein   "The research evidence does not provide definitive information about the actual costs of testing but the information that is available suggests that expenditures for testing have grown in recent years." Dismissive Introduction, p.9 Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002 https://www.rand.org/pubs/monograph_reports/MR1554.html US National Science Foundation No. See, for example, Phelps, R.P. (2000, Winter). Estimating the cost of systemwide student testing in the United States. Journal of Education Finance, 25(3) 343–380; Danitz, T. (2001, February 27). Special report: States pay $400 million for tests in 2001. Stateline.org. Pew Center for the States; Hoxby, C.M. (2002). The cost of accountability, in W. M Evers & H.J. Walberg (Eds.), School Accountability, Stanford, CA: Hoover Institution Press; U.S. GAO. (1993, January). Student testing: Current extent and expenditures, with cost estimates for a national examination. GAO/PEMD-93-8. Washington, DC: US General Accounting Office; Picus, L.O., & Tralli, A. (1998, February). Alternative assessment programs: What are the true costs? CSE Technical Report 441, Los Angeles: CRESST; Phelps, R.P. (1998). Benefit-cost analysis of systemwide student testing, Paper presented at the annual meeting of the American Education Finance Association, Mobile, AL.
107 Laura S. Hamilton, Brian M. Stecher, Stephen P. Klein   "The General Accounting Office (1993) … estimate was $516 million … The estimate does not include time for more-extensive test preparation activities." p.9 Denigrating Introduction, p.9 Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002 https://www.rand.org/pubs/monograph_reports/MR1554.html US National Science Foundation As a matter of fact the GAO report did include those costs -- all of them. The GAO surveys very explicitly instructed respondents to "include any and all costs related" to each test, including any and all test preparation time and expenses.
108 Laura S. Hamilton, Daniel M. Koretz Laura S. Hamilton, Brian M. Stecher, Stephen P. Klein, Eds. "There is currently no substantial evidence on the effects of published report cards on parents’ decisionmaking or on the schools themselves." Dismissive Chapter 2: Tests and their use in test-based accountability systems, p.44 Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002 https://www.rand.org/pubs/monograph_reports/MR1554.html US National Science Foundation For decades, consulting services have existed that help parents new to a city select the right school or school district for them.
109 Vi-Nhuan Le, Stephen P. Klein Laura S. Hamilton, Brian M. Stecher, Stephen P. Klein, Eds. "Research on the inflation of gains remains too limited to indicate how prevalent the problem is." Dismissive Chapter 3: Technical criteria for evaluating tests, p. 68 Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002 https://www.rand.org/pubs/monograph_reports/MR1554.html US National Science Foundation In fact the test prep, or test coaching, literature is vast and dates back decades, with meta-analyses of the literature dating back at least to the 1970s. There's even a What Works Clearinghouse summary of the (post World Wide Web) college admission test prep research literature:  https://ies.ed.gov/ncee/wwc/Docs/InterventionReports/wwc_act_sat_100416.pdf . See also: Gilmore (1927)  DeWeerdt (1927)  French (1959) French & Dear (1959)  Ortar (1960)  Marron (1965)  ETS (1965). Messick & Jungeblut (1981)  Ellis, Konoske, Wulfeck, & Montague (1982)  DerSimonian and Laird (1983)  Kulik, Bangert-Drowns & Kulik (1984)  Powers (1985)  Jones (1986). Fraker (1986/1987)  Halpin (1987)  Whitla (1988)  Snedecor (1989)  Bond (1989). Baydar (1990)  Becker (1990)  Smyth (1990)  Moore (1991)  Alderson & Wall (1992)  Powers (1993)  Oren (1993). Powers & Rock (1994)  Scholes, Lane (1997)   Allalouf & Ben Shakhar (1998)  Robb & Ercanbrack (1999)  McClain (1999)  Camara (1999, 2001, 2008) Stone & Lane (2000, 2003)  Din & Soldan (2001)  Briggs (2001)  Palmer (2002)  Briggs & Hansen (2004)  Cankoy & Ali Tut (2005)  Crocker (2005)  Allensworth, Correa, & Ponisciak (2008)  Domingue & Briggs (2009)  Koljatic & Silva (2014)  Early (2019)  Herndon (2021)
110 Vi-Nhuan Le, Stephen P. Klein Laura S. Hamilton, Brian M. Stecher, Stephen P. Klein, Eds. "Relatively little is known about how testing accomodations affect score validity, and the few studies that have been conducted on the subject have had mixed results." Dismissive Chapter 3: Technical criteria for evaluating tests, p. 71 Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002 https://www.rand.org/pubs/monograph_reports/MR1554.html US National Science Foundation  
111 Brian M. Stecher Laura S. Hamilton, Brian M. Stecher, Stephen P. Klein, Eds. "High-stakes testing may also affect parents (e.g., their attitudes toward education, their engagement with schools,  and their direct participation in their child's learning) as well as policymakers (their beliefs about system performance, their judgements about program effectiveness, and their allocation of resources). However, these issues remain largely unexamined in the literature." Dismissive Chapter 4: Consequences of large-scale, high-stakes testing on school and classroom practice, p. 79 Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002 https://www.rand.org/content/dam/rand/pubs/monograph_reports/2002/MR1554.pdf US National Science Foundation Parents and other adults are typically reached.through public opinion polls. See, for example:  https://www.tandfonline.com/doi/full/10.1080/15305058.2011.602920 ; https://nonpartisaneducation.org/Review/Resources/SurveyList.htm .  Among the hundreds of polls conducted between 1958 and 2008, a majority of them included parents in particular or adults in general.
112 Brian M. Stecher Laura S. Hamilton, Brian M. Stecher, Stephen P. Klein, Eds. "As described in chapter 2, there was little concern about the effects of testing on teaching prior to the 1970s." Dismissive Chapter 4: Consequences of large-scale, high-stakes testing on school and classroom practice, p. 81 Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002 https://www.rand.org/content/dam/rand/pubs/monograph_reports/2002/MR1554.pdf US National Science Foundation Rubbish. Entire books were written on the topic, for example:  C.C. Ross, Measurement in Today’s Schools, 1942;  G.M. Ruch, G.D. Stoddard, Tests and Measurements in High School Instruction, 1927;  C.W. Odell, Educational Measurement in High School, 1930. Other testimonies to the abundance of educational testing and empirical research on test use starting in the first half of the twentieth century can be found in Lincoln & Workman 1936, 4, 7; Butts 1947, 605; Monroe 1950, 1461; Holman & Docter 1972, 34; Tyack 1974, 183; and Lohman 1997, 88.
113 Brian M. Stecher Laura S. Hamilton, Brian M. Stecher, Stephen P. Klein, Eds. "In light of the changes that occurred in the uses of large-scale testing in the 1980s and 1990s, researchers began to investigate teachers' reactions to external assessment. The initial research on the impact of large-scale testing was conducted in the 1980s and the 1990s." Dismissive Chapter 4: Consequences of large-scale, high-stakes testing on school and classroom practice, p. 83 Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002 https://www.rand.org/content/dam/rand/pubs/monograph_reports/2002/MR1554.pdf US National Science Foundation Hundreds, perhaps thousands, of studies of the effects of testing predate CRESST's in the 1980s. See, for example:  https://www.tandfonline.com/doi/full/10.1080/15305058.2011.602920 ; https://nonpartisaneducation.org/Review/Resources/QuantitativeList.htm ; https://nonpartisaneducation.org/Review/Resources/SurveyList.htm ; https://nonpartisaneducation.org/Review/Resources/QualitativeList.htm
114 Brian M. Stecher Laura S. Hamilton, Brian M. Stecher, Stephen P. Klein, Eds. "The bulk of the research on the effects of testing has been conducted using surveys and case studies." Dismissive Chapter 4: Consequences of large-scale, high-stakes testing on school and classroom practice, p. 83 Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002 https://www.rand.org/content/dam/rand/pubs/monograph_reports/2002/MR1554.pdf US National Science Foundation This is misleading. True, many of the hundreds of studies on the effects of testing have been surveys and case studies. But, many, and more by my count, have been randomized experiments. See, for example:  https://www.tandfonline.com/doi/full/10.1080/15305058.2011.602920 ; https://nonpartisaneducation.org/Review/Resources/QuantitativeList.htm ;
115 Brian M. Stecher Laura S. Hamilton, Brian M. Stecher, Stephen P. Klein, Eds. "Data on the incidence of cheating [on educational tests] are scarce…" Dismissive Chapter 4: Consequences of large-scale, high-stakes testing on school and classroom practice, p. 96 Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002 https://www.rand.org/pubs/monograph_reports/MR1554.html US National Science Foundation Actually, there have been, in surveys, in which respondents freely admit that they cheat and how. Moreover, news reports of cheating, by students or educators, have been voluminous. See, for example, Caveon Test Security's "Cheating in the News" section on its web site.
116 Brian M. Stecher Laura S. Hamilton, Brian M. Stecher, Stephen P. Klein, Eds. "Less is known about changes in policies at the district and school levels in response to high-stakes testing, but mixed evidence of some impact has appeared." Dismissive Chapter 4: Consequences of large-scale, high-stakes testing on school and classroom practice, p. 96 Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002 https://www.rand.org/pubs/monograph_reports/MR1554.html US National Science Foundation Relevant pre-2000 studies of the effects of testing on at-risk students, completion, dropping out, curricular offerings, attitudes, etc. include those of Schleisman (1999); the *Southern Regional Education Board (1998); Webster, Mendro, Orsak, Weerasinghe & Bembry (1997); Jones (1996); Boylan (1996); Jones, 1993; Jacobson (1992); Grisay (1991); Johnstone (1990); Task Force on Educational Assessment Programs [Florida] (1979); Wellisch, MacQueen, Carriere & Duck (1978); Enochs (1978); Pronaratna (1976); and McWilliams & Thomas (1976).
117 Brian M. Stecher Laura S. Hamilton, Brian M. Stecher, Stephen P. Klein, Eds. "Although numerous news articles have addressed the negative effects of high-stakes testing, systematic research on the subject is limited." Dismissive Chapter 4: Consequences of large-scale, high-stakes testing on school and classroom practice, p. 98 Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002 https://www.rand.org/pubs/monograph_reports/MR1554.html US National Science Foundation Relevant pre-2000 studies of the effects of testing on at-risk students, completion, dropping out, curricular offerings, attitudes, etc. include those of Schleisman (1999); the *Southern Regional Education Board (1998); Webster, Mendro, Orsak, Weerasinghe & Bembry (1997); Jones (1996); Boylan (1996); Jones, 1993; Jacobson (1992); Grisay (1991); Johnstone (1990); Task Force on Educational Assessment Programs [Florida] (1979); Wellisch, MacQueen, Carriere & Duck (1978); Enochs (1978); Pronaratna (1976); and McWilliams & Thomas (1976).
118 Brian M. Stecher Laura S. Hamilton, Brian M. Stecher, Stephen P. Klein, Eds. "Research regarding the effects of test-based accountability on equity is very limited." Dismissive Chapter 4: Consequences of large-scale, high-stakes testing on school and classroom practice, p. 99 Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002 https://www.rand.org/content/dam/rand/pubs/monograph_reports/2002/MR1554.pdf US National Science Foundation  
119 Brian M. Stecher Laura S. Hamilton, Brian M. Stecher, Stephen P. Klein, Eds. "Researchers have not documented the desirable consequences of testing … as clearly as the undesirable ones." Dismissive Chapter 4: Consequences of large-scale, high-stakes testing on school and classroom practice, p. 99 Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002 https://www.rand.org/content/dam/rand/pubs/monograph_reports/2002/MR1554.pdf US National Science Foundation See, for example:  https://www.tandfonline.com/doi/full/10.1080/15305058.2011.602920 ; https://nonpartisaneducation.org/Review/Resources/QuantitativeList.htm ; https://nonpartisaneducation.org/Review/Resources/SurveyList.htm ; https://nonpartisaneducation.org/Review/Resources/QualitativeList.htm
120 Brian M. Stecher Laura S. Hamilton, Brian M. Stecher, Stephen P. Klein, Eds. " … researchers have not generally measured the extent or magnitude of the shifts in practice that they identified as a result of high-stakes testing." Dismissive Chapter 4: Consequences of large-scale, high-stakes testing on school and classroom practice, pp. 99–100 Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002 https://www.rand.org/content/dam/rand/pubs/monograph_reports/2002/MR1554.pdf US National Science Foundation The 1993 GAO study did. See, also:  https://www.tandfonline.com/doi/full/10.1080/15305058.2011.602920 ; https://nonpartisaneducation.org/Review/Resources/QuantitativeList.htm ; https://nonpartisaneducation.org/Review/Resources/SurveyList.htm ; https://nonpartisaneducation.org/Review/Resources/QualitativeList.htm
121 Lorraine M. McDonnell Laura S. Hamilton, Brian M. Stecher, Stephen P. Klein, Eds. "...this chapter can only describe the issues that are raised when one looks at testing from a political perspective. Because of the lack of systematic studies on the topic." Dismissive Chapter 5: Accountability as seen through a political lens, p.102 Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002 https://www.rand.org/pubs/monograph_reports/MR1554.html US National Science Foundation Parents and other adults are typically reached.through public opinion polls. See, for example:  https://www.tandfonline.com/doi/full/10.1080/15305058.2011.602920 ; https://nonpartisaneducation.org/Review/Resources/SurveyList.htm .  Among the hundreds of polls conducted between 1958 and 2008, a majority of them included parents in particular or adults in general.
122 Lorraine M. McDonnell Laura S. Hamilton, Brian M. Stecher, Stephen P. Klein, Eds. "...public opinion, as measured by surveys, does not always provide a clear and unambiguous measure of public sentiment." Denigrating Chapter 5: Accountability as seen through a political lens, p.108 Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002 https://www.rand.org/pubs/monograph_reports/MR1554.html US National Science Foundation Parents and other adults are typically reached.through public opinion polls. See, for example:  https://www.tandfonline.com/doi/full/10.1080/15305058.2011.602920 ; https://nonpartisaneducation.org/Review/Resources/SurveyList.htm .  Among the hundreds of polls conducted between 1958 and 2008, a majority of them included parents in particular or adults in general.
123 Laura S. Hamilton, Brian M. Stecher Laura S. Hamilton, Brian M. Stecher, Stephen P. Klein, Eds. "So test-based accountability remains controversial because there is inadequate evidence to make clear judgments about its effectiveness in raising test scores and achieving its other goals." Dismissive Chapter 6: Improving test-based accountability, p.122 Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002 https://www.rand.org/pubs/monograph_reports/MR1554.html US National Science Foundation In fact, the evidence "that testing can improve education" is voluminous. See, for example, Phelps, R. P. (2005). The rich, robust research literature on testing’s achievement benefits. In R. P. Phelps (Ed.), Defending standardized testing (pp. 55–90). Mahwah, NJ: Psychology Press. Or, see https://journals.sagepub.com/doi/abs/10.1177/0193841X19865628#abstract
124 Laura S. Hamilton, Brian M. Stecher Laura S. Hamilton, Brian M. Stecher, Stephen P. Klein, Eds. "Unfortunately, the complexity of the issues and the ambiguity of the existing research do not allow our recommendations to take the form of a practical “how-to” guide for policymakers and practitioners." Denigrating Chapter 6: Improving test-based accountability, p.123 Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002 https://www.rand.org/pubs/monograph_reports/MR1554.html US National Science Foundation In fact, the evidence "that testing can improve education" is voluminous. See, for example, Phelps, R. P. (2005). The rich, robust research literature on testing’s achievement benefits. In R. P. Phelps (Ed.), Defending standardized testing (pp. 55–90). Mahwah, NJ: Psychology Press. Or, see https://journals.sagepub.com/doi/abs/10.1177/0193841X19865628#abstract
125 Laura S. Hamilton, Brian M. Stecher Laura S. Hamilton, Brian M. Stecher, Stephen P. Klein, Eds. "Additional research is needed to identify the elements of performance on tests and how these elements map onto other tests …." Denigrating Chapter 6: Improving test-based accountability, p.127 Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002 https://www.rand.org/pubs/monograph_reports/MR1554.html US National Science Foundation  
126 Laura S. Hamilton, Brian M. Stecher Laura S. Hamilton, Brian M. Stecher, Stephen P. Klein, Eds. "Another part of the interpretive question is the need to gather information
in other subject areas to portray a more complete picture of
achievement.
The scope of constructs that have been considered in
research
to date has been fairly narrow, focusing on the subjects that
are part of the accountability systems that have been studied. Many
legitimate instructional
objectives have been ignored in the literature
to date."
Denigrating Chapter 6: Improving test-based accountability, p.127 Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002 https://www.rand.org/pubs/monograph_reports/MR1554.html US National Science Foundation Many studies of the effects of testing predate CRESST's in the 1980s and cover all subject fields, not just reading and math. See, for example:  https://www.tandfonline.com/doi/full/10.1080/15305058.2011.602920 ; https://nonpartisaneducation.org/Review/Resources/QuantitativeList.htm ; https://nonpartisaneducation.org/Review/Resources/SurveyList.htm ; https://nonpartisaneducation.org/Review/Resources/QualitativeList.htm
127 Laura S. Hamilton, Brian M. Stecher Laura S. Hamilton, Brian M. Stecher, Stephen P. Klein, Eds. "States should also conduct ongoing analyses of the performance of groups whose members may not be numerous enough to permit separate reporting. English-language learners and students with disabilities are increasingly being included in high-stakes testing systems, and, as discussed in Chapter Three, little is currently known about the validity of scores for these groups." Dismissive Chapter 6: Improving test-based accountability, p.131 Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002 https://www.rand.org/pubs/monograph_reports/MR1554.html US National Science Foundation Difficult to believe given that the federal government has for decades generously funded research into testing students with disabilities. See, for example, https://nceo.info/ and Kurt Geisinger's and Janet Carlson's chapters in Defending Standardized Testing and Correcting Fallacies in Educational and Psychological Testing. 
128 Laura S. Hamilton, Brian M. Stecher Laura S. Hamilton, Brian M. Stecher, Stephen P. Klein, Eds. "It would be especially helpful to know what changes in instruction are made in response to different kinds of information and incentives. In particular, we need to know how teachers interpret information from tests and how they use it to modify instruction." Dismissive Chapter 6: Improving test-based accountability, p.133 Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002 https://www.rand.org/pubs/monograph_reports/MR1554.html US National Science Foundation Relevant studies of the effects of tests and/or accountability program on motivation and instructional practice include those of the *Southern Regional Education Board (1998); Johnson (1998); Schafer, Hultgren, Hawley, Abrams Seubert & Mazzoni (1997); Miles, Bishop, Collins, Fink, Gardner, Grant, Hussain, et al. (1997); Tuckman & Trimble (1997); Clarke & Stephens (1996); Zigarelli (1996); Stevenson, Lee, et al. (1995); Waters, Burger & Burger (1995); Egeland (1995); Prais (1995); Tuckman (1994); Ritchie & Thorkildsen (1994); Brown & Walberg, (1993); Wall & Alderson (1993); Wolf & Rapiau (1993); Eckstein & Noah (1993); Chao-Qun & Hui (1993); Plazak & Mazur (1992); Steedman (1992); Singh, Marimutha & Mukjerjee (1990); *Levine & Lezotte (1990); O’Sullivan (1989); Somerset (1988); Pennycuick & Murphy (1988); Stevens (1984); Marsh (1984); Brunton (1982); Solberg (1977); Foss (1977); *Kirkland (1971); Somerset (1968); Stuit (1947); and Keys (1934).  *Covers many studies; study is a research review, research synthesis, or meta-analysis. "Others have considered the role of tests in incentive programs.  These researchers have included Homme, Csanyi, Gonzales, Rechs, O’Leary, Drabman, Kaszdin, Bootzin, Staats, Cameron, Pierce, McMillan, Corcoran, and Wilson. International organizations, such as the World Bank or the Asian Development Bank, have studied the effects of testing on education programs they sponsor.  Researchers have included Somerset, Heynemann, Ransom, Psacharopoulis, Velez, Brooke, Oxenham, Bude, Chapman, Snyder, and Pronaratna.
Moreover, the mastery learning/mastery testing experiments conducted from the 1960s through today varied incentives, frequency of tests, types of tests, and many other factors to determine the optimal structure of testing programs. Researchers included such notables as Bloom, Carroll, Keller, Block, Burns, Wentling, Anderson, Hymel, Kulik, Tierney, Cross, Okey, Guskey, Gates, and Jones."
129 Laura S. Hamilton, Brian M. Stecher Laura S. Hamilton, Brian M. Stecher, Stephen P. Klein, Eds. It seems clear that aligning the components of the system and providing appropriate professional development should, at a minimum, increase teachers’ political support for test-based accountability policies .... Although there is no empirical evidence to suggest that this strategy will reduce inappropriate responses to high-stakes testing,... Additional research needs to be done to determine the importance of alignment for promoting positive effects of test-based accountability. Dismissive Chapter 6: Improving test-based accountability, p.135 Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002 https://www.rand.org/pubs/monograph_reports/MR1554.html US National Science Foundation Relevant studies of the effects of tests and/or accountability program on motivation and instructional practice include those of the *Southern Regional Education Board (1998); Johnson (1998); Schafer, Hultgren, Hawley, Abrams Seubert & Mazzoni (1997); Miles, Bishop, Collins, Fink, Gardner, Grant, Hussain, et al. (1997); Tuckman & Trimble (1997); Clarke & Stephens (1996); Zigarelli (1996); Stevenson, Lee, et al. (1995); Waters, Burger & Burger (1995); Egeland (1995); Prais (1995); Tuckman (1994); Ritchie & Thorkildsen (1994); Brown & Walberg, (1993); Wall & Alderson (1993); Wolf & Rapiau (1993); Eckstein & Noah (1993); Chao-Qun & Hui (1993); Plazak & Mazur (1992); Steedman (1992); Singh, Marimutha & Mukjerjee (1990); *Levine & Lezotte (1990); O’Sullivan (1989); Somerset (1988); Pennycuick & Murphy (1988); Stevens (1984); Marsh (1984); Brunton (1982); Solberg (1977); Foss (1977); *Kirkland (1971); Somerset (1968); Stuit (1947); and Keys (1934).  *Covers many studies; study is a research review, research synthesis, or meta-analysis. "Others have considered the role of tests in incentive programs.  These researchers have included Homme, Csanyi, Gonzales, Rechs, O’Leary, Drabman, Kaszdin, Bootzin, Staats, Cameron, Pierce, McMillan, Corcoran, and Wilson. International organizations, such as the World Bank or the Asian Development Bank, have studied the effects of testing on education programs they sponsor.  Researchers have included Somerset, Heynemann, Ransom, Psacharopoulis, Velez, Brooke, Oxenham, Bude, Chapman, Snyder, and Pronaratna.
Moreover, the mastery learning/mastery testing experiments conducted from the 1960s through today varied incentives, frequency of tests, types of tests, and many other factors to determine the optimal structure of testing programs. Researchers included such notables as Bloom, Carroll, Keller, Block, Burns, Wentling, Anderson, Hymel, Kulik, Tierney, Cross, Okey, Guskey, Gates, and Jones."
130 Laura S. Hamilton, Brian M. Stecher Laura S. Hamilton, Brian M. Stecher, Stephen P. Klein, Eds. "… we currently do not know enough about test-based accountability to design a system that is immune from the problems we have discussed Dismissive Chapter 6: Improving test-based accountability, p.136 Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002 https://www.rand.org/pubs/monograph_reports/MR1554.html US National Science Foundation In fact, the evidence "that testing can improve education" is voluminous. See, for example, Phelps, R. P. (2005). The rich, robust research literature on testing’s achievement benefits. In R. P. Phelps (Ed.), Defending standardized testing (pp. 55–90). Mahwah, NJ: Psychology Press. Or, see https://journals.sagepub.com/doi/abs/10.1177/0193841X19865628#abstract
131 Laura S. Hamilton, Brian M. Stecher Laura S. Hamilton, Brian M. Stecher, Stephen P. Klein, Eds. "There is some limited evidence that educators’ responses to test based accountability vary according to the characteristics of their student populations,…" Denigrating Chapter 6: Improving test-based accountability, p.138 Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002 https://www.rand.org/pubs/monograph_reports/MR1554.html US National Science Foundation There was and is far more than "limited" evidence. See, for example:  Hundreds, perhaps thousands, of studies of the effects of testing predate CRESST's in the 1980s. See, for example:  https://www.tandfonline.com/doi/full/10.1080/15305058.2011.602920 ; https://nonpartisaneducation.org/Review/Resources/SurveyList.htm ; https://nonpartisaneducation.org/Review/Resources/QualitativeList.htm
132 Laura S. Hamilton, Brian M. Stecher Laura S. Hamilton, Brian M. Stecher, Stephen P. Klein, Eds. "... there is very limited evidence to guide thinking about political issues." Dismissive Chapter 6: Improving test-based accountability, p.139 Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002 https://www.rand.org/pubs/monograph_reports/MR1554.html US National Science Foundation Parents and other adults are typically reached.through public opinion polls. See, for example:  https://www.tandfonline.com/doi/full/10.1080/15305058.2011.602920 ; https://nonpartisaneducation.org/Review/Resources/SurveyList.htm .  Among the hundreds of polls conducted between 1958 and 2008, a majority of them included parents in particular or adults in general.
133 Laura S. Hamilton, Brian M. Stecher Laura S. Hamilton, Brian M. Stecher, Stephen P. Klein, Eds. "First, we do not have an accurate assessment of the additional costs." Denigrating Chapter 6: Improving test-based accountability, p.141 Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002 https://www.rand.org/pubs/monograph_reports/MR1554.html US National Science Foundation Yes, we did and we do. See, for example, Phelps, R.P. (2000, Winter). Estimating the cost of systemwide student testing in the United States. Journal of Education Finance, 25(3) 343–380; Danitz, T. (2001, February 27). Special report: States pay $400 million for tests in 2001. Stateline.org. Pew Center for the States; Hoxby, C.M. (2002). The cost of accountability, in W. M Evers & H.J. Walberg (Eds.), School Accountability, Stanford, CA: Hoover Institution Press; U.S. GAO. (1993, January). Student testing: Current extent and expenditures, with cost estimates for a national examination. GAO/PEMD-93-8. Washington, DC: US General Accounting Office; Picus, L.O., & Tralli, A. (1998, February). Alternative assessment programs: What are the true costs? CSE Technical Report 441, Los Angeles: CRESST; Phelps, R.P. (1998). Benefit-cost analysis of systemwide student testing, Paper presented at the annual meeting of the American Education Finance Association, Mobile, AL.
134 Laura S. Hamilton, Brian M. Stecher Laura S. Hamilton, Brian M. Stecher, Stephen P. Klein, Eds. "However, many of these recommended reforms are relatively inexpensive in comparison with the total cost of education. This equation is seldom examined."  Denigrating Chapter 6: Improving test-based accountability, p.141 Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002 https://www.rand.org/pubs/monograph_reports/MR1554.html US National Science Foundation Wrong. See, for example, Phelps, R.P. (2000, Winter). Estimating the cost of systemwide student testing in the United States. Journal of Education Finance, 25(3) 343–380;  Phelps, R.P. (1998). Benefit-cost analysis of systemwide student testing, Paper presented at the annual meeting of the American Education Finance Association, Mobile, AL.
135 Laura S. Hamilton, Brian M. Stecher Laura S. Hamilton, Brian M. Stecher, Stephen P. Klein, Eds. "Part of the reason these issues are rarely considered may be that no one has produced a good estimate of the cost of an improved accountability system in comparison with its benefits." Denigrating Chapter 6: Improving test-based accountability, p.141 Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002 https://www.rand.org/pubs/monograph_reports/MR1554.html US National Science Foundation No. See, for example, Phelps, R.P. (2000, Winter). Estimating the cost of systemwide student testing in the United States. Journal of Education Finance, 25(3) 343–380; Danitz, T. (2001, February 27). Special report: States pay $400 million for tests in 2001. Stateline.org. Pew Center for the States; Hoxby, C.M. (2002). The cost of accountability, in W. M Evers & H.J. Walberg (Eds.), School Accountability, Stanford, CA: Hoover Institution Press; U.S. GAO. (1993, January). Student testing: Current extent and expenditures, with cost estimates for a national examination. GAO/PEMD-93-8. Washington, DC: US General Accounting Office; Picus, L.O., & Tralli, A. (1998, February). Alternative assessment programs: What are the true costs? CSE Technical Report 441, Los Angeles: CRESST; Phelps, R.P. (1998). Benefit-cost analysis of systemwide student testing, Paper presented at the annual meeting of the American Education Finance Association, Mobile, AL.
136 Laura S. Hamilton, Brian M. Stecher Laura S. Hamilton, Brian M. Stecher, Stephen P. Klein, Eds. "Nevertheless, our knowledge of the costs of alternative accountability systems is still somewhat limited. Policymakers need to know how much it would cost to change their current systems to be responsive to criticisms such as those described in this book. These estimates need to consider all of the associated costs, including possible opportunity costs associated with increased testing time and increased test preparation time." Dismissive Chapter 6: Improving test-based accountability, p.142 Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002 https://www.rand.org/pubs/monograph_reports/MR1554.html US National Science Foundation No. See, for example, Phelps, R.P. (2000, Winter). Estimating the cost of systemwide student testing in the United States. Journal of Education Finance, 25(3) 343–380; Danitz, T. (2001, February 27). Special report: States pay $400 million for tests in 2001. Stateline.org. Pew Center for the States; Hoxby, C.M. (2002). The cost of accountability, in W. M Evers & H.J. Walberg (Eds.), School Accountability, Stanford, CA: Hoover Institution Press; U.S. GAO. (1993, January). Student testing: Current extent and expenditures, with cost estimates for a national examination. GAO/PEMD-93-8. Washington, DC: US General Accounting Office; Picus, L.O., & Tralli, A. (1998, February). Alternative assessment programs: What are the true costs? CSE Technical Report 441, Los Angeles: CRESST; Phelps, R.P. (1998). Benefit-cost analysis of systemwide student testing, Paper presented at the annual meeting of the American Education Finance Association, Mobile, AL.
137 Laura S. Hamilton, Brian M. Stecher Laura S. Hamilton, Brian M. Stecher, Stephen P. Klein, Eds. "However, there is still much about these systems that is not well understood. Lack of research-based knowledge about the quality of scores and the mechanisms through which high-stakes testing programs operate limits our ability to improve these systems. As a result, our discussions also identified unanswered questions..." Dismissive Chapter 6: Improving test-based accountability, p.143 Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002 https://www.rand.org/pubs/monograph_reports/MR1554.html US National Science Foundation In fact, the evidence "that testing can improve education" is voluminous. See, for example, Phelps, R. P. (2005). The rich, robust research literature on testing’s achievement benefits. In R. P. Phelps (Ed.), Defending standardized testing (pp. 55–90). Mahwah, NJ: Psychology Press. Or, see https://journals.sagepub.com/doi/abs/10.1177/0193841X19865628#abstract
138 Eva L. Baker, Robert L. Linn, Joan L. Herman, and Daniel Koretz   "Because experience with accountability systems is still developing, the standards we propose are intended to help evaluate existing systems and to guide the design of improved procedures." p.1 Dismissive Standards for Educational
Accountability Systems
CRESST Policy Brief 5, Winter 2002 https://www.gpo.gov/fdsys/pkg/ERIC-ED466643/pdf/ERIC-ED466643.pdf Office of Research and Improvement, US Education Department See, for example:  https://www.tandfonline.com/doi/full/10.1080/15305058.2011.602920 ; https://nonpartisaneducation.org/Review/Resources/QuantitativeList.htm .  This list includes 24 studies completed before 2000 whose primary focus was to measure the effect of “test-based accountability.” A few dozen more pre-2000 studies also measured the effect of test-based accountability although such was not their primary focus. Include qualitative and program evaluation studies of test-based accountability, and the count of pre-2000 studies rises into the hundreds.  Relevant studies of the effects of tests and/or accountability program on motivation and instructional practice include those of the *Southern Regional Education Board (1998); Johnson (1998); Schafer, Hultgren, Hawley, Abrams Seubert & Mazzoni (1997); Miles, Bishop, Collins, Fink, Gardner, Grant, Hussain, et al. (1997); Tuckman & Trimble (1997); Clarke & Stephens (1996); Zigarelli (1996); Stevenson, Lee, et al. (1995); Waters, Burger & Burger (1995); Egeland (1995); Prais (1995); Tuckman (1994); Ritchie & Thorkildsen (1994); Brown & Walberg, (1993); Wall & Alderson (1993); Wolf & Rapiau (1993); Eckstein & Noah (1993); Chao-Qun & Hui (1993); Plazak & Mazur (1992); Steedman (1992); Singh, Marimutha & Mukjerjee (1990); *Levine & Lezotte (1990); O’Sullivan (1989); Somerset (1988); Pennycuick & Murphy (1988); Stevens (1984); Marsh (1984); Brunton (1982); Solberg (1977); Foss (1977); *Kirkland (1971); Somerset (1968); Stuit (1947); and Keys (1934).  *Covers many studies; study is a research review, research synthesis, or meta-analysis.
139 Eva L. Baker, Robert L. Linn, Joan L. Herman, and Daniel Koretz   "It is not possible at this stage in the development of accountability systems to know in advance how every element of an accountability system will actually operate in practice or what effects it will produce." p.1 Dismissive Standards for Educational
Accountability Systems
CRESST Policy Brief 5, Winter 2002 https://www.gpo.gov/fdsys/pkg/ERIC-ED466643/pdf/ERIC-ED466643.pdf Office of Research and Improvement, US Education Department See, for example:  https://www.tandfonline.com/doi/full/10.1080/15305058.2011.602920 ; https://nonpartisaneducation.org/Review/Resources/QuantitativeList.htm .  This list includes 24 studies completed before 2000 whose primary focus was to measure the effect of “test-based accountability.” A few dozen more pre-2000 studies also measured the effect of test-based accountability although such was not their primary focus. Include qualitative and program evaluation studies of test-based accountability, and the count of pre-2000 studies rises into the hundreds.  Relevant studies of the effects of tests and/or accountability program on motivation and instructional practice include those of the *Southern Regional Education Board (1998); Johnson (1998); Schafer, Hultgren, Hawley, Abrams Seubert & Mazzoni (1997); Miles, Bishop, Collins, Fink, Gardner, Grant, Hussain, et al. (1997); Tuckman & Trimble (1997); Clarke & Stephens (1996); Zigarelli (1996); Stevenson, Lee, et al. (1995); Waters, Burger & Burger (1995); Egeland (1995); Prais (1995); Tuckman (1994); Ritchie & Thorkildsen (1994); Brown & Walberg, (1993); Wall & Alderson (1993); Wolf & Rapiau (1993); Eckstein & Noah (1993); Chao-Qun & Hui (1993); Plazak & Mazur (1992); Steedman (1992); Singh, Marimutha & Mukjerjee (1990); *Levine & Lezotte (1990); O’Sullivan (1989); Somerset (1988); Pennycuick & Murphy (1988); Stevens (1984); Marsh (1984); Brunton (1982); Solberg (1977); Foss (1977); *Kirkland (1971); Somerset (1968); Stuit (1947); and Keys (1934).  *Covers many studies; study is a research review, research synthesis, or meta-analysis.
140 Jay P. Heubert   "For Heubert, it is very much an open question what the effect of standards and high-stakes testing will be." p.83 Dismissive Achieving High Standards for All National Research Council   "This project was funded by grant R215U990023 from the Office of Educational Research andImprovement (OERI) of the United States Department of Education." See, for example, https://www.tandfonline.com/doi/full/10.1080/15305058.2011.602920 ; https://nonpartisaneducation.org/Review/Resources/SurveyList.htm ; https://nonpartisaneducation.org/Review/Resources/QualitativeList.htm; https://nonpartisaneducation.org/Review/Resources/QuantitativeList.htm Relevant pre-2000 studies of the effects of standards, alignment, goal setting, setting reachable goals, etc. include those of Mitchell (1999); Morgan & Ramist (1998); the *Southern Regional Education Board (1998); Miles, Bishop, Collins, Fink, Gardner, Grant, Hussain, et al. (1997); the Florida Office of Program Policy Analysis (1997); Pomplun (1997); Schmoker (1996); Aguilera & Hendricks (1996); Banta, Lund, Black & Oblander (1996); Bottoms & Mikos (1995); *Bamburg & Medina (1993); Bishop (1993); the U. S. General Accounting Office (1993); Eckstein & Noah (1993); Mattsson (1993); Brown (1992); Heyneman & Ransom (1992); Whetton (1992); Anderson, Muir, Bateson, Blackmore & Rogers (1990); Csikszentmihalyi (1990); *Levine & Lezotte (1990); LaRoque & Coleman (1989); Hillocks (1987); Willingham & Morris (1986); Resnick & Resnick (1985); Ogle & Fritts (1984); *Natriello & Dornbusch (1984); Brooke & Oxenham (1984); Rentz (1979); Wellisch, MacQueen, Carriere & Dick (1978); *Rosswork (1977); Estes, Colvin & Goodwin (1976); Wood (1953); and Panlasigui & Knight (1930).
141 Ready, Timothy, Ed.; Edley, Christopher, Jr., Ed.; Snow,
Catherine E., Ed.
  "To be sure, there is a largely unexamined empirical assertion under-lying the arguments of high-stakes proponents: attaching high-stakesconsequences for the students provides an indispensable, otherwise un-obtainable incentive for students, parents, and teachers topay carefulattention to learning tasks." p. 128 Dismissive Achieving High Standards for All National Research Council   "This project was funded by grant R215U990023 from the Office of Educational Research andImprovement (OERI) of the United States Department of Education." Relevant studies of the effects of tests and/or accountability program on motivation and instructional practice include those of the *Southern Regional Education Board (1998); Johnson (1998); Schafer, Hultgren, Hawley, Abrams Seubert & Mazzoni (1997); Miles, Bishop, Collins, Fink, Gardner, Grant, Hussain, et al. (1997); Tuckman & Trimble (1997); Clarke & Stephens (1996); Zigarelli (1996); Stevenson, Lee, et al. (1995); Waters, Burger & Burger (1995); Egeland (1995); Prais (1995); Tuckman (1994); Ritchie & Thorkildsen (1994); Brown & Walberg, (1993); Wall & Alderson (1993); Wolf & Rapiau (1993); Eckstein & Noah (1993); Chao-Qun & Hui (1993); Plazak & Mazur (1992); Steedman (1992); Singh, Marimutha & Mukjerjee (1990); *Levine & Lezotte (1990); O’Sullivan (1989); Somerset (1988); Pennycuick & Murphy (1988); Stevens (1984); Marsh (1984); Brunton (1982); Solberg (1977); Foss (1977); *Kirkland (1971); Somerset (1968); Stuit (1947); and Keys (1934).  *Covers many studies; study is a research review, research synthesis, or meta-analysis. "Others have considered the role of tests in incentive programs.  These researchers have included Homme, Csanyi, Gonzales, Rechs, O’Leary, Drabman, Kaszdin, Bootzin, Staats, Cameron, Pierce, McMillan, Corcoran, and Wilson. International organizations, such as the World Bank or the Asian Development Bank, have studied the effects of testing on education programs they sponsor.  Researchers have included Somerset, Heynemann, Ransom, Psacharopoulis, Velez, Brooke, Oxenham, Bude, Chapman, Snyder, and Pronaratna.
Moreover, the mastery learning/mastery testing experiments conducted from the 1960s through today varied incentives, frequency of tests, types of tests, and many other factors to determine the optimal structure of testing programs. Researchers included such notables as Bloom, Carroll, Keller, Block, Burns, Wentling, Anderson, Hymel, Kulik, Tierney, Cross, Okey, Guskey, Gates, and Jones."
142 Daniel M. Koretz Daniel F. McCaffrey, Laura S. Hamilton "Although high-stakes testing is now widespread, methods for evaluating the validity of gains obtained under high-stakes conditions are poorly developed. This report presents an approach for evaluating the validity of inferences based on score gains on high-stakes tests. It describes the inadequacy of traditional validation approaches for validating gains under high-stakes conditions and outlines an alternative validation framework for conceptualizing meaningful and inflated score gains.", p.1 Denigrating Toward a framework for validating gains under high-stakes conditions CSE Technical Report 551, CRESST/Harvard Graduate School of Education, CRESST/RAND Education, December 2001 https://files.eric.ed.gov/fulltext/ED462410.pdf Office of Research and Improvement, US Education Department In fact the test prep, or test coaching, literature is vast and dates back decades, with meta-analyses of the literature dating back at least to the 1970s. There's even a What Works Clearinghouse summary of the (post World Wide Web) college admission test prep research literature:  https://ies.ed.gov/ncee/wwc/Docs/InterventionReports/wwc_act_sat_100416.pdf . See also: Gilmore (1927)  DeWeerdt (1927)  French (1959) French & Dear (1959)  Ortar (1960)  Marron (1965)  ETS (1965). Messick & Jungeblut (1981)  Ellis, Konoske, Wulfeck, & Montague (1982)  DerSimonian and Laird (1983)  Kulik, Bangert-Drowns & Kulik (1984)  Powers (1985)  Jones (1986). Fraker (1986/1987)  Halpin (1987)  Whitla (1988)  Snedecor (1989)  Bond (1989). Baydar (1990)  Becker (1990)  Smyth (1990)  Moore (1991)  Alderson & Wall (1992)  Powers (1993)  Oren (1993). Powers & Rock (1994)  Scholes, Lane (1997)   Allalouf & Ben Shakhar (1998)  Robb & Ercanbrack (1999)  McClain (1999)  Camara (1999, 2001, 2008) Stone & Lane (2000, 2003)  Din & Soldan (2001)  Briggs (2001)  Palmer (2002)  Briggs & Hansen (2004)  Cankoy & Ali Tut (2005)  Crocker (2005)  Allensworth, Correa, & Ponisciak (2008)  Domingue & Briggs (2009)  Koljatic & Silva (2014)  Early (2019)  Herndon (2021)
143 Daniel M. Koretz Daniel F. McCaffrey, Laura S. Hamilton "Few efforts are made to evaluate directly score gains obtained under high-stakes conditions, and conventional validation tools are not fully adequate for the task.", p. 1 Dismissive Toward a framework for validating gains under high-stakes conditions CSE Technical Report 551, CRESST/Harvard Graduate School of Education, CRESST/RAND Education, December 2001 https://files.eric.ed.gov/fulltext/ED462410.pdf Office of Research and Improvement, US Education Department In fact the test prep, or test coaching, literature is vast and dates back decades, with meta-analyses of the literature dating back at least to the 1970s. There's even a What Works Clearinghouse summary of the (post World Wide Web) college admission test prep research literature:  https://ies.ed.gov/ncee/wwc/Docs/InterventionReports/wwc_act_sat_100416.pdf . See also: Gilmore (1927)  DeWeerdt (1927)  French (1959) French & Dear (1959)  Ortar (1960)  Marron (1965)  ETS (1965). Messick & Jungeblut (1981)  Ellis, Konoske, Wulfeck, & Montague (1982)  DerSimonian and Laird (1983)  Kulik, Bangert-Drowns & Kulik (1984)  Powers (1985)  Jones (1986). Fraker (1986/1987)  Halpin (1987)  Whitla (1988)  Snedecor (1989)  Bond (1989). Baydar (1990)  Becker (1990)  Smyth (1990)  Moore (1991)  Alderson & Wall (1992)  Powers (1993)  Oren (1993). Powers & Rock (1994)  Scholes, Lane (1997)   Allalouf & Ben Shakhar (1998)  Robb & Ercanbrack (1999)  McClain (1999)  Camara (1999, 2001, 2008) Stone & Lane (2000, 2003)  Din & Soldan (2001)  Briggs (2001)  Palmer (2002)  Briggs & Hansen (2004)  Cankoy & Ali Tut (2005)  Crocker (2005)  Allensworth, Correa, & Ponisciak (2008)  Domingue & Briggs (2009)  Koljatic & Silva (2014)  Early (2019)
144 Karen J. Mitchell, David Z. Robinson, Barbara S. Plake, & Kaeli T. Knowles (Eds.)   "Despite their importance and widespread use, little is known about the impact of these tests on states’ recent efforts to improve teaching and learning." Dismissive Testing Teacher Candidates: The Role of Licensure Tests in Improving Teacher Quality, 2001, p.14 Committee on Assessment and Teacher Quality   Board on Testing and Assessment, National Research Council  
145 Karen J. Mitchell, David Z. Robinson, Barbara S. Plake, & Kaeli T. Knowles (Eds.)   "Little information about the technical soundness of teacher licensure tests appears in the published literature." Dismissive Testing Teacher Candidates: The Role of Licensure Tests in Improving Teacher Quality, 2001, p.14 Committee on Assessment and Teacher Quality   Board on Testing and Assessment, National Research Council  
146 Karen J. Mitchell, David Z. Robinson, Barbara S. Plake, & Kaeli T. Knowles (Eds.)   "Little research exists on the extent to which licensure tests identify candidates with the knowledge and skills necessary to be minimally competent beginning teachers." Dismissive Testing Teacher Candidates: The Role of Licensure Tests in Improving Teacher Quality, 2001, p.14 Committee on Assessment and Teacher Quality   Board on Testing and Assessment, National Research Council  
147 Karen J. Mitchell, David Z. Robinson, Barbara S. Plake, & Kaeli T. Knowles (Eds.)   "Information is needed about the soundness and technical quality of the tests that states use to license their teachers." Dismissive Testing Teacher Candidates: The Role of Licensure Tests in Improving Teacher Quality, 2001, p.14 Committee on Assessment and Teacher Quality   Board on Testing and Assessment, National Research Council  
148 Karen J. Mitchell, David Z. Robinson, Barbara S. Plake, & Kaeli T. Knowles (Eds.)   "policy and practice on teacher licensure testing in the United States are nascent and evolving" Dismissive Testing Teacher Candidates: The Role of Licensure Tests in Improving Teacher Quality, 2001, p.17 Committee on Assessment and Teacher Quality   Board on Testing and Assessment, National Research Council  
149 Karen J. Mitchell, David Z. Robinson, Barbara S. Plake, & Kaeli T. Knowles (Eds.)   "The paucity of data and these methodological challenges made the committee’s examination of teacher licensure testing difficult." Dismissive Testing Teacher Candidates: The Role of Licensure Tests in Improving Teacher Quality, 2001, p.17 Committee on Assessment and Teacher Quality   Board on Testing and Assessment, National Research Council  
150 Karen J. Mitchell, David Z. Robinson, Barbara S. Plake, & Kaeli T. Knowles (Eds.)   "There were a number of questions the committee wanted to answer but could not, either because they were beyond the scope of this study, the evidentiary base was inconclusive, or the committee’s time and resources were insufficient." Dismissive Testing Teacher Candidates: The Role of Licensure Tests in Improving Teacher Quality, 2001, p.17 Committee on Assessment and Teacher Quality   Board on Testing and Assessment, National Research Council  
151 Harold F. O’Neil, Jr., University of Southern California, CRESST
Jamal Abedi, UCLA/CRESST, Charlotte Lee, UCLA/CRESST, Judy Miyoshi, UCLA/CRESST, Ann Mastergeorge, UCLA/CRESST "To our knowledge, based on an extensive literature review (to be reported elsewhere), our research group is the only one conducting research of this type; i.e., meaningful monetary incentives with released items from either NAEP or TIMSS with 12th graders." p.1 Firstness Monetary Incentives for Low-Stakes Tests, March 2001 report to USED, CRESST https://nces.ed.gov/pubs2001/2001024.pdf "The work reported herein was funded at least in part with Federal funds from the U.S. Department of Education under the American Institutes for Research (AIR)/Education Statistical Services Institute (ESSI) contract number RN95127001, Task Order 1.2.93.1, as administered by the ... NCES.. The work reported herein was also supported under the Educational Research and Development Centers Program, PR/Award Number R305B60002, as administered by the Office of Educational Research and Improvement (OERI), U.S. Department of Education."  
152 Marguerite Clarke Jamal Abedi, UCLA/CRESST “[T]here has been no analogous infrastructure for independently evaluating a testing program before or after implementation, or for monitoring test use and impact.” p. 19 Dismissive The Adverse Impact of High Stakes Testing on Minority Students: Evidence from 100 Years of Test Data In G. Orfield and M. Kornhaber (Eds.), Raising standards or raising barriers? Inequality and high stakes testing in public education. New York: The Century Foundation (2001) http://files.eric.ed.gov/fulltext/ED450183.pdf The Century Foundation External evaluations of large-scale testing programs not only exist, but represent the norm. 
153 Marguerite Clarke Charlotte Lee, UCLA/CRESST “The effects of testing are now so diverse, widespread, and serious that it is necessary to establish mechanisms for catalyzing inquiry about, and systematic independent scrutiny of them.” p. 20 Dismissive The Adverse Impact of High Stakes Testing on Minority Students: Evidence from 100 Years of Test Data In G. Orfield and M. Kornhaber (Eds.), Raising standards or raising barriers? Inequality and high stakes testing in public education. New York: The Century Foundation (2001) http://files.eric.ed.gov/fulltext/ED450183.pdf The Century Foundation External evaluations of large-scale testing programs not only exist, but represent the norm. 
154 Ronald Deitel Judy Miyoshi, UCLA/CRESST "In the late 1980s, CRESST was among the first to research the measurement of rigorous, discipline-based knowledge for purposes of large-scale assessment." 1stness Center for Research on Evaluation, Standards, and Student Testing (CRESST) clarify the goals and activities of CRESST EducationNews.org, November 18, 2000   Office of Research and Improvement, US Education Department Nonsense. Hundreds, perhaps thousands, of studies of the effects of testing predate CRESST's in the 1980s. See, for example:  https://www.tandfonline.com/doi/full/10.1080/15305058.2011.602920 ; https://nonpartisaneducation.org/Review/Resources/QuantitativeList.htm ; https://nonpartisaneducation.org/Review/Resources/SurveyList.htm ; https://nonpartisaneducation.org/Review/Resources/QualitativeList.htm
155 Marguerite Clarke Ann Mastergeorge, UCLA/CRESST “[F]or most of this century, there has been no infrastructure for independently evaluating a testing programme before or after implementation, or for monitoring test use and impact. The commercial testing industry does not as yet have any structure in place for the regulation and monitoring of appropriate test use.” p. 177 Dismissive Retrospective on Educational Testing and Assessment in the 20th Century Curriculum Studies, 2000, vol. 32, no. 2, http://webpages.uncc.edu/~rglamber/Rsch6109%20Materials/HistoryAchTests_3958652.pdf   External evaluations of large-scale testing programs not only exist, but represent the norm. 
156 Marguerite Clarke Madaus, Horn, and Ramos “Given the paucity of evidence available on the volume of testing over time, we examined five indirect indicators of growth in testing. . . .” p. 169 Dismissive Retrospective on Educational Testing and Assessment in the 20th Century Curriculum Studies, 2000, vol. 32, no. 2 http://webpages.uncc.edu/~rglamber/Rsch6109%20Materials/HistoryAchTests_3958652.pdf   There exist many sources of such information, from the Council of Chief State School Officers (CCSSO), the US Education Department, the US General Accounting Office (GAO), for example.
157 Sheila Barron   "Although this is a topic researchers ... talk about often, very little has been written about the difficulties secondary analysts confront." p.173 Dismissive Difficulties associated with secondary analysis of NAEP data, chapter 9 Grading the Nation's Report Card, National Research Council, 2000 https://www.nap.edu/catalog/9751/grading-the-nations-report-card-research-from-the-evaluation-of National Research Council funders In their 2009 Evaluation of NAEP for the US Education Department, Buckendahl, Davis, Plake, Sireci, Hambleton, Zenisky, & Wells (pp. 77–85) managed to find quite a lot of research on making comparisons between NAEP and state assessments: several of NAEP's own publications, Chromy 2005), Chromy, Ault, Black, & Mosquin (2007), McLaughlin (2000), Schuiz & Mitzel (2005), Sireci, Robin, Meara, Rogers, & Swaminathan (2000),  Stancavage, Et al (2002),  Stoneberg (2007), WestEd (2002), and Wise, Le, Hoffman, & Becker (2004). 
158 Sheila Barron   "...few articles have been written that specifically address the difficulties of using NAEP data." p.173 Dismissive Difficulties associated with secondary analysis of NAEP data, chapter 9 Grading the Nation's Report Card, National Research Council, 2000 https://www.nap.edu/catalog/9751/grading-the-nations-report-card-research-from-the-evaluation-of National Research Council funders In their 2009 Evaluation of NAEP for the US Education Department, Buckendahl, Davis, Plake, Sireci, Hambleton, Zenisky, & Wells (pp. 77–85) managed to find quite a lot of research on making comparisons between NAEP and state assessments: several of NAEP's own publications, Chromy 2005), Chromy, Ault, Black, & Mosquin (2007), McLaughlin (2000), Schuiz & Mitzel (2005), Sireci, Robin, Meara, Rogers, & Swaminathan (2000),  Stancavage, Et al (2002),  Stoneberg (2007), WestEd (2002), and Wise, Le, Hoffman, & Becker (2004). 
159 Herman, Joan L.    “Testing accommodations that attempt to reduce the language load of a test or otherwise compensate for students' reduced language skills (e.g., by providing students more time) are also currently being researched, but answers that are equitable and fair for all students have not yet been found.” p. 8 Dismissive Student Assessment and Student Achievement in the California Public School System (with Brown and Baker) CSE Technical Report 519, April 2000 https://www.cse.ucla.edu/products/reports/TECH519.pdf Office of Research and Improvement, US Education Department
160 Herman, Joan L.    “Thus, the extent to which gains reflect real improvement in learning is an open question (see, e.g., Shepard, 1990).” p. 15 Dismissive Student Assessment and Student Achievement in the California Public School System (with Brown and Baker) CSE Technical Report 519, April 2000 https://www.cse.ucla.edu/products/reports/TECH519.pdf Office of Research and Improvement, US Education Department
161 R. L. Linn   "There are many reasons for the Lake Wobegon Effect, most of which are less sinister than those emphasized by Cannell." Denigrating Assessments and Accountability, p.7 Educational Researcher, March, pp.4–16. https://journals.sagepub.com/doi/abs/10.3102/0013189x029002004 Office of Research and Improvement, US Education Department No. Cannell was exactly right. The cause was corruption, lax security, and cheating. See, for example, https://nonpartisaneducation.org/Review/Articles/v6n3.htm
162 Lorrie A. Shepard   "This portrayal derives mostly from research leading to Wood and Bruner’s original conception of scaffolding, from Vygotskian theory, and from naturalistic studies of effective tutoring described next. Relatively few studies have been undertaken in which explicit feedback interventions have been tried in the context of constructivist instructional settings." Dismissive The Role of Classroom Assessment in Teaching and Learning, p.59 CSE Technical Report 517, February 2000 https://nepc.colorado.edu/sites/default/files/publications/TECH517.pdf Office of Research and Improvement, US Education Department  
163 Lorrie A. Shepard   "The NCTM and NRC visions are idealizations based on beliefs about constructivist pedagogy and reflective practice. Although both are supported by examples of individual teachers who use assessment to improve their teaching, little is known about what kinds of support would be required to help large numbers of teachers develop these strategies or to ensure that teacher education programs prepared teachers to use assessment in these ways. Research is needed to address these basic implementation questions." Dismissive The Role of Classroom Assessment in Teaching and Learning, p.64 CSE Technical Report 517, February 2000 https://nepc.colorado.edu/sites/default/files/publications/TECH517.pdf Office of Research and Improvement, US Education Department Rubbish. Entire books dating back a century were written on the topic, for example:  C.C. Ross, Measurement in Today’s Schools, 1942;  G.M. Ruch, G.D. Stoddard, Tests and Measurements in High School Instruction, 1927;  C.W. Odell, Educational Measurement in High School, 1930. Other testimonies to the abundance of educational testing and empirical research on test use starting in the first half of the twentieth century can be found in Lincoln & Workman 1936, 4, 7; Butts 1947, 605; Monroe 1950, 1461; Holman & Docter 1972, 34; Tyack 1974, 183; and Lohman 1997, 88.
164 Lorrie A. Shepard   "This social-constructivist view of classroom assessment is an idealization. The new ideas and perspectives underlying it have a basis in theory and empirical studies, but how they will work in practice and on a larger scale is not known." Dismissive The Role of Classroom Assessment in Teaching and Learning, p.67 CSE Technical Report 517, February 2000 https://nepc.colorado.edu/sites/default/files/publications/TECH517.pdf Office of Research and Improvement, US Education Department  
165 Marguerite Clarke Madaus, Pedulla, and Shore “The National Board believes that we must as a nation conduct research that helps testing contribute to student learning, classroom practice, and state and district management of school resources.” p. 2 Dismissive An Agenda for Research on Educational Testing NBETPP Statements, Vol. 1, No. 1, Jan. 2000 http://files.eric.ed.gov/fulltext/ED456137.pdf Ford Foundation Rubbish. Entire books dating back a century were written on the topic, for example:  C.C. Ross, Measurement in Today’s Schools, 1942;  G.M. Ruch, G.D. Stoddard, Tests and Measurements in High School Instruction, 1927;  C.W. Odell, Educational Measurement in High School, 1930. Other testimonies to the abundance of educational testing and empirical research on test use starting in the first half of the twentieth century can be found in Lincoln & Workman 1936, 4, 7; Butts 1947, 605; Monroe 1950, 1461; Holman & Docter 1972, 34; Tyack 1974, 183; and Lohman 1997, 88.
166 Marguerite Clarke Madaus, Pedulla, and Shore “Validity research on teacher testing needs to address the following four issues in particular. . .” : [four bullet-point paragraphs follow] p. 3 Dismissive An Agenda for Research on Educational Testing NBETPP Statements, Vol. 1, No. 1, Jan. 2000 http://files.eric.ed.gov/fulltext/ED456137.pdf Ford Foundation  
167 Marguerite Clarke Madaus, Pedulla, and Shore “[W]e need to understand better the relationship between testing and the diversity of the college student body.” p. 6 Dismissive An Agenda for Research on Educational Testing NBETPP Statements, Vol. 1, No. 1, Jan. 2000 http://files.eric.ed.gov/fulltext/ED456137.pdf Ford Foundation  
168 Marguerite Clarke Haney, Madaus We trust that further research will build on this good example and help all of us move from suggestive correlational studies towards more definitive conclusions.” p. 9 1stness High Stakes Testing and High School Completion NBETPP Statements, Volume 1, Number 3, Jan. 2000 http://files.eric.ed.gov/fulltext/ED456139.pdf Ford Foundation Just some of the relevant pre-2008 studies of the effects of minimum-competency or exit exams and the problems with a single passing score include those of Alvarez, Moreno, & Patrinos (2007); Grodsky & Kalogrides (2006); Audette (2005); Orlich (2003); StandardsWork (2003); Meisels, et al. (2003); Braun (2003); Rosenshine (2003); Tighe, Wang, & Foley (2002); Carnoy & Loeb (2002); Baumert & Demmrich (2001); Rosenblatt & Offer (2001); Phelps (2001); Toenjes, Dworkin, Lorence, & Hill (2000); Wenglinsky (2000); Massachusetts Finance Office (2000); DeMars (2000); Bishop (1999, 2000, 2001, & 2004); Grissmer & Flanagan(1998); Strauss, Bowes, Marks, & Plesko (1998); Frederiksen (1994); Ritchie & Thorkildsen (1994); Chao-Qun & Hui (1993); Potter & Wall (1992); Jacobson (1992); Rodgers, et al. (1991); Morris (1991); Winfield (1990); Ligon, Johnstone, Brightman, Davis, et al. (1990); Winfield (1987); Koffler (1987); Losack (1987); Marshall (1987); Hembree (1987); Mangino, Battaille, Washington, & Rumbaut (1986); Michigan Department of Education (1984); Ketchie (1984); Serow (1982); Indiana Education Department (1982); Brunton (1982); Paramore, et al. (1980); Ogden (1979); Down(2) (1979); Wellisch (1978); and Findley (1978).
169 Jay P. Heubert Robert M. Hauser "A growing body of research suggests that tests often do in fact change school and classroom practices (Corbett & Wilson, 1991; Madaus, 1988; Herman & Golan 1993; Smith & Rottenberg, 1991)." p.29 Dismissive High Stakes: Testing for Tracking, Promotion, and Graduation Board on Testing and Assessment, National Research Council, 1999 https://www.nap.edu/catalog/6336/high-stakes-testing-for-tracking-promotion-and-graduation Ford Foundation Rubbish. Entire books dating back a century were written on the topic, for example:  C.C. Ross, Measurement in Today’s Schools, 1942;  G.M. Ruch, G.D. Stoddard, Tests and Measurements in High School Instruction, 1927;  C.W. Odell, Educational Measurement in High School, 1930. Other testimonies to the abundance of educational testing and empirical research on test use starting in the first half of the twentieth century can be found in Lincoln & Workman 1936, 4, 7; Butts 1947, 605; Monroe 1950, 1461; Holman & Docter 1972, 34; Tyack 1974, 183; and Lohman 1997, 88.
170 Jay P. Heubert Robert M. Hauser "A growing body of research suggests that tests often do in fact change school and classroom practices (Corbett & Wilson, 1991; Madaus, 1988; Herman & Golan 1993; Smith & Rottenberg, 1991)." p.29 Denigrating High Stakes: Testing for Tracking, Promotion, and Graduation Board on Testing and Assessment, National Research Council, 1999 https://www.nap.edu/catalog/6336/high-stakes-testing-for-tracking-promotion-and-graduation Ford Foundation Rubbish. Entire books dating back a century were written on the topic, for example:  C.C. Ross, Measurement in Today’s Schools, 1942;  G.M. Ruch, G.D. Stoddard, Tests and Measurements in High School Instruction, 1927;  C.W. Odell, Educational Measurement in High School, 1930. Other testimonies to the abundance of educational testing and empirical research on test use starting in the first half of the twentieth century can be found in Lincoln & Workman 1936, 4, 7; Butts 1947, 605; Monroe 1950, 1461; Holman & Docter 1972, 34; Tyack 1974, 183; and Lohman 1997, 88.
171 Jay P. Heubert Robert M. Hauser "Most standards-based assessments have only recently been implemented or are still being developed. Consequently, it is too early to determine whether they will produce the intended effects on classroom instruction." p.36 Dismissive High Stakes: Testing for Tracking, Promotion, and Graduation Board on Testing and Assessment, National Research Council, 1999 https://www.nap.edu/catalog/6336/high-stakes-testing-for-tracking-promotion-and-graduation Ford Foundation Relevant pre-2000 studies of the effects of standards, alignment, goal setting, setting reachable goals, etc. include those of Mitchell (1999); Morgan & Ramist (1998); the *Southern Regional Education Board (1998); Miles, Bishop, Collins, Fink, Gardner, Grant, Hussain, et al. (1997); the Florida Office of Program Policy Analysis (1997); Pomplun (1997); Schmoker (1996); Aguilera & Hendricks (1996); Banta, Lund, Black & Oblander (1996); Bottoms & Mikos (1995); *Bamburg & Medina (1993); Bishop (1993); the U. S. General Accounting Office (1993); Eckstein & Noah (1993); Mattsson (1993); Brown (1992); Heyneman & Ransom (1992); Whetton (1992); Anderson, Muir, Bateson, Blackmore & Rogers (1990); Csikszentmihalyi (1990); *Levine & Lezotte (1990); LaRoque & Coleman (1989); Hillocks (1987); Willingham & Morris (1986); Resnick & Resnick (1985); Ogle & Fritts (1984); *Natriello & Dornbusch (1984); Brooke & Oxenham (1984); Rentz (1979); Wellisch, MacQueen, Carriere & Dick (1978); *Rosswork (1977); Estes, Colvin & Goodwin (1976); Wood (1953); and Panlasigui & Knight (1930).
172 Jay P. Heubert Robert M. Hauser "A recent review of the available research evidence by Mehrens (1998) reaches several interim conclusions. Drawing on eight studies...." p.36 Dismissive High Stakes: Testing for Tracking, Promotion, and Graduation Board on Testing and Assessment, National Research Council, 1999 https://www.nap.edu/catalog/6336/high-stakes-testing-for-tracking-promotion-and-graduation Ford Foundation Just some of the relevant pre-2008 studies of the effects of minimum-competency or exit exams and the problems with a single passing score include those of Alvarez, Moreno, & Patrinos (2007); Grodsky & Kalogrides (2006); Audette (2005); Orlich (2003); StandardsWork (2003); Meisels, et al. (2003); Braun (2003); Rosenshine (2003); Tighe, Wang, & Foley (2002); Carnoy & Loeb (2002); Baumert & Demmrich (2001); Rosenblatt & Offer (2001); Phelps (2001); Toenjes, Dworkin, Lorence, & Hill (2000); Wenglinsky (2000); Massachusetts Finance Office (2000); DeMars (2000); Bishop (1999, 2000, 2001, & 2004); Grissmer & Flanagan(1998); Strauss, Bowes, Marks, & Plesko (1998); Frederiksen (1994); Ritchie & Thorkildsen (1994); Chao-Qun & Hui (1993); Potter & Wall (1992); Jacobson (1992); Rodgers, et al. (1991); Morris (1991); Winfield (1990); Ligon, Johnstone, Brightman, Davis, et al. (1990); Winfield (1987); Koffler (1987); Losack (1987); Marshall (1987); Hembree (1987); Mangino, Battaille, Washington, & Rumbaut (1986); Michigan Department of Education (1984); Ketchie (1984); Serow (1982); Indiana Education Department (1982); Brunton (1982); Paramore, et al. (1980); Ogden (1979); Down(2) (1979); Wellisch (1978); and Findley (1978).
173 Jay P. Heubert Robert M. Hauser "Although there are no national data summarizing how local districts use standardized tests in certifying students, we do know that serveral of the largest school systems have begun to use test scores in determining grade-to-grade promotion (Chicago) or are considering doing so (New York City, Boston)." p.37 Dismissive High Stakes: Testing for Tracking, Promotion, and Graduation Board on Testing and Assessment, National Research Council, 1999 https://www.nap.edu/catalog/6336/high-stakes-testing-for-tracking-promotion-and-graduation Ford Foundation Just some of the relevant pre-2008 studies of the effects of minimum-competency or exit exams and the problems with a single passing score include those of Alvarez, Moreno, & Patrinos (2007); Grodsky & Kalogrides (2006); Audette (2005); Orlich (2003); StandardsWork (2003); Meisels, et al. (2003); Braun (2003); Rosenshine (2003); Tighe, Wang, & Foley (2002); Carnoy & Loeb (2002); Baumert & Demmrich (2001); Rosenblatt & Offer (2001); Phelps (2001); Toenjes, Dworkin, Lorence, & Hill (2000); Wenglinsky (2000); Massachusetts Finance Office (2000); DeMars (2000); Bishop (1999, 2000, 2001, & 2004); Grissmer & Flanagan(1998); Strauss, Bowes, Marks, & Plesko (1998); Frederiksen (1994); Ritchie & Thorkildsen (1994); Chao-Qun & Hui (1993); Potter & Wall (1992); Jacobson (1992); Rodgers, et al. (1991); Morris (1991); Winfield (1990); Ligon, Johnstone, Brightman, Davis, et al. (1990); Winfield (1987); Koffler (1987); Losack (1987); Marshall (1987); Hembree (1987); Mangino, Battaille, Washington, & Rumbaut (1986); Michigan Department of Education (1984); Ketchie (1984); Serow (1982); Indiana Education Department (1982); Brunton (1982); Paramore, et al. (1980); Ogden (1979); Down(2) (1979); Wellisch (1978); and Findley (1978).
174 Jay P. Heubert Robert M. Hauser "There is very little research that specifically addresses the consequences of graduation testing." p.172 Dismissive High Stakes: Testing for Tracking, Promotion, and Graduation Board on Testing and Assessment, National Research Council, 1999 https://www.nap.edu/catalog/6336/high-stakes-testing-for-tracking-promotion-and-graduation Ford Foundation Just some of the relevant pre-2008 studies of the effects of minimum-competency or exit exams and the problems with a single passing score include those of Alvarez, Moreno, & Patrinos (2007); Grodsky & Kalogrides (2006); Audette (2005); Orlich (2003); StandardsWork (2003); Meisels, et al. (2003); Braun (2003); Rosenshine (2003); Tighe, Wang, & Foley (2002); Carnoy & Loeb (2002); Baumert & Demmrich (2001); Rosenblatt & Offer (2001); Phelps (2001); Toenjes, Dworkin, Lorence, & Hill (2000); Wenglinsky (2000); Massachusetts Finance Office (2000); DeMars (2000); Bishop (1999, 2000, 2001, & 2004); Grissmer & Flanagan(1998); Strauss, Bowes, Marks, & Plesko (1998); Frederiksen (1994); Ritchie & Thorkildsen (1994); Chao-Qun & Hui (1993); Potter & Wall (1992); Jacobson (1992); Rodgers, et al. (1991); Morris (1991); Winfield (1990); Ligon, Johnstone, Brightman, Davis, et al. (1990); Winfield (1987); Koffler (1987); Losack (1987); Marshall (1987); Hembree (1987); Mangino, Battaille, Washington, & Rumbaut (1986); Michigan Department of Education (1984); Ketchie (1984); Serow (1982); Indiana Education Department (1982); Brunton (1982); Paramore, et al. (1980); Ogden (1979); Down(2) (1979); Wellisch (1978); and Findley (1978).
175 Jay P. Heubert Robert M. Hauser "Caterall adds, 'initial boasts and doubts alike regarding the effects of gatekeeping competency testing have met with a paucity of follow-up research." p.172 Dismissive High Stakes: Testing for Tracking, Promotion, and Graduation Board on Testing and Assessment, National Research Council, 1999 https://www.nap.edu/catalog/6336/high-stakes-testing-for-tracking-promotion-and-graduation Ford Foundation Just some of the relevant pre-2008 studies of the effects of minimum-competency or exit exams and the problems with a single passing score include those of Alvarez, Moreno, & Patrinos (2007); Grodsky & Kalogrides (2006); Audette (2005); Orlich (2003); StandardsWork (2003); Meisels, et al. (2003); Braun (2003); Rosenshine (2003); Tighe, Wang, & Foley (2002); Carnoy & Loeb (2002); Baumert & Demmrich (2001); Rosenblatt & Offer (2001); Phelps (2001); Toenjes, Dworkin, Lorence, & Hill (2000); Wenglinsky (2000); Massachusetts Finance Office (2000); DeMars (2000); Bishop (1999, 2000, 2001, & 2004); Grissmer & Flanagan(1998); Strauss, Bowes, Marks, & Plesko (1998); Frederiksen (1994); Ritchie & Thorkildsen (1994); Chao-Qun & Hui (1993); Potter & Wall (1992); Jacobson (1992); Rodgers, et al. (1991); Morris (1991); Winfield (1990); Ligon, Johnstone, Brightman, Davis, et al. (1990); Winfield (1987); Koffler (1987); Losack (1987); Marshall (1987); Hembree (1987); Mangino, Battaille, Washington, & Rumbaut (1986); Michigan Department of Education (1984); Ketchie (1984); Serow (1982); Indiana Education Department (1982); Brunton (1982); Paramore, et al. (1980); Ogden (1979); Down(2) (1979); Wellisch (1978); and Findley (1978).
176 Jay P. Heubert Robert M. Hauser "in one of the few such studies on this topic (Bishop, 1997) compared the Third International Mathematics and Science Study (TIMSS) test scores of countries with and without rigorous graduation tests. He found that countries with demanding exit exams outperformed other countries at a comparable level of development. He concluded, however that such exams were probably not the most important determinant of achievement levels and that more research was needed." p.173 Dismissive High Stakes: Testing for Tracking, Promotion, and Graduation Board on Testing and Assessment, National Research Council, 1999 https://www.nap.edu/catalog/6336/high-stakes-testing-for-tracking-promotion-and-graduation Ford Foundation Relevant pre-2000 studies of the effects of minimum-competency testing and the problems with a single passing score include those of Frederiksen (1994); Winfield (1990); Ligon, Johnstone, Brightman, Davis, et al. (1990); Losack (1987); Mangino & Babcock (1986); Serow (1982); Brunton (1982); Paramore, et al. (1980); Ogden (1979); and Findley (1978).
177 Jay P. Heubert Robert M. Hauser "Very little is known about the specific consequences of passing or failing a high school graduation exam." p.176 Dismissive High Stakes: Testing for Tracking, Promotion, and Graduation Board on Testing and Assessment, National Research Council, 1999 https://www.nap.edu/catalog/6336/high-stakes-testing-for-tracking-promotion-and-graduation Ford Foundation Relevant pre-2000 studies of the effects of minimum-competency testing and the problems with a single passing score include those of Frederiksen (1994); Winfield (1990); Ligon, Johnstone, Brightman, Davis, et al. (1990); Losack (1987); Mangino & Babcock (1986); Serow (1982); Brunton (1982); Paramore, et al. (1980); Ogden (1979); and Findley (1978).
178 Jay P. Heubert Robert M. Hauser "American experience is limited and research is needed to explore their effectiveness. For instance, we do not know how to combine advance notice of high-stakes test requirements, remedial intervention, and opportunity to retake graduation tests." p.180 Dismissive High Stakes: Testing for Tracking, Promotion, and Graduation Board on Testing and Assessment, National Research Council, 1999 https://www.nap.edu/catalog/6336/high-stakes-testing-for-tracking-promotion-and-graduation Ford Foundation Relevant pre-2000 studies of the effects of minimum-competency testing and the problems with a single passing score include those of Frederiksen (1994); Winfield (1990); Ligon, Johnstone, Brightman, Davis, et al. (1990); Losack (1987); Mangino & Babcock (1986); Serow (1982); Brunton (1982); Paramore, et al. (1980); Ogden (1979); and Findley (1978).
179 Jay P. Heubert Robert M. Hauser "Research is also needed to explore the effects of different kinds of high school credentials on employment and other post-school outcomes." p.180 Dismissive High Stakes: Testing for Tracking, Promotion, and Graduation Board on Testing and Assessment, National Research Council, 1999 https://www.nap.edu/catalog/6336/high-stakes-testing-for-tracking-promotion-and-graduation Ford Foundation  
180 Jay P. Heubert Robert M. Hauser "At the same time, solid evaluation research on the most effective remedial approaches is sparse." p.183 Denigrating High Stakes: Testing for Tracking, Promotion, and Graduation Board on Testing and Assessment, National Research Council, 1999 https://www.nap.edu/catalog/6336/high-stakes-testing-for-tracking-promotion-and-graduation Ford Foundation Developmental (i.e., remedial) education researchers have conducted many studies to determine what works best to keep students from failing in their “courses of last resort,” after which there are no alternatives.  Researchers have included Boylan, Roueche, McCabe, Wheeler, Kulik, Bonham, Claxton, Bliss, Schonecker, Chen, Chang, and Kirk.
181 Jay P. Heubert Robert M. Hauser "There is plainly a need for good research on effective remedial eduation." p.183 Denigrating High Stakes: Testing for Tracking, Promotion, and Graduation Board on Testing and Assessment, National Research Council, 1999 https://www.nap.edu/catalog/6336/high-stakes-testing-for-tracking-promotion-and-graduation Ford Foundation Developmental (i.e., remedial) education researchers have conducted many studies to determine what works best to keep students from failing in their “courses of last resort,” after which there are no alternatives.  Researchers have included Boylan, Roueche, McCabe, Wheeler, Kulik, Bonham, Claxton, Bliss, Schonecker, Chen, Chang, and Kirk.
182 Jay P. Heubert Robert M. Hauser "However, in most of the nation, much needs to be done before a world-class curriculum and world-class instruction will be in place." p.277 Dismissive High Stakes: Testing for Tracking, Promotion, and Graduation Board on Testing and Assessment, National Research Council, 1999 https://www.nap.edu/catalog/6336/high-stakes-testing-for-tracking-promotion-and-graduation Ford Foundation  
183 Jay P. Heubert Robert M. Hauser "The committee sees a strong need for better evidence on the benefits and costs of high-stakes testing." p.281 Denigrating High Stakes: Testing for Tracking, Promotion, and Graduation Board on Testing and Assessment, National Research Council, 1999 https://www.nap.edu/catalog/6336/high-stakes-testing-for-tracking-promotion-and-graduation Ford Foundation No. See, for example, Phelps, R.P. (2000, Winter). Estimating the cost of systemwide student testing in the United States. Journal of Education Finance, 25(3) 343–380; Danitz, T. (2001, February 27). Special report: States pay $400 million for tests in 2001. Stateline.org. Pew Center for the States; Hoxby, C.M. (2002). The cost of accountability, in W. M Evers & H.J. Walberg (Eds.), School Accountability, Stanford, CA: Hoover Institution Press; U.S. GAO. (1993, January). Student testing: Current extent and expenditures, with cost estimates for a national examination. GAO/PEMD-93-8. Washington, DC: US General Accounting Office; Phelps, R.P. (1998). Benefit-cost analysis of systemwide student testing, Paper presented at the annual meeting of the American Education Finance Association, Mobile, AL.
184 Jay P. Heubert Robert M. Hauser "Very little is known about the specific consequences of passing or failing a high school graduation exam." p.288 Dismissive High Stakes: Testing for Tracking, Promotion, and Graduation Board on Testing and Assessment, National Research Council, 1999 https://www.nap.edu/catalog/6336/high-stakes-testing-for-tracking-promotion-and-graduation Ford Foundation The many studies of district and state minimum competency or diploma testing programs popular from the 1960s through the 1980s found positive effects for students just below the cut score and mixed effects for students far below and anywhere above.  Researchers have included Fincher, Jackson, Battiste, Corcoran, Jacobsen, Tanner, Boylan, Saxon, Anderson, Muir, Bateson, Blackmore, Rogers, Zigarelli, Schafer, Hultgren, Hawley, Abrams, Seubert, Mazzoni, Brookhart, Mendro, Herrick, Webster, Orsack, Weerasinghe, and Bembry
185 Jay P. Heubert Robert M. Hauser "At present, however, advanced skills are often not well defined and ways of assessing them are not well established." p.289 Denigrating High Stakes: Testing for Tracking, Promotion, and Graduation Board on Testing and Assessment, National Research Council, 1999 https://www.nap.edu/catalog/6336/high-stakes-testing-for-tracking-promotion-and-graduation Ford Foundation  
186 Jay P. Heubert Robert M. Hauser "...in many cases, the demands that full participation of these students [i.e., students with disabilities] place on assessment systems are greater than current assessment knowledge and technology can support." p.191 Dismissive High Stakes: Testing for Tracking, Promotion, and Graduation Board on Testing and Assessment, National Research Council, 1999 https://www.nap.edu/catalog/6336/high-stakes-testing-for-tracking-promotion-and-graduation Ford Foundation  
187 Jay P. Heubert Robert M. Hauser "...available evidence about the possible effects of graduation tests on learning and on high school dropout is inconclusive (e.g., Kreitzer et al., 1989, Reardon, 1996; Catterall, 1990; Cawthorne, 1990; Bishop, 1997). Dismissive High Stakes: Testing for Tracking, Promotion, and Graduation Board on Testing and Assessment, National Research Council, 1999 https://www.nap.edu/catalog/6336/high-stakes-testing-for-tracking-promotion-and-graduation Ford Foundation The many studies of district and state minimum competency or diploma testing programs popular from the 1960s through the 1980s found positive effects for students just below the cut score and mixed effects for students far below and anywhere above.  Researchers have included Fincher, Jackson, Battiste, Corcoran, Jacobsen, Tanner, Boylan, Saxon, Anderson, Muir, Bateson, Blackmore, Rogers, Zigarelli, Schafer, Hultgren, Hawley, Abrams, Seubert, Mazzoni, Brookhart, Mendro, Herrick, Webster, Orsack, Weerasinghe, and Bembry
188 Jay P. Heubert Robert M. Hauser "We do not know how to combine advance notice of high-stakes test requirements, remedial intervention, and opportunity to retake graduation tests. Research is also needed to explore the effects of different kinds of high school credentials on employment and other post-school outcomes." p.289 Dismissive High Stakes: Testing for Tracking, Promotion, and Graduation Board on Testing and Assessment, National Research Council, 1999 https://www.nap.edu/catalog/6336/high-stakes-testing-for-tracking-promotion-and-graduation Ford Foundation The many studies of district and state minimum competency or diploma testing programs popular from the 1960s through the 1980s found positive effects for students just below the cut score and mixed effects for students far below and anywhere above.  Researchers have included Fincher, Jackson, Battiste, Corcoran, Jacobsen, Tanner, Boylan, Saxon, Anderson, Muir, Bateson, Blackmore, Rogers, Zigarelli, Schafer, Hultgren, Hawley, Abrams, Seubert, Mazzoni, Brookhart, Mendro, Herrick, Webster, Orsack, Weerasinghe, and Bembry
189 Robert L. Linn   "Two obvious, but frequently ignored, cautions [from the TIERS experience] are these: . . . " p. 6 Denigrating Assessments and Accountability CSE Technical Report 490 (November 1998) http://www.cse.ucla.edu/products/Reports/TECH490.pdf Office of Research and Improvement, US Education Department  
190 Robert L. Linn   "Moreover, it is critical to recognize first that the choice of constructs matters, and so does the way in which measures are developed and linked to the constructs. Although these two points may be considered obvious, they are too often ignored." p. 13 Denigrating Assessments and Accountability CSE Technical Report 490 (November 1998) http://www.cse.ucla.edu/products/Reports/TECH490.pdf Office of Research and Improvement, US Education Department  
191 Robert L. Linn   “Although that claim is subject to debate, it seldom even gets considered when aggregate results are used either to monitor progress (e.g., NAEP) or for purposes of school, district, or state accountability.” p. 16 Dismissive Assessments and Accountability CSE Technical Report 490 (November 1998) http://www.cse.ucla.edu/products/Reports/TECH490.pdf Office of Research and Improvement, US Education Department  
192 Lawrence O. Picus Alisha Tralli "What is surprising is, given the tremendous emphasis placed on assessment systems to measure school accountability, the relatively minuscule portion of educational expenditures devoted to this important and
highly visible component of the educational system." p.66
Dismissive Alternative assessment programs: What are the true costs?  CSE Technical Report 441, February 1998 https://cresst.org/publications/cresst-publication-2813/?_sf_s=441 Office of Research and Improvement, US Education Department The taxpayers ponied up big time to fund the GAO study, which Picus has spent his whole career misrepresenting, demeaning, or dismissing. By 1998, it is simply not believable that his continuing efforts stem from honest misunderstanding. He is deliberately misrepresenting previous research on the topic in order to advance his own work and career. 
193 Lawrence O. Picus Alisha Tralli "In all of these analyses, except the GAO report, the cost estimates are based on the direct costs of the assessment program. The GAO is the only other organization we are aware of that has attempted to estimate the opportunity costs of personnel time, in attempting to determine the full costs of assessment programs. The GAO study, however, did not focus specifically on state assessment programs that included portfolios, an important factor in the higher cost estimates identified in the present study." p.64 Denigrating Alternative assessment programs: What are the true costs?  CSE Technical Report 441, February 1998 https://cresst.org/publications/cresst-publication-2813/?_sf_s=441 Office of Research and Improvement, US Education Department The previous 63 pages of the Picus and Tralli report claimed: theirs was the first study to look at opportunity costs and all previous studies were "just expenditure studies" that ignored "true" opportunity costs. Then, here, on page 64, they finally admit something a bit truthful about the earlier and vastly better GAO report, but also immediately attempt to demain it, because it did not estimate the costs of Vermont's doomed portfolio program, which did not exist when the GAO did its study.
194 Lawrence O. Picus Alisha Tralli "Costs and expenditures are not synonymous terms. Monk (1995) distinguishes between these two terms. Costs are “measures of what must be foregone to realize some benefit,” while expenditures are “measures of resource flows regardless of their consequence” (p. 365). Expenditures are generally easier to track since accounting systems typically report resource flows by object, e.g., instruction, administration, transportation. Typically, most cost analyses in education focus on these measurable expenditures and ignore the more difficult measures of opportunity. The goal of this report is to move one step beyond past work and estimate these economic costs as well." p.5 Denigrating Alternative assessment programs: What are the true costs?  CSE Technical Report 441, February 1998 https://cresst.org/publications/cresst-publication-2813/?_sf_s=441 Office of Research and Improvement, US Education Department No. Picus & Tralli neither did the first study of opportunity costs, nor the first study of opportunity costs in those two states. The 1993 GAO study did both. See, for example, Phelps, R.P. (2000, Winter). Estimating the cost of systemwide student testing in the United States. Journal of Education Finance, 25(3) 343–380; Danitz, T. (2001, February 27). Special report: States pay $400 million for tests in 2001. Stateline.org. Pew Center for the States; Hoxby, C.M. (2002). The cost of accountability, in W. M Evers & H.J. Walberg (Eds.), School Accountability, Stanford, CA: Hoover Institution Press; U.S. GAO. (1993, January). Student testing: Current extent and expenditures, with cost estimates for a national examination. GAO/PEMD-93-8. Washington, DC: US General Accounting Office; Phelps, R.P. (1998). Benefit-cost analysis of systemwide student testing, Paper presented at the annual meeting of the American Education Finance Association, Mobile, AL.
195 Lawrence O. Picus Alisha Tralli "Although several states have implemented new assessment programs, there has been little research on the costs of developing and implementing these new systems." p.4 Dismissive Alternative assessment programs: What are the true costs?  CSE Technical Report 441, February 1998 https://cresst.org/publications/cresst-publication-2813/?_sf_s=441 Office of Research and Improvement, US Education Department No. Picus & Tralli neither did the first study of opportunity costs, nor the first study of opportunity costs in those two states. The 1993 GAO study did both. See, for example, Phelps, R.P. (2000, Winter). Estimating the cost of systemwide student testing in the United States. Journal of Education Finance, 25(3) 343–380; Danitz, T. (2001, February 27). Special report: States pay $400 million for tests in 2001. Stateline.org. Pew Center for the States; Hoxby, C.M. (2002). The cost of accountability, in W. M Evers & H.J. Walberg (Eds.), School Accountability, Stanford, CA: Hoover Institution Press; U.S. GAO. (1993, January). Student testing: Current extent and expenditures, with cost estimates for a national examination. GAO/PEMD-93-8. Washington, DC: US General Accounting Office; Phelps, R.P. (1998). Benefit-cost analysis of systemwide student testing, Paper presented at the annual meeting of the American Education Finance Association, Mobile, AL.
196 Lawrence O. Picus Alisha Tralli "The purpose of this report is to provide a first detailed analysis of the “economic” or opportunity costs of the testing systems in two states, Kentucky and Vermont." p.2 1stness Alternative assessment programs: What are the true costs?  CSE Technical Report 441, February 1998 https://cresst.org/publications/cresst-publication-2813/?_sf_s=441 Office of Research and Improvement, US Education Department No. Picus & Tralli neither did the first study of opportunity costs, nor the first study of opportunity costs in those two states. The 1993 GAO study did both. See, for example, Phelps, R.P. (2000, Winter). Estimating the cost of systemwide student testing in the United States. Journal of Education Finance, 25(3) 343–380; Danitz, T. (2001, February 27). Special report: States pay $400 million for tests in 2001. Stateline.org. Pew Center for the States; Hoxby, C.M. (2002). The cost of accountability, in W. M Evers & H.J. Walberg (Eds.), School Accountability, Stanford, CA: Hoover Institution Press; U.S. GAO. (1993, January). Student testing: Current extent and expenditures, with cost estimates for a national examination. GAO/PEMD-93-8. Washington, DC: US General Accounting Office; Phelps, R.P. (1998). Benefit-cost analysis of systemwide student testing, Paper presented at the annual meeting of the American Education Finance Association, Mobile, AL.
197 Anne Lewis quoting Arnold Fege, National PTA "The national testing proposal is based on 'quantum leap' theories, not on research, contended Arnold Fege of the National PTA. 'As I listened to the presentations this morning,’ he said, ‘I didn't hear about any research that backs up the introduction of national testing.’ In his opinion, ‘no parent in the country is losing sleep because his or her child is not meeting NAEP standards,’ and even though testing is pervasive in American education, it seems to not have made a big impact on change." Dismissive Assessing Student Achievement: Search for Validity and Balance CSE Technical Report 481 (1997) https://cresst.org/wp-content/uploads/TECH481.pdf Office of Research and Improvement, US Education Department In their 2009 Evaluation of NAEP for the US Education Department, Buckendahl, Davis, Plake, Sireci, Hambleton, Zenisky, & Wells (pp. 77–85) managed to find quite a lot of research on making comparisons between NAEP and state assessments: several of NAEP's own publications, Chromy 2005), Chromy, Ault, Black, & Mosquin (2007), McLaughlin (2000), Schuiz & Mitzel (2005), Sireci, Robin, Meara, Rogers, & Swaminathan (2000),  Stancavage, Et al (2002),  Stoneberg (2007), WestEd (2002), and Wise, Le, Hoffman, & Becker (2004). 
198 Eva L. Baker Zenaida Aguirre-Munoz "The extent and nature of the impact of language skills on performance assessments remains elusive due to the paucity of research in this area." Dismissive Improving the equity and validity of assessment-based information systems, p.3 CSE Technical Report 462, December 1997 https://cresst.org/wp-content/uploads/TECH462.pdf Office of Research and Improvement, US Education Department  
199 Joan L. Herman   "Although conceptual models for analyzing the cost of alternative assessment and for conducting cost-benefit analyses have been formulated (Catterall & Winters, 1994; Picus, 1994), definitive cost studies are yet to be completed (see, however, Picus & Tralli, forthcoming)." p. 30 Dismissive, Denigrating Large-Scale Assessment in Support of School Reform: Lessons in the Search for Alternative Measures CSE Technical Report 446, Oct. 1997 http://www.cse.ucla.edu/products/reports/TECH446.pdf Office of Research and Improvement, US Education Department No. See, for example, Phelps, R.P. (2000, Winter). Estimating the cost of systemwide student testing in the United States. Journal of Education Finance, 25(3) 343–380; Danitz, T. (2001, February 27). Special report: States pay $400 million for tests in 2001. Stateline.org. Pew Center for the States; Hoxby, C.M. (2002). The cost of accountability, in W. M Evers & H.J. Walberg (Eds.), School Accountability, Stanford, CA: Hoover Institution Press; U.S. GAO. (1993, January). Student testing: Current extent and expenditures, with cost estimates for a national examination. GAO/PEMD-93-8. Washington, DC: US General Accounting Office; Phelps, R.P. (1998). Benefit-cost analysis of systemwide student testing, Paper presented at the annual meeting of the American Education Finance Association, Mobile, AL.
200 Robert L. Linn Eva L. Baker "“Very little research has been conducted to validate performance standards, particularly those that include specification of student response attributes.” pp. 26-27 Dismissive Emerging Educational Standards of Performance in the United States CSE Technical Report 437 (August 1997) http://www.cse.ucla.edu/products/reports/TECH437.pdf Office of Research and Improvement, US Education Department Relevant pre-2000 studies of the effects of standards, alignment, goal setting, setting reachable goals, etc. include those of Mitchell (1999); Morgan & Ramist (1998); the *Southern Regional Education Board (1998); Miles, Bishop, Collins, Fink, Gardner, Grant, Hussain, et al. (1997); the Florida Office of Program Policy Analysis (1997); Pomplun (1997); Schmoker (1996); Aguilera & Hendricks (1996); Banta, Lund, Black & Oblander (1996); Bottoms & Mikos (1995); *Bamburg & Medina (1993); Bishop (1993); the U. S. General Accounting Office (1993); Eckstein & Noah (1993); Mattsson (1993); Brown (1992); Heyneman & Ransom (1992); Whetton (1992); Anderson, Muir, Bateson, Blackmore & Rogers (1990); Csikszentmihalyi (1990); *Levine & Lezotte (1990); LaRoque & Coleman (1989); Hillocks (1987); Willingham & Morris (1986); Resnick & Resnick (1985); Ogle & Fritts (1984); *Natriello & Dornbusch (1984); Brooke & Oxenham (1984); Rentz (1979); Wellisch, MacQueen, Carriere & Dick (1978); *Rosswork (1977); Estes, Colvin & Goodwin (1976); Wood (1953); and Panlasigui & Knight (1930).
201 Harold F. O'Neil, Jr. Brenda Sugrue, Jamal Abedi, Eva L. Baker, Shari Golan "However, as d'Ydewalle (1987) has pointed out, 'clear-cut results from neat experiments on the impact of motivation on learning [or performance] do not exist.'" Dismissive Final Report of Experimental Studies on Motivation and NAEP Test Performance, p.5 CSE Technical Report 427, June 1997 https://cresst.org/wp-content/uploads/TECH427.pdf Office of Research and Improvement, US Education Department Relevant studies of the effects of tests and/or accountability program on motivation and instructional practice include those of the *Southern Regional Education Board (1998); Johnson (1998); Schafer, Hultgren, Hawley, Abrams Seubert & Mazzoni (1997); Miles, Bishop, Collins, Fink, Gardner, Grant, Hussain, et al. (1997); Tuckman & Trimble (1997); Clarke & Stephens (1996); Zigarelli (1996); Stevenson, Lee, et al. (1995); Waters, Burger & Burger (1995); Egeland (1995); Prais (1995); Tuckman (1994); Ritchie & Thorkildsen (1994); Brown & Walberg, (1993); Wall & Alderson (1993); Wolf & Rapiau (1993); Eckstein & Noah (1993); Chao-Qun & Hui (1993); Plazak & Mazur (1992); Steedman (1992); Singh, Marimutha & Mukjerjee (1990); *Levine & Lezotte (1990); O’Sullivan (1989); Somerset (1988); Pennycuick & Murphy (1988); Stevens (1984); Marsh (1984); Brunton (1982); Solberg (1977); Foss (1977); *Kirkland (1971); Somerset (1968); Stuit (1947); and Keys (1934).  *Covers many studies; study is a research review, research synthesis, or meta-analysis. "Others have considered the role of tests in incentive programs.  These researchers have included Homme, Csanyi, Gonzales, Rechs, O’Leary, Drabman, Kaszdin, Bootzin, Staats, Cameron, Pierce, McMillan, Corcoran, and Wilson. International organizations, such as the World Bank or the Asian Development Bank, have studied the effects of testing on education programs they sponsor.  Researchers have included Somerset, Heynemann, Ransom, Psacharopoulis, Velez, Brooke, Oxenham, Bude, Chapman, Snyder, and Pronaratna.
Moreover, the mastery learning/mastery testing experiments conducted from the 1960s through today varied incentives, frequency of tests, types of tests, and many other factors to determine the optimal structure of testing programs. Researchers included such notables as Bloom, Carroll, Keller, Block, Burns, Wentling, Anderson, Hymel, Kulik, Tierney, Cross, Okey, Guskey, Gates, and Jones."
202 Harold F. O'Neil, Jr. Brenda Sugrue, Jamal Abedi, Eva L. Baker, Shari Golan "In the educational context, most existing studies have focused on the influence of characteristics of the classroom learning environment, such as rewards, teacher feedback, goal structures, evaluation practices, on either the entecedents of consequences of motivation." Dismissive Final Report of Experimental Studies on Motivation and NAEP Test Performance, p.5 CSE Technical Report 427, June 1997 https://cresst.org/wp-content/uploads/TECH427.pdf Office of Research and Improvement, US Education Department Relevant studies of the effects of tests and/or accountability program on motivation and instructional practice include those of the *Southern Regional Education Board (1998); Johnson (1998); Schafer, Hultgren, Hawley, Abrams Seubert & Mazzoni (1997); Miles, Bishop, Collins, Fink, Gardner, Grant, Hussain, et al. (1997); Tuckman & Trimble (1997); Clarke & Stephens (1996); Zigarelli (1996); Stevenson, Lee, et al. (1995); Waters, Burger & Burger (1995); Egeland (1995); Prais (1995); Tuckman (1994); Ritchie & Thorkildsen (1994); Brown & Walberg, (1993); Wall & Alderson (1993); Wolf & Rapiau (1993); Eckstein & Noah (1993); Chao-Qun & Hui (1993); Plazak & Mazur (1992); Steedman (1992); Singh, Marimutha & Mukjerjee (1990); *Levine & Lezotte (1990); O’Sullivan (1989); Somerset (1988); Pennycuick & Murphy (1988); Stevens (1984); Marsh (1984); Brunton (1982); Solberg (1977); Foss (1977); *Kirkland (1971); Somerset (1968); Stuit (1947); and Keys (1934).  *Covers many studies; study is a research review, research synthesis, or meta-analysis. "Others have considered the role of tests in incentive programs.  These researchers have included Homme, Csanyi, Gonzales, Rechs, O’Leary, Drabman, Kaszdin, Bootzin, Staats, Cameron, Pierce, McMillan, Corcoran, and Wilson. International organizations, such as the World Bank or the Asian Development Bank, have studied the effects of testing on education programs they sponsor.  Researchers have included Somerset, Heynemann, Ransom, Psacharopoulis, Velez, Brooke, Oxenham, Bude, Chapman, Snyder, and Pronaratna.
Moreover, the mastery learning/mastery testing experiments conducted from the 1960s through today varied incentives, frequency of tests, types of tests, and many other factors to determine the optimal structure of testing programs. Researchers included such notables as Bloom, Carroll, Keller, Block, Burns, Wentling, Anderson, Hymel, Kulik, Tierney, Cross, Okey, Guskey, Gates, and Jones."
203 Harold F. O'Neil, Jr. Brenda Sugrue, Jamal Abedi, Eva L. Baker, Shari Golan "Most of the studies that have compared goal orientations have examined their effects on performance during classroom learning activities rather than at the time of test taking." Dismissive Final Report of Experimental Studies on Motivation and NAEP Test Performance, p.7 CSE Technical Report 427, June 1997 https://cresst.org/wp-content/uploads/TECH427.pdf Office of Research and Improvement, US Education Department Relevant studies of the effects of tests and/or accountability program on motivation and instructional practice include those of the *Southern Regional Education Board (1998); Johnson (1998); Schafer, Hultgren, Hawley, Abrams Seubert & Mazzoni (1997); Miles, Bishop, Collins, Fink, Gardner, Grant, Hussain, et al. (1997); Tuckman & Trimble (1997); Clarke & Stephens (1996); Zigarelli (1996); Stevenson, Lee, et al. (1995); Waters, Burger & Burger (1995); Egeland (1995); Prais (1995); Tuckman (1994); Ritchie & Thorkildsen (1994); Brown & Walberg, (1993); Wall & Alderson (1993); Wolf & Rapiau (1993); Eckstein & Noah (1993); Chao-Qun & Hui (1993); Plazak & Mazur (1992); Steedman (1992); Singh, Marimutha & Mukjerjee (1990); *Levine & Lezotte (1990); O’Sullivan (1989); Somerset (1988); Pennycuick & Murphy (1988); Stevens (1984); Marsh (1984); Brunton (1982); Solberg (1977); Foss (1977); *Kirkland (1971); Somerset (1968); Stuit (1947); and Keys (1934).  *Covers many studies; study is a research review, research synthesis, or meta-analysis. "Others have considered the role of tests in incentive programs.  These researchers have included Homme, Csanyi, Gonzales, Rechs, O’Leary, Drabman, Kaszdin, Bootzin, Staats, Cameron, Pierce, McMillan, Corcoran, and Wilson. International organizations, such as the World Bank or the Asian Development Bank, have studied the effects of testing on education programs they sponsor.  Researchers have included Somerset, Heynemann, Ransom, Psacharopoulis, Velez, Brooke, Oxenham, Bude, Chapman, Snyder, and Pronaratna.
Moreover, the mastery learning/mastery testing experiments conducted from the 1960s through today varied incentives, frequency of tests, types of tests, and many other factors to determine the optimal structure of testing programs. Researchers included such notables as Bloom, Carroll, Keller, Block, Burns, Wentling, Anderson, Hymel, Kulik, Tierney, Cross, Okey, Guskey, Gates, and Jones."
204 Harold F. O'Neil, Jr. Brenda Sugrue, Jamal Abedi, Eva L. Baker, Shari Golan "As yet, there appear to be no published studies that investigate the direct and indirect causal paths from motivational antecedents through use of metacognitive strategies to achievement."  Dismissive Final Report of Experimental Studies on Motivation and NAEP Test Performance, p.8 CSE Technical Report 427, June 1997 https://cresst.org/wp-content/uploads/TECH427.pdf Office of Research and Improvement, US Education Department Relevant studies of the effects of tests and/or accountability program on motivation and instructional practice include those of the *Southern Regional Education Board (1998); Johnson (1998); Schafer, Hultgren, Hawley, Abrams Seubert & Mazzoni (1997); Miles, Bishop, Collins, Fink, Gardner, Grant, Hussain, et al. (1997); Tuckman & Trimble (1997); Clarke & Stephens (1996); Zigarelli (1996); Stevenson, Lee, et al. (1995); Waters, Burger & Burger (1995); Egeland (1995); Prais (1995); Tuckman (1994); Ritchie & Thorkildsen (1994); Brown & Walberg, (1993); Wall & Alderson (1993); Wolf & Rapiau (1993); Eckstein & Noah (1993); Chao-Qun & Hui (1993); Plazak & Mazur (1992); Steedman (1992); Singh, Marimutha & Mukjerjee (1990); *Levine & Lezotte (1990); O’Sullivan (1989); Somerset (1988); Pennycuick & Murphy (1988); Stevens (1984); Marsh (1984); Brunton (1982); Solberg (1977); Foss (1977); *Kirkland (1971); Somerset (1968); Stuit (1947); and Keys (1934).  *Covers many studies; study is a research review, research synthesis, or meta-analysis. "Others have considered the role of tests in incentive programs.  These researchers have included Homme, Csanyi, Gonzales, Rechs, O’Leary, Drabman, Kaszdin, Bootzin, Staats, Cameron, Pierce, McMillan, Corcoran, and Wilson. International organizations, such as the World Bank or the Asian Development Bank, have studied the effects of testing on education programs they sponsor.  Researchers have included Somerset, Heynemann, Ransom, Psacharopoulis, Velez, Brooke, Oxenham, Bude, Chapman, Snyder, and Pronaratna.
Moreover, the mastery learning/mastery testing experiments conducted from the 1960s through today varied incentives, frequency of tests, types of tests, and many other factors to determine the optimal structure of testing programs. Researchers included such notables as Bloom, Carroll, Keller, Block, Burns, Wentling, Anderson, Hymel, Kulik, Tierney, Cross, Okey, Guskey, Gates, and Jones."
205 Harold F. O'Neil, Jr. Brenda Sugrue, Jamal Abedi, Eva L. Baker, Shari Golan "In general, there is a need for more studies to focus on the effects on test performance of motivational antecedents (not just anxiety) introduced at the time of test taking." Dismissive Final Report of Experimental Studies on Motivation and NAEP Test Performance, p.10 CSE Technical Report 427, June 1997 https://cresst.org/wp-content/uploads/TECH427.pdf Office of Research and Improvement, US Education Department Relevant studies of the effects of tests and/or accountability program on motivation and instructional practice include those of the *Southern Regional Education Board (1998); Johnson (1998); Schafer, Hultgren, Hawley, Abrams Seubert & Mazzoni (1997); Miles, Bishop, Collins, Fink, Gardner, Grant, Hussain, et al. (1997); Tuckman & Trimble (1997); Clarke & Stephens (1996); Zigarelli (1996); Stevenson, Lee, et al. (1995); Waters, Burger & Burger (1995); Egeland (1995); Prais (1995); Tuckman (1994); Ritchie & Thorkildsen (1994); Brown & Walberg, (1993); Wall & Alderson (1993); Wolf & Rapiau (1993); Eckstein & Noah (1993); Chao-Qun & Hui (1993); Plazak & Mazur (1992); Steedman (1992); Singh, Marimutha & Mukjerjee (1990); *Levine & Lezotte (1990); O’Sullivan (1989); Somerset (1988); Pennycuick & Murphy (1988); Stevens (1984); Marsh (1984); Brunton (1982); Solberg (1977); Foss (1977); *Kirkland (1971); Somerset (1968); Stuit (1947); and Keys (1934).  *Covers many studies; study is a research review, research synthesis, or meta-analysis. "Others have considered the role of tests in incentive programs.  These researchers have included Homme, Csanyi, Gonzales, Rechs, O’Leary, Drabman, Kaszdin, Bootzin, Staats, Cameron, Pierce, McMillan, Corcoran, and Wilson. International organizations, such as the World Bank or the Asian Development Bank, have studied the effects of testing on education programs they sponsor.  Researchers have included Somerset, Heynemann, Ransom, Psacharopoulis, Velez, Brooke, Oxenham, Bude, Chapman, Snyder, and Pronaratna.
Moreover, the mastery learning/mastery testing experiments conducted from the 1960s through today varied incentives, frequency of tests, types of tests, and many other factors to determine the optimal structure of testing programs. Researchers included such notables as Bloom, Carroll, Keller, Block, Burns, Wentling, Anderson, Hymel, Kulik, Tierney, Cross, Okey, Guskey, Gates, and Jones."
206 Brian M. Stecher Stephen P. Klein "In constrast, relatively little has been published on the costs of such measures [performance tests] in operational programs. An Office of Technology Assessment (1992) … (Hoover and Bray) …." Dismissive The Cost of Science Performance Assessments in Large-Scale Testing Programs, p.1 Educational Evaluation and Policy Analysis, Spring 1997, 19(1)   "This article is based on work supported by the National Science Foundation under Grant No. MDR-9154406." p.12 The January 1993 GAO report on testing costs included such information. CRESST has spent a quarter century denigrating that report.
207 Brian M. Stecher Stephen P. Klein "However, empirical and observational data suggest much more needs to be done to understand what hands-on tasks actually measure. Klein et al. (1996b) … Shavelson et al. (1992) … Hamilton (1994) …." pp.9-10 Dismissive The Cost of Science Performance Assessments in Large-Scale Testing Programs, p.1 Educational Evaluation and Policy Analysis, Spring 1997, 19(1)   "This article is based on work supported by the National Science Foundation under Grant No. MDR-9154406." p.12 Article references only works by other CRESST authors and completely ignores the career-tech education literature, where such studies are most likely to be found.
208 Brian M. Stecher Stephen P. Klein "Future research will no doubt shed more light on the validity question, but for now, it is not clear how scores on hands-on performance tasks should be interpreted." p.10 Dismissive The Cost of Science Performance Assessments in Large-Scale Testing Programs, p.1 Educational Evaluation and Policy Analysis, Spring 1997, 19(1)   "This article is based on work supported by the National Science Foundation under Grant No. MDR-9154406." p.12 Article references only works by other CRESST authors and completely ignores the career-tech education literature, where such studies are most likely to be found.
209 Brian M. Stecher Stephen P. Klein "Advocates of performance assessment believe that the use of these measures will reinforce efforts to reform curriculum and instruction. … Unfortunately, there is very little research to confirm either the existence or the size of most off these potential benefits. Those few studies ... Klein (1995) ... Javonovic, Solanno-Flores, & Shavelson, 1994; Klein et al., 1996a)." p.10 Dismissive The Cost of Science Performance Assessments in Large-Scale Testing Programs, p.1 Educational Evaluation and Policy Analysis, Spring 1997, 19(1)   "This article is based on work supported by the National Science Foundation under Grant No. MDR-9154406." p.12 Article references only works by other CRESST authors and completely ignores the career-tech education literature, where such studies are most likely to be found.
210 Mary Lee Smith 11 others "The purpose of the research described in this report is to understand what happens in the aftermath of a change in state assessment policy that is designed to improve schools and make them more accountable to a set of common standards. Although theoretical and rhetorical works about this issue are common in the literature, empirical evidence is novel and scant." Dismissive Reforming schools by reforming assessment: Consequences of the Arizona Student Assessment Program (ASAP): Equity and teacher capacity building, p.3 CSE Technical Report 425, March 1997 https://cresst.org/wp-content/uploads/TECH425.pdf Office of Research and Improvement, US Education Department Relevant pre-2000 studies of the effects of standards, alignment, goal setting, setting reachable goals, etc. include those of Mitchell (1999); Morgan & Ramist (1998); the *Southern Regional Education Board (1998); Miles, Bishop, Collins, Fink, Gardner, Grant, Hussain, et al. (1997); the Florida Office of Program Policy Analysis (1997); Pomplun (1997); Schmoker (1996); Aguilera & Hendricks (1996); Banta, Lund, Black & Oblander (1996); Bottoms & Mikos (1995); *Bamburg & Medina (1993); Bishop (1993); the U. S. General Accounting Office (1993); Eckstein & Noah (1993); Mattsson (1993); Brown (1992); Heyneman & Ransom (1992); Whetton (1992); Anderson, Muir, Bateson, Blackmore & Rogers (1990); Csikszentmihalyi (1990); *Levine & Lezotte (1990); LaRoque & Coleman (1989); Hillocks (1987); Willingham & Morris (1986); Resnick & Resnick (1985); Ogle & Fritts (1984); *Natriello & Dornbusch (1984); Brooke & Oxenham (1984); Rentz (1979); Wellisch, MacQueen, Carriere & Dick (1978); *Rosswork (1977); Estes, Colvin & Goodwin (1976); Wood (1953); and Panlasigui & Knight (1930).
211 Robert L. Linn Joan L. Herman "How much do standards-led assessments costs? Dependable estimates are difficult to obtain, in part because many of the costs associated with assessment -- the time spent by teachers in preparation, administration, and scoring -- are typically absorbed by schools' normal operations and not prices in a separate budget." p.14 Denigrating A Policymaker's Guide to Standards-Led Assessment Education Commission of the States, February, 1997     The January 1993 GAO report on testing costs included such information. CRESST has spent a quarter century denigrating that report. See, for example, Phelps, R.P. (2000, Winter). Estimating the cost of systemwide student testing in the United States. Journal of Education Finance, 25(3) 343–380; Danitz, T. (2001, February 27). Special report: States pay $400 million for tests in 2001. Stateline.org. Pew Center for the States; Hoxby, C.M. (2002). The cost of accountability, in W. M Evers & H.J. Walberg (Eds.), School Accountability, Stanford, CA: Hoover Institution Press; U.S. GAO. (1993, January). Student testing: Current extent and expenditures, with cost estimates for a national examination. GAO/PEMD-93-8. Washington, DC: US General Accounting Office; Phelps, R.P. (1998). Benefit-cost analysis of systemwide student testing, Paper presented at the annual meeting of the American Education Finance Association, Mobile, AL.
212 Robert L. Linn Joan L. Herman "None of the above estimates includes operational costs for schools, districts, or states." p.14 Denigrating A Policymaker's Guide to Standards-Led Assessment Education Commission of the States, February, 1997     The January 1993 GAO report on testing costs included such information. CRESST has spent a quarter century denigrating that report.
213 Eva L. Baker Robert L. Linn, Joan L. Herman "How do we assure accurate placement of students with varying abilities and language capabilities? There is little research to date to guide policy and practice (August, et al., 1994)." Dismissive CRESST: A Continuing Mission to Improve Educational Assessment, p.12 Evaluation Comment, Summer 1996   Office of Research and Improvement, US Education Department  
214 Eva L. Baker Robert L. Linn, Joan L. Herman "Alternative assessments are needed for these students (see Kentucky Portfolios for Special Education, Kentucky Department of Education, 1995). Although promising, there has been little or no research investigating the validity of inferences from these adaptations or alternatives." Dismissive CRESST: A Continuing Mission to Improve Educational Assessment, p.13 Evaluation Comment, Summer 1996   Office of Research and Improvement, US Education Department  
215 Eva L. Baker Robert L. Linn, Joan L. Herman "Similarly, research is needed to provide a basis for understanding the implications of using different summaries of student performance, such as group means or percentage of students meeting a standard, for measuring progress." p.15 Dismissive CRESST: A Continuing Mission to Improve Educational Assessment Evaluation Comment, Summer 1996   Office of Research and Improvement, US Education Department  
216 Robert L. Linn Daniel M. Koretz, Eva Baker “’Yet we do not have the necessary comprehensive dependable data. . . .’ (Tyler 1996a, p. 95)” p. 8 Dismissive Assessing the Validity of the National Assessment of Educational Progress CSE Technical Report 416 (June 1996) http://www.cse.ucla.edu/products/reports/TECH416.pdf Office of Research and Improvement, US Education Department In their 2009 Evaluation of NAEP for the US Education Department, Buckendahl, Davis, Plake, Sireci, Hambleton, Zenisky, & Wells (pp. 77–85) managed to find quite a lot of research on making comparisons between NAEP and state assessments: several of NAEP's own publications, Chromy 2005), Chromy, Ault, Black, & Mosquin (2007), McLaughlin (2000), Schuiz & Mitzel (2005), Sireci, Robin, Meara, Rogers, & Swaminathan (2000),  Stancavage, Et al (2002),  Stoneberg (2007), WestEd (2002), and Wise, Le, Hoffman, & Becker (2004). 
217 Robert L. Linn Daniel M. Koretz, Eva Baker "“There is a need for more extended discussion and reconsideration of the approach being used to measure long-term trends.” p. 21  Dismissive Assessing the Validity of the National Assessment of Educational Progress CSE Technical Report 416 (June 1996) http://www.cse.ucla.edu/products/reports/TECH416.pdf Office of Research and Improvement, US Education Department There was extended discussion and cosideration. Simply put, they did not get their way because others disagreed with them.
218 Robert L. Linn Daniel M. Koretz, Eva Baker "“Only a small minority of the articles that discussed achievement levels made any mention of the judgmental nature of the levels, and most of those did so only briefly.” p. 27 Denigrating Assessing the Validity of the National Assessment of Educational Progress CSE Technical Report 416 (June 1996) http://www.cse.ucla.edu/products/reports/TECH416.pdf Office of Research and Improvement, US Education Department All achievement levels, just like all course grades, are set subjectively. This information was never hidden.
219 Thomas Kellaghan George F. Madaus, Anastasia Raczek "The limited evidence on the effectiveness of external, or extrinsic, rewards in education is also reviewed." p.vii Dismissive The Use of External Examinations to Improve Student Motication American Educational Research Association monograph   "Work on this monograph was supported by Grant 910-1205-1 from the Ford Foundation." See, for example:  https://www.tandfonline.com/doi/full/10.1080/15305058.2011.602920 ; https://nonpartisaneducation.org/Review/Resources/QuantitativeList.htm .  This list includes 24 studies completed before 2000 whose primary focus was to measure the effect of “test-based accountability.” A few dozen more pre-2000 studies also measured the effect of test-based accountability although such was not their primary focus. Include qualitative and program evaluation studies of test-based accountability, and the count of pre-2000 studies rises into the hundreds. 
220 Lawrence O. Picus Alisha Tralli, Suzanne Tacheny "Although several states have implemented new assessment programs, there has been little research on the costs of developing and implementing these new systems." p.4 Dismissive Estimating the Costs of Student Assessment in North Carolina and Kentucky: A State-Level Analysis CSE Technical Report 408 (February 1996) http://www.cse.ucla.edu/products/reports/TECH408.pdf Office of Research and Improvement, US Education Department The January 1993 GAO report on testing costs included such information. Picus has spent over two decades denigrating that report, both directly ad by insinuation.
221 Lawrence O. Picus Alisha Tralli, Suzanne Tacheny "Although several states have implemenmted new assessment programs, there has been little research on the cost of developing and implementing these new systems." p.3 Dismissive Estimating the Costs of Student Assessment in North Carolina and Kentucky: A State-Level Analysis CSE Technical Report 408 (February 1996) http://www.cse.ucla.edu/products/reports/TECH408.pdf Office of Research and Improvement, US Education Department The January 1993 GAO report on testing costs included such information. Picus has spent over two decades denigrating that report, both directly ad by insinuation.
222 Thomas Kellaghan George F. Madaus, Anastasia Raczek "At the very least, a careful analysis of relecvant issues and a consideration of empirical evidence are required before reaching such a conclusion.   However, the arguments put forward by reformers are not based on such analysis or consideration. Indeed, their arguments often lack clarity, even in the terminology they use. Further, although not much research deals directly with the relationship between external examinations and motivation, ..." p.2 Dismissive, Denigrating The Use of External Examinations to Improve Student Motication American Educational Research Association monograph   "Work on this monograph was supported by Grant 910-1205-1 from the Ford Foundation." Relevant studies of the effects of tests and/or accountability program on motivation and instructional practice include those of the *Southern Regional Education Board (1998); Johnson (1998); Schafer, Hultgren, Hawley, Abrams Seubert & Mazzoni (1997); Miles, Bishop, Collins, Fink, Gardner, Grant, Hussain, et al. (1997); Tuckman & Trimble (1997); Clarke & Stephens (1996); Zigarelli (1996); Stevenson, Lee, et al. (1995); Waters, Burger & Burger (1995); Egeland (1995); Prais (1995); Tuckman (1994); Ritchie & Thorkildsen (1994); Brown & Walberg, (1993); Wall & Alderson (1993); Wolf & Rapiau (1993); Eckstein & Noah (1993); Chao-Qun & Hui (1993); Plazak & Mazur (1992); Steedman (1992); Singh, Marimutha & Mukjerjee (1990); *Levine & Lezotte (1990); O’Sullivan (1989); Somerset (1988); Pennycuick & Murphy (1988); Stevens (1984); Marsh (1984); Brunton (1982); Solberg (1977); Foss (1977); *Kirkland (1971); Somerset (1968); Stuit (1947); and Keys (1934).  *Covers many studies; study is a research review, research synthesis, or meta-analysis.
223 Thomas Kellaghan George F. Madaus, Anastasia Raczek "The final proposition in the armory of proponents of external examinations anticipates that all students at selected grades at both elementary and high school levels will take such examinations. This proposition is presumably based on the unexamined assumption that the motivational power of examinations will operate more or less the same way for students of all ages." p.10 Dismissive, Denigrating The Use of External Examinations to Improve Student Motication American Educational Research Association monograph   "Work on this monograph was supported by Grant 910-1205-1 from the Ford Foundation." Relevant studies of the effects of tests and/or accountability program on motivation and instructional practice include those of the *Southern Regional Education Board (1998); Johnson (1998); Schafer, Hultgren, Hawley, Abrams Seubert & Mazzoni (1997); Miles, Bishop, Collins, Fink, Gardner, Grant, Hussain, et al. (1997); Tuckman & Trimble (1997); Clarke & Stephens (1996); Zigarelli (1996); Stevenson, Lee, et al. (1995); Waters, Burger & Burger (1995); Egeland (1995); Prais (1995); Tuckman (1994); Ritchie & Thorkildsen (1994); Brown & Walberg, (1993); Wall & Alderson (1993); Wolf & Rapiau (1993); Eckstein & Noah (1993); Chao-Qun & Hui (1993); Plazak & Mazur (1992); Steedman (1992); Singh, Marimutha & Mukjerjee (1990); *Levine & Lezotte (1990); O’Sullivan (1989); Somerset (1988); Pennycuick & Murphy (1988); Stevens (1984); Marsh (1984); Brunton (1982); Solberg (1977); Foss (1977); *Kirkland (1971); Somerset (1968); Stuit (1947); and Keys (1934).  *Covers many studies; study is a research review, research synthesis, or meta-analysis.
224 Robert L. Linn Eva L. Baker "Although the connection between student achievement and economic competitiveness is not well established, exhortations for higher standards of student achievement nonetheless are frequently based on the assumption of a strong connection." Dismissive What Do International Assessments Imply for World-Class Standards? Educational Evaluation and Policy Analysis, Dec. 1, 1995 https://journals.sagepub.com/doi/abs/10.3102/01623737017004405 Office of Research and Improvement, US Education Department  
225 Lawrence O. Picus   "While our understanding of how each of these assessment instruments can best be used is growing, information of their costs is virtually nonexistent." p.1 Dismissive A Conceptual Framework for Analyzing the Costs of Alternative Assessment CSE Technical Report 384 (August 1994) https://cresst.org/wp-content/uploads/TECH384.pdf Office of Research and Improvement, US Education Department The January 1993 GAO report on testing costs included such information. Picus has spent over two decades denigrating that report, both directly and by insinuation.
226 Lawrence O. Picus   "Research at the Center for Research on Evaluation, Standards, and Student Testing (CRESST) has found that policy makers have little information about the costs of alternative assessments, and that they are concerned abou the cost trade-offs involved in using alternative assessment compared to the many other activities they feel continue to be necessary." p.1 Dismissive A Conceptual Framework for Analyzing the Costs of Alternative Assessment CSE Technical Report 384 (August 1994) https://cresst.org/wp-content/uploads/TECH384.pdf Office of Research and Improvement, US Education Department The January 1993 GAO report on testing costs included such information. Picus has spent over two decades denigrating that report, both directly and by insinuation.
227 Lawrence O. Picus   "A number of important issues must be resolved before accurate estimates of costs can be developed. Central among those issues is the development of a clear definition of what constitutes a cost." p.1 Denigrating A Conceptual Framework for Analyzing the Costs of Alternative Assessment CSE Technical Report 384 (August 1994) https://cresst.org/wp-content/uploads/TECH384.pdf Office of Research and Improvement, US Education Department The January 1993 GAO report on testing costs included such information. Picus has spent over two decades denigrating that report, both directly and by insinuation.
228 Lawrence O. Picus   "Determining the resources necessary to achieve each of these goals is, at best, a difficult task. Because of this difficulty, many analysts stop short of estimating the true cost of a program, and instead focus on the expenditures required for its implementation." pp.3-4 Denigrating A Conceptual Framework for Analyzing the Costs of Alternative Assessment CSE Technical Report 384 (August 1994) https://cresst.org/wp-content/uploads/TECH384.pdf Office of Research and Improvement, US Education Department The January 1993 GAO report on testing costs included such information. Picus has spent over two decades denigrating that report, both directly and by insinuation.
229 Lawrence O. Picus   "… cost analysts in education have often resorted to estimating the monetary value of the resources devoted to the program being evaluated. ... However, it is important to remember the opportunity costs that result from time commitments of individuals not directly compensated through the assessment program, such as the teachers who are required to spend time on tasks that previously did not exist or were not their responsibility. Determining the value of these opportunity costs will improve the quality of educational cost analyses dramatically." p.33 Denigrating A Conceptual Framework for Analyzing the Costs of Alternative Assessment CSE Technical Report 384 (August 1994) https://cresst.org/wp-content/uploads/TECH384.pdf Office of Research and Improvement, US Education Department The January 1993 GAO report on testing costs included such information. Picus has spent over two decades denigrating that report, both directly and by insinuation.
230 Mary Lee Smith 5 others "This study also draws on previous research on the role of mandated testing. …The question unanswered by extant research is whether assessments that differ in form from the traditional, norm- or criterion-referenced standardized tests would produce similar reactions and effects." Dismissive What Happens When the Test Mandate Changes? Results of a Multiple Case Study CSE Technical Report 380, July 1994 https://cresst.org/wp-content/uploads/TECH380.pdf Office of Research and Improvement, US Education Department Emerging? It is selected-response item formats (e.g., multiple choice) that are new. Open-ended item formats have been with us for millenia. And, thousands of research, evalution, and validity studies have been conducted on them.
231 Linn, R.L.   "Evidence is also needed that the uses and interpretations are contributing to enhanced student achievement and at the same time, not producing unintended negative outcomes." p.8   Performance Assessment: Policy promises and technical measurement standards.  Educational Researcher, 23(9), 4-14, 1994 As quoted in William A. Mehrens, Consequences of Assessment: What is the Evidence?, Education Policy Analysis Archives Volume 6 Number 13 July 14, 1998,  https://epaa.asu.edu/ojs/article/view/580/ Office of Research and Improvement, US Education Department  
232 Audrey J. Noble Mary Lee Smith "Are the behaviorist beliefs underlying measurement-driven reform warranted? A small body of evidence addresses the functions of assessments from the traditional viewpoint. Dismissive Old and New Beliefs About Measurement-Driven Reform: The More Things Change, the More They Stay the Same, p.3 CSE Technical Report 373, CRESST/Arizona State University https://cresst.org/wp-content/uploads/TECH373.pdf Office of Research and Improvement, US Education Department Emerging? It is selected-response item formats (e.g., multiple choice) that are new. Open-ended item formats have been with us for millenia. And, thousands of research, evalution, and validity studies have been conducted on them.
233 Audrey J. Noble Mary Lee Smith "Few empirical studies exist of the use and effects of performance testing in
high-stakes  environments."
Dismissive Old and New Beliefs About Measurement-Driven Reform: The More Things Change, the More They Stay the Same, p.10 CSE Technical Report 373, CRESST/Arizona State University https://cresst.org/wp-content/uploads/TECH373.pdf Office of Research and Improvement, US Education Department Emerging? It is selected-response item formats (e.g., multiple choice) that are new. Open-ended item formats have been with us for millenia. And, thousands of research, evalution, and validity studies have been conducted on them.
234 Baker, E.L. O'Neil, H.F., & Linn, R.L.  "Sufficient high-quality assessments must be available before their impact on educational reform can be assessed. Although interest in performance-based assessment is high, our knowledge about its quality is low." Dismissive Policy and validity prospects for performance-based assessment, 1993, p.332 American Psychologist, 48(12), 1210-1218. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.816.7823&rep=rep1&type=pdf Office of Research and Improvement, US Education Department It is selected-response item formats (e.g., multiple choice) that are new. Performance assessments have been with us for millenia. And, thousands of research, evalution, and validity studies have been conducted on them.
235 Baker, E.L. O'Neil, H.F., & Linn, R.L.  "Moreover, few psychometric templates exist to guide the technical practices of assessment developers." Dismissive Policy and validity prospects for performance-based assessment, 1993, p.332 American Psychologist, 48(12), 1210-1218. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.816.7823&rep=rep1&type=pdf Office of Research and Improvement, US Education Department It is selected-response item formats (e.g., multiple choice) that are new. Performance assessments have been with us for millenia. And, thousands of research, evalution, and validity studies have been conducted on them.
236 Baker, E.L. O'Neil, H.F., & Linn, R.L.  "Most of the arguments in favor of performance-based assessment ... are based on single instances, essentially hand-crafted exercises whose virtues are assumed because they have been developed by teachers or because they are thought to model good instructional practice."  Denigrating Policy and validity prospects for performance-based assessment, 1993, p.334 American Psychologist, 48(12), 1210-1218. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.816.7823&rep=rep1&type=pdf Office of Research and Improvement, US Education Department It is selected-response item formats (e.g., multiple choice) that are new. Open-ended item formats have been with us for millenia. And, thousands of research, evalution, and validity studies have been conducted on them.
237 Baker, E.L. O'Neil, H.F., & Linn, R.L.  "Although there is a considerable literature on the problem of unit or team assessment in the military (Swezey & Salas, 1992) and in technical fields such as antisubmarine warfare (Franken, in press), no compelling solutions have been forwarded for disaggregating group or team performance into individual records, a potential problem if assessments are to be used to allocate individual access or certification." Denigrating Policy and validity prospects for performance-based assessment, 1993, p.336 American Psychologist, 48(12), 1210-1218. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.816.7823&rep=rep1&type=pdf Office of Research and Improvement, US Education Department It is selected-response item formats (e.g., multiple choice) that are new. Open-ended item formats have been with us for millenia. And, thousands of research, evalution, and validity studies have been conducted on them.
238 Baker, E.L. O'Neil, H.F., & Linn, R.L.  "What is the evidence in support of performance assessment? Reviews conducted of literature in military performance assessments (Baker, O’Neil, & Linn, 1990) and of literature in education (Baker, 1990b) have reported the relatively low incidence of any empirical literature in the field; less than 5% of the literature cited empirical data." Dismissive Policy and validity prospects for performance-based assessment, 1993, p.339-340 American Psychologist, 48(12), 1210-1218. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.816.7823&rep=rep1&type=pdf Office of Research and Improvement, US Education Department It is selected-response item formats (e.g., multiple choice) that are new. Open-ended item formats have been with us for millenia. And, thousands of research, evalution, and validity studies have been conducted on them.
239 Baker, E.L. O'Neil, H.F., & Linn, R.L.  "To date, there is some evidence that precollegiate performance assessments result in relatively low levels of student performance in almost every subject matter area in which they have been tried. There is also emerging data from NAEP analyses (Koretz, Lewis, Skewes-Cox, & Burstein, 1992) that students differ by ethnicity in the rate at which they attempt more open-ended types of items." Dismissive Policy and validity prospects for performance-based assessment, 1993, p.341 American Psychologist, 48(12), 1210-1218. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.816.7823&rep=rep1&type=pdf Office of Research and Improvement, US Education Department It is selected-response item formats (e.g., multiple choice) that are new. Open-ended item formats have been with us for millenia. And, thousands of research, evalution, and validity studies have been conducted on them.
240 Baker, E.L. O'Neil, H.F., & Linn, R.L.  "Research is underway attempting to address the motivational aspects of these assessments (Gearhart, Saxe, Stipek, & Hakansson, 1992; O’Neil, Sugrue, Abedi, Baker, & Golan, 1992)." Dismissive Policy and validity prospects for performance-based assessment, 1993, p.341 American Psychologist, 48(12), 1210-1218. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.816.7823&rep=rep1&type=pdf Office of Research and Improvement, US Education Department It is selected-response item formats (e.g., multiple choice) that are new. Open-ended item formats have been with us for millenia. And, thousands of research, evalution, and validity studies have been conducted on them.
241 Baker, E.L. O'Neil, H.F., & Linn, R.L.  "Another approach might require the reconceptualization of the unit of assessment to include both teacher and student and thereby to legitimate help of various sorts. As yet, there is little research and only occasional speculation about the degree to which new assessments will be corrupted." Dismissive Policy and validity prospects for performance-based assessment, 1993, p.344-345 American Psychologist, 48(12), 1210-1218. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.816.7823&rep=rep1&type=pdf Office of Research and Improvement, US Education Department It is selected-response item formats (e.g., multiple choice) that are new. Open-ended item formats have been with us for millenia. And, thousands of research, evalution, and validity studies have been conducted on them.
242 Baker, E.L. O'Neil, H.F., & Linn, R.L.  "A better research base is needed to evaluate the degree to which newly developed assessments fulfill expectations" Denigrating Policy and validity prospects for performance-based assessment, 1993, p.346 American Psychologist, 48(12), 1210-1218. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.816.7823&rep=rep1&type=pdf Office of Research and Improvement, US Education Department It is selected-response item formats (e.g., multiple choice) that are new. Open-ended item formats have been with us for millenia. And, thousands of research, evalution, and validity studies have been conducted on them.
243 Eva L. Baker Robert L. Linn "Because performance assessments are emerging phenomena, procedures for assessing their quality are in some disorder." Denigrating The Technical Merits of Performance Assessments, p.1 CRESST Line, Special 1993 AERA Issue   Office of Research and Improvement, US Education Department Emerging? It is selected-response item formats (e.g., multiple choice) that are new. Open-ended item formats have been with us for millenia. And, thousands of research, evalution, and validity studies have been conducted on them.
244 Eva L. Baker Robert L. Linn "Second, there is relatively little analysis of the sequence of technical procedures required to render assessments sound for some uses."  Dismissive The Technical Merits of Performance Assessments, p.1 CRESST Line, Special 1993 AERA Issue   Office of Research and Improvement, US Education Department Emerging? It is selected-response item formats (e.g., multiple choice) that are new. Open-ended item formats have been with us for millenia. And, thousands of research, evalution, and validity studies have been conducted on them.
245 Eva L. Baker Robert L. Linn "The problem is that we cannot learn enough from the conduct of short-term instructional studies, nor can we wait for the results of longer-term instructional programs. ...We must continue to operate on faith." Denigrating The Technical Merits of Performance Assessments, p.2 CRESST Line, Special 1993 AERA Issue   Office of Research and Improvement, US Education Department Emerging? It is selected-response item formats (e.g., multiple choice) that are new. Open-ended item formats have been with us for millenia. And, thousands of research, evalution, and validity studies have been conducted on them.
246 Walter M. Haney George F. Madaus, Robert Lyons "Academics who write about educational and psychological testing similarly have given little attention to the commercial side of testing." p.9 Dismissive The Fractured Marketplace for Standardized Testing National Commission on Testing and Public Policy, Boston College, Kluwer Academic Publishers, 1993   "Finally we thank the Ford Foundation, and three present and former officials there, …"  
247 Walter M. Haney George F. Madaus, Robert Lyons "Nor is there much clear evidence on the potential distortions introduced by the Lake Wobegon phenomenon." p.231 Dismissive The Fractured Marketplace for Standardized Testing National Commission on Testing and Public Policy, Boston College, Kluwer Academic Publishers, 1993   "Finally we thank the Ford Foundation, and three present and former officials there, …" John J. Cannells original "Lake Wobegon Effect" studies did a fine job of specifying the results, in detail.  See:  http://nonpartisaneducation.org/Review/Books/CannellBook1.htm  http://nonpartisaneducation.org/Review/Books/Cannell2.pdf
248 Robert L. Linn Vonda L. Kiplinger "Unfortunately, there have been no empirical studies to date to either support or reject the hypothesized lack of motivation generated by the NAEP testing environment, or to show whether students' performance would be improved if motivation were increased." 1stness Raising the stakes of test administration: The impact on student performance on NAEP, p.3 CSE Technical Report 360, March 3, 1993 https://files.eric.ed.gov/fulltext/ED378221.pdf Office of Research and Improvement, US Education Department A cornucopia of research has shown "no stakes" tests to be relatively unreliable, less reliable than high stakes tests, and to dampen student effort (see, e.g., Acherman & Kanfer, 2009; S. M. Brown & Walberg, 1993; Cole, Bergin, & Whittaker, 2008; Eklof, 2007; Finn, 2015; Hawthorne, Bol, Pribesh, & Suh, 2015; Wise & DeMars, 2005, 2015). Relevant studies of the effects of tests and/or accountability program on motivation and instructional practice include those of the *Southern Regional Education Board (1998); Johnson (1998); Schafer, Hultgren, Hawley, Abrams Seubert & Mazzoni (1997); Miles, Bishop, Collins, Fink, Gardner, Grant, Hussain, et al. (1997); Tuckman & Trimble (1997); Clarke & Stephens (1996); Zigarelli (1996); Stevenson, Lee, et al. (1995); Waters, Burger & Burger (1995); Egeland (1995); Prais (1995); Tuckman (1994); Ritchie & Thorkildsen (1994); Brown & Walberg, (1993); Wall & Alderson (1993); Wolf & Rapiau (1993); Eckstein & Noah (1993); Chao-Qun & Hui (1993); Plazak & Mazur (1992); Steedman (1992); Singh, Marimutha & Mukjerjee (1990); *Levine & Lezotte (1990); O’Sullivan (1989); Somerset (1988); Pennycuick & Murphy (1988); Stevens (1984); Marsh (1984); Brunton (1982); Solberg (1977); Foss (1977); *Kirkland (1971); Somerset (1968); Stuit (1947); and Keys (1934).  *Covers many studies; study is a research review, research synthesis, or meta-analysis.
249 Robert L. Linn Vonda L. Kiplinger "Although much has been written on achievement motivation per se, there has been surprisingly little empirical research on the effects of different motivation conditions on test performance. Before examining the paucity of research on the relationship of motivation and test performance....?" Dismissive Raising the stakes of test administration: The impact on student performance on NAEP, p.3 CSE Technical Report 360, March 3, 1993 https://files.eric.ed.gov/fulltext/ED378221.pdf Office of Research and Improvement, US Education Department A cornucopia of research has shown "no stakes" tests to be relatively unreliable, less reliable than high stakes tests, and to dampen student effort (see, e.g., Acherman & Kanfer, 2009; S. M. Brown & Walberg, 1993; Cole, Bergin, & Whittaker, 2008; Eklof, 2007; Finn, 2015; Hawthorne, Bol, Pribesh, & Suh, 2015; Wise & DeMars, 2005, 2015). Relevant studies of the effects of tests and/or accountability program on motivation and instructional practice include those of the *Southern Regional Education Board (1998); Johnson (1998); Schafer, Hultgren, Hawley, Abrams Seubert & Mazzoni (1997); Miles, Bishop, Collins, Fink, Gardner, Grant, Hussain, et al. (1997); Tuckman & Trimble (1997); Clarke & Stephens (1996); Zigarelli (1996); Stevenson, Lee, et al. (1995); Waters, Burger & Burger (1995); Egeland (1995); Prais (1995); Tuckman (1994); Ritchie & Thorkildsen (1994); Brown & Walberg, (1993); Wall & Alderson (1993); Wolf & Rapiau (1993); Eckstein & Noah (1993); Chao-Qun & Hui (1993); Plazak & Mazur (1992); Steedman (1992); Singh, Marimutha & Mukjerjee (1990); *Levine & Lezotte (1990); O’Sullivan (1989); Somerset (1988); Pennycuick & Murphy (1988); Stevens (1984); Marsh (1984); Brunton (1982); Solberg (1977); Foss (1977); *Kirkland (1971); Somerset (1968); Stuit (1947); and Keys (1934).  *Covers many studies; study is a research review, research synthesis, or meta-analysis.
250 Robert L. Linn Vonda L. Kiplinger "Before examining the paucity of research on the relationship of motivation and test performance, we first review briefly the general literature on the relationship of motivation and achievement." Dismissive Raising the stakes of test administration: The impact on student performance on NAEP, p.3 CSE Technical Report 360, March 3, 1993 https://files.eric.ed.gov/fulltext/ED378221.pdf Office of Research and Improvement, US Education Department A cornucopia of research has shown "no stakes" tests to be relatively unreliable, less reliable than high stakes tests, and to dampen student effort (see, e.g., Acherman & Kanfer, 2009; S. M. Brown & Walberg, 1993; Cole, Bergin, & Whittaker, 2008; Eklof, 2007; Finn, 2015; Hawthorne, Bol, Pribesh, & Suh, 2015; Wise & DeMars, 2005, 2015). Relevant studies of the effects of tests and/or accountability program on motivation and instructional practice include those of the *Southern Regional Education Board (1998); Johnson (1998); Schafer, Hultgren, Hawley, Abrams Seubert & Mazzoni (1997); Miles, Bishop, Collins, Fink, Gardner, Grant, Hussain, et al. (1997); Tuckman & Trimble (1997); Clarke & Stephens (1996); Zigarelli (1996); Stevenson, Lee, et al. (1995); Waters, Burger & Burger (1995); Egeland (1995); Prais (1995); Tuckman (1994); Ritchie & Thorkildsen (1994); Brown & Walberg, (1993); Wall & Alderson (1993); Wolf & Rapiau (1993); Eckstein & Noah (1993); Chao-Qun & Hui (1993); Plazak & Mazur (1992); Steedman (1992); Singh, Marimutha & Mukjerjee (1990); *Levine & Lezotte (1990); O’Sullivan (1989); Somerset (1988); Pennycuick & Murphy (1988); Stevens (1984); Marsh (1984); Brunton (1982); Solberg (1977); Foss (1977); *Kirkland (1971); Somerset (1968); Stuit (1947); and Keys (1934).  *Covers many studies; study is a research review, research synthesis, or meta-analysis.
251 Robert L. Linn Vonda L. Kiplinger "Prior to 1980, achievement motivation theory focused primarily on the need for achievement and the effects of test anxiety on test performance." Dismissive Raising the stakes of test administration: The impact on student performance on NAEP, p.3 CSE Technical Report 360, March 3, 1993 https://files.eric.ed.gov/fulltext/ED378221.pdf Office of Research and Improvement, US Education Department Relevant studies of the effects of tests and/or accountability program on motivation and instructional practice include those of the *Southern Regional Education Board (1998); Johnson (1998); Schafer, Hultgren, Hawley, Abrams Seubert & Mazzoni (1997); Miles, Bishop, Collins, Fink, Gardner, Grant, Hussain, et al. (1997); Tuckman & Trimble (1997); Clarke & Stephens (1996); Zigarelli (1996); Stevenson, Lee, et al. (1995); Waters, Burger & Burger (1995); Egeland (1995); Prais (1995); Tuckman (1994); Ritchie & Thorkildsen (1994); Brown & Walberg, (1993); Wall & Alderson (1993); Wolf & Rapiau (1993); Eckstein & Noah (1993); Chao-Qun & Hui (1993); Plazak & Mazur (1992); Steedman (1992); Singh, Marimutha & Mukjerjee (1990); *Levine & Lezotte (1990); O’Sullivan (1989); Somerset (1988); Pennycuick & Murphy (1988); Stevens (1984); Marsh (1984); Brunton (1982); Solberg (1977); Foss (1977); *Kirkland (1971); Somerset (1968); Stuit (1947); and Keys (1934).  *Covers many studies; study is a research review, research synthesis, or meta-analysis. "Others have considered the role of tests in incentive programs.  These researchers have included Homme, Csanyi, Gonzales, Rechs, O’Leary, Drabman, Kaszdin, Bootzin, Staats, Cameron, Pierce, McMillan, Corcoran, and Wilson. International organizations, such as the World Bank or the Asian Development Bank, have studied the effects of testing on education programs they sponsor.  Researchers have included Somerset, Heynemann, Ransom, Psacharopoulis, Velez, Brooke, Oxenham, Bude, Chapman, Snyder, and Pronaratna.
Moreover, the mastery learning/mastery testing experiments conducted from the 1960s through today varied incentives, frequency of tests, types of tests, and many other factors to determine the optimal structure of testing programs. Researchers included such notables as Bloom, Carroll, Keller, Block, Burns, Wentling, Anderson, Hymel, Kulik, Tierney, Cross, Okey, Guskey, Gates, and Jones."
252 Robert L. Linn Vonda L. Kiplinger "Despite continuing concern regarding the effects of motivation on student achievement and test performance in general, ...there has been very little empirical research on students' self-reported motivation levels or experimental manipulation of motivational conditions--until recently." Dismissive Raising the stakes of test administration: The impact on student performance on NAEP, p.3 CSE Technical Report 360, March 3, 1993 https://files.eric.ed.gov/fulltext/ED378221.pdf Office of Research and Improvement, US Education Department Relevant studies of the effects of tests and/or accountability program on motivation and instructional practice include those of the *Southern Regional Education Board (1998); Johnson (1998); Schafer, Hultgren, Hawley, Abrams Seubert & Mazzoni (1997); Miles, Bishop, Collins, Fink, Gardner, Grant, Hussain, et al. (1997); Tuckman & Trimble (1997); Clarke & Stephens (1996); Zigarelli (1996); Stevenson, Lee, et al. (1995); Waters, Burger & Burger (1995); Egeland (1995); Prais (1995); Tuckman (1994); Ritchie & Thorkildsen (1994); Brown & Walberg, (1993); Wall & Alderson (1993); Wolf & Rapiau (1993); Eckstein & Noah (1993); Chao-Qun & Hui (1993); Plazak & Mazur (1992); Steedman (1992); Singh, Marimutha & Mukjerjee (1990); *Levine & Lezotte (1990); O’Sullivan (1989); Somerset (1988); Pennycuick & Murphy (1988); Stevens (1984); Marsh (1984); Brunton (1982); Solberg (1977); Foss (1977); *Kirkland (1971); Somerset (1968); Stuit (1947); and Keys (1934).  *Covers many studies; study is a research review, research synthesis, or meta-analysis. "Others have considered the role of tests in incentive programs.  These researchers have included Homme, Csanyi, Gonzales, Rechs, O’Leary, Drabman, Kaszdin, Bootzin, Staats, Cameron, Pierce, McMillan, Corcoran, and Wilson. International organizations, such as the World Bank or the Asian Development Bank, have studied the effects of testing on education programs they sponsor.  Researchers have included Somerset, Heynemann, Ransom, Psacharopoulis, Velez, Brooke, Oxenham, Bude, Chapman, Snyder, and Pronaratna.
Moreover, the mastery learning/mastery testing experiments conducted from the 1960s through today varied incentives, frequency of tests, types of tests, and many other factors to determine the optimal structure of testing programs. Researchers included such notables as Bloom, Carroll, Keller, Block, Burns, Wentling, Anderson, Hymel, Kulik, Tierney, Cross, Okey, Guskey, Gates, and Jones."
253 Joan L. Herman   "Although the development of new alternatives is a popular idea, and many are engaged in the process, most developers of these new alternatives (with the exception of writing assessments) are at the design and prototyping stages, at some distance from having validated assessments." Dismissive Accountability and Alternative Assessment: Research and Development Issues, p.9 CSE Technical Report 348, August 1992 https://cresst.org/wp-content/uploads/TECH348.pdf Office of Research and Improvement, US Education Department It is selected-response item formats (e.g., multiple choice) that are new. Performance and authentic test formats have been with us for millenia. And, thousands of research, evalution, and validity studies have been conducted on them.
254 Joan L. Herman   "Yet what we know about alternative or performance-based measures is relatively small when compared to what we have yet to discover." Dismissive Accountability and Alternative Assessment: Research and Development Issues, p.9 CSE Technical Report 348, August 1992 https://cresst.org/wp-content/uploads/TECH348.pdf Office of Research and Improvement, US Education Department It is selected-response item formats (e.g., multiple choice) that are new. Performance and authentic test formats have been with us for millenia. And, thousands of research, evalution, and validity studies have been conducted on them.
255 Lorrie A. Shepard   "Proponents of measurement-driveni nstruction (MDI) argued, in the 1980s, that high-stakes tests would set clear targets thus assuring that teachers would focus greater attentionon essential basic skills. Critics countered that measurement-driven instruction distorts the curriculum, .... Each side argued theoretically and from limited observations but without systematic proof of these assertions." Dismissive Will National Tests Improve Student Learning?, p.6 CSE Technical Report 342, April 1992 https://files.eric.ed.gov/fulltext/ED348382.pdf Office of Research and Improvement, US Education Department Relevant pre-2000 studies of the effects of standards, alignment, goal setting, setting reachable goals, etc. include those of Mitchell (1999); Morgan & Ramist (1998); the *Southern Regional Education Board (1998); Miles, Bishop, Collins, Fink, Gardner, Grant, Hussain, et al. (1997); the Florida Office of Program Policy Analysis (1997); Pomplun (1997); Schmoker (1996); Aguilera & Hendricks (1996); Banta, Lund, Black & Oblander (1996); Bottoms & Mikos (1995); *Bamburg & Medina (1993); Bishop (1993); the U. S. General Accounting Office (1993); Eckstein & Noah (1993); Mattsson (1993); Brown (1992); Heyneman & Ransom (1992); Whetton (1992); Anderson, Muir, Bateson, Blackmore & Rogers (1990); Csikszentmihalyi (1990); *Levine & Lezotte (1990); LaRoque & Coleman (1989); Hillocks (1987); Willingham & Morris (1986); Resnick & Resnick (1985); Ogle & Fritts (1984); *Natriello & Dornbusch (1984); Brooke & Oxenham (1984); Rentz (1979); Wellisch, MacQueen, Carriere & Dick (1978); *Rosswork (1977); Estes, Colvin & Goodwin (1976); Wood (1953); and Panlasigui & Knight (1930).
256 Lorrie A. Shepard   "The vision of curriculum-driven examinations offered by the National Education Goals Panel is inspired. However, we do not at present have the technical, curricular, or political know-how to install such a system at least not on so large a scale." Dismissive Will National Tests Improve Student Learning?, p.10 CSE Technical Report 342, April 1992 https://files.eric.ed.gov/fulltext/ED348382.pdf Office of Research and Improvement, US Education Department  
257 Lorrie A. Shepard   "Moreover, there is no evidence available about what would happen to the quality of instruction if all high-school teachers, not just those who volunteered, were required to teach to the AP curricula." Dismissive Will National Tests Improve Student Learning?, p.10 CSE Technical Report 342, April 1992 https://files.eric.ed.gov/fulltext/ED348382.pdf Office of Research and Improvement, US Education Department  
258 Lorrie A. Shepard   "Research evidence on the effects of traditional standardized tests when used as high-stakes accountability instruments is strikingly negative." Dismissive Will National Tests Improve Student Learning?, pp.15-16 CSE Technical Report 342, April 1992 https://files.eric.ed.gov/fulltext/ED348382.pdf Office of Research and Improvement, US Education Department In fact, the evidence "that testing can improve education" is voluminous. See, for example, Phelps, R. P. (2005). The rich, robust research literature on testing’s achievement benefits. In R. P. Phelps (Ed.), Defending standardized testing (pp. 55–90). Mahwah, NJ: Psychology Press. Or, see https://journals.sagepub.com/doi/abs/10.1177/0193841X19865628#abstract
259 Joan L. Herman Shari Golan ""Using greater technical rigor, Linn et al. (1989) replicated Cannell's findings, but moved beyond them in identifying underlying causes for such seemingly spurious results, among them the age of norms." pp.10-11 Denigrating Effects of Standardized Testing on Teachers and Learning—Another Look CSE Report No. 334 https://eric.ed.gov/?id=ED341738 Office of Research and Improvement, US Education Department No. Cannell was exactly right. The cause was corruption, lax security, and cheating. See, for example, https://nonpartisaneducation.org/Review/Articles/v6n3.htm
260 R.J. Dietel, J.L. Herman, and R.A. Knuth   "Although there is now great excitement about performance-based assessment, we still know relatively little about methods for designing and validating such assessments. CRESST is one of many organizations and schools researching the promises and realities of such assessments." p.3 Dismissive What Does Research Say About Assessment? North Central Regional Education Laboratory, 1991 http://methodenpool.uni-koeln.de/portfolio/What%20Does%20Research%20Say%20About%20Assessment.htm Office of Research and Improvement, US Education Department It is selected-response item formats (e.g., multiple choice) that are new. Open-ended item formats have been with us for millenia. And, thousands of research, evalution, and validity studies have been conducted on them.
261 R.J. Dietel, J.L. Herman, and R.A. Knuth   "What we know about performance-based assessment is limited and there are many issues yet to be resolved." p.6 Dismissive What Does Research Say About Assessment? North Central Regional Education Laboratory, 1991 http://methodenpool.uni-koeln.de/portfolio/What%20Does%20Research%20Say%20About%20Assessment.htm Office of Research and Improvement, US Education Department It is selected-response item formats (e.g., multiple choice) that are new. Open-ended item formats have been with us for millenia. And, thousands of research, evalution, and validity studies have been conducted on them.
262 Mary Lee Smith Carole Edelsky, Kelly Draper, Claire Rottenberg, Meredith Cherland "The research literature on the effects of external testing is small but growing." p.3 Dismissive The Role of Testing in Elementary Schools CSE Technical Report 321, May 1991 https://cresst.org/wp-content/uploads/TECH334.pdf Office of Research and Improvement, US Education Department Rubbish. Entire books dating back a century were written on the topic, for example:  C.C. Ross, Measurement in Today’s Schools, 1942;  G.M. Ruch, G.D. Stoddard, Tests and Measurements in High School Instruction, 1927;  C.W. Odell, Educational Measurement in High School, 1930. Other testimonies to the abundance of educational testing and empirical research on test use starting in the first half of the twentieth century can be found in Lincoln & Workman 1936, 4, 7; Butts 1947, 605; Monroe 1950, 1461; Holman & Docter 1972, 34; Tyack 1974, 183; and Lohman 1997, 88.
263 Mary Lee Smith Carole Edelsky, Kelly Draper, Claire Rottenberg, Meredith Cherland "Past researchers have not examined the classroom directly for traces of testing effects." p.5 Dismissive The Role of Testing in Elementary Schools CSE Technical Report 321, May 1991 https://cresst.org/wp-content/uploads/TECH334.pdf Office of Research and Improvement, US Education Department Rubbish. Entire books dating back a century were written on the topic, for example:  C.C. Ross, Measurement in Today’s Schools, 1942;  G.M. Ruch, G.D. Stoddard, Tests and Measurements in High School Instruction, 1927;  C.W. Odell, Educational Measurement in High School, 1930. Other testimonies to the abundance of educational testing and empirical research on test use starting in the first half of the twentieth century can be found in Lincoln & Workman 1936, 4, 7; Butts 1947, 605; Monroe 1950, 1461; Holman & Docter 1972, 34; Tyack 1974, 183; and Lohman 1997, 88.
264 Eva L. Baker   "Knowledge Base: Paltry But Sure to Improve: At the same time that interest in alternative assessment is high, our knowledge about the design, distribution, quality and impact of such efforts is low. This is a time of tingling metaphor, cottage industry, and existence proofs rather than carefully designed research and development." Dismissive What Probably Works in Alternative Assessment, p.2 Paper presented at the Annual Meetings of the American Educational Research Association (Chicago, IL, April 3-7, 1991)  https://files.eric.ed.gov/fulltext/ED512658.pdf Office of Research and Improvement, US Education Department It is selected-response item formats (e.g., multiple choice) that are new. Performance and authentic test formats have been with us for millenia. And, thousands of research, evalution, and validity studies have been conducted on them.
265 Eva L. Baker   "Moreover, because psychometric methods appropriate for dealing with such new measures are not readily available, nor even a matter of common agreement, no clear templates exist to guide the technical practices of alternative assessment developers (Linn, Baker, Dunbar, 1991)." Dismissive What Probably Works in Alternative Assessment, p.2 Paper presented at the Annual Meetings of the American Educational Research Association (Chicago, IL, April 3-7, 1991)  https://files.eric.ed.gov/fulltext/ED512658.pdf Office of Research and Improvement, US Education Department It is selected-response item formats (e.g., multiple choice) that are new. Performance and authentic test formats have been with us for millenia. And, thousands of research, evalution, and validity studies have been conducted on them.
266 Eva L. Baker   "Given that the level of empirical work is so obviously low, one well might wonder what these studies are about." Denigrating What Probably Works in Alternative Assessment, p.3 Paper presented at the Annual Meetings of the American Educational Research Association (Chicago, IL, April 3-7, 1991)  https://files.eric.ed.gov/fulltext/ED512658.pdf Office of Research and Improvement, US Education Department It is selected-response item formats (e.g., multiple choice) that are new. Performance and authentic test formats have been with us for millenia. And, thousands of research, evalution, and validity studies have been conducted on them.
267 Eva L. Baker   "Despite this fragile research base, alternative assessment has already taken off. What issues can we anticipate being raised by relevant communities about the value of these efforts?" Dismissive What Probably Works in Alternative Assessment, p.6 Paper presented at the Annual Meetings of the American Educational Research Association (Chicago, IL, April 3-7, 1991)  https://files.eric.ed.gov/fulltext/ED512658.pdf Office of Research and Improvement, US Education Department It is selected-response item formats (e.g., multiple choice) that are new. Performance and authentic test formats have been with us for millenia. And, thousands of research, evalution, and validity studies have been conducted on them.
268 Eva L. Baker   "This phenomenon may be due to lack of coherent specifications of the performance task domain, lack of coherent instructional experience, or the inherent instability of more complex performance? Until some insight on this phenomenon can be developed, however, using a single performance assessment for individual student decisions is a scary prospect." Dismissive What Probably Works in Alternative Assessment, p.7 Paper presented at the Annual Meetings of the American Educational Research Association (Chicago, IL, April 3-7, 1991)  https://files.eric.ed.gov/fulltext/ED512658.pdf Office of Research and Improvement, US Education Department It is selected-response item formats (e.g., multiple choice) that are new. Performance and authentic test formats have been with us for millenia. And, thousands of research, evalution, and validity studies have been conducted on them.
269 Lorrie A. Shepard Catherine Cutts Dougherty "Evidence to support the positive claims for measurement-driven instruction comes primarily from high-stakes tests themselves. For example, Popham, Cruse, Rankin, Sandifer, and Williams (1985) and Popham (1987) pointed to the steeply rising passing rates on minimum competency tests as demonstrations that MDI had improved student learning." p.2 Denigrating Effect of High-Stakes Testing on Instruction Paper presented at the Annual Meetings of the American Educational Research Association (Chicago, IL, April 3-7, 1991) and the National Council on Measurement in Education (Chicago, IL, April 4-6,1991) https://files.eric.ed.gov/fulltext/ED337468.pdf Office of Research and Improvement, US Education Department The many studies of district and state minimum competency or diploma testing programs popular from the 1960s through the 1980s found positive effects for students just below the cut score and mixed effects for students far below and anywhere above.  Researchers have included Fincher, Jackson, Battiste, Corcoran, Jacobsen, Tanner, Boylan, Saxon, Anderson, Muir, Bateson, Blackmore, Rogers, Zigarelli, Schafer, Hultgren, Hawley, Abrams, Seubert, Mazzoni, Brookhart, Mendro, Herrick, Webster, Orsack, Weerasinghe, and Bembry
270 Lorrie A. Shepard Catherine Cutts Dougherty "Evidence documenting the negative influence on instruction is limited to a few studies. Darling-Hammond and Wise (1985) reported that teachers in their study were pressured to 'teach to the test.'" Dismissive Effect of High-Stakes Testing on Instruction Paper presented at the Annual Meetings of the American Educational Research Association (Chicago, IL, April 3-7, 1991) and the National Council on Measurement in Education (Chicago, IL, April 4-6,1991) https://files.eric.ed.gov/fulltext/ED337468.pdf Office of Research and Improvement, US Education Department Rubbish. Entire books dating back a century were written on the topic, for example:  C.C. Ross, Measurement in Today’s Schools, 1942;  G.M. Ruch, G.D. Stoddard, Tests and Measurements in High School Instruction, 1927;  C.W. Odell, Educational Measurement in High School, 1930. Other testimonies to the abundance of educational testing and empirical research on test use starting in the first half of the twentieth century can be found in Lincoln & Workman 1936, 4, 7; Butts 1947, 605; Monroe 1950, 1461; Holman & Docter 1972, 34; Tyack 1974, 183; and Lohman 1997, 88.
271 Daniel M. Koretz Robert L. Linn, Stephen Dunbar, Lorrie A. Shepard “Evidence relevant to this debate has been limited.” p. 2 Dismissive The Effects of High-Stakes Testing On Achievement: Preliminary Findings About Generalization Across Tests  Originally presented at the annual meeting of the AERA and the NCME, Chicago, April 5, 1991 http://nepc.colorado.edu/files/HighStakesTesting.pdf Office of Research and Improvement, US Education Department See, for example, https://www.tandfonline.com/doi/full/10.1080/15305058.2011.602920 ; https://nonpartisaneducation.org/Review/Resources/QuantitativeList.htm ; https://nonpartisaneducation.org/Review/Resources/SurveyList.htm ; https://nonpartisaneducation.org/Review/Resources/QualitativeList.htm
272 James S. Catterall   "Before proceeding, readers should note that the observations do not result from an accumulated weight of in-depth cost-benefit type studies, since no such weight has been registered." p.2 Dismissive Estimating the Costs and Benefits of Large-Scale Assessments: Lessons from Recent Research CSE Report No. 319, 1990 https://cresst.org/wp-content/uploads/TECH319.pdf Office of Research and Improvement, US Education Department  
273 James S. Catterall   "The points tend to build on the small number of interesting developments reported (particularly Shepard & Kreitzer, 1987a, 1987b; Solmon & Fagnano, in press), as well as on the author's experiences in conducting cost-benefit type analyses of educational assessment practices (Catterall, 1984, 1989). We also base inferences on the paucity of research itself." p.2 Dismissive Estimating the Costs and Benefits of Large-Scale Assessments: Lessons from Recent Research CSE Report No. 319, 1990 https://cresst.org/wp-content/uploads/TECH319.pdf Office of Research and Improvement, US Education Department  
274 Hartigan, J. A., & Wigdor, A. K.   "The empirical evidence cited for the standard deviation of worker productivity is quite slight." p.239 Dismissive Fairness in employment testing: Validity generalization, minority issues, and the General Aptitude Test Battery.  Washington, DC: National Academy Press, 1989 https://www.nap.edu/catalog/1338/fairness-in-employment-testing-validity-generalization-minority-issues-and-the National Research Council funders See, for example, The National Research Council’s Testing Expertise,  http://supp.apa.org/books/Correcting-Fallacies/appendix-d.pdf
275 Hartigan, J. A., & Wigdor, A. K.   "Some fragmentary confirming evidence that supports this point of view can be found in Hunter et al. (1988)... We regard the Hunter and Schmidt assumption as plausible but note that there is very little evidence about the nature of the relationship of ability to output." p.243 Dismissive Fairness in employment testing: Validity generalization, minority issues, and the General Aptitude Test Battery.  Washington, DC: National Academy Press, 1989 https://www.nap.edu/catalog/1338/fairness-in-employment-testing-validity-generalization-minority-issues-and-the National Research Council funders See, for example, The National Research Council’s Testing Expertise,  http://supp.apa.org/books/Correcting-Fallacies/appendix-d.pdf
276 Hartigan, J. A., & Wigdor, A. K.   "It is also important to remember that the most important assumptions of the Hunter-Schmidt models rest on a very slim empirical foundation .... Hunter and Schmidt's economy-wide models are based on simple assumptions for which the empirical evidence is slight." p.245 Dismissive, Denigrating Fairness in employment testing: Validity generalization, minority issues, and the General Aptitude Test Battery.  Washington, DC: National Academy Press, 1989 https://www.nap.edu/catalog/1338/fairness-in-employment-testing-validity-generalization-minority-issues-and-the National Research Council funders See, for example, The National Research Council’s Testing Expertise,  http://supp.apa.org/books/Correcting-Fallacies/appendix-d.pdf
277 Hartigan, J. A., & Wigdor, A. K.   "It is important to remember that the most important assumptions of the Hunter-Schmidt models rest on a very slim empirical foundation." p.245 Dismissive, Denigrating Fairness in employment testing: Validity generalization, minority issues, and the General Aptitude Test Battery.  Washington, DC: National Academy Press, 1989 https://www.nap.edu/catalog/1338/fairness-in-employment-testing-validity-generalization-minority-issues-and-the National Research Council funders See, for example, The National Research Council’s Testing Expertise,  http://supp.apa.org/books/Correcting-Fallacies/appendix-d.pdf
278 Hartigan, J. A., & Wigdor, A. K.   "Hunter and Schmidt's economy wide models are based on simple assumptions for which the empirical evidence is slight." p.245 Dismissive, Denigrating Fairness in employment testing: Validity generalization, minority issues, and the General Aptitude Test Battery.  Washington, DC: National Academy Press, 1989 https://www.nap.edu/catalog/1338/fairness-in-employment-testing-validity-generalization-minority-issues-and-the National Research Council funders See, for example, The National Research Council’s Testing Expertise,  http://supp.apa.org/books/Correcting-Fallacies/appendix-d.pdf
279 Hartigan, J. A., & Wigdor, A. K.   "That assumption is supported by only a very few studies." p.245 Dismissive, Denigrating Fairness in employment testing: Validity generalization, minority issues, and the General Aptitude Test Battery.  Washington, DC: National Academy Press, 1989 https://www.nap.edu/catalog/1338/fairness-in-employment-testing-validity-generalization-minority-issues-and-the National Research Council funders See, for example, The National Research Council’s Testing Expertise,  http://supp.apa.org/books/Correcting-Fallacies/appendix-d.pdf
280 Hartigan, J. A., & Wigdor, A. K.   "There is no well-developed body of evidence from which to estimate the aggregate effects of better personnel selection...we have seen no empirical evidence that any of them provide an adequate basis for estimating the aggregate economic effects of implementing the VG-GATB on a nationwide basis." p.247 Dismissive, Denigrating Fairness in employment testing: Validity generalization, minority issues, and the General Aptitude Test Battery.  Washington, DC: National Academy Press, 1989 https://www.nap.edu/catalog/1338/fairness-in-employment-testing-validity-generalization-minority-issues-and-the National Research Council funders See, for example, The National Research Council’s Testing Expertise,  http://supp.apa.org/books/Correcting-Fallacies/appendix-d.pdf
281 Hartigan, J. A., & Wigdor, A. K.   "Furthermore, given the state of scientific knowledge, we do not believe that realistic dollar estimates of aggregate gains from improved selection are even possible." p.248 Dismissive Fairness in employment testing: Validity generalization, minority issues, and the General Aptitude Test Battery.  Washington, DC: National Academy Press, 1989 https://www.nap.edu/catalog/1338/fairness-in-employment-testing-validity-generalization-minority-issues-and-the National Research Council funders See, for example, The National Research Council’s Testing Expertise,  http://supp.apa.org/books/Correcting-Fallacies/appendix-d.pdf
282 Hartigan, J. A., & Wigdor, A. K.   "...primitive state of knowledge..." p.248 Denigrating Fairness in employment testing: Validity generalization, minority issues, and the General Aptitude Test Battery.  Washington, DC: National Academy Press, 1989 https://www.nap.edu/catalog/1338/fairness-in-employment-testing-validity-generalization-minority-issues-and-the National Research Council funders See, for example, The National Research Council’s Testing Expertise,  http://supp.apa.org/books/Correcting-Fallacies/appendix-d.pdf
283 Joan L. Herman, Donald W. Dorr-Bremme Walter E. Hathaway, Ed. "Despite the controversy and the important issues that it raises, little information has been forthcoming on the nature of testing as it is actually used in the schools. What functions do tests serve in the classrooms? How do teachers and principals use test results? What kinds of tests do principals and teachers trust and rely on most? These and similar questions have gone largely unaddressed." p.8 Dismissive Uses of Testing in the Schools: A National Profile Testing in the Schools, New Directions for Testing and Measurement #19, Jossey-Bass, September 1983   Office of Research and Improvement, US Education Department Rubbish. Entire books dating back a century were written on the topic, for example:  C.C. Ross, Measurement in Today’s Schools, 1942;  G.M. Ruch, G.D. Stoddard, Tests and Measurements in High School Instruction, 1927;  C.W. Odell, Educational Measurement in High School, 1930. Other testimonies to the abundance of educational testing and empirical research on test use starting in the first half of the twentieth century can be found in Lincoln & Workman 1936, 4, 7; Butts 1947, 605; Monroe 1950, 1461; Holman & Docter 1972, 34; Tyack 1974, 183; and Lohman 1997, 88.
284 Joan L. Herman, Donald W. Dorr-Bremme Walter E. Hathaway, Ed. "A few studies have indicated teachers' circumspect attitudes toward and limited use of one type of achievement measure, the norm-referenced test. Beyond this, however, the landscape of test uses in American schools has remained largely unexplored." p.8 Dismissive Uses of Testing in the Schools: A National Profile Testing in the Schools, New Directions for Testing and Measurement #19, Jossey-Bass, September 1983   Office of Research and Improvement, US Education Department Rubbish. Entire books dating back a century were written on the topic, for example:  C.C. Ross, Measurement in Today’s Schools, 1942;  G.M. Ruch, G.D. Stoddard, Tests and Measurements in High School Instruction, 1927;  C.W. Odell, Educational Measurement in High School, 1930. Other testimonies to the abundance of educational testing and empirical research on test use starting in the first half of the twentieth century can be found in Lincoln & Workman 1936, 4, 7; Butts 1947, 605; Monroe 1950, 1461; Holman & Docter 1972, 34; Tyack 1974, 183; and Lohman 1997, 88.
285 Joan L. Herman, Donald W. Dorr-Bremme Walter E. Hathaway, Ed. "We know very little about the quality of teacher-developed tests." p.15 Dismissive Uses of Testing in the Schools: A National Profile Testing in the Schools, New Directions for Testing and Measurement #19, Jossey-Bass, September 1983   Office of Research and Improvement, US Education Department Rubbish. Entire books dating back a century were written on the topic, for example:  C.C. Ross, Measurement in Today’s Schools, 1942;  G.M. Ruch, G.D. Stoddard, Tests and Measurements in High School Instruction, 1927;  C.W. Odell, Educational Measurement in High School, 1930. Other testimonies to the abundance of educational testing and empirical research on test use starting in the first half of the twentieth century can be found in Lincoln & Workman 1936, 4, 7; Butts 1947, 605; Monroe 1950, 1461; Holman & Docter 1972, 34; Tyack 1974, 183; and Lohman 1997, 88.
286 Don Dorr-Bremme James Catterall "Relatively little is known aout students' attitudes and feelings toward assessment in general. Even less is known regarding their feelings about different forms of assessment." p.48-1 Dismissive Costs of Testing: Test Use Project CSE Report, November 1982 https://files.eric.ed.gov/fulltext/ED224835.pdf National Institute of Education, US Education Department See  https://nonpartisaneducation.org/Review/Resources/QualitativeList.htm. For a list of 19 pre-1982 qualitative studies of student attitudes toward testing
287 Don Dorr-Bremme James Catterall "in light of these few and certainly non-definitive findings, student interviews were undertaken to explore the affective valence that different forms of achievement assessment have for students." p.48-2 Dismissive Costs of Testing: Test Use Project CSE Report, November 1982 https://files.eric.ed.gov/fulltext/ED224835.pdf National Institute of Education, US Education Department See  https://nonpartisaneducation.org/Review/Resources/QualitativeList.htm. For a list of 19 pre-1982 qualitative studies of student attitudes toward testing
288 Don Dorr-Bremme James Catterall "Because of the small sample size and the paucity of research in this topic, these findings suggests potential avenues for research as much as they provide information." p.48-26 Dismissive Costs of Testing: Test Use Project CSE Report, November 1982 https://files.eric.ed.gov/fulltext/ED224835.pdf National Institute of Education, US Education Department See  https://nonpartisaneducation.org/Review/Resources/QualitativeList.htm. For a list of 19 pre-1982 qualitative studies of student attitudes toward testing
289 Jennie P. Yeh Joan L. Herman "Testing in American schools is increasing in both scope and visibility. … What return are we getting for this quite considerable investment? Little information is available. How are tests used in schools? What functions to test serve in classrooms?", p.1 Dismissive Teachers and testing: A survey of test use CSE Report No. 166, 1981 https://files.eric.ed.gov/fulltext/ED218336.pdf National Institute of Education, US Education Department Rubbish. Entire books dating back a century were written on the topic, for example:  C.C. Ross, Measurement in Today’s Schools, 1942;  G.M. Ruch, G.D. Stoddard, Tests and Measurements in High School Instruction, 1927;  C.W. Odell, Educational Measurement in High School, 1930. Other testimonies to the abundance of educational testing and empirical research on test use starting in the first half of the twentieth century can be found in Lincoln & Workman 1936, 4, 7; Butts 1947, 605; Monroe 1950, 1461; Holman & Docter 1972, 34; Tyack 1974, 183; and Lohman 1997, 88.
290 Joan L. Herman James Burry, Don Dorr-Bremme, Charlotte M. Lazar-Morrison, James D. Lehman, Jennie P. Yeh "Despite the great controversy that surrounds testing and its potential uses and abuses, there is little empirical information available about the nature of testing as it actually occurs and is used (or not used) in schools. The Test Use Project at the Center for the Study of Evaluation seeks to fill this gap and answer basic questions about tests and schooling.", p.2 Dismissive Teaching and testing: Allies or adversaries CSE Report No. 165, 1981 https://files.eric.ed.gov/fulltext/ED218336.pdf National Institute of Education, US Education Department Rubbish. Entire books dating back a century were written on the topic, for example:  C.C. Ross, Measurement in Today’s Schools, 1942;  G.M. Ruch, G.D. Stoddard, Tests and Measurements in High School Instruction, 1927;  C.W. Odell, Educational Measurement in High School, 1930. Other testimonies to the abundance of educational testing and empirical research on test use starting in the first half of the twentieth century can be found in Lincoln & Workman 1936, 4, 7; Butts 1947, 605; Monroe 1950, 1461; Holman & Docter 1972, 34; Tyack 1974, 183; and Lohman 1997, 88.
291 Joan L. Herman James Burry, Don Dorr-Bremme, Charlotte M. Lazar-Morrison, James D. Lehman, Jennie P. Yeh "Clearly the policy toward testing in this country has been one of accretion, but the full magnitude is undocumented. The CSE Test Use Project ... ", p.2 Dismissive Teaching and testing: Allies or adversaries CSE Report No. 165, 1981 https://files.eric.ed.gov/fulltext/ED218336.pdf National Institute of Education, US Education Department Rubbish. Entire books dating back a century were written on the topic, for example:  C.C. Ross, Measurement in Today’s Schools, 1942;  G.M. Ruch, G.D. Stoddard, Tests and Measurements in High School Instruction, 1927;  C.W. Odell, Educational Measurement in High School, 1930. Other testimonies to the abundance of educational testing and empirical research on test use starting in the first half of the twentieth century can be found in Lincoln & Workman 1936, 4, 7; Butts 1947, 605; Monroe 1950, 1461; Holman & Docter 1972, 34; Tyack 1974, 183; and Lohman 1997, 88.
292 James Burry   "As instructional considerations have come into prominence, the dialogue over testing has become somewhat adversarial, with a great deal of the recent literature forming a series of position papers espousing the value of one kind of test over another, but offering little empirical data (Lazar-Morrison, Polin, Moy, & Burry, 1980)." p.27 Dismissive The Design of Testing Programs with Multiple and Complimentary Uses Paper presented at the Annual Meeting of the National Council on Measurement in Education (Los Angeles, CA, April 1981) https://files.eric.ed.gov/fulltext/ED218337.pdf National Institute of Education, US Education Department Rubbish. Entire books dating back a century were written on the topic, for example:  C.C. Ross, Measurement in Today’s Schools, 1942;  G.M. Ruch, G.D. Stoddard, Tests and Measurements in High School Instruction, 1927;  C.W. Odell, Educational Measurement in High School, 1930. Other testimonies to the abundance of educational testing and empirical research on test use starting in the first half of the twentieth century can be found in Lincoln & Workman 1936, 4, 7; Butts 1947, 605; Monroe 1950, 1461; Holman & Docter 1972, 34; Tyack 1974, 183; and Lohman 1997, 88.
293 James Burry   "This paper makes a preliminary step toward explicating school peoples' points of view about the kinds of assessment that are useful for external accountability concerns and for instructional decision making." pp.27-28 1stness The Design of Testing Programs with Multiple and Complimentary Uses Paper presented at the Annual Meeting of the National Council on Measurement in Education (Los Angeles, CA, April 1981) https://files.eric.ed.gov/fulltext/ED218337.pdf National Institute of Education, US Education Department Rubbish. Entire books dating back a century were written on the topic, for example:  C.C. Ross, Measurement in Today’s Schools, 1942;  G.M. Ruch, G.D. Stoddard, Tests and Measurements in High School Instruction, 1927;  C.W. Odell, Educational Measurement in High School, 1930. Other testimonies to the abundance of educational testing and empirical research on test use starting in the first half of the twentieth century can be found in Lincoln & Workman 1936, 4, 7; Butts 1947, 605; Monroe 1950, 1461; Holman & Docter 1972, 34; Tyack 1974, 183; and Lohman 1997, 88.
294 Joan L. Herman Jennie Yeh "Despite the great controversy that surrounds testing and its potential uses and abuses, there is little empirical information available about the nature of testing as it actually occurs and is used (or not used) in schools. The Test Use Project …." p.2 Dismissive Contextual Examination of Test Use: The Test, The Setting, The Cost Paper presented at the Annual Meeting of the National Council on Measurement in Education (Los Angeles, CA, April 1981) https://files.eric.ed.gov/fulltext/ED218337.pdf National Institute of Education, US Education Department Rubbish. Entire books dating back a century were written on the topic, for example:  C.C. Ross, Measurement in Today’s Schools, 1942;  G.M. Ruch, G.D. Stoddard, Tests and Measurements in High School Instruction, 1927;  C.W. Odell, Educational Measurement in High School, 1930. Other testimonies to the abundance of educational testing and empirical research on test use starting in the first half of the twentieth century can be found in Lincoln & Workman 1936, 4, 7; Butts 1947, 605; Monroe 1950, 1461; Holman & Docter 1972, 34; Tyack 1974, 183; and Lohman 1997, 88.
295 Joan L. Herman Jennie Yeh "Clearly the policy toward testing in this country has been one of accretion, but the full magnitude is undocumented. The CSE Test Use Project ... ", p.2 Dismissive Contextual Examination of Test Use: The Test, The Setting, The Cost Paper presented at the Annual Meeting of the National Council on Measurement in Education (Los Angeles, CA, April 1981) https://files.eric.ed.gov/fulltext/ED218337.pdf National Institute of Education, US Education Department Rubbish. Entire books dating back a century were written on the topic, for example:  C.C. Ross, Measurement in Today’s Schools, 1942;  G.M. Ruch, G.D. Stoddard, Tests and Measurements in High School Instruction, 1927;  C.W. Odell, Educational Measurement in High School, 1930. Other testimonies to the abundance of educational testing and empirical research on test use starting in the first half of the twentieth century can be found in Lincoln & Workman 1936, 4, 7; Butts 1947, 605; Monroe 1950, 1461; Holman & Docter 1972, 34; Tyack 1974, 183; and Lohman 1997, 88.
296 Charlotte Lazar-Morrison Linda Polin, Raymond Moy, James Burry "There is little research-based information about current testing practice." Dismissive A review of the literature on test use, p.3 CSE Report No. 144, August 1980 https://cresst.org/publications/cresst-publication-2531/ National Institute of Education, US Department of Health and Human Services Rubbish. Entire books dating back a century were written on the topic, for example:  C.C. Ross, Measurement in Today’s Schools, 1942;  G.M. Ruch, G.D. Stoddard, Tests and Measurements in High School Instruction, 1927;  C.W. Odell, Educational Measurement in High School, 1930. Other testimonies to the abundance of educational testing and empirical research on test use starting in the first half of the twentieth century can be found in Lincoln & Workman 1936, 4, 7; Butts 1947, 605; Monroe 1950, 1461; Holman & Docter 1972, 34; Tyack 1974, 183; and Lohman 1997, 88.
297 Charlotte Lazar-Morrison Linda Polin, Raymond Moy, James Burry "Almost ten years ago, Kirkland (1971) reviewed the literature on test impact on students and schools and found that while much had been written about tests, few empirical studies were evident."  Dismissive A review of the literature on test use, p.3 CSE Report No. 144, August 1980 https://cresst.org/publications/cresst-publication-2531/ National Institute of Education, US Department of Health and Human Services Rubbish. Entire books dating back a century were written on the topic, for example:  C.C. Ross, Measurement in Today’s Schools, 1942;  G.M. Ruch, G.D. Stoddard, Tests and Measurements in High School Instruction, 1927;  C.W. Odell, Educational Measurement in High School, 1930. Other testimonies to the abundance of educational testing and empirical research on test use starting in the first half of the twentieth century can be found in Lincoln & Workman 1936, 4, 7; Butts 1947, 605; Monroe 1950, 1461; Holman & Docter 1972, 34; Tyack 1974, 183; and Lohman 1997, 88.
298 Charlotte Lazar-Morrison Linda Polin, Raymond Moy, James Burry "What is significant about [Kirkland's] exclusions is the correct observation that these issues are 'implications,' often not founded on empirical research."  Denigrating A review of the literature on test use, p.3 CSE Report No. 144, August 1980 https://cresst.org/publications/cresst-publication-2531/ National Institute of Education, US Department of Health and Human Services Rubbish. Entire books dating back a century were written on the topic, for example:  C.C. Ross, Measurement in Today’s Schools, 1942;  G.M. Ruch, G.D. Stoddard, Tests and Measurements in High School Instruction, 1927;  C.W. Odell, Educational Measurement in High School, 1930. Other testimonies to the abundance of educational testing and empirical research on test use starting in the first half of the twentieth century can be found in Lincoln & Workman 1936, 4, 7; Butts 1947, 605; Monroe 1950, 1461; Holman & Docter 1972, 34; Tyack 1974, 183; and Lohman 1997, 88.
299 Charlotte Lazar-Morrison Linda Polin, Raymond Moy, James Burry "Today, there still remains a plethora of publications on these very issues and a dearth of empirical support on actual test use practices." Dismissive A review of the literature on test use, p.3 CSE Report No. 144, August 1980 https://cresst.org/publications/cresst-publication-2531/ National Institute of Education, US Department of Health and Human Services Rubbish. Entire books dating back a century were written on the topic, for example:  C.C. Ross, Measurement in Today’s Schools, 1942;  G.M. Ruch, G.D. Stoddard, Tests and Measurements in High School Instruction, 1927;  C.W. Odell, Educational Measurement in High School, 1930. Other testimonies to the abundance of educational testing and empirical research on test use starting in the first half of the twentieth century can be found in Lincoln & Workman 1936, 4, 7; Butts 1947, 605; Monroe 1950, 1461; Holman & Docter 1972, 34; Tyack 1974, 183; and Lohman 1997, 88.
300 Charlotte Lazar-Morrison Linda Polin, Raymond Moy, James Burry "Kirkland's review of the literature is concentrated mainly upon the social and psychological issues in testing, more than upon instructional issues. Also, then as now, little empirical research had accumulated on the latter. Dismissive A review of the literature on test use, p.3 CSE Report No. 144, August 1980 https://cresst.org/publications/cresst-publication-2531/ National Institute of Education, US Department of Health and Human Services Rubbish. Entire books dating back a century were written on the topic, for example:  C.C. Ross, Measurement in Today’s Schools, 1942;  G.M. Ruch, G.D. Stoddard, Tests and Measurements in High School Instruction, 1927;  C.W. Odell, Educational Measurement in High School, 1930. Other testimonies to the abundance of educational testing and empirical research on test use starting in the first half of the twentieth century can be found in Lincoln & Workman 1936, 4, 7; Butts 1947, 605; Monroe 1950, 1461; Holman & Docter 1972, 34; Tyack 1974, 183; and Lohman 1997, 88.
301 Charlotte Lazar-Morrison Linda Polin, Raymond Moy, James Burry "Only recently has the testing dialogue begun to move away from social and psychological issues ...and begun to focus on the instructional issues of testing. Dismissive A review of the literature on test use, p.3 CSE Report No. 144, August 1980 https://cresst.org/publications/cresst-publication-2531/ National Institute of Education, US Department of Health and Human Services Rubbish. Entire books dating back a century were written on the topic, for example:  C.C. Ross, Measurement in Today’s Schools, 1942;  G.M. Ruch, G.D. Stoddard, Tests and Measurements in High School Instruction, 1927;  C.W. Odell, Educational Measurement in High School, 1930. Other testimonies to the abundance of educational testing and empirical research on test use starting in the first half of the twentieth century can be found in Lincoln & Workman 1936, 4, 7; Butts 1947, 605; Monroe 1950, 1461; Holman & Docter 1972, 34; Tyack 1974, 183; and Lohman 1997, 88.
302 Charlotte Lazar-Morrison Linda Polin, Raymond Moy, James Burry " ...the testing dialogue has taken the form of a debate, with the bulk of the test literature being a series of position papers citing little empirical data. This debate is being carried on predominantly by people outside the schools." Denigrating A review of the literature on test use, p.4 CSE Report No. 144, August 1980 https://cresst.org/publications/cresst-publication-2531/ National Institute of Education, US Department of Health and Human Services Rubbish. Entire books dating back a century were written on the topic, for example:  C.C. Ross, Measurement in Today’s Schools, 1942;  G.M. Ruch, G.D. Stoddard, Tests and Measurements in High School Instruction, 1927;  C.W. Odell, Educational Measurement in High School, 1930. Other testimonies to the abundance of educational testing and empirical research on test use starting in the first half of the twentieth century can be found in Lincoln & Workman 1936, 4, 7; Butts 1947, 605; Monroe 1950, 1461; Holman & Docter 1972, 34; Tyack 1974, 183; and Lohman 1997, 88.
303 Charlotte Lazar-Morrison Linda Polin, Raymond Moy, James Burry ""There is little empirical research available that can answer the questions that have arisen."  Dismissive A review of the literature on test use, p.5 CSE Report No. 144, August 1980 https://cresst.org/publications/cresst-publication-2531/ National Institute of Education, US Department of Health and Human Services Rubbish. Entire books dating back a century were written on the topic, for example:  C.C. Ross, Measurement in Today’s Schools, 1942;  G.M. Ruch, G.D. Stoddard, Tests and Measurements in High School Instruction, 1927;  C.W. Odell, Educational Measurement in High School, 1930. Other testimonies to the abundance of educational testing and empirical research on test use starting in the first half of the twentieth century can be found in Lincoln & Workman 1936, 4, 7; Butts 1947, 605; Monroe 1950, 1461; Holman & Docter 1972, 34; Tyack 1974, 183; and Lohman 1997, 88.
304 Charlotte Lazar-Morrison Linda Polin, Raymond Moy, James Burry "... little is known about the amount of other testing that takes place."  Dismissive A review of the literature on test use, p.6 CSE Report No. 144, August 1980 https://cresst.org/publications/cresst-publication-2531/ National Institute of Education, US Department of Health and Human Services Rubbish. Entire books dating back a century were written on the topic, for example:  C.C. Ross, Measurement in Today’s Schools, 1942;  G.M. Ruch, G.D. Stoddard, Tests and Measurements in High School Instruction, 1927;  C.W. Odell, Educational Measurement in High School, 1930. Other testimonies to the abundance of educational testing and empirical research on test use starting in the first half of the twentieth century can be found in Lincoln & Workman 1936, 4, 7; Butts 1947, 605; Monroe 1950, 1461; Holman & Docter 1972, 34; Tyack 1974, 183; and Lohman 1997, 88.
305 Charlotte Lazar-Morrison Linda Polin, Raymond Moy, James Burry "Although much has been written about minimum competency issues, there has yet to be any report of the actual uses or extent of the use of competency-based tests." Dismissive A review of the literature on test use, p.7 CSE Report No. 144, August 1980 https://cresst.org/publications/cresst-publication-2531/ National Institute of Education, US Department of Health and Human Services Rubbish. Entire books dating back a century were written on the topic, for example:  C.C. Ross, Measurement in Today’s Schools, 1942;  G.M. Ruch, G.D. Stoddard, Tests and Measurements in High School Instruction, 1927;  C.W. Odell, Educational Measurement in High School, 1930. Other testimonies to the abundance of educational testing and empirical research on test use starting in the first half of the twentieth century can be found in Lincoln & Workman 1936, 4, 7; Butts 1947, 605; Monroe 1950, 1461; Holman & Docter 1972, 34; Tyack 1974, 183; and Lohman 1997, 88.
306 Charlotte Lazar-Morrison Linda Polin, Raymond Moy, James Burry ""Virtually nothing is known about the amount of testing taking place using other types of assessments."  Dismissive A review of the literature on test use, p.7 CSE Report No. 144, August 1980 https://cresst.org/publications/cresst-publication-2531/ National Institute of Education, US Department of Health and Human Services Rubbish. Entire books dating back a century were written on the topic, for example:  C.C. Ross, Measurement in Today’s Schools, 1942;  G.M. Ruch, G.D. Stoddard, Tests and Measurements in High School Instruction, 1927;  C.W. Odell, Educational Measurement in High School, 1930. Other testimonies to the abundance of educational testing and empirical research on test use starting in the first half of the twentieth century can be found in Lincoln & Workman 1936, 4, 7; Butts 1947, 605; Monroe 1950, 1461; Holman & Docter 1972, 34; Tyack 1974, 183; and Lohman 1997, 88.
307 Charlotte Lazar-Morrison Linda Polin, Raymond Moy, James Burry "The literature on curriculum-embedded tests is equally scant." Dismissive A review of the literature on test use, p.8 CSE Report No. 144, August 1980 https://cresst.org/publications/cresst-publication-2531/ National Institute of Education, US Department of Health and Human Services Rubbish. Entire books dating back a century were written on the topic, for example:  C.C. Ross, Measurement in Today’s Schools, 1942;  G.M. Ruch, G.D. Stoddard, Tests and Measurements in High School Instruction, 1927;  C.W. Odell, Educational Measurement in High School, 1930. Other testimonies to the abundance of educational testing and empirical research on test use starting in the first half of the twentieth century can be found in Lincoln & Workman 1936, 4, 7; Butts 1947, 605; Monroe 1950, 1461; Holman & Docter 1972, 34; Tyack 1974, 183; and Lohman 1997, 88.
308 Charlotte Lazar-Morrison Linda Polin, Raymond Moy, James Burry "The current information focuses on norm- and criterion-referenced tests with some emphasis on minimum competency testing. Since literature on the other evaluative processes is lacking, there is a great need to look at various types of assessments to determine the purposes they serve.  Dismissive A review of the literature on test use, p.9 CSE Report No. 144, August 1980 https://cresst.org/publications/cresst-publication-2531/ National Institute of Education, US Department of Health and Human Services Rubbish. Entire books dating back a century were written on the topic, for example:  C.C. Ross, Measurement in Today’s Schools, 1942;  G.M. Ruch, G.D. Stoddard, Tests and Measurements in High School Instruction, 1927;  C.W. Odell, Educational Measurement in High School, 1930. Other testimonies to the abundance of educational testing and empirical research on test use starting in the first half of the twentieth century can be found in Lincoln & Workman 1936, 4, 7; Butts 1947, 605; Monroe 1950, 1461; Holman & Docter 1972, 34; Tyack 1974, 183; and Lohman 1997, 88.
309 Charlotte Lazar-Morrison Linda Polin, Raymond Moy, James Burry "The kinds of contextual factors which influence testing and the use of test results are just beginning to be appreciated." Dismissive A review of the literature on test use, p.9 CSE Report No. 144, August 1980 https://cresst.org/publications/cresst-publication-2531/ National Institute of Education, US Department of Health and Human Services Rubbish. Entire books dating back a century were written on the topic, for example:  C.C. Ross, Measurement in Today’s Schools, 1942;  G.M. Ruch, G.D. Stoddard, Tests and Measurements in High School Instruction, 1927;  C.W. Odell, Educational Measurement in High School, 1930. Other testimonies to the abundance of educational testing and empirical research on test use starting in the first half of the twentieth century can be found in Lincoln & Workman 1936, 4, 7; Butts 1947, 605; Monroe 1950, 1461; Holman & Docter 1972, 34; Tyack 1974, 183; and Lohman 1997, 88.
310 Charlotte Lazar-Morrison Linda Polin, Raymond Moy, James Burry "Concern exists about the level of teacher training in testing. ... The literature does not appear to reflect any great follow-up to such suggestions [regarding teacher competence with testing]." Dismissive A review of the literature on test use, p.9 CSE Report No. 144, August 1980 https://cresst.org/publications/cresst-publication-2531/ National Institute of Education, US Department of Health and Human Services Rubbish. Entire books dating back a century were written on the topic, for example:  C.C. Ross, Measurement in Today’s Schools, 1942;  G.M. Ruch, G.D. Stoddard, Tests and Measurements in High School Instruction, 1927;  C.W. Odell, Educational Measurement in High School, 1930. Other testimonies to the abundance of educational testing and empirical research on test use starting in the first half of the twentieth century can be found in Lincoln & Workman 1936, 4, 7; Butts 1947, 605; Monroe 1950, 1461; Holman & Docter 1972, 34; Tyack 1974, 183; and Lohman 1997, 88.
311 Charlotte Lazar-Morrison Linda Polin, Raymond Moy, James Burry "All of the studies mentioned included information about standardized achievement testing. As of yet, there is no evidence about how teacher attitudes toward other types of tests affect the use of those assessments." Dismissive A review of the literature on test use, p.19 CSE Report No. 144, August 1980 https://cresst.org/publications/cresst-publication-2531/ National Institute of Education, US Department of Health and Human Services Rubbish. Entire books dating back a century were written on the topic, for example:  C.C. Ross, Measurement in Today’s Schools, 1942;  G.M. Ruch, G.D. Stoddard, Tests and Measurements in High School Instruction, 1927;  C.W. Odell, Educational Measurement in High School, 1930. Other testimonies to the abundance of educational testing and empirical research on test use starting in the first half of the twentieth century can be found in Lincoln & Workman 1936, 4, 7; Butts 1947, 605; Monroe 1950, 1461; Holman & Docter 1972, 34; Tyack 1974, 183; and Lohman 1997, 88.
312 Charlotte Lazar-Morrison Linda Polin, Raymond Moy, James Burry "The effect of the actual testing environment on test use is only beginning to emerge. Evidence suggests that characteristics of the test-takers and the instructional environment need to be explored." Dismissive A review of the literature on test use, p.19 CSE Report No. 144, August 1980 https://cresst.org/publications/cresst-publication-2531/ National Institute of Education, US Department of Health and Human Services Rubbish. Entire books dating back a century were written on the topic, for example:  C.C. Ross, Measurement in Today’s Schools, 1942;  G.M. Ruch, G.D. Stoddard, Tests and Measurements in High School Instruction, 1927;  C.W. Odell, Educational Measurement in High School, 1930. Other testimonies to the abundance of educational testing and empirical research on test use starting in the first half of the twentieth century can be found in Lincoln & Workman 1936, 4, 7; Butts 1947, 605; Monroe 1950, 1461; Holman & Docter 1972, 34; Tyack 1974, 183; and Lohman 1997, 88.
313 Charlotte Lazar-Morrison Linda Polin, Raymond Moy, James Burry "These factors have been considered in research on teachers' instructional decision-making or in studies of the social or organizational qualities of the classroom. The investigation of these variables as factors affecting teachers' use of tests and test data is minimal." Dismissive A review of the literature on test use, p.20 CSE Report No. 144, August 1980 https://cresst.org/publications/cresst-publication-2531/ National Institute of Education, US Department of Health and Human Services Rubbish. Entire books dating back a century were written on the topic, for example:  C.C. Ross, Measurement in Today’s Schools, 1942;  G.M. Ruch, G.D. Stoddard, Tests and Measurements in High School Instruction, 1927;  C.W. Odell, Educational Measurement in High School, 1930. Other testimonies to the abundance of educational testing and empirical research on test use starting in the first half of the twentieth century can be found in Lincoln & Workman 1936, 4, 7; Butts 1947, 605; Monroe 1950, 1461; Holman & Docter 1972, 34; Tyack 1974, 183; and Lohman 1997, 88.
314 Charlotte Lazar-Morrison Linda Polin, Raymond Moy, James Burry "In the community, parent involvement, accounability pressures, and news media coverage of test scores are possible influences on the nature and amount of testing, but they have yet to be researched."  Dismissive A review of the literature on test use, p.20 CSE Report No. 144, August 1980 https://cresst.org/publications/cresst-publication-2531/ National Institute of Education, US Department of Health and Human Services Rubbish. Entire books dating back a century were written on the topic, for example:  C.C. Ross, Measurement in Today’s Schools, 1942;  G.M. Ruch, G.D. Stoddard, Tests and Measurements in High School Instruction, 1927;  C.W. Odell, Educational Measurement in High School, 1930. Other testimonies to the abundance of educational testing and empirical research on test use starting in the first half of the twentieth century can be found in Lincoln & Workman 1936, 4, 7; Butts 1947, 605; Monroe 1950, 1461; Holman & Docter 1972, 34; Tyack 1974, 183; and Lohman 1997, 88.
315 Charlotte Lazar-Morrison Linda Polin, Raymond Moy, James Burry "We know very little about the costs of testing." Dismissive A review of the literature on test use, p.20 CSE Report No. 144, August 1980 https://cresst.org/publications/cresst-publication-2531/ National Institute of Education, US Department of Health and Human Services Rubbish. Entire books dating back a century were written on the topic, for example:  C.C. Ross, Measurement in Today’s Schools, 1942;  G.M. Ruch, G.D. Stoddard, Tests and Measurements in High School Instruction, 1927;  C.W. Odell, Educational Measurement in High School, 1930. Other testimonies to the abundance of educational testing and empirical research on test use starting in the first half of the twentieth century can be found in Lincoln & Workman 1936, 4, 7; Butts 1947, 605; Monroe 1950, 1461; Holman & Docter 1972, 34; Tyack 1974, 183; and Lohman 1997, 88.
316 Charlotte Lazar-Morrison Linda Polin, Raymond Moy, James Burry "Little information is available about these types of costs, and the little information that is available concerns teachers and student attitudes." Dismissive A review of the literature on test use, p.22 CSE Report No. 144, August 1980 https://cresst.org/publications/cresst-publication-2531/ National Institute of Education, US Department of Health and Human Services Rubbish. Entire books dating back a century were written on the topic, for example:  C.C. Ross, Measurement in Today’s Schools, 1942;  G.M. Ruch, G.D. Stoddard, Tests and Measurements in High School Instruction, 1927;  C.W. Odell, Educational Measurement in High School, 1930. Other testimonies to the abundance of educational testing and empirical research on test use starting in the first half of the twentieth century can be found in Lincoln & Workman 1936, 4, 7; Butts 1947, 605; Monroe 1950, 1461; Holman & Docter 1972, 34; Tyack 1974, 183; and Lohman 1997, 88.
317 Charlotte Lazar-Morrison Linda Polin, Raymond Moy, James Burry "The question of whether test scores affect a student's self-concept has also been raised." ... As indicated previously, information on any of the aforementioned issues is scant," Dismissive A review of the literature on test use, p.23 CSE Report No. 144, August 1980 https://cresst.org/publications/cresst-publication-2531/ National Institute of Education, US Department of Health and Human Services Rubbish. Entire books dating back a century were written on the topic, for example:  C.C. Ross, Measurement in Today’s Schools, 1942;  G.M. Ruch, G.D. Stoddard, Tests and Measurements in High School Instruction, 1927;  C.W. Odell, Educational Measurement in High School, 1930. Other testimonies to the abundance of educational testing and empirical research on test use starting in the first half of the twentieth century can be found in Lincoln & Workman 1936, 4, 7; Butts 1947, 605; Monroe 1950, 1461; Holman & Docter 1972, 34; Tyack 1974, 183; and Lohman 1997, 88.
318 Charlotte Lazar-Morrison Linda Polin, Raymond Moy, James Burry "Other evidence suggests that tests of many types are being administered and the results are being utilized. To what extent this is occurring is not specifically known." Dismissive A review of the literature on test use, pp.23-24 CSE Report No. 144, August 1980 https://cresst.org/publications/cresst-publication-2531/ National Institute of Education, US Department of Health and Human Services Rubbish. Entire books dating back a century were written on the topic, for example:  C.C. Ross, Measurement in Today’s Schools, 1942;  G.M. Ruch, G.D. Stoddard, Tests and Measurements in High School Instruction, 1927;  C.W. Odell, Educational Measurement in High School, 1930. Other testimonies to the abundance of educational testing and empirical research on test use starting in the first half of the twentieth century can be found in Lincoln & Workman 1936, 4, 7; Butts 1947, 605; Monroe 1950, 1461; Holman & Docter 1972, 34; Tyack 1974, 183; and Lohman 1997, 88.
319 Charlotte Lazar-Morrison Linda Polin, Raymond Moy, James Burry "There are a number of areas concerning teachers and testing for which there is no information." Dismissive A review of the literature on test use, p.24 CSE Report No. 144, August 1980 https://cresst.org/publications/cresst-publication-2531/ National Institute of Education, US Department of Health and Human Services Rubbish. Entire books dating back a century were written on the topic, for example:  C.C. Ross, Measurement in Today’s Schools, 1942;  G.M. Ruch, G.D. Stoddard, Tests and Measurements in High School Instruction, 1927;  C.W. Odell, Educational Measurement in High School, 1930. Other testimonies to the abundance of educational testing and empirical research on test use starting in the first half of the twentieth century can be found in Lincoln & Workman 1936, 4, 7; Butts 1947, 605; Monroe 1950, 1461; Holman & Docter 1972, 34; Tyack 1974, 183; and Lohman 1997, 88.
320 Charlotte Lazar-Morrison Linda Polin, Raymond Moy, James Burry "The impact of other testing must also be considered. In-class assessments made by individual teachers have yet to be examined in depth." Dismissive A review of the literature on test use, p.24 CSE Report No. 144, August 1980 https://cresst.org/publications/cresst-publication-2531/ National Institute of Education, US Department of Health and Human Services Rubbish. Entire books dating back a century were written on the topic, for example:  C.C. Ross, Measurement in Today’s Schools, 1942;  G.M. Ruch, G.D. Stoddard, Tests and Measurements in High School Instruction, 1927;  C.W. Odell, Educational Measurement in High School, 1930. Other testimonies to the abundance of educational testing and empirical research on test use starting in the first half of the twentieth century can be found in Lincoln & Workman 1936, 4, 7; Butts 1947, 605; Monroe 1950, 1461; Holman & Docter 1972, 34; Tyack 1974, 183; and Lohman 1997, 88.
321 Charlotte Lazar-Morrison Linda Polin, Raymond Moy, James Burry "Teachers place greater reliance on, and have more confidence in, the results of their own judgments of students' performance, but little is known about the kinds of activities that give voice to this information." Dismissive A review of the literature on test use, p.25 CSE Report No. 144, August 1980 https://cresst.org/publications/cresst-publication-2531/ National Institute of Education, US Department of Health and Human Services Rubbish. Entire books dating back a century were written on the topic, for example:  C.C. Ross, Measurement in Today’s Schools, 1942;  G.M. Ruch, G.D. Stoddard, Tests and Measurements in High School Instruction, 1927;  C.W. Odell, Educational Measurement in High School, 1930. Other testimonies to the abundance of educational testing and empirical research on test use starting in the first half of the twentieth century can be found in Lincoln & Workman 1936, 4, 7; Butts 1947, 605; Monroe 1950, 1461; Holman & Docter 1972, 34; Tyack 1974, 183; and Lohman 1997, 88.
322 Charlotte Lazar-Morrison Linda Polin, Raymond Moy, James Burry "The settings and factors which affect the use of tests and their results is yet another uninformed area." Dismissive A review of the literature on test use, p.25 CSE Report No. 144, August 1980 https://cresst.org/publications/cresst-publication-2531/ National Institute of Education, US Department of Health and Human Services Rubbish. Entire books dating back a century were written on the topic, for example:  C.C. Ross, Measurement in Today’s Schools, 1942;  G.M. Ruch, G.D. Stoddard, Tests and Measurements in High School Instruction, 1927;  C.W. Odell, Educational Measurement in High School, 1930. Other testimonies to the abundance of educational testing and empirical research on test use starting in the first half of the twentieth century can be found in Lincoln & Workman 1936, 4, 7; Butts 1947, 605; Monroe 1950, 1461; Holman & Docter 1972, 34; Tyack 1974, 183; and Lohman 1997, 88.
                   
  IRONIES:                
  Michael J. Feuer   "To challenge authority is to hold authority accountable. Challenging people in power requires them to show that what they are doing is legitimate; we invite them to rise to the challenge and prove their case; and they, in turn, trust that the system will treat them fairly."   Measuring Accountability When Trust Is Conditional Education Week, September 24, 2012 https://www.edweek.org/ew/articles/2012/09/24/05feuer_ep.h32.html?print=1    
  Michael J. Feuer   "No profession is granted automatic autonomy or an exemption from evaluation."   Measuring Accountability When Trust Is Conditional Education Week, September 24, 2012 https://www.edweek.org/ew/articles/2012/09/24/05feuer_ep.h32.html?print=1    
  Laura S. Hamilton Brian M. Stecher, Stephen P. Klein "Greater knowledge about testing and accountability can lead to better system design and more-effective system management." p.xiv   Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002 Summary, p.xiv      
  Laura S. Hamilton Brian M. Stecher "Incremental improvements to existing systems, based on current research on testing and accountability, should be combined with long-term research and development efforts that may ultimately lead to a major redesign of these systems. Success in this endeavor will require the thoughtful engagement of educators, policymakers, and researchers in discussions and debates about tests and testing policies."   Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002 Chapter 6, Improving test-based accountability, pp.143-144      
  Brian M. Stecher Stephen P. Klein "Additional information about the impact of performance assessments on curriculum and instruction would provide policymakers with valuable data on the benefits that may accrue from this relatively expensive form of assessment." p.11   The Cost of Science Performance Assessments in Large-Scale Testing Programs, p.1 Educational Evaluation and Policy Analysis, Spring 1997, 19(1)      
  Eva L. Baker Robert L. Linn, Joan L. Herman "Diverse perspectives are needed to clarify real differences and to find equitable, workable balances."   CRESST: A Continuing Mission to Improve Educational Assessment, p.13 Evaluation Comment, Summer 1996      
  Eva L. Baker Robert L. Linn, Joan L. Herman "Impartiality, not advocacy, is the key to the credibility of research and development."   CRESST: A Continuing Mission to Improve Educational Assessment, p.13 Evaluation Comment, Summer 1996      
  Madaus, G.F.   "too often policy debates emphasize only one side or the other of the testing effects coin"   The effects of important tests on students: Implications for a National Examination System, 1991 Phi Delta Kappan, 73(3), 226-231. As quoted in William A. Mehrens, Consequences of Assessment: What is the Evidence?, Education Policy Analysis Archives Volume 6 Number 13 July 14, 1998,  https://epaa.asu.edu/ojs/article/view/580/    
                   
      Author cites (and accepts as fact without checking) someone elses dismissive review            
      Cite selves or colleagues in the group, but dismiss or denigrate all other work            
      Falsely claim that research has only recently been done on topic.            
1) [as of July 4, 2021] SCOPE funders include:  Bill & Melinda Gates Foundation; California Education Policy Fund;  Carnegie Corporation of New York; Center for American Progress; Community Education Fund, Silicon Valley Community Foundation; Ford Foundation; James Irvine Foundation; Joyce Foundation; Justice Matters; Learning Forward; Metlife Foundation; National Center on Education and the Economy; National Education Association; National Public Education Support Fund; Nellie Mae Education Foundation; NoVo Foundation; Rose Foundation;S. D. Bechtel, Jr. Foundation; San Francisco Foundation; Sandler Foundation; Silver Giving Foundation; Spencer Foundation; Stanford University; Stuart Foundation; The Wallace Foundation; William and Flora Hewlett Foundation; William T. Grant Foundation