HOME:  Dismissive Reviews in Education Policy Research        
  Author Co-author(s) Dismissive Quote type Title Source Link1 Funders Notes Notes2
1 John F. Pane   "Practitioners and policymakers seeking to implement personalized learning, lacking clearly defined evidence-based models to adopt, are creating custom designs for their specific contexts. Those who want to use rigorous research evidence to guide their designs will find many gaps and will be left with important unanswered questions about which practices or combinations of practices are effective. It will likely take many years of research to fill these gaps". Dismissive Strategies for Implementing Personalized Learning While Evidence and Resources Are Underdeveloped, p.1 Rand Corporation Perspective, October 2018 https://www.rand.org/pubs/perspectives/PE314.html Funded by the William and Flora Hewlett Foundation, UCLA’s National Center for Research on Evaluation, Standards, and Student Testing (CRESST) is monitoring the extent to which the two consortia’s assessment development efforts are likely to produce tests that measure and support goals for deeper learning.  And, Rand Corporation funders Pane devotes considerable text to claims that no prior research exists, except for another Rand study, and then, on p.7 admits that there exist some relevant mastery learning studies from the 1980s. He implies, however, that there were only one or a few. In fact, there were hundreds. There have also been thousands  of studies of personalized instruction in conjunction with studies in special education, tutoring, teachers' aides, tracking, etc. 
2 John F. Pane   "The purpose of this Perspective is to offer strategic guidance for designers of personalized learning programs to consider while the evidence base is catching up." Dismissive Strategies for Implementing Personalized Learning While Evidence and Resources Are Underdeveloped, p.1 Rand Corporation Perspective, October 2018 https://www.rand.org/pubs/perspectives/PE314.html Funded by the William and Flora Hewlett Foundation, UCLA’s National Center for Research on Evaluation, Standards, and Student Testing (CRESST) is monitoring the extent to which the two consortia’s assessment development efforts are likely to produce tests that measure and support goals for deeper learning.  And, Rand Corporation funders Pane devotes considerable text to claims that no prior research exists, except for another Rand study, and then, on p.7 admits that there exist some relevant mastery learning studies from the 1980s. He implies, however, that there were only one or a few. In fact, there were hundreds. There have also been thousands  of studies of personalized instruction in conjunction with studies in special education, tutoring, teachers' aides, tracking, etc. 
3 John F. Pane   "This guidance draws on theory, basic principles of learning science, and the limited research that does exist on personalized learning and its component parts." Dismissive Strategies for Implementing Personalized Learning While Evidence and Resources Are Underdeveloped, p.1 Rand Corporation Perspective, October 2018 https://www.rand.org/pubs/perspectives/PE314.html Funded by the William and Flora Hewlett Foundation, UCLA’s National Center for Research on Evaluation, Standards, and Student Testing (CRESST) is monitoring the extent to which the two consortia’s assessment development efforts are likely to produce tests that measure and support goals for deeper learning.  And, Rand Corporation funders Pane devotes considerable text to claims that no prior research exists, except for another Rand study, and then, on p.7 admits that there exist some relevant mastery learning studies from the 1980s. He implies, however, that there were only one or a few. In fact, there were hundreds. There have also been thousands  of studies of personalized instruction in conjunction with studies in special education, tutoring, teachers' aides, tracking, etc. 
4 John F. Pane   "Thus far, the research evidence on personalized learning as an overarching schoolwide model is sparse." Dismissive Strategies for Implementing Personalized Learning While Evidence and Resources Are Underdeveloped, p.4 Rand Corporation Perspective, October 2018 https://www.rand.org/pubs/perspectives/PE314.html Funded by the William and Flora Hewlett Foundation, UCLA’s National Center for Research on Evaluation, Standards, and Student Testing (CRESST) is monitoring the extent to which the two consortia’s assessment development efforts are likely to produce tests that measure and support goals for deeper learning.  And, Rand Corporation funders Pane devotes considerable text to claims that no prior research exists, except for another Rand study, and then, on p.7 admits that there exist some relevant mastery learning studies from the 1980s. He implies, however, that there were only one or a few. In fact, there were hundreds. There have also been thousands  of studies of personalized instruction in conjunction with studies in special education, tutoring, teachers' aides, tracking, etc. 
5 John F. Pane   "A team of RAND Corporation researchers conducted the largest and most-rigorous studies of student achievement effects to date." 1stness Strategies for Implementing Personalized Learning While Evidence and Resources Are Underdeveloped, p.4 Rand Corporation Perspective, October 2018 https://www.rand.org/pubs/perspectives/PE314.html Funded by the William and Flora Hewlett Foundation, UCLA’s National Center for Research on Evaluation, Standards, and Student Testing (CRESST) is monitoring the extent to which the two consortia’s assessment development efforts are likely to produce tests that measure and support goals for deeper learning.  And, Rand Corporation funders Pane devotes considerable text to claims that no prior research exists, except for another Rand study, and then, on p.7 admits that there exist some relevant mastery learning studies from the 1980s. He implies, however, that there were only one or a few. In fact, there were hundreds. There have also been thousands  of studies of personalized instruction in conjunction with studies in special education, tutoring, teachers' aides, tracking, etc. 
6 John F. Pane   "While we await the answers to those questions, substantial enthusiasm around personalized learning persists. Educators, policy makers, and advocates are moving forward without the guidance of conclusive research evidence." Dismissive Strategies for Implementing Personalized Learning While Evidence and Resources Are Underdeveloped, p.5 Rand Corporation Perspective, October 2018 https://www.rand.org/pubs/perspectives/PE314.html Funded by the William and Flora Hewlett Foundation, UCLA’s National Center for Research on Evaluation, Standards, and Student Testing (CRESST) is monitoring the extent to which the two consortia’s assessment development efforts are likely to produce tests that measure and support goals for deeper learning.  And, Rand Corporation funders Pane devotes considerable text to claims that no prior research exists, except for another Rand study, and then, on p.7 admits that there exist some relevant mastery learning studies from the 1980s. He implies, however, that there were only one or a few. In fact, there were hundreds. There have also been thousands  of studies of personalized instruction in conjunction with studies in special education, tutoring, teachers' aides, tracking, etc. 
7 John F. Pane   "In the absence of comprehensive, rigorous evidence to help select the personalized learning components most likely to succeed, what is the path forward? I suggest a few guiding principles aimed at using existing scientific knowledge and the best available resources." Denigrating Strategies for Implementing Personalized Learning While Evidence and Resources Are Underdeveloped, p.5 Rand Corporation Perspective, October 2018 https://www.rand.org/pubs/perspectives/PE314.html Funded by the William and Flora Hewlett Foundation, UCLA’s National Center for Research on Evaluation, Standards, and Student Testing (CRESST) is monitoring the extent to which the two consortia’s assessment development efforts are likely to produce tests that measure and support goals for deeper learning.  And, Rand Corporation funders Pane devotes considerable text to claims that no prior research exists, except for another Rand study, and then, on p.7 admits that there exist some relevant mastery learning studies from the 1980s. He implies, however, that there were only one or a few. In fact, there were hundreds. There have also been thousands  of studies of personalized instruction in conjunction with studies in special education, tutoring, teachers' aides, tracking, etc. 
8 John F. Pane   "However, more work is necessary to establish causal evidence that the concept leads to improved outcomes for students" Dismissive Strategies for Implementing Personalized Learning While Evidence and Resources Are Underdeveloped, p.9 Rand Corporation Perspective, October 2018 https://www.rand.org/pubs/perspectives/PE314.html Funded by the William and Flora Hewlett Foundation, UCLA’s National Center for Research on Evaluation, Standards, and Student Testing (CRESST) is monitoring the extent to which the two consortia’s assessment development efforts are likely to produce tests that measure and support goals for deeper learning.  And, Rand Corporation funders Pane devotes considerable text to claims that no prior research exists, except for another Rand study, and then, on p.7 admits that there exist some relevant mastery learning studies from the 1980s. He implies, however, that there were only one or a few. In fact, there were hundreds. There have also been thousands  of studies of personalized instruction in conjunction with studies in special education, tutoring, teachers' aides, tracking, etc. 
9 Lorraine M. McDonnell   "However, an essential question for those interested in the politics of education policy has not been central in past research: To what extent have recent accountability policies altered the politics of education? This article begins to address that question ..." Dismissive Educational Accountability and Policy Feedback, p.171 Educational Policy, 27(2) 170–189 https://journals.sagepub.com/doi/10.1177/0895904812465119 "The author received financial support from the William T. Grant Foundation for research presented in this article."  
10 Jinok Kim Joan L. Herman "However, the validity of existing criteria and procedures lack an empirical base; in fact, reclassification practices are formulated and implemented with little knowledge of the factors that may influence their success." Dismissive, Denigrating Understanding Patterns and Precursors of ELL Success Subsequent to Reclassification, p.1 CRESST Report 818, August, 2012 https://files.eric.ed.gov/fulltext/ED540604.pdf "The work reported herein was supported under the National Research and Development Centers, PR/Award Number R305A09058101, as administered by the U.S. Department of Education, Institute of Education Sciences."  
11 Jinok Kim Joan L. Herman "Because the research basis for making mainstreaming or reclassification decisions remains slim, it may not be surprising that criteria for reclassifying students from ELL to Reclassified as Fluent English Proficient (RFEP) status vary substantially across states, as documented by a recent report reviewing statewide practices related to ELLs." Dismissive Understanding Patterns and Precursors of ELL Success Subsequent to Reclassification, p.3 CRESST Report 818, August, 2012 https://files.eric.ed.gov/fulltext/ED540604.pdf "The work reported herein was supported under the National Research and Development Centers, PR/Award Number R305A09058101, as administered by the U.S. Department of Education, Institute of Education Sciences."  
12 Jinok Kim Joan L. Herman "Previous studies cited earlier have identified potential problems in current reclassification, qualitatively analyzed criteria, and student characteristics that may relate to high versus low redesignation rates, and examined related research questions, such as how long it takes for non native speakers to acquire ELP or be reclassified; but none of the existing literature has directly dealt with reclassification systems and their consequences, and more specifically with the consequences of various reclassification criteria." 1stness Understanding Patterns and Precursors of ELL Success Subsequent to Reclassification, p.6 CRESST Report 818, August, 2012 https://files.eric.ed.gov/fulltext/ED540604.pdf "The work reported herein was supported under the National Research and Development Centers, PR/Award Number R305A09058101, as administered by the U.S. Department of Education, Institute of Education Sciences."  
13 Girlie C. Delacruz   "Opportunities for student use of rubrics to improve learning appears logical, although only a few studies have examined this idea directly." Dismissive Impact of Incentives on the Use of Feedback in Educational Videogames CRESST Report 813, March, 2012, p.3 https://cresst.org/wp-content/uploads/R813.pdf Institute of Education Sciences, US Education Department Relevant pre-2000 studies of the effects of standards, alignment, goal setting, setting reachable goals, etc. include those of Mitchell (1999); Morgan & Ramist (1998); the *Southern Regional Education Board (1998); Miles, Bishop, Collins, Fink, Gardner, Grant, Hussain, et al. (1997); the Florida Office of Program Policy Analysis (1997); Pomplun (1997); Schmoker (1996); Aguilera & Hendricks (1996); Banta, Lund, Black & Oblander (1996); Bottoms & Mikos (1995); *Bamburg & Medina (1993); Bishop (1993); the U. S. General Accounting Office (1993); Eckstein & Noah (1993); Mattsson (1993); Brown (1992); Heyneman & Ransom (1992); Whetton (1992); Anderson, Muir, Bateson, Blackmore & Rogers (1990); Csikszentmihalyi (1990); *Levine & Lezotte (1990); LaRoque & Coleman (1989); Hillocks (1987); Willingham & Morris (1986); Resnick & Resnick (1985); Ogle & Fritts (1984); *Natriello & Dornbusch (1984); Brooke & Oxenham (1984); Rentz (1979); Wellisch, MacQueen, Carriere & Dick (1978); *Rosswork (1977); Estes, Colvin & Goodwin (1976); Wood (1953); and Panlasigui & Knight (1930).
14 Jinok Kim   "Though we can find many such statistics in various reports, few have dealt with comparisons across students reclassified in various grade levels. Lack of such studies may be in part due to the difficulty in defining who are reclassified students as well as when they are reclassified."   Relationshiips among and between ELL status, demographic characteristics, enrollment history, and school persistence CRESST Report 810, December, 2011, p.6 https://cresst.org/wp-content/uploads/R810.pdf "The work reported herein was supported under the National Research and Development Centers, PR/Award Number R305A090581, as administered by the U.S. Department of Education, Institute of Education Sciences with funding to the National Center for Research on Evaluation, Standards, and Student Testing (CRESST)."  
15 Joan Herman 4 others "While the challenge of teachers’ content-pedagogical knowledge has been documented (Heritage et al., 2009; Heritage, Jones & White, 2010; Herman et al., 2010), few studies have examined the relationship between such knowledge and teachers’ assessment practices, nor examined how teachers’ knowledge may moderate the relationship between assessment practices and student learning." Dismissive Relationships between Teacher Knowledge, Assessment Practice, and Learning-Chicken, Egg, or Omelet? CRESST Report 809, November 2011 http://cresst.org/wp-content/uploads/R809.pdf Institute of Education Sciences, US Education Department  
16 Lorrie A. Shepard Kristen L. Davidson, Richard Bowman "Although some instruments, such as the Northwest Evaluation Association‘s (NWEA) Measures of Academic Progress (MAP®), have been around for decades, few studies have been conducted to examine the technical adequacy of interim assessments or to evaluate their effects on teaching and student learning."  Dismissive How Middle-School Mathematics Teachers Use Interim and Benchmark Assessment Data, p.2 CRESST Report 807, October 2011 http://cresst.org/wp-content/uploads/R807.pdf Institute of Education Sciences, US Education Department Studies of formative testing date back a cenury, and the evidence, on average, is strongly positive, which is not the result favored by CRESST, so they declare the studies nonexistent. See, for example,  https://www.tandfonline.com/doi/full/10.1080/15305058.2011.602920 ; https://nonpartisaneducation.org/Review/Resources/QuantitativeList.htm ; https://nonpartisaneducation.org/Review/Resources/SurveyList.htm ; https://nonpartisaneducation.org/Review/Resources/QualitativeList.htm
17 Kristen L. Davidson Greta Frohbieter "Yet, districts’ processes to this end [of adopting interim or benchmark assessments] have been largely unexamined (Bulkley et al.; Mandinach et al.; Young & Kim). Dismissive District Adoption and Implementation of Interim and Benchmark Assessments, p.2 CRESST Report 806, September 2011 https://eric.ed.gov/?id=ED525098 Institute of Education Sciences, US Education Department Studies of formative testing date back a cenury, and the evidence, on average, is strongly positive, which is not the result favored by CRESST, so they declare the studies nonexistent. See, for example,  https://www.tandfonline.com/doi/full/10.1080/15305058.2011.602920 ; https://nonpartisaneducation.org/Review/Resources/QuantitativeList.htm ; https://nonpartisaneducation.org/Review/Resources/SurveyList.htm ; https://nonpartisaneducation.org/Review/Resources/QualitativeList.htm
18 Kristen L. Davidson Greta Frohbieter "As noted above, district processes with regard to interim assessment adoption and implementation remain largely uninvestigated. A review of the few relevant studies, however, reveals..." Dismissive District Adoption and Implementation of Interim and Benchmark Assessments, p.4 CRESST Report 806, September 2011 https://eric.ed.gov/?id=ED525098 Institute of Education Sciences, US Education Department Studies of formative testing date back a cenury, and the evidence, on average, is strongly positive, which is not the result favored by CRESST, so they declare the studies nonexistent. See, for example,  https://www.tandfonline.com/doi/full/10.1080/15305058.2011.602920 ; https://nonpartisaneducation.org/Review/Resources/QuantitativeList.htm ; https://nonpartisaneducation.org/Review/Resources/SurveyList.htm ; https://nonpartisaneducation.org/Review/Resources/QualitativeList.htm
19 Marguerite Clarke   “The evidence base is stronger in some areas than in others. For example, there are many professional standards for assessment quality that ` be applied to classroom assessments, examinations, and large-scale assessments (APA, AERA, and NCME, 1999), but less professional or empirical research on enabling contexts.” p. 20 Dismissive Framework for Building an Effective Student Assessment System  World Bank, READ/SABER Working Paper, Aug. 2011  http://files.eric.ed.gov/fulltext/ED553178.pdf World Bank funders No matter that there exist hundreds of other countries, a century's worth of research prior to 2010, literally thousands of other journals that might publish such a article, and a large "grey literature" of alignment studies conducted as routine parts of test development. Virtually any standards-based, large-scale test development includes an alignment study, not to be found in a scholarly journal.  Some notable alignment studies:
with NRTs:  Freeman, Kuhs, Porter, Floden, Schmidt, Schwille (1983); Debra P. v. Turlington (1984); Cohen, Spillane (1993); La Marca, Redfield, Winter, Bailey, and Despriet (2000); Wainer (2011)
with Standards: Archbald (1994); Buckendahl, Plake, Impara, Irwin (2000); Bhola, Impara, Buckendahl (2003); Phelps (2005)
with RTs: Massell, Kirst, Hoppe (1997); Wiley, Hembry, Buckendahl, Forte,Towles Nebelsick-Gullett (2015)
20 Marguerite Clarke   “Data for some of these indicator areas can be found in official documents, published reports (for example, Ferrer, 2006), research articles (for example, Braun and Kanjee, 2005), and online databases. For the most part, data have not been gathered in any comprehensive or systematic fashion. Those wishing to review this type of information for a particular assessment system will most likely need to collect the data themselves.” p. 21 Denigrating Framework for Building an Effective Student Assessment System  World Bank, READ/SABER Working Paper, Aug. 2011  http://files.eric.ed.gov/fulltext/ED553178.pdf World Bank funders No matter that there exist hundreds of other countries, a century's worth of research prior to 2010, literally thousands of other journals that might publish such a article, and a large "grey literature" of alignment studies conducted as routine parts of test development. Virtually any standards-based, large-scale test development includes an alignment study, not to be found in a scholarly journal.  Some notable alignment studies:
with NRTs:  Freeman, Kuhs, Porter, Floden, Schmidt, Schwille (1983); Debra P. v. Turlington (1984); Cohen, Spillane (1993); La Marca, Redfield, Winter, Bailey, and Despriet (2000); Wainer (2011)
with Standards: Archbald (1994); Buckendahl, Plake, Impara, Irwin (2000); Bhola, Impara, Buckendahl (2003); Phelps (2005)
with RTs: Massell, Kirst, Hoppe (1997); Wiley, Hembry, Buckendahl, Forte,Towles Nebelsick-Gullett (2015)
21 Marguerite Clarke   “This paper has extracted principles and guidelines from countries’ experiences and the current research base to outline a framework for developing a more effective student assessment system. The framework provides policy makers and others with a structure for discussion and consensus building around priorities and key inputs for their assessment system.” p. 27 1rstness Framework for Building an Effective Student Assessment System  World Bank, READ/SABER Working Paper, Aug. 2011  http://files.eric.ed.gov/fulltext/ED553178.pdf World Bank funders No matter that there exist hundreds of other countries, a century's worth of research prior to 2010, literally thousands of other journals that might publish such a article, and a large "grey literature" of alignment studies conducted as routine parts of test development. Virtually any standards-based, large-scale test development includes an alignment study, not to be found in a scholarly journal.  Some notable alignment studies:
with NRTs:  Freeman, Kuhs, Porter, Floden, Schmidt, Schwille (1983); Debra P. v. Turlington (1984); Cohen, Spillane (1993); La Marca, Redfield, Winter, Bailey, and Despriet (2000); Wainer (2011)
with Standards: Archbald (1994); Buckendahl, Plake, Impara, Irwin (2000); Bhola, Impara, Buckendahl (2003); Phelps (2005)
with RTs: Massell, Kirst, Hoppe (1997); Wiley, Hembry, Buckendahl, Forte,Towles Nebelsick-Gullett (2015)
22 Michael Hout, Stuart W. Elliot, Editors   "Unfortunately, there were no other studies available that would have allowed us to contrast the overall effect of state incentive programs predating NCLB…" p. 4-6 Dismissive Incentives and Test-Based Accountability in Education, 2011 Board on Testing and Assessment, National Research Council https://www.nap.edu/catalog/12521/incentives-and-test-based-accountability-in-education National Research Council funders Relevant studies of the effects of varying types of incentive or the optimal structure of incentives include those of Kelley (1999); the *Southern Regional Education Board (1998); Trelfa (1998); Heneman (1998); Banta, Lund, Black & Oblander (1996); Brooks-Cooper, 1993; Eckstein & Noah (1993); Richards & Shen (1992); Jacobson (1992); Heyneman & Ransom (1992); *Levine & Lezotte (1990); Duran, 1989; *Crooks (1988); *Kulik & Kulik (1987); Corcoran & Wilson (1986); *Guskey & Gates (1986); Brook & Oxenham (1985); Oxenham (1984); Venezky & Winfield (1979); Brookover & Lezotte (1979); McMillan (1977); Abbott (1977); *Staats (1973); *Kazdin & Bootzin (1972); *O’Leary & Drabman (1971); Cronbach (1960); and Hurlock (1925).   *Covers many studies; study is a research review, research synthesis, or meta-analysis.  Other researchers who, prior to 2000, studied test-based incentive programs include Homme, Csanyi, Gonzales, Rechs, O’Leary, Drabman, Kaszdin, Bootzin, Staats, Cameron, Pierce, McMillan, Corcoran, Roueche, Kirk, Wheeler, Boylan, and Wilson. "Others have considered the role of tests in incentive programs.  These researchers have included Homme, Csanyi, Gonzales, Rechs, O’Leary, Drabman, Kaszdin, Bootzin, Staats, Cameron, Pierce, McMillan, Corcoran, and Wilson. International organizations, such as the World Bank or the Asian Development Bank, have studied the effects of testing on education programs they sponsor.  Researchers have included Somerset, Heynemann, Ransom, Psacharopoulis, Velez, Brooke, Oxenham, Bude, Chapman, Snyder, and Pronaratna.
Moreover, the mastery learning/mastery testing experiments conducted from the 1960s through today varied incentives, frequency of tests, types of tests, and many other factors to determine the optimal structure of testing programs. Researchers included such notables as Bloom, Carroll, Keller, Block, Burns, Wentling, Anderson, Hymel, Kulik, Tierney, Cross, Okey, Guskey, Gates, and Jones."
23 Michael Hout, Stuart W. Elliot, Editors   "Test-based incentive programs, as designed and implemented in the programs that have been carefully studied have not increased student achievement enough to bring the United States close to the levels of the highest achieving countries.", p. 4-26 Denigrating Incentives and Test-Based Accountability in Education, 2011 Board on Testing and Assessment, National Research Council https://www.nap.edu/catalog/12521/incentives-and-test-based-accountability-in-education National Research Council funders Relevant studies of the effects of varying types of incentive or the optimal structure of incentives include those of Kelley (1999); the *Southern Regional Education Board (1998); Trelfa (1998); Heneman (1998); Banta, Lund, Black & Oblander (1996); Brooks-Cooper, 1993; Eckstein & Noah (1993); Richards & Shen (1992); Jacobson (1992); Heyneman & Ransom (1992); *Levine & Lezotte (1990); Duran, 1989; *Crooks (1988); *Kulik & Kulik (1987); Corcoran & Wilson (1986); *Guskey & Gates (1986); Brook & Oxenham (1985); Oxenham (1984); Venezky & Winfield (1979); Brookover & Lezotte (1979); McMillan (1977); Abbott (1977); *Staats (1973); *Kazdin & Bootzin (1972); *O’Leary & Drabman (1971); Cronbach (1960); and Hurlock (1925).   *Covers many studies; study is a research review, research synthesis, or meta-analysis.  Other researchers who, prior to 2000, studied test-based incentive programs include Homme, Csanyi, Gonzales, Rechs, O’Leary, Drabman, Kaszdin, Bootzin, Staats, Cameron, Pierce, McMillan, Corcoran, Roueche, Kirk, Wheeler, Boylan, and Wilson. Others have considered the role of tests in incentive programs.  These researchers have included Homme, Csanyi, Gonzales, Rechs, O’Leary, Drabman, Kaszdin, Bootzin, Staats, Cameron, Pierce, McMillan, Corcoran, and Wilson. International organizations, such as the World Bank or the Asian Development Bank, have studied the effects of testing on education programs they sponsor.  Researchers have included Somerset, Heynemann, Ransom, Psacharopoulis, Velez, Brooke, Oxenham, Bude, Chapman, Snyder, and Pronaratna.
Moreover, the mastery learning/mastery testing experiments conducted from the 1960s through today varied incentives, frequency of tests, types of tests, and many other factors to determine the optimal structure of testing programs. Researchers included such notables as Bloom, Carroll, Keller, Block, Burns, Wentling, Anderson, Hymel, Kulik, Tierney, Cross, Okey, Guskey, Gates, and Jones.
24 Michael Hout, Stuart W. Elliot, Editors   "Despite using them for several decades, policymakers and educators do not yet know how to use test-based incentives to consistently generate positive effects on achievement and to improve education." p .5-1 Dismissive Incentives and Test-Based Accountability in Education, 2011 Board on Testing and Assessment, National Research Council https://www.nap.edu/catalog/12521/incentives-and-test-based-accountability-in-education National Research Council funders Relevant studies of the effects of varying types of incentive or the optimal structure of incentives include those of Kelley (1999); the *Southern Regional Education Board (1998); Trelfa (1998); Heneman (1998); Banta, Lund, Black & Oblander (1996); Brooks-Cooper, 1993; Eckstein & Noah (1993); Richards & Shen (1992); Jacobson (1992); Heyneman & Ransom (1992); *Levine & Lezotte (1990); Duran, 1989; *Crooks (1988); *Kulik & Kulik (1987); Corcoran & Wilson (1986); *Guskey & Gates (1986); Brook & Oxenham (1985); Oxenham (1984); Venezky & Winfield (1979); Brookover & Lezotte (1979); McMillan (1977); Abbott (1977); *Staats (1973); *Kazdin & Bootzin (1972); *O’Leary & Drabman (1971); Cronbach (1960); and Hurlock (1925).   *Covers many studies; study is a research review, research synthesis, or meta-analysis.  Other researchers who, prior to 2000, studied test-based incentive programs include Homme, Csanyi, Gonzales, Rechs, O’Leary, Drabman, Kaszdin, Bootzin, Staats, Cameron, Pierce, McMillan, Corcoran, Roueche, Kirk, Wheeler, Boylan, and Wilson. "Others have considered the role of tests in incentive programs.  These researchers have included Homme, Csanyi, Gonzales, Rechs, O’Leary, Drabman, Kaszdin, Bootzin, Staats, Cameron, Pierce, McMillan, Corcoran, and Wilson. International organizations, such as the World Bank or the Asian Development Bank, have studied the effects of testing on education programs they sponsor.  Researchers have included Somerset, Heynemann, Ransom, Psacharopoulis, Velez, Brooke, Oxenham, Bude, Chapman, Snyder, and Pronaratna.
Moreover, the mastery learning/mastery testing experiments conducted from the 1960s through today varied incentives, frequency of tests, types of tests, and many other factors to determine the optimal structure of testing programs. Researchers included such notables as Bloom, Carroll, Keller, Block, Burns, Wentling, Anderson, Hymel, Kulik, Tierney, Cross, Okey, Guskey, Gates, and Jones."
25 Michael Hout, Stuart W. Elliot, Editors   "The general lack of guidance coming from existing studies of test-based incentive programs in education…" Dismissive Incentives and Test-Based Accountability in Education, 2011 Board on Testing and Assessment, National Research Council https://www.nap.edu/catalog/12521/incentives-and-test-based-accountability-in-education National Research Council funders Relevant studies of the effects of varying types of incentive or the optimal structure of incentives include those of Kelley (1999); the *Southern Regional Education Board (1998); Trelfa (1998); Heneman (1998); Banta, Lund, Black & Oblander (1996); Brooks-Cooper, 1993; Eckstein & Noah (1993); Richards & Shen (1992); Jacobson (1992); Heyneman & Ransom (1992); *Levine & Lezotte (1990); Duran, 1989; *Crooks (1988); *Kulik & Kulik (1987); Corcoran & Wilson (1986); *Guskey & Gates (1986); Brook & Oxenham (1985); Oxenham (1984); Venezky & Winfield (1979); Brookover & Lezotte (1979); McMillan (1977); Abbott (1977); *Staats (1973); *Kazdin & Bootzin (1972); *O’Leary & Drabman (1971); Cronbach (1960); and Hurlock (1925).   *Covers many studies; study is a research review, research synthesis, or meta-analysis.  Other researchers who, prior to 2000, studied test-based incentive programs include Homme, Csanyi, Gonzales, Rechs, O’Leary, Drabman, Kaszdin, Bootzin, Staats, Cameron, Pierce, McMillan, Corcoran, Roueche, Kirk, Wheeler, Boylan, and Wilson. "Others have considered the role of tests in incentive programs.  These researchers have included Homme, Csanyi, Gonzales, Rechs, O’Leary, Drabman, Kaszdin, Bootzin, Staats, Cameron, Pierce, McMillan, Corcoran, and Wilson. International organizations, such as the World Bank or the Asian Development Bank, have studied the effects of testing on education programs they sponsor.  Researchers have included Somerset, Heynemann, Ransom, Psacharopoulis, Velez, Brooke, Oxenham, Bude, Chapman, Snyder, and Pronaratna.
Moreover, the mastery learning/mastery testing experiments conducted from the 1960s through today varied incentives, frequency of tests, types of tests, and many other factors to determine the optimal structure of testing programs. Researchers included such notables as Bloom, Carroll, Keller, Block, Burns, Wentling, Anderson, Hymel, Kulik, Tierney, Cross, Okey, Guskey, Gates, and Jones."
26 Laura S. Hamilton Brian M. Stecher, Kun Yuan “A few studies have attempted to examine how the creation and publication of standards, per se, have affected practices.” p. 3 Dismissive Standards-Based Reform in the United States: History, Research, and Future Directions Center on Education Policy, December, 2008 http://www.rand.org/content/dam/rand/pubs/reprints/2009/RAND_RP1384.pdf Institute of Education Sciences, US Education Department Relevant pre-2000 studies of the effects of standards, alignment, goal setting, setting reachable goals, etc. include those of Mitchell (1999); Morgan & Ramist (1998); the *Southern Regional Education Board (1998); Miles, Bishop, Collins, Fink, Gardner, Grant, Hussain, et al. (1997); the Florida Office of Program Policy Analysis (1997); Pomplun (1997); Schmoker (1996); Aguilera & Hendricks (1996); Banta, Lund, Black & Oblander (1996); Bottoms & Mikos (1995); *Bamburg & Medina (1993); Bishop (1993); the U. S. General Accounting Office (1993); Eckstein & Noah (1993); Mattsson (1993); Brown (1992); Heyneman & Ransom (1992); Whetton (1992); Anderson, Muir, Bateson, Blackmore & Rogers (1990); Csikszentmihalyi (1990); *Levine & Lezotte (1990); LaRoque & Coleman (1989); Hillocks (1987); Willingham & Morris (1986); Resnick & Resnick (1985); Ogle & Fritts (1984); *Natriello & Dornbusch (1984); Brooke & Oxenham (1984); Rentz (1979); Wellisch, MacQueen, Carriere & Dick (1978); *Rosswork (1977); Estes, Colvin & Goodwin (1976); Wood (1953); and Panlasigui & Knight (1930).
27 Laura S. Hamilton Brian M. Stecher, Kun Yuan “The research evidence does not provide definitive answers to these questions.” p. 6 Denigrating Standards-Based Reform in the United States: History, Research, and Future Directions Center on Education Policy, December, 2008 http://www.rand.org/content/dam/rand/pubs/reprints/2009/RAND_RP1384.pdf Institute of Education Sciences, US Education Department Relevant pre-2000 studies of the effects of standards, alignment, goal setting, setting reachable goals, etc. include those of Mitchell (1999); Morgan & Ramist (1998); the *Southern Regional Education Board (1998); Miles, Bishop, Collins, Fink, Gardner, Grant, Hussain, et al. (1997); the Florida Office of Program Policy Analysis (1997); Pomplun (1997); Schmoker (1996); Aguilera & Hendricks (1996); Banta, Lund, Black & Oblander (1996); Bottoms & Mikos (1995); *Bamburg & Medina (1993); Bishop (1993); the U. S. General Accounting Office (1993); Eckstein & Noah (1993); Mattsson (1993); Brown (1992); Heyneman & Ransom (1992); Whetton (1992); Anderson, Muir, Bateson, Blackmore & Rogers (1990); Csikszentmihalyi (1990); *Levine & Lezotte (1990); LaRoque & Coleman (1989); Hillocks (1987); Willingham & Morris (1986); Resnick & Resnick (1985); Ogle & Fritts (1984); *Natriello & Dornbusch (1984); Brooke & Oxenham (1984); Rentz (1979); Wellisch, MacQueen, Carriere & Dick (1978); *Rosswork (1977); Estes, Colvin & Goodwin (1976); Wood (1953); and Panlasigui & Knight (1930).
28 Laura S. Hamilton Brian M. Stecher, Kun Yuan “He [Poynter 1994] also noted that ‘virtually all of the arguments, both for and against standards, are based on beliefs and hypotheses rather than on direct empirical evidence’ (p. 427).” pp. 34-35 Dismissive Standards-Based Reform in the United States: History, Research, and Future Directions Center on Education Policy, December, 2008 http://www.rand.org/content/dam/rand/pubs/reprints/2009/RAND_RP1384.pdf Institute of Education Sciences, US Education Department Relevant pre-2000 studies of the effects of standards, alignment, goal setting, setting reachable goals, etc. include those of Mitchell (1999); Morgan & Ramist (1998); the *Southern Regional Education Board (1998); Miles, Bishop, Collins, Fink, Gardner, Grant, Hussain, et al. (1997); the Florida Office of Program Policy Analysis (1997); Pomplun (1997); Schmoker (1996); Aguilera & Hendricks (1996); Banta, Lund, Black & Oblander (1996); Bottoms & Mikos (1995); *Bamburg & Medina (1993); Bishop (1993); the U. S. General Accounting Office (1993); Eckstein & Noah (1993); Mattsson (1993); Brown (1992); Heyneman & Ransom (1992); Whetton (1992); Anderson, Muir, Bateson, Blackmore & Rogers (1990); Csikszentmihalyi (1990); *Levine & Lezotte (1990); LaRoque & Coleman (1989); Hillocks (1987); Willingham & Morris (1986); Resnick & Resnick (1985); Ogle & Fritts (1984); *Natriello & Dornbusch (1984); Brooke & Oxenham (1984); Rentz (1979); Wellisch, MacQueen, Carriere & Dick (1978); *Rosswork (1977); Estes, Colvin & Goodwin (1976); Wood (1953); and Panlasigui & Knight (1930).
29 Laura S. Hamilton Brian M. Stecher, Kun Yuan "Although a large and growing body of research has been conducted to examine the effects of SBR, the caution Poynter expressed in 1994 about the lack of empirical evidence remains relevant today.” pp. 34-35 Dismissive Standards-Based Reform in the United States: History, Research, and Future Directions Center on Education Policy, December, 2008 http://www.rand.org/content/dam/rand/pubs/reprints/2009/RAND_RP1384.pdf Institute of Education Sciences, US Education Department Relevant pre-2000 studies of the effects of standards, alignment, goal setting, setting reachable goals, etc. include those of Mitchell (1999); Morgan & Ramist (1998); the *Southern Regional Education Board (1998); Miles, Bishop, Collins, Fink, Gardner, Grant, Hussain, et al. (1997); the Florida Office of Program Policy Analysis (1997); Pomplun (1997); Schmoker (1996); Aguilera & Hendricks (1996); Banta, Lund, Black & Oblander (1996); Bottoms & Mikos (1995); *Bamburg & Medina (1993); Bishop (1993); the U. S. General Accounting Office (1993); Eckstein & Noah (1993); Mattsson (1993); Brown (1992); Heyneman & Ransom (1992); Whetton (1992); Anderson, Muir, Bateson, Blackmore & Rogers (1990); Csikszentmihalyi (1990); *Levine & Lezotte (1990); LaRoque & Coleman (1989); Hillocks (1987); Willingham & Morris (1986); Resnick & Resnick (1985); Ogle & Fritts (1984); *Natriello & Dornbusch (1984); Brooke & Oxenham (1984); Rentz (1979); Wellisch, MacQueen, Carriere & Dick (1978); *Rosswork (1977); Estes, Colvin & Goodwin (1976); Wood (1953); and Panlasigui & Knight (1930).
30 Laura S. Hamilton Brian M. Stecher, Kun Yuan “Arguably the most important test of quality is whether the standards promote high-quality instruction and improved student learning, but as we discuss later, there is very little research to address that question.” p. 37 Dismissive Standards-Based Reform in the United States: History, Research, and Future Directions Center on Education Policy, December, 2008 http://www.rand.org/content/dam/rand/pubs/reprints/2009/RAND_RP1384.pdf Institute of Education Sciences, US Education Department Relevant pre-2000 studies of the effects of standards, alignment, goal setting, setting reachable goals, etc. include those of Mitchell (1999); Morgan & Ramist (1998); the *Southern Regional Education Board (1998); Miles, Bishop, Collins, Fink, Gardner, Grant, Hussain, et al. (1997); the Florida Office of Program Policy Analysis (1997); Pomplun (1997); Schmoker (1996); Aguilera & Hendricks (1996); Banta, Lund, Black & Oblander (1996); Bottoms & Mikos (1995); *Bamburg & Medina (1993); Bishop (1993); the U. S. General Accounting Office (1993); Eckstein & Noah (1993); Mattsson (1993); Brown (1992); Heyneman & Ransom (1992); Whetton (1992); Anderson, Muir, Bateson, Blackmore & Rogers (1990); Csikszentmihalyi (1990); *Levine & Lezotte (1990); LaRoque & Coleman (1989); Hillocks (1987); Willingham & Morris (1986); Resnick & Resnick (1985); Ogle & Fritts (1984); *Natriello & Dornbusch (1984); Brooke & Oxenham (1984); Rentz (1979); Wellisch, MacQueen, Carriere & Dick (1978); *Rosswork (1977); Estes, Colvin & Goodwin (1976); Wood (1953); and Panlasigui & Knight (1930).
31 Laura S. Hamilton Brian M. Stecher, Kun Yuan “[T]here have been a few studies of SBR as a comprehensive system. . . . [T]here is some research on how the adoption of standards, per se, or the alignment of standards with curriculum influences school practices or student outcomes.” p. 38 Dismissive Standards-Based Reform in the United States: History, Research, and Future Directions Center on Education Policy, December, 2008 http://www.rand.org/content/dam/rand/pubs/reprints/2009/RAND_RP1384.pdf Institute of Education Sciences, US Education Department Relevant pre-2000 studies of the effects of standards, alignment, goal setting, setting reachable goals, etc. include those of Mitchell (1999); Morgan & Ramist (1998); the *Southern Regional Education Board (1998); Miles, Bishop, Collins, Fink, Gardner, Grant, Hussain, et al. (1997); the Florida Office of Program Policy Analysis (1997); Pomplun (1997); Schmoker (1996); Aguilera & Hendricks (1996); Banta, Lund, Black & Oblander (1996); Bottoms & Mikos (1995); *Bamburg & Medina (1993); Bishop (1993); the U. S. General Accounting Office (1993); Eckstein & Noah (1993); Mattsson (1993); Brown (1992); Heyneman & Ransom (1992); Whetton (1992); Anderson, Muir, Bateson, Blackmore & Rogers (1990); Csikszentmihalyi (1990); *Levine & Lezotte (1990); LaRoque & Coleman (1989); Hillocks (1987); Willingham & Morris (1986); Resnick & Resnick (1985); Ogle & Fritts (1984); *Natriello & Dornbusch (1984); Brooke & Oxenham (1984); Rentz (1979); Wellisch, MacQueen, Carriere & Dick (1978); *Rosswork (1977); Estes, Colvin & Goodwin (1976); Wood (1953); and Panlasigui & Knight (1930).
32 Laura S. Hamilton Brian M. Stecher, Kun Yuan “The lack of evidence about the effects of SBR derives primarily from the fact that the vision has never been fully realized in practice.” p. 47 Dismissive Standards-Based Reform in the United States: History, Research, and Future Directions Center on Education Policy, December, 2008 http://www.rand.org/content/dam/rand/pubs/reprints/2009/RAND_RP1384.pdf Institute of Education Sciences, US Education Department Relevant pre-2000 studies of the effects of standards, alignment, goal setting, setting reachable goals, etc. include those of Mitchell (1999); Morgan & Ramist (1998); the *Southern Regional Education Board (1998); Miles, Bishop, Collins, Fink, Gardner, Grant, Hussain, et al. (1997); the Florida Office of Program Policy Analysis (1997); Pomplun (1997); Schmoker (1996); Aguilera & Hendricks (1996); Banta, Lund, Black & Oblander (1996); Bottoms & Mikos (1995); *Bamburg & Medina (1993); Bishop (1993); the U. S. General Accounting Office (1993); Eckstein & Noah (1993); Mattsson (1993); Brown (1992); Heyneman & Ransom (1992); Whetton (1992); Anderson, Muir, Bateson, Blackmore & Rogers (1990); Csikszentmihalyi (1990); *Levine & Lezotte (1990); LaRoque & Coleman (1989); Hillocks (1987); Willingham & Morris (1986); Resnick & Resnick (1985); Ogle & Fritts (1984); *Natriello & Dornbusch (1984); Brooke & Oxenham (1984); Rentz (1979); Wellisch, MacQueen, Carriere & Dick (1978); *Rosswork (1977); Estes, Colvin & Goodwin (1976); Wood (1953); and Panlasigui & Knight (1930).
33 Laura S. Hamilton Brian M. Stecher, Kun Yuan “[A]lthough many conceptions of SBR emphasize autonomy, we currently know relatively little about the effects of granting autonomy or what the right balance is between autonomy and prescriptiveness.” p. 55 Dismissive Standards-Based Reform in the United States: History, Research, and Future Directions Center on Education Policy, December, 2008 http://www.rand.org/content/dam/rand/pubs/reprints/2009/RAND_RP1384.pdf Institute of Education Sciences, US Education Department Relevant pre-2000 studies of the effects of standards, alignment, goal setting, setting reachable goals, etc. include those of Mitchell (1999); Morgan & Ramist (1998); the *Southern Regional Education Board (1998); Miles, Bishop, Collins, Fink, Gardner, Grant, Hussain, et al. (1997); the Florida Office of Program Policy Analysis (1997); Pomplun (1997); Schmoker (1996); Aguilera & Hendricks (1996); Banta, Lund, Black & Oblander (1996); Bottoms & Mikos (1995); *Bamburg & Medina (1993); Bishop (1993); the U. S. General Accounting Office (1993); Eckstein & Noah (1993); Mattsson (1993); Brown (1992); Heyneman & Ransom (1992); Whetton (1992); Anderson, Muir, Bateson, Blackmore & Rogers (1990); Csikszentmihalyi (1990); *Levine & Lezotte (1990); LaRoque & Coleman (1989); Hillocks (1987); Willingham & Morris (1986); Resnick & Resnick (1985); Ogle & Fritts (1984); *Natriello & Dornbusch (1984); Brooke & Oxenham (1984); Rentz (1979); Wellisch, MacQueen, Carriere & Dick (1978); *Rosswork (1977); Estes, Colvin & Goodwin (1976); Wood (1953); and Panlasigui & Knight (1930).
34 Laura S. Hamilton Brian M. Stecher, Kun Yuan “One of the primary responsibilities of the federal government should be to ensure ongoing collection of evidence demonstrating the effects of the policies, which could be used to make decisions about whether to continue on the current course or whether small adjustments or a major overhaul are needed.” p. 55 Dismissive Standards-Based Reform in the United States: History, Research, and Future Directions Center on Education Policy, December, 2008 http://www.rand.org/content/dam/rand/pubs/reprints/2009/RAND_RP1384.pdf Institute of Education Sciences, US Education Department Relevant pre-2000 studies of the effects of standards, alignment, goal setting, setting reachable goals, etc. include those of Mitchell (1999); Morgan & Ramist (1998); the *Southern Regional Education Board (1998); Miles, Bishop, Collins, Fink, Gardner, Grant, Hussain, et al. (1997); the Florida Office of Program Policy Analysis (1997); Pomplun (1997); Schmoker (1996); Aguilera & Hendricks (1996); Banta, Lund, Black & Oblander (1996); Bottoms & Mikos (1995); *Bamburg & Medina (1993); Bishop (1993); the U. S. General Accounting Office (1993); Eckstein & Noah (1993); Mattsson (1993); Brown (1992); Heyneman & Ransom (1992); Whetton (1992); Anderson, Muir, Bateson, Blackmore & Rogers (1990); Csikszentmihalyi (1990); *Levine & Lezotte (1990); LaRoque & Coleman (1989); Hillocks (1987); Willingham & Morris (1986); Resnick & Resnick (1985); Ogle & Fritts (1984); *Natriello & Dornbusch (1984); Brooke & Oxenham (1984); Rentz (1979); Wellisch, MacQueen, Carriere & Dick (1978); *Rosswork (1977); Estes, Colvin & Goodwin (1976); Wood (1953); and Panlasigui & Knight (1930).
35 Douglas N. Harris Lori L. Taylor, Amy A. Levine, William K. Ingle, Leslie McDonald "However, previous studies under-state current costs by focusing on costs before NCLB was put in place and by excluding important cost categories." Denigrating The Resource Costs of Standards, Assessments, and Accountability report to the National Research Council   National Research Council funders No, they did not leave out important cost categories; Harris' study deliberately exagerates costs. See pages 3-10:  https://nonpartisaneducation.org/Review/Essays/v10n1.pdf
36 Joan Herman Katherine E. Ryan, Lorrie A. Shepard, Eds. "Yet, available evidence suggests that the rhetoric surpasses the reality of formative assessment use" p.217 Denigrating Accountability and assessment: Is public interest in K-12 education being served? Chapter 11 in The Future of Test-Based Educational Accountability https://www.routledge.com/The-Future-of-Test-Based-Educational-Accountability-1st-Edition/Ryan-Shepard/p/book/9780805864700 Institute of Education Sciences, US Education Department Studies of formative testing date back a cenury, and the evidence, on average, is strongly positive, which is not the result favored by CRESST, so they declare the studies nonexistent. See, for example,  https://www.tandfonline.com/doi/full/10.1080/15305058.2011.602920 ; https://nonpartisaneducation.org/Review/Resources/QuantitativeList.htm ; https://nonpartisaneducation.org/Review/Resources/SurveyList.htm ; https://nonpartisaneducation.org/Review/Resources/QualitativeList.htm
37 Joan Herman Katherine E. Ryan, Lorrie A. Shepard, Eds. "The research base examining effects on students with disabilities and on English Language learners is scanty." p.223 Dismissive Accountability and assessment: Is public interest in K-12 education being served? Chapter 11 in The Future of Test-Based Educational Accountability https://www.routledge.com/The-Future-of-Test-Based-Educational-Accountability-1st-Edition/Ryan-Shepard/p/book/9780805864700 Institute of Education Sciences, US Education Department  
38 Joan Herman Katherine E. Ryan, Lorrie A. Shepard, Eds. "...there is no obvious accountability mechanism for the "average student" who may have made it just over the proficient level. There is little research on this issue." p.224 Dismissive Accountability and assessment: Is public interest in K-12 education being served? Chapter 11 in The Future of Test-Based Educational Accountability https://www.routledge.com/The-Future-of-Test-Based-Educational-Accountability-1st-Edition/Ryan-Shepard/p/book/9780805864700 Institute of Education Sciences, US Education Department Relevant pre-2000 studies of the effects of minimum-competency testing and the problems with a single passing score include those of Frederiksen (1994); Jacobson (1992); Winfield (1990); Ligon, Johnstone, Brightman, Davis, et al. (1990); Losack (1987); Marshall (1987); Mangino & Babcock (1986); Michigan Department of Education (1984); Serow (1982); Brunton (1982); Paramore, et al. (1980); Ogden (1979); and Findley (1978).
39 Joan Herman   "The report considers how well the model fits available evidence by examining whether and how accountability assessment influences students’ learning opportunities and the relationship between accountability and learning." abstract Dismissive Accountability and assessment: Is public interest in K-12 education being served? CRESST Report 728, October 2007 https://eric.ed.gov/?id=ED499421 Institute of Education Sciences, US Education Department See, for example, Test Frequency, Stakes, and Feedback in Student Achievement: A Meta-Analysis   https://journals.sagepub.com/doi/abs/10.1177/0193841X19865628#abstract
40 Joan Herman   "What of the impact of accountability on other segments of the student population--traditionally higher performing students? ...The average student? ...there is no obvious accountability mechanism for the "average student. There is little research on this issue." Dismissive Accountability and assessment: Is public interest in K-12 education being served? CRESST Report 728, October 2007 https://eric.ed.gov/?id=ED499421 Institute of Education Sciences, US Education Department Relevant pre-2000 studies of the effects of minimum-competency testing and the problems with a single passing score include those of Frederiksen (1994); Winfield (1990); Ligon, Johnstone, Brightman, Davis, et al. (1990); Losack (1987); Mangino & Babcock (1986); Serow (1982); Brunton (1982); Paramore, et al. (1980); Ogden (1979); and Findley (1978).
41 Joan Herman   "While a thorough treatment of the effects on teachers is also beyond the scope of this report, it is worth noting a growing literature that is cause for concern." p.17 Dismissive Accountability and assessment: Is public interest in K-12 education being served? CRESST Report 728, October 2007 https://eric.ed.gov/?id=ED499421 Institute of Education Sciences, US Education Department Rubbish. Entire books dating back a century were written on the topic, for example:  C.C. Ross, Measurement in Today’s Schools, 1942;  G.M. Ruch, G.D. Stoddard, Tests and Measurements in High School Instruction, 1927;  C.W. Odell, Educational Measurement in High School, 1930. Other testimonies to the abundance of educational testing and empirical research on test use starting in the first half of the twentieth century can be found in Lincoln & Workman 1936, 4, 7; Butts 1947, 605; Monroe 1950, 1461; Holman & Docter 1972, 34; Tyack 1974, 183; and Lohman 1997, 88.
42 Joan Herman   "The research base examining effects on students with disabilities and on English language learner students is scanty." pp.16-17 Dismissive Accountability and assessment: Is public interest in K-12 education being served? CRESST Report 728, October 2007 https://eric.ed.gov/?id=ED499421 Institute of Education Sciences, US Education Department  
43 Eva L. Baker   "Tests only dimly reflect in their design the results of research on learning, whether of skills, subject matter, or problem solving." p.310 Denigrating The End(s) of Testing Educational Researcher, Vol. 36, No. 6, pp. 309–317   2007 Presidential Address for the American Educational Research Association  
44 Eva L. Baker   "To my mind, the evidential disconnect between test design and learning research is no small thing." p.310 Dismissive The End(s) of Testing Educational Researcher, Vol. 36, No. 6, pp. 309–317   2007 Presidential Address for the American Educational Research Association  
45 Eva L. Baker   "What if we set aside learning-based design and ask, “How well do any of our external tests work?” The answer is that we often don’t know enough to know. We have little evidence that tests are in sync with their stated or de facto purposes or that their results lead to appropriate decisions." p.310 Dismissive The End(s) of Testing Educational Researcher, Vol. 36, No. 6, pp. 309–317   2007 Presidential Address for the American Educational Research Association  
46 Laura S. Hamilton Brian M. Stecher, Julie A. Marsh, Jennifer Sloan McCombs, Abby Robyn, Jennifer Lin Russell, Scott Naftel, Heather Barney "However, the paths through which SBA [standards-based accountability] changes district, school, and classroom practices and how these changes in practice influence student outcomes are largely unexplored. There is strong evidence that SBA leads to changes in teachers’ instructional practices (Hamilton, 2004; Stecher, 2002)." p.5 Dismissive Standards-Based Accountability Under No Child Left Behind: Experiences of Teachers and Administrators in Three States Rand Corporation, 2007 https://www.rand.org/pubs/monographs/MG589.html "This research was sponsored by the National Science Foundation under grant number REC-0228295." Relevant pre-2000 studies of the effects of standards, alignment, goal setting, setting reachable goals, etc. include those of Mitchell (1999); Morgan & Ramist (1998); the *Southern Regional Education Board (1998); Miles, Bishop, Collins, Fink, Gardner, Grant, Hussain, et al. (1997); the Florida Office of Program Policy Analysis (1997); Pomplun (1997); Schmoker (1996); Aguilera & Hendricks (1996); Banta, Lund, Black & Oblander (1996); Bottoms & Mikos (1995); *Bamburg & Medina (1993); Bishop (1993); the U. S. General Accounting Office (1993); Eckstein & Noah (1993); Mattsson (1993); Brown (1992); Heyneman & Ransom (1992); Whetton (1992); Anderson, Muir, Bateson, Blackmore & Rogers (1990); Csikszentmihalyi (1990); *Levine & Lezotte (1990); LaRoque & Coleman (1989); Hillocks (1987); Willingham & Morris (1986); Resnick & Resnick (1985); Ogle & Fritts (1984); *Natriello & Dornbusch (1984); Brooke & Oxenham (1984); Rentz (1979); Wellisch, MacQueen, Carriere & Dick (1978); *Rosswork (1977); Estes, Colvin & Goodwin (1976); Wood (1953); and Panlasigui & Knight (1930).
47 Laura S. Hamilton Brian M. Stecher, Julie A. Marsh, Jennifer Sloan McCombs, Abby Robyn, Jennifer Lin Russell, Scott Naftel, Heather Barney "Much less is known about the impact of SBA at the district and school levels and the relationships among actions at the various levels and student outcomes. This study was designed to shed light on this complex set of relationships…" p.5 Dismissive Standards-Based Accountability Under No Child Left Behind: Experiences of Teachers and Administrators in Three States Rand Corporation, 2007 https://www.rand.org/pubs/monographs/MG589.html "This research was sponsored by the National Science Foundation under grant number REC-0228295." Relevant pre-2000 studies of the effects of standards, alignment, goal setting, setting reachable goals, etc. include those of Mitchell (1999); Morgan & Ramist (1998); the *Southern Regional Education Board (1998); Miles, Bishop, Collins, Fink, Gardner, Grant, Hussain, et al. (1997); the Florida Office of Program Policy Analysis (1997); Pomplun (1997); Schmoker (1996); Aguilera & Hendricks (1996); Banta, Lund, Black & Oblander (1996); Bottoms & Mikos (1995); *Bamburg & Medina (1993); Bishop (1993); the U. S. General Accounting Office (1993); Eckstein & Noah (1993); Mattsson (1993); Brown (1992); Heyneman & Ransom (1992); Whetton (1992); Anderson, Muir, Bateson, Blackmore & Rogers (1990); Csikszentmihalyi (1990); *Levine & Lezotte (1990); LaRoque & Coleman (1989); Hillocks (1987); Willingham & Morris (1986); Resnick & Resnick (1985); Ogle & Fritts (1984); *Natriello & Dornbusch (1984); Brooke & Oxenham (1984); Rentz (1979); Wellisch, MacQueen, Carriere & Dick (1978); *Rosswork (1977); Estes, Colvin & Goodwin (1976); Wood (1953); and Panlasigui & Knight (1930).
48 Eva L. Baker Joan L. Herman, Robert L. Linn "For  example, performance assessment was a rage in the early 1990s  because it was something new and flashy, and looked to have great promise.  Before almost any research was done, a number of states dropped their multiple-choice accountability systems, replacing them with performance assessments.   Dismissive ACCELERATING FUTURE POSSIBILITIES FORASSESSMENT AND LEARNING, p.1 CRESST Line, Winter 2006 https://www.researchgate.net/publication/277283780_in_Educational_Researcher_called_The_Awful_Reputation_of_Education_Research Institute of Education Sciences, US Education Department It is selected-response item formats (e.g., multiple choice) that are new. Open-ended item formats have been with us for millenia. And, thousands of research, evalution, and validity studies have been conducted on them.
49 Eva L. Baker Joan L. Herman, Robert L. Linn "By the end of this year, nearly half of all states will have graduation exams in place (Peterson, 2005). Short institutional memory forgets that similar minimum competency tests did not lead to increased achievement some 20 years ago, but instead contributed to higher numbers of high school dropouts and inequities along racial lines (Catterall, 1989; Haertel & Herman, 2005)." Dismissive ACCELERATING FUTURE POSSIBILITIES FORASSESSMENT AND LEARNING, p.3 CRESST Line, Winter 2006 https://www.researchgate.net/publication/277283780_in_Educational_Researcher_called_The_Awful_Reputation_of_Education_Research Institute of Education Sciences, US Education Department Just some of the relevant pre-2008 studies of the effects of minimum-competency or exit exams and the problems with a single passing score include those of Alvarez, Moreno, & Patrinos (2007); Grodsky & Kalogrides (2006); Audette (2005); Orlich (2003); StandardsWork (2003); Meisels, et al. (2003); Braun (2003); Rosenshine (2003); Tighe, Wang, & Foley (2002); Carnoy & Loeb (2002); Baumert & Demmrich (2001); Rosenblatt & Offer (2001); Phelps (2001); Toenjes, Dworkin, Lorence, & Hill (2000); Wenglinsky (2000); Massachusetts Finance Office (2000); DeMars (2000); Bishop (1999, 2000, 2001, & 2004); Grissmer & Flanagan(1998); Strauss, Bowes, Marks, & Plesko (1998); Frederiksen (1994); Ritchie & Thorkildsen (1994); Chao-Qun & Hui (1993); Potter & Wall (1992); Jacobson (1992); Rodgers, et al. (1991); Morris (1991); Winfield (1990); Ligon, Johnstone, Brightman, Davis, et al. (1990); Winfield (1987); Koffler (1987); Losack (1987); Marshall (1987); Hembree (1987); Mangino, Battaille, Washington, & Rumbaut (1986); Michigan Department of Education (1984); Ketchie (1984); Serow (1982); Indiana Education Department (1982); Brunton (1982); Paramore, et al. (1980); Ogden (1979); Down(2) (1979); Wellisch (1978); and Findley (1978).
50 Edward Haertel Joan Herman "Passing rates on MCTs in many states rose rapidly from year to year (Popham, Cruse, Rankin, Sandifer, & Williams, 1985). Despite these gains, and positive trends on examinations like the National Assessment of Educational Progress (NAEP), there is little evidence that MCTs were the reason for improvements on other examinations." Dismissive A Historical Perspective on Validity Arguments for Accountability Testing CRESST Report 654, June 2005 https://cresst.org/wp-content/uploads/R654.pdf Institute of Education Sciences, US Education Department Just some of the relevant pre-2008 studies of the effects of minimum-competency or exit exams and the problems with a single passing score include those of Alvarez, Moreno, & Patrinos (2007); Grodsky & Kalogrides (2006); Audette (2005); Orlich (2003); StandardsWork (2003); Meisels, et al. (2003); Braun (2003); Rosenshine (2003); Tighe, Wang, & Foley (2002); Carnoy & Loeb (2002); Baumert & Demmrich (2001); Rosenblatt & Offer (2001); Phelps (2001); Toenjes, Dworkin, Lorence, & Hill (2000); Wenglinsky (2000); Massachusetts Finance Office (2000); DeMars (2000); Bishop (1999, 2000, 2001, & 2004); Grissmer & Flanagan(1998); Strauss, Bowes, Marks, & Plesko (1998); Frederiksen (1994); Ritchie & Thorkildsen (1994); Chao-Qun & Hui (1993); Potter & Wall (1992); Jacobson (1992); Rodgers, et al. (1991); Morris (1991); Winfield (1990); Ligon, Johnstone, Brightman, Davis, et al. (1990); Winfield (1987); Koffler (1987); Losack (1987); Marshall (1987); Hembree (1987); Mangino, Battaille, Washington, & Rumbaut (1986); Michigan Department of Education (1984); Ketchie (1984); Serow (1982); Indiana Education Department (1982); Brunton (1982); Paramore, et al. (1980); Ogden (1979); Down(2) (1979); Wellisch (1978); and Findley (1978).
51 Robert L. Linn   "Despite the clear appeal of assessment-based accountability and the widespread use of this approach, the development of assessments that are aligned with content standards and for which there is solid evidence of validity and reliability is a challenging endeavor." Dismissive Issues in the Design of Accountability Systems CRESST Report 650, April 2005 https://cresst.org/wp-content/uploads/R650.pdf Institute of Education Sciences, US Education Department Relevant pre-2000 studies of the effects of standards, alignment, goal setting, setting reachable goals, etc. include those of Mitchell (1999); Morgan & Ramist (1998); the *Southern Regional Education Board (1998); Miles, Bishop, Collins, Fink, Gardner, Grant, Hussain, et al. (1997); the Florida Office of Program Policy Analysis (1997); Pomplun (1997); Schmoker (1996); Aguilera & Hendricks (1996); Banta, Lund, Black & Oblander (1996); Bottoms & Mikos (1995); *Bamburg & Medina (1993); Bishop (1993); the U. S. General Accounting Office (1993); Eckstein & Noah (1993); Mattsson (1993); Brown (1992); Heyneman & Ransom (1992); Whetton (1992); Anderson, Muir, Bateson, Blackmore & Rogers (1990); Csikszentmihalyi (1990); *Levine & Lezotte (1990); LaRoque & Coleman (1989); Hillocks (1987); Willingham & Morris (1986); Resnick & Resnick (1985); Ogle & Fritts (1984); *Natriello & Dornbusch (1984); Brooke & Oxenham (1984); Rentz (1979); Wellisch, MacQueen, Carriere & Dick (1978); *Rosswork (1977); Estes, Colvin & Goodwin (1976); Wood (1953); and Panlasigui & Knight (1930).
52 Robert L. Linn   "Alignment of an assessment with the content standards that it is intended to measure is critical if the assessment is to buttress rather than undermine the standards. Too little attention has been given to the evaluation of the alignment of assessments and standards." Denigrating Issues in the Design of Accountability Systems CRESST Report 650, April 2005 https://cresst.org/wp-content/uploads/R650.pdf Institute of Education Sciences, US Education Department Relevant pre-2000 studies of the effects of standards, alignment, goal setting, setting reachable goals, etc. include those of Mitchell (1999); Morgan & Ramist (1998); the *Southern Regional Education Board (1998); Miles, Bishop, Collins, Fink, Gardner, Grant, Hussain, et al. (1997); the Florida Office of Program Policy Analysis (1997); Pomplun (1997); Schmoker (1996); Aguilera & Hendricks (1996); Banta, Lund, Black & Oblander (1996); Bottoms & Mikos (1995); *Bamburg & Medina (1993); Bishop (1993); the U. S. General Accounting Office (1993); Eckstein & Noah (1993); Mattsson (1993); Brown (1992); Heyneman & Ransom (1992); Whetton (1992); Anderson, Muir, Bateson, Blackmore & Rogers (1990); Csikszentmihalyi (1990); *Levine & Lezotte (1990); LaRoque & Coleman (1989); Hillocks (1987); Willingham & Morris (1986); Resnick & Resnick (1985); Ogle & Fritts (1984); *Natriello & Dornbusch (1984); Brooke & Oxenham (1984); Rentz (1979); Wellisch, MacQueen, Carriere & Dick (1978); *Rosswork (1977); Estes, Colvin & Goodwin (1976); Wood (1953); and Panlasigui & Knight (1930).
53 Lorraine M. McDonnell   "A growing body of research suggests that school and classroom practices do change in response to these assessments (Herman and Golan, 1993; Smith and Rottenberg, 1991; Madaus, 1988)" 1stness Politics, Persuasion, and Educational Testing, p.9 Harvard University Press, 2004     Rubbish. Entire books dating back a century were written on the topic, for example:  C.C. Ross, Measurement in Today’s Schools, 1942;  G.M. Ruch, G.D. Stoddard, Tests and Measurements in High School Instruction, 1927;  C.W. Odell, Educational Measurement in High School, 1930. Other testimonies to the abundance of educational testing and empirical research on test use starting in the first half of the twentieth century can be found in Lincoln & Workman 1936, 4, 7; Butts 1947, 605; Monroe 1950, 1461; Holman & Docter 1972, 34; Tyack 1974, 183; and Lohman 1997, 88.
54 Lorraine M. McDonnell   "A growing body of research suggests that school and classroom practices do change in response to these assessments (Herman and Golan, 1993; Smith and Rottenberg, 1991; Madaus, 1988)" Dismissive Politics, Persuasion, and Educational Testing, p.9 Harvard University Press, 2004     Rubbish. Entire books dating back a century were written on the topic, for example:  C.C. Ross, Measurement in Today’s Schools, 1942;  G.M. Ruch, G.D. Stoddard, Tests and Measurements in High School Instruction, 1927;  C.W. Odell, Educational Measurement in High School, 1930. Other testimonies to the abundance of educational testing and empirical research on test use starting in the first half of the twentieth century can be found in Lincoln & Workman 1936, 4, 7; Butts 1947, 605; Monroe 1950, 1461; Holman & Docter 1972, 34; Tyack 1974, 183; and Lohman 1997, 88.
55 Lorraine M. McDonnell   "Although most literature on policy instruments identifies this persuasive tool as one of the stategies available to policymakers, little theoretical or comparative empirical research has been conducted on its properties." Dismissive Politics, Persuasion, and Educational Testing, p.24 Harvard University Press, 2004      
56 Lorraine M. McDonnell   "There is empirical research on policies that rely on hortatory tools, but studies of these individual policies have not examined them within a broader theoretical framework." Denigrating Politics, Persuasion, and Educational Testing, p.24 Harvard University Press, 2004      
57 Lorraine M. McDonnell   "This chapter represents an initial attempt to analyze the major characteristics of hortatory policy by taking an inductive approach and looking across several different policy areas to identify a few basic properties common to most policies of this type." 1stness Politics, Persuasion, and Educational Testing, p.24 Harvard University Press, 2004      
58 Lorraine M. McDonnell   "This chapter has begun the task of building a conceptual framework for understanding hortatory  policies by identifying their underlying causal assumptions and analyzing some basic properties common to most polcies that rely on information and values to motivate action."  1stness Politics, Persuasion, and Educational Testing, p.44–45 Harvard University Press, 2004      
59 Lorraine M. McDonnell   "Because so little systematic research has been conducted on hortatory policy, it is possible at this point only to suggest, rather than to specify, the conditions under which its underlying assumptions will be valid and a policy likely to succeed." Dismissive Politics, Persuasion, and Educational Testing, p.45 Harvard University Press, 2004      
60 Lorraine M. McDonnell   "Additional theoretical and empirical work is needed to develop a more rigorous and nuanced understanding of hotatory policy. Nevertheless, this study starts that process by articulating the policy theory undergirding hortatory policy and by outlining its potential promise and shortcomings." Denigrating Politics, Persuasion, and Educational Testing, p.45 Harvard University Press, 2004      
61 Lorraine M. McDonnell   "However, because research on the effects of high stakes testing is limited, finds mixed results, and suggests unintended consequences, the informational and persuasive dimensions of testing will continue to be critical to the success of this policy." Dismissive Politics, Persuasion, and Educational Testing, p.182–183 Harvard University Press, 2004     Rubbish. Entire books dating back a century were written on the topic, for example:  C.C. Ross, Measurement in Today’s Schools, 1942;  G.M. Ruch, G.D. Stoddard, Tests and Measurements in High School Instruction, 1927;  C.W. Odell, Educational Measurement in High School, 1930. Other testimonies to the abundance of educational testing and empirical research on test use starting in the first half of the twentieth century can be found in Lincoln & Workman 1936, 4, 7; Butts 1947, 605; Monroe 1950, 1461; Holman & Docter 1972, 34; Tyack 1974, 183; and Lohman 1997, 88.
62 Laura S. Hamilton Daniel F. McCaffrey, J.R. Lockwood, Daniel M. Koretz “The shortcomings of the studies make it difficult to determine the size of teacher effects, but we suspect that the magnitude of some of the effects reported in this literature are overstated.” p. xiii Denigrating Evaluating Value-Added Models for Teacher Accountability  Rand Corporation, 2003 https://www.rand.org/content/dam/rand/pubs/monographs/2004/RAND_MG158.pdf Rand Corporation funders Tennessee's TVAAS value-added measurement system had been running a decade when they wrote this and did much of what these authors claim had never been done.
63 Laura S. Hamilton Daniel F. McCaffrey, J.R. Lockwood, Daniel M. Koretz “Using VAM to estimate individual teacher effects is a recent endeavor, and many of the possible sources of error have not been thoroughly evaluated in the literature.” p. xix Dismissive Evaluating Value-Added Models for Teacher Accountability  Rand Corporation, 2003 https://www.rand.org/content/dam/rand/pubs/monographs/2004/RAND_MG158.pdf Rand Corporation funders Tennessee's TVAAS value-added measurement system had been running a decade when they wrote this and did much of what these authors claim had never been done.
64 Laura S. Hamilton Daniel F. McCaffrey, J.R. Lockwood, Daniel M. Koretz “Empirical evaluations do not exist for many of the potential sources of error we have identified. Studies need to be conducted to determine how these factors contribute to estimated teacher effects and to determine the conditions that exacerbate or mitigate the impact these factors have on teacher effects.” p. xix Dismissive Evaluating Value-Added Models for Teacher Accountability  Rand Corporation, 2003 https://www.rand.org/content/dam/rand/pubs/monographs/2004/RAND_MG158.pdf Rand Corporation funders Tennessee's TVAAS value-added measurement system had been running a decade when they wrote this and did much of what these authors claim had never been done.
65 Laura S. Hamilton Daniel F. McCaffrey, J.R. Lockwood, Daniel M. Koretz “This lack of attention to teachers in policy discussions may be attributed in part to another body of literature that attempted to determine the effects of specific teacher background characteristics, including credentialing status (e.g., Miller, McKenna, and McKenna, 1998; Goldhaber and Brewer, 2000) and subject matter coursework (e.g., Monk, 1994).” p. 8 Dismissive Evaluating Value-Added Models for Teacher Accountability  Rand Corporation, 2003 https://www.rand.org/content/dam/rand/pubs/monographs/2004/RAND_MG158.pdf Rand Corporation funders Tennessee's TVAAS value-added measurement system had been running a decade when they wrote this and did much of what these authors claim had never been done.
66 Laura S. Hamilton Daniel F. McCaffrey, J.R. Lockwood, Daniel M. Koretz “To date, there has been little empirical exploration of the size of school effects and the sensitivity of teacher effects to modeling of school effects.” p. 78 Dismissive Evaluating Value-Added Models for Teacher Accountability  Rand Corporation, 2003 https://www.rand.org/content/dam/rand/pubs/monographs/2004/RAND_MG158.pdf Rand Corporation funders Tennessee's TVAAS value-added measurement system had been running a decade when they wrote this and did much of what these authors claim had never been done.
67 Laura S. Hamilton Daniel F. McCaffrey, J.R. Lockwood, Daniel M. Koretz “There are no empirical explorations of the robustness of estimates to assumptions about prior-year schooling effects.“ p. 81 Dismissive Evaluating Value-Added Models for Teacher Accountability  Rand Corporation, 2003 https://www.rand.org/content/dam/rand/pubs/monographs/2004/RAND_MG158.pdf Rand Corporation funders Tennessee's TVAAS value-added measurement system had been running a decade when they wrote this and did much of what these authors claim had never been done.
68 Laura S. Hamilton Daniel F. McCaffrey, J.R. Lockwood, Daniel M. Koretz “There is currently no empirical evidence about the sensitivity of gain scores or teacher effects to such alternatives.” p. 89 Dismissive Evaluating Value-Added Models for Teacher Accountability  Rand Corporation, 2003 https://www.rand.org/content/dam/rand/pubs/monographs/2004/RAND_MG158.pdf Rand Corporation funders Tennessee's TVAAS value-added measurement system had been running a decade when they wrote this and did much of what these authors claim had never been done.
69 Laura S. Hamilton Daniel F. McCaffrey, J.R. Lockwood, Daniel M. Koretz “Empirical evaluations do not exist for many of the potential sources of error we have identified. Studies need to be conducted to determine how these factors contribute to estimated teacher effects and to determine the conditions that exacerbate or mitigate the impact these factors have on teacher effects.” p. 116 Dismissive Evaluating Value-Added Models for Teacher Accountability  Rand Corporation, 2003 https://www.rand.org/content/dam/rand/pubs/monographs/2004/RAND_MG158.pdf Rand Corporation funders Tennessee's TVAAS value-added measurement system had been running a decade when they wrote this and did much of what these authors claim had never been done.
70 Laura S. Hamilton Daniel F. McCaffrey, J.R. Lockwood, Daniel M. Koretz “Although we expect missing data are likely to be pervasive, there is little systematic discussion of the extent or nature of missing data in test score databases.” p. 117 Dismissive Evaluating Value-Added Models for Teacher Accountability  Rand Corporation, 2003 https://www.rand.org/content/dam/rand/pubs/monographs/2004/RAND_MG158.pdf Rand Corporation funders Tennessee's TVAAS value-added measurement system had been running a decade when they wrote this and did much of what these authors claim had never been done.
71 Marguerite Clarke 5 co-authors “What this study adds to the body of literature in this area is a systematic look at how impact varies with the stakes attached to the test results.” p. 91 1stness Perceived Effects of State-Mandated Testing Programs on Teaching and Learning etc. (5 co-authors) National Board on Educational Testing and Public Policy monograph, January 2003 http://files.eric.ed.gov/fulltext/ED474867.pdf Ford Foundation See, for example, Test Frequency, Stakes, and Feedback in Student Achievement: A Meta-Analysis   https://journals.sagepub.com/doi/abs/10.1177/0193841X19865628#abstract
72 Marguerite Clarke 5 co-authors “Many calls for school reform assert that high-stakes testing will foster the economic competitiveness of the U.S. However, the empirical basis for this claim is weak.” p. 96, n. 1 Denigrating Perceived Effects of State-Mandated Testing Programs on Teaching and Learning etc. (5 co-authors) National Board on Educational Testing and Public Policy monograph, January 2003 http://files.eric.ed.gov/fulltext/ED474867.pdf Ford Foundation  
73 Brian M. Stecher Laura S. Hamilton "The business model of setting clear targets, attaching incentives to the attainment of those targets, and rewarding those responsible for reaching the targets has proven successful in a wide range of business enterprises. But there is no evidence that these accountability principles will work well in an educational context, and there are many reasons to doubt that the principles can be applied without significant adaptation." Dismissive Putting Theory to the Test: Systems of "Educational Accountability" Should be Held Accountable Rand Review, Spring 2002 https://www.rand.org/pubs/periodicals/rand-review/issues/rr-04-02/theory.html Rand Corporation funders See, for example:  https://www.tandfonline.com/doi/full/10.1080/15305058.2011.602920 ; https://nonpartisaneducation.org/Review/Resources/QuantitativeList.htm .  This list includes 24 studies completed before 2000 whose primary focus was to measure the effect of “test-based accountability.” A few dozen more pre-2000 studies also measured the effect of test-based accountability although such was not their primary focus. Include qualitative and program evaluation studies of test-based accountability, and the count of pre-2000 studies rises into the hundreds. 
74 Brian M. Stecher Laura S. Hamilton " The lack of strong evidence regarding the design and effectiveness of accountability systems hampers policymaking at a critical juncture." Denigrating Putting Theory to the Test: Systems of "Educational Accountability" Should be Held Accountable Rand Review, Spring 2002 https://www.rand.org/pubs/periodicals/rand-review/issues/rr-04-02/theory.html Rand Corporation funders See, for example:  https://www.tandfonline.com/doi/full/10.1080/15305058.2011.602920 ; https://nonpartisaneducation.org/Review/Resources/QuantitativeList.htm .  This list includes 24 studies completed before 2000 whose primary focus was to measure the effect of “test-based accountability.” A few dozen more pre-2000 studies also measured the effect of test-based accountability although such was not their primary focus. Include qualitative and program evaluation studies of test-based accountability, and the count of pre-2000 studies rises into the hundreds. 
75 Brian M. Stecher Laura S. Hamilton "Nonetheless, the evidence has yet to justify the expectations. The initial evidence is, at best, mixed. On the plus side, students and teachers seem to respond to the incentives created by the accountability systems Dismissive Putting Theory to the Test: Systems of "Educational Accountability" Should be Held Accountable Rand Review, Spring 2002 https://www.rand.org/pubs/periodicals/rand-review/issues/rr-04-02/theory.html Rand Corporation funders See, for example:  https://www.tandfonline.com/doi/full/10.1080/15305058.2011.602920 ; https://nonpartisaneducation.org/Review/Resources/QuantitativeList.htm .  This list includes 24 studies completed before 2000 whose primary focus was to measure the effect of “test-based accountability.” A few dozen more pre-2000 studies also measured the effect of test-based accountability although such was not their primary focus. Include qualitative and program evaluation studies of test-based accountability, and the count of pre-2000 studies rises into the hundreds. 
76 Brian M. Stecher Laura S. Hamilton "Proponents of accountability attribute the improved scores in these states to clearer expectations, greater motivation on the part of the students and teachers, a focused curriculum, and more-effective instruction. However, there is little or no research to substantiate these positive changes or their effects on scores." Dismissive Putting Theory to the Test: Systems of "Educational Accountability" Should be Held Accountable Rand Review, Spring 2002 https://www.rand.org/pubs/periodicals/rand-review/issues/rr-04-02/theory.html Rand Corporation funders In fact, the evidence "that testing can improve education" is voluminous. See, for example, Phelps, R. P. (2005). The rich, robust research literature on testing’s achievement benefits. In R. P. Phelps (Ed.), Defending standardized testing (pp. 55–90). Mahwah, NJ: Psychology Press. Or, see https://journals.sagepub.com/doi/abs/10.1177/0193841X19865628#abstract
77 Brian M. Stecher Laura S. Hamilton "One of the earliest studies on the effects of testing (conducted in two Arizona schools in the late 1980s) showed that teachers reduced their emphasis on important, nontested material." Dismissive Putting Theory to the Test: Systems of "Educational Accountability" Should be Held Accountable Rand Review, Spring 2002 https://www.rand.org/pubs/periodicals/rand-review/issues/rr-04-02/theory.html Rand Corporation funders Hundreds, perhaps thousands, of studies of the effects of testing predate CRESST's in the 1980s. See, for example:  https://www.tandfonline.com/doi/full/10.1080/15305058.2011.602920 ; https://nonpartisaneducation.org/Review/Resources/QuantitativeList.htm ; https://nonpartisaneducation.org/Review/Resources/SurveyList.htm ; https://nonpartisaneducation.org/Review/Resources/QualitativeList.htm
78 Brian M. Stecher Laura S. Hamilton "Test-based accountability systems will work better if we acknowledge how little we know about them, if the federal government devotes appropriate resources to studying them, and if the states make ongoing efforts to improve them."  Dismissive Putting Theory to the Test: Systems of "Educational Accountability" Should be Held Accountable Rand Review, Spring 2002 https://www.rand.org/pubs/periodicals/rand-review/issues/rr-04-02/theory.html Rand Corporation funders See, for example:  https://www.tandfonline.com/doi/full/10.1080/15305058.2011.602920 ; https://nonpartisaneducation.org/Review/Resources/QuantitativeList.htm .  This list includes 24 studies completed before 2000 whose primary focus was to measure the effect of “test-based accountability.” A few dozen more pre-2000 studies also measured the effect of test-based accountability although such was not their primary focus. Include qualitative and program evaluation studies of test-based accountability, and the count of pre-2000 studies rises into the hundreds. 
79 Robert L. Linn Eva L. Baker "“It is true that many of these accommodated test conditions are not subjected to validity studies to determine whether the construct or domain tested has been significantly altered. In part, this lack of empirical data results from restricted resources.” p. 14 Dismissive Validity Issues for Accountability Systems CSE Technical Report 585 (December 2002) http://www.cse.ucla.edu/products/reports/TR585.pdf Office of Research and Improvement, US Education Department External evaluations of large-scale testing programs not only exist, but represent the norm. 
80 Brian M. Stecher Laura S. Hamilton, Stephen P. Klein, Eds. "High-stakes testing may also affect parents (e.g., their attitudes toward education, their engagement with schools,  and their direct participation in their child's learning) as well as policymakers (their beliefs about system performance, their judgements about program effectiveness, and their allocation of resources). However, these issues remain largely unexamined in the literature." Dismissive Consequences of large-scale, high-stakes testing on school and classroom practice Chapter 4 in Making sense of test-based accountability in education, 2002, p.79 https://www.rand.org/content/dam/rand/pubs/monograph_reports/2002/MR1554.pdf Office of Research and Improvement, US Education Department Parents and other adults are typically reached.through public opinion polls. See, for example:  https://www.tandfonline.com/doi/full/10.1080/15305058.2011.602920 ; https://nonpartisaneducation.org/Review/Resources/SurveyList.htm .  Among the hundreds of polls conducted between 1958 and 2008, a majority of them included parents in particular or adults in general.
81 Brian M. Stecher Laura S. Hamilton, Stephen P. Klein, Eds. "As described in chapter 2, there was little concern about the effects of testing on teaching prior to the 1970s." Dismissive Consequences of large-scale, high-stakes testing on school and classroom practice Chapter 4 in Making sense of test-based accountability in education, 2002, p.81 https://www.rand.org/content/dam/rand/pubs/monograph_reports/2002/MR1554.pdf Office of Research and Improvement, US Education Department Rubbish. Entire books were written on the topic, for example:  C.C. Ross, Measurement in Today’s Schools, 1942;  G.M. Ruch, G.D. Stoddard, Tests and Measurements in High School Instruction, 1927;  C.W. Odell, Educational Measurement in High School, 1930. Other testimonies to the abundance of educational testing and empirical research on test use starting in the first half of the twentieth century can be found in Lincoln & Workman 1936, 4, 7; Butts 1947, 605; Monroe 1950, 1461; Holman & Docter 1972, 34; Tyack 1974, 183; and Lohman 1997, 88.
82 Brian M. Stecher Laura S. Hamilton, Stephen P. Klein, Eds. "In light of the changes that occurred in the uses of large-scale testing in the 1980s and 1990s, researchers began to investigate teachers' reactions to external assessment. The initial research on the impact of large-scale testing was conducted in the 1980s and the 1990s." Dismissive Consequences of large-scale, high-stakes testing on school and classroom practice Chapter 4 in Making sense of test-based accountability in education, 2002, p.83 https://www.rand.org/content/dam/rand/pubs/monograph_reports/2002/MR1554.pdf Office of Research and Improvement, US Education Department Hundreds, perhaps thousands, of studies of the effects of testing predate CRESST's in the 1980s. See, for example:  https://www.tandfonline.com/doi/full/10.1080/15305058.2011.602920 ; https://nonpartisaneducation.org/Review/Resources/QuantitativeList.htm ; https://nonpartisaneducation.org/Review/Resources/SurveyList.htm ; https://nonpartisaneducation.org/Review/Resources/QualitativeList.htm
83 Brian M. Stecher Laura S. Hamilton, Stephen P. Klein, Eds. "Researchers have not documented the desirable consequences of testing … as clearly as the undesirable ones. More importantly, researchers have not generally measured the extent or magnitude of the shifts in practice that they identified as a result of high-stakes testing." Dismissive Consequences of large-scale, high-stakes testing on school and classroom practice Chapter 4 in Making sense of test-based accountability in education, 2002, pp.99–100 https://www.rand.org/content/dam/rand/pubs/monograph_reports/2002/MR1554.pdf Office of Research and Improvement, US Education Department Hundreds, perhaps thousands, of studies of the effects of testing predate CRESST's in the 1980s. See, for example:  https://www.tandfonline.com/doi/full/10.1080/15305058.2011.602920 ; https://nonpartisaneducation.org/Review/Resources/QuantitativeList.htm ; https://nonpartisaneducation.org/Review/Resources/SurveyList.htm ; https://nonpartisaneducation.org/Review/Resources/QualitativeList.htm
84 Lauren B. Resnick Robert Rothman, Jean B. Slattery, Jennifer L. Vranek "States that have or adopt test-based accountability programs claim that their tests are aligned to their standards. But there has been, up to now, no independent methodology for checking alignment. This paper describes and illustrates such a methodology..." 1stness Benchmarking and Alignment of Standards and Testing, p.1 CSE Technical Report 566, CRESST/Achieve, May 2002 https://www.achieve.org/files/TR566.pdf Office of Research and Improvement, US Education Department Relevant pre-2000 studies of the effects of standards, alignment, goal setting, setting reachable goals, etc. include those of Mitchell (1999); Morgan & Ramist (1998); the *Southern Regional Education Board (1998); Miles, Bishop, Collins, Fink, Gardner, Grant, Hussain, et al. (1997); the Florida Office of Program Policy Analysis (1997); Pomplun (1997); Schmoker (1996); Aguilera & Hendricks (1996); Banta, Lund, Black & Oblander (1996); Bottoms & Mikos (1995); *Bamburg & Medina (1993); Bishop (1993); the U. S. General Accounting Office (1993); Eckstein & Noah (1993); Mattsson (1993); Brown (1992); Heyneman & Ransom (1992); Whetton (1992); Anderson, Muir, Bateson, Blackmore & Rogers (1990); Csikszentmihalyi (1990); *Levine & Lezotte (1990); LaRoque & Coleman (1989); Hillocks (1987); Willingham & Morris (1986); Resnick & Resnick (1985); Ogle & Fritts (1984); *Natriello & Dornbusch (1984); Brooke & Oxenham (1984); Rentz (1979); Wellisch, MacQueen, Carriere & Dick (1978); *Rosswork (1977); Estes, Colvin & Goodwin (1976); Wood (1953); and Panlasigui & Knight (1930).
85 Lauren B. Resnick Robert Rothman, Jean B. Slattery, Jennifer L. Vranek "Yet  few,  if  any,  states have put in place effective policies or resource systems for improving instructional quality (National Research Council, 1999)." Dismissive Benchmarking and Alignment of Standards and Testing, p.4 CSE Technical Report 566, CRESST/Achieve, May 2002 https://www.achieve.org/files/TR566.pdf Office of Research and Improvement, US Education Department Hundreds, perhaps thousands, of studies of the effects of testing predate CRESST's in the 1980s. See, for example:  https://www.tandfonline.com/doi/full/10.1080/15305058.2011.602920 ; https://nonpartisaneducation.org/Review/Resources/QuantitativeList.htm ; https://nonpartisaneducation.org/Review/Resources/SurveyList.htm ; https://nonpartisaneducation.org/Review/Resources/QualitativeList.htm
86 Laura S. Hamilton Brian M. Stecher, Stephen P. Klein "Although test-based accountability has shown some compelling results, the issues are complex, the research is new and incomplete, and many of the claims that have received the most attention have proved to be premature and superficial." Denigrating Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002 Summary, p.xiv https://www.rand.org/pubs/monograph_reports/MR1554.html Office of Research and Improvement, US Education Department Hundreds, perhaps thousands, of studies of the effects of testing predate CRESST's in the 1980s. See, for example:  https://www.tandfonline.com/doi/full/10.1080/15305058.2011.602920 ; https://nonpartisaneducation.org/Review/Resources/QuantitativeList.htm ; https://nonpartisaneducation.org/Review/Resources/SurveyList.htm ; https://nonpartisaneducation.org/Review/Resources/QualitativeList.htm
87 Laura S. Hamilton Brian M. Stecher, Stephen P. Klein "The research evidence does not provide definitive information about the actual costs of testing but the information that is available suggests that expenditures for testing have grown in recent years." Dismissive Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002 Introduction, p.9 https://www.rand.org/pubs/monograph_reports/MR1554.html Office of Research and Improvement, US Education Department No. See, for example, Phelps, R.P. (2000, Winter). Estimating the cost of systemwide student testing in the United States. Journal of Education Finance, 25(3) 343–380; Danitz, T. (2001, February 27). Special report: States pay $400 million for tests in 2001. Stateline.org. Pew Center for the States; Hoxby, C.M. (2002). The cost of accountability, in W. M Evers & H.J. Walberg (Eds.), School Accountability, Stanford, CA: Hoover Institution Press; U.S. GAO. (1993, January). Student testing: Current extent and expenditures, with cost estimates for a national examination. GAO/PEMD-93-8. Washington, DC: US General Accounting Office; Picus, L.O., & Tralli, A. (1998, February). Alternative assessment programs: What are the true costs? CSE Technical Report 441, Los Angeles: CRESST; Phelps, R.P. (1998). Benefit-cost analysis of systemwide student testing, Paper presented at the annual meeting of the American Education Finance Association, Mobile, AL.
88 Laura S. Hamilton Daniel M. Koretz "There is currently no substantial evidence on the effects of published report cards on parents’ decisionmaking or on the schools themselves." Dismissive Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002 Chapter 2: Tests and their use in test-based accountability systems, p.44 https://www.rand.org/pubs/monograph_reports/MR1554.html Office of Research and Improvement, US Education Department For decades, consulting services have existed that help parents new to a city select the right school or school district for them.
89 Brian M. Stecher   "High-stakes testing may also affect parents ...  as well as policymakers .... However, these issues remain largely unexamined in the literature." Dismissive Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002 Chapter 3: Technical Criteria for Evaluating Tests, p.79 https://www.rand.org/pubs/monograph_reports/MR1554.html Office of Research and Improvement, US Education Department Parents and other adults are typically reached.through public opinion polls. See, for example:  https://www.tandfonline.com/doi/full/10.1080/15305058.2011.602920 ; https://nonpartisaneducation.org/Review/Resources/SurveyList.htm .  Among the hundreds of polls conducted between 1958 and 2008, a majority of them included parents in particular or adults in general.
90 Brian M. Stecher   "Data on the incidence of cheating [on educational tests] are scarce…" Dismissive Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002 Chapter 3: Technical Criteria for Evaluating Tests, p.96 https://www.rand.org/pubs/monograph_reports/MR1554.html Office of Research and Improvement, US Education Department Actually, there have been, in surveys, in which respondents freely admit that they cheat and how. Moreover, news reports of cheating, by students or educators, have been voluminous. See, for example, Caveon Test Security's "Cheating in the News" section on its web site.
91 Brian M. Stecher   "Less is known about changes in policies at the district and school levels in response to high-stakes testing, but mixed evidence of some impact has appeared." Dismissive Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002 Chapter 3: Technical Criteria for Evaluating Tests, p.96 https://www.rand.org/pubs/monograph_reports/MR1554.html Office of Research and Improvement, US Education Department Relevant pre-2000 studies of the effects of testing on at-risk students, completion, dropping out, curricular offerings, attitudes, etc. include those of Schleisman (1999); the *Southern Regional Education Board (1998); Webster, Mendro, Orsak, Weerasinghe & Bembry (1997); Jones (1996); Boylan (1996); Jones, 1993; Jacobson (1992); Grisay (1991); Johnstone (1990); Task Force on Educational Assessment Programs [Florida] (1979); Wellisch, MacQueen, Carriere & Duck (1978); Enochs (1978); Pronaratna (1976); and McWilliams & Thomas (1976).
92 Brian M. Stecher   "Although numerous news articles have addressed the negative effects of high-stakes testing, systematic research on the subject is limited." Dismissive Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002 Chapter 3: Technical Criteria for Evaluating Tests, p.98 https://www.rand.org/pubs/monograph_reports/MR1554.html Office of Research and Improvement, US Education Department Relevant pre-2000 studies of the effects of testing on at-risk students, completion, dropping out, curricular offerings, attitudes, etc. include those of Schleisman (1999); the *Southern Regional Education Board (1998); Webster, Mendro, Orsak, Weerasinghe & Bembry (1997); Jones (1996); Boylan (1996); Jones, 1993; Jacobson (1992); Grisay (1991); Johnstone (1990); Task Force on Educational Assessment Programs [Florida] (1979); Wellisch, MacQueen, Carriere & Duck (1978); Enochs (1978); Pronaratna (1976); and McWilliams & Thomas (1976).
93 Brian M. Stecher   "researchers have not generally measured the extent or magnitude of the shifts in practice that they identified as a result of high-stakes testing." Dismissive Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002 Chapter 3: Technical Criteria for Evaluating Tests, pp.99–100 https://www.rand.org/pubs/monograph_reports/MR1554.html Office of Research and Improvement, US Education Department Relevant pre-2000 studies of the effects of testing on at-risk students, completion, dropping out, curricular offerings, attitudes, etc. include those of Schleisman (1999); the *Southern Regional Education Board (1998); Webster, Mendro, Orsak, Weerasinghe & Bembry (1997); Jones (1996); Boylan (1996); Jones, 1993; Jacobson (1992); Grisay (1991); Johnstone (1990); Task Force on Educational Assessment Programs [Florida] (1979); Wellisch, MacQueen, Carriere & Duck (1978); Enochs (1978); Pronaratna (1976); and McWilliams & Thomas (1976).
94 Lorraine M. McDonnell   "...this chapter can only describe the issues that are raised when one looks at testing from a political perspective. Because of the lack of systematic studies on the topic." Dismissive Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002 Chapter 4: Consequences of Large-Scale, High-Stakes Testing on School and Classroom Practice, p.102 https://www.rand.org/pubs/monograph_reports/MR1554.html Office of Research and Improvement, US Education Department Parents and other adults are typically reached.through public opinion polls. See, for example:  https://www.tandfonline.com/doi/full/10.1080/15305058.2011.602920 ; https://nonpartisaneducation.org/Review/Resources/SurveyList.htm .  Among the hundreds of polls conducted between 1958 and 2008, a majority of them included parents in particular or adults in general.
95 Lorraine M. McDonnell   "...public opinion, as measured by surveys, does not always provide a clear and unambiguous measure of public sentiment." Denigrating Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002 Chapter 5: Accountability as seen through a political lens, p.108 https://www.rand.org/pubs/monograph_reports/MR1554.html Office of Research and Improvement, US Education Department Parents and other adults are typically reached.through public opinion polls. See, for example:  https://www.tandfonline.com/doi/full/10.1080/15305058.2011.602920 ; https://nonpartisaneducation.org/Review/Resources/SurveyList.htm .  Among the hundreds of polls conducted between 1958 and 2008, a majority of them included parents in particular or adults in general.
96 Laura S. Hamilton Brian M. Stecher "So test-based accountability remains controversial because there is inadequate evidence to make clear judgments about its effectiveness in raising test scores and achieving its other goals." Dismissive Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002 Chapter 6: Improving test-based accountability, p.122 https://www.rand.org/pubs/monograph_reports/MR1554.html Office of Research and Improvement, US Education Department In fact, the evidence "that testing can improve education" is voluminous. See, for example, Phelps, R. P. (2005). The rich, robust research literature on testing’s achievement benefits. In R. P. Phelps (Ed.), Defending standardized testing (pp. 55–90). Mahwah, NJ: Psychology Press. Or, see https://journals.sagepub.com/doi/abs/10.1177/0193841X19865628#abstract
97 Laura S. Hamilton Brian M. Stecher "Unfortunately, the complexity of the issues and the ambiguity of the existing research do not allow our recommendations to take the form of a practical “how-to” guide for policymakers and practitioners." Denigrating Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002 Chapter 6: Improving test-based accountability, p.123 https://www.rand.org/pubs/monograph_reports/MR1554.html Office of Research and Improvement, US Education Department In fact, the evidence "that testing can improve education" is voluminous. See, for example, Phelps, R. P. (2005). The rich, robust research literature on testing’s achievement benefits. In R. P. Phelps (Ed.), Defending standardized testing (pp. 55–90). Mahwah, NJ: Psychology Press. Or, see https://journals.sagepub.com/doi/abs/10.1177/0193841X19865628#abstract
98 Laura S. Hamilton Brian M. Stecher "Another part of the interpretive question is the need to gather information
in other subject areas to portray a more complete picture of
achievement.
The scope of constructs that have been considered in
research
to date has been fairly narrow, focusing on the subjects that
are part of the accountability systems that have been studied. Many
legitimate instructional
objectives have been ignored in the literature
to date."
Denigrating Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002 Chapter 6: Improving test-based accountability, p.127 https://www.rand.org/pubs/monograph_reports/MR1554.html Office of Research and Improvement, US Education Department Many studies of the effects of testing predate CRESST's in the 1980s and cover all subject fields, not just reading and math. See, for example:  https://www.tandfonline.com/doi/full/10.1080/15305058.2011.602920 ; https://nonpartisaneducation.org/Review/Resources/QuantitativeList.htm ; https://nonpartisaneducation.org/Review/Resources/SurveyList.htm ; https://nonpartisaneducation.org/Review/Resources/QualitativeList.htm
99 Laura S. Hamilton Brian M. Stecher "States should also conduct ongoing analyses of the performance of groups whose members may not be numerous enough to permit separate reporting. English-language learners and students with disabilities are increasingly being included in high-stakes testing systems, and, as discussed in Chapter Three, little is currently known about the validity of scores for these groups." Dismissive Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002 Chapter 6: Improving test-based accountability, p.131 https://www.rand.org/pubs/monograph_reports/MR1554.html Office of Research and Improvement, US Education Department Difficult to believe given that the federal government has for decades generously funded research into testing students with disabilities. See, for example, https://nceo.info/ and Kurt Geisinger's and Janet Carlson's chapters in Defending Standardized Testing and Correcting Fallacies in Educational and Psychological Testing. 
100 Laura S. Hamilton Brian M. Stecher "It would be especially helpful to know what changes in instruction are made in response to different kinds of information and incentives. In particular, we need to know how teachers interpret information from tests and how they use it to modify instruction." Dismissive Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002 Chapter 6: Improving test-based accountability, p.133 https://www.rand.org/pubs/monograph_reports/MR1554.html Office of Research and Improvement, US Education Department Relevant studies of the effects of tests and/or accountability program on motivation and instructional practice include those of the *Southern Regional Education Board (1998); Johnson (1998); Schafer, Hultgren, Hawley, Abrams Seubert & Mazzoni (1997); Miles, Bishop, Collins, Fink, Gardner, Grant, Hussain, et al. (1997); Tuckman & Trimble (1997); Clarke & Stephens (1996); Zigarelli (1996); Stevenson, Lee, et al. (1995); Waters, Burger & Burger (1995); Egeland (1995); Prais (1995); Tuckman (1994); Ritchie & Thorkildsen (1994); Brown & Walberg, (1993); Wall & Alderson (1993); Wolf & Rapiau (1993); Eckstein & Noah (1993); Chao-Qun & Hui (1993); Plazak & Mazur (1992); Steedman (1992); Singh, Marimutha & Mukjerjee (1990); *Levine & Lezotte (1990); O’Sullivan (1989); Somerset (1988); Pennycuick & Murphy (1988); Stevens (1984); Marsh (1984); Brunton (1982); Solberg (1977); Foss (1977); *Kirkland (1971); Somerset (1968); Stuit (1947); and Keys (1934).  *Covers many studies; study is a research review, research synthesis, or meta-analysis. "Others have considered the role of tests in incentive programs.  These researchers have included Homme, Csanyi, Gonzales, Rechs, O’Leary, Drabman, Kaszdin, Bootzin, Staats, Cameron, Pierce, McMillan, Corcoran, and Wilson. International organizations, such as the World Bank or the Asian Development Bank, have studied the effects of testing on education programs they sponsor.  Researchers have included Somerset, Heynemann, Ransom, Psacharopoulis, Velez, Brooke, Oxenham, Bude, Chapman, Snyder, and Pronaratna.
Moreover, the mastery learning/mastery testing experiments conducted from the 1960s through today varied incentives, frequency of tests, types of tests, and many other factors to determine the optimal structure of testing programs. Researchers included such notables as Bloom, Carroll, Keller, Block, Burns, Wentling, Anderson, Hymel, Kulik, Tierney, Cross, Okey, Guskey, Gates, and Jones."
101 Laura S. Hamilton Brian M. Stecher It seems clear that aligning the components of the system and providing appropriate professional development should, at a minimum, increase teachers’ political support for test-based accountability policies .... Although there is no empirical evidence to suggest that this strategy will reduce inappropriate responses to high-stakes testing,... Additional research needs to be done to determine the importance of alignment for promoting positive effects of test-based accountability. Dismissive Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002 Chapter 6: Improving test-based accountability, p.135 https://www.rand.org/pubs/monograph_reports/MR1554.html Office of Research and Improvement, US Education Department Relevant studies of the effects of tests and/or accountability program on motivation and instructional practice include those of the *Southern Regional Education Board (1998); Johnson (1998); Schafer, Hultgren, Hawley, Abrams Seubert & Mazzoni (1997); Miles, Bishop, Collins, Fink, Gardner, Grant, Hussain, et al. (1997); Tuckman & Trimble (1997); Clarke & Stephens (1996); Zigarelli (1996); Stevenson, Lee, et al. (1995); Waters, Burger & Burger (1995); Egeland (1995); Prais (1995); Tuckman (1994); Ritchie & Thorkildsen (1994); Brown & Walberg, (1993); Wall & Alderson (1993); Wolf & Rapiau (1993); Eckstein & Noah (1993); Chao-Qun & Hui (1993); Plazak & Mazur (1992); Steedman (1992); Singh, Marimutha & Mukjerjee (1990); *Levine & Lezotte (1990); O’Sullivan (1989); Somerset (1988); Pennycuick & Murphy (1988); Stevens (1984); Marsh (1984); Brunton (1982); Solberg (1977); Foss (1977); *Kirkland (1971); Somerset (1968); Stuit (1947); and Keys (1934).  *Covers many studies; study is a research review, research synthesis, or meta-analysis. "Others have considered the role of tests in incentive programs.  These researchers have included Homme, Csanyi, Gonzales, Rechs, O’Leary, Drabman, Kaszdin, Bootzin, Staats, Cameron, Pierce, McMillan, Corcoran, and Wilson. International organizations, such as the World Bank or the Asian Development Bank, have studied the effects of testing on education programs they sponsor.  Researchers have included Somerset, Heynemann, Ransom, Psacharopoulis, Velez, Brooke, Oxenham, Bude, Chapman, Snyder, and Pronaratna.
Moreover, the mastery learning/mastery testing experiments conducted from the 1960s through today varied incentives, frequency of tests, types of tests, and many other factors to determine the optimal structure of testing programs. Researchers included such notables as Bloom, Carroll, Keller, Block, Burns, Wentling, Anderson, Hymel, Kulik, Tierney, Cross, Okey, Guskey, Gates, and Jones."
102 Laura S. Hamilton Brian M. Stecher "… we currently do not know enough about test-based accountability to design a system that is immune from the problems we have discussed Dismissive Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002 Chapter 6: Improving test-based accountability, p.136 https://www.rand.org/pubs/monograph_reports/MR1554.html Office of Research and Improvement, US Education Department In fact, the evidence "that testing can improve education" is voluminous. See, for example, Phelps, R. P. (2005). The rich, robust research literature on testing’s achievement benefits. In R. P. Phelps (Ed.), Defending standardized testing (pp. 55–90). Mahwah, NJ: Psychology Press. Or, see https://journals.sagepub.com/doi/abs/10.1177/0193841X19865628#abstract
103 Laura S. Hamilton Brian M. Stecher "There is some limited evidence that educators’ responses to test based accountability vary according to the characteristics of their student populations,…" Denigrating Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002 Chapter 6: Improving test-based accountability, p.138 https://www.rand.org/pubs/monograph_reports/MR1554.html Office of Research and Improvement, US Education Department There was and is far more than "limited" evidence. See, for example:  Hundreds, perhaps thousands, of studies of the effects of testing predate CRESST's in the 1980s. See, for example:  https://www.tandfonline.com/doi/full/10.1080/15305058.2011.602920 ; https://nonpartisaneducation.org/Review/Resources/SurveyList.htm ; https://nonpartisaneducation.org/Review/Resources/QualitativeList.htm
104 Laura S. Hamilton Brian M. Stecher "... there is very limited evidence to guide thinking about political issues." Dismissive Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002 Chapter 6: Improving test-based accountability, p.139 https://www.rand.org/pubs/monograph_reports/MR1554.html Office of Research and Improvement, US Education Department Parents and other adults are typically reached.through public opinion polls. See, for example:  https://www.tandfonline.com/doi/full/10.1080/15305058.2011.602920 ; https://nonpartisaneducation.org/Review/Resources/SurveyList.htm .  Among the hundreds of polls conducted between 1958 and 2008, a majority of them included parents in particular or adults in general.
105 Laura S. Hamilton Brian M. Stecher "First, we do not have an accurate assessment of the additional costs." Denigrating Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002 Chapter 6: Improving test-based accountability, p.141 https://www.rand.org/pubs/monograph_reports/MR1554.html Office of Research and Improvement, US Education Department Yes, we did and we do. See, for example, Phelps, R.P. (2000, Winter). Estimating the cost of systemwide student testing in the United States. Journal of Education Finance, 25(3) 343–380; Danitz, T. (2001, February 27). Special report: States pay $400 million for tests in 2001. Stateline.org. Pew Center for the States; Hoxby, C.M. (2002). The cost of accountability, in W. M Evers & H.J. Walberg (Eds.), School Accountability, Stanford, CA: Hoover Institution Press; U.S. GAO. (1993, January). Student testing: Current extent and expenditures, with cost estimates for a national examination. GAO/PEMD-93-8. Washington, DC: US General Accounting Office; Picus, L.O., & Tralli, A. (1998, February). Alternative assessment programs: What are the true costs? CSE Technical Report 441, Los Angeles: CRESST; Phelps, R.P. (1998). Benefit-cost analysis of systemwide student testing, Paper presented at the annual meeting of the American Education Finance Association, Mobile, AL.
106 Laura S. Hamilton Brian M. Stecher "Part of the reason these issues are rarely considered may be that no one has produced a good estimate of the cost of an improved accountability system in comparison with its benefits." Denigrating Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002 Chapter 6: Improving test-based accountability, p.141 https://www.rand.org/pubs/monograph_reports/MR1554.html Office of Research and Improvement, US Education Department No. See, for example, Phelps, R.P. (2000, Winter). Estimating the cost of systemwide student testing in the United States. Journal of Education Finance, 25(3) 343–380; Danitz, T. (2001, February 27). Special report: States pay $400 million for tests in 2001. Stateline.org. Pew Center for the States; Hoxby, C.M. (2002). The cost of accountability, in W. M Evers & H.J. Walberg (Eds.), School Accountability, Stanford, CA: Hoover Institution Press; U.S. GAO. (1993, January). Student testing: Current extent and expenditures, with cost estimates for a national examination. GAO/PEMD-93-8. Washington, DC: US General Accounting Office; Picus, L.O., & Tralli, A. (1998, February). Alternative assessment programs: What are the true costs? CSE Technical Report 441, Los Angeles: CRESST; Phelps, R.P. (1998). Benefit-cost analysis of systemwide student testing, Paper presented at the annual meeting of the American Education Finance Association, Mobile, AL.
107 Laura S. Hamilton Brian M. Stecher "Nevertheless, our knowledge of the costs of alternative accountability systems is still somewhat limited. Policymakers need to know how much it would cost to change their current systems to be responsive to criticisms such as those described in this book. These estimates need to consider all of the associated costs, including possible opportunity costs associated with increased testing time and increased test preparation time." Dismissive Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002 Chapter 6: Improving test-based accountability, p.142 https://www.rand.org/pubs/monograph_reports/MR1554.html Office of Research and Improvement, US Education Department No. See, for example, Phelps, R.P. (2000, Winter). Estimating the cost of systemwide student testing in the United States. Journal of Education Finance, 25(3) 343–380; Danitz, T. (2001, February 27). Special report: States pay $400 million for tests in 2001. Stateline.org. Pew Center for the States; Hoxby, C.M. (2002). The cost of accountability, in W. M Evers & H.J. Walberg (Eds.), School Accountability, Stanford, CA: Hoover Institution Press; U.S. GAO. (1993, January). Student testing: Current extent and expenditures, with cost estimates for a national examination. GAO/PEMD-93-8. Washington, DC: US General Accounting Office; Picus, L.O., & Tralli, A. (1998, February). Alternative assessment programs: What are the true costs? CSE Technical Report 441, Los Angeles: CRESST; Phelps, R.P. (1998). Benefit-cost analysis of systemwide student testing, Paper presented at the annual meeting of the American Education Finance Association, Mobile, AL.
108 Laura S. Hamilton Brian M. Stecher "However, there is still much about these systems that is not well understood. Lack of research-based knowledge about the quality of scores and the mechanisms through which high-stakes testing programs operate limits our ability to improve these systems. As a result, our discussions also identified unanswered questions..." Dismissive Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002 Chapter 6: Improving test-based accountability, p.143 https://www.rand.org/pubs/monograph_reports/MR1554.html Office of Research and Improvement, US Education Department In fact, the evidence "that testing can improve education" is voluminous. See, for example, Phelps, R. P. (2005). The rich, robust research literature on testing’s achievement benefits. In R. P. Phelps (Ed.), Defending standardized testing (pp. 55–90). Mahwah, NJ: Psychology Press. Or, see https://journals.sagepub.com/doi/abs/10.1177/0193841X19865628#abstract
109 Daniel M. Koretz Daniel F. McCaffrey, Laura S. Hamilton "Although high-stakes testing is now widespread, methods for evaluating the validity of gains obtained under high-stakes conditions are poorly developed. This report presents an approach for evaluating the validity of inferences based on score gains on high-stakes tests. It describes the inadequacy of traditional validation approaches for validating gains under high-stakes conditions and outlines an alternative validation framework for conceptualizing meaningful and inflated score gains.", p.1 Denigrating Toward a framework for validating gains under high-stakes conditions CSE Technical Report 551, CRESST/Harvard Graduate School of Education, CRESST/RAND Education, December 2001 https://files.eric.ed.gov/fulltext/ED462410.pdf Office of Research and Improvement, US Education Department In fact the test prep, or test coaching, literature is vast and dates back decades, with meta-analyses of the literature dating back at least to the 1970s. There's even a What Works Clearinghouse summary of the (post World Wide Web) college admission test prep research literature:  https://ies.ed.gov/ncee/wwc/Docs/InterventionReports/wwc_act_sat_100416.pdf . See also:  Ortar (1960)  Marron (1965)  ETS (1965). Messick & Jungeblut (1981)  Ellis, Konoske, Wulfeck, & Montague (1982)  DerSimonian and Laird (1983)  Kulik, Bangert-Drowns & Kulik (1984)  Powers (1985)  Jones (1986). Fraker (1986/1987)  Halpin (1987)  Whitla (1988)  Snedecor (1989)  Bond (1989). Baydar (1990)  Becker (1990)  Smyth (1990)  Moore (1991)  Alderson & Wall (1992)  Powers (1993)  Oren (1993). Powers & Rock (1994)  Scholes, Lane (1997)   Allalouf & Ben Shakhar (1998)  Robb & Ercanbrack (1999)  McClain (1999)  Camara (1999, 2001, 2008) Stone & Lane (2000, 2003)  Din & Soldan (2001)  Briggs (2001)  Palmer (2002)  Briggs & Hansen (2004)  Cankoy & Ali Tut (2005)  Crocker (2005)  Allensworth, Correa, & Ponisciak (2008)  Domingue & Briggs (2009)  Koljatic & Silva (2014)  Early (2019)
110 Daniel M. Koretz Daniel F. McCaffrey, Laura S. Hamilton "Few efforts are made to evaluate directly score gains obtained under high-stakes conditions, and conventional validation tools are not fully adequate for the task.", p. 1 Dismissive Toward a framework for validating gains under high-stakes conditions CSE Technical Report 551, CRESST/Harvard Graduate School of Education, CRESST/RAND Education, December 2001 https://files.eric.ed.gov/fulltext/ED462410.pdf Office of Research and Improvement, US Education Department In fact the test prep, or test coaching, literature is vast and dates back decades, with meta-analyses of the literature dating back at least to the 1970s. There's even a What Works Clearinghouse summary of the (post World Wide Web) college admission test prep research literature:  https://ies.ed.gov/ncee/wwc/Docs/InterventionReports/wwc_act_sat_100416.pdf . See also:  Ortar (1960)  Marron (1965)  ETS (1965). Messick & Jungeblut (1981)  Ellis, Konoske, Wulfeck, & Montague (1982)  DerSimonian and Laird (1983)  Kulik, Bangert-Drowns & Kulik (1984)  Powers (1985)  Jones (1986). Fraker (1986/1987)  Halpin (1987)  Whitla (1988)  Snedecor (1989)  Bond (1989). Baydar (1990)  Becker (1990)  Smyth (1990)  Moore (1991)  Alderson & Wall (1992)  Powers (1993)  Oren (1993). Powers & Rock (1994)  Scholes, Lane (1997)   Allalouf & Ben Shakhar (1998)  Robb & Ercanbrack (1999)  McClain (1999)  Camara (1999, 2001, 2008) Stone & Lane (2000, 2003)  Din & Soldan (2001)  Briggs (2001)  Palmer (2002)  Briggs & Hansen (2004)  Cankoy & Ali Tut (2005)  Crocker (2005)  Allensworth, Correa, & Ponisciak (2008)  Domingue & Briggs (2009)  Koljatic & Silva (2014)  Early (2019)
111 Karen J. Mitchell, David Z. Robinson, Barbara S. Plake, & Kaeli T. Knowles (Eds.)   "Despite their importance and widespread use, little is known about the impact of these tests on states’ recent efforts to improve teaching and learning." Dismissive Testing Teacher Candidates: The Role of Licensure Tests in Improving Teacher Quality, 2001, p.14 Committee on Assessment and Teacher Quality   Board on Testing and Assessment, National Research Council  
112 Karen J. Mitchell, David Z. Robinson, Barbara S. Plake, & Kaeli T. Knowles (Eds.)   "Little information about the technical soundness of teacher licensure tests appears in the published literature." Dismissive Testing Teacher Candidates: The Role of Licensure Tests in Improving Teacher Quality, 2001, p.14 Committee on Assessment and Teacher Quality   Board on Testing and Assessment, National Research Council  
113 Karen J. Mitchell, David Z. Robinson, Barbara S. Plake, & Kaeli T. Knowles (Eds.)   "Little research exists on the extent to which licensure tests identify candidates with the knowledge and skills necessary to be minimally competent beginning teachers." Dismissive Testing Teacher Candidates: The Role of Licensure Tests in Improving Teacher Quality, 2001, p.14 Committee on Assessment and Teacher Quality   Board on Testing and Assessment, National Research Council  
114 Karen J. Mitchell, David Z. Robinson, Barbara S. Plake, & Kaeli T. Knowles (Eds.)   "Information is needed about the soundness and technical quality of the tests that states use to license their teachers." Dismissive Testing Teacher Candidates: The Role of Licensure Tests in Improving Teacher Quality, 2001, p.14 Committee on Assessment and Teacher Quality   Board on Testing and Assessment, National Research Council  
115 Karen J. Mitchell, David Z. Robinson, Barbara S. Plake, & Kaeli T. Knowles (Eds.)   "policy and practice on teacher licensure testing in the United States are nascent and evolving"   Testing Teacher Candidates: The Role of Licensure Tests in Improving Teacher Quality, 2001, p.17 Committee on Assessment and Teacher Quality   Board on Testing and Assessment, National Research Council  
116 Karen J. Mitchell, David Z. Robinson, Barbara S. Plake, & Kaeli T. Knowles (Eds.)   "The paucity of data and these methodological challenges made the committee’s examination of teacher licensure testing difficult."   Testing Teacher Candidates: The Role of Licensure Tests in Improving Teacher Quality, 2001, p.17 Committee on Assessment and Teacher Quality   Board on Testing and Assessment, National Research Council  
117 Karen J. Mitchell, David Z. Robinson, Barbara S. Plake, & Kaeli T. Knowles (Eds.)   "There were a number of questions the committee wanted to answer but could not, either because they were beyond the scope of this study, the evidentiary base was inconclusive, or the committee’s time and resources were insufficient."   Testing Teacher Candidates: The Role of Licensure Tests in Improving Teacher Quality, 2001, p.17 Committee on Assessment and Teacher Quality   Board on Testing and Assessment, National Research Council  
118 Marguerite Clarke George F. Madaus “[T]here has been no analogous infrastructure for independently evaluating a testing program before or after implementation, or for monitoring test use and impact.” p. 19 Dismissive The Adverse Impact of High Stakes Testing on Minority Students: Evidence from 100 Years of Test Data In G. Orfield and M. Kornhaber (Eds.), Raising standards or raising barriers? Inequality and high stakes testing in public education. New York: The Century Foundation (2001) http://files.eric.ed.gov/fulltext/ED450183.pdf The Century Foundation External evaluations of large-scale testing programs not only exist, but represent the norm. 
119 Marguerite Clarke George F. Madaus “The effects of testing are now so diverse, widespread, and serious that it is necessary to establish mechanisms for catalyzing inquiry about, and systematic independent scrutiny of them.” p. 20 Dismissive The Adverse Impact of High Stakes Testing on Minority Students: Evidence from 100 Years of Test Data In G. Orfield and M. Kornhaber (Eds.), Raising standards or raising barriers? Inequality and high stakes testing in public education. New York: The Century Foundation (2001) http://files.eric.ed.gov/fulltext/ED450183.pdf The Century Foundation External evaluations of large-scale testing programs not only exist, but represent the norm. 
120 Ronald Deitel   "In the late 1980s, CRESST was among the first to research the measurement of rigorous, discipline-based knowledge for purposes of large-scale assessment." 1stness Center for Research on Evaluation, Standards, and Student Testing (CRESST) clarify the goals and activities of CRESST EducationNews.org, November 18, 2000   Office of Research and Improvement, US Education Department Nonsense. Hundreds, perhaps thousands, of studies of the effects of testing predate CRESST's in the 1980s. See, for example:  https://www.tandfonline.com/doi/full/10.1080/15305058.2011.602920 ; https://nonpartisaneducation.org/Review/Resources/QuantitativeList.htm ; https://nonpartisaneducation.org/Review/Resources/SurveyList.htm ; https://nonpartisaneducation.org/Review/Resources/QualitativeList.htm
121 Marguerite Clarke Madaus, Horn, and Ramos “[F]or most of this century, there has been no infrastructure for independently evaluating a testing programme before or after implementation, or for monitoring test use and impact. The commercial testing industry does not as yet have any structure in place for the regulation and monitoring of appropriate test use.” p. 177 Dismissive Retrospective on Educational Testing and Assessment in the 20th Century Curriculum Studies, 2000, vol. 32, no. 2, http://webpages.uncc.edu/~rglamber/Rsch6109%20Materials/HistoryAchTests_3958652.pdf   External evaluations of large-scale testing programs not only exist, but represent the norm. 
122 Marguerite Clarke Madaus, Horn, and Ramos “Given the paucity of evidence available on the volume of testing over time, we examined five indirect indicators of growth in testing. . . .” p. 169 Dismissive Retrospective on Educational Testing and Assessment in the 20th Century Curriculum Studies, 2000, vol. 32, no. 2 http://webpages.uncc.edu/~rglamber/Rsch6109%20Materials/HistoryAchTests_3958652.pdf   There exist many sources of such information, from the Council of Chief State School Officers (CCSSO), the US Education Department, the US General Accounting Office (GAO), for example.
123 Sheila Barron   "Although this is a topic researchers ... talk about often, very little has been written about the difficulties secondary analysts confront." p.173 Dismissive Difficulties associated with secondary analysis of NAEP data, chapter 9 Grading the Nation's Report Card, National Research Council, 2000 https://www.nap.edu/catalog/9751/grading-the-nations-report-card-research-from-the-evaluation-of National Research Council funders In their 2009 Evaluation of NAEP for the US Education Department, Buckendahl, Davis, Plake, Sireci, Hambleton, Zenisky, & Wells (pp. 77–85) managed to find quite a lot of research on making comparisons between NAEP and state assessments: several of NAEP's own publications, Chromy 2005), Chromy, Ault, Black, & Mosquin (2007), McLaughlin (2000), Schuiz & Mitzel (2005), Sireci, Robin, Meara, Rogers, & Swaminathan (2000),  Stancavage, Et al (2002),  Stoneberg (2007), WestEd (2002), and Wise, Le, Hoffman, & Becker (2004). 
124 Sheila Barron   "...few articles have been written that specifically address the difficulties of using NAEP data." p.173 Dismissive Difficulties associated with secondary analysis of NAEP data, chapter 9 Grading the Nation's Report Card, National Research Council, 2000 https://www.nap.edu/catalog/9751/grading-the-nations-report-card-research-from-the-evaluation-of National Research Council funders In their 2009 Evaluation of NAEP for the US Education Department, Buckendahl, Davis, Plake, Sireci, Hambleton, Zenisky, & Wells (pp. 77–85) managed to find quite a lot of research on making comparisons between NAEP and state assessments: several of NAEP's own publications, Chromy 2005), Chromy, Ault, Black, & Mosquin (2007), McLaughlin (2000), Schuiz & Mitzel (2005), Sireci, Robin, Meara, Rogers, & Swaminathan (2000),  Stancavage, Et al (2002),  Stoneberg (2007), WestEd (2002), and Wise, Le, Hoffman, & Becker (2004). 
125 Herman, Joan L.    “Testing accommodations that attempt to reduce the language load of a test or otherwise compensate for students' reduced language skills (e.g., by providing students more time) are also currently being researched, but answers that are equitable and fair for all students have not yet been found.” p. 8 Dismissive Student Assessment and Student Achievement in the California Public School System (with Brown and Baker) CSE Technical Report 519, April 2000 https://www.cse.ucla.edu/products/reports/TECH519.pdf Office of Research and Improvement, US Education Department
126 Herman, Joan L.    “Thus, the extent to which gains reflect real improvement in learning is an open question (see, e.g., Shepard, 1990).” p. 15 Dismissive Student Assessment and Student Achievement in the California Public School System (with Brown and Baker) CSE Technical Report 519, April 2000 https://www.cse.ucla.edu/products/reports/TECH519.pdf Office of Research and Improvement, US Education Department
127 R. L. Linn   "There are many reasons for the Lake Wobegon Effect, most of which are less sinister than those emphasized by Cannell." Denigrating Assessments and Accountability, p.7 Educational Researcher, March, pp.4–16. https://journals.sagepub.com/doi/abs/10.3102/0013189x029002004 Office of Research and Improvement, US Education Department No. Cannell was exactly right. The cause was corruption, lax security, and cheating. See, for example, https://nonpartisaneducation.org/Review/Articles/v6n3.htm
128 Lorrie A. Shepard   "This portrayal derives mostly from research leading to Wood and Bruner’s original conception of scaffolding, from Vygotskian theory, and from naturalistic studies of effective tutoring described next. Relatively few studies have been undertaken in which explicit feedback interventions have been tried in the context of constructivist instructional settings." Dismissive The Role of Classroom Assessment in Teaching and Learning, p.59 CSE Technical Report 517, February 2000 https://nepc.colorado.edu/sites/default/files/publications/TECH517.pdf Office of Research and Improvement, US Education Department  
129 Lorrie A. Shepard   "The NCTM and NRC visions are idealizations based on beliefs about constructivist pedagogy and reflective practice. Although both are supported by examples of individual teachers who use assessment to improve their teaching, little is known about what kinds of support would be required to help large numbers of teachers develop these strategies or to ensure that teacher education programs prepared teachers to use assessment in these ways. Research is needed to address these basic implementation questions." Dismissive The Role of Classroom Assessment in Teaching and Learning, p.64 CSE Technical Report 517, February 2000 https://nepc.colorado.edu/sites/default/files/publications/TECH517.pdf Office of Research and Improvement, US Education Department Rubbish. Entire books dating back a century were written on the topic, for example:  C.C. Ross, Measurement in Today’s Schools, 1942;  G.M. Ruch, G.D. Stoddard, Tests and Measurements in High School Instruction, 1927;  C.W. Odell, Educational Measurement in High School, 1930. Other testimonies to the abundance of educational testing and empirical research on test use starting in the first half of the twentieth century can be found in Lincoln & Workman 1936, 4, 7; Butts 1947, 605; Monroe 1950, 1461; Holman & Docter 1972, 34; Tyack 1974, 183; and Lohman 1997, 88.
130 Lorrie A. Shepard   "This social-constructivist view of classroom assessment is an idealization. The new ideas and perspectives underlying it have a basis in theory and empirical studies, but how they will work in practice and on a larger scale is not known." Dismissive The Role of Classroom Assessment in Teaching and Learning, p.67 CSE Technical Report 517, February 2000 https://nepc.colorado.edu/sites/default/files/publications/TECH517.pdf Office of Research and Improvement, US Education Department  
131 Marguerite Clarke Madaus, Pedulla, and Shore “The National Board believes that we must as a nation conduct research that helps testing contribute to student learning, classroom practice, and state and district management of school resources.” p. 2 Dismissive An Agenda for Research on Educational Testing NBETPP Statements, Vol. 1, No. 1, Jan. 2000 http://files.eric.ed.gov/fulltext/ED456137.pdf Ford Foundation Rubbish. Entire books dating back a century were written on the topic, for example:  C.C. Ross, Measurement in Today’s Schools, 1942;  G.M. Ruch, G.D. Stoddard, Tests and Measurements in High School Instruction, 1927;  C.W. Odell, Educational Measurement in High School, 1930. Other testimonies to the abundance of educational testing and empirical research on test use starting in the first half of the twentieth century can be found in Lincoln & Workman 1936, 4, 7; Butts 1947, 605; Monroe 1950, 1461; Holman & Docter 1972, 34; Tyack 1974, 183; and Lohman 1997, 88.
132 Marguerite Clarke Madaus, Pedulla, and Shore “Validity research on teacher testing needs to address the following four issues in particular. . .” : [four bullet-point paragraphs follow] p. 3 Dismissive An Agenda for Research on Educational Testing NBETPP Statements, Vol. 1, No. 1, Jan. 2000 http://files.eric.ed.gov/fulltext/ED456137.pdf Ford Foundation  
133 Marguerite Clarke Madaus, Pedulla, and Shore “[W]e need to understand better the relationship between testing and the diversity of the college student body.” p. 6 Dismissive An Agenda for Research on Educational Testing NBETPP Statements, Vol. 1, No. 1, Jan. 2000 http://files.eric.ed.gov/fulltext/ED456137.pdf Ford Foundation  
134 Marguerite Clarke Haney, Madaus We trust that further research will build on this good example and help all of us move from suggestive correlational studies towards more definitive conclusions.” p. 9 1stness High Stakes Testing and High School Completion NBETPP Statements, Volume 1, Number 3, Jan. 2000 http://files.eric.ed.gov/fulltext/ED456139.pdf Ford Foundation Just some of the relevant pre-2008 studies of the effects of minimum-competency or exit exams and the problems with a single passing score include those of Alvarez, Moreno, & Patrinos (2007); Grodsky & Kalogrides (2006); Audette (2005); Orlich (2003); StandardsWork (2003); Meisels, et al. (2003); Braun (2003); Rosenshine (2003); Tighe, Wang, & Foley (2002); Carnoy & Loeb (2002); Baumert & Demmrich (2001); Rosenblatt & Offer (2001); Phelps (2001); Toenjes, Dworkin, Lorence, & Hill (2000); Wenglinsky (2000); Massachusetts Finance Office (2000); DeMars (2000); Bishop (1999, 2000, 2001, & 2004); Grissmer & Flanagan(1998); Strauss, Bowes, Marks, & Plesko (1998); Frederiksen (1994); Ritchie & Thorkildsen (1994); Chao-Qun & Hui (1993); Potter & Wall (1992); Jacobson (1992); Rodgers, et al. (1991); Morris (1991); Winfield (1990); Ligon, Johnstone, Brightman, Davis, et al. (1990); Winfield (1987); Koffler (1987); Losack (1987); Marshall (1987); Hembree (1987); Mangino, Battaille, Washington, & Rumbaut (1986); Michigan Department of Education (1984); Ketchie (1984); Serow (1982); Indiana Education Department (1982); Brunton (1982); Paramore, et al. (1980); Ogden (1979); Down(2) (1979); Wellisch (1978); and Findley (1978).
135 Jay P. Heubert Robert M. Hauser "A growing body of research suggests that tests often do in fact change school and classroom practices (Corbett & Wilson, 1991; Madaus, 1988; Herman & Golan 1993; Smith & Rottenberg, 1991)." p.29 Dismissive High Stakes: Testing for Tracking, Promotion, and Graduation Board on Testing and Assessment, National Research Council, 1999 https://www.nap.edu/catalog/6336/high-stakes-testing-for-tracking-promotion-and-graduation Ford Foundation Rubbish. Entire books dating back a century were written on the topic, for example:  C.C. Ross, Measurement in Today’s Schools, 1942;  G.M. Ruch, G.D. Stoddard, Tests and Measurements in High School Instruction, 1927;  C.W. Odell, Educational Measurement in High School, 1930. Other testimonies to the abundance of educational testing and empirical research on test use starting in the first half of the twentieth century can be found in Lincoln & Workman 1936, 4, 7; Butts 1947, 605; Monroe 1950, 1461; Holman & Docter 1972, 34; Tyack 1974, 183; and Lohman 1997, 88.
136 Jay P. Heubert Robert M. Hauser "A growing body of research suggests that tests often do in fact change school and classroom practices (Corbett & Wilson, 1991; Madaus, 1988; Herman & Golan 1993; Smith & Rottenberg, 1991)." p.29 Denigrating High Stakes: Testing for Tracking, Promotion, and Graduation Board on Testing and Assessment, National Research Council, 1999 https://www.nap.edu/catalog/6336/high-stakes-testing-for-tracking-promotion-and-graduation Ford Foundation Rubbish. Entire books dating back a century were written on the topic, for example:  C.C. Ross, Measurement in Today’s Schools, 1942;  G.M. Ruch, G.D. Stoddard, Tests and Measurements in High School Instruction, 1927;  C.W. Odell, Educational Measurement in High School, 1930. Other testimonies to the abundance of educational testing and empirical research on test use starting in the first half of the twentieth century can be found in Lincoln & Workman 1936, 4, 7; Butts 1947, 605; Monroe 1950, 1461; Holman & Docter 1972, 34; Tyack 1974, 183; and Lohman 1997, 88.
137 Jay P. Heubert Robert M. Hauser "Most standards-based assessments have only recently been implemented or are still being developed. Consequently, it is too early to determine whether they will produce the intended effects on classroom instruction." p.36 Dismissive High Stakes: Testing for Tracking, Promotion, and Graduation Board on Testing and Assessment, National Research Council, 1999 https://www.nap.edu/catalog/6336/high-stakes-testing-for-tracking-promotion-and-graduation Ford Foundation Relevant pre-2000 studies of the effects of standards, alignment, goal setting, setting reachable goals, etc. include those of Mitchell (1999); Morgan & Ramist (1998); the *Southern Regional Education Board (1998); Miles, Bishop, Collins, Fink, Gardner, Grant, Hussain, et al. (1997); the Florida Office of Program Policy Analysis (1997); Pomplun (1997); Schmoker (1996); Aguilera & Hendricks (1996); Banta, Lund, Black & Oblander (1996); Bottoms & Mikos (1995); *Bamburg & Medina (1993); Bishop (1993); the U. S. General Accounting Office (1993); Eckstein & Noah (1993); Mattsson (1993); Brown (1992); Heyneman & Ransom (1992); Whetton (1992); Anderson, Muir, Bateson, Blackmore & Rogers (1990); Csikszentmihalyi (1990); *Levine & Lezotte (1990); LaRoque & Coleman (1989); Hillocks (1987); Willingham & Morris (1986); Resnick & Resnick (1985); Ogle & Fritts (1984); *Natriello & Dornbusch (1984); Brooke & Oxenham (1984); Rentz (1979); Wellisch, MacQueen, Carriere & Dick (1978); *Rosswork (1977); Estes, Colvin & Goodwin (1976); Wood (1953); and Panlasigui & Knight (1930).
138 Jay P. Heubert Robert M. Hauser "A recent review of the available research evidence by Mehrens (1998) reaches several interim conclusions. Drawing on eight studies...." p.36 Dismissive High Stakes: Testing for Tracking, Promotion, and Graduation Board on Testing and Assessment, National Research Council, 1999 https://www.nap.edu/catalog/6336/high-stakes-testing-for-tracking-promotion-and-graduation Ford Foundation Just some of the relevant pre-2008 studies of the effects of minimum-competency or exit exams and the problems with a single passing score include those of Alvarez, Moreno, & Patrinos (2007); Grodsky & Kalogrides (2006); Audette (2005); Orlich (2003); StandardsWork (2003); Meisels, et al. (2003); Braun (2003); Rosenshine (2003); Tighe, Wang, & Foley (2002); Carnoy & Loeb (2002); Baumert & Demmrich (2001); Rosenblatt & Offer (2001); Phelps (2001); Toenjes, Dworkin, Lorence, & Hill (2000); Wenglinsky (2000); Massachusetts Finance Office (2000); DeMars (2000); Bishop (1999, 2000, 2001, & 2004); Grissmer & Flanagan(1998); Strauss, Bowes, Marks, & Plesko (1998); Frederiksen (1994); Ritchie & Thorkildsen (1994); Chao-Qun & Hui (1993); Potter & Wall (1992); Jacobson (1992); Rodgers, et al. (1991); Morris (1991); Winfield (1990); Ligon, Johnstone, Brightman, Davis, et al. (1990); Winfield (1987); Koffler (1987); Losack (1987); Marshall (1987); Hembree (1987); Mangino, Battaille, Washington, & Rumbaut (1986); Michigan Department of Education (1984); Ketchie (1984); Serow (1982); Indiana Education Department (1982); Brunton (1982); Paramore, et al. (1980); Ogden (1979); Down(2) (1979); Wellisch (1978); and Findley (1978).
139 Jay P. Heubert Robert M. Hauser "Although there are no national data summarizing how local districts use standardized tests in certifying students, we do know that serveral of the largest school systems have begun to use test scores in determining grade-to-grade promotion (Chicago) or are considering doing so (New York City, Boston)." p.37 Dismissive High Stakes: Testing for Tracking, Promotion, and Graduation Board on Testing and Assessment, National Research Council, 1999 https://www.nap.edu/catalog/6336/high-stakes-testing-for-tracking-promotion-and-graduation Ford Foundation Just some of the relevant pre-2008 studies of the effects of minimum-competency or exit exams and the problems with a single passing score include those of Alvarez, Moreno, & Patrinos (2007); Grodsky & Kalogrides (2006); Audette (2005); Orlich (2003); StandardsWork (2003); Meisels, et al. (2003); Braun (2003); Rosenshine (2003); Tighe, Wang, & Foley (2002); Carnoy & Loeb (2002); Baumert & Demmrich (2001); Rosenblatt & Offer (2001); Phelps (2001); Toenjes, Dworkin, Lorence, & Hill (2000); Wenglinsky (2000); Massachusetts Finance Office (2000); DeMars (2000); Bishop (1999, 2000, 2001, & 2004); Grissmer & Flanagan(1998); Strauss, Bowes, Marks, & Plesko (1998); Frederiksen (1994); Ritchie & Thorkildsen (1994); Chao-Qun & Hui (1993); Potter & Wall (1992); Jacobson (1992); Rodgers, et al. (1991); Morris (1991); Winfield (1990); Ligon, Johnstone, Brightman, Davis, et al. (1990); Winfield (1987); Koffler (1987); Losack (1987); Marshall (1987); Hembree (1987); Mangino, Battaille, Washington, & Rumbaut (1986); Michigan Department of Education (1984); Ketchie (1984); Serow (1982); Indiana Education Department (1982); Brunton (1982); Paramore, et al. (1980); Ogden (1979); Down(2) (1979); Wellisch (1978); and Findley (1978).
140 Jay P. Heubert Robert M. Hauser "There is very little research that specifically addresses the consequences of graduation testing." p.172 Dismissive High Stakes: Testing for Tracking, Promotion, and Graduation Board on Testing and Assessment, National Research Council, 1999 https://www.nap.edu/catalog/6336/high-stakes-testing-for-tracking-promotion-and-graduation Ford Foundation Just some of the relevant pre-2008 studies of the effects of minimum-competency or exit exams and the problems with a single passing score include those of Alvarez, Moreno, & Patrinos (2007); Grodsky & Kalogrides (2006); Audette (2005); Orlich (2003); StandardsWork (2003); Meisels, et al. (2003); Braun (2003); Rosenshine (2003); Tighe, Wang, & Foley (2002); Carnoy & Loeb (2002); Baumert & Demmrich (2001); Rosenblatt & Offer (2001); Phelps (2001); Toenjes, Dworkin, Lorence, & Hill (2000); Wenglinsky (2000); Massachusetts Finance Office (2000); DeMars (2000); Bishop (1999, 2000, 2001, & 2004); Grissmer & Flanagan(1998); Strauss, Bowes, Marks, & Plesko (1998); Frederiksen (1994); Ritchie & Thorkildsen (1994); Chao-Qun & Hui (1993); Potter & Wall (1992); Jacobson (1992); Rodgers, et al. (1991); Morris (1991); Winfield (1990); Ligon, Johnstone, Brightman, Davis, et al. (1990); Winfield (1987); Koffler (1987); Losack (1987); Marshall (1987); Hembree (1987); Mangino, Battaille, Washington, & Rumbaut (1986); Michigan Department of Education (1984); Ketchie (1984); Serow (1982); Indiana Education Department (1982); Brunton (1982); Paramore, et al. (1980); Ogden (1979); Down(2) (1979); Wellisch (1978); and Findley (1978).
141 Jay P. Heubert Robert M. Hauser "Caterall adds, 'initial boasts and doubts alike regarding the effects of gatekeeping competency testing have met with a paucity of follow-up research." p.172 Dismissive High Stakes: Testing for Tracking, Promotion, and Graduation Board on Testing and Assessment, National Research Council, 1999 https://www.nap.edu/catalog/6336/high-stakes-testing-for-tracking-promotion-and-graduation Ford Foundation Just some of the relevant pre-2008 studies of the effects of minimum-competency or exit exams and the problems with a single passing score include those of Alvarez, Moreno, & Patrinos (2007); Grodsky & Kalogrides (2006); Audette (2005); Orlich (2003); StandardsWork (2003); Meisels, et al. (2003); Braun (2003); Rosenshine (2003); Tighe, Wang, & Foley (2002); Carnoy & Loeb (2002); Baumert & Demmrich (2001); Rosenblatt & Offer (2001); Phelps (2001); Toenjes, Dworkin, Lorence, & Hill (2000); Wenglinsky (2000); Massachusetts Finance Office (2000); DeMars (2000); Bishop (1999, 2000, 2001, & 2004); Grissmer & Flanagan(1998); Strauss, Bowes, Marks, & Plesko (1998); Frederiksen (1994); Ritchie & Thorkildsen (1994); Chao-Qun & Hui (1993); Potter & Wall (1992); Jacobson (1992); Rodgers, et al. (1991); Morris (1991); Winfield (1990); Ligon, Johnstone, Brightman, Davis, et al. (1990); Winfield (1987); Koffler (1987); Losack (1987); Marshall (1987); Hembree (1987); Mangino, Battaille, Washington, & Rumbaut (1986); Michigan Department of Education (1984); Ketchie (1984); Serow (1982); Indiana Education Department (1982); Brunton (1982); Paramore, et al. (1980); Ogden (1979); Down(2) (1979); Wellisch (1978); and Findley (1978).
142 Jay P. Heubert Robert M. Hauser "in one of the few such studies on this topic (Bishop, 1997) compared the Third International Mathematics and Science Study (TIMSS) test scores of countries with and without rigorous graduation tests. He found that countries with demanding exit exams outperformed other countries at a comparable level of development. He concluded, however that such exams were probably not the most important determinant of achievement levels and that more research was needed." p.173 Dismissive High Stakes: Testing for Tracking, Promotion, and Graduation Board on Testing and Assessment, National Research Council, 1999 https://www.nap.edu/catalog/6336/high-stakes-testing-for-tracking-promotion-and-graduation Ford Foundation Relevant pre-2000 studies of the effects of minimum-competency testing and the problems with a single passing score include those of Frederiksen (1994); Winfield (1990); Ligon, Johnstone, Brightman, Davis, et al. (1990); Losack (1987); Mangino & Babcock (1986); Serow (1982); Brunton (1982); Paramore, et al. (1980); Ogden (1979); and Findley (1978).
143 Jay P. Heubert Robert M. Hauser "Very little is known about the specific consequences of passing or failing a high school graduation exam." p.176 Dismissive High Stakes: Testing for Tracking, Promotion, and Graduation Board on Testing and Assessment, National Research Council, 1999 https://www.nap.edu/catalog/6336/high-stakes-testing-for-tracking-promotion-and-graduation Ford Foundation Relevant pre-2000 studies of the effects of minimum-competency testing and the problems with a single passing score include those of Frederiksen (1994); Winfield (1990); Ligon, Johnstone, Brightman, Davis, et al. (1990); Losack (1987); Mangino & Babcock (1986); Serow (1982); Brunton (1982); Paramore, et al. (1980); Ogden (1979); and Findley (1978).
144 Jay P. Heubert Robert M. Hauser "American experience is limited and research is needed to explore their effectiveness. For instance, we do not know how to combine advance notice of high-stakes test requirements, remedial intervention, and opportunity to retake graduation tests." p.180 Dismissive High Stakes: Testing for Tracking, Promotion, and Graduation Board on Testing and Assessment, National Research Council, 1999 https://www.nap.edu/catalog/6336/high-stakes-testing-for-tracking-promotion-and-graduation Ford Foundation Relevant pre-2000 studies of the effects of minimum-competency testing and the problems with a single passing score include those of Frederiksen (1994); Winfield (1990); Ligon, Johnstone, Brightman, Davis, et al. (1990); Losack (1987); Mangino & Babcock (1986); Serow (1982); Brunton (1982); Paramore, et al. (1980); Ogden (1979); and Findley (1978).
145 Jay P. Heubert Robert M. Hauser "Research is also needed to explore the effects of different kinds of high school credentials on employment and other post-school outcomes." p.180 Dismissive High Stakes: Testing for Tracking, Promotion, and Graduation Board on Testing and Assessment, National Research Council, 1999 https://www.nap.edu/catalog/6336/high-stakes-testing-for-tracking-promotion-and-graduation Ford Foundation  
146 Jay P. Heubert Robert M. Hauser "At the same time, solid evaluation research on the most effective remedial approaches is sparse." p.183 Denigrating High Stakes: Testing for Tracking, Promotion, and Graduation Board on Testing and Assessment, National Research Council, 1999 https://www.nap.edu/catalog/6336/high-stakes-testing-for-tracking-promotion-and-graduation Ford Foundation Developmental (i.e., remedial) education researchers have conducted many studies to determine what works best to keep students from failing in their “courses of last resort,” after which there are no alternatives.  Researchers have included Boylan, Roueche, McCabe, Wheeler, Kulik, Bonham, Claxton, Bliss, Schonecker, Chen, Chang, and Kirk.
147 Jay P. Heubert Robert M. Hauser "There is plainly a need for good research on effective remedial eduation." p.183 Denigrating High Stakes: Testing for Tracking, Promotion, and Graduation Board on Testing and Assessment, National Research Council, 1999 https://www.nap.edu/catalog/6336/high-stakes-testing-for-tracking-promotion-and-graduation Ford Foundation Developmental (i.e., remedial) education researchers have conducted many studies to determine what works best to keep students from failing in their “courses of last resort,” after which there are no alternatives.  Researchers have included Boylan, Roueche, McCabe, Wheeler, Kulik, Bonham, Claxton, Bliss, Schonecker, Chen, Chang, and Kirk.
148 Jay P. Heubert Robert M. Hauser "However, in most of the nation, much needs to be done before a world-class curriculum and world-class instruction will be in place." p.277 Dismissive High Stakes: Testing for Tracking, Promotion, and Graduation Board on Testing and Assessment, National Research Council, 1999 https://www.nap.edu/catalog/6336/high-stakes-testing-for-tracking-promotion-and-graduation Ford Foundation  
149 Jay P. Heubert Robert M. Hauser "The committee sees a strong need for better evidence on the benefits and costs of high-stakes testing." p.281 Denigrating High Stakes: Testing for Tracking, Promotion, and Graduation Board on Testing and Assessment, National Research Council, 1999 https://www.nap.edu/catalog/6336/high-stakes-testing-for-tracking-promotion-and-graduation Ford Foundation No. See, for example, Phelps, R.P. (2000, Winter). Estimating the cost of systemwide student testing in the United States. Journal of Education Finance, 25(3) 343–380; Danitz, T. (2001, February 27). Special report: States pay $400 million for tests in 2001. Stateline.org. Pew Center for the States; Hoxby, C.M. (2002). The cost of accountability, in W. M Evers & H.J. Walberg (Eds.), School Accountability, Stanford, CA: Hoover Institution Press; U.S. GAO. (1993, January). Student testing: Current extent and expenditures, with cost estimates for a national examination. GAO/PEMD-93-8. Washington, DC: US General Accounting Office; Phelps, R.P. (1998). Benefit-cost analysis of systemwide student testing, Paper presented at the annual meeting of the American Education Finance Association, Mobile, AL.
150 Jay P. Heubert Robert M. Hauser "Very little is known about the specific consequences of passing or failing a high school graduation exam." p.288 Dismissive High Stakes: Testing for Tracking, Promotion, and Graduation Board on Testing and Assessment, National Research Council, 1999 https://www.nap.edu/catalog/6336/high-stakes-testing-for-tracking-promotion-and-graduation Ford Foundation The many studies of district and state minimum competency or diploma testing programs popular from the 1960s through the 1980s found positive effects for students just below the cut score and mixed effects for students far below and anywhere above.  Researchers have included Fincher, Jackson, Battiste, Corcoran, Jacobsen, Tanner, Boylan, Saxon, Anderson, Muir, Bateson, Blackmore, Rogers, Zigarelli, Schafer, Hultgren, Hawley, Abrams, Seubert, Mazzoni, Brookhart, Mendro, Herrick, Webster, Orsack, Weerasinghe, and Bembry
151 Jay P. Heubert Robert M. Hauser "At present, however, advanced skills are often not well defined and ways of assessing them are not well established." p.289 Denigrating High Stakes: Testing for Tracking, Promotion, and Graduation Board on Testing and Assessment, National Research Council, 1999 https://www.nap.edu/catalog/6336/high-stakes-testing-for-tracking-promotion-and-graduation Ford Foundation  
152 Jay P. Heubert Robert M. Hauser "...in many cases, the demands that full participation of these students [i.e., students with disabilities] place on assessment systems are greater than current assessment knowledge and technology can support." p.191 Dismissive High Stakes: Testing for Tracking, Promotion, and Graduation Board on Testing and Assessment, National Research Council, 1999 https://www.nap.edu/catalog/6336/high-stakes-testing-for-tracking-promotion-and-graduation Ford Foundation  
153 Jay P. Heubert Robert M. Hauser "...available evidence about the possible effects of graduation tests on learning and on high school dropout is inconclusive (e.g., Kreitzer et al., 1989, Reardon, 1996; Catterall, 1990; Cawthorne, 1990; Bishop, 1997). Dismissive High Stakes: Testing for Tracking, Promotion, and Graduation Board on Testing and Assessment, National Research Council, 1999 https://www.nap.edu/catalog/6336/high-stakes-testing-for-tracking-promotion-and-graduation Ford Foundation The many studies of district and state minimum competency or diploma testing programs popular from the 1960s through the 1980s found positive effects for students just below the cut score and mixed effects for students far below and anywhere above.  Researchers have included Fincher, Jackson, Battiste, Corcoran, Jacobsen, Tanner, Boylan, Saxon, Anderson, Muir, Bateson, Blackmore, Rogers, Zigarelli, Schafer, Hultgren, Hawley, Abrams, Seubert, Mazzoni, Brookhart, Mendro, Herrick, Webster, Orsack, Weerasinghe, and Bembry
154 Jay P. Heubert Robert M. Hauser "We do not know how to combine advance notice of high-stakes test requirements, remedial intervention, and opportunity to retake graduation tests. Research is also needed to explore the effects of different kinds of high school credentials on employment and other post-school outcomes." p.289 Dismissive High Stakes: Testing for Tracking, Promotion, and Graduation Board on Testing and Assessment, National Research Council, 1999 https://www.nap.edu/catalog/6336/high-stakes-testing-for-tracking-promotion-and-graduation Ford Foundation The many studies of district and state minimum competency or diploma testing programs popular from the 1960s through the 1980s found positive effects for students just below the cut score and mixed effects for students far below and anywhere above.  Researchers have included Fincher, Jackson, Battiste, Corcoran, Jacobsen, Tanner, Boylan, Saxon, Anderson, Muir, Bateson, Blackmore, Rogers, Zigarelli, Schafer, Hultgren, Hawley, Abrams, Seubert, Mazzoni, Brookhart, Mendro, Herrick, Webster, Orsack, Weerasinghe, and Bembry
155 Robert L. Linn   "Two obvious, but frequently ignored, cautions [from the TIERS experience] are these: . . . " p. 6 Denigrating Assessments and Accountability CSE Technical Report 490 (November 1998) http://www.cse.ucla.edu/products/Reports/TECH490.pdf Office of Research and Improvement, US Education Department  
156 Robert L. Linn   "Moreover, it is critical to recognize first that the choice of constructs matters, and so does the way in which measures are developed and linked to the constructs. Although these two points may be considered obvious, they are too often ignored." p. 13 Denigrating Assessments and Accountability CSE Technical Report 490 (November 1998) http://www.cse.ucla.edu/products/Reports/TECH490.pdf Office of Research and Improvement, US Education Department  
157 Robert L. Linn   “Although that claim is subject to debate, it seldom even gets considered when aggregate results are used either to monitor progress (e.g., NAEP) or for purposes of school, district, or state accountability.” p. 16 Dismissive Assessments and Accountability CSE Technical Report 490 (November 1998) http://www.cse.ucla.edu/products/Reports/TECH490.pdf Office of Research and Improvement, US Education Department  
158 Anne Lewis quoting Arnold Fege, National PTA "The national testing proposal is based on 'quantum leap' theories, not on research, contended Arnold Fege of the National PTA. 'As I listened to the presentations this morning,’ he said, ‘I didn't hear about any research that backs up the introduction of national testing.’ In his opinion, ‘no parent in the country is losing sleep because his or her child is not meeting NAEP standards,’ and even though testing is pervasive in American education, it seems to not have made a big impact on change." Dismissive Assessing Student Achievement: Search for Validity and Balance CSE Technical Report 481 (1997) https://cresst.org/wp-content/uploads/TECH481.pdf Office of Research and Improvement, US Education Department In their 2009 Evaluation of NAEP for the US Education Department, Buckendahl, Davis, Plake, Sireci, Hambleton, Zenisky, & Wells (pp. 77–85) managed to find quite a lot of research on making comparisons between NAEP and state assessments: several of NAEP's own publications, Chromy 2005), Chromy, Ault, Black, & Mosquin (2007), McLaughlin (2000), Schuiz & Mitzel (2005), Sireci, Robin, Meara, Rogers, & Swaminathan (2000),  Stancavage, Et al (2002),  Stoneberg (2007), WestEd (2002), and Wise, Le, Hoffman, & Becker (2004). 
159 Eva L. Baker Zenaida Aguirre-Munoz "The extent and nature of the impact of language skills on performance assessments remains elusive due to the paucity of research in this area." Dismissive Improving the equity and validity of assessment-based information systems, p.3 CSE Technical Report 462, December 1997 https://cresst.org/wp-content/uploads/TECH462.pdf Office of Research and Improvement, US Education Department  
160 Joan L. Herman   "Although conceptual models for analyzing the cost of alternative assessment and for conducting cost-benefit analyses have been formulated (Catterall & Winters, 1994; Picus, 1994), definitive cost studies are yet to be completed (see, however, Picus & Tralli, forthcoming)." p. 30 Dismissive, Denigrating Large-Scale Assessment in Support of School Reform: Lessons in the Search for Alternative Measures CSE Technical Report 446, Oct. 1997 http://www.cse.ucla.edu/products/reports/TECH446.pdf Office of Research and Improvement, US Education Department No. See, for example, Phelps, R.P. (2000, Winter). Estimating the cost of systemwide student testing in the United States. Journal of Education Finance, 25(3) 343–380; Danitz, T. (2001, February 27). Special report: States pay $400 million for tests in 2001. Stateline.org. Pew Center for the States; Hoxby, C.M. (2002). The cost of accountability, in W. M Evers & H.J. Walberg (Eds.), School Accountability, Stanford, CA: Hoover Institution Press; U.S. GAO. (1993, January). Student testing: Current extent and expenditures, with cost estimates for a national examination. GAO/PEMD-93-8. Washington, DC: US General Accounting Office; Phelps, R.P. (1998). Benefit-cost analysis of systemwide student testing, Paper presented at the annual meeting of the American Education Finance Association, Mobile, AL.
161 Robert L. Linn Eva L. Baker "“Very little research has been conducted to validate performance standards, particularly those that include specification of student response attributes.” pp. 26-27 Dismissive Emerging Educational Standards of Performance in the United States CSE Technical Report 437 (August 1997) http://www.cse.ucla.edu/products/reports/TECH437.pdf Office of Research and Improvement, US Education Department Relevant pre-2000 studies of the effects of standards, alignment, goal setting, setting reachable goals, etc. include those of Mitchell (1999); Morgan & Ramist (1998); the *Southern Regional Education Board (1998); Miles, Bishop, Collins, Fink, Gardner, Grant, Hussain, et al. (1997); the Florida Office of Program Policy Analysis (1997); Pomplun (1997); Schmoker (1996); Aguilera & Hendricks (1996); Banta, Lund, Black & Oblander (1996); Bottoms & Mikos (1995); *Bamburg & Medina (1993); Bishop (1993); the U. S. General Accounting Office (1993); Eckstein & Noah (1993); Mattsson (1993); Brown (1992); Heyneman & Ransom (1992); Whetton (1992); Anderson, Muir, Bateson, Blackmore & Rogers (1990); Csikszentmihalyi (1990); *Levine & Lezotte (1990); LaRoque & Coleman (1989); Hillocks (1987); Willingham & Morris (1986); Resnick & Resnick (1985); Ogle & Fritts (1984); *Natriello & Dornbusch (1984); Brooke & Oxenham (1984); Rentz (1979); Wellisch, MacQueen, Carriere & Dick (1978); *Rosswork (1977); Estes, Colvin & Goodwin (1976); Wood (1953); and Panlasigui & Knight (1930).
162 Harold F. O'Neil, Jr. Brenda Sugrue, Jamal Abedi, Eva L. Baker, Shari Golan "However, as d'Ydewalle (1987) has pointed out, 'clear-cut results from neat experiments on the impact of motivation on learning [or performance] do not exist.'" Dismissive Final Report of Experimental Studies on Motivation and NAEP Test Performance, p.5 CSE Technical Report 427, June 1997 https://cresst.org/wp-content/uploads/TECH427.pdf Office of Research and Improvement, US Education Department Relevant studies of the effects of tests and/or accountability program on motivation and instructional practice include those of the *Southern Regional Education Board (1998); Johnson (1998); Schafer, Hultgren, Hawley, Abrams Seubert & Mazzoni (1997); Miles, Bishop, Collins, Fink, Gardner, Grant, Hussain, et al. (1997); Tuckman & Trimble (1997); Clarke & Stephens (1996); Zigarelli (1996); Stevenson, Lee, et al. (1995); Waters, Burger & Burger (1995); Egeland (1995); Prais (1995); Tuckman (1994); Ritchie & Thorkildsen (1994); Brown & Walberg, (1993); Wall & Alderson (1993); Wolf & Rapiau (1993); Eckstein & Noah (1993); Chao-Qun & Hui (1993); Plazak & Mazur (1992); Steedman (1992); Singh, Marimutha & Mukjerjee (1990); *Levine & Lezotte (1990); O’Sullivan (1989); Somerset (1988); Pennycuick & Murphy (1988); Stevens (1984); Marsh (1984); Brunton (1982); Solberg (1977); Foss (1977); *Kirkland (1971); Somerset (1968); Stuit (1947); and Keys (1934).  *Covers many studies; study is a research review, research synthesis, or meta-analysis. "Others have considered the role of tests in incentive programs.  These researchers have included Homme, Csanyi, Gonzales, Rechs, O’Leary, Drabman, Kaszdin, Bootzin, Staats, Cameron, Pierce, McMillan, Corcoran, and Wilson. International organizations, such as the World Bank or the Asian Development Bank, have studied the effects of testing on education programs they sponsor.  Researchers have included Somerset, Heynemann, Ransom, Psacharopoulis, Velez, Brooke, Oxenham, Bude, Chapman, Snyder, and Pronaratna.
Moreover, the mastery learning/mastery testing experiments conducted from the 1960s through today varied incentives, frequency of tests, types of tests, and many other factors to determine the optimal structure of testing programs. Researchers included such notables as Bloom, Carroll, Keller, Block, Burns, Wentling, Anderson, Hymel, Kulik, Tierney, Cross, Okey, Guskey, Gates, and Jones."
163 Harold F. O'Neil, Jr. Brenda Sugrue, Jamal Abedi, Eva L. Baker, Shari Golan "In the educational context, most existing studies have focused on the influence of characteristics of the classroom learning environment, such as rewards, teacher feedback, goal structures, evaluation practices, on either the entecedents of consequences of motivation." Dismissive Final Report of Experimental Studies on Motivation and NAEP Test Performance, p.5 CSE Technical Report 427, June 1997 https://cresst.org/wp-content/uploads/TECH427.pdf Office of Research and Improvement, US Education Department Relevant studies of the effects of tests and/or accountability program on motivation and instructional practice include those of the *Southern Regional Education Board (1998); Johnson (1998); Schafer, Hultgren, Hawley, Abrams Seubert & Mazzoni (1997); Miles, Bishop, Collins, Fink, Gardner, Grant, Hussain, et al. (1997); Tuckman & Trimble (1997); Clarke & Stephens (1996); Zigarelli (1996); Stevenson, Lee, et al. (1995); Waters, Burger & Burger (1995); Egeland (1995); Prais (1995); Tuckman (1994); Ritchie & Thorkildsen (1994); Brown & Walberg, (1993); Wall & Alderson (1993); Wolf & Rapiau (1993); Eckstein & Noah (1993); Chao-Qun & Hui (1993); Plazak & Mazur (1992); Steedman (1992); Singh, Marimutha & Mukjerjee (1990); *Levine & Lezotte (1990); O’Sullivan (1989); Somerset (1988); Pennycuick & Murphy (1988); Stevens (1984); Marsh (1984); Brunton (1982); Solberg (1977); Foss (1977); *Kirkland (1971); Somerset (1968); Stuit (1947); and Keys (1934).  *Covers many studies; study is a research review, research synthesis, or meta-analysis. "Others have considered the role of tests in incentive programs.  These researchers have included Homme, Csanyi, Gonzales, Rechs, O’Leary, Drabman, Kaszdin, Bootzin, Staats, Cameron, Pierce, McMillan, Corcoran, and Wilson. International organizations, such as the World Bank or the Asian Development Bank, have studied the effects of testing on education programs they sponsor.  Researchers have included Somerset, Heynemann, Ransom, Psacharopoulis, Velez, Brooke, Oxenham, Bude, Chapman, Snyder, and Pronaratna.
Moreover, the mastery learning/mastery testing experiments conducted from the 1960s through today varied incentives, frequency of tests, types of tests, and many other factors to determine the optimal structure of testing programs. Researchers included such notables as Bloom, Carroll, Keller, Block, Burns, Wentling, Anderson, Hymel, Kulik, Tierney, Cross, Okey, Guskey, Gates, and Jones."
164 Harold F. O'Neil, Jr. Brenda Sugrue, Jamal Abedi, Eva L. Baker, Shari Golan "Most of the studies that have compared goal orientations have examined their effects on performance during classroom learning activities rather than at the time of test taking." Dismissive Final Report of Experimental Studies on Motivation and NAEP Test Performance, p.7 CSE Technical Report 427, June 1997 https://cresst.org/wp-content/uploads/TECH427.pdf Office of Research and Improvement, US Education Department Relevant studies of the effects of tests and/or accountability program on motivation and instructional practice include those of the *Southern Regional Education Board (1998); Johnson (1998); Schafer, Hultgren, Hawley, Abrams Seubert & Mazzoni (1997); Miles, Bishop, Collins, Fink, Gardner, Grant, Hussain, et al. (1997); Tuckman & Trimble (1997); Clarke & Stephens (1996); Zigarelli (1996); Stevenson, Lee, et al. (1995); Waters, Burger & Burger (1995); Egeland (1995); Prais (1995); Tuckman (1994); Ritchie & Thorkildsen (1994); Brown & Walberg, (1993); Wall & Alderson (1993); Wolf & Rapiau (1993); Eckstein & Noah (1993); Chao-Qun & Hui (1993); Plazak & Mazur (1992); Steedman (1992); Singh, Marimutha & Mukjerjee (1990); *Levine & Lezotte (1990); O’Sullivan (1989); Somerset (1988); Pennycuick & Murphy (1988); Stevens (1984); Marsh (1984); Brunton (1982); Solberg (1977); Foss (1977); *Kirkland (1971); Somerset (1968); Stuit (1947); and Keys (1934).  *Covers many studies; study is a research review, research synthesis, or meta-analysis. "Others have considered the role of tests in incentive programs.  These researchers have included Homme, Csanyi, Gonzales, Rechs, O’Leary, Drabman, Kaszdin, Bootzin, Staats, Cameron, Pierce, McMillan, Corcoran, and Wilson. International organizations, such as the World Bank or the Asian Development Bank, have studied the effects of testing on education programs they sponsor.  Researchers have included Somerset, Heynemann, Ransom, Psacharopoulis, Velez, Brooke, Oxenham, Bude, Chapman, Snyder, and Pronaratna.
Moreover, the mastery learning/mastery testing experiments conducted from the 1960s through today varied incentives, frequency of tests, types of tests, and many other factors to determine the optimal structure of testing programs. Researchers included such notables as Bloom, Carroll, Keller, Block, Burns, Wentling, Anderson, Hymel, Kulik, Tierney, Cross, Okey, Guskey, Gates, and Jones."
165 Harold F. O'Neil, Jr. Brenda Sugrue, Jamal Abedi, Eva L. Baker, Shari Golan "As yet, there appear to be no published studies that investigate the direct and indirect causal paths from motivational antecedents through use of metacognitive strategies to achievement."  Dismissive Final Report of Experimental Studies on Motivation and NAEP Test Performance, p.8 CSE Technical Report 427, June 1997 https://cresst.org/wp-content/uploads/TECH427.pdf Office of Research and Improvement, US Education Department Relevant studies of the effects of tests and/or accountability program on motivation and instructional practice include those of the *Southern Regional Education Board (1998); Johnson (1998); Schafer, Hultgren, Hawley, Abrams Seubert & Mazzoni (1997); Miles, Bishop, Collins, Fink, Gardner, Grant, Hussain, et al. (1997); Tuckman & Trimble (1997); Clarke & Stephens (1996); Zigarelli (1996); Stevenson, Lee, et al. (1995); Waters, Burger & Burger (1995); Egeland (1995); Prais (1995); Tuckman (1994); Ritchie & Thorkildsen (1994); Brown & Walberg, (1993); Wall & Alderson (1993); Wolf & Rapiau (1993); Eckstein & Noah (1993); Chao-Qun & Hui (1993); Plazak & Mazur (1992); Steedman (1992); Singh, Marimutha & Mukjerjee (1990); *Levine & Lezotte (1990); O’Sullivan (1989); Somerset (1988); Pennycuick & Murphy (1988); Stevens (1984); Marsh (1984); Brunton (1982); Solberg (1977); Foss (1977); *Kirkland (1971); Somerset (1968); Stuit (1947); and Keys (1934).  *Covers many studies; study is a research review, research synthesis, or meta-analysis. "Others have considered the role of tests in incentive programs.  These researchers have included Homme, Csanyi, Gonzales, Rechs, O’Leary, Drabman, Kaszdin, Bootzin, Staats, Cameron, Pierce, McMillan, Corcoran, and Wilson. International organizations, such as the World Bank or the Asian Development Bank, have studied the effects of testing on education programs they sponsor.  Researchers have included Somerset, Heynemann, Ransom, Psacharopoulis, Velez, Brooke, Oxenham, Bude, Chapman, Snyder, and Pronaratna.
Moreover, the mastery learning/mastery testing experiments conducted from the 1960s through today varied incentives, frequency of tests, types of tests, and many other factors to determine the optimal structure of testing programs. Researchers included such notables as Bloom, Carroll, Keller, Block, Burns, Wentling, Anderson, Hymel, Kulik, Tierney, Cross, Okey, Guskey, Gates, and Jones."
166 Harold F. O'Neil, Jr. Brenda Sugrue, Jamal Abedi, Eva L. Baker, Shari Golan "In general, there is a need for more studies to focus on the effects on test performance of motivational antecedents (not just anxiety) introduced at the time of test taking." Dismissive Final Report of Experimental Studies on Motivation and NAEP Test Performance, p.10 CSE Technical Report 427, June 1997 https://cresst.org/wp-content/uploads/TECH427.pdf Office of Research and Improvement, US Education Department Relevant studies of the effects of tests and/or accountability program on motivation and instructional practice include those of the *Southern Regional Education Board (1998); Johnson (1998); Schafer, Hultgren, Hawley, Abrams Seubert & Mazzoni (1997); Miles, Bishop, Collins, Fink, Gardner, Grant, Hussain, et al. (1997); Tuckman & Trimble (1997); Clarke & Stephens (1996); Zigarelli (1996); Stevenson, Lee, et al. (1995); Waters, Burger & Burger (1995); Egeland (1995); Prais (1995); Tuckman (1994); Ritchie & Thorkildsen (1994); Brown & Walberg, (1993); Wall & Alderson (1993); Wolf & Rapiau (1993); Eckstein & Noah (1993); Chao-Qun & Hui (1993); Plazak & Mazur (1992); Steedman (1992); Singh, Marimutha & Mukjerjee (1990); *Levine & Lezotte (1990); O’Sullivan (1989); Somerset (1988); Pennycuick & Murphy (1988); Stevens (1984); Marsh (1984); Brunton (1982); Solberg (1977); Foss (1977); *Kirkland (1971); Somerset (1968); Stuit (1947); and Keys (1934).  *Covers many studies; study is a research review, research synthesis, or meta-analysis. "Others have considered the role of tests in incentive programs.  These researchers have included Homme, Csanyi, Gonzales, Rechs, O’Leary, Drabman, Kaszdin, Bootzin, Staats, Cameron, Pierce, McMillan, Corcoran, and Wilson. International organizations, such as the World Bank or the Asian Development Bank, have studied the effects of testing on education programs they sponsor.  Researchers have included Somerset, Heynemann, Ransom, Psacharopoulis, Velez, Brooke, Oxenham, Bude, Chapman, Snyder, and Pronaratna.
Moreover, the mastery learning/mastery testing experiments conducted from the 1960s through today varied incentives, frequency of tests, types of tests, and many other factors to determine the optimal structure of testing programs. Researchers included such notables as Bloom, Carroll, Keller, Block, Burns, Wentling, Anderson, Hymel, Kulik, Tierney, Cross, Okey, Guskey, Gates, and Jones."
167 Brian M. Stecher Stephen P. Klein "In constrast, relatively little has been published on the costs of such measures [performance tests] in operational programs. An Office of Technology Assessment (1992) … (Hoover and Bray) …." Dismissive The Cost of Science Performance Assessments in Large-Scale Testing Programs, p.1 Educational Evaluation and Policy Analysis, Spring 1997, 19(1)   "This article is based on work supported by the National Science Foundation under Grant No. MDR-9154406." p.12 The January 1993 GAO report on testing costs included such information. CRESST has spent a quarter century denigrating that report.
168 Brian M. Stecher Stephen P. Klein "However, empirical and observational data suggest much more needs to be done to understand what hands-on tasks actually measure. Klein et al. (1996b) … Shavelson et al. (1992) … Hamilton (1994) …." pp.9-10 Dismissive The Cost of Science Performance Assessments in Large-Scale Testing Programs, p.1 Educational Evaluation and Policy Analysis, Spring 1997, 19(1)   "This article is based on work supported by the National Science Foundation under Grant No. MDR-9154406." p.12 Article references only works by other CRESST authors and completely ignores the career-tech education literature, where such studies are most likely to be found.
169 Brian M. Stecher Stephen P. Klein "Future research will no doubt shed more light on the validity question, but for now, it is not clear how scores on hands-on performance tasks should be interpreted." p.10 Dismissive The Cost of Science Performance Assessments in Large-Scale Testing Programs, p.1 Educational Evaluation and Policy Analysis, Spring 1997, 19(1)   "This article is based on work supported by the National Science Foundation under Grant No. MDR-9154406." p.12 Article references only works by other CRESST authors and completely ignores the career-tech education literature, where such studies are most likely to be found.
170 Brian M. Stecher Stephen P. Klein "Advocates of performance assessment believe that the use of these measures will reinforce efforts to reform curriculum and instruction. … Unfortunately, there is very little research to confirm either the existence or the size of most off these potential benefits. Those few studies ... Klein (1995) ... Javonovic, Solanno-Flores, & Shavelson, 1994; Klein et al., 1996a)." p.10 Dismissive The Cost of Science Performance Assessments in Large-Scale Testing Programs, p.1 Educational Evaluation and Policy Analysis, Spring 1997, 19(1)   "This article is based on work supported by the National Science Foundation under Grant No. MDR-9154406." p.12 Article references only works by other CRESST authors and completely ignores the career-tech education literature, where such studies are most likely to be found.
171 Mary Lee Smith 11 others "The purpose of the research described in this report is to understand what happens in the aftermath of a change in state assessment policy that is designed to improve schools and make them more accountable to a set of common standards. Although theoretical and rhetorical works about this issue are common in the literature, empirical evidence is novel and scant." Dismissive Reforming schools by reforming assessment: Consequences of the Arizona Student Assessment Program (ASAP): Equity and teacher capacity building, p.3 CSE Technical Report 425, March 1997 https://cresst.org/wp-content/uploads/TECH425.pdf Office of Research and Improvement, US Education Department Relevant pre-2000 studies of the effects of standards, alignment, goal setting, setting reachable goals, etc. include those of Mitchell (1999); Morgan & Ramist (1998); the *Southern Regional Education Board (1998); Miles, Bishop, Collins, Fink, Gardner, Grant, Hussain, et al. (1997); the Florida Office of Program Policy Analysis (1997); Pomplun (1997); Schmoker (1996); Aguilera & Hendricks (1996); Banta, Lund, Black & Oblander (1996); Bottoms & Mikos (1995); *Bamburg & Medina (1993); Bishop (1993); the U. S. General Accounting Office (1993); Eckstein & Noah (1993); Mattsson (1993); Brown (1992); Heyneman & Ransom (1992); Whetton (1992); Anderson, Muir, Bateson, Blackmore & Rogers (1990); Csikszentmihalyi (1990); *Levine & Lezotte (1990); LaRoque & Coleman (1989); Hillocks (1987); Willingham & Morris (1986); Resnick & Resnick (1985); Ogle & Fritts (1984); *Natriello & Dornbusch (1984); Brooke & Oxenham (1984); Rentz (1979); Wellisch, MacQueen, Carriere & Dick (1978); *Rosswork (1977); Estes, Colvin & Goodwin (1976); Wood (1953); and Panlasigui & Knight (1930).
172 Robert L. Linn Joan L. Herman "How much do standards-led assessments costs? Dependable estimates are difficult to obtain, in part because many of the costs associated with assessment -- the time spent by teachers in preparation, administration, and scoring -- are typically absorbed by schools' normal operations and not prices in a separate budget." p.14 Denigrating A Policymaker's Guide to Standards-Led Assessment Education Commission of the States, February, 1997     The January 1993 GAO report on testing costs included such information. CRESST has spent a quarter century denigrating that report. See, for example, Phelps, R.P. (2000, Winter). Estimating the cost of systemwide student testing in the United States. Journal of Education Finance, 25(3) 343–380; Danitz, T. (2001, February 27). Special report: States pay $400 million for tests in 2001. Stateline.org. Pew Center for the States; Hoxby, C.M. (2002). The cost of accountability, in W. M Evers & H.J. Walberg (Eds.), School Accountability, Stanford, CA: Hoover Institution Press; U.S. GAO. (1993, January). Student testing: Current extent and expenditures, with cost estimates for a national examination. GAO/PEMD-93-8. Washington, DC: US General Accounting Office; Phelps, R.P. (1998). Benefit-cost analysis of systemwide student testing, Paper presented at the annual meeting of the American Education Finance Association, Mobile, AL.
173 Robert L. Linn Joan L. Herman "None of the above estimates includes operational costs for schools, districts, or states." p.14 Denigrating A Policymaker's Guide to Standards-Led Assessment Education Commission of the States, February, 1997     The January 1993 GAO report on testing costs included such information. CRESST has spent a quarter century denigrating that report.
174 Eva L. Baker Robert L. Linn, Joan L. Herman "How do we assure accurate placement of students with varying abilities and language capabilities? There is little research to date to guide policy and practice (August, et al., 1994)." Dismissive CRESST: A Continuing Mission to Improve Educational Assessment, p.12 Evaluation Comment, Summer 1996   Office of Research and Improvement, US Education Department  
175 Eva L. Baker Robert L. Linn, Joan L. Herman "Alternative assessments are needed for these students (see Kentucky Portfolios for Special Education, Kentucky Department of Education, 1995). Although promising, there has been little or no research investigating the validity of inferences from these adaptations or alternatives." Dismissive CRESST: A Continuing Mission to Improve Educational Assessment, p.13 Evaluation Comment, Summer 1996   Office of Research and Improvement, US Education Department  
176 Eva L. Baker Robert L. Linn, Joan L. Herman "Similarly, research is needed to provide a basis for understanding the implications of using different summaries of student performance, such as group means or percentage of students meeting a standard, for measuring progress." p.15 Dismissive CRESST: A Continuing Mission to Improve Educational Assessment Evaluation Comment, Summer 1996   Office of Research and Improvement, US Education Department  
177 Robert L. Linn Daniel M. Koretz, Eva Baker “’Yet we do not have the necessary comprehensive dependable data. . . .’ (Tyler 1996a, p. 95)” p. 8 Dismissive Assessing the Validity of the National Assessment of Educational Progress CSE Technical Report 416 (June 1996) http://www.cse.ucla.edu/products/reports/TECH416.pdf Office of Research and Improvement, US Education Department In their 2009 Evaluation of NAEP for the US Education Department, Buckendahl, Davis, Plake, Sireci, Hambleton, Zenisky, & Wells (pp. 77–85) managed to find quite a lot of research on making comparisons between NAEP and state assessments: several of NAEP's own publications, Chromy 2005), Chromy, Ault, Black, & Mosquin (2007), McLaughlin (2000), Schuiz & Mitzel (2005), Sireci, Robin, Meara, Rogers, & Swaminathan (2000),  Stancavage, Et al (2002),  Stoneberg (2007), WestEd (2002), and Wise, Le, Hoffman, & Becker (2004). 
178 Robert L. Linn Daniel M. Koretz, Eva Baker "“There is a need for more extended discussion and reconsideration of the approach being used to measure long-term trends.” p. 21  Dismissive Assessing the Validity of the National Assessment of Educational Progress CSE Technical Report 416 (June 1996) http://www.cse.ucla.edu/products/reports/TECH416.pdf Office of Research and Improvement, US Education Department There was extended discussion and cosideration. Simply put, they did not get their way because others disagreed with them.
179 Robert L. Linn Daniel M. Koretz, Eva Baker "“Only a small minority of the articles that discussed achievement levels made any mention of the judgmental nature of the levels, and most of those did so only briefly.” p. 27 Denigrating Assessing the Validity of the National Assessment of Educational Progress CSE Technical Report 416 (June 1996) http://www.cse.ucla.edu/products/reports/TECH416.pdf Office of Research and Improvement, US Education Department All achievement levels, just like all course grades, are set subjectively. This information was never hidden.
180 Thomas Kellaghan George F. Madaus, Anastasia Raczek "The limited evidence on the effectiveness of external, or extrinsic, rewards in education is also reviewed." p.vii Dismissive The Use of External Examinations to Improve Student Motication American Educational Research Association monograph   "Work on this monograph was supported by Grant 910-1205-1 from the Ford Foundation." See, for example:  https://www.tandfonline.com/doi/full/10.1080/15305058.2011.602920 ; https://nonpartisaneducation.org/Review/Resources/QuantitativeList.htm .  This list includes 24 studies completed before 2000 whose primary focus was to measure the effect of “test-based accountability.” A few dozen more pre-2000 studies also measured the effect of test-based accountability although such was not their primary focus. Include qualitative and program evaluation studies of test-based accountability, and the count of pre-2000 studies rises into the hundreds. 
181 Lawrence O. Picus   "Although several states have implemenmted new assessment programs, there has been little research on the cost of developing and implementing these new systems." p.3 Dismissive Estimating the Costs of Student Assessment in North Carolina and Kentucky: A State-Level Analysis CSE Technical Report 408 (February 1996) http://www.cse.ucla.edu/products/reports/TECH408.pdf Office of Research and Improvement, US Education Department The January 1993 GAO report on testing costs included such information. Picus spent six years denigrating that report, by insinuation.
182 Thomas Kellaghan George F. Madaus, Anastasia Raczek "At the very least, a careful analysis of relecvant issues and a consideration of empirical evidence are required before reaching such a conclusion.   However, the arguments put forward by reformers are not based on such analysis or consideration. Indeed, their arguments often lack clarity, even in the terminology they use. Further, although not much research deals directly with the relationship between external examinations and motivation, ..." p.2 Dismissive, Denigrating The Use of External Examinations to Improve Student Motication American Educational Research Association monograph   "Work on this monograph was supported by Grant 910-1205-1 from the Ford Foundation." Relevant studies of the effects of tests and/or accountability program on motivation and instructional practice include those of the *Southern Regional Education Board (1998); Johnson (1998); Schafer, Hultgren, Hawley, Abrams Seubert & Mazzoni (1997); Miles, Bishop, Collins, Fink, Gardner, Grant, Hussain, et al. (1997); Tuckman & Trimble (1997); Clarke & Stephens (1996); Zigarelli (1996); Stevenson, Lee, et al. (1995); Waters, Burger & Burger (1995); Egeland (1995); Prais (1995); Tuckman (1994); Ritchie & Thorkildsen (1994); Brown & Walberg, (1993); Wall & Alderson (1993); Wolf & Rapiau (1993); Eckstein & Noah (1993); Chao-Qun & Hui (1993); Plazak & Mazur (1992); Steedman (1992); Singh, Marimutha & Mukjerjee (1990); *Levine & Lezotte (1990); O’Sullivan (1989); Somerset (1988); Pennycuick & Murphy (1988); Stevens (1984); Marsh (1984); Brunton (1982); Solberg (1977); Foss (1977); *Kirkland (1971); Somerset (1968); Stuit (1947); and Keys (1934).  *Covers many studies; study is a research review, research synthesis, or meta-analysis.
183 Thomas Kellaghan George F. Madaus, Anastasia Raczek "The final proposition in the armory of proponents of external examinations anticipates that all students at selected grades at both elementary and high school levels will take such examinations. This proposition is presumably based on the unexamined assumption that the motivational power of examinations will operate more or less the same way for students of all ages." p.10 Dismissive, Denigrating The Use of External Examinations to Improve Student Motication American Educational Research Association monograph   "Work on this monograph was supported by Grant 910-1205-1 from the Ford Foundation." Relevant studies of the effects of tests and/or accountability program on motivation and instructional practice include those of the *Southern Regional Education Board (1998); Johnson (1998); Schafer, Hultgren, Hawley, Abrams Seubert & Mazzoni (1997); Miles, Bishop, Collins, Fink, Gardner, Grant, Hussain, et al. (1997); Tuckman & Trimble (1997); Clarke & Stephens (1996); Zigarelli (1996); Stevenson, Lee, et al. (1995); Waters, Burger & Burger (1995); Egeland (1995); Prais (1995); Tuckman (1994); Ritchie & Thorkildsen (1994); Brown & Walberg, (1993); Wall & Alderson (1993); Wolf & Rapiau (1993); Eckstein & Noah (1993); Chao-Qun & Hui (1993); Plazak & Mazur (1992); Steedman (1992); Singh, Marimutha & Mukjerjee (1990); *Levine & Lezotte (1990); O’Sullivan (1989); Somerset (1988); Pennycuick & Murphy (1988); Stevens (1984); Marsh (1984); Brunton (1982); Solberg (1977); Foss (1977); *Kirkland (1971); Somerset (1968); Stuit (1947); and Keys (1934).  *Covers many studies; study is a research review, research synthesis, or meta-analysis.
184 Robert L. Linn Eva L. Baker "Although the connection between student achievement and economic competitiveness is not well established, exhortations for higher standards of student achievement nonetheless are frequently based on the assumption of a strong connection." Dismissive What Do International Assessments Imply for World-Class Standards? Educational Evaluation and Policy Analysis, Dec. 1, 1995 https://journals.sagepub.com/doi/abs/10.3102/01623737017004405 Office of Research and Improvement, US Education Department  
185 Lawrence O. Picus   "While our understanding of how each of these assessment instruments can best be used is growing, information of their costs is virtually nonexistent." p.1 Dismissive A Conceptual Framework for Analyzing the Costs of Alternative Assessment CSE Technical Report 384 (August 1994) https://cresst.org/wp-content/uploads/TECH384.pdf Office of Research and Improvement, US Education Department The January 1993 GAO report on testing costs included such information. Picus spent six years denigrating that report, by insinuation.
186 Lawrence O. Picus   "Research at the Center for Research on Evaluation, Standards, and Student Testing (CRESST) has found that policy makers have little information about the costs of alternative assessments, and that they are concerned abou the cost trade-offs involved in using alternative assessment compared to the many other activities they feel continue to be necessary." p.1 Dismissive A Conceptual Framework for Analyzing the Costs of Alternative Assessment CSE Technical Report 384 (August 1994) https://cresst.org/wp-content/uploads/TECH384.pdf Office of Research and Improvement, US Education Department The January 1993 GAO report on testing costs included such information. Picus spent six years denigrating that report, by insinuation.
187 Lawrence O. Picus   "A number of important issues must be resolved before accurate estimates of costs can be developed. Central among those issues is the development of a clear definition of what constitutes a cost." p.1 Denigrating A Conceptual Framework for Analyzing the Costs of Alternative Assessment CSE Technical Report 384 (August 1994) https://cresst.org/wp-content/uploads/TECH384.pdf Office of Research and Improvement, US Education Department The January 1993 GAO report on testing costs included such information. Picus spent six years denigrating that report, by insinuation.
188 Lawrence O. Picus   "Determining the resources necessary to achieve each of these goals is, at best, a difficult task. Because of this difficulty, many analysts stop short of estimating the true cost of a program, and instead focus on the expenditures required for its implementation." pp.3-4 Denigrating A Conceptual Framework for Analyzing the Costs of Alternative Assessment CSE Technical Report 384 (August 1994) https://cresst.org/wp-content/uploads/TECH384.pdf Office of Research and Improvement, US Education Department The January 1993 GAO report on testing costs included such information. Picus spent six years denigrating that report, by insinuation.
189 Lawrence O. Picus   "… cost analysts in education have often resorted to estimating the monetary value of the resources devoted to the program being evaluated. ... However, it is important to remember the opportunity costs that result from time commitments of individuals not directly compensated through the assessment program, such as the teachers who are required to spend time on tasks that previously did not exist or were not their responsibility. Determining the value of these opportunity costs will improve the quality of educational cost analyses dramatically." p.33 Denigrating A Conceptual Framework for Analyzing the Costs of Alternative Assessment CSE Technical Report 384 (August 1994) https://cresst.org/wp-content/uploads/TECH384.pdf Office of Research and Improvement, US Education Department The January 1993 GAO report on testing costs included such information. Picus spent six years denigrating that report, by insinuation.
190 Mary Lee Smith 5 others "This study also draws on previous research on the role of mandated testing. …The question unanswered by extant research is whether assessments that differ in form from the traditional, norm- or criterion-referenced standardized tests would produce similar reactions and effects." Dismissive What Happens When the Test Mandate Changes? Results of a Multiple Case Study CSE Technical Report 380, July 1994 https://cresst.org/wp-content/uploads/TECH380.pdf Office of Research and Improvement, US Education Department Emerging? It is selected-response item formats (e.g., multiple choice) that are new. Open-ended item formats have been with us for millenia. And, thousands of research, evalution, and validity studies have been conducted on them.
191 Audrey J. Noble Mary Lee Smith "Are the behaviorist beliefs underlying measurement-driven reform warranted? A small body of evidence addresses the functions of assessments from the traditional viewpoint. Dismissive Old and New Beliefs About Measurement-Driven Reform: The More Things Change, the More They Stay the Same, p.3 CSE Technical Report 373, CRESST/Arizona State University https://cresst.org/wp-content/uploads/TECH373.pdf Office of Research and Improvement, US Education Department Emerging? It is selected-response item formats (e.g., multiple choice) that are new. Open-ended item formats have been with us for millenia. And, thousands of research, evalution, and validity studies have been conducted on them.
192 Audrey J. Noble Mary Lee Smith "Few empirical studies exist of the use and effects of performance testing in
high-stakes  environments."
Dismissive Old and New Beliefs About Measurement-Driven Reform: The More Things Change, the More They Stay the Same, p.10 CSE Technical Report 373, CRESST/Arizona State University https://cresst.org/wp-content/uploads/TECH373.pdf Office of Research and Improvement, US Education Department Emerging? It is selected-response item formats (e.g., multiple choice) that are new. Open-ended item formats have been with us for millenia. And, thousands of research, evalution, and validity studies have been conducted on them.
193 Eva L. Baker Robert L. Linn "Because performance assessments are emerging phenomena, procedures for assessing their quality are in some disorder." Denigrating The Technical Merits of Performance Assessments, p.1 CRESST Line, Special 1993 AERA Issue   Office of Research and Improvement, US Education Department Emerging? It is selected-response item formats (e.g., multiple choice) that are new. Open-ended item formats have been with us for millenia. And, thousands of research, evalution, and validity studies have been conducted on them.
194 Eva L. Baker Robert L. Linn "Second, there is relatively little analysis of the sequence of technical procedures required to render assessments sound for some uses."  Dismissive The Technical Merits of Performance Assessments, p.1 CRESST Line, Special 1993 AERA Issue   Office of Research and Improvement, US Education Department Emerging? It is selected-response item formats (e.g., multiple choice) that are new. Open-ended item formats have been with us for millenia. And, thousands of research, evalution, and validity studies have been conducted on them.
195 Eva L. Baker Robert L. Linn "The problem is that we cannot learn enough from the conduct of short-term instructional studies, nor can we wait for the results of longer-term instructional programs. ...We must continue to operate on faith." Denigrating The Technical Merits of Performance Assessments, p.2 CRESST Line, Special 1993 AERA Issue   Office of Research and Improvement, US Education Department Emerging? It is selected-response item formats (e.g., multiple choice) that are new. Open-ended item formats have been with us for millenia. And, thousands of research, evalution, and validity studies have been conducted on them.
196 Walter M. Haney George F. Madaus, Robert Lyons "Academics who write about educational and psychological testing similarly have given little attention to the commercial side of testing." p.9 Dismissive The Fractured Marketplace for Standardized Testing,  National Commission on Testing and Public Policy, Boston College, Kluwer Academic Publishers, 1993   "Finally we thank the Ford Foundation, and three present and former officials there, …"  
197 Walter M. Haney George F. Madaus, Robert Lyons "Nor is there much clear evidence on the potential distortions introduced by the Lake Wobegon phenomenon." p.231 Dismissive The Fractured Marketplace for Standardized Testing,  National Commission on Testing and Public Policy, Boston College, Kluwer Academic Publishers, 1993   "Finally we thank the Ford Foundation, and three present and former officials there, …" John J. Cannells original "Lake Wobegon Effect" studies did a fine job of specifying the results, in detail.  See:  http://nonpartisaneducation.org/Review/Books/CannellBook1.htm  http://nonpartisaneducation.org/Review/Books/Cannell2.pdf
198 Robert L. Linn Vonda L. Kiplinger "Unfortunately, there have been no empirical studies to date to either support or reject the hypothesized lack of motivation generated by the NAEP testing environment, or to show whether students' performance would be improved if motivation were increased." 1stness Raising the stakes of test administration: The impact on student performance on NAEP, p.3 CSE Technical Report 360, March 3, 1993 https://files.eric.ed.gov/fulltext/ED378221.pdf Office of Research and Improvement, US Education Department A cornucopia of research has shown "no stakes" tests to be relatively unreliable, less reliable than high stakes tests, and to dampen student effort (see, e.g., Acherman & Kanfer, 2009; S. M. Brown & Walberg, 1993; Cole, Bergin, & Whittaker, 2008; Eklof, 2007; Finn, 2015; Hawthorne, Bol, Pribesh, & Suh, 2015; Wise & DeMars, 2005, 2015). Relevant studies of the effects of tests and/or accountability program on motivation and instructional practice include those of the *Southern Regional Education Board (1998); Johnson (1998); Schafer, Hultgren, Hawley, Abrams Seubert & Mazzoni (1997); Miles, Bishop, Collins, Fink, Gardner, Grant, Hussain, et al. (1997); Tuckman & Trimble (1997); Clarke & Stephens (1996); Zigarelli (1996); Stevenson, Lee, et al. (1995); Waters, Burger & Burger (1995); Egeland (1995); Prais (1995); Tuckman (1994); Ritchie & Thorkildsen (1994); Brown & Walberg, (1993); Wall & Alderson (1993); Wolf & Rapiau (1993); Eckstein & Noah (1993); Chao-Qun & Hui (1993); Plazak & Mazur (1992); Steedman (1992); Singh, Marimutha & Mukjerjee (1990); *Levine & Lezotte (1990); O’Sullivan (1989); Somerset (1988); Pennycuick & Murphy (1988); Stevens (1984); Marsh (1984); Brunton (1982); Solberg (1977); Foss (1977); *Kirkland (1971); Somerset (1968); Stuit (1947); and Keys (1934).  *Covers many studies; study is a research review, research synthesis, or meta-analysis.
199 Robert L. Linn Vonda L. Kiplinger "Although much has been written on achievement motivation per se, there has been surprisingly little empirical research on the effects of different motivation conditions on test performance. Before examining the paucity of research on the relationship of motivation and test performance....?" Dismissive Raising the stakes of test administration: The impact on student performance on NAEP, p.3 CSE Technical Report 360, March 3, 1993 https://files.eric.ed.gov/fulltext/ED378221.pdf Office of Research and Improvement, US Education Department A cornucopia of research has shown "no stakes" tests to be relatively unreliable, less reliable than high stakes tests, and to dampen student effort (see, e.g., Acherman & Kanfer, 2009; S. M. Brown & Walberg, 1993; Cole, Bergin, & Whittaker, 2008; Eklof, 2007; Finn, 2015; Hawthorne, Bol, Pribesh, & Suh, 2015; Wise & DeMars, 2005, 2015). Relevant studies of the effects of tests and/or accountability program on motivation and instructional practice include those of the *Southern Regional Education Board (1998); Johnson (1998); Schafer, Hultgren, Hawley, Abrams Seubert & Mazzoni (1997); Miles, Bishop, Collins, Fink, Gardner, Grant, Hussain, et al. (1997); Tuckman & Trimble (1997); Clarke & Stephens (1996); Zigarelli (1996); Stevenson, Lee, et al. (1995); Waters, Burger & Burger (1995); Egeland (1995); Prais (1995); Tuckman (1994); Ritchie & Thorkildsen (1994); Brown & Walberg, (1993); Wall & Alderson (1993); Wolf & Rapiau (1993); Eckstein & Noah (1993); Chao-Qun & Hui (1993); Plazak & Mazur (1992); Steedman (1992); Singh, Marimutha & Mukjerjee (1990); *Levine & Lezotte (1990); O’Sullivan (1989); Somerset (1988); Pennycuick & Murphy (1988); Stevens (1984); Marsh (1984); Brunton (1982); Solberg (1977); Foss (1977); *Kirkland (1971); Somerset (1968); Stuit (1947); and Keys (1934).  *Covers many studies; study is a research review, research synthesis, or meta-analysis.
200 Robert L. Linn Vonda L. Kiplinger "Before examining the paucity of research on the relationship of motivation and test performance, we first review briefly the general literature on the relationship of motivation and achievement." Dismissive Raising the stakes of test administration: The impact on student performance on NAEP, p.3 CSE Technical Report 360, March 3, 1993 https://files.eric.ed.gov/fulltext/ED378221.pdf Office of Research and Improvement, US Education Department A cornucopia of research has shown "no stakes" tests to be relatively unreliable, less reliable than high stakes tests, and to dampen student effort (see, e.g., Acherman & Kanfer, 2009; S. M. Brown & Walberg, 1993; Cole, Bergin, & Whittaker, 2008; Eklof, 2007; Finn, 2015; Hawthorne, Bol, Pribesh, & Suh, 2015; Wise & DeMars, 2005, 2015). Relevant studies of the effects of tests and/or accountability program on motivation and instructional practice include those of the *Southern Regional Education Board (1998); Johnson (1998); Schafer, Hultgren, Hawley, Abrams Seubert & Mazzoni (1997); Miles, Bishop, Collins, Fink, Gardner, Grant, Hussain, et al. (1997); Tuckman & Trimble (1997); Clarke & Stephens (1996); Zigarelli (1996); Stevenson, Lee, et al. (1995); Waters, Burger & Burger (1995); Egeland (1995); Prais (1995); Tuckman (1994); Ritchie & Thorkildsen (1994); Brown & Walberg, (1993); Wall & Alderson (1993); Wolf & Rapiau (1993); Eckstein & Noah (1993); Chao-Qun & Hui (1993); Plazak & Mazur (1992); Steedman (1992); Singh, Marimutha & Mukjerjee (1990); *Levine & Lezotte (1990); O’Sullivan (1989); Somerset (1988); Pennycuick & Murphy (1988); Stevens (1984); Marsh (1984); Brunton (1982); Solberg (1977); Foss (1977); *Kirkland (1971); Somerset (1968); Stuit (1947); and Keys (1934).  *Covers many studies; study is a research review, research synthesis, or meta-analysis.
201 Robert L. Linn Vonda L. Kiplinger "Prior to 1980, achievement motivation theory focused primarily on the need for achievement and the effects of test anxiety on test performance." Dismissive Raising the stakes of test administration: The impact on student performance on NAEP, p.3 CSE Technical Report 360, March 3, 1993 https://files.eric.ed.gov/fulltext/ED378221.pdf Office of Research and Improvement, US Education Department A cornucopia of research has shown "no stakes" tests to be relatively unreliable, less reliable than high stakes tests, and to dampen student effort (see, e.g., Acherman & Kanfer, 2009; S. M. Brown & Walberg, 1993; Cole, Bergin, & Whittaker, 2008; Eklof, 2007; Finn, 2015; Hawthorne, Bol, Pribesh, & Suh, 2015; Wise & DeMars, 2005, 2015). Relevant studies of the effects of tests and/or accountability program on motivation and instructional practice include those of the *Southern Regional Education Board (1998); Johnson (1998); Schafer, Hultgren, Hawley, Abrams Seubert & Mazzoni (1997); Miles, Bishop, Collins, Fink, Gardner, Grant, Hussain, et al. (1997); Tuckman & Trimble (1997); Clarke & Stephens (1996); Zigarelli (1996); Stevenson, Lee, et al. (1995); Waters, Burger & Burger (1995); Egeland (1995); Prais (1995); Tuckman (1994); Ritchie & Thorkildsen (1994); Brown & Walberg, (1993); Wall & Alderson (1993); Wolf & Rapiau (1993); Eckstein & Noah (1993); Chao-Qun & Hui (1993); Plazak & Mazur (1992); Steedman (1992); Singh, Marimutha & Mukjerjee (1990); *Levine & Lezotte (1990); O’Sullivan (1989); Somerset (1988); Pennycuick & Murphy (1988); Stevens (1984); Marsh (1984); Brunton (1982); Solberg (1977); Foss (1977); *Kirkland (1971); Somerset (1968); Stuit (1947); and Keys (1934).  *Covers many studies; study is a research review, research synthesis, or meta-analysis.
202 Robert L. Linn Vonda L. Kiplinger "Despite continuing concern regarding the effects of motivation on student achievement and test performance in general, ...there has been very little empirical research on students' self-reported motivation levels or experimental manipulation of motivational conditions--until recently." Dismissive Raising the stakes of test administration: The impact on student performance on NAEP, p.3 CSE Technical Report 360, March 3, 1993 https://files.eric.ed.gov/fulltext/ED378221.pdf Office of Research and Improvement, US Education Department A cornucopia of research has shown "no stakes" tests to be relatively unreliable, less reliable than high stakes tests, and to dampen student effort (see, e.g., Acherman & Kanfer, 2009; S. M. Brown & Walberg, 1993; Cole, Bergin, & Whittaker, 2008; Eklof, 2007; Finn, 2015; Hawthorne, Bol, Pribesh, & Suh, 2015; Wise & DeMars, 2005, 2015). Relevant studies of the effects of tests and/or accountability program on motivation and instructional practice include those of the *Southern Regional Education Board (1998); Johnson (1998); Schafer, Hultgren, Hawley, Abrams Seubert & Mazzoni (1997); Miles, Bishop, Collins, Fink, Gardner, Grant, Hussain, et al. (1997); Tuckman & Trimble (1997); Clarke & Stephens (1996); Zigarelli (1996); Stevenson, Lee, et al. (1995); Waters, Burger & Burger (1995); Egeland (1995); Prais (1995); Tuckman (1994); Ritchie & Thorkildsen (1994); Brown & Walberg, (1993); Wall & Alderson (1993); Wolf & Rapiau (1993); Eckstein & Noah (1993); Chao-Qun & Hui (1993); Plazak & Mazur (1992); Steedman (1992); Singh, Marimutha & Mukjerjee (1990); *Levine & Lezotte (1990); O’Sullivan (1989); Somerset (1988); Pennycuick & Murphy (1988); Stevens (1984); Marsh (1984); Brunton (1982); Solberg (1977); Foss (1977); *Kirkland (1971); Somerset (1968); Stuit (1947); and Keys (1934).  *Covers many studies; study is a research review, research synthesis, or meta-analysis.
203 Lorrie A. Shepard   "Proponents of measurement-driveninstruction (MDI) argued, in the 1980s, that high-stakes tests would set clear targets thus assuring that teachers would focus greater attentionon essential basic skills. Critics countered that measurement-driven instruction distorts the curriculum, .... Each side argued theoretically and from limited observations but without systematic proof of these assertions." Dismissive Will National Tests Improve Student Learning?, p.6 CSE Technical Report 342, April 1992 https://files.eric.ed.gov/fulltext/ED348382.pdf Office of Research and Improvement, US Education Department Relevant pre-2000 studies of the effects of standards, alignment, goal setting, setting reachable goals, etc. include those of Mitchell (1999); Morgan & Ramist (1998); the *Southern Regional Education Board (1998); Miles, Bishop, Collins, Fink, Gardner, Grant, Hussain, et al. (1997); the Florida Office of Program Policy Analysis (1997); Pomplun (1997); Schmoker (1996); Aguilera & Hendricks (1996); Banta, Lund, Black & Oblander (1996); Bottoms & Mikos (1995); *Bamburg & Medina (1993); Bishop (1993); the U. S. General Accounting Office (1993); Eckstein & Noah (1993); Mattsson (1993); Brown (1992); Heyneman & Ransom (1992); Whetton (1992); Anderson, Muir, Bateson, Blackmore & Rogers (1990); Csikszentmihalyi (1990); *Levine & Lezotte (1990); LaRoque & Coleman (1989); Hillocks (1987); Willingham & Morris (1986); Resnick & Resnick (1985); Ogle & Fritts (1984); *Natriello & Dornbusch (1984); Brooke & Oxenham (1984); Rentz (1979); Wellisch, MacQueen, Carriere & Dick (1978); *Rosswork (1977); Estes, Colvin & Goodwin (1976); Wood (1953); and Panlasigui & Knight (1930).
204 Lorrie A. Shepard   "The vision of curriculum-driven examinations offered by the National Education Goals Panel is inspired. However, we do not at present have the technical, curricular, or political know-how to install such a system at least not on so large a scale." Dismissive Will National Tests Improve Student Learning?, p.10 CSE Technical Report 342, April 1992 https://files.eric.ed.gov/fulltext/ED348382.pdf Office of Research and Improvement, US Education Department  
205 Lorrie A. Shepard   "Moreover, there is no evidence available about what would happen to the quality of instruction if all high-school teachers, not just those who volunteered, were required to teach to the AP curricula." Dismissive Will National Tests Improve Student Learning?, p.10 CSE Technical Report 342, April 1992 https://files.eric.ed.gov/fulltext/ED348382.pdf Office of Research and Improvement, US Education Department  
206 Lorrie A. Shepard   "Research evidence on the effects of traditional standardized tests when used as high-stakes accountability instruments is strikingly negative." Dismissive Will National Tests Improve Student Learning?, pp.15-16 CSE Technical Report 342, April 1992 https://files.eric.ed.gov/fulltext/ED348382.pdf Office of Research and Improvement, US Education Department In fact, the evidence "that testing can improve education" is voluminous. See, for example, Phelps, R. P. (2005). The rich, robust research literature on testing’s achievement benefits. In R. P. Phelps (Ed.), Defending standardized testing (pp. 55–90). Mahwah, NJ: Psychology Press. Or, see https://journals.sagepub.com/doi/abs/10.1177/0193841X19865628#abstract
207 Joan L. Herman Shari Golan ""Using greater technical rigor, Linn et al. (1989) replicated Cannell's findings, but moved beyond them in identifying underlying causes for such seemingly spurious results, among them the age of norms." pp.10-11 Denigrating Effects of Standardized Testing on Teachers and Learning—Another Look CSE Report No. 334 https://eric.ed.gov/?id=ED341738 Office of Research and Improvement, US Education Department No. Cannell was exactly right. The cause was corruption, lax security, and cheating. See, for example, https://nonpartisaneducation.org/Review/Articles/v6n3.htm
208 R.J. Dietel, J.L. Herman, and R.A. Knuth   "Although there is now great excitement about performance-based assessment, we still know relatively little about methods for designing and validating such assessments. CRESST is one of many organizations and schools researching the promises and realities of such assessments." p.3 Dismissive What Does Research Say About Assessment? North Central Regional Education Laboratory, 1991 http://methodenpool.uni-koeln.de/portfolio/What%20Does%20Research%20Say%20About%20Assessment.htm Office of Research and Improvement, US Education Department It is selected-response item formats (e.g., multiple choice) that are new. Open-ended item formats have been with us for millenia. And, thousands of research, evalution, and validity studies have been conducted on them.
209 R.J. Dietel, J.L. Herman, and R.A. Knuth   "What we know about performance-based assessment is limited and there are many issues yet to be resolved." p.6 Dismissive What Does Research Say About Assessment? North Central Regional Education Laboratory, 1991 http://methodenpool.uni-koeln.de/portfolio/What%20Does%20Research%20Say%20About%20Assessment.htm Office of Research and Improvement, US Education Department It is selected-response item formats (e.g., multiple choice) that are new. Open-ended item formats have been with us for millenia. And, thousands of research, evalution, and validity studies have been conducted on them.
210 Mary Lee Smith Carole Edelsky, Kelly Draper, Claire Rottenberg, Meredith Cherland "The research literature on the effects of external testing is small but growing." p.3 Dismissive The Role of Testing in Elementary Schools CSE Technical Report 321, May 1991 https://cresst.org/wp-content/uploads/TECH334.pdf Office of Research and Improvement, US Education Department Rubbish. Entire books dating back a century were written on the topic, for example:  C.C. Ross, Measurement in Today’s Schools, 1942;  G.M. Ruch, G.D. Stoddard, Tests and Measurements in High School Instruction, 1927;  C.W. Odell, Educational Measurement in High School, 1930. Other testimonies to the abundance of educational testing and empirical research on test use starting in the first half of the twentieth century can be found in Lincoln & Workman 1936, 4, 7; Butts 1947, 605; Monroe 1950, 1461; Holman & Docter 1972, 34; Tyack 1974, 183; and Lohman 1997, 88.
211 Mary Lee Smith Carole Edelsky, Kelly Draper, Claire Rottenberg, Meredith Cherland "Past researchers have not examined the classroom directly for traces of testing effects." p.5 Dismissive The Role of Testing in Elementary Schools CSE Technical Report 321, May 1991 https://cresst.org/wp-content/uploads/TECH334.pdf Office of Research and Improvement, US Education Department Rubbish. Entire books dating back a century were written on the topic, for example:  C.C. Ross, Measurement in Today’s Schools, 1942;  G.M. Ruch, G.D. Stoddard, Tests and Measurements in High School Instruction, 1927;  C.W. Odell, Educational Measurement in High School, 1930. Other testimonies to the abundance of educational testing and empirical research on test use starting in the first half of the twentieth century can be found in Lincoln & Workman 1936, 4, 7; Butts 1947, 605; Monroe 1950, 1461; Holman & Docter 1972, 34; Tyack 1974, 183; and Lohman 1997, 88.
212 Lorrie A. Shepard Catherine Cutts Dougherty "Evidence to support the positive claims for measurement-driven instruction comes primarily from high-stakes tests themselves. For example, Popham, Cruse, Rankin, Sandifer, and Williams (1985) and Popham (1987) pointed to the steeply rising passing rates on minimum competency tests as demonstrations that MDI had improved student learning." p.2 Denigrating Effect of High-Stakes Testing on Instruction Paper presented at the Annual Meetings of the American Educational Research Association (Chicago, IL, April 3-7, 1991) and the National Council on Measurement in Education (Chicago, IL, April 4-6,1991) https://files.eric.ed.gov/fulltext/ED337468.pdf Office of Research and Improvement, US Education Department The many studies of district and state minimum competency or diploma testing programs popular from the 1960s through the 1980s found positive effects for students just below the cut score and mixed effects for students far below and anywhere above.  Researchers have included Fincher, Jackson, Battiste, Corcoran, Jacobsen, Tanner, Boylan, Saxon, Anderson, Muir, Bateson, Blackmore, Rogers, Zigarelli, Schafer, Hultgren, Hawley, Abrams, Seubert, Mazzoni, Brookhart, Mendro, Herrick, Webster, Orsack, Weerasinghe, and Bembry
213 Lorrie A. Shepard Catherine Cutts Dougherty "Evidence documenting the negative influence on instruction is limited to a few studies. Darling-Hammond and Wise (1985) reported that teachers in their study were pressured to 'teach to the test.'" Dismissive Effect of High-Stakes Testing on Instruction Paper presented at the Annual Meetings of the American Educational Research Association (Chicago, IL, April 3-7, 1991) and the National Council on Measurement in Education (Chicago, IL, April 4-6,1991) https://files.eric.ed.gov/fulltext/ED337468.pdf Office of Research and Improvement, US Education Department Rubbish. Entire books dating back a century were written on the topic, for example:  C.C. Ross, Measurement in Today’s Schools, 1942;  G.M. Ruch, G.D. Stoddard, Tests and Measurements in High School Instruction, 1927;  C.W. Odell, Educational Measurement in High School, 1930. Other testimonies to the abundance of educational testing and empirical research on test use starting in the first half of the twentieth century can be found in Lincoln & Workman 1936, 4, 7; Butts 1947, 605; Monroe 1950, 1461; Holman & Docter 1972, 34; Tyack 1974, 183; and Lohman 1997, 88.
214 Daniel M. Koretz Robert L. Linn, Stephen Dunbar, Lorrie A. Shepard “Evidence relevant to this debate has been limited.” p. 2 Dismissive The Effects of High-Stakes Testing On Achievement: Preliminary Findings About Generalization Across Tests  Originally presented at the annual meeting of the AERA and the NCME, Chicago, April 5, 1991 http://nepc.colorado.edu/files/HighStakesTesting.pdf Office of Research and Improvement, US Education Department See, for example, https://www.tandfonline.com/doi/full/10.1080/15305058.2011.602920 ; https://nonpartisaneducation.org/Review/Resources/QuantitativeList.htm ; https://nonpartisaneducation.org/Review/Resources/SurveyList.htm ; https://nonpartisaneducation.org/Review/Resources/QualitativeList.htm
215 James S. Catterall   "Before proceeding, readers should note that the observations do not result from an accumulated weight of in-depth cost-benefit type studies, since no such weight has been registered." p.2 Dismissive Estimating the Costs and Benefits of Large-Scale Assessments: Lessons from Recent Research CSE Report No. 319, 1990 https://cresst.org/wp-content/uploads/TECH319.pdf Office of Research and Improvement, US Education Department  
216 James S. Catterall   "The points tend to build on the small number of interesting developments reported (particularly Shepard & Kreitzer, 1987a, 1987b; Solmon & Fagnano, in press), as well as on the author's experiences in conducting cost-benefit type analyses of educational assessment practices (Catterall, 1984, 1989). We also base inferences on the paucity of research itself." p.2 Dismissive Estimating the Costs and Benefits of Large-Scale Assessments: Lessons from Recent Research CSE Report No. 319, 1990 https://cresst.org/wp-content/uploads/TECH319.pdf Office of Research and Improvement, US Education Department  
217 Hartigan, J. A., & Wigdor, A. K.   "The empirical evidence cited for the standard deviation of worker productivity is quite slight." p.239 Dismissive Fairness in employment testing: Validity generalization, minority issues, and the General Aptitude Test Battery.  Washington, DC: National Academy Press, 1989 https://www.nap.edu/catalog/1338/fairness-in-employment-testing-validity-generalization-minority-issues-and-the National Research Council funders See, for example, The National Research Council’s Testing Expertise,  http://supp.apa.org/books/Correcting-Fallacies/appendix-d.pdf
218 Hartigan, J. A., & Wigdor, A. K.   "Some fragmentary confirming evidence that supports this point of view can be found in Hunter et al. (1988)... We regard the Hunter and Schmidt assumption as plausible but note that there is very little evidence about the nature of the relationship of ability to output." p.243 Dismissive Fairness in employment testing: Validity generalization, minority issues, and the General Aptitude Test Battery.  Washington, DC: National Academy Press, 1989 https://www.nap.edu/catalog/1338/fairness-in-employment-testing-validity-generalization-minority-issues-and-the National Research Council funders See, for example, The National Research Council’s Testing Expertise,  http://supp.apa.org/books/Correcting-Fallacies/appendix-d.pdf
219 Hartigan, J. A., & Wigdor, A. K.   "It is also important to remember that the most important assumptions of the Hunter-Schmidt models rest on a very slim empirical foundation .... Hunter and Schmidt's economy-wide models are based on simple assumptions for which the empirical evidence is slight." p.245 Dismissive, Denigrating Fairness in employment testing: Validity generalization, minority issues, and the General Aptitude Test Battery.  Washington, DC: National Academy Press, 1989 https://www.nap.edu/catalog/1338/fairness-in-employment-testing-validity-generalization-minority-issues-and-the National Research Council funders See, for example, The National Research Council’s Testing Expertise,  http://supp.apa.org/books/Correcting-Fallacies/appendix-d.pdf
220 Hartigan, J. A., & Wigdor, A. K.   "It is important to remember that the most important assumptions of the Hunter-Schmidt models rest on a very slim empirical foundation." p.245 Dismissive, Denigrating Fairness in employment testing: Validity generalization, minority issues, and the General Aptitude Test Battery.  Washington, DC: National Academy Press, 1989 https://www.nap.edu/catalog/1338/fairness-in-employment-testing-validity-generalization-minority-issues-and-the National Research Council funders See, for example, The National Research Council’s Testing Expertise,  http://supp.apa.org/books/Correcting-Fallacies/appendix-d.pdf
221 Hartigan, J. A., & Wigdor, A. K.   "Hunter and Schmidt's economy wide models are based on simple assumptions for which the empirical evidence is slight." p.245 Dismissive, Denigrating Fairness in employment testing: Validity generalization, minority issues, and the General Aptitude Test Battery.  Washington, DC: National Academy Press, 1989 https://www.nap.edu/catalog/1338/fairness-in-employment-testing-validity-generalization-minority-issues-and-the National Research Council funders See, for example, The National Research Council’s Testing Expertise,  http://supp.apa.org/books/Correcting-Fallacies/appendix-d.pdf
222 Hartigan, J. A., & Wigdor, A. K.   "That assumption is supported by only a very few studies." p.245 Dismissive, Denigrating Fairness in employment testing: Validity generalization, minority issues, and the General Aptitude Test Battery.  Washington, DC: National Academy Press, 1989 https://www.nap.edu/catalog/1338/fairness-in-employment-testing-validity-generalization-minority-issues-and-the National Research Council funders See, for example, The National Research Council’s Testing Expertise,  http://supp.apa.org/books/Correcting-Fallacies/appendix-d.pdf
223 Hartigan, J. A., & Wigdor, A. K.   "There is no well-developed body of evidence from which to estimate the aggregate effects of better personnel selection...we have seen no empirical evidence that any of them provide an adequate basis for estimating the aggregate economic effects of implementing the VG-GATB on a nationwide basis." p.247 Dismissive, Denigrating Fairness in employment testing: Validity generalization, minority issues, and the General Aptitude Test Battery.  Washington, DC: National Academy Press, 1989 https://www.nap.edu/catalog/1338/fairness-in-employment-testing-validity-generalization-minority-issues-and-the National Research Council funders See, for example, The National Research Council’s Testing Expertise,  http://supp.apa.org/books/Correcting-Fallacies/appendix-d.pdf
224 Hartigan, J. A., & Wigdor, A. K.   "Furthermore, given the state of scientific knowledge, we do not believe that realistic dollar estimates of aggregate gains from improved selection are even possible." p.248 Dismissive Fairness in employment testing: Validity generalization, minority issues, and the General Aptitude Test Battery.  Washington, DC: National Academy Press, 1989 https://www.nap.edu/catalog/1338/fairness-in-employment-testing-validity-generalization-minority-issues-and-the National Research Council funders See, for example, The National Research Council’s Testing Expertise,  http://supp.apa.org/books/Correcting-Fallacies/appendix-d.pdf
225 Hartigan, J. A., & Wigdor, A. K.   "...primitive state of knowledge..." p.248 Denigrating Fairness in employment testing: Validity generalization, minority issues, and the General Aptitude Test Battery.  Washington, DC: National Academy Press, 1989 https://www.nap.edu/catalog/1338/fairness-in-employment-testing-validity-generalization-minority-issues-and-the National Research Council funders See, for example, The National Research Council’s Testing Expertise,  http://supp.apa.org/books/Correcting-Fallacies/appendix-d.pdf
226 Joan L. Herman, Donald W. Dorr-Bremme Walter E. Hathaway, Ed. "Despite the controversy and the important issues that it raises, little information has been forthcoming on the nature of testing as it is actually used in the schools. What functions do tests serve in the classrooms? How do teachers and principals use test results? What kinds of tests do principals and teachers trust and rely on most? These and similar questions have gone largely unaddressed." p.8 Dismissive Uses of Testing in the Schools: A National Profile Testing in the Schools, New Directions for Testing and Measurement #19, Jossey-Bass, September 1983   Office of Research and Improvement, US Education Department Rubbish. Entire books dating back a century were written on the topic, for example:  C.C. Ross, Measurement in Today’s Schools, 1942;  G.M. Ruch, G.D. Stoddard, Tests and Measurements in High School Instruction, 1927;  C.W. Odell, Educational Measurement in High School, 1930. Other testimonies to the abundance of educational testing and empirical research on test use starting in the first half of the twentieth century can be found in Lincoln & Workman 1936, 4, 7; Butts 1947, 605; Monroe 1950, 1461; Holman & Docter 1972, 34; Tyack 1974, 183; and Lohman 1997, 88.
227 Joan L. Herman, Donald W. Dorr-Bremme Walter E. Hathaway, Ed. "A few studies have indicated teachers' circumspect attitudes toward and limited use of one type of achievement measure, the norm-referenced test. Beyond this, however, the landscape of test uses in American schools has remained largely unexplored." p.8 Dismissive Uses of Testing in the Schools: A National Profile Testing in the Schools, New Directions for Testing and Measurement #19, Jossey-Bass, September 1983   Office of Research and Improvement, US Education Department Rubbish. Entire books dating back a century were written on the topic, for example:  C.C. Ross, Measurement in Today’s Schools, 1942;  G.M. Ruch, G.D. Stoddard, Tests and Measurements in High School Instruction, 1927;  C.W. Odell, Educational Measurement in High School, 1930. Other testimonies to the abundance of educational testing and empirical research on test use starting in the first half of the twentieth century can be found in Lincoln & Workman 1936, 4, 7; Butts 1947, 605; Monroe 1950, 1461; Holman & Docter 1972, 34; Tyack 1974, 183; and Lohman 1997, 88.
228 Joan L. Herman, Donald W. Dorr-Bremme Walter E. Hathaway, Ed. "We know very little about the quality of teacher-developed tests." p.15 Dismissive Uses of Testing in the Schools: A National Profile Testing in the Schools, New Directions for Testing and Measurement #19, Jossey-Bass, September 1983   Office of Research and Improvement, US Education Department Rubbish. Entire books dating back a century were written on the topic, for example:  C.C. Ross, Measurement in Today’s Schools, 1942;  G.M. Ruch, G.D. Stoddard, Tests and Measurements in High School Instruction, 1927;  C.W. Odell, Educational Measurement in High School, 1930. Other testimonies to the abundance of educational testing and empirical research on test use starting in the first half of the twentieth century can be found in Lincoln & Workman 1936, 4, 7; Butts 1947, 605; Monroe 1950, 1461; Holman & Docter 1972, 34; Tyack 1974, 183; and Lohman 1997, 88.
229 Don Dorr-Bremme James Catterall "Relatively little is known aout students' attitudes and feelings toward assessment in general. Even less is known regarding their feelings about different forms of assessment." p.48-1 Dismissive Costs of Testing: Test Use Project CSE Report, November 1982 https://files.eric.ed.gov/fulltext/ED224835.pdf National Institute of Education, US Education Department  
230 Don Dorr-Bremme James Catterall "in light of these few and certainly non-definitive findings, student interviews were undertaken to explore the affective valence that different forms of achievement assessment have for students." p.48-2 Dismissive Costs of Testing: Test Use Project CSE Report, November 1982 https://files.eric.ed.gov/fulltext/ED224835.pdf National Institute of Education, US Education Department  
231 Don Dorr-Bremme James Catterall "Because of the small sample size and the paucity of research in this topic, these findings suggests potential avenues for research as much as they provide information." p.48-26 Dismissive Costs of Testing: Test Use Project CSE Report, November 1982 https://files.eric.ed.gov/fulltext/ED224835.pdf National Institute of Education, US Education Department  
232 Jennie P. Yeh Joan L. Herman "Testing in American schools is increasing in both scope and visibility. … What return are we getting for this quite considerable investment? Little information is available. How are tests used in schools? What functions to test serve in classrooms?", p.1 Dismissive Teachers and testing: A survey of test use CSE Report No. 166, 1981 https://files.eric.ed.gov/fulltext/ED218336.pdf National Institute of Education, US Education Department Rubbish. Entire books dating back a century were written on the topic, for example:  C.C. Ross, Measurement in Today’s Schools, 1942;  G.M. Ruch, G.D. Stoddard, Tests and Measurements in High School Instruction, 1927;  C.W. Odell, Educational Measurement in High School, 1930. Other testimonies to the abundance of educational testing and empirical research on test use starting in the first half of the twentieth century can be found in Lincoln & Workman 1936, 4, 7; Butts 1947, 605; Monroe 1950, 1461; Holman & Docter 1972, 34; Tyack 1974, 183; and Lohman 1997, 88.
233 Joan L. Herman James Burry, Don Dorr-Bremme, Charlotte M. Lazar-Morrison, James D. Lehman, Jennie P. Yeh "Despite the great controversy that surrounds testing and its potential uses and abuses, there is little empirical information available about the nature of testing as it actually occurs and is used (or not used) in schools. The Test Use Project at the Center for the Study of Evaluation seeks to fill this gap and answer basic questions about tests and schooling.", p.2 Dismissive Teaching and testing: Allies or adversaries CSE Report No. 165, 1981 https://files.eric.ed.gov/fulltext/ED218336.pdf National Institute of Education, US Education Department Rubbish. Entire books dating back a century were written on the topic, for example:  C.C. Ross, Measurement in Today’s Schools, 1942;  G.M. Ruch, G.D. Stoddard, Tests and Measurements in High School Instruction, 1927;  C.W. Odell, Educational Measurement in High School, 1930. Other testimonies to the abundance of educational testing and empirical research on test use starting in the first half of the twentieth century can be found in Lincoln & Workman 1936, 4, 7; Butts 1947, 605; Monroe 1950, 1461; Holman & Docter 1972, 34; Tyack 1974, 183; and Lohman 1997, 88.
234 Joan L. Herman James Burry, Don Dorr-Bremme, Charlotte M. Lazar-Morrison, James D. Lehman, Jennie P. Yeh "Clearly the policy toward testing in this country has been one of accretion, but the full magnitude is undocumented. The CSE Test Use Project ... ", p.2 Dismissive Teaching and testing: Allies or adversaries CSE Report No. 165, 1981 https://files.eric.ed.gov/fulltext/ED218336.pdf National Institute of Education, US Education Department Rubbish. Entire books dating back a century were written on the topic, for example:  C.C. Ross, Measurement in Today’s Schools, 1942;  G.M. Ruch, G.D. Stoddard, Tests and Measurements in High School Instruction, 1927;  C.W. Odell, Educational Measurement in High School, 1930. Other testimonies to the abundance of educational testing and empirical research on test use starting in the first half of the twentieth century can be found in Lincoln & Workman 1936, 4, 7; Butts 1947, 605; Monroe 1950, 1461; Holman & Docter 1972, 34; Tyack 1974, 183; and Lohman 1997, 88.
235 James Burry   "As instructional considerations have come into prominence, the dialogue over testing has become somewhat adversarial, with a great deal of the recent literature forming a series of position papers espousing the value of one kind of test over another, but offering little empirical data (Lazar-Morrison, Polin, Moy, & Burry, 1980)." p.27 Dismissive The Design of Testing Programs with Multiple and Complimentary Uses Paper presented at the Annual Meeting of the National Council on Measurement in Education (Los Angeles, CA, April 1981) https://files.eric.ed.gov/fulltext/ED218337.pdf National Institute of Education, US Education Department Rubbish. Entire books dating back a century were written on the topic, for example:  C.C. Ross, Measurement in Today’s Schools, 1942;  G.M. Ruch, G.D. Stoddard, Tests and Measurements in High School Instruction, 1927;  C.W. Odell, Educational Measurement in High School, 1930. Other testimonies to the abundance of educational testing and empirical research on test use starting in the first half of the twentieth century can be found in Lincoln & Workman 1936, 4, 7; Butts 1947, 605; Monroe 1950, 1461; Holman & Docter 1972, 34; Tyack 1974, 183; and Lohman 1997, 88.
236 James Burry   "This paper makes a preliminary step toward explicating school peoples' points of view about the kinds of assessment that are useful for external accountability concerns and for instructional decision making." pp.27-28 1stness The Design of Testing Programs with Multiple and Complimentary Uses Paper presented at the Annual Meeting of the National Council on Measurement in Education (Los Angeles, CA, April 1981) https://files.eric.ed.gov/fulltext/ED218337.pdf National Institute of Education, US Education Department Rubbish. Entire books dating back a century were written on the topic, for example:  C.C. Ross, Measurement in Today’s Schools, 1942;  G.M. Ruch, G.D. Stoddard, Tests and Measurements in High School Instruction, 1927;  C.W. Odell, Educational Measurement in High School, 1930. Other testimonies to the abundance of educational testing and empirical research on test use starting in the first half of the twentieth century can be found in Lincoln & Workman 1936, 4, 7; Butts 1947, 605; Monroe 1950, 1461; Holman & Docter 1972, 34; Tyack 1974, 183; and Lohman 1997, 88.
237 Joan L. Herman Jennie Yeh "Despite the great controversy that surrounds testing and its potential uses and abuses, there is little empirical information available about the nature of testing as it actually occurs and is used (or not used) in schools. The Test Use Project …." p.2 Dismissive Contextual Examination of Test Use: The Test, The Setting, The Cost Paper presented at the Annual Meeting of the National Council on Measurement in Education (Los Angeles, CA, April 1981) https://files.eric.ed.gov/fulltext/ED218337.pdf National Institute of Education, US Education Department Rubbish. Entire books dating back a century were written on the topic, for example:  C.C. Ross, Measurement in Today’s Schools, 1942;  G.M. Ruch, G.D. Stoddard, Tests and Measurements in High School Instruction, 1927;  C.W. Odell, Educational Measurement in High School, 1930. Other testimonies to the abundance of educational testing and empirical research on test use starting in the first half of the twentieth century can be found in Lincoln & Workman 1936, 4, 7; Butts 1947, 605; Monroe 1950, 1461; Holman & Docter 1972, 34; Tyack 1974, 183; and Lohman 1997, 88.
238 Joan L. Herman Jennie Yeh "Clearly the policy toward testing in this country has been one of accretion, but the full magnitude is undocumented. The CSE Test Use Project ... ", p.2 Dismissive Contextual Examination of Test Use: The Test, The Setting, The Cost Paper presented at the Annual Meeting of the National Council on Measurement in Education (Los Angeles, CA, April 1981) https://files.eric.ed.gov/fulltext/ED218337.pdf National Institute of Education, US Education Department Rubbish. Entire books dating back a century were written on the topic, for example:  C.C. Ross, Measurement in Today’s Schools, 1942;  G.M. Ruch, G.D. Stoddard, Tests and Measurements in High School Instruction, 1927;  C.W. Odell, Educational Measurement in High School, 1930. Other testimonies to the abundance of educational testing and empirical research on test use starting in the first half of the twentieth century can be found in Lincoln & Workman 1936, 4, 7; Butts 1947, 605; Monroe 1950, 1461; Holman & Docter 1972, 34; Tyack 1974, 183; and Lohman 1997, 88.
239 Charlotte Lazar-Morrison Linda Polin, Raymond Moy, James Burry "There is little research-based information about current testing practice." Dismissive A review of the literature on test use, p.3 CSE Report No. 144, August 1980 https://cresst.org/publications/cresst-publication-2531/ National Institute of Education, US Department of Health and Human Services Rubbish. Entire books dating back a century were written on the topic, for example:  C.C. Ross, Measurement in Today’s Schools, 1942;  G.M. Ruch, G.D. Stoddard, Tests and Measurements in High School Instruction, 1927;  C.W. Odell, Educational Measurement in High School, 1930. Other testimonies to the abundance of educational testing and empirical research on test use starting in the first half of the twentieth century can be found in Lincoln & Workman 1936, 4, 7; Butts 1947, 605; Monroe 1950, 1461; Holman & Docter 1972, 34; Tyack 1974, 183; and Lohman 1997, 88.
240 Charlotte Lazar-Morrison Linda Polin, Raymond Moy, James Burry "Almost ten years ago, Kirkland (1971) reviewed the literature on test impact on students and schools and found that while much had been written about tests, few empirical studies were evident."  Dismissive A review of the literature on test use, p.3 CSE Report No. 144, August 1980 https://cresst.org/publications/cresst-publication-2531/ National Institute of Education, US Department of Health and Human Services Rubbish. Entire books dating back a century were written on the topic, for example:  C.C. Ross, Measurement in Today’s Schools, 1942;  G.M. Ruch, G.D. Stoddard, Tests and Measurements in High School Instruction, 1927;  C.W. Odell, Educational Measurement in High School, 1930. Other testimonies to the abundance of educational testing and empirical research on test use starting in the first half of the twentieth century can be found in Lincoln & Workman 1936, 4, 7; Butts 1947, 605; Monroe 1950, 1461; Holman & Docter 1972, 34; Tyack 1974, 183; and Lohman 1997, 88.
241 Charlotte Lazar-Morrison Linda Polin, Raymond Moy, James Burry "What is significant about [Kirkland's] exclusions is the correct observation that these issues are 'implications,' often not founded on empirical research."  Denigrating A review of the literature on test use, p.3 CSE Report No. 144, August 1980 https://cresst.org/publications/cresst-publication-2531/ National Institute of Education, US Department of Health and Human Services Rubbish. Entire books dating back a century were written on the topic, for example:  C.C. Ross, Measurement in Today’s Schools, 1942;  G.M. Ruch, G.D. Stoddard, Tests and Measurements in High School Instruction, 1927;  C.W. Odell, Educational Measurement in High School, 1930. Other testimonies to the abundance of educational testing and empirical research on test use starting in the first half of the twentieth century can be found in Lincoln & Workman 1936, 4, 7; Butts 1947, 605; Monroe 1950, 1461; Holman & Docter 1972, 34; Tyack 1974, 183; and Lohman 1997, 88.
242 Charlotte Lazar-Morrison Linda Polin, Raymond Moy, James Burry "Today, there still remains a plethora of publications on these very issues and a dearth of empirical support on actual test use practices." Dismissive A review of the literature on test use, p.3 CSE Report No. 144, August 1980 https://cresst.org/publications/cresst-publication-2531/ National Institute of Education, US Department of Health and Human Services Rubbish. Entire books dating back a century were written on the topic, for example:  C.C. Ross, Measurement in Today’s Schools, 1942;  G.M. Ruch, G.D. Stoddard, Tests and Measurements in High School Instruction, 1927;  C.W. Odell, Educational Measurement in High School, 1930. Other testimonies to the abundance of educational testing and empirical research on test use starting in the first half of the twentieth century can be found in Lincoln & Workman 1936, 4, 7; Butts 1947, 605; Monroe 1950, 1461; Holman & Docter 1972, 34; Tyack 1974, 183; and Lohman 1997, 88.
243 Charlotte Lazar-Morrison Linda Polin, Raymond Moy, James Burry "Kirkland's review of the literature is concentrated mainly upon the social and psychological issues in testing, more than upon instructional issues. Also, then as now, little empirical research had accumulated on the latter. Dismissive A review of the literature on test use, p.3 CSE Report No. 144, August 1980 https://cresst.org/publications/cresst-publication-2531/ National Institute of Education, US Department of Health and Human Services Rubbish. Entire books dating back a century were written on the topic, for example:  C.C. Ross, Measurement in Today’s Schools, 1942;  G.M. Ruch, G.D. Stoddard, Tests and Measurements in High School Instruction, 1927;  C.W. Odell, Educational Measurement in High School, 1930. Other testimonies to the abundance of educational testing and empirical research on test use starting in the first half of the twentieth century can be found in Lincoln & Workman 1936, 4, 7; Butts 1947, 605; Monroe 1950, 1461; Holman & Docter 1972, 34; Tyack 1974, 183; and Lohman 1997, 88.
244 Charlotte Lazar-Morrison Linda Polin, Raymond Moy, James Burry "Only recently has the testing dialogue begun to move away from social and psychological issues ...and begun to focus on the instructional issues of testing. Dismissive A review of the literature on test use, p.3 CSE Report No. 144, August 1980 https://cresst.org/publications/cresst-publication-2531/ National Institute of Education, US Department of Health and Human Services Rubbish. Entire books dating back a century were written on the topic, for example:  C.C. Ross, Measurement in Today’s Schools, 1942;  G.M. Ruch, G.D. Stoddard, Tests and Measurements in High School Instruction, 1927;  C.W. Odell, Educational Measurement in High School, 1930. Other testimonies to the abundance of educational testing and empirical research on test use starting in the first half of the twentieth century can be found in Lincoln & Workman 1936, 4, 7; Butts 1947, 605; Monroe 1950, 1461; Holman & Docter 1972, 34; Tyack 1974, 183; and Lohman 1997, 88.
245 Charlotte Lazar-Morrison Linda Polin, Raymond Moy, James Burry " ...the testing dialogue has taken the form of a debate, with the bulk of the test literature being a series of position papers citing little empirical data. This debate is being carried on predominantly by people outside the schools." Denigrating A review of the literature on test use, p.4 CSE Report No. 144, August 1980 https://cresst.org/publications/cresst-publication-2531/ National Institute of Education, US Department of Health and Human Services Rubbish. Entire books dating back a century were written on the topic, for example:  C.C. Ross, Measurement in Today’s Schools, 1942;  G.M. Ruch, G.D. Stoddard, Tests and Measurements in High School Instruction, 1927;  C.W. Odell, Educational Measurement in High School, 1930. Other testimonies to the abundance of educational testing and empirical research on test use starting in the first half of the twentieth century can be found in Lincoln & Workman 1936, 4, 7; Butts 1947, 605; Monroe 1950, 1461; Holman & Docter 1972, 34; Tyack 1974, 183; and Lohman 1997, 88.
246 Charlotte Lazar-Morrison Linda Polin, Raymond Moy, James Burry ""There is little empirical research available that can answer the questions that have arisen."  Dismissive A review of the literature on test use, p.5 CSE Report No. 144, August 1980 https://cresst.org/publications/cresst-publication-2531/ National Institute of Education, US Department of Health and Human Services Rubbish. Entire books dating back a century were written on the topic, for example:  C.C. Ross, Measurement in Today’s Schools, 1942;  G.M. Ruch, G.D. Stoddard, Tests and Measurements in High School Instruction, 1927;  C.W. Odell, Educational Measurement in High School, 1930. Other testimonies to the abundance of educational testing and empirical research on test use starting in the first half of the twentieth century can be found in Lincoln & Workman 1936, 4, 7; Butts 1947, 605; Monroe 1950, 1461; Holman & Docter 1972, 34; Tyack 1974, 183; and Lohman 1997, 88.
247 Charlotte Lazar-Morrison Linda Polin, Raymond Moy, James Burry "... little is known about the amount of other testing that takes place."  Dismissive A review of the literature on test use, p.6 CSE Report No. 144, August 1980 https://cresst.org/publications/cresst-publication-2531/ National Institute of Education, US Department of Health and Human Services Rubbish. Entire books dating back a century were written on the topic, for example:  C.C. Ross, Measurement in Today’s Schools, 1942;  G.M. Ruch, G.D. Stoddard, Tests and Measurements in High School Instruction, 1927;  C.W. Odell, Educational Measurement in High School, 1930. Other testimonies to the abundance of educational testing and empirical research on test use starting in the first half of the twentieth century can be found in Lincoln & Workman 1936, 4, 7; Butts 1947, 605; Monroe 1950, 1461; Holman & Docter 1972, 34; Tyack 1974, 183; and Lohman 1997, 88.
248 Charlotte Lazar-Morrison Linda Polin, Raymond Moy, James Burry "Although much has been written about minimum competency issues, there has yet to be any report of the actual uses or extent of the use of competency-based tests." Dismissive A review of the literature on test use, p.7 CSE Report No. 144, August 1980 https://cresst.org/publications/cresst-publication-2531/ National Institute of Education, US Department of Health and Human Services Rubbish. Entire books dating back a century were written on the topic, for example:  C.C. Ross, Measurement in Today’s Schools, 1942;  G.M. Ruch, G.D. Stoddard, Tests and Measurements in High School Instruction, 1927;  C.W. Odell, Educational Measurement in High School, 1930. Other testimonies to the abundance of educational testing and empirical research on test use starting in the first half of the twentieth century can be found in Lincoln & Workman 1936, 4, 7; Butts 1947, 605; Monroe 1950, 1461; Holman & Docter 1972, 34; Tyack 1974, 183; and Lohman 1997, 88.
249 Charlotte Lazar-Morrison Linda Polin, Raymond Moy, James Burry ""Virtually nothing is known about the amount of testing taking place using other types of assessments."  Dismissive A review of the literature on test use, p.7 CSE Report No. 144, August 1980 https://cresst.org/publications/cresst-publication-2531/ National Institute of Education, US Department of Health and Human Services Rubbish. Entire books dating back a century were written on the topic, for example:  C.C. Ross, Measurement in Today’s Schools, 1942;  G.M. Ruch, G.D. Stoddard, Tests and Measurements in High School Instruction, 1927;  C.W. Odell, Educational Measurement in High School, 1930. Other testimonies to the abundance of educational testing and empirical research on test use starting in the first half of the twentieth century can be found in Lincoln & Workman 1936, 4, 7; Butts 1947, 605; Monroe 1950, 1461; Holman & Docter 1972, 34; Tyack 1974, 183; and Lohman 1997, 88.
250 Charlotte Lazar-Morrison Linda Polin, Raymond Moy, James Burry "The literature on curriculum-embedded tests is equally scant." Dismissive A review of the literature on test use, p.8 CSE Report No. 144, August 1980 https://cresst.org/publications/cresst-publication-2531/ National Institute of Education, US Department of Health and Human Services Rubbish. Entire books dating back a century were written on the topic, for example:  C.C. Ross, Measurement in Today’s Schools, 1942;  G.M. Ruch, G.D. Stoddard, Tests and Measurements in High School Instruction, 1927;  C.W. Odell, Educational Measurement in High School, 1930. Other testimonies to the abundance of educational testing and empirical research on test use starting in the first half of the twentieth century can be found in Lincoln & Workman 1936, 4, 7; Butts 1947, 605; Monroe 1950, 1461; Holman & Docter 1972, 34; Tyack 1974, 183; and Lohman 1997, 88.
251 Charlotte Lazar-Morrison Linda Polin, Raymond Moy, James Burry "The current information focuses on norm- and criterion-referenced tests with some emphasis on minimum competency testing. Since literature on the other evaluative processes is lacking, there is a great need to look at various types of assessments to determine the purposes they serve.  Dismissive A review of the literature on test use, p.9 CSE Report No. 144, August 1980 https://cresst.org/publications/cresst-publication-2531/ National Institute of Education, US Department of Health and Human Services Rubbish. Entire books dating back a century were written on the topic, for example:  C.C. Ross, Measurement in Today’s Schools, 1942;  G.M. Ruch, G.D. Stoddard, Tests and Measurements in High School Instruction, 1927;  C.W. Odell, Educational Measurement in High School, 1930. Other testimonies to the abundance of educational testing and empirical research on test use starting in the first half of the twentieth century can be found in Lincoln & Workman 1936, 4, 7; Butts 1947, 605; Monroe 1950, 1461; Holman & Docter 1972, 34; Tyack 1974, 183; and Lohman 1997, 88.
252 Charlotte Lazar-Morrison Linda Polin, Raymond Moy, James Burry "The kinds of contextual factors which influence testing and the use of test results are just beginning to be appreciated." Dismissive A review of the literature on test use, p.9 CSE Report No. 144, August 1980 https://cresst.org/publications/cresst-publication-2531/ National Institute of Education, US Department of Health and Human Services Rubbish. Entire books dating back a century were written on the topic, for example:  C.C. Ross, Measurement in Today’s Schools, 1942;  G.M. Ruch, G.D. Stoddard, Tests and Measurements in High School Instruction, 1927;  C.W. Odell, Educational Measurement in High School, 1930. Other testimonies to the abundance of educational testing and empirical research on test use starting in the first half of the twentieth century can be found in Lincoln & Workman 1936, 4, 7; Butts 1947, 605; Monroe 1950, 1461; Holman & Docter 1972, 34; Tyack 1974, 183; and Lohman 1997, 88.
253 Charlotte Lazar-Morrison Linda Polin, Raymond Moy, James Burry "Concern exists about the level of teacher training in testing. ... The literature does not appear to reflect any great follow-up to such suggestions [regarding teacher competence with testing]." Dismissive A review of the literature on test use, p.9 CSE Report No. 144, August 1980 https://cresst.org/publications/cresst-publication-2531/ National Institute of Education, US Department of Health and Human Services Rubbish. Entire books dating back a century were written on the topic, for example:  C.C. Ross, Measurement in Today’s Schools, 1942;  G.M. Ruch, G.D. Stoddard, Tests and Measurements in High School Instruction, 1927;  C.W. Odell, Educational Measurement in High School, 1930. Other testimonies to the abundance of educational testing and empirical research on test use starting in the first half of the twentieth century can be found in Lincoln & Workman 1936, 4, 7; Butts 1947, 605; Monroe 1950, 1461; Holman & Docter 1972, 34; Tyack 1974, 183; and Lohman 1997, 88.
254 Charlotte Lazar-Morrison Linda Polin, Raymond Moy, James Burry "All of the studies mentioned included information about standardized achievement testing. As of yet, there is no evidence about how teacher attitudes toward other types of tests affect the use of those assessments." Dismissive A review of the literature on test use, p.19 CSE Report No. 144, August 1980 https://cresst.org/publications/cresst-publication-2531/ National Institute of Education, US Department of Health and Human Services Rubbish. Entire books dating back a century were written on the topic, for example:  C.C. Ross, Measurement in Today’s Schools, 1942;  G.M. Ruch, G.D. Stoddard, Tests and Measurements in High School Instruction, 1927;  C.W. Odell, Educational Measurement in High School, 1930. Other testimonies to the abundance of educational testing and empirical research on test use starting in the first half of the twentieth century can be found in Lincoln & Workman 1936, 4, 7; Butts 1947, 605; Monroe 1950, 1461; Holman & Docter 1972, 34; Tyack 1974, 183; and Lohman 1997, 88.
255 Charlotte Lazar-Morrison Linda Polin, Raymond Moy, James Burry "The effect of the actual testing environment on test use is only beginning to emerge. Evidence suggests that characteristics of the test-takers and the instructional environment need to be explored." Dismissive A review of the literature on test use, p.19 CSE Report No. 144, August 1980 https://cresst.org/publications/cresst-publication-2531/ National Institute of Education, US Department of Health and Human Services Rubbish. Entire books dating back a century were written on the topic, for example:  C.C. Ross, Measurement in Today’s Schools, 1942;  G.M. Ruch, G.D. Stoddard, Tests and Measurements in High School Instruction, 1927;  C.W. Odell, Educational Measurement in High School, 1930. Other testimonies to the abundance of educational testing and empirical research on test use starting in the first half of the twentieth century can be found in Lincoln & Workman 1936, 4, 7; Butts 1947, 605; Monroe 1950, 1461; Holman & Docter 1972, 34; Tyack 1974, 183; and Lohman 1997, 88.
256 Charlotte Lazar-Morrison Linda Polin, Raymond Moy, James Burry "These factors have been considered in research on teachers' instructional decision-making or in studies of the social or organizational qualities of the classroom. The investigation of these variables as factors affecting teachers' use of tests and test data is minimal." Dismissive A review of the literature on test use, p.20 CSE Report No. 144, August 1980 https://cresst.org/publications/cresst-publication-2531/ National Institute of Education, US Department of Health and Human Services Rubbish. Entire books dating back a century were written on the topic, for example:  C.C. Ross, Measurement in Today’s Schools, 1942;  G.M. Ruch, G.D. Stoddard, Tests and Measurements in High School Instruction, 1927;  C.W. Odell, Educational Measurement in High School, 1930. Other testimonies to the abundance of educational testing and empirical research on test use starting in the first half of the twentieth century can be found in Lincoln & Workman 1936, 4, 7; Butts 1947, 605; Monroe 1950, 1461; Holman & Docter 1972, 34; Tyack 1974, 183; and Lohman 1997, 88.
257 Charlotte Lazar-Morrison Linda Polin, Raymond Moy, James Burry "In the community, parent involvement, accounability pressures, and news media coverage of test scores are possible influences on the nature and amount of testing, but they have yet to be researched."  Dismissive A review of the literature on test use, p.20 CSE Report No. 144, August 1980 https://cresst.org/publications/cresst-publication-2531/ National Institute of Education, US Department of Health and Human Services Rubbish. Entire books dating back a century were written on the topic, for example:  C.C. Ross, Measurement in Today’s Schools, 1942;  G.M. Ruch, G.D. Stoddard, Tests and Measurements in High School Instruction, 1927;  C.W. Odell, Educational Measurement in High School, 1930. Other testimonies to the abundance of educational testing and empirical research on test use starting in the first half of the twentieth century can be found in Lincoln & Workman 1936, 4, 7; Butts 1947, 605; Monroe 1950, 1461; Holman & Docter 1972, 34; Tyack 1974, 183; and Lohman 1997, 88.
258 Charlotte Lazar-Morrison Linda Polin, Raymond Moy, James Burry "We know very little about the costs of testing." Dismissive A review of the literature on test use, p.20 CSE Report No. 144, August 1980 https://cresst.org/publications/cresst-publication-2531/ National Institute of Education, US Department of Health and Human Services Rubbish. Entire books dating back a century were written on the topic, for example:  C.C. Ross, Measurement in Today’s Schools, 1942;  G.M. Ruch, G.D. Stoddard, Tests and Measurements in High School Instruction, 1927;  C.W. Odell, Educational Measurement in High School, 1930. Other testimonies to the abundance of educational testing and empirical research on test use starting in the first half of the twentieth century can be found in Lincoln & Workman 1936, 4, 7; Butts 1947, 605; Monroe 1950, 1461; Holman & Docter 1972, 34; Tyack 1974, 183; and Lohman 1997, 88.
259 Charlotte Lazar-Morrison Linda Polin, Raymond Moy, James Burry "Little information is available about these types of costs, and the little information that is available concerns teachers and student attitudes." Dismissive A review of the literature on test use, p.22 CSE Report No. 144, August 1980 https://cresst.org/publications/cresst-publication-2531/ National Institute of Education, US Department of Health and Human Services Rubbish. Entire books dating back a century were written on the topic, for example:  C.C. Ross, Measurement in Today’s Schools, 1942;  G.M. Ruch, G.D. Stoddard, Tests and Measurements in High School Instruction, 1927;  C.W. Odell, Educational Measurement in High School, 1930. Other testimonies to the abundance of educational testing and empirical research on test use starting in the first half of the twentieth century can be found in Lincoln & Workman 1936, 4, 7; Butts 1947, 605; Monroe 1950, 1461; Holman & Docter 1972, 34; Tyack 1974, 183; and Lohman 1997, 88.
260 Charlotte Lazar-Morrison Linda Polin, Raymond Moy, James Burry "The question of whether test scores affect a student's self-concept has also been raised." ... As indicated previously, information on any of the aforementioned issues is scant," Dismissive A review of the literature on test use, p.23 CSE Report No. 144, August 1980 https://cresst.org/publications/cresst-publication-2531/ National Institute of Education, US Department of Health and Human Services Rubbish. Entire books dating back a century were written on the topic, for example:  C.C. Ross, Measurement in Today’s Schools, 1942;  G.M. Ruch, G.D. Stoddard, Tests and Measurements in High School Instruction, 1927;  C.W. Odell, Educational Measurement in High School, 1930. Other testimonies to the abundance of educational testing and empirical research on test use starting in the first half of the twentieth century can be found in Lincoln & Workman 1936, 4, 7; Butts 1947, 605; Monroe 1950, 1461; Holman & Docter 1972, 34; Tyack 1974, 183; and Lohman 1997, 88.
261 Charlotte Lazar-Morrison Linda Polin, Raymond Moy, James Burry "Other evidence suggests that tests of many types are being administered and the results are being utilized. To what extent this is occurring is not specifically known." Dismissive A review of the literature on test use, pp.23-24 CSE Report No. 144, August 1980 https://cresst.org/publications/cresst-publication-2531/ National Institute of Education, US Department of Health and Human Services Rubbish. Entire books dating back a century were written on the topic, for example:  C.C. Ross, Measurement in Today’s Schools, 1942;  G.M. Ruch, G.D. Stoddard, Tests and Measurements in High School Instruction, 1927;  C.W. Odell, Educational Measurement in High School, 1930. Other testimonies to the abundance of educational testing and empirical research on test use starting in the first half of the twentieth century can be found in Lincoln & Workman 1936, 4, 7; Butts 1947, 605; Monroe 1950, 1461; Holman & Docter 1972, 34; Tyack 1974, 183; and Lohman 1997, 88.
262 Charlotte Lazar-Morrison Linda Polin, Raymond Moy, James Burry "There are a number of areas concerning teachers and testing for which there is no information." Dismissive A review of the literature on test use, p.24 CSE Report No. 144, August 1980 https://cresst.org/publications/cresst-publication-2531/ National Institute of Education, US Department of Health and Human Services Rubbish. Entire books dating back a century were written on the topic, for example:  C.C. Ross, Measurement in Today’s Schools, 1942;  G.M. Ruch, G.D. Stoddard, Tests and Measurements in High School Instruction, 1927;  C.W. Odell, Educational Measurement in High School, 1930. Other testimonies to the abundance of educational testing and empirical research on test use starting in the first half of the twentieth century can be found in Lincoln & Workman 1936, 4, 7; Butts 1947, 605; Monroe 1950, 1461; Holman & Docter 1972, 34; Tyack 1974, 183; and Lohman 1997, 88.
263 Charlotte Lazar-Morrison Linda Polin, Raymond Moy, James Burry "The impact of other testing must also be considered. In-class assessments made by individual teachers have yet to be examined in depth." Dismissive A review of the literature on test use, p.24 CSE Report No. 144, August 1980 https://cresst.org/publications/cresst-publication-2531/ National Institute of Education, US Department of Health and Human Services Rubbish. Entire books dating back a century were written on the topic, for example:  C.C. Ross, Measurement in Today’s Schools, 1942;  G.M. Ruch, G.D. Stoddard, Tests and Measurements in High School Instruction, 1927;  C.W. Odell, Educational Measurement in High School, 1930. Other testimonies to the abundance of educational testing and empirical research on test use starting in the first half of the twentieth century can be found in Lincoln & Workman 1936, 4, 7; Butts 1947, 605; Monroe 1950, 1461; Holman & Docter 1972, 34; Tyack 1974, 183; and Lohman 1997, 88.
264 Charlotte Lazar-Morrison Linda Polin, Raymond Moy, James Burry "Teachers place greater reliance on, and have more confidence in, the results of their own judgments of students' performance, but little is known about the kinds of activities that give voice to this information." Dismissive A review of the literature on test use, p.25 CSE Report No. 144, August 1980 https://cresst.org/publications/cresst-publication-2531/ National Institute of Education, US Department of Health and Human Services Rubbish. Entire books dating back a century were written on the topic, for example:  C.C. Ross, Measurement in Today’s Schools, 1942;  G.M. Ruch, G.D. Stoddard, Tests and Measurements in High School Instruction, 1927;  C.W. Odell, Educational Measurement in High School, 1930. Other testimonies to the abundance of educational testing and empirical research on test use starting in the first half of the twentieth century can be found in Lincoln & Workman 1936, 4, 7; Butts 1947, 605; Monroe 1950, 1461; Holman & Docter 1972, 34; Tyack 1974, 183; and Lohman 1997, 88.
265 Charlotte Lazar-Morrison Linda Polin, Raymond Moy, James Burry "The settings and factors which affect the use of tests and their results is yet another uninformed area." Dismissive A review of the literature on test use, p.25 CSE Report No. 144, August 1980 https://cresst.org/publications/cresst-publication-2531/ National Institute of Education, US Department of Health and Human Services Rubbish. Entire books dating back a century were written on the topic, for example:  C.C. Ross, Measurement in Today’s Schools, 1942;  G.M. Ruch, G.D. Stoddard, Tests and Measurements in High School Instruction, 1927;  C.W. Odell, Educational Measurement in High School, 1930. Other testimonies to the abundance of educational testing and empirical research on test use starting in the first half of the twentieth century can be found in Lincoln & Workman 1936, 4, 7; Butts 1947, 605; Monroe 1950, 1461; Holman & Docter 1972, 34; Tyack 1974, 183; and Lohman 1997, 88.
                   
  IRONIES:                
  Michael J. Feuer   "To challenge authority is to hold authority accountable. Challenging people in power requires them to show that what they are doing is legitimate; we invite them to rise to the challenge and prove their case; and they, in turn, trust that the system will treat them fairly."   Measuring Accountability When Trust Is Conditional Education Week, September 24, 2012 https://www.edweek.org/ew/articles/2012/09/24/05feuer_ep.h32.html?print=1    
  Michael J. Feuer   "No profession is granted automatic autonomy or an exemption from evaluation."   Measuring Accountability When Trust Is Conditional Education Week, September 24, 2012 https://www.edweek.org/ew/articles/2012/09/24/05feuer_ep.h32.html?print=1    
  Laura S. Hamilton Brian M. Stecher, Stephen P. Klein "Greater knowledge about testing and accountability can lead to better system design and more-effective system management." p.xiv   Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002 Summary, p.xiv      
  Laura S. Hamilton Brian M. Stecher "Incremental improvements to existing systems, based on current research on testing and accountability, should be combined with long-term research and development efforts that may ultimately lead to a major redesign of these systems. Success in this endeavor will require the thoughtful engagement of educators, policymakers, and researchers in discussions and debates about tests and testing policies."   Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002 Chapter 6, Improving test-based accountability, pp.143-144      
  Brian M. Stecher Stephen P. Klein "Additional information about the impact of performance assessments on curriculum and instruction would provide policymakers with valuable data on the benefits that may accrue from this relatively expensive form of assessment." p.11   The Cost of Science Performance Assessments in Large-Scale Testing Programs, p.1 Educational Evaluation and Policy Analysis, Spring 1997, 19(1)      
  Eva L. Baker Robert L. Linn, Joan L. Herman "Diverse perspectives are needed to clarify real differences and to find equitable, workable balances."   CRESST: A Continuing Mission to Improve Educational Assessment, p.13 Evaluation Comment, Summer 1996      
  Eva L. Baker Robert L. Linn, Joan L. Herman "Impartiality, not advocacy, is the key to the credibility of research and development."   CRESST: A Continuing Mission to Improve Educational Assessment, p.13 Evaluation Comment, Summer 1996      
                   
      Author cites (and accepts as fact without checking) someone elses dismissive review            
      Cite selves or colleagues in the group, but dismiss or denigrate all other work            
      Falsely claim that research has only recently been done on topic.