HOME:  Dismissive Reviews in Education Policy Research        
  Author Co-author(s) Dismissive Quote type Title Source Link1 Notes  
1 Laura S. Hamilton   "Despite the widespread enthusiasm for assessment-based reforms, many of the current and proposed uses of large-scale assessments are based on unverified assumptions about the extent to which they will actually lead to improved teaching and learning, and insufficient attention has been paid to the characteristics of assessment programs that are likely to promote desired outcomes." Denigrating Testing What Has Been Taught, p.47 American Educator, Winter 2010-2011 https://www.aft.org/sites/default/files/periodicals/Hamilton.pdf Relevant studies of the effects of varying types of incentive or the optimal structure of testing prorgrams include those of Kelley (1999); the *Southern Regional Education Board (1998); Trelfa (1998); Heneman (1998); Banta, Lund, Black & Oblander (1996); Brooks-Cooper, 1993; Eckstein & Noah (1993); Richards & Shen (1992); Jacobson (1992); Heyneman & Ransom (1992); *Levine & Lezotte (1990); Duran, 1989; *Crooks (1988); *Kulik & Kulik (1987); Corcoran & Wilson (1986); *Guskey & Gates (1986); Brook & Oxenham (1985); Oxenham (1984); Venezky & Winfield (1979); Brookover & Lezotte (1979); McMillan (1977); Abbott (1977); *Staats (1973); *Kazdin & Bootzin (1972); *O’Leary & Drabman (1971); Cronbach (1960); Hurlock (1925), and Zeng (2001). *Covers many studies; study is a research review, research synthesis, or meta-analysis.  Other researchers who, even prior to 2000, studied test-based incentive programs include Homme, Csanyi, Gonzales, Rechs, O’Leary, Drabman, Kaszdin, Bootzin, Staats, Cameron, Pierce, McMillan, Corcoran, Roueche, Kirk, Wheeler, Boylan, and Wilson. "Others have considered the role of tests in incentive programs.  These researchers have included Homme, Csanyi, Gonzales, Rechs, O’Leary, Drabman, Kaszdin, Bootzin, Staats, Cameron, Pierce, McMillan, Corcoran, and Wilson. International organizations, such as the World Bank or the Asian Development Bank, have studied the effects of testing on education programs they sponsor.  Researchers have included Somerset, Heynemann, Ransom, Psacharopoulis, Velez, Brooke, Oxenham, Bude, Chapman, Snyder, and Pronaratna.
Moreover, the mastery learning/mastery testing experiments conducted from the 1960s through today varied incentives, frequency of tests, types of tests, and many other factors to determine the optimal structure of testing programs. Researchers included such notables as Bloom, Carroll, Keller, Block, Burns, Wentling, Anderson, Hymel, Kulik, Tierney, Cross, Okey, Guskey, Gates, and Jones."
2 Laura S. Hamilton   "Can assessments meaningfully be aligned to standards, … What would the key features of an assessment system designed to increase student learning and improve instruction be? While current assessment knowledge is not sufficient to fully answer these questions, in this article I offer an overview of what is known and several suggestions for improving our approach to assessment." Denigrating Testing What Has Been Taught, p.47 American Educator, Winter 2010-2011 https://www.aft.org/sites/default/files/periodicals/Hamilton.pdf Relevant studies of the effects of varying types of incentive or the optimal structure of testing prorgrams include those of Kelley (1999); the *Southern Regional Education Board (1998); Trelfa (1998); Heneman (1998); Banta, Lund, Black & Oblander (1996); Brooks-Cooper, 1993; Eckstein & Noah (1993); Richards & Shen (1992); Jacobson (1992); Heyneman & Ransom (1992); *Levine & Lezotte (1990); Duran, 1989; *Crooks (1988); *Kulik & Kulik (1987); Corcoran & Wilson (1986); *Guskey & Gates (1986); Brook & Oxenham (1985); Oxenham (1984); Venezky & Winfield (1979); Brookover & Lezotte (1979); McMillan (1977); Abbott (1977); *Staats (1973); *Kazdin & Bootzin (1972); *O’Leary & Drabman (1971); Cronbach (1960); Hurlock (1925), and Zeng (2001). *Covers many studies; study is a research review, research synthesis, or meta-analysis.  Other researchers who, even prior to 2000, studied test-based incentive programs include Homme, Csanyi, Gonzales, Rechs, O’Leary, Drabman, Kaszdin, Bootzin, Staats, Cameron, Pierce, McMillan, Corcoran, Roueche, Kirk, Wheeler, Boylan, and Wilson. "Others have considered the role of tests in incentive programs.  These researchers have included Homme, Csanyi, Gonzales, Rechs, O’Leary, Drabman, Kaszdin, Bootzin, Staats, Cameron, Pierce, McMillan, Corcoran, and Wilson. International organizations, such as the World Bank or the Asian Development Bank, have studied the effects of testing on education programs they sponsor.  Researchers have included Somerset, Heynemann, Ransom, Psacharopoulis, Velez, Brooke, Oxenham, Bude, Chapman, Snyder, and Pronaratna.
Moreover, the mastery learning/mastery testing experiments conducted from the 1960s through today varied incentives, frequency of tests, types of tests, and many other factors to determine the optimal structure of testing programs. Researchers included such notables as Bloom, Carroll, Keller, Block, Burns, Wentling, Anderson, Hymel, Kulik, Tierney, Cross, Okey, Guskey, Gates, and Jones."
3 Laura S. Hamilton   "There is no research evidence to tell us definitively how to build an assessment system that will promote student learning and be resistent to the negative consequences that are common in high-stakes testing programs." Dismissive Testing What Has Been Taught, p.49 American Educator, Winter 2010-2011 https://www.aft.org/sites/default/files/periodicals/Hamilton.pdf Relevant studies of the effects of varying types of incentive or the optimal structure of testing prorgrams include those of Kelley (1999); the *Southern Regional Education Board (1998); Trelfa (1998); Heneman (1998); Banta, Lund, Black & Oblander (1996); Brooks-Cooper, 1993; Eckstein & Noah (1993); Richards & Shen (1992); Jacobson (1992); Heyneman & Ransom (1992); *Levine & Lezotte (1990); Duran, 1989; *Crooks (1988); *Kulik & Kulik (1987); Corcoran & Wilson (1986); *Guskey & Gates (1986); Brook & Oxenham (1985); Oxenham (1984); Venezky & Winfield (1979); Brookover & Lezotte (1979); McMillan (1977); Abbott (1977); *Staats (1973); *Kazdin & Bootzin (1972); *O’Leary & Drabman (1971); Cronbach (1960); Hurlock (1925), and Zeng (2001). *Covers many studies; study is a research review, research synthesis, or meta-analysis.  Other researchers who, even prior to 2000, studied test-based incentive programs include Homme, Csanyi, Gonzales, Rechs, O’Leary, Drabman, Kaszdin, Bootzin, Staats, Cameron, Pierce, McMillan, Corcoran, Roueche, Kirk, Wheeler, Boylan, and Wilson. "Others have considered the role of tests in incentive programs.  These researchers have included Homme, Csanyi, Gonzales, Rechs, O’Leary, Drabman, Kaszdin, Bootzin, Staats, Cameron, Pierce, McMillan, Corcoran, and Wilson. International organizations, such as the World Bank or the Asian Development Bank, have studied the effects of testing on education programs they sponsor.  Researchers have included Somerset, Heynemann, Ransom, Psacharopoulis, Velez, Brooke, Oxenham, Bude, Chapman, Snyder, and Pronaratna.
Moreover, the mastery learning/mastery testing experiments conducted from the 1960s through today varied incentives, frequency of tests, types of tests, and many other factors to determine the optimal structure of testing programs. Researchers included such notables as Bloom, Carroll, Keller, Block, Burns, Wentling, Anderson, Hymel, Kulik, Tierney, Cross, Okey, Guskey, Gates, and Jones."
4 Laura S. Hamilton   Research on the effects of various assessment-design features is limited, so any effort that relies heavily on assessment as a tool for school improvement should be carried out with caution." Denigrating Testing What Has Been Taught, p.50 American Educator, Winter 2010-2011 https://www.aft.org/sites/default/files/periodicals/Hamilton.pdf Relevant studies of the effects of varying types of incentive or the optimal structure of testing prorgrams include those of Kelley (1999); the *Southern Regional Education Board (1998); Trelfa (1998); Heneman (1998); Banta, Lund, Black & Oblander (1996); Brooks-Cooper, 1993; Eckstein & Noah (1993); Richards & Shen (1992); Jacobson (1992); Heyneman & Ransom (1992); *Levine & Lezotte (1990); Duran, 1989; *Crooks (1988); *Kulik & Kulik (1987); Corcoran & Wilson (1986); *Guskey & Gates (1986); Brook & Oxenham (1985); Oxenham (1984); Venezky & Winfield (1979); Brookover & Lezotte (1979); McMillan (1977); Abbott (1977); *Staats (1973); *Kazdin & Bootzin (1972); *O’Leary & Drabman (1971); Cronbach (1960); Hurlock (1925), and Zeng (2001). *Covers many studies; study is a research review, research synthesis, or meta-analysis.  Other researchers who, even prior to 2000, studied test-based incentive programs include Homme, Csanyi, Gonzales, Rechs, O’Leary, Drabman, Kaszdin, Bootzin, Staats, Cameron, Pierce, McMillan, Corcoran, Roueche, Kirk, Wheeler, Boylan, and Wilson. "Others have considered the role of tests in incentive programs.  These researchers have included Homme, Csanyi, Gonzales, Rechs, O’Leary, Drabman, Kaszdin, Bootzin, Staats, Cameron, Pierce, McMillan, Corcoran, and Wilson. International organizations, such as the World Bank or the Asian Development Bank, have studied the effects of testing on education programs they sponsor.  Researchers have included Somerset, Heynemann, Ransom, Psacharopoulis, Velez, Brooke, Oxenham, Bude, Chapman, Snyder, and Pronaratna.
Moreover, the mastery learning/mastery testing experiments conducted from the 1960s through today varied incentives, frequency of tests, types of tests, and many other factors to determine the optimal structure of testing programs. Researchers included such notables as Bloom, Carroll, Keller, Block, Burns, Wentling, Anderson, Hymel, Kulik, Tierney, Cross, Okey, Guskey, Gates, and Jones."
5 Laura S. Hamilton Brian M. Stecher, Kun Yuan “A few studies have attempted to examine how the creation and publication of standards, per se, have affected practices.” p. 3 Dismissive Standards-Based Reform in the United States: History, Research, and Future Directions Center on Education Policy, December, 2008 http://www.rand.org/content/dam/rand/pubs/reprints/2009/RAND_RP1384.pdf Relevant pre-2000 studies of the effects of standards, alignment, goal setting, setting reachable goals, etc. include those of Mitchell (1999); Morgan & Ramist (1998); the *Southern Regional Education Board (1998); Miles, Bishop, Collins, Fink, Gardner, Grant, Hussain, et al. (1997); the Florida Office of Program Policy Analysis (1997); Pomplun (1997); Schmoker (1996); Aguilera & Hendricks (1996); Banta, Lund, Black & Oblander (1996); Bottoms & Mikos (1995); *Bamburg & Medina (1993); Bishop (1993); the U. S. General Accounting Office (1993); Eckstein & Noah (1993); Mattsson (1993); Brown (1992); Heyneman & Ransom (1992); Whetton (1992); Anderson, Muir, Bateson, Blackmore & Rogers (1990); Csikszentmihalyi (1990); *Levine & Lezotte (1990); LaRoque & Coleman (1989); Hillocks (1987); Willingham & Morris (1986); Resnick & Resnick (1985); Ogle & Fritts (1984); *Natriello & Dornbusch (1984); Brooke & Oxenham (1984); Rentz (1979); Wellisch, MacQueen, Carriere & Dick (1978); *Rosswork (1977); Estes, Colvin & Goodwin (1976); Wood (1953); and Panlasigui & Knight (1930).  
6 Laura S. Hamilton Brian M. Stecher, Kun Yuan “The research evidence does not provide definitive answers to these questions.” p. 6 Denigrating Standards-Based Reform in the United States: History, Research, and Future Directions Center on Education Policy, December, 2008 http://www.rand.org/content/dam/rand/pubs/reprints/2009/RAND_RP1384.pdf Relevant pre-2000 studies of the effects of standards, alignment, goal setting, setting reachable goals, etc. include those of Mitchell (1999); Morgan & Ramist (1998); the *Southern Regional Education Board (1998); Miles, Bishop, Collins, Fink, Gardner, Grant, Hussain, et al. (1997); the Florida Office of Program Policy Analysis (1997); Pomplun (1997); Schmoker (1996); Aguilera & Hendricks (1996); Banta, Lund, Black & Oblander (1996); Bottoms & Mikos (1995); *Bamburg & Medina (1993); Bishop (1993); the U. S. General Accounting Office (1993); Eckstein & Noah (1993); Mattsson (1993); Brown (1992); Heyneman & Ransom (1992); Whetton (1992); Anderson, Muir, Bateson, Blackmore & Rogers (1990); Csikszentmihalyi (1990); *Levine & Lezotte (1990); LaRoque & Coleman (1989); Hillocks (1987); Willingham & Morris (1986); Resnick & Resnick (1985); Ogle & Fritts (1984); *Natriello & Dornbusch (1984); Brooke & Oxenham (1984); Rentz (1979); Wellisch, MacQueen, Carriere & Dick (1978); *Rosswork (1977); Estes, Colvin & Goodwin (1976); Wood (1953); and Panlasigui & Knight (1930).  
7 Laura S. Hamilton Brian M. Stecher, Kun Yuan “He [Poynter 1994] also noted that ‘virtually all of the arguments, both for and against standards, are based on beliefs and hypotheses rather than on direct empirical evidence’ (p. 427). Although a large and growing body of research has been conducted to examine the effects of SBR, the caution Poynter expressed in 1994 about the lack of empirical evidence remains relevant today.” pp. 34-35 Dismissive Standards-Based Reform in the United States: History, Research, and Future Directions Center on Education Policy, December, 2008 http://www.rand.org/content/dam/rand/pubs/reprints/2009/RAND_RP1384.pdf Relevant pre-2000 studies of the effects of standards, alignment, goal setting, setting reachable goals, etc. include those of Mitchell (1999); Morgan & Ramist (1998); the *Southern Regional Education Board (1998); Miles, Bishop, Collins, Fink, Gardner, Grant, Hussain, et al. (1997); the Florida Office of Program Policy Analysis (1997); Pomplun (1997); Schmoker (1996); Aguilera & Hendricks (1996); Banta, Lund, Black & Oblander (1996); Bottoms & Mikos (1995); *Bamburg & Medina (1993); Bishop (1993); the U. S. General Accounting Office (1993); Eckstein & Noah (1993); Mattsson (1993); Brown (1992); Heyneman & Ransom (1992); Whetton (1992); Anderson, Muir, Bateson, Blackmore & Rogers (1990); Csikszentmihalyi (1990); *Levine & Lezotte (1990); LaRoque & Coleman (1989); Hillocks (1987); Willingham & Morris (1986); Resnick & Resnick (1985); Ogle & Fritts (1984); *Natriello & Dornbusch (1984); Brooke & Oxenham (1984); Rentz (1979); Wellisch, MacQueen, Carriere & Dick (1978); *Rosswork (1977); Estes, Colvin & Goodwin (1976); Wood (1953); and Panlasigui & Knight (1930).  
8 Laura S. Hamilton Brian M. Stecher, Kun Yuan “Arguably the most important test of quality is whether the standards promote high-quality instruction and improved student learning, but as we discuss later, there is very little research to address that question.” p. 37 Dismissive Standards-Based Reform in the United States: History, Research, and Future Directions Center on Education Policy, December, 2008 http://www.rand.org/content/dam/rand/pubs/reprints/2009/RAND_RP1384.pdf Relevant pre-2000 studies of the effects of standards, alignment, goal setting, setting reachable goals, etc. include those of Mitchell (1999); Morgan & Ramist (1998); the *Southern Regional Education Board (1998); Miles, Bishop, Collins, Fink, Gardner, Grant, Hussain, et al. (1997); the Florida Office of Program Policy Analysis (1997); Pomplun (1997); Schmoker (1996); Aguilera & Hendricks (1996); Banta, Lund, Black & Oblander (1996); Bottoms & Mikos (1995); *Bamburg & Medina (1993); Bishop (1993); the U. S. General Accounting Office (1993); Eckstein & Noah (1993); Mattsson (1993); Brown (1992); Heyneman & Ransom (1992); Whetton (1992); Anderson, Muir, Bateson, Blackmore & Rogers (1990); Csikszentmihalyi (1990); *Levine & Lezotte (1990); LaRoque & Coleman (1989); Hillocks (1987); Willingham & Morris (1986); Resnick & Resnick (1985); Ogle & Fritts (1984); *Natriello & Dornbusch (1984); Brooke & Oxenham (1984); Rentz (1979); Wellisch, MacQueen, Carriere & Dick (1978); *Rosswork (1977); Estes, Colvin & Goodwin (1976); Wood (1953); and Panlasigui & Knight (1930).  
9 Laura S. Hamilton Brian M. Stecher, Kun Yuan “[T]here have been a few studies of SBR as a comprehensive system. . . . [T]here is some research on how the adoption of standards, per se, or the alignment of standards with curriculum influences school practices or student outcomes.” p. 38 Dismissive Standards-Based Reform in the United States: History, Research, and Future Directions Center on Education Policy, December, 2008 http://www.rand.org/content/dam/rand/pubs/reprints/2009/RAND_RP1384.pdf Relevant studies of the effects of varying types of incentive or the optimal structure of testing prorgrams include those of Kelley (1999); the *Southern Regional Education Board (1998); Trelfa (1998); Heneman (1998); Banta, Lund, Black & Oblander (1996); Brooks-Cooper, 1993; Eckstein & Noah (1993); Richards & Shen (1992); Jacobson (1992); Heyneman & Ransom (1992); *Levine & Lezotte (1990); Duran, 1989; *Crooks (1988); *Kulik & Kulik (1987); Corcoran & Wilson (1986); *Guskey & Gates (1986); Brook & Oxenham (1985); Oxenham (1984); Venezky & Winfield (1979); Brookover & Lezotte (1979); McMillan (1977); Abbott (1977); *Staats (1973); *Kazdin & Bootzin (1972); *O’Leary & Drabman (1971); Cronbach (1960); Hurlock (1925), and Zeng (2001). *Covers many studies; study is a research review, research synthesis, or meta-analysis.  Other researchers who, even prior to 2000, studied test-based incentive programs include Homme, Csanyi, Gonzales, Rechs, O’Leary, Drabman, Kaszdin, Bootzin, Staats, Cameron, Pierce, McMillan, Corcoran, Roueche, Kirk, Wheeler, Boylan, and Wilson. "Others have considered the role of tests in incentive programs.  These researchers have included Homme, Csanyi, Gonzales, Rechs, O’Leary, Drabman, Kaszdin, Bootzin, Staats, Cameron, Pierce, McMillan, Corcoran, and Wilson. International organizations, such as the World Bank or the Asian Development Bank, have studied the effects of testing on education programs they sponsor.  Researchers have included Somerset, Heynemann, Ransom, Psacharopoulis, Velez, Brooke, Oxenham, Bude, Chapman, Snyder, and Pronaratna.
Moreover, the mastery learning/mastery testing experiments conducted from the 1960s through today varied incentives, frequency of tests, types of tests, and many other factors to determine the optimal structure of testing programs. Researchers included such notables as Bloom, Carroll, Keller, Block, Burns, Wentling, Anderson, Hymel, Kulik, Tierney, Cross, Okey, Guskey, Gates, and Jones."
10 Laura S. Hamilton Brian M. Stecher, Kun Yuan “The lack of evidence about the effects of SBR derives primarily from the fact that the vision has never been fully realized in practice.” p. 47 Dismissive Standards-Based Reform in the United States: History, Research, and Future Directions Center on Education Policy, December, 2008 http://www.rand.org/content/dam/rand/pubs/reprints/2009/RAND_RP1384.pdf Relevant studies of the effects of varying types of incentive or the optimal structure of testing prorgrams include those of Kelley (1999); the *Southern Regional Education Board (1998); Trelfa (1998); Heneman (1998); Banta, Lund, Black & Oblander (1996); Brooks-Cooper, 1993; Eckstein & Noah (1993); Richards & Shen (1992); Jacobson (1992); Heyneman & Ransom (1992); *Levine & Lezotte (1990); Duran, 1989; *Crooks (1988); *Kulik & Kulik (1987); Corcoran & Wilson (1986); *Guskey & Gates (1986); Brook & Oxenham (1985); Oxenham (1984); Venezky & Winfield (1979); Brookover & Lezotte (1979); McMillan (1977); Abbott (1977); *Staats (1973); *Kazdin & Bootzin (1972); *O’Leary & Drabman (1971); Cronbach (1960); Hurlock (1925), and Zeng (2001). *Covers many studies; study is a research review, research synthesis, or meta-analysis.  Other researchers who, even prior to 2000, studied test-based incentive programs include Homme, Csanyi, Gonzales, Rechs, O’Leary, Drabman, Kaszdin, Bootzin, Staats, Cameron, Pierce, McMillan, Corcoran, Roueche, Kirk, Wheeler, Boylan, and Wilson. "Others have considered the role of tests in incentive programs.  These researchers have included Homme, Csanyi, Gonzales, Rechs, O’Leary, Drabman, Kaszdin, Bootzin, Staats, Cameron, Pierce, McMillan, Corcoran, and Wilson. International organizations, such as the World Bank or the Asian Development Bank, have studied the effects of testing on education programs they sponsor.  Researchers have included Somerset, Heynemann, Ransom, Psacharopoulis, Velez, Brooke, Oxenham, Bude, Chapman, Snyder, and Pronaratna.
Moreover, the mastery learning/mastery testing experiments conducted from the 1960s through today varied incentives, frequency of tests, types of tests, and many other factors to determine the optimal structure of testing programs. Researchers included such notables as Bloom, Carroll, Keller, Block, Burns, Wentling, Anderson, Hymel, Kulik, Tierney, Cross, Okey, Guskey, Gates, and Jones."
11 Laura S. Hamilton Brian M. Stecher, Kun Yuan “[A]lthough many conceptions of SBR emphasize autonomy, we currently know relatively little about the effects of granting autonomy or what the right balance is between autonomy and prescriptiveness.” p. 55 Dismissive Standards-Based Reform in the United States: History, Research, and Future Directions Center on Education Policy, December, 2008 http://www.rand.org/content/dam/rand/pubs/reprints/2009/RAND_RP1384.pdf Relevant pre-2000 studies of the effects of standards, alignment, goal setting, setting reachable goals, etc. include those of Mitchell (1999); Morgan & Ramist (1998); the *Southern Regional Education Board (1998); Miles, Bishop, Collins, Fink, Gardner, Grant, Hussain, et al. (1997); the Florida Office of Program Policy Analysis (1997); Pomplun (1997); Schmoker (1996); Aguilera & Hendricks (1996); Banta, Lund, Black & Oblander (1996); Bottoms & Mikos (1995); *Bamburg & Medina (1993); Bishop (1993); the U. S. General Accounting Office (1993); Eckstein & Noah (1993); Mattsson (1993); Brown (1992); Heyneman & Ransom (1992); Whetton (1992); Anderson, Muir, Bateson, Blackmore & Rogers (1990); Csikszentmihalyi (1990); *Levine & Lezotte (1990); LaRoque & Coleman (1989); Hillocks (1987); Willingham & Morris (1986); Resnick & Resnick (1985); Ogle & Fritts (1984); *Natriello & Dornbusch (1984); Brooke & Oxenham (1984); Rentz (1979); Wellisch, MacQueen, Carriere & Dick (1978); *Rosswork (1977); Estes, Colvin & Goodwin (1976); Wood (1953); and Panlasigui & Knight (1930).  
12 Laura S. Hamilton Brian M. Stecher, Kun Yuan “One of the primary responsibilities of the federal government should be to ensure ongoing collection of evidence demonstrating the effects of the policies, which could be used to make decisions about whether to continue on the current course or whether small adjustments or a major overhaul are needed.” p. 55 Dismissive Standards-Based Reform in the United States: History, Research, and Future Directions Center on Education Policy, December, 2008 http://www.rand.org/content/dam/rand/pubs/reprints/2009/RAND_RP1384.pdf Relevant pre-2000 studies of the effects of standards, alignment, goal setting, setting reachable goals, etc. include those of Mitchell (1999); Morgan & Ramist (1998); the *Southern Regional Education Board (1998); Miles, Bishop, Collins, Fink, Gardner, Grant, Hussain, et al. (1997); the Florida Office of Program Policy Analysis (1997); Pomplun (1997); Schmoker (1996); Aguilera & Hendricks (1996); Banta, Lund, Black & Oblander (1996); Bottoms & Mikos (1995); *Bamburg & Medina (1993); Bishop (1993); the U. S. General Accounting Office (1993); Eckstein & Noah (1993); Mattsson (1993); Brown (1992); Heyneman & Ransom (1992); Whetton (1992); Anderson, Muir, Bateson, Blackmore & Rogers (1990); Csikszentmihalyi (1990); *Levine & Lezotte (1990); LaRoque & Coleman (1989); Hillocks (1987); Willingham & Morris (1986); Resnick & Resnick (1985); Ogle & Fritts (1984); *Natriello & Dornbusch (1984); Brooke & Oxenham (1984); Rentz (1979); Wellisch, MacQueen, Carriere & Dick (1978); *Rosswork (1977); Estes, Colvin & Goodwin (1976); Wood (1953); and Panlasigui & Knight (1930).  
13 Laura S. Hamilton Brian M. Stecher, Julie A. Marsh, Jennifer Sloan McCombs, Abby Robyn, Jennifer Lin Russell, Scott Naftel, Heather Barney "However, the paths through which SBA [standards-based accountability] changes district, school, and classroom practices and how these changes in practice influence student outcomes are largely unexplored. There is strong evidence that SBA leads to changes in teachers’ instructional practices (Hamilton, 2004; Stecher, 2002)." p.5 Dismissive Standards-Based Accountability Under No Child Left Behind: Experiences of Teachers and Administrators in Three States Rand Corporation, 2007 https://www.rand.org/pubs/monographs/MG589.html Relevant pre-2000 studies of the effects of standards, alignment, goal setting, setting reachable goals, etc. include those of Mitchell (1999); Morgan & Ramist (1998); the *Southern Regional Education Board (1998); Miles, Bishop, Collins, Fink, Gardner, Grant, Hussain, et al. (1997); the Florida Office of Program Policy Analysis (1997); Pomplun (1997); Schmoker (1996); Aguilera & Hendricks (1996); Banta, Lund, Black & Oblander (1996); Bottoms & Mikos (1995); *Bamburg & Medina (1993); Bishop (1993); the U. S. General Accounting Office (1993); Eckstein & Noah (1993); Mattsson (1993); Brown (1992); Heyneman & Ransom (1992); Whetton (1992); Anderson, Muir, Bateson, Blackmore & Rogers (1990); Csikszentmihalyi (1990); *Levine & Lezotte (1990); LaRoque & Coleman (1989); Hillocks (1987); Willingham & Morris (1986); Resnick & Resnick (1985); Ogle & Fritts (1984); *Natriello & Dornbusch (1984); Brooke & Oxenham (1984); Rentz (1979); Wellisch, MacQueen, Carriere & Dick (1978); *Rosswork (1977); Estes, Colvin & Goodwin (1976); Wood (1953); and Panlasigui & Knight (1930). "This research was sponsored by the National Science Foundation under grant number REC-0228295."
14 Laura S. Hamilton Brian M. Stecher, Julie A. Marsh, Jennifer Sloan McCombs, Abby Robyn, Jennifer Lin Russell, Scott Naftel, Heather Barney "Much less is known about the impact of SBA at the district and school levels and the relationships among actions at the various levels and student outcomes. This study was designed to shed light on this complex set of relationships…" p.5 Dismissive Standards-Based Accountability Under No Child Left Behind: Experiences of Teachers and Administrators in Three States Rand Corporation, 2007 https://www.rand.org/pubs/monographs/MG589.html Relevant pre-2000 studies of the effects of standards, alignment, goal setting, setting reachable goals, etc. include those of Mitchell (1999); Morgan & Ramist (1998); the *Southern Regional Education Board (1998); Miles, Bishop, Collins, Fink, Gardner, Grant, Hussain, et al. (1997); the Florida Office of Program Policy Analysis (1997); Pomplun (1997); Schmoker (1996); Aguilera & Hendricks (1996); Banta, Lund, Black & Oblander (1996); Bottoms & Mikos (1995); *Bamburg & Medina (1993); Bishop (1993); the U. S. General Accounting Office (1993); Eckstein & Noah (1993); Mattsson (1993); Brown (1992); Heyneman & Ransom (1992); Whetton (1992); Anderson, Muir, Bateson, Blackmore & Rogers (1990); Csikszentmihalyi (1990); *Levine & Lezotte (1990); LaRoque & Coleman (1989); Hillocks (1987); Willingham & Morris (1986); Resnick & Resnick (1985); Ogle & Fritts (1984); *Natriello & Dornbusch (1984); Brooke & Oxenham (1984); Rentz (1979); Wellisch, MacQueen, Carriere & Dick (1978); *Rosswork (1977); Estes, Colvin & Goodwin (1976); Wood (1953); and Panlasigui & Knight (1930). "This research was sponsored by the National Science Foundation under grant number REC-0228295."
15 Daniel M. Koretz & Laura S. Hamilton Robert L. Brennan, Ed. "Most of the studies of [testing's] effects on practice report average responses that mask some of these important variations and interactions." p.552 Denigrating Testing for Accountability in K-12 Chapter 15 in Educational Measurement, published by NCME and ACE, 2006   Relevant studies of the effects of varying types of incentive or the optimal structure of incentives include those of Kelley (1999); the *Southern Regional Education Board (1998); Trelfa (1998); Heneman (1998); Banta, Lund, Black & Oblander (1996); Brooks-Cooper, 1993; Eckstein & Noah (1993); Richards & Shen (1992); Jacobson (1992); Heyneman & Ransom (1992); *Levine & Lezotte (1990); Duran, 1989; *Crooks (1988); *Kulik & Kulik (1987); Corcoran & Wilson (1986); *Guskey & Gates (1986); Brook & Oxenham (1985); Oxenham (1984); Venezky & Winfield (1979); Brookover & Lezotte (1979); McMillan (1977); Abbott (1977); *Staats (1973); *Kazdin & Bootzin (1972); *O’Leary & Drabman (1971); Cronbach (1960); Hurlock (1925), and Zeng (2001). *Covers many studies; study is a research review, research synthesis, or meta-analysis.  Other researchers who, even prior to 2000, studied test-based incentive programs include Homme, Csanyi, Gonzales, Rechs, O’Leary, Drabman, Kaszdin, Bootzin, Staats, Cameron, Pierce, McMillan, Corcoran, Roueche, Kirk, Wheeler, Boylan, and Wilson.  
16 Daniel M. Koretz & Laura S. Hamilton Robert L. Brennan, Ed. "There is no comprehensive source of information on how much time schools devote to coaching activities such as practicing on released test forms, but some studies suggest these activities are widespread." p.552 Dismissive Testing for Accountability in K-12 Chapter 15 in Educational Measurement, published by NCME and ACE, 2006   A comprehensive study that included a nationally representative population of all systemwide tests contains exactl that information. See Phelps, R.P. (2000, Winter). Estimating the cost of systemwide student testing in the United States. Journal of Education Finance, 25(3) 343–380;  U.S. GAO. (1993, January). Student testing: Current extent and expenditures, with cost estimates for a national examination. GAO/PEMD-93-8. Washington, DC: US General Accounting Office.        Also the test prep, or test coaching, literature is vast and dates back decades, with meta-analyses of the literature dating back at least to the 1970s. There's even a What Works Clearinghouse summary of the (post World Wide Web) college admission test prep research literature:  https://ies.ed.gov/ncee/wwc/Docs/InterventionReports/wwc_act_sat_100416.pdf .  
17 Daniel M. Koretz & Laura S. Hamilton Robert L. Brennan, Ed. "As with coaching, there are no comprehensive studies of the frequency of cheating across schools in the United States." p.553 Dismissive Testing for Accountability in K-12 Chapter 15 in Educational Measurement, published by NCME and ACE, 2006   Actually, there have been, in surveys, in which respondents freely admit that they cheat and how. Moreover, news reports of cheating, by students or educators, have been voluminous. See, for example, Caveon Test Security's "Cheating in the News" section on its web site.  
18 Daniel M. Koretz & Laura S. Hamilton Robert L. Brennan, Ed. "However, in the absence of audit testing, this hypothesis [of score inflation] cannot be tested." p.553 Denigrating Testing for Accountability in K-12 Chapter 15 in Educational Measurement, published by NCME and ACE, 2006   Yes, it can, and often has been, tested in experiments. Koretz's preferred method for "auditing" a high-stakes test is to compare its score trends to those of a parallel no-stakes test, which, presumably, will have totally reliable score trends. Yet, a cornucopia of experimental research has shown "no stakes" tests to be relatively unreliable, less reliable than high stakes tests, and to dampen student effort (see, e.g., Acherman & Kanfer, 2009; S. M. Brown & Walberg, 1993; Cole, Bergin, & Whittaker, 2008; Eklof, 2007; Finn, 2015; Hawthorne, Bol, Pribesh, & Suh, 2015; Wise & DeMars, 2005, 2015).  
19 Laura S. Hamilton   "Despite their popularity, in most cases these [education] reforms are not guided by a careful investigation of the probable consequences of using tests as accountability tools.", p.25 Denigrating Assessment as a Policy Tool Chapter 2, in Review of Research in Education (27), 2003 https://journals.sagepub.com/doi/pdf/10.3102/0091732X027001025 Hamilton's preferred method for "auditing" a high-stakes test is to compare its score trends to those of a parallel no-stakes test, which, presumably, will have totally reliable score trends. Yet, a cornucopia of experimental research has shown "no stakes" tests to be relatively unreliable, less reliable than high stakes tests, and to dampen student effort (see, e.g., Acherman & Kanfer, 2009; S. M. Brown & Walberg, 1993; Cole, Bergin, & Whittaker, 2008; Eklof, 2007; Finn, 2015; Hawthorne, Bol, Pribesh, & Suh, 2015; Wise & DeMars, 2005, 2015).  
20 Laura S. Hamilton   "Although numerous studies have examined the effects of high-stakes testing, the majority of these investigations have failed to reach the standards of quality that would be required to make strong inferences based upon them.", p.32 Denigrating Assessment as a Policy Tool Chapter 2, in Review of Research in Education (27), 2003 https://journals.sagepub.com/doi/pdf/10.3102/0091732X027001025 Nonsense, it is straightforward, and has been done for over a century. For example, 70%, or 357, of the effect sizes in a recent meta-regression derived from randomized experiments conducted over the past century. Another 12% derived from multiple regression studies.  https://journals.sagepub.com/doi/abs/10.1177/0193841X19865628#abstract  
21 Laura S. Hamilton   "In addition, it is nearly impossible for researchers to set up the kind of experimental design that is most appropriate for examining cause-and-effect relationships." p.32 Dismissive Assessment as a Policy Tool Chapter 2, in Review of Research in Education (27), 2003 https://journals.sagepub.com/doi/pdf/10.3102/0091732X027001025 Nonsense, it is straightforward, and has been done for over a century. For example, 70%, or 357, of the effect sizes in a recent meta-regression derived from randomized experiments conducted over the past century. Another 12% derived from multiple regression studies.  https://journals.sagepub.com/doi/abs/10.1177/0193841X19865628#abstract  
22 Laura S. Hamilton   "...even studies that are intended to be merely descriptive often suffer from poor measurement of the construct of interest, as well as biased samples that may result from nonrepresentative sampling or nonrandom refusal to participate in the research." p. 32 Denigrating Assessment as a Policy Tool Chapter 2, in Review of Research in Education (27), 2003 https://journals.sagepub.com/doi/pdf/10.3102/0091732X027001025 Nonsense, it is straightforward, and has been done for over a century. For example, 70%, or 357, of the effect sizes in a recent meta-regression derived from randomized experiments conducted over the past century. Another 12% derived from multiple regression studies.  https://journals.sagepub.com/doi/abs/10.1177/0193841X19865628#abstract  
23 Laura S. Hamilton   "...when test scores are associated with consequences that are important or meaningful to teachers, it is likely that instruction will be affected. The empirical evidence, though not extensive, supports this distinction." p.33 Dismissive Assessment as a Policy Tool Chapter 2, in Review of Research in Education (27), 2003 https://journals.sagepub.com/doi/pdf/10.3102/0091732X027001025 Relevant studies of the effects of tests and/or accountability program on motivation and instructional practice include those of the *Southern Regional Education Board (1998); Johnson (1998); Schafer, Hultgren, Hawley, Abrams Seubert & Mazzoni (1997); Miles, Bishop, Collins, Fink, Gardner, Grant, Hussain, et al. (1997); Tuckman & Trimble (1997); Clarke & Stephens (1996); Zigarelli (1996); Stevenson, Lee, et al. (1995); Waters, Burger & Burger (1995); Egeland (1995); Prais (1995); Tuckman (1994); Ritchie & Thorkildsen (1994); Brown & Walberg, (1993); Wall & Alderson (1993); Wolf & Rapiau (1993); Eckstein & Noah (1993); Chao-Qun & Hui (1993); Plazak & Mazur (1992); Steedman (1992); Singh, Marimutha & Mukjerjee (1990); *Levine & Lezotte (1990); O’Sullivan (1989); Somerset (1988); Pennycuick & Murphy (1988); Stevens (1984); Marsh (1984); Brunton (1982); Solberg (1977); Foss (1977); *Kirkland (1971); Somerset (1968); Stuit (1947); and Keys (1934).  *Covers many studies; study is a research review, research synthesis, or meta-analysis.  
24 Laura S. Hamilton   "...there are no compehensive studies of the frequency of cheating." p.35 Dismissive Assessment as a Policy Tool Chapter 2, in Review of Research in Education (27), 2003 https://journals.sagepub.com/doi/pdf/10.3102/0091732X027001025 Actually, there have been, in surveys, in which respondents freely admit that they cheat and how. Moreover, news reports of cheating, by students or educators, have been voluminous. See, for example, Caveon Test Security's "Cheating in the News" section on its web site.  
25 Laura S. Hamilton   "Although these studies are suggestive, they rely on teacher perceptions, and there is little direct evidence of how testing actually affects student morale." p.39 Dismissive Assessment as a Policy Tool Chapter 2, in Review of Research in Education (27), 2003 https://journals.sagepub.com/doi/pdf/10.3102/0091732X027001025 At least twelve pre-2004 student surveys were included here: https://www.nonpartisaneducation.org/Review/Resources/SurveyList.htm. See also: https://richardphelps.net/DemandForStandardizedTesting.pdf  
26 Laura S. Hamilton   "The overall lack of evidence regarding student morale, stress, and motivation is due in part to the difficulty that researchers have in gaining access to students and measuring their levels of these constructs (Stecher, 2002)." p.39 Dismissive Assessment as a Policy Tool Chapter 2, in Review of Research in Education (27), 2003 https://journals.sagepub.com/doi/pdf/10.3102/0091732X027001025 At least twelve pre-2004 student surveys were included here: https://www.nonpartisaneducation.org/Review/Resources/SurveyList.htm. See also: https://richardphelps.net/DemandForStandardizedTesting.pdf
In 150 of 241 qualitative studies the focus of the interviews, case study, or observations was the effect of testing stakes on students. See: https://www.tandfonline.com/doi/abs/10.1080/15305058.2011.602920
 
27 Laura S. Hamilton   "...though one study that examined high school exit exams and that controlled for individual student characteristics (unlike most of the research on this topic) found no such relationship." p.40 Denigrating Assessment as a Policy Tool Chapter 2, in Review of Research in Education (27), 2003 https://journals.sagepub.com/doi/pdf/10.3102/0091732X027001025 The article to which she refers ignored most previous studies, mischaracterized the ones it acknowledged, and mis-classified the testing programs in several states.  
28 Laura S. Hamilton   "There is much we do not know about score inflation." p.46 Dismissive Assessment as a Policy Tool Chapter 2, in Review of Research in Education (27), 2003 https://journals.sagepub.com/doi/pdf/10.3102/0091732X027001025 In fact, we know quite a lot about the source of higher levels of score inflation -- it is lax test security. The many experimental studies of test coaching are consistent, it has some modest effect, and not the volatile or very large effects that Koretz claims.  
29 Laura S. Hamilton   "there is simply too much that we currently do not know about how to design testing policies that promote desirable outcomes and prevent undesirable ones." p.57 Dismissive Assessment as a Policy Tool Chapter 2, in Review of Research in Education (27), 2003 https://journals.sagepub.com/doi/pdf/10.3102/0091732X027001025 In fact, the evidence "that testing can improve education" is voluminous. See, for example, Phelps, R. P. (2005). The rich, robust research literature on testing’s achievement benefits. In R. P. Phelps (Ed.), Defending standardized testing (pp. 55–90). Mahwah, NJ: Psychology Press. Or, see https://journals.sagepub.com/doi/abs/10.1177/0193841X19865628#abstract  
30 Laura S. Hamilton   "...the search for answers to questions about how to minimize score inflation and promote effective instruction is likely to continue for many years." p.57 Dismissive Assessment as a Policy Tool Chapter 2, in Review of Research in Education (27), 2003 https://journals.sagepub.com/doi/pdf/10.3102/0091732X027001025 In fact, we know quite a lot about the source of higher levels of score inflation -- it is lax test security. The many experimental studies of test coaching are consistent, it has some modest effect, and not the volatile or very large effects that Koretz claims.  
31 Laura S. Hamilton Daniel F. McCaffrey, J.R. Lockwood, Daniel M. Koretz “The shortcomings of the studies make it difficult to determine the size of teacher effects, but we suspect that the magnitude of some of the effects reported in this literature are overstated.” p. xiii Denigrating Evaluating Value-Added Models for Teacher Accountability  Rand Corporation, 2003 https://www.rand.org/content/dam/rand/pubs/monographs/2004/RAND_MG158.pdf Tennessee's TVAAS value-added measurement system had been running a decade when they wrote this and did much of what these authors claim had never been done.  
32 Laura S. Hamilton Daniel F. McCaffrey, J.R. Lockwood, Daniel M. Koretz “Using VAM to estimate individual teacher effects is a recent endeavor, and many of the possible sources of error have not been thoroughly evaluated in the literature.” p. xix Dismissive Evaluating Value-Added Models for Teacher Accountability  Rand Corporation, 2003 https://www.rand.org/content/dam/rand/pubs/monographs/2004/RAND_MG158.pdf Tennessee's TVAAS value-added measurement system had been running a decade when they wrote this and did much of what these authors claim had never been done.  
33 Laura S. Hamilton Daniel F. McCaffrey, J.R. Lockwood, Daniel M. Koretz “Empirical evaluations do not exist for many of the potential sources of error we have identified. Studies need to be conducted to determine how these factors contribute to estimated teacher effects and to determine the conditions that exacerbate or mitigate the impact these factors have on teacher effects.” p. xix Dismissive Evaluating Value-Added Models for Teacher Accountability  Rand Corporation, 2003 https://www.rand.org/content/dam/rand/pubs/monographs/2004/RAND_MG158.pdf Tennessee's TVAAS value-added measurement system had been running a decade when they wrote this and did much of what these authors claim had never been done.  
34 Laura S. Hamilton Daniel F. McCaffrey, J.R. Lockwood, Daniel M. Koretz “This lack of attention to teachers in policy discussions may be attributed in part to another body of literature that attempted to determine the effects of specific teacher background characteristics, including credentialing status (e.g., Miller, McKenna, and McKenna, 1998; Goldhaber and Brewer, 2000) and subject matter coursework (e.g., Monk, 1994).” p. 8 Dismissive Evaluating Value-Added Models for Teacher Accountability  Rand Corporation, 2003 https://www.rand.org/content/dam/rand/pubs/monographs/2004/RAND_MG158.pdf Tennessee's TVAAS value-added measurement system had been running a decade when they wrote this and did much of what these authors claim had never been done.  
35 Laura S. Hamilton Daniel F. McCaffrey, J.R. Lockwood, Daniel M. Koretz “To date, there has been little empirical exploration of the size of school effects and the sensitivity of teacher effects to modeling of school effects.” p. 78 Dismissive Evaluating Value-Added Models for Teacher Accountability  Rand Corporation, 2003 https://www.rand.org/content/dam/rand/pubs/monographs/2004/RAND_MG158.pdf Tennessee's TVAAS value-added measurement system had been running a decade when they wrote this and did much of what these authors claim had never been done.  
36 Laura S. Hamilton Daniel F. McCaffrey, J.R. Lockwood, Daniel M. Koretz “There are no empirical explorations of the robustness of estimates to assumptions about prior-year schooling effects.“ p. 81 Dismissive Evaluating Value-Added Models for Teacher Accountability  Rand Corporation, 2003 https://www.rand.org/content/dam/rand/pubs/monographs/2004/RAND_MG158.pdf Tennessee's TVAAS value-added measurement system had been running a decade when they wrote this and did much of what these authors claim had never been done.  
37 Laura S. Hamilton Daniel F. McCaffrey, J.R. Lockwood, Daniel M. Koretz “There is currently no empirical evidence about the sensitivity of gain scores or teacher effects to such alternatives.” p. 89 Dismissive Evaluating Value-Added Models for Teacher Accountability  Rand Corporation, 2003 https://www.rand.org/content/dam/rand/pubs/monographs/2004/RAND_MG158.pdf Tennessee's TVAAS value-added measurement system had been running a decade when they wrote this and did much of what these authors claim had never been done.  
38 Laura S. Hamilton Daniel F. McCaffrey, J.R. Lockwood, Daniel M. Koretz “Empirical evaluations do not exist for many of the potential sources of error we have identified. Studies need to be conducted to determine how these factors contribute to estimated teacher effects and to determine the conditions that exacerbate or mitigate the impact these factors have on teacher effects.” p. 116 Dismissive Evaluating Value-Added Models for Teacher Accountability  Rand Corporation, 2003 https://www.rand.org/content/dam/rand/pubs/monographs/2004/RAND_MG158.pdf Tennessee's TVAAS value-added measurement system had been running a decade when they wrote this and did much of what these authors claim had never been done.  
39 Laura S. Hamilton Daniel F. McCaffrey, J.R. Lockwood, Daniel M. Koretz “Although we expect missing data are likely to be pervasive, there is little systematic discussion of the extent or nature of missing data in test score databases.” p. 117 Dismissive Evaluating Value-Added Models for Teacher Accountability  Rand Corporation, 2003 https://www.rand.org/content/dam/rand/pubs/monographs/2004/RAND_MG158.pdf Tennessee's TVAAS value-added measurement system had been running a decade when they wrote this and did much of what these authors claim had never been done.  
40 Brian M. Stecher Laura S. Hamilton "The business model of setting clear targets, attaching incentives to the attainment of those targets, and rewarding those responsible for reaching the targets has proven successful in a wide range of business enterprises. But there is no evidence that these accountability principles will work well in an educational context, and there are many reasons to doubt that the principles can be applied without significant adaptation." Dismissive Putting Theory to the Test: Systems of "Educational Accountability" Should be Held Accountable Rand Review, Spring 2002 https://www.rand.org/pubs/periodicals/rand-review/issues/rr-04-02/theory.html See, for example:  https://www.tandfonline.com/doi/full/10.1080/15305058.2011.602920 ; https://nonpartisaneducation.org/Review/Resources/QuantitativeList.htm .  This list includes 24 studies completed before 2000 whose primary focus was to measure the effect of “test-based accountability.” A few dozen more pre-2000 studies also measured the effect of test-based accountability although such was not their primary focus. Include qualitative and program evaluation studies of test-based accountability, and the count of pre-2000 studies rises into the hundreds.   
41 Brian M. Stecher Laura S. Hamilton " The lack of strong evidence regarding the design and effectiveness of accountability systems hampers policymaking at a critical juncture." Denigrating Putting Theory to the Test: Systems of "Educational Accountability" Should be Held Accountable Rand Review, Spring 2002 https://www.rand.org/pubs/periodicals/rand-review/issues/rr-04-02/theory.html See, for example:  https://www.tandfonline.com/doi/full/10.1080/15305058.2011.602920 ; https://nonpartisaneducation.org/Review/Resources/QuantitativeList.htm .  This list includes 24 studies completed before 2000 whose primary focus was to measure the effect of “test-based accountability.” A few dozen more pre-2000 studies also measured the effect of test-based accountability although such was not their primary focus. Include qualitative and program evaluation studies of test-based accountability, and the count of pre-2000 studies rises into the hundreds.   
42 Brian M. Stecher Laura S. Hamilton "Nonetheless, the evidence has yet to justify the expectations. The initial evidence is, at best, mixed. On the plus side, students and teachers seem to respond to the incentives created by the accountability systems Dismissive Putting Theory to the Test: Systems of "Educational Accountability" Should be Held Accountable Rand Review, Spring 2002 https://www.rand.org/pubs/periodicals/rand-review/issues/rr-04-02/theory.html See, for example:  https://www.tandfonline.com/doi/full/10.1080/15305058.2011.602920 ; https://nonpartisaneducation.org/Review/Resources/QuantitativeList.htm .  This list includes 24 studies completed before 2000 whose primary focus was to measure the effect of “test-based accountability.” A few dozen more pre-2000 studies also measured the effect of test-based accountability although such was not their primary focus. Include qualitative and program evaluation studies of test-based accountability, and the count of pre-2000 studies rises into the hundreds.   
43 Brian M. Stecher Laura S. Hamilton "Proponents of accountability attribute the improved scores in these states to clearer expectations, greater motivation on the part of the students and teachers, a focused curriculum, and more-effective instruction. However, there is little or no research to substantiate these positive changes or their effects on scores." Dismissive Putting Theory to the Test: Systems of "Educational Accountability" Should be Held Accountable Rand Review, Spring 2002 https://www.rand.org/pubs/periodicals/rand-review/issues/rr-04-02/theory.html In fact, the evidence "that testing can improve education" is voluminous. See, for example, Phelps, R. P. (2005). The rich, robust research literature on testing’s achievement benefits. In R. P. Phelps (Ed.), Defending standardized testing (pp. 55–90). Mahwah, NJ: Psychology Press. Or, see https://journals.sagepub.com/doi/abs/10.1177/0193841X19865628#abstract  
44 Brian M. Stecher Laura S. Hamilton "One of the earliest studies on the effects of testing (conducted in two Arizona schools in the late 1980s) showed that teachers reduced their emphasis on important, nontested material." Dismissive Putting Theory to the Test: Systems of "Educational Accountability" Should be Held Accountable Rand Review, Spring 2002 https://www.rand.org/pubs/periodicals/rand-review/issues/rr-04-02/theory.html Hundreds, perhaps thousands, of studies of the effects of testing predate CRESST's in the 1980s. See, for example:  https://www.tandfonline.com/doi/full/10.1080/15305058.2011.602920 ; https://nonpartisaneducation.org/Review/Resources/QuantitativeList.htm ; https://nonpartisaneducation.org/Review/Resources/SurveyList.htm ; https://nonpartisaneducation.org/Review/Resources/QualitativeList.htm  
45 Brian M. Stecher Laura S. Hamilton "Test-based accountability systems will work better if we acknowledge how little we know about them, if the federal government devotes appropriate resources to studying them, and if the states make ongoing efforts to improve them."  Dismissive Putting Theory to the Test: Systems of "Educational Accountability" Should be Held Accountable Rand Review, Spring 2002 https://www.rand.org/pubs/periodicals/rand-review/issues/rr-04-02/theory.html See, for example:  https://www.tandfonline.com/doi/full/10.1080/15305058.2011.602920 ; https://nonpartisaneducation.org/Review/Resources/QuantitativeList.htm .  This list includes 24 studies completed before 2000 whose primary focus was to measure the effect of “test-based accountability.” A few dozen more pre-2000 studies also measured the effect of test-based accountability although such was not their primary focus. Include qualitative and program evaluation studies of test-based accountability, and the count of pre-2000 studies rises into the hundreds.   
46 Brian M. Stecher Laura S. Hamilton, Stephen P. Klein, Eds. "High-stakes testing may also affect parents (e.g., their attitudes toward education, their engagement with schools,  and their direct participation in their child's learning) as well as policymakers (their beliefs about system performance, their judgements about program effectiveness, and their allocation of resources). However, these issues remain largely unexamined in the literature." Dismissive Consequences of large-scale, high-stakes testing on school and classroom practice Chapter 4 in Making sense of test-based accountability in education, 2002, p.79 https://www.rand.org/content/dam/rand/pubs/monograph_reports/2002/MR1554.pdf Parents and other adults are typically reached.through public opinion polls. See, for example:  https://www.tandfonline.com/doi/full/10.1080/15305058.2011.602920 ; https://nonpartisaneducation.org/Review/Resources/SurveyList.htm .  Among the hundreds of polls conducted between 1958 and 2008, a majority of them included parents in particular or adults in general.  
47 Brian M. Stecher Laura S. Hamilton, Stephen P. Klein, Eds. "As described in chapter 2, there was little concern about the effects of testing on teaching prior to the 1970s." Dismissive Consequences of large-scale, high-stakes testing on school and classroom practice Chapter 4 in Making sense of test-based accountability in education, 2002, p.81 https://www.rand.org/content/dam/rand/pubs/monograph_reports/2002/MR1554.pdf Hundreds, perhaps thousands, of studies of the effects of testing predate CRESST's in the 1980s. See, for example:  https://www.tandfonline.com/doi/full/10.1080/15305058.2011.602920 ; https://nonpartisaneducation.org/Review/Resources/QuantitativeList.htm ; https://nonpartisaneducation.org/Review/Resources/SurveyList.htm ; https://nonpartisaneducation.org/Review/Resources/QualitativeList.htm  
48 Brian M. Stecher Laura S. Hamilton, Stephen P. Klein, Eds. "In light of the changes that occurred in the uses of large-scale testing in the 1980s and 1990s, researchers began to investigate teachers' reactions to external assessment. The initial research on the impact of large-scale testing was conducted in the 1980s and the 1990s." Dismissive Consequences of large-scale, high-stakes testing on school and classroom practice Chapter 4 in Making sense of test-based accountability in education, 2002, p.83 https://www.rand.org/content/dam/rand/pubs/monograph_reports/2002/MR1554.pdf Hundreds, perhaps thousands, of studies of the effects of testing predate CRESST's in the 1980s. See, for example:  https://www.tandfonline.com/doi/full/10.1080/15305058.2011.602920 ; https://nonpartisaneducation.org/Review/Resources/QuantitativeList.htm ; https://nonpartisaneducation.org/Review/Resources/SurveyList.htm ; https://nonpartisaneducation.org/Review/Resources/QualitativeList.htm  
49 Brian M. Stecher Laura S. Hamilton, Stephen P. Klein, Eds. "Researchers have not documented the desirable consequences of testing … as clearly as the undesirable ones. More importantly, researchers have not generally measured the extent or magnitude of the shifts in practice that they identified as a result of high-stakes testing." Dismissive Consequences of large-scale, high-stakes testing on school and classroom practice Chapter 4 in Making sense of test-based accountability in education, 2002, pp.99–100 https://www.rand.org/content/dam/rand/pubs/monograph_reports/2002/MR1554.pdf Hundreds, perhaps thousands, of studies of the effects of testing predate CRESST's in the 1980s. See, for example:  https://www.tandfonline.com/doi/full/10.1080/15305058.2011.602920 ; https://nonpartisaneducation.org/Review/Resources/QuantitativeList.htm ; https://nonpartisaneducation.org/Review/Resources/SurveyList.htm ; https://nonpartisaneducation.org/Review/Resources/QualitativeList.htm  
50 Laura S. Hamilton Brian M. Stecher, Stephen P. Klein "Although test-based accountability has shown some compelling results, the issues are complex, the research is new and incomplete, and many of the claims that have received the most attention have proved to be premature and superficial." Denigrating Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002 Summary, p.xiv https://www.rand.org/content/dam/rand/pubs/monograph_reports/2002/MR1554.pdf Hundreds, perhaps thousands, of studies of the effects of testing predate CRESST's in the 1980s. See, for example:  https://www.tandfonline.com/doi/full/10.1080/15305058.2011.602920 ; https://nonpartisaneducation.org/Review/Resources/QuantitativeList.htm ; https://nonpartisaneducation.org/Review/Resources/SurveyList.htm ; https://nonpartisaneducation.org/Review/Resources/QualitativeList.htm  
51 Laura S. Hamilton Brian M. Stecher, Stephen P. Klein "The research evidence does not provide definitive information about the actual costs of testing but the information that is available suggests that expenditures for testing have grown in recent years." Dismissive Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002 Introduction, p.9 https://www.rand.org/content/dam/rand/pubs/monograph_reports/2002/MR1554.pdf No. See, for example, Phelps, R.P. (2000, Winter). Estimating the cost of systemwide student testing in the United States. Journal of Education Finance, 25(3) 343–380; Danitz, T. (2001, February 27). Special report: States pay $400 million for tests in 2001. Stateline.org. Pew Center for the States; Hoxby, C.M. (2002). The cost of accountability, in W. M Evers & H.J. Walberg (Eds.), School Accountability, Stanford, CA: Hoover Institution Press; U.S. GAO. (1993, January). Student testing: Current extent and expenditures, with cost estimates for a national examination. GAO/PEMD-93-8. Washington, DC: US General Accounting Office; Picus, L.O., & Tralli, A. (1998, February). Alternative assessment programs: What are the true costs? CSE Technical Report 441, Los Angeles: CRESST; Phelps, R.P. (1998). Benefit-cost analysis of systemwide student testing, Paper presented at the annual meeting of the American Education Finance Association, Mobile, AL.  
52 Laura S. Hamilton Daniel M. Koretz "There is currently no substantial evidence on the effects of published report cards on parents’ decisionmaking or on the schools themselves." Dismissive Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002 Chapter 2: Tests and their use in test-based accountability systems, p.44
https://www.rand.org/content/dam/rand/pubs/monograph_reports/2002/MR1554.pdf For decades, consulting services have existed that help parents new to a city select the right school or school district for them.  
53 Laura S. Hamilton Brian M. Stecher "So test-based accountability remains controversial because there is inadequate evidence to make clear judgments about its effectiveness in raising test scores and achieving its other goals." Dismissive Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002 Chapter 6: Improving test-based accountability, p.122 https://www.rand.org/content/dam/rand/pubs/monograph_reports/2002/MR1554.pdf In fact, the evidence "that testing can improve education" is voluminous. See, for example, Phelps, R. P. (2005). The rich, robust research literature on testing’s achievement benefits. In R. P. Phelps (Ed.), Defending standardized testing (pp. 55–90). Mahwah, NJ: Psychology Press. Or, see https://journals.sagepub.com/doi/abs/10.1177/0193841X19865628#abstract  
54 Laura S. Hamilton Brian M. Stecher "Unfortunately, the complexity of the issues and the ambiguity of the existing research do not allow our recommendations to take the form of a practical “how-to” guide for policymakers and practitioners." Denigrating Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002 Chapter 6: Improving test-based accountability, p.123 https://www.rand.org/content/dam/rand/pubs/monograph_reports/2002/MR1554.pdf In fact, the evidence "that testing can improve education" is voluminous. See, for example, Phelps, R. P. (2005). The rich, robust research literature on testing’s achievement benefits. In R. P. Phelps (Ed.), Defending standardized testing (pp. 55–90). Mahwah, NJ: Psychology Press. Or, see https://journals.sagepub.com/doi/abs/10.1177/0193841X19865628#abstract  
55 Laura S. Hamilton Brian M. Stecher "Another part of the interpretive question is the need to gather information
in other subject areas to portray a more complete picture of
achievement.
The scope of constructs that have been considered in
research
to date has been fairly narrow, focusing on the subjects that
are part of the accountability systems that have been studied. Many
legitimate instructional
objectives have been ignored in the literature
to date."
Dismissive Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002 Chapter 6: Improving test-based accountability, p.127 https://www.rand.org/content/dam/rand/pubs/monograph_reports/2002/MR1554.pdf Many studies of the effects of testing predate CRESST's in the 1980s and cover all subject fields, not just reading and math. See, for example:  https://www.tandfonline.com/doi/full/10.1080/15305058.2011.602920 ; https://nonpartisaneducation.org/Review/Resources/QuantitativeList.htm ; https://nonpartisaneducation.org/Review/Resources/SurveyList.htm ; https://nonpartisaneducation.org/Review/Resources/QualitativeList.htm  
56 Laura S. Hamilton Brian M. Stecher "States should also conduct ongoing analyses of the performance of groups whose members may not be numerous enough to permit separate reporting. English-language learners and students with disabilities are increasingly being included in high-stakes testing systems, and, as discussed in Chapter Three, little is currently known about the validity of scores for these groups." Dismissive Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002 Chapter 6: Improving test-based accountability, p.131 https://www.rand.org/content/dam/rand/pubs/monograph_reports/2002/MR1554.pdf Difficult to believe given that the federal government has for decades generously funded research into testing students with disabilities. See, for example, https://nceo.info/ and Kurt Geisinger's and Janet Carlson's chapters in Defending Standardized Testing and Correcting Fallacies in Educational and Psychological Testing.   
57 Laura S. Hamilton Brian M. Stecher "It would be especially helpful to know what changes in instruction are made in response to different kinds of information and incentives. In particular, we need to know how teachers interpret information from tests and how they use it to modify instruction." Dismissive Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002 Chapter 6: Improving test-based accountability, p.133 https://www.rand.org/content/dam/rand/pubs/monograph_reports/2002/MR1554.pdf Relevant studies of the effects of tests and/or accountability program on motivation and instructional practice include those of the *Southern Regional Education Board (1998); Johnson (1998); Schafer, Hultgren, Hawley, Abrams Seubert & Mazzoni (1997); Miles, Bishop, Collins, Fink, Gardner, Grant, Hussain, et al. (1997); Tuckman & Trimble (1997); Clarke & Stephens (1996); Zigarelli (1996); Stevenson, Lee, et al. (1995); Waters, Burger & Burger (1995); Egeland (1995); Prais (1995); Tuckman (1994); Ritchie & Thorkildsen (1994); Brown & Walberg, (1993); Wall & Alderson (1993); Wolf & Rapiau (1993); Eckstein & Noah (1993); Chao-Qun & Hui (1993); Plazak & Mazur (1992); Steedman (1992); Singh, Marimutha & Mukjerjee (1990); *Levine & Lezotte (1990); O’Sullivan (1989); Somerset (1988); Pennycuick & Murphy (1988); Stevens (1984); Marsh (1984); Brunton (1982); Solberg (1977); Foss (1977); *Kirkland (1971); Somerset (1968); Stuit (1947); and Keys (1934).  *Covers many studies; study is a research review, research synthesis, or meta-analysis.  
58 Laura S. Hamilton Brian M. Stecher It seems clear that aligning the components of the system and providing appropriate professional development should, at a minimum, increase teachers’ political support for test-based accountability policies .... Although there is no empirical evidence to suggest that this strategy will reduce inappropriate responses to high-stakes testing,... Additional research needs to be done to determine the importance of alignment for promoting positive effects of test-based accountability. Dismissive Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002 Chapter 6: Improving test-based accountability, p.135 https://www.rand.org/content/dam/rand/pubs/monograph_reports/2002/MR1554.pdf Relevant studies of the effects of tests and/or accountability program on motivation and instructional practice include those of the *Southern Regional Education Board (1998); Johnson (1998); Schafer, Hultgren, Hawley, Abrams Seubert & Mazzoni (1997); Miles, Bishop, Collins, Fink, Gardner, Grant, Hussain, et al. (1997); Tuckman & Trimble (1997); Clarke & Stephens (1996); Zigarelli (1996); Stevenson, Lee, et al. (1995); Waters, Burger & Burger (1995); Egeland (1995); Prais (1995); Tuckman (1994); Ritchie & Thorkildsen (1994); Brown & Walberg, (1993); Wall & Alderson (1993); Wolf & Rapiau (1993); Eckstein & Noah (1993); Chao-Qun & Hui (1993); Plazak & Mazur (1992); Steedman (1992); Singh, Marimutha & Mukjerjee (1990); *Levine & Lezotte (1990); O’Sullivan (1989); Somerset (1988); Pennycuick & Murphy (1988); Stevens (1984); Marsh (1984); Brunton (1982); Solberg (1977); Foss (1977); *Kirkland (1971); Somerset (1968); Stuit (1947); and Keys (1934).  *Covers many studies; study is a research review, research synthesis, or meta-analysis.  
59 Laura S. Hamilton Brian M. Stecher "… we currently do not know enough about test-based accountability to design a system that is immune from the problems we have discussed Dismissive Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002 Chapter 6: Improving test-based accountability, p.136 https://www.rand.org/content/dam/rand/pubs/monograph_reports/2002/MR1554.pdf In fact, the evidence "that testing can improve education" is voluminous. See, for example, Phelps, R. P. (2005). The rich, robust research literature on testing’s achievement benefits. In R. P. Phelps (Ed.), Defending standardized testing (pp. 55–90). Mahwah, NJ: Psychology Press. Or, see https://journals.sagepub.com/doi/abs/10.1177/0193841X19865628#abstract  
60 Laura S. Hamilton Brian M. Stecher "There is some limited evidence that educators’ responses to test based accountability vary according to the characteristics of their student populations,…" Denigrating Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002 Chapter 6: Improving test-based accountability, p.138 https://www.rand.org/content/dam/rand/pubs/monograph_reports/2002/MR1554.pdf There was and is far more than "limited" evidence. See, for example:  Hundreds, perhaps thousands, of studies of the effects of testing predate CRESST's in the 1980s. See, for example:  https://www.tandfonline.com/doi/full/10.1080/15305058.2011.602920 ; https://nonpartisaneducation.org/Review/Resources/SurveyList.htm ; https://nonpartisaneducation.org/Review/Resources/QualitativeList.htm  
61 Laura S. Hamilton Brian M. Stecher "... there is very limited evidence to guide thinking about political issues." Dismissive Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002 Chapter 6: Improving test-based accountability, p.139 https://www.rand.org/content/dam/rand/pubs/monograph_reports/2002/MR1554.pdf    
62 Laura S. Hamilton Brian M. Stecher "First, we do not have an accurate assessment of the additional costs." Denigrating Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002 Chapter 6: Improving test-based accountability, p.141 https://www.rand.org/content/dam/rand/pubs/monograph_reports/2002/MR1554.pdf Yes, we did and we do. See, for example, Phelps, R.P. (2000, Winter). Estimating the cost of systemwide student testing in the United States. Journal of Education Finance, 25(3) 343–380; Danitz, T. (2001, February 27). Special report: States pay $400 million for tests in 2001. Stateline.org. Pew Center for the States; Hoxby, C.M. (2002). The cost of accountability, in W. M Evers & H.J. Walberg (Eds.), School Accountability, Stanford, CA: Hoover Institution Press; U.S. GAO. (1993, January). Student testing: Current extent and expenditures, with cost estimates for a national examination. GAO/PEMD-93-8. Washington, DC: US General Accounting Office; Picus, L.O., & Tralli, A. (1998, February). Alternative assessment programs: What are the true costs? CSE Technical Report 441, Los Angeles: CRESST; Phelps, R.P. (1998). Benefit-cost analysis of systemwide student testing, Paper presented at the annual meeting of the American Education Finance Association, Mobile, AL.  
63 Laura S. Hamilton Brian M. Stecher "Part of the reason these issues are rarely considered may be that no one has produced a good estimate of the cost of an improved accountability system in comparison with its benefits." Denigrating Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002 Chapter 6: Improving test-based accountability, p.141 https://www.rand.org/content/dam/rand/pubs/monograph_reports/2002/MR1554.pdf No. See, for example, Phelps, R.P. (2000, Winter). Estimating the cost of systemwide student testing in the United States. Journal of Education Finance, 25(3) 343–380; Danitz, T. (2001, February 27). Special report: States pay $400 million for tests in 2001. Stateline.org. Pew Center for the States; Hoxby, C.M. (2002). The cost of accountability, in W. M Evers & H.J. Walberg (Eds.), School Accountability, Stanford, CA: Hoover Institution Press; U.S. GAO. (1993, January). Student testing: Current extent and expenditures, with cost estimates for a national examination. GAO/PEMD-93-8. Washington, DC: US General Accounting Office; Picus, L.O., & Tralli, A. (1998, February). Alternative assessment programs: What are the true costs? CSE Technical Report 441, Los Angeles: CRESST; Phelps, R.P. (1998). Benefit-cost analysis of systemwide student testing, Paper presented at the annual meeting of the American Education Finance Association, Mobile, AL.  
64 Laura S. Hamilton Brian M. Stecher "Nevertheless, our knowledge of the costs of alternative accountability systems is still somewhat limited. Policymakers need to know how much it would cost to change their current systems to be responsive to criticisms such as those described in this book. These estimates need to consider all of the associated costs, including possible opportunity costs associated with increased testing time and increased test preparation time." Dismissive Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002 Chapter 6: Improving test-based accountability, p.142 https://www.rand.org/content/dam/rand/pubs/monograph_reports/2002/MR1554.pdf No. See, for example, Phelps, R.P. (2000, Winter). Estimating the cost of systemwide student testing in the United States. Journal of Education Finance, 25(3) 343–380; Danitz, T. (2001, February 27). Special report: States pay $400 million for tests in 2001. Stateline.org. Pew Center for the States; Hoxby, C.M. (2002). The cost of accountability, in W. M Evers & H.J. Walberg (Eds.), School Accountability, Stanford, CA: Hoover Institution Press; U.S. GAO. (1993, January). Student testing: Current extent and expenditures, with cost estimates for a national examination. GAO/PEMD-93-8. Washington, DC: US General Accounting Office; Picus, L.O., & Tralli, A. (1998, February). Alternative assessment programs: What are the true costs? CSE Technical Report 441, Los Angeles: CRESST; Phelps, R.P. (1998). Benefit-cost analysis of systemwide student testing, Paper presented at the annual meeting of the American Education Finance Association, Mobile, AL.  
65 Laura S. Hamilton Brian M. Stecher "However, there is still much about these systems that is not well understood. Lack of research-based knowledge about the quality of scores and the mechanisms through which high-stakes testing programs operate limits our ability to improve these systems. As a result, our discussions also identified unanswered questions..." Dismissive Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002 Chapter 6: Improving test-based accountability, p.143 https://www.rand.org/content/dam/rand/pubs/monograph_reports/2002/MR1554.pdf In fact, the evidence "that testing can improve education" is voluminous. See, for example, Phelps, R. P. (2005). The rich, robust research literature on testing’s achievement benefits. In R. P. Phelps (Ed.), Defending standardized testing (pp. 55–90). Mahwah, NJ: Psychology Press. Or, see https://journals.sagepub.com/doi/abs/10.1177/0193841X19865628#abstract  
66 Daniel M. Koretz Daniel F. McCaffrey, Laura S. Hamilton "Although high-stakes testing is now widespread, methods for evaluating the validity of gains obtained under high-stakes conditions are poorly developed. This report presents an approach for evaluating the validity of inferences based on score gains on high-stakes tests. It describes the inadequacy of traditional validation approaches for validating gains under high-stakes conditions and outlines an alternative validation framework for conceptualizing meaningful and inflated score gains.", p.1 Denigrating Toward a framework for validating gains under high-stakes conditions CSE Technical Report 551, CRESST/Harvard Graduate School of Education, CRESST/RAND Education, December 2001 https://files.eric.ed.gov/fulltext/ED462410.pdf In fact the test prep, or test coaching, literature is vast and dates back decades, with meta-analyses of the literature dating back at least to the 1970s. There's even a What Works Clearinghouse summary of the (post World Wide Web) college admission test prep research literature:  https://ies.ed.gov/ncee/wwc/Docs/InterventionReports/wwc_act_sat_100416.pdf . See also:  Ortar (1960)  Marron (1965)  ETS (1965). Messick & Jungeblut (1981)  Ellis, Konoske, Wulfeck, & Montague (1982)  DerSimonian and Laird (1983)  Kulik, Bangert-Drowns & Kulik (1984)  Powers (1985)  Jones (1986). Fraker (1986/1987)  Halpin (1987)  Whitla (1988)  Snedecor (1989)  Bond (1989). Baydar (1990)  Becker (1990)  Smyth (1990)  Moore (1991)  Alderson & Wall (1992)  Powers (1993)  Oren (1993). Powers & Rock (1994)  Scholes, Lane (1997)   Allalouf & Ben Shakhar (1998)  Robb & Ercanbrack (1999)  McClain (1999)  Camara (1999, 2001, 2008) Stone & Lane (2000, 2003)  Din & Soldan (2001)  Briggs (2001)  Palmer (2002)  Briggs & Hansen (2004)  Cankoy & Ali Tut (2005)  Crocker (2005)  Allensworth, Correa, & Ponisciak (2008)  Domingue & Briggs (2009)  Koljatic & Silva (2014)  Early (2019)  
67 Daniel M. Koretz Daniel F. McCaffrey, Laura S. Hamilton "Few efforts are made to evaluate directly score gains obtained under high-stakes conditions, and conventional validation tools are not fully adequate for the task.", p. 1 Dismissive Toward a framework for validating gains under high-stakes conditions CSE Technical Report 551, CRESST/Harvard Graduate School of Education, CRESST/RAND Education, December 2001 https://files.eric.ed.gov/fulltext/ED462410.pdf In fact the test prep, or test coaching, literature is vast and dates back decades, with meta-analyses of the literature dating back at least to the 1970s. There's even a What Works Clearinghouse summary of the (post World Wide Web) college admission test prep research literature:  https://ies.ed.gov/ncee/wwc/Docs/InterventionReports/wwc_act_sat_100416.pdf . See also:  Ortar (1960)  Marron (1965)  ETS (1965). Messick & Jungeblut (1981)  Ellis, Konoske, Wulfeck, & Montague (1982)  DerSimonian and Laird (1983)  Kulik, Bangert-Drowns & Kulik (1984)  Powers (1985)  Jones (1986). Fraker (1986/1987)  Halpin (1987)  Whitla (1988)  Snedecor (1989)  Bond (1989). Baydar (1990)  Becker (1990)  Smyth (1990)  Moore (1991)  Alderson & Wall (1992)  Powers (1993)  Oren (1993). Powers & Rock (1994)  Scholes, Lane (1997)   Allalouf & Ben Shakhar (1998)  Robb & Ercanbrack (1999)  McClain (1999)  Camara (1999, 2001, 2008) Stone & Lane (2000, 2003)  Din & Soldan (2001)  Briggs (2001)  Palmer (2002)  Briggs & Hansen (2004)  Cankoy & Ali Tut (2005)  Crocker (2005)  Allensworth, Correa, & Ponisciak (2008)  Domingue & Briggs (2009)  Koljatic & Silva (2014)  Early (2019)  
68 Laura S. Hamilton   “Despite the number of studies investigating affective aspects of test taking, little is known about how students perceive the kinds of extended performance assessments currently being developed for state and local testing programs.” - Abstract Denigrating An Investigation of Students' Affective Responses to Alternative Assessment Formats Paper presented at the Annual Meeting of the National Council on Measurement in Education (New Orleans, LA, April 5-7, 1994) http://files.eric.ed.gov/fulltext/ED376203.pdf "At least twelve pre-2004 student surveys were included here: https://www.nonpartisaneducation.org/Review/Resources/SurveyList.htm. See also: https://richardphelps.net/DemandForStandardizedTesting.pdf
In 150 of 241 qualitative studies the focus of the interviews, case study, or observations was the effect of testing stakes on students. See: https://www.tandfonline.com/doi/abs/10.1080/15305058.2011.602920"
 
69 Laura S. Hamilton   “As stated earlier, this study was not intended to produce results that could be generalized to other tasks or to other samples of students, but to identify questions that might be addressed by future studies and to suggest possible hypotheses.” p. 23 Dismissive An Investigation of Students' Affective Responses to Alternative Assessment Formats Paper presented at the Annual Meeting of the National Council on Measurement in Education (New Orleans, LA, April 5-7, 1994) http://files.eric.ed.gov/fulltext/ED376203.pdf "At least twelve pre-2004 student surveys were included here: https://www.nonpartisaneducation.org/Review/Resources/SurveyList.htm. See also: https://richardphelps.net/DemandForStandardizedTesting.pdf
In 150 of 241 qualitative studies the focus of the interviews, case study, or observations was the effect of testing stakes on students. See: https://www.tandfonline.com/doi/abs/10.1080/15305058.2011.602920"
 
                   
  IRONIES:                
  Laura S. Hamilton   "In the meantime, it will be important to continue to gather evidence, both from large-scale studies and from the individual experiences of teachers, administrators, and others affected by test-based accountability, and to make that evidence available so that it can inform efforts to improve accountability systems." p.57   Assessment as a Policy Tool Chapter 2, in Review of Research in Education (27), 2003 https://journals.sagepub.com/doi/pdf/10.3102/0091732X027001025    
  Laura S. Hamilton Brian M. Stecher, Stephen P. Klein "Greater knowledge about testing and accountability can lead to better system design and more-effective system management." p.xiv   Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002 Summary, p.xiv https://www.rand.org/content/dam/rand/pubs/monograph_reports/2002/MR1554.pdf    
  Laura S. Hamilton Brian M. Stecher "Incremental improvements to existing systems, based on current research on testing and accountability, should be combined with long-term research and development efforts that may ultimately lead to a major redesign of these systems. Success in this endeavor will require the thoughtful engagement of educators, policymakers, and researchers in discussions and debates about tests and testing policies."   Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002 Chapter 6, Improving test-based accountability, pp.143-144 https://www.rand.org/content/dam/rand/pubs/monograph_reports/2002/MR1554.pdf    
  Christian Bourge, journalist Laura S. Hamilton, interviewee ""The fact that there is a high correlation (between the high-stakes and low-stake tests) doesn't necessarily mean the tests are telling you the same thing."   Experts differ about high-stakes testing UPI, Feb. 13, 2003  https://www.upi.com/Top_News/2003/02/13/Experts-differ-about-high-stakes-testing/60271045180206/    
  Christian Bourge, journalist Laura S. Hamilton, interviewee "She said that without examining the similarities in the design of each test, you do not know how comparable they are from district to district."   Experts differ about high-stakes testing UPI, Feb. 13, 2003  https://www.upi.com/Top_News/2003/02/13/Experts-differ-about-high-stakes-testing/60271045180206/    
                   
      Cite selves or colleagues in the group, but dismiss or denigrate all other work            
      Falsely claim that research has only recently been done on topic.            
      Author cites (and accepts as fact without checking) someone elses dismissive review