HOME:  Dismissive Reviews in Education Policy Research        
  Author Co-author(s) Dismissive Quote type Title Source Link1 Notes  
1 Laura S. Hamilton Brian M. Stecher, Kun Yuan "He also noted that virtually all of the arguments, both for and against standards, are based on beliefs and hypotheses rather than on direct empirical evidence” (p. 427) Although a large and growing body of research has been conducted to examine the effects of SBA, the caution Porter expressed in 1994 about the lack of empirical evidence remains relevant today." pp.157-158 Denigrating Standards-Based Accountability in the United States: Lessons Learned and Future Directions Education Inquiry, 3(2), June 2012, 149-170 https://www.academia.edu/15201890/Standards_Based_Accountability_in_the_United_States_Lessons_Learned_and_Future_Directions_1 Relevant pre-2000 studies of the effects of standards, alignment, goal setting, setting reachable goals, etc. include those of Mitchell (1999); Morgan & Ramist (1998); the *Southern Regional Education Board (1998); Miles, Bishop, Collins, Fink, Gardner, Grant, Hussain, et al. (1997); the Florida Office of Program Policy Analysis (1997); Pomplun (1997); Schmoker (1996); Aguilera & Hendricks (1996); Banta, Lund, Black & Oblander (1996); Bottoms & Mikos (1995); *Bamburg & Medina (1993); Bishop (1993); the U. S. General Accounting Office (1993); Eckstein & Noah (1993); Mattsson (1993); Brown (1992); Heyneman & Ransom (1992); Whetton (1992); Anderson, Muir, Bateson, Blackmore & Rogers (1990); Csikszentmihalyi (1990); *Levine & Lezotte (1990); LaRoque & Coleman (1989); Hillocks (1987); Willingham & Morris (1986); Resnick & Resnick (1985); Ogle & Fritts (1984); *Natriello & Dornbusch (1984); Brooke & Oxenham (1984); Rentz (1979); Wellisch, MacQueen, Carriere & Dick (1978); *Rosswork (1977); Estes, Colvin & Goodwin (1976); Wood (1953); and Panlasigui & Knight (1930).  
2 Laura S. Hamilton Brian M. Stecher, Kun Yuan "High-quality research on the effects of SBA is difficult to conduct for a number of reasons,…." p.158 Dismissive Standards-Based Accountability in the United States: Lessons Learned and Future Directions Education Inquiry, 3(2), June 2012, 149-170 https://www.academia.edu/15201890/Standards_Based_Accountability_in_the_United_States_Lessons_Learned_and_Future_Directions_1 Access to anonymized student data is granted all the time. Externally administered high-stakes testing is widely reviled among US educationists. It strains credulity that one can not find one or a few districts out of the many thousands to cooperate in a study to discredit testing.  
3 Laura S. Hamilton Brian M. Stecher, Kun Yuan "Even when the necessary data have been collected by states or other entities, it is often difficult for researchers to obtain these data because those responsible for the data refuse to grant access, either because of concerns about confidentiality or because they are not interested in having their programmes scrutinised by. researchers. Thus, the amount of rigorous analysis is limited." p.158 Dismissive Standards-Based Accountability in the United States: Lessons Learned and Future Directions Education Inquiry, 3(2), June 2012, 149-170 https://www.academia.edu/15201890/Standards_Based_Accountability_in_the_United_States_Lessons_Learned_and_Future_Directions_1 Access to anonymized student data is granted all the time. Externally administered high-stakes testing is widely reviled among US educationists. It strains credulity that one can not find one or a few districts out of the many thousands to cooperate in a study to discredit testing.  
4 Laura S. Hamilton Brian M. Stecher, Kun Yuan "These evaluation findings reveal the challenges inherent in trying to judge the quality of standards. Arguably the most important test of quality is whether the standards promote high-quality instruction and improved student learning but, as we discuss later, there is very little research to address that question." p.158 Dismissive Standards-Based Accountability in the United States: Lessons Learned and Future Directions Education Inquiry, 3(2), June 2012, 149-170 https://www.academia.edu/15201890/Standards_Based_Accountability_in_the_United_States_Lessons_Learned_and_Future_Directions_1 Relevant pre-2000 studies of the effects of standards, alignment, goal setting, setting reachable goals, etc. include those of Mitchell (1999); Morgan & Ramist (1998); the *Southern Regional Education Board (1998); Miles, Bishop, Collins, Fink, Gardner, Grant, Hussain, et al. (1997); the Florida Office of Program Policy Analysis (1997); Pomplun (1997); Schmoker (1996); Aguilera & Hendricks (1996); Banta, Lund, Black & Oblander (1996); Bottoms & Mikos (1995); *Bamburg & Medina (1993); Bishop (1993); the U. S. General Accounting Office (1993); Eckstein & Noah (1993); Mattsson (1993); Brown (1992); Heyneman & Ransom (1992); Whetton (1992); Anderson, Muir, Bateson, Blackmore & Rogers (1990); Csikszentmihalyi (1990); *Levine & Lezotte (1990); LaRoque & Coleman (1989); Hillocks (1987); Willingham & Morris (1986); Resnick & Resnick (1985); Ogle & Fritts (1984); *Natriello & Dornbusch (1984); Brooke & Oxenham (1984); Rentz (1979); Wellisch, MacQueen, Carriere & Dick (1978); *Rosswork (1977); Estes, Colvin & Goodwin (1976); Wood (1953); and Panlasigui & Knight (1930).  
5 Laura S. Hamilton Brian M. Stecher, Kun Yuan "In fact, the bulk of research relevant to SBA has focused on the links between high-stakes tests and educators’ practices rather than standards and practices." p.159 Dismissive Standards-Based Accountability in the United States: Lessons Learned and Future Directions Education Inquiry, 3(2), June 2012, 149-170 https://www.academia.edu/15201890/Standards_Based_Accountability_in_the_United_States_Lessons_Learned_and_Future_Directions_1 Relevant pre-2000 studies of the effects of standards, alignment, goal setting, setting reachable goals, etc. include those of Mitchell (1999); Morgan & Ramist (1998); the *Southern Regional Education Board (1998); Miles, Bishop, Collins, Fink, Gardner, Grant, Hussain, et al. (1997); the Florida Office of Program Policy Analysis (1997); Pomplun (1997); Schmoker (1996); Aguilera & Hendricks (1996); Banta, Lund, Black & Oblander (1996); Bottoms & Mikos (1995); *Bamburg & Medina (1993); Bishop (1993); the U. S. General Accounting Office (1993); Eckstein & Noah (1993); Mattsson (1993); Brown (1992); Heyneman & Ransom (1992); Whetton (1992); Anderson, Muir, Bateson, Blackmore & Rogers (1990); Csikszentmihalyi (1990); *Levine & Lezotte (1990); LaRoque & Coleman (1989); Hillocks (1987); Willingham & Morris (1986); Resnick & Resnick (1985); Ogle & Fritts (1984); *Natriello & Dornbusch (1984); Brooke & Oxenham (1984); Rentz (1979); Wellisch, MacQueen, Carriere & Dick (1978); *Rosswork (1977); Estes, Colvin & Goodwin (1976); Wood (1953); and Panlasigui & Knight (1930).  
6 Laura S. Hamilton Brian M. Stecher, Kun Yuan "The existing evidence does not provide definitive guidance regarding the SBA system features that would be most likely to promote desirable outcomes." p.163 Dismissive Standards-Based Accountability in the United States: Lessons Learned and Future Directions Education Inquiry, 3(2), June 2012, 149-170 https://www.academia.edu/15201890/Standards_Based_Accountability_in_the_United_States_Lessons_Learned_and_Future_Directions_1 Relevant pre-2000 studies of the effects of standards, alignment, goal setting, setting reachable goals, etc. include those of Mitchell (1999); Morgan & Ramist (1998); the *Southern Regional Education Board (1998); Miles, Bishop, Collins, Fink, Gardner, Grant, Hussain, et al. (1997); the Florida Office of Program Policy Analysis (1997); Pomplun (1997); Schmoker (1996); Aguilera & Hendricks (1996); Banta, Lund, Black & Oblander (1996); Bottoms & Mikos (1995); *Bamburg & Medina (1993); Bishop (1993); the U. S. General Accounting Office (1993); Eckstein & Noah (1993); Mattsson (1993); Brown (1992); Heyneman & Ransom (1992); Whetton (1992); Anderson, Muir, Bateson, Blackmore & Rogers (1990); Csikszentmihalyi (1990); *Levine & Lezotte (1990); LaRoque & Coleman (1989); Hillocks (1987); Willingham & Morris (1986); Resnick & Resnick (1985); Ogle & Fritts (1984); *Natriello & Dornbusch (1984); Brooke & Oxenham (1984); Rentz (1979); Wellisch, MacQueen, Carriere & Dick (1978); *Rosswork (1977); Estes, Colvin & Goodwin (1976); Wood (1953); and Panlasigui & Knight (1930).  
7 Laura S. Hamilton   "Despite the widespread enthusiasm for assessment-based reforms, many of the current and proposed uses of large-scale assessments are based on unverified assumptions about the extent to which they will actually lead to improved teaching and learning, and insufficient attention has been paid to the characteristics of assessment programs that are likely to promote desired outcomes." Denigrating Testing What Has Been Taught, p.47 American Educator, Winter 2010-2011 https://www.aft.org/sites/default/files/periodicals/Hamilton.pdf Relevant studies of the effects of varying types of incentive or the optimal structure of testing prorgrams include those of Kelley (1999); the *Southern Regional Education Board (1998); Trelfa (1998); Heneman (1998); Banta, Lund, Black & Oblander (1996); Brooks-Cooper, 1993; Eckstein & Noah (1993); Richards & Shen (1992); Jacobson (1992); Heyneman & Ransom (1992); *Levine & Lezotte (1990); Duran, 1989; *Crooks (1988); *Kulik & Kulik (1987); Corcoran & Wilson (1986); *Guskey & Gates (1986); Brook & Oxenham (1985); Oxenham (1984); Venezky & Winfield (1979); Brookover & Lezotte (1979); McMillan (1977); Abbott (1977); *Staats (1973); *Kazdin & Bootzin (1972); *O’Leary & Drabman (1971); Cronbach (1960); Hurlock (1925), and Zeng (2001). *Covers many studies; study is a research review, research synthesis, or meta-analysis.  Other researchers who, even prior to 2000, studied test-based incentive programs include Homme, Csanyi, Gonzales, Rechs, O’Leary, Drabman, Kaszdin, Bootzin, Staats, Cameron, Pierce, McMillan, Corcoran, Roueche, Kirk, Wheeler, Boylan, and Wilson. "Others have considered the role of tests in incentive programs.  These researchers have included Homme, Csanyi, Gonzales, Rechs, O’Leary, Drabman, Kaszdin, Bootzin, Staats, Cameron, Pierce, McMillan, Corcoran, and Wilson. International organizations, such as the World Bank or the Asian Development Bank, have studied the effects of testing on education programs they sponsor.  Researchers have included Somerset, Heynemann, Ransom, Psacharopoulis, Velez, Brooke, Oxenham, Bude, Chapman, Snyder, and Pronaratna.
Moreover, the mastery learning/mastery testing experiments conducted from the 1960s through today varied incentives, frequency of tests, types of tests, and many other factors to determine the optimal structure of testing programs. Researchers included such notables as Bloom, Carroll, Keller, Block, Burns, Wentling, Anderson, Hymel, Kulik, Tierney, Cross, Okey, Guskey, Gates, and Jones."
8 Laura S. Hamilton   "Can assessments meaningfully be aligned to standards, … What would the key features of an assessment system designed to increase student learning and improve instruction be? While current assessment knowledge is not sufficient to fully answer these questions, in this article I offer an overview of what is known and several suggestions for improving our approach to assessment." Denigrating Testing What Has Been Taught, p.47 American Educator, Winter 2010-2011 https://www.aft.org/sites/default/files/periodicals/Hamilton.pdf Relevant studies of the effects of varying types of incentive or the optimal structure of testing prorgrams include those of Kelley (1999); the *Southern Regional Education Board (1998); Trelfa (1998); Heneman (1998); Banta, Lund, Black & Oblander (1996); Brooks-Cooper, 1993; Eckstein & Noah (1993); Richards & Shen (1992); Jacobson (1992); Heyneman & Ransom (1992); *Levine & Lezotte (1990); Duran, 1989; *Crooks (1988); *Kulik & Kulik (1987); Corcoran & Wilson (1986); *Guskey & Gates (1986); Brook & Oxenham (1985); Oxenham (1984); Venezky & Winfield (1979); Brookover & Lezotte (1979); McMillan (1977); Abbott (1977); *Staats (1973); *Kazdin & Bootzin (1972); *O’Leary & Drabman (1971); Cronbach (1960); Hurlock (1925), and Zeng (2001). *Covers many studies; study is a research review, research synthesis, or meta-analysis.  Other researchers who, even prior to 2000, studied test-based incentive programs include Homme, Csanyi, Gonzales, Rechs, O’Leary, Drabman, Kaszdin, Bootzin, Staats, Cameron, Pierce, McMillan, Corcoran, Roueche, Kirk, Wheeler, Boylan, and Wilson. "Others have considered the role of tests in incentive programs.  These researchers have included Homme, Csanyi, Gonzales, Rechs, O’Leary, Drabman, Kaszdin, Bootzin, Staats, Cameron, Pierce, McMillan, Corcoran, and Wilson. International organizations, such as the World Bank or the Asian Development Bank, have studied the effects of testing on education programs they sponsor.  Researchers have included Somerset, Heynemann, Ransom, Psacharopoulis, Velez, Brooke, Oxenham, Bude, Chapman, Snyder, and Pronaratna.
Moreover, the mastery learning/mastery testing experiments conducted from the 1960s through today varied incentives, frequency of tests, types of tests, and many other factors to determine the optimal structure of testing programs. Researchers included such notables as Bloom, Carroll, Keller, Block, Burns, Wentling, Anderson, Hymel, Kulik, Tierney, Cross, Okey, Guskey, Gates, and Jones."
9 Laura S. Hamilton   "There is no research evidence to tell us definitively how to build an assessment system that will promote student learning and be resistent to the negative consequences that are common in high-stakes testing programs." Dismissive Testing What Has Been Taught, p.49 American Educator, Winter 2010-2011 https://www.aft.org/sites/default/files/periodicals/Hamilton.pdf Relevant studies of the effects of varying types of incentive or the optimal structure of testing prorgrams include those of Kelley (1999); the *Southern Regional Education Board (1998); Trelfa (1998); Heneman (1998); Banta, Lund, Black & Oblander (1996); Brooks-Cooper, 1993; Eckstein & Noah (1993); Richards & Shen (1992); Jacobson (1992); Heyneman & Ransom (1992); *Levine & Lezotte (1990); Duran, 1989; *Crooks (1988); *Kulik & Kulik (1987); Corcoran & Wilson (1986); *Guskey & Gates (1986); Brook & Oxenham (1985); Oxenham (1984); Venezky & Winfield (1979); Brookover & Lezotte (1979); McMillan (1977); Abbott (1977); *Staats (1973); *Kazdin & Bootzin (1972); *O’Leary & Drabman (1971); Cronbach (1960); Hurlock (1925), and Zeng (2001). *Covers many studies; study is a research review, research synthesis, or meta-analysis.  Other researchers who, even prior to 2000, studied test-based incentive programs include Homme, Csanyi, Gonzales, Rechs, O’Leary, Drabman, Kaszdin, Bootzin, Staats, Cameron, Pierce, McMillan, Corcoran, Roueche, Kirk, Wheeler, Boylan, and Wilson. "Others have considered the role of tests in incentive programs.  These researchers have included Homme, Csanyi, Gonzales, Rechs, O’Leary, Drabman, Kaszdin, Bootzin, Staats, Cameron, Pierce, McMillan, Corcoran, and Wilson. International organizations, such as the World Bank or the Asian Development Bank, have studied the effects of testing on education programs they sponsor.  Researchers have included Somerset, Heynemann, Ransom, Psacharopoulis, Velez, Brooke, Oxenham, Bude, Chapman, Snyder, and Pronaratna.
Moreover, the mastery learning/mastery testing experiments conducted from the 1960s through today varied incentives, frequency of tests, types of tests, and many other factors to determine the optimal structure of testing programs. Researchers included such notables as Bloom, Carroll, Keller, Block, Burns, Wentling, Anderson, Hymel, Kulik, Tierney, Cross, Okey, Guskey, Gates, and Jones."
10 Laura S. Hamilton   Research on the effects of various assessment-design features is limited, so any effort that relies heavily on assessment as a tool for school improvement should be carried out with caution." Denigrating Testing What Has Been Taught, p.50 American Educator, Winter 2010-2011 https://www.aft.org/sites/default/files/periodicals/Hamilton.pdf Relevant studies of the effects of varying types of incentive or the optimal structure of testing prorgrams include those of Kelley (1999); the *Southern Regional Education Board (1998); Trelfa (1998); Heneman (1998); Banta, Lund, Black & Oblander (1996); Brooks-Cooper, 1993; Eckstein & Noah (1993); Richards & Shen (1992); Jacobson (1992); Heyneman & Ransom (1992); *Levine & Lezotte (1990); Duran, 1989; *Crooks (1988); *Kulik & Kulik (1987); Corcoran & Wilson (1986); *Guskey & Gates (1986); Brook & Oxenham (1985); Oxenham (1984); Venezky & Winfield (1979); Brookover & Lezotte (1979); McMillan (1977); Abbott (1977); *Staats (1973); *Kazdin & Bootzin (1972); *O’Leary & Drabman (1971); Cronbach (1960); Hurlock (1925), and Zeng (2001). *Covers many studies; study is a research review, research synthesis, or meta-analysis.  Other researchers who, even prior to 2000, studied test-based incentive programs include Homme, Csanyi, Gonzales, Rechs, O’Leary, Drabman, Kaszdin, Bootzin, Staats, Cameron, Pierce, McMillan, Corcoran, Roueche, Kirk, Wheeler, Boylan, and Wilson. "Others have considered the role of tests in incentive programs.  These researchers have included Homme, Csanyi, Gonzales, Rechs, O’Leary, Drabman, Kaszdin, Bootzin, Staats, Cameron, Pierce, McMillan, Corcoran, and Wilson. International organizations, such as the World Bank or the Asian Development Bank, have studied the effects of testing on education programs they sponsor.  Researchers have included Somerset, Heynemann, Ransom, Psacharopoulis, Velez, Brooke, Oxenham, Bude, Chapman, Snyder, and Pronaratna.
Moreover, the mastery learning/mastery testing experiments conducted from the 1960s through today varied incentives, frequency of tests, types of tests, and many other factors to determine the optimal structure of testing programs. Researchers included such notables as Bloom, Carroll, Keller, Block, Burns, Wentling, Anderson, Hymel, Kulik, Tierney, Cross, Okey, Guskey, Gates, and Jones."
11 Laura S. Hamilton Brian M. Stecher, Kun Yuan “A few studies have attempted to examine how the creation and publication of standards, per se, have affected practices.” p. 3 Dismissive Standards-Based Reform in the United States: History, Research, and Future Directions Center on Education Policy, December, 2008 http://www.rand.org/content/dam/rand/pubs/reprints/2009/RAND_RP1384.pdf Relevant pre-2000 studies of the effects of standards, alignment, goal setting, setting reachable goals, etc. include those of Mitchell (1999); Morgan & Ramist (1998); the *Southern Regional Education Board (1998); Miles, Bishop, Collins, Fink, Gardner, Grant, Hussain, et al. (1997); the Florida Office of Program Policy Analysis (1997); Pomplun (1997); Schmoker (1996); Aguilera & Hendricks (1996); Banta, Lund, Black & Oblander (1996); Bottoms & Mikos (1995); *Bamburg & Medina (1993); Bishop (1993); the U. S. General Accounting Office (1993); Eckstein & Noah (1993); Mattsson (1993); Brown (1992); Heyneman & Ransom (1992); Whetton (1992); Anderson, Muir, Bateson, Blackmore & Rogers (1990); Csikszentmihalyi (1990); *Levine & Lezotte (1990); LaRoque & Coleman (1989); Hillocks (1987); Willingham & Morris (1986); Resnick & Resnick (1985); Ogle & Fritts (1984); *Natriello & Dornbusch (1984); Brooke & Oxenham (1984); Rentz (1979); Wellisch, MacQueen, Carriere & Dick (1978); *Rosswork (1977); Estes, Colvin & Goodwin (1976); Wood (1953); and Panlasigui & Knight (1930).  
12 Laura S. Hamilton Brian M. Stecher, Kun Yuan “The research evidence does not provide definitive answers to these questions.” p. 6 Denigrating Standards-Based Reform in the United States: History, Research, and Future Directions Center on Education Policy, December, 2008 http://www.rand.org/content/dam/rand/pubs/reprints/2009/RAND_RP1384.pdf Relevant pre-2000 studies of the effects of standards, alignment, goal setting, setting reachable goals, etc. include those of Mitchell (1999); Morgan & Ramist (1998); the *Southern Regional Education Board (1998); Miles, Bishop, Collins, Fink, Gardner, Grant, Hussain, et al. (1997); the Florida Office of Program Policy Analysis (1997); Pomplun (1997); Schmoker (1996); Aguilera & Hendricks (1996); Banta, Lund, Black & Oblander (1996); Bottoms & Mikos (1995); *Bamburg & Medina (1993); Bishop (1993); the U. S. General Accounting Office (1993); Eckstein & Noah (1993); Mattsson (1993); Brown (1992); Heyneman & Ransom (1992); Whetton (1992); Anderson, Muir, Bateson, Blackmore & Rogers (1990); Csikszentmihalyi (1990); *Levine & Lezotte (1990); LaRoque & Coleman (1989); Hillocks (1987); Willingham & Morris (1986); Resnick & Resnick (1985); Ogle & Fritts (1984); *Natriello & Dornbusch (1984); Brooke & Oxenham (1984); Rentz (1979); Wellisch, MacQueen, Carriere & Dick (1978); *Rosswork (1977); Estes, Colvin & Goodwin (1976); Wood (1953); and Panlasigui & Knight (1930).  
13 Laura S. Hamilton Brian M. Stecher, Kun Yuan “He [Poynter 1994] also noted that ‘virtually all of the arguments, both for and against standards, are based on beliefs and hypotheses rather than on direct empirical evidence’ (p. 427). Although a large and growing body of research has been conducted to examine the effects of SBR, the caution Poynter expressed in 1994 about the lack of empirical evidence remains relevant today.” pp. 34-35 Dismissive Standards-Based Reform in the United States: History, Research, and Future Directions Center on Education Policy, December, 2008 http://www.rand.org/content/dam/rand/pubs/reprints/2009/RAND_RP1384.pdf Relevant pre-2000 studies of the effects of standards, alignment, goal setting, setting reachable goals, etc. include those of Mitchell (1999); Morgan & Ramist (1998); the *Southern Regional Education Board (1998); Miles, Bishop, Collins, Fink, Gardner, Grant, Hussain, et al. (1997); the Florida Office of Program Policy Analysis (1997); Pomplun (1997); Schmoker (1996); Aguilera & Hendricks (1996); Banta, Lund, Black & Oblander (1996); Bottoms & Mikos (1995); *Bamburg & Medina (1993); Bishop (1993); the U. S. General Accounting Office (1993); Eckstein & Noah (1993); Mattsson (1993); Brown (1992); Heyneman & Ransom (1992); Whetton (1992); Anderson, Muir, Bateson, Blackmore & Rogers (1990); Csikszentmihalyi (1990); *Levine & Lezotte (1990); LaRoque & Coleman (1989); Hillocks (1987); Willingham & Morris (1986); Resnick & Resnick (1985); Ogle & Fritts (1984); *Natriello & Dornbusch (1984); Brooke & Oxenham (1984); Rentz (1979); Wellisch, MacQueen, Carriere & Dick (1978); *Rosswork (1977); Estes, Colvin & Goodwin (1976); Wood (1953); and Panlasigui & Knight (1930).  
14 Laura S. Hamilton Brian M. Stecher, Kun Yuan “Arguably the most important test of quality is whether the standards promote high-quality instruction and improved student learning, but as we discuss later, there is very little research to address that question.” p. 37 Dismissive Standards-Based Reform in the United States: History, Research, and Future Directions Center on Education Policy, December, 2008 http://www.rand.org/content/dam/rand/pubs/reprints/2009/RAND_RP1384.pdf Relevant pre-2000 studies of the effects of standards, alignment, goal setting, setting reachable goals, etc. include those of Mitchell (1999); Morgan & Ramist (1998); the *Southern Regional Education Board (1998); Miles, Bishop, Collins, Fink, Gardner, Grant, Hussain, et al. (1997); the Florida Office of Program Policy Analysis (1997); Pomplun (1997); Schmoker (1996); Aguilera & Hendricks (1996); Banta, Lund, Black & Oblander (1996); Bottoms & Mikos (1995); *Bamburg & Medina (1993); Bishop (1993); the U. S. General Accounting Office (1993); Eckstein & Noah (1993); Mattsson (1993); Brown (1992); Heyneman & Ransom (1992); Whetton (1992); Anderson, Muir, Bateson, Blackmore & Rogers (1990); Csikszentmihalyi (1990); *Levine & Lezotte (1990); LaRoque & Coleman (1989); Hillocks (1987); Willingham & Morris (1986); Resnick & Resnick (1985); Ogle & Fritts (1984); *Natriello & Dornbusch (1984); Brooke & Oxenham (1984); Rentz (1979); Wellisch, MacQueen, Carriere & Dick (1978); *Rosswork (1977); Estes, Colvin & Goodwin (1976); Wood (1953); and Panlasigui & Knight (1930).  
15 Laura S. Hamilton Brian M. Stecher, Kun Yuan “[T]here have been a few studies of SBR as a comprehensive system. . . . [T]here is some research on how the adoption of standards, per se, or the alignment of standards with curriculum influences school practices or student outcomes.” p. 38 Dismissive Standards-Based Reform in the United States: History, Research, and Future Directions Center on Education Policy, December, 2008 http://www.rand.org/content/dam/rand/pubs/reprints/2009/RAND_RP1384.pdf Relevant studies of the effects of varying types of incentive or the optimal structure of testing prorgrams include those of Kelley (1999); the *Southern Regional Education Board (1998); Trelfa (1998); Heneman (1998); Banta, Lund, Black & Oblander (1996); Brooks-Cooper, 1993; Eckstein & Noah (1993); Richards & Shen (1992); Jacobson (1992); Heyneman & Ransom (1992); *Levine & Lezotte (1990); Duran, 1989; *Crooks (1988); *Kulik & Kulik (1987); Corcoran & Wilson (1986); *Guskey & Gates (1986); Brook & Oxenham (1985); Oxenham (1984); Venezky & Winfield (1979); Brookover & Lezotte (1979); McMillan (1977); Abbott (1977); *Staats (1973); *Kazdin & Bootzin (1972); *O’Leary & Drabman (1971); Cronbach (1960); Hurlock (1925), and Zeng (2001). *Covers many studies; study is a research review, research synthesis, or meta-analysis.  Other researchers who, even prior to 2000, studied test-based incentive programs include Homme, Csanyi, Gonzales, Rechs, O’Leary, Drabman, Kaszdin, Bootzin, Staats, Cameron, Pierce, McMillan, Corcoran, Roueche, Kirk, Wheeler, Boylan, and Wilson. "Others have considered the role of tests in incentive programs.  These researchers have included Homme, Csanyi, Gonzales, Rechs, O’Leary, Drabman, Kaszdin, Bootzin, Staats, Cameron, Pierce, McMillan, Corcoran, and Wilson. International organizations, such as the World Bank or the Asian Development Bank, have studied the effects of testing on education programs they sponsor.  Researchers have included Somerset, Heynemann, Ransom, Psacharopoulis, Velez, Brooke, Oxenham, Bude, Chapman, Snyder, and Pronaratna.
Moreover, the mastery learning/mastery testing experiments conducted from the 1960s through today varied incentives, frequency of tests, types of tests, and many other factors to determine the optimal structure of testing programs. Researchers included such notables as Bloom, Carroll, Keller, Block, Burns, Wentling, Anderson, Hymel, Kulik, Tierney, Cross, Okey, Guskey, Gates, and Jones."
16 Laura S. Hamilton Brian M. Stecher, Kun Yuan “The lack of evidence about the effects of SBR derives primarily from the fact that the vision has never been fully realized in practice.” p. 47 Dismissive Standards-Based Reform in the United States: History, Research, and Future Directions Center on Education Policy, December, 2008 http://www.rand.org/content/dam/rand/pubs/reprints/2009/RAND_RP1384.pdf Relevant studies of the effects of varying types of incentive or the optimal structure of testing prorgrams include those of Kelley (1999); the *Southern Regional Education Board (1998); Trelfa (1998); Heneman (1998); Banta, Lund, Black & Oblander (1996); Brooks-Cooper, 1993; Eckstein & Noah (1993); Richards & Shen (1992); Jacobson (1992); Heyneman & Ransom (1992); *Levine & Lezotte (1990); Duran, 1989; *Crooks (1988); *Kulik & Kulik (1987); Corcoran & Wilson (1986); *Guskey & Gates (1986); Brook & Oxenham (1985); Oxenham (1984); Venezky & Winfield (1979); Brookover & Lezotte (1979); McMillan (1977); Abbott (1977); *Staats (1973); *Kazdin & Bootzin (1972); *O’Leary & Drabman (1971); Cronbach (1960); Hurlock (1925), and Zeng (2001). *Covers many studies; study is a research review, research synthesis, or meta-analysis.  Other researchers who, even prior to 2000, studied test-based incentive programs include Homme, Csanyi, Gonzales, Rechs, O’Leary, Drabman, Kaszdin, Bootzin, Staats, Cameron, Pierce, McMillan, Corcoran, Roueche, Kirk, Wheeler, Boylan, and Wilson. "Others have considered the role of tests in incentive programs.  These researchers have included Homme, Csanyi, Gonzales, Rechs, O’Leary, Drabman, Kaszdin, Bootzin, Staats, Cameron, Pierce, McMillan, Corcoran, and Wilson. International organizations, such as the World Bank or the Asian Development Bank, have studied the effects of testing on education programs they sponsor.  Researchers have included Somerset, Heynemann, Ransom, Psacharopoulis, Velez, Brooke, Oxenham, Bude, Chapman, Snyder, and Pronaratna.
Moreover, the mastery learning/mastery testing experiments conducted from the 1960s through today varied incentives, frequency of tests, types of tests, and many other factors to determine the optimal structure of testing programs. Researchers included such notables as Bloom, Carroll, Keller, Block, Burns, Wentling, Anderson, Hymel, Kulik, Tierney, Cross, Okey, Guskey, Gates, and Jones."
17 Laura S. Hamilton Brian M. Stecher, Kun Yuan “[A]lthough many conceptions of SBR emphasize autonomy, we currently know relatively little about the effects of granting autonomy or what the right balance is between autonomy and prescriptiveness.” p. 55 Dismissive Standards-Based Reform in the United States: History, Research, and Future Directions Center on Education Policy, December, 2008 http://www.rand.org/content/dam/rand/pubs/reprints/2009/RAND_RP1384.pdf Relevant pre-2000 studies of the effects of standards, alignment, goal setting, setting reachable goals, etc. include those of Mitchell (1999); Morgan & Ramist (1998); the *Southern Regional Education Board (1998); Miles, Bishop, Collins, Fink, Gardner, Grant, Hussain, et al. (1997); the Florida Office of Program Policy Analysis (1997); Pomplun (1997); Schmoker (1996); Aguilera & Hendricks (1996); Banta, Lund, Black & Oblander (1996); Bottoms & Mikos (1995); *Bamburg & Medina (1993); Bishop (1993); the U. S. General Accounting Office (1993); Eckstein & Noah (1993); Mattsson (1993); Brown (1992); Heyneman & Ransom (1992); Whetton (1992); Anderson, Muir, Bateson, Blackmore & Rogers (1990); Csikszentmihalyi (1990); *Levine & Lezotte (1990); LaRoque & Coleman (1989); Hillocks (1987); Willingham & Morris (1986); Resnick & Resnick (1985); Ogle & Fritts (1984); *Natriello & Dornbusch (1984); Brooke & Oxenham (1984); Rentz (1979); Wellisch, MacQueen, Carriere & Dick (1978); *Rosswork (1977); Estes, Colvin & Goodwin (1976); Wood (1953); and Panlasigui & Knight (1930).  
18 Laura S. Hamilton Brian M. Stecher, Kun Yuan “One of the primary responsibilities of the federal government should be to ensure ongoing collection of evidence demonstrating the effects of the policies, which could be used to make decisions about whether to continue on the current course or whether small adjustments or a major overhaul are needed.” p. 55 Dismissive Standards-Based Reform in the United States: History, Research, and Future Directions Center on Education Policy, December, 2008 http://www.rand.org/content/dam/rand/pubs/reprints/2009/RAND_RP1384.pdf Relevant pre-2000 studies of the effects of standards, alignment, goal setting, setting reachable goals, etc. include those of Mitchell (1999); Morgan & Ramist (1998); the *Southern Regional Education Board (1998); Miles, Bishop, Collins, Fink, Gardner, Grant, Hussain, et al. (1997); the Florida Office of Program Policy Analysis (1997); Pomplun (1997); Schmoker (1996); Aguilera & Hendricks (1996); Banta, Lund, Black & Oblander (1996); Bottoms & Mikos (1995); *Bamburg & Medina (1993); Bishop (1993); the U. S. General Accounting Office (1993); Eckstein & Noah (1993); Mattsson (1993); Brown (1992); Heyneman & Ransom (1992); Whetton (1992); Anderson, Muir, Bateson, Blackmore & Rogers (1990); Csikszentmihalyi (1990); *Levine & Lezotte (1990); LaRoque & Coleman (1989); Hillocks (1987); Willingham & Morris (1986); Resnick & Resnick (1985); Ogle & Fritts (1984); *Natriello & Dornbusch (1984); Brooke & Oxenham (1984); Rentz (1979); Wellisch, MacQueen, Carriere & Dick (1978); *Rosswork (1977); Estes, Colvin & Goodwin (1976); Wood (1953); and Panlasigui & Knight (1930).  
19 Laura S. Hamilton Brian M. Stecher, Julie A. Marsh, Jennifer Sloan McCombs, Abby Robyn, Jennifer Lin Russell, Scott Naftel, Heather Barney "However, the paths through which SBA [standards-based accountability] changes district, school, and classroom practices and how these changes in practice influence student outcomes are largely unexplored. There is strong evidence that SBA leads to changes in teachers’ instructional practices (Hamilton, 2004; Stecher, 2002)." p.5 Dismissive Standards-Based Accountability Under No Child Left Behind: Experiences of Teachers and Administrators in Three States Rand Corporation, 2007 https://www.rand.org/pubs/monographs/MG589.html Relevant pre-2000 studies of the effects of standards, alignment, goal setting, setting reachable goals, etc. include those of Mitchell (1999); Morgan & Ramist (1998); the *Southern Regional Education Board (1998); Miles, Bishop, Collins, Fink, Gardner, Grant, Hussain, et al. (1997); the Florida Office of Program Policy Analysis (1997); Pomplun (1997); Schmoker (1996); Aguilera & Hendricks (1996); Banta, Lund, Black & Oblander (1996); Bottoms & Mikos (1995); *Bamburg & Medina (1993); Bishop (1993); the U. S. General Accounting Office (1993); Eckstein & Noah (1993); Mattsson (1993); Brown (1992); Heyneman & Ransom (1992); Whetton (1992); Anderson, Muir, Bateson, Blackmore & Rogers (1990); Csikszentmihalyi (1990); *Levine & Lezotte (1990); LaRoque & Coleman (1989); Hillocks (1987); Willingham & Morris (1986); Resnick & Resnick (1985); Ogle & Fritts (1984); *Natriello & Dornbusch (1984); Brooke & Oxenham (1984); Rentz (1979); Wellisch, MacQueen, Carriere & Dick (1978); *Rosswork (1977); Estes, Colvin & Goodwin (1976); Wood (1953); and Panlasigui & Knight (1930). "This research was sponsored by the National Science Foundation under grant number REC-0228295."
20 Laura S. Hamilton Brian M. Stecher, Julie A. Marsh, Jennifer Sloan McCombs, Abby Robyn, Jennifer Lin Russell, Scott Naftel, Heather Barney "Much less is known about the impact of SBA at the district and school levels and the relationships among actions at the various levels and student outcomes. This study was designed to shed light on this complex set of relationships…" p.5 Dismissive Standards-Based Accountability Under No Child Left Behind: Experiences of Teachers and Administrators in Three States Rand Corporation, 2007 https://www.rand.org/pubs/monographs/MG589.html Relevant pre-2000 studies of the effects of standards, alignment, goal setting, setting reachable goals, etc. include those of Mitchell (1999); Morgan & Ramist (1998); the *Southern Regional Education Board (1998); Miles, Bishop, Collins, Fink, Gardner, Grant, Hussain, et al. (1997); the Florida Office of Program Policy Analysis (1997); Pomplun (1997); Schmoker (1996); Aguilera & Hendricks (1996); Banta, Lund, Black & Oblander (1996); Bottoms & Mikos (1995); *Bamburg & Medina (1993); Bishop (1993); the U. S. General Accounting Office (1993); Eckstein & Noah (1993); Mattsson (1993); Brown (1992); Heyneman & Ransom (1992); Whetton (1992); Anderson, Muir, Bateson, Blackmore & Rogers (1990); Csikszentmihalyi (1990); *Levine & Lezotte (1990); LaRoque & Coleman (1989); Hillocks (1987); Willingham & Morris (1986); Resnick & Resnick (1985); Ogle & Fritts (1984); *Natriello & Dornbusch (1984); Brooke & Oxenham (1984); Rentz (1979); Wellisch, MacQueen, Carriere & Dick (1978); *Rosswork (1977); Estes, Colvin & Goodwin (1976); Wood (1953); and Panlasigui & Knight (1930). "This research was sponsored by the National Science Foundation under grant number REC-0228295."
21 Daniel M. Koretz & Laura S. Hamilton Robert L. Brennan, Ed. "Most of the studies of [testing's] effects on practice report average responses that mask some of these important variations and interactions." p.552 Denigrating Testing for Accountability in K-12 Chapter 15 in Educational Measurement, published by NCME and ACE, 2006   Relevant studies of the effects of varying types of incentive or the optimal structure of incentives include those of Kelley (1999); the *Southern Regional Education Board (1998); Trelfa (1998); Heneman (1998); Banta, Lund, Black & Oblander (1996); Brooks-Cooper, 1993; Eckstein & Noah (1993); Richards & Shen (1992); Jacobson (1992); Heyneman & Ransom (1992); *Levine & Lezotte (1990); Duran, 1989; *Crooks (1988); *Kulik & Kulik (1987); Corcoran & Wilson (1986); *Guskey & Gates (1986); Brook & Oxenham (1985); Oxenham (1984); Venezky & Winfield (1979); Brookover & Lezotte (1979); McMillan (1977); Abbott (1977); *Staats (1973); *Kazdin & Bootzin (1972); *O’Leary & Drabman (1971); Cronbach (1960); Hurlock (1925), and Zeng (2001). *Covers many studies; study is a research review, research synthesis, or meta-analysis.  Other researchers who, even prior to 2000, studied test-based incentive programs include Homme, Csanyi, Gonzales, Rechs, O’Leary, Drabman, Kaszdin, Bootzin, Staats, Cameron, Pierce, McMillan, Corcoran, Roueche, Kirk, Wheeler, Boylan, and Wilson. "What about:  Brooks-Cooper, C. (1993), Brown, S. M. & Walberg, H. J. (1993), Heneman, H. G., III. (1998), Hurlock, E. B. (1925), Jones, J. et al. (1996), Kazdin, A. & Bootzin, R. (1972),  Kelley, C. (1999), Kirkpatrick, J. E. (1934), O’Leary, K. D. & Drabman, R. (1971), Palmer, J. S. (2002), Richards, C. E. & Shen, T. M. (1992), .Rosswork, S. G. (1977),  Staats, A. (1973), Tuckman, B. W. (1994),  Tuckman, B. W. & Trimble, S. (1997), Webster, W. J., Mendro, R. L., Orsack, T., Weerasinghe, D. & Bembry, K. (1997, September). The Dallas Value-Added Accountability System (pp.81–99) & Little practical difference and pie in the sky, (pp.120–131). In J. Millman, (Ed.), Grading teachers, grading schools: Is student achievement a valid evaluation measure? Thousand Oaks, CA: Corwin Press."
22 Daniel M. Koretz & Laura S. Hamilton Robert L. Brennan, Ed. "There is no comprehensive source of information on how much time schools devote to coaching activities such as practicing on released test forms, but some studies suggest these activities are widespread." p.552 Dismissive Testing for Accountability in K-12 Chapter 15 in Educational Measurement, published by NCME and ACE, 2006   A comprehensive study that included a nationally representative population of all systemwide tests contains exactl that information. See Phelps, R.P. (2000, Winter). Estimating the cost of systemwide student testing in the United States. Journal of Education Finance, 25(3) 343–380;  U.S. GAO. (1993, January). Student testing: Current extent and expenditures, with cost estimates for a national examination. GAO/PEMD-93-8. Washington, DC: US General Accounting Office.        Also the test prep, or test coaching, literature is vast and dates back decades, with meta-analyses of the literature dating back at least to the 1970s. There's even a What Works Clearinghouse summary of the (post World Wide Web) college admission test prep research literature:  https://ies.ed.gov/ncee/wwc/Docs/InterventionReports/wwc_act_sat_100416.pdf .  
23 Daniel M. Koretz & Laura S. Hamilton Robert L. Brennan, Ed. "As with coaching, there are no comprehensive studies of the frequency of cheating across schools in the United States." p.553 Dismissive Testing for Accountability in K-12 Chapter 15 in Educational Measurement, published by NCME and ACE, 2006   Actually, there have been, in surveys, in which respondents freely admit that they cheat and how. Moreover, news reports of cheating, by students or educators, have been voluminous. See, for example, Caveon Test Security's "Cheating in the News" section on its web site.  
24 Daniel M. Koretz & Laura S. Hamilton Robert L. Brennan, Ed. "However, in the absence of audit testing, this hypothesis [of score inflation] cannot be tested." p.553 Denigrating Testing for Accountability in K-12 Chapter 15 in Educational Measurement, published by NCME and ACE, 2006   Yes, it can, and often has been, tested in experiments. Koretz's preferred method for "auditing" a high-stakes test is to compare its score trends to those of a parallel no-stakes test, which, presumably, will have totally reliable score trends. Yet, a cornucopia of experimental research has shown "no stakes" tests to be relatively unreliable, less reliable than high stakes tests, and to dampen student effort (see, e.g., Acherman & Kanfer, 2009; S. M. Brown & Walberg, 1993; Cole, Bergin, & Whittaker, 2008; Eklof, 2007; Finn, 2015; Hawthorne, Bol, Pribesh, & Suh, 2015; Wise & DeMars, 2005, 2015).  
25 Laura S. Hamilton   "Despite their popularity, in most cases these [education] reforms are not guided by a careful investigation of the probable consequences of using tests as accountability tools.", p.25 Denigrating Assessment as a Policy Tool Chapter 2, in Review of Research in Education (27), 2003 https://journals.sagepub.com/doi/pdf/10.3102/0091732X027001025 Hamilton's preferred method for "auditing" a high-stakes test is to compare its score trends to those of a parallel no-stakes test, which, presumably, will have totally reliable score trends. Yet, a cornucopia of experimental research has shown "no stakes" tests to be relatively unreliable, less reliable than high stakes tests, and to dampen student effort (see, e.g., Acherman & Kanfer, 2009; S. M. Brown & Walberg, 1993; Cole, Bergin, & Whittaker, 2008; Eklof, 2007; Finn, 2015; Hawthorne, Bol, Pribesh, & Suh, 2015; Wise & DeMars, 2005, 2015).  
26 Laura S. Hamilton   "Although numerous studies have examined the effects of high-stakes testing, the majority of these investigations have failed to reach the standards of quality that would be required to make strong inferences based upon them.", p.32 Denigrating Assessment as a Policy Tool Chapter 2, in Review of Research in Education (27), 2003 https://journals.sagepub.com/doi/pdf/10.3102/0091732X027001025 Nonsense, it is straightforward, and has been done for over a century. For example, 70%, or 357, of the effect sizes in a recent meta-regression derived from randomized experiments conducted over the past century. Another 12% derived from multiple regression studies.  https://journals.sagepub.com/doi/abs/10.1177/0193841X19865628#abstract  
27 Laura S. Hamilton   "In addition, it is nearly impossible for researchers to set up the kind of experimental design that is most appropriate for examining cause-and-effect relationships." p.32 Dismissive Assessment as a Policy Tool Chapter 2, in Review of Research in Education (27), 2003 https://journals.sagepub.com/doi/pdf/10.3102/0091732X027001025 Nonsense, it is straightforward, and has been done for over a century. For example, 70%, or 357, of the effect sizes in a recent meta-regression derived from randomized experiments conducted over the past century. Another 12% derived from multiple regression studies.  https://journals.sagepub.com/doi/abs/10.1177/0193841X19865628#abstract  
28 Laura S. Hamilton   "...even studies that are intended to be merely descriptive often suffer from poor measurement of the construct of interest, as well as biased samples that may result from nonrepresentative sampling or nonrandom refusal to participate in the research." p. 32 Denigrating Assessment as a Policy Tool Chapter 2, in Review of Research in Education (27), 2003 https://journals.sagepub.com/doi/pdf/10.3102/0091732X027001025 Nonsense, it is straightforward, and has been done for over a century. For example, 70%, or 357, of the effect sizes in a recent meta-regression derived from randomized experiments conducted over the past century. Another 12% derived from multiple regression studies.  https://journals.sagepub.com/doi/abs/10.1177/0193841X19865628#abstract  
29 Laura S. Hamilton   "...when test scores are associated with consequences that are important or meaningful to teachers, it is likely that instruction will be affected. The empirical evidence, though not extensive, supports this distinction." p.33 Dismissive Assessment as a Policy Tool Chapter 2, in Review of Research in Education (27), 2003 https://journals.sagepub.com/doi/pdf/10.3102/0091732X027001025 Relevant studies of the effects of tests and/or accountability program on motivation and instructional practice include those of the *Southern Regional Education Board (1998); Johnson (1998); Schafer, Hultgren, Hawley, Abrams Seubert & Mazzoni (1997); Miles, Bishop, Collins, Fink, Gardner, Grant, Hussain, et al. (1997); Tuckman & Trimble (1997); Clarke & Stephens (1996); Zigarelli (1996); Stevenson, Lee, et al. (1995); Waters, Burger & Burger (1995); Egeland (1995); Prais (1995); Tuckman (1994); Ritchie & Thorkildsen (1994); Brown & Walberg, (1993); Wall & Alderson (1993); Wolf & Rapiau (1993); Eckstein & Noah (1993); Chao-Qun & Hui (1993); Plazak & Mazur (1992); Steedman (1992); Singh, Marimutha & Mukjerjee (1990); *Levine & Lezotte (1990); O’Sullivan (1989); Somerset (1988); Pennycuick & Murphy (1988); Stevens (1984); Marsh (1984); Brunton (1982); Solberg (1977); Foss (1977); *Kirkland (1971); Somerset (1968); Stuit (1947); and Keys (1934).  *Covers many studies; study is a research review, research synthesis, or meta-analysis.  
30 Laura S. Hamilton   "...there are no compehensive studies of the frequency of cheating." p.35 Dismissive Assessment as a Policy Tool Chapter 2, in Review of Research in Education (27), 2003 https://journals.sagepub.com/doi/pdf/10.3102/0091732X027001025 Actually, there have been, in surveys, in which respondents freely admit that they cheat and how. Moreover, news reports of cheating, by students or educators, have been voluminous. See, for example, Caveon Test Security's "Cheating in the News" section on its web site.  
31 Laura S. Hamilton   "Although these studies are suggestive, they rely on teacher perceptions, and there is little direct evidence of how testing actually affects student morale." p.39 Dismissive Assessment as a Policy Tool Chapter 2, in Review of Research in Education (27), 2003 https://journals.sagepub.com/doi/pdf/10.3102/0091732X027001025 At least twelve pre-2004 student surveys were included here: https://www.nonpartisaneducation.org/Review/Resources/SurveyList.htm. See also: https://richardphelps.net/DemandForStandardizedTesting.pdf  
32 Laura S. Hamilton   "The overall lack of evidence regarding student morale, stress, and motivation is due in part to the difficulty that researchers have in gaining access to students and measuring their levels of these constructs (Stecher, 2002)." p.39 Dismissive Assessment as a Policy Tool Chapter 2, in Review of Research in Education (27), 2003 https://journals.sagepub.com/doi/pdf/10.3102/0091732X027001025 At least twelve pre-2004 student surveys were included here: https://www.nonpartisaneducation.org/Review/Resources/SurveyList.htm. See also: https://richardphelps.net/DemandForStandardizedTesting.pdf
In 150 of 241 qualitative studies the focus of the interviews, case study, or observations was the effect of testing stakes on students. See: https://www.tandfonline.com/doi/abs/10.1080/15305058.2011.602920
 
33 Laura S. Hamilton   "...though one study that examined high school exit exams and that controlled for individual student characteristics (unlike most of the research on this topic) found no such relationship." p.40 Denigrating Assessment as a Policy Tool Chapter 2, in Review of Research in Education (27), 2003 https://journals.sagepub.com/doi/pdf/10.3102/0091732X027001025 The article to which she refers ignored most previous studies, mischaracterized the ones it acknowledged, and mis-classified the testing programs in several states.  
34 Laura S. Hamilton   "There is much we do not know about score inflation." p.46 Dismissive Assessment as a Policy Tool Chapter 2, in Review of Research in Education (27), 2003 https://journals.sagepub.com/doi/pdf/10.3102/0091732X027001025 In fact, we know quite a lot about the source of higher levels of score inflation -- it is lax test security. The many experimental studies of test coaching are consistent, it has some modest effect, and not the volatile or very large effects that Koretz claims.  
35 Laura S. Hamilton   "there is simply too much that we currently do not know about how to design testing policies that promote desirable outcomes and prevent undesirable ones." p.57 Dismissive Assessment as a Policy Tool Chapter 2, in Review of Research in Education (27), 2003 https://journals.sagepub.com/doi/pdf/10.3102/0091732X027001025 In fact, the evidence "that testing can improve education" is voluminous. See, for example, Phelps, R. P. (2005). The rich, robust research literature on testing’s achievement benefits. In R. P. Phelps (Ed.), Defending standardized testing (pp. 55–90). Mahwah, NJ: Psychology Press. Or, see https://journals.sagepub.com/doi/abs/10.1177/0193841X19865628#abstract  
36 Laura S. Hamilton   "...the search for answers to questions about how to minimize score inflation and promote effective instruction is likely to continue for many years." p.57 Dismissive Assessment as a Policy Tool Chapter 2, in Review of Research in Education (27), 2003 https://journals.sagepub.com/doi/pdf/10.3102/0091732X027001025 In fact, we know quite a lot about the source of higher levels of score inflation -- it is lax test security. The many experimental studies of test coaching are consistent, it has some modest effect, and not the volatile or very large effects that Koretz claims.  
37 Laura S. Hamilton Daniel F. McCaffrey, J.R. Lockwood, Daniel M. Koretz “The shortcomings of the studies make it difficult to determine the size of teacher effects, but we suspect that the magnitude of some of the effects reported in this literature are overstated.” p. xiii Denigrating Evaluating Value-Added Models for Teacher Accountability  Rand Corporation, 2003 https://www.rand.org/content/dam/rand/pubs/monographs/2004/RAND_MG158.pdf Tennessee's TVAAS value-added measurement system had been running a decade when they wrote this and did much of what these authors claim had never been done.  
38 Laura S. Hamilton Daniel F. McCaffrey, J.R. Lockwood, Daniel M. Koretz “Using VAM to estimate individual teacher effects is a recent endeavor, and many of the possible sources of error have not been thoroughly evaluated in the literature.” p. xix Dismissive Evaluating Value-Added Models for Teacher Accountability  Rand Corporation, 2003 https://www.rand.org/content/dam/rand/pubs/monographs/2004/RAND_MG158.pdf Tennessee's TVAAS value-added measurement system had been running a decade when they wrote this and did much of what these authors claim had never been done.  
39 Laura S. Hamilton Daniel F. McCaffrey, J.R. Lockwood, Daniel M. Koretz “Empirical evaluations do not exist for many of the potential sources of error we have identified. Studies need to be conducted to determine how these factors contribute to estimated teacher effects and to determine the conditions that exacerbate or mitigate the impact these factors have on teacher effects.” p. xix Dismissive Evaluating Value-Added Models for Teacher Accountability  Rand Corporation, 2003 https://www.rand.org/content/dam/rand/pubs/monographs/2004/RAND_MG158.pdf Tennessee's TVAAS value-added measurement system had been running a decade when they wrote this and did much of what these authors claim had never been done.  
40 Laura S. Hamilton Daniel F. McCaffrey, J.R. Lockwood, Daniel M. Koretz “This lack of attention to teachers in policy discussions may be attributed in part to another body of literature that attempted to determine the effects of specific teacher background characteristics, including credentialing status (e.g., Miller, McKenna, and McKenna, 1998; Goldhaber and Brewer, 2000) and subject matter coursework (e.g., Monk, 1994).” p. 8 Dismissive Evaluating Value-Added Models for Teacher Accountability  Rand Corporation, 2003 https://www.rand.org/content/dam/rand/pubs/monographs/2004/RAND_MG158.pdf Tennessee's TVAAS value-added measurement system had been running a decade when they wrote this and did much of what these authors claim had never been done.  
41 Laura S. Hamilton Daniel F. McCaffrey, J.R. Lockwood, Daniel M. Koretz “To date, there has been little empirical exploration of the size of school effects and the sensitivity of teacher effects to modeling of school effects.” p. 78 Dismissive Evaluating Value-Added Models for Teacher Accountability  Rand Corporation, 2003 https://www.rand.org/content/dam/rand/pubs/monographs/2004/RAND_MG158.pdf Tennessee's TVAAS value-added measurement system had been running a decade when they wrote this and did much of what these authors claim had never been done.  
42 Laura S. Hamilton Daniel F. McCaffrey, J.R. Lockwood, Daniel M. Koretz “There are no empirical explorations of the robustness of estimates to assumptions about prior-year schooling effects.“ p. 81 Dismissive Evaluating Value-Added Models for Teacher Accountability  Rand Corporation, 2003 https://www.rand.org/content/dam/rand/pubs/monographs/2004/RAND_MG158.pdf Tennessee's TVAAS value-added measurement system had been running a decade when they wrote this and did much of what these authors claim had never been done.  
43 Laura S. Hamilton Daniel F. McCaffrey, J.R. Lockwood, Daniel M. Koretz “There is currently no empirical evidence about the sensitivity of gain scores or teacher effects to such alternatives.” p. 89 Dismissive Evaluating Value-Added Models for Teacher Accountability  Rand Corporation, 2003 https://www.rand.org/content/dam/rand/pubs/monographs/2004/RAND_MG158.pdf Tennessee's TVAAS value-added measurement system had been running a decade when they wrote this and did much of what these authors claim had never been done.  
44 Laura S. Hamilton Daniel F. McCaffrey, J.R. Lockwood, Daniel M. Koretz “Empirical evaluations do not exist for many of the potential sources of error we have identified. Studies need to be conducted to determine how these factors contribute to estimated teacher effects and to determine the conditions that exacerbate or mitigate the impact these factors have on teacher effects.” p. 116 Dismissive Evaluating Value-Added Models for Teacher Accountability  Rand Corporation, 2003 https://www.rand.org/content/dam/rand/pubs/monographs/2004/RAND_MG158.pdf Tennessee's TVAAS value-added measurement system had been running a decade when they wrote this and did much of what these authors claim had never been done.  
45 Laura S. Hamilton Daniel F. McCaffrey, J.R. Lockwood, Daniel M. Koretz “Although we expect missing data are likely to be pervasive, there is little systematic discussion of the extent or nature of missing data in test score databases.” p. 117 Dismissive Evaluating Value-Added Models for Teacher Accountability  Rand Corporation, 2003 https://www.rand.org/content/dam/rand/pubs/monographs/2004/RAND_MG158.pdf Tennessee's TVAAS value-added measurement system had been running a decade when they wrote this and did much of what these authors claim had never been done.  
46 Brian M. Stecher Laura S. Hamilton "The business model of setting clear targets, attaching incentives to the attainment of those targets, and rewarding those responsible for reaching the targets has proven successful in a wide range of business enterprises. But there is no evidence that these accountability principles will work well in an educational context, and there are many reasons to doubt that the principles can be applied without significant adaptation." Dismissive Putting Theory to the Test: Systems of "Educational Accountability" Should be Held Accountable Rand Review, Spring 2002 https://www.rand.org/pubs/periodicals/rand-review/issues/rr-04-02/theory.html See, for example:  https://www.tandfonline.com/doi/full/10.1080/15305058.2011.602920 ; https://nonpartisaneducation.org/Review/Resources/QuantitativeList.htm .  This list includes 24 studies completed before 2000 whose primary focus was to measure the effect of “test-based accountability.” A few dozen more pre-2000 studies also measured the effect of test-based accountability although such was not their primary focus. Include qualitative and program evaluation studies of test-based accountability, and the count of pre-2000 studies rises into the hundreds.   
47 Brian M. Stecher Laura S. Hamilton " The lack of strong evidence regarding the design and effectiveness of accountability systems hampers policymaking at a critical juncture." Denigrating Putting Theory to the Test: Systems of "Educational Accountability" Should be Held Accountable Rand Review, Spring 2002 https://www.rand.org/pubs/periodicals/rand-review/issues/rr-04-02/theory.html See, for example:  https://www.tandfonline.com/doi/full/10.1080/15305058.2011.602920 ; https://nonpartisaneducation.org/Review/Resources/QuantitativeList.htm .  This list includes 24 studies completed before 2000 whose primary focus was to measure the effect of “test-based accountability.” A few dozen more pre-2000 studies also measured the effect of test-based accountability although such was not their primary focus. Include qualitative and program evaluation studies of test-based accountability, and the count of pre-2000 studies rises into the hundreds.   
48 Brian M. Stecher Laura S. Hamilton "Nonetheless, the evidence has yet to justify the expectations. The initial evidence is, at best, mixed. On the plus side, students and teachers seem to respond to the incentives created by the accountability systems Dismissive Putting Theory to the Test: Systems of "Educational Accountability" Should be Held Accountable Rand Review, Spring 2002 https://www.rand.org/pubs/periodicals/rand-review/issues/rr-04-02/theory.html See, for example:  https://www.tandfonline.com/doi/full/10.1080/15305058.2011.602920 ; https://nonpartisaneducation.org/Review/Resources/QuantitativeList.htm .  This list includes 24 studies completed before 2000 whose primary focus was to measure the effect of “test-based accountability.” A few dozen more pre-2000 studies also measured the effect of test-based accountability although such was not their primary focus. Include qualitative and program evaluation studies of test-based accountability, and the count of pre-2000 studies rises into the hundreds.   
49 Brian M. Stecher Laura S. Hamilton "Proponents of accountability attribute the improved scores in these states to clearer expectations, greater motivation on the part of the students and teachers, a focused curriculum, and more-effective instruction. However, there is little or no research to substantiate these positive changes or their effects on scores." Dismissive Putting Theory to the Test: Systems of "Educational Accountability" Should be Held Accountable Rand Review, Spring 2002 https://www.rand.org/pubs/periodicals/rand-review/issues/rr-04-02/theory.html In fact, the evidence "that testing can improve education" is voluminous. See, for example, Phelps, R. P. (2005). The rich, robust research literature on testing’s achievement benefits. In R. P. Phelps (Ed.), Defending standardized testing (pp. 55–90). Mahwah, NJ: Psychology Press. Or, see https://journals.sagepub.com/doi/abs/10.1177/0193841X19865628#abstract  
50 Brian M. Stecher Laura S. Hamilton "One of the earliest studies on the effects of testing (conducted in two Arizona schools in the late 1980s) showed that teachers reduced their emphasis on important, nontested material." Dismissive Putting Theory to the Test: Systems of "Educational Accountability" Should be Held Accountable Rand Review, Spring 2002 https://www.rand.org/pubs/periodicals/rand-review/issues/rr-04-02/theory.html Hundreds, perhaps thousands, of studies of the effects of testing predate CRESST's in the 1980s. See, for example:  https://www.tandfonline.com/doi/full/10.1080/15305058.2011.602920 ; https://nonpartisaneducation.org/Review/Resources/QuantitativeList.htm ; https://nonpartisaneducation.org/Review/Resources/SurveyList.htm ; https://nonpartisaneducation.org/Review/Resources/QualitativeList.htm  
51 Brian M. Stecher Laura S. Hamilton "Test-based accountability systems will work better if we acknowledge how little we know about them, if the federal government devotes appropriate resources to studying them, and if the states make ongoing efforts to improve them."  Dismissive Putting Theory to the Test: Systems of "Educational Accountability" Should be Held Accountable Rand Review, Spring 2002 https://www.rand.org/pubs/periodicals/rand-review/issues/rr-04-02/theory.html See, for example:  https://www.tandfonline.com/doi/full/10.1080/15305058.2011.602920 ; https://nonpartisaneducation.org/Review/Resources/QuantitativeList.htm .  This list includes 24 studies completed before 2000 whose primary focus was to measure the effect of “test-based accountability.” A few dozen more pre-2000 studies also measured the effect of test-based accountability although such was not their primary focus. Include qualitative and program evaluation studies of test-based accountability, and the count of pre-2000 studies rises into the hundreds.   
52 Laura S. Hamilton, Brian M. Stecher, Stephen P. Klein   "Although test-based accountability has shown some compelling results, the issues are complex, the research is new and incomplete, and many of the claims that have received the most attention have proved to be premature and superficial." Denigrating Summary, p.xiv Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002 https://www.rand.org/pubs/monograph_reports/MR1554.html Hundreds, perhaps thousands, of studies of the effects of testing predate CRESST's in the 1980s. See, for example:  https://www.tandfonline.com/doi/full/10.1080/15305058.2011.602920 ; https://nonpartisaneducation.org/Review/Resources/QuantitativeList.htm ; https://nonpartisaneducation.org/Review/Resources/SurveyList.htm ; https://nonpartisaneducation.org/Review/Resources/QualitativeList.htm US National Science Foundation
53 Laura S. Hamilton, Brian M. Stecher, Stephen P. Klein   "The research evidence does not provide definitive information about the actual costs of testing but the information that is available suggests that expenditures for testing have grown in recent years." Dismissive Introduction, p.9 Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002 https://www.rand.org/pubs/monograph_reports/MR1554.html No. See, for example, Phelps, R.P. (2000, Winter). Estimating the cost of systemwide student testing in the United States. Journal of Education Finance, 25(3) 343–380; Danitz, T. (2001, February 27). Special report: States pay $400 million for tests in 2001. Stateline.org. Pew Center for the States; Hoxby, C.M. (2002). The cost of accountability, in W. M Evers & H.J. Walberg (Eds.), School Accountability, Stanford, CA: Hoover Institution Press; U.S. GAO. (1993, January). Student testing: Current extent and expenditures, with cost estimates for a national examination. GAO/PEMD-93-8. Washington, DC: US General Accounting Office; Picus, L.O., & Tralli, A. (1998, February). Alternative assessment programs: What are the true costs? CSE Technical Report 441, Los Angeles: CRESST; Phelps, R.P. (1998). Benefit-cost analysis of systemwide student testing, Paper presented at the annual meeting of the American Education Finance Association, Mobile, AL. US National Science Foundation
54 Laura S. Hamilton, Brian M. Stecher, Stephen P. Klein   "The General Accounting Office (1993) … estimate was $516 million … The estimate does not include time for more-extensive test preparation activities." p.9 Denigrating Introduction, p.9 Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002 https://www.rand.org/pubs/monograph_reports/MR1554.html As a matter of fact the GAO report did include those costs -- all of them. The GAO surveys very explicitly instructed respondents to "include any and all costs related" to each test, including any and all test preparation time and expenses. US National Science Foundation
55 Laura S. Hamilton, Daniel M. Koretz Laura S. Hamilton, Brian M. Stecher, Stephen P. Klein, Eds. "There is currently no substantial evidence on the effects of published report cards on parents’ decisionmaking or on the schools themselves." Dismissive Chapter 2: Tests and their use in test-based accountability systems, p.44 Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002 https://www.rand.org/pubs/monograph_reports/MR1554.html For decades, consulting services have existed that help parents new to a city select the right school or school district for them. US National Science Foundation
56 Vi-Nhuan Le, Stephen P. Klein Laura S. Hamilton, Brian M. Stecher, Stephen P. Klein, Eds. "Research on the inflation of gains remains too limited to indicate how prevalent the problem is." Dismissive Chapter 3: Technical criteria for evaluating tests, p. 68 Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002 https://www.rand.org/pubs/monograph_reports/MR1554.html In fact the test prep, or test coaching, literature is vast and dates back decades, with meta-analyses of the literature dating back at least to the 1970s. There's even a What Works Clearinghouse summary of the (post World Wide Web) college admission test prep research literature:  https://ies.ed.gov/ncee/wwc/Docs/InterventionReports/wwc_act_sat_100416.pdf . See also: Gilmore (1927)  DeWeerdt (1927)  French (1959) French & Dear (1959)  Ortar (1960)  Marron (1965)  ETS (1965). Messick & Jungeblut (1981)  Ellis, Konoske, Wulfeck, & Montague (1982)  DerSimonian and Laird (1983)  Kulik, Bangert-Drowns & Kulik (1984)  Powers (1985)  Jones (1986). Fraker (1986/1987)  Halpin (1987)  Whitla (1988)  Snedecor (1989)  Bond (1989). Baydar (1990)  Becker (1990)  Smyth (1990)  Moore (1991)  Alderson & Wall (1992)  Powers (1993)  Oren (1993). Powers & Rock (1994)  Scholes, Lane (1997)   Allalouf & Ben Shakhar (1998)  Robb & Ercanbrack (1999)  McClain (1999)  Camara (1999, 2001, 2008) Stone & Lane (2000, 2003)  Din & Soldan (2001)  Briggs (2001)  Palmer (2002)  Briggs & Hansen (2004)  Cankoy & Ali Tut (2005)  Crocker (2005)  Allensworth, Correa, & Ponisciak (2008)  Domingue & Briggs (2009)  Koljatic & Silva (2014)  Early (2019)  Herndon (2021) US National Science Foundation
57 Vi-Nhuan Le, Stephen P. Klein Laura S. Hamilton, Brian M. Stecher, Stephen P. Klein, Eds. "Relatively little is known about how testing accomodations affect score validity, and the few studies that have been conducted on the subject have had mixed results." Dismissive Chapter 3: Technical criteria for evaluating tests, p. 71 Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002 https://www.rand.org/pubs/monograph_reports/MR1554.html   US National Science Foundation
58 Brian M. Stecher Laura S. Hamilton, Brian M. Stecher, Stephen P. Klein, Eds. "High-stakes testing may also affect parents (e.g., their attitudes toward education, their engagement with schools,  and their direct participation in their child's learning) as well as policymakers (their beliefs about system performance, their judgements about program effectiveness, and their allocation of resources). However, these issues remain largely unexamined in the literature." Dismissive Chapter 4: Consequences of large-scale, high-stakes testing on school and classroom practice, p. 79 Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002 https://www.rand.org/content/dam/rand/pubs/monograph_reports/2002/MR1554.pdf Parents and other adults are typically reached.through public opinion polls. See, for example:  https://www.tandfonline.com/doi/full/10.1080/15305058.2011.602920 ; https://nonpartisaneducation.org/Review/Resources/SurveyList.htm .  Among the hundreds of polls conducted between 1958 and 2008, a majority of them included parents in particular or adults in general. US National Science Foundation
59 Brian M. Stecher Laura S. Hamilton, Brian M. Stecher, Stephen P. Klein, Eds. "As described in chapter 2, there was little concern about the effects of testing on teaching prior to the 1970s." Dismissive Chapter 4: Consequences of large-scale, high-stakes testing on school and classroom practice, p. 81 Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002 https://www.rand.org/content/dam/rand/pubs/monograph_reports/2002/MR1554.pdf Rubbish. Entire books were written on the topic, for example:  C.C. Ross, Measurement in Today’s Schools, 1942;  G.M. Ruch, G.D. Stoddard, Tests and Measurements in High School Instruction, 1927;  C.W. Odell, Educational Measurement in High School, 1930. Other testimonies to the abundance of educational testing and empirical research on test use starting in the first half of the twentieth century can be found in Lincoln & Workman 1936, 4, 7; Butts 1947, 605; Monroe 1950, 1461; Holman & Docter 1972, 34; Tyack 1974, 183; and Lohman 1997, 88. US National Science Foundation
60 Brian M. Stecher Laura S. Hamilton, Brian M. Stecher, Stephen P. Klein, Eds. "In light of the changes that occurred in the uses of large-scale testing in the 1980s and 1990s, researchers began to investigate teachers' reactions to external assessment. The initial research on the impact of large-scale testing was conducted in the 1980s and the 1990s." Dismissive Chapter 4: Consequences of large-scale, high-stakes testing on school and classroom practice, p. 83 Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002 https://www.rand.org/content/dam/rand/pubs/monograph_reports/2002/MR1554.pdf Hundreds, perhaps thousands, of studies of the effects of testing predate CRESST's in the 1980s. See, for example:  https://www.tandfonline.com/doi/full/10.1080/15305058.2011.602920 ; https://nonpartisaneducation.org/Review/Resources/QuantitativeList.htm ; https://nonpartisaneducation.org/Review/Resources/SurveyList.htm ; https://nonpartisaneducation.org/Review/Resources/QualitativeList.htm US National Science Foundation
61 Brian M. Stecher Laura S. Hamilton, Brian M. Stecher, Stephen P. Klein, Eds. "The bulk of the research on the effects of testing has been conducted using surveys and case studies." Dismissive Chapter 4: Consequences of large-scale, high-stakes testing on school and classroom practice, p. 83 Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002 https://www.rand.org/content/dam/rand/pubs/monograph_reports/2002/MR1554.pdf This is misleading. True, many of the hundreds of studies on the effects of testing have been surveys and case studies. But, many, and more by my count, have been randomized experiments. See, for example:  https://www.tandfonline.com/doi/full/10.1080/15305058.2011.602920 ; https://nonpartisaneducation.org/Review/Resources/QuantitativeList.htm ; US National Science Foundation
62 Brian M. Stecher Laura S. Hamilton, Brian M. Stecher, Stephen P. Klein, Eds. "Data on the incidence of cheating [on educational tests] are scarce…" Dismissive Chapter 4: Consequences of large-scale, high-stakes testing on school and classroom practice, p. 96 Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002 https://www.rand.org/pubs/monograph_reports/MR1554.html Actually, there have been, in surveys, in which respondents freely admit that they cheat and how. Moreover, news reports of cheating, by students or educators, have been voluminous. See, for example, Caveon Test Security's "Cheating in the News" section on its web site. US National Science Foundation
63 Brian M. Stecher Laura S. Hamilton, Brian M. Stecher, Stephen P. Klein, Eds. "Less is known about changes in policies at the district and school levels in response to high-stakes testing, but mixed evidence of some impact has appeared." Dismissive Chapter 4: Consequences of large-scale, high-stakes testing on school and classroom practice, p. 96 Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002 https://www.rand.org/pubs/monograph_reports/MR1554.html Relevant pre-2000 studies of the effects of testing on at-risk students, completion, dropping out, curricular offerings, attitudes, etc. include those of Schleisman (1999); the *Southern Regional Education Board (1998); Webster, Mendro, Orsak, Weerasinghe & Bembry (1997); Jones (1996); Boylan (1996); Jones, 1993; Jacobson (1992); Grisay (1991); Johnstone (1990); Task Force on Educational Assessment Programs [Florida] (1979); Wellisch, MacQueen, Carriere & Duck (1978); Enochs (1978); Pronaratna (1976); and McWilliams & Thomas (1976). US National Science Foundation
64 Brian M. Stecher Laura S. Hamilton, Brian M. Stecher, Stephen P. Klein, Eds. "Although numerous news articles have addressed the negative effects of high-stakes testing, systematic research on the subject is limited." Dismissive Chapter 4: Consequences of large-scale, high-stakes testing on school and classroom practice, p. 98 Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002 https://www.rand.org/pubs/monograph_reports/MR1554.html Relevant pre-2000 studies of the effects of testing on at-risk students, completion, dropping out, curricular offerings, attitudes, etc. include those of Schleisman (1999); the *Southern Regional Education Board (1998); Webster, Mendro, Orsak, Weerasinghe & Bembry (1997); Jones (1996); Boylan (1996); Jones, 1993; Jacobson (1992); Grisay (1991); Johnstone (1990); Task Force on Educational Assessment Programs [Florida] (1979); Wellisch, MacQueen, Carriere & Duck (1978); Enochs (1978); Pronaratna (1976); and McWilliams & Thomas (1976). US National Science Foundation
65 Brian M. Stecher Laura S. Hamilton, Brian M. Stecher, Stephen P. Klein, Eds. "Research regarding the effects of test-based accountability on equity is very limited." Dismissive Chapter 4: Consequences of large-scale, high-stakes testing on school and classroom practice, p. 99 Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002 https://www.rand.org/content/dam/rand/pubs/monograph_reports/2002/MR1554.pdf   US National Science Foundation
66 Brian M. Stecher Laura S. Hamilton, Brian M. Stecher, Stephen P. Klein, Eds. "Researchers have not documented the desirable consequences of testing … as clearly as the undesirable ones." Dismissive Chapter 4: Consequences of large-scale, high-stakes testing on school and classroom practice, p. 99 Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002 https://www.rand.org/content/dam/rand/pubs/monograph_reports/2002/MR1554.pdf See, for example:  https://www.tandfonline.com/doi/full/10.1080/15305058.2011.602920 ; https://nonpartisaneducation.org/Review/Resources/QuantitativeList.htm ; https://nonpartisaneducation.org/Review/Resources/SurveyList.htm ; https://nonpartisaneducation.org/Review/Resources/QualitativeList.htm US National Science Foundation
67 Brian M. Stecher Laura S. Hamilton, Brian M. Stecher, Stephen P. Klein, Eds. " … researchers have not generally measured the extent or magnitude of the shifts in practice that they identified as a result of high-stakes testing." Dismissive Chapter 4: Consequences of large-scale, high-stakes testing on school and classroom practice, pp. 99–100 Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002 https://www.rand.org/content/dam/rand/pubs/monograph_reports/2002/MR1554.pdf The 1993 GAO study did. See, also:  https://www.tandfonline.com/doi/full/10.1080/15305058.2011.602920 ; https://nonpartisaneducation.org/Review/Resources/QuantitativeList.htm ; https://nonpartisaneducation.org/Review/Resources/SurveyList.htm ; https://nonpartisaneducation.org/Review/Resources/QualitativeList.htm US National Science Foundation
68 Lorraine M. McDonnell Laura S. Hamilton, Brian M. Stecher, Stephen P. Klein, Eds. "...this chapter can only describe the issues that are raised when one looks at testing from a political perspective. Because of the lack of systematic studies on the topic." Dismissive Chapter 5: Accountability as seen through a political lens, p.102 Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002 https://www.rand.org/pubs/monograph_reports/MR1554.html Parents and other adults are typically reached.through public opinion polls. See, for example:  https://www.tandfonline.com/doi/full/10.1080/15305058.2011.602920 ; https://nonpartisaneducation.org/Review/Resources/SurveyList.htm .  Among the hundreds of polls conducted between 1958 and 2008, a majority of them included parents in particular or adults in general. US National Science Foundation
69 Lorraine M. McDonnell Laura S. Hamilton, Brian M. Stecher, Stephen P. Klein, Eds. "...public opinion, as measured by surveys, does not always provide a clear and unambiguous measure of public sentiment." Denigrating Chapter 5: Accountability as seen through a political lens, p.108 Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002 https://www.rand.org/pubs/monograph_reports/MR1554.html Parents and other adults are typically reached.through public opinion polls. See, for example:  https://www.tandfonline.com/doi/full/10.1080/15305058.2011.602920 ; https://nonpartisaneducation.org/Review/Resources/SurveyList.htm .  Among the hundreds of polls conducted between 1958 and 2008, a majority of them included parents in particular or adults in general. US National Science Foundation
70 Laura S. Hamilton, Brian M. Stecher Laura S. Hamilton, Brian M. Stecher, Stephen P. Klein, Eds. "So test-based accountability remains controversial because there is inadequate evidence to make clear judgments about its effectiveness in raising test scores and achieving its other goals." Dismissive Chapter 6: Improving test-based accountability, p.122 Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002 https://www.rand.org/pubs/monograph_reports/MR1554.html In fact, the evidence "that testing can improve education" is voluminous. See, for example, Phelps, R. P. (2005). The rich, robust research literature on testing’s achievement benefits. In R. P. Phelps (Ed.), Defending standardized testing (pp. 55–90). Mahwah, NJ: Psychology Press. Or, see https://journals.sagepub.com/doi/abs/10.1177/0193841X19865628#abstract US National Science Foundation
71 Laura S. Hamilton, Brian M. Stecher Laura S. Hamilton, Brian M. Stecher, Stephen P. Klein, Eds. "Unfortunately, the complexity of the issues and the ambiguity of the existing research do not allow our recommendations to take the form of a practical “how-to” guide for policymakers and practitioners." Denigrating Chapter 6: Improving test-based accountability, p.123 Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002 https://www.rand.org/pubs/monograph_reports/MR1554.html In fact, the evidence "that testing can improve education" is voluminous. See, for example, Phelps, R. P. (2005). The rich, robust research literature on testing’s achievement benefits. In R. P. Phelps (Ed.), Defending standardized testing (pp. 55–90). Mahwah, NJ: Psychology Press. Or, see https://journals.sagepub.com/doi/abs/10.1177/0193841X19865628#abstract US National Science Foundation
72 Laura S. Hamilton, Brian M. Stecher Laura S. Hamilton, Brian M. Stecher, Stephen P. Klein, Eds. "Additional research is needed to identify the elements of performance on tests and how these elements map onto other tests …." Denigrating Chapter 6: Improving test-based accountability, p.127 Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002 https://www.rand.org/pubs/monograph_reports/MR1554.html   US National Science Foundation
73 Laura S. Hamilton, Brian M. Stecher Laura S. Hamilton, Brian M. Stecher, Stephen P. Klein, Eds. "Another part of the interpretive question is the need to gather information
in other subject areas to portray a more complete picture of
achievement.
The scope of constructs that have been considered in
research
to date has been fairly narrow, focusing on the subjects that
are part of the accountability systems that have been studied. Many
legitimate instructional
objectives have been ignored in the literature
to date."
Denigrating Chapter 6: Improving test-based accountability, p.127 Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002 https://www.rand.org/pubs/monograph_reports/MR1554.html Many studies of the effects of testing predate CRESST's in the 1980s and cover all subject fields, not just reading and math. See, for example:  https://www.tandfonline.com/doi/full/10.1080/15305058.2011.602920 ; https://nonpartisaneducation.org/Review/Resources/QuantitativeList.htm ; https://nonpartisaneducation.org/Review/Resources/SurveyList.htm ; https://nonpartisaneducation.org/Review/Resources/QualitativeList.htm US National Science Foundation
74 Laura S. Hamilton, Brian M. Stecher Laura S. Hamilton, Brian M. Stecher, Stephen P. Klein, Eds. "States should also conduct ongoing analyses of the performance of groups whose members may not be numerous enough to permit separate reporting. English-language learners and students with disabilities are increasingly being included in high-stakes testing systems, and, as discussed in Chapter Three, little is currently known about the validity of scores for these groups." Dismissive Chapter 6: Improving test-based accountability, p.131 Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002 https://www.rand.org/pubs/monograph_reports/MR1554.html Difficult to believe given that the federal government has for decades generously funded research into testing students with disabilities. See, for example, https://nceo.info/ and Kurt Geisinger's and Janet Carlson's chapters in Defending Standardized Testing and Correcting Fallacies in Educational and Psychological Testing.  US National Science Foundation
75 Laura S. Hamilton, Brian M. Stecher Laura S. Hamilton, Brian M. Stecher, Stephen P. Klein, Eds. "It would be especially helpful to know what changes in instruction are made in response to different kinds of information and incentives. In particular, we need to know how teachers interpret information from tests and how they use it to modify instruction." Dismissive Chapter 6: Improving test-based accountability, p.133 Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002 https://www.rand.org/pubs/monograph_reports/MR1554.html Relevant studies of the effects of tests and/or accountability program on motivation and instructional practice include those of the *Southern Regional Education Board (1998); Johnson (1998); Schafer, Hultgren, Hawley, Abrams Seubert & Mazzoni (1997); Miles, Bishop, Collins, Fink, Gardner, Grant, Hussain, et al. (1997); Tuckman & Trimble (1997); Clarke & Stephens (1996); Zigarelli (1996); Stevenson, Lee, et al. (1995); Waters, Burger & Burger (1995); Egeland (1995); Prais (1995); Tuckman (1994); Ritchie & Thorkildsen (1994); Brown & Walberg, (1993); Wall & Alderson (1993); Wolf & Rapiau (1993); Eckstein & Noah (1993); Chao-Qun & Hui (1993); Plazak & Mazur (1992); Steedman (1992); Singh, Marimutha & Mukjerjee (1990); *Levine & Lezotte (1990); O’Sullivan (1989); Somerset (1988); Pennycuick & Murphy (1988); Stevens (1984); Marsh (1984); Brunton (1982); Solberg (1977); Foss (1977); *Kirkland (1971); Somerset (1968); Stuit (1947); and Keys (1934).  *Covers many studies; study is a research review, research synthesis, or meta-analysis. US National Science Foundation
76 Laura S. Hamilton, Brian M. Stecher Laura S. Hamilton, Brian M. Stecher, Stephen P. Klein, Eds. It seems clear that aligning the components of the system and providing appropriate professional development should, at a minimum, increase teachers’ political support for test-based accountability policies .... Although there is no empirical evidence to suggest that this strategy will reduce inappropriate responses to high-stakes testing,... Additional research needs to be done to determine the importance of alignment for promoting positive effects of test-based accountability. Dismissive Chapter 6: Improving test-based accountability, p.135 Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002 https://www.rand.org/pubs/monograph_reports/MR1554.html Relevant studies of the effects of tests and/or accountability program on motivation and instructional practice include those of the *Southern Regional Education Board (1998); Johnson (1998); Schafer, Hultgren, Hawley, Abrams Seubert & Mazzoni (1997); Miles, Bishop, Collins, Fink, Gardner, Grant, Hussain, et al. (1997); Tuckman & Trimble (1997); Clarke & Stephens (1996); Zigarelli (1996); Stevenson, Lee, et al. (1995); Waters, Burger & Burger (1995); Egeland (1995); Prais (1995); Tuckman (1994); Ritchie & Thorkildsen (1994); Brown & Walberg, (1993); Wall & Alderson (1993); Wolf & Rapiau (1993); Eckstein & Noah (1993); Chao-Qun & Hui (1993); Plazak & Mazur (1992); Steedman (1992); Singh, Marimutha & Mukjerjee (1990); *Levine & Lezotte (1990); O’Sullivan (1989); Somerset (1988); Pennycuick & Murphy (1988); Stevens (1984); Marsh (1984); Brunton (1982); Solberg (1977); Foss (1977); *Kirkland (1971); Somerset (1968); Stuit (1947); and Keys (1934).  *Covers many studies; study is a research review, research synthesis, or meta-analysis. US National Science Foundation
77 Laura S. Hamilton, Brian M. Stecher Laura S. Hamilton, Brian M. Stecher, Stephen P. Klein, Eds. "… we currently do not know enough about test-based accountability to design a system that is immune from the problems we have discussed Dismissive Chapter 6: Improving test-based accountability, p.136 Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002 https://www.rand.org/pubs/monograph_reports/MR1554.html In fact, the evidence "that testing can improve education" is voluminous. See, for example, Phelps, R. P. (2005). The rich, robust research literature on testing’s achievement benefits. In R. P. Phelps (Ed.), Defending standardized testing (pp. 55–90). Mahwah, NJ: Psychology Press. Or, see https://journals.sagepub.com/doi/abs/10.1177/0193841X19865628#abstract US National Science Foundation
78 Laura S. Hamilton, Brian M. Stecher Laura S. Hamilton, Brian M. Stecher, Stephen P. Klein, Eds. "There is some limited evidence that educators’ responses to test based accountability vary according to the characteristics of their student populations,…" Denigrating Chapter 6: Improving test-based accountability, p.138 Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002 https://www.rand.org/pubs/monograph_reports/MR1554.html There was and is far more than "limited" evidence. See, for example:  Hundreds, perhaps thousands, of studies of the effects of testing predate CRESST's in the 1980s. See, for example:  https://www.tandfonline.com/doi/full/10.1080/15305058.2011.602920 ; https://nonpartisaneducation.org/Review/Resources/SurveyList.htm ; https://nonpartisaneducation.org/Review/Resources/QualitativeList.htm US National Science Foundation
79 Laura S. Hamilton, Brian M. Stecher Laura S. Hamilton, Brian M. Stecher, Stephen P. Klein, Eds. "... there is very limited evidence to guide thinking about political issues." Dismissive Chapter 6: Improving test-based accountability, p.139 Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002 https://www.rand.org/pubs/monograph_reports/MR1554.html Parents and other adults are typically reached.through public opinion polls. See, for example:  https://www.tandfonline.com/doi/full/10.1080/15305058.2011.602920 ; https://nonpartisaneducation.org/Review/Resources/SurveyList.htm .  Among the hundreds of polls conducted between 1958 and 2008, a majority of them included parents in particular or adults in general. US National Science Foundation
80 Laura S. Hamilton, Brian M. Stecher Laura S. Hamilton, Brian M. Stecher, Stephen P. Klein, Eds. "First, we do not have an accurate assessment of the additional costs." Denigrating Chapter 6: Improving test-based accountability, p.141 Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002 https://www.rand.org/pubs/monograph_reports/MR1554.html Yes, we did and we do. See, for example, Phelps, R.P. (2000, Winter). Estimating the cost of systemwide student testing in the United States. Journal of Education Finance, 25(3) 343–380; Danitz, T. (2001, February 27). Special report: States pay $400 million for tests in 2001. Stateline.org. Pew Center for the States; Hoxby, C.M. (2002). The cost of accountability, in W. M Evers & H.J. Walberg (Eds.), School Accountability, Stanford, CA: Hoover Institution Press; U.S. GAO. (1993, January). Student testing: Current extent and expenditures, with cost estimates for a national examination. GAO/PEMD-93-8. Washington, DC: US General Accounting Office; Picus, L.O., & Tralli, A. (1998, February). Alternative assessment programs: What are the true costs? CSE Technical Report 441, Los Angeles: CRESST; Phelps, R.P. (1998). Benefit-cost analysis of systemwide student testing, Paper presented at the annual meeting of the American Education Finance Association, Mobile, AL. US National Science Foundation
81 Laura S. Hamilton, Brian M. Stecher Laura S. Hamilton, Brian M. Stecher, Stephen P. Klein, Eds. "However, many of these recommended reforms are relatively inexpensive in comparison with the total cost of education. This equation is seldom examined."  Denigrating Chapter 6: Improving test-based accountability, p.141 Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002 https://www.rand.org/pubs/monograph_reports/MR1554.html Wrong. See, for example, Phelps, R.P. (2000, Winter). Estimating the cost of systemwide student testing in the United States. Journal of Education Finance, 25(3) 343–380;  Phelps, R.P. (1998). Benefit-cost analysis of systemwide student testing, Paper presented at the annual meeting of the American Education Finance Association, Mobile, AL. US National Science Foundation
82 Laura S. Hamilton, Brian M. Stecher Laura S. Hamilton, Brian M. Stecher, Stephen P. Klein, Eds. "Part of the reason these issues are rarely considered may be that no one has produced a good estimate of the cost of an improved accountability system in comparison with its benefits." Denigrating Chapter 6: Improving test-based accountability, p.141 Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002 https://www.rand.org/pubs/monograph_reports/MR1554.html No. See, for example, Phelps, R.P. (2000, Winter). Estimating the cost of systemwide student testing in the United States. Journal of Education Finance, 25(3) 343–380; Danitz, T. (2001, February 27). Special report: States pay $400 million for tests in 2001. Stateline.org. Pew Center for the States; Hoxby, C.M. (2002). The cost of accountability, in W. M Evers & H.J. Walberg (Eds.), School Accountability, Stanford, CA: Hoover Institution Press; U.S. GAO. (1993, January). Student testing: Current extent and expenditures, with cost estimates for a national examination. GAO/PEMD-93-8. Washington, DC: US General Accounting Office; Picus, L.O., & Tralli, A. (1998, February). Alternative assessment programs: What are the true costs? CSE Technical Report 441, Los Angeles: CRESST; Phelps, R.P. (1998). Benefit-cost analysis of systemwide student testing, Paper presented at the annual meeting of the American Education Finance Association, Mobile, AL. US National Science Foundation
83 Laura S. Hamilton, Brian M. Stecher Laura S. Hamilton, Brian M. Stecher, Stephen P. Klein, Eds. "Nevertheless, our knowledge of the costs of alternative accountability systems is still somewhat limited. Policymakers need to know how much it would cost to change their current systems to be responsive to criticisms such as those described in this book. These estimates need to consider all of the associated costs, including possible opportunity costs associated with increased testing time and increased test preparation time." Dismissive Chapter 6: Improving test-based accountability, p.142 Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002 https://www.rand.org/pubs/monograph_reports/MR1554.html No. See, for example, Phelps, R.P. (2000, Winter). Estimating the cost of systemwide student testing in the United States. Journal of Education Finance, 25(3) 343–380; Danitz, T. (2001, February 27). Special report: States pay $400 million for tests in 2001. Stateline.org. Pew Center for the States; Hoxby, C.M. (2002). The cost of accountability, in W. M Evers & H.J. Walberg (Eds.), School Accountability, Stanford, CA: Hoover Institution Press; U.S. GAO. (1993, January). Student testing: Current extent and expenditures, with cost estimates for a national examination. GAO/PEMD-93-8. Washington, DC: US General Accounting Office; Picus, L.O., & Tralli, A. (1998, February). Alternative assessment programs: What are the true costs? CSE Technical Report 441, Los Angeles: CRESST; Phelps, R.P. (1998). Benefit-cost analysis of systemwide student testing, Paper presented at the annual meeting of the American Education Finance Association, Mobile, AL. US National Science Foundation
84 Laura S. Hamilton, Brian M. Stecher Laura S. Hamilton, Brian M. Stecher, Stephen P. Klein, Eds. "However, there is still much about these systems that is not well understood. Lack of research-based knowledge about the quality of scores and the mechanisms through which high-stakes testing programs operate limits our ability to improve these systems. As a result, our discussions also identified unanswered questions..." Dismissive Chapter 6: Improving test-based accountability, p.143 Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002 https://www.rand.org/pubs/monograph_reports/MR1554.html In fact, the evidence "that testing can improve education" is voluminous. See, for example, Phelps, R. P. (2005). The rich, robust research literature on testing’s achievement benefits. In R. P. Phelps (Ed.), Defending standardized testing (pp. 55–90). Mahwah, NJ: Psychology Press. Or, see https://journals.sagepub.com/doi/abs/10.1177/0193841X19865628#abstract US National Science Foundation
85 Laura S. Hamilton Brian M. Stecher, Stephen P. Klein "Although test-based accountability has shown some compelling results, the issues are complex, the research is new and incomplete, and many of the claims that have received the most attention have proved to be premature and superficial." Denigrating Summary, p.xiv Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002 https://www.rand.org/content/dam/rand/pubs/monograph_reports/2002/MR1554.pdf Hundreds, perhaps thousands, of studies of the effects of testing predate CRESST's in the 1980s. See, for example:  https://www.tandfonline.com/doi/full/10.1080/15305058.2011.602920 ; https://nonpartisaneducation.org/Review/Resources/QuantitativeList.htm ; https://nonpartisaneducation.org/Review/Resources/SurveyList.htm ; https://nonpartisaneducation.org/Review/Resources/QualitativeList.htm  
86 Laura S. Hamilton Brian M. Stecher, Stephen P. Klein "The research evidence does not provide definitive information about the actual costs of testing but the information that is available suggests that expenditures for testing have grown in recent years." Dismissive Introduction, p.9 Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002 https://www.rand.org/content/dam/rand/pubs/monograph_reports/2002/MR1554.pdf No. See, for example, Phelps, R.P. (2000, Winter). Estimating the cost of systemwide student testing in the United States. Journal of Education Finance, 25(3) 343–380; Danitz, T. (2001, February 27). Special report: States pay $400 million for tests in 2001. Stateline.org. Pew Center for the States; Hoxby, C.M. (2002). The cost of accountability, in W. M Evers & H.J. Walberg (Eds.), School Accountability, Stanford, CA: Hoover Institution Press; U.S. GAO. (1993, January). Student testing: Current extent and expenditures, with cost estimates for a national examination. GAO/PEMD-93-8. Washington, DC: US General Accounting Office; Picus, L.O., & Tralli, A. (1998, February). Alternative assessment programs: What are the true costs? CSE Technical Report 441, Los Angeles: CRESST; Phelps, R.P. (1998). Benefit-cost analysis of systemwide student testing, Paper presented at the annual meeting of the American Education Finance Association, Mobile, AL.  
87 Laura S. Hamilton Brian M. Stecher, Stephen P. Klein "The General Accounting Office (1993) … estimate was $516 million … The estimate does not include time for more-extensive test preparation activities." p.9 Denigrating Introduction, p.9 Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002 https://www.rand.org/pubs/monograph_reports/MR1554.html As a matter of fact the GAO report did include those costs -- all of them. The GAO surveys very explicitly instructed respondents to "include any and all costs related" to each test, including any and all test preparation time and expenses.  
88 Laura S. Hamilton Daniel M. Koretz "There is currently no substantial evidence on the effects of published report cards on parents’ decisionmaking or on the schools themselves." Dismissive Chapter 2: Tests and their use in test-based accountability systems, p.44
Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002 https://www.rand.org/content/dam/rand/pubs/monograph_reports/2002/MR1554.pdf For decades, consulting services have existed that help parents new to a city select the right school or school district for them.  
89 Brian M. Stecher Laura S. Hamilton, Stephen P. Klein, Eds. "High-stakes testing may also affect parents (e.g., their attitudes toward education, their engagement with schools,  and their direct participation in their child's learning) as well as policymakers (their beliefs about system performance, their judgements about program effectiveness, and their allocation of resources). However, these issues remain largely unexamined in the literature." Dismissive Chapter 4 Consequences of large-scale, high-stakes testing on school and classroom practice, p.79 Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002 https://www.rand.org/content/dam/rand/pubs/monograph_reports/2002/MR1554.pdf Parents and other adults are typically reached.through public opinion polls. See, for example:  https://www.tandfonline.com/doi/full/10.1080/15305058.2011.602920 ; https://nonpartisaneducation.org/Review/Resources/SurveyList.htm .  Among the hundreds of polls conducted between 1958 and 2008, a majority of them included parents in particular or adults in general.  
90 Brian M. Stecher Laura S. Hamilton, Stephen P. Klein, Eds. "As described in chapter 2, there was little concern about the effects of testing on teaching prior to the 1970s." Dismissive Chapter 4 Consequences of large-scale, high-stakes testing on school and classroom practice, p.81 Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002 https://www.rand.org/content/dam/rand/pubs/monograph_reports/2002/MR1554.pdf Hundreds, perhaps thousands, of studies of the effects of testing predate CRESST's in the 1980s. See, for example:  https://www.tandfonline.com/doi/full/10.1080/15305058.2011.602920 ; https://nonpartisaneducation.org/Review/Resources/QuantitativeList.htm ; https://nonpartisaneducation.org/Review/Resources/SurveyList.htm ; https://nonpartisaneducation.org/Review/Resources/QualitativeList.htm  
91 Brian M. Stecher Laura S. Hamilton, Stephen P. Klein, Eds. "In light of the changes that occurred in the uses of large-scale testing in the 1980s and 1990s, researchers began to investigate teachers' reactions to external assessment. The initial research on the impact of large-scale testing was conducted in the 1980s and the 1990s." Dismissive Chapter 4 Consequences of large-scale, high-stakes testing on school and classroom practice, p.83 Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002 https://www.rand.org/content/dam/rand/pubs/monograph_reports/2002/MR1554.pdf Hundreds, perhaps thousands, of studies of the effects of testing predate CRESST's in the 1980s. See, for example:  https://www.tandfonline.com/doi/full/10.1080/15305058.2011.602920 ; https://nonpartisaneducation.org/Review/Resources/QuantitativeList.htm ; https://nonpartisaneducation.org/Review/Resources/SurveyList.htm ; https://nonpartisaneducation.org/Review/Resources/QualitativeList.htm  
92 Brian M. Stecher Laura S. Hamilton, Stephen P. Klein, Eds. "The bulk of the research on the effects of testing has been conducted using surveys and case studies." Dismissive Chapter 4: Consequences of large-scale, high-stakes testing on school and classroom practice, p. 83 Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002 https://www.rand.org/content/dam/rand/pubs/monograph_reports/2002/MR1554.pdf This is misleading. True, many of the hundreds of studies on the effects of testing have been surveys and case studies. But, many, and more by my count, have been randomized experiments. See, for example:  https://www.tandfonline.com/doi/full/10.1080/15305058.2011.602920 ; https://nonpartisaneducation.org/Review/Resources/QuantitativeList.htm ;  
93 Brian M. Stecher Laura S. Hamilton, Stephen P. Klein, Eds. "Researchers have not documented the desirable consequences of testing … as clearly as the undesirable ones. More importantly, researchers have not generally measured the extent or magnitude of the shifts in practice that they identified as a result of high-stakes testing." Dismissive Chapter 4 Consequences of large-scale, high-stakes testing on school and classroom practice, pp.99–100 Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002 https://www.rand.org/content/dam/rand/pubs/monograph_reports/2002/MR1554.pdf Hundreds, perhaps thousands, of studies of the effects of testing predate CRESST's in the 1980s. See, for example:  https://www.tandfonline.com/doi/full/10.1080/15305058.2011.602920 ; https://nonpartisaneducation.org/Review/Resources/QuantitativeList.htm ; https://nonpartisaneducation.org/Review/Resources/SurveyList.htm ; https://nonpartisaneducation.org/Review/Resources/QualitativeList.htm  
94 Laura S. Hamilton Brian M. Stecher "So test-based accountability remains controversial because there is inadequate evidence to make clear judgments about its effectiveness in raising test scores and achieving its other goals." Dismissive Chapter 6: Improving test-based accountability, p.122 Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002 https://www.rand.org/content/dam/rand/pubs/monograph_reports/2002/MR1554.pdf In fact, the evidence "that testing can improve education" is voluminous. See, for example, Phelps, R. P. (2005). The rich, robust research literature on testing’s achievement benefits. In R. P. Phelps (Ed.), Defending standardized testing (pp. 55–90). Mahwah, NJ: Psychology Press. Or, see https://journals.sagepub.com/doi/abs/10.1177/0193841X19865628#abstract  
95 Laura S. Hamilton Brian M. Stecher "Unfortunately, the complexity of the issues and the ambiguity of the existing research do not allow our recommendations to take the form of a practical “how-to” guide for policymakers and practitioners." Denigrating Chapter 6: Improving test-based accountability, p.123 Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002 https://www.rand.org/content/dam/rand/pubs/monograph_reports/2002/MR1554.pdf In fact, the evidence "that testing can improve education" is voluminous. See, for example, Phelps, R. P. (2005). The rich, robust research literature on testing’s achievement benefits. In R. P. Phelps (Ed.), Defending standardized testing (pp. 55–90). Mahwah, NJ: Psychology Press. Or, see https://journals.sagepub.com/doi/abs/10.1177/0193841X19865628#abstract  
96 Laura S. Hamilton Brian M. Stecher "Additional research is needed to identify the elements of performance on tests and how these elements map onto other tests …." Denigrating Chapter 6: Improving test-based accountability, p.127 Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002 https://www.rand.org/pubs/monograph_reports/MR1554.html    
97 Laura S. Hamilton Brian M. Stecher "Another part of the interpretive question is the need to gather information
in other subject areas to portray a more complete picture of
achievement.
The scope of constructs that have been considered in
research
to date has been fairly narrow, focusing on the subjects that
are part of the accountability systems that have been studied. Many
legitimate instructional
objectives have been ignored in the literature
to date."
Dismissive Chapter 6: Improving test-based accountability, p.127 Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002 https://www.rand.org/content/dam/rand/pubs/monograph_reports/2002/MR1554.pdf Many studies of the effects of testing predate CRESST's in the 1980s and cover all subject fields, not just reading and math. See, for example:  https://www.tandfonline.com/doi/full/10.1080/15305058.2011.602920 ; https://nonpartisaneducation.org/Review/Resources/QuantitativeList.htm ; https://nonpartisaneducation.org/Review/Resources/SurveyList.htm ; https://nonpartisaneducation.org/Review/Resources/QualitativeList.htm  
98 Laura S. Hamilton Brian M. Stecher "States should also conduct ongoing analyses of the performance of groups whose members may not be numerous enough to permit separate reporting. English-language learners and students with disabilities are increasingly being included in high-stakes testing systems, and, as discussed in Chapter Three, little is currently known about the validity of scores for these groups." Dismissive Chapter 6: Improving test-based accountability, p.131 Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002 https://www.rand.org/content/dam/rand/pubs/monograph_reports/2002/MR1554.pdf Difficult to believe given that the federal government has for decades generously funded research into testing students with disabilities. See, for example, https://nceo.info/ and Kurt Geisinger's and Janet Carlson's chapters in Defending Standardized Testing and Correcting Fallacies in Educational and Psychological Testing.   
99 Laura S. Hamilton Brian M. Stecher "It would be especially helpful to know what changes in instruction are made in response to different kinds of information and incentives. In particular, we need to know how teachers interpret information from tests and how they use it to modify instruction." Dismissive Chapter 6: Improving test-based accountability, p.133 Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002 https://www.rand.org/content/dam/rand/pubs/monograph_reports/2002/MR1554.pdf Relevant studies of the effects of tests and/or accountability program on motivation and instructional practice include those of the *Southern Regional Education Board (1998); Johnson (1998); Schafer, Hultgren, Hawley, Abrams Seubert & Mazzoni (1997); Miles, Bishop, Collins, Fink, Gardner, Grant, Hussain, et al. (1997); Tuckman & Trimble (1997); Clarke & Stephens (1996); Zigarelli (1996); Stevenson, Lee, et al. (1995); Waters, Burger & Burger (1995); Egeland (1995); Prais (1995); Tuckman (1994); Ritchie & Thorkildsen (1994); Brown & Walberg, (1993); Wall & Alderson (1993); Wolf & Rapiau (1993); Eckstein & Noah (1993); Chao-Qun & Hui (1993); Plazak & Mazur (1992); Steedman (1992); Singh, Marimutha & Mukjerjee (1990); *Levine & Lezotte (1990); O’Sullivan (1989); Somerset (1988); Pennycuick & Murphy (1988); Stevens (1984); Marsh (1984); Brunton (1982); Solberg (1977); Foss (1977); *Kirkland (1971); Somerset (1968); Stuit (1947); and Keys (1934).  *Covers many studies; study is a research review, research synthesis, or meta-analysis.  
100 Laura S. Hamilton Brian M. Stecher It seems clear that aligning the components of the system and providing appropriate professional development should, at a minimum, increase teachers’ political support for test-based accountability policies .... Although there is no empirical evidence to suggest that this strategy will reduce inappropriate responses to high-stakes testing,... Additional research needs to be done to determine the importance of alignment for promoting positive effects of test-based accountability. Dismissive Chapter 6: Improving test-based accountability, p.135 Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002 https://www.rand.org/content/dam/rand/pubs/monograph_reports/2002/MR1554.pdf Relevant studies of the effects of tests and/or accountability program on motivation and instructional practice include those of the *Southern Regional Education Board (1998); Johnson (1998); Schafer, Hultgren, Hawley, Abrams Seubert & Mazzoni (1997); Miles, Bishop, Collins, Fink, Gardner, Grant, Hussain, et al. (1997); Tuckman & Trimble (1997); Clarke & Stephens (1996); Zigarelli (1996); Stevenson, Lee, et al. (1995); Waters, Burger & Burger (1995); Egeland (1995); Prais (1995); Tuckman (1994); Ritchie & Thorkildsen (1994); Brown & Walberg, (1993); Wall & Alderson (1993); Wolf & Rapiau (1993); Eckstein & Noah (1993); Chao-Qun & Hui (1993); Plazak & Mazur (1992); Steedman (1992); Singh, Marimutha & Mukjerjee (1990); *Levine & Lezotte (1990); O’Sullivan (1989); Somerset (1988); Pennycuick & Murphy (1988); Stevens (1984); Marsh (1984); Brunton (1982); Solberg (1977); Foss (1977); *Kirkland (1971); Somerset (1968); Stuit (1947); and Keys (1934).  *Covers many studies; study is a research review, research synthesis, or meta-analysis.  
101 Laura S. Hamilton Brian M. Stecher "… we currently do not know enough about test-based accountability to design a system that is immune from the problems we have discussed Dismissive Chapter 6: Improving test-based accountability, p.136 Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002 https://www.rand.org/content/dam/rand/pubs/monograph_reports/2002/MR1554.pdf In fact, the evidence "that testing can improve education" is voluminous. See, for example, Phelps, R. P. (2005). The rich, robust research literature on testing’s achievement benefits. In R. P. Phelps (Ed.), Defending standardized testing (pp. 55–90). Mahwah, NJ: Psychology Press. Or, see https://journals.sagepub.com/doi/abs/10.1177/0193841X19865628#abstract  
102 Laura S. Hamilton Brian M. Stecher "There is some limited evidence that educators’ responses to test based accountability vary according to the characteristics of their student populations,…" Denigrating Chapter 6: Improving test-based accountability, p.138 Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002 https://www.rand.org/content/dam/rand/pubs/monograph_reports/2002/MR1554.pdf There was and is far more than "limited" evidence. See, for example:  Hundreds, perhaps thousands, of studies of the effects of testing predate CRESST's in the 1980s. See, for example:  https://www.tandfonline.com/doi/full/10.1080/15305058.2011.602920 ; https://nonpartisaneducation.org/Review/Resources/SurveyList.htm ; https://nonpartisaneducation.org/Review/Resources/QualitativeList.htm  
103 Laura S. Hamilton Brian M. Stecher "... there is very limited evidence to guide thinking about political issues." Dismissive Chapter 6: Improving test-based accountability, p.139 Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002 https://www.rand.org/content/dam/rand/pubs/monograph_reports/2002/MR1554.pdf    
104 Laura S. Hamilton Brian M. Stecher "First, we do not have an accurate assessment of the additional costs." Denigrating Chapter 6: Improving test-based accountability, p.141 Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002 https://www.rand.org/content/dam/rand/pubs/monograph_reports/2002/MR1554.pdf Yes, we did and we do. See, for example, Phelps, R.P. (2000, Winter). Estimating the cost of systemwide student testing in the United States. Journal of Education Finance, 25(3) 343–380; Danitz, T. (2001, February 27). Special report: States pay $400 million for tests in 2001. Stateline.org. Pew Center for the States; Hoxby, C.M. (2002). The cost of accountability, in W. M Evers & H.J. Walberg (Eds.), School Accountability, Stanford, CA: Hoover Institution Press; U.S. GAO. (1993, January). Student testing: Current extent and expenditures, with cost estimates for a national examination. GAO/PEMD-93-8. Washington, DC: US General Accounting Office; Picus, L.O., & Tralli, A. (1998, February). Alternative assessment programs: What are the true costs? CSE Technical Report 441, Los Angeles: CRESST; Phelps, R.P. (1998). Benefit-cost analysis of systemwide student testing, Paper presented at the annual meeting of the American Education Finance Association, Mobile, AL.  
105 Laura S. Hamilton Brian M. Stecher "However, many of these recommended reforms are relatively inexpensive in comparison with the total cost of education. This equation is seldom examined."  Denigrating Chapter 6: Improving test-based accountability, p.141 Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002 https://www.rand.org/pubs/monograph_reports/MR1554.html Wrong. See, for example, Phelps, R.P. (2000, Winter). Estimating the cost of systemwide student testing in the United States. Journal of Education Finance, 25(3) 343–380;  Phelps, R.P. (1998). Benefit-cost analysis of systemwide student testing, Paper presented at the annual meeting of the American Education Finance Association, Mobile, AL.  
106 Laura S. Hamilton Brian M. Stecher "Part of the reason these issues are rarely considered may be that no one has produced a good estimate of the cost of an improved accountability system in comparison with its benefits." Denigrating Chapter 6: Improving test-based accountability, p.141 Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002 https://www.rand.org/content/dam/rand/pubs/monograph_reports/2002/MR1554.pdf No. See, for example, Phelps, R.P. (2000, Winter). Estimating the cost of systemwide student testing in the United States. Journal of Education Finance, 25(3) 343–380; Danitz, T. (2001, February 27). Special report: States pay $400 million for tests in 2001. Stateline.org. Pew Center for the States; Hoxby, C.M. (2002). The cost of accountability, in W. M Evers & H.J. Walberg (Eds.), School Accountability, Stanford, CA: Hoover Institution Press; U.S. GAO. (1993, January). Student testing: Current extent and expenditures, with cost estimates for a national examination. GAO/PEMD-93-8. Washington, DC: US General Accounting Office; Picus, L.O., & Tralli, A. (1998, February). Alternative assessment programs: What are the true costs? CSE Technical Report 441, Los Angeles: CRESST; Phelps, R.P. (1998). Benefit-cost analysis of systemwide student testing, Paper presented at the annual meeting of the American Education Finance Association, Mobile, AL.  
107 Laura S. Hamilton Brian M. Stecher "Nevertheless, our knowledge of the costs of alternative accountability systems is still somewhat limited. Policymakers need to know how much it would cost to change their current systems to be responsive to criticisms such as those described in this book. These estimates need to consider all of the associated costs, including possible opportunity costs associated with increased testing time and increased test preparation time." Dismissive Chapter 6: Improving test-based accountability, p.142 Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002 https://www.rand.org/content/dam/rand/pubs/monograph_reports/2002/MR1554.pdf No. See, for example, Phelps, R.P. (2000, Winter). Estimating the cost of systemwide student testing in the United States. Journal of Education Finance, 25(3) 343–380; Danitz, T. (2001, February 27). Special report: States pay $400 million for tests in 2001. Stateline.org. Pew Center for the States; Hoxby, C.M. (2002). The cost of accountability, in W. M Evers & H.J. Walberg (Eds.), School Accountability, Stanford, CA: Hoover Institution Press; U.S. GAO. (1993, January). Student testing: Current extent and expenditures, with cost estimates for a national examination. GAO/PEMD-93-8. Washington, DC: US General Accounting Office; Picus, L.O., & Tralli, A. (1998, February). Alternative assessment programs: What are the true costs? CSE Technical Report 441, Los Angeles: CRESST; Phelps, R.P. (1998). Benefit-cost analysis of systemwide student testing, Paper presented at the annual meeting of the American Education Finance Association, Mobile, AL.  
108 Laura S. Hamilton Brian M. Stecher "However, there is still much about these systems that is not well understood. Lack of research-based knowledge about the quality of scores and the mechanisms through which high-stakes testing programs operate limits our ability to improve these systems. As a result, our discussions also identified unanswered questions..." Dismissive Chapter 6: Improving test-based accountability, p.143 Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002 https://www.rand.org/content/dam/rand/pubs/monograph_reports/2002/MR1554.pdf In fact, the evidence "that testing can improve education" is voluminous. See, for example, Phelps, R. P. (2005). The rich, robust research literature on testing’s achievement benefits. In R. P. Phelps (Ed.), Defending standardized testing (pp. 55–90). Mahwah, NJ: Psychology Press. Or, see https://journals.sagepub.com/doi/abs/10.1177/0193841X19865628#abstract  
109 Daniel M. Koretz Daniel F. McCaffrey, Laura S. Hamilton "Although high-stakes testing is now widespread, methods for evaluating the validity of gains obtained under high-stakes conditions are poorly developed. This report presents an approach for evaluating the validity of inferences based on score gains on high-stakes tests. It describes the inadequacy of traditional validation approaches for validating gains under high-stakes conditions and outlines an alternative validation framework for conceptualizing meaningful and inflated score gains.", p.1 Denigrating Toward a framework for validating gains under high-stakes conditions CSE Technical Report 551, CRESST/Harvard Graduate School of Education, CRESST/RAND Education, December 2001 https://files.eric.ed.gov/fulltext/ED462410.pdf In fact the test prep, or test coaching, literature is vast and dates back decades, with meta-analyses of the literature dating back at least to the 1970s. There's even a What Works Clearinghouse summary of the (post World Wide Web) college admission test prep research literature:  https://ies.ed.gov/ncee/wwc/Docs/InterventionReports/wwc_act_sat_100416.pdf . See also: Gilmore (1927)  DeWeerdt (1927)  French (1959) French & Dear (1959)  Ortar (1960)  Marron (1965)  ETS (1965). Messick & Jungeblut (1981)  Ellis, Konoske, Wulfeck, & Montague (1982)  DerSimonian and Laird (1983)  Kulik, Bangert-Drowns & Kulik (1984)  Powers (1985)  Jones (1986). Fraker (1986/1987)  Halpin (1987)  Whitla (1988)  Snedecor (1989)  Bond (1989). Baydar (1990)  Becker (1990)  Smyth (1990)  Moore (1991)  Alderson & Wall (1992)  Powers (1993)  Oren (1993). Powers & Rock (1994)  Scholes, Lane (1997)   Allalouf & Ben Shakhar (1998)  Robb & Ercanbrack (1999)  McClain (1999)  Camara (1999, 2001, 2008) Stone & Lane (2000, 2003)  Din & Soldan (2001)  Briggs (2001)  Palmer (2002)  Briggs & Hansen (2004)  Cankoy & Ali Tut (2005)  Crocker (2005)  Allensworth, Correa, & Ponisciak (2008)  Domingue & Briggs (2009)  Koljatic & Silva (2014)  Early (2019)  Herndon (2021)  
110 Daniel M. Koretz Daniel F. McCaffrey, Laura S. Hamilton "Few efforts are made to evaluate directly score gains obtained under high-stakes conditions, and conventional validation tools are not fully adequate for the task.", p. 1 Dismissive Toward a framework for validating gains under high-stakes conditions CSE Technical Report 551, CRESST/Harvard Graduate School of Education, CRESST/RAND Education, December 2001 https://files.eric.ed.gov/fulltext/ED462410.pdf In fact the test prep, or test coaching, literature is vast and dates back decades, with meta-analyses of the literature dating back at least to the 1970s. There's even a What Works Clearinghouse summary of the (post World Wide Web) college admission test prep research literature:  https://ies.ed.gov/ncee/wwc/Docs/InterventionReports/wwc_act_sat_100416.pdf . See also: Gilmore (1927)  DeWeerdt (1927)  French (1959) French & Dear (1959)  Ortar (1960)  Marron (1965)  ETS (1965). Messick & Jungeblut (1981)  Ellis, Konoske, Wulfeck, & Montague (1982)  DerSimonian and Laird (1983)  Kulik, Bangert-Drowns & Kulik (1984)  Powers (1985)  Jones (1986). Fraker (1986/1987)  Halpin (1987)  Whitla (1988)  Snedecor (1989)  Bond (1989). Baydar (1990)  Becker (1990)  Smyth (1990)  Moore (1991)  Alderson & Wall (1992)  Powers (1993)  Oren (1993). Powers & Rock (1994)  Scholes, Lane (1997)   Allalouf & Ben Shakhar (1998)  Robb & Ercanbrack (1999)  McClain (1999)  Camara (1999, 2001, 2008) Stone & Lane (2000, 2003)  Din & Soldan (2001)  Briggs (2001)  Palmer (2002)  Briggs & Hansen (2004)  Cankoy & Ali Tut (2005)  Crocker (2005)  Allensworth, Correa, & Ponisciak (2008)  Domingue & Briggs (2009)  Koljatic & Silva (2014)  Early (2019)  Herndon (2021)  
111 Laura S. Hamilton   “Despite the number of studies investigating affective aspects of test taking, little is known about how students perceive the kinds of extended performance assessments currently being developed for state and local testing programs.” - Abstract Denigrating An Investigation of Students' Affective Responses to Alternative Assessment Formats Paper presented at the Annual Meeting of the National Council on Measurement in Education (New Orleans, LA, April 5-7, 1994) http://files.eric.ed.gov/fulltext/ED376203.pdf "At least twelve pre-2004 student surveys were included here: https://www.nonpartisaneducation.org/Review/Resources/SurveyList.htm. See also: https://richardphelps.net/DemandForStandardizedTesting.pdf
In 150 of 241 qualitative studies the focus of the interviews, case study, or observations was the effect of testing stakes on students. See: https://www.tandfonline.com/doi/abs/10.1080/15305058.2011.602920"
 
112 Laura S. Hamilton   “As stated earlier, this study was not intended to produce results that could be generalized to other tasks or to other samples of students, but to identify questions that might be addressed by future studies and to suggest possible hypotheses.” p. 23 Dismissive An Investigation of Students' Affective Responses to Alternative Assessment Formats Paper presented at the Annual Meeting of the National Council on Measurement in Education (New Orleans, LA, April 5-7, 1994) http://files.eric.ed.gov/fulltext/ED376203.pdf "At least twelve pre-2004 student surveys were included here: https://www.nonpartisaneducation.org/Review/Resources/SurveyList.htm. See also: https://richardphelps.net/DemandForStandardizedTesting.pdf
In 150 of 241 qualitative studies the focus of the interviews, case study, or observations was the effect of testing stakes on students. See: https://www.tandfonline.com/doi/abs/10.1080/15305058.2011.602920"
 
                   
  IRONIES:                
  Laura S. Hamilton   "In the meantime, it will be important to continue to gather evidence, both from large-scale studies and from the individual experiences of teachers, administrators, and others affected by test-based accountability, and to make that evidence available so that it can inform efforts to improve accountability systems." p.57   Assessment as a Policy Tool Chapter 2, in Review of Research in Education (27), 2003 https://journals.sagepub.com/doi/pdf/10.3102/0091732X027001025    
  Laura S. Hamilton Brian M. Stecher, Stephen P. Klein "Greater knowledge about testing and accountability can lead to better system design and more-effective system management." p.xiv   Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002 Summary, p.xiv https://www.rand.org/content/dam/rand/pubs/monograph_reports/2002/MR1554.pdf    
  Laura S. Hamilton Brian M. Stecher "Incremental improvements to existing systems, based on current research on testing and accountability, should be combined with long-term research and development efforts that may ultimately lead to a major redesign of these systems. Success in this endeavor will require the thoughtful engagement of educators, policymakers, and researchers in discussions and debates about tests and testing policies."   Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002 Chapter 6, Improving test-based accountability, pp.143-144 https://www.rand.org/content/dam/rand/pubs/monograph_reports/2002/MR1554.pdf    
  Christian Bourge, journalist Laura S. Hamilton, interviewee ""The fact that there is a high correlation (between the high-stakes and low-stake tests) doesn't necessarily mean the tests are telling you the same thing."   Experts differ about high-stakes testing UPI, Feb. 13, 2003  https://www.upi.com/Top_News/2003/02/13/Experts-differ-about-high-stakes-testing/60271045180206/    
  Christian Bourge, journalist Laura S. Hamilton, interviewee "She said that without examining the similarities in the design of each test, you do not know how comparable they are from district to district."   Experts differ about high-stakes testing UPI, Feb. 13, 2003  https://www.upi.com/Top_News/2003/02/13/Experts-differ-about-high-stakes-testing/60271045180206/    
                   
      Cite selves or colleagues in the group, but dismiss or denigrate all other work            
      Falsely claim that research has only recently been done on topic.            
      Author cites (and accepts as fact without checking) someone elses dismissive review