HOME:  Dismissive Reviews in Education Policy Research
  Author Co-author(s) Dismissive Quote type Title Source Link1 Link2 Notes Notes2
1 Daniel M. Koretz   "However, our experience is still limited, and there is a serious dearth of research investigating the characteristics and effects of testing in the postsecondary sector." Dismissive Measuring Postsecondary Achievement: Lessons from Large-Scale Assessments in the K-12 Sector Higher Education Policy, April 24, 2019, Abstract https://link.springer.com/article/10.1057/s41307-019-00142-4   In fact, the research literature on testing in higher education is long and deep. Consider, for example, the work of Trudy Banta, Patricia Cross, and Thomas Angelo. See also the large number of higher education studies in this meta analysis:  https://www.tandfonline.com/doi/full/10.1080/15305058.2011.602920 ; https://nonpartisaneducation.org/Review/Resources/QuantitativeList.htm ; https://nonpartisaneducation.org/Review/Resources/SurveyList.htm ; https://nonpartisaneducation.org/Review/Resources/QualitativeList.htm
2 Matt Barnum Daniel Koretz [interviewee] Journalist: I take it it’s very hard to quantify this test prep phenomenon, though? Koretz: It is extremely hard, and there’s a big hole in the research in this area. Dismissive Why one Harvard professor calls American schools’ focus on testing a ‘charade’ Chalkbeat, January 19, 2018 https://www.chalkbeat.org/posts/us/2018/01/19/why-one-harvard-professor-calls-american-schools-focus-on-testing-a-charade/   In fact the test prep, or test coaching, literature is vast and dates back decades, with meta-analyses of the literature dating back at least to the 1970s. There's even a What Works Clearinghouse summary of the (post World Wide Web) college admission test prep research literature:  https://ies.ed.gov/ncee/wwc/Docs/InterventionReports/wwc_act_sat_100416.pdf . See also: Messick & Jungeblut (1981)  Ellis, Konoske, Wulfeck, & Montague (1982)  DerSimonian and Laird (1983)  Kulik, Bangert-Drowns & Kulik (1984) Fraker (1986/1987) Halpin (1987) Whitla (1988)  Snedecor (1989)  Becker (1990)  Smyth (1990) Moore (1991)  Alderson & Wall (1992)  Powers (1993)  Powers & Rock (1994)  Scholes, Lane (1997)   Allalouf & Ben Shakhar (1998)  Robb & Ercanbrack (1999)  McClain (1999)  Camara (1999, 2001, 2008) Stone & Lane (2000, 2003)  Din & Soldan (2001)  Briggs (2001)  Palmer (2002)  Briggs & Hansen (2004)  Cankoy & Ali Tut (2005)  Crocker (2005)  Allensworth, Correa, & Ponisciak (2008)  Domingue & Briggs (2009) 
3 Matt Barnum Daniel Koretz [interviewee] "There aren’t that many studies, but they’re very consistent. The inflation that does show up is sometimes absolutely massive. Worse, there is growing evidence that that problem is more severe for disadvantaged kids, creating the illusion of improved equity." Dismissive Why one Harvard professor calls American schools’ focus on testing a ‘charade’ Chalkbeat, January 19, 2018 https://www.chalkbeat.org/posts/us/2018/01/19/why-one-harvard-professor-calls-american-schools-focus-on-testing-a-charade/   In fact the test prep, or test coaching, literature is vast and dates back decades, with meta-analyses of the literature dating back at least to the 1970s. There's even a What Works Clearinghouse summary of the (post World Wide Web) college admission test prep research literature:  https://ies.ed.gov/ncee/wwc/Docs/InterventionReports/wwc_act_sat_100416.pdf . See also: Messick & Jungeblut (1981)  Ellis, Konoske, Wulfeck, & Montague (1982)  DerSimonian and Laird (1983)  Kulik, Bangert-Drowns & Kulik (1984) Fraker (1986/1987) Halpin (1987) Whitla (1988)  Snedecor (1989)  Becker (1990)  Smyth (1990) Moore (1991)  Alderson & Wall (1992)  Powers (1993)  Powers & Rock (1994)  Scholes, Lane (1997)   Allalouf & Ben Shakhar (1998)  Robb & Ercanbrack (1999)  McClain (1999)  Camara (1999, 2001, 2008) Stone & Lane (2000, 2003)  Din & Soldan (2001)  Briggs (2001)  Palmer (2002)  Briggs & Hansen (2004)  Cankoy & Ali Tut (2005)  Crocker (2005)  Allensworth, Correa, & Ponisciak (2008)  Domingue & Briggs (2009) 
4 Daniel M. Koretz   "However, this reasoning isn't just simple, it's simplistic--and the evidence is overwhelming that this approach [that testing can improve education] has failed. … these improvements are few and small. Hard evidence is limited, a consequence of our failure as a nation to evaluate these programs appropriately before imposing them on all children." Dismissive The Testing Charade: Pretending to Make Schools Better [Kindle location 142] University of Chicago Press, 2017     In fact, the evidence "that testing can improve education" is voluminous. See, for example, Phelps, R. P. (2005). The rich, robust research literature on testing’s achievement benefits. In R. P. Phelps (Ed.), Defending standardized testing (pp. 55–90). Mahwah, NJ: Psychology Press.
5 Daniel M. Koretz   "The bottom line: the information yielded by tests, while very useful, is never by itself adequate for evaluating programs, schools, or educators. Self-evident as this should be, it has been widely ignored in recent years. Indeed, ignoring this obvious warning has been the bedrock of test-based education reform." Denigrating The Testing Charade: Pretending to Make Schools Better [Kindle location 142] University of Chicago Press, 2017     I know of no testing professional who claims that testing by itself is adequate for evaluating programs, schools, or educators. But, by the same notion, neither are other measures used alone, such as inspections or graduation rates.
6 Daniel M. Koretz   "…as of the late 1980s there was not a single study evaluating whether inflation occurred or how severe it was. With three colleagues, I set out to conduct one." 1stness The Testing Charade: Pretending to Make Schools Better [Kindle location 142] University of Chicago Press, 2017     * The most famous test score inflation study of all time -- John J. Cannells "Lake Wobegon Effect" study -- preceded Koretz's by several years. See:  http://nonpartisaneducation.org/Review/Books/CannellBook1.htm  http://nonpartisaneducation.org/Review/Books/Cannell2.pdf
7 Daniel M. Koretz   "However, value-added estimates are rarely calculated with lower-stakes tests that are less likely to be inflated." Dismissive The Testing Charade: Pretending to Make Schools Better [Kindle location 142] University of Chicago Press, 2017     Almost all value-added measurements (VAM) are calculated on scores from tests with no stakes for the students. The state of Tennessee, which pioneered VAM and has continued to use it for two decades uses nationally-normed reference tests that have no stakes for anyone, including teachers. Moreover, research shows that low-stakes tests are more prone to score inflation than high-stakes tests.
8 Daniel M. Koretz   "One reason we know less than we should … is that most of the abundant test score data available to us are too vulnerable to score inflation to be trusted. There is a second reason for the dearth of information, the blame for which lies squarely on the shoulders of many of the reformers." Dismissive The Testing Charade: Pretending to Make Schools Better [Kindle location 142] University of Chicago Press, 2017    
9 Daniel M. Koretz   "High-quality evaluations of the test-based reforms aren't common, …" Denigrating The Testing Charade: Pretending to Make Schools Better [Kindle location 142] University of Chicago Press, 2017     Actually, high-quality evaluation of testing interventions have been numerous and common over the past century. Most of them do not produce the results that Koretz prefers, however, so he declares them nonexistent. See https://www.tandfonline.com/doi/full/10.1080/15305058.2011.602920 ; https://nonpartisaneducation.org/Review/Resources/QuantitativeList.htm ; https://nonpartisaneducation.org/Review/Resources/SurveyList.htm ; https://nonpartisaneducation.org/Review/Resources/QualitativeList.htm
10 Daniel M. Koretz   "The first solid study documenting score inflation was presented twenty-five years before I started writing this book." 1stness The Testing Charade: Pretending to Make Schools Better [Kindle location 142] University of Chicago Press, 2017     * The most famous test score inflation study of all time -- John J. Cannells "Lake Wobegon Effect" study -- preceded Koretz's by several years. See:  http://nonpartisaneducation.org/Review/Books/CannellBook1.htm  http://nonpartisaneducation.org/Review/Books/Cannell2.pdf
11 Daniel M. Koretz   "The first study showing illusory improvement in achievement gaps--the largely bogus "Texas miracle"--was publicshed only ten years after that." 1stness The Testing Charade: Pretending to Make Schools Better [Kindle location 142] University of Chicago Press, 2017     * The most famous test score inflation study of all time -- John J. Cannells "Lake Wobegon Effect" study -- preceded Koretz's by several years. See:  http://nonpartisaneducation.org/Review/Books/CannellBook1.htm  http://nonpartisaneducation.org/Review/Books/Cannell2.pdf
12 Daniel M. Koretz Holcombe, Jennings “To date, few studies have attempted to understand the sources of variation in score inflation across testing programs.” p. 3 Dismissive The roots of score inflation, an examination of opportunities in two states’ tests  Prepublication draft “to appear in Sunderman (Ed.), Charting reform: achieving equity in a diverse nation http://dash.harvard.edu/bitstream/handle/1/10880587/roots%20of%20score%20inflation.pdf?sequence=1   In fact the test prep, or test coaching, literature is vast and dates back decades, with meta-analyses of the literature dating back at least to the 1970s. There's even a What Works Clearinghouse summary of the (post World Wide Web) college admission test prep research literature:  https://ies.ed.gov/ncee/wwc/Docs/InterventionReports/wwc_act_sat_100416.pdf . See also: Messick & Jungeblut (1981)  Ellis, Konoske, Wulfeck, & Montague (1982)  DerSimonian and Laird (1983)  Kulik, Bangert-Drowns & Kulik (1984) Fraker (1986/1987) Halpin (1987) Whitla (1988)  Snedecor (1989)  Becker (1990)  Smyth (1990) Moore (1991)  Alderson & Wall (1992)  Powers (1993)  Powers & Rock (1994)  Scholes, Lane (1997)   Allalouf & Ben Shakhar (1998)  Robb & Ercanbrack (1999)  McClain (1999)  Camara (1999, 2001, 2008) Stone & Lane (2000, 2003)  Din & Soldan (2001)  Briggs (2001)  Palmer (2002)  Briggs & Hansen (2004)  Cankoy & Ali Tut (2005)  Crocker (2005)  Allensworth, Correa, & Ponisciak (2008)  Domingue & Briggs (2009) 
13 Daniel M. Koretz Waldman, Yu, Langli, Orzech Few studies have applied a multi-level framework to the evaluation of inflation,” p. 1 Denigrating Using the introduction of a new test to investigate the distribution of score inflation  Working paper of Education Accountability Project at the Harvard Graduate School of Education, Nov. 2014 http://projects.iq.harvard.edu/files/eap/files/ky_cot_3_2_15_working_paper.pdf   In fact the test prep, or test coaching, literature is vast and dates back decades, with meta-analyses of the literature dating back at least to the 1970s. There's even a What Works Clearinghouse summary of the (post World Wide Web) college admission test prep research literature:  https://ies.ed.gov/ncee/wwc/Docs/InterventionReports/wwc_act_sat_100416.pdf . See also: Messick & Jungeblut (1981)  Ellis, Konoske, Wulfeck, & Montague (1982)  DerSimonian and Laird (1983)  Kulik, Bangert-Drowns & Kulik (1984) Fraker (1986/1987) Halpin (1987) Whitla (1988)  Snedecor (1989)  Becker (1990)  Smyth (1990) Moore (1991)  Alderson & Wall (1992)  Powers (1993)  Powers & Rock (1994)  Scholes, Lane (1997)   Allalouf & Ben Shakhar (1998)  Robb & Ercanbrack (1999)  McClain (1999)  Camara (1999, 2001, 2008) Stone & Lane (2000, 2003)  Din & Soldan (2001)  Briggs (2001)  Palmer (2002)  Briggs & Hansen (2004)  Cankoy & Ali Tut (2005)  Crocker (2005)  Allensworth, Correa, & Ponisciak (2008)  Domingue & Briggs (2009) 
14 Daniel M. Koretz   "What we don’t know, What is the net effect on student achievement?
-Weak research designs, weaker data
-Some evidence of inconsistent, modest effects in elementary math, none in reading
-Effects are likely to vary across contexts...
Reason: grossly inadequate research and evaluation"
Denigrating Using tests for monitoring and accountability Presentation at:  Agencia de Calidad de la Educación Santiago, Chile, November 3, 2014     See, for example, https://www.tandfonline.com/doi/full/10.1080/15305058.2011.602920 ; https://nonpartisaneducation.org/Review/Resources/QuantitativeList.htm ; https://nonpartisaneducation.org/Review/Resources/SurveyList.htm ; https://nonpartisaneducation.org/Review/Resources/QualitativeList.htm
15 Daniel M. Koretz Jennifer L. Jennings “We find that research on the use of test score data is limited, and research investigating the understanding of tests and score data is meager.” p. 1 Dismissive The Misunderstanding and Use of Data from Educational Tests  Prepared for Spencer Foundation meetings, Chicago, IL, February 11, 2010. Revised November 21, 2010 http://www.spencer.org/data-use-and-educational-improvement-initiative-activities/ http://www.spencer.org/resources/content/3/3/8/documents/Koretz--Jennings-paper.pdf Relevant studies include: Forte Fast, E., & the Accountability Systems and Reporting State Collaborative on Assessment and Student Standards. (2002). A guide to effective accountability reporting. Washington, DC: Council of Chief State School Officers. * Goodman, D., & Hambleton, R.K. (2005). Some misconceptions about large-scale educational assessments, Chapter 4 in Richard P Phelps (Ed.) Defending Standardized Testing, Psychology Press. * Goodman, D. P., & Hambleton (2004). Student test score reports and interpretive guides: Review of current practices and suggestions for future research. Applied Measurement in Education. * Hambleton, R. K. (2002). How can we make NAEP and state test score reporting scales and reports more understandable? In R. W. Lissitz & W. D. Schafer (Eds.), Assessment in educational reform (pp. 192-205). Boston: Allyn & Bacon. * Impara, J. C., Divine, K. P., Bruce, F. A., Liverman, M. R., & Gay, A. (1991). Does interpretive test score information help teachers? Educational Measurement: Issues and Practice, 10(4), 16-18. * Wainer, H., Hambleton, R. K., & Meara, K. (1999). Alternative displays for communicating NAEP results: A redesign and validity study. Journal of Educational Measurement, 36(4), 301-335.
16 Daniel M. Koretz Jennifer L. Jennings “Because of the sparse research literature, we rely on experience and anecdote in parts of this paper, with the premise that these conclusions should be supplanted over time by findings from systematic research.” p. 1 Dismissive The Misunderstanding and Use of Data from Educational Tests  Prepared for Spencer Foundation meetings, Chicago, IL, February 11, 2010. Revised November 21, 2010 http://www.spencer.org/data-use-and-educational-improvement-initiative-activities http://www.spencer.org/sites/default/files/pdfs/Koretz-Jennings-paper.pdf Relevant studies include: Forte Fast, E., & the Accountability Systems and Reporting State Collaborative on Assessment and Student Standards. (2002). A guide to effective accountability reporting. Washington, DC: Council of Chief State School Officers. * Goodman, D., & Hambleton, R.K. (2005). Some misconceptions about large-scale educational assessments, Chapter 4 in Richard P Phelps (Ed.) Defending Standardized Testing, Psychology Press. * Goodman, D. P., & Hambleton (2004). Student test score reports and interpretive guides: Review of current practices and suggestions for future research. Applied Measurement in Education. * Hambleton, R. K. (2002). How can we make NAEP and state test score reporting scales and reports more understandable? In R. W. Lissitz & W. D. Schafer (Eds.), Assessment in educational reform (pp. 192-205). Boston: Allyn & Bacon. * Impara, J. C., Divine, K. P., Bruce, F. A., Liverman, M. R., & Gay, A. (1991). Does interpretive test score information help teachers? Educational Measurement: Issues and Practice, 10(4), 16-18. * Wainer, H., Hambleton, R. K., & Meara, K. (1999). Alternative displays for communicating NAEP results: A redesign and validity study. Journal of Educational Measurement, 36(4), 301-335.
17 Daniel M. Koretz Jennifer L. Jennings "...the relative performance of schools is difficult to interpret in the presence of score inflation. At this point, we know very little about the factors that may predict higher levels of inflation —for example, characteristics of tests, accountability systems, students, or schools." p.4 Dismissive The Misunderstanding and Use of Data from Educational Tests  Prepared for Spencer Foundation meetings, Chicago, IL, February 11, 2010. Revised November 21, 2010 http://www.spencer.org/data-use-and-educational-improvement-initiative-activities http://www.spencer.org/sites/default/files/pdfs/Koretz-Jennings-paper.pdf In fact, we know quite a lot about the source of higher levels of score inflation -- it is lax test security.
18 Daniel M. Koretz Jennifer L. Jennings "Unfortunately, it is often exceedingly difficult to obtain the permission and access needed to carry out testing-related research in the public education sector. This is particularly so if the research holds out the possibility of politically inconvenient findings, which virtually all evaluations in this area do. In our experience, very few state or district superintendents or commissioners consider it an obligation to provide thepublic or the field with open and impartial research.  Dismissive The Misunderstanding and Use of Data from Educational Tests, pp.4-5 Prepared for Spencer Foundation meetings, Chicago, IL, February 11, 2010. Revised November 21, 2010 http://www.spencer.org/data-use-and-educational-improvement-initiative-activities/ http://www.spencer.org/resources/content/3/3/8/documents/Koretz--Jennings-paper.pdf Externally administered high-stakes testing is widely reviled among US educationists. It strains credulity that Koretz can not find one district out of the many thousands to cooperate with him to discredit testing.
19 Daniel M. Koretz Jennifer L. Jennings “We focus on three issues that are especially relevant to test-based data and about which research is currently sparse:
  How do the types of data made available for use affect policymakers’ and educators’ understanding of data?
  What are the common errors made by policymakers and educators in interpreting test score data?
  How do high-stakes testing and the availability of test-based data affect administrator and teacher practice? (p. 5)
Dismissive The Misunderstanding and Use of Data from Educational Tests  Prepared for Spencer Foundation meetings, Chicago, IL, February 11, 2010. Revised November 21, 2010 http://www.spencer.org/data-use-and-educational-improvement-initiative-activities http://www.spencer.org/resources/content/3/3/8/documents/Koretz-Jennings-paper.pdf Relevant studies of the effects of tests and/or accountability program on motivation and instructional practice include those of the *Southern Regional Education Board (1998); Johnson (1998); Schafer, Hultgren, Hawley, Abrams Seubert & Mazzoni (1997); Miles, Bishop, Collins, Fink, Gardner, Grant, Hussain, et al. (1997); Tuckman & Trimble (1997); Clarke & Stephens (1996); Zigarelli (1996); Stevenson, Lee, et al. (1995); Waters, Burger & Burger (1995); Egeland (1995); Prais (1995); Tuckman (1994); Ritchie & Thorkildsen (1994); Brown & Walberg, (1993); Wall & Alderson (1993); Wolf & Rapiau (1993); Eckstein & Noah (1993); Chao-Qun & Hui (1993); Plazak & Mazur (1992); Steedman (1992); Singh, Marimutha & Mukjerjee (1990); *Levine & Lezotte (1990); O’Sullivan (1989); Somerset (1988); Pennycuick & Murphy (1988); Stevens (1984); Marsh (1984); Brunton (1982); Solberg (1977); Foss (1977); *Kirkland (1971); Somerset (1968); Stuit (1947); and Keys (1934).  *Covers many studies; study is a research review, research synthesis, or meta-analysis. Moreover, the mastery learning/mastery testing experiments conducted from the 1960s through today varied incentives, frequency of tests, types of tests, and many other factors to determine the optimal structure of testing programs. Researchers included such notables as Bloom, Carroll, Keller, Block, Burns, Wentling, Anderson, Hymel, Kulik, Tierney, Cross, Okey, Guskey, Gates, and Jones.
20 Daniel M. Koretz Jennifer L. Jennings Systematic research exploring educators’ understanding of both the principles of testing and appropriate interpretation of test-based data is meager.”, p.5 Dismissive The Misunderstanding and Use of Data from Educational Tests  Prepared for Spencer Foundation meetings, Chicago, IL, February 11, 2010. Revised November 21, 2010 http://www.spencer.org/data-use-and-educational-improvement-initiative-activities http://www.spencer.org/resources/content/3/3/8/documents/Koretz-Jennings-paper.pdf
21 Daniel M. Koretz Jennifer L. Jennings "Although current, systematic information is lacking, our experience is that that the level of understanding of test data among both educators and education policymakers is in many cases abysmally low.", p.6 Dismissive The Misunderstanding and Use of Data from Educational Tests  Prepared for Spencer Foundation meetings, Chicago, IL, February 11, 2010. Revised November 21, 2010 http://www.spencer.org/data-use-and-educational-improvement-initiative-activities http://www.spencer.org/resources/content/3/3/8/documents/Koretz-Jennings-paper.pdf
22 Daniel M. Koretz Jennifer L. Jennings "There has been a considerably (sic) amount of research exploring problems with standards-based reporting, but less on the use and interpretation of standards-based data by important stakeholders." p.12 Dismissive The Misunderstanding and Use of Data from Educational Tests  Prepared for Spencer Foundation meetings, Chicago, IL, February 11, 2010. Revised November 21, 2010 http://www.spencer.org/data-use-and-educational-improvement-initiative-activities http://www.spencer.org/resources/content/3/3/8/documents/Koretz-Jennings-paper.pdf Relevant studies include: Forte Fast, E., & the Accountability Systems and Reporting State Collaborative on Assessment and Student Standards. (2002). A guide to effective accountability reporting. Washington, DC: Council of Chief State School Officers. * Goodman, D., & Hambleton, R.K. (2005). Some misconceptions about large-scale educational assessments, Chapter 4 in Richard P Phelps (Ed.) Defending Standardized Testing, Psychology Press. * Goodman, D. P., & Hambleton (2004). Student test score reports and interpretive guides: Review of current practices and suggestions for future research. Applied Measurement in Education. * Hambleton, R. K. (2002). How can we make NAEP and state test score reporting scales and reports more understandable? In R. W. Lissitz & W. D. Schafer (Eds.), Assessment in educational reform (pp. 192-205). Boston: Allyn & Bacon. * Impara, J. C., Divine, K. P., Bruce, F. A., Liverman, M. R., & Gay, A. (1991). Does interpretive test score information help teachers? Educational Measurement: Issues and Practice, 10(4), 16-18. * Wainer, H., Hambleton, R. K., & Meara, K. (1999). Alternative displays for communicating NAEP results: A redesign and validity study. Journal of Educational Measurement, 36(4), 301-335.
23 Daniel M. Koretz Jennifer L. Jennings "We have heard former teachers discuss this frequently, saying that new teachers in many schools are inculcated with the notion that raising scores in tested subjects is in itself the appropriate goal of instruction. However, we lack systematic data about this..." p.14 Dismissive The Misunderstanding and Use of Data from Educational Tests  Prepared for Spencer Foundation meetings, Chicago, IL, February 11, 2010. Revised November 21, 2010 http://www.spencer.org/data-use-and-educational-improvement-initiative-activities http://www.spencer.org/resources/content/3/3/8/documents/Koretz-Jennings-paper.pdf
24 Daniel M. Koretz Jennifer L. Jennings "Research on score inflation is not abundant, largely for the reason discussed above: policymakers for the most part feel no obligation to allow the relevant research, which is not in their self-interest even when it is in the interests of students in schools. However, at this time, the evidence is both abundant enough and sufficiently often discussed that that the existence of the general issue of score inflation appears to be increasingly widely recognized by the media, policymakers, and educators." p.17 Dismissive The Misunderstanding and Use of Data from Educational Tests  Prepared for Spencer Foundation meetings, Chicago, IL, February 11, 2010. Revised November 21, 2010 http://www.spencer.org/data-use-and-educational-improvement-initiative-activities http://www.spencer.org/resources/content/3/3/8/documents/Koretz-Jennings-paper.pdf Externally administered high-stakes testing is widely reviled among US educationists. It strains credulity that Koretz can not find one district out of the many thousands to cooperate with him to discredit testing.
25 Daniel M. Koretz Jennifer L. Jennings "The issue of score inflation is both poorly understood and widely ignored in the research community as well." p.18 Denigrating The Misunderstanding and Use of Data from Educational Tests  Prepared for Spencer Foundation meetings, Chicago, IL, February 11, 2010. Revised November 21, 2010 http://www.spencer.org/data-use-and-educational-improvement-initiative-activities http://www.spencer.org/resources/content/3/3/8/documents/Koretz-Jennings-paper.pdf In fact the test prep, or test coaching, literature is vast and dates back decades, with meta-analyses of the literature dating back at least to the 1970s. There's even a What Works Clearinghouse summary of the (post World Wide Web) college admission test prep research literature:  https://ies.ed.gov/ncee/wwc/Docs/InterventionReports/wwc_act_sat_100416.pdf . See also: Messick & Jungeblut (1981)  Ellis, Konoske, Wulfeck, & Montague (1982)  DerSimonian and Laird (1983)  Kulik, Bangert-Drowns & Kulik (1984) Fraker (1986/1987) Halpin (1987) Whitla (1988)  Snedecor (1989)  Becker (1990)  Smyth (1990) Moore (1991)  Alderson & Wall (1992)  Powers (1993)  Powers & Rock (1994)  Scholes, Lane (1997)   Allalouf & Ben Shakhar (1998)  Robb & Ercanbrack (1999)  McClain (1999)  Camara (1999, 2001, 2008) Stone & Lane (2000, 2003)  Din & Soldan (2001)  Briggs (2001)  Palmer (2002)  Briggs & Hansen (2004)  Cankoy & Ali Tut (2005)  Crocker (2005)  Allensworth, Correa, & Ponisciak (2008)  Domingue & Briggs (2009) 
26 Daniel M. Koretz Jennifer L. Jennings "Research on coaching is very limited." p.21 Dismissive The Misunderstanding and Use of Data from Educational Tests  Prepared for Spencer Foundation meetings, Chicago, IL, February 11, 2010. Revised November 21, 2010 http://www.spencer.org/data-use-and-educational-improvement-initiative-activities http://www.spencer.org/resources/content/3/3/8/documents/Koretz-Jennings-paper.pdf In fact the test prep, or test coaching, literature is vast and dates back decades, with meta-analyses of the literature dating back at least to the 1970s. There's even a What Works Clearinghouse summary of the (post World Wide Web) college admission test prep research literature:  https://ies.ed.gov/ncee/wwc/Docs/InterventionReports/wwc_act_sat_100416.pdf . See also: Messick & Jungeblut (1981)  Ellis, Konoske, Wulfeck, & Montague (1982)  DerSimonian and Laird (1983)  Kulik, Bangert-Drowns & Kulik (1984) Fraker (1986/1987) Halpin (1987) Whitla (1988)  Snedecor (1989)  Becker (1990)  Smyth (1990) Moore (1991)  Alderson & Wall (1992)  Powers (1993)  Powers & Rock (1994)  Scholes, Lane (1997)   Allalouf & Ben Shakhar (1998)  Robb & Ercanbrack (1999)  McClain (1999)  Camara (1999, 2001, 2008) Stone & Lane (2000, 2003)  Din & Soldan (2001)  Briggs (2001)  Palmer (2002)  Briggs & Hansen (2004)  Cankoy & Ali Tut (2005)  Crocker (2005)  Allensworth, Correa, & Ponisciak (2008)  Domingue & Briggs (2009) 
27 Daniel M. Koretz Jennifer L. Jennings "How is test-based information used by educators? … The types of research done to date on this topic, while useful, are insufficient." p.26 Denigrating The Misunderstanding and Use of Data from Educational Tests  Prepared for Spencer Foundation meetings, Chicago, IL, February 11, 2010. Revised November 21, 2010 http://www.spencer.org/data-use-and-educational-improvement-initiative-activities http://www.spencer.org/resources/content/3/3/8/documents/Koretz-Jennings-paper.pdf
28 Daniel M. Koretz Jennifer L. Jennings … We need to design ways of measuring coaching, which has been almost entirely unstudied." p.26 Dismissive The Misunderstanding and Use of Data from Educational Tests  Prepared for Spencer Foundation meetings, Chicago, IL, February 11, 2010. Revised November 21, 2010 http://www.spencer.org/data-use-and-educational-improvement-initiative-activities http://www.spencer.org/resources/content/3/3/8/documents/Koretz-Jennings-paper.pdf In fact the test prep, or test coaching, literature is vast and dates back decades, with meta-analyses of the literature dating back at least to the 1970s. There's even a What Works Clearinghouse summary of the (post World Wide Web) college admission test prep research literature:  https://ies.ed.gov/ncee/wwc/Docs/InterventionReports/wwc_act_sat_100416.pdf . See also: Messick & Jungeblut (1981)  Ellis, Konoske, Wulfeck, & Montague (1982)  DerSimonian and Laird (1983)  Kulik, Bangert-Drowns & Kulik (1984) Fraker (1986/1987) Halpin (1987) Whitla (1988)  Snedecor (1989)  Becker (1990)  Smyth (1990) Moore (1991)  Alderson & Wall (1992)  Powers (1993)  Powers & Rock (1994)  Scholes, Lane (1997)   Allalouf & Ben Shakhar (1998)  Robb & Ercanbrack (1999)  McClain (1999)  Camara (1999, 2001, 2008) Stone & Lane (2000, 2003)  Din & Soldan (2001)  Briggs (2001)  Palmer (2002)  Briggs & Hansen (2004)  Cankoy & Ali Tut (2005)  Crocker (2005)  Allensworth, Correa, & Ponisciak (2008)  Domingue & Briggs (2009) 
29 Daniel M. Koretz Jennifer L. Jennings “We have few systematic studies of variations in educators’ responses. …” p. 26 Dismissive The Misunderstanding and Use of Data from Educational Tests  Prepared for Spencer Foundation meetings, Chicago, IL, February 11, 2010. Revised November 21, 2010 http://www.spencer.org/data-use-and-educational-improvement-initiative-activities http://www.spencer.org/resources/content/3/3/8/documents/Koretz-Jennings-paper.pdf Relevant studies of the effects of tests and/or accountability program on motivation and instructional practice include those of the *Southern Regional Education Board (1998); Johnson (1998); Schafer, Hultgren, Hawley, Abrams Seubert & Mazzoni (1997); Miles, Bishop, Collins, Fink, Gardner, Grant, Hussain, et al. (1997); Tuckman & Trimble (1997); Clarke & Stephens (1996); Zigarelli (1996); Stevenson, Lee, et al. (1995); Waters, Burger & Burger (1995); Egeland (1995); Prais (1995); Tuckman (1994); Ritchie & Thorkildsen (1994); Brown & Walberg, (1993); Wall & Alderson (1993); Wolf & Rapiau (1993); Eckstein & Noah (1993); Chao-Qun & Hui (1993); Plazak & Mazur (1992); Steedman (1992); Singh, Marimutha & Mukjerjee (1990); *Levine & Lezotte (1990); O’Sullivan (1989); Somerset (1988); Pennycuick & Murphy (1988); Stevens (1984); Marsh (1984); Brunton (1982); Solberg (1977); Foss (1977); *Kirkland (1971); Somerset (1968); Stuit (1947); and Keys (1934).  *Covers many studies; study is a research review, research synthesis, or meta-analysis. Moreover, the mastery learning/mastery testing experiments conducted from the 1960s through today varied incentives, frequency of tests, types of tests, and many other factors to determine the optimal structure of testing programs. Researchers included such notables as Bloom, Carroll, Keller, Block, Burns, Wentling, Anderson, Hymel, Kulik, Tierney, Cross, Okey, Guskey, Gates, and Jones.
30 Daniel M. Koretz Jennifer L. Jennings "Ultimately, our concern is the impact of educators’ understanding and use of test data on student learning. However, at this point, we have very little comparative information about the validity of gains, ....  The comparative information that is beginning to emerge suggests..." p.26 Dismissive The Misunderstanding and Use of Data from Educational Tests  Prepared for Spencer Foundation meetings, Chicago, IL, February 11, 2010. Revised November 21, 2010 http://www.spencer.org/data-use-and-educational-improvement-initiative-activities http://www.spencer.org/resources/content/3/3/8/documents/Koretz-Jennings-paper.pdf
31 Daniel M. Koretz   “The field of measurement has not kept pace with this transformation of testing.” p. 3 Denigrating Implications of current policy for educational measurement  paper presented at the Exploratory Seminar: Measurement Challenges Within the Race to the Top Agenda, December 2009 http://www.k12center.org/rsc/pdf/KoretzPresenterSession3.pdf  
32 Daniel M. Koretz   “For the most part, notwithstanding Lindquist’s warning, the field of measurement has largely ignored the top levels of sampling.” p. 6 Dismissive Implications of current policy for educational measurement  paper presented at the Exploratory Seminar: Measurement Challenges Within the Race to the Top Agenda, December 2009 http://www.k12center.org/rsc/pdf/KoretzPresenterSession3.pdf   Many psychometricians work in the field of gifted testing. Indeed, some specialize in it, and have created a large, robust research literature. One can find much of it at web sites such as "Hoagie's Gifted" and those for the gifted education research centers such as: Belin-Blank (in Iowa); Josephson (in Nevada); Johns Hopkins Center for Talented Youth (in Maryland); and Duke University's Talent Identification Program.
33 Daniel M. Koretz   “Currently, research on accountability‐related topics, such as score inflation and effects on educational practice, is slowly growing but remains largely divorced from the core activities of the measurement field.” p. 15 Dismissive Implications of current policy for educational measurement  paper presented at the Exploratory Seminar: Measurement Challenges Within the Race to the Top Agenda, December 2009 http://www.k12center.org/rsc/pdf/KoretzPresenterSession3.pdf  
34 Daniel M. Koretz   “The data, however, are more limited and more complex than is often realized, and the story they properly tell is not quite so straightforward. . . . Data about student performance at the end of high school are scarce and especially hard to collect and interpret.” p. 38 Dismissive How do American students measure up? Making Sense of International Comparisons The Future of Children 19:1 Spring 2009 http://www.princeton.edu/futureofchildren/publications/docs/19_01_FullJournal.pdf   Relevant studies of the effects of testing on at-risk students, completion, dropping out, curricular offerings, attitudes, etc. include those of Schleisman (1999); the *Southern Regional Education Board (1998); Webster, Mendro, Orsak, Weerasinghe & Bembry (1997); Jones (1996); Boylan (1996); Jones, 1993; Jacobson (1992); Grisay (1991); Johnstone (1990); Task Force on Educational Assessment Programs [Florida] (1979); Wellisch, MacQueen, Carriere & Duck (1978); Enochs (1978); Pronaratna (1976); and McWilliams & Thomas (1976).  *Covers many studies; study is a research review, research synthesis, or meta-analysis.
35 Daniel M. Koretz   “International comparisons clearly do not provide what many observers of education would like. . . . The findings are in some cases inconsistent from one study to another. Moreover, the data from all of these studies are poorly suited to separating the effects of schooling from the myriad other influences on student achievement. p 48 Dismissive How do American students measure up? Making Sense of International Comparisons The Future of Children 19:1 Spring 2009 http://www.princeton.edu/futureofchildren/publications/docs/19_01_FullJournal.pdf  
36 Daniel M. Koretz   If truly comparable data from the end of schooling were available, they would presumably look somewhat different, though it is unlikely that they would be greatly more optimistic.” p. 49 Dismissive How do American students measure up? Making Sense of International Comparisons The Future of Children 19:1 Spring 2009 http://www.princeton.edu/futureofchildren/publications/docs/19_01_FullJournal.pdf  
37 Daniel M. Koretz   Few detailed studies of score inflation have been carried out. ...” p. 778 Dismissive Test-based educational accountability. Research evidence and implication Zeitschrift für Pädagogik 54 (2008) 6, S. 777–790 http://www.pedocs.de/volltexte/2011/4376/pdf/ZfPaed_2008_6_Koretz_Testbased_educational_accountability_D_A.pdf   * The most famous test score inflation study of all time -- John J. Cannells "Lake Wobegon Effect" study -- preceded Koretz's by several years. See:  http://nonpartisaneducation.org/Review/Books/CannellBook1.htm  http://nonpartisaneducation.org/Review/Books/Cannell2.pdf
38 Daniel M. Koretz   “Unfortunately, while we have a lot of anecdotal evidence suggesting that this [equity as the rationale for NCLB] is the case, we have very few serious empirical studies of this.” answer to 3rd question, 1st para Denigrating What does educational testing really tell us?  Education Week [interview ], 9.23.2008 http://blogs.edweek.org/edweek/eduwonkette/2008/09/what_does_educational_testing_1.html   A "rationale" is an argument, a belief, an explanation, not an empirical result. The civil rights groups that supported NCLB did so because they saw it an an equity vehicle. 
39 Daniel M. Koretz   "…we rarely know when [test] scores are inflated because we so rarely check." Dismissive Interpreting test scores: More complicated than you think [interview] Chronicle of Higher Education, August 15, 2008, p. A23     In fact the test prep, or test coaching, literature is vast and dates back decades, with meta-analyses of the literature dating back at least to the 1970s. There's even a What Works Clearinghouse summary of the (post World Wide Web) college admission test prep research literature:  https://ies.ed.gov/ncee/wwc/Docs/InterventionReports/wwc_act_sat_100416.pdf . See also: Messick & Jungeblut (1981)  Ellis, Konoske, Wulfeck, & Montague (1982)  DerSimonian and Laird (1983)  Kulik, Bangert-Drowns & Kulik (1984) Fraker (1986/1987) Halpin (1987) Whitla (1988)  Snedecor (1989)  Becker (1990)  Smyth (1990) Moore (1991)  Alderson & Wall (1992)  Powers (1993)  Powers & Rock (1994)  Scholes, Lane (1997)   Allalouf & Ben Shakhar (1998)  Robb & Ercanbrack (1999)  McClain (1999)  Camara (1999, 2001, 2008) Stone & Lane (2000, 2003)  Din & Soldan (2001)  Briggs (2001)  Palmer (2002)  Briggs & Hansen (2004)  Cankoy & Ali Tut (2005)  Crocker (2005)  Allensworth, Correa, & Ponisciak (2008)  Domingue & Briggs (2009) 
40 Daniel M. Koretz   "... We know far too little about how to hold schools accountable for improving student performance.", p.9 Dismissive The pending reauthorization of NCLB: An opportunity to rethink the basic strategy Chapter 1 in Holding NCLB accountable: Achieving accountability, equity, and school reform, Gail Sunderland, Ed., 2008 Corwin Press  
41 Daniel M. Koretz   "A modest number of studies argue that high-stakes testing does or doesn't improve student performance in tested subjects.", p.10 Dismissive The pending reauthorization of NCLB: An opportunity to rethink the basic strategy Chapter 1 in Holding NCLB accountable: Achieving accountability, equity, and school reform, Gail Sunderland, Ed., 2008 Corwin Press   In fact, a very large number of studies do so. See, for example, https://www.tandfonline.com/doi/full/10.1080/15305058.2011.602920 ; https://nonpartisaneducation.org/Review/Resources/QuantitativeList.htm ; https://nonpartisaneducation.org/Review/Resources/SurveyList.htm ; https://nonpartisaneducation.org/Review/Resources/QualitativeList.htm
42 Daniel M. Koretz   "This research tells us little. Much of it is of very low quality, and even the careful studies are hobbled by data that are inadequate for the task.", p.10 Denigrating The pending reauthorization of NCLB: An opportunity to rethink the basic strategy Chapter 1 in Holding NCLB accountable: Achieving accountability, equity, and school reform, Gail Sunderland, Ed., 2008 Corwin Press   In fact, a very large number of studies do so. See, for example, https://www.tandfonline.com/doi/full/10.1080/15305058.2011.602920 ; https://nonpartisaneducation.org/Review/Resources/QuantitativeList.htm ; https://nonpartisaneducation.org/Review/Resources/SurveyList.htm ; https://nonpartisaneducation.org/Review/Resources/QualitativeList.htm
43 Daniel M. Koretz   "Moreover, this research asks too simple a question. Asking whether test-based accountability works is a bit like asking whether medicine works. What medicines? For what medical conditions?", p.10 Denigrating The pending reauthorization of NCLB: An opportunity to rethink the basic strategy Chapter 1 in Holding NCLB accountable: Achieving accountability, equity, and school reform, Gail Sunderland, Ed., 2008 Corwin Press   In fact, a very large number of studies do so. See, for example, https://www.tandfonline.com/doi/full/10.1080/15305058.2011.602920 ; https://nonpartisaneducation.org/Review/Resources/QuantitativeList.htm ; https://nonpartisaneducation.org/Review/Resources/SurveyList.htm ; https://nonpartisaneducation.org/Review/Resources/QualitativeList.htm
44 Daniel M. Koretz   "We need research and evaluation to address this question, because we lack a grounded answer.", p.11 Dismissive The pending reauthorization of NCLB: An opportunity to rethink the basic strategy Chapter 1 in Holding NCLB accountable: Achieving accountability, equity, and school reform, Gail Sunderland, Ed., 2008 Corwin Press   In fact, a very large number of studies do so. See, for example, https://www.tandfonline.com/doi/full/10.1080/15305058.2011.602920 ; https://nonpartisaneducation.org/Review/Resources/QuantitativeList.htm ; https://nonpartisaneducation.org/Review/Resources/SurveyList.htm ; https://nonpartisaneducation.org/Review/Resources/QualitativeList.htm
45 Daniel M. Koretz   " ... research does not tell us whether high-stakes testing works.", p.11 Dismissive The pending reauthorization of NCLB: An opportunity to rethink the basic strategy Chapter 1 in Holding NCLB accountable: Achieving accountability, equity, and school reform, Gail Sunderland, Ed., 2008 Corwin Press   In fact, a very large number of studies do so. See, for example, https://www.tandfonline.com/doi/full/10.1080/15305058.2011.602920 ; https://nonpartisaneducation.org/Review/Resources/QuantitativeList.htm ; https://nonpartisaneducation.org/Review/Resources/SurveyList.htm ; https://nonpartisaneducation.org/Review/Resources/QualitativeList.htm
46 Daniel M. Koretz   "The few relevant studies [of test score inflation] are of two types: detailed evaluations of scores in specific jurisdictions, .... We have far fewer ... than we should.", pp.11-12 Denigrating The pending reauthorization of NCLB: An opportunity to rethink the basic strategy Chapter 1 in Holding NCLB accountable: Achieving accountability, equity, and school reform, Gail Sunderland, Ed., 2008 Corwin Press   In fact the test prep, or test coaching, literature is vast and dates back decades, with meta-analyses of the literature dating back at least to the 1970s. There's even a What Works Clearinghouse summary of the (post World Wide Web) college admission test prep research literature:  https://ies.ed.gov/ncee/wwc/Docs/InterventionReports/wwc_act_sat_100416.pdf . See also: Messick & Jungeblut (1981)  Ellis, Konoske, Wulfeck, & Montague (1982)  DerSimonian and Laird (1983)  Kulik, Bangert-Drowns & Kulik (1984) Fraker (1986/1987) Halpin (1987) Whitla (1988)  Snedecor (1989)  Becker (1990)  Smyth (1990) Moore (1991)  Alderson & Wall (1992)  Powers (1993)  Powers & Rock (1994)  Scholes, Lane (1997)   Allalouf & Ben Shakhar (1998)  Robb & Ercanbrack (1999)  McClain (1999)  Camara (1999, 2001, 2008) Stone & Lane (2000, 2003)  Din & Soldan (2001)  Briggs (2001)  Palmer (2002)  Briggs & Hansen (2004)  Cankoy & Ali Tut (2005)  Crocker (2005)  Allensworth, Correa, & Ponisciak (2008)  Domingue & Briggs (2009) 
47 Daniel M. Koretz   "The results of the relatively few relevant studies are both striking and consistent: gains on high-stakes tests often do not generalize well to other measures, and the gap is frequently huge." p.12 Dismissive The pending reauthorization of NCLB: An opportunity to rethink the basic strategy Chapter 1 in Holding NCLB accountable: Achieving accountability, equity, and school reform, Gail Sunderland, Ed., 2008 Corwin Press   In fact the test prep, or test coaching, literature is vast and dates back decades, with meta-analyses of the literature dating back at least to the 1970s. There's even a What Works Clearinghouse summary of the (post World Wide Web) college admission test prep research literature:  https://ies.ed.gov/ncee/wwc/Docs/InterventionReports/wwc_act_sat_100416.pdf . See also: Messick & Jungeblut (1981)  Ellis, Konoske, Wulfeck, & Montague (1982)  DerSimonian and Laird (1983)  Kulik, Bangert-Drowns & Kulik (1984) Fraker (1986/1987) Halpin (1987) Whitla (1988)  Snedecor (1989)  Becker (1990)  Smyth (1990) Moore (1991)  Alderson & Wall (1992)  Powers (1993)  Powers & Rock (1994)  Scholes, Lane (1997)   Allalouf & Ben Shakhar (1998)  Robb & Ercanbrack (1999)  McClain (1999)  Camara (1999, 2001, 2008) Stone & Lane (2000, 2003)  Din & Soldan (2001)  Briggs (2001)  Palmer (2002)  Briggs & Hansen (2004)  Cankoy & Ali Tut (2005)  Crocker (2005)  Allensworth, Correa, & Ponisciak (2008)  Domingue & Briggs (2009) 
48 Daniel M. Koretz   "But this remains only a hypothesis, not yet tested by much empirical evidence." p.14 Dismissive The pending reauthorization of NCLB: An opportunity to rethink the basic strategy Chapter 1 in Holding NCLB accountable: Achieving accountability, equity, and school reform, Gail Sunderland, Ed., 2008 Corwin Press   The most famous test score inflation study of all time -- John J. Cannells "Lake Wobegon Effect" study -- is largely about cheating. See:  http://nonpartisaneducation.org/Review/Books/CannellBook1.htm  http://nonpartisaneducation.org/Review/Books/Cannell2.pdf;  See also Gregory J. Cizek's Cheating on Tests: https://www.goodreads.com/book/show/5084641-cheating-on-tests ; and Caveon Test Security's resource pages: https://www.caveon.com/resources/
49 Daniel M. Koretz   "We urgently need finer grained studies of this issue.", p.14 Denigrating The pending reauthorization of NCLB: An opportunity to rethink the basic strategy Chapter 1 in Holding NCLB accountable: Achieving accountability, equity, and school reform, Gail Sunderland, Ed., 2008 Corwin Press   The most famous test score inflation study of all time -- John J. Cannells "Lake Wobegon Effect" study -- is largely about cheating. See:  http://nonpartisaneducation.org/Review/Books/CannellBook1.htm  http://nonpartisaneducation.org/Review/Books/Cannell2.pdf;  See also Gregory J. Cizek's Cheating on Tests: https://www.goodreads.com/book/show/5084641-cheating-on-tests ; and Caveon Test Security's resource pages: https://www.caveon.com/resources/
50 Daniel M. Koretz   "There are limited systematic data about cheating.", p.16 Denigrating The pending reauthorization of NCLB: An opportunity to rethink the basic strategy Chapter 1 in Holding NCLB accountable: Achieving accountability, equity, and school reform, Gail Sunderland, Ed., 2008 Corwin Press   The most famous test score inflation study of all time -- John J. Cannells "Lake Wobegon Effect" study -- is largely about cheating. See:  http://nonpartisaneducation.org/Review/Books/CannellBook1.htm  http://nonpartisaneducation.org/Review/Books/Cannell2.pdf;  See also Gregory J. Cizek's Cheating on Tests: https://www.goodreads.com/book/show/5084641-cheating-on-tests ; and Caveon Test Security's resource pages: https://www.caveon.com/resources/
51 Daniel M. Koretz   "Building those better [accountability] systems requires more systematic, empirical data, and that, in turn, requires a serious agenda of R&D.", p.26 Denigrating The pending reauthorization of NCLB: An opportunity to rethink the basic strategy Chapter 1 in Holding NCLB accountable: Achieving accountability, equity, and school reform, Gail Sunderland, Ed., 2008 Corwin Press  
52 Daniel M. Koretz   “… [T]he problem of score inflation is at best inconvenient and at worse [sic] threatening. (The latter is one reason that there are so few studies of this problem. …)” p. 11 Dismissive Measuring up: What educational testing really tells us Harvard University Press, 2008  Google Books   Externally administered high-stakes testing is widely reviled among US educationists. It strains credulity that Koretz can not find one district out of the many thousands to cooperate with him to discredit testing.
53 Daniel M. Koretz   “The relatively few studies that have addressed this question support the skeptical interpretation: in many cases, mastery of material on the new test simply substitutes for mastery of the old.” p. 242 Dismissive Measuring up: What educational testing really tells us Harvard University Press, 2008  Google Books  
54 Daniel M. Koretz   “Because so many people consider test-based accountability to be self-evaluating … there is a disturbing lack of good evaluations of these systems. …”) p. 331 Denigrating Measuring up: What educational testing really tells us Harvard University Press, 2008  Google Books   
55 Daniel M. Koretz   Most of these few studies showed a rapid divergence of means on the two tests. …” p. 348 Dismissive Using aggregate-level linkages for estimation and valuation, etc. in Linking and Aligning Scores and Scales, Springer, 2007 Google Books  
56 Daniel M. Koretz   "Research to date makes clear that score gains achieved under high-stakes conditions should not be accepted at face value. ...policymakers embarking on an effort to create a more effective system of ...accountability must face uncertainty about how well alternatives will function in practice, and should be prepared for a period of evaluation and mid-course correction." Dismissive Alignment, High Stakes, and the Inflation of Test Scores CRESST Report 655, June 2005    
57 Daniel M. Koretz   "Thus, even in a well-aligned system, policymakers still face the challenge of designing educational accountability systems that create the right mix of incentives: incentives that will maximize real gains in student performance, minimize score inflation, and generate other desirable changes in educational practice. This is a challenge in part because of a shortage of relevant experience and research..." Dismissive Alignment, High Stakes, and the Inflation of Test Scores CRESST Report 655, June 2005     Relevant studies of the effects of varying types of incentive or the optimal structure of incentives include those of Kelley (1999); the *Southern Regional Education Board (1998); Trelfa (1998); Heneman (1998); Banta, Lund, Black & Oblander (1996); Brooks-Cooper, 1993; Eckstein & Noah (1993); Richards & Shen (1992); Jacobson (1992); Heyneman & Ransom (1992); *Levine & Lezotte (1990); Duran, 1989; *Crooks (1988); *Kulik & Kulik (1987); Corcoran & Wilson (1986); *Guskey & Gates (1986); Brook & Oxenham (1985); Oxenham (1984); Venezky & Winfield (1979); Brookover & Lezotte (1979); McMillan (1977); Abbott (1977); *Staats (1973); *Kazdin & Bootzin (1972); *O’Leary & Drabman (1971); Cronbach (1960); and Hurlock (1925).   *Covers many studies; study is a research review, research synthesis, or meta-analysis.  Other researchers who, prior to 2000, studied test-based incentive programs include Homme, Csanyi, Gonzales, Rechs, O’Leary, Drabman, Kaszdin, Bootzin, Staats, Cameron, Pierce, McMillan, Corcoran, Roueche, Kirk, Wheeler, Boylan, and Wilson. Moreover, the mastery learning/mastery testing experiments conducted from the 1960s through today varied incentives, frequency of tests, types of tests, and many other factors to determine the optimal structure of testing programs. Researchers included such notables as Bloom, Carroll, Keller, Block, Burns, Wentling, Anderson, Hymel, Kulik, Tierney, Cross, Okey, Guskey, Gates, and Jones.
58 Daniel M. Koretz   "Research has yet to clarify how variations in the performance targets set for schools affect the incentives faced by teachers and the resulting validity of score gains." Dismissive Alignment, High Stakes, and the Inflation of Test Scores CRESST Report 655, June 2005     Relevant studies of the effects of varying types of incentive or the optimal structure of incentives include those of Kelley (1999); the *Southern Regional Education Board (1998); Trelfa (1998); Heneman (1998); Banta, Lund, Black & Oblander (1996); Brooks-Cooper, 1993; Eckstein & Noah (1993); Richards & Shen (1992); Jacobson (1992); Heyneman & Ransom (1992); *Levine & Lezotte (1990); Duran, 1989; *Crooks (1988); *Kulik & Kulik (1987); Corcoran & Wilson (1986); *Guskey & Gates (1986); Brook & Oxenham (1985); Oxenham (1984); Venezky & Winfield (1979); Brookover & Lezotte (1979); McMillan (1977); Abbott (1977); *Staats (1973); *Kazdin & Bootzin (1972); *O’Leary & Drabman (1971); Cronbach (1960); and Hurlock (1925).   *Covers many studies; study is a research review, research synthesis, or meta-analysis.  Other researchers who, prior to 2000, studied test-based incentive programs include Homme, Csanyi, Gonzales, Rechs, O’Leary, Drabman, Kaszdin, Bootzin, Staats, Cameron, Pierce, McMillan, Corcoran, Roueche, Kirk, Wheeler, Boylan, and Wilson. Moreover, the mastery learning/mastery testing experiments conducted from the 1960s through today varied incentives, frequency of tests, types of tests, and many other factors to determine the optimal structure of testing programs. Researchers included such notables as Bloom, Carroll, Keller, Block, Burns, Wentling, Anderson, Hymel, Kulik, Tierney, Cross, Okey, Guskey, Gates, and Jones.
59 Daniel M. Koretz   "In terms of research, the jury is still out." Dismissive Alignment, High Stakes, and the Inflation of Test Scores CRESST Report 655, June 2005     Relevant studies of the effects of varying types of incentive or the optimal structure of incentives include those of Kelley (1999); the *Southern Regional Education Board (1998); Trelfa (1998); Heneman (1998); Banta, Lund, Black & Oblander (1996); Brooks-Cooper, 1993; Eckstein & Noah (1993); Richards & Shen (1992); Jacobson (1992); Heyneman & Ransom (1992); *Levine & Lezotte (1990); Duran, 1989; *Crooks (1988); *Kulik & Kulik (1987); Corcoran & Wilson (1986); *Guskey & Gates (1986); Brook & Oxenham (1985); Oxenham (1984); Venezky & Winfield (1979); Brookover & Lezotte (1979); McMillan (1977); Abbott (1977); *Staats (1973); *Kazdin & Bootzin (1972); *O’Leary & Drabman (1971); Cronbach (1960); and Hurlock (1925).   *Covers many studies; study is a research review, research synthesis, or meta-analysis.  Other researchers who, prior to 2000, studied test-based incentive programs include Homme, Csanyi, Gonzales, Rechs, O’Leary, Drabman, Kaszdin, Bootzin, Staats, Cameron, Pierce, McMillan, Corcoran, Roueche, Kirk, Wheeler, Boylan, and Wilson.
60 Daniel M. Koretz   "The first study to evaluate score inflation empirically (Koretz, Linn, Dunbar, and Shepard, 1991) looked at a district-testing program in the 1980s that used commercial, off-the-shelf, multiple-choice achievement tests."  1stness Alignment, High Stakes, and the Inflation of Test Scores, p.7 CRESST Report 655, June 2005     * The most famous test score inflation study of all time -- John J. Cannells "Lake Wobegon Effect" study -- preceded Koretz's by several years. See:  http://nonpartisaneducation.org/Review/Books/CannellBook1.htm  http://nonpartisaneducation.org/Review/Books/Cannell2.pdf
61 Laura S. Hamilton Daniel F. McCaffrey, J.R. Lockwood, Daniel M. Koretz “The shortcomings of the studies make it difficult to determine the size of teacher effects, but we suspect that the magnitude of some of the effects reported in this literature are overstated.” p. xiii Denigrating Evaluating Value-Added Models for Teacher Accountability  Rand Corporation, 2003 https://www.rand.org/content/dam/rand/pubs/monographs/2004/RAND_MG158.pdf  
62 Laura S. Hamilton Daniel F. McCaffrey, J.R. Lockwood, Daniel M. Koretz “Using VAM to estimate individual teacher effects is a recent endeavor, and many of the possible sources of error have not been thoroughly evaluated in the literature.” p. xix Dismissive Evaluating Value-Added Models for Teacher Accountability  Rand Corporation, 2003 https://www.rand.org/content/dam/rand/pubs/monographs/2004/RAND_MG158.pdf  
63 Laura S. Hamilton Daniel F. McCaffrey, J.R. Lockwood, Daniel M. Koretz “Empirical evaluations do not exist for many of the potential sources of error we have identified. Studies need to be conducted to determine how these factors contribute to estimated teacher effects and to determine the conditions that exacerbate or mitigate the impact these factors have on teacher effects.” p. xix Dismissive Evaluating Value-Added Models for Teacher Accountability  Rand Corporation, 2003 https://www.rand.org/content/dam/rand/pubs/monographs/2004/RAND_MG158.pdf  
64 Laura S. Hamilton Daniel F. McCaffrey, J.R. Lockwood, Daniel M. Koretz “This lack of attention to teachers in policy discussions may be attributed in part to another body of literature that attempted to determine the effects of specific teacher background characteristics, including credentialing status (e.g., Miller, McKenna, and McKenna, 1998; Goldhaber and Brewer, 2000) and subject matter coursework (e.g., Monk, 1994).” p. 8 Dismissive Evaluating Value-Added Models for Teacher Accountability  Rand Corporation, 2003 https://www.rand.org/content/dam/rand/pubs/monographs/2004/RAND_MG158.pdf  
65 Laura S. Hamilton Daniel F. McCaffrey, J.R. Lockwood, Daniel M. Koretz “To date, there has been little empirical exploration of the size of school effects and the sensitivity of teacher effects to modeling of school effects.” p. 78 Dismissive Evaluating Value-Added Models for Teacher Accountability  Rand Corporation, 2003 https://www.rand.org/content/dam/rand/pubs/monographs/2004/RAND_MG158.pdf  
66 Laura S. Hamilton Daniel F. McCaffrey, J.R. Lockwood, Daniel M. Koretz “There are no empirical explorations of the robustness of estimates to assumptions about prior-year schooling effects.“ p. 81 Dismissive Evaluating Value-Added Models for Teacher Accountability  Rand Corporation, 2003 https://www.rand.org/content/dam/rand/pubs/monographs/2004/RAND_MG158.pdf  
67 Laura S. Hamilton Daniel F. McCaffrey, J.R. Lockwood, Daniel M. Koretz “There is currently no empirical evidence about the sensitivity of gain scores or teacher effects to such alternatives.” p. 89 Dismissive Evaluating Value-Added Models for Teacher Accountability  Rand Corporation, 2003 https://www.rand.org/content/dam/rand/pubs/monographs/2004/RAND_MG158.pdf  
68 Laura S. Hamilton Daniel F. McCaffrey, J.R. Lockwood, Daniel M. Koretz “Empirical evaluations do not exist for many of the potential sources of error we have identified. Studies need to be conducted to determine how these factors contribute to estimated teacher effects and to determine the conditions that exacerbate or mitigate the impact these factors have on teacher effects.” p. 116 Dismissive Evaluating Value-Added Models for Teacher Accountability  Rand Corporation, 2003 https://www.rand.org/content/dam/rand/pubs/monographs/2004/RAND_MG158.pdf  
69 Laura S. Hamilton Daniel F. McCaffrey, J.R. Lockwood, Daniel M. Koretz “Although we expect missing data are likely to be pervasive, there is little systematic discussion of the extent or nature of missing data in test score databases.” p. 117 Dismissive Evaluating Value-Added Models for Teacher Accountability  Rand Corporation, 2003 https://www.rand.org/content/dam/rand/pubs/monographs/2004/RAND_MG158.pdf  
70 Daniel M. Koretz   "Empirical research on the validity of score gains on high-stakes tests is limited, but the studies conducted to date show…" Dismissive Using multiple measures to address perverse incentives an score inflation, p.21 Educational Measurement: Issues and Practice, Summer 2003    
71 Daniel M. Koretz   "Research on educators' responses to high-stakes testing is also limited, …" Dismissive Using multiple measures to address perverse incentives an score inflation, p.21 Educational Measurement: Issues and Practice, Summer 2003    
72 Daniel M. Koretz   "Although extant research is sufficient to document problems of score inflation and unintended incentives from test-based accountability, it provides very little guidance about how one might design an accountability system to lessen these problems."  Denigrating Using multiple measures to address perverse incentives an score inflation, p.22 Educational Measurement: Issues and Practice, Summer 2003    
73 Daniel M. Koretz   “Relatively few studies, however, provide strong empirical evidence pertaining to inflation of entire scores on tests used for accountability.” p. 759 Denigrating Limitations in the use of achievement tests as measures of educators’ productivity  The Journal of Human Resources, 37:4 (Fall 2002) http://standardizedtests.procon.org/sourcefiles/limitations-in-the-use-of-achievement-tests-as-measures-of-educators-productivity.pdf  
74 Daniel M. Koretz   “Only a few studies have directly tested the generalizability of gains in scores on accountability-oriented tests.” p. 759 Dismissive Limitations in the use of achievement tests as measures of educators’ productivity  The Journal of Human Resources, 37:4 (Fall 2002) http://standardizedtests.procon.org/sourcefiles/limitations-in-the-use-of-achievement-tests-as-measures-of-educators-productivity.pdf   "Validity" studies are common, even routine, parts of large-scale testing programs' technical reports. 
75 Daniel M. Koretz   “Moreover, while there are numerous anecdotal reports of various types of coaching, little systematic research describes the range of coaching strategies and their effects.” p. 769 Dismissive Limitations in the use of achievement tests as measures of educators’ productivity  The Journal of Human Resources, 37:4 (Fall 2002) http://standardizedtests.procon.org/sourcefiles/limitations-in-the-use-of-achievement-tests-as-measures-of-educators-productivity.pdf   In fact the test prep, or test coaching, literature is vast and dates back decades, with meta-analyses of the literature dating back at least to the 1970s. There's even a What Works Clearinghouse summary of the (post World Wide Web) college admission test prep research literature:  https://ies.ed.gov/ncee/wwc/Docs/InterventionReports/wwc_act_sat_100416.pdf . See also: Messick & Jungeblut (1981)  Ellis, Konoske, Wulfeck, & Montague (1982)  DerSimonian and Laird (1983)  Kulik, Bangert-Drowns & Kulik (1984) Fraker (1986/1987) Halpin (1987) Whitla (1988)  Snedecor (1989)  Becker (1990)  Smyth (1990) Moore (1991)  Alderson & Wall (1992)  Powers (1993)  Powers & Rock (1994)  Scholes, Lane (1997)   Allalouf & Ben Shakhar (1998)  Robb & Ercanbrack (1999)  McClain (1999)  Camara (1999, 2001, 2008) Stone & Lane (2000, 2003)  Din & Soldan (2001)  Briggs (2001)  Palmer (2002)  Briggs & Hansen (2004)  Cankoy & Ali Tut (2005)  Crocker (2005)  Allensworth, Correa, & Ponisciak (2008)  Domingue & Briggs (2009) 
76 Laura S. Hamilton Daniel M. Koretz "There is currently no substantial evidence on the effects of published report cards on parents’ decisionmaking or on the schools themselves." Dismissive Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002 Chapter 2: Tests and their use in test-based accountability systems, p.44
    For decades, consulting services have existed that help parents new to a city select the right school or school district for them.
77 Daniel M. Koretz Daniel F. McCaffrey, Laura S. Hamilton "Few efforts are made to evaluate directly score gains obtained under high-stakes conditions, and conventional validation tools are not fully adequate for the task.", p. 1 Dismissive Toward a framework for validating gains under high-stakes conditions CSE Technical Report 551, CRESST/Harvard Graduate School of Education, CRESST/RAND Education, December 2001     In fact the test prep, or test coaching, literature is vast and dates back decades, with meta-analyses of the literature dating back at least to the 1970s. There's even a What Works Clearinghouse summary of the (post World Wide Web) college admission test prep research literature:  https://ies.ed.gov/ncee/wwc/Docs/InterventionReports/wwc_act_sat_100416.pdf . See also: Messick & Jungeblut (1981)  Ellis, Konoske, Wulfeck, & Montague (1982)  DerSimonian and Laird (1983)  Kulik, Bangert-Drowns & Kulik (1984) Fraker (1986/1987) Halpin (1987) Whitla (1988)  Snedecor (1989)  Becker (1990)  Smyth (1990) Moore (1991)  Alderson & Wall (1992)  Powers (1993)  Powers & Rock (1994)  Scholes, Lane (1997)   Allalouf & Ben Shakhar (1998)  Robb & Ercanbrack (1999)  McClain (1999)  Camara (1999, 2001, 2008) Stone & Lane (2000, 2003)  Din & Soldan (2001)  Briggs (2001)  Palmer (2002)  Briggs & Hansen (2004)  Cankoy & Ali Tut (2005)  Crocker (2005)  Allensworth, Correa, & Ponisciak (2008)  Domingue & Briggs (2009) 
78 Daniel M. Koretz Mark Berends “[T]here has been little systematic research exploring changes in grading standards. …” p. iii Dismissive Changes in high school grading standards in mathematics, 1982–1992  Rand Education, 2001 http://www.rand.org/content/dam/rand/pubs/monograph_reports/2007/MR1445.pdf   See a review of hundreds of studies:  Brookhart et al. (2016) A Century of Grading Research: A Century of Grading Research: Meaning and Value in the Most Common Educational Measure. Brookhart, S. M., Guskey, T. R., Bowers, A. J., McMillan, J. H., Smith, J. K., Smith, L. F., Stevens, M.T., Welsh, M. E. (2016). A Century of Grading Research: Meaning and Value in the Most Common Educational Measure. Review of Educational Research, 86(4), 803-848.
doi: 10.3102/0034654316672069   http://doi.org/10.3102/0034654316672069
79 Daniel M. Koretz Mark Berends [F]ew studies have attempted to evaluate systematically changes in grading standards over time.” p. xi Dismissive Changes in high school grading standards in mathematics, 1982–1992  Rand Education, 2001 http://www.rand.org/content/dam/rand/pubs/monograph_reports/2007/MR1445.pdf   See a review of hundreds of studies:  Brookhart et al. (2016) A Century of Grading Research: A Century of Grading Research: Meaning and Value in the Most Common Educational Measure. Brookhart, S. M., Guskey, T. R., Bowers, A. J., McMillan, J. H., Smith, J. K., Smith, L. F., Stevens, M.T., Welsh, M. E. (2016). A Century of Grading Research: Meaning and Value in the Most Common Educational Measure. Review of Educational Research, 86(4), 803-848.
doi: 10.3102/0034654316672069   http://doi.org/10.3102/0034654316672069
80 Daniel M. Koretz E. A. Hanushek, J. J. Heckman, and D. Neal (organizers) "Research provides sparse guidance about how to broaden the range of measured outcomes to provide a better mix of incentives and lessen score inflation.", p.27 Dismissive Limitations in the Use of Achievement Tests as Measures of Educators’ Productivity  Devising Incentives to Promote Human Capital, National Academy of Sciences Conference, May 2000 http://www.irp.wisc.edu/newsevents/other/symposia/koretz.pdf   Relevant studies of the effects of varying types of incentive or the optimal structure of incentives include those of Kelley (1999); the *Southern Regional Education Board (1998); Trelfa (1998); Heneman (1998); Banta, Lund, Black & Oblander (1996); Brooks-Cooper, 1993; Eckstein & Noah (1993); Richards & Shen (1992); Jacobson (1992); Heyneman & Ransom (1992); *Levine & Lezotte (1990); Duran, 1989; *Crooks (1988); *Kulik & Kulik (1987); Corcoran & Wilson (1986); *Guskey & Gates (1986); Brook & Oxenham (1985); Oxenham (1984); Venezky & Winfield (1979); Brookover & Lezotte (1979); McMillan (1977); Abbott (1977); *Staats (1973); *Kazdin & Bootzin (1972); *O’Leary & Drabman (1971); Cronbach (1960); and Hurlock (1925).   *Covers many studies; study is a research review, research synthesis, or meta-analysis. Moreover, the mastery learning/mastery testing experiments conducted from the 1960s through today varied incentives, frequency of tests, types of tests, and many other factors to determine the optimal structure of testing programs. Researchers included such notables as Bloom, Carroll, Keller, Block, Burns, Wentling, Anderson, Hymel, Kulik, Tierney, Cross, Okey, Guskey, Gates, and Jones.
81 Daniel M. Koretz E. A. Hanushek, J. J. Heckman, and D. Neal (organizers) "...what types of accountability systems might be more effective, and what role might achievement tests play in them? Unfortunately, there is little basis in research for answering this question. The simple test-based accountability systems that have been in vogue for the past two decades have appeared so commonsensical to some policymakers that they have had little incentive to permit the evaluation of alternatives.", p.25 Dismissive Limitations in the Use of Achievement Tests as Measures of Educators’ Productivity  Devising Incentives to Promote Human Capital, National Academy of Sciences Conference, May 2000 http://www.irp.wisc.edu/newsevents/other/symposia/koretz.pdf   Externally administered high-stakes testing is widely reviled among US educationists. It strains credulity that Koretz can not find one district out of the many thousands to cooperate with him to discredit testing.
82 Daniel M. Koretz E. A. Hanushek, J. J. Heckman, and D. Neal (organizers) "...while there are numerous anecdotal reports of various types of coaching, little systematic research describes the range of coaching strategies and their effects.", p.24 Denigrating Limitations in the Use of Achievement Tests as Measures of Educators’ Productivity  Devising Incentives to Promote Human Capital, National Academy of Sciences Conference, May 2000 http://www.irp.wisc.edu/newsevents/other/symposia/koretz.pdf   In fact the test prep, or test coaching, literature is vast and dates back decades, with meta-analyses of the literature dating back at least to the 1970s. There's even a What Works Clearinghouse summary of the (post World Wide Web) college admission test prep research literature:  https://ies.ed.gov/ncee/wwc/Docs/InterventionReports/wwc_act_sat_100416.pdf . See also: Messick & Jungeblut (1981)  Ellis, Konoske, Wulfeck, & Montague (1982)  DerSimonian and Laird (1983)  Kulik, Bangert-Drowns & Kulik (1984) Fraker (1986/1987) Halpin (1987) Whitla (1988)  Snedecor (1989)  Becker (1990)  Smyth (1990) Moore (1991)  Alderson & Wall (1992)  Powers (1993)  Powers & Rock (1994)  Scholes, Lane (1997)   Allalouf & Ben Shakhar (1998)  Robb & Ercanbrack (1999)  McClain (1999)  Camara (1999, 2001, 2008) Stone & Lane (2000, 2003)  Din & Soldan (2001)  Briggs (2001)  Palmer (2002)  Briggs & Hansen (2004)  Cankoy & Ali Tut (2005)  Crocker (2005)  Allensworth, Correa, & Ponisciak (2008)  Domingue & Briggs (2009) 
83 Daniel M. Koretz E. A. Hanushek, J. J. Heckman, and D. Neal (organizers) "Only a few studies have directly tested the generalizability of gains in scores on accountability-oriented tests.", p.11 Denigrating Limitations in the Use of Achievement Tests as Measures of Educators’ Productivity  Devising Incentives to Promote Human Capital, National Academy of Sciences Conference, May 2000 http://www.irp.wisc.edu/newsevents/other/symposia/koretz.pdf   "Validity" studies are common, even routine, parts of large-scale testing programs' technical reports. 
84 Daniel M. Koretz E. A. Hanushek, J. J. Heckman, and D. Neal (organizers) "Relatively few studies, however, provide strong empirical evidence pertaining to inflation of entire scores on tests used for accountability.  Policy makers have little incentive to facilitate such studies, and they can be difficult to carry out.", p.11 Denigrating Limitations in the Use of Achievement Tests as Measures of Educators’ Productivity  Devising Incentives to Promote Human Capital, National Academy of Sciences Conference, May 2000 http://www.irp.wisc.edu/newsevents/other/symposia/koretz.pdf   Externally administered high-stakes testing is widely reviled among US educationists. It strains credulity that Koretz can not find one district out of the many thousands to cooperate with him to discredit testing.
85 Daniel M. Koretz Sheila I. Barron “In the absence of systematic research documenting test-based accountability systems that have avoided the problem of inflated gains …” p. xvii Dismissive The validity of gains in scores on the Kentucky Instructional Results Information System (KIRIS)  Rand Education, 1998 http://www.rand.org/content/dam/rand/pubs/monograph_reports/2009/MR1014.pdf   In fact the test prep, or test coaching, literature is vast and dates back decades, with meta-analyses of the literature dating back at least to the 1970s. There's even a What Works Clearinghouse summary of the (post World Wide Web) college admission test prep research literature:  https://ies.ed.gov/ncee/wwc/Docs/InterventionReports/wwc_act_sat_100416.pdf . See also: Messick & Jungeblut (1981)  Ellis, Konoske, Wulfeck, & Montague (1982)  DerSimonian and Laird (1983)  Kulik, Bangert-Drowns & Kulik (1984) Fraker (1986/1987) Halpin (1987) Whitla (1988)  Snedecor (1989)  Becker (1990)  Smyth (1990) Moore (1991)  Alderson & Wall (1992)  Powers (1993)  Powers & Rock (1994)  Scholes, Lane (1997)   Allalouf & Ben Shakhar (1998)  Robb & Ercanbrack (1999)  McClain (1999)  Camara (1999, 2001, 2008) Stone & Lane (2000, 2003)  Din & Soldan (2001)  Briggs (2001)  Palmer (2002)  Briggs & Hansen (2004)  Cankoy & Ali Tut (2005)  Crocker (2005)  Allensworth, Correa, & Ponisciak (2008)  Domingue & Briggs (2009) 
86 Daniel M. Koretz Sheila I. Barron “This study also illustrated in numerous ways the limitations of current research on the validity of gains.” p. xviii Dismissive The validity of gains in scores on the Kentucky Instructional Results Information System (KIRIS)  Rand Education, 1998 http://www.rand.org/content/dam/rand/pubs/monograph_reports/2009/MR1014.pdf   In fact the test prep, or test coaching, literature is vast and dates back decades, with meta-analyses of the literature dating back at least to the 1970s. There's even a What Works Clearinghouse summary of the (post World Wide Web) college admission test prep research literature:  https://ies.ed.gov/ncee/wwc/Docs/InterventionReports/wwc_act_sat_100416.pdf . See also: Messick & Jungeblut (1981)  Ellis, Konoske, Wulfeck, & Montague (1982)  DerSimonian and Laird (1983)  Kulik, Bangert-Drowns & Kulik (1984) Fraker (1986/1987) Halpin (1987) Whitla (1988)  Snedecor (1989)  Becker (1990)  Smyth (1990) Moore (1991)  Alderson & Wall (1992)  Powers (1993)  Powers & Rock (1994)  Scholes, Lane (1997)   Allalouf & Ben Shakhar (1998)  Robb & Ercanbrack (1999)  McClain (1999)  Camara (1999, 2001, 2008) Stone & Lane (2000, 2003)  Din & Soldan (2001)  Briggs (2001)  Palmer (2002)  Briggs & Hansen (2004)  Cankoy & Ali Tut (2005)  Crocker (2005)  Allensworth, Correa, & Ponisciak (2008)  Domingue & Briggs (2009) 
87 Daniel M. Koretz Sheila I. Barron “The field of measurement has seen many decades of intensive development of methods for evaluating scores cross-sectionally, but much less attention has been devoted to the problem of evaluating gains. . . . [T]his methodological gap is likely to become ever more important.” p. 122 Dismissive The validity of gains in scores on the Kentucky Instructional Results Information System (KIRIS)  Rand Education, 1998 http://www.rand.org/content/dam/rand/pubs/monograph_reports/2009/MR1014.pdf   In fact the test prep, or test coaching, literature is vast and dates back decades, with meta-analyses of the literature dating back at least to the 1970s. There's even a What Works Clearinghouse summary of the (post World Wide Web) college admission test prep research literature:  https://ies.ed.gov/ncee/wwc/Docs/InterventionReports/wwc_act_sat_100416.pdf . See also: Messick & Jungeblut (1981)  Ellis, Konoske, Wulfeck, & Montague (1982)  DerSimonian and Laird (1983)  Kulik, Bangert-Drowns & Kulik (1984) Fraker (1986/1987) Halpin (1987) Whitla (1988)  Snedecor (1989)  Becker (1990)  Smyth (1990) Moore (1991)  Alderson & Wall (1992)  Powers (1993)  Powers & Rock (1994)  Scholes, Lane (1997)   Allalouf & Ben Shakhar (1998)  Robb & Ercanbrack (1999)  McClain (1999)  Camara (1999, 2001, 2008) Stone & Lane (2000, 2003)  Din & Soldan (2001)  Briggs (2001)  Palmer (2002)  Briggs & Hansen (2004)  Cankoy & Ali Tut (2005)  Crocker (2005)  Allensworth, Correa, & Ponisciak (2008)  Domingue & Briggs (2009) 
88 Daniel M. Koretz Sheila I. Barron “The contrast between mathematics … and reading … underlines the limits of our current knowledge of the mechanisms that underlie score inflation.” p. 122 Dismissive The validity of gains in scores on the Kentucky Instructional Results Information System (KIRIS)  Rand Education, 1998 http://www.rand.org/content/dam/rand/pubs/monograph_reports/2009/MR1014.pdf   In fact the test prep, or test coaching, literature is vast and dates back decades, with meta-analyses of the literature dating back at least to the 1970s. There's even a What Works Clearinghouse summary of the (post World Wide Web) college admission test prep research literature:  https://ies.ed.gov/ncee/wwc/Docs/InterventionReports/wwc_act_sat_100416.pdf . See also: Messick & Jungeblut (1981)  Ellis, Konoske, Wulfeck, & Montague (1982)  DerSimonian and Laird (1983)  Kulik, Bangert-Drowns & Kulik (1984) Fraker (1986/1987) Halpin (1987) Whitla (1988)  Snedecor (1989)  Becker (1990)  Smyth (1990) Moore (1991)  Alderson & Wall (1992)  Powers (1993)  Powers & Rock (1994)  Scholes, Lane (1997)   Allalouf & Ben Shakhar (1998)  Robb & Ercanbrack (1999)  McClain (1999)  Camara (1999, 2001, 2008) Stone & Lane (2000, 2003)  Din & Soldan (2001)  Briggs (2001)  Palmer (2002)  Briggs & Hansen (2004)  Cankoy & Ali Tut (2005)  Crocker (2005)  Allensworth, Correa, & Ponisciak (2008)  Domingue & Briggs (2009) 
89 Daniel M. Koretz reported by Debra Viadero “...all of the researchers interviewed agreed with FairTest’s contention that research evidence supporting the use of high-stakes tests as a means of improving schools is thin.”   Dismissive FairTest report questions reliance on high-stakes testing by states Debra Viadero, Education Week. January 28, 1998.     In fact, a very large number of studies do so. See, for example, https://www.tandfonline.com/doi/full/10.1080/15305058.2011.602920 ; https://nonpartisaneducation.org/Review/Resources/QuantitativeList.htm ; https://nonpartisaneducation.org/Review/Resources/SurveyList.htm ; https://nonpartisaneducation.org/Review/Resources/QualitativeList.htm
90 Daniel M. Koretz Erik A. Hanushek, D.W. Jorgenson (Eds.) "Despite the long history of assessment-based accountability, hard evidence about its effects is surprisingly sparse, and the little evidence that is available is not encouraging. ...The large positive effects assumed by advocates...are often not substantiated by hard evidence....” Dismissive Using student assessments for educational accountability Improving America’s schools: The role of incentives. Washington, D.C.: National Academy Press, 1996     See, for example, https://www.tandfonline.com/doi/full/10.1080/15305058.2011.602920 ; https://nonpartisaneducation.org/Review/Resources/QuantitativeList.htm ; https://nonpartisaneducation.org/Review/Resources/SurveyList.htm ; https://nonpartisaneducation.org/Review/Resources/QualitativeList.htm
91 Daniel M. Koretz Robert L. Linn, Stephen Dunbar, Lorrie A. Shepard “Evidence relevant to this debate has been limited.” p. 2 Dismissive The Effects of High-Stakes Testing On Achievement: Preliminary Findings About Generalization Across Tests  Originally presented at the annual meeting of the AERA and the NCME, Chicago, April 5, 1991 http://nepc.colorado.edu/files/HighStakesTesting.pdf   See, for example, https://www.tandfonline.com/doi/full/10.1080/15305058.2011.602920 ; https://nonpartisaneducation.org/Review/Resources/QuantitativeList.htm ; https://nonpartisaneducation.org/Review/Resources/SurveyList.htm ; https://nonpartisaneducation.org/Review/Resources/QualitativeList.htm
                 
  IRONIES:              
  Daniel M. Koretz   "I discuss a number of important issues that have arisen in K-12 testing and explore their implications for testing in the postsecondary sector. These include ... overstating comparability ... and unwarranted causal inference."   Measuring Postsecondary Achievement: Lessons from Large-Scale Assessments in the K-12 Sector Higher Education Policy, April 24, 2019, Abstract https://link.springer.com/article/10.1057/s41307-019-00142-4  
  Daniel M. Koretz   "Although this problem has been documented for more than a quarter of a century, it is still widely ignored, and the public is fed a steady diet of seriously misleading information about improvements in schools."   The Testing Charade: Pretending to Make Schools Better [Kindle location 723] University of Chicago Press, 2017    
  Daniel M. Koretz   "It is worth considering why we are so unlikely to ever find out how common cheating has become. … the press remains gullible…"   The Testing Charade: Pretending to Make Schools Better [Kindle location 1424] University of Chicago Press, 2017    
  Daniel M. Koretz   "…putting a stop to this disdain for evidence--this arrogant assumption that we know so much that we don't have to bother evaluating our ideas before imposing them on teachers and students--is one of the most important changes we have to make."   The Testing Charade: Pretending to Make Schools Better [Kindle location 2573] University of Chicago Press, 2017    
  Daniel M. Koretz   "But the failure to evaluate the reforms also reflects a particular arrogance."   The Testing Charade: Pretending to Make Schools Better [Kindle location 3184] University of Chicago Press, 2017    
  Daniel M. Koretz   "I've several times excoriated some of the reformers for assuming that whatever they dreamed up would work well without turning to actual evidence."   The Testing Charade: Pretending to Make Schools Better [Kindle location 3229] University of Chicago Press, 2017    
  Daniel M. Koretz Jennifer L. Jennings "Data are considered proprietary—a position that the restrictions imposed by the federal Family Educational Rights and Privacy Act (FERPA) have made easier to maintain publicly. Access is usually provided only for research which is not seen as unduly threatening to the leaders’ immediate political agendas. The fact that this last consideration is often openly discussed underscores the lack of a culture of public accountability."   The Misunderstanding and Use of Data from Educational Tests, pp.4-5 Prepared for Spencer Foundation meetings, Chicago, IL, February 11, 2010. Revised November 21, 2010 http://www.spencer.org/data-use-and-educational-improvement-initiative-activities/ http://www.spencer.org/resources/content/3/3/8/documents/Koretz--Jennings-paper.pdf
  Daniel M. Koretz Jennifer L. Jennings "This unwillingness to countenance honest but potentially threatening research garners very little discussion, but in this respect, education is an anomaly. In many areas of public policy, such as drug safety or vehicle safety, there is an expectation that the public is owed honest and impartial evaluation and research. For example, imagine what would have happed if the CEO of Merck had responded to reports of side-effects from Vioxx by saying that allowing access to data was “not our priority at present,” which is a not infrequent response to data requests made to districts or states. In public education, there is no expectation that the public has a right to honest evaluation, and data are seen as the policymakers’ proprietary sandbox, to which they can grant access when it happens to serve their political needs."   The Misunderstanding and Use of Data from Educational Tests, p.5 Prepared for Spencer Foundation meetings, Chicago, IL, February 11, 2010. Revised November 21, 2010 http://www.spencer.org/data-use-and-educational-improvement-initiative-activities/ http://www.spencer.org/resources/content/3/3/8/documents/Koretz--Jennings-paper.pdf
  Daniel M. Koretz   One sometimes disquieting consequence of the incompleteness of tests is that different tests often provide somewhat inconsistent results. (p. 10)   Measuring up: What educational testing really tells us. Harvard University Press, 2008  Google Books  
  Daniel M. Koretz   "Even a single test can provide varying results. Just as polls have a margin of error, so do achievement tests. Students who take more than one form of a test typically obtain different scores." (p. 11)   Measuring up: What educational testing really tells us. Harvard University Press, 2008  Google Books  
  Daniel M. Koretz   "Even well-designed tests will often provide substantially different views of trends because of differences in content and other aspects of the tests' design. . . . [W]e have to be careful not to place too much confidence in detailed findings, such as the precise size of changes over time or of differences between groups." (p. 92)   Measuring up: What educational testing really tells us. Harvard University Press, 2008  Google Books  
  Daniel M. Koretz   "[O]ne cannot give all the credit or blame to one factor . . . without investigating the impact of others. Many of the complex statistical models used in economics, sociology, epidemiology, and other sciences are efforts to take into account (or 'control' for') other factors that offer plausible alternative explanations of the observed data, and many apportion variation in the outcome-say, test scores-among various possible causes. …A hypothesis is only scientifically credible when the evidence gathered has ruled out plausible alternative explanations." (pp. 122-123)   Measuring up: What educational testing really tells us. Harvard University Press, 2008  Google Books  
  Daniel M. Koretz   "[A] simple correlation need not indicate that one of the factors causes the other." (p. 123)   Measuring up: What educational testing really tells us. Harvard University Press, 2008  Google Books  
  Daniel M. Koretz   "Any number of studies have shown the complexity of the non-educational factors that can affect achievement and test scores." (p. 129)   Measuring up: What educational testing really tells us. Harvard University Press, 2008  Google Books  
                 
      Cite selves or colleagues in the group, but dismiss or denigrate all other work          
      Falsely claim that research has only recently been done on topic.          
                 
* Cannell, J.J. (1987). Nationally Normed Elementary Achievement Testing in America's Public Schools: How All Fifty States are Above the National Average, Daniels, WV: Friends for Education;  Cannell, J.J. (1989). How Public Educators Cheat on Standardized Achievement Tests: The “Lake Wobegon” Report. Albuquerque, NM: Friends for Education.