Author Co-author(s) Dismissive Quote type Title Source Link1 Link2
1 Daniel M. Koretz Holcombe, Jennings “To date, few studies have attempted to understand the sources of variation in score inflation across testing programs.” p. 3 Dismissive The roots of score inflation, an examination of opportunities in two states’ tests  Prepublication draft “to appear in Sunderman (Ed.), Charting reform: achieving equity in a diverse nation http://dash.harvard.edu/bitstream/handle/1/10880587/roots%20of%20score%20inflation.pdf?sequence=1  
2 Daniel M. Koretz Waldman, Yu, Langli, Orzech Few studies have applied a multi-level framework to the evaluation of inflation,” p. 1 Denigrating Using the introduction of a new test to investigate the distribution of score inflation  Working paper of Education Accountability Project at the Harvard Graduate School of Education, Nov. 2014 http://projects.iq.harvard.edu/files/eap/files/ky_cot_3_2_15_working_paper.pdf  
3 Daniel M. Koretz   "What we don’t know, What is the net effect on student achievement?
-Weak research designs, weaker data
-Some evidence of inconsistent, modest effects in elementary math, none in reading
-Effects are likely to vary across contexts...
Reason: grossly inadequate research and evaluation"
Denigrating Using tests for monitoring and accountability Presentation at:  Agencia de Calidad de la Educación Santiago, Chile, November 3, 2014    
4 Daniel M. Koretz Jennifer L. Jennings “We find that research on the use of test score data is limited, and research investigating the understanding of tests and score data is meager.” p. 1 Dismissive The Misunderstanding and Use of Data from Educational Tests  Prepared for Spencer Foundation meetings, Chicago, IL, February 11, 2010. Revised November 21, 2010 http://www.spencer.org/data-use-and-educational-improvement-initiative-activities/ http://www.spencer.org/resources/content/3/3/8/documents/Koretz--Jennings-paper.pdf
5 Daniel M. Koretz Jennifer L. Jennings “Because of the sparse research literature, we rely on experience and anecdote in parts of this paper, with the premise that these conclusions should be supplanted over time by findings from systematic research.” p. 1 Dismissive The Misunderstanding and Use of Data from Educational Tests  Prepared for Spencer Foundation meetings, Chicago, IL, February 11, 2010. Revised November 21, 2010 http://www.spencer.org/data-use-and-educational-improvement-initiative-activities http://www.spencer.org/sites/default/files/pdfs/Koretz-Jennings-paper.pdf
6 Daniel M. Koretz Jennifer L. Jennings “We focus on three issues that are especially relevant to test-based data and about which research is currently sparse:
  How do the types of data made available for use affect policymakers’ and educators’ understanding of data?
  What are the common errors made by policymakers and educators in interpreting test score data?
  How do high-stakes testing and the availability of test-based data affect administrator and teacher practice? (p. 5)
Dismissive The Misunderstanding and Use of Data from Educational Tests  Prepared for Spencer Foundation meetings, Chicago, IL, February 11, 2010. Revised November 21, 2010 http://www.spencer.org/data-use-and-educational-improvement-initiative-activities http://www.spencer.org/resources/content/3/3/8/documents/Koretz-Jennings-paper.pdf
7 Daniel M. Koretz Jennifer L. Jennings Systematic research exploring educators’ understanding of both the principles of testing and appropriate interpretation of test-based data is meager.”, p.5 Dismissive The Misunderstanding and Use of Data from Educational Tests  Prepared for Spencer Foundation meetings, Chicago, IL, February 11, 2010. Revised November 21, 2010 http://www.spencer.org/data-use-and-educational-improvement-initiative-activities http://www.spencer.org/resources/content/3/3/8/documents/Koretz-Jennings-paper.pdf
8 Daniel M. Koretz Jennifer L. Jennings "Although current, systematic information is lacking, our experience is that that the level of understanding of test data among both educators and education policymakers is in many cases abysmally low.", p.6 Dismissive The Misunderstanding and Use of Data from Educational Tests  Prepared for Spencer Foundation meetings, Chicago, IL, February 11, 2010. Revised November 21, 2010 http://www.spencer.org/data-use-and-educational-improvement-initiative-activities http://www.spencer.org/resources/content/3/3/8/documents/Koretz-Jennings-paper.pdf
9 Daniel M. Koretz Jennifer L. Jennings “We have few systematic studies of variations in educators’ responses. …” p. 26 Dismissive The Misunderstanding and Use of Data from Educational Tests  Prepared for Spencer Foundation meetings, Chicago, IL, February 11, 2010. Revised November 21, 2010 http://www.spencer.org/data-use-and-educational-improvement-initiative-activities http://www.spencer.org/resources/content/3/3/8/documents/Koretz-Jennings-paper.pdf
10 Daniel M. Koretz   “The field of measurement has not kept pace with this transformation of testing.” p. 3 Denigrating Implications of current policy for educational measurement  paper presented at the Exploratory Seminar: Measurement Challenges Within the Race to the Top Agenda, December 2009 http://www.k12center.org/rsc/pdf/KoretzPresenterSession3.pdf  
11 Daniel M. Koretz   “For the most part, notwithstanding Lindquist’s warning, the field of measurement has largely ignored the top levels of sampling.” p. 6 Dismissive Implications of current policy for educational measurement  paper presented at the Exploratory Seminar: Measurement Challenges Within the Race to the Top Agenda, December 2009 http://www.k12center.org/rsc/pdf/KoretzPresenterSession3.pdf  
12 Daniel M. Koretz   “Currently, research on accountability‐related topics, such as score inflation and effects on educational practice, is slowly growing but remains largely divorced from the core activities of the measurement field.” p. 15 Dismissive Implications of current policy for educational measurement  paper presented at the Exploratory Seminar: Measurement Challenges Within the Race to the Top Agenda, December 2009 http://www.k12center.org/rsc/pdf/KoretzPresenterSession3.pdf  
13 Daniel M. Koretz   “The data, however, are more limited and more complex than is often realized, and the story they properly tell is not quite so straightforward. . . . Data about student performance at the end of high school are scarce and especially hard to collect and interpret.” p. 38 Dismissive How do American students measure up? Making Sense of International Comparisons The Future of Children 19:1 Spring 2009 http://www.princeton.edu/futureofchildren/publications/docs/19_01_FullJournal.pdf  
14 Daniel M. Koretz   “International comparisons clearly do not provide what many observers of education would like. . . . The findings are in some cases inconsistent from one study to another. Moreover, the data from all of these studies are poorly suited to separating the effects of schooling from the myriad other influences on student achievement. p 48 Dismissive How do American students measure up? Making Sense of International Comparisons The Future of Children 19:1 Spring 2009 http://www.princeton.edu/futureofchildren/publications/docs/19_01_FullJournal.pdf  
15 Daniel M. Koretz   If truly comparable data from the end of schooling were available, they would presumably look somewhat different, though it is unlikely that they would be greatly more optimistic.” p. 49 Dismissive How do American students measure up? Making Sense of International Comparisons The Future of Children 19:1 Spring 2009 http://www.princeton.edu/futureofchildren/publications/docs/19_01_FullJournal.pdf  
16 Daniel M. Koretz   Few detailed studies of score inflation have been carried out. ...” p. 778 Dismissive Test-based educational accountability. Research evidence and implication Zeitschrift für Pädagogik 54 (2008) 6, S. 777–790 http://www.pedocs.de/volltexte/2011/4376/pdf/ZfPaed_2008_6_Koretz_Testbased_educational_accountability_D_A.pdf  
17 Daniel M. Koretz   “Unfortunately, while we have a lot of anecdotal evidence suggesting that this [equity as the rationale for NCLB] is the case, we have very few serious empirical studies of this.” answer to 3rd question, 1st para Denigrating What does educational testing really tell us?  Education Week [interview ], 9.23.2008 http://blogs.edweek.org/edweek/eduwonkette/2008/09/what_does_educational_testing_1.html  
18 Daniel M. Koretz   "…we rarely know when [test] scores are inflated because we so rarely check." Dismissive Interpreting test scores: More complicated than you think [interview] Chronicle of Higher Education, August 15, 2008, p. A23    
19 Daniel M. Koretz   “… [T]he problem of score inflation is at best inconvenient and at worse [sic] threatening. (The later is one reason that there are so few studies of this problem. …)” p. 11 Dismissive Measuring up: What educational testing really tells us. Harvard University Press, 2008  Google Books  
20 Daniel M. Koretz   “The relatively few studies that have addressed this question support the skeptical interpretation: in many cases, mastery of material on the new test simply substitutes for mastery of the old.” p. 242 Dismissive Measuring up: What educational testing really tells us. Harvard University Press, 2008  Google Books  
21 Daniel M. Koretz   “Because so many people consider test-based accountability to be self-evaluating … there is a disturbing lack of good evaluations of these systems. …”) p. 331 Denigrating Measuring up: What educational testing really tells us Harvard University Press, 2008  Google Books   
22 Daniel M. Koretz   Most of these few studies showed a rapid divergence of means on the two tests. …” p. 348 Dismissive Using aggregate-level linkages for estimation and valuation, etc. in Linking and Aligning Scores and Scales, Springer, 2007 Google Books  
23 Daniel M. Koretz   "Research to date makes clear that score gains achieved under high-stakes conditions should not be accepted at face value. ...policymakers embarking on an effort to create a more effective system of ...accountability must face uncertainty about how well alternatives will function in practice, and should be prepared for a period of evaluation and mid-course correction." Dismissive Alignment, High Stakes, and the Inflation of Test Scores CRESST Report 655, June 2005    
24 Daniel M. Koretz   "Thus, even in a well-aligned system, policymakers still face the challenge of designing educational accountability systems that create the right mix of incentives: incentives that will maximize real gains in student performance, minimize score inflation, and generate other desirable changes in educational practice. This is a challenge in part because of a shortage of relevant experience and research..." Dismissive Alignment, High Stakes, and the Inflation of Test Scores CRESST Report 655, June 2005    
25 Daniel M. Koretz   "Research has yet to clarify how variations in the performance targets set for schools affect the incentives faced by teachers and the resulting validity of score gains." Dismissive Alignment, High Stakes, and the Inflation of Test Scores CRESST Report 655, June 2005    
26 Daniel M. Koretz   "In terms of research, the jury is still out." Dismissive Alignment, High Stakes, and the Inflation of Test Scores CRESST Report 655, June 2005    
27 Daniel M. Koretz   "The first study to evaluate score inflation empirically (Koretz, Linn, Dunbar, and Shepard, 1991) looked at a district-testing program in the 1980s that used commercial, off-the-shelf, multiple-choice achievement tests."  1stness Alignment, High Stakes, and the Inflation of Test Scores, p.7 CRESST Report 655, June 2005    
28 Laura S. Hamilton Daniel F. McCaffrey, Lockwood, Daniel M. Koretz “The shortcomings of the studies make it difficult to determine the size of teacher effects, but we suspect that the magnitude of some of the effects reported in this literature are overstated.” p. xiii Denigrating Evaluating Value-Added Models for Teacher Accountability  Rand Corporation, 2003 https://www.rand.org/content/dam/rand/pubs/monographs/2004/RAND_MG158.pdf
29 Laura S. Hamilton Daniel F. McCaffrey, Lockwood, Daniel M. Koretz “Using VAM to estimate individual teacher effects is a recent endeavor, and many of the possible sources of error have not been thoroughly evaluated in the literature.” p. xix Dismissive Evaluating Value-Added Models for Teacher Accountability  Rand Corporation, 2003 https://www.rand.org/content/dam/rand/pubs/monographs/2004/RAND_MG158.pdf
30 Laura S. Hamilton Daniel F. McCaffrey, Lockwood, Daniel M. Koretz “Empirical evaluations do not exist for many of the potential sources of error we have identified. Studies need to be conducted to determine how these factors contribute to estimated teacher effects and to determine the conditions that exacerbate or mitigate the impact these factors have on teacher effects.” p. xix Dismissive Evaluating Value-Added Models for Teacher Accountability  Rand Corporation, 2003 https://www.rand.org/content/dam/rand/pubs/monographs/2004/RAND_MG158.pdf
31 Laura S. Hamilton Daniel F. McCaffrey, Lockwood, Daniel M. Koretz “This lack of attention to teachers in policy discussions may be attributed in part to another body of literature that attempted to determine the effects of specific teacher background characteristics, including credentialing status (e.g., Miller, McKenna, and McKenna, 1998; Goldhaber and Brewer, 2000) and subject matter coursework (e.g., Monk, 1994).” p. 8 Dismissive Evaluating Value-Added Models for Teacher Accountability  Rand Corporation, 2003 https://www.rand.org/content/dam/rand/pubs/monographs/2004/RAND_MG158.pdf
32 Laura S. Hamilton Daniel F. McCaffrey, Lockwood, Daniel M. Koretz “To date, there has been little empirical exploration of the size of school effects and the sensitivity of teacher effects to modeling of school effects.” p. 78 Dismissive Evaluating Value-Added Models for Teacher Accountability  Rand Corporation, 2003 https://www.rand.org/content/dam/rand/pubs/monographs/2004/RAND_MG158.pdf
33 Laura S. Hamilton Daniel F. McCaffrey, Lockwood, Daniel M. Koretz “There are no empirical explorations of the robustness of estimates to assumptions about prior-year schooling effects.“ p. 81 Dismissive Evaluating Value-Added Models for Teacher Accountability  Rand Corporation, 2003 https://www.rand.org/content/dam/rand/pubs/monographs/2004/RAND_MG158.pdf
34 Laura S. Hamilton Daniel F. McCaffrey, Lockwood, Daniel M. Koretz “There is currently no empirical evidence about the sensitivity of gain scores or teacher effects to such alternatives.” p. 89 Dismissive Evaluating Value-Added Models for Teacher Accountability  Rand Corporation, 2003 https://www.rand.org/content/dam/rand/pubs/monographs/2004/RAND_MG158.pdf
35 Laura S. Hamilton Daniel F. McCaffrey, Lockwood, Daniel M. Koretz “Empirical evaluations do not exist for many of the potential sources of error we have identified. Studies need to be conducted to determine how these factors contribute to estimated teacher effects and to determine the conditions that exacerbate or mitigate the impact these factors have on teacher effects.” p. 116 Dismissive Evaluating Value-Added Models for Teacher Accountability  Rand Corporation, 2003 https://www.rand.org/content/dam/rand/pubs/monographs/2004/RAND_MG158.pdf
36 Laura S. Hamilton Daniel F. McCaffrey, Lockwood, Daniel M. Koretz “Although we expect missing data are likely to be pervasive, there is little systematic discussion of the extent or nature of missing data in test score databases.” p. 117 Dismissive Evaluating Value-Added Models for Teacher Accountability  Rand Corporation, 2003 https://www.rand.org/content/dam/rand/pubs/monographs/2004/RAND_MG158.pdf
37 Daniel M. Koretz   "Empirical research on the validity of score gains on high-stakes tests is limited, but the studies conducted to date show…" Dismissive Using multiple measures to address perverse incentives an score inflation, p.21 Educational Measurement: Issues and Practice, Summer 2003    
38 Daniel M. Koretz   "Research on educators' responses to high-stakes testing is also limited, …" Dismissive Using multiple measures to address perverse incentives an score inflation, p.21 Educational Measurement: Issues and Practice, Summer 2003    
39 Daniel M. Koretz   "Although extant research is sufficient to document problems of score inflation and unintended incentives from test-based accountability, it provides very little guidance about how one might design an accountability system to lessen these problems."  Denigrating Using multiple measures to address perverse incentives an score inflation, p.22 Educational Measurement: Issues and Practice, Summer 2003    
40 Daniel M. Koretz   “Relatively few studies, however, provide strong empirical evidence pertaining to inflation of entire scores on tests used for accountability.” p. 759 Denigrating Limitations in the use of achievement tests as measures of educators’ productivity  The Journal of Human Resources, 37:4 (Fall 2002) http://standardizedtests.procon.org/sourcefiles/limitations-in-the-use-of-achievement-tests-as-measures-of-educators-productivity.pdf  
41 Daniel M. Koretz   “Only a few studies have directly tested the generalizability of gains in scores on accountability-oriented tests.” p. 759 Dismissive Limitations in the use of achievement tests as measures of educators’ productivity  The Journal of Human Resources, 37:4 (Fall 2002) http://standardizedtests.procon.org/sourcefiles/limitations-in-the-use-of-achievement-tests-as-measures-of-educators-productivity.pdf  
42 Daniel M. Koretz   “Moreover, while there are numerous anecdotal reports of various types of coaching, little systematic research describes the range of coaching strategies and their effects.” p. 769 Dismissive Limitations in the use of achievement tests as measures of educators’ productivity  The Journal of Human Resources, 37:4 (Fall 2002) http://standardizedtests.procon.org/sourcefiles/limitations-in-the-use-of-achievement-tests-as-measures-of-educators-productivity.pdf  
43 Daniel M. Koretz Daniel F. McCaffrey, Laura S. Hamilton "Few efforts are made to evaluate directly score gains obtained under high-stakes conditions, and conventional validation tools are not fully adequate for the task.", p. 1 Dismissive Toward a framework for validating gains under high-stakes conditions CSE Technical Report 551, CRESST/Harvard Graduate School of Education, CRESST/RAND Education, December 2001    
44 Daniel M. Koretz Mark Berends “[T]here has been little systematic research exploring changes in grading standards. …” p. iii Dismissive Changes in high school grading standards in mathematics, 1982–1992  Rand Education, 2001 http://www.rand.org/content/dam/rand/pubs/monograph_reports/2007/MR1445.pdf  
45 Daniel M. Koretz Mark Berends [F]ew studies have attempted to evaluate systematically changes in grading standards over time.” p. xi Dismissive Changes in high school grading standards in mathematics, 1982–1992  Rand Education, 2001 http://www.rand.org/content/dam/rand/pubs/monograph_reports/2007/MR1445.pdf  
46 Daniel M. Koretz E. A. Hanushek, J. J. Heckman, and D. Neal (organizers) "Research provides sparse guidance about how to broaden the range of measured outcomes to provide a better mix of incentives and lessen score inflation.", p.27 Dismissive Limitations in the Use of Achievement Tests as Measures of Educators’ Productivity  Devising Incentives to Promote Human Capital, National Academy of Sciences Conference, May 2000 http://www.irp.wisc.edu/newsevents/other/symposia/koretz.pdf  
47 Daniel M. Koretz E. A. Hanushek, J. J. Heckman, and D. Neal (organizers) "...what types of accountability systems might be more effective, and what role might achievement tests play in them? Unfortunately, there is little basis in research for answering this question. The simple test-based accountability systems that have been in vogue for the past two decades have appeared so commonsensical to some policymakers that they have had little incentive to permit the evaluation of alternatives.", p.25 Dismissive Limitations in the Use of Achievement Tests as Measures of Educators’ Productivity  Devising Incentives to Promote Human Capital, National Academy of Sciences Conference, May 2000 http://www.irp.wisc.edu/newsevents/other/symposia/koretz.pdf  
48 Daniel M. Koretz E. A. Hanushek, J. J. Heckman, and D. Neal (organizers) "...while there are numerous anecdotal reports of various types of coaching, little systematic research describes the range of coaching strategies and their effects.", p.24 Denigrating Limitations in the Use of Achievement Tests as Measures of Educators’ Productivity  Devising Incentives to Promote Human Capital, National Academy of Sciences Conference, May 2000 http://www.irp.wisc.edu/newsevents/other/symposia/koretz.pdf  
49 Daniel M. Koretz E. A. Hanushek, J. J. Heckman, and D. Neal (organizers) "Only a few studies have directly tested the generalizability of gains in scores on accountability-oriented tests.", p.11 Denigrating Limitations in the Use of Achievement Tests as Measures of Educators’ Productivity  Devising Incentives to Promote Human Capital, National Academy of Sciences Conference, May 2000 http://www.irp.wisc.edu/newsevents/other/symposia/koretz.pdf  
50 Daniel M. Koretz E. A. Hanushek, J. J. Heckman, and D. Neal (organizers) "Relatively few studies, however, provide strong empirical evidence pertaining to inflation of entire scores on tests used for accountability.  Policy makers have little incentive to facilitate such studies, and they can be difficult to carry out.", p.11 Denigrating Limitations in the Use of Achievement Tests as Measures of Educators’ Productivity  Devising Incentives to Promote Human Capital, National Academy of Sciences Conference, May 2000 http://www.irp.wisc.edu/newsevents/other/symposia/koretz.pdf  
51 Daniel M. Koretz Sheila I. Barron “In the absence of systematic research documenting test-based accountability systems that have avoided the problem of inflated gains …” p. xvii Dismissive The validity of gains in scores on the Kentucky Instructional Results Information System (KIRIS)  Rand Education, 1998 http://www.rand.org/content/dam/rand/pubs/monograph_reports/2009/MR1014.pdf  
52 Daniel M. Koretz Sheila I. Barron “This study also illustrated in numerous ways the limitations of current research on the validity of gains.” p. xviii Dismissive The validity of gains in scores on the Kentucky Instructional Results Information System (KIRIS)  Rand Education, 1998 http://www.rand.org/content/dam/rand/pubs/monograph_reports/2009/MR1014.pdf  
53 Daniel M. Koretz Sheila I. Barron “The field of measurement has seen many decades of intensive development of methods for evaluating scores cross-sectionally, but much less attention has been devoted to the problem of evaluating gains. . . . [T]his methodological gap is likely to become ever more important.” p. 122 Dismissive The validity of gains in scores on the Kentucky Instructional Results Information System (KIRIS)  Rand Education, 1998 http://www.rand.org/content/dam/rand/pubs/monograph_reports/2009/MR1014.pdf  
54 Daniel M. Koretz Sheila I. Barron “The contrast between mathematics … and reading … underlines the limits of our current knowledge of the mechanisms that underlie score inflation.” p. 122 Dismissive The validity of gains in scores on the Kentucky Instructional Results Information System (KIRIS)  Rand Education, 1998 http://www.rand.org/content/dam/rand/pubs/monograph_reports/2009/MR1014.pdf  
55 Daniel M. Koretz reported by Debra Viadero “...all of the researchers interviewed agreed with FairTest’s contention that research evidence supporting the use of high-stakes tests as a means of improving schools is thin.”   Dismissive FairTest report questions reliance on high-stakes testing by states Debra Viadero, Education Week. January 28, 1998.    
56 Daniel M. Koretz Erik A. Hanushek, D.W. Jorgenson (Eds.) "Despite the long history of assessment-based accountability, hard evidence about its effects is surprisingly sparse, and the little evidence that is available is not encouraging. ...The large positive effects assumed by advocates...are often not substantiated by hard evidence....” Dismissive Using student assessments for educational accountability Improving America’s schools: The role of incentives. Washington, D.C.: National Academy Press, 1996    
57 Daniel M. Koretz Robert L. Linn, Stephen Dunbar, Lorrie A. Shepard “Evidence relevant to this debate has been limited.” p. 2 Dismissive The Effects of High-Stakes Testing On Achievement: Preliminary Findings About Generalization Across Tests  Originally presented at the annual meeting of the AERA and the NCME, Chicago, April 5, 1991 http://nepc.colorado.edu/files/HighStakesTesting.pdf  
Cite selves or colleagues in the group, but dismiss or denigrate all other work
Falsely claim that research has only recently been done on topic.