HOME: Dismissive Reviews in Education Policy Research | |||||||||
Author | Co-author(s) | Dismissive Quote | type | Title | Source | Link1 | Notes | Notes2 | |
1 | Jill Barshay | Daniel Koretz [interviewee] | " In this country, we treat education data as the private sandbox of superintendents and commissioners. This is entirely different from how we treat data in other areas of public policy, such as medicine or airline safety." | Dismissive | PROOF POINTS: 5 Questions for Daniel Koretz | Hechinger Report, July 13, 2020 | https://hechingerreport.org/proof-points-5-five-questions-for-daniel-koretz/ | There are privacy controls on student data; there should probably be more and they should probably be enforced more vigourously. But, such controls are even stronger with medical data, which Koretz seems to imply here are weaker. Nonetheless, access to anonymized student data is granted all the time. Externally administered high-stakes testing is widely reviled among US educationists. It strains credulity that Koretz can not find one district out of the many thousands to cooperate with him to conduct a study to discredit testing. | |
2 | Jill Barshay | Daniel Koretz [interviewee] | "And so there aren’t that many studies, but the ones we have are quite consistent." | Dismissive | PROOF POINTS: 5 Questions for Daniel Koretz | Hechinger Report, July 13, 2020 | https://hechingerreport.org/proof-points-5-five-questions-for-daniel-koretz/ | In fact the test prep, or test coaching, literature is vast and dates back decades, with meta-analyses of the literature dating back at least to the 1970s. There's even a What Works Clearinghouse summary of the (post World Wide Web) college admission test prep research literature: https://ies.ed.gov/ncee/wwc/Docs/InterventionReports/wwc_act_sat_100416.pdf . See also: Gilmore (1927) DeWeerdt (1927) French (1959) French & Dear (1959) Ortar (1960) Marron (1965) ETS (1965). Messick & Jungeblut (1981) Ellis, Konoske, Wulfeck, & Montague (1982) DerSimonian and Laird (1983) Kulik, Bangert-Drowns & Kulik (1984) Powers (1985) Samson (1985) Scruggs, White, & Bennion (1986) Jones (1986). Fraker (1986/1987) Halpin (1987) Whitla (1988) Snedecor (1989) Bond (1989). Baydar (1990) Becker (1990) Smyth (1990) Moore (1991) Alderson & Wall (1992) Powers (1993) Oren (1993). Powers & Rock (1994) Scholes, Lane (1997) Allalouf & Ben Shakhar (1998) Robb & Ercanbrack (1999) McClain (1999) Camara (1999, 2001, 2008) Stone & Lane (2000, 2003) Din & Soldan (2001) Briggs (2001) Palmer (2002) Briggs & Hansen (2004) Cankoy & Ali Tut (2005) Crocker (2005) Allensworth, Correa, & Ponisciak (2008) Domingue & Briggs (2009) Koljatic & Silva (2014) Early (2019) Herndon (2021) The many experimental studies of test coaching are consistent, it has some modest effect, and not the volatile or very large effects that Koretz claims. | |
3 | Jill Barshay | Daniel Koretz [interviewee] | "Experts have been writing about test score inflation since at least 1951. It’s not news but people have willfully ignored it." | Denigrating | PROOF POINTS: 5 Questions for Daniel Koretz | Hechinger Report, July 13, 2020 | https://hechingerreport.org/proof-points-5-five-questions-for-daniel-koretz/ | Seems hypocritical. The most famous, and most honest, study of test score inflation--which primarily blamed cheating, corruption, and lax test security for it--was conducted by John J. Cannell in the mid 1980s. Koretz and his colleagues at CRESST have misrepresented Cannell's reports for three decades. More recently, Koretz has claimed that he conducted the first test score inflation study around 1990. | |
4 | Daniel M. Koretz | "Our current system is premised on the assumption that if we hold people accountable for just a few important things — primarily scores on a few tests — the rest of what matters in schools will follow along, but experience has confirmed that this is nonsense." | Dismissive | American
students aren't getting smarter — and testbased 'reform' initiatives are to blame |
NBC News, Thought Experiment | https://www.nbcnews.com/think/opinion/american-students-aren-t-getting-smarter-test-based-reform-initiatives-ncna1103366 | In fact, the evidence "that testing can improve education" is voluminous. See, for example, Phelps, R. P. (2005). The rich, robust research literature on testing’s achievement benefits. In R. P. Phelps (Ed.), Defending standardized testing (pp. 55–90). Mahwah, NJ: Psychology Press. Or, see https://journals.sagepub.com/doi/abs/10.1177/0193841X19865628#abstract | ||
5 | Daniel M. Koretz | "One of the main reasons for the failure of test-based accountability was reformers’ refusal to evaluate their innovations before imposing them wholesale on students and teachers." | Dismissive | American
students aren't getting smarter — and testbased 'reform' initiatives are to blame |
NBC News, Thought Experiment | https://www.nbcnews.com/think/opinion/american-students-aren-t-getting-smarter-test-based-reform-initiatives-ncna1103366 | In fact, many, if not most, large-scale testing and accountability programs in the past have been evaluated. The evaluation reports tended to end up on shelves in district and state research bureaus. Some declare there to be no research after looking only in the most easily accessible locations for the most easily retrieved evidence. | ||
6 | Daniel M. Koretz | "However, our experience is still limited, and there is a serious dearth of research investigating the characteristics and effects of testing in the postsecondary sector." | Dismissive | Measuring Postsecondary Achievement: Lessons from Large-Scale Assessments in the K-12 Sector | Higher Education Policy, April 24, 2019, Abstract | https://link.springer.com/article/10.1057/s41307-019-00142-4 | In fact, the research literature on testing in higher education is long and deep. Consider, for example, the work of Trudy Banta, Patricia Cross, and Thomas Angelo. See also the large number of higher education studies in this meta analysis: https://www.tandfonline.com/doi/full/10.1080/15305058.2011.602920 ; https://nonpartisaneducation.org/Review/Resources/QuantitativeList.htm ; https://nonpartisaneducation.org/Review/Resources/SurveyList.htm ; https://nonpartisaneducation.org/Review/Resources/QualitativeList.htm | ||
7 | Matt Barnum | Daniel Koretz [interviewee] | Journalist: I take it it’s very hard to quantify this test prep phenomenon, though? Koretz: It is extremely hard, and there’s a big hole in the research in this area. | Dismissive | Why one Harvard professor calls American schools’ focus on testing a ‘charade’ | Chalkbeat, January 19, 2018 | https://www.chalkbeat.org/posts/us/2018/01/19/why-one-harvard-professor-calls-american-schools-focus-on-testing-a-charade/ | In fact the test prep, or test coaching, literature is vast and dates back decades, with meta-analyses of the literature dating back at least to the 1970s. There's even a What Works Clearinghouse summary of the (post World Wide Web) college admission test prep research literature: https://ies.ed.gov/ncee/wwc/Docs/InterventionReports/wwc_act_sat_100416.pdf . See also: Gilmore (1927) DeWeerdt (1927) French (1959) French & Dear (1959) Ortar (1960) Marron (1965) ETS (1965). Messick & Jungeblut (1981) Ellis, Konoske, Wulfeck, & Montague (1982) DerSimonian and Laird (1983) Kulik, Bangert-Drowns & Kulik (1984) Powers (1985) Samson (1985) Scruggs, White, & Bennion (1986) Jones (1986). Fraker (1986/1987) Halpin (1987) Whitla (1988) Snedecor (1989) Bond (1989). Baydar (1990) Becker (1990) Smyth (1990) Moore (1991) Alderson & Wall (1992) Powers (1993) Oren (1993). Powers & Rock (1994) Scholes, Lane (1997) Allalouf & Ben Shakhar (1998) Robb & Ercanbrack (1999) McClain (1999) Camara (1999, 2001, 2008) Stone & Lane (2000, 2003) Din & Soldan (2001) Briggs (2001) Palmer (2002) Briggs & Hansen (2004) Cankoy & Ali Tut (2005) Crocker (2005) Allensworth, Correa, & Ponisciak (2008) Domingue & Briggs (2009) Koljatic & Silva (2014) Early (2019) Herndon (2021) | |
8 | Matt Barnum | Daniel Koretz [interviewee] | "There aren’t that many studies, but they’re very consistent. The inflation that does show up is sometimes absolutely massive. Worse, there is growing evidence that that problem is more severe for disadvantaged kids, creating the illusion of improved equity." | Dismissive | Why one Harvard professor calls American schools’ focus on testing a ‘charade’ | Chalkbeat, January 19, 2018 | https://www.chalkbeat.org/posts/us/2018/01/19/why-one-harvard-professor-calls-american-schools-focus-on-testing-a-charade/ | In fact the test prep, or test coaching, literature is vast and dates back decades, with meta-analyses of the literature dating back at least to the 1970s. There's even a What Works Clearinghouse summary of the (post World Wide Web) college admission test prep research literature: https://ies.ed.gov/ncee/wwc/Docs/InterventionReports/wwc_act_sat_100416.pdf . See also: Gilmore (1927) DeWeerdt (1927) French (1959) French & Dear (1959) Ortar (1960) Marron (1965) ETS (1965). Messick & Jungeblut (1981) Ellis, Konoske, Wulfeck, & Montague (1982) DerSimonian and Laird (1983) Kulik, Bangert-Drowns & Kulik (1984) Powers (1985) Samson (1985) Scruggs, White, & Bennion (1986) Jones (1986). Fraker (1986/1987) Halpin (1987) Whitla (1988) Snedecor (1989) Bond (1989). Baydar (1990) Becker (1990) Smyth (1990) Moore (1991) Alderson & Wall (1992) Powers (1993) Oren (1993). Powers & Rock (1994) Scholes, Lane (1997) Allalouf & Ben Shakhar (1998) Robb & Ercanbrack (1999) McClain (1999) Camara (1999, 2001, 2008) Stone & Lane (2000, 2003) Din & Soldan (2001) Briggs (2001) Palmer (2002) Briggs & Hansen (2004) Cankoy & Ali Tut (2005) Crocker (2005) Allensworth, Correa, & Ponisciak (2008) Domingue & Briggs (2009) Koljatic & Silva (2014) Early (2019) Herndon (2021) | |
9 | Daniel Koretz | "While the evidence about the past effects of TBA is generally clear, I stress in Charade that we have far less evidence to guide the development of alternatives," | Dismissive | A Realistic Perspective on High-Stakes Testing | Education Next, November 21, 2017 | ||||
10 | Daniel Koretz | " I explain that 25 years of research has shown that score inflation is common, that it is often very large, and that the limited research on its distribution suggests that both inflation and bad test preparation affect disadvantaged students more than others." | Dismissive | A Realistic Perspective on High-Stakes Testing | Education Next, November 21, 2017 | ||||
11 | Daniel M. Koretz | "However, this reasoning isn't just simple, it's simplistic--and the evidence is overwhelming that this approach [that testing can improve education] has failed. … these improvements are few and small. Hard evidence is limited, a consequence of our failure as a nation to evaluate these programs appropriately before imposing them on all children." | Dismissive | The Testing Charade: Pretending to Make Schools Better [Kindle location 142] | University of Chicago Press, 2017 | https://www.press.uchicago.edu/ucp/books/book/chicago/T/bo24695545.html | In fact, the evidence "that testing can improve education" is voluminous. See, for example, Phelps, R. P. (2005). The rich, robust research literature on testing’s achievement benefits. In R. P. Phelps (Ed.), Defending standardized testing (pp. 55–90). Mahwah, NJ: Psychology Press. | ||
12 | Daniel M. Koretz | "The bottom line: the information yielded by tests, while very useful, is never by itself adequate for evaluating programs, schools, or educators. Self-evident as this should be, it has been widely ignored in recent years. Indeed, ignoring this obvious warning has been the bedrock of test-based education reform." | Denigrating | The Testing Charade: Pretending to Make Schools Better [Kindle location 142] | University of Chicago Press, 2017 | https://www.press.uchicago.edu/ucp/books/book/chicago/T/bo24695545.html | I know of no testing professional who claims that testing by itself is adequate for evaluating programs, schools, or educators. But, by the same notion, neither are other measures used alone, such as inspections or graduation rates. | ||
13 | Daniel M. Koretz | "…as of the late 1980s there was not a single study evaluating whether inflation occurred or how severe it was. With three colleagues, I set out to conduct one." | 1stness | The Testing Charade: Pretending to Make Schools Better [Kindle location 142] | University of Chicago Press, 2017 | https://www.press.uchicago.edu/ucp/books/book/chicago/T/bo24695545.html | * The most famous test score inflation study of all time -- John J. Cannells "Lake Wobegon Effect" study -- preceded Koretz's by several years. See: http://nonpartisaneducation.org/Review/Books/CannellBook1.htm http://nonpartisaneducation.org/Review/Books/Cannell2.pdf | ||
14 | Daniel M. Koretz | "However, value-added estimates are rarely calculated with lower-stakes tests that are less likely to be inflated." | Dismissive | The Testing Charade: Pretending to Make Schools Better [Kindle location 142] | University of Chicago Press, 2017 | https://www.press.uchicago.edu/ucp/books/book/chicago/T/bo24695545.html | Almost all value-added measurements (VAM) are calculated on scores from tests with no stakes for the students. The state of Tennessee, which pioneered VAM and has continued to use it for two decades uses nationally-normed reference tests that have no stakes for anyone, including teachers. Moreover, research shows that low-stakes tests are more prone to score inflation than high-stakes tests. | ||
15 | Daniel M. Koretz | "One reason we know less than we should … is that most of the abundant test score data available to us are too vulnerable to score inflation to be trusted. There is a second reason for the dearth of information, the blame for which lies squarely on the shoulders of many of the reformers." | Dismissive | The Testing Charade: Pretending to Make Schools Better [Kindle location 142] | University of Chicago Press, 2017 | https://www.press.uchicago.edu/ucp/books/book/chicago/T/bo24695545.html | The vast amount of information already available just for the asking, worldwide, could help build better accountability systems, without wasting more research grant money on those who refuse to study what is already available. | ||
16 | Daniel M. Koretz | "High-quality evaluations of the test-based reforms aren't common, …" | Denigrating | The Testing Charade: Pretending to Make Schools Better [Kindle location 142] | University of Chicago Press, 2017 | https://www.press.uchicago.edu/ucp/books/book/chicago/T/bo24695545.html | Actually, high-quality evaluations of testing interventions have been numerous and common over the past century. Most of them do not produce the results that Koretz prefers, however, so he declares them nonexistent. See https://www.tandfonline.com/doi/full/10.1080/15305058.2011.602920 ; https://nonpartisaneducation.org/Review/Resources/QuantitativeList.htm ; https://nonpartisaneducation.org/Review/Resources/SurveyList.htm ; https://nonpartisaneducation.org/Review/Resources/QualitativeList.htm | ||
17 | Daniel M. Koretz | "The first solid study documenting score inflation was presented twenty-five years before I started writing this book." | 1stness | The Testing Charade: Pretending to Make Schools Better [Kindle location 142] | University of Chicago Press, 2017 | https://www.press.uchicago.edu/ucp/books/book/chicago/T/bo24695545.html | * The most famous test score inflation study of all time -- John J. Cannells "Lake Wobegon Effect" study -- preceded Koretz's by several years. See: http://nonpartisaneducation.org/Review/Books/CannellBook1.htm http://nonpartisaneducation.org/Review/Books/Cannell2.pdf | ||
18 | Daniel M. Koretz | "The first study showing illusory improvement in achievement gaps--the largely bogus "Texas miracle"--was publicshed only ten years after that." | 1stness | The Testing Charade: Pretending to Make Schools Better [Kindle location 142] | University of Chicago Press, 2017 | https://www.press.uchicago.edu/ucp/books/book/chicago/T/bo24695545.html | * The most famous test score inflation study of all time -- John J. Cannells "Lake Wobegon Effect" study -- preceded Koretz's by several years. See: http://nonpartisaneducation.org/Review/Books/CannellBook1.htm http://nonpartisaneducation.org/Review/Books/Cannell2.pdf | ||
19 | Daniel Koretz | Carol
Yu Preeya P. Mbekeani Meredith Langi Tasmin Dhaliwal David Braslow |
"Despite these differences, research to date suggests that the two types of tests are roughly similar as predictors of performance in college. However, this research is very limited; it comprises only a few studies in a few contexts." | Dismissive | Predicting Freshman Grade Point Average From College Admissions Test Scores and State High School Test Scores, p.2 | AERA Open, October-December 2016, Vol. 2, No. 4, pp. 1–13 | |||
20 | Daniel Koretz | Carol
Yu Preeya P. Mbekeani Meredith Langi Tasmin Dhaliwal David Braslow |
"Most of the data are old, antedating the enactment of No Child Left Behind. They include no evidence about the predictive power of EOC tests and, with the exception of only a single weak contrast, present no evidence about summative tests that are high stakes for students. These studies do not include analysis of over- and underprediction as a function of student demographics, which is standard in validation studies of college admissions tests." | Denigrating | Predicting Freshman Grade Point Average From College Admissions Test Scores and State High School Test Scores, p.2 | AERA Open, October-December 2016, Vol. 2, No. 4, pp. 1–13 | |||
21 | Daniel M. Koretz | Jennifer L. Jennings, Hui Leng Ng, Carol Yu, David Braslow, Meredith Langi | "A number of studies have estimated smaller effects of coaching for the SAT, often in the range of 0.1–0.2 standard deviation on the mathematics test (e.g., Briggs, 2009; Dominigue & Briggs, 2009; Powers & Rock, 1999). However, these studies reflect a different process than test prep in K–12 schools and are methodologically weaker; while most studies of K–12 score inflation rely on comparisons of identical or randomly equivalent groups, studies of SAT coaching rely on covariate-adjustment or propensity-score matching in an attempt to remove differences between coached and uncoached students." | Denigrating | Auditing for score inflation using self-monitoring assessments: Findings from three pilot studies | Harvard Library Office for Scholarly Communication, to be published in Educational Assessment | https://dash.harvard.edu/handle/1/28269315 | So now it seems that there is other, previous research on test coaching but, Koretz, et al. pick out only three from the many available and declare them to be inferior to their work. Koretz, et al. studies do not control for any aspect of test administration and, at best, make only meager efforts at content matching between the two tests they compare. | |
22 | Daniel Koretz | "Second, there are a few studies in the literature in which motivational effects were ruled out. For example, in the first study of score inflation (Koretz, Linn, Dunbar, & Shepard, 1991), we administered a parallel form of the high-stakes test to a random subsample of classrooms instead of the target test and found no evidence of motivational effects in 3rd grade." | 1stness | Adapting Measurement to the Demands of Test-Based Accountability: Rejoinder to Commentaries, p.190 | Measurement, 13: 189–191, 2015 | ISSN: 1536-6367 print / 1536-6359 online DOI: 10.1080/15366367.2015.1000712 | * The most famous test score inflation study of all time -- John J. Cannells "Lake Wobegon Effect" study -- preceded Koretz's by several years. See: http://nonpartisaneducation.org/Review/Books/CannellBook1.htm http://nonpartisaneducation.org/Review/Books/Cannell2.pdf | ||
23 | Daniel Koretz | "'… presumably, tests that incorporate more performance assessments are less subject to TSI.' I do not believe that there is any evidence that this is the case." | Adapting Measurement to the Demands of Test-Based Accountability: Rejoinder to Commentaries, p.193 | Measurement, 13: 189–191, 2015 | ISSN: 1536-6367 print / 1536-6359 online DOI: 10.1080/15366367.2015.1000712 | The test prep, or test coaching, literature is vast and dates back decades, with meta-analyses of the literature dating back at least to the 1970s. There's even a What Works Clearinghouse summary of the (post World Wide Web) college admission test prep research literature: https://ies.ed.gov/ncee/wwc/Docs/InterventionReports/wwc_act_sat_100416.pdf . See also: Gilmore (1927) DeWeerdt (1927) French (1959) French & Dear (1959) Ortar (1960) Marron (1965) ETS (1965). Messick & Jungeblut (1981) Ellis, Konoske, Wulfeck, & Montague (1982) DerSimonian and Laird (1983) Kulik, Bangert-Drowns & Kulik (1984) Powers (1985) Samson (1985) Scruggs, White, & Bennion (1986) Jones (1986). Fraker (1986/1987) Halpin (1987) Whitla (1988) Snedecor (1989) Bond (1989). Baydar (1990) Becker (1990) Smyth (1990) Moore (1991) Alderson & Wall (1992) Powers (1993) Oren (1993). Powers & Rock (1994) Scholes, Lane (1997) Allalouf & Ben Shakhar (1998) Robb & Ercanbrack (1999) McClain (1999) Camara (1999, 2001, 2008) Stone & Lane (2000, 2003) Din & Soldan (2001) Briggs (2001) Palmer (2002) Briggs & Hansen (2004) Cankoy & Ali Tut (2005) Crocker (2005) Allensworth, Correa, & Ponisciak (2008) Domingue & Briggs (2009) Koljatic & Silva (2014) Early (2019) Herndon (2021) | |||
24 | Daniel Koretz | "To keep this discussion reasonable in length, I will not discuss the evaluation of other effects of TBA, although I agree with Haertel (2013), who asserted that more-extensive evaluation of impact is essential." | Dismissive | Adapting Educational Measurement to the Demands of Test-Based Accountability, p.2 | Measurement, 13: 1–25, 2015 | ISSN: 1536-6367 print / 1536-6359 online DOI: 10.1080/15366367.2015.1000712 | See, for example, https://journals.sagepub.com/doi/abs/10.1177/0193841X19865628#abstract & https://www.tandfonline.com/doi/full/10.1080/15305058.2011.602920 ; https://nonpartisaneducation.org/Review/Resources/QuantitativeList.htm ; https://nonpartisaneducation.org/Review/Resources/SurveyList.htm ; https://nonpartisaneducation.org/Review/Resources/QualitativeList.htm | ||
25 | Daniel Koretz | "Indeed, the first empirical study identifying score inflation (Koretz, Linn, Dunbar, & Shepard, 1991) was conducted in a district in which teachers perceived strong pressure to raise scores but did not face tangible sanctions of the sort that are common today." | 1stness | Adapting Educational Measurement to the Demands of Test-Based Accountability, p.2 | Measurement, 13: 1–25, 2015 | ISSN: 1536-6367 print / 1536-6359 online DOI: 10.1080/15366367.2015.1000712 | The most famous, and most honest, study of test score inflation--which primarily blamed cheating, corruption, and lax test security for it--was conducted by John J. Cannell in the mid 1980s. Koretz and his colleagues at CRESST have misrepresented Cannell's reports for three decades. Now, Koretz is claiming that he conducted the first test score inflation study around 1990. See: http://nonpartisaneducation.org/Review/Books/CannellBook1.htm http://nonpartisaneducation.org/Review/Books/Cannell2.pdf The tests in Koretz's 1990 study had no stakes; the stakes were not high by the standards of the time. At that time--1990--a majority of states administered tests for which passage was required for graduation or passage to the next grade. | ||
26 | Daniel Koretz | "The growth of TBA has also spurred empirical research on the effects of high-stakes testing." | Dismissive | Adapting Educational Measurement to the Demands of Test-Based Accountability, p.4 | Measurement, 13: 1–25, 2015 | ISSN: 1536-6367 print / 1536-6359 online DOI: 10.1080/15366367.2015.1000712 | Rubbish. Entire books dating back a century were written on the topic, for example: C.C. Ross, Measurement in Today’s Schools, 1942; G.M. Ruch, G.D. Stoddard, Tests and Measurements in High School Instruction, 1927; C.W. Odell, Educational Measurement in High School, 1930. Other testimonies to the abundance of educational testing and empirical research on test use starting in the first half of the twentieth century can be found in Lincoln & Workman 1936, 4, 7; Butts 1947, 605; Monroe 1950, 1461; Holman & Docter 1972, 34; Tyack 1974, 183; and Lohman 1997, 88. | ||
27 | Daniel Koretz | "Research bearing on coaching is more rudimentary. Few of the relevant studies focused on the nonsubstantive details of specific tests that would provide the basis for many coaching strategies, and coaching is often subsumed under a broader category of test preparation." | Denigrating | Adapting Educational Measurement to the Demands of Test-Based Accountability, p.6 | Measurement, 13: 1–25, 2015 | ISSN: 1536-6367 print / 1536-6359 online DOI: 10.1080/15366367.2015.1000712 | In fact the test prep, or test coaching, literature is vast and dates back decades, with meta-analyses of the literature dating back at least to the 1970s. There's even a What Works Clearinghouse summary of the (post World Wide Web) college admission test prep research literature: https://ies.ed.gov/ncee/wwc/Docs/InterventionReports/wwc_act_sat_100416.pdf . See also: Gilmore (1927) DeWeerdt (1927) French (1959) French & Dear (1959) Ortar (1960) Marron (1965) ETS (1965). Messick & Jungeblut (1981) Ellis, Konoske, Wulfeck, & Montague (1982) DerSimonian and Laird (1983) Kulik, Bangert-Drowns & Kulik (1984) Powers (1985) Samson (1985) Scruggs, White, & Bennion (1986) Jones (1986). Fraker (1986/1987) Halpin (1987) Whitla (1988) Snedecor (1989) Bond (1989). Baydar (1990) Becker (1990) Smyth (1990) Moore (1991) Alderson & Wall (1992) Powers (1993) Oren (1993). Powers & Rock (1994) Scholes, Lane (1997) Allalouf & Ben Shakhar (1998) Robb & Ercanbrack (1999) McClain (1999) Camara (1999, 2001, 2008) Stone & Lane (2000, 2003) Din & Soldan (2001) Briggs (2001) Palmer (2002) Briggs & Hansen (2004) Cankoy & Ali Tut (2005) Crocker (2005) Allensworth, Correa, & Ponisciak (2008) Domingue & Briggs (2009) Koljatic & Silva (2014) Early (2019) Herndon (2021) | ||
28 | Daniel Koretz | "Although studies of score inflation are not numerous—relevant data are limited and policy makers usually have no incentive to make such studies feasible—it is nonetheless clear that inflation is common and is often very large." | Dismissive | Adapting Educational Measurement to the Demands of Test-Based Accountability, p.6 | Measurement, 13: 1–25, 2015 | ISSN: 1536-6367 print / 1536-6359 online DOI: 10.1080/15366367.2015.1000712 | In fact, we know quite a lot about the source of higher levels of score inflation -- it is lax test security. The many experimental studies of test coaching are consistent, it has some modest effect, and not the volatile or very large effects that Koretz claims. | ||
29 | Daniel Koretz | "Most of the studies gauge inflation by evaluating the disparity in score trends between a high-stakes test and a lower stakes audit test, often the National Assessment of Educational Progress (NAEP)." | Denigrating | Adapting Educational Measurement to the Demands of Test-Based Accountability, p.6 | Measurement, 13: 1–25, 2015 | ISSN: 1536-6367 print / 1536-6359 online DOI: 10.1080/15366367.2015.1000712 | Only most of his studies are like that. Most relevant studies on the topic are experimental, comparing two groups -- one receiving coaching and the other not. | ||
30 | Daniel Koretz | "The first empirical study of inflation, a cluster-randomized experiment conducted in a context that by current standards was quite low-stakes,… Other studies of inflation have not been experiments," | 1stness | Adapting Educational Measurement to the Demands of Test-Based Accountability, p.7 | Measurement, 13: 1–25, 2015 | ISSN: 1536-6367 print / 1536-6359 online DOI: 10.1080/15366367.2015.1000712 | The most famous test score inflation study of all time -- John J. Cannells "Lake Wobegon Effect" study -- preceded Koretz's by several years. See: http://nonpartisaneducation.org/Review/Books/CannellBook1.htm http://nonpartisaneducation.org/Review/Books/Cannell2.pdf. His 1991 test was, in fact, no-stakes, in a period when over twenty states administered high-stakes graduation tests. Most relevant studies on the topic are experimental, comparing two groups -- one receiving coaching and the other not. | ||
31 | Daniel Koretz | "While calls for sampling from a broad range of items are not uncommon (e.g., Hanushek, 2009), I am aware of only 2 specific suggestions in the published literature for how this might be done, one calling for the use of 2 separate tests and the second suggesting embedding less predictable items in the test used for accountability. There is as yet almost no empirical work exploring the advantages and disadvantages of these (and of as yet unspecified alternative) approaches. This is an area in which there is a pressing need for research." | Dismissive | Adapting Educational Measurement to the Demands of Test-Based Accountability, p.16 | Measurement, 13: 1–25, 2015 | ISSN: 1536-6367 print / 1536-6359 online DOI: 10.1080/15366367.2015.1000712 | |||
32 | Daniel Koretz | "The first empirical study that identified score inflation [Koretz et al., 1991] was conducted in an environment in which there were no explicit sanctions and rewards." | 1stness | Adapting Educational Measurement to the Demands of Test-Based Accountability, p.16 | Measurement, 13: 1–25, 2015 | ISSN: 1536-6367 print / 1536-6359 online DOI: 10.1080/15366367.2015.1000712 | The most famous test score inflation study of all time -- John J. Cannells "Lake Wobegon Effect" study -- preceded Koretz's by several years. See: http://nonpartisaneducation.org/Review/Books/CannellBook1.htm http://nonpartisaneducation.org/Review/Books/Cannell2.pdf. His 1991 test was, in fact, no-stakes, in a period when over twenty states administered high-stakes graduation tests. Most relevant studies on the topic are experimental, comparing two groups -- one receiving coaching and the other not. | ||
33 | Daniel Koretz | "I agree with Brennan (personal communication, October 11, 2007), who suggested that the data currently used in NEAT linking are not sufficient to resolve this problem. I am aware of no efforts to design and evaluate reasonable alternatives." | Dismissive | Adapting Educational Measurement to the Demands of Test-Based Accountability, p.19 | Measurement, 13: 1–25, 2015 | ISSN: 1536-6367 print / 1536-6359 online DOI: 10.1080/15366367.2015.1000712 | |||
34 | Daniel Koretz | "Second, if items are too novel, they may be too salient and therefore too memorable to serve as uncorrupted linking items. There is as yet no research indicating whether this approach is feasible." | Dismissive | Adapting Educational Measurement to the Demands of Test-Based Accountability, p.20 | Measurement, 13: 1–25, 2015 | ISSN: 1536-6367 print / 1536-6359 online DOI: 10.1080/15366367.2015.1000712 | |||
35 | Daniel Koretz | "Finally, we need more-frequent evaluation of the effects of testing. … We need evaluation of behavioral responses to testing, including explicit test preparation and other aspects of instruction. This is needed not only for evaluation but to inform the test-design decisions sketched above." | Dismissive | Adapting Educational Measurement to the Demands of Test-Based Accountability, p.21 | Measurement, 13: 1–25, 2015 | ISSN: 1536-6367 print / 1536-6359 online DOI: 10.1080/15366367.2015.1000712 | Rubbish. Entire books dating back a century were written on the topic, for example: C.C. Ross, Measurement in Today’s Schools, 1942; G.M. Ruch, G.D. Stoddard, Tests and Measurements in High School Instruction, 1927; C.W. Odell, Educational Measurement in High School, 1930. Other testimonies to the abundance of educational testing and empirical research on test use starting in the first half of the twentieth century can be found in Lincoln & Workman 1936, 4, 7; Butts 1947, 605; Monroe 1950, 1461; Holman & Docter 1972, 34; Tyack 1974, 183; and Lohman 1997, 88. | ||
36 | Daniel Koretz | "Third, independent studies augmenting conventional validation are hindered by the fact that in education, in contrast to some other areas of public policy, there is no expectation that data should be accessible for research purposes. Researchers proposing potentially threatening studies, such as evaluations of possible score inflation, are sometimes denied access to data." | Dismissive | Adapting Educational Measurement to the Demands of Test-Based Accountability, p.22 | Measurement, 13: 1–25, 2015 | ISSN: 1536-6367 print / 1536-6359 online DOI: 10.1080/15366367.2015.1000712 | There are privacy controls on student data; there should probably be more and they should probably be enforced more vigourously. But, such controls are even stronger with medical data, which Koretz seems to imply here are weaker. Nonetheless, access to anonymized student data is granted all the time. Externally administered high-stakes testing is widely reviled among US educationists. It strains credulity that Koretz can not find one district out of the many thousands to cooperate with him to conduct a study to discredit testing. | ||
37 | Daniel Koretz | "The problem of linking under high-stakes conditions is a particularly difficult one that could benefi from research." | Dismissive | Adapting Educational Measurement to the Demands of Test-Based Accountability, p.22 | Measurement, 13: 1–25, 2015 | ISSN: 1536-6367 print / 1536-6359 online DOI: 10.1080/15366367.2015.1000712 | |||
38 | Daniel M. Koretz | Waldman, Yu, Langli, Orzech | "However, the literature investigating the distribution of score inflation remains limited. Much of it is highly aggregated, e.g., comparing inflation for subgroups at the level of states (e.g., Klein et al., 2000)." p.1 | Denigrating | Using the introduction of a new test to investigate the distribution of score inflation | Working paper of Education Accountability Project at the Harvard Graduate School of Education, Nov. 2014 | http://projects.iq.harvard.edu/files/eap/files/ky_cot_3_2_15_working_paper.pdf | The research reported here was supported by the Institute of Education Sciences, U.S. Department of Education through Grant R305AII0420 to the President and Fellows of Harvard College. | |
39 | Daniel M. Koretz | Waldman, Yu, Langli, Orzech | "Using statewide data from Kentucky, this study uses a novel approach to explore the distribution of score inflation: we examine the distribution of changes in performance when long-standing high-stakes test was replaced by a new high-stakes test aligned with the Common Core state standards." p.1 | 1stness | Using the introduction of a new test to investigate the distribution of score inflation | Working paper of Education Accountability Project at the Harvard Graduate School of Education, Nov. 2014 | http://projects.iq.harvard.edu/files/eap/files/ky_cot_3_2_15_working_paper.pdf | In fact the test prep, or test coaching, literature is vast and dates back decades, with meta-analyses of the literature dating back at least to the 1970s. There's even a What Works Clearinghouse summary of the (post World Wide Web) college admission test prep research literature: https://ies.ed.gov/ncee/wwc/Docs/InterventionReports/wwc_act_sat_100416.pdf . See also: Gilmore (1927) DeWeerdt (1927) French (1959) French & Dear (1959) Ortar (1960) Marron (1965) ETS (1965). Messick & Jungeblut (1981) Ellis, Konoske, Wulfeck, & Montague (1982) DerSimonian and Laird (1983) Kulik, Bangert-Drowns & Kulik (1984) Powers (1985) Samson (1985) Scruggs, White, & Bennion (1986) Jones (1986). Fraker (1986/1987) Halpin (1987) Whitla (1988) Snedecor (1989) Bond (1989). Baydar (1990) Becker (1990) Smyth (1990) Moore (1991) Alderson & Wall (1992) Powers (1993) Oren (1993). Powers & Rock (1994) Scholes, Lane (1997) Allalouf & Ben Shakhar (1998) Robb & Ercanbrack (1999) McClain (1999) Camara (1999, 2001, 2008) Stone & Lane (2000, 2003) Din & Soldan (2001) Briggs (2001) Palmer (2002) Briggs & Hansen (2004) Cankoy & Ali Tut (2005) Crocker (2005) Allensworth, Correa, & Ponisciak (2008) Domingue & Briggs (2009) Koljatic & Silva (2014) Early (2019) Herndon (2021) | The research reported here was supported by the Institute of Education Sciences, U.S. Department of Education through Grant R305AII0420 to the President and Fellows of Harvard College. |
40 | Daniel M. Koretz | Waldman, Yu, Langli, Orzech | “Few studies have applied a multi-level framework to the evaluation of inflation,” p. 1 | Denigrating | Using the introduction of a new test to investigate the distribution of score inflation | Working paper of Education Accountability Project at the Harvard Graduate School of Education, Nov. 2014 | http://projects.iq.harvard.edu/files/eap/files/ky_cot_3_2_15_working_paper.pdf | In fact the test prep, or test coaching, literature is vast and dates back decades, with meta-analyses of the literature dating back at least to the 1970s. There's even a What Works Clearinghouse summary of the (post World Wide Web) college admission test prep research literature: https://ies.ed.gov/ncee/wwc/Docs/InterventionReports/wwc_act_sat_100416.pdf . See also: Gilmore (1927) DeWeerdt (1927) French (1959) French & Dear (1959) Ortar (1960) Marron (1965) ETS (1965). Messick & Jungeblut (1981) Ellis, Konoske, Wulfeck, & Montague (1982) DerSimonian and Laird (1983) Kulik, Bangert-Drowns & Kulik (1984) Powers (1985) Samson (1985) Scruggs, White, & Bennion (1986) Jones (1986). Fraker (1986/1987) Halpin (1987) Whitla (1988) Snedecor (1989) Bond (1989). Baydar (1990) Becker (1990) Smyth (1990) Moore (1991) Alderson & Wall (1992) Powers (1993) Oren (1993). Powers & Rock (1994) Scholes, Lane (1997) Allalouf & Ben Shakhar (1998) Robb & Ercanbrack (1999) McClain (1999) Camara (1999, 2001, 2008) Stone & Lane (2000, 2003) Din & Soldan (2001) Briggs (2001) Palmer (2002) Briggs & Hansen (2004) Cankoy & Ali Tut (2005) Crocker (2005) Allensworth, Correa, & Ponisciak (2008) Domingue & Briggs (2009) Koljatic & Silva (2014) Early (2019) Herndon (2021) | The research reported here was supported by the Institute of Education Sciences, U.S. Department of Education through Grant R305AII0420 to the President and Fellows of Harvard College. |
41 | Daniel M. Koretz | Waldman, Yu, Langli, Orzech | "The problem of score inflation has been well documented over the past quarter century. … (Lindquist, 1951). However, the empirical literature evaluating inflation arose decades later in response to the increasing importance of high-stakes testing," p.2 | Dismissive | Using the introduction of a new test to investigate the distribution of score inflation | Working paper of Education Accountability Project at the Harvard Graduate School of Education, Nov. 2014 | http://projects.iq.harvard.edu/files/eap/files/ky_cot_3_2_15_working_paper.pdf | In fact the test prep, or test coaching, literature is vast and dates back decades, with meta-analyses of the literature dating back at least to the 1970s. There's even a What Works Clearinghouse summary of the (post World Wide Web) college admission test prep research literature: https://ies.ed.gov/ncee/wwc/Docs/InterventionReports/wwc_act_sat_100416.pdf . See also: Gilmore (1927) DeWeerdt (1927) French (1959) French & Dear (1959) Ortar (1960) Marron (1965) ETS (1965). Messick & Jungeblut (1981) Ellis, Konoske, Wulfeck, & Montague (1982) DerSimonian and Laird (1983) Kulik, Bangert-Drowns & Kulik (1984) Powers (1985) Samson (1985) Scruggs, White, & Bennion (1986) Jones (1986). Fraker (1986/1987) Halpin (1987) Whitla (1988) Snedecor (1989) Bond (1989). Baydar (1990) Becker (1990) Smyth (1990) Moore (1991) Alderson & Wall (1992) Powers (1993) Oren (1993). Powers & Rock (1994) Scholes, Lane (1997) Allalouf & Ben Shakhar (1998) Robb & Ercanbrack (1999) McClain (1999) Camara (1999, 2001, 2008) Stone & Lane (2000, 2003) Din & Soldan (2001) Briggs (2001) Palmer (2002) Briggs & Hansen (2004) Cankoy & Ali Tut (2005) Crocker (2005) Allensworth, Correa, & Ponisciak (2008) Domingue & Briggs (2009) Koljatic & Silva (2014) Early (2019) Herndon (2021) | The research reported here was supported by the Institute of Education Sciences, U.S. Department of Education through Grant R305AII0420 to the President and Fellows of Harvard College. |
42 | Daniel M. Koretz | Waldman, Yu, Langli, Orzech | "Most often, potential inflation has been evaluated by comparing trends in scores on a high-stakes test to trends on another, lower-stakes “audit” test measuring a similar domain." p.2 | Dismissive | Using the introduction of a new test to investigate the distribution of score inflation | Working paper of Education Accountability Project at the Harvard Graduate School of Education, Nov. 2014 | http://projects.iq.harvard.edu/files/eap/files/ky_cot_3_2_15_working_paper.pdf | No, most often test-score inflation caused by teaching to the test has been measured in experimental studies of test prep / test coaching -- a large research literature which Koretz ignores. | The research reported here was supported by the Institute of Education Sciences, U.S. Department of Education through Grant R305AII0420 to the President and Fellows of Harvard College. |
43 | Daniel M. Koretz | Waldman, Yu, Langli, Orzech | "Although no studies to date directly link specific behaviors of individual educators to score inflation, …" p.3 | Dismissive | Using the introduction of a new test to investigate the distribution of score inflation | Working paper of Education Accountability Project at the Harvard Graduate School of Education, Nov. 2014 | http://projects.iq.harvard.edu/files/eap/files/ky_cot_3_2_15_working_paper.pdf | In fact the test prep, or test coaching, literature is vast and dates back decades, with meta-analyses of the literature dating back at least to the 1970s. There's even a What Works Clearinghouse summary of the (post World Wide Web) college admission test prep research literature: https://ies.ed.gov/ncee/wwc/Docs/InterventionReports/wwc_act_sat_100416.pdf . See also: Gilmore (1927) DeWeerdt (1927) French (1959) French & Dear (1959) Ortar (1960) Marron (1965) ETS (1965). Messick & Jungeblut (1981) Ellis, Konoske, Wulfeck, & Montague (1982) DerSimonian and Laird (1983) Kulik, Bangert-Drowns & Kulik (1984) Powers (1985) Samson (1985) Scruggs, White, & Bennion (1986) Jones (1986). Fraker (1986/1987) Halpin (1987) Whitla (1988) Snedecor (1989) Bond (1989). Baydar (1990) Becker (1990) Smyth (1990) Moore (1991) Alderson & Wall (1992) Powers (1993) Oren (1993). Powers & Rock (1994) Scholes, Lane (1997) Allalouf & Ben Shakhar (1998) Robb & Ercanbrack (1999) McClain (1999) Camara (1999, 2001, 2008) Stone & Lane (2000, 2003) Din & Soldan (2001) Briggs (2001) Palmer (2002) Briggs & Hansen (2004) Cankoy & Ali Tut (2005) Crocker (2005) Allensworth, Correa, & Ponisciak (2008) Domingue & Briggs (2009) Koljatic & Silva (2014) Early (2019) Herndon (2021) | The research reported here was supported by the Institute of Education Sciences, U.S. Department of Education through Grant R305AII0420 to the President and Fellows of Harvard College. |
44 | Daniel M. Koretz | Waldman, Yu, Langli, Orzech | "The limited number of findings to date suggesting greater test preparation and score inflation among disadvantaged students is not surprising given the specifics of schooling and accountability in the U.S." p.4 | Dismissive | Using the introduction of a new test to investigate the distribution of score inflation | Working paper of Education Accountability Project at the Harvard Graduate School of Education, Nov. 2014 | http://projects.iq.harvard.edu/files/eap/files/ky_cot_3_2_15_working_paper.pdf | In fact the test prep, or test coaching, literature is vast and dates back decades, with meta-analyses of the literature dating back at least to the 1970s. There's even a What Works Clearinghouse summary of the (post World Wide Web) college admission test prep research literature: https://ies.ed.gov/ncee/wwc/Docs/InterventionReports/wwc_act_sat_100416.pdf . See also: Gilmore (1927) DeWeerdt (1927) French (1959) French & Dear (1959) Ortar (1960) Marron (1965) ETS (1965). Messick & Jungeblut (1981) Ellis, Konoske, Wulfeck, & Montague (1982) DerSimonian and Laird (1983) Kulik, Bangert-Drowns & Kulik (1984) Powers (1985) Samson (1985) Scruggs, White, & Bennion (1986) Jones (1986). Fraker (1986/1987) Halpin (1987) Whitla (1988) Snedecor (1989) Bond (1989). Baydar (1990) Becker (1990) Smyth (1990) Moore (1991) Alderson & Wall (1992) Powers (1993) Oren (1993). Powers & Rock (1994) Scholes, Lane (1997) Allalouf & Ben Shakhar (1998) Robb & Ercanbrack (1999) McClain (1999) Camara (1999, 2001, 2008) Stone & Lane (2000, 2003) Din & Soldan (2001) Briggs (2001) Palmer (2002) Briggs & Hansen (2004) Cankoy & Ali Tut (2005) Crocker (2005) Allensworth, Correa, & Ponisciak (2008) Domingue & Briggs (2009) Koljatic & Silva (2014) Early (2019) Herndon (2021) | The research reported here was supported by the Institute of Education Sciences, U.S. Department of Education through Grant R305AII0420 to the President and Fellows of Harvard College. |
45 | Daniel M. Koretz | Waldman, Yu, Langli, Orzech | "This study evaluates these hypotheses by examining changes in performance when one high-stakes test is replaced by another. This approach, not previously used in this literature…." p.6 | Dismissive | Using the introduction of a new test to investigate the distribution of score inflation | Working paper of Education Accountability Project at the Harvard Graduate School of Education, Nov. 2014 | http://projects.iq.harvard.edu/files/eap/files/ky_cot_3_2_15_working_paper.pdf | In fact the test prep, or test coaching, literature is vast and dates back decades, with meta-analyses of the literature dating back at least to the 1970s. There's even a What Works Clearinghouse summary of the (post World Wide Web) college admission test prep research literature: https://ies.ed.gov/ncee/wwc/Docs/InterventionReports/wwc_act_sat_100416.pdf . See also: Gilmore (1927) DeWeerdt (1927) French (1959) French & Dear (1959) Ortar (1960) Marron (1965) ETS (1965). Messick & Jungeblut (1981) Ellis, Konoske, Wulfeck, & Montague (1982) DerSimonian and Laird (1983) Kulik, Bangert-Drowns & Kulik (1984) Powers (1985) Samson (1985) Scruggs, White, & Bennion (1986) Jones (1986). Fraker (1986/1987) Halpin (1987) Whitla (1988) Snedecor (1989) Bond (1989). Baydar (1990) Becker (1990) Smyth (1990) Moore (1991) Alderson & Wall (1992) Powers (1993) Oren (1993). Powers & Rock (1994) Scholes, Lane (1997) Allalouf & Ben Shakhar (1998) Robb & Ercanbrack (1999) McClain (1999) Camara (1999, 2001, 2008) Stone & Lane (2000, 2003) Din & Soldan (2001) Briggs (2001) Palmer (2002) Briggs & Hansen (2004) Cankoy & Ali Tut (2005) Crocker (2005) Allensworth, Correa, & Ponisciak (2008) Domingue & Briggs (2009) Koljatic & Silva (2014) Early (2019) Herndon (2021) | The research reported here was supported by the Institute of Education Sciences, U.S. Department of Education through Grant R305AII0420 to the President and Fellows of Harvard College. |
46 | Daniel M. Koretz | "What we don’t know, What is the net effect on student
achievement? - Weak research designs, weaker data - Some evidence of inconsistent, modest effects in elementary math, none in reading - Effects are likely to vary across contexts... Reason: grossly inadequate research and evaluation" |
Denigrating | Using tests for monitoring and accountability | Presentation at: Agencia de Calidad de la Educación Santiago, Chile, November 3, 2014 | See, for example, https://journals.sagepub.com/doi/abs/10.1177/0193841X19865628#abstract & https://www.tandfonline.com/doi/full/10.1080/15305058.2011.602920 ; https://nonpartisaneducation.org/Review/Resources/QuantitativeList.htm ; https://nonpartisaneducation.org/Review/Resources/SurveyList.htm ; https://nonpartisaneducation.org/Review/Resources/QualitativeList.htm | |||
47 | Rebecca Holcombe, Jennifer L. Jennings, Daniel Koretz | "To date, few studies have attempted to understand the sources of variation in score inflation across testing programs. In particular, research has not identified the specific characteristics of tests that facilitate or impede score inflation and inappropriate test preparation, that is, test preparation that inflates scores. Without this information, it is impossible to improve existing assessments to lessen these problems." | Dismissive | THE ROOTS OF SCORE INFLATION: An Examination of Opportunities in Two States’ Tests, p.164 | Chapter 7, in Gail Sunderman, Ed., (2013) Charting Reform, Achieving Equity in a Diverse Nation, pp. 163–189 | http://dash.harvard.edu/bitstream/handle/1/10880587/roots%20of%20score%20inflation.pdf?sequence=1 | In fact the test prep, or test coaching, literature is vast and dates back decades, with meta-analyses of the literature dating back at least to the 1970s. There's even a What Works Clearinghouse summary of the (post World Wide Web) college admission test prep research literature: https://ies.ed.gov/ncee/wwc/Docs/InterventionReports/wwc_act_sat_100416.pdf . See also: Gilmore (1927) DeWeerdt (1927) French (1959) French & Dear (1959) Ortar (1960) Marron (1965) ETS (1965). Messick & Jungeblut (1981) Ellis, Konoske, Wulfeck, & Montague (1982) DerSimonian and Laird (1983) Kulik, Bangert-Drowns & Kulik (1984) Powers (1985) Samson (1985) Scruggs, White, & Bennion (1986) Jones (1986). Fraker (1986/1987) Halpin (1987) Whitla (1988) Snedecor (1989) Bond (1989). Baydar (1990) Becker (1990) Smyth (1990) Moore (1991) Alderson & Wall (1992) Powers (1993) Oren (1993). Powers & Rock (1994) Scholes, Lane (1997) Allalouf & Ben Shakhar (1998) Robb & Ercanbrack (1999) McClain (1999) Camara (1999, 2001, 2008) Stone & Lane (2000, 2003) Din & Soldan (2001) Briggs (2001) Palmer (2002) Briggs & Hansen (2004) Cankoy & Ali Tut (2005) Crocker (2005) Allensworth, Correa, & Ponisciak (2008) Domingue & Briggs (2009) Koljatic & Silva (2014) Early (2019) Herndon (2021) | ||
48 | Rebecca Holcombe, Jennifer L. Jennings, Daniel Koretz | "This chapter is the first attempt in the literature to systematically investigate the opportunities for score inflation within current tests." | 1stness | THE ROOTS OF SCORE INFLATION: An Examination of Opportunities in Two States’ Tests, p.164 | Chapter 7, in Gail Sunderman, Ed., (2013) Charting Reform, Achieving Equity in a Diverse Nation, pp. 163–189 | http://dash.harvard.edu/bitstream/handle/1/10880587/roots%20of%20score%20inflation.pdf?sequence=1 | In fact the test prep, or test coaching, literature is vast and dates back decades, with meta-analyses of the literature dating back at least to the 1970s. There's even a What Works Clearinghouse summary of the (post World Wide Web) college admission test prep research literature: https://ies.ed.gov/ncee/wwc/Docs/InterventionReports/wwc_act_sat_100416.pdf . See also: Gilmore (1927) DeWeerdt (1927) French (1959) French & Dear (1959) Ortar (1960) Marron (1965) ETS (1965). Messick & Jungeblut (1981) Ellis, Konoske, Wulfeck, & Montague (1982) DerSimonian and Laird (1983) Kulik, Bangert-Drowns & Kulik (1984) Powers (1985) Samson (1985) Scruggs, White, & Bennion (1986) Jones (1986). Fraker (1986/1987) Halpin (1987) Whitla (1988) Snedecor (1989) Bond (1989). Baydar (1990) Becker (1990) Smyth (1990) Moore (1991) Alderson & Wall (1992) Powers (1993) Oren (1993). Powers & Rock (1994) Scholes, Lane (1997) Allalouf & Ben Shakhar (1998) Robb & Ercanbrack (1999) McClain (1999) Camara (1999, 2001, 2008) Stone & Lane (2000, 2003) Din & Soldan (2001) Briggs (2001) Palmer (2002) Briggs & Hansen (2004) Cankoy & Ali Tut (2005) Crocker (2005) Allensworth, Correa, & Ponisciak (2008) Domingue & Briggs (2009) Koljatic & Silva (2014) Early (2019) Herndon (2021) | ||
49 | Rebecca Holcombe, Jennifer L. Jennings, Daniel Koretz | "The first empirical investigation of inflation was conducted by Koretz et al. (1991) in a district with a testing policy that was high-stakes by the standards of the day, but much lower stakes than the norm today." | 1stness | THE ROOTS OF SCORE INFLATION: An Examination of Opportunities in Two States’ Tests, p. 166 | Chapter 7, in Gail Sunderman, Ed., (2013) Charting Reform, Achieving Equity in a Diverse Nation, pp. 163–189 | http://dash.harvard.edu/bitstream/handle/1/10880587/roots%20of%20score%20inflation.pdf?sequence=1 | The most famous, and most honest, study of test score inflation--which primarily blamed cheating, corruption, and lax test security for it--was conducted by John J. Cannell in the mid 1980s. Koretz and his colleagues at CRESST have misrepresented Cannell's reports for three decades. Now, Koretz is claiming that he conducted the first test score inflation study around 1990. See: http://nonpartisaneducation.org/Review/Books/CannellBook1.htm http://nonpartisaneducation.org/Review/Books/Cannell2.pdf The tests in Koretz's 1990 study had no stakes; the stakes were not high by the standards of the time. At that time--1990--a majority of states administered tests for which passage was required for graduation or passage to the next grade. | ||
50 | Rebecca Holcombe, Jennifer L. Jennings, Daniel Koretz | "Research has not yet tied variations in score inflation to specific forms of test preparation." | Dismissive | THE ROOTS OF SCORE INFLATION: An Examination of Opportunities in Two States’ Tests, p. 168 | Chapter 7, in Gail Sunderman, Ed., (2013) Charting Reform, Achieving Equity in a Diverse Nation, pp. 163–189 | http://dash.harvard.edu/bitstream/handle/1/10880587/roots%20of%20score%20inflation.pdf?sequence=1 | In fact the test prep, or test coaching, literature is vast and dates back decades, with meta-analyses of the literature dating back at least to the 1970s. There's even a What Works Clearinghouse summary of the (post World Wide Web) college admission test prep research literature: https://ies.ed.gov/ncee/wwc/Docs/InterventionReports/wwc_act_sat_100416.pdf . See also: Gilmore (1927) DeWeerdt (1927) French (1959) French & Dear (1959) Ortar (1960) Marron (1965) ETS (1965). Messick & Jungeblut (1981) Ellis, Konoske, Wulfeck, & Montague (1982) DerSimonian and Laird (1983) Kulik, Bangert-Drowns & Kulik (1984) Powers (1985) Samson (1985) Scruggs, White, & Bennion (1986) Jones (1986). Fraker (1986/1987) Halpin (1987) Whitla (1988) Snedecor (1989) Bond (1989). Baydar (1990) Becker (1990) Smyth (1990) Moore (1991) Alderson & Wall (1992) Powers (1993) Oren (1993). Powers & Rock (1994) Scholes, Lane (1997) Allalouf & Ben Shakhar (1998) Robb & Ercanbrack (1999) McClain (1999) Camara (1999, 2001, 2008) Stone & Lane (2000, 2003) Din & Soldan (2001) Briggs (2001) Palmer (2002) Briggs & Hansen (2004) Cankoy & Ali Tut (2005) Crocker (2005) Allensworth, Correa, & Ponisciak (2008) Domingue & Briggs (2009) Koljatic & Silva (2014) Early (2019) Herndon (2021) | ||
51 | Rebecca Holcombe, Jennifer L. Jennings, Daniel Koretz | "This study provides only a first glimpse of the opportunities for inappropriate test preparation provided by current high-stakes tests." | 1stness | THE ROOTS OF SCORE INFLATION: An Examination of Opportunities in Two States’ Tests, p. 184 | Chapter 7, in Gail Sunderman, Ed., (2013) Charting Reform, Achieving Equity in a Diverse Nation, pp. 163–189 | http://dash.harvard.edu/bitstream/handle/1/10880587/roots%20of%20score%20inflation.pdf?sequence=1 | In fact the test prep, or test coaching, literature is vast and dates back decades, with meta-analyses of the literature dating back at least to the 1970s. There's even a What Works Clearinghouse summary of the (post World Wide Web) college admission test prep research literature: https://ies.ed.gov/ncee/wwc/Docs/InterventionReports/wwc_act_sat_100416.pdf . See also: Gilmore (1927) DeWeerdt (1927) French (1959) French & Dear (1959) Ortar (1960) Marron (1965) ETS (1965). Messick & Jungeblut (1981) Ellis, Konoske, Wulfeck, & Montague (1982) DerSimonian and Laird (1983) Kulik, Bangert-Drowns & Kulik (1984) Powers (1985) Samson (1985) Scruggs, White, & Bennion (1986) Jones (1986). Fraker (1986/1987) Halpin (1987) Whitla (1988) Snedecor (1989) Bond (1989). Baydar (1990) Becker (1990) Smyth (1990) Moore (1991) Alderson & Wall (1992) Powers (1993) Oren (1993). Powers & Rock (1994) Scholes, Lane (1997) Allalouf & Ben Shakhar (1998) Robb & Ercanbrack (1999) McClain (1999) Camara (1999, 2001, 2008) Stone & Lane (2000, 2003) Din & Soldan (2001) Briggs (2001) Palmer (2002) Briggs & Hansen (2004) Cankoy & Ali Tut (2005) Crocker (2005) Allensworth, Correa, & Ponisciak (2008) Domingue & Briggs (2009) Koljatic & Silva (2014) Early (2019) Herndon (2021) | ||
52 | Daniel Koretz | "Studies of score inflation are not all that numerous, but they indicate that the resulting inflation is common and sometimes very large (e.g., Jacob, 2005; Klein, Hamilton, McCaffrey, & Stecher 2000; Koretz & Barron, 1998; Koretz, Linn, Dunbar, & Shepard, 1991). Most of these studies evaluate extrapolation from the high-stakes test to a lower-stakes audit test, often the National Assessment of Educational Progress (NAEP)." | Commentary on E. Haertel, "How Is Testing Supposed to Improve Schooling?" p.41 | Measurement, 11: 40–43, 2013 | Only most of his studies are like that. Most relevant studies on the topic are experimental, comparing two groups -- one receiving coaching and the other not. | ||||
53 | Daniel Koretz | "We face four fundamental barriers to improved evaluation. ... Third, in most cases, states have little incentive to foster this evaluation. Our current accountability system has no countervailing incentives." | Commentary on E. Haertel, "How Is Testing Supposed to Improve Schooling?" p.42 | Measurement, 11: 40–43, 2013 | This is a cynical swipe at state governments, implying that they do not care if their testing programs are valid. In fact, many state governments have assed "sunshine" laws that require evaluations of new programs, or old programs nearing their "sunset" termination. These evaluations do not ordinarily end up in academic journals, but can be found in state offices or archives if one knows where to look and whom to ask. | ||||
54 | Daniel Koretz | "Research has shown several serious problems with accountability as implemented in the U.S., including sometimes severe inflation of scores. However, the extant research is limited and leaves many essential questions about the design of accountability systems unanswered." | Dismissive | Lessons from test-based education reform in the U.S., p. 9 | Z Erziehungswiss (2011) 14:9–23 | See, for example, https://journals.sagepub.com/doi/abs/10.1177/0193841X19865628#abstract & https://www.tandfonline.com/doi/full/10.1080/15305058.2011.602920 ; https://nonpartisaneducation.org/Review/Resources/QuantitativeList.htm ; https://nonpartisaneducation.org/Review/Resources/SurveyList.htm ; https://nonpartisaneducation.org/Review/Resources/QualitativeList.htm | |||
55 | Daniel Koretz | "The most important question is whether the accountability programs have made schools more effective and have increased student achievement. There has been a lively scholarly debate about this in the U.S., and to a lesser degree in Europe as well, but the question remains unanswered." | Dismissive | Lessons from test-based education reform in the U.S., p. 12 | Z Erziehungswiss (2011) 14:9–23 | See, for example, https://journals.sagepub.com/doi/abs/10.1177/0193841X19865628#abstract & https://www.tandfonline.com/doi/full/10.1080/15305058.2011.602920 ; https://nonpartisaneducation.org/Review/Resources/QuantitativeList.htm ; https://nonpartisaneducation.org/Review/Resources/SurveyList.htm ; https://nonpartisaneducation.org/Review/Resources/QualitativeList.htm | |||
56 | Daniel Koretz | "Some of the relevant studies are very weakly designed, and most of the well-designed ones rely on data that are insufficient for this use." | Denigrating | Lessons from test-based education reform in the U.S., p. 12 | Z Erziehungswiss (2011) 14:9–23 | Relevant studies of the effects of varying types of incentive or the optimal structure of incentives include those of Kelley (1999); the *Southern Regional Education Board (1998); Trelfa (1998); Heneman (1998); Banta, Lund, Black & Oblander (1996); Brooks-Cooper, 1993; Eckstein & Noah (1993); Richards & Shen (1992); Jacobson (1992); Heyneman & Ransom (1992); *Levine & Lezotte (1990); Duran, 1989; *Crooks (1988); *Kulik & Kulik (1987); Corcoran & Wilson (1986); *Guskey & Gates (1986); Brook & Oxenham (1985); Oxenham (1984); Venezky & Winfield (1979); Brookover & Lezotte (1979); McMillan (1977); Abbott (1977); *Staats (1973); *Kazdin & Bootzin (1972); *O’Leary & Drabman (1971); Cronbach (1960); Hurlock (1925), and Zeng (2001). *Covers many studies; study is a research review, research synthesis, or meta-analysis. Other researchers who, even prior to 2000, studied test-based incentive programs include Homme, Csanyi, Gonzales, Rechs, O’Leary, Drabman, Kaszdin, Bootzin, Staats, Cameron, Pierce, McMillan, Corcoran, Roueche, Kirk, Wheeler, Boylan, and Wilson. | "Others have
considered the role of tests in incentive programs. These researchers have included Homme,
Csanyi, Gonzales, Rechs, O’Leary, Drabman, Kaszdin, Bootzin, Staats, Cameron,
Pierce, McMillan, Corcoran, and Wilson. International organizations, such as
the World Bank or the Asian Development Bank, have studied the effects of
testing on education programs they sponsor.
Researchers have included Somerset, Heynemann, Ransom, Psacharopoulis,
Velez, Brooke, Oxenham, Bude, Chapman, Snyder, and Pronaratna. Moreover, the mastery learning/mastery testing experiments conducted from the 1960s through today varied incentives, frequency of tests, types of tests, and many other factors to determine the optimal structure of testing programs. Researchers included such notables as Bloom, Carroll, Keller, Block, Burns, Wentling, Anderson, Hymel, Kulik, Tierney, Cross, Okey, Guskey, Gates, and Jones." |
||
57 | Daniel Koretz | "Rather than attempting to estimate the effects of undifferentiated test-based accountability systems using happenstance data, the field needs to use appropriate data to evaluate the varying effects of different approaches to accountability. As yet, we have almost no such research." | Dismissive | Lessons from test-based education reform in the U.S., p. 12 | Z Erziehungswiss (2011) 14:9–23 | See, for example, https://journals.sagepub.com/doi/abs/10.1177/0193841X19865628#abstract & https://www.tandfonline.com/doi/full/10.1080/15305058.2011.602920 ; https://nonpartisaneducation.org/Review/Resources/QuantitativeList.htm ; https://nonpartisaneducation.org/Review/Resources/SurveyList.htm ; https://nonpartisaneducation.org/Review/Resources/QualitativeList.htm | |||
58 | Daniel Koretz | "Despite this long-standing controversy, however, we have very little research comparing the effects of various ways of setting targets." | Dismissive | Lessons from test-based education reform in the U.S., p.13 | Z Erziehungswiss (2011) 14:9–23 | Relevant studies of the effects of varying types of incentive or the optimal structure of incentives include those of Kelley (1999); the *Southern Regional Education Board (1998); Trelfa (1998); Heneman (1998); Banta, Lund, Black & Oblander (1996); Brooks-Cooper, 1993; Eckstein & Noah (1993); Richards & Shen (1992); Jacobson (1992); Heyneman & Ransom (1992); *Levine & Lezotte (1990); Duran, 1989; *Crooks (1988); *Kulik & Kulik (1987); Corcoran & Wilson (1986); *Guskey & Gates (1986); Brook & Oxenham (1985); Oxenham (1984); Venezky & Winfield (1979); Brookover & Lezotte (1979); McMillan (1977); Abbott (1977); *Staats (1973); *Kazdin & Bootzin (1972); *O’Leary & Drabman (1971); Cronbach (1960); Hurlock (1925), and Zeng (2001). *Covers many studies; study is a research review, research synthesis, or meta-analysis. Other researchers who, even prior to 2000, studied test-based incentive programs include Homme, Csanyi, Gonzales, Rechs, O’Leary, Drabman, Kaszdin, Bootzin, Staats, Cameron, Pierce, McMillan, Corcoran, Roueche, Kirk, Wheeler, Boylan, and Wilson. | "Others have
considered the role of tests in incentive programs. These researchers have included Homme,
Csanyi, Gonzales, Rechs, O’Leary, Drabman, Kaszdin, Bootzin, Staats, Cameron,
Pierce, McMillan, Corcoran, and Wilson. International organizations, such as
the World Bank or the Asian Development Bank, have studied the effects of
testing on education programs they sponsor.
Researchers have included Somerset, Heynemann, Ransom, Psacharopoulis,
Velez, Brooke, Oxenham, Bude, Chapman, Snyder, and Pronaratna. Moreover, the mastery learning/mastery testing experiments conducted from the 1960s through today varied incentives, frequency of tests, types of tests, and many other factors to determine the optimal structure of testing programs. Researchers included such notables as Bloom, Carroll, Keller, Block, Burns, Wentling, Anderson, Hymel, Kulik, Tierney, Cross, Okey, Guskey, Gates, and Jones." |
||
59 | Daniel Koretz | [re: standards-based reporting] "Yet here again, extant research is not adequate: one could devise many metrics for reporting performance, and we do not have direct comparisons of their effects." | Denigrating | Lessons from test-based education reform in the U.S. p. 13 | Z Erziehungswiss (2011) 14:9–23 | Relevant studies include: Forte Fast, E., & the Accountability Systems and Reporting State Collaborative on Assessment and Student Standards. (2002). A guide to effective accountability reporting. Washington, DC: Council of Chief State School Officers. * Goodman, D., & Hambleton, R.K. (2005). Some misconceptions about large-scale educational assessments, Chapter 4 in Richard P Phelps (Ed.) Defending Standardized Testing, Psychology Press. * Goodman, D. P., & Hambleton (2004). Student test score reports and interpretive guides: Review of current practices and suggestions for future research. Applied Measurement in Education. * Hambleton, R. K. (2002). How can we make NAEP and state test score reporting scales and reports more understandable? In R. W. Lissitz & W. D. Schafer (Eds.), Assessment in educational reform (pp. 192-205). Boston: Allyn & Bacon. * Impara, J. C., Divine, K. P., Bruce, F. A., Liverman, M. R., & Gay, A. (1991). Does interpretive test score information help teachers? Educational Measurement: Issues and Practice, 10(4), 16-18. * Wainer, H., Hambleton, R. K., & Meara, K. (1999). Alternative displays for communicating NAEP results: A redesign and validity study. Journal of Educational Measurement, 36(4), 301-335. | |||
60 | Daniel Koretz | "Evidence of the effects on practice of using growth rather than CTC measures is extremely sparse. This too is an area in which additional research and development is needed. | Dismissive | Lessons from test-based education reform in the U.S. pp. 13-14 | Z Erziehungswiss (2011) 14:9–23 | ||||
61 | Daniel Koretz | "In the U.S., commitment to including these students [with disabilities] continues to be widespread, but disagreements about the best way of doing so remain strong. Research addressing these questions has grown markedly in recent years but remains inadequate." p. 15 | Dismissive | Lessons from test-based education reform in the U.S. | Z Erziehungswiss (2011) 14:9–23 | Difficult to believe given that the federal government has for decades generously funded research into testing students with disabilities. See, for example, https://nceo.info/ and Kurt Geisinger's and Janet Carlson's chapters in Defending Standardized Testing and Correcting Fallacies in Educational and Psychological Testing. | |||
62 | Daniel Koretz | "Because the NCLB law is very specific about how disaggregated reporting must be done, there has been little research exploring the advantages and disadvantages of different approaches to this." p. 15 | Dismissive | Lessons from test-based education reform in the U.S. | Z Erziehungswiss (2011) 14:9–23 | Relevant studies include: Forte Fast, E., & the Accountability Systems and Reporting State Collaborative on Assessment and Student Standards. (2002). A guide to effective accountability reporting. Washington, DC: Council of Chief State School Officers. * Goodman, D., & Hambleton, R.K. (2005). Some misconceptions about large-scale educational assessments, Chapter 4 in Richard P Phelps (Ed.) Defending Standardized Testing, Psychology Press. * Goodman, D. P., & Hambleton (2004). Student test score reports and interpretive guides: Review of current practices and suggestions for future research. Applied Measurement in Education. * Hambleton, R. K. (2002). How can we make NAEP and state test score reporting scales and reports more understandable? In R. W. Lissitz & W. D. Schafer (Eds.), Assessment in educational reform (pp. 192-205). Boston: Allyn & Bacon. * Impara, J. C., Divine, K. P., Bruce, F. A., Liverman, M. R., & Gay, A. (1991). Does interpretive test score information help teachers? Educational Measurement: Issues and Practice, 10(4), 16-18. * Wainer, H., Hambleton, R. K., & Meara, K. (1999). Alternative displays for communicating NAEP results: A redesign and validity study. Journal of Educational Measurement, 36(4), 301-335. | |||
63 | Daniel Koretz | "Inflation is also often highly variable among classrooms and teachers within a given system, but research exploring this variation is in its infancy." | Dismissive | Lessons from test-based education reform in the U.S., p.16 | Z Erziehungswiss (2011) 14:9–23 | In fact, we know quite a lot about the source of higher levels of score inflation -- it is lax test security. The many experimental studies of test coaching are consistent, it has some modest effect, and not the volatile or very large effects that Koretz claims. | |||
64 | Daniel Koretz | "… under this pressure, educators and students have an incentive to focus on the tested sample, at the cost of other important elements of the domain, thus undermining the test score’s representation of the domain. This problem has received relatively little focus in the measurement field." | Dismissive | Lessons from test-based education reform in the U.S., p.17 | Z Erziehungswiss (2011) 14:9–23 | In fact the test prep, or test coaching, literature is vast and dates back decades, with meta-analyses of the literature dating back at least to the 1970s. There's even a What Works Clearinghouse summary of the (post World Wide Web) college admission test prep research literature: https://ies.ed.gov/ncee/wwc/Docs/InterventionReports/wwc_act_sat_100416.pdf . See also: Gilmore (1927) DeWeerdt (1927) French (1959) French & Dear (1959) Ortar (1960) Marron (1965) ETS (1965). Messick & Jungeblut (1981) Ellis, Konoske, Wulfeck, & Montague (1982) DerSimonian and Laird (1983) Kulik, Bangert-Drowns & Kulik (1984) Powers (1985) Samson (1985) Scruggs, White, & Bennion (1986) Jones (1986). Fraker (1986/1987) Halpin (1987) Whitla (1988) Snedecor (1989) Bond (1989). Baydar (1990) Becker (1990) Smyth (1990) Moore (1991) Alderson & Wall (1992) Powers (1993) Oren (1993). Powers & Rock (1994) Scholes, Lane (1997) Allalouf & Ben Shakhar (1998) Robb & Ercanbrack (1999) McClain (1999) Camara (1999, 2001, 2008) Stone & Lane (2000, 2003) Din & Soldan (2001) Briggs (2001) Palmer (2002) Briggs & Hansen (2004) Cankoy & Ali Tut (2005) Crocker (2005) Allensworth, Correa, & Ponisciak (2008) Domingue & Briggs (2009) Koljatic & Silva (2014) Early (2019) Herndon (2021) | |||
65 | Daniel Koretz | "Our studies of the validity of score gains have been few in number and unsystematic because of the persistent misconception that the programs are self-evaluating." | Denigrating | Lessons from test-based education reform in the U.S., p.20 | Z Erziehungswiss (2011) 14:9–23 | In fact the test prep, or test coaching, literature is vast and dates back decades, with meta-analyses of the literature dating back at least to the 1970s. There's even a What Works Clearinghouse summary of the (post World Wide Web) college admission test prep research literature: https://ies.ed.gov/ncee/wwc/Docs/InterventionReports/wwc_act_sat_100416.pdf . See also: Gilmore (1927) DeWeerdt (1927) French (1959) French & Dear (1959) Ortar (1960) Marron (1965) ETS (1965). Messick & Jungeblut (1981) Ellis, Konoske, Wulfeck, & Montague (1982) DerSimonian and Laird (1983) Kulik, Bangert-Drowns & Kulik (1984) Powers (1985) Samson (1985) Scruggs, White, & Bennion (1986) Jones (1986). Fraker (1986/1987) Halpin (1987) Whitla (1988) Snedecor (1989) Bond (1989). Baydar (1990) Becker (1990) Smyth (1990) Moore (1991) Alderson & Wall (1992) Powers (1993) Oren (1993). Powers & Rock (1994) Scholes, Lane (1997) Allalouf & Ben Shakhar (1998) Robb & Ercanbrack (1999) McClain (1999) Camara (1999, 2001, 2008) Stone & Lane (2000, 2003) Din & Soldan (2001) Briggs (2001) Palmer (2002) Briggs & Hansen (2004) Cankoy & Ali Tut (2005) Crocker (2005) Allensworth, Correa, & Ponisciak (2008) Domingue & Briggs (2009) Koljatic & Silva (2014) Early (2019) Herndon (2021) | |||
66 | Daniel Koretz | "As noted above, there are still many large gaps in the research evidence … the development of performance accountability systems with a better mix of positive and negative effects will require a substantial amount of additional research." | Dismissive | Lessons from test-based education reform in the U.S., p.20 | Z Erziehungswiss (2011) 14:9–23 | ||||
67 | Daniel Koretz | "In principle, tests could be designed to address this issue. Such tests should make it feasible to distinguish score inflation from meaningful gains and to improve the incentives for teachers. However, no designs have yet been tried and evaluated." | Dismissive | Lessons from test-based education reform in the U.S., p.20 | Z Erziehungswiss (2011) 14:9–23 | In fact, we know quite a lot about the source of higher levels of score inflation -- it is lax test security. The many experimental studies of test coaching are consistent, it has some modest effect, and not the volatile or very large effects that Koretz claims. | |||
68 | Daniel Koretz | "Refinements in test design, however, are unlikely to be sufficient. There is a pressing need for research that investigates the relative effects of different designs for accountability systems." | Dismissive | Lessons from test-based education reform in the U.S., p.21 | Z Erziehungswiss (2011) 14:9–23 | Relevant studies of the effects of varying types of incentive or the optimal structure of incentives include those of Kelley (1999); the *Southern Regional Education Board (1998); Trelfa (1998); Heneman (1998); Banta, Lund, Black & Oblander (1996); Brooks-Cooper, 1993; Eckstein & Noah (1993); Richards & Shen (1992); Jacobson (1992); Heyneman & Ransom (1992); *Levine & Lezotte (1990); Duran, 1989; *Crooks (1988); *Kulik & Kulik (1987); Corcoran & Wilson (1986); *Guskey & Gates (1986); Brook & Oxenham (1985); Oxenham (1984); Venezky & Winfield (1979); Brookover & Lezotte (1979); McMillan (1977); Abbott (1977); *Staats (1973); *Kazdin & Bootzin (1972); *O’Leary & Drabman (1971); Cronbach (1960); Hurlock (1925), and Zeng (2001). *Covers many studies; study is a research review, research synthesis, or meta-analysis. Other researchers who, even prior to 2000, studied test-based incentive programs include Homme, Csanyi, Gonzales, Rechs, O’Leary, Drabman, Kaszdin, Bootzin, Staats, Cameron, Pierce, McMillan, Corcoran, Roueche, Kirk, Wheeler, Boylan, and Wilson. | "Others have
considered the role of tests in incentive programs. These researchers have included Homme,
Csanyi, Gonzales, Rechs, O’Leary, Drabman, Kaszdin, Bootzin, Staats, Cameron,
Pierce, McMillan, Corcoran, and Wilson. International organizations, such as
the World Bank or the Asian Development Bank, have studied the effects of
testing on education programs they sponsor.
Researchers have included Somerset, Heynemann, Ransom, Psacharopoulis,
Velez, Brooke, Oxenham, Bude, Chapman, Snyder, and Pronaratna. Moreover, the mastery learning/mastery testing experiments conducted from the 1960s through today varied incentives, frequency of tests, types of tests, and many other factors to determine the optimal structure of testing programs. Researchers included such notables as Bloom, Carroll, Keller, Block, Burns, Wentling, Anderson, Hymel, Kulik, Tierney, Cross, Okey, Guskey, Gates, and Jones." |
||
69 | Daniel Koretz | "Evidence is beginning to accumulate that these choices have consequences for educators’ behavior, but systematic comparisons remain nearly non-existent." | Dismissive | Lessons from test-based education reform in the U.S., p.21 | Z Erziehungswiss (2011) 14:9–23 | Rubbish. Entire books dating back a century were written on the topic, for example: C.C. Ross, Measurement in Today’s Schools, 1942; G.M. Ruch, G.D. Stoddard, Tests and Measurements in High School Instruction, 1927; C.W. Odell, Educational Measurement in High School, 1930. Other testimonies to the abundance of educational testing and empirical research on test use starting in the first half of the twentieth century can be found in Lincoln & Workman 1936, 4, 7; Butts 1947, 605; Monroe 1950, 1461; Holman & Docter 1972, 34; Tyack 1974, 183; and Lohman 1997, 88. | |||
70 | Daniel Koretz | "It is axiomatic that evaluations of schools should be based on multiple measures, but we have very little evidence about the effects of various approaches to doing so." | Dismissive | Lessons from test-based education reform in the U.S., p.21 | Z Erziehungswiss (2011) 14:9–23 | See, for example, https://journals.sagepub.com/doi/abs/10.1177/0193841X19865628#abstract & https://www.tandfonline.com/doi/full/10.1080/15305058.2011.602920 ; https://nonpartisaneducation.org/Review/Resources/QuantitativeList.htm ; https://nonpartisaneducation.org/Review/Resources/SurveyList.htm ; https://nonpartisaneducation.org/Review/Resources/QualitativeList.htm | |||
71 | Daniel Koretz | "Subjective measures are widely used in other fields to help offset the limitations of available objective measures, and there are many educational systems that make substantial use of them … These systems have generated some research, but comparative research exploring the effects of various ways of using judgmental measures remains limited." | Dismissive | Lessons from test-based education reform in the U.S., p.21 | Z Erziehungswiss (2011) 14:9–23 | See
a review of hundreds of studies:
Brookhart, S. M., Guskey, T. R., Bowers, A. J., McMillan, J. H.,
Smith, J. K., Smith, L. F., Stevens, M.T., Welsh, M. E. (2016). A Century of
Grading Research: Meaning and Value in the Most Common Educational Measure.
Review of Educational Research, 86(4), 803-848. doi: 10.3102/0034654316672069 http://doi.org/10.3102/0034654316672069 |
|||
72 | Daniel M. Koretz | Jennifer L. Jennings | “We find that research on the use of test score data is limited, and research investigating the understanding of tests and score data is meager.” p. 1 | Dismissive | The Misunderstanding and Use of Data from Educational Tests | Prepared for Spencer Foundation meetings, Chicago, IL, February 11, 2010. Revised November 21, 2010 | http://www.spencer.org/data-use-and-educational-improvement-initiative-activities/ | Relevant studies include: Forte Fast, E., & the Accountability Systems and Reporting State Collaborative on Assessment and Student Standards. (2002). A guide to effective accountability reporting. Washington, DC: Council of Chief State School Officers. * Goodman, D., & Hambleton, R.K. (2005). Some misconceptions about large-scale educational assessments, Chapter 4 in Richard P Phelps (Ed.) Defending Standardized Testing, Psychology Press. * Goodman, D. P., & Hambleton (2004). Student test score reports and interpretive guides: Review of current practices and suggestions for future research. Applied Measurement in Education. * Hambleton, R. K. (2002). How can we make NAEP and state test score reporting scales and reports more understandable? In R. W. Lissitz & W. D. Schafer (Eds.), Assessment in educational reform (pp. 192-205). Boston: Allyn & Bacon. * Impara, J. C., Divine, K. P., Bruce, F. A., Liverman, M. R., & Gay, A. (1991). Does interpretive test score information help teachers? Educational Measurement: Issues and Practice, 10(4), 16-18. * Wainer, H., Hambleton, R. K., & Meara, K. (1999). Alternative displays for communicating NAEP results: A redesign and validity study. Journal of Educational Measurement, 36(4), 301-335. | |
73 | Daniel M. Koretz | Jennifer L. Jennings | “Because of the sparse research literature, we rely on experience and anecdote in parts of this paper, with the premise that these conclusions should be supplanted over time by findings from systematic research.” p. 1 | Dismissive | The Misunderstanding and Use of Data from Educational Tests | Prepared for Spencer Foundation meetings, Chicago, IL, February 11, 2010. Revised November 21, 2010 | http://www.spencer.org/data-use-and-educational-improvement-initiative-activities | Relevant studies include: Forte Fast, E., & the Accountability Systems and Reporting State Collaborative on Assessment and Student Standards. (2002). A guide to effective accountability reporting. Washington, DC: Council of Chief State School Officers. * Goodman, D., & Hambleton, R.K. (2005). Some misconceptions about large-scale educational assessments, Chapter 4 in Richard P Phelps (Ed.) Defending Standardized Testing, Psychology Press. * Goodman, D. P., & Hambleton (2004). Student test score reports and interpretive guides: Review of current practices and suggestions for future research. Applied Measurement in Education. * Hambleton, R. K. (2002). How can we make NAEP and state test score reporting scales and reports more understandable? In R. W. Lissitz & W. D. Schafer (Eds.), Assessment in educational reform (pp. 192-205). Boston: Allyn & Bacon. * Impara, J. C., Divine, K. P., Bruce, F. A., Liverman, M. R., & Gay, A. (1991). Does interpretive test score information help teachers? Educational Measurement: Issues and Practice, 10(4), 16-18. * Wainer, H., Hambleton, R. K., & Meara, K. (1999). Alternative displays for communicating NAEP results: A redesign and validity study. Journal of Educational Measurement, 36(4), 301-335. | |
74 | Daniel M. Koretz | Jennifer L. Jennings | "...the relative performance of schools is difficult to interpret in the presence of score inflation. At this point, we know very little about the factors that may predict higher levels of inflation —for example, characteristics of tests, accountability systems, students, or schools." p.4 | Dismissive | The Misunderstanding and Use of Data from Educational Tests | Prepared for Spencer Foundation meetings, Chicago, IL, February 11, 2010. Revised November 21, 2010 | http://www.spencer.org/data-use-and-educational-improvement-initiative-activities | In fact, we know quite a lot about the source of higher levels of score inflation -- it is lax test security. The many experimental studies of test coaching are consistent, it has some modest effect, and not the volatile or very large effects that Koretz claims. | |
75 | Daniel M. Koretz | Jennifer L. Jennings | "Unfortunately, it is often exceedingly difficult to obtain the permission and access needed to carry out testing-related research in the public education sector. This is particularly so if the research holds out the possibility of politically inconvenient findings, which virtually all evaluations in this area do. In our experience, very few state or district superintendents or commissioners consider it an obligation to provide thepublic or the field with open and impartial research. | Dismissive | The Misunderstanding and Use of Data from Educational Tests, pp.4-5 | Prepared for Spencer Foundation meetings, Chicago, IL, February 11, 2010. Revised November 21, 2010 | http://www.spencer.org/data-use-and-educational-improvement-initiative-activities/ | Externally administered high-stakes testing is widely reviled among US educationists. It strains credulity that Koretz can not find one district out of the many thousands to cooperate with him to discredit testing. | |
76 | Daniel M. Koretz | Jennifer L. Jennings | “We
focus on three issues that are especially relevant to test-based data and about which research is currently sparse: How do the types of data made available for use affect policymakers’ and educators’ understanding of data? What are the common errors made by policymakers and educators in interpreting test score data? How do high-stakes testing and the availability of test-based data affect administrator and teacher practice? (p. 5) |
Dismissive | The Misunderstanding and Use of Data from Educational Tests | Prepared for Spencer Foundation meetings, Chicago, IL, February 11, 2010. Revised November 21, 2010 | http://www.spencer.org/data-use-and-educational-improvement-initiative-activities | Relevant studies of the effects of tests and/or accountability program on motivation and instructional practice: Goslin (1967), *Southern Regional Education Board (1998); Johnson (1998); Schafer, Hultgren, Hawley, Abrams Seubert & Mazzoni (1997); Miles, Bishop, Collins, Fink, Gardner, Grant, Hussain, et al. (1997); Tuckman & Trimble (1997); Clarke & Stephens (1996); Zigarelli (1996); Stevenson, Lee, et al. (1995); Waters, Burger & Burger (1995); Egeland (1995); Prais (1995); Tuckman (1994); Ritchie & Thorkildsen (1994); Brown & Walberg, (1993); Wall & Alderson (1993); Wolf & Rapiau (1993); Eckstein & Noah (1993); Chao-Qun & Hui (1993); Plazak & Mazur (1992); Steedman (1992); Singh, Marimutha & Mukjerjee (1990); *Levine & Lezotte (1990); O’Sullivan (1989); Somerset (1988); Pennycuick & Murphy (1988); Stevens (1984); Marsh (1984); Brunton (1982); Solberg (1977); Foss (1977); *Kirkland (1971); Somerset (1968); Stuit (1947); and Keys (1934). *Covers many studies; study is a research review, research synthesis, or meta-analysis. | Moreover, the mastery learning/mastery testing experiments conducted from the 1960s through today varied incentives, frequency of tests, types of tests, and many other factors to determine the optimal structure of testing programs. Researchers included such notables as Bloom, Carroll, Keller, Block, Burns, Wentling, Anderson, Hymel, Kulik, Tierney, Cross, Okey, Guskey, Gates, and Jones. |
77 | Daniel M. Koretz | Jennifer L. Jennings | “Systematic research exploring educators’ understanding of both the principles of testing and appropriate interpretation of test-based data is meager.”, p.5 | Dismissive | The Misunderstanding and Use of Data from Educational Tests | Prepared for Spencer Foundation meetings, Chicago, IL, February 11, 2010. Revised November 21, 2010 | http://www.spencer.org/data-use-and-educational-improvement-initiative-activities | Relevant studies include: Forte Fast, E., & the Accountability Systems and Reporting State Collaborative on Assessment and Student Standards. (2002). A guide to effective accountability reporting. Washington, DC: Council of Chief State School Officers. * Goodman, D., & Hambleton, R.K. (2005). Some misconceptions about large-scale educational assessments, Chapter 4 in Richard P Phelps (Ed.) Defending Standardized Testing, Psychology Press. * Goodman, D. P., & Hambleton (2004). Student test score reports and interpretive guides: Review of current practices and suggestions for future research. Applied Measurement in Education. * Hambleton, R. K. (2002). How can we make NAEP and state test score reporting scales and reports more understandable? In R. W. Lissitz & W. D. Schafer (Eds.), Assessment in educational reform (pp. 192-205). Boston: Allyn & Bacon. * Impara, J. C., Divine, K. P., Bruce, F. A., Liverman, M. R., & Gay, A. (1991). Does interpretive test score information help teachers? Educational Measurement: Issues and Practice, 10(4), 16-18. * Wainer, H., Hambleton, R. K., & Meara, K. (1999). Alternative displays for communicating NAEP results: A redesign and validity study. Journal of Educational Measurement, 36(4), 301-335. | |
78 | Daniel M. Koretz | Jennifer L. Jennings | "Although current, systematic information is lacking, our experience is that that the level of understanding of test data among both educators and education policymakers is in many cases abysmally low.", p.6 | Dismissive | The Misunderstanding and Use of Data from Educational Tests | Prepared for Spencer Foundation meetings, Chicago, IL, February 11, 2010. Revised November 21, 2010 | http://www.spencer.org/data-use-and-educational-improvement-initiative-activities | Relevant studies include: Forte Fast, E., & the Accountability Systems and Reporting State Collaborative on Assessment and Student Standards. (2002). A guide to effective accountability reporting. Washington, DC: Council of Chief State School Officers. * Goodman, D., & Hambleton, R.K. (2005). Some misconceptions about large-scale educational assessments, Chapter 4 in Richard P Phelps (Ed.) Defending Standardized Testing, Psychology Press. * Goodman, D. P., & Hambleton (2004). Student test score reports and interpretive guides: Review of current practices and suggestions for future research. Applied Measurement in Education. * Hambleton, R. K. (2002). How can we make NAEP and state test score reporting scales and reports more understandable? In R. W. Lissitz & W. D. Schafer (Eds.), Assessment in educational reform (pp. 192-205). Boston: Allyn & Bacon. * Impara, J. C., Divine, K. P., Bruce, F. A., Liverman, M. R., & Gay, A. (1991). Does interpretive test score information help teachers? Educational Measurement: Issues and Practice, 10(4), 16-18. * Wainer, H., Hambleton, R. K., & Meara, K. (1999). Alternative displays for communicating NAEP results: A redesign and validity study. Journal of Educational Measurement, 36(4), 301-335. | |
79 | Daniel M. Koretz | Jennifer L. Jennings | "There has been a considerably (sic) amount of research exploring problems with standards-based reporting, but less on the use and interpretation of standards-based data by important stakeholders." p.12 | Dismissive | The Misunderstanding and Use of Data from Educational Tests | Prepared for Spencer Foundation meetings, Chicago, IL, February 11, 2010. Revised November 21, 2010 | http://www.spencer.org/data-use-and-educational-improvement-initiative-activities | Relevant studies include: Forte Fast, E., & the Accountability Systems and Reporting State Collaborative on Assessment and Student Standards. (2002). A guide to effective accountability reporting. Washington, DC: Council of Chief State School Officers. * Goodman, D., & Hambleton, R.K. (2005). Some misconceptions about large-scale educational assessments, Chapter 4 in Richard P Phelps (Ed.) Defending Standardized Testing, Psychology Press. * Goodman, D. P., & Hambleton (2004). Student test score reports and interpretive guides: Review of current practices and suggestions for future research. Applied Measurement in Education. * Hambleton, R. K. (2002). How can we make NAEP and state test score reporting scales and reports more understandable? In R. W. Lissitz & W. D. Schafer (Eds.), Assessment in educational reform (pp. 192-205). Boston: Allyn & Bacon. * Impara, J. C., Divine, K. P., Bruce, F. A., Liverman, M. R., & Gay, A. (1991). Does interpretive test score information help teachers? Educational Measurement: Issues and Practice, 10(4), 16-18. * Wainer, H., Hambleton, R. K., & Meara, K. (1999). Alternative displays for communicating NAEP results: A redesign and validity study. Journal of Educational Measurement, 36(4), 301-335. | |
80 | Daniel M. Koretz | Jennifer L. Jennings | "We have heard former teachers discuss this frequently, saying that new teachers in many schools are inculcated with the notion that raising scores in tested subjects is in itself the appropriate goal of instruction. However, we lack systematic data about this..." p.14 | Dismissive | The Misunderstanding and Use of Data from Educational Tests | Prepared for Spencer Foundation meetings, Chicago, IL, February 11, 2010. Revised November 21, 2010 | http://www.spencer.org/data-use-and-educational-improvement-initiative-activities | See, for example, https://www.tandfonline.com/doi/full/10.1080/15305058.2011.602920 ; https://nonpartisaneducation.org/Review/Resources/SurveyList.htm ; https://nonpartisaneducation.org/Review/Resources/QualitativeList.htm | |
81 | Daniel M. Koretz | Jennifer L. Jennings | "Research on score inflation is not abundant, largely for the reason discussed above: policymakers for the most part feel no obligation to allow the relevant research, which is not in their self-interest even when it is in the interests of students in schools. However, at this time, the evidence is both abundant enough and sufficiently often discussed that the existence of the general issue of score inflation appears to be increasingly widely recognized by the media, policymakers, and educators." p.17 | Dismissive | The Misunderstanding and Use of Data from Educational Tests | Prepared for Spencer Foundation meetings, Chicago, IL, February 11, 2010. Revised November 21, 2010 | http://www.spencer.org/data-use-and-educational-improvement-initiative-activities | Externally administered high-stakes testing is widely reviled among US educationists. It strains credulity that Koretz can not find one district out of the many thousands to cooperate with him to discredit testing. | |
82 | Daniel M. Koretz | Jennifer L. Jennings | "The issue of score inflation is both poorly understood and widely ignored in the research community as well." p.18 | Denigrating | The Misunderstanding and Use of Data from Educational Tests | Prepared for Spencer Foundation meetings, Chicago, IL, February 11, 2010. Revised November 21, 2010 | http://www.spencer.org/data-use-and-educational-improvement-initiative-activities | In fact the test prep, or test coaching, literature is vast and dates back decades, with meta-analyses of the literature dating back at least to the 1970s. There's even a What Works Clearinghouse summary of the (post World Wide Web) college admission test prep research literature: https://ies.ed.gov/ncee/wwc/Docs/InterventionReports/wwc_act_sat_100416.pdf . See also: Gilmore (1927) DeWeerdt (1927) French (1959) French & Dear (1959) Ortar (1960) Marron (1965) ETS (1965). Messick & Jungeblut (1981) Ellis, Konoske, Wulfeck, & Montague (1982) DerSimonian and Laird (1983) Kulik, Bangert-Drowns & Kulik (1984) Powers (1985) Samson (1985) Scruggs, White, & Bennion (1986) Jones (1986). Fraker (1986/1987) Halpin (1987) Whitla (1988) Snedecor (1989) Bond (1989). Baydar (1990) Becker (1990) Smyth (1990) Moore (1991) Alderson & Wall (1992) Powers (1993) Oren (1993). Powers & Rock (1994) Scholes, Lane (1997) Allalouf & Ben Shakhar (1998) Robb & Ercanbrack (1999) McClain (1999) Camara (1999, 2001, 2008) Stone & Lane (2000, 2003) Din & Soldan (2001) Briggs (2001) Palmer (2002) Briggs & Hansen (2004) Cankoy & Ali Tut (2005) Crocker (2005) Allensworth, Correa, & Ponisciak (2008) Domingue & Briggs (2009) Koljatic & Silva (2014) Early (2019) Herndon (2021) | |
83 | Daniel M. Koretz | Jennifer L. Jennings | "Research on coaching is very limited." p.21 | Dismissive | The Misunderstanding and Use of Data from Educational Tests | Prepared for Spencer Foundation meetings, Chicago, IL, February 11, 2010. Revised November 21, 2010 | http://www.spencer.org/data-use-and-educational-improvement-initiative-activities | ||
84 | Daniel M. Koretz | Jennifer L. Jennings | "How is test-based information used by educators? … The types of research done to date on this topic, while useful, are insufficient." p.26 | Denigrating | The Misunderstanding and Use of Data from Educational Tests | Prepared for Spencer Foundation meetings, Chicago, IL, February 11, 2010. Revised November 21, 2010 | http://www.spencer.org/data-use-and-educational-improvement-initiative-activities | Relevant studies include: Forte Fast, E., & the Accountability Systems and Reporting State Collaborative on Assessment and Student Standards. (2002). A guide to effective accountability reporting. Washington, DC: Council of Chief State School Officers. * Goodman, D., & Hambleton, R.K. (2005). Some misconceptions about large-scale educational assessments, Chapter 4 in Richard P Phelps (Ed.) Defending Standardized Testing, Psychology Press. * Goodman, D. P., & Hambleton (2004). Student test score reports and interpretive guides: Review of current practices and suggestions for future research. Applied Measurement in Education. * Hambleton, R. K. (2002). How can we make NAEP and state test score reporting scales and reports more understandable? In R. W. Lissitz & W. D. Schafer (Eds.), Assessment in educational reform (pp. 192-205). Boston: Allyn & Bacon. * Impara, J. C., Divine, K. P., Bruce, F. A., Liverman, M. R., & Gay, A. (1991). Does interpretive test score information help teachers? Educational Measurement: Issues and Practice, 10(4), 16-18. * Wainer, H., Hambleton, R. K., & Meara, K. (1999). Alternative displays for communicating NAEP results: A redesign and validity study. Journal of Educational Measurement, 36(4), 301-335. | |
85 | Daniel M. Koretz | Jennifer L. Jennings | … We need to design ways of measuring coaching, which has been almost entirely unstudied." p.26 | Dismissive | The Misunderstanding and Use of Data from Educational Tests | Prepared for Spencer Foundation meetings, Chicago, IL, February 11, 2010. Revised November 21, 2010 | http://www.spencer.org/data-use-and-educational-improvement-initiative-activities | In fact the test prep, or test coaching, literature is vast and dates back decades, with meta-analyses of the literature dating back at least to the 1970s. There's even a What Works Clearinghouse summary of the (post World Wide Web) college admission test prep research literature: https://ies.ed.gov/ncee/wwc/Docs/InterventionReports/wwc_act_sat_100416.pdf . See also: Gilmore (1927) DeWeerdt (1927) French (1959) French & Dear (1959) Ortar (1960) Marron (1965) ETS (1965). Messick & Jungeblut (1981) Ellis, Konoske, Wulfeck, & Montague (1982) DerSimonian and Laird (1983) Kulik, Bangert-Drowns & Kulik (1984) Powers (1985) Samson (1985) Scruggs, White, & Bennion (1986) Jones (1986). Fraker (1986/1987) Halpin (1987) Whitla (1988) Snedecor (1989) Bond (1989). Baydar (1990) Becker (1990) Smyth (1990) Moore (1991) Alderson & Wall (1992) Powers (1993) Oren (1993). Powers & Rock (1994) Scholes, Lane (1997) Allalouf & Ben Shakhar (1998) Robb & Ercanbrack (1999) McClain (1999) Camara (1999, 2001, 2008) Stone & Lane (2000, 2003) Din & Soldan (2001) Briggs (2001) Palmer (2002) Briggs & Hansen (2004) Cankoy & Ali Tut (2005) Crocker (2005) Allensworth, Correa, & Ponisciak (2008) Domingue & Briggs (2009) Koljatic & Silva (2014) Early (2019) Herndon (2021) | |
86 | Daniel M. Koretz | Jennifer L. Jennings | “We have few systematic studies of variations in educators’ responses. …” p. 26 | Dismissive | The Misunderstanding and Use of Data from Educational Tests | Prepared for Spencer Foundation meetings, Chicago, IL, February 11, 2010. Revised November 21, 2010 | http://www.spencer.org/data-use-and-educational-improvement-initiative-activities | Relevant studies of the effects of tests and/or accountability program on motivation and instructional practice: Goslin (1967), *Southern Regional Education Board (1998); Johnson (1998); Schafer, Hultgren, Hawley, Abrams Seubert & Mazzoni (1997); Miles, Bishop, Collins, Fink, Gardner, Grant, Hussain, et al. (1997); Tuckman & Trimble (1997); Clarke & Stephens (1996); Zigarelli (1996); Stevenson, Lee, et al. (1995); Waters, Burger & Burger (1995); Egeland (1995); Prais (1995); Tuckman (1994); Ritchie & Thorkildsen (1994); Brown & Walberg, (1993); Wall & Alderson (1993); Wolf & Rapiau (1993); Eckstein & Noah (1993); Chao-Qun & Hui (1993); Plazak & Mazur (1992); Steedman (1992); Singh, Marimutha & Mukjerjee (1990); *Levine & Lezotte (1990); O’Sullivan (1989); Somerset (1988); Pennycuick & Murphy (1988); Stevens (1984); Marsh (1984); Brunton (1982); Solberg (1977); Foss (1977); *Kirkland (1971); Somerset (1968); Stuit (1947); and Keys (1934). *Covers many studies; study is a research review, research synthesis, or meta-analysis. | Moreover, the mastery learning/mastery testing experiments conducted from the 1960s through today varied incentives, frequency of tests, types of tests, and many other factors to determine the optimal structure of testing programs. Researchers included such notables as Bloom, Carroll, Keller, Block, Burns, Wentling, Anderson, Hymel, Kulik, Tierney, Cross, Okey, Guskey, Gates, and Jones. |
87 | Daniel M. Koretz | Jennifer L. Jennings | "Ultimately, our concern is the impact of educators’ understanding and use of test data on student learning. However, at this point, we have very little comparative information about the validity of gains, .... The comparative information that is beginning to emerge suggests..." p.26 | Dismissive | The Misunderstanding and Use of Data from Educational Tests | Prepared for Spencer Foundation meetings, Chicago, IL, February 11, 2010. Revised November 21, 2010 | http://www.spencer.org/data-use-and-educational-improvement-initiative-activities | In fact the test prep, or test coaching, literature is vast and dates back decades, with meta-analyses of the literature dating back at least to the 1970s. There's even a What Works Clearinghouse summary of the (post World Wide Web) college admission test prep research literature: https://ies.ed.gov/ncee/wwc/Docs/InterventionReports/wwc_act_sat_100416.pdf . See also: Gilmore (1927) DeWeerdt (1927) French (1959) French & Dear (1959) Ortar (1960) Marron (1965) ETS (1965). Messick & Jungeblut (1981) Ellis, Konoske, Wulfeck, & Montague (1982) DerSimonian and Laird (1983) Kulik, Bangert-Drowns & Kulik (1984) Powers (1985) Samson (1985) Scruggs, White, & Bennion (1986) Jones (1986). Fraker (1986/1987) Halpin (1987) Whitla (1988) Snedecor (1989) Bond (1989). Baydar (1990) Becker (1990) Smyth (1990) Moore (1991) Alderson & Wall (1992) Powers (1993) Oren (1993). Powers & Rock (1994) Scholes, Lane (1997) Allalouf & Ben Shakhar (1998) Robb & Ercanbrack (1999) McClain (1999) Camara (1999, 2001, 2008) Stone & Lane (2000, 2003) Din & Soldan (2001) Briggs (2001) Palmer (2002) Briggs & Hansen (2004) Cankoy & Ali Tut (2005) Crocker (2005) Allensworth, Correa, & Ponisciak (2008) Domingue & Briggs (2009) Koljatic & Silva (2014) Early (2019) Herndon (2021) | |
88 | Daniel Koretz | “[T]here is a disturbing lack of good evaluations of these systems. …”, sidebar | Denigrating | Tyler Heights is not alone | American Educator (Summer 2010) | http://www.aft.org/sites/default/files/periodicals/Perlstein.pdf | |||
89 | Daniel Koretz | Anton Béguin | "In the past, score inflation has usually been evaluated by comparing trends in scores on a high-stakes test to trends on a lower-stakes audit test.", abstract | Dismissive | Self-Monitoring Assessment for Educational Accountability Systems | Measurement: Interdisciplinary Research and Perspectives, 8(2–3), 92–109. | No, most of the research on test prep, test coaching, and score inflation has been conducted in experiments. In fact the test prep, or test coaching, literature is vast and dates back decades, with meta-analyses of the literature dating back at least to the 1970s. There's even a What Works Clearinghouse summary of the (post World Wide Web) college admission test prep research literature: https://ies.ed.gov/ncee/wwc/Docs/InterventionReports/wwc_act_sat_100416.pdf . See also: Gilmore (1927) DeWeerdt (1927) French (1959) French & Dear (1959) Ortar (1960) Marron (1965) ETS (1965). Messick & Jungeblut (1981) Ellis, Konoske, Wulfeck, & Montague (1982) DerSimonian and Laird (1983) Kulik, Bangert-Drowns & Kulik (1984) Powers (1985) Samson (1985) Scruggs, White, & Bennion (1986) Jones (1986). Fraker (1986/1987) Halpin (1987) Whitla (1988) Snedecor (1989) Bond (1989). Baydar (1990) Becker (1990) Smyth (1990) Moore (1991) Alderson & Wall (1992) Powers (1993) Oren (1993). Powers & Rock (1994) Scholes, Lane (1997) Allalouf & Ben Shakhar (1998) Robb & Ercanbrack (1999) McClain (1999) Camara (1999, 2001, 2008) Stone & Lane (2000, 2003) Din & Soldan (2001) Briggs (2001) Palmer (2002) Briggs & Hansen (2004) Cankoy & Ali Tut (2005) Crocker (2005) Allensworth, Correa, & Ponisciak (2008) Domingue & Briggs (2009) Koljatic & Silva (2014) Early (2019) | ||
90 | Daniel Koretz | Anton Béguin | "In most of the research to date, score inflation has been evaluated by comparing trends on a high-stakes test to trends on an audit test—a low- or lower-stakes test intended to measure a reasonably similar domain of achievement." p.93 | Dismissive | Self-Monitoring Assessment for Educational Accountability Systems | Measurement: Interdisciplinary Research and Perspectives, 8(2–3), 92–109. | |||
91 | Daniel M. Koretz | "There is a lack of persuasive evidence of positive effects from test-based accountability." p.1 | Dismissive | Implications of Current Policy for Educational Measurment. Policy Brief | Center for K–12 Assessment & Performance Management, Educational Testing Service | http://www.k12center.org/publications.html | See, for example, https://journals.sagepub.com/doi/abs/10.1177/0193841X19865628#abstract & https://www.tandfonline.com/doi/full/10.1080/15305058.2011.602920 ; https://nonpartisaneducation.org/Review/Resources/QuantitativeList.htm ; https://nonpartisaneducation.org/Review/Resources/SurveyList.htm ; https://nonpartisaneducation.org/Review/Resources/QualitativeList.htm | ||
92 | Daniel M. Koretz | "Confronting these problems requires improvements in the design of both accountability systems and the tests used in them." p.1 | Denigrating | Implications of Current Policy for Educational Measurment. Policy Brief | Center for K–12 Assessment & Performance Management, Educational Testing Service | http://www.k12center.org/publications.html | In other words, there isn't enough research, ...and there never will be. | ||
93 | Daniel M. Koretz | "The measurement field has not drawn from research in other fields on accountability systems. Rather, it has proceeded as if it were working in isolation. It also has not conducted sufficient research on the problems being encountered in test-based accountability." p.2 | Dismissive | Implications of Current Policy for Educational Measurment. Policy Brief | Center for K–12 Assessment & Performance Management, Educational Testing Service | http://www.k12center.org/publications.html | In other words, there isn't enough research, ...and there never will be. | ||
94 | Daniel M. Koretz | "It has not addressed adequately the implications of test-based accountability for the field's own activities." p.2 | Denigrating | Implications of Current Policy for Educational Measurment. Policy Brief | Center for K–12 Assessment & Performance Management, Educational Testing Service | http://www.k12center.org/publications.html | In other words, there isn't enough research, ...and there never will be. | ||
95 | Daniel M. Koretz | “The field of measurement has not kept pace with this transformation of testing.” p. 3 | Denigrating | Some Implications of Current Policy for Educational Measurment | paper presented at the Exploratory Seminar: Measurement Challenges Within the Race to the Top Agenda, December 2009 | http://www.k12center.org/rsc/pdf/KoretzPresenterSession3.pdf | In other words, there isn't enough research, ...and there never will be. | ||
96 | Daniel M. Koretz | “For the most part, notwithstanding Lindquist’s warning, the field of measurement has largely ignored the top levels of sampling.” p. 6 | Dismissive | Some Implications of Current Policy for Educational Measurment | paper presented at the Exploratory Seminar: Measurement Challenges Within the Race to the Top Agenda, December 2009 | http://www.k12center.org/rsc/pdf/KoretzPresenterSession3.pdf | Many psychometricians work in the field of gifted testing. Indeed, some specialize in it, and have created a large, robust research literature. One can find much of it at web sites such as "Hoagie's Gifted" and those for the gifted education research centers such as: Belin-Blank (in Iowa); Josephson (in Nevada); Johns Hopkins Center for Talented Youth (in Maryland); and Duke University's Talent Identification Program. | ||
97 | Daniel M. Koretz | "The field of measurement has devoted a great deal of effort to respond to the demands of TBA. …but, however valuable they may be for other reasons, they are not helpful for confronting the core problem of Campbell's Law." p.14 | Dismissive | Some Implications of Current Policy for Educational Measurment | paper presented at the Exploratory Seminar: Measurement Challenges Within the Race to the Top Agenda, December 2009 | http://www.k12center.org/rsc/pdf/KoretzPresenterSession3.pdf | No. See other blurbs above and below. | ||
98 | Daniel M. Koretz | “Currently, research on accountability‐related topics, such as score inflation and effects on educational practice, is slowly growing but remains largely divorced from the core activities of the measurement field.” p. 15 | Dismissive | Some Implications of Current Policy for Educational Measurment | paper presented at the Exploratory Seminar: Measurement Challenges Within the Race to the Top Agenda, December 2009 | http://www.k12center.org/rsc/pdf/KoretzPresenterSession3.pdf | No. See other blurbs above and below. | ||
99 | Daniel Koretz | "Scientifically credible evidence about the effects … of test-based accountability —is in short supply." | Dismissive | Moving Past No Child Left Behind, p.804 | Science 326 (5954), 803-804, 6 NOVEMBER 804 2009 | ||||
100 | Daniel Koretz | "Studies that purport to show net effects on learning are numerous but are as a group too weak to be persuasive." | Denigrating | Moving Past No Child Left Behind, p.804 | Science 326 (5954), 803-804, 6 NOVEMBER 804 2009 | ||||
101 | Daniel Koretz | "Many of these studies use highly aggregated data—comparisons between states and entire nations—which exacerbates the problem of omitted variables." | Denigrating | Moving Past No Child Left Behind, p.804 | Science 326 (5954), 803-804, 6 NOVEMBER 804 2009 | ||||
102 | Daniel Koretz | "Numerous approaches for incorporating judgment have been tried, including inspectorates, quality reviews, peer review, and even parent surveys, but none has yet been adequately evaluated." | Dismissive | Moving Past No Child Left Behind, p.804 | Science 326 (5954), 803-804, 6 NOVEMBER 804 2009 | ||||
103 | Daniel Koretz | "Perhaps the most fundamental problem of ... the TBA programs ... is that they have not been based on rigorous research." | Denigrating | Moving Past No Child Left Behind, p.804 | Science 326 (5954), 803-804, 6 NOVEMBER 804 2009 | ||||
104 | Daniel M. Koretz | “The data, however, are more limited and more complex than is often realized, and the story they properly tell is not quite so straightforward. . . . Data about student performance at the end of high school are scarce and especially hard to collect and interpret.” p. 38 | Dismissive | How do American students measure up? Making Sense of International Comparisons | The Future of Children 19:1 Spring 2009 | http://www.princeton.edu/futureofchildren/publications/docs/19_01_FullJournal.pdf | Relevant studies of the effects of testing on at-risk students, completion, dropping out, curricular offerings, attitudes, etc. include those of Schleisman (1999); the *Southern Regional Education Board (1998); Webster, Mendro, Orsak, Weerasinghe & Bembry (1997); Jones (1996); Boylan (1996); Jones, 1993; Jacobson (1992); Grisay (1991); Johnstone (1990); Task Force on Educational Assessment Programs [Florida] (1979); Wellisch, MacQueen, Carriere & Duck (1978); Enochs (1978); Pronaratna (1976); and McWilliams & Thomas (1976). *Covers many studies; study is a research review, research synthesis, or meta-analysis. | ||
105 | Daniel M. Koretz | “International comparisons clearly do not provide what many observers of education would like. . . . The findings are in some cases inconsistent from one study tor another. Moreover, the data from all of these studies are poorly suited to separating the effects of schooling from the myriad other influences on student achievement. “p 48 | Dismissive | How do American students measure up? Making Sense of International Comparisons | The Future of Children 19:1 Spring 2009 | http://www.princeton.edu/futureofchildren/publications/docs/19_01_FullJournal.pdf | If they do not provide what "many observers" want, why are they so popular? The first international comparison study included less than ten countries several decades ago. Now, several dozen participate each time, at great expense. As for the differences in results, they are to be expected. The Trends in Mathematics and Science Study (TIMSS) is an achievement test administered in primary and middle school. PISA is quite different—more or less an aptitude test administered to fifteen-year-olds. | ||
106 | Daniel M. Koretz | “If truly comparable data from the end of schooling were available, they would presumably look somewhat different, though it is unlikely that they would be greatly more optimistic.” p. 49 | Dismissive | How do American students measure up? Making Sense of International Comparisons | The Future of Children 19:1 Spring 2009 | http://www.princeton.edu/futureofchildren/publications/docs/19_01_FullJournal.pdf | If they do not provide what "many observers" want, why are they so popular? The first international comparison study included less than ten countries several decades ago. Now, several dozen participate each time, at great expense. As for the differences in results, they are to be expected. The Trends in Mathematics and Science Study (TIMSS) is an achievement test administered in primary and middle school. PISA is quite different—more or less an aptitude test administered to fifteen-year-olds. | ||
107 | Daniel Koretz | "Therefore, most studies of score inflation in the U.S. have compared gains on a test used for accountability (usually called the high-stakes test) to gains on another test of the same domain (often called the audit test)." p.778 | Dismissive | Test-based educational accountability. Research evidence and implication | Zeitschrift für Pädagogik 54 (2008) 6, S. 777–790 | http://www.pedocs.de/volltexte/2011/4376/pdf/ZfPaed_2008_6_Koretz_Testbased_educational_accountability_D_A.pdf | Only most of his studies are like that. Most relevant studies on the topic are experimental, comparing two groups -- one receiving coaching and the other not. | ||
108 | Daniel Koretz | “Few detailed studies of score inflation have been carried out, in part because they are politically controversial..” p. 778 | Dismissive | Test-based educational accountability. Research evidence and implication | Zeitschrift für Pädagogik 54 (2008) 6, S. 777–790 | http://www.pedocs.de/volltexte/2011/4376/pdf/ZfPaed_2008_6_Koretz_Testbased_educational_accountability_D_A.pdf | Externally administered high-stakes testing is widely reviled among US educationists. It strains credulity that Koretz can not find one district out of the many thousands to cooperate with him to discredit testing. The most famous test score inflation study of all time -- John J. Cannell's "Lake Wobegon Effect" study -- preceded Koretz's by several years. See: http://nonpartisaneducation.org/Review/Books/CannellBook1.htm http://nonpartisaneducation.org/Review/Books/Cannell2.pdf | ||
109 | Daniel Koretz | "Moreover, we still know little about factors that predict which schools’ scores are most inflated. Indeed, we lack good tools for identifying variations in inflation because we rarely have a reasonable audit test that is administered regularly in all schools. The result is that the many of the most important inferences based on scores can badly biased. In the absence of confirmatory evidence, neither the large aggregate increases in scores that often accompany test-based accountability nor relative differences in gains among schools can be trusted." p.779 | Dismissive | Test-based educational accountability. Research evidence and implication | Zeitschrift für Pädagogik 54 (2008) 6, S. 777–790 | http://www.pedocs.de/volltexte/2011/4376/pdf/ZfPaed_2008_6_Koretz_Testbased_educational_accountability_D_A.pdf | Koretz's score inflation studies typically employ no controls for test administration or test content factors. One of his tests might be administered with tight security and the other with none at all. One of his tests might focus on one subject area and the other test another topic entirely. He writes as if all of his "left out" variables could not possibly matter. Moreover, he ignores completely the huge experimental literature on test prep in favor of his apples-to-oranges comparison studies. | ||
110 | Daniel Koretz | “There has been very little research on the practical effects of using VAMs.” p. 39 | Dismissive | A measured approach: value-added models are a promising improvement, but no one measure can evaluate teacher performance. | American Educator (Fall 2008) | http://www.aft.org/sites/default/files/periodicals/koretz.pdf | Tennessee's TVAAS value-added measurement system had been running for over a decade when he wrote this, and it did much of what he claims had never been done. | ||
111 | Daniel Koretz | “The movement toward VAMs only exacerbates this problem because of the remaining serious gaps in our knowledge of their workings and effects.” p. 39 | Dismissive | A measured approach: value-added models are a promising improvement, but no one measure can evaluate teacher performance. | American Educator (Fall 2008) | http://www.aft.org/sites/default/files/periodicals/koretz.pdf | Tennessee's TVAAS value-added measurement system had been running for over a decade when he wrote this, and it did much of what he claims had never been done. | ||
112 | Scott J. Cech | Daniel Koretz [interviewee] | “'If you tell people that performance on that tested sample is what matters, that’s what they worry about, so you can get inappropriate responses in the classroom and inflated test scores,' he said." Mr. Koretz pointed to research in the 1990s on the state standardized test then used in Kentucky, ... " | Dismissive | Testing Expert Sees ‘Illusions of Progress’ Under NCLB | Education Week, October 1, 2008 | Koretz's score inflation studies typically employ no controls for test administration or test content factors. One of his tests might be administered with tight security and the other with none at all. One of his tests might focus on one subject area and the other test another topic entirely. He writes as if all of his "left out" variables could not possibly matter. Moreover, he ignores completely the huge experimental literature on test prep in favor of his apples-to-oranges comparison studies. | ||
113 | Scott J. Cech | Daniel Koretz [interviewee] | "Mr. Koretz said the relative dearth to date of comparative studies on large-scale state assessments isn’t for lack of trying. He said he and other scholars have often been rebuffed after approaching officials about the possibility of studying their assessment systems. | Dismissive | Testing Expert Sees ‘Illusions of Progress’ Under NCLB | Education Week, October 1, 2008 | Externally administered high-stakes testing is widely reviled among US educationists. It strains credulity that Koretz can not find one district out of the many thousands to cooperate with him to discredit testing. | ||
114 | Scott J. Cech | Daniel Koretz [interviewee] | “There have not been a lot of studies of this,” Mr. Koretz said, “for the simple reason that it’s politically rather hard to do, to come to a state chief and say, ‘I’d like the chance to see whether your test scores are inflated.’?” | Dismissive | Testing Expert Sees ‘Illusions of Progress’ Under NCLB | Education Week, October 1, 2008 | Externally administered high-stakes testing is widely reviled among US educationists. It strains credulity that Koretz can not find one district out of the many thousands to cooperate with him to discredit testing. | ||
115 | Eduwonkette | Daniel Koretz [interviewee] | “Unfortunately, while we have a lot of anecdotal evidence suggesting that this [equity as the rationale for NCLB] is the case, we have very few serious empirical studies of this.” answer to 3rd question, 1st para | Denigrating | What does educational testing really tell us? | Education Week [interview ], 9.23.2008 | http://blogs.edweek.org/edweek/eduwonkette/2008/09/what_does_educational_testing_1.html | A "rationale" is an argument, a belief, an explanation, not an empirical result. The civil rights groups that supported NCLB did so because they saw it an an equity vehicle. | |
116 | Daniel M. Koretz | "…we rarely know when [test] scores are inflated because we so rarely check." | Dismissive | Interpreting test scores: More complicated than you think [interview] | Chronicle of Higher Education, August 15, 2008, p. A23 | In fact the test prep, or test coaching, literature is vast and dates back decades, with meta-analyses of the literature dating back at least to the 1970s. There's even a What Works Clearinghouse summary of the (post World Wide Web) college admission test prep research literature: https://ies.ed.gov/ncee/wwc/Docs/InterventionReports/wwc_act_sat_100416.pdf . See also: Gilmore (1927) DeWeerdt (1927) French (1959) French & Dear (1959) Ortar (1960) Marron (1965) ETS (1965). Messick & Jungeblut (1981) Ellis, Konoske, Wulfeck, & Montague (1982) DerSimonian and Laird (1983) Kulik, Bangert-Drowns & Kulik (1984) Powers (1985) Samson (1985) Scruggs, White, & Bennion (1986) Jones (1986). Fraker (1986/1987) Halpin (1987) Whitla (1988) Snedecor (1989) Bond (1989). Baydar (1990) Becker (1990) Smyth (1990) Moore (1991) Alderson & Wall (1992) Powers (1993) Oren (1993). Powers & Rock (1994) Scholes, Lane (1997) Allalouf & Ben Shakhar (1998) Robb & Ercanbrack (1999) McClain (1999) Camara (1999, 2001, 2008) Stone & Lane (2000, 2003) Din & Soldan (2001) Briggs (2001) Palmer (2002) Briggs & Hansen (2004) Cankoy & Ali Tut (2005) Crocker (2005) Allensworth, Correa, & Ponisciak (2008) Domingue & Briggs (2009) Koljatic & Silva (2014) Early (2019) Herndon (2021) | |||
117 | Daniel M. Koretz | Katherine E. Ryan, Lorrie A. Shepard, Eds. | "...traditional psychometrics was in two critical respects tacitly premised on low stakes. The first is that it gave relatively little attention to the consequences of testing. The second is a special case of the first: traditional psychometrics focused little on behavioral responses to testing, other than the behavior of the student while taking the test and of proctors administering it." pp.71-72 | Dismissive | Further steps toward the development of an accountability-oriented science of measurement | Chapter 4 in The Future of Test-Based Educational Accountability | Routledge | Actually, high-quality evaluations of testing interventions have been numerous and common over the past century. Most of them do not produce the results that Koretz prefers, however, so he declares them nonexistent. See https://www.tandfonline.com/doi/full/10.1080/15305058.2011.602920 ; https://nonpartisaneducation.org/Review/Resources/QuantitativeList.htm ; https://nonpartisaneducation.org/Review/Resources/SurveyList.htm ; https://nonpartisaneducation.org/Review/Resources/QualitativeList.htm | |
118 | Daniel M. Koretz | Katherine E. Ryan, Lorrie A. Shepard, Eds. | "Nonetheless it is fair to say that most of the psychometric enterprise--what people in the field did when developing methods or operating testing programs--proceeded without much attention to these concerns." p.72 | Denigrating | Further steps toward the development of an accountability-oriented science of measurement | Chapter 4 in The Future of Test-Based Educational Accountability | Routledge | Relevant studies of the effects of tests and/or accountability program on motivation and instructional practice: Goslin (1967), *Southern Regional Education Board (1998); Johnson (1998); Schafer, Hultgren, Hawley, Abrams Seubert & Mazzoni (1997); Miles, Bishop, Collins, Fink, Gardner, Grant, Hussain, et al. (1997); Tuckman & Trimble (1997); Clarke & Stephens (1996); Zigarelli (1996); Stevenson, Lee, et al. (1995); Waters, Burger & Burger (1995); Egeland (1995); Prais (1995); Tuckman (1994); Ritchie & Thorkildsen (1994); Brown & Walberg, (1993); Wall & Alderson (1993); Wolf & Rapiau (1993); Eckstein & Noah (1993); Chao-Qun & Hui (1993); Plazak & Mazur (1992); Steedman (1992); Singh, Marimutha & Mukjerjee (1990); *Levine & Lezotte (1990); O’Sullivan (1989); Somerset (1988); Pennycuick & Murphy (1988); Stevens (1984); Marsh (1984); Brunton (1982); Solberg (1977); Foss (1977); *Kirkland (1971); Somerset (1968); Stuit (1947); and Keys (1934). *Covers many studies; study is a research review, research synthesis, or meta-analysis. | Moreover, the mastery learning/mastery testing experiments conducted from the 1960s through today varied incentives, frequency of tests, types of tests, and many other factors to determine the optimal structure of testing programs. Researchers included such notables as Bloom, Carroll, Keller, Block, Burns, Wentling, Anderson, Hymel, Kulik, Tierney, Cross, Okey, Guskey, Gates, and Jones. |
119 | Daniel M. Koretz | Katherine E. Ryan, Lorrie A. Shepard, Eds. | "The past several decades have also witnessed a growth in empirical research exploring the effects of accountability-oriented testing programs. A limited amount of work has investigated the validity of gains obtained under high-stakes conditions (e.g., Jacob, 2005, 2007; Koretz, Linn, Dunbar, & Shepard, 1991; Koretz & Barron, 1998)." p.74 | Denigrating | Further steps toward the development of an accountability-oriented science of measurement | Chapter 4 in The Future of Test-Based Educational Accountability | Routledge | Koretz's score inflation studies typically employ no controls for test administration or test content factors. One of his tests might be administered with tight security and the other with none at all. One of his tests might focus on one subject area and the other test another topic entirely. He writes as if all of his "left out" variables could not possibly matter. Moreover, he ignores completely the huge experimental literature on test prep in favor of his apples-to-oranges comparison studies. | |
120 | Daniel M. Koretz | Katherine E. Ryan, Lorrie A. Shepard, Eds. | "Although it is clear that behavioral responses to high-stakes testing pose serious challenges to conventional practices in measurment, the field's responses to them have been meager. Little has been done to explore alternative practices--either in the design of tests or in the operation of testing programs." p.86 | Denigrating | Further steps toward the development of an accountability-oriented science of measurement | Chapter 4 in The Future of Test-Based Educational Accountability | Routledge | Koretz's score inflation studies typically employ no controls for test administration or test content factors. One of his tests might be administered with tight security and the other with none at all. One of his tests might focus on one subject area and the other test another topic entirely. He writes as if all of his "left out" variables could not possibly matter. Moreover, he ignores completely the huge experimental literature on test prep in favor of his apples-to-oranges comparison studies. | |
121 | Daniel M. Koretz | Katherine E. Ryan, Lorrie A. Shepard, Eds. | "Perhaps most striking, the problem of score inflation gets fleeting mention, if any at all, in most evaluations or discussions of validity, whether in technical reports of testing programs, the scholarly literature, or textbooks--even though the bias introduced by score inflation can dwarf that caused by some factors that receive more attention." p.86 | Denigrating | Further steps toward the development of an accountability-oriented science of measurement | Chapter 4 in The Future of Test-Based Educational Accountability | Routledge | His theory of score inflation gets little attention within the profession because it is a red herring. Outside the domain of psychometricians, however, it gets quite a lot of attention, and is widely believed as valid. | |
122 | Daniel M. Koretz | Katherine E. Ryan, Lorrie A. Shepard, Eds. | "There are far too few studies of the validity of scores under high-stakes conditions, and we know very little about the distribution and correlates of score inflation (e.g., its variation across types of testing programs, types of schools, or types of schools." p.87 | Dismissive | Further steps toward the development of an accountability-oriented science of measurement | Chapter 4 in The Future of Test-Based Educational Accountability | Routledge | In fact the test prep, or test coaching, literature is vast and dates back decades, with meta-analyses of the literature dating back at least to the 1970s. There's even a What Works Clearinghouse summary of the (post World Wide Web) college admission test prep research literature: https://ies.ed.gov/ncee/wwc/Docs/InterventionReports/wwc_act_sat_100416.pdf . See also: Gilmore (1927) DeWeerdt (1927) French (1959) French & Dear (1959) Ortar (1960) Marron (1965) ETS (1965). Messick & Jungeblut (1981) Ellis, Konoske, Wulfeck, & Montague (1982) DerSimonian and Laird (1983) Kulik, Bangert-Drowns & Kulik (1984) Powers (1985) Samson (1985) Scruggs, White, & Bennion (1986) Jones (1986). Fraker (1986/1987) Halpin (1987) Whitla (1988) Snedecor (1989) Bond (1989). Baydar (1990) Becker (1990) Smyth (1990) Moore (1991) Alderson & Wall (1992) Powers (1993) Oren (1993). Powers & Rock (1994) Scholes, Lane (1997) Allalouf & Ben Shakhar (1998) Robb & Ercanbrack (1999) McClain (1999) Camara (1999, 2001, 2008) Stone & Lane (2000, 2003) Din & Soldan (2001) Briggs (2001) Palmer (2002) Briggs & Hansen (2004) Cankoy & Ali Tut (2005) Crocker (2005) Allensworth, Correa, & Ponisciak (2008) Domingue & Briggs (2009) Koljatic & Silva (2014) Early (2019) Herndon (2021) | |
123 | Daniel M. Koretz | Katherine E. Ryan, Lorrie A. Shepard, Eds. | "Extant research on teachers' and principals' responses to testing, although somewhat more copius, is still insufficient, providing little systematic data on the use of test-preparation materials and other forms of coaching or on the relationships between test design and instructional responses." p.87 | Further steps toward the development of an accountability-oriented science of measurement | Chapter 4 in The Future of Test-Based Educational Accountability | Routledge | Relevant studies of the effects of tests and/or accountability program on motivation and instructional practice: Goslin (1967), *Southern Regional Education Board (1998); Johnson (1998); Schafer, Hultgren, Hawley, Abrams Seubert & Mazzoni (1997); Miles, Bishop, Collins, Fink, Gardner, Grant, Hussain, et al. (1997); Tuckman & Trimble (1997); Clarke & Stephens (1996); Zigarelli (1996); Stevenson, Lee, et al. (1995); Waters, Burger & Burger (1995); Egeland (1995); Prais (1995); Tuckman (1994); Ritchie & Thorkildsen (1994); Brown & Walberg, (1993); Wall & Alderson (1993); Wolf & Rapiau (1993); Eckstein & Noah (1993); Chao-Qun & Hui (1993); Plazak & Mazur (1992); Steedman (1992); Singh, Marimutha & Mukjerjee (1990); *Levine & Lezotte (1990); O’Sullivan (1989); Somerset (1988); Pennycuick & Murphy (1988); Stevens (1984); Marsh (1984); Brunton (1982); Solberg (1977); Foss (1977); *Kirkland (1971); Somerset (1968); Stuit (1947); and Keys (1934). *Covers many studies; study is a research review, research synthesis, or meta-analysis. | Moreover, the mastery learning/mastery testing experiments conducted from the 1960s through today varied incentives, frequency of tests, types of tests, and many other factors to determine the optimal structure of testing programs. Researchers included such notables as Bloom, Carroll, Keller, Block, Burns, Wentling, Anderson, Hymel, Kulik, Tierney, Cross, Okey, Guskey, Gates, and Jones. | |
124 | Daniel M. Koretz | Gail Sunderland, Ed. | "... We know far too little about how to hold schools accountable for improving student performance.", p.9 | Dismissive | The pending reauthorization of NCLB: An opportunity to rethink the basic strategy | Chapter 1 in Holding NCLB accountable: Achieving accountability, equity, and school reform, 2008 | Corwin Press | The vast amount of information already available just for the asking, worldwide, could help build better accountability systems, without wasting more research grant money on those who refuse to study what is already available. | |
125 | Daniel M. Koretz | Gail Sunderland, Ed. | "A modest number of studies argue that high-stakes testing does or doesn't improve student performance in tested subjects.", p.10 | Dismissive | The pending reauthorization of NCLB: An opportunity to rethink the basic strategy | Chapter 1 in Holding NCLB accountable: Achieving accountability, equity, and school reform, 2008 | Corwin Press | In fact, a very large number of studies do so. See, for example, https://journals.sagepub.com/doi/abs/10.1177/0193841X19865628#abstract & https://www.tandfonline.com/doi/full/10.1080/15305058.2011.602920 ; https://nonpartisaneducation.org/Review/Resources/QuantitativeList.htm ; https://nonpartisaneducation.org/Review/Resources/SurveyList.htm ; https://nonpartisaneducation.org/Review/Resources/QualitativeList.htm | |
126 | Daniel M. Koretz | Gail Sunderland, Ed. | "This research tells us little. Much of it is of very low quality, and even the careful studies are hobbled by data that are inadequate for the task.", p.10 | Denigrating | The pending reauthorization of NCLB: An opportunity to rethink the basic strategy | Chapter 1 in Holding NCLB accountable: Achieving accountability, equity, and school reform, 2008 | Corwin Press | In fact, a very large number of studies do so. See, for example, https://journals.sagepub.com/doi/abs/10.1177/0193841X19865628#abstract & https://www.tandfonline.com/doi/full/10.1080/15305058.2011.602920 ; https://nonpartisaneducation.org/Review/Resources/QuantitativeList.htm ; https://nonpartisaneducation.org/Review/Resources/SurveyList.htm ; https://nonpartisaneducation.org/Review/Resources/QualitativeList.htm | |
127 | Daniel M. Koretz | Gail Sunderland, Ed. | "Moreover, this research asks too simple a question. Asking whether test-based accountability works is a bit like asking whether medicine works. What medicines? For what medical conditions?", p.10 | Denigrating | The pending reauthorization of NCLB: An opportunity to rethink the basic strategy | Chapter 1 in Holding NCLB accountable: Achieving accountability, equity, and school reform, 2008 | Corwin Press | In fact, a very large number of studies do so. See, for example, https://journals.sagepub.com/doi/abs/10.1177/0193841X19865628#abstract & https://www.tandfonline.com/doi/full/10.1080/15305058.2011.602920 ; https://nonpartisaneducation.org/Review/Resources/QuantitativeList.htm ; https://nonpartisaneducation.org/Review/Resources/SurveyList.htm ; https://nonpartisaneducation.org/Review/Resources/QualitativeList.htm | |
128 | Daniel M. Koretz | Gail Sunderland, Ed. | "We need research and evaluation to address this question, because we lack a grounded answer.", p.11 | Dismissive | The pending reauthorization of NCLB: An opportunity to rethink the basic strategy | Chapter 1 in Holding NCLB accountable: Achieving accountability, equity, and school reform, 2008 | Corwin Press | In fact, a very large number of studies do so. See, for example, https://journals.sagepub.com/doi/abs/10.1177/0193841X19865628#abstract & https://www.tandfonline.com/doi/full/10.1080/15305058.2011.602920 ; https://nonpartisaneducation.org/Review/Resources/QuantitativeList.htm ; https://nonpartisaneducation.org/Review/Resources/SurveyList.htm ; https://nonpartisaneducation.org/Review/Resources/QualitativeList.htm | |
129 | Daniel M. Koretz | Gail Sunderland, Ed. | " ... research does not tell us whether high-stakes testing works.", p.11 | Dismissive | The pending reauthorization of NCLB: An opportunity to rethink the basic strategy | Chapter 1 in Holding NCLB accountable: Achieving accountability, equity, and school reform, 2008 | Corwin Press | In fact, a very large number of studies do so. See, for example, https://journals.sagepub.com/doi/abs/10.1177/0193841X19865628#abstract & https://www.tandfonline.com/doi/full/10.1080/15305058.2011.602920 ; https://nonpartisaneducation.org/Review/Resources/QuantitativeList.htm ; https://nonpartisaneducation.org/Review/Resources/SurveyList.htm ; https://nonpartisaneducation.org/Review/Resources/QualitativeList.htm | |
130 | Daniel M. Koretz | Gail Sunderland, Ed. | "The few relevant studies [of test score inflation] are of two types: detailed evaluations of scores in specific jurisdictions, .... We have far fewer ... than we should.", pp.11-12 | Denigrating | The pending reauthorization of NCLB: An opportunity to rethink the basic strategy | Chapter 1 in Holding NCLB accountable: Achieving accountability, equity, and school reform, 2008 | Corwin Press | In fact the test prep, or test coaching, literature is vast and dates back decades, with meta-analyses of the literature dating back at least to the 1970s. There's even a What Works Clearinghouse summary of the (post World Wide Web) college admission test prep research literature: https://ies.ed.gov/ncee/wwc/Docs/InterventionReports/wwc_act_sat_100416.pdf . See also: Gilmore (1927) DeWeerdt (1927) French (1959) French & Dear (1959) Ortar (1960) Marron (1965) ETS (1965). Messick & Jungeblut (1981) Ellis, Konoske, Wulfeck, & Montague (1982) DerSimonian and Laird (1983) Kulik, Bangert-Drowns & Kulik (1984) Powers (1985) Samson (1985) Scruggs, White, & Bennion (1986) Jones (1986). Fraker (1986/1987) Halpin (1987) Whitla (1988) Snedecor (1989) Bond (1989). Baydar (1990) Becker (1990) Smyth (1990) Moore (1991) Alderson & Wall (1992) Powers (1993) Oren (1993). Powers & Rock (1994) Scholes, Lane (1997) Allalouf & Ben Shakhar (1998) Robb & Ercanbrack (1999) McClain (1999) Camara (1999, 2001, 2008) Stone & Lane (2000, 2003) Din & Soldan (2001) Briggs (2001) Palmer (2002) Briggs & Hansen (2004) Cankoy & Ali Tut (2005) Crocker (2005) Allensworth, Correa, & Ponisciak (2008) Domingue & Briggs (2009) Koljatic & Silva (2014) Early (2019) Herndon (2021) | |
131 | Daniel M. Koretz | Gail Sunderland, Ed. | "The results of the relatively few relevant studies are both striking and consistent: gains on high-stakes tests often do not generalize well to other measures, and the gap is frequently huge." p.12 | Dismissive | The pending reauthorization of NCLB: An opportunity to rethink the basic strategy | Chapter 1 in Holding NCLB accountable: Achieving accountability, equity, and school reform, 2008 | Corwin Press | ||
132 | Daniel M. Koretz | Gail Sunderland, Ed. | "But this remains only a hypothesis, not yet tested by much empirical evidence." p.14 | Dismissive | The pending reauthorization of NCLB: An opportunity to rethink the basic strategy | Chapter 1 in Holding NCLB accountable: Achieving accountability, equity, and school reform, 2008 | Corwin Press | The most famous test score inflation study of all time -- John J. Cannells "Lake Wobegon Effect" study -- is largely about cheating. See: http://nonpartisaneducation.org/Review/Books/CannellBook1.htm http://nonpartisaneducation.org/Review/Books/Cannell2.pdf; See also Gregory J. Cizek's Cheating on Tests: https://www.goodreads.com/book/show/5084641-cheating-on-tests ; and Caveon Test Security's resource pages: https://www.caveon.com/resources/ | |
133 | Daniel M. Koretz | Gail Sunderland, Ed. | "We urgently need finer grained studies of this issue.", p.14 | Denigrating | The pending reauthorization of NCLB: An opportunity to rethink the basic strategy | Chapter 1 in Holding NCLB accountable: Achieving accountability, equity, and school reform, 2008 | Corwin Press | The most famous test score inflation study of all time -- John J. Cannells "Lake Wobegon Effect" study -- is largely about cheating. See: http://nonpartisaneducation.org/Review/Books/CannellBook1.htm http://nonpartisaneducation.org/Review/Books/Cannell2.pdf; See also Gregory J. Cizek's Cheating on Tests: https://www.goodreads.com/book/show/5084641-cheating-on-tests ; and Caveon Test Security's resource pages: https://www.caveon.com/resources/ | |
134 | Daniel M. Koretz | Gail Sunderland, Ed. | "There are limited systematic data about cheating.", p.16 | Denigrating | The pending reauthorization of NCLB: An opportunity to rethink the basic strategy | Chapter 1 in Holding NCLB accountable: Achieving accountability, equity, and school reform, 2008 | Corwin Press | The most famous test score inflation study of all time -- John J. Cannells "Lake Wobegon Effect" study -- is largely about cheating. See: http://nonpartisaneducation.org/Review/Books/CannellBook1.htm http://nonpartisaneducation.org/Review/Books/Cannell2.pdf; See also Gregory J. Cizek's Cheating on Tests: https://www.goodreads.com/book/show/5084641-cheating-on-tests ; and Caveon Test Security's resource pages: https://www.caveon.com/resources/ | |
135 | Daniel M. Koretz | Gail Sunderland, Ed. | "Building those better [accountability] systems requires more systematic, empirical data, and that, in turn, requires a serious agenda of R&D.", p.26 | Denigrating | The pending reauthorization of NCLB: An opportunity to rethink the basic strategy | Chapter 1 in Holding NCLB accountable: Achieving accountability, equity, and school reform, 2008 | Corwin Press | The vast amount of information already available just for the asking, worldwide, could help build better accountability systems, without wasting more research grant money on those who refuse to study what is already available. | |
136 | Daniel M. Koretz | “… [T]he problem of score inflation is at best inconvenient and at worse [sic] threatening. (The latter is one reason that there are so few studies of this problem. …)” p. 11 | Dismissive | Measuring up: What educational testing really tells us | Harvard University Press, 2008 | Google Books | Externally administered high-stakes testing is widely reviled among US educationists. It strains credulity that Koretz can not find one district out of the many thousands to cooperate with him to discredit testing. | ||
137 | Daniel M. Koretz | “The relatively few studies that have addressed this question support the skeptical interpretation: in many cases, mastery of material on the new test simply substitutes for mastery of the old.” p. 242 | Dismissive | Measuring up: What educational testing really tells us | Harvard University Press, 2008 | Google Books | Koretz's preferred method for "auditing" a high-stakes test is to compare its score trends to those of a parallel no-stakes test, which, presumably, will have totally reliable score trends. Yet, a cornucopia of research has shown "no stakes" tests to be relatively unreliable, less reliable than high stakes tests, and to dampen student effort (see, e.g., Acherman & Kanfer, 2009; S. M. Brown & Walberg, 1993; Cole, Bergin, & Whittaker, 2008; Eklof, 2007; Finn, 2015; Hawthorne, Bol, Pribesh, & Suh, 2015; Wise & DeMars, 2005, 2015). | ||
138 | Daniel M. Koretz | “Because so many people consider test-based accountability to be self-evaluating … there is a disturbing lack of good evaluations of these systems. …”) p. 331 | Denigrating | Measuring up: What educational testing really tells us | Harvard University Press, 2008 | Google Books | Koretz's preferred method for "auditing" a high-stakes test is to compare its score trends to those of a parallel no-stakes test, which, presumably, will have totally reliable score trends. Yet, a cornucopia of research has shown "no stakes" tests to be relatively unreliable, less reliable than high stakes tests, and to dampen student effort (see, e.g., Acherman & Kanfer, 2009; S. M. Brown & Walberg, 1993; Cole, Bergin, & Whittaker, 2008; Eklof, 2007; Finn, 2015; Hawthorne, Bol, Pribesh, & Suh, 2015; Wise & DeMars, 2005, 2015). | ||
139 | Daniel M. Koretz | "Although there are only a handful of good studies of possible score inflation in high-stakes contexts, most contrast trends on high-stakes test to trends on other measures designed to support similar inferences." p.348 | Dismissive | Using aggregate-level linkages for estimation and valuation, etc. | in Linking and Aligning Scores and Scales, Springer, 2007 | Google Books | Actually, there's a large experimental literature on test prep / test coaching / teaching to the test / test score inflation. Koretz's preferred method for "auditing" a high-stakes test is to compare its score trends to those of a parallel no-stakes test, which, presumably, will have totally reliable score trends. Yet, a cornucopia of research has shown "no stakes" tests to be relatively unreliable, less reliable than high stakes tests, and to dampen student effort (see, e.g., Acherman & Kanfer, 2009; S. M. Brown & Walberg, 1993; Cole, Bergin, & Whittaker, 2008; Eklof, 2007; Finn, 2015; Hawthorne, Bol, Pribesh, & Suh, 2015; Wise & DeMars, 2005, 2015). | ||
140 | Daniel M. Koretz | “Most of these few studies showed a rapid divergence of means on the two tests. …” p. 348 | Dismissive | Using aggregate-level linkages for estimation and valuation, etc. | in Linking and Aligning Scores and Scales, Springer, 2007 | Google Books | Koretz's preferred method for "auditing" a high-stakes test is to compare its score trends to those of a parallel no-stakes test, which, presumably, will have totally reliable score trends. Yet, a cornucopia of research has shown "no stakes" tests to be relatively unreliable, less reliable than high stakes tests, and to dampen student effort (see, e.g., Acherman & Kanfer, 2009; S. M. Brown & Walberg, 1993; Cole, Bergin, & Whittaker, 2008; Eklof, 2007; Finn, 2015; Hawthorne, Bol, Pribesh, & Suh, 2015; Wise & DeMars, 2005, 2015). | ||
141 | Daniel M. Koretz | Valerie Strauss, journalist interviewer | "“The testing culture ‘has a lot more momentum than it should,’ agreed [CRESST researcher Koretz]. He said a lack of solid research on the results of the new testing regimen—or those that predated No Child Left Behind—essentially means that the country is experimenting with its young people." | Dismissive | The rise of the testing culture, p.A09 | Strauss, V. (2006, October 10). Washington Post | In fact, a very large number of studies do so. See, for example, https://journals.sagepub.com/doi/abs/10.1177/0193841X19865628#abstract & https://www.tandfonline.com/doi/full/10.1080/15305058.2011.602920 ; https://nonpartisaneducation.org/Review/Resources/QuantitativeList.htm ; https://nonpartisaneducation.org/Review/Resources/SurveyList.htm ; https://nonpartisaneducation.org/Review/Resources/QualitativeList.htm | ||
142 | Daniel M. Koretz & Laura S. Hamilton | Robert L. Brennan, Ed. | "Most of the studies of [testing's] effects on practice report average responses that mask some of these important variations and interactions." p.552 | Denigrating | Testing for Accountability in K-12 | Chapter 15 in Educational Measurement, published by NCME and ACE, 2006 | Relevant studies of the effects of varying types of incentive or the optimal structure of incentives include those of Kelley (1999); the *Southern Regional Education Board (1998); Trelfa (1998); Heneman (1998); Banta, Lund, Black & Oblander (1996); Brooks-Cooper, 1993; Eckstein & Noah (1993); Richards & Shen (1992); Jacobson (1992); Heyneman & Ransom (1992); *Levine & Lezotte (1990); Duran, 1989; *Crooks (1988); *Kulik & Kulik (1987); Corcoran & Wilson (1986); *Guskey & Gates (1986); Brook & Oxenham (1985); Oxenham (1984); Venezky & Winfield (1979); Brookover & Lezotte (1979); McMillan (1977); Abbott (1977); *Staats (1973); *Kazdin & Bootzin (1972); *O’Leary & Drabman (1971); Cronbach (1960); Hurlock (1925), and Zeng (2001). *Covers many studies; study is a research review, research synthesis, or meta-analysis. Other researchers who, even prior to 2000, studied test-based incentive programs include Homme, Csanyi, Gonzales, Rechs, O’Leary, Drabman, Kaszdin, Bootzin, Staats, Cameron, Pierce, McMillan, Corcoran, Roueche, Kirk, Wheeler, Boylan, and Wilson. | "Others have
considered the role of tests in incentive programs. These researchers have included Homme,
Csanyi, Gonzales, Rechs, O’Leary, Drabman, Kaszdin, Bootzin, Staats, Cameron,
Pierce, McMillan, Corcoran, and Wilson. International organizations, such as
the World Bank or the Asian Development Bank, have studied the effects of
testing on education programs they sponsor.
Researchers have included Somerset, Heynemann, Ransom, Psacharopoulis,
Velez, Brooke, Oxenham, Bude, Chapman, Snyder, and Pronaratna. Moreover, the mastery learning/mastery testing experiments conducted from the 1960s through today varied incentives, frequency of tests, types of tests, and many other factors to determine the optimal structure of testing programs. Researchers included such notables as Bloom, Carroll, Keller, Block, Burns, Wentling, Anderson, Hymel, Kulik, Tierney, Cross, Okey, Guskey, Gates, and Jones." |
|
143 | Daniel M. Koretz & Laura S. Hamilton | Robert L. Brennan, Ed. | "There is no comprehensive source of information on how much time schools devote to coaching activities such as practicing on released test forms, but some studies suggest these activities are widespread." p.552 | Dismissive | Testing for Accountability in K-12 | Chapter 15 in Educational Measurement, published by NCME and ACE, 2006 | In fact the test prep, or test coaching, literature is vast and dates back decades, with meta-analyses of the literature dating back at least to the 1970s. There's even a What Works Clearinghouse summary of the (post World Wide Web) college admission test prep research literature: https://ies.ed.gov/ncee/wwc/Docs/InterventionReports/wwc_act_sat_100416.pdf . See also: Gilmore (1927) DeWeerdt (1927) French (1959) French & Dear (1959) Ortar (1960) Marron (1965) ETS (1965). Messick & Jungeblut (1981) Ellis, Konoske, Wulfeck, & Montague (1982) DerSimonian and Laird (1983) Kulik, Bangert-Drowns & Kulik (1984) Powers (1985) Samson (1985) Scruggs, White, & Bennion (1986) Jones (1986). Fraker (1986/1987) Halpin (1987) Whitla (1988) Snedecor (1989) Bond (1989). Baydar (1990) Becker (1990) Smyth (1990) Moore (1991) Alderson & Wall (1992) Powers (1993) Oren (1993). Powers & Rock (1994) Scholes, Lane (1997) Allalouf & Ben Shakhar (1998) Robb & Ercanbrack (1999) McClain (1999) Camara (1999, 2001, 2008) Stone & Lane (2000, 2003) Din & Soldan (2001) Briggs (2001) Palmer (2002) Briggs & Hansen (2004) Cankoy & Ali Tut (2005) Crocker (2005) Allensworth, Correa, & Ponisciak (2008) Domingue & Briggs (2009) Koljatic & Silva (2014) Early (2019) Herndon (2021) | ||
144 | Daniel M. Koretz & Laura S. Hamilton | Robert L. Brennan, Ed. | "As with coaching, there are no comprehensive studies of the frequency of cheating across schools in the United States." p.553 | Dismissive | Testing for Accountability in K-12 | Chapter 15 in Educational Measurement, published by NCME and ACE, 2006 | Actually, there have been, in surveys, in which respondents freely admit that they cheat and how. Moreover, news reports of cheating, by students or educators, have been voluminous. See, for example, Caveon Test Security's "Cheating in the News" section on its web site. | ||
145 | Daniel M. Koretz & Laura S. Hamilton | Robert L. Brennan, Ed. | "However, in the absence of audit testing, this hypothesis [of score inflation] cannot be tested." p.553 | Denigrating | Testing for Accountability in K-12 | Chapter 15 in Educational Measurement, published by NCME and ACE, 2006 | Koretz's preferred method for "auditing" a high-stakes test is to compare its score trends to those of a parallel no-stakes test, which, presumably, will have totally reliable score trends. Yet, a cornucopia of research has shown "no stakes" tests to be relatively unreliable, less reliable than high stakes tests, and to dampen student effort (see, e.g., Acherman & Kanfer, 2009; S. M. Brown & Walberg, 1993; Cole, Bergin, & Whittaker, 2008; Eklof, 2007; Finn, 2015; Hawthorne, Bol, Pribesh, & Suh, 2015; Wise & DeMars, 2005, 2015). | ||
146 | Daniel M. Koretz | "Research to date makes clear that score gains achieved under high-stakes conditions should not be accepted at face value. ...policymakers embarking on an effort to create a more effective system of ...accountability must face uncertainty about how well alternatives will function in practice, and should be prepared for a period of evaluation and mid-course correction." | Dismissive | Alignment, High Stakes, and the Inflation of Test Scores | CRESST Report 655, June 2005 | https://cresst.org/wp-content/uploads/R655.pdf | Koretz's preferred method for "auditing" a high-stakes test is to compare its score trends to those of a parallel no-stakes test, which, presumably, will have totally reliable score trends. Yet, a cornucopia of research has shown "no stakes" tests to be relatively unreliable, less reliable than high stakes tests, and to dampen student effort (see, e.g., Acherman & Kanfer, 2009; S. M. Brown & Walberg, 1993; Cole, Bergin, & Whittaker, 2008; Eklof, 2007; Finn, 2015; Hawthorne, Bol, Pribesh, & Suh, 2015; Wise & DeMars, 2005, 2015). | ||
147 | Daniel M. Koretz | "Thus, even in a well-aligned system, policymakers still face the challenge of designing educational accountability systems that create the right mix of incentives: incentives that will maximize real gains in student performance, minimize score inflation, and generate other desirable changes in educational practice. This is a challenge in part because of a shortage of relevant experience and research..." | Dismissive | Alignment, High Stakes, and the Inflation of Test Scores | CRESST Report 655, June 2005 | https://cresst.org/wp-content/uploads/R655.pdf | Relevant studies of the effects of varying types of incentive or the optimal structure of incentives include those of Kelley (1999); the *Southern Regional Education Board (1998); Trelfa (1998); Heneman (1998); Banta, Lund, Black & Oblander (1996); Brooks-Cooper, 1993; Eckstein & Noah (1993); Richards & Shen (1992); Jacobson (1992); Heyneman & Ransom (1992); *Levine & Lezotte (1990); Duran, 1989; *Crooks (1988); *Kulik & Kulik (1987); Corcoran & Wilson (1986); *Guskey & Gates (1986); Brook & Oxenham (1985); Oxenham (1984); Venezky & Winfield (1979); Brookover & Lezotte (1979); McMillan (1977); Abbott (1977); *Staats (1973); *Kazdin & Bootzin (1972); *O’Leary & Drabman (1971); Cronbach (1960); Hurlock (1925), and Zeng (2001). *Covers many studies; study is a research review, research synthesis, or meta-analysis. Other researchers who, even prior to 2000, studied test-based incentive programs include Homme, Csanyi, Gonzales, Rechs, O’Leary, Drabman, Kaszdin, Bootzin, Staats, Cameron, Pierce, McMillan, Corcoran, Roueche, Kirk, Wheeler, Boylan, and Wilson. | "Others
have considered the role of tests in incentive programs. These researchers have included Homme,
Csanyi, Gonzales, Rechs, O’Leary, Drabman, Kaszdin, Bootzin, Staats, Cameron,
Pierce, McMillan, Corcoran, and Wilson. International organizations, such as
the World Bank or the Asian Development Bank, have studied the effects of
testing on education programs they sponsor.
Researchers have included Somerset, Heynemann, Ransom, Psacharopoulis,
Velez, Brooke, Oxenham, Bude, Chapman, Snyder, and Pronaratna. Moreover, the mastery learning/mastery testing experiments conducted from the 1960s through today varied incentives, frequency of tests, types of tests, and many other factors to determine the optimal structure of testing programs. Researchers included such notables as Bloom, Carroll, Keller, Block, Burns, Wentling, Anderson, Hymel, Kulik, Tierney, Cross, Okey, Guskey, Gates, and Jones." |
|
148 | Daniel M. Koretz | "Research has yet to clarify how variations in the performance targets set for schools affect the incentives faced by teachers and the resulting validity of score gains." | Dismissive | Alignment, High Stakes, and the Inflation of Test Scores | CRESST Report 655, June 2005 | https://cresst.org/wp-content/uploads/R655.pdf | |||
149 | Daniel M. Koretz | "In terms of research, the jury is still out." | Dismissive | Alignment, High Stakes, and the Inflation of Test Scores | CRESST Report 655, June 2005 | https://cresst.org/wp-content/uploads/R655.pdf | |||
150 | Daniel M. Koretz | "The first study to evaluate score inflation empirically (Koretz, Linn, Dunbar, and Shepard, 1991) looked at a district-testing program in the 1980s that used commercial, off-the-shelf, multiple-choice achievement tests." | 1stness | Alignment, High Stakes, and the Inflation of Test Scores, p.7 | CRESST Report 655, June 2005 | https://cresst.org/wp-content/uploads/R655.pdf | * The most famous test score inflation study of all time -- John J. Cannells "Lake Wobegon Effect" study -- preceded Koretz's by several years. See: http://nonpartisaneducation.org/Review/Books/CannellBook1.htm http://nonpartisaneducation.org/Review/Books/Cannell2.pdf | ||
151 | Laura S. Hamilton | Daniel F. McCaffrey, J.R. Lockwood, Daniel M. Koretz | “The shortcomings of the studies make it difficult to determine the size of teacher effects, but we suspect that the magnitude of some of the effects reported in this literature are overstated.” p. xiii | Denigrating | Evaluating Value-Added Models for Teacher Accountability | Rand Corporation, 2003 | https://www.rand.org/content/dam/rand/pubs/monographs/2004/RAND_MG158.pdf | Tennessee's TVAAS value-added measurement system had been running a decade when they wrote this and did much of what these authors claim had never been done. | |
152 | Laura S. Hamilton | Daniel F. McCaffrey, J.R. Lockwood, Daniel M. Koretz | “Using VAM to estimate individual teacher effects is a recent endeavor, and many of the possible sources of error have not been thoroughly evaluated in the literature.” p. xix | Dismissive | Evaluating Value-Added Models for Teacher Accountability | Rand Corporation, 2003 | https://www.rand.org/content/dam/rand/pubs/monographs/2004/RAND_MG158.pdf | Tennessee's TVAAS value-added measurement system had been running a decade when they wrote this and did much of what these authors claim had never been done. | |
153 | Laura S. Hamilton | Daniel F. McCaffrey, J.R. Lockwood, Daniel M. Koretz | “Empirical evaluations do not exist for many of the potential sources of error we have identified. Studies need to be conducted to determine how these factors contribute to estimated teacher effects and to determine the conditions that exacerbate or mitigate the impact these factors have on teacher effects.” p. xix | Dismissive | Evaluating Value-Added Models for Teacher Accountability | Rand Corporation, 2003 | https://www.rand.org/content/dam/rand/pubs/monographs/2004/RAND_MG158.pdf | Tennessee's TVAAS value-added measurement system had been running a decade when they wrote this and did much of what these authors claim had never been done. | |
154 | Laura S. Hamilton | Daniel F. McCaffrey, J.R. Lockwood, Daniel M. Koretz | “This lack of attention to teachers in policy discussions may be attributed in part to another body of literature that attempted to determine the effects of specific teacher background characteristics, including credentialing status (e.g., Miller, McKenna, and McKenna, 1998; Goldhaber and Brewer, 2000) and subject matter coursework (e.g., Monk, 1994).” p. 8 | Dismissive | Evaluating Value-Added Models for Teacher Accountability | Rand Corporation, 2003 | https://www.rand.org/content/dam/rand/pubs/monographs/2004/RAND_MG158.pdf | Tennessee's TVAAS value-added measurement system had been running a decade when they wrote this and did much of what these authors claim had never been done. | |
155 | Laura S. Hamilton | Daniel F. McCaffrey, J.R. Lockwood, Daniel M. Koretz | “To date, there has been little empirical exploration of the size of school effects and the sensitivity of teacher effects to modeling of school effects.” p. 78 | Dismissive | Evaluating Value-Added Models for Teacher Accountability | Rand Corporation, 2003 | https://www.rand.org/content/dam/rand/pubs/monographs/2004/RAND_MG158.pdf | Tennessee's TVAAS value-added measurement system had been running a decade when they wrote this and did much of what these authors claim had never been done. | |
156 | Laura S. Hamilton | Daniel F. McCaffrey, J.R. Lockwood, Daniel M. Koretz | “There are no empirical explorations of the robustness of estimates to assumptions about prior-year schooling effects.“ p. 81 | Dismissive | Evaluating Value-Added Models for Teacher Accountability | Rand Corporation, 2003 | https://www.rand.org/content/dam/rand/pubs/monographs/2004/RAND_MG158.pdf | Tennessee's TVAAS value-added measurement system had been running a decade when they wrote this and did much of what these authors claim had never been done. | |
157 | Laura S. Hamilton | Daniel F. McCaffrey, J.R. Lockwood, Daniel M. Koretz | “There is currently no empirical evidence about the sensitivity of gain scores or teacher effects to such alternatives.” p. 89 | Dismissive | Evaluating Value-Added Models for Teacher Accountability | Rand Corporation, 2003 | https://www.rand.org/content/dam/rand/pubs/monographs/2004/RAND_MG158.pdf | Tennessee's TVAAS value-added measurement system had been running a decade when they wrote this and did much of what these authors claim had never been done. | |
158 | Laura S. Hamilton | Daniel F. McCaffrey, J.R. Lockwood, Daniel M. Koretz | “Empirical evaluations do not exist for many of the potential sources of error we have identified. Studies need to be conducted to determine how these factors contribute to estimated teacher effects and to determine the conditions that exacerbate or mitigate the impact these factors have on teacher effects.” p. 116 | Dismissive | Evaluating Value-Added Models for Teacher Accountability | Rand Corporation, 2003 | https://www.rand.org/content/dam/rand/pubs/monographs/2004/RAND_MG158.pdf | Tennessee's TVAAS value-added measurement system had been running a decade when they wrote this and did much of what these authors claim had never been done. | |
159 | Laura S. Hamilton | Daniel F. McCaffrey, J.R. Lockwood, Daniel M. Koretz | “Although we expect missing data are likely to be pervasive, there is little systematic discussion of the extent or nature of missing data in test score databases.” p. 117 | Dismissive | Evaluating Value-Added Models for Teacher Accountability | Rand Corporation, 2003 | https://www.rand.org/content/dam/rand/pubs/monographs/2004/RAND_MG158.pdf | Tennessee's TVAAS value-added measurement system had been running a decade when they wrote this and did much of what these authors claim had never been done. | |
160 | Daniel M. Koretz | "Empirical research on the validity of score gains on high-stakes tests is limited, but the studies conducted to date show…" | Dismissive | Using multiple measures to address perverse incentives an score inflation, p.21 | Educational Measurement: Issues and Practice, Summer 2003 | https://onlinelibrary.wiley.com/doi/10.1111/j.1745-3992.2003.tb00124.x | "Validity" studies are common, even routine, parts of large-scale testing programs' technical reports. | ||
161 | Daniel M. Koretz | "Research on educators' responses to high-stakes testing is also limited, …" | Dismissive | Using multiple measures to address perverse incentives an score inflation, p.21 | Educational Measurement: Issues and Practice, Summer 2003 | https://onlinelibrary.wiley.com/doi/10.1111/j.1745-3992.2003.tb00124.x | See, for example, https://www.tandfonline.com/doi/full/10.1080/15305058.2011.602920 ; https://nonpartisaneducation.org/Review/Resources/SurveyList.htm ; https://nonpartisaneducation.org/Review/Resources/QualitativeList.htm | ||
162 | Daniel M. Koretz | "Although extant research is sufficient to document problems of score inflation and unintended incentives from test-based accountability, it provides very little guidance about how one might design an accountability system to lessen these problems." | Denigrating | Using multiple measures to address perverse incentives an score inflation, p.22 | Educational Measurement: Issues and Practice, Summer 2003 | https://onlinelibrary.wiley.com/doi/10.1111/j.1745-3992.2003.tb00124.x | The vast amount of information already available just for the asking, worldwide, could help build better accountability systems, without wasting more research grant money on those who refuse to study what is already available. | ||
163 | Daniel M. Koretz | “Relatively few studies, however, provide strong empirical evidence pertaining to inflation of entire scores on tests used for accountability.” p. 759 | Denigrating | Limitations in the use of achievement tests as measures of educators’ productivity | The Journal of Human Resources, 37:4 (Fall 2002) | http://standardizedtests.procon.org/sourcefiles/limitations-in-the-use-of-achievement-tests-as-measures-of-educators-productivity.pdf | In fact the test prep, or test coaching, literature is vast and dates back decades, with meta-analyses of the literature dating back at least to the 1970s. There's even a What Works Clearinghouse summary of the (post World Wide Web) college admission test prep research literature: https://ies.ed.gov/ncee/wwc/Docs/InterventionReports/wwc_act_sat_100416.pdf . See also: Gilmore (1927) DeWeerdt (1927) French (1959) French & Dear (1959) Ortar (1960) Marron (1965) ETS (1965). Messick & Jungeblut (1981) Ellis, Konoske, Wulfeck, & Montague (1982) DerSimonian and Laird (1983) Kulik, Bangert-Drowns & Kulik (1984) Powers (1985) Samson (1985) Scruggs, White, & Bennion (1986) Jones (1986). Fraker (1986/1987) Halpin (1987) Whitla (1988) Snedecor (1989) Bond (1989). Baydar (1990) Becker (1990) Smyth (1990) Moore (1991) Alderson & Wall (1992) Powers (1993) Oren (1993). Powers & Rock (1994) Scholes, Lane (1997) Allalouf & Ben Shakhar (1998) Robb & Ercanbrack (1999) McClain (1999) Camara (1999, 2001, 2008) Stone & Lane (2000, 2003) Din & Soldan (2001) Briggs (2001) Palmer (2002) Briggs & Hansen (2004) Cankoy & Ali Tut (2005) Crocker (2005) Allensworth, Correa, & Ponisciak (2008) Domingue & Briggs (2009) Koljatic & Silva (2014) Early (2019) Herndon (2021) | ||
164 | Daniel M. Koretz | “Only a few studies have directly tested the generalizability of gains in scores on accountability-oriented tests.” p. 759 | Dismissive | Limitations in the use of achievement tests as measures of educators’ productivity | The Journal of Human Resources, 37:4 (Fall 2002) | http://standardizedtests.procon.org/sourcefiles/limitations-in-the-use-of-achievement-tests-as-measures-of-educators-productivity.pdf | "Validity" studies are common, even routine, parts of large-scale testing programs' technical reports. | ||
165 | Daniel M. Koretz | “Moreover, while there are numerous anecdotal reports of various types of coaching, little systematic research describes the range of coaching strategies and their effects.” p. 769 | Dismissive | Limitations in the use of achievement tests as measures of educators’ productivity | The Journal of Human Resources, 37:4 (Fall 2002) | http://standardizedtests.procon.org/sourcefiles/limitations-in-the-use-of-achievement-tests-as-measures-of-educators-productivity.pdf | In fact the test prep, or test coaching, literature is vast and dates back decades, with meta-analyses of the literature dating back at least to the 1970s. There's even a What Works Clearinghouse summary of the (post World Wide Web) college admission test prep research literature: https://ies.ed.gov/ncee/wwc/Docs/InterventionReports/wwc_act_sat_100416.pdf . See also: Gilmore (1927) French (1959) French & Dear (1959) Ortar (1960) Marron (1965) ETS (1965). Messick & Jungeblut (1981) Ellis, Konoske, Wulfeck, & Montague (1982) DerSimonian and Laird (1983) Kulik, Bangert-Drowns & Kulik (1984) Powers (1985) Samson (1985) Scruggs, White, & Bennion (1986) Jones (1986). Fraker (1986/1987) Halpin (1987) Whitla (1988) Snedecor (1989) Bond (1989). Baydar (1990) Becker (1990) Smyth (1990) Moore (1991) Alderson & Wall (1992) Powers (1993) Oren (1993). Powers & Rock (1994) Scholes, Lane (1997) Allalouf & Ben Shakhar (1998) Robb & Ercanbrack (1999) McClain (1999) Camara (1999, 2001, 2008) Stone & Lane (2000, 2003) Din & Soldan (2001) Briggs (2001) Palmer (2002) Briggs & Hansen (2004) Cankoy & Ali Tut (2005) Crocker (2005) Allensworth, Correa, & Ponisciak (2008) Domingue & Briggs (2009) Koljatic & Silva (2014) Early (2019) Herndon (2021) | ||
166 | Daniel M. Koretz | "Yet we have accumulating evidence that test-based accountability policies are not working as intended, and we have no adequate research-based alternative to offer to the policy community." p.774 | Dismissive | Limitations in the Use of Achievement Tests as Measures of Educators’ Productivity | The Journal of Human Resources, 37:4 (Fall 2002) | http://standardizedtests.procon.org/sourcefiles/limitations-in-the-use-of-achievement-tests-as-measures-of-educators-productivity.pdf | Test-based accountability worked just fine before 2001, when the now dominant two citation cartels took over all policy advising on the topic. As for alternatives to Koretz's conception of test-based accountability, two come to mind. First, there is the normal type that most of the world uses: stakes for students; no stakes for teachers; only administered every few years; administered externally and securely; full battery of subjects. Second, inspectorates (a poor substitute in my opinion) are used in other countries, and, yes, quite a lot of research has accumulated about them in the countries where they are used. | ||
167 | Laura S. Hamilton, Daniel M. Koretz | "There is currently no substantial evidence on the effects of published report cards on parents’ decisionmaking or on the schools themselves." | Dismissive | Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002 | Chapter 2: Tests and their use in test-based accountability systems, p.44 | https://www.rand.org/content/dam/rand/pubs/monograph_reports/2002/MR1554.pdf | For decades, consulting services have existed that help parents new to a city select the right school or school district for them. | ||
168 | Eva L. Baker, Robert L. Linn, Joan L. Herman, and Daniel Koretz | "Because experience with accountability systems is still developing, the standards we propose are intended to help evaluate existing systems and to guide the design of improved procedures." p.1 | Dismissive | Standards for
Educational Accountability Systems |
CRESST Policy Brief 5, Winter 2002 | https://www.gpo.gov/fdsys/pkg/ERIC-ED466643/pdf/ERIC-ED466643.pdf | See, for example: https://www.tandfonline.com/doi/full/10.1080/15305058.2011.602920 ; https://nonpartisaneducation.org/Review/Resources/QuantitativeList.htm . This list includes 24 studies completed before 2000 whose primary focus was to measure the effect of “test-based accountability.” A few dozen more pre-2000 studies also measured the effect of test-based accountability although such was not their primary focus. Include qualitative and program evaluation studies of test-based accountability, and the count of pre-2000 studies rises into the hundreds. | Relevant studies of the effects of tests and/or accountability program on motivation and instructional practice include those of the *Southern Regional Education Board (1998); Johnson (1998); Schafer, Hultgren, Hawley, Abrams Seubert & Mazzoni (1997); Miles, Bishop, Collins, Fink, Gardner, Grant, Hussain, et al. (1997); Tuckman & Trimble (1997); Clarke & Stephens (1996); Zigarelli (1996); Stevenson, Lee, et al. (1995); Waters, Burger & Burger (1995); Egeland (1995); Prais (1995); Tuckman (1994); Ritchie & Thorkildsen (1994); Brown & Walberg, (1993); Wall & Alderson (1993); Wolf & Rapiau (1993); Eckstein & Noah (1993); Chao-Qun & Hui (1993); Plazak & Mazur (1992); Steedman (1992); Singh, Marimutha & Mukjerjee (1990); *Levine & Lezotte (1990); O’Sullivan (1989); Somerset (1988); Pennycuick & Murphy (1988); Stevens (1984); Marsh (1984); Brunton (1982); Solberg (1977); Foss (1977); *Kirkland (1971); Somerset (1968); Stuit (1947); and Keys (1934). *Covers many studies; study is a research review, research synthesis, or meta-analysis. | |
169 | Eva L. Baker, Robert L. Linn, Joan L. Herman, and Daniel Koretz | "It is not possible at this stage in the development of accountability systems to know in advance how every element of an accountability system will actually operate in practice or what effects it will produce." p.1 | Dismissive | Standards
for Educational Accountability Systems |
CRESST Policy Brief 5, Winter 2002 | https://www.gpo.gov/fdsys/pkg/ERIC-ED466643/pdf/ERIC-ED466643.pdf | See, for example: https://www.tandfonline.com/doi/full/10.1080/15305058.2011.602920 ; https://nonpartisaneducation.org/Review/Resources/QuantitativeList.htm . This list includes 24 studies completed before 2000 whose primary focus was to measure the effect of “test-based accountability.” A few dozen more pre-2000 studies also measured the effect of test-based accountability although such was not their primary focus. Include qualitative and program evaluation studies of test-based accountability, and the count of pre-2000 studies rises into the hundreds. | Relevant studies of the effects of tests and/or accountability program on motivation and instructional practice include those of the *Southern Regional Education Board (1998); Johnson (1998); Schafer, Hultgren, Hawley, Abrams Seubert & Mazzoni (1997); Miles, Bishop, Collins, Fink, Gardner, Grant, Hussain, et al. (1997); Tuckman & Trimble (1997); Clarke & Stephens (1996); Zigarelli (1996); Stevenson, Lee, et al. (1995); Waters, Burger & Burger (1995); Egeland (1995); Prais (1995); Tuckman (1994); Ritchie & Thorkildsen (1994); Brown & Walberg, (1993); Wall & Alderson (1993); Wolf & Rapiau (1993); Eckstein & Noah (1993); Chao-Qun & Hui (1993); Plazak & Mazur (1992); Steedman (1992); Singh, Marimutha & Mukjerjee (1990); *Levine & Lezotte (1990); O’Sullivan (1989); Somerset (1988); Pennycuick & Murphy (1988); Stevens (1984); Marsh (1984); Brunton (1982); Solberg (1977); Foss (1977); *Kirkland (1971); Somerset (1968); Stuit (1947); and Keys (1934). *Covers many studies; study is a research review, research synthesis, or meta-analysis. | |
170 | Daniel M. Koretz | Michael Russell, Chingwei David Shin, Cathy Horn, Kelly Shasby | "Although hard data on affirmative action are scanty, most observers believe that selective institutions have widely employed it for several decades." p.2 | Dismissive | Testing and Diversity in Postsecondary Education: The Case of California | Education Policy Analysis Archives, 10(1), January 7, 2002 | https://epaa.asu.edu/ojs/article/view/280 | ||
171 | Daniel M. Koretz | Michael Russell, Chingwei David Shin, Cathy Horn, Kelly Shasby | "As Kane noted, 'Nearly two decades after the U.S. Supreme Court's 1978 Bakke decision, we know little about the true extent of affirmative action admissions by race or ethnicity ... Hard evidence has been difficult to obtain, primarily because many colleges guard their admissions practices closely." p.4 | Dismissive | Testing and Diversity in Postsecondary Education: The Case of California | Education Policy Analysis Archives, 10(1), January 7, 2002 | https://epaa.asu.edu/ojs/article/view/280 | ||
172 | Daniel M. Koretz | Michael Russell, Chingwei David Shin, Cathy Horn, Kelly Shasby | "Thus research leaves unclear how substantial preferences were in the states that have been at the center of the debate about the elimination of affirmative action, such as California and Texas." p.4 | Dismissive | Testing and Diversity in Postsecondary Education: The Case of California | Education Policy Analysis Archives, 10(1), January 7, 2002 | https://epaa.asu.edu/ojs/article/view/280 | ||
173 | Daniel M. Koretz | Daniel F. McCaffrey, Laura S. Hamilton | "Although high-stakes testing is now widespread, methods for evaluating the validity of gains obtained under high-stakes conditions are poorly developed. This report presents an approach for evaluating the validity of inferences based on score gains on high-stakes tests. It describes the inadequacy of traditional validation approaches for validating gains under high-stakes conditions and outlines an alternative validation framework for conceptualizing meaningful and inflated score gains.", p.1 | Denigrating | Toward a framework for validating gains under high-stakes conditions | CSE Technical Report 551, CRESST/Harvard Graduate School of Education, CRESST/RAND Education, December 2001 | https://files.eric.ed.gov/fulltext/ED462410.pdf | In fact the test prep, or test coaching, literature is vast and dates back decades, with meta-analyses of the literature dating back at least to the 1970s. There's even a What Works Clearinghouse summary of the (post World Wide Web) college admission test prep research literature: https://ies.ed.gov/ncee/wwc/Docs/InterventionReports/wwc_act_sat_100416.pdf . See also: Gilmore (1927) DeWeerdt (1927) French (1959) French & Dear (1959) Ortar (1960) Marron (1965) ETS (1965). Messick & Jungeblut (1981) Ellis, Konoske, Wulfeck, & Montague (1982) DerSimonian and Laird (1983) Kulik, Bangert-Drowns & Kulik (1984) Powers (1985) Samson (1985) Scruggs, White, & Bennion (1986) Jones (1986). Fraker (1986/1987) Halpin (1987) Whitla (1988) Snedecor (1989) Bond (1989). Baydar (1990) Becker (1990) Smyth (1990) Moore (1991) Alderson & Wall (1992) Powers (1993) Oren (1993). Powers & Rock (1994) Scholes, Lane (1997) Allalouf & Ben Shakhar (1998) Robb & Ercanbrack (1999) McClain (1999) Camara (1999, 2001, 2008) Stone & Lane (2000, 2003) Din & Soldan (2001) Briggs (2001) Palmer (2002) Briggs & Hansen (2004) Cankoy & Ali Tut (2005) Crocker (2005) Allensworth, Correa, & Ponisciak (2008) Domingue & Briggs (2009) Koljatic & Silva (2014) Early (2019) Herndon (2021) | |
174 | Daniel M. Koretz | Daniel F. McCaffrey, Laura S. Hamilton | "Few efforts are made to evaluate directly score gains obtained under high-stakes conditions, and conventional validation tools are not fully adequate for the task.", p. 1 | Dismissive | Toward a framework for validating gains under high-stakes conditions | CSE Technical Report 551, CRESST/Harvard Graduate School of Education, CRESST/RAND Education, December 2001 | https://files.eric.ed.gov/fulltext/ED462410.pdf | In fact the test prep, or test coaching, literature is vast and dates back decades, with meta-analyses of the literature dating back at least to the 1970s. There's even a What Works Clearinghouse summary of the (post World Wide Web) college admission test prep research literature: https://ies.ed.gov/ncee/wwc/Docs/InterventionReports/wwc_act_sat_100416.pdf . See also: Gilmore (1927) DeWeerdt (1927) French (1959) French & Dear (1959) Ortar (1960) Marron (1965) ETS (1965). Messick & Jungeblut (1981) Ellis, Konoske, Wulfeck, & Montague (1982) DerSimonian and Laird (1983) Kulik, Bangert-Drowns & Kulik (1984) Powers (1985) Samson (1985) Scruggs, White, & Bennion (1986) Jones (1986). Fraker (1986/1987) Halpin (1987) Whitla (1988) Snedecor (1989) Bond (1989). Baydar (1990) Becker (1990) Smyth (1990) Moore (1991) Alderson & Wall (1992) Powers (1993) Oren (1993). Powers & Rock (1994) Scholes, Lane (1997) Allalouf & Ben Shakhar (1998) Robb & Ercanbrack (1999) McClain (1999) Camara (1999, 2001, 2008) Stone & Lane (2000, 2003) Din & Soldan (2001) Briggs (2001) Palmer (2002) Briggs & Hansen (2004) Cankoy & Ali Tut (2005) Crocker (2005) Allensworth, Correa, & Ponisciak (2008) Domingue & Briggs (2009) Koljatic & Silva (2014) Early (2019) Herndon (2021) | |
175 | Daniel M. Koretz | Mark Berends | “[T]here has been little systematic research exploring changes in grading standards. …” p. iii | Dismissive | Changes in high school grading standards in mathematics, 1982–1992 | Rand Education & College Board, 2001 | http://www.rand.org/content/dam/rand/pubs/monograph_reports/2007/MR1445.pdf | See
a review of hundreds of studies:
Brookhart, S. M., Guskey, T. R., Bowers, A. J., McMillan, J. H.,
Smith, J. K., Smith, L. F., Stevens, M.T., Welsh, M. E. (2016). A Century of
Grading Research: Meaning and Value in the Most Common Educational Measure.
Review of Educational Research, 86(4), 803-848. doi: 10.3102/0034654316672069 http://doi.org/10.3102/0034654316672069 |
|
176 | Daniel M. Koretz | Mark Berends | “[F]ew studies have attempted to evaluate systematically changes in grading standards over time.” p. xi | Dismissive | Changes in high school grading standards in mathematics, 1982–1992 | Rand Education & College Board, 2001 | http://www.rand.org/content/dam/rand/pubs/monograph_reports/2007/MR1445.pdf | See
a review of hundreds of studies:
Brookhart, S. M., Guskey, T. R., Bowers, A. J., McMillan, J. H.,
Smith, J. K., Smith, L. F., Stevens, M.T., Welsh, M. E. (2016). A Century of
Grading Research: Meaning and Value in the Most Common Educational Measure.
Review of Educational Research, 86(4), 803-848. doi: 10.3102/0034654316672069 http://doi.org/10.3102/0034654316672069 |
|
177 | Daniel M. Koretz | Mark Berends | "Despite these anecdotes, generalizable empirical evidence about grade inflation is surprisingly thin."p.4 | Dismissive | Changes in high school grading standards in mathematics, 1982–1992 | Rand Education & College Board, 2001 | http://www.rand.org/content/dam/rand/pubs/monograph_reports/2007/MR1445.pdf | How about... Bursuck, W., Polloway, E. A., Plante, L., Epstein, M. H., Jayanthis, M., & McConegy, J. (1996); Conley, D. (2000, April); Howley, A., Kusimo, P. S., & Parrott, L. (2000); Sawyer, R., Laing, J., & Houston, M. (1988); Turnbull, W.W. (1985); Stone, J.E. (1995); Stanley, G. & Baines, L. (2001); Healy, P. (1997); Camara, W.J. (1998); Birk, L. (2000); Brookhart, S.M., Guskey, T.R., Bowers, A.J., McMillan, J.H., Smith, J.K., Smith, L.F., Stevens, M.T., Welsh, M.E. (2016). | |
178 | Daniel M. Koretz | Mark Berends | "The research evidence showing grade inflation over time in secondary schools is scarce." p.4 | Dismissive | Changes in high school grading standards in mathematics, 1982–1992 | Rand Education & College Board, 2001 | http://www.rand.org/content/dam/rand/pubs/monograph_reports/2007/MR1445.pdf | How about... Bursuck, W., Polloway, E. A., Plante, L., Epstein, M. H., Jayanthis, M., & McConegy, J. (1996); Conley, D. (2000, April); Howley, A., Kusimo, P. S., & Parrott, L. (2000); Sawyer, R., Laing, J., & Houston, M. (1988); Turnbull, W.W. (1985); Stone, J.E. (1995); Stanley, G. & Baines, L. (2001); Healy, P. (1997); Camara, W.J. (1998); Birk, L. (2000); Brookhart, S.M., Guskey, T.R., Bowers, A.J., McMillan, J.H., Smith, J.K., Smith, L.F., Stevens, M.T., Welsh, M.E. (2016). | |
179 | Lynn Olson (journalist) | Daniel M. Koretz, respondent | "For years, the research community has been walking behind an elephant with a broom. Policymakers start accountability systems and, on rare occasions, we have an opportunity to go in and look at what's going on." | Dismissive | "Reporter's Notebook" | Education Week, Sept.27 2000 | "Validity" studies are common, even routine, parts of large-scale testing programs' technical reports. | ||
180 | Daniel M. Koretz | E. A. Hanushek, J. J. Heckman, and D. Neal (organizers) | "Research provides sparse guidance about how to broaden the range of measured outcomes to provide a better mix of incentives and lessen score inflation.", p.27 | Dismissive | Limitations in the Use of Achievement Tests as Measures of Educators’ Productivity | Devising Incentives to Promote Human Capital, National Academy of Sciences Conference, May 2000 | http://www.irp.wisc.edu/newsevents/other/symposia/koretz.pdf | Relevant studies of the effects of varying types of incentive or the optimal structure of incentives include those of Kelley (1999); the *Southern Regional Education Board (1998); Trelfa (1998); Heneman (1998); Banta, Lund, Black & Oblander (1996); Brooks-Cooper, 1993; Eckstein & Noah (1993); Richards & Shen (1992); Jacobson (1992); Heyneman & Ransom (1992); *Levine & Lezotte (1990); Duran, 1989; *Crooks (1988); *Kulik & Kulik (1987); Corcoran & Wilson (1986); *Guskey & Gates (1986); Brook & Oxenham (1985); Oxenham (1984); Venezky & Winfield (1979); Brookover & Lezotte (1979); McMillan (1977); Abbott (1977); *Staats (1973); *Kazdin & Bootzin (1972); *O’Leary & Drabman (1971); Cronbach (1960); and Hurlock (1925). *Covers many studies; study is a research review, research synthesis, or meta-analysis. | "Others have
considered the role of tests in incentive programs. These researchers have included Homme,
Csanyi, Gonzales, Rechs, O’Leary, Drabman, Kaszdin, Bootzin, Staats, Cameron,
Pierce, McMillan, Corcoran, and Wilson. International organizations, such as
the World Bank or the Asian Development Bank, have studied the effects of
testing on education programs they sponsor.
Researchers have included Somerset, Heynemann, Ransom, Psacharopoulis,
Velez, Brooke, Oxenham, Bude, Chapman, Snyder, and Pronaratna. Moreover, the mastery learning/mastery testing experiments conducted from the 1960s through today varied incentives, frequency of tests, types of tests, and many other factors to determine the optimal structure of testing programs. Researchers included such notables as Bloom, Carroll, Keller, Block, Burns, Wentling, Anderson, Hymel, Kulik, Tierney, Cross, Okey, Guskey, Gates, and Jones." |
181 | Daniel M. Koretz | E. A. Hanushek, J. J. Heckman, and D. Neal (organizers) | "...what types of accountability systems might be more effective, and what role might achievement tests play in them? Unfortunately, there is little basis in research for answering this question. The simple test-based accountability systems that have been in vogue for the past two decades have appeared so commonsensical to some policymakers that they have had little incentive to permit the evaluation of alternatives.", p.25 | Dismissive | Limitations in the Use of Achievement Tests as Measures of Educators’ Productivity | Devising Incentives to Promote Human Capital, National Academy of Sciences Conference, May 2000 | http://www.irp.wisc.edu/newsevents/other/symposia/koretz.pdf | Externally administered high-stakes testing is widely reviled among US educationists. It strains credulity that Koretz can not find one district out of the many thousands to cooperate with him to discredit testing. | |
182 | Daniel M. Koretz | E. A. Hanushek, J. J. Heckman, and D. Neal (organizers) | "...while there are numerous anecdotal reports of various types of coaching, little systematic research describes the range of coaching strategies and their effects.", p.24 | Denigrating | Limitations in the Use of Achievement Tests as Measures of Educators’ Productivity | Devising Incentives to Promote Human Capital, National Academy of Sciences Conference, May 2000 | http://www.irp.wisc.edu/newsevents/other/symposia/koretz.pdf | In fact the test prep, or test coaching, literature is vast and dates back decades, with meta-analyses of the literature dating back at least to the 1970s. There's even a What Works Clearinghouse summary of the (post World Wide Web) college admission test prep research literature: https://ies.ed.gov/ncee/wwc/Docs/InterventionReports/wwc_act_sat_100416.pdf . See also: Gilmore (1927) DeWeerdt (1927) French (1959) French & Dear (1959) Ortar (1960) Marron (1965) ETS (1965). Messick & Jungeblut (1981) Ellis, Konoske, Wulfeck, & Montague (1982) DerSimonian and Laird (1983) Kulik, Bangert-Drowns & Kulik (1984) Powers (1985) Samson (1985) Scruggs, White, & Bennion (1986) Jones (1986). Fraker (1986/1987) Halpin (1987) Whitla (1988) Snedecor (1989) Bond (1989). Baydar (1990) Becker (1990) Smyth (1990) Moore (1991) Alderson & Wall (1992) Powers (1993) Oren (1993). Powers & Rock (1994) Scholes, Lane (1997) Allalouf & Ben Shakhar (1998) Robb & Ercanbrack (1999) McClain (1999) Camara (1999, 2001, 2008) Stone & Lane (2000, 2003) Din & Soldan (2001) Briggs (2001) Palmer (2002) Briggs & Hansen (2004) Cankoy & Ali Tut (2005) Crocker (2005) Allensworth, Correa, & Ponisciak (2008) Domingue & Briggs (2009) Koljatic & Silva (2014) Early (2019) Herndon (2021) | |
183 | Daniel M. Koretz | E. A. Hanushek, J. J. Heckman, and D. Neal (organizers) | "Only a few studies have directly tested the generalizability of gains in scores on accountability-oriented tests.", p.11 | Denigrating | Limitations in the Use of Achievement Tests as Measures of Educators’ Productivity | Devising Incentives to Promote Human Capital, National Academy of Sciences Conference, May 2000 | http://www.irp.wisc.edu/newsevents/other/symposia/koretz.pdf | "Validity" studies are common, even routine, parts of large-scale testing programs' technical reports. | |
184 | Daniel M. Koretz | E. A. Hanushek, J. J. Heckman, and D. Neal (organizers) | "Relatively few studies, however, provide strong empirical evidence pertaining to inflation of entire scores on tests used for accountability. Policy makers have little incentive to facilitate such studies, and they can be difficult to carry out.", p.11 | Denigrating | Limitations in the Use of Achievement Tests as Measures of Educators’ Productivity | Devising Incentives to Promote Human Capital, National Academy of Sciences Conference, May 2000 | http://www.irp.wisc.edu/newsevents/other/symposia/koretz.pdf | Externally administered high-stakes testing is widely reviled among US educationists. It strains credulity that Koretz can not find one district out of the many thousands to cooperate with him to discredit testing. | |
185 | Daniel M. Koretz | Laura Hamilton | "Efforts to increase the participation of students with disabilities in large-scale assessments, however, are hindered by a lack of experience and systematic information (National Research Council, 1997). For example, there is little systematic information on the use or effects of special testing accommodations for elementary and secondary students with disabilities. | Dismissive | Assessing Students With Disabilities in Kentucky:The Effects of Accommodations, Format, and Subject, p.2 | CSE Technical Report 498, CRESST/Rand Education, January 1999 | https://files.eric.ed.gov/fulltext/ED440148.pdf | Difficult to believe given that the federal government has for decades generously funded research into testing students with disabilities. See, for example, https://nceo.info/ and Kurt Geisinger's and Janet Carlson's chapters in Defending Standardized Testing and Correcting Fallacies in Educational and Psychological Testing. | |
186 | Daniel M. Koretz | Laura Hamilton | "In addition, there is little evidence about the effects of format differences on the assessment of students with disabilities." | Dismissive | Assessing Students With Disabilities in Kentucky:The Effects of Accommodations, Format, and Subject, p.2 | CSE Technical Report 498, CRESST/Rand Education, January 1999 | https://files.eric.ed.gov/fulltext/ED440148.pdf | Difficult to believe given that the federal government has for decades generously funded research into testing students with disabilities. See, for example, https://nceo.info/ and Kurt Geisinger's and Janet Carlson's chapters in Defending Standardized Testing and Correcting Fallacies in Educational and Psychological Testing. | |
187 | Daniel M. Koretz | Laura Hamilton | "Others have argued the opposite, pointing out that open-response questions, for example, mix verbal skills with other skills to be measured and may make it more difficult to isolate and compensate for the effects of disabilities. Relevant research, however, is scarce." | Dismissive | Assessing Students With Disabilities in Kentucky:The Effects of Accommodations, Format, and Subject, p.2 | CSE Technical Report 498, CRESST/Rand Education, January 1999 | https://files.eric.ed.gov/fulltext/ED440148.pdf | Difficult to believe given that the federal government has for decades generously funded research into testing students with disabilities. See, for example, https://nceo.info/ and Kurt Geisinger's and Janet Carlson's chapters in Defending Standardized Testing and Correcting Fallacies in Educational and Psychological Testing. | |
188 | Daniel M. Koretz | Laura Hamilton | "There is a clear need for additional descriptive studies of the performance of students with disabilities in large-scale assessments. In our earlier study, we noted that research evidence was sparse " | Dismissive | Assessing Students With Disabilities in Kentucky:The Effects of Accommodations, Format, and Subject, p.56 | CSE Technical Report 498, CRESST/Rand Education, January 1999 | https://files.eric.ed.gov/fulltext/ED440148.pdf | Difficult to believe given that the federal government has for decades generously funded research into testing students with disabilities. See, for example, https://nceo.info/ and Kurt Geisinger's and Janet Carlson's chapters in Defending Standardized Testing and Correcting Fallacies in Educational and Psychological Testing. | |
189 | Daniel M. Koretz | Sheila I. Barron | "In the absence of systematic research documenting test-based accountability systems that have avoided the problem of inflated gains …” p. xvii | Dismissive | The validity of gains in scores on the Kentucky Instructional Results Information System (KIRIS) | Rand Education, 1998 | http://www.rand.org/content/dam/rand/pubs/monograph_reports/2009/MR1014.pdf | In fact the test prep, or test coaching, literature is vast and dates back decades, with meta-analyses of the literature dating back at least to the 1970s. There's even a What Works Clearinghouse summary of the (post World Wide Web) college admission test prep research literature: https://ies.ed.gov/ncee/wwc/Docs/InterventionReports/wwc_act_sat_100416.pdf . See also: Gilmore (1927) DeWeerdt (1927) French (1959) French & Dear (1959) Ortar (1960) Marron (1965) ETS (1965). Messick & Jungeblut (1981) Ellis, Konoske, Wulfeck, & Montague (1982) DerSimonian and Laird (1983) Kulik, Bangert-Drowns & Kulik (1984) Powers (1985) Samson (1985) Scruggs, White, & Bennion (1986) Jones (1986). Fraker (1986/1987) Halpin (1987) Whitla (1988) Snedecor (1989) Bond (1989). Baydar (1990) Becker (1990) Smyth (1990) Moore (1991) Alderson & Wall (1992) Powers (1993) Oren (1993). Powers & Rock (1994) Scholes, Lane (1997) Allalouf & Ben Shakhar (1998) Robb & Ercanbrack (1999) McClain (1999) Camara (1999, 2001, 2008) Stone & Lane (2000, 2003) Din & Soldan (2001) Briggs (2001) Palmer (2002) Briggs & Hansen (2004) Cankoy & Ali Tut (2005) Crocker (2005) Allensworth, Correa, & Ponisciak (2008) Domingue & Briggs (2009) Koljatic & Silva (2014) Early (2019) Herndon (2021) | |
190 | Daniel M. Koretz | Sheila I. Barron | "This study also illustrated in numerous ways the limitations of current research on the validity of gains.” p. xviii | Dismissive | The validity of gains in scores on the Kentucky Instructional Results Information System (KIRIS) | Rand Education, 1998 | http://www.rand.org/content/dam/rand/pubs/monograph_reports/2009/MR1014.pdf | In fact the test prep, or test coaching, literature is vast and dates back decades, with meta-analyses of the literature dating back at least to the 1970s. There's even a What Works Clearinghouse summary of the (post World Wide Web) college admission test prep research literature: https://ies.ed.gov/ncee/wwc/Docs/InterventionReports/wwc_act_sat_100416.pdf . See also: Gilmore (1927) DeWeerdt (1927) French (1959) French & Dear (1959) Ortar (1960) Marron (1965) ETS (1965). Messick & Jungeblut (1981) Ellis, Konoske, Wulfeck, & Montague (1982) DerSimonian and Laird (1983) Kulik, Bangert-Drowns & Kulik (1984) Powers (1985) Samson (1985) Scruggs, White, & Bennion (1986) Jones (1986). Fraker (1986/1987) Halpin (1987) Whitla (1988) Snedecor (1989) Bond (1989). Baydar (1990) Becker (1990) Smyth (1990) Moore (1991) Alderson & Wall (1992) Powers (1993) Oren (1993). Powers & Rock (1994) Scholes, Lane (1997) Allalouf & Ben Shakhar (1998) Robb & Ercanbrack (1999) McClain (1999) Camara (1999, 2001, 2008) Stone & Lane (2000, 2003) Din & Soldan (2001) Briggs (2001) Palmer (2002) Briggs & Hansen (2004) Cankoy & Ali Tut (2005) Crocker (2005) Allensworth, Correa, & Ponisciak (2008) Domingue & Briggs (2009) Koljatic & Silva (2014) Early (2019) Herndon (2021) | |
191 | Daniel M. Koretz | Sheila I. Barron | “The field of measurement has seen many decades of intensive development of methods for evaluating scores cross-sectionally, but much less attention has been devoted to the problem of evaluating gains. . . . [T]his methodological gap is likely to become ever more important.” p. 122 | Dismissive | The validity of gains in scores on the Kentucky Instructional Results Information System (KIRIS) | Rand Education, 1998 | http://www.rand.org/content/dam/rand/pubs/monograph_reports/2009/MR1014.pdf | In fact the test prep, or test coaching, literature is vast and dates back decades, with meta-analyses of the literature dating back at least to the 1970s. There's even a What Works Clearinghouse summary of the (post World Wide Web) college admission test prep research literature: https://ies.ed.gov/ncee/wwc/Docs/InterventionReports/wwc_act_sat_100416.pdf . See also: Gilmore (1927) DeWeerdt (1927) French (1959) French & Dear (1959) Ortar (1960) Marron (1965) ETS (1965). Messick & Jungeblut (1981) Ellis, Konoske, Wulfeck, & Montague (1982) DerSimonian and Laird (1983) Kulik, Bangert-Drowns & Kulik (1984) Powers (1985) Samson (1985) Scruggs, White, & Bennion (1986) Jones (1986). Fraker (1986/1987) Halpin (1987) Whitla (1988) Snedecor (1989) Bond (1989). Baydar (1990) Becker (1990) Smyth (1990) Moore (1991) Alderson & Wall (1992) Powers (1993) Oren (1993). Powers & Rock (1994) Scholes, Lane (1997) Allalouf & Ben Shakhar (1998) Robb & Ercanbrack (1999) McClain (1999) Camara (1999, 2001, 2008) Stone & Lane (2000, 2003) Din & Soldan (2001) Briggs (2001) Palmer (2002) Briggs & Hansen (2004) Cankoy & Ali Tut (2005) Crocker (2005) Allensworth, Correa, & Ponisciak (2008) Domingue & Briggs (2009) Koljatic & Silva (2014) Early (2019) Herndon (2021) | |
192 | Daniel M. Koretz | Sheila I. Barron | “The contrast between mathematics … and reading … underlines the limits of our current knowledge of the mechanisms that underlie score inflation.” p. 122 | Dismissive | The validity of gains in scores on the Kentucky Instructional Results Information System (KIRIS) | Rand Education, 1998 | http://www.rand.org/content/dam/rand/pubs/monograph_reports/2009/MR1014.pdf | In fact the test prep, or test coaching, literature is vast and dates back decades, with meta-analyses of the literature dating back at least to the 1970s. There's even a What Works Clearinghouse summary of the (post World Wide Web) college admission test prep research literature: https://ies.ed.gov/ncee/wwc/Docs/InterventionReports/wwc_act_sat_100416.pdf . See also: Gilmore (1927) DeWeerdt (1927) French (1959) French & Dear (1959) Ortar (1960) Marron (1965) ETS (1965). Messick & Jungeblut (1981) Ellis, Konoske, Wulfeck, & Montague (1982) DerSimonian and Laird (1983) Kulik, Bangert-Drowns & Kulik (1984) Powers (1985) Samson (1985) Scruggs, White, & Bennion (1986) Jones (1986). Fraker (1986/1987) Halpin (1987) Whitla (1988) Snedecor (1989) Bond (1989). Baydar (1990) Becker (1990) Smyth (1990) Moore (1991) Alderson & Wall (1992) Powers (1993) Oren (1993). Powers & Rock (1994) Scholes, Lane (1997) Allalouf & Ben Shakhar (1998) Robb & Ercanbrack (1999) McClain (1999) Camara (1999, 2001, 2008) Stone & Lane (2000, 2003) Din & Soldan (2001) Briggs (2001) Palmer (2002) Briggs & Hansen (2004) Cankoy & Ali Tut (2005) Crocker (2005) Allensworth, Correa, & Ponisciak (2008) Domingue & Briggs (2009) Koljatic & Silva (2014) Early (2019) Herndon (2021) | |
193 | Daniel M. Koretz | reported by Debra Viadero | “...all of the researchers interviewed agreed with FairTest’s contention that research evidence supporting the use of high-stakes tests as a means of improving schools is thin.” | Dismissive | FairTest report questions reliance on high-stakes testing by states | Debra Viadero, Education Week.January 28, 1998. | In fact, a very large number of studies do so. See, for example, https://journals.sagepub.com/doi/abs/10.1177/0193841X19865628#abstract & https://www.tandfonline.com/doi/full/10.1080/15305058.2011.602920 ; https://nonpartisaneducation.org/Review/Resources/QuantitativeList.htm ; https://nonpartisaneducation.org/Review/Resources/SurveyList.htm ; https://nonpartisaneducation.org/Review/Resources/QualitativeList.htm | ||
194 | Robert L. Linn | Daniel M. Koretz, Eva Baker | “’Yet we do not have the necessary comprehensive dependable data. . . .’ (Tyler 1996a, p. 95)” p. 8 | Dismissive | Assessing the Validity of the National Assessment of Educational Progress | CSE Technical Report 416 (June 1996) | http://www.cse.ucla.edu/products/reports/TECH416.pdf | There was extended discussion and consideration. Simply put, they did not get their way because others disagreed with them. | |
195 | Robert L. Linn | Daniel M. Koretz, Eva Baker | "“There is a need for more extended discussion and reconsideration of the approach being used to measure long-term trends.” p. 21 | Dismissive | Assessing the Validity of the National Assessment of Educational Progress | CSE Technical Report 416 (June 1996) | http://www.cse.ucla.edu/products/reports/TECH416.pdf | There was extended discussion and consideration. Simply put, they did not get their way because others disagreed with them. | |
196 | Robert L. Linn | Daniel M. Koretz, Eva Baker | "“Only a small minority of the articles that discussed achievement levels made any mention of the judgmental nature of the levels, and most of those did so only briefly.” p. 27 | Denigrating | Assessing the Validity of the National Assessment of Educational Progress | CSE Technical Report 416 (June 1996) | http://www.cse.ucla.edu/products/reports/TECH416.pdf | All achievement levels, just like all course grades, are set subjectively. This information was never hidden. | |
197 | Daniel M. Koretz | Erik A. Hanushek, D.W. Jorgenson (Eds.) | "Despite the long history of assessment-based accountability, hard evidence about its effects is surprisingly sparse, and the little evidence that is available is not encouraging. ...The large positive effects assumed by advocates...are often not substantiated by hard evidence....” p.172 | Dismissive | Using student assessments for educational accountability | Improving America’s schools: The role of incentives. Washington, D.C.: National Academy Press, 1996 | https://www.nap.edu/catalog/5143/improving-americas-schools-the-role-of-incentives | In fact, a very large number of studies do so. See, for example, https://journals.sagepub.com/doi/abs/10.1177/0193841X19865628#abstract & https://www.tandfonline.com/doi/full/10.1080/15305058.2011.602920 ; https://nonpartisaneducation.org/Review/Resources/QuantitativeList.htm ; https://nonpartisaneducation.org/Review/Resources/SurveyList.htm ; https://nonpartisaneducation.org/Review/Resources/QualitativeList.htm | |
198 | Daniel M. Koretz | Erik A. Hanushek, D.W. Jorgenson (Eds.) | "The testing of the 1980s reform movement fell into disfavor surprisingly soon. Confidence in the reforms was so high at the outset that few programs were evaluated realistically." p.173 | Dismissive | Using student assessments for educational accountability | Improving America’s schools: The role of incentives. Washington, D.C.: National Academy Press, 1996 | https://www.nap.edu/catalog/5143/improving-americas-schools-the-role-of-incentives | In fact, a very large number of studies do so. See, for example, https://journals.sagepub.com/doi/abs/10.1177/0193841X19865628#abstract & https://www.tandfonline.com/doi/full/10.1080/15305058.2011.602920 ; https://nonpartisaneducation.org/Review/Resources/QuantitativeList.htm ; https://nonpartisaneducation.org/Review/Resources/SurveyList.htm ; https://nonpartisaneducation.org/Review/Resources/QualitativeList.htm | |
199 | Daniel M. Koretz | Erik A. Hanushek, D.W. Jorgenson (Eds.) | "Although overconfidence in the test-based reforms of the 1980s resulted in a scarcity of research on their impact, there is enough evidence to paint a discouraging picture." p.181 | Dismissive | Using student assessments for educational accountability | Improving America’s schools: The role of incentives. Washington, D.C.: National Academy Press, 1996 | https://www.nap.edu/catalog/5143/improving-americas-schools-the-role-of-incentives | In fact, a very large number of studies do so. See, for example, https://journals.sagepub.com/doi/abs/10.1177/0193841X19865628#abstract & https://www.tandfonline.com/doi/full/10.1080/15305058.2011.602920 ; https://nonpartisaneducation.org/Review/Resources/QuantitativeList.htm ; https://nonpartisaneducation.org/Review/Resources/SurveyList.htm ; https://nonpartisaneducation.org/Review/Resources/QualitativeList.htm | |
200 | Daniel M. Koretz | Erik A. Hanushek, D.W. Jorgenson (Eds.) | "Although Cannell's report was wrong in some of the specifics, his basic conclusion that an implausible proportion of jurisdictions were above the national average was confirmed. | Denigrating | Using student assessments for educational accountability | Improving America’s schools: The role of incentives. Washington, D.C.: National Academy Press, 1996 | https://www.nap.edu/catalog/5143/improving-americas-schools-the-role-of-incentives | No. Cannell was exactly right. The cause was corruption, lax security, and cheating. See, for example, https://nonpartisaneducation.org/Review/Articles/v6n3.htm | |
201 | Daniel M. Koretz | Erik A. Hanushek, D.W. Jorgenson (Eds.) | "Nevertheless, evidence about the instructional effects of performance assessment programs remains scarce. It is not clear under what circumstances these programs are conducive to improved teaching or what the effects are on student achievement." p.188 | Dismissive | Using student assessments for educational accountability | Improving America’s schools: The role of incentives. Washington, D.C.: National Academy Press, 1996 | https://www.nap.edu/catalog/5143/improving-americas-schools-the-role-of-incentives | In fact, a very large number of studies do so. See, for example, https://journals.sagepub.com/doi/abs/10.1177/0193841X19865628#abstract & https://www.tandfonline.com/doi/full/10.1080/15305058.2011.602920 ; https://nonpartisaneducation.org/Review/Resources/QuantitativeList.htm ; https://nonpartisaneducation.org/Review/Resources/SurveyList.htm ; https://nonpartisaneducation.org/Review/Resources/QualitativeList.htm | |
202 | Daniel M. Koretz | Erik A. Hanushek, D.W. Jorgenson (Eds.) | "The discussion above represents a fairly discouraging assessment of test-based accountability. Traditional approaches have not worked well, and the scanty available evidence does not suggest that shifting to innovative testing formats will overcome their deficiencies." p.189 | Dismissive | Using student assessments for educational accountability | Improving America’s schools: The role of incentives. Washington, D.C.: National Academy Press, 1996 | https://www.nap.edu/catalog/5143/improving-americas-schools-the-role-of-incentives | In fact, a very large number of studies do so. See, for example, https://journals.sagepub.com/doi/abs/10.1177/0193841X19865628#abstract & https://www.tandfonline.com/doi/full/10.1080/15305058.2011.602920 ; https://nonpartisaneducation.org/Review/Resources/QuantitativeList.htm ; https://nonpartisaneducation.org/Review/Resources/SurveyList.htm ; https://nonpartisaneducation.org/Review/Resources/QualitativeList.htm | |
203 | Daniel M. Koretz | "Some observers have maintained that performance assessments used for accountability are vulnerable to the same problem [of score inflation]. Evidence at this point is scarce...." p.52 | Dismissive | Final Report: Perceived Effects of the Maryland School Performance Assessment Program | CSE Technical Report 409, CRESST/Rand Education, March 1996 | http://cresst.org/wp-content/uploads/TECH409.pdf | In fact the test prep, or test coaching, literature is vast and dates back decades, with meta-analyses of the literature dating back at least to the 1970s. There's even a What Works Clearinghouse summary of the (post World Wide Web) college admission test prep research literature: https://ies.ed.gov/ncee/wwc/Docs/InterventionReports/wwc_act_sat_100416.pdf . See also: Gilmore (1927) DeWeerdt (1927) French (1959) French & Dear (1959) Ortar (1960) Marron (1965) ETS (1965). Messick & Jungeblut (1981) Ellis, Konoske, Wulfeck, & Montague (1982) DerSimonian and Laird (1983) Kulik, Bangert-Drowns & Kulik (1984) Powers (1985) Samson (1985) Scruggs, White, & Bennion (1986) Jones (1986). Fraker (1986/1987) Halpin (1987) Whitla (1988) Snedecor (1989) Bond (1989). Baydar (1990) Becker (1990) Smyth (1990) Moore (1991) Alderson & Wall (1992) Powers (1993) Oren (1993). Powers & Rock (1994) Scholes, Lane (1997) Allalouf & Ben Shakhar (1998) Robb & Ercanbrack (1999) McClain (1999) Camara (1999, 2001, 2008) Stone & Lane (2000, 2003) Din & Soldan (2001) Briggs (2001) Palmer (2002) Briggs & Hansen (2004) Cankoy & Ali Tut (2005) Crocker (2005) Allensworth, Correa, & Ponisciak (2008) Domingue & Briggs (2009) Koljatic & Silva (2014) Early (2019) Herndon (2021) | ||
204 | Daniel M. Koretz | "Despite the intense controversy engendered by proposals for national testing, questions in the second and third sets-the essential questions about the practicality and likely effects of national testing-have been aired insufficiently in many quarters." p.31 | Dismissive | A Call for Caution: NAEP and National Testing: Issues and Implications for Educators | NASSP Bulletin, September 1992 | There was extended discussion and consideration. Simply put, they did not get their way because others disagreed with them. | |||
205 | Daniel M. Koretz | "Although these proposed new uses for NAEP relatively may seem straightforward, they actually raise a number of difficult technical issues. I will note four, none of which has received sufficient attention in the policy debate about national testing." p.34 | Dismissive | A Call for Caution: NAEP and National Testing: Issues and Implications for Educators | NASSP Bulletin, September 1992 | There was extended discussion and consideration. Simply put, they did not get their way because others disagreed with them. | |||
206 | Daniel M. Koretz | "Data [from the NAEP] about educational factors that influence achievement are sparse,… " p.37 | Dismissive | A Call for Caution: NAEP and National Testing: Issues and Implications for Educators | NASSP Bulletin, September 1992 | However, there exist an abundance of other sources of that information which could be combined with NAEP data to paint the bigger picture. | |||
207 | Daniel M. Koretz | "Moreover, even if NAEP could be strengthened to the point where it could reliably identify states with better educational programs, it would be unable, as is currently structured, to provide trustworthy information about which aspects of those programs matter, because its information on educational policies and practices is limited." pp.37–38 | Dismissive | A Call for Caution: NAEP and National Testing: Issues and Implications for Educators | NASSP Bulletin, September 1992 | However, there exist an abundance of other sources of that information which could be combined with NAEP data to paint the bigger picture. | |||
208 | Daniel M. Koretz | George F. Madaus, Edward Haertel, Albert E. Beaton | " … to the extent that the proposed assessments really are innovative, they are in many cases unfinished and untested. They are at a stage where they are ripe for a serious R&D effort, complete with rigorous evaluation, but they are not yet ready to be a linchpin of national policy." | Dismissive | Congressional Testimony: National Educational Standards and Testing: A Response to the Recommendations of the National Council on Education Standards and Testing, 1992 | RAND Institute on Education and Training | https://www.rand.org/pubs/testimonies/CT100.html | Actually, high-quality evaluations of testing interventions have been numerous and common over the past century. Most of them do not produce the results that Koretz prefers, however, so he declares them nonexistent. See https://www.tandfonline.com/doi/full/10.1080/15305058.2011.602920 ; https://nonpartisaneducation.org/Review/Resources/QuantitativeList.htm ; https://nonpartisaneducation.org/Review/Resources/SurveyList.htm ; https://nonpartisaneducation.org/Review/Resources/QualitativeList.htm. Indeed, a wave of "sunshine" laws in states from the 1970s to the 1990s required program evaluations. Those evaluations may reside on shelves in state libraries rather than in academic journals, but they exist. | |
209 | Daniel M. Koretz | George F. Madaus, Edward Haertel, Albert E. Beaton | "To our knowledge, there is no evidence that performance tests are less susceptible tho this problem [narrowing the curriculum] than conventional tests." p.6 | Dismissive | Congressional Testimony: National Educational Standards and Testing: A Response to the Recommendations of the National Council on Education Standards and Testing, 1992 | RAND Institute on Education and Training | https://www.rand.org/pubs/testimonies/CT100.html | ||
210 | Daniel M. Koretz | George F. Madaus, Edward Haertel, Albert E. Beaton | "A test cannot be validated by asking a group of individuals to examine its content, …. p.11 | Dismissive | Congressional Testimony: National Educational Standards and Testing: A Response to the Recommendations of the National Council on Education Standards and Testing, 1992 | RAND Institute on Education and Training | https://www.rand.org/pubs/testimonies/CT100.html | Not for every type of validation, but for some, yes, that's exactly how to do it -- actual classroom teachers for content validation, for example. | |
211 | Daniel M. Koretz | George F. Madaus, Edward Haertel, Albert E. Beaton | "During the 1980s, very few jurisdictions using test-based accountability evaluated the effects of their programs, and some flatly refused outside evaluations" | Dismissive | Congressional Testimony: National Educational Standards and Testing: A Response to the Recommendations of the National Council on Education Standards and Testing, 1992 | RAND Institute on Education and Training | https://www.rand.org/pubs/testimonies/CT100.html | Actually, high-quality evaluations of testing interventions have been numerous and common over the past century. Most of them do not produce the results that Koretz prefers, however, so he declares them nonexistent. See https://www.tandfonline.com/doi/full/10.1080/15305058.2011.602920 ; https://nonpartisaneducation.org/Review/Resources/QuantitativeList.htm ; https://nonpartisaneducation.org/Review/Resources/SurveyList.htm ; https://nonpartisaneducation.org/Review/Resources/QualitativeList.htm. Indeed, a wave of "sunshine" laws in states from the 1970s to the 1990s required program evaluations. Those evaluations may reside on shelves in state libraries rather than in academic journals, but they exist. | |
212 | Daniel Koretz | "There is no agreement, however, about the precautions that must be taken. Now that states are ranked (and evaluated) by the media on the basis of NAEP scores, will the breadth of the NAEP and the test security procedures suffice to keep the test uncorrupted? What about the NAGB proposal to permit use of NAEP at the local level? At this point, we have no firm answers, and consequences of inadequate caution could be large indeed." | Dismissive | NAEP and the Movement toward National Testing, p.10 | Paper presented at the Annual Meeting of the American Educational Research Association (SanFrancisco, CA, April 20-24, 1992) | ||||
213 | Daniel Koretz | "Data about educational factors that influence achievement are sparse and are partially at the wrong level of aggregation; many of the important factors vary at the level of districts, schools, teachers, or even specific classes." | Dismissive | NAEP and the Movement toward National Testing, p.11 | Paper presented at the Annual Meeting of the American Educational Research Association (SanFrancisco, CA, April 20-24, 1992) | ||||
214 | Daniel M. Koretz | Robert L. Linn, Stephen Dunbar, Lorrie A. Shepard | “Evidence relevant to this debate has been limited.” p. 2 | Dismissive | The Effects of High-Stakes Testing On Achievement: Preliminary Findings About Generalization Across Tests | Originally presented at the annual meeting of the AERA and the NCME, Chicago, April 5, 1991 | http://nepc.colorado.edu/files/HighStakesTesting.pdf | In fact, a very large number of studies do so. See, for example, https://journals.sagepub.com/doi/abs/10.1177/0193841X19865628#abstract & https://www.tandfonline.com/doi/full/10.1080/15305058.2011.602920 ; https://nonpartisaneducation.org/Review/Resources/QuantitativeList.htm ; https://nonpartisaneducation.org/Review/Resources/SurveyList.htm ; https://nonpartisaneducation.org/Review/Resources/QualitativeList.htm | |
215 | Daniel Koretz | "Much of the debate about reforms appears to be predicated on an implicit view--never empirically substantiated--that a relatively small number of educational factors can account for much of the decline [in test scores]." | Dismissive | Educational Practices, Trends in Achievement, and the Potential of the Reform Movement, p.351 | Educational Administration Quarterly, 24(3), August 1988, 350-359 | ||||
IRONIES: | |||||||||
Daniel M. Koretz | "I discuss a number of important issues that have arisen in K-12 testing and explore their implications for testing in the postsecondary sector. These include ... overstating comparability ... and unwarranted causal inference." | Measuring Postsecondary Achievement: Lessons from Large-Scale Assessments in the K-12 Sector | Higher Education Policy, April 24, 2019, Abstract | https://link.springer.com/article/10.1057/s41307-019-00142-4 | No one overstates comparability between tests more than Daniel Koretz. | ||||
Daniel M. Koretz | "Although this problem has been documented for more than a quarter of a century, it is still widely ignored, and the public is fed a steady diet of seriously misleading information about improvements in schools." | The Testing Charade: Pretending to Make Schools Better [Kindle location 3229] | University of Chicago Press, 2017 | They are fed a steady diet of seriously misleading information from him. | |||||
Daniel Koretz | "That is the debate I hope Charade will promote. We need to face up to the findings of three decades of research on the effects of TBA and engage in a vigorous debate about how best to move forward, including discussion about how best to use standardized testing. " | A Realistic Perspective on High-Stakes Testing | Education Next, November 21, 2017 | ||||||
Daniel Koretz | "I stress that there is ample room for debate about how to move forward and that regardless of who wins those debates, we will make mistakes." | A Realistic Perspective on High-Stakes Testing | Education Next, November 21, 2017 | ||||||
Daniel M. Koretz | "It is worth considering why we are so unlikely to ever find out how common cheating has become. … the press remains gullible…" | The Testing Charade: Pretending to Make Schools Better [Kindle location 3229] | University of Chicago Press, 2017 | We know from surveys how common cheating is. It is very common. | |||||
Daniel M. Koretz | "…putting a stop to this disdain for evidence--this arrogant assumption that we know so much that we don't have to bother evaluating our ideas before imposing them on teachers and students--is one of the most important changes we have to make." | The Testing Charade: Pretending to Make Schools Better [Kindle location 3229] | University of Chicago Press, 2017 | Koretz steadfastly avoids any debate, any evaluation or criticism of his convoluted claims. To hear him tell it, no researcher in the world disagrees with him, or can produce any evidence that counters his assertions. | |||||
Daniel M. Koretz | "But the failure to evaluate the reforms also reflects a particular arrogance." | The Testing Charade: Pretending to Make Schools Better [Kindle location 3229] | University of Chicago Press, 2017 | Koretz steadfastly avoids any debate, any evaluation or criticism of his convoluted claims. To hear him tell it, no researcher in the world disagrees with him, or can produce any evidence that counters his assertions. | |||||
Daniel M. Koretz | "I've several times excoriated some of the reformers for assuming that whatever they dreamed up would work well without turning to actual evidence." | The Testing Charade: Pretending to Make Schools Better [Kindle location 3229] | University of Chicago Press, 2017 | ||||||
Daniel M. Koretz | Jennifer L. Jennings | "Data are considered proprietary—a position that the restrictions imposed by the federal Family Educational Rights and Privacy Act (FERPA) have made easier to maintain publicly. Access is usually provided only for research which is not seen as unduly threatening to the leaders’ immediate political agendas. The fact that this last consideration is often openly discussed underscores the lack of a culture of public accountability." | The Misunderstanding and Use of Data from Educational Tests, pp.4-5 | Prepared for Spencer Foundation meetings, Chicago, IL, February 11, 2010. Revised November 21, 2010 | http://www.spencer.org/data-use-and-educational-improvement-initiative-activities/ | ||||
Daniel M. Koretz | Jennifer L. Jennings | "This unwillingness to countenance honest but potentially threatening research garners very little discussion, but in this respect, education is an anomaly. In many areas of public policy, such as drug safety or vehicle safety, there is an expectation that the public is owed honest and impartial evaluation and research. For example, imagine what would have happed if the CEO of Merck had responded to reports of side-effects from Vioxx by saying that allowing access to data was “not our priority at present,” which is a not infrequent response to data requests made to districts or states. In public education, there is no expectation that the public has a right to honest evaluation, and data are seen as the policymakers’ proprietary sandbox, to which they can grant access when it happens to serve their political needs." | The Misunderstanding and Use of Data from Educational Tests, p.5 | Prepared for Spencer Foundation meetings, Chicago, IL, February 11, 2010. Revised November 21, 2010 | http://www.spencer.org/data-use-and-educational-improvement-initiative-activities/ | Koretz steadfastly avoids any debate, any evaluation or criticism of his convoluted claims. To hear him tell it, no researcher in the world disagrees with him, or can produce any evidence that counters his assertions. | |||
Daniel Koretz | "Many of these studies use highly aggregated data—comparisons between states and entire nations—which exacerbates the problem of omitted variables." | Moving Past No Child Left Behind, p.804 | Science 326 (5954), 803-804 | ||||||
Daniel M. Koretz | One sometimes disquieting consequence of the incompleteness of tests is that different tests often provide somewhat inconsistent results. (p. 10) | Measuring up: What educational testing really tells us. | Harvard University Press, 2008 | Google Books | |||||
Daniel M. Koretz | "Even a single test can provide varying results. Just as polls have a margin of error, so do achievement tests. Students who take more than one form of a test typically obtain different scores." (p. 11) | Measuring up: What educational testing really tells us. | Harvard University Press, 2008 | Google Books | |||||
Daniel M. Koretz | "Even well-designed tests will often provide substantially different views of trends because of differences in content and other aspects of the tests' design. . . . [W]e have to be careful not to place too much confidence in detailed findings, such as the precise size of changes over time or of differences between groups." (p. 92) | Measuring up: What educational testing really tells us. | Harvard University Press, 2008 | Google Books | |||||
Daniel M. Koretz | "[O]ne cannot give all the credit or blame to one factor . . . without investigating the impact of others. Many of the complex statistical models used in economics, sociology, epidemiology, and other sciences are efforts to take into account (or 'control' for') other factors that offer plausible alternative explanations of the observed data, and many apportion variation in the outcome-say, test scores-among various possible causes. …A hypothesis is only scientifically credible when the evidence gathered has ruled out plausible alternative explanations." (pp. 122-123) | Measuring up: What educational testing really tells us. | Harvard University Press, 2008 | Google Books | Yet, in his studies test administration and security characteristics are totally left out, as if they could not matter. | ||||
Daniel M. Koretz | "[A] simple correlation need not indicate that one of the factors causes the other." (p. 123) | Measuring up: What educational testing really tells us. | Harvard University Press, 2008 | Google Books | Yet, Koretz rejects decades of experimental evidence on test coaching and, instead, relies on purely correlational, apples and oranges comparisons of unrelated tests. | ||||
Daniel M. Koretz | "Any number of studies have shown the complexity of the non-educational factors that can affect achievement and test scores." (p. 129) | Measuring up: What educational testing really tells us. | Harvard University Press, 2008 | Google Books | |||||
Daniel M. Koretz | "For a test to be even approximately parallel, it has to be so close in content that the effects of inappropriate coaching are likely to generalize to some degree to the new form." | Measuring up: What educational testing really tells us. | Harvard University Press, 2008 | Google Books | Yet, he argues that high-stakes tests should be "audited" by comparing their score trends to those of unrelated no-stakes tests. | ||||
Daniel Koretz | "Certainly, one can find many examples of publications and presentations debating critical issues of test use. Both this issue and the special issue of Applied Measurement in Education described earlier are examples. Nonetheless, I would argue that this debate is too limited, too episodic, and often too far from the field’s primary ongoing concerns to bring us to the level of agreement needed for enforceable standards." | Steps Toward More Effective Implementation of the Standards for Educational and Psychological Testing, p.49 | Educational Measurement: Issues and Practice, Fall 2006 | Yet, he ignores the vast majority of evidence and opinion on educational test use -- a century's and a world's worth of it -- claims it doesn't exist. | |||||
Daniel Koretz | "First, as a field, we can do more to encourage ongoing debate about important issues of test use. We can allocate conference slots to this purpose, sponsor more special issues of journals, and experiment with other publication outlets, e.g., on the web. It is important, however, thatwegenerate not only additional debate about test use, but more protracted debate." | Steps Toward More Effective Implementation of the Standards for Educational and Psychological Testing, p.49 | Educational Measurement: Issues and Practice, Fall 2006 | Yet, he ignores the vast majority of evidence and opinion on educational test use -- a century's and a world's worth of it -- claims it doesn't exist. | |||||
Daniel M. Koretz | George F. Madaus, Edward Haertel, Albert E. Beaton | "… even fairly minor differences between tests can produce fundamental differences in their results (e.g., Beaton and Zwick, 1990; Koretz, 1986) | Congressional Testimony: National Educational Standards and Testing: A Response to the Recommendations of the National Council on Education Standards and Testing, 1992 | RAND Institute on Education and Training | https://www.rand.org/pubs/testimonies/CT100.html | ||||
Cite selves or colleagues in the group, but dismiss or denigrate all other work | |||||||||
Falsely claim that research has only recently been done on topic. | |||||||||
Author cites (and accepts as fact without checking) someone else's dismissive review | |||||||||
* Cannell, J.J. (1987). Nationally Normed Elementary Achievement Testing in America's Public Schools: How All Fifty States are Above the National Average, Daniels, WV: Friends for Education; Cannell, J.J. (1989). How Public Educators Cheat on Standardized Achievement Tests: The “Lake Wobegon” Report. Albuquerque, NM: Friends for Education. | https://nonpartisaneducation.org/Review/Books/CannellBook1.htm |