HOME: Dismissive Reviews in Education Policy Research | |||||||||||
Author | Co-author(s) | Dismissive Quote | type | Title | Source | Link1 | Funder | Notes | |||
1 | Jennifer
L. Steele, Matthew W. Lewis, Lucrecia Santibañez, Susannah Faxon-Mills, Mollie Rudnick, Brian M. Stecher, Laura S. Hamilton |
"Despite taking on considerable momentum in the field, competency-based systems have not been extensively researched." p.2 | Dismissive | Competency-Based Education in Three Pilot Programs Examining Implementation and Outcomes | Rand Education, 2014 | https://www.rand.org/content/dam/rand/pubs/research_reports/RR700/RR732/RAND_RR732.pdf | "The research described in this report was sponsored by the Bill & Melinda Gates Foundation" | ||||
2 | Jennifer
L. Steele, Matthew W. Lewis, Lucrecia Santibañez, Susannah Faxon-Mills, Mollie Rudnick, Brian M. Stecher, Laura S. Hamilton |
"Recent studies have described the experiences of educators working to undertake competency-based reforms or have highlighted promising models, but these studies have not systematically examined the effects of these models on student learning or persistence." p.2 | Denigrating | Competency-Based Education in Three Pilot Programs Examining Implementation and Outcomes | Rand Education, 2014 | https://www.rand.org/content/dam/rand/pubs/research_reports/RR700/RR732/RAND_RR732.pdf | "The research described in this report was sponsored by the Bill & Melinda Gates Foundation" | ||||
3 | Jennifer
L. Steele, Matthew W. Lewis, Lucrecia Santibañez, Susannah Faxon-Mills, Mollie Rudnick, Brian M. Stecher, Laura S. Hamilton |
"… there are no studies that would allow us to attribute outperformance to the competency-based education systems alone," p.2 | Dismissive | Competency-Based Education in Three Pilot Programs Examining Implementation and Outcomes | Rand Education, 2014 | https://www.rand.org/content/dam/rand/pubs/research_reports/RR700/RR732/RAND_RR732.pdf | "The research described in this report was sponsored by the Bill & Melinda Gates Foundation" | ||||
4 | Jennifer
L. Steele, Matthew W. Lewis, Lucrecia Santibañez, Susannah Faxon-Mills, Mollie Rudnick, Brian M. Stecher, Laura S. Hamilton |
"Because it is one of the first studies we are aware of since the late 1980s that has attempted to estimate the impact of competency-based models on students’ academic outcomes," p.4 | 1stness | Competency-Based Education in Three Pilot Programs Examining Implementation and Outcomes | Rand Education, 2014 | https://www.rand.org/content/dam/rand/pubs/research_reports/RR700/RR732/RAND_RR732.pdf | "The research described in this report was sponsored by the Bill & Melinda Gates Foundation" | ||||
5 | Jennifer
L. Steele, Matthew W. Lewis, Lucrecia Santibañez, Susannah Faxon-Mills, Mollie Rudnick, Brian M. Stecher, Laura S. Hamilton |
"In part, the lack of recent research on competency-based education may be due to variability around the concept of competency-based education itself." p.10 | Dismissive | Competency-Based Education in Three Pilot Programs Examining Implementation and Outcomes | Rand Education, 2014 | https://www.rand.org/content/dam/rand/pubs/research_reports/RR700/RR732/RAND_RR732.pdf | "The research described in this report was sponsored by the Bill & Melinda Gates Foundation" | ||||
6 | Susannah Faxon-Mills, Laura S. Hamilton, Mollie Rudnick, Brian M. Stecher | "In particular, there is still much to learn about how changes in testing might influence the education system and how tests of deeper content and more complex skills and processes could best be used to promote the Foundation’s goals for deeper learning." p.1 | Dismissive | New Assessments, Better Instruction? Designing Assessment Systems to Promote Instructional Improvement | Rand Corporation Research Report, 2013 | "Funding to support the research was provided by the William and Flora Hewlett Foundation." "Marc Chun at the Hewlett Foundation first approached us about reviewing the literature on the impact of assessment, and he was very helpful in framing this investigation." | |||||
7 | Susannah Faxon-Mills, Laura S. Hamilton, Mollie Rudnick, Brian M. Stecher | "Given the gaps in evidence regarding the link between testing and student outcomes … " p.1 | Dismissive | New Assessments, Better Instruction? Designing Assessment Systems to Promote Instructional Improvement | Rand Corporation Research Report, 2013 | "Funding to support the research was provided by the William and Flora Hewlett Foundation." "Marc Chun at the Hewlett Foundation first approached us about reviewing the literature on the impact of assessment, and he was very helpful in framing this investigation." | |||||
8 | Susannah Faxon-Mills, Laura S. Hamilton, Mollie Rudnick, Brian M. Stecher | "The first step for each of these research areas was to identify relevant material from previous literature reviews on these topics, including those conducted by RAND researchers (e.g., Hamilton, Stecher, and Klein, 2002; Hamilton, 2003; Stecher, 2010) and by the National Research Council (e.g., Koenig, 2011). p.5 | Dismissive | New Assessments, Better Instruction? Designing Assessment Systems to Promote Instructional Improvement | Rand Corporation Research Report, 2013 | "Funding to support the research was provided by the William and Flora Hewlett Foundation." "Marc Chun at the Hewlett Foundation first approached us about reviewing the literature on the impact of assessment, and he was very helpful in framing this investigation." | |||||
9 | Susannah Faxon-Mills, Laura S. Hamilton, Mollie Rudnick, Brian M. Stecher | "… we paid particular attention to sources from the past ten years, since these studies were less likely to have been included in previous literature reviews." p.5 | Dismissive | New Assessments, Better Instruction? Designing Assessment Systems to Promote Instructional Improvement | Rand Corporation Research Report, 2013 | "Funding to support the research was provided by the William and Flora Hewlett Foundation." "Marc Chun at the Hewlett Foundation first approached us about reviewing the literature on the impact of assessment, and he was very helpful in framing this investigation." | |||||
10 | Susannah Faxon-Mills, Laura S. Hamilton, Mollie Rudnick, Brian M. Stecher | "Time and resource constraints limited the extent of our literature reviews, but we do not think this had a serious effect on our findings. Most importantly, we included all the clearly relevant studies from major sources that were available for electronic searching. In addition, many of the studies we reviewed also included comprehensive reviews of other literature, leading to fairly wide coverage of each body of literature." p.8 | Dismissive | New Assessments, Better Instruction? Designing Assessment Systems to Promote Instructional Improvement | Rand Corporation Research Report, 2013 | "Funding to support the research was provided by the William and Flora Hewlett Foundation." "Marc Chun at the Hewlett Foundation first approached us about reviewing the literature on the impact of assessment, and he was very helpful in framing this investigation." | |||||
11 | Susannah Faxon-Mills, Laura S. Hamilton, Mollie Rudnick, Brian M. Stecher | "However, the amount of research on test attributes is limited, and the research has been conducted in a wide variety of contexts involving a wide variety of tests. Thus, while the findings are interesting, few have been replicated." p.22 | Dismissive | New Assessments, Better Instruction? Designing Assessment Systems to Promote Instructional Improvement | Rand Corporation Research Report, 2013 | "Funding to support the research was provided by the William and Flora Hewlett Foundation." "Marc Chun at the Hewlett Foundation first approached us about reviewing the literature on the impact of assessment, and he was very helpful in framing this investigation." | |||||
12 | Susannah Faxon-Mills, Laura S. Hamilton, Mollie Rudnick, Brian M. Stecher | "It is important to recognize that the literature on how school characteristics, such as urbanicity and governance, affect educators’ responses to testing is sparse." p.29 | Dismissive | New Assessments, Better Instruction? Designing Assessment Systems to Promote Instructional Improvement | Rand Corporation Research Report, 2013 | "Funding to support the research was provided by the William and Flora Hewlett Foundation." "Marc Chun at the Hewlett Foundation first approached us about reviewing the literature on the impact of assessment, and he was very helpful in framing this investigation." | |||||
13 | Susannah Faxon-Mills, Laura S. Hamilton, Mollie Rudnick, Brian M. Stecher | "… there is little empirical evidence that provides guidance on the amount and types of professional development that would promote constructive responses to assessment. | Dismissive | New Assessments, Better Instruction? Designing Assessment Systems to Promote Instructional Improvement | Rand Corporation Research Report, 2013 | "Funding to support the research was provided by the William and Flora Hewlett Foundation." "Marc Chun at the Hewlett Foundation first approached us about reviewing the literature on the impact of assessment, and he was very helpful in framing this investigation." | |||||
14 | Laura S. Hamilton | Brian M. Stecher, Kun Yuan | "He also noted that virtually all of the arguments, both for and against standards, are based on beliefs and hypotheses rather than on direct empirical evidence” (p. 427) Although a large and growing body of research has been conducted to examine the effects of SBA, the caution Porter expressed in 1994 about the lack of empirical evidence remains relevant today." pp.157-158 | Denigrating | Standards-Based Accountability in the United States: Lessons Learned and Future Directions | Education Inquiry, 3(2), June 2012, 149-170 | https://www.academia.edu/15201890/Standards_Based_Accountability_in_the_United_States_Lessons_Learned_and_Future_Directions_1 | Relevant pre-2000 studies of the effects of standards, alignment, goal setting, setting reachable goals, etc. include those of Mitchell (1999); Morgan & Ramist (1998); the *Southern Regional Education Board (1998); Miles, Bishop, Collins, Fink, Gardner, Grant, Hussain, et al. (1997); the Florida Office of Program Policy Analysis (1997); Pomplun (1997); Schmoker (1996); Aguilera & Hendricks (1996); Banta, Lund, Black & Oblander (1996); Bottoms & Mikos (1995); *Bamburg & Medina (1993); Bishop (1993); the U. S. General Accounting Office (1993); Eckstein & Noah (1993); Mattsson (1993); Brown (1992); Heyneman & Ransom (1992); Whetton (1992); Anderson, Muir, Bateson, Blackmore & Rogers (1990); Csikszentmihalyi (1990); *Levine & Lezotte (1990); LaRoque & Coleman (1989); Hillocks (1987); Willingham & Morris (1986); Resnick & Resnick (1985); Ogle & Fritts (1984); *Natriello & Dornbusch (1984); Brooke & Oxenham (1984); Rentz (1979); Wellisch, MacQueen, Carriere & Dick (1978); *Rosswork (1977); Estes, Colvin & Goodwin (1976); Wood (1953); and Panlasigui & Knight (1930). | |||
15 | Laura S. Hamilton | Brian M. Stecher, Kun Yuan | "High-quality research on the effects of SBA is difficult to conduct for a number of reasons,…." p.158 | Dismissive | Standards-Based Accountability in the United States: Lessons Learned and Future Directions | Education Inquiry, 3(2), June 2012, 149-170 | https://www.academia.edu/15201890/Standards_Based_Accountability_in_the_United_States_Lessons_Learned_and_Future_Directions_1 | Access to anonymized student data is granted all the time. Externally administered high-stakes testing is widely reviled among US educationists. It strains credulity that one can not find one or a few districts out of the many thousands to cooperate in a study to discredit testing. | |||
16 | Laura S. Hamilton | Brian M. Stecher, Kun Yuan | "Even when the necessary data have been collected by states or other entities, it is often difficult for researchers to obtain these data because those responsible for the data refuse to grant access, either because of concerns about confidentiality or because they are not interested in having their programmes scrutinised by. researchers. Thus, the amount of rigorous analysis is limited." p.158 | Dismissive | Standards-Based Accountability in the United States: Lessons Learned and Future Directions | Education Inquiry, 3(2), June 2012, 149-170 | https://www.academia.edu/15201890/Standards_Based_Accountability_in_the_United_States_Lessons_Learned_and_Future_Directions_1 | Access to anonymized student data is granted all the time. Externally administered high-stakes testing is widely reviled among US educationists. It strains credulity that one can not find one or a few districts out of the many thousands to cooperate in a study to discredit testing. | |||
17 | Laura S. Hamilton | Brian M. Stecher, Kun Yuan | "These evaluation findings reveal the challenges inherent in trying to judge the quality of standards. Arguably the most important test of quality is whether the standards promote high-quality instruction and improved student learning but, as we discuss later, there is very little research to address that question." p.158 | Dismissive | Standards-Based Accountability in the United States: Lessons Learned and Future Directions | Education Inquiry, 3(2), June 2012, 149-170 | https://www.academia.edu/15201890/Standards_Based_Accountability_in_the_United_States_Lessons_Learned_and_Future_Directions_1 | Relevant pre-2000 studies of the effects of standards, alignment, goal setting, setting reachable goals, etc. include those of Mitchell (1999); Morgan & Ramist (1998); the *Southern Regional Education Board (1998); Miles, Bishop, Collins, Fink, Gardner, Grant, Hussain, et al. (1997); the Florida Office of Program Policy Analysis (1997); Pomplun (1997); Schmoker (1996); Aguilera & Hendricks (1996); Banta, Lund, Black & Oblander (1996); Bottoms & Mikos (1995); *Bamburg & Medina (1993); Bishop (1993); the U. S. General Accounting Office (1993); Eckstein & Noah (1993); Mattsson (1993); Brown (1992); Heyneman & Ransom (1992); Whetton (1992); Anderson, Muir, Bateson, Blackmore & Rogers (1990); Csikszentmihalyi (1990); *Levine & Lezotte (1990); LaRoque & Coleman (1989); Hillocks (1987); Willingham & Morris (1986); Resnick & Resnick (1985); Ogle & Fritts (1984); *Natriello & Dornbusch (1984); Brooke & Oxenham (1984); Rentz (1979); Wellisch, MacQueen, Carriere & Dick (1978); *Rosswork (1977); Estes, Colvin & Goodwin (1976); Wood (1953); and Panlasigui & Knight (1930). | |||
18 | Laura S. Hamilton | Brian M. Stecher, Kun Yuan | "In fact, the bulk of research relevant to SBA has focused on the links between high-stakes tests and educators’ practices rather than standards and practices." p.159 | Dismissive | Standards-Based Accountability in the United States: Lessons Learned and Future Directions | Education Inquiry, 3(2), June 2012, 149-170 | https://www.academia.edu/15201890/Standards_Based_Accountability_in_the_United_States_Lessons_Learned_and_Future_Directions_1 | Relevant pre-2000 studies of the effects of standards, alignment, goal setting, setting reachable goals, etc. include those of Mitchell (1999); Morgan & Ramist (1998); the *Southern Regional Education Board (1998); Miles, Bishop, Collins, Fink, Gardner, Grant, Hussain, et al. (1997); the Florida Office of Program Policy Analysis (1997); Pomplun (1997); Schmoker (1996); Aguilera & Hendricks (1996); Banta, Lund, Black & Oblander (1996); Bottoms & Mikos (1995); *Bamburg & Medina (1993); Bishop (1993); the U. S. General Accounting Office (1993); Eckstein & Noah (1993); Mattsson (1993); Brown (1992); Heyneman & Ransom (1992); Whetton (1992); Anderson, Muir, Bateson, Blackmore & Rogers (1990); Csikszentmihalyi (1990); *Levine & Lezotte (1990); LaRoque & Coleman (1989); Hillocks (1987); Willingham & Morris (1986); Resnick & Resnick (1985); Ogle & Fritts (1984); *Natriello & Dornbusch (1984); Brooke & Oxenham (1984); Rentz (1979); Wellisch, MacQueen, Carriere & Dick (1978); *Rosswork (1977); Estes, Colvin & Goodwin (1976); Wood (1953); and Panlasigui & Knight (1930). | |||
19 | Laura S. Hamilton | Brian M. Stecher, Kun Yuan | "The existing evidence does not provide definitive guidance regarding the SBA system features that would be most likely to promote desirable outcomes." p.163 | Dismissive | Standards-Based Accountability in the United States: Lessons Learned and Future Directions | Education Inquiry, 3(2), June 2012, 149-170 | https://www.academia.edu/15201890/Standards_Based_Accountability_in_the_United_States_Lessons_Learned_and_Future_Directions_1 | Relevant pre-2000 studies of the effects of standards, alignment, goal setting, setting reachable goals, etc. include those of Mitchell (1999); Morgan & Ramist (1998); the *Southern Regional Education Board (1998); Miles, Bishop, Collins, Fink, Gardner, Grant, Hussain, et al. (1997); the Florida Office of Program Policy Analysis (1997); Pomplun (1997); Schmoker (1996); Aguilera & Hendricks (1996); Banta, Lund, Black & Oblander (1996); Bottoms & Mikos (1995); *Bamburg & Medina (1993); Bishop (1993); the U. S. General Accounting Office (1993); Eckstein & Noah (1993); Mattsson (1993); Brown (1992); Heyneman & Ransom (1992); Whetton (1992); Anderson, Muir, Bateson, Blackmore & Rogers (1990); Csikszentmihalyi (1990); *Levine & Lezotte (1990); LaRoque & Coleman (1989); Hillocks (1987); Willingham & Morris (1986); Resnick & Resnick (1985); Ogle & Fritts (1984); *Natriello & Dornbusch (1984); Brooke & Oxenham (1984); Rentz (1979); Wellisch, MacQueen, Carriere & Dick (1978); *Rosswork (1977); Estes, Colvin & Goodwin (1976); Wood (1953); and Panlasigui & Knight (1930). | |||
20 | Laura S. Hamilton | "Despite the widespread enthusiasm for assessment-based reforms, many of the current and proposed uses of large-scale assessments are based on unverified assumptions about the extent to which they will actually lead to improved teaching and learning, and insufficient attention has been paid to the characteristics of assessment programs that are likely to promote desired outcomes." | Denigrating | Testing What Has Been Taught, p.47 | American Educator, Winter 2010-2011 | https://www.aft.org/sites/default/files/periodicals/Hamilton.pdf | Relevant studies of the effects of varying types of incentive or the optimal structure of testing prorgrams include those of Kelley (1999); the *Southern Regional Education Board (1998); Trelfa (1998); Heneman (1998); Banta, Lund, Black & Oblander (1996); Brooks-Cooper, 1993; Eckstein & Noah (1993); Richards & Shen (1992); Jacobson (1992); Heyneman & Ransom (1992); *Levine & Lezotte (1990); Duran, 1989; *Crooks (1988); *Kulik & Kulik (1987); Corcoran & Wilson (1986); *Guskey & Gates (1986); Brook & Oxenham (1985); Oxenham (1984); Venezky & Winfield (1979); Brookover & Lezotte (1979); McMillan (1977); Abbott (1977); *Staats (1973); *Kazdin & Bootzin (1972); *O’Leary & Drabman (1971); Cronbach (1960); Hurlock (1925), and Zeng (2001). *Covers many studies; study is a research review, research synthesis, or meta-analysis. Other researchers who, even prior to 2000, studied test-based incentive programs include Homme, Csanyi, Gonzales, Rechs, O’Leary, Drabman, Kaszdin, Bootzin, Staats, Cameron, Pierce, McMillan, Corcoran, Roueche, Kirk, Wheeler, Boylan, and Wilson. | "Others have considered the role of tests in incentive
programs. These researchers have
included Homme, Csanyi, Gonzales, Rechs, O’Leary, Drabman, Kaszdin, Bootzin,
Staats, Cameron, Pierce, McMillan, Corcoran, and Wilson. International
organizations, such as the World Bank or the Asian Development Bank, have
studied the effects of testing on education programs they sponsor. Researchers have included Somerset,
Heynemann, Ransom, Psacharopoulis, Velez, Brooke, Oxenham, Bude, Chapman,
Snyder, and Pronaratna. Moreover, the mastery learning/mastery testing experiments conducted from the 1960s through today varied incentives, frequency of tests, types of tests, and many other factors to determine the optimal structure of testing programs. Researchers included such notables as Bloom, Carroll, Keller, Block, Burns, Wentling, Anderson, Hymel, Kulik, Tierney, Cross, Okey, Guskey, Gates, and Jones." |
|||
21 | Laura S. Hamilton | "Can assessments meaningfully be aligned to standards, … What would the key features of an assessment system designed to increase student learning and improve instruction be? While current assessment knowledge is not sufficient to fully answer these questions, in this article I offer an overview of what is known and several suggestions for improving our approach to assessment." | Denigrating | Testing What Has Been Taught, p.47 | American Educator, Winter 2010-2011 | https://www.aft.org/sites/default/files/periodicals/Hamilton.pdf | Relevant studies of the effects of varying types of incentive or the optimal structure of testing prorgrams include those of Kelley (1999); the *Southern Regional Education Board (1998); Trelfa (1998); Heneman (1998); Banta, Lund, Black & Oblander (1996); Brooks-Cooper, 1993; Eckstein & Noah (1993); Richards & Shen (1992); Jacobson (1992); Heyneman & Ransom (1992); *Levine & Lezotte (1990); Duran, 1989; *Crooks (1988); *Kulik & Kulik (1987); Corcoran & Wilson (1986); *Guskey & Gates (1986); Brook & Oxenham (1985); Oxenham (1984); Venezky & Winfield (1979); Brookover & Lezotte (1979); McMillan (1977); Abbott (1977); *Staats (1973); *Kazdin & Bootzin (1972); *O’Leary & Drabman (1971); Cronbach (1960); Hurlock (1925), and Zeng (2001). *Covers many studies; study is a research review, research synthesis, or meta-analysis. Other researchers who, even prior to 2000, studied test-based incentive programs include Homme, Csanyi, Gonzales, Rechs, O’Leary, Drabman, Kaszdin, Bootzin, Staats, Cameron, Pierce, McMillan, Corcoran, Roueche, Kirk, Wheeler, Boylan, and Wilson. | "Others have considered the role of tests in incentive
programs. These researchers have
included Homme, Csanyi, Gonzales, Rechs, O’Leary, Drabman, Kaszdin, Bootzin,
Staats, Cameron, Pierce, McMillan, Corcoran, and Wilson. International
organizations, such as the World Bank or the Asian Development Bank, have
studied the effects of testing on education programs they sponsor. Researchers have included Somerset,
Heynemann, Ransom, Psacharopoulis, Velez, Brooke, Oxenham, Bude, Chapman,
Snyder, and Pronaratna. Moreover, the mastery learning/mastery testing experiments conducted from the 1960s through today varied incentives, frequency of tests, types of tests, and many other factors to determine the optimal structure of testing programs. Researchers included such notables as Bloom, Carroll, Keller, Block, Burns, Wentling, Anderson, Hymel, Kulik, Tierney, Cross, Okey, Guskey, Gates, and Jones." |
|||
22 | Laura S. Hamilton | "There is no research evidence to tell us definitively how to build an assessment system that will promote student learning and be resistent to the negative consequences that are common in high-stakes testing programs." | Dismissive | Testing What Has Been Taught, p.49 | American Educator, Winter 2010-2011 | https://www.aft.org/sites/default/files/periodicals/Hamilton.pdf | Relevant studies of the effects of varying types of incentive or the optimal structure of testing prorgrams include those of Kelley (1999); the *Southern Regional Education Board (1998); Trelfa (1998); Heneman (1998); Banta, Lund, Black & Oblander (1996); Brooks-Cooper, 1993; Eckstein & Noah (1993); Richards & Shen (1992); Jacobson (1992); Heyneman & Ransom (1992); *Levine & Lezotte (1990); Duran, 1989; *Crooks (1988); *Kulik & Kulik (1987); Corcoran & Wilson (1986); *Guskey & Gates (1986); Brook & Oxenham (1985); Oxenham (1984); Venezky & Winfield (1979); Brookover & Lezotte (1979); McMillan (1977); Abbott (1977); *Staats (1973); *Kazdin & Bootzin (1972); *O’Leary & Drabman (1971); Cronbach (1960); Hurlock (1925), and Zeng (2001). *Covers many studies; study is a research review, research synthesis, or meta-analysis. Other researchers who, even prior to 2000, studied test-based incentive programs include Homme, Csanyi, Gonzales, Rechs, O’Leary, Drabman, Kaszdin, Bootzin, Staats, Cameron, Pierce, McMillan, Corcoran, Roueche, Kirk, Wheeler, Boylan, and Wilson. | "Others have considered the role of tests in incentive
programs. These researchers have
included Homme, Csanyi, Gonzales, Rechs, O’Leary, Drabman, Kaszdin, Bootzin,
Staats, Cameron, Pierce, McMillan, Corcoran, and Wilson. International
organizations, such as the World Bank or the Asian Development Bank, have
studied the effects of testing on education programs they sponsor. Researchers have included Somerset,
Heynemann, Ransom, Psacharopoulis, Velez, Brooke, Oxenham, Bude, Chapman,
Snyder, and Pronaratna. Moreover, the mastery learning/mastery testing experiments conducted from the 1960s through today varied incentives, frequency of tests, types of tests, and many other factors to determine the optimal structure of testing programs. Researchers included such notables as Bloom, Carroll, Keller, Block, Burns, Wentling, Anderson, Hymel, Kulik, Tierney, Cross, Okey, Guskey, Gates, and Jones." |
|||
23 | Laura S. Hamilton | “Research on the effects of various assessment-design features is limited, so any effort that relies heavily on assessment as a tool for school improvement should be carried out with caution." | Denigrating | Testing What Has Been Taught, p.50 | American Educator, Winter 2010-2011 | https://www.aft.org/sites/default/files/periodicals/Hamilton.pdf | Relevant studies of the effects of varying types of incentive or the optimal structure of testing prorgrams include those of Kelley (1999); the *Southern Regional Education Board (1998); Trelfa (1998); Heneman (1998); Banta, Lund, Black & Oblander (1996); Brooks-Cooper, 1993; Eckstein & Noah (1993); Richards & Shen (1992); Jacobson (1992); Heyneman & Ransom (1992); *Levine & Lezotte (1990); Duran, 1989; *Crooks (1988); *Kulik & Kulik (1987); Corcoran & Wilson (1986); *Guskey & Gates (1986); Brook & Oxenham (1985); Oxenham (1984); Venezky & Winfield (1979); Brookover & Lezotte (1979); McMillan (1977); Abbott (1977); *Staats (1973); *Kazdin & Bootzin (1972); *O’Leary & Drabman (1971); Cronbach (1960); Hurlock (1925), and Zeng (2001). *Covers many studies; study is a research review, research synthesis, or meta-analysis. Other researchers who, even prior to 2000, studied test-based incentive programs include Homme, Csanyi, Gonzales, Rechs, O’Leary, Drabman, Kaszdin, Bootzin, Staats, Cameron, Pierce, McMillan, Corcoran, Roueche, Kirk, Wheeler, Boylan, and Wilson. | "Others have considered the role of tests in incentive
programs. These researchers have
included Homme, Csanyi, Gonzales, Rechs, O’Leary, Drabman, Kaszdin, Bootzin,
Staats, Cameron, Pierce, McMillan, Corcoran, and Wilson. International
organizations, such as the World Bank or the Asian Development Bank, have
studied the effects of testing on education programs they sponsor. Researchers have included Somerset,
Heynemann, Ransom, Psacharopoulis, Velez, Brooke, Oxenham, Bude, Chapman,
Snyder, and Pronaratna. Moreover, the mastery learning/mastery testing experiments conducted from the 1960s through today varied incentives, frequency of tests, types of tests, and many other factors to determine the optimal structure of testing programs. Researchers included such notables as Bloom, Carroll, Keller, Block, Burns, Wentling, Anderson, Hymel, Kulik, Tierney, Cross, Okey, Guskey, Gates, and Jones." |
|||
24 | Laura S. Hamilton | Brian M. Stecher, Kun Yuan | “A few studies have attempted to examine how the creation and publication of standards, per se, have affected practices.” p. 3 | Dismissive | Standards-Based Reform in the United States: History, Research, and Future Directions | Center on Education Policy, December, 2008 | http://www.rand.org/content/dam/rand/pubs/reprints/2009/RAND_RP1384.pdf | "This work was supported by the National Science Foundation under Grant No. REC-0228295." | Relevant pre-2000 studies of the effects of standards, alignment, goal setting, setting reachable goals, etc. include those of Mitchell (1999); Morgan & Ramist (1998); the *Southern Regional Education Board (1998); Miles, Bishop, Collins, Fink, Gardner, Grant, Hussain, et al. (1997); the Florida Office of Program Policy Analysis (1997); Pomplun (1997); Schmoker (1996); Aguilera & Hendricks (1996); Banta, Lund, Black & Oblander (1996); Bottoms & Mikos (1995); *Bamburg & Medina (1993); Bishop (1993); the U. S. General Accounting Office (1993); Eckstein & Noah (1993); Mattsson (1993); Brown (1992); Heyneman & Ransom (1992); Whetton (1992); Anderson, Muir, Bateson, Blackmore & Rogers (1990); Csikszentmihalyi (1990); *Levine & Lezotte (1990); LaRoque & Coleman (1989); Hillocks (1987); Willingham & Morris (1986); Resnick & Resnick (1985); Ogle & Fritts (1984); *Natriello & Dornbusch (1984); Brooke & Oxenham (1984); Rentz (1979); Wellisch, MacQueen, Carriere & Dick (1978); *Rosswork (1977); Estes, Colvin & Goodwin (1976); Wood (1953); and Panlasigui & Knight (1930). | ||
25 | Laura S. Hamilton | Brian M. Stecher, Kun Yuan | “The research evidence does not provide definitive answers to these questions.” p. 6 | Denigrating | Standards-Based Reform in the United States: History, Research, and Future Directions | Center on Education Policy, December, 2008 | http://www.rand.org/content/dam/rand/pubs/reprints/2009/RAND_RP1384.pdf | "This work was supported by the National Science Foundation under Grant No. REC-0228295." | Relevant pre-2000 studies of the effects of standards, alignment, goal setting, setting reachable goals, etc. include those of Mitchell (1999); Morgan & Ramist (1998); the *Southern Regional Education Board (1998); Miles, Bishop, Collins, Fink, Gardner, Grant, Hussain, et al. (1997); the Florida Office of Program Policy Analysis (1997); Pomplun (1997); Schmoker (1996); Aguilera & Hendricks (1996); Banta, Lund, Black & Oblander (1996); Bottoms & Mikos (1995); *Bamburg & Medina (1993); Bishop (1993); the U. S. General Accounting Office (1993); Eckstein & Noah (1993); Mattsson (1993); Brown (1992); Heyneman & Ransom (1992); Whetton (1992); Anderson, Muir, Bateson, Blackmore & Rogers (1990); Csikszentmihalyi (1990); *Levine & Lezotte (1990); LaRoque & Coleman (1989); Hillocks (1987); Willingham & Morris (1986); Resnick & Resnick (1985); Ogle & Fritts (1984); *Natriello & Dornbusch (1984); Brooke & Oxenham (1984); Rentz (1979); Wellisch, MacQueen, Carriere & Dick (1978); *Rosswork (1977); Estes, Colvin & Goodwin (1976); Wood (1953); and Panlasigui & Knight (1930). | ||
26 | Laura S. Hamilton | Brian M. Stecher, Kun Yuan | “He [Poynter 1994] also noted that ‘virtually all of the arguments, both for and against standards, are based on beliefs and hypotheses rather than on direct empirical evidence’ (p. 427). Although a large and growing body of research has been conducted to examine the effects of SBR, the caution Poynter expressed in 1994 about the lack of empirical evidence remains relevant today.” pp. 34-35 | Dismissive | Standards-Based Reform in the United States: History, Research, and Future Directions | Center on Education Policy, December, 2008 | http://www.rand.org/content/dam/rand/pubs/reprints/2009/RAND_RP1384.pdf | "This work was supported by the National Science Foundation under Grant No. REC-0228295." | Relevant pre-2000 studies of the effects of standards, alignment, goal setting, setting reachable goals, etc. include those of Mitchell (1999); Morgan & Ramist (1998); the *Southern Regional Education Board (1998); Miles, Bishop, Collins, Fink, Gardner, Grant, Hussain, et al. (1997); the Florida Office of Program Policy Analysis (1997); Pomplun (1997); Schmoker (1996); Aguilera & Hendricks (1996); Banta, Lund, Black & Oblander (1996); Bottoms & Mikos (1995); *Bamburg & Medina (1993); Bishop (1993); the U. S. General Accounting Office (1993); Eckstein & Noah (1993); Mattsson (1993); Brown (1992); Heyneman & Ransom (1992); Whetton (1992); Anderson, Muir, Bateson, Blackmore & Rogers (1990); Csikszentmihalyi (1990); *Levine & Lezotte (1990); LaRoque & Coleman (1989); Hillocks (1987); Willingham & Morris (1986); Resnick & Resnick (1985); Ogle & Fritts (1984); *Natriello & Dornbusch (1984); Brooke & Oxenham (1984); Rentz (1979); Wellisch, MacQueen, Carriere & Dick (1978); *Rosswork (1977); Estes, Colvin & Goodwin (1976); Wood (1953); and Panlasigui & Knight (1930). | ||
27 | Laura S. Hamilton | Brian M. Stecher, Kun Yuan | “Arguably the most important test of quality is whether the standards promote high-quality instruction and improved student learning, but as we discuss later, there is very little research to address that question.” p. 37 | Dismissive | Standards-Based Reform in the United States: History, Research, and Future Directions | Center on Education Policy, December, 2008 | http://www.rand.org/content/dam/rand/pubs/reprints/2009/RAND_RP1384.pdf | "This work was supported by the National Science Foundation under Grant No. REC-0228295." | Relevant pre-2000 studies of the effects of standards, alignment, goal setting, setting reachable goals, etc. include those of Mitchell (1999); Morgan & Ramist (1998); the *Southern Regional Education Board (1998); Miles, Bishop, Collins, Fink, Gardner, Grant, Hussain, et al. (1997); the Florida Office of Program Policy Analysis (1997); Pomplun (1997); Schmoker (1996); Aguilera & Hendricks (1996); Banta, Lund, Black & Oblander (1996); Bottoms & Mikos (1995); *Bamburg & Medina (1993); Bishop (1993); the U. S. General Accounting Office (1993); Eckstein & Noah (1993); Mattsson (1993); Brown (1992); Heyneman & Ransom (1992); Whetton (1992); Anderson, Muir, Bateson, Blackmore & Rogers (1990); Csikszentmihalyi (1990); *Levine & Lezotte (1990); LaRoque & Coleman (1989); Hillocks (1987); Willingham & Morris (1986); Resnick & Resnick (1985); Ogle & Fritts (1984); *Natriello & Dornbusch (1984); Brooke & Oxenham (1984); Rentz (1979); Wellisch, MacQueen, Carriere & Dick (1978); *Rosswork (1977); Estes, Colvin & Goodwin (1976); Wood (1953); and Panlasigui & Knight (1930). | ||
28 | Laura S. Hamilton | Brian M. Stecher, Kun Yuan | “[T]here have been a few studies of SBR as a comprehensive system. . . . [T]here is some research on how the adoption of standards, per se, or the alignment of standards with curriculum influences school practices or student outcomes.” p. 38 | Dismissive | Standards-Based Reform in the United States: History, Research, and Future Directions | Center on Education Policy, December, 2008 | http://www.rand.org/content/dam/rand/pubs/reprints/2009/RAND_RP1384.pdf | "This work was supported by the National Science Foundation under Grant No. REC-0228295." | Relevant studies of the effects of varying types of incentive or the optimal structure of testing prorgrams include those of Kelley (1999); the *Southern Regional Education Board (1998); Trelfa (1998); Heneman (1998); Banta, Lund, Black & Oblander (1996); Brooks-Cooper, 1993; Eckstein & Noah (1993); Richards & Shen (1992); Jacobson (1992); Heyneman & Ransom (1992); *Levine & Lezotte (1990); Duran, 1989; *Crooks (1988); *Kulik & Kulik (1987); Corcoran & Wilson (1986); *Guskey & Gates (1986); Brook & Oxenham (1985); Oxenham (1984); Venezky & Winfield (1979); Brookover & Lezotte (1979); McMillan (1977); Abbott (1977); *Staats (1973); *Kazdin & Bootzin (1972); *O’Leary & Drabman (1971); Cronbach (1960); Hurlock (1925), and Zeng (2001). *Covers many studies; study is a research review, research synthesis, or meta-analysis. Other researchers who, even prior to 2000, studied test-based incentive programs include Homme, Csanyi, Gonzales, Rechs, O’Leary, Drabman, Kaszdin, Bootzin, Staats, Cameron, Pierce, McMillan, Corcoran, Roueche, Kirk, Wheeler, Boylan, and Wilson. | "Others have considered the role of tests in incentive
programs. These researchers have
included Homme, Csanyi, Gonzales, Rechs, O’Leary, Drabman, Kaszdin, Bootzin,
Staats, Cameron, Pierce, McMillan, Corcoran, and Wilson. International
organizations, such as the World Bank or the Asian Development Bank, have
studied the effects of testing on education programs they sponsor. Researchers have included Somerset,
Heynemann, Ransom, Psacharopoulis, Velez, Brooke, Oxenham, Bude, Chapman,
Snyder, and Pronaratna. Moreover, the mastery learning/mastery testing experiments conducted from the 1960s through today varied incentives, frequency of tests, types of tests, and many other factors to determine the optimal structure of testing programs. Researchers included such notables as Bloom, Carroll, Keller, Block, Burns, Wentling, Anderson, Hymel, Kulik, Tierney, Cross, Okey, Guskey, Gates, and Jones." |
"What about: Brooks-Cooper, C. (1993), Brown, S. M. & Walberg, H. J. (1993), Heneman, H. G., III. (1998), Hurlock, E. B. (1925), Jones, J. et al. (1996), Kazdin, A. & Bootzin, R. (1972), Kelley, C. (1999), Kirkpatrick, J. E. (1934), O’Leary, K. D. & Drabman, R. (1971), Palmer, J. S. (2002), Richards, C. E. & Shen, T. M. (1992), .Rosswork, S. G. (1977), Staats, A. (1973), Tuckman, B. W. (1994), Tuckman, B. W. & Trimble, S. (1997), Webster, W. J., Mendro, R. L., Orsack, T., Weerasinghe, D. & Bembry, K. (1997, September). The Dallas Value-Added Accountability System (pp.81–99) & Little practical difference and pie in the sky, (pp.120–131). In J. Millman, (Ed.), Grading teachers, grading schools: Is student achievement a valid evaluation measure? Thousand Oaks, CA: Corwin Press." |
29 | Laura S. Hamilton | Brian M. Stecher, Kun Yuan | “The lack of evidence about the effects of SBR derives primarily from the fact that the vision has never been fully realized in practice.” p. 47 | Dismissive | Standards-Based Reform in the United States: History, Research, and Future Directions | Center on Education Policy, December, 2008 | http://www.rand.org/content/dam/rand/pubs/reprints/2009/RAND_RP1384.pdf | "This work was supported by the National Science Foundation under Grant No. REC-0228295." | Relevant studies of the effects of varying types of incentive or the optimal structure of testing prorgrams include those of Kelley (1999); the *Southern Regional Education Board (1998); Trelfa (1998); Heneman (1998); Banta, Lund, Black & Oblander (1996); Brooks-Cooper, 1993; Eckstein & Noah (1993); Richards & Shen (1992); Jacobson (1992); Heyneman & Ransom (1992); *Levine & Lezotte (1990); Duran, 1989; *Crooks (1988); *Kulik & Kulik (1987); Corcoran & Wilson (1986); *Guskey & Gates (1986); Brook & Oxenham (1985); Oxenham (1984); Venezky & Winfield (1979); Brookover & Lezotte (1979); McMillan (1977); Abbott (1977); *Staats (1973); *Kazdin & Bootzin (1972); *O’Leary & Drabman (1971); Cronbach (1960); Hurlock (1925), and Zeng (2001). *Covers many studies; study is a research review, research synthesis, or meta-analysis. Other researchers who, even prior to 2000, studied test-based incentive programs include Homme, Csanyi, Gonzales, Rechs, O’Leary, Drabman, Kaszdin, Bootzin, Staats, Cameron, Pierce, McMillan, Corcoran, Roueche, Kirk, Wheeler, Boylan, and Wilson. | "Others have considered the role of tests in incentive
programs. These researchers have
included Homme, Csanyi, Gonzales, Rechs, O’Leary, Drabman, Kaszdin, Bootzin,
Staats, Cameron, Pierce, McMillan, Corcoran, and Wilson. International
organizations, such as the World Bank or the Asian Development Bank, have
studied the effects of testing on education programs they sponsor. Researchers have included Somerset,
Heynemann, Ransom, Psacharopoulis, Velez, Brooke, Oxenham, Bude, Chapman,
Snyder, and Pronaratna. Moreover, the mastery learning/mastery testing experiments conducted from the 1960s through today varied incentives, frequency of tests, types of tests, and many other factors to determine the optimal structure of testing programs. Researchers included such notables as Bloom, Carroll, Keller, Block, Burns, Wentling, Anderson, Hymel, Kulik, Tierney, Cross, Okey, Guskey, Gates, and Jones." |
"What about: Brooks-Cooper, C. (1993), Brown, S. M. & Walberg, H. J. (1993), Heneman, H. G., III. (1998), Hurlock, E. B. (1925), Jones, J. et al. (1996), Kazdin, A. & Bootzin, R. (1972), Kelley, C. (1999), Kirkpatrick, J. E. (1934), O’Leary, K. D. & Drabman, R. (1971), Palmer, J. S. (2002), Richards, C. E. & Shen, T. M. (1992), .Rosswork, S. G. (1977), Staats, A. (1973), Tuckman, B. W. (1994), Tuckman, B. W. & Trimble, S. (1997), Webster, W. J., Mendro, R. L., Orsack, T., Weerasinghe, D. & Bembry, K. (1997, September). The Dallas Value-Added Accountability System (pp.81–99) & Little practical difference and pie in the sky, (pp.120–131). In J. Millman, (Ed.), Grading teachers, grading schools: Is student achievement a valid evaluation measure? Thousand Oaks, CA: Corwin Press." |
30 | Laura S. Hamilton | Brian M. Stecher, Kun Yuan | “[A]lthough many conceptions of SBR emphasize autonomy, we currently know relatively little about the effects of granting autonomy or what the right balance is between autonomy and prescriptiveness.” p. 55 | Dismissive | Standards-Based Reform in the United States: History, Research, and Future Directions | Center on Education Policy, December, 2008 | http://www.rand.org/content/dam/rand/pubs/reprints/2009/RAND_RP1384.pdf | "This work was supported by the National Science Foundation under Grant No. REC-0228295." | Relevant pre-2000 studies of the effects of standards, alignment, goal setting, setting reachable goals, etc. include those of Mitchell (1999); Morgan & Ramist (1998); the *Southern Regional Education Board (1998); Miles, Bishop, Collins, Fink, Gardner, Grant, Hussain, et al. (1997); the Florida Office of Program Policy Analysis (1997); Pomplun (1997); Schmoker (1996); Aguilera & Hendricks (1996); Banta, Lund, Black & Oblander (1996); Bottoms & Mikos (1995); *Bamburg & Medina (1993); Bishop (1993); the U. S. General Accounting Office (1993); Eckstein & Noah (1993); Mattsson (1993); Brown (1992); Heyneman & Ransom (1992); Whetton (1992); Anderson, Muir, Bateson, Blackmore & Rogers (1990); Csikszentmihalyi (1990); *Levine & Lezotte (1990); LaRoque & Coleman (1989); Hillocks (1987); Willingham & Morris (1986); Resnick & Resnick (1985); Ogle & Fritts (1984); *Natriello & Dornbusch (1984); Brooke & Oxenham (1984); Rentz (1979); Wellisch, MacQueen, Carriere & Dick (1978); *Rosswork (1977); Estes, Colvin & Goodwin (1976); Wood (1953); and Panlasigui & Knight (1930). | ||
31 | Laura S. Hamilton | Brian M. Stecher, Kun Yuan | “One of the primary responsibilities of the federal government should be to ensure ongoing collection of evidence demonstrating the effects of the policies, which could be used to make decisions about whether to continue on the current course or whether small adjustments or a major overhaul are needed.” p. 55 | Dismissive | Standards-Based Reform in the United States: History, Research, and Future Directions | Center on Education Policy, December, 2008 | http://www.rand.org/content/dam/rand/pubs/reprints/2009/RAND_RP1384.pdf | "This work was supported by the National Science Foundation under Grant No. REC-0228295." | Relevant pre-2000 studies of the effects of standards, alignment, goal setting, setting reachable goals, etc. include those of Mitchell (1999); Morgan & Ramist (1998); the *Southern Regional Education Board (1998); Miles, Bishop, Collins, Fink, Gardner, Grant, Hussain, et al. (1997); the Florida Office of Program Policy Analysis (1997); Pomplun (1997); Schmoker (1996); Aguilera & Hendricks (1996); Banta, Lund, Black & Oblander (1996); Bottoms & Mikos (1995); *Bamburg & Medina (1993); Bishop (1993); the U. S. General Accounting Office (1993); Eckstein & Noah (1993); Mattsson (1993); Brown (1992); Heyneman & Ransom (1992); Whetton (1992); Anderson, Muir, Bateson, Blackmore & Rogers (1990); Csikszentmihalyi (1990); *Levine & Lezotte (1990); LaRoque & Coleman (1989); Hillocks (1987); Willingham & Morris (1986); Resnick & Resnick (1985); Ogle & Fritts (1984); *Natriello & Dornbusch (1984); Brooke & Oxenham (1984); Rentz (1979); Wellisch, MacQueen, Carriere & Dick (1978); *Rosswork (1977); Estes, Colvin & Goodwin (1976); Wood (1953); and Panlasigui & Knight (1930). | ||
32 | Laura S. Hamilton | Brian M. Stecher, Julie A. Marsh, Jennifer Sloan McCombs, Abby Robyn, Jennifer Lin Russell, Scott Naftel, Heather Barney | "For many educators, the utility of SBA was demonstrated in a few pioneering states in the 1990s. Two of the most prominent examples of SBA occurred in Texas and North Carolina, where scores on state accountability tests rose dramatically after the introduction of SBA systems (Grissmer and Flanagan, 1998)." p.4 | Standards-Based Accountability Under No Child Left Behind: Experiences of Teachers and Administrators in Three States | Rand Corporation, 2007 | https://www.rand.org/pubs/monographs/MG589.html | Relevant pre-2000 studies of the effects of standards, alignment, goal setting, setting reachable goals, etc. include those of Mitchell (1999); Morgan & Ramist (1998); the *Southern Regional Education Board (1998); Miles, Bishop, Collins, Fink, Gardner, Grant, Hussain, et al. (1997); the Florida Office of Program Policy Analysis (1997); Pomplun (1997); Schmoker (1996); Aguilera & Hendricks (1996); Banta, Lund, Black & Oblander (1996); Bottoms & Mikos (1995); *Bamburg & Medina (1993); Bishop (1993); the U. S. General Accounting Office (1993); Eckstein & Noah (1993); Mattsson (1993); Brown (1992); Heyneman & Ransom (1992); Whetton (1992); Anderson, Muir, Bateson, Blackmore & Rogers (1990); Csikszentmihalyi (1990); *Levine & Lezotte (1990); LaRoque & Coleman (1989); Hillocks (1987); Willingham & Morris (1986); Resnick & Resnick (1985); Ogle & Fritts (1984); *Natriello & Dornbusch (1984); Brooke & Oxenham (1984); Rentz (1979); Wellisch, MacQueen, Carriere & Dick (1978); *Rosswork (1977); Estes, Colvin & Goodwin (1976); Wood (1953); and Panlasigui & Knight (1930). | ||||
33 | Laura S. Hamilton | Brian M. Stecher, Julie A. Marsh, Jennifer Sloan McCombs, Abby Robyn, Jennifer Lin Russell, Scott Naftel, Heather Barney | "However, the paths through which SBA [standards-based accountability] changes district, school, and classroom practices and how these changes in practice influence student outcomes are largely unexplored. There is strong evidence that SBA leads to changes in teachers’ instructional practices (Hamilton, 2004; Stecher, 2002)." p.5 | Dismissive | Standards-Based Accountability Under No Child Left Behind: Experiences of Teachers and Administrators in Three States | Rand Corporation, 2007 | https://www.rand.org/pubs/monographs/MG589.html | Relevant pre-2000 studies of the effects of standards, alignment, goal setting, setting reachable goals, etc. include those of Mitchell (1999); Morgan & Ramist (1998); the *Southern Regional Education Board (1998); Miles, Bishop, Collins, Fink, Gardner, Grant, Hussain, et al. (1997); the Florida Office of Program Policy Analysis (1997); Pomplun (1997); Schmoker (1996); Aguilera & Hendricks (1996); Banta, Lund, Black & Oblander (1996); Bottoms & Mikos (1995); *Bamburg & Medina (1993); Bishop (1993); the U. S. General Accounting Office (1993); Eckstein & Noah (1993); Mattsson (1993); Brown (1992); Heyneman & Ransom (1992); Whetton (1992); Anderson, Muir, Bateson, Blackmore & Rogers (1990); Csikszentmihalyi (1990); *Levine & Lezotte (1990); LaRoque & Coleman (1989); Hillocks (1987); Willingham & Morris (1986); Resnick & Resnick (1985); Ogle & Fritts (1984); *Natriello & Dornbusch (1984); Brooke & Oxenham (1984); Rentz (1979); Wellisch, MacQueen, Carriere & Dick (1978); *Rosswork (1977); Estes, Colvin & Goodwin (1976); Wood (1953); and Panlasigui & Knight (1930). | "This research was sponsored by the National Science Foundation under grant number REC-0228295." | ||
34 | Laura S. Hamilton | Brian M. Stecher, Julie A. Marsh, Jennifer Sloan McCombs, Abby Robyn, Jennifer Lin Russell, Scott Naftel, Heather Barney | "Much less is known about the impact of SBA at the district and school levels and the relationships among actions at the various levels and student outcomes. This study was designed to shed light on this complex set of relationships…" p.5 | Dismissive | Standards-Based Accountability Under No Child Left Behind: Experiences of Teachers and Administrators in Three States | Rand Corporation, 2007 | https://www.rand.org/pubs/monographs/MG589.html | Relevant pre-2000 studies of the effects of standards, alignment, goal setting, setting reachable goals, etc. include those of Mitchell (1999); Morgan & Ramist (1998); the *Southern Regional Education Board (1998); Miles, Bishop, Collins, Fink, Gardner, Grant, Hussain, et al. (1997); the Florida Office of Program Policy Analysis (1997); Pomplun (1997); Schmoker (1996); Aguilera & Hendricks (1996); Banta, Lund, Black & Oblander (1996); Bottoms & Mikos (1995); *Bamburg & Medina (1993); Bishop (1993); the U. S. General Accounting Office (1993); Eckstein & Noah (1993); Mattsson (1993); Brown (1992); Heyneman & Ransom (1992); Whetton (1992); Anderson, Muir, Bateson, Blackmore & Rogers (1990); Csikszentmihalyi (1990); *Levine & Lezotte (1990); LaRoque & Coleman (1989); Hillocks (1987); Willingham & Morris (1986); Resnick & Resnick (1985); Ogle & Fritts (1984); *Natriello & Dornbusch (1984); Brooke & Oxenham (1984); Rentz (1979); Wellisch, MacQueen, Carriere & Dick (1978); *Rosswork (1977); Estes, Colvin & Goodwin (1976); Wood (1953); and Panlasigui & Knight (1930). | "This research was sponsored by the National Science Foundation under grant number REC-0228295." | ||
35 | Julie A. Marsh, John F. Pane, and Laura S. Hamilton | "Unlike past studies of data use in schools, this paper brings together information systematically gathered from large, representative samples of educators at the district, school, and classroom levels in a variety of contexts." p.1 | Dismissive, Denigrating | Making Sense of Data-Driven Decision Making in Education | Rand Corporation Occassional Paper, 2006 | ||||||
36 | Julie A. Marsh, John F. Pane, and Laura S. Hamilton | "Although a few studies have tried to link DDDM to changes in school culture or performance (Chen et al., 2005; Copland, 2003; Feldman and Tung, 2001; Schmoker and Wilson, 1995; Wayman and Stringfield 2005), most of the literature focuses on implementation. In addition, previous work has tended to describe case studies of schools or has taken the form of advocacy or technical assistance (such as the “how to” implementation guides described by Feldman and Tung, 2001)." p.4 | Dismissive, Denigrating | Making Sense of Data-Driven Decision Making in Education | Rand Corporation Occassional Paper, 2006 | ||||||
37 | Daniel M. Koretz & Laura S. Hamilton | Robert L. Brennan, Ed. | "Most of the studies of [testing's] effects on practice report average responses that mask some of these important variations and interactions." p.552 | Denigrating | Testing for Accountability in K-12 | Chapter 15 in Educational Measurement, published by NCME and ACE, 2006 | Relevant studies of the effects of varying types of incentive or the optimal structure of incentives include those of Kelley (1999); the *Southern Regional Education Board (1998); Trelfa (1998); Heneman (1998); Banta, Lund, Black & Oblander (1996); Brooks-Cooper, 1993; Eckstein & Noah (1993); Richards & Shen (1992); Jacobson (1992); Heyneman & Ransom (1992); *Levine & Lezotte (1990); Duran, 1989; *Crooks (1988); *Kulik & Kulik (1987); Corcoran & Wilson (1986); *Guskey & Gates (1986); Brook & Oxenham (1985); Oxenham (1984); Venezky & Winfield (1979); Brookover & Lezotte (1979); McMillan (1977); Abbott (1977); *Staats (1973); *Kazdin & Bootzin (1972); *O’Leary & Drabman (1971); Cronbach (1960); Hurlock (1925), and Zeng (2001). *Covers many studies; study is a research review, research synthesis, or meta-analysis. Other researchers who, even prior to 2000, studied test-based incentive programs include Homme, Csanyi, Gonzales, Rechs, O’Leary, Drabman, Kaszdin, Bootzin, Staats, Cameron, Pierce, McMillan, Corcoran, Roueche, Kirk, Wheeler, Boylan, and Wilson. | "What about: Brooks-Cooper, C. (1993), Brown, S. M. & Walberg, H. J. (1993), Heneman, H. G., III. (1998), Hurlock, E. B. (1925), Jones, J. et al. (1996), Kazdin, A. & Bootzin, R. (1972), Kelley, C. (1999), Kirkpatrick, J. E. (1934), O’Leary, K. D. & Drabman, R. (1971), Palmer, J. S. (2002), Richards, C. E. & Shen, T. M. (1992), .Rosswork, S. G. (1977), Staats, A. (1973), Tuckman, B. W. (1994), Tuckman, B. W. & Trimble, S. (1997), Webster, W. J., Mendro, R. L., Orsack, T., Weerasinghe, D. & Bembry, K. (1997, September). The Dallas Value-Added Accountability System (pp.81–99) & Little practical difference and pie in the sky, (pp.120–131). In J. Millman, (Ed.), Grading teachers, grading schools: Is student achievement a valid evaluation measure? Thousand Oaks, CA: Corwin Press." | |||
38 | Daniel M. Koretz & Laura S. Hamilton | Robert L. Brennan, Ed. | "There is no comprehensive source of information on how much time schools devote to coaching activities such as practicing on released test forms, but some studies suggest these activities are widespread." p.552 | Dismissive | Testing for Accountability in K-12 | Chapter 15 in Educational Measurement, published by NCME and ACE, 2006 | A comprehensive study that included a nationally representative population of all systemwide tests contains exactl that information. See Phelps, R.P. (2000, Winter). Estimating the cost of systemwide student testing in the United States. Journal of Education Finance, 25(3) 343–380; U.S. GAO. (1993, January). Student testing: Current extent and expenditures, with cost estimates for a national examination. GAO/PEMD-93-8. Washington, DC: US General Accounting Office. Also the test prep, or test coaching, literature is vast and dates back decades, with meta-analyses of the literature dating back at least to the 1970s. There's even a What Works Clearinghouse summary of the (post World Wide Web) college admission test prep research literature: https://ies.ed.gov/ncee/wwc/Docs/InterventionReports/wwc_act_sat_100416.pdf . | ||||
39 | Daniel M. Koretz & Laura S. Hamilton | Robert L. Brennan, Ed. | "As with coaching, there are no comprehensive studies of the frequency of cheating across schools in the United States." p.553 | Dismissive | Testing for Accountability in K-12 | Chapter 15 in Educational Measurement, published by NCME and ACE, 2006 | Actually, there have been, in surveys, in which respondents freely admit that they cheat and how. Moreover, news reports of cheating, by students or educators, have been voluminous. See, for example, Caveon Test Security's "Cheating in the News" section on its web site. | ||||
40 | Daniel M. Koretz & Laura S. Hamilton | Robert L. Brennan, Ed. | "However, in the absence of audit testing, this hypothesis [of score inflation] cannot be tested." p.553 | Denigrating | Testing for Accountability in K-12 | Chapter 15 in Educational Measurement, published by NCME and ACE, 2006 | Yes, it can, and often has been, tested in experiments. Koretz's preferred method for "auditing" a high-stakes test is to compare its score trends to those of a parallel no-stakes test, which, presumably, will have totally reliable score trends. Yet, a cornucopia of experimental research has shown "no stakes" tests to be relatively unreliable, less reliable than high stakes tests, and to dampen student effort (see, e.g., Acherman & Kanfer, 2009; S. M. Brown & Walberg, 1993; Cole, Bergin, & Whittaker, 2008; Eklof, 2007; Finn, 2015; Hawthorne, Bol, Pribesh, & Suh, 2015; Wise & DeMars, 2005, 2015). | ||||
41 | Laura S. Hamilton | "Despite their popularity, in most cases these [education] reforms are not guided by a careful investigation of the probable consequences of using tests as accountability tools.", p.25 | Denigrating | Assessment as a Policy Tool | Chapter 2, in Review of Research in Education (27), 2003 | https://journals.sagepub.com/doi/pdf/10.3102/0091732X027001025 | Hamilton's preferred method for "auditing" a high-stakes test is to compare its score trends to those of a parallel no-stakes test, which, presumably, will have totally reliable score trends. Yet, a cornucopia of experimental research has shown "no stakes" tests to be relatively unreliable, less reliable than high stakes tests, and to dampen student effort (see, e.g., Acherman & Kanfer, 2009; S. M. Brown & Walberg, 1993; Cole, Bergin, & Whittaker, 2008; Eklof, 2007; Finn, 2015; Hawthorne, Bol, Pribesh, & Suh, 2015; Wise & DeMars, 2005, 2015). | ||||
42 | Laura S. Hamilton | "Although numerous studies have examined the effects of high-stakes testing, the majority of these investigations have failed to reach the standards of quality that would be required to make strong inferences based upon them.", p.32 | Denigrating | Assessment as a Policy Tool | Chapter 2, in Review of Research in Education (27), 2003 | https://journals.sagepub.com/doi/pdf/10.3102/0091732X027001025 | Nonsense, it is straightforward, and has been done for over a century. For example, 70%, or 357, of the effect sizes in a recent meta-regression derived from randomized experiments conducted over the past century. Another 12% derived from multiple regression studies. https://journals.sagepub.com/doi/abs/10.1177/0193841X19865628#abstract | ||||
43 | Laura S. Hamilton | "In addition, it is nearly impossible for researchers to set up the kind of experimental design that is most appropriate for examining cause-and-effect relationships." p.32 | Dismissive | Assessment as a Policy Tool | Chapter 2, in Review of Research in Education (27), 2003 | https://journals.sagepub.com/doi/pdf/10.3102/0091732X027001025 | Nonsense, it is straightforward, and has been done for over a century. For example, 70%, or 357, of the effect sizes in a recent meta-regression derived from randomized experiments conducted over the past century. Another 12% derived from multiple regression studies. https://journals.sagepub.com/doi/abs/10.1177/0193841X19865628#abstract | ||||
44 | Laura S. Hamilton | "...even studies that are intended to be merely descriptive often suffer from poor measurement of the construct of interest, as well as biased samples that may result from nonrepresentative sampling or nonrandom refusal to participate in the research." p. 32 | Denigrating | Assessment as a Policy Tool | Chapter 2, in Review of Research in Education (27), 2003 | https://journals.sagepub.com/doi/pdf/10.3102/0091732X027001025 | Nonsense, it is straightforward, and has been done for over a century. For example, 70%, or 357, of the effect sizes in a recent meta-regression derived from randomized experiments conducted over the past century. Another 12% derived from multiple regression studies. https://journals.sagepub.com/doi/abs/10.1177/0193841X19865628#abstract | ||||
45 | Laura S. Hamilton | "...when test scores are associated with consequences that are important or meaningful to teachers, it is likely that instruction will be affected. The empirical evidence, though not extensive, supports this distinction." p.33 | Dismissive | Assessment as a Policy Tool | Chapter 2, in Review of Research in Education (27), 2003 | https://journals.sagepub.com/doi/pdf/10.3102/0091732X027001025 | Relevant studies of the effects of tests and/or accountability program on motivation and instructional practice include those of the *Southern Regional Education Board (1998); Johnson (1998); Schafer, Hultgren, Hawley, Abrams Seubert & Mazzoni (1997); Miles, Bishop, Collins, Fink, Gardner, Grant, Hussain, et al. (1997); Tuckman & Trimble (1997); Clarke & Stephens (1996); Zigarelli (1996); Stevenson, Lee, et al. (1995); Waters, Burger & Burger (1995); Egeland (1995); Prais (1995); Tuckman (1994); Ritchie & Thorkildsen (1994); Brown & Walberg, (1993); Wall & Alderson (1993); Wolf & Rapiau (1993); Eckstein & Noah (1993); Chao-Qun & Hui (1993); Plazak & Mazur (1992); Steedman (1992); Singh, Marimutha & Mukjerjee (1990); *Levine & Lezotte (1990); O’Sullivan (1989); Somerset (1988); Pennycuick & Murphy (1988); Stevens (1984); Marsh (1984); Brunton (1982); Solberg (1977); Foss (1977); *Kirkland (1971); Somerset (1968); Stuit (1947); and Keys (1934). *Covers many studies; study is a research review, research synthesis, or meta-analysis. | ||||
46 | Laura S. Hamilton | "...there are no compehensive studies of the frequency of cheating." p.35 | Dismissive | Assessment as a Policy Tool | Chapter 2, in Review of Research in Education (27), 2003 | https://journals.sagepub.com/doi/pdf/10.3102/0091732X027001025 | Actually, there have been, in surveys, in which respondents freely admit that they cheat and how. Moreover, news reports of cheating, by students or educators, have been voluminous. See, for example, Caveon Test Security's "Cheating in the News" section on its web site. | ||||
47 | Laura S. Hamilton | "Although these studies are suggestive, they rely on teacher perceptions, and there is little direct evidence of how testing actually affects student morale." p.39 | Dismissive | Assessment as a Policy Tool | Chapter 2, in Review of Research in Education (27), 2003 | https://journals.sagepub.com/doi/pdf/10.3102/0091732X027001025 | At least twelve pre-2004 student surveys were included here: https://www.nonpartisaneducation.org/Review/Resources/SurveyList.htm. See also: https://richardphelps.net/DemandForStandardizedTesting.pdf | ||||
48 | Laura S. Hamilton | "The overall lack of evidence regarding student morale, stress, and motivation is due in part to the difficulty that researchers have in gaining access to students and measuring their levels of these constructs (Stecher, 2002)." p.39 | Dismissive | Assessment as a Policy Tool | Chapter 2, in Review of Research in Education (27), 2003 | https://journals.sagepub.com/doi/pdf/10.3102/0091732X027001025 | At
least twelve pre-2004 student surveys were included here:
https://www.nonpartisaneducation.org/Review/Resources/SurveyList.htm. See
also: https://richardphelps.net/DemandForStandardizedTesting.pdf In 150 of 241 qualitative studies the focus of the interviews, case study, or observations was the effect of testing stakes on students. See: https://www.tandfonline.com/doi/abs/10.1080/15305058.2011.602920 |
||||
49 | Laura S. Hamilton | "...though one study that examined high school exit exams and that controlled for individual student characteristics (unlike most of the research on this topic) found no such relationship." p.40 | Denigrating | Assessment as a Policy Tool | Chapter 2, in Review of Research in Education (27), 2003 | https://journals.sagepub.com/doi/pdf/10.3102/0091732X027001025 | The article to which she refers ignored most previous studies, mischaracterized the ones it acknowledged, and mis-classified the testing programs in several states. | ||||
50 | Laura S. Hamilton | "There is much we do not know about score inflation." p.46 | Dismissive | Assessment as a Policy Tool | Chapter 2, in Review of Research in Education (27), 2003 | https://journals.sagepub.com/doi/pdf/10.3102/0091732X027001025 | In fact, we know quite a lot about the source of higher levels of score inflation -- it is lax test security. The many experimental studies of test coaching are consistent, it has some modest effect, and not the volatile or very large effects that Koretz claims. | ||||
51 | Laura S. Hamilton | "there is simply too much that we currently do not know about how to design testing policies that promote desirable outcomes and prevent undesirable ones." p.57 | Dismissive | Assessment as a Policy Tool | Chapter 2, in Review of Research in Education (27), 2003 | https://journals.sagepub.com/doi/pdf/10.3102/0091732X027001025 | In fact, the evidence "that testing can improve education" is voluminous. See, for example, Phelps, R. P. (2005). The rich, robust research literature on testing’s achievement benefits. In R. P. Phelps (Ed.), Defending standardized testing (pp. 55–90). Mahwah, NJ: Psychology Press. Or, see https://journals.sagepub.com/doi/abs/10.1177/0193841X19865628#abstract | ||||
52 | Laura S. Hamilton | "...the search for answers to questions about how to minimize score inflation and promote effective instruction is likely to continue for many years." p.57 | Dismissive | Assessment as a Policy Tool | Chapter 2, in Review of Research in Education (27), 2003 | https://journals.sagepub.com/doi/pdf/10.3102/0091732X027001025 | In fact, we know quite a lot about the source of higher levels of score inflation -- it is lax test security. The many experimental studies of test coaching are consistent, it has some modest effect, and not the volatile or very large effects that Koretz claims. | ||||
53 | Laura S. Hamilton | Daniel F. McCaffrey, J.R. Lockwood, Daniel M. Koretz | “The shortcomings of the studies make it difficult to determine the size of teacher effects, but we suspect that the magnitude of some of the effects reported in this literature are overstated.” p. xiii | Denigrating | Evaluating Value-Added Models for Teacher Accountability | Rand Corporation, 2003 | https://www.rand.org/content/dam/rand/pubs/monographs/2004/RAND_MG158.pdf | Tennessee's TVAAS value-added measurement system had been running a decade when they wrote this and did much of what these authors claim had never been done. | |||
54 | Laura S. Hamilton | Daniel F. McCaffrey, J.R. Lockwood, Daniel M. Koretz | “Using VAM to estimate individual teacher effects is a recent endeavor, and many of the possible sources of error have not been thoroughly evaluated in the literature.” p. xix | Dismissive | Evaluating Value-Added Models for Teacher Accountability | Rand Corporation, 2003 | https://www.rand.org/content/dam/rand/pubs/monographs/2004/RAND_MG158.pdf | Tennessee's TVAAS value-added measurement system had been running a decade when they wrote this and did much of what these authors claim had never been done. | |||
55 | Laura S. Hamilton | Daniel F. McCaffrey, J.R. Lockwood, Daniel M. Koretz | “Empirical evaluations do not exist for many of the potential sources of error we have identified. Studies need to be conducted to determine how these factors contribute to estimated teacher effects and to determine the conditions that exacerbate or mitigate the impact these factors have on teacher effects.” p. xix | Dismissive | Evaluating Value-Added Models for Teacher Accountability | Rand Corporation, 2003 | https://www.rand.org/content/dam/rand/pubs/monographs/2004/RAND_MG158.pdf | Tennessee's TVAAS value-added measurement system had been running a decade when they wrote this and did much of what these authors claim had never been done. | |||
56 | Laura S. Hamilton | Daniel F. McCaffrey, J.R. Lockwood, Daniel M. Koretz | “This lack of attention to teachers in policy discussions may be attributed in part to another body of literature that attempted to determine the effects of specific teacher background characteristics, including credentialing status (e.g., Miller, McKenna, and McKenna, 1998; Goldhaber and Brewer, 2000) and subject matter coursework (e.g., Monk, 1994).” p. 8 | Dismissive | Evaluating Value-Added Models for Teacher Accountability | Rand Corporation, 2003 | https://www.rand.org/content/dam/rand/pubs/monographs/2004/RAND_MG158.pdf | Tennessee's TVAAS value-added measurement system had been running a decade when they wrote this and did much of what these authors claim had never been done. | |||
57 | Laura S. Hamilton | Daniel F. McCaffrey, J.R. Lockwood, Daniel M. Koretz | “To date, there has been little empirical exploration of the size of school effects and the sensitivity of teacher effects to modeling of school effects.” p. 78 | Dismissive | Evaluating Value-Added Models for Teacher Accountability | Rand Corporation, 2003 | https://www.rand.org/content/dam/rand/pubs/monographs/2004/RAND_MG158.pdf | Tennessee's TVAAS value-added measurement system had been running a decade when they wrote this and did much of what these authors claim had never been done. | |||
58 | Laura S. Hamilton | Daniel F. McCaffrey, J.R. Lockwood, Daniel M. Koretz | “There are no empirical explorations of the robustness of estimates to assumptions about prior-year schooling effects.“ p. 81 | Dismissive | Evaluating Value-Added Models for Teacher Accountability | Rand Corporation, 2003 | https://www.rand.org/content/dam/rand/pubs/monographs/2004/RAND_MG158.pdf | Tennessee's TVAAS value-added measurement system had been running a decade when they wrote this and did much of what these authors claim had never been done. | |||
59 | Laura S. Hamilton | Daniel F. McCaffrey, J.R. Lockwood, Daniel M. Koretz | “There is currently no empirical evidence about the sensitivity of gain scores or teacher effects to such alternatives.” p. 89 | Dismissive | Evaluating Value-Added Models for Teacher Accountability | Rand Corporation, 2003 | https://www.rand.org/content/dam/rand/pubs/monographs/2004/RAND_MG158.pdf | Tennessee's TVAAS value-added measurement system had been running a decade when they wrote this and did much of what these authors claim had never been done. | |||
60 | Laura S. Hamilton | Daniel F. McCaffrey, J.R. Lockwood, Daniel M. Koretz | “Empirical evaluations do not exist for many of the potential sources of error we have identified. Studies need to be conducted to determine how these factors contribute to estimated teacher effects and to determine the conditions that exacerbate or mitigate the impact these factors have on teacher effects.” p. 116 | Dismissive | Evaluating Value-Added Models for Teacher Accountability | Rand Corporation, 2003 | https://www.rand.org/content/dam/rand/pubs/monographs/2004/RAND_MG158.pdf | Tennessee's TVAAS value-added measurement system had been running a decade when they wrote this and did much of what these authors claim had never been done. | |||
61 | Laura S. Hamilton | Daniel F. McCaffrey, J.R. Lockwood, Daniel M. Koretz | “Although we expect missing data are likely to be pervasive, there is little systematic discussion of the extent or nature of missing data in test score databases.” p. 117 | Dismissive | Evaluating Value-Added Models for Teacher Accountability | Rand Corporation, 2003 | https://www.rand.org/content/dam/rand/pubs/monographs/2004/RAND_MG158.pdf | Tennessee's TVAAS value-added measurement system had been running a decade when they wrote this and did much of what these authors claim had never been done. | |||
62 | Brian M. Stecher | Laura S. Hamilton | "The business model of setting clear targets, attaching incentives to the attainment of those targets, and rewarding those responsible for reaching the targets has proven successful in a wide range of business enterprises. But there is no evidence that these accountability principles will work well in an educational context, and there are many reasons to doubt that the principles can be applied without significant adaptation." | Dismissive | Putting Theory to the Test: Systems of "Educational Accountability" Should be Held Accountable | Rand Review, Spring 2002 | https://www.rand.org/pubs/periodicals/rand-review/issues/rr-04-02/theory.html | See, for example: https://www.tandfonline.com/doi/full/10.1080/15305058.2011.602920 ; https://nonpartisaneducation.org/Review/Resources/QuantitativeList.htm . This list includes 24 studies completed before 2000 whose primary focus was to measure the effect of “test-based accountability.” A few dozen more pre-2000 studies also measured the effect of test-based accountability although such was not their primary focus. Include qualitative and program evaluation studies of test-based accountability, and the count of pre-2000 studies rises into the hundreds. | |||
63 | Brian M. Stecher | Laura S. Hamilton | " The lack of strong evidence regarding the design and effectiveness of accountability systems hampers policymaking at a critical juncture." | Denigrating | Putting Theory to the Test: Systems of "Educational Accountability" Should be Held Accountable | Rand Review, Spring 2002 | https://www.rand.org/pubs/periodicals/rand-review/issues/rr-04-02/theory.html | See, for example: https://www.tandfonline.com/doi/full/10.1080/15305058.2011.602920 ; https://nonpartisaneducation.org/Review/Resources/QuantitativeList.htm . This list includes 24 studies completed before 2000 whose primary focus was to measure the effect of “test-based accountability.” A few dozen more pre-2000 studies also measured the effect of test-based accountability although such was not their primary focus. Include qualitative and program evaluation studies of test-based accountability, and the count of pre-2000 studies rises into the hundreds. | |||
64 | Brian M. Stecher | Laura S. Hamilton | "Nonetheless, the evidence has yet to justify the expectations. The initial evidence is, at best, mixed. On the plus side, students and teachers seem to respond to the incentives created by the accountability systems | Dismissive | Putting Theory to the Test: Systems of "Educational Accountability" Should be Held Accountable | Rand Review, Spring 2002 | https://www.rand.org/pubs/periodicals/rand-review/issues/rr-04-02/theory.html | See, for example: https://www.tandfonline.com/doi/full/10.1080/15305058.2011.602920 ; https://nonpartisaneducation.org/Review/Resources/QuantitativeList.htm . This list includes 24 studies completed before 2000 whose primary focus was to measure the effect of “test-based accountability.” A few dozen more pre-2000 studies also measured the effect of test-based accountability although such was not their primary focus. Include qualitative and program evaluation studies of test-based accountability, and the count of pre-2000 studies rises into the hundreds. | |||
65 | Brian M. Stecher | Laura S. Hamilton | "Proponents of accountability attribute the improved scores in these states to clearer expectations, greater motivation on the part of the students and teachers, a focused curriculum, and more-effective instruction. However, there is little or no research to substantiate these positive changes or their effects on scores." | Dismissive | Putting Theory to the Test: Systems of "Educational Accountability" Should be Held Accountable | Rand Review, Spring 2002 | https://www.rand.org/pubs/periodicals/rand-review/issues/rr-04-02/theory.html | In fact, the evidence "that testing can improve education" is voluminous. See, for example, Phelps, R. P. (2005). The rich, robust research literature on testing’s achievement benefits. In R. P. Phelps (Ed.), Defending standardized testing (pp. 55–90). Mahwah, NJ: Psychology Press. Or, see https://journals.sagepub.com/doi/abs/10.1177/0193841X19865628#abstract | |||
66 | Brian M. Stecher | Laura S. Hamilton | "One of the earliest studies on the effects of testing (conducted in two Arizona schools in the late 1980s) showed that teachers reduced their emphasis on important, nontested material." | Dismissive | Putting Theory to the Test: Systems of "Educational Accountability" Should be Held Accountable | Rand Review, Spring 2002 | https://www.rand.org/pubs/periodicals/rand-review/issues/rr-04-02/theory.html | Hundreds, perhaps thousands, of studies of the effects of testing predate CRESST's in the 1980s. See, for example: https://www.tandfonline.com/doi/full/10.1080/15305058.2011.602920 ; https://nonpartisaneducation.org/Review/Resources/QuantitativeList.htm ; https://nonpartisaneducation.org/Review/Resources/SurveyList.htm ; https://nonpartisaneducation.org/Review/Resources/QualitativeList.htm | |||
67 | Brian M. Stecher | Laura S. Hamilton | "Test-based accountability systems will work better if we acknowledge how little we know about them, if the federal government devotes appropriate resources to studying them, and if the states make ongoing efforts to improve them." | Dismissive | Putting Theory to the Test: Systems of "Educational Accountability" Should be Held Accountable | Rand Review, Spring 2002 | https://www.rand.org/pubs/periodicals/rand-review/issues/rr-04-02/theory.html | See, for example: https://www.tandfonline.com/doi/full/10.1080/15305058.2011.602920 ; https://nonpartisaneducation.org/Review/Resources/QuantitativeList.htm . This list includes 24 studies completed before 2000 whose primary focus was to measure the effect of “test-based accountability.” A few dozen more pre-2000 studies also measured the effect of test-based accountability although such was not their primary focus. Include qualitative and program evaluation studies of test-based accountability, and the count of pre-2000 studies rises into the hundreds. | |||
68 | Laura S. Hamilton, Brian M. Stecher, Stephen P. Klein | "Although test-based accountability has shown some compelling results, the issues are complex, the research is new and incomplete, and many of the claims that have received the most attention have proved to be premature and superficial." | Denigrating | Summary, p.xiv | Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002 | https://www.rand.org/pubs/monograph_reports/MR1554.html | US National Science Foundation | Hundreds, perhaps thousands, of studies of the effects of testing predate CRESST's in the 1980s. See, for example: https://www.tandfonline.com/doi/full/10.1080/15305058.2011.602920 ; https://nonpartisaneducation.org/Review/Resources/QuantitativeList.htm ; https://nonpartisaneducation.org/Review/Resources/SurveyList.htm ; https://nonpartisaneducation.org/Review/Resources/QualitativeList.htm | |||
69 | Laura S. Hamilton, Brian M. Stecher, Stephen P. Klein | "The research evidence does not provide definitive information about the actual costs of testing but the information that is available suggests that expenditures for testing have grown in recent years." | Dismissive | Introduction, p.9 | Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002 | https://www.rand.org/pubs/monograph_reports/MR1554.html | US National Science Foundation | No. See, for example, Phelps, R.P. (2000, Winter). Estimating the cost of systemwide student testing in the United States. Journal of Education Finance, 25(3) 343–380; Danitz, T. (2001, February 27). Special report: States pay $400 million for tests in 2001. Stateline.org. Pew Center for the States; Hoxby, C.M. (2002). The cost of accountability, in W. M Evers & H.J. Walberg (Eds.), School Accountability, Stanford, CA: Hoover Institution Press; U.S. GAO. (1993, January). Student testing: Current extent and expenditures, with cost estimates for a national examination. GAO/PEMD-93-8. Washington, DC: US General Accounting Office; Picus, L.O., & Tralli, A. (1998, February). Alternative assessment programs: What are the true costs? CSE Technical Report 441, Los Angeles: CRESST; Phelps, R.P. (1998). Benefit-cost analysis of systemwide student testing, Paper presented at the annual meeting of the American Education Finance Association, Mobile, AL. | |||
70 | Laura S. Hamilton, Brian M. Stecher, Stephen P. Klein | "The General Accounting Office (1993) … estimate was $516 million … The estimate does not include time for more-extensive test preparation activities." p.9 | Denigrating | Introduction, p.9 | Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002 | https://www.rand.org/pubs/monograph_reports/MR1554.html | US National Science Foundation | As a matter of fact the GAO report did include those costs -- all of them. The GAO surveys very explicitly instructed respondents to "include any and all costs related" to each test, including any and all test preparation time and expenses. | |||
71 | Laura S. Hamilton, Daniel M. Koretz | Laura S. Hamilton, Brian M. Stecher, Stephen P. Klein, Eds. | "There is currently no substantial evidence on the effects of published report cards on parents’ decisionmaking or on the schools themselves." | Dismissive | Chapter 2: Tests and their use in test-based accountability systems, p.44 | Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002 | https://www.rand.org/pubs/monograph_reports/MR1554.html | US National Science Foundation | For decades, consulting services have existed that help parents new to a city select the right school or school district for them. | ||
72 | Vi-Nhuan Le, Stephen P. Klein | Laura S. Hamilton, Brian M. Stecher, Stephen P. Klein, Eds. | "Research on the inflation of gains remains too limited to indicate how prevalent the problem is." | Dismissive | Chapter 3: Technical criteria for evaluating tests, p. 68 | Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002 | https://www.rand.org/pubs/monograph_reports/MR1554.html | US National Science Foundation | In fact the test prep, or test coaching, literature is vast and dates back decades, with meta-analyses of the literature dating back at least to the 1970s. There's even a What Works Clearinghouse summary of the (post World Wide Web) college admission test prep research literature: https://ies.ed.gov/ncee/wwc/Docs/InterventionReports/wwc_act_sat_100416.pdf . See also: Gilmore (1927) DeWeerdt (1927) French (1959) French & Dear (1959) Ortar (1960) Marron (1965) ETS (1965). Messick & Jungeblut (1981) Ellis, Konoske, Wulfeck, & Montague (1982) DerSimonian and Laird (1983) Kulik, Bangert-Drowns & Kulik (1984) Powers (1985) Samson (1985) Scruggs, White, & Bennion (1986) Jones (1986). Fraker (1986/1987) Halpin (1987) Whitla (1988) Snedecor (1989) Bond (1989). Baydar (1990) Becker (1990) Smyth (1990) Moore (1991) Alderson & Wall (1992) Powers (1993) Oren (1993). Powers & Rock (1994) Scholes, Lane (1997) Allalouf & Ben Shakhar (1998) Robb & Ercanbrack (1999) McClain (1999) Camara (1999, 2001, 2008) Stone & Lane (2000, 2003) Din & Soldan (2001) Briggs (2001) Palmer (2002) Briggs & Hansen (2004) Cankoy & Ali Tut (2005) Crocker (2005) Allensworth, Correa, & Ponisciak (2008) Domingue & Briggs (2009) Koljatic & Silva (2014) Early (2019) Herndon (2021) | ||
73 | Vi-Nhuan Le, Stephen P. Klein | Laura S. Hamilton, Brian M. Stecher, Stephen P. Klein, Eds. | "Relatively little is known about how testing accomodations affect score validity, and the few studies that have been conducted on the subject have had mixed results." | Dismissive | Chapter 3: Technical criteria for evaluating tests, p. 71 | Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002 | https://www.rand.org/pubs/monograph_reports/MR1554.html | US National Science Foundation | |||
74 | Brian M. Stecher | Laura S. Hamilton, Brian M. Stecher, Stephen P. Klein, Eds. | "High-stakes testing may also affect parents (e.g., their attitudes toward education, their engagement with schools, and their direct participation in their child's learning) as well as policymakers (their beliefs about system performance, their judgements about program effectiveness, and their allocation of resources). However, these issues remain largely unexamined in the literature." | Dismissive | Chapter 4: Consequences of large-scale, high-stakes testing on school and classroom practice, p. 79 | Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002 | https://www.rand.org/content/dam/rand/pubs/monograph_reports/2002/MR1554.pdf | US National Science Foundation | Parents and other adults are typically reached.through public opinion polls. See, for example: https://www.tandfonline.com/doi/full/10.1080/15305058.2011.602920 ; https://nonpartisaneducation.org/Review/Resources/SurveyList.htm . Among the hundreds of polls conducted between 1958 and 2008, a majority of them included parents in particular or adults in general. | ||
75 | Brian M. Stecher | Laura S. Hamilton, Brian M. Stecher, Stephen P. Klein, Eds. | "As described in chapter 2, there was little concern about the effects of testing on teaching prior to the 1970s." | Dismissive | Chapter 4: Consequences of large-scale, high-stakes testing on school and classroom practice, p. 81 | Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002 | https://www.rand.org/content/dam/rand/pubs/monograph_reports/2002/MR1554.pdf | US National Science Foundation | Rubbish. Entire books were written on the topic, for example: C.C. Ross, Measurement in Today’s Schools, 1942; G.M. Ruch, G.D. Stoddard, Tests and Measurements in High School Instruction, 1927; C.W. Odell, Educational Measurement in High School, 1930. Other testimonies to the abundance of educational testing and empirical research on test use starting in the first half of the twentieth century can be found in Lincoln & Workman 1936, 4, 7; Butts 1947, 605; Monroe 1950, 1461; Holman & Docter 1972, 34; Tyack 1974, 183; and Lohman 1997, 88. | ||
76 | Brian M. Stecher | Laura S. Hamilton, Brian M. Stecher, Stephen P. Klein, Eds. | "In light of the changes that occurred in the uses of large-scale testing in the 1980s and 1990s, researchers began to investigate teachers' reactions to external assessment. The initial research on the impact of large-scale testing was conducted in the 1980s and the 1990s." | Dismissive | Chapter 4: Consequences of large-scale, high-stakes testing on school and classroom practice, p. 83 | Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002 | https://www.rand.org/content/dam/rand/pubs/monograph_reports/2002/MR1554.pdf | US National Science Foundation | Hundreds, perhaps thousands, of studies of the effects of testing predate CRESST's in the 1980s. See, for example: https://www.tandfonline.com/doi/full/10.1080/15305058.2011.602920 ; https://nonpartisaneducation.org/Review/Resources/QuantitativeList.htm ; https://nonpartisaneducation.org/Review/Resources/SurveyList.htm ; https://nonpartisaneducation.org/Review/Resources/QualitativeList.htm | ||
77 | Brian M. Stecher | Laura S. Hamilton, Brian M. Stecher, Stephen P. Klein, Eds. | "The bulk of the research on the effects of testing has been conducted using surveys and case studies." | Dismissive | Chapter 4: Consequences of large-scale, high-stakes testing on school and classroom practice, p. 83 | Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002 | https://www.rand.org/content/dam/rand/pubs/monograph_reports/2002/MR1554.pdf | US National Science Foundation | This is misleading. True, many of the hundreds of studies on the effects of testing have been surveys and case studies. But, many, and more by my count, have been randomized experiments. See, for example: https://www.tandfonline.com/doi/full/10.1080/15305058.2011.602920 ; https://nonpartisaneducation.org/Review/Resources/QuantitativeList.htm ; | ||
78 | Brian M. Stecher | Laura S. Hamilton, Brian M. Stecher, Stephen P. Klein, Eds. | "Data on the incidence of cheating [on educational tests] are scarce…" | Dismissive | Chapter 4: Consequences of large-scale, high-stakes testing on school and classroom practice, p. 96 | Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002 | https://www.rand.org/pubs/monograph_reports/MR1554.html | US National Science Foundation | Actually, there have been, in surveys, in which respondents freely admit that they cheat and how. Moreover, news reports of cheating, by students or educators, have been voluminous. See, for example, Caveon Test Security's "Cheating in the News" section on its web site. | ||
79 | Brian M. Stecher | Laura S. Hamilton, Brian M. Stecher, Stephen P. Klein, Eds. | "Less is known about changes in policies at the district and school levels in response to high-stakes testing, but mixed evidence of some impact has appeared." | Dismissive | Chapter 4: Consequences of large-scale, high-stakes testing on school and classroom practice, p. 96 | Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002 | https://www.rand.org/pubs/monograph_reports/MR1554.html | US National Science Foundation | Relevant pre-2000 studies of the effects of testing on at-risk students, completion, dropping out, curricular offerings, attitudes, etc. include those of Schleisman (1999); the *Southern Regional Education Board (1998); Webster, Mendro, Orsak, Weerasinghe & Bembry (1997); Jones (1996); Boylan (1996); Jones, 1993; Jacobson (1992); Grisay (1991); Johnstone (1990); Task Force on Educational Assessment Programs [Florida] (1979); Wellisch, MacQueen, Carriere & Duck (1978); Enochs (1978); Pronaratna (1976); and McWilliams & Thomas (1976). | ||
80 | Brian M. Stecher | Laura S. Hamilton, Brian M. Stecher, Stephen P. Klein, Eds. | "Although numerous news articles have addressed the negative effects of high-stakes testing, systematic research on the subject is limited." | Dismissive | Chapter 4: Consequences of large-scale, high-stakes testing on school and classroom practice, p. 98 | Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002 | https://www.rand.org/pubs/monograph_reports/MR1554.html | US National Science Foundation | Relevant pre-2000 studies of the effects of testing on at-risk students, completion, dropping out, curricular offerings, attitudes, etc. include those of Schleisman (1999); the *Southern Regional Education Board (1998); Webster, Mendro, Orsak, Weerasinghe & Bembry (1997); Jones (1996); Boylan (1996); Jones, 1993; Jacobson (1992); Grisay (1991); Johnstone (1990); Task Force on Educational Assessment Programs [Florida] (1979); Wellisch, MacQueen, Carriere & Duck (1978); Enochs (1978); Pronaratna (1976); and McWilliams & Thomas (1976). | ||
81 | Brian M. Stecher | Laura S. Hamilton, Brian M. Stecher, Stephen P. Klein, Eds. | "Research regarding the effects of test-based accountability on equity is very limited." | Dismissive | Chapter 4: Consequences of large-scale, high-stakes testing on school and classroom practice, p. 99 | Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002 | https://www.rand.org/content/dam/rand/pubs/monograph_reports/2002/MR1554.pdf | US National Science Foundation | |||
82 | Brian M. Stecher | Laura S. Hamilton, Brian M. Stecher, Stephen P. Klein, Eds. | "Researchers have not documented the desirable consequences of testing … as clearly as the undesirable ones." | Dismissive | Chapter 4: Consequences of large-scale, high-stakes testing on school and classroom practice, p. 99 | Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002 | https://www.rand.org/content/dam/rand/pubs/monograph_reports/2002/MR1554.pdf | US National Science Foundation | See, for example: https://www.tandfonline.com/doi/full/10.1080/15305058.2011.602920 ; https://nonpartisaneducation.org/Review/Resources/QuantitativeList.htm ; https://nonpartisaneducation.org/Review/Resources/SurveyList.htm ; https://nonpartisaneducation.org/Review/Resources/QualitativeList.htm | ||
83 | Brian M. Stecher | Laura S. Hamilton, Brian M. Stecher, Stephen P. Klein, Eds. | " … researchers have not generally measured the extent or magnitude of the shifts in practice that they identified as a result of high-stakes testing." | Dismissive | Chapter 4: Consequences of large-scale, high-stakes testing on school and classroom practice, pp. 99–100 | Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002 | https://www.rand.org/content/dam/rand/pubs/monograph_reports/2002/MR1554.pdf | US National Science Foundation | The 1993 GAO study did. See, also: https://www.tandfonline.com/doi/full/10.1080/15305058.2011.602920 ; https://nonpartisaneducation.org/Review/Resources/QuantitativeList.htm ; https://nonpartisaneducation.org/Review/Resources/SurveyList.htm ; https://nonpartisaneducation.org/Review/Resources/QualitativeList.htm | ||
84 | Lorraine M. McDonnell | Laura S. Hamilton, Brian M. Stecher, Stephen P. Klein, Eds. | "...this chapter can only describe the issues that are raised when one looks at testing from a political perspective. Because of the lack of systematic studies on the topic." | Dismissive | Chapter 5: Accountability as seen through a political lens, p.102 | Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002 | https://www.rand.org/pubs/monograph_reports/MR1554.html | US National Science Foundation | Parents and other adults are typically reached.through public opinion polls. See, for example: https://www.tandfonline.com/doi/full/10.1080/15305058.2011.602920 ; https://nonpartisaneducation.org/Review/Resources/SurveyList.htm . Among the hundreds of polls conducted between 1958 and 2008, a majority of them included parents in particular or adults in general. | ||
85 | Lorraine M. McDonnell | Laura S. Hamilton, Brian M. Stecher, Stephen P. Klein, Eds. | "...public opinion, as measured by surveys, does not always provide a clear and unambiguous measure of public sentiment." | Denigrating | Chapter 5: Accountability as seen through a political lens, p.108 | Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002 | https://www.rand.org/pubs/monograph_reports/MR1554.html | US National Science Foundation | Parents and other adults are typically reached.through public opinion polls. See, for example: https://www.tandfonline.com/doi/full/10.1080/15305058.2011.602920 ; https://nonpartisaneducation.org/Review/Resources/SurveyList.htm . Among the hundreds of polls conducted between 1958 and 2008, a majority of them included parents in particular or adults in general. | ||
86 | Laura S. Hamilton, Brian M. Stecher | Laura S. Hamilton, Brian M. Stecher, Stephen P. Klein, Eds. | "So test-based accountability remains controversial because there is inadequate evidence to make clear judgments about its effectiveness in raising test scores and achieving its other goals." | Dismissive | Chapter 6: Improving test-based accountability, p.122 | Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002 | https://www.rand.org/pubs/monograph_reports/MR1554.html | US National Science Foundation | In fact, the evidence "that testing can improve education" is voluminous. See, for example, Phelps, R. P. (2005). The rich, robust research literature on testing’s achievement benefits. In R. P. Phelps (Ed.), Defending standardized testing (pp. 55–90). Mahwah, NJ: Psychology Press. Or, see https://journals.sagepub.com/doi/abs/10.1177/0193841X19865628#abstract | ||
87 | Laura S. Hamilton, Brian M. Stecher | Laura S. Hamilton, Brian M. Stecher, Stephen P. Klein, Eds. | "Unfortunately, the complexity of the issues and the ambiguity of the existing research do not allow our recommendations to take the form of a practical “how-to” guide for policymakers and practitioners." | Denigrating | Chapter 6: Improving test-based accountability, p.123 | Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002 | https://www.rand.org/pubs/monograph_reports/MR1554.html | US National Science Foundation | In fact, the evidence "that testing can improve education" is voluminous. See, for example, Phelps, R. P. (2005). The rich, robust research literature on testing’s achievement benefits. In R. P. Phelps (Ed.), Defending standardized testing (pp. 55–90). Mahwah, NJ: Psychology Press. Or, see https://journals.sagepub.com/doi/abs/10.1177/0193841X19865628#abstract | ||
88 | Laura S. Hamilton, Brian M. Stecher | Laura S. Hamilton, Brian M. Stecher, Stephen P. Klein, Eds. | "Additional research is needed to identify the elements of performance on tests and how these elements map onto other tests …." | Denigrating | Chapter 6: Improving test-based accountability, p.127 | Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002 | https://www.rand.org/pubs/monograph_reports/MR1554.html | US National Science Foundation | |||
89 | Laura S. Hamilton, Brian M. Stecher | Laura S. Hamilton, Brian M. Stecher, Stephen P. Klein, Eds. | "Another
part of the interpretive question is the need to gather
information in other subject areas to portray a more complete picture of achievement. The scope of constructs that have been considered in research to date has been fairly narrow, focusing on the subjects that are part of the accountability systems that have been studied. Many legitimate instructional objectives have been ignored in the literature to date." |
Denigrating | Chapter 6: Improving test-based accountability, p.127 | Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002 | https://www.rand.org/pubs/monograph_reports/MR1554.html | US National Science Foundation | Many studies of the effects of testing predate CRESST's in the 1980s and cover all subject fields, not just reading and math. See, for example: https://www.tandfonline.com/doi/full/10.1080/15305058.2011.602920 ; https://nonpartisaneducation.org/Review/Resources/QuantitativeList.htm ; https://nonpartisaneducation.org/Review/Resources/SurveyList.htm ; https://nonpartisaneducation.org/Review/Resources/QualitativeList.htm | ||
90 | Laura S. Hamilton, Brian M. Stecher | Laura S. Hamilton, Brian M. Stecher, Stephen P. Klein, Eds. | "States should also conduct ongoing analyses of the performance of groups whose members may not be numerous enough to permit separate reporting. English-language learners and students with disabilities are increasingly being included in high-stakes testing systems, and, as discussed in Chapter Three, little is currently known about the validity of scores for these groups." | Dismissive | Chapter 6: Improving test-based accountability, p.131 | Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002 | https://www.rand.org/pubs/monograph_reports/MR1554.html | US National Science Foundation | Difficult to believe given that the federal government has for decades generously funded research into testing students with disabilities. See, for example, https://nceo.info/ and Kurt Geisinger's and Janet Carlson's chapters in Defending Standardized Testing and Correcting Fallacies in Educational and Psychological Testing. | ||
91 | Laura S. Hamilton, Brian M. Stecher | Laura S. Hamilton, Brian M. Stecher, Stephen P. Klein, Eds. | "It would be especially helpful to know what changes in instruction are made in response to different kinds of information and incentives. In particular, we need to know how teachers interpret information from tests and how they use it to modify instruction." | Dismissive | Chapter 6: Improving test-based accountability, p.133 | Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002 | https://www.rand.org/pubs/monograph_reports/MR1554.html | US National Science Foundation | Relevant studies of the effects of tests and/or accountability program on motivation and instructional practice include those of the *Southern Regional Education Board (1998); Johnson (1998); Schafer, Hultgren, Hawley, Abrams Seubert & Mazzoni (1997); Miles, Bishop, Collins, Fink, Gardner, Grant, Hussain, et al. (1997); Tuckman & Trimble (1997); Clarke & Stephens (1996); Zigarelli (1996); Stevenson, Lee, et al. (1995); Waters, Burger & Burger (1995); Egeland (1995); Prais (1995); Tuckman (1994); Ritchie & Thorkildsen (1994); Brown & Walberg, (1993); Wall & Alderson (1993); Wolf & Rapiau (1993); Eckstein & Noah (1993); Chao-Qun & Hui (1993); Plazak & Mazur (1992); Steedman (1992); Singh, Marimutha & Mukjerjee (1990); *Levine & Lezotte (1990); O’Sullivan (1989); Somerset (1988); Pennycuick & Murphy (1988); Stevens (1984); Marsh (1984); Brunton (1982); Solberg (1977); Foss (1977); *Kirkland (1971); Somerset (1968); Stuit (1947); and Keys (1934). *Covers many studies; study is a research review, research synthesis, or meta-analysis. | ||
92 | Laura S. Hamilton, Brian M. Stecher | Laura S. Hamilton, Brian M. Stecher, Stephen P. Klein, Eds. | It seems clear that aligning the components of the system and providing appropriate professional development should, at a minimum, increase teachers’ political support for test-based accountability policies .... Although there is no empirical evidence to suggest that this strategy will reduce inappropriate responses to high-stakes testing,... Additional research needs to be done to determine the importance of alignment for promoting positive effects of test-based accountability. | Dismissive | Chapter 6: Improving test-based accountability, p.135 | Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002 | https://www.rand.org/pubs/monograph_reports/MR1554.html | US National Science Foundation | Relevant studies of the effects of tests and/or accountability program on motivation and instructional practice include those of the *Southern Regional Education Board (1998); Johnson (1998); Schafer, Hultgren, Hawley, Abrams Seubert & Mazzoni (1997); Miles, Bishop, Collins, Fink, Gardner, Grant, Hussain, et al. (1997); Tuckman & Trimble (1997); Clarke & Stephens (1996); Zigarelli (1996); Stevenson, Lee, et al. (1995); Waters, Burger & Burger (1995); Egeland (1995); Prais (1995); Tuckman (1994); Ritchie & Thorkildsen (1994); Brown & Walberg, (1993); Wall & Alderson (1993); Wolf & Rapiau (1993); Eckstein & Noah (1993); Chao-Qun & Hui (1993); Plazak & Mazur (1992); Steedman (1992); Singh, Marimutha & Mukjerjee (1990); *Levine & Lezotte (1990); O’Sullivan (1989); Somerset (1988); Pennycuick & Murphy (1988); Stevens (1984); Marsh (1984); Brunton (1982); Solberg (1977); Foss (1977); *Kirkland (1971); Somerset (1968); Stuit (1947); and Keys (1934). *Covers many studies; study is a research review, research synthesis, or meta-analysis. | ||
93 | Laura S. Hamilton, Brian M. Stecher | Laura S. Hamilton, Brian M. Stecher, Stephen P. Klein, Eds. | "… we currently do not know enough about test-based accountability to design a system that is immune from the problems we have discussed | Dismissive | Chapter 6: Improving test-based accountability, p.136 | Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002 | https://www.rand.org/pubs/monograph_reports/MR1554.html | US National Science Foundation | In fact, the evidence "that testing can improve education" is voluminous. See, for example, Phelps, R. P. (2005). The rich, robust research literature on testing’s achievement benefits. In R. P. Phelps (Ed.), Defending standardized testing (pp. 55–90). Mahwah, NJ: Psychology Press. Or, see https://journals.sagepub.com/doi/abs/10.1177/0193841X19865628#abstract | ||
94 | Laura S. Hamilton, Brian M. Stecher | Laura S. Hamilton, Brian M. Stecher, Stephen P. Klein, Eds. | "There is some limited evidence that educators’ responses to test based accountability vary according to the characteristics of their student populations,…" | Denigrating | Chapter 6: Improving test-based accountability, p.138 | Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002 | https://www.rand.org/pubs/monograph_reports/MR1554.html | US National Science Foundation | There was and is far more than "limited" evidence. See, for example: Hundreds, perhaps thousands, of studies of the effects of testing predate CRESST's in the 1980s. See, for example: https://www.tandfonline.com/doi/full/10.1080/15305058.2011.602920 ; https://nonpartisaneducation.org/Review/Resources/SurveyList.htm ; https://nonpartisaneducation.org/Review/Resources/QualitativeList.htm | ||
95 | Laura S. Hamilton, Brian M. Stecher | Laura S. Hamilton, Brian M. Stecher, Stephen P. Klein, Eds. | "... there is very limited evidence to guide thinking about political issues." | Dismissive | Chapter 6: Improving test-based accountability, p.139 | Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002 | https://www.rand.org/pubs/monograph_reports/MR1554.html | US National Science Foundation | Parents and other adults are typically reached.through public opinion polls. See, for example: https://www.tandfonline.com/doi/full/10.1080/15305058.2011.602920 ; https://nonpartisaneducation.org/Review/Resources/SurveyList.htm . Among the hundreds of polls conducted between 1958 and 2008, a majority of them included parents in particular or adults in general. | ||
96 | Laura S. Hamilton, Brian M. Stecher | Laura S. Hamilton, Brian M. Stecher, Stephen P. Klein, Eds. | "First, we do not have an accurate assessment of the additional costs." | Denigrating | Chapter 6: Improving test-based accountability, p.141 | Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002 | https://www.rand.org/pubs/monograph_reports/MR1554.html | US National Science Foundation | Yes, we did and we do. See, for example, Phelps, R.P. (2000, Winter). Estimating the cost of systemwide student testing in the United States. Journal of Education Finance, 25(3) 343–380; Danitz, T. (2001, February 27). Special report: States pay $400 million for tests in 2001. Stateline.org. Pew Center for the States; Hoxby, C.M. (2002). The cost of accountability, in W. M Evers & H.J. Walberg (Eds.), School Accountability, Stanford, CA: Hoover Institution Press; U.S. GAO. (1993, January). Student testing: Current extent and expenditures, with cost estimates for a national examination. GAO/PEMD-93-8. Washington, DC: US General Accounting Office; Picus, L.O., & Tralli, A. (1998, February). Alternative assessment programs: What are the true costs? CSE Technical Report 441, Los Angeles: CRESST; Phelps, R.P. (1998). Benefit-cost analysis of systemwide student testing, Paper presented at the annual meeting of the American Education Finance Association, Mobile, AL. | ||
97 | Laura S. Hamilton, Brian M. Stecher | Laura S. Hamilton, Brian M. Stecher, Stephen P. Klein, Eds. | "However, many of these recommended reforms are relatively inexpensive in comparison with the total cost of education. This equation is seldom examined." | Denigrating | Chapter 6: Improving test-based accountability, p.141 | Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002 | https://www.rand.org/pubs/monograph_reports/MR1554.html | US National Science Foundation | Wrong. See, for example, Phelps, R.P. (2000, Winter). Estimating the cost of systemwide student testing in the United States. Journal of Education Finance, 25(3) 343–380; Phelps, R.P. (1998). Benefit-cost analysis of systemwide student testing, Paper presented at the annual meeting of the American Education Finance Association, Mobile, AL. | ||
98 | Laura S. Hamilton, Brian M. Stecher | Laura S. Hamilton, Brian M. Stecher, Stephen P. Klein, Eds. | "Part of the reason these issues are rarely considered may be that no one has produced a good estimate of the cost of an improved accountability system in comparison with its benefits." | Denigrating | Chapter 6: Improving test-based accountability, p.141 | Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002 | https://www.rand.org/pubs/monograph_reports/MR1554.html | US National Science Foundation | No. See, for example, Phelps, R.P. (2000, Winter). Estimating the cost of systemwide student testing in the United States. Journal of Education Finance, 25(3) 343–380; Danitz, T. (2001, February 27). Special report: States pay $400 million for tests in 2001. Stateline.org. Pew Center for the States; Hoxby, C.M. (2002). The cost of accountability, in W. M Evers & H.J. Walberg (Eds.), School Accountability, Stanford, CA: Hoover Institution Press; U.S. GAO. (1993, January). Student testing: Current extent and expenditures, with cost estimates for a national examination. GAO/PEMD-93-8. Washington, DC: US General Accounting Office; Picus, L.O., & Tralli, A. (1998, February). Alternative assessment programs: What are the true costs? CSE Technical Report 441, Los Angeles: CRESST; Phelps, R.P. (1998). Benefit-cost analysis of systemwide student testing, Paper presented at the annual meeting of the American Education Finance Association, Mobile, AL. | ||
99 | Laura S. Hamilton, Brian M. Stecher | Laura S. Hamilton, Brian M. Stecher, Stephen P. Klein, Eds. | "Nevertheless, our knowledge of the costs of alternative accountability systems is still somewhat limited. Policymakers need to know how much it would cost to change their current systems to be responsive to criticisms such as those described in this book. These estimates need to consider all of the associated costs, including possible opportunity costs associated with increased testing time and increased test preparation time." | Dismissive | Chapter 6: Improving test-based accountability, p.142 | Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002 | https://www.rand.org/pubs/monograph_reports/MR1554.html | US National Science Foundation | No. See, for example, Phelps, R.P. (2000, Winter). Estimating the cost of systemwide student testing in the United States. Journal of Education Finance, 25(3) 343–380; Danitz, T. (2001, February 27). Special report: States pay $400 million for tests in 2001. Stateline.org. Pew Center for the States; Hoxby, C.M. (2002). The cost of accountability, in W. M Evers & H.J. Walberg (Eds.), School Accountability, Stanford, CA: Hoover Institution Press; U.S. GAO. (1993, January). Student testing: Current extent and expenditures, with cost estimates for a national examination. GAO/PEMD-93-8. Washington, DC: US General Accounting Office; Picus, L.O., & Tralli, A. (1998, February). Alternative assessment programs: What are the true costs? CSE Technical Report 441, Los Angeles: CRESST; Phelps, R.P. (1998). Benefit-cost analysis of systemwide student testing, Paper presented at the annual meeting of the American Education Finance Association, Mobile, AL. | ||
100 | Laura S. Hamilton, Brian M. Stecher | Laura S. Hamilton, Brian M. Stecher, Stephen P. Klein, Eds. | "However, there is still much about these systems that is not well understood. Lack of research-based knowledge about the quality of scores and the mechanisms through which high-stakes testing programs operate limits our ability to improve these systems. As a result, our discussions also identified unanswered questions..." | Dismissive | Chapter 6: Improving test-based accountability, p.143 | Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002 | https://www.rand.org/pubs/monograph_reports/MR1554.html | US National Science Foundation | In fact, the evidence "that testing can improve education" is voluminous. See, for example, Phelps, R. P. (2005). The rich, robust research literature on testing’s achievement benefits. In R. P. Phelps (Ed.), Defending standardized testing (pp. 55–90). Mahwah, NJ: Psychology Press. Or, see https://journals.sagepub.com/doi/abs/10.1177/0193841X19865628#abstract | ||
101 | Laura S. Hamilton | Brian M. Stecher, Stephen P. Klein | "Although test-based accountability has shown some compelling results, the issues are complex, the research is new and incomplete, and many of the claims that have received the most attention have proved to be premature and superficial." | Denigrating | Summary, p.xiv | Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002 | https://www.rand.org/content/dam/rand/pubs/monograph_reports/2002/MR1554.pdf | Hundreds, perhaps thousands, of studies of the effects of testing predate CRESST's in the 1980s. See, for example: https://www.tandfonline.com/doi/full/10.1080/15305058.2011.602920 ; https://nonpartisaneducation.org/Review/Resources/QuantitativeList.htm ; https://nonpartisaneducation.org/Review/Resources/SurveyList.htm ; https://nonpartisaneducation.org/Review/Resources/QualitativeList.htm | |||
102 | Laura S. Hamilton | Brian M. Stecher, Stephen P. Klein | "The research evidence does not provide definitive information about the actual costs of testing but the information that is available suggests that expenditures for testing have grown in recent years." | Dismissive | Introduction, p.9 | Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002 | https://www.rand.org/content/dam/rand/pubs/monograph_reports/2002/MR1554.pdf | No. See, for example, Phelps, R.P. (2000, Winter). Estimating the cost of systemwide student testing in the United States. Journal of Education Finance, 25(3) 343–380; Danitz, T. (2001, February 27). Special report: States pay $400 million for tests in 2001. Stateline.org. Pew Center for the States; Hoxby, C.M. (2002). The cost of accountability, in W. M Evers & H.J. Walberg (Eds.), School Accountability, Stanford, CA: Hoover Institution Press; U.S. GAO. (1993, January). Student testing: Current extent and expenditures, with cost estimates for a national examination. GAO/PEMD-93-8. Washington, DC: US General Accounting Office; Picus, L.O., & Tralli, A. (1998, February). Alternative assessment programs: What are the true costs? CSE Technical Report 441, Los Angeles: CRESST; Phelps, R.P. (1998). Benefit-cost analysis of systemwide student testing, Paper presented at the annual meeting of the American Education Finance Association, Mobile, AL. | |||
103 | Laura S. Hamilton | Brian M. Stecher, Stephen P. Klein | "The General Accounting Office (1993) … estimate was $516 million … The estimate does not include time for more-extensive test preparation activities." p.9 | Denigrating | Introduction, p.9 | Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002 | https://www.rand.org/pubs/monograph_reports/MR1554.html | As a matter of fact the GAO report did include those costs -- all of them. The GAO surveys very explicitly instructed respondents to "include any and all costs related" to each test, including any and all test preparation time and expenses. | |||
104 | Laura S. Hamilton | Daniel M. Koretz | "There is currently no substantial evidence on the effects of published report cards on parents’ decisionmaking or on the schools themselves." | Dismissive | Chapter
2: Tests and their use in test-based accountability systems, p.44 |
Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002 | https://www.rand.org/content/dam/rand/pubs/monograph_reports/2002/MR1554.pdf | For decades, consulting services have existed that help parents new to a city select the right school or school district for them. | |||
105 | Brian M. Stecher | Laura S. Hamilton, Stephen P. Klein, Eds. | "High-stakes testing may also affect parents (e.g., their attitudes toward education, their engagement with schools, and their direct participation in their child's learning) as well as policymakers (their beliefs about system performance, their judgements about program effectiveness, and their allocation of resources). However, these issues remain largely unexamined in the literature." | Dismissive | Chapter 4 Consequences of large-scale, high-stakes testing on school and classroom practice, p.79 | Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002 | https://www.rand.org/content/dam/rand/pubs/monograph_reports/2002/MR1554.pdf | Parents and other adults are typically reached.through public opinion polls. See, for example: https://www.tandfonline.com/doi/full/10.1080/15305058.2011.602920 ; https://nonpartisaneducation.org/Review/Resources/SurveyList.htm . Among the hundreds of polls conducted between 1958 and 2008, a majority of them included parents in particular or adults in general. | |||
106 | Brian M. Stecher | Laura S. Hamilton, Stephen P. Klein, Eds. | "As described in chapter 2, there was little concern about the effects of testing on teaching prior to the 1970s." | Dismissive | Chapter 4 Consequences of large-scale, high-stakes testing on school and classroom practice, p.81 | Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002 | https://www.rand.org/content/dam/rand/pubs/monograph_reports/2002/MR1554.pdf | Hundreds, perhaps thousands, of studies of the effects of testing predate CRESST's in the 1980s. See, for example: https://www.tandfonline.com/doi/full/10.1080/15305058.2011.602920 ; https://nonpartisaneducation.org/Review/Resources/QuantitativeList.htm ; https://nonpartisaneducation.org/Review/Resources/SurveyList.htm ; https://nonpartisaneducation.org/Review/Resources/QualitativeList.htm | |||
107 | Brian M. Stecher | Laura S. Hamilton, Stephen P. Klein, Eds. | "In light of the changes that occurred in the uses of large-scale testing in the 1980s and 1990s, researchers began to investigate teachers' reactions to external assessment. The initial research on the impact of large-scale testing was conducted in the 1980s and the 1990s." | Dismissive | Chapter 4 Consequences of large-scale, high-stakes testing on school and classroom practice, p.83 | Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002 | https://www.rand.org/content/dam/rand/pubs/monograph_reports/2002/MR1554.pdf | Hundreds, perhaps thousands, of studies of the effects of testing predate CRESST's in the 1980s. See, for example: https://www.tandfonline.com/doi/full/10.1080/15305058.2011.602920 ; https://nonpartisaneducation.org/Review/Resources/QuantitativeList.htm ; https://nonpartisaneducation.org/Review/Resources/SurveyList.htm ; https://nonpartisaneducation.org/Review/Resources/QualitativeList.htm | |||
108 | Brian M. Stecher | Laura S. Hamilton, Stephen P. Klein, Eds. | "The bulk of the research on the effects of testing has been conducted using surveys and case studies." | Dismissive | Chapter 4: Consequences of large-scale, high-stakes testing on school and classroom practice, p. 83 | Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002 | https://www.rand.org/content/dam/rand/pubs/monograph_reports/2002/MR1554.pdf | This is misleading. True, many of the hundreds of studies on the effects of testing have been surveys and case studies. But, many, and more by my count, have been randomized experiments. See, for example: https://www.tandfonline.com/doi/full/10.1080/15305058.2011.602920 ; https://nonpartisaneducation.org/Review/Resources/QuantitativeList.htm ; | |||
109 | Brian M. Stecher | Laura S. Hamilton, Stephen P. Klein, Eds. | "Researchers have not documented the desirable consequences of testing … as clearly as the undesirable ones. More importantly, researchers have not generally measured the extent or magnitude of the shifts in practice that they identified as a result of high-stakes testing." | Dismissive | Chapter 4 Consequences of large-scale, high-stakes testing on school and classroom practice, pp.99–100 | Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002 | https://www.rand.org/content/dam/rand/pubs/monograph_reports/2002/MR1554.pdf | Hundreds, perhaps thousands, of studies of the effects of testing predate CRESST's in the 1980s. See, for example: https://www.tandfonline.com/doi/full/10.1080/15305058.2011.602920 ; https://nonpartisaneducation.org/Review/Resources/QuantitativeList.htm ; https://nonpartisaneducation.org/Review/Resources/SurveyList.htm ; https://nonpartisaneducation.org/Review/Resources/QualitativeList.htm | |||
110 | Laura S. Hamilton | Brian M. Stecher | "So test-based accountability remains controversial because there is inadequate evidence to make clear judgments about its effectiveness in raising test scores and achieving its other goals." | Dismissive | Chapter 6: Improving test-based accountability, p.122 | Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002 | https://www.rand.org/content/dam/rand/pubs/monograph_reports/2002/MR1554.pdf | In fact, the evidence "that testing can improve education" is voluminous. See, for example, Phelps, R. P. (2005). The rich, robust research literature on testing’s achievement benefits. In R. P. Phelps (Ed.), Defending standardized testing (pp. 55–90). Mahwah, NJ: Psychology Press. Or, see https://journals.sagepub.com/doi/abs/10.1177/0193841X19865628#abstract | |||
111 | Laura S. Hamilton | Brian M. Stecher | "Unfortunately, the complexity of the issues and the ambiguity of the existing research do not allow our recommendations to take the form of a practical “how-to” guide for policymakers and practitioners." | Denigrating | Chapter 6: Improving test-based accountability, p.123 | Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002 | https://www.rand.org/content/dam/rand/pubs/monograph_reports/2002/MR1554.pdf | In fact, the evidence "that testing can improve education" is voluminous. See, for example, Phelps, R. P. (2005). The rich, robust research literature on testing’s achievement benefits. In R. P. Phelps (Ed.), Defending standardized testing (pp. 55–90). Mahwah, NJ: Psychology Press. Or, see https://journals.sagepub.com/doi/abs/10.1177/0193841X19865628#abstract | |||
112 | Laura S. Hamilton | Brian M. Stecher | "Additional research is needed to identify the elements of performance on tests and how these elements map onto other tests …." | Denigrating | Chapter 6: Improving test-based accountability, p.127 | Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002 | https://www.rand.org/pubs/monograph_reports/MR1554.html | ||||
113 | Laura S. Hamilton | Brian M. Stecher | "Another
part of the interpretive question is the need to gather
information in other subject areas to portray a more complete picture of achievement. The scope of constructs that have been considered in research to date has been fairly narrow, focusing on the subjects that are part of the accountability systems that have been studied. Many legitimate instructional objectives have been ignored in the literature to date." |
Dismissive | Chapter 6: Improving test-based accountability, p.127 | Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002 | https://www.rand.org/content/dam/rand/pubs/monograph_reports/2002/MR1554.pdf | Many studies of the effects of testing predate CRESST's in the 1980s and cover all subject fields, not just reading and math. See, for example: https://www.tandfonline.com/doi/full/10.1080/15305058.2011.602920 ; https://nonpartisaneducation.org/Review/Resources/QuantitativeList.htm ; https://nonpartisaneducation.org/Review/Resources/SurveyList.htm ; https://nonpartisaneducation.org/Review/Resources/QualitativeList.htm | |||
114 | Laura S. Hamilton | Brian M. Stecher | "States should also conduct ongoing analyses of the performance of groups whose members may not be numerous enough to permit separate reporting. English-language learners and students with disabilities are increasingly being included in high-stakes testing systems, and, as discussed in Chapter Three, little is currently known about the validity of scores for these groups." | Dismissive | Chapter 6: Improving test-based accountability, p.131 | Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002 | https://www.rand.org/content/dam/rand/pubs/monograph_reports/2002/MR1554.pdf | Difficult to believe given that the federal government has for decades generously funded research into testing students with disabilities. See, for example, https://nceo.info/ and Kurt Geisinger's and Janet Carlson's chapters in Defending Standardized Testing and Correcting Fallacies in Educational and Psychological Testing. | |||
115 | Laura S. Hamilton | Brian M. Stecher | "It would be especially helpful to know what changes in instruction are made in response to different kinds of information and incentives. In particular, we need to know how teachers interpret information from tests and how they use it to modify instruction." | Dismissive | Chapter 6: Improving test-based accountability, p.133 | Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002 | https://www.rand.org/content/dam/rand/pubs/monograph_reports/2002/MR1554.pdf | Relevant studies of the effects of tests and/or accountability program on motivation and instructional practice include those of the *Southern Regional Education Board (1998); Johnson (1998); Schafer, Hultgren, Hawley, Abrams Seubert & Mazzoni (1997); Miles, Bishop, Collins, Fink, Gardner, Grant, Hussain, et al. (1997); Tuckman & Trimble (1997); Clarke & Stephens (1996); Zigarelli (1996); Stevenson, Lee, et al. (1995); Waters, Burger & Burger (1995); Egeland (1995); Prais (1995); Tuckman (1994); Ritchie & Thorkildsen (1994); Brown & Walberg, (1993); Wall & Alderson (1993); Wolf & Rapiau (1993); Eckstein & Noah (1993); Chao-Qun & Hui (1993); Plazak & Mazur (1992); Steedman (1992); Singh, Marimutha & Mukjerjee (1990); *Levine & Lezotte (1990); O’Sullivan (1989); Somerset (1988); Pennycuick & Murphy (1988); Stevens (1984); Marsh (1984); Brunton (1982); Solberg (1977); Foss (1977); *Kirkland (1971); Somerset (1968); Stuit (1947); and Keys (1934). *Covers many studies; study is a research review, research synthesis, or meta-analysis. | |||
116 | Laura S. Hamilton | Brian M. Stecher | It seems clear that aligning the components of the system and providing appropriate professional development should, at a minimum, increase teachers’ political support for test-based accountability policies .... Although there is no empirical evidence to suggest that this strategy will reduce inappropriate responses to high-stakes testing,... Additional research needs to be done to determine the importance of alignment for promoting positive effects of test-based accountability. | Dismissive | Chapter 6: Improving test-based accountability, p.135 | Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002 | https://www.rand.org/content/dam/rand/pubs/monograph_reports/2002/MR1554.pdf | Relevant studies of the effects of tests and/or accountability program on motivation and instructional practice include those of the *Southern Regional Education Board (1998); Johnson (1998); Schafer, Hultgren, Hawley, Abrams Seubert & Mazzoni (1997); Miles, Bishop, Collins, Fink, Gardner, Grant, Hussain, et al. (1997); Tuckman & Trimble (1997); Clarke & Stephens (1996); Zigarelli (1996); Stevenson, Lee, et al. (1995); Waters, Burger & Burger (1995); Egeland (1995); Prais (1995); Tuckman (1994); Ritchie & Thorkildsen (1994); Brown & Walberg, (1993); Wall & Alderson (1993); Wolf & Rapiau (1993); Eckstein & Noah (1993); Chao-Qun & Hui (1993); Plazak & Mazur (1992); Steedman (1992); Singh, Marimutha & Mukjerjee (1990); *Levine & Lezotte (1990); O’Sullivan (1989); Somerset (1988); Pennycuick & Murphy (1988); Stevens (1984); Marsh (1984); Brunton (1982); Solberg (1977); Foss (1977); *Kirkland (1971); Somerset (1968); Stuit (1947); and Keys (1934). *Covers many studies; study is a research review, research synthesis, or meta-analysis. | |||
117 | Laura S. Hamilton | Brian M. Stecher | "… we currently do not know enough about test-based accountability to design a system that is immune from the problems we have discussed | Dismissive | Chapter 6: Improving test-based accountability, p.136 | Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002 | https://www.rand.org/content/dam/rand/pubs/monograph_reports/2002/MR1554.pdf | In fact, the evidence "that testing can improve education" is voluminous. See, for example, Phelps, R. P. (2005). The rich, robust research literature on testing’s achievement benefits. In R. P. Phelps (Ed.), Defending standardized testing (pp. 55–90). Mahwah, NJ: Psychology Press. Or, see https://journals.sagepub.com/doi/abs/10.1177/0193841X19865628#abstract | |||
118 | Laura S. Hamilton | Brian M. Stecher | "There is some limited evidence that educators’ responses to test based accountability vary according to the characteristics of their student populations,…" | Denigrating | Chapter 6: Improving test-based accountability, p.138 | Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002 | https://www.rand.org/content/dam/rand/pubs/monograph_reports/2002/MR1554.pdf | There was and is far more than "limited" evidence. See, for example: Hundreds, perhaps thousands, of studies of the effects of testing predate CRESST's in the 1980s. See, for example: https://www.tandfonline.com/doi/full/10.1080/15305058.2011.602920 ; https://nonpartisaneducation.org/Review/Resources/SurveyList.htm ; https://nonpartisaneducation.org/Review/Resources/QualitativeList.htm | |||
119 | Laura S. Hamilton | Brian M. Stecher | "... there is very limited evidence to guide thinking about political issues." | Dismissive | Chapter 6: Improving test-based accountability, p.139 | Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002 | https://www.rand.org/content/dam/rand/pubs/monograph_reports/2002/MR1554.pdf | ||||
120 | Laura S. Hamilton | Brian M. Stecher | "First, we do not have an accurate assessment of the additional costs." | Denigrating | Chapter 6: Improving test-based accountability, p.141 | Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002 | https://www.rand.org/content/dam/rand/pubs/monograph_reports/2002/MR1554.pdf | Yes, we did and we do. See, for example, Phelps, R.P. (2000, Winter). Estimating the cost of systemwide student testing in the United States. Journal of Education Finance, 25(3) 343–380; Danitz, T. (2001, February 27). Special report: States pay $400 million for tests in 2001. Stateline.org. Pew Center for the States; Hoxby, C.M. (2002). The cost of accountability, in W. M Evers & H.J. Walberg (Eds.), School Accountability, Stanford, CA: Hoover Institution Press; U.S. GAO. (1993, January). Student testing: Current extent and expenditures, with cost estimates for a national examination. GAO/PEMD-93-8. Washington, DC: US General Accounting Office; Picus, L.O., & Tralli, A. (1998, February). Alternative assessment programs: What are the true costs? CSE Technical Report 441, Los Angeles: CRESST; Phelps, R.P. (1998). Benefit-cost analysis of systemwide student testing, Paper presented at the annual meeting of the American Education Finance Association, Mobile, AL. | |||
121 | Laura S. Hamilton | Brian M. Stecher | "However, many of these recommended reforms are relatively inexpensive in comparison with the total cost of education. This equation is seldom examined." | Denigrating | Chapter 6: Improving test-based accountability, p.141 | Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002 | https://www.rand.org/pubs/monograph_reports/MR1554.html | Wrong. See, for example, Phelps, R.P. (2000, Winter). Estimating the cost of systemwide student testing in the United States. Journal of Education Finance, 25(3) 343–380; Phelps, R.P. (1998). Benefit-cost analysis of systemwide student testing, Paper presented at the annual meeting of the American Education Finance Association, Mobile, AL. | |||
122 | Laura S. Hamilton | Brian M. Stecher | "Part of the reason these issues are rarely considered may be that no one has produced a good estimate of the cost of an improved accountability system in comparison with its benefits." | Denigrating | Chapter 6: Improving test-based accountability, p.141 | Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002 | https://www.rand.org/content/dam/rand/pubs/monograph_reports/2002/MR1554.pdf | No. See, for example, Phelps, R.P. (2000, Winter). Estimating the cost of systemwide student testing in the United States. Journal of Education Finance, 25(3) 343–380; Danitz, T. (2001, February 27). Special report: States pay $400 million for tests in 2001. Stateline.org. Pew Center for the States; Hoxby, C.M. (2002). The cost of accountability, in W. M Evers & H.J. Walberg (Eds.), School Accountability, Stanford, CA: Hoover Institution Press; U.S. GAO. (1993, January). Student testing: Current extent and expenditures, with cost estimates for a national examination. GAO/PEMD-93-8. Washington, DC: US General Accounting Office; Picus, L.O., & Tralli, A. (1998, February). Alternative assessment programs: What are the true costs? CSE Technical Report 441, Los Angeles: CRESST; Phelps, R.P. (1998). Benefit-cost analysis of systemwide student testing, Paper presented at the annual meeting of the American Education Finance Association, Mobile, AL. | |||
123 | Laura S. Hamilton | Brian M. Stecher | "Nevertheless, our knowledge of the costs of alternative accountability systems is still somewhat limited. Policymakers need to know how much it would cost to change their current systems to be responsive to criticisms such as those described in this book. These estimates need to consider all of the associated costs, including possible opportunity costs associated with increased testing time and increased test preparation time." | Dismissive | Chapter 6: Improving test-based accountability, p.142 | Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002 | https://www.rand.org/content/dam/rand/pubs/monograph_reports/2002/MR1554.pdf | No. See, for example, Phelps, R.P. (2000, Winter). Estimating the cost of systemwide student testing in the United States. Journal of Education Finance, 25(3) 343–380; Danitz, T. (2001, February 27). Special report: States pay $400 million for tests in 2001. Stateline.org. Pew Center for the States; Hoxby, C.M. (2002). The cost of accountability, in W. M Evers & H.J. Walberg (Eds.), School Accountability, Stanford, CA: Hoover Institution Press; U.S. GAO. (1993, January). Student testing: Current extent and expenditures, with cost estimates for a national examination. GAO/PEMD-93-8. Washington, DC: US General Accounting Office; Picus, L.O., & Tralli, A. (1998, February). Alternative assessment programs: What are the true costs? CSE Technical Report 441, Los Angeles: CRESST; Phelps, R.P. (1998). Benefit-cost analysis of systemwide student testing, Paper presented at the annual meeting of the American Education Finance Association, Mobile, AL. | |||
124 | Laura S. Hamilton | Brian M. Stecher | "However, there is still much about these systems that is not well understood. Lack of research-based knowledge about the quality of scores and the mechanisms through which high-stakes testing programs operate limits our ability to improve these systems. As a result, our discussions also identified unanswered questions..." | Dismissive | Chapter 6: Improving test-based accountability, p.143 | Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002 | https://www.rand.org/content/dam/rand/pubs/monograph_reports/2002/MR1554.pdf | In fact, the evidence "that testing can improve education" is voluminous. See, for example, Phelps, R. P. (2005). The rich, robust research literature on testing’s achievement benefits. In R. P. Phelps (Ed.), Defending standardized testing (pp. 55–90). Mahwah, NJ: Psychology Press. Or, see https://journals.sagepub.com/doi/abs/10.1177/0193841X19865628#abstract | |||
125 | Daniel M. Koretz | Daniel F. McCaffrey, Laura S. Hamilton | "Although high-stakes testing is now widespread, methods for evaluating the validity of gains obtained under high-stakes conditions are poorly developed. This report presents an approach for evaluating the validity of inferences based on score gains on high-stakes tests. It describes the inadequacy of traditional validation approaches for validating gains under high-stakes conditions and outlines an alternative validation framework for conceptualizing meaningful and inflated score gains.", p.1 | Denigrating | Toward a framework for validating gains under high-stakes conditions | CSE Technical Report 551, CRESST/Harvard Graduate School of Education, CRESST/RAND Education, December 2001 | https://files.eric.ed.gov/fulltext/ED462410.pdf | US Education Department | In fact the test prep, or test coaching, literature is vast and dates back decades, with meta-analyses of the literature dating back at least to the 1970s. There's even a What Works Clearinghouse summary of the (post World Wide Web) college admission test prep research literature: https://ies.ed.gov/ncee/wwc/Docs/InterventionReports/wwc_act_sat_100416.pdf . See also: Gilmore (1927) DeWeerdt (1927) French (1959) French & Dear (1959) Ortar (1960) Marron (1965) ETS (1965). Messick & Jungeblut (1981) Ellis, Konoske, Wulfeck, & Montague (1982) DerSimonian and Laird (1983) Kulik, Bangert-Drowns & Kulik (1984) Powers (1985) Samson (1985) Scruggs, White, & Bennion (1986) Jones (1986). Fraker (1986/1987) Halpin (1987) Whitla (1988) Snedecor (1989) Bond (1989). Baydar (1990) Becker (1990) Smyth (1990) Moore (1991) Alderson & Wall (1992) Powers (1993) Oren (1993). Powers & Rock (1994) Scholes, Lane (1997) Allalouf & Ben Shakhar (1998) Robb & Ercanbrack (1999) McClain (1999) Camara (1999, 2001, 2008) Stone & Lane (2000, 2003) Din & Soldan (2001) Briggs (2001) Palmer (2002) Briggs & Hansen (2004) Cankoy & Ali Tut (2005) Crocker (2005) Allensworth, Correa, & Ponisciak (2008) Domingue & Briggs (2009) Koljatic & Silva (2014) Early (2019) Herndon (2021) | ||
126 | Daniel M. Koretz | Daniel F. McCaffrey, Laura S. Hamilton | "Few efforts are made to evaluate directly score gains obtained under high-stakes conditions, and conventional validation tools are not fully adequate for the task.", p. 1 | Dismissive | Toward a framework for validating gains under high-stakes conditions | CSE Technical Report 551, CRESST/Harvard Graduate School of Education, CRESST/RAND Education, December 2001 | https://files.eric.ed.gov/fulltext/ED462410.pdf | US Education Department | In fact the test prep, or test coaching, literature is vast and dates back decades, with meta-analyses of the literature dating back at least to the 1970s. There's even a What Works Clearinghouse summary of the (post World Wide Web) college admission test prep research literature: https://ies.ed.gov/ncee/wwc/Docs/InterventionReports/wwc_act_sat_100416.pdf . See also: Gilmore (1927) DeWeerdt (1927) French (1959) French & Dear (1959) Ortar (1960) Marron (1965) ETS (1965). Messick & Jungeblut (1981) Ellis, Konoske, Wulfeck, & Montague (1982) DerSimonian and Laird (1983) Kulik, Bangert-Drowns & Kulik (1984) Powers (1985) Samson (1985) Scruggs, White, & Bennion (1986) Jones (1986). Fraker (1986/1987) Halpin (1987) Whitla (1988) Snedecor (1989) Bond (1989). Baydar (1990) Becker (1990) Smyth (1990) Moore (1991) Alderson & Wall (1992) Powers (1993) Oren (1993). Powers & Rock (1994) Scholes, Lane (1997) Allalouf & Ben Shakhar (1998) Robb & Ercanbrack (1999) McClain (1999) Camara (1999, 2001, 2008) Stone & Lane (2000, 2003) Din & Soldan (2001) Briggs (2001) Palmer (2002) Briggs & Hansen (2004) Cankoy & Ali Tut (2005) Crocker (2005) Allensworth, Correa, & Ponisciak (2008) Domingue & Briggs (2009) Koljatic & Silva (2014) Early (2019) Herndon (2021) | ||
127 | Daniel M. Koretz | Laura Hamilton | "Efforts to increase the participation of students with disabilities in large-scale assessments, however, are hindered by a lack of experience and systematic information (National Research Council, 1997). For example, there is little systematic information on the use or effects of special testing accommodations for elementary and secondary students with disabilities. | Dismissive | Assessing Students With Disabilities in Kentucky:The Effects of Accommodations, Format, and Subject, p.2 | CSE Technical Report 498, CRESST/Rand Education, January 1999 | https://files.eric.ed.gov/fulltext/ED440148.pdf | US Education Department | Difficult to believe given that the federal government has for decades generously funded research into testing students with disabilities. See, for example, https://nceo.info/ and Kurt Geisinger's and Janet Carlson's chapters in Defending Standardized Testing and Correcting Fallacies in Educational and Psychological Testing. | "The work reported in this publication was supported under the Educational Research and Development Center Program PR/Award Number R305B600002, as administered by the Office of Educational Research and Improvement, U.S. Department of Education." | |
128 | Daniel M. Koretz | Laura Hamilton | "In addition, there is little evidence about the effects of format differences on the assessment of students with disabilities." | Dismissive | Assessing Students With Disabilities in Kentucky:The Effects of Accommodations, Format, and Subject, p.2 | CSE Technical Report 498, CRESST/Rand Education, January 1999 | https://files.eric.ed.gov/fulltext/ED440148.pdf | US Education Department | Difficult to believe given that the federal government has for decades generously funded research into testing students with disabilities. See, for example, https://nceo.info/ and Kurt Geisinger's and Janet Carlson's chapters in Defending Standardized Testing and Correcting Fallacies in Educational and Psychological Testing. | "The work reported in this publication was supported under the Educational Research and Development Center Program PR/Award Number R305B600002, as administered by the Office of Educational Research and Improvement, U.S. Department of Education." | |
129 | Daniel M. Koretz | Laura Hamilton | "Others have argued the opposite, pointing out that open-response questions, for example, mix verbal skills with other skills to be measured and may make it more difficult to isolate and compensate for the effects of disabilities. Relevant research, however, is scarce." | Dismissive | Assessing Students With Disabilities in Kentucky:The Effects of Accommodations, Format, and Subject, p.2 | CSE Technical Report 498, CRESST/Rand Education, January 1999 | https://files.eric.ed.gov/fulltext/ED440148.pdf | US Education Department | Difficult to believe given that the federal government has for decades generously funded research into testing students with disabilities. See, for example, https://nceo.info/ and Kurt Geisinger's and Janet Carlson's chapters in Defending Standardized Testing and Correcting Fallacies in Educational and Psychological Testing. | "The work reported in this publication was supported under the Educational Research and Development Center Program PR/Award Number R305B600002, as administered by the Office of Educational Research and Improvement, U.S. Department of Education." | |
130 | Daniel M. Koretz | Laura Hamilton | "There is a clear need for additional descriptive studies of the performance of students with disabilities in large-scale assessments. In our earlier study, we noted that research evidence was sparse " | Dismissive | Assessing Students With Disabilities in Kentucky:The Effects of Accommodations, Format, and Subject, p.56 | CSE Technical Report 498, CRESST/Rand Education, January 1999 | https://files.eric.ed.gov/fulltext/ED440148.pdf | US Education Department | Difficult to believe given that the federal government has for decades generously funded research into testing students with disabilities. See, for example, https://nceo.info/ and Kurt Geisinger's and Janet Carlson's chapters in Defending Standardized Testing and Correcting Fallacies in Educational and Psychological Testing. | "The work reported in this publication was supported under the Educational Research and Development Center Program PR/Award Number R305B600002, as administered by the Office of Educational Research and Improvement, U.S. Department of Education." | |
131 | Laura S. Hamilton | “Despite the number of studies investigating affective aspects of test taking, little is known about how students perceive the kinds of extended performance assessments currently being developed for state and local testing programs.” - Abstract | Denigrating | An Investigation of Students' Affective Responses to Alternative Assessment Formats | Paper presented at the Annual Meeting of the National Council on Measurement in Education (New Orleans, LA, April 5-7, 1994) | http://files.eric.ed.gov/fulltext/ED376203.pdf | US Education Department | "At least
twelve pre-2004 student surveys were included here:
https://www.nonpartisaneducation.org/Review/Resources/SurveyList.htm. See
also: https://richardphelps.net/DemandForStandardizedTesting.pdf In 150 of 241 qualitative studies the focus of the interviews, case study, or observations was the effect of testing stakes on students. See: https://www.tandfonline.com/doi/abs/10.1080/15305058.2011.602920" |
|||
132 | Laura S. Hamilton | “As stated earlier, this study was not intended to produce results that could be generalized to other tasks or to other samples of students, but to identify questions that might be addressed by future studies and to suggest possible hypotheses.” p. 23 | Dismissive | An Investigation of Students' Affective Responses to Alternative Assessment Formats | Paper presented at the Annual Meeting of the National Council on Measurement in Education (New Orleans, LA, April 5-7, 1994) | http://files.eric.ed.gov/fulltext/ED376203.pdf | US Education Department | "At
least twelve pre-2004 student surveys were included here:
https://www.nonpartisaneducation.org/Review/Resources/SurveyList.htm. See
also: https://richardphelps.net/DemandForStandardizedTesting.pdf In 150 of 241 qualitative studies the focus of the interviews, case study, or observations was the effect of testing stakes on students. See: https://www.tandfonline.com/doi/abs/10.1080/15305058.2011.602920" |
|||
IRONIES: | |||||||||||
Laura S. Hamilton | "In the meantime, it will be important to continue to gather evidence, both from large-scale studies and from the individual experiences of teachers, administrators, and others affected by test-based accountability, and to make that evidence available so that it can inform efforts to improve accountability systems." p.57 | Assessment as a Policy Tool | Chapter 2, in Review of Research in Education (27), 2003 | https://journals.sagepub.com/doi/pdf/10.3102/0091732X027001025 | |||||||
Laura S. Hamilton | Brian M. Stecher, Stephen P. Klein | "Greater knowledge about testing and accountability can lead to better system design and more-effective system management." p.xiv | Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002 | Summary, p.xiv | https://www.rand.org/content/dam/rand/pubs/monograph_reports/2002/MR1554.pdf | ||||||
Laura S. Hamilton | Brian M. Stecher | "Incremental improvements to existing systems, based on current research on testing and accountability, should be combined with long-term research and development efforts that may ultimately lead to a major redesign of these systems. Success in this endeavor will require the thoughtful engagement of educators, policymakers, and researchers in discussions and debates about tests and testing policies." | Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002 | Chapter 6, Improving test-based accountability, pp.143-144 | https://www.rand.org/content/dam/rand/pubs/monograph_reports/2002/MR1554.pdf | ||||||
Christian Bourge, journalist | Laura S. Hamilton, interviewee | ""The fact that there is a high correlation (between the high-stakes and low-stake tests) doesn't necessarily mean the tests are telling you the same thing." | Experts differ about high-stakes testing | UPI, Feb. 13, 2003 | https://www.upi.com/Top_News/2003/02/13/Experts-differ-about-high-stakes-testing/60271045180206/ | ||||||
Christian Bourge, journalist | Laura S. Hamilton, interviewee | "She said that without examining the similarities in the design of each test, you do not know how comparable they are from district to district." | Experts differ about high-stakes testing | UPI, Feb. 13, 2003 | https://www.upi.com/Top_News/2003/02/13/Experts-differ-about-high-stakes-testing/60271045180206/ | ||||||
Cite selves or colleagues in the group, but dismiss or denigrate all other work | |||||||||||
Falsely claim that research has only recently been done on topic. | |||||||||||
Author cites (and accepts as fact without checking) someone elses dismissive review |