HOME:  Dismissive Reviews in Education Policy Research        
  Author Co-author(s) Dismissive Quote type Title Source Link1 Funders Notes Notes2
1 John F. Pane   "Practitioners and policymakers seeking to implement personalized learning, lacking clearly defined evidence-based models to adopt, are creating custom designs for their specific contexts. Those who want to use rigorous research evidence to guide their designs will find many gaps and will be left with important unanswered questions about which practices or combinations of practices are effective. It will likely take many years of research to fill these gaps". Dismissive Strategies for Implementing Personalized Learning While Evidence and Resources Are Underdeveloped, p.1 Rand Corporation Perspective, October 2018 https://www.rand.org/pubs/perspectives/PE314.html Funded by the William and Flora Hewlett Foundation, UCLA’s National Center for Research on Evaluation, Standards, and Student Testing (CRESST) is monitoring the extent to which the two consortia’s assessment development efforts are likely to produce tests that measure and support goals for deeper learning.  And, Rand Corporation funders Pane devotes considerable text to claims that no prior research exists, except for another Rand study, and then, on p.7 admits that there exist some relevant mastery learning studies from the 1980s. He implies, however, that there were only one or a few. In fact, there were hundreds. These researchers have included Homme, Csanyi, Gonzales, Rechs, O’Leary, Drabman, Kaszdin, Bootzin, Staats, Cameron, Pierce, McMillan, Corcoran, and Wilson. There have also been thousands  of studies of personalized instruction in conjunction with studies in special education, tutoring, teachers' aides, tracking, etc. 
2 John F. Pane   "The purpose of this Perspective is to offer strategic guidance for designers of personalized learning programs to consider while the evidence base is catching up." Dismissive Strategies for Implementing Personalized Learning While Evidence and Resources Are Underdeveloped, p.1 Rand Corporation Perspective, October 2018 https://www.rand.org/pubs/perspectives/PE314.html Funded by the William and Flora Hewlett Foundation, UCLA’s National Center for Research on Evaluation, Standards, and Student Testing (CRESST) is monitoring the extent to which the two consortia’s assessment development efforts are likely to produce tests that measure and support goals for deeper learning.  And, Rand Corporation funders Pane devotes considerable text to claims that no prior research exists, except for another Rand study, and then, on p.7 admits that there exist some relevant mastery learning studies from the 1980s. He implies, however, that there were only one or a few. In fact, there were hundreds. These researchers have included Homme, Csanyi, Gonzales, Rechs, O’Leary, Drabman, Kaszdin, Bootzin, Staats, Cameron, Pierce, McMillan, Corcoran, and Wilson. There have also been thousands  of studies of personalized instruction in conjunction with studies in special education, tutoring, teachers' aides, tracking, etc. 
3 John F. Pane   "This guidance draws on theory, basic principles of learning science, and the limited research that does exist on personalized learning and its component parts." Dismissive Strategies for Implementing Personalized Learning While Evidence and Resources Are Underdeveloped, p.1 Rand Corporation Perspective, October 2018 https://www.rand.org/pubs/perspectives/PE314.html Funded by the William and Flora Hewlett Foundation, UCLA’s National Center for Research on Evaluation, Standards, and Student Testing (CRESST) is monitoring the extent to which the two consortia’s assessment development efforts are likely to produce tests that measure and support goals for deeper learning.  And, Rand Corporation funders Pane devotes considerable text to claims that no prior research exists, except for another Rand study, and then, on p.7 admits that there exist some relevant mastery learning studies from the 1980s. He implies, however, that there were only one or a few. In fact, there were hundreds. These researchers have included Homme, Csanyi, Gonzales, Rechs, O’Leary, Drabman, Kaszdin, Bootzin, Staats, Cameron, Pierce, McMillan, Corcoran, and Wilson. There have also been thousands  of studies of personalized instruction in conjunction with studies in special education, tutoring, teachers' aides, tracking, etc. 
4 John F. Pane   "Thus far, the research evidence on personalized learning as an overarching schoolwide model is sparse." Dismissive Strategies for Implementing Personalized Learning While Evidence and Resources Are Underdeveloped, p.4 Rand Corporation Perspective, October 2018 https://www.rand.org/pubs/perspectives/PE314.html Funded by the William and Flora Hewlett Foundation, UCLA’s National Center for Research on Evaluation, Standards, and Student Testing (CRESST) is monitoring the extent to which the two consortia’s assessment development efforts are likely to produce tests that measure and support goals for deeper learning.  And, Rand Corporation funders Pane devotes considerable text to claims that no prior research exists, except for another Rand study, and then, on p.7 admits that there exist some relevant mastery learning studies from the 1980s. He implies, however, that there were only one or a few. In fact, there were hundreds. These researchers have included Homme, Csanyi, Gonzales, Rechs, O’Leary, Drabman, Kaszdin, Bootzin, Staats, Cameron, Pierce, McMillan, Corcoran, and Wilson. There have also been thousands  of studies of personalized instruction in conjunction with studies in special education, tutoring, teachers' aides, tracking, etc. 
5 John F. Pane   "A team of RAND Corporation researchers conducted the largest and most-rigorous studies of student achievement effects to date." 1stness Strategies for Implementing Personalized Learning While Evidence and Resources Are Underdeveloped, p.4 Rand Corporation Perspective, October 2018 https://www.rand.org/pubs/perspectives/PE314.html Funded by the William and Flora Hewlett Foundation, UCLA’s National Center for Research on Evaluation, Standards, and Student Testing (CRESST) is monitoring the extent to which the two consortia’s assessment development efforts are likely to produce tests that measure and support goals for deeper learning.  And, Rand Corporation funders Pane devotes considerable text to claims that no prior research exists, except for another Rand study, and then, on p.7 admits that there exist some relevant mastery learning studies from the 1980s. He implies, however, that there were only one or a few. In fact, there were hundreds. These researchers have included Homme, Csanyi, Gonzales, Rechs, O’Leary, Drabman, Kaszdin, Bootzin, Staats, Cameron, Pierce, McMillan, Corcoran, and Wilson. There have also been thousands  of studies of personalized instruction in conjunction with studies in special education, tutoring, teachers' aides, tracking, etc. 
6 John F. Pane   "While we await the answers to those questions, substantial enthusiasm around personalized learning persists. Educators, policy makers, and advocates are moving forward without the guidance of conclusive research evidence." Dismissive Strategies for Implementing Personalized Learning While Evidence and Resources Are Underdeveloped, p.5 Rand Corporation Perspective, October 2018 https://www.rand.org/pubs/perspectives/PE314.html Funded by the William and Flora Hewlett Foundation, UCLA’s National Center for Research on Evaluation, Standards, and Student Testing (CRESST) is monitoring the extent to which the two consortia’s assessment development efforts are likely to produce tests that measure and support goals for deeper learning.  And, Rand Corporation funders Pane devotes considerable text to claims that no prior research exists, except for another Rand study, and then, on p.7 admits that there exist some relevant mastery learning studies from the 1980s. He implies, however, that there were only one or a few. In fact, there were hundreds. These researchers have included Homme, Csanyi, Gonzales, Rechs, O’Leary, Drabman, Kaszdin, Bootzin, Staats, Cameron, Pierce, McMillan, Corcoran, and Wilson. There have also been thousands  of studies of personalized instruction in conjunction with studies in special education, tutoring, teachers' aides, tracking, etc. 
7 John F. Pane   "In the absence of comprehensive, rigorous evidence to help select the personalized learning components most likely to succeed, what is the path forward? I suggest a few guiding principles aimed at using existing scientific knowledge and the best available resources." Denigrating Strategies for Implementing Personalized Learning While Evidence and Resources Are Underdeveloped, p.5 Rand Corporation Perspective, October 2018 https://www.rand.org/pubs/perspectives/PE314.html Funded by the William and Flora Hewlett Foundation, UCLA’s National Center for Research on Evaluation, Standards, and Student Testing (CRESST) is monitoring the extent to which the two consortia’s assessment development efforts are likely to produce tests that measure and support goals for deeper learning.  And, Rand Corporation funders Pane devotes considerable text to claims that no prior research exists, except for another Rand study, and then, on p.7 admits that there exist some relevant mastery learning studies from the 1980s. He implies, however, that there were only one or a few. In fact, there were hundreds. These researchers have included Homme, Csanyi, Gonzales, Rechs, O’Leary, Drabman, Kaszdin, Bootzin, Staats, Cameron, Pierce, McMillan, Corcoran, and Wilson. There have also been thousands  of studies of personalized instruction in conjunction with studies in special education, tutoring, teachers' aides, tracking, etc. 
8 John F. Pane   "However, more work is necessary to establish causal evidence that the concept leads to improved outcomes for students" Dismissive Strategies for Implementing Personalized Learning While Evidence and Resources Are Underdeveloped, p.9 Rand Corporation Perspective, October 2018 https://www.rand.org/pubs/perspectives/PE314.html Funded by the William and Flora Hewlett Foundation, UCLA’s National Center for Research on Evaluation, Standards, and Student Testing (CRESST) is monitoring the extent to which the two consortia’s assessment development efforts are likely to produce tests that measure and support goals for deeper learning.  And, Rand Corporation funders Pane devotes considerable text to claims that no prior research exists, except for another Rand study, and then, on p.7 admits that there exist some relevant mastery learning studies from the 1980s. He implies, however, that there were only one or a few. In fact, there were hundreds. These researchers have included Homme, Csanyi, Gonzales, Rechs, O’Leary, Drabman, Kaszdin, Bootzin, Staats, Cameron, Pierce, McMillan, Corcoran, and Wilson. There have also been thousands  of studies of personalized instruction in conjunction with studies in special education, tutoring, teachers' aides, tracking, etc. 
9 John F. Pane   "Those who want to use rigorous research evidence to guide their designs will find many gaps and will be left with important unanswered questions about which practices or combinations of practices are effective."  Dismissive, Denigrating Strategies for Implementing Personalized Learning While Evidence and Resources Are Underdeveloped, p.12 Rand Corporation Perspective, October 2018 https://www.rand.org/pubs/perspectives/PE314.html Funded by the William and Flora Hewlett Foundation, UCLA’s National Center for Research on Evaluation, Standards, and Student Testing (CRESST) is monitoring the extent to which the two consortia’s assessment development efforts are likely to produce tests that measure and support goals for "deeper learning." Pane devotes considerable text to claims that no prior research exists, except for another Rand study, and then, on p.7 admits that there exist some relevant mastery learning studies from the 1980s. He implies, however, that there were only one or a few. In fact, there were hundreds. These researchers have included Homme, Csanyi, Gonzales, Rechs, O’Leary, Drabman, Kaszdin, Bootzin, Staats, Cameron, Pierce, McMillan, Corcoran, and Wilson. There have also been thousands  of studies of personalized instruction in conjunction with studies in special education, tutoring, teachers' aides, tracking, etc. 
10 John F. Pane "Despite the lack of evidence, there is considerable enthusiasm about personalized learning among practitioners and policymakers, and implementation is spreading." Dismissive Strategies for Implementing Personalized Learning While Evidence and Resources Are Underdeveloped, p.12 Rand Corporation Perspective, October 2018 https://www.rand.org/pubs/perspectives/PE314.html Funded by the William and Flora Hewlett Foundation, UCLA’s National Center for Research on Evaluation, Standards, and Student Testing (CRESST) is monitoring the extent to which the two consortia’s assessment development efforts are likely to produce tests that measure and support goals for "deeper learning." Pane devotes considerable text to claims that no prior research exists, except for another Rand study, and then, on p.7 admits that there exist some relevant mastery learning studies from the 1980s. He implies, however, that there were only one or a few. In fact, there were hundreds. These researchers have included Homme, Csanyi, Gonzales, Rechs, O’Leary, Drabman, Kaszdin, Bootzin, Staats, Cameron, Pierce, McMillan, Corcoran, and Wilson. There have also been thousands  of studies of personalized instruction in conjunction with studies in special education, tutoring, teachers' aides, tracking, etc. 
11 John F. Pane "Thus, the purpose of this Perspective is to offer strategic guidance for designers of personalized learning programs to consider while the evidence base is catching up. This guidance draws on theory, basic principles from learning science, and the limited research that does exist on personalized learning and its component parts. This research was conducted in RAND Education." Dismissive Strategies for Implementing Personalized Learning While Evidence and Resources Are Underdeveloped, p.12 Rand Corporation Perspective, October 2018 https://www.rand.org/pubs/perspectives/PE314.html Funded by the William and Flora Hewlett Foundation, UCLA’s National Center for Research on Evaluation, Standards, and Student Testing (CRESST) is monitoring the extent to which the two consortia’s assessment development efforts are likely to produce tests that measure and support goals for "deeper learning." Pane devotes considerable text to claims that no prior research exists, except for another Rand study, and then, on p.7 admits that there exist some relevant mastery learning studies from the 1980s. He implies, however, that there were only one or a few. In fact, there were hundreds. These researchers have included Homme, Csanyi, Gonzales, Rechs, O’Leary, Drabman, Kaszdin, Bootzin, Staats, Cameron, Pierce, McMillan, Corcoran, and Wilson. There have also been thousands  of studies of personalized instruction in conjunction with studies in special education, tutoring, teachers' aides, tracking, etc. 
12 Jennifer L. Steele, Matthew W. Lewis, Lucrecia Santibañez, Susannah Faxon-Mills, Mollie Rudnick, Brian M. Stecher, Laura S. Hamilton   "Despite taking on considerable momentum in the field, competency-based systems have not been extensively researched." p.2 Dismissive Competency-Based Education in Three Pilot Programs Examining Implementation and Outcomes Rand Education, 2014 https://www.rand.org/content/dam/rand/pubs/research_reports/RR700/RR732/RAND_RR732.pdf "The research described in this report was sponsored by the Bill & Melinda Gates Foundation" Just some of the relevant pre-2008 studies of the effects of minimum-competency or exit exams and the problems with a single passing score include those of Alvarez, Moreno, & Patrinos (2007); Grodsky & Kalogrides (2006); Audette (2005); Orlich (2003); StandardsWork (2003); Meisels, et al. (2003); Braun (2003); Rosenshine (2003); Tighe, Wang, & Foley (2002); Carnoy & Loeb (2002); Baumert & Demmrich (2001); Rosenblatt & Offer (2001); Phelps (2001); Toenjes, Dworkin, Lorence, & Hill (2000); Wenglinsky (2000); Massachusetts Finance Office (2000); DeMars (2000); Bishop (1999, 2000, 2001, & 2004); Grissmer & Flanagan(1998); Strauss, Bowes, Marks, & Plesko (1998); Frederiksen (1994); Ritchie & Thorkildsen (1994); Chao-Qun & Hui (1993); Potter & Wall (1992); Jacobson (1992); Rodgers, et al. (1991); Morris (1991); Winfield (1990); Ligon, Johnstone, Brightman, Davis, et al. (1990); Winfield (1987); Koffler (1987); Losack (1987); Marshall (1987); Hembree (1987); Mangino, Battaille, Washington, & Rumbaut (1986); Michigan Department of Education (1984); Ketchie (1984); Serow (1982); Indiana Education Department (1982); Brunton (1982); Paramore, et al. (1980); Ogden (1979); Down(2) (1979); Wellisch (1978); and Findley (1978).
13 Jennifer L. Steele, Matthew W. Lewis, Lucrecia Santibañez, Susannah Faxon-Mills, Mollie Rudnick, Brian M. Stecher, Laura S. Hamilton   "Recent studies have described the experiences of educators working to undertake competency-based reforms or have highlighted promising models, but these studies have not systematically examined the effects of these models on student learning or persistence." p.2 Denigrating Competency-Based Education in Three Pilot Programs Examining Implementation and Outcomes Rand Education, 2014 https://www.rand.org/content/dam/rand/pubs/research_reports/RR700/RR732/RAND_RR732.pdf "The research described in this report was sponsored by the Bill & Melinda Gates Foundation" Just some of the relevant pre-2008 studies of the effects of minimum-competency or exit exams and the problems with a single passing score include those of Alvarez, Moreno, & Patrinos (2007); Grodsky & Kalogrides (2006); Audette (2005); Orlich (2003); StandardsWork (2003); Meisels, et al. (2003); Braun (2003); Rosenshine (2003); Tighe, Wang, & Foley (2002); Carnoy & Loeb (2002); Baumert & Demmrich (2001); Rosenblatt & Offer (2001); Phelps (2001); Toenjes, Dworkin, Lorence, & Hill (2000); Wenglinsky (2000); Massachusetts Finance Office (2000); DeMars (2000); Bishop (1999, 2000, 2001, & 2004); Grissmer & Flanagan(1998); Strauss, Bowes, Marks, & Plesko (1998); Frederiksen (1994); Ritchie & Thorkildsen (1994); Chao-Qun & Hui (1993); Potter & Wall (1992); Jacobson (1992); Rodgers, et al. (1991); Morris (1991); Winfield (1990); Ligon, Johnstone, Brightman, Davis, et al. (1990); Winfield (1987); Koffler (1987); Losack (1987); Marshall (1987); Hembree (1987); Mangino, Battaille, Washington, & Rumbaut (1986); Michigan Department of Education (1984); Ketchie (1984); Serow (1982); Indiana Education Department (1982); Brunton (1982); Paramore, et al. (1980); Ogden (1979); Down(2) (1979); Wellisch (1978); and Findley (1978).
14 Jennifer L. Steele, Matthew W. Lewis, Lucrecia Santibañez, Susannah Faxon-Mills, Mollie Rudnick, Brian M. Stecher, Laura S. Hamilton   "… there are no studies that would allow us to attribute outperformance to the competency-based education systems alone," p.2 Dismissive Competency-Based Education in Three Pilot Programs Examining Implementation and Outcomes Rand Education, 2014 https://www.rand.org/content/dam/rand/pubs/research_reports/RR700/RR732/RAND_RR732.pdf "The research described in this report was sponsored by the Bill & Melinda Gates Foundation" Just some of the relevant pre-2008 studies of the effects of minimum-competency or exit exams and the problems with a single passing score include those of Alvarez, Moreno, & Patrinos (2007); Grodsky & Kalogrides (2006); Audette (2005); Orlich (2003); StandardsWork (2003); Meisels, et al. (2003); Braun (2003); Rosenshine (2003); Tighe, Wang, & Foley (2002); Carnoy & Loeb (2002); Baumert & Demmrich (2001); Rosenblatt & Offer (2001); Phelps (2001); Toenjes, Dworkin, Lorence, & Hill (2000); Wenglinsky (2000); Massachusetts Finance Office (2000); DeMars (2000); Bishop (1999, 2000, 2001, & 2004); Grissmer & Flanagan(1998); Strauss, Bowes, Marks, & Plesko (1998); Frederiksen (1994); Ritchie & Thorkildsen (1994); Chao-Qun & Hui (1993); Potter & Wall (1992); Jacobson (1992); Rodgers, et al. (1991); Morris (1991); Winfield (1990); Ligon, Johnstone, Brightman, Davis, et al. (1990); Winfield (1987); Koffler (1987); Losack (1987); Marshall (1987); Hembree (1987); Mangino, Battaille, Washington, & Rumbaut (1986); Michigan Department of Education (1984); Ketchie (1984); Serow (1982); Indiana Education Department (1982); Brunton (1982); Paramore, et al. (1980); Ogden (1979); Down(2) (1979); Wellisch (1978); and Findley (1978).
15 Jennifer L. Steele, Matthew W. Lewis, Lucrecia Santibañez, Susannah Faxon-Mills, Mollie Rudnick, Brian M. Stecher, Laura S. Hamilton   "Because it is one of the first studies we are aware of since the late 1980s that has attempted to estimate the impact of competency-based models on students’ academic outcomes," p.4 1stness Competency-Based Education in Three Pilot Programs Examining Implementation and Outcomes Rand Education, 2014 https://www.rand.org/content/dam/rand/pubs/research_reports/RR700/RR732/RAND_RR732.pdf "The research described in this report was sponsored by the Bill & Melinda Gates Foundation" Just some of the relevant pre-2008 studies of the effects of minimum-competency or exit exams and the problems with a single passing score include those of Alvarez, Moreno, & Patrinos (2007); Grodsky & Kalogrides (2006); Audette (2005); Orlich (2003); StandardsWork (2003); Meisels, et al. (2003); Braun (2003); Rosenshine (2003); Tighe, Wang, & Foley (2002); Carnoy & Loeb (2002); Baumert & Demmrich (2001); Rosenblatt & Offer (2001); Phelps (2001); Toenjes, Dworkin, Lorence, & Hill (2000); Wenglinsky (2000); Massachusetts Finance Office (2000); DeMars (2000); Bishop (1999, 2000, 2001, & 2004); Grissmer & Flanagan(1998); Strauss, Bowes, Marks, & Plesko (1998); Frederiksen (1994); Ritchie & Thorkildsen (1994); Chao-Qun & Hui (1993); Potter & Wall (1992); Jacobson (1992); Rodgers, et al. (1991); Morris (1991); Winfield (1990); Ligon, Johnstone, Brightman, Davis, et al. (1990); Winfield (1987); Koffler (1987); Losack (1987); Marshall (1987); Hembree (1987); Mangino, Battaille, Washington, & Rumbaut (1986); Michigan Department of Education (1984); Ketchie (1984); Serow (1982); Indiana Education Department (1982); Brunton (1982); Paramore, et al. (1980); Ogden (1979); Down(2) (1979); Wellisch (1978); and Findley (1978).
16 Jennifer L. Steele, Matthew W. Lewis, Lucrecia Santibañez, Susannah Faxon-Mills, Mollie Rudnick, Brian M. Stecher, Laura S. Hamilton   "In part, the lack of recent research on competency-based education may be due to variability around the concept of competency-based education itself." p.10 Dismissive Competency-Based Education in Three Pilot Programs Examining Implementation and Outcomes Rand Education, 2014 https://www.rand.org/content/dam/rand/pubs/research_reports/RR700/RR732/RAND_RR732.pdf "The research described in this report was sponsored by the Bill & Melinda Gates Foundation" Just some of the relevant pre-2008 studies of the effects of minimum-competency or exit exams and the problems with a single passing score include those of Alvarez, Moreno, & Patrinos (2007); Grodsky & Kalogrides (2006); Audette (2005); Orlich (2003); StandardsWork (2003); Meisels, et al. (2003); Braun (2003); Rosenshine (2003); Tighe, Wang, & Foley (2002); Carnoy & Loeb (2002); Baumert & Demmrich (2001); Rosenblatt & Offer (2001); Phelps (2001); Toenjes, Dworkin, Lorence, & Hill (2000); Wenglinsky (2000); Massachusetts Finance Office (2000); DeMars (2000); Bishop (1999, 2000, 2001, & 2004); Grissmer & Flanagan(1998); Strauss, Bowes, Marks, & Plesko (1998); Frederiksen (1994); Ritchie & Thorkildsen (1994); Chao-Qun & Hui (1993); Potter & Wall (1992); Jacobson (1992); Rodgers, et al. (1991); Morris (1991); Winfield (1990); Ligon, Johnstone, Brightman, Davis, et al. (1990); Winfield (1987); Koffler (1987); Losack (1987); Marshall (1987); Hembree (1987); Mangino, Battaille, Washington, & Rumbaut (1986); Michigan Department of Education (1984); Ketchie (1984); Serow (1982); Indiana Education Department (1982); Brunton (1982); Paramore, et al. (1980); Ogden (1979); Down(2) (1979); Wellisch (1978); and Findley (1978).
17 Kun Yuan, Vi-Nhuan Le   "… there has been no systematic empirical examination of the extent to which other widely used achievement tests emphasize deeper learning." p.xi Dismissive Measuring Deeper Learning Through
Cognitively Demanding Test Items
Rand Corporation Research Report, 2014 https://www.rand.org/content/dam/rand/pubs/research_reports/RR400/RR483/RAND_RR483.pdf "The research described in this report was sponsored by the William and Flora Hewlett Foundation"  
18 Pete Wilmoth   "The increasing availability of computers and Internet access makes technology based education an enticing option, both inside and outside the classroom. However, school districts have adopted many such tools without compelling evidence that they are effective in improving student achievement." Dismissive Cognitive Tutor: Encouraging Signs for Computers in the Classroom The RAND Blog, November 19, 2013   "The research reported here was supported by the Institute of Education Sciences, U.S. Department of Education, through Grant R305A070185 to the RAND Corporation."  
19 Pete Wilmoth   "To help fill this evidence gap, a RAND research team … " Dismissive Cognitive Tutor: Encouraging Signs for Computers in the Classroom The RAND Blog, November 19, 2013   "The research reported here was supported by the Institute of Education Sciences, U.S. Department of Education, through Grant R305A070185 to the RAND Corporation."  
20 Pete Wilmoth   "As one of the first large-scale assessments of a blended learning approach, this study suggests promise for using technology to improve student achievement." 1stness Cognitive Tutor: Encouraging Signs for Computers in the Classroom The RAND Blog, November 19, 2013   "The research reported here was supported by the Institute of Education Sciences, U.S. Department of Education, through Grant R305A070185 to the RAND Corporation."  
21 John F. Pane, Beth Ann Griffin, Daniel F. McCaffrey, and Rita Karam,   "These tools allow self-paced instruction and provide students with customized feedback. These features, it is widely held, will improve student engagement and improve proficiency. However, evidence to support these claims remains scarce." p.2 Dismissive Does an Algebra Course with Tutoring Software Improve Student Learning? Rand Corporation Brief, 2013   "The research reported here was supported by the Institute of Education Sciences, U.S. Department of Education, through Grant R305A070185 to the RAND Corporation."  
22 John F. Pane, Beth Ann Griffin, Daniel F. McCaffrey, and Rita Karam,   "To make headway in addressing this knowledge gap, a team of RAND researchers …" p.3 Dismissive Does an Algebra Course with Tutoring Software Improve Student Learning? Rand Corporation Brief, 2013   "The research reported here was supported by the Institute of Education Sciences, U.S. Department of Education, through Grant R305A070185 to the RAND Corporation."  
23 Susannah Faxon-Mills, Laura S. Hamilton, Mollie Rudnick, Brian M. Stecher   "In particular, there is still much to learn about how changes in testing might influence the education system and how tests of deeper content and more complex skills and processes could best be used to promote the Foundation’s goals for deeper learning." p.1 Dismissive New Assessments, Better Instruction? Designing Assessment Systems to Promote Instructional Improvement Rand Corporation Research Report, 2013   "Funding to support the research was provided by the William and Flora Hewlett Foundation."  "Marc Chun at the Hewlett Foundation first approached us about reviewing the literature on the impact of assessment, and he was very helpful in framing this investigation."  
24 Susannah Faxon-Mills, Laura S. Hamilton, Mollie Rudnick, Brian M. Stecher   "Given the gaps in evidence regarding the link between testing and student outcomes … " p.1 Dismissive New Assessments, Better Instruction? Designing Assessment Systems to Promote Instructional Improvement Rand Corporation Research Report, 2013   "Funding to support the research was provided by the William and Flora Hewlett Foundation."  "Marc Chun at the Hewlett Foundation first approached us about reviewing the literature on the impact of assessment, and he was very helpful in framing this investigation."  
25 Susannah Faxon-Mills, Laura S. Hamilton, Mollie Rudnick, Brian M. Stecher   "The first step for each of these research areas was to identify relevant material from previous literature reviews on these topics, including those conducted by RAND researchers (e.g., Hamilton, Stecher, and Klein, 2002; Hamilton, 2003; Stecher, 2010) and by the National Research Council (e.g., Koenig, 2011). p.5 Dismissive New Assessments, Better Instruction? Designing Assessment Systems to Promote Instructional Improvement Rand Corporation Research Report, 2013   "Funding to support the research was provided by the William and Flora Hewlett Foundation."  "Marc Chun at the Hewlett Foundation first approached us about reviewing the literature on the impact of assessment, and he was very helpful in framing this investigation."  
26 Susannah Faxon-Mills, Laura S. Hamilton, Mollie Rudnick, Brian M. Stecher   "… we paid particular attention to sources from the past ten years, since these studies were less likely to have been included in previous literature reviews." p.5 Dismissive New Assessments, Better Instruction? Designing Assessment Systems to Promote Instructional Improvement Rand Corporation Research Report, 2013   "Funding to support the research was provided by the William and Flora Hewlett Foundation."  "Marc Chun at the Hewlett Foundation first approached us about reviewing the literature on the impact of assessment, and he was very helpful in framing this investigation."  
27 Susannah Faxon-Mills, Laura S. Hamilton, Mollie Rudnick, Brian M. Stecher   "Time and resource constraints limited the extent of our literature reviews, but we do not think this had a serious effect on our findings. Most importantly, we included all the clearly relevant studies from major sources that were available for electronic searching. In addition, many of the studies we reviewed also included comprehensive reviews of other literature, leading to fairly wide coverage of each body of literature." p.8 Dismissive New Assessments, Better Instruction? Designing Assessment Systems to Promote Instructional Improvement Rand Corporation Research Report, 2013   "Funding to support the research was provided by the William and Flora Hewlett Foundation."  "Marc Chun at the Hewlett Foundation first approached us about reviewing the literature on the impact of assessment, and he was very helpful in framing this investigation."  
28 Susannah Faxon-Mills, Laura S. Hamilton, Mollie Rudnick, Brian M. Stecher   "However, the amount of research on test attributes is limited, and the research has been conducted in a wide variety of contexts involving a wide variety of tests. Thus, while the findings are interesting, few have been replicated." p.22 Dismissive New Assessments, Better Instruction? Designing Assessment Systems to Promote Instructional Improvement Rand Corporation Research Report, 2013   "Funding to support the research was provided by the William and Flora Hewlett Foundation."  "Marc Chun at the Hewlett Foundation first approached us about reviewing the literature on the impact of assessment, and he was very helpful in framing this investigation."  
29 Susannah Faxon-Mills, Laura S. Hamilton, Mollie Rudnick, Brian M. Stecher   "It is important to recognize that the literature on how school characteristics, such as urbanicity and governance, affect educators’ responses to testing is sparse." p.29 Dismissive New Assessments, Better Instruction? Designing Assessment Systems to Promote Instructional Improvement Rand Corporation Research Report, 2013   "Funding to support the research was provided by the William and Flora Hewlett Foundation."  "Marc Chun at the Hewlett Foundation first approached us about reviewing the literature on the impact of assessment, and he was very helpful in framing this investigation."  
30 Susannah Faxon-Mills, Laura S. Hamilton, Mollie Rudnick, Brian M. Stecher   "… there is little empirical evidence that provides guidance on the amount and types of professional development that would promote constructive responses to assessment. Dismissive New Assessments, Better Instruction? Designing Assessment Systems to Promote Instructional Improvement Rand Corporation Research Report, 2013   "Funding to support the research was provided by the William and Flora Hewlett Foundation."  "Marc Chun at the Hewlett Foundation first approached us about reviewing the literature on the impact of assessment, and he was very helpful in framing this investigation."  
31 Jinok Kim Joan L. Herman "However, the validity of existing criteria and procedures lack an empirical base; in fact, reclassification practices are formulated and implemented with little knowledge of the factors that may influence their success." Dismissive, Denigrating Understanding Patterns and Precursors of ELL Success Subsequent to Reclassification, p.1 CRESST Report 818, August, 2012 https://files.eric.ed.gov/fulltext/ED540604.pdf "The work reported herein was supported under the National Research and Development Centers, PR/Award Number R305A09058101, as administered by the U.S. Department of Education, Institute of Education Sciences."  
32 Jinok Kim Joan L. Herman "Because the research basis for making mainstreaming or reclassification decisions remains slim, it may not be surprising that criteria for reclassifying students from ELL to Reclassified as Fluent English Proficient (RFEP) status vary substantially across states, as documented by a recent report reviewing statewide practices related to ELLs." Dismissive Understanding Patterns and Precursors of ELL Success Subsequent to Reclassification, p.3 CRESST Report 818, August, 2012 https://files.eric.ed.gov/fulltext/ED540604.pdf "The work reported herein was supported under the National Research and Development Centers, PR/Award Number R305A09058101, as administered by the U.S. Department of Education, Institute of Education Sciences."  
33 Jinok Kim Joan L. Herman "Previous studies cited earlier have identified potential problems in current reclassification, qualitatively analyzed criteria, and student characteristics that may relate to high versus low redesignation rates, and examined related research questions, such as how long it takes for non native speakers to acquire ELP or be reclassified; but none of the existing literature has directly dealt with reclassification systems and their consequences, and more specifically with the consequences of various reclassification criteria." 1stness Understanding Patterns and Precursors of ELL Success Subsequent to Reclassification, p.6 CRESST Report 818, August, 2012 https://files.eric.ed.gov/fulltext/ED540604.pdf "The work reported herein was supported under the National Research and Development Centers, PR/Award Number R305A09058101, as administered by the U.S. Department of Education, Institute of Education Sciences."  
34 Lorraine M. McDonnell   "Over the past 30 years, accountability policies have become more prominent in public K-12 education and have changed how teaching and learning are organized. It is less clear the extent to which these policies have altered the politics of education." Abstract, p.170 Dismissive Educational Accountability and Policy Feedback Educational Policy 27(2) 170–189, 2012 https://journals.sagepub.com/doi/10.1177/0895904812465119 "The author received financial support from the William T. Grant Foundation for research presented in this article."  
35 Lorraine M. McDonnell   "In contrast to other policy areas such as health and social welfare where research is more developed, we know less about policy feedback in education." p.171 Dismissive Educational Accountability and Policy Feedback Educational Policy 27(2) 170–189, 2012 https://journals.sagepub.com/doi/10.1177/0895904812465119 "The author received financial support from the William T. Grant Foundation for research presented in this article."  
36 Lorraine M. McDonnell   "However, an essential question for those interested in the politics of education policy has not been central in past research: To what extent have recent accountability policies altered the politics of education? This article begins to address that question ..." p.171 Dismissive Educational Accountability and Policy Feedback Educational Policy, 27(2) 170–189, 2012 https://journals.sagepub.com/doi/10.1177/0895904812465119 "The author received financial support from the William T. Grant Foundation for research presented in this article."  
37 Laura S. Hamilton Brian M. Stecher, Kun Yuan "He also noted that virtually all of the arguments, both for and against standards, are based on beliefs and hypotheses rather than on direct empirical evidence” (p. 427) Although a large and growing body of research has been conducted to examine the effects of SBA, the caution Porter expressed in 1994 about the lack of empirical evidence remains relevant today." pp.157-158 Denigrating Standards-Based Accountability in the United States: Lessons Learned and Future Directions Education Inquiry, 3(2), June 2012, 149-170 https://www.academia.edu/15201890/Standards_Based_Accountability_in_the_United_States_Lessons_Learned_and_Future_Directions_1 "Material in this paper has been adapted from a paper commissioned by the Center on Education Policy: Hamilton, L.S., Stecher, B.M., & Yuan, K. (2009) Standards-based Reform in the United States: History, Research, and Future Directions. Washington, DC: Center on Education Policy. Portions of this work were supported by the National Science Foundation under Grant No. REC-0228295." Relevant pre-2000 studies of the effects of standards, alignment, goal setting, setting reachable goals, etc. include those of Mitchell (1999); Morgan & Ramist (1998); the *Southern Regional Education Board (1998); Miles, Bishop, Collins, Fink, Gardner, Grant, Hussain, et al. (1997); the Florida Office of Program Policy Analysis (1997); Pomplun (1997); Schmoker (1996); Aguilera & Hendricks (1996); Banta, Lund, Black & Oblander (1996); Bottoms & Mikos (1995); *Bamburg & Medina (1993); Bishop (1993); the U. S. General Accounting Office (1993); Eckstein & Noah (1993); Mattsson (1993); Brown (1992); Heyneman & Ransom (1992); Whetton (1992); Anderson, Muir, Bateson, Blackmore & Rogers (1990); Csikszentmihalyi (1990); *Levine & Lezotte (1990); LaRoque & Coleman (1989); Hillocks (1987); Willingham & Morris (1986); Resnick & Resnick (1985); Ogle & Fritts (1984); *Natriello & Dornbusch (1984); Brooke & Oxenham (1984); Rentz (1979); Wellisch, MacQueen, Carriere & Dick (1978); *Rosswork (1977); Estes, Colvin & Goodwin (1976); Wood (1953); and Panlasigui & Knight (1930).
38 Laura S. Hamilton Brian M. Stecher, Kun Yuan "High-quality research on the effects of SBA is difficult to conduct for a number of reasons,…." p.158 Dismissive Standards-Based Accountability in the United States: Lessons Learned and Future Directions Education Inquiry, 3(2), June 2012, 149-170 https://www.academia.edu/15201890/Standards_Based_Accountability_in_the_United_States_Lessons_Learned_and_Future_Directions_1 "Material in this paper has been adapted from a paper commissioned by the Center on Education Policy: Hamilton, L.S., Stecher, B.M., & Yuan, K. (2009) Standards-based Reform in the United States: History, Research, and Future Directions. Washington, DC: Center on Education Policy. Portions of this work were supported by the National Science Foundation under Grant No. REC-0228295." Access to anonymized student data is granted all the time. Externally administered high-stakes testing is widely reviled among US educationists. It strains credulity that one can not find one or a few districts out of the many thousands to cooperate in a study to discredit testing.
39 Laura S. Hamilton Brian M. Stecher, Kun Yuan "Even when the necessary data have been collected by states or other entities, it is often difficult for researchers to obtain these data because those responsible for the data refuse to grant access, either because of concerns about confidentiality or because they are not interested in having their programmes scrutinised by. researchers. Thus, the amount of rigorous analysis is limited." p.158 Dismissive Standards-Based Accountability in the United States: Lessons Learned and Future Directions Education Inquiry, 3(2), June 2012, 149-170 https://www.academia.edu/15201890/Standards_Based_Accountability_in_the_United_States_Lessons_Learned_and_Future_Directions_1 "Material in this paper has been adapted from a paper commissioned by the Center on Education Policy: Hamilton, L.S., Stecher, B.M., & Yuan, K. (2009) Standards-based Reform in the United States: History, Research, and Future Directions. Washington, DC: Center on Education Policy. Portions of this work were supported by the National Science Foundation under Grant No. REC-0228295." Access to anonymized student data is granted all the time. Externally administered high-stakes testing is widely reviled among US educationists. It strains credulity that one can not find one or a few districts out of the many thousands to cooperate in a study to discredit testing.
40 Laura S. Hamilton Brian M. Stecher, Kun Yuan "These evaluation findings reveal the challenges inherent in trying to judge the quality of standards. Arguably the most important test of quality is whether the standards promote high-quality instruction and improved student learning but, as we discuss later, there is very little research to address that question." p.158 Dismissive Standards-Based Accountability in the United States: Lessons Learned and Future Directions Education Inquiry, 3(2), June 2012, 149-170 https://www.academia.edu/15201890/Standards_Based_Accountability_in_the_United_States_Lessons_Learned_and_Future_Directions_1 "Material in this paper has been adapted from a paper commissioned by the Center on Education Policy: Hamilton, L.S., Stecher, B.M., & Yuan, K. (2009) Standards-based Reform in the United States: History, Research, and Future Directions. Washington, DC: Center on Education Policy. Portions of this work were supported by the National Science Foundation under Grant No. REC-0228295." Relevant pre-2000 studies of the effects of standards, alignment, goal setting, setting reachable goals, etc. include those of Mitchell (1999); Morgan & Ramist (1998); the *Southern Regional Education Board (1998); Miles, Bishop, Collins, Fink, Gardner, Grant, Hussain, et al. (1997); the Florida Office of Program Policy Analysis (1997); Pomplun (1997); Schmoker (1996); Aguilera & Hendricks (1996); Banta, Lund, Black & Oblander (1996); Bottoms & Mikos (1995); *Bamburg & Medina (1993); Bishop (1993); the U. S. General Accounting Office (1993); Eckstein & Noah (1993); Mattsson (1993); Brown (1992); Heyneman & Ransom (1992); Whetton (1992); Anderson, Muir, Bateson, Blackmore & Rogers (1990); Csikszentmihalyi (1990); *Levine & Lezotte (1990); LaRoque & Coleman (1989); Hillocks (1987); Willingham & Morris (1986); Resnick & Resnick (1985); Ogle & Fritts (1984); *Natriello & Dornbusch (1984); Brooke & Oxenham (1984); Rentz (1979); Wellisch, MacQueen, Carriere & Dick (1978); *Rosswork (1977); Estes, Colvin & Goodwin (1976); Wood (1953); and Panlasigui & Knight (1930).
41 Laura S. Hamilton Brian M. Stecher, Kun Yuan "In fact, the bulk of research relevant to SBA has focused on the links between high-stakes tests and educators’ practices rather than standards and practices." p.159 Dismissive Standards-Based Accountability in the United States: Lessons Learned and Future Directions Education Inquiry, 3(2), June 2012, 149-170 https://www.academia.edu/15201890/Standards_Based_Accountability_in_the_United_States_Lessons_Learned_and_Future_Directions_1 "Material in this paper has been adapted from a paper commissioned by the Center on Education Policy: Hamilton, L.S., Stecher, B.M., & Yuan, K. (2009) Standards-based Reform in the United States: History, Research, and Future Directions. Washington, DC: Center on Education Policy. Portions of this work were supported by the National Science Foundation under Grant No. REC-0228295." Relevant pre-2000 studies of the effects of standards, alignment, goal setting, setting reachable goals, etc. include those of Mitchell (1999); Morgan & Ramist (1998); the *Southern Regional Education Board (1998); Miles, Bishop, Collins, Fink, Gardner, Grant, Hussain, et al. (1997); the Florida Office of Program Policy Analysis (1997); Pomplun (1997); Schmoker (1996); Aguilera & Hendricks (1996); Banta, Lund, Black & Oblander (1996); Bottoms & Mikos (1995); *Bamburg & Medina (1993); Bishop (1993); the U. S. General Accounting Office (1993); Eckstein & Noah (1993); Mattsson (1993); Brown (1992); Heyneman & Ransom (1992); Whetton (1992); Anderson, Muir, Bateson, Blackmore & Rogers (1990); Csikszentmihalyi (1990); *Levine & Lezotte (1990); LaRoque & Coleman (1989); Hillocks (1987); Willingham & Morris (1986); Resnick & Resnick (1985); Ogle & Fritts (1984); *Natriello & Dornbusch (1984); Brooke & Oxenham (1984); Rentz (1979); Wellisch, MacQueen, Carriere & Dick (1978); *Rosswork (1977); Estes, Colvin & Goodwin (1976); Wood (1953); and Panlasigui & Knight (1930).
42 Laura S. Hamilton Brian M. Stecher, Kun Yuan "The existing evidence does not provide definitive guidance regarding the SBA system features that would be most likely to promote desirable outcomes." p.163 Dismissive Standards-Based Accountability in the United States: Lessons Learned and Future Directions Education Inquiry, 3(2), June 2012, 149-170 https://www.academia.edu/15201890/Standards_Based_Accountability_in_the_United_States_Lessons_Learned_and_Future_Directions_1 "Material in this paper has been adapted from a paper commissioned by the Center on Education Policy: Hamilton, L.S., Stecher, B.M., & Yuan, K. (2009) Standards-based Reform in the United States: History, Research, and Future Directions. Washington, DC: Center on Education Policy. Portions of this work were supported by the National Science Foundation under Grant No. REC-0228295." Relevant pre-2000 studies of the effects of standards, alignment, goal setting, setting reachable goals, etc. include those of Mitchell (1999); Morgan & Ramist (1998); the *Southern Regional Education Board (1998); Miles, Bishop, Collins, Fink, Gardner, Grant, Hussain, et al. (1997); the Florida Office of Program Policy Analysis (1997); Pomplun (1997); Schmoker (1996); Aguilera & Hendricks (1996); Banta, Lund, Black & Oblander (1996); Bottoms & Mikos (1995); *Bamburg & Medina (1993); Bishop (1993); the U. S. General Accounting Office (1993); Eckstein & Noah (1993); Mattsson (1993); Brown (1992); Heyneman & Ransom (1992); Whetton (1992); Anderson, Muir, Bateson, Blackmore & Rogers (1990); Csikszentmihalyi (1990); *Levine & Lezotte (1990); LaRoque & Coleman (1989); Hillocks (1987); Willingham & Morris (1986); Resnick & Resnick (1985); Ogle & Fritts (1984); *Natriello & Dornbusch (1984); Brooke & Oxenham (1984); Rentz (1979); Wellisch, MacQueen, Carriere & Dick (1978); *Rosswork (1977); Estes, Colvin & Goodwin (1976); Wood (1953); and Panlasigui & Knight (1930).
43 Girlie C. Delacruz   "Opportunities for student use of rubrics to improve learning appears logical, although only a few studies have examined this idea directly." Dismissive Impact of Incentives on the Use of Feedback in Educational Videogames CRESST Report 813, March, 2012, p.3 https://cresst.org/wp-content/uploads/R813.pdf Institute of Education Sciences, US Education Department Relevant pre-2000 studies of the effects of standards, alignment, goal setting, setting reachable goals, etc. include those of Mitchell (1999); Morgan & Ramist (1998); the *Southern Regional Education Board (1998); Miles, Bishop, Collins, Fink, Gardner, Grant, Hussain, et al. (1997); the Florida Office of Program Policy Analysis (1997); Pomplun (1997); Schmoker (1996); Aguilera & Hendricks (1996); Banta, Lund, Black & Oblander (1996); Bottoms & Mikos (1995); *Bamburg & Medina (1993); Bishop (1993); the U. S. General Accounting Office (1993); Eckstein & Noah (1993); Mattsson (1993); Brown (1992); Heyneman & Ransom (1992); Whetton (1992); Anderson, Muir, Bateson, Blackmore & Rogers (1990); Csikszentmihalyi (1990); *Levine & Lezotte (1990); LaRoque & Coleman (1989); Hillocks (1987); Willingham & Morris (1986); Resnick & Resnick (1985); Ogle & Fritts (1984); *Natriello & Dornbusch (1984); Brooke & Oxenham (1984); Rentz (1979); Wellisch, MacQueen, Carriere & Dick (1978); *Rosswork (1977); Estes, Colvin & Goodwin (1976); Wood (1953); and Panlasigui & Knight (1930).
44 Jinok Kim   "Though we can find many such statistics in various reports, few have dealt with comparisons across students reclassified in various grade levels. Lack of such studies may be in part due to the difficulty in defining who are reclassified students as well as when they are reclassified." Dismissive Relationshiips among and between ELL status, demographic characteristics, enrollment history, and school persistence CRESST Report 810, December, 2011, p.6 https://cresst.org/wp-content/uploads/R810.pdf "The work reported herein was supported under the National Research and Development Centers, PR/Award Number R305A090581, as administered by the U.S. Department of Education, Institute of Education Sciences with funding to the National Center for Research on Evaluation, Standards, and Student Testing (CRESST)."  
45 Joan Herman 4 others "While the challenge of teachers’ content-pedagogical knowledge has been documented (Heritage et al., 2009; Heritage, Jones & White, 2010; Herman et al., 2010), few studies have examined the relationship between such knowledge and teachers’ assessment practices, nor examined how teachers’ knowledge may moderate the relationship between assessment practices and student learning." Dismissive Relationships between Teacher Knowledge, Assessment Practice, and Learning-Chicken, Egg, or Omelet? CRESST Report 809, November 2011 http://cresst.org/wp-content/uploads/R809.pdf Institute of Education Sciences, US Education Department See, for example, https://www.tandfonline.com/doi/full/10.1080/15305058.2011.602920 ; https://nonpartisaneducation.org/Review/Resources/SurveyList.htm ; https://nonpartisaneducation.org/Review/Resources/QualitativeList.htm
46 Lorrie A. Shepard Kristen L. Davidson, Richard Bowman "Although some instruments, such as the Northwest Evaluation Association‘s (NWEA) Measures of Academic Progress (MAP®), have been around for decades, few studies have been conducted to examine the technical adequacy of interim assessments or to evaluate their effects on teaching and student learning."  Dismissive How Middle-School Mathematics Teachers Use Interim and Benchmark Assessment Data, p.2 CRESST Report 807, October 2011 http://cresst.org/wp-content/uploads/R807.pdf Institute of Education Sciences, US Education Department Studies of formative testing date back a cenury, and the evidence, on average, is strongly positive, which is not the result favored by CRESST, so they declare the studies nonexistent. See, for example,  https://www.tandfonline.com/doi/full/10.1080/15305058.2011.602920 ; https://nonpartisaneducation.org/Review/Resources/QuantitativeList.htm ; https://nonpartisaneducation.org/Review/Resources/SurveyList.htm ; https://nonpartisaneducation.org/Review/Resources/QualitativeList.htm
47 Kristen L. Davidson Greta Frohbieter "Yet, districts’ processes to this end [of adopting interim or benchmark assessments] have been largely unexamined (Bulkley et al.; Mandinach et al.; Young & Kim). Dismissive District Adoption and Implementation of Interim and Benchmark Assessments, p.2 CRESST Report 806, September 2011 https://eric.ed.gov/?id=ED525098 Institute of Education Sciences, US Education Department Studies of formative testing date back a cenury, and the evidence, on average, is strongly positive, which is not the result favored by CRESST, so they declare the studies nonexistent. See, for example,  https://www.tandfonline.com/doi/full/10.1080/15305058.2011.602920 ; https://nonpartisaneducation.org/Review/Resources/QuantitativeList.htm ; https://nonpartisaneducation.org/Review/Resources/SurveyList.htm ; https://nonpartisaneducation.org/Review/Resources/QualitativeList.htm
48 Kristen L. Davidson Greta Frohbieter "As noted above, district processes with regard to interim assessment adoption and implementation remain largely uninvestigated. A review of the few relevant studies, however, reveals..." Dismissive District Adoption and Implementation of Interim and Benchmark Assessments, p.4 CRESST Report 806, September 2011 https://eric.ed.gov/?id=ED525098 Institute of Education Sciences, US Education Department Studies of formative testing date back a cenury, and the evidence, on average, is strongly positive, which is not the result favored by CRESST, so they declare the studies nonexistent. See, for example,  https://www.tandfonline.com/doi/full/10.1080/15305058.2011.602920 ; https://nonpartisaneducation.org/Review/Resources/QuantitativeList.htm ; https://nonpartisaneducation.org/Review/Resources/SurveyList.htm ; https://nonpartisaneducation.org/Review/Resources/QualitativeList.htm
49 Marguerite Clarke   “The evidence base is stronger in some areas than in others. For example, there are many professional standards for assessment quality that ` be applied to classroom assessments, examinations, and large-scale assessments (APA, AERA, and NCME, 1999), but less professional or empirical research on enabling contexts.” p. 20 Dismissive Framework for Building an Effective Student Assessment System  World Bank, READ/SABER Working Paper, Aug. 2011  http://files.eric.ed.gov/fulltext/ED553178.pdf World Bank funders No matter that there exist hundreds of other countries, a century's worth of research prior to 2010, literally thousands of other journals that might publish such a article, and a large "grey literature" of alignment studies conducted as routine parts of test development. Virtually any standards-based, large-scale test development includes an alignment study, not to be found in a scholarly journal.  Some notable alignment studies:
with NRTs:  Freeman, Kuhs, Porter, Floden, Schmidt, Schwille (1983); Debra P. v. Turlington (1984); Cohen, Spillane (1993); La Marca, Redfield, Winter, Bailey, and Despriet (2000); Wainer (2011)
with Standards: Archbald (1994); Buckendahl, Plake, Impara, Irwin (2000); Bhola, Impara, Buckendahl (2003); Phelps (2005)
with RTs: Massell, Kirst, Hoppe (1997); Wiley, Hembry, Buckendahl, Forte,Towles Nebelsick-Gullett (2015)
50 Marguerite Clarke   “Data for some of these indicator areas can be found in official documents, published reports (for example, Ferrer, 2006), research articles (for example, Braun and Kanjee, 2005), and online databases. For the most part, data have not been gathered in any comprehensive or systematic fashion. Those wishing to review this type of information for a particular assessment system will most likely need to collect the data themselves.” p. 21 Denigrating Framework for Building an Effective Student Assessment System  World Bank, READ/SABER Working Paper, Aug. 2011  http://files.eric.ed.gov/fulltext/ED553178.pdf World Bank funders No matter that there exist hundreds of other countries, a century's worth of research prior to 2010, literally thousands of other journals that might publish such a article, and a large "grey literature" of alignment studies conducted as routine parts of test development. Virtually any standards-based, large-scale test development includes an alignment study, not to be found in a scholarly journal.  Some notable alignment studies:
with NRTs:  Freeman, Kuhs, Porter, Floden, Schmidt, Schwille (1983); Debra P. v. Turlington (1984); Cohen, Spillane (1993); La Marca, Redfield, Winter, Bailey, and Despriet (2000); Wainer (2011)
with Standards: Archbald (1994); Buckendahl, Plake, Impara, Irwin (2000); Bhola, Impara, Buckendahl (2003); Phelps (2005)
with RTs: Massell, Kirst, Hoppe (1997); Wiley, Hembry, Buckendahl, Forte,Towles Nebelsick-Gullett (2015)
51 Marguerite Clarke   “This paper has extracted principles and guidelines from countries’ experiences and the current research base to outline a framework for developing a more effective student assessment system. The framework provides policy makers and others with a structure for discussion and consensus building around priorities and key inputs for their assessment system.” p. 27 1rstness Framework for Building an Effective Student Assessment System  World Bank, READ/SABER Working Paper, Aug. 2011  http://files.eric.ed.gov/fulltext/ED553178.pdf World Bank funders No matter that there exist hundreds of other countries, a century's worth of research prior to 2010, literally thousands of other journals that might publish such a article, and a large "grey literature" of alignment studies conducted as routine parts of test development. Virtually any standards-based, large-scale test development includes an alignment study, not to be found in a scholarly journal.  Some notable alignment studies:
with NRTs:  Freeman, Kuhs, Porter, Floden, Schmidt, Schwille (1983); Debra P. v. Turlington (1984); Cohen, Spillane (1993); La Marca, Redfield, Winter, Bailey, and Despriet (2000); Wainer (2011)
with Standards: Archbald (1994); Buckendahl, Plake, Impara, Irwin (2000); Bhola, Impara, Buckendahl (2003); Phelps (2005)
with RTs: Massell, Kirst, Hoppe (1997); Wiley, Hembry, Buckendahl, Forte,Towles Nebelsick-Gullett (2015)
52 Michael Hout, Stuart W. Elliot, Editors   "Unfortunately, there were no other studies available that would have allowed us to contrast the overall effect of state incentive programs predating NCLB…" p. 4-6 Dismissive Incentives and Test-Based Accountability in Education, 2011 Board on Testing and Assessment, National Research Council https://www.nap.edu/catalog/12521/incentives-and-test-based-accountability-in-education National Research Council funders Relevant studies of the effects of varying types of incentive or the optimal structure of incentives include those of Kelley (1999); the *Southern Regional Education Board (1998); Trelfa (1998); Heneman (1998); Banta, Lund, Black & Oblander (1996); Brooks-Cooper, 1993; Eckstein & Noah (1993); Richards & Shen (1992); Jacobson (1992); Heyneman & Ransom (1992); *Levine & Lezotte (1990); Duran, 1989; *Crooks (1988); *Kulik & Kulik (1987); Corcoran & Wilson (1986); *Guskey & Gates (1986); Brook & Oxenham (1985); Oxenham (1984); Venezky & Winfield (1979); Brookover & Lezotte (1979); McMillan (1977); Abbott (1977); *Staats (1973); *Kazdin & Bootzin (1972); *O’Leary & Drabman (1971); Cronbach (1960); and Hurlock (1925).   *Covers many studies; study is a research review, research synthesis, or meta-analysis.  Other researchers who, prior to 2000, studied test-based incentive programs include Homme, Csanyi, Gonzales, Rechs, O’Leary, Drabman, Kaszdin, Bootzin, Staats, Cameron, Pierce, McMillan, Corcoran, Roueche, Kirk, Wheeler, Boylan, and Wilson. "Others have considered the role of tests in incentive programs.  These researchers have included Homme, Csanyi, Gonzales, Rechs, O’Leary, Drabman, Kaszdin, Bootzin, Staats, Cameron, Pierce, McMillan, Corcoran, and Wilson. International organizations, such as the World Bank or the Asian Development Bank, have studied the effects of testing on education programs they sponsor.  Researchers have included Somerset, Heynemann, Ransom, Psacharopoulis, Velez, Brooke, Oxenham, Bude, Chapman, Snyder, and Pronaratna.
Moreover, the mastery learning/mastery testing experiments conducted from the 1960s through today varied incentives, frequency of tests, types of tests, and many other factors to determine the optimal structure of testing programs. Researchers included such notables as Bloom, Carroll, Keller, Block, Burns, Wentling, Anderson, Hymel, Kulik, Tierney, Cross, Okey, Guskey, Gates, and Jones."
What about:  Brooks-Cooper, C. (1993), Brown, S. M. & Walberg, H. J. (1993), Heneman, H. G., III. (1998), Hurlock, E. B. (1925), Jones, J. et al. (1996), Kazdin, A. & Bootzin, R. (1972),
Kelley, C. (1999), Kirkpatrick, J. E. (1934), O’Leary, K. D. & Drabman, R. (1971), Palmer, J. S. (2002), Richards, C. E. & Shen, T. M. (1992), .Rosswork, S. G. (1977),  Staats, A. (1973), Tuckman, B. W. (1994),  Tuckman, B. W. & Trimble, S. (1997), Webster, W. J., Mendro, R. L., Orsack, T., Weerasinghe, D. & Bembry, K. (1997)
53 Michael Hout, Stuart W. Elliot, Editors   "Test-based incentive programs, as designed and implemented in the programs that have been carefully studied have not increased student achievement enough to bring the United States close to the levels of the highest achieving countries.", p. 4-26 Denigrating Incentives and Test-Based Accountability in Education, 2011 Board on Testing and Assessment, National Research Council https://www.nap.edu/catalog/12521/incentives-and-test-based-accountability-in-education National Research Council funders Relevant studies of the effects of varying types of incentive or the optimal structure of incentives include those of Kelley (1999); the *Southern Regional Education Board (1998); Trelfa (1998); Heneman (1998); Banta, Lund, Black & Oblander (1996); Brooks-Cooper, 1993; Eckstein & Noah (1993); Richards & Shen (1992); Jacobson (1992); Heyneman & Ransom (1992); *Levine & Lezotte (1990); Duran, 1989; *Crooks (1988); *Kulik & Kulik (1987); Corcoran & Wilson (1986); *Guskey & Gates (1986); Brook & Oxenham (1985); Oxenham (1984); Venezky & Winfield (1979); Brookover & Lezotte (1979); McMillan (1977); Abbott (1977); *Staats (1973); *Kazdin & Bootzin (1972); *O’Leary & Drabman (1971); Cronbach (1960); and Hurlock (1925).   *Covers many studies; study is a research review, research synthesis, or meta-analysis.  Other researchers who, prior to 2000, studied test-based incentive programs include Homme, Csanyi, Gonzales, Rechs, O’Leary, Drabman, Kaszdin, Bootzin, Staats, Cameron, Pierce, McMillan, Corcoran, Roueche, Kirk, Wheeler, Boylan, and Wilson. Others have considered the role of tests in incentive programs.  These researchers have included Homme, Csanyi, Gonzales, Rechs, O’Leary, Drabman, Kaszdin, Bootzin, Staats, Cameron, Pierce, McMillan, Corcoran, and Wilson. International organizations, such as the World Bank or the Asian Development Bank, have studied the effects of testing on education programs they sponsor.  Researchers have included Somerset, Heynemann, Ransom, Psacharopoulis, Velez, Brooke, Oxenham, Bude, Chapman, Snyder, and Pronaratna.
Moreover, the mastery learning/mastery testing experiments conducted from the 1960s through today varied incentives, frequency of tests, types of tests, and many other factors to determine the optimal structure of testing programs. Researchers included such notables as Bloom, Carroll, Keller, Block, Burns, Wentling, Anderson, Hymel, Kulik, Tierney, Cross, Okey, Guskey, Gates, and Jones.
What about:  Brooks-Cooper, C. (1993), Brown, S. M. & Walberg, H. J. (1993), Heneman, H. G., III. (1998), Hurlock, E. B. (1925), Jones, J. et al. (1996), Kazdin, A. & Bootzin, R. (1972),
Kelley, C. (1999), Kirkpatrick, J. E. (1934), O’Leary, K. D. & Drabman, R. (1971), Palmer, J. S. (2002), Richards, C. E. & Shen, T. M. (1992), .Rosswork, S. G. (1977),  Staats, A. (1973), Tuckman, B. W. (1994),  Tuckman, B. W. & Trimble, S. (1997), Webster, W. J., Mendro, R. L., Orsack, T., Weerasinghe, D. & Bembry, K. (1997)
54 Michael Hout, Stuart W. Elliot, Editors   "Despite using them for several decades, policymakers and educators do not yet know how to use test-based incentives to consistently generate positive effects on achievement and to improve education." p .5-1 Dismissive Incentives and Test-Based Accountability in Education, 2011 Board on Testing and Assessment, National Research Council https://www.nap.edu/catalog/12521/incentives-and-test-based-accountability-in-education National Research Council funders Relevant studies of the effects of varying types of incentive or the optimal structure of incentives include those of Kelley (1999); the *Southern Regional Education Board (1998); Trelfa (1998); Heneman (1998); Banta, Lund, Black & Oblander (1996); Brooks-Cooper, 1993; Eckstein & Noah (1993); Richards & Shen (1992); Jacobson (1992); Heyneman & Ransom (1992); *Levine & Lezotte (1990); Duran, 1989; *Crooks (1988); *Kulik & Kulik (1987); Corcoran & Wilson (1986); *Guskey & Gates (1986); Brook & Oxenham (1985); Oxenham (1984); Venezky & Winfield (1979); Brookover & Lezotte (1979); McMillan (1977); Abbott (1977); *Staats (1973); *Kazdin & Bootzin (1972); *O’Leary & Drabman (1971); Cronbach (1960); and Hurlock (1925).   *Covers many studies; study is a research review, research synthesis, or meta-analysis.  Other researchers who, prior to 2000, studied test-based incentive programs include Homme, Csanyi, Gonzales, Rechs, O’Leary, Drabman, Kaszdin, Bootzin, Staats, Cameron, Pierce, McMillan, Corcoran, Roueche, Kirk, Wheeler, Boylan, and Wilson. "Others have considered the role of tests in incentive programs.  These researchers have included Homme, Csanyi, Gonzales, Rechs, O’Leary, Drabman, Kaszdin, Bootzin, Staats, Cameron, Pierce, McMillan, Corcoran, and Wilson. International organizations, such as the World Bank or the Asian Development Bank, have studied the effects of testing on education programs they sponsor.  Researchers have included Somerset, Heynemann, Ransom, Psacharopoulis, Velez, Brooke, Oxenham, Bude, Chapman, Snyder, and Pronaratna.
Moreover, the mastery learning/mastery testing experiments conducted from the 1960s through today varied incentives, frequency of tests, types of tests, and many other factors to determine the optimal structure of testing programs. Researchers included such notables as Bloom, Carroll, Keller, Block, Burns, Wentling, Anderson, Hymel, Kulik, Tierney, Cross, Okey, Guskey, Gates, and Jones."
What about:  Brooks-Cooper, C. (1993), Brown, S. M. & Walberg, H. J. (1993), Heneman, H. G., III. (1998), Hurlock, E. B. (1925), Jones, J. et al. (1996), Kazdin, A. & Bootzin, R. (1972),
Kelley, C. (1999), Kirkpatrick, J. E. (1934), O’Leary, K. D. & Drabman, R. (1971), Palmer, J. S. (2002), Richards, C. E. & Shen, T. M. (1992), .Rosswork, S. G. (1977),  Staats, A. (1973), Tuckman, B. W. (1994),  Tuckman, B. W. & Trimble, S. (1997), Webster, W. J., Mendro, R. L., Orsack, T., Weerasinghe, D. & Bembry, K. (1997)
55 Michael Hout, Stuart W. Elliot, Editors   "The general lack of guidance coming from existing studies of test-based incentive programs in education…" Dismissive Incentives and Test-Based Accountability in Education, 2011 Board on Testing and Assessment, National Research Council https://www.nap.edu/catalog/12521/incentives-and-test-based-accountability-in-education National Research Council funders Relevant studies of the effects of varying types of incentive or the optimal structure of incentives include those of Kelley (1999); the *Southern Regional Education Board (1998); Trelfa (1998); Heneman (1998); Banta, Lund, Black & Oblander (1996); Brooks-Cooper, 1993; Eckstein & Noah (1993); Richards & Shen (1992); Jacobson (1992); Heyneman & Ransom (1992); *Levine & Lezotte (1990); Duran, 1989; *Crooks (1988); *Kulik & Kulik (1987); Corcoran & Wilson (1986); *Guskey & Gates (1986); Brook & Oxenham (1985); Oxenham (1984); Venezky & Winfield (1979); Brookover & Lezotte (1979); McMillan (1977); Abbott (1977); *Staats (1973); *Kazdin & Bootzin (1972); *O’Leary & Drabman (1971); Cronbach (1960); and Hurlock (1925).   *Covers many studies; study is a research review, research synthesis, or meta-analysis.  Other researchers who, prior to 2000, studied test-based incentive programs include Homme, Csanyi, Gonzales, Rechs, O’Leary, Drabman, Kaszdin, Bootzin, Staats, Cameron, Pierce, McMillan, Corcoran, Roueche, Kirk, Wheeler, Boylan, and Wilson. "Others have considered the role of tests in incentive programs.  These researchers have included Homme, Csanyi, Gonzales, Rechs, O’Leary, Drabman, Kaszdin, Bootzin, Staats, Cameron, Pierce, McMillan, Corcoran, and Wilson. International organizations, such as the World Bank or the Asian Development Bank, have studied the effects of testing on education programs they sponsor.  Researchers have included Somerset, Heynemann, Ransom, Psacharopoulis, Velez, Brooke, Oxenham, Bude, Chapman, Snyder, and Pronaratna.
Moreover, the mastery learning/mastery testing experiments conducted from the 1960s through today varied incentives, frequency of tests, types of tests, and many other factors to determine the optimal structure of testing programs. Researchers included such notables as Bloom, Carroll, Keller, Block, Burns, Wentling, Anderson, Hymel, Kulik, Tierney, Cross, Okey, Guskey, Gates, and Jones."
What about:  Brooks-Cooper, C. (1993), Brown, S. M. & Walberg, H. J. (1993), Heneman, H. G., III. (1998), Hurlock, E. B. (1925), Jones, J. et al. (1996), Kazdin, A. & Bootzin, R. (1972),
Kelley, C. (1999), Kirkpatrick, J. E. (1934), O’Leary, K. D. & Drabman, R. (1971), Palmer, J. S. (2002), Richards, C. E. & Shen, T. M. (1992), .Rosswork, S. G. (1977),  Staats, A. (1973), Tuckman, B. W. (1994),  Tuckman, B. W. & Trimble, S. (1997), Webster, W. J., Mendro, R. L., Orsack, T., Weerasinghe, D. & Bembry, K. (1997)
56 Eva L. Baker   "At the same time that interest in alternative assessment is high, our knowledge about the design, distribution, quality and impact of such efforts is low. This is a time of tingling metaphor, cottage industry, and existence proofs rather than carefully designed research and development." p.2 Dismissive, Denigrating What Probably Works in Alternative Assessment, July 2010 CRESST Report 772   Institute of Education Sciences, US Education Department It is selected-response item formats (e.g., multiple choice) that are new. Open-ended item formats have been with us for millenia. And, thousands of research, evalution, and validity studies have been conducted on them. See, for example,  https://www.tandfonline.com/doi/full/10.1080/15305058.2011.602920 ; https://nonpartisaneducation.org/Review/Resources/QuantitativeList.htm ; https://nonpartisaneducation.org/Review/Resources/SurveyList.htm ; https://nonpartisaneducation.org/Review/Resources/QualitativeList.htm
57 Eva L. Baker   "Moreover, because psychometric methods appropriate for dealing with such new measures are not readily available, nor even a matter of common agreement, no clear templates exist to guide the technical practices of alternative assessment developers (Linn, Baker, Dunbar, 1991)." p.2 Dismissive What Probably Works in Alternative Assessment, July 2010 CRESST Report 772   Institute of Education Sciences, US Education Department It is selected-response item formats (e.g., multiple choice) that are new. Open-ended item formats have been with us for millenia. And, thousands of research, evalution, and validity studies have been conducted on them. See, for example,  https://www.tandfonline.com/doi/full/10.1080/15305058.2011.602920 ; https://nonpartisaneducation.org/Review/Resources/QuantitativeList.htm ; https://nonpartisaneducation.org/Review/Resources/SurveyList.htm ; https://nonpartisaneducation.org/Review/Resources/QualitativeList.htm
58 Eva L. Baker   "Given that the level of empirical work is so obviously low, one well might wonder what these studies are about. Some studies argue for new approaches to achievement testing." p.3 Denigrating What Probably Works in Alternative Assessment, July 2010 CRESST Report 772   Institute of Education Sciences, US Education Department She looked in two databases -- ERIC and NTIS -- and then implied she had looked everywhere.
59 Eva L. Baker   "Despite this fragile research base, alternative assessment has already taken off. What issues can we anticipate being raised by relevant communities about the value of these efforts?" p.6 Dismissive, Denigrating What Probably Works in Alternative Assessment, July 2010 CRESST Report 772   Institute of Education Sciences, US Education Department She looked in two databases -- ERIC and NTIS -- and then implied she had looked everywhere.
60 Lawrence O. Picus
Frank Adamson
William Montague
Margaret Owens
"As in the earlier studies, efforts are made to distinguish between the concept of economic or opportunity costs (i.e., the use of teacher time that is already “paid for” through the contract and used as part of the assessment process rather then for some other activity or function), and the direct expenditures made for assessment." p.1 Dismissive A New Conceptual Framework for Analyzing the Costs of Performance Assessment, 2010 The Stanford Center for Opportunity Policy in Education (SCOPE) https://edpolicy.stanford.edu/sites/default/files/publications/new-conceptual-framework-analyzing-costs-performance-assessment_0.pdf SCOPE funders (1) For at least two decades, Larry Picus has elevated the trivial and elemental difference between expenditure and cost to the level of heavenly revelation. As any beginning undergraduate in economics knows expenditures -- particularly budgetary line-item expenditures -- don't necessarily equal the cost of an item or acitivity. The classifications of the amounts might or might not match. Picus needled the trivial point over and over for decades. Meanwhile my project on  testing costs at the GAO (1991-1993) was a cost study in every sense that Picus identified for the term, but the word "expenditures" was in the title of the report. So, when Picus repeated and repeated that most studies on the topic prior to his were "just expenditure studies" (and not really "cost" studies) there was the GAO report, one of the few cost studies done prior to his, with the word expenditure in its title. The ploy worked, and many were convinced then, and still today, that my work at the GAO relied on budgetary line-item expenditure data (it didn't), neglected to include the cost of personnel time (it did include those costs), or was otherwise suspect, an inferior study. Picus and CRESST managed to denigrate into oblivion a taxpayer-funded study that was vastly superior to any he would ever do.
61 Lawrence O. Picus
Frank Adamson
William Montague
Margaret Owens
"Determining the resources necessary to achieve each of these goals is, at best, a complex task. Because of this difficulty, many analysts stop short of estimating the true costs of a program, and instead focus on the expenditures required for its implementation." p.7 Dismissive A New Conceptual Framework for Analyzing the Costs of Performance Assessment, 2010 The Stanford Center for Opportunity Policy in Education (SCOPE) https://edpolicy.stanford.edu/sites/default/files/publications/new-conceptual-framework-analyzing-costs-performance-assessment_0.pdf SCOPE funders (1) For at least two decades, Larry Picus has elevated the trivial and elemental difference between expenditure and cost to the level of heavenly revelation. As any beginning undergraduate in economics knows expenditures -- particularly budgetary line-item expenditures -- don't necessarily equal the cost of an item or acitivity. The classifications of the amounts might or might not match. Picus needled the trivial point over and over for decades. Meanwhile my project on  testing costs at the GAO (1991-1993) was a cost study in every sense that Picus identified for the term, but the word "expenditures" was in the title of the report. So, when Picus repeated and repeated that most studies on the topic prior to his were "just expenditure studies" (and not really "cost" studies) there was the GAO report, one of the few cost studies done prior to his, with the word expenditure in its title. The ploy worked, and many were convinced then, and still today, that my work at the GAO relied on budgetary line-item expenditure data (it didn't), neglected to include the cost of personnel time (it did include those costs), or was otherwise suspect, an inferior study. Picus and CRESST managed to denigrate into oblivion a taxpayer-funded study that was vastly superior to any he would ever do.
62 Lawrence O. Picus
Frank Adamson
William Montague
Margaret Owens
"The study defined purchase cost as the money spent on test-related goods and services, a category in line with what we call expenditures (U.S. GAO, 1993)." p.21 Denigrating A New Conceptual Framework for Analyzing the Costs of Performance Assessment, 2010 The Stanford Center for Opportunity Policy in Education (SCOPE) https://edpolicy.stanford.edu/sites/default/files/publications/new-conceptual-framework-analyzing-costs-performance-assessment_0.pdf SCOPE funders (1) For at least two decades, Larry Picus has elevated the trivial and elemental difference between expenditure and cost to the level of heavenly revelation. As any beginning undergraduate in economics knows expenditures -- particularly budgetary line-item expenditures -- don't necessarily equal the cost of an item or acitivity. The classifications of the amounts might or might not match. Picus needled the trivial point over and over for decades. Meanwhile my project on  testing costs at the GAO (1991-1993) was a cost study in every sense that Picus identified for the term, but the word "expenditures" was in the title of the report. So, when Picus repeated and repeated that most studies on the topic prior to his were "just expenditure studies" (and not really "cost" studies) there was the GAO report, one of the few cost studies done prior to his, with the word expenditure in its title. The ploy worked, and many were convinced then, and still today, that my work at the GAO relied on budgetary line-item expenditure data (it didn't), neglected to include the cost of personnel time (it did include those costs), or was otherwise suspect, an inferior study. Picus and CRESST managed to denigrate into oblivion a taxpayer-funded study that was vastly superior to any he would ever do.
63 Lawrence O. Picus
Frank Adamson
William Montague
Margaret Owens
"Unfortunately, aggregating these different types of time disguises important differences between them that, in fairness to the GAO, have emerged in the NCLB era as more important considerations than in previous decades. Specifically, test-preparation time for students has become a subject of national debate about how much class time teachers spend 'teaching to the test.'" p.21 Denigrating A New Conceptual Framework for Analyzing the Costs of Performance Assessment, 2010 The Stanford Center for Opportunity Policy in Education (SCOPE) https://edpolicy.stanford.edu/sites/default/files/publications/new-conceptual-framework-analyzing-costs-performance-assessment_0.pdf SCOPE funders (1) I continued to publish articles and made presentations based on the GAO project for several years after I left the GAO. These publications reported the disagregated costs and estimated benefits. Indeed, I published a net benefit (i.e., benefit/cost) study in the Journal of Education Finance ten years prior to this Picus article. Almost certainly he knows about it -- he has served as editor or on the editorial board for that journal for many years. In this report of his for SCOPE, my name is never mentioned nor are any of my many publications or presentations related to the costs and benefits of testing. 
64 Lawrence O. Picus
Frank Adamson
William Montague
Margaret Owens
"In its analysis, the GAO does provide aggregate time estimates. However, it does not provide disaggregated estimates of teacher time, nor estimated benefits in terms of either teacher PD or improved student learning." p.21 Denigrating A New Conceptual Framework for Analyzing the Costs of Performance Assessment, 2010 The Stanford Center for Opportunity Policy in Education (SCOPE) https://edpolicy.stanford.edu/sites/default/files/publications/new-conceptual-framework-analyzing-costs-performance-assessment_0.pdf SCOPE funders (1) I continued to publish articles and made presentations based on the GAO project for several years after I left the GAO. These publications reported the disagregated costs and estimated benefits. Indeed, I published a net benefit (i.e., benefit/cost) study in the Journal of Education Finance ten years prior to this Picus article. Almost certainly he knows about it -- he has served as editor or on the editorial board for that journal for many years. In this report of his for SCOPE, my name is never mentioned nor are any of my many publications or presentations related to the costs and benefits of testing. 
65 Lawrence O. Picus
Frank Adamson
William Montague
Margaret Owens
"The performance assessments studied by the GAO also do not demonstrate much variety. Most included only writing samples, reading comprehension and response, and math/science problem-solving items. A few districts used science lab work, group work, and skills observations, but most still relied on paper-and-pencil testing (U.S. GAO, 1993)." p.21 Denigrating A New Conceptual Framework for Analyzing the Costs of Performance Assessment, 2010 The Stanford Center for Opportunity Policy in Education (SCOPE) https://edpolicy.stanford.edu/sites/default/files/publications/new-conceptual-framework-analyzing-costs-performance-assessment_0.pdf SCOPE funders (1) Picus neglects to mention that the GAO collected data from the universe of states with testing programs and a very large, representative sample (> 660) of public school districts. We collected all the data on all the systemwide testing occurring at the time. We oversampled districts in certain states, such as Maryland, the one state at the time with the most elaborate performance test types. In doing that, we did more than he ever did in his couple of state studies. Yet, as usual, he implies that the GAO study or my work must have left out something important. 
66 Lawrence O. Picus
Frank Adamson
William Montague
Margaret Owens
"In every instance, test developers crafting the performancebased tests started from scratch, writing test questions that fit the state’s curriculum or guidelines, then testing the draft on pilot groups of students and using an iterative revision process that did not involve state curriculum, which was undergoing simultaneous development (U.S. GAO, 1993)." p.22 Denigrating A New Conceptual Framework for Analyzing the Costs of Performance Assessment, 2010 The Stanford Center for Opportunity Policy in Education (SCOPE) https://edpolicy.stanford.edu/sites/default/files/publications/new-conceptual-framework-analyzing-costs-performance-assessment_0.pdf SCOPE funders (1) This sentence doesn't make sense, but he doesn't include page numbers in his citations so it is not even possible to find what text he might have been misunderstanding. Within one sentence, Picus claims that test items were based on established content standards, but then not based on them, because they didn't yet exist. The latter point is certainly not true. When standards-based tests are developed, the content standards are completed first, and the test items are written directly from them. 
67 Joan L. Herman
Ellen Osmundson, David Silver "These indeed are promising developments for pushing formative assessment to fruition in classroom practice. They acknowledge and work toward remedying the need for classroom tools to assess and support student learning. Yet at the same time, recent studies reveal challenges in implementing quality formative assessment and show non-robust results with regard to effects on student learning (Herman, Osmundson, Ayala, Schneider, & Timms, 2006; Furtak, et al., 2008)." Dismissive, Denigrating Capturing Quality in Formative Assessment Practice: Measurement Challenges, p.2 CRESST Report 770, June 2010 https://eric.ed.gov/?id=ED512648 Institute of Education Sciences, US Education Department Studies of formative testing date back a cenury, and the evidence, on average, is strongly positive, which is not the result favored by CRESST, so they declare the studies nonexistent. See, for example,  https://www.tandfonline.com/doi/full/10.1080/15305058.2011.602920 ; https://nonpartisaneducation.org/Review/Resources/QuantitativeList.htm ; https://nonpartisaneducation.org/Review/Resources/SurveyList.htm ; https://nonpartisaneducation.org/Review/Resources/QualitativeList.htm
68 Joan L. Herman
Ellen Osmundson, David Silver "Just as the concept of formative assessment itself underscores the central role of evidence—learning data—in an effective teaching and learning process, so too do policymakers and practitioners need evidence on which to build effective formative practices. Toward this latter goal, this report explores ..." 1stness Capturing Quality in Formative Assessment Practice: Measurement Challenges, p.2 CRESST Report 770, June 2010 https://eric.ed.gov/?id=ED512648 Institute of Education Sciences, US Education Department Studies of formative testing date back a cenury, and the evidence, on average, is strongly positive, which is not the result favored by CRESST, so they declare the studies nonexistent. See, for example,  https://www.tandfonline.com/doi/full/10.1080/15305058.2011.602920 ; https://nonpartisaneducation.org/Review/Resources/QuantitativeList.htm ; https://nonpartisaneducation.org/Review/Resources/SurveyList.htm ; https://nonpartisaneducation.org/Review/Resources/QualitativeList.htm
69 Diana Pullin (Chair)
Joan Herman, Scott Marion, Dirk Mattson, Rebecca Maynard, Mark Wilson,  "However, there have been very few studies of how interim assessments are actually used by individual teachers in classrooms, by principals, and by districts or of their impact on student achievement." p. 6 Dismissive Best Practices for State Assessment Systems, Part I Committee on Best Practices for State Assessment Systems: Improving Assessment While Revisiting Standards; Center for Education; Division of Behavioral and Social Sciences and Education; National Research Council https://www.nap.edu/catalog/12906/best-practices-for-state-assessment-systems-part-i-summary-of "With funding from the James B. Hunt, Jr. Institute for Educational Leadership and Policy, as well as additional support from the Bill & Melinda Gates Foundation and the Stupski Foundation, the National Research Council (NRC) planned two workshops designed to explore some of the possibilities for state assessment systems."  
70 Diana Pullin (Chair)
Joan Herman, Scott Marion, Dirk Mattson, Rebecca Maynard, Mark Wilson,  "Research indicates that the result has been emphasis on lower-level knowledge and skills and very thin alignment with the standards. For example, Porter, Polikoff, and Smithson (2009) found very low to moder
ate alignment between state assessments and standards—meaning that large proportions of content standards are not covered on the assessments (see also Fuller et al., 2006; Ho, 2008). p. 10
Denigrating Best Practices for State Assessment Systems, Part I Committee on Best Practices for State Assessment Systems: Improving Assessment While Revisiting Standards; Center for Education; Division of Behavioral and Social Sciences and Education; National Research Council https://www.nap.edu/catalog/12906/best-practices-for-state-assessment-systems-part-i-summary-of "With funding from the James B. Hunt, Jr. Institute for Educational Leadership and Policy, as well as additional support from the Bill & Melinda Gates Foundation and the Stupski Foundation, the National Research Council (NRC) planned two workshops designed to explore some of the possibilities for state assessment systems."  
71 Diana Pullin (Chair)
Joan Herman, Scott Marion, Dirk Mattson, Rebecca Maynard, Mark Wilson,  "Another issue is that the implications of computer-based approaches for validity and reliability have not been thoroughly evaluated." p. 40 Dismissive Best Practices for State Assessment Systems, Part I Committee on Best Practices for State Assessment Systems: Improving Assessment While Revisiting Standards; Center for Education; Division of Behavioral and Social Sciences and Education; National Research Council https://www.nap.edu/catalog/12906/best-practices-for-state-assessment-systems-part-i-summary-of "With funding from the James B. Hunt, Jr. Institute for Educational Leadership and Policy, as well as additional support from the Bill & Melinda Gates Foundation and the Stupski Foundation, the National Research Council (NRC) planned two workshops designed to explore some of the possibilities for state assessment systems."  
72 Diana Pullin (Chair)
Joan Herman, Scott Marion, Dirk Mattson, Rebecca Maynard, Mark Wilson,  "For current tests, he [Lauress Wise] observed, there is little evidence that they are good indicators of instructional effectiveness or good predictors of students’ readiness for subsequent levels of instruction." Dismissive Best Practices for State Assessment Systems, Part I Committee on Best Practices for State Assessment Systems: Improving Assessment While Revisiting Standards; Center for Education; Division of Behavioral and Social Sciences and Education; National Research Council https://www.nap.edu/catalog/12906/best-practices-for-state-assessment-systems-part-i-summary-of "With funding from the James B. Hunt, Jr. Institute for Educational Leadership and Policy, as well as additional support from the Bill & Melinda Gates Foundation and the Stupski Foundation, the National Research Council (NRC) planned two workshops designed to explore some of the possibilities for state assessment systems."  
73 Laura S. Hamilton Brian M. Stecher, Kun Yuan “A few studies have attempted to examine how the creation and publication of standards, per se, have affected practices.” p. 3 Dismissive Standards-Based Reform in the United States: History, Research, and Future Directions Center on Education Policy, December, 2008 http://www.rand.org/content/dam/rand/pubs/reprints/2009/RAND_RP1384.pdf Institute of Education Sciences, US Education Department Relevant pre-2000 studies of the effects of standards, alignment, goal setting, setting reachable goals, etc. include those of Mitchell (1999); Morgan & Ramist (1998); the *Southern Regional Education Board (1998); Miles, Bishop, Collins, Fink, Gardner, Grant, Hussain, et al. (1997); the Florida Office of Program Policy Analysis (1997); Pomplun (1997); Schmoker (1996); Aguilera & Hendricks (1996); Banta, Lund, Black & Oblander (1996); Bottoms & Mikos (1995); *Bamburg & Medina (1993); Bishop (1993); the U. S. General Accounting Office (1993); Eckstein & Noah (1993); Mattsson (1993); Brown (1992); Heyneman & Ransom (1992); Whetton (1992); Anderson, Muir, Bateson, Blackmore & Rogers (1990); Csikszentmihalyi (1990); *Levine & Lezotte (1990); LaRoque & Coleman (1989); Hillocks (1987); Willingham & Morris (1986); Resnick & Resnick (1985); Ogle & Fritts (1984); *Natriello & Dornbusch (1984); Brooke & Oxenham (1984); Rentz (1979); Wellisch, MacQueen, Carriere & Dick (1978); *Rosswork (1977); Estes, Colvin & Goodwin (1976); Wood (1953); and Panlasigui & Knight (1930).
74 Laura S. Hamilton Brian M. Stecher, Kun Yuan “The research evidence does not provide definitive answers to these questions.” p. 6 Denigrating Standards-Based Reform in the United States: History, Research, and Future Directions Center on Education Policy, December, 2008 http://www.rand.org/content/dam/rand/pubs/reprints/2009/RAND_RP1384.pdf Institute of Education Sciences, US Education Department Relevant pre-2000 studies of the effects of standards, alignment, goal setting, setting reachable goals, etc. include those of Mitchell (1999); Morgan & Ramist (1998); the *Southern Regional Education Board (1998); Miles, Bishop, Collins, Fink, Gardner, Grant, Hussain, et al. (1997); the Florida Office of Program Policy Analysis (1997); Pomplun (1997); Schmoker (1996); Aguilera & Hendricks (1996); Banta, Lund, Black & Oblander (1996); Bottoms & Mikos (1995); *Bamburg & Medina (1993); Bishop (1993); the U. S. General Accounting Office (1993); Eckstein & Noah (1993); Mattsson (1993); Brown (1992); Heyneman & Ransom (1992); Whetton (1992); Anderson, Muir, Bateson, Blackmore & Rogers (1990); Csikszentmihalyi (1990); *Levine & Lezotte (1990); LaRoque & Coleman (1989); Hillocks (1987); Willingham & Morris (1986); Resnick & Resnick (1985); Ogle & Fritts (1984); *Natriello & Dornbusch (1984); Brooke & Oxenham (1984); Rentz (1979); Wellisch, MacQueen, Carriere & Dick (1978); *Rosswork (1977); Estes, Colvin & Goodwin (1976); Wood (1953); and Panlasigui & Knight (1930).
75 Laura S. Hamilton Brian M. Stecher, Kun Yuan “He [Poynter 1994] also noted that ‘virtually all of the arguments, both for and against standards, are based on beliefs and hypotheses rather than on direct empirical evidence’ (p. 427).” pp. 34-35 Dismissive, Denigrating Standards-Based Reform in the United States: History, Research, and Future Directions Center on Education Policy, December, 2008 http://www.rand.org/content/dam/rand/pubs/reprints/2009/RAND_RP1384.pdf Institute of Education Sciences, US Education Department Relevant pre-2000 studies of the effects of standards, alignment, goal setting, setting reachable goals, etc. include those of Mitchell (1999); Morgan & Ramist (1998); the *Southern Regional Education Board (1998); Miles, Bishop, Collins, Fink, Gardner, Grant, Hussain, et al. (1997); the Florida Office of Program Policy Analysis (1997); Pomplun (1997); Schmoker (1996); Aguilera & Hendricks (1996); Banta, Lund, Black & Oblander (1996); Bottoms & Mikos (1995); *Bamburg & Medina (1993); Bishop (1993); the U. S. General Accounting Office (1993); Eckstein & Noah (1993); Mattsson (1993); Brown (1992); Heyneman & Ransom (1992); Whetton (1992); Anderson, Muir, Bateson, Blackmore & Rogers (1990); Csikszentmihalyi (1990); *Levine & Lezotte (1990); LaRoque & Coleman (1989); Hillocks (1987); Willingham & Morris (1986); Resnick & Resnick (1985); Ogle & Fritts (1984); *Natriello & Dornbusch (1984); Brooke & Oxenham (1984); Rentz (1979); Wellisch, MacQueen, Carriere & Dick (1978); *Rosswork (1977); Estes, Colvin & Goodwin (1976); Wood (1953); and Panlasigui & Knight (1930).
76 Laura S. Hamilton Brian M. Stecher, Kun Yuan "Although a large and growing body of research has been conducted to examine the effects of SBR, the caution Poynter expressed in 1994 about the lack of empirical evidence remains relevant today.” pp. 34-35 Dismissive Standards-Based Reform in the United States: History, Research, and Future Directions Center on Education Policy, December, 2008 http://www.rand.org/content/dam/rand/pubs/reprints/2009/RAND_RP1384.pdf Institute of Education Sciences, US Education Department Relevant pre-2000 studies of the effects of standards, alignment, goal setting, setting reachable goals, etc. include those of Mitchell (1999); Morgan & Ramist (1998); the *Southern Regional Education Board (1998); Miles, Bishop, Collins, Fink, Gardner, Grant, Hussain, et al. (1997); the Florida Office of Program Policy Analysis (1997); Pomplun (1997); Schmoker (1996); Aguilera & Hendricks (1996); Banta, Lund, Black & Oblander (1996); Bottoms & Mikos (1995); *Bamburg & Medina (1993); Bishop (1993); the U. S. General Accounting Office (1993); Eckstein & Noah (1993); Mattsson (1993); Brown (1992); Heyneman & Ransom (1992); Whetton (1992); Anderson, Muir, Bateson, Blackmore & Rogers (1990); Csikszentmihalyi (1990); *Levine & Lezotte (1990); LaRoque & Coleman (1989); Hillocks (1987); Willingham & Morris (1986); Resnick & Resnick (1985); Ogle & Fritts (1984); *Natriello & Dornbusch (1984); Brooke & Oxenham (1984); Rentz (1979); Wellisch, MacQueen, Carriere & Dick (1978); *Rosswork (1977); Estes, Colvin & Goodwin (1976); Wood (1953); and Panlasigui & Knight (1930).
77 Laura S. Hamilton Brian M. Stecher, Kun Yuan “Arguably the most important test of quality is whether the standards promote high-quality instruction and improved student learning, but as we discuss later, there is very little research to address that question.” p. 37 Dismissive Standards-Based Reform in the United States: History, Research, and Future Directions Center on Education Policy, December, 2008 http://www.rand.org/content/dam/rand/pubs/reprints/2009/RAND_RP1384.pdf Institute of Education Sciences, US Education Department Relevant pre-2000 studies of the effects of standards, alignment, goal setting, setting reachable goals, etc. include those of Mitchell (1999); Morgan & Ramist (1998); the *Southern Regional Education Board (1998); Miles, Bishop, Collins, Fink, Gardner, Grant, Hussain, et al. (1997); the Florida Office of Program Policy Analysis (1997); Pomplun (1997); Schmoker (1996); Aguilera & Hendricks (1996); Banta, Lund, Black & Oblander (1996); Bottoms & Mikos (1995); *Bamburg & Medina (1993); Bishop (1993); the U. S. General Accounting Office (1993); Eckstein & Noah (1993); Mattsson (1993); Brown (1992); Heyneman & Ransom (1992); Whetton (1992); Anderson, Muir, Bateson, Blackmore & Rogers (1990); Csikszentmihalyi (1990); *Levine & Lezotte (1990); LaRoque & Coleman (1989); Hillocks (1987); Willingham & Morris (1986); Resnick & Resnick (1985); Ogle & Fritts (1984); *Natriello & Dornbusch (1984); Brooke & Oxenham (1984); Rentz (1979); Wellisch, MacQueen, Carriere & Dick (1978); *Rosswork (1977); Estes, Colvin & Goodwin (1976); Wood (1953); and Panlasigui & Knight (1930).
78 Laura S. Hamilton Brian M. Stecher, Kun Yuan “[T]here have been a few studies of SBR as a comprehensive system. . . . [T]here is some research on how the adoption of standards, per se, or the alignment of standards with curriculum influences school practices or student outcomes.” p. 38 Dismissive Standards-Based Reform in the United States: History, Research, and Future Directions Center on Education Policy, December, 2008 http://www.rand.org/content/dam/rand/pubs/reprints/2009/RAND_RP1384.pdf Institute of Education Sciences, US Education Department Relevant pre-2000 studies of the effects of standards, alignment, goal setting, setting reachable goals, etc. include those of Mitchell (1999); Morgan & Ramist (1998); the *Southern Regional Education Board (1998); Miles, Bishop, Collins, Fink, Gardner, Grant, Hussain, et al. (1997); the Florida Office of Program Policy Analysis (1997); Pomplun (1997); Schmoker (1996); Aguilera & Hendricks (1996); Banta, Lund, Black & Oblander (1996); Bottoms & Mikos (1995); *Bamburg & Medina (1993); Bishop (1993); the U. S. General Accounting Office (1993); Eckstein & Noah (1993); Mattsson (1993); Brown (1992); Heyneman & Ransom (1992); Whetton (1992); Anderson, Muir, Bateson, Blackmore & Rogers (1990); Csikszentmihalyi (1990); *Levine & Lezotte (1990); LaRoque & Coleman (1989); Hillocks (1987); Willingham & Morris (1986); Resnick & Resnick (1985); Ogle & Fritts (1984); *Natriello & Dornbusch (1984); Brooke & Oxenham (1984); Rentz (1979); Wellisch, MacQueen, Carriere & Dick (1978); *Rosswork (1977); Estes, Colvin & Goodwin (1976); Wood (1953); and Panlasigui & Knight (1930).
79 Laura S. Hamilton Brian M. Stecher, Kun Yuan “The lack of evidence about the effects of SBR derives primarily from the fact that the vision has never been fully realized in practice.” p. 47 Dismissive Standards-Based Reform in the United States: History, Research, and Future Directions Center on Education Policy, December, 2008 http://www.rand.org/content/dam/rand/pubs/reprints/2009/RAND_RP1384.pdf Institute of Education Sciences, US Education Department Relevant pre-2000 studies of the effects of standards, alignment, goal setting, setting reachable goals, etc. include those of Mitchell (1999); Morgan & Ramist (1998); the *Southern Regional Education Board (1998); Miles, Bishop, Collins, Fink, Gardner, Grant, Hussain, et al. (1997); the Florida Office of Program Policy Analysis (1997); Pomplun (1997); Schmoker (1996); Aguilera & Hendricks (1996); Banta, Lund, Black & Oblander (1996); Bottoms & Mikos (1995); *Bamburg & Medina (1993); Bishop (1993); the U. S. General Accounting Office (1993); Eckstein & Noah (1993); Mattsson (1993); Brown (1992); Heyneman & Ransom (1992); Whetton (1992); Anderson, Muir, Bateson, Blackmore & Rogers (1990); Csikszentmihalyi (1990); *Levine & Lezotte (1990); LaRoque & Coleman (1989); Hillocks (1987); Willingham & Morris (1986); Resnick & Resnick (1985); Ogle & Fritts (1984); *Natriello & Dornbusch (1984); Brooke & Oxenham (1984); Rentz (1979); Wellisch, MacQueen, Carriere & Dick (1978); *Rosswork (1977); Estes, Colvin & Goodwin (1976); Wood (1953); and Panlasigui & Knight (1930).
80 Laura S. Hamilton Brian M. Stecher, Kun Yuan “[A]lthough many conceptions of SBR emphasize autonomy, we currently know relatively little about the effects of granting autonomy or what the right balance is between autonomy and prescriptiveness.” p. 55 Dismissive Standards-Based Reform in the United States: History, Research, and Future Directions Center on Education Policy, December, 2008 http://www.rand.org/content/dam/rand/pubs/reprints/2009/RAND_RP1384.pdf Institute of Education Sciences, US Education Department Relevant pre-2000 studies of the effects of standards, alignment, goal setting, setting reachable goals, etc. include those of Mitchell (1999); Morgan & Ramist (1998); the *Southern Regional Education Board (1998); Miles, Bishop, Collins, Fink, Gardner, Grant, Hussain, et al. (1997); the Florida Office of Program Policy Analysis (1997); Pomplun (1997); Schmoker (1996); Aguilera & Hendricks (1996); Banta, Lund, Black & Oblander (1996); Bottoms & Mikos (1995); *Bamburg & Medina (1993); Bishop (1993); the U. S. General Accounting Office (1993); Eckstein & Noah (1993); Mattsson (1993); Brown (1992); Heyneman & Ransom (1992); Whetton (1992); Anderson, Muir, Bateson, Blackmore & Rogers (1990); Csikszentmihalyi (1990); *Levine & Lezotte (1990); LaRoque & Coleman (1989); Hillocks (1987); Willingham & Morris (1986); Resnick & Resnick (1985); Ogle & Fritts (1984); *Natriello & Dornbusch (1984); Brooke & Oxenham (1984); Rentz (1979); Wellisch, MacQueen, Carriere & Dick (1978); *Rosswork (1977); Estes, Colvin & Goodwin (1976); Wood (1953); and Panlasigui & Knight (1930).
81 Laura S. Hamilton Brian M. Stecher, Kun Yuan “One of the primary responsibilities of the federal government should be to ensure ongoing collection of evidence demonstrating the effects of the policies, which could be used to make decisions about whether to continue on the current course or whether small adjustments or a major overhaul are needed.” p. 55 Dismissive Standards-Based Reform in the United States: History, Research, and Future Directions Center on Education Policy, December, 2008 http://www.rand.org/content/dam/rand/pubs/reprints/2009/RAND_RP1384.pdf Institute of Education Sciences, US Education Department Relevant pre-2000 studies of the effects of standards, alignment, goal setting, setting reachable goals, etc. include those of Mitchell (1999); Morgan & Ramist (1998); the *Southern Regional Education Board (1998); Miles, Bishop, Collins, Fink, Gardner, Grant, Hussain, et al. (1997); the Florida Office of Program Policy Analysis (1997); Pomplun (1997); Schmoker (1996); Aguilera & Hendricks (1996); Banta, Lund, Black & Oblander (1996); Bottoms & Mikos (1995); *Bamburg & Medina (1993); Bishop (1993); the U. S. General Accounting Office (1993); Eckstein & Noah (1993); Mattsson (1993); Brown (1992); Heyneman & Ransom (1992); Whetton (1992); Anderson, Muir, Bateson, Blackmore & Rogers (1990); Csikszentmihalyi (1990); *Levine & Lezotte (1990); LaRoque & Coleman (1989); Hillocks (1987); Willingham & Morris (1986); Resnick & Resnick (1985); Ogle & Fritts (1984); *Natriello & Dornbusch (1984); Brooke & Oxenham (1984); Rentz (1979); Wellisch, MacQueen, Carriere & Dick (1978); *Rosswork (1977); Estes, Colvin & Goodwin (1976); Wood (1953); and Panlasigui & Knight (1930).
82 Douglas N. Harris Lori L. Taylor, Amy A. Levine, William K. Ingle, Leslie McDonald "However, previous studies under-state current costs by focusing on costs before NCLB was put in place and by excluding important cost categories." Denigrating The Resource Costs of Standards, Assessments, and Accountability report to the National Research Council   National Research Council funders No, they did not leave out important cost categories; Harris' study deliberately exagerates costs. See pages 3-10:  https://nonpartisaneducation.org/Review/Essays/v10n1.pdf
83 Joan Herman Katherine E. Ryan, Lorrie A. Shepard, Eds. "Yet, available evidence suggests that the rhetoric surpasses the reality of formative assessment use" p.217 Denigrating Accountability and assessment: Is public interest in K-12 education being served? Chapter 11 in The Future of Test-Based Educational Accountability https://www.routledge.com/The-Future-of-Test-Based-Educational-Accountability-1st-Edition/Ryan-Shepard/p/book/9780805864700 Institute of Education Sciences, US Education Department Studies of formative testing date back a cenury, and the evidence, on average, is strongly positive, which is not the result favored by CRESST, so they declare the studies nonexistent. See, for example,  https://www.tandfonline.com/doi/full/10.1080/15305058.2011.602920 ; https://nonpartisaneducation.org/Review/Resources/QuantitativeList.htm ; https://nonpartisaneducation.org/Review/Resources/SurveyList.htm ; https://nonpartisaneducation.org/Review/Resources/QualitativeList.htm
84 Joan Herman Katherine E. Ryan, Lorrie A. Shepard, Eds. "The research base examining effects on students with disabilities and on English Language learners is scanty." p.223 Dismissive Accountability and assessment: Is public interest in K-12 education being served? Chapter 11 in The Future of Test-Based Educational Accountability https://www.routledge.com/The-Future-of-Test-Based-Educational-Accountability-1st-Edition/Ryan-Shepard/p/book/9780805864700 Institute of Education Sciences, US Education Department  
85 Joan Herman Katherine E. Ryan, Lorrie A. Shepard, Eds. "...there is no obvious accountability mechanism for the "average student" who may have made it just over the proficient level. There is little research on this issue." p.224 Dismissive Accountability and assessment: Is public interest in K-12 education being served? Chapter 11 in The Future of Test-Based Educational Accountability https://www.routledge.com/The-Future-of-Test-Based-Educational-Accountability-1st-Edition/Ryan-Shepard/p/book/9780805864700 Institute of Education Sciences, US Education Department Relevant pre-2000 studies of the effects of minimum-competency testing and the problems with a single passing score include those of Frederiksen (1994); Jacobson (1992); Winfield (1990); Ligon, Johnstone, Brightman, Davis, et al. (1990); Losack (1987); Marshall (1987); Mangino & Babcock (1986); Michigan Department of Education (1984); Serow (1982); Brunton (1982); Paramore, et al. (1980); Ogden (1979); and Findley (1978).
86 Joan Herman   "The report considers how well the model fits available evidence by examining whether and how accountability assessment influences students’ learning opportunities and the relationship between accountability and learning." abstract Dismissive Accountability and assessment: Is public interest in K-12 education being served? CRESST Report 728, October 2007 https://eric.ed.gov/?id=ED499421 Institute of Education Sciences, US Education Department See, for example, Test Frequency, Stakes, and Feedback in Student Achievement: A Meta-Analysis   https://journals.sagepub.com/doi/abs/10.1177/0193841X19865628#abstract
87 Joan Herman   "What of the impact of accountability on other segments of the student population--traditionally higher performing students? ...The average student? ...there is no obvious accountability mechanism for the "average student. There is little research on this issue." p.17 Dismissive Accountability and assessment: Is public interest in K-12 education being served? CRESST Report 728, October 2007 https://eric.ed.gov/?id=ED499421 Institute of Education Sciences, US Education Department Relevant pre-2000 studies of the effects of minimum-competency testing and the problems with a single passing score include those of Frederiksen (1994); Winfield (1990); Ligon, Johnstone, Brightman, Davis, et al. (1990); Losack (1987); Mangino & Babcock (1986); Serow (1982); Brunton (1982); Paramore, et al. (1980); Ogden (1979); and Findley (1978).
88 Joan Herman   "While a thorough treatment of the effects on teachers is also beyond the scope of this report, it is worth noting a growing literature that is cause for concern." p.17 Dismissive Accountability and assessment: Is public interest in K-12 education being served? CRESST Report 728, October 2007 https://eric.ed.gov/?id=ED499421 Institute of Education Sciences, US Education Department Rubbish. Entire books dating back a century were written on the topic, for example:  C.C. Ross, Measurement in Today’s Schools, 1942;  G.M. Ruch, G.D. Stoddard, Tests and Measurements in High School Instruction, 1927;  C.W. Odell, Educational Measurement in High School, 1930. Other testimonies to the abundance of educational testing and empirical research on test use starting in the first half of the twentieth century can be found in Lincoln & Workman 1936, 4, 7; Butts 1947, 605; Monroe 1950, 1461; Holman & Docter 1972, 34; Tyack 1974, 183; and Lohman 1997, 88.
89 Joan Herman   "The research base examining effects on students with disabilities and on English language learner students is scanty." pp.16-17 Dismissive Accountability and assessment: Is public interest in K-12 education being served? CRESST Report 728, October 2007 https://eric.ed.gov/?id=ED499421 Institute of Education Sciences, US Education Department  
90 Eva L. Baker   "Tests only dimly reflect in their design the results of research on learning, whether of skills, subject matter, or problem solving." p.310 Denigrating The End(s) of Testing Educational Researcher, Vol. 36, No. 6, pp. 309–317   2007 Presidential Address for the American Educational Research Association  
91 Eva L. Baker   "To my mind, the evidential disconnect between test design and learning research is no small thing." p.310 Dismissive The End(s) of Testing Educational Researcher, Vol. 36, No. 6, pp. 309–317   2007 Presidential Address for the American Educational Research Association  
92 Eva L. Baker   "What if we set aside learning-based design and ask, “How well do any of our external tests work?” The answer is that we often don’t know enough to know. We have little evidence that tests are in sync with their stated or de facto purposes or that their results lead to appropriate decisions." p.310 Dismissive The End(s) of Testing Educational Researcher, Vol. 36, No. 6, pp. 309–317   2007 Presidential Address for the American Educational Research Association  
93 Laura S. Hamilton Brian M. Stecher, Julie A. Marsh, Jennifer Sloan McCombs, Abby Robyn, Jennifer Lin Russell, Scott Naftel, Heather Barney "For many educators, the utility of SBA was demonstrated in a few pioneering states in the 1990s. Two of the most prominent examples of SBA occurred in Texas and North Carolina, where scores on state accountability tests rose dramatically after the introduction of SBA systems (Grissmer and Flanagan, 1998)." p.4   Standards-Based Accountability Under No Child Left Behind: Experiences of Teachers and Administrators in Three States Rand Corporation, 2007 https://www.rand.org/pubs/monographs/MG589.html "This research was sponsored by the National Science Foundation under grant number REC-0228295." Relevant pre-2000 studies of the effects of standards, alignment, goal setting, setting reachable goals, etc. include those of Mitchell (1999); Morgan & Ramist (1998); the *Southern Regional Education Board (1998); Miles, Bishop, Collins, Fink, Gardner, Grant, Hussain, et al. (1997); the Florida Office of Program Policy Analysis (1997); Pomplun (1997); Schmoker (1996); Aguilera & Hendricks (1996); Banta, Lund, Black & Oblander (1996); Bottoms & Mikos (1995); *Bamburg & Medina (1993); Bishop (1993); the U. S. General Accounting Office (1993); Eckstein & Noah (1993); Mattsson (1993); Brown (1992); Heyneman & Ransom (1992); Whetton (1992); Anderson, Muir, Bateson, Blackmore & Rogers (1990); Csikszentmihalyi (1990); *Levine & Lezotte (1990); LaRoque & Coleman (1989); Hillocks (1987); Willingham & Morris (1986); Resnick & Resnick (1985); Ogle & Fritts (1984); *Natriello & Dornbusch (1984); Brooke & Oxenham (1984); Rentz (1979); Wellisch, MacQueen, Carriere & Dick (1978); *Rosswork (1977); Estes, Colvin & Goodwin (1976); Wood (1953); and Panlasigui & Knight (1930).
94 Laura S. Hamilton Brian M. Stecher, Julie A. Marsh, Jennifer Sloan McCombs, Abby Robyn, Jennifer Lin Russell, Scott Naftel, Heather Barney "However, the paths through which SBA [standards-based accountability] changes district, school, and classroom practices and how these changes in practice influence student outcomes are largely unexplored. There is strong evidence that SBA leads to changes in teachers’ instructional practices (Hamilton, 2004; Stecher, 2002)." p.5 Dismissive Standards-Based Accountability Under No Child Left Behind: Experiences of Teachers and Administrators in Three States Rand Corporation, 2007 https://www.rand.org/pubs/monographs/MG589.html "This research was sponsored by the National Science Foundation under grant number REC-0228295." Relevant pre-2000 studies of the effects of standards, alignment, goal setting, setting reachable goals, etc. include those of Mitchell (1999); Morgan & Ramist (1998); the *Southern Regional Education Board (1998); Miles, Bishop, Collins, Fink, Gardner, Grant, Hussain, et al. (1997); the Florida Office of Program Policy Analysis (1997); Pomplun (1997); Schmoker (1996); Aguilera & Hendricks (1996); Banta, Lund, Black & Oblander (1996); Bottoms & Mikos (1995); *Bamburg & Medina (1993); Bishop (1993); the U. S. General Accounting Office (1993); Eckstein & Noah (1993); Mattsson (1993); Brown (1992); Heyneman & Ransom (1992); Whetton (1992); Anderson, Muir, Bateson, Blackmore & Rogers (1990); Csikszentmihalyi (1990); *Levine & Lezotte (1990); LaRoque & Coleman (1989); Hillocks (1987); Willingham & Morris (1986); Resnick & Resnick (1985); Ogle & Fritts (1984); *Natriello & Dornbusch (1984); Brooke & Oxenham (1984); Rentz (1979); Wellisch, MacQueen, Carriere & Dick (1978); *Rosswork (1977); Estes, Colvin & Goodwin (1976); Wood (1953); and Panlasigui & Knight (1930).
95 Laura S. Hamilton Brian M. Stecher, Julie A. Marsh, Jennifer Sloan McCombs, Abby Robyn, Jennifer Lin Russell, Scott Naftel, Heather Barney "Much less is known about the impact of SBA at the district and school levels and the relationships among actions at the various levels and student outcomes. This study was designed to shed light on this complex set of relationships…" p.5 Dismissive Standards-Based Accountability Under No Child Left Behind: Experiences of Teachers and Administrators in Three States Rand Corporation, 2007 https://www.rand.org/pubs/monographs/MG589.html "This research was sponsored by the National Science Foundation under grant number REC-0228295." Relevant pre-2000 studies of the effects of standards, alignment, goal setting, setting reachable goals, etc. include those of Mitchell (1999); Morgan & Ramist (1998); the *Southern Regional Education Board (1998); Miles, Bishop, Collins, Fink, Gardner, Grant, Hussain, et al. (1997); the Florida Office of Program Policy Analysis (1997); Pomplun (1997); Schmoker (1996); Aguilera & Hendricks (1996); Banta, Lund, Black & Oblander (1996); Bottoms & Mikos (1995); *Bamburg & Medina (1993); Bishop (1993); the U. S. General Accounting Office (1993); Eckstein & Noah (1993); Mattsson (1993); Brown (1992); Heyneman & Ransom (1992); Whetton (1992); Anderson, Muir, Bateson, Blackmore & Rogers (1990); Csikszentmihalyi (1990); *Levine & Lezotte (1990); LaRoque & Coleman (1989); Hillocks (1987); Willingham & Morris (1986); Resnick & Resnick (1985); Ogle & Fritts (1984); *Natriello & Dornbusch (1984); Brooke & Oxenham (1984); Rentz (1979); Wellisch, MacQueen, Carriere & Dick (1978); *Rosswork (1977); Estes, Colvin & Goodwin (1976); Wood (1953); and Panlasigui & Knight (1930).
96 Julie A. Marsh, John F. Pane, and Laura S. Hamilton   "Unlike past studies of data use in schools, this paper brings together information systematically gathered from large, representative samples of educators at the district, school, and classroom levels in a variety of contexts." p.1 Dismissive, Denigrating Making Sense of Data-Driven Decision Making in Education Rand Corporation Occassional Paper, 2006      
97 Julie A. Marsh, John F. Pane, and Laura S. Hamilton   "Although a few studies have tried to link DDDM to changes in school culture or performance (Chen et al., 2005; Copland, 2003; Feldman and Tung, 2001; Schmoker and Wilson, 1995; Wayman and Stringfield 2005), most of the literature focuses on implementation. In addition, previous work has tended to describe case studies of schools or has taken the form of advocacy
or technical assistance (such as the “how to” implementation guides described by Feldman and Tung, 2001)." p.4
Dismissive, Denigrating Making Sense of Data-Driven Decision Making in Education Rand Corporation Occassional Paper, 2006      
98 Eva L. Baker Joan L. Herman, Robert L. Linn "For  example, performance assessment was a rage in the early 1990s  because it was something new and flashy, and looked to have great promise.  Before almost any research was done, a number of states dropped their multiple-choice accountability systems, replacing them with performance assessments.   Dismissive ACCELERATING FUTURE POSSIBILITIES FORASSESSMENT AND LEARNING, p.1 CRESST Line, Winter 2006 https://www.researchgate.net/publication/277283780_in_Educational_Researcher_called_The_Awful_Reputation_of_Education_Research Institute of Education Sciences, US Education Department It is selected-response item formats (e.g., multiple choice) that are new. Open-ended item formats have been with us for millenia. And, thousands of research, evalution, and validity studies have been conducted on them.
99 Eva L. Baker Joan L. Herman, Robert L. Linn "By the end of this year, nearly half of all states will have graduation exams in place (Peterson, 2005). Short institutional memory forgets that similar minimum competency tests did not lead to increased achievement some 20 years ago, but instead contributed to higher numbers of high school dropouts and inequities along racial lines (Catterall, 1989; Haertel & Herman, 2005)." Dismissive ACCELERATING FUTURE POSSIBILITIES FORASSESSMENT AND LEARNING, p.3 CRESST Line, Winter 2006 https://www.researchgate.net/publication/277283780_in_Educational_Researcher_called_The_Awful_Reputation_of_Education_Research Institute of Education Sciences, US Education Department Just some of the relevant pre-2008 studies of the effects of minimum-competency or exit exams and the problems with a single passing score include those of Alvarez, Moreno, & Patrinos (2007); Grodsky & Kalogrides (2006); Audette (2005); Orlich (2003); StandardsWork (2003); Meisels, et al. (2003); Braun (2003); Rosenshine (2003); Tighe, Wang, & Foley (2002); Carnoy & Loeb (2002); Baumert & Demmrich (2001); Rosenblatt & Offer (2001); Phelps (2001); Toenjes, Dworkin, Lorence, & Hill (2000); Wenglinsky (2000); Massachusetts Finance Office (2000); DeMars (2000); Bishop (1999, 2000, 2001, & 2004); Grissmer & Flanagan(1998); Strauss, Bowes, Marks, & Plesko (1998); Frederiksen (1994); Ritchie & Thorkildsen (1994); Chao-Qun & Hui (1993); Potter & Wall (1992); Jacobson (1992); Rodgers, et al. (1991); Morris (1991); Winfield (1990); Ligon, Johnstone, Brightman, Davis, et al. (1990); Winfield (1987); Koffler (1987); Losack (1987); Marshall (1987); Hembree (1987); Mangino, Battaille, Washington, & Rumbaut (1986); Michigan Department of Education (1984); Ketchie (1984); Serow (1982); Indiana Education Department (1982); Brunton (1982); Paramore, et al. (1980); Ogden (1979); Down(2) (1979); Wellisch (1978); and Findley (1978).
100 Edward Haertel Joan Herman "Passing rates on MCTs in many states rose rapidly from year to year (Popham, Cruse, Rankin, Sandifer, & Williams, 1985). Despite these gains, and positive trends on examinations like the National Assessment of Educational Progress (NAEP), there is little evidence that MCTs were the reason for improvements on other examinations." Dismissive A Historical Perspective on Validity Arguments for Accountability Testing CRESST Report 654, June 2005 https://cresst.org/wp-content/uploads/R654.pdf Institute of Education Sciences, US Education Department Just some of the relevant pre-2008 studies of the effects of minimum-competency or exit exams and the problems with a single passing score include those of Alvarez, Moreno, & Patrinos (2007); Grodsky & Kalogrides (2006); Audette (2005); Orlich (2003); StandardsWork (2003); Meisels, et al. (2003); Braun (2003); Rosenshine (2003); Tighe, Wang, & Foley (2002); Carnoy & Loeb (2002); Baumert & Demmrich (2001); Rosenblatt & Offer (2001); Phelps (2001); Toenjes, Dworkin, Lorence, & Hill (2000); Wenglinsky (2000); Massachusetts Finance Office (2000); DeMars (2000); Bishop (1999, 2000, 2001, & 2004); Grissmer & Flanagan(1998); Strauss, Bowes, Marks, & Plesko (1998); Frederiksen (1994); Ritchie & Thorkildsen (1994); Chao-Qun & Hui (1993); Potter & Wall (1992); Jacobson (1992); Rodgers, et al. (1991); Morris (1991); Winfield (1990); Ligon, Johnstone, Brightman, Davis, et al. (1990); Winfield (1987); Koffler (1987); Losack (1987); Marshall (1987); Hembree (1987); Mangino, Battaille, Washington, & Rumbaut (1986); Michigan Department of Education (1984); Ketchie (1984); Serow (1982); Indiana Education Department (1982); Brunton (1982); Paramore, et al. (1980); Ogden (1979); Down(2) (1979); Wellisch (1978); and Findley (1978).
101 Robert L. Linn   "Despite the clear appeal of assessment-based accountability and the widespread use of this approach, the development of assessments that are aligned with content standards and for which there is solid evidence of validity and reliability is a challenging endeavor." Dismissive Issues in the Design of Accountability Systems CRESST Report 650, April 2005 https://cresst.org/wp-content/uploads/R650.pdf Institute of Education Sciences, US Education Department Relevant pre-2000 studies of the effects of standards, alignment, goal setting, setting reachable goals, etc. include those of Mitchell (1999); Morgan & Ramist (1998); the *Southern Regional Education Board (1998); Miles, Bishop, Collins, Fink, Gardner, Grant, Hussain, et al. (1997); the Florida Office of Program Policy Analysis (1997); Pomplun (1997); Schmoker (1996); Aguilera & Hendricks (1996); Banta, Lund, Black & Oblander (1996); Bottoms & Mikos (1995); *Bamburg & Medina (1993); Bishop (1993); the U. S. General Accounting Office (1993); Eckstein & Noah (1993); Mattsson (1993); Brown (1992); Heyneman & Ransom (1992); Whetton (1992); Anderson, Muir, Bateson, Blackmore & Rogers (1990); Csikszentmihalyi (1990); *Levine & Lezotte (1990); LaRoque & Coleman (1989); Hillocks (1987); Willingham & Morris (1986); Resnick & Resnick (1985); Ogle & Fritts (1984); *Natriello & Dornbusch (1984); Brooke & Oxenham (1984); Rentz (1979); Wellisch, MacQueen, Carriere & Dick (1978); *Rosswork (1977); Estes, Colvin & Goodwin (1976); Wood (1953); and Panlasigui & Knight (1930).
102 Robert L. Linn   "Alignment of an assessment with the content standards that it is intended to measure is critical if the assessment is to buttress rather than undermine the standards. Too little attention has been given to the evaluation of the alignment of assessments and standards." Denigrating Issues in the Design of Accountability Systems CRESST Report 650, April 2005 https://cresst.org/wp-content/uploads/R650.pdf Institute of Education Sciences, US Education Department Relevant pre-2000 studies of the effects of standards, alignment, goal setting, setting reachable goals, etc. include those of Mitchell (1999); Morgan & Ramist (1998); the *Southern Regional Education Board (1998); Miles, Bishop, Collins, Fink, Gardner, Grant, Hussain, et al. (1997); the Florida Office of Program Policy Analysis (1997); Pomplun (1997); Schmoker (1996); Aguilera & Hendricks (1996); Banta, Lund, Black & Oblander (1996); Bottoms & Mikos (1995); *Bamburg & Medina (1993); Bishop (1993); the U. S. General Accounting Office (1993); Eckstein & Noah (1993); Mattsson (1993); Brown (1992); Heyneman & Ransom (1992); Whetton (1992); Anderson, Muir, Bateson, Blackmore & Rogers (1990); Csikszentmihalyi (1990); *Levine & Lezotte (1990); LaRoque & Coleman (1989); Hillocks (1987); Willingham & Morris (1986); Resnick & Resnick (1985); Ogle & Fritts (1984); *Natriello & Dornbusch (1984); Brooke & Oxenham (1984); Rentz (1979); Wellisch, MacQueen, Carriere & Dick (1978); *Rosswork (1977); Estes, Colvin & Goodwin (1976); Wood (1953); and Panlasigui & Knight (1930).
103 Betheny Gross, Michael Kirst, Dana Holland, and Tom Luschei Bethany Gross & Margaret E. Goertz, Eds. "Unlike elementary and middle school leaders, for whose institutions countless reform models have been designed and subsequently employed in efforts to meet accountability demands, high school leaders have relatively few models or school designs to which they can turn for guidance." p.43 Dismissive Got You Under My Spell? How Accountability Policy Is Changing and Not Changing Decision Making in High Schools Holding High Hopes: How High Schools Respond to State Accountability Policies, CPRE Research Report Series RR-056, March 2005   US Education Department funding for the Consortium for Policy Research in Education  
104 Betheny Gross, Michael Kirst, Dana Holland, and Tom Luschei Bethany Gross & Margaret E. Goertz, Eds. "Perceptions that little information exists to be found may very well reduce the likelihood that information will be sought and that new strategies will be found." p.48 Dismissive Got You Under My Spell? How Accountability Policy Is Changing and Not Changing Decision Making in High Schools Holding High Hopes: How High Schools Respond to State Accountability Policies, CPRE Research Report Series RR-056, March 2005   US Education Department funding for the Consortium for Policy Research in Education … perceptions that they encourage
105 Elliot H. Weinbaum Bethany Gross & Margaret E. Goertz, Eds. "However, state accountability policies and the research on those policies have traditionally overlooked the role of school districts. Little research is available about the ways in which districts respond to accountability pressure or, until recently, the strategies that they might use for improvement."  Dismissive Stuck in the Middle With You: District Response to State Accountability, p.96 Holding High Hopes: How High Schools Respond to State Accountability Policies, CPRE Research Report Series RR-056, March 2005   US Education Department funding for the Consortium for Policy Research in Education  
106 Elliot H. Weinbaum Bethany Gross & Margaret E. Goertz, Eds. "Because of the limited investigation that has been done, and the urgent need for high school improvement" Dismissive Stuck in the Middle With You: District Response to State Accountability, p.96 Holding High Hopes: How High Schools Respond to State Accountability Policies, CPRE Research Report Series RR-056, March 2005   US Education Department funding for the Consortium for Policy Research in Education  
107 Elliot H. Weinbaum Bethany Gross & Margaret E. Goertz, Eds. "The research community has relatively little understanding of the ways in which state level, performance-based accountability systems and local school districts interact given various contexts." p.98 Dismissive Stuck in the Middle With You: District Response to State Accountability, p.98 Holding High Hopes: How High Schools Respond to State Accountability Policies, CPRE Research Report Series RR-056, March 2005   US Education Department funding for the Consortium for Policy Research in Education  
108 Elliot H. Weinbaum Bethany Gross & Margaret E. Goertz, Eds. "First of all, much of the research on districts has studied districts that are, for some reason, 'outliers.'" p.100 Dismissive Stuck in the Middle With You: District Response to State Accountability Holding High Hopes: How High Schools Respond to State Accountability Policies, CPRE Research Report Series RR-056, March 2005   US Education Department funding for the Consortium for Policy Research in Education  
109 Elliot H. Weinbaum Bethany Gross & Margaret E. Goertz, Eds. "This is particularly true at the high school level, where continued debates about standards, the subject-specific nature of teacher expertise, and the lack of basic research about effective practices at the high school level make effective improvement strategies complex." p.104 Dismissive Stuck in the Middle With You: District Response to State Accountability Holding High Hopes: How High Schools Respond to State Accountability Policies, CPRE Research Report Series RR-056, March 2005   US Education Department funding for the Consortium for Policy Research in Education  
110 Margaret E. Goertz and Diane Massell Bethany Gross & Margaret E. Goertz, Eds. "We know little about how high schools respond to external accountability pressures." p. 123 Dismissive Summary Holding High Hopes: How High Schools Respond to State Accountability Policies, CPRE Research Report Series RR-056, March 2005   US Education Department funding for the Consortium for Policy Research in Education  
111 Margaret E. Goertz and Diane Massell Bethany Gross & Margaret E. Goertz, Eds. "Little academic research has explored what motivates and helps district organizations intervene on behalf of state accountability goals, particularly at the high school level. Our study sheds some light on this question." p.129 Dismissive Summary Holding High Hopes: How High Schools Respond to State Accountability Policies, CPRE Research Report Series RR-056, March 2005   US Education Department funding for the Consortium for Policy Research in Education  
112 Joan L. Herman
Susan H. Fuhrman & Richard F. Elmore, Eds "Based on available research, this chapter explores how well assessments serve these functions from the perspective of elementary schools." p.141 Dismissive Redesigning Accountability Systems for Education, Chapter 7 Teachers College Press, 2004 Joint project between CRESST and CPRE. Institute of Education Sciences, US Education Department Relevant pre-2000 studies of the effects of standards, alignment, goal setting, setting reachable goals, etc. include those of Mitchell (1999); Morgan & Ramist (1998); the Southern Regional Education Board (1998); Miles, Bishop, Collins, Fink, Gardner, Grant, Hussain, et al. (1997); the Florida Office of Program Policy Analysis (1997); Pomplun (1997); Schmoker (1996); Aguilera & Hendricks (1996); Banta, Lund, Black & Oblander (1996); Bottoms & Mikos (1995); *Bamburg & Medina (1993); Bishop (1993); the U. S. General Accounting Office (1993); Eckstein & Noah (1993); Mattsson (1993); Brown (1992); Heyneman & Ransom (1992); Whetton (1992); Anderson, Muir, Bateson, Blackmore & Rogers (1990); Csikszentmihalyi (1990); *Levine & Lezotte (1990); LaRoque & Coleman (1989); Hillocks (1987); Willingham & Morris (1986); Resnick & Resnick (1985); Ogle & Fritts (1984); *Natriello & Dornbusch (1984); Brooke & Oxenham (1984); Rentz (1979); Wellisch, MacQueen, Carriere & Dick (1978); *Rosswork (1977); Estes, Colvin & Goodwin (1976); Wood (1953); and Panlasigui & Knight (1930).                                                                                                                                                                                                                                                  
It is selected-response item formats (e.g., multiple choice) that are new. Open-ended item formats have been with us for millenia. And, thousands of research, evalution, and validity studies have been conducted on them. See, for example,  https://www.tandfonline.com/doi/full/10.1080/15305058.2011.602920 ; https://nonpartisaneducation.org/Review/Resources/QuantitativeList.htm ; https://nonpartisaneducation.org/Review/Resources/SurveyList.htm ; https://nonpartisaneducation.org/Review/Resources/QualitativeList.htm
"Others have considered the role of tests in incentive programs.  These researchers have included Homme, Csanyi, Gonzales, Rechs, O’Leary, Drabman, Kaszdin, Bootzin, Staats, Cameron, Pierce, McMillan, Corcoran, and Wilson. International organizations, such as the World Bank or the Asian Development Bank, have studied the effects of testing on education programs they sponsor.  Researchers have included Somerset, Heynemann, Ransom, Psacharopoulis, Velez, Brooke, Oxenham, Bude, Chapman, Snyder, and Pronaratna.
Moreover, the mastery learning/mastery testing experiments conducted from the 1960s through today varied incentives, frequency of tests, types of tests, and many other factors to determine the optimal structure of testing programs. Researchers included such notables as Bloom, Carroll, Keller, Block, Burns, Wentling, Anderson, Hymel, Kulik, Tierney, Cross, Okey, Guskey, Gates, and Jones."                                    

"What about:  Brooks-Cooper, C. (1993), Brown, S. M. & Walberg, H. J. (1993), Heneman, H. G., III. (1998), Hurlock, E. B. (1925), Jones, J. et al. (1996), Kazdin, A. & Bootzin, R. (1972), Kelley, C. (1999), Kirkpatrick, J. E. (1934), O’Leary, K. D. & Drabman, R. (1971), Palmer, J. S. (2002), Richards, C. E. & Shen, T. M. (1992), .Rosswork, S. G. (1977),  Staats, A. (1973), Tuckman, B. W. (1994),  Tuckman, B. W. & Trimble, S. (1997), Webster, W. J., Mendro, R. L., Orsack, T., Weerasinghe, D. & Bembry, K. (1997)"
113 Joan L. Herman
Susan H. Fuhrman & Richard F. Elmore, Eds "What is particularly new in standards-based assessment reform is being clear not only on the 'what' of what is expected (the content standards), but also on "how well" it should be accomplished (the performance standards) (Linn and Herman, 1997)." pp.141-142 Dismissive Redesigning Accountability Systems for Education, Chapter 7 Teachers College Press, 2004 Joint project between CRESST and CPRE. Institute of Education Sciences, US Education Department
114 Joan L. Herman
Susan H. Fuhrman & Richard F. Elmore, Eds "More is known currently about the variation in those elements across states and localities than about their influence on schools, teaching, and student learning." p.154 Dismissive, Denigrating Redesigning Accountability Systems for Education, Chapter 7 Teachers College Press, 2004 Joint project between CRESST and CPRE. Institute of Education Sciences, US Education Department
115 Joan L. Herman
Susan H. Fuhrman & Richard F. Elmore, Eds "There is ample evidence to suggest that state assessment systems do create pressure for teachers and principals … but little clear evidence on how various stakes have differential effects on teachers, their curriculum and instruction, or, ultimately, student learning." p.155 Dismissive, Denigrating Redesigning Accountability Systems for Education, Chapter 7 Teachers College Press, 2004 Joint project between CRESST and CPRE. Institute of Education Sciences, US Education Department
116 Joan L. Herman
Susan H. Fuhrman & Richard F. Elmore, Eds "Similarly, states and districts differ in how they respond to low-performing schools, but evidence on whether and how their various responses influence classroom teaching, test performance, and student learning is limited." p.155 Dismissive, Denigrating Redesigning Accountability Systems for Education, Chapter 7 Teachers College Press, 2004 Joint project between CRESST and CPRE. Institute of Education Sciences, US Education Department
117 Joan L. Herman
Susan H. Fuhrman & Richard F. Elmore, Eds "Further research is necessary, however, to identify optimal approaches. Needed, too, is additional research on how schools can best orchestrate their improvement efforts." p.155 Dismissive, Denigrating Redesigning Accountability Systems for Education, Chapter 7 Teachers College Press, 2004 Joint project between CRESST and CPRE. Institute of Education Sciences, US Education Department
118 Richard F. Elmore  Susan H. Fuhrman & Richard F. Elmore, Eds "Nowhere is this question of what we don't know more apparent than in the issue of stakes. State policies require proficiency levels for grade promotion and graduation for students, for example, without any empirical evidence …" p.278 Dismissive Redesigning Accountability Systems for Education, Chapter 7 Teachers College Press, 2004 Joint project between CRESST and CPRE. Institute of Education Sciences, US Education Department
119 Richard F. Elmore  Susan H. Fuhrman & Richard F. Elmore, Eds "Likewise, state policies set expected levels of improvement in schools without any evidence or theory about how schools actually respond to external pressure for student performance ..." pp.278-279 Dismissive Redesigning Accountability Systems for Education, Chapter 7 Teachers College Press, 2004 Joint project between CRESST and CPRE. Institute of Education Sciences, US Education Department
120 Lorraine M. McDonnell   "A growing body of research suggests that school and classroom practices do change in response to these assessments (Herman and Golan, 1993; Smith and Rottenberg, 1991; Madaus, 1988)" 1stness Politics, Persuasion, and Educational Testing, p.9 Harvard University Press, 2004     Rubbish. Entire books dating back a century were written on the topic, for example:  C.C. Ross, Measurement in Today’s Schools, 1942;  G.M. Ruch, G.D. Stoddard, Tests and Measurements in High School Instruction, 1927;  C.W. Odell, Educational Measurement in High School, 1930. Other testimonies to the abundance of educational testing and empirical research on test use starting in the first half of the twentieth century can be found in Lincoln & Workman 1936, 4, 7; Butts 1947, 605; Monroe 1950, 1461; Holman & Docter 1972, 34; Tyack 1974, 183; and Lohman 1997, 88.
121 Lorraine M. McDonnell   "A growing body of research suggests that school and classroom practices do change in response to these assessments (Herman and Golan, 1993; Smith and Rottenberg, 1991; Madaus, 1988)" Dismissive Politics, Persuasion, and Educational Testing, p.9 Harvard University Press, 2004     Rubbish. Entire books dating back a century were written on the topic, for example:  C.C. Ross, Measurement in Today’s Schools, 1942;  G.M. Ruch, G.D. Stoddard, Tests and Measurements in High School Instruction, 1927;  C.W. Odell, Educational Measurement in High School, 1930. Other testimonies to the abundance of educational testing and empirical research on test use starting in the first half of the twentieth century can be found in Lincoln & Workman 1936, 4, 7; Butts 1947, 605; Monroe 1950, 1461; Holman & Docter 1972, 34; Tyack 1974, 183; and Lohman 1997, 88.
122 Lorraine M. McDonnell   "Although most literature on policy instruments identifies this persuasive tool as one of the stategies available to policymakers, little theoretical or comparative empirical research has been conducted on its properties." Dismissive Politics, Persuasion, and Educational Testing, p.24 Harvard University Press, 2004      
123 Lorraine M. McDonnell   "There is empirical research on policies that rely on hortatory tools, but studies of these individual policies have not examined them within a broader theoretical framework." Denigrating Politics, Persuasion, and Educational Testing, p.24 Harvard University Press, 2004      
124 Lorraine M. McDonnell   "This chapter represents an initial attempt to analyze the major characteristics of hortatory policy by taking an inductive approach and looking across several different policy areas to identify a few basic properties common to most policies of this type." 1stness Politics, Persuasion, and Educational Testing, p.24 Harvard University Press, 2004      
125 Lorraine M. McDonnell   "This chapter has begun the task of building a conceptual framework for understanding hortatory  policies by identifying their underlying causal assumptions and analyzing some basic properties common to most polcies that rely on information and values to motivate action."  1stness Politics, Persuasion, and Educational Testing, p.44–45 Harvard University Press, 2004      
126 Lorraine M. McDonnell   "Because so little systematic research has been conducted on hortatory policy, it is possible at this point only to suggest, rather than to specify, the conditions under which its underlying assumptions will be valid and a policy likely to succeed." Dismissive Politics, Persuasion, and Educational Testing, p.45 Harvard University Press, 2004      
127 Lorraine M. McDonnell   "Additional theoretical and empirical work is needed to develop a more rigorous and nuanced understanding of hotatory policy. Nevertheless, this study starts that process by articulating the policy theory undergirding hortatory policy and by outlining its potential promise and shortcomings." Denigrating Politics, Persuasion, and Educational Testing, p.45 Harvard University Press, 2004      
128 Lorraine M. McDonnell   "However, because research on the effects of high stakes testing is limited, finds mixed results, and suggests unintended consequences, the informational and persuasive dimensions of testing will continue to be critical to the success of this policy." Dismissive Politics, Persuasion, and Educational Testing, p.182–183 Harvard University Press, 2004     Rubbish. Entire books dating back a century were written on the topic, for example:  C.C. Ross, Measurement in Today’s Schools, 1942;  G.M. Ruch, G.D. Stoddard, Tests and Measurements in High School Instruction, 1927;  C.W. Odell, Educational Measurement in High School, 1930. Other testimonies to the abundance of educational testing and empirical research on test use starting in the first half of the twentieth century can be found in Lincoln & Workman 1936, 4, 7; Butts 1947, 605; Monroe 1950, 1461; Holman & Docter 1972, 34; Tyack 1974, 183; and Lohman 1997, 88.
129 Deborah Loewenberg Ball Jo Boaler, Phil Daro, Andrew Porter, & 14 others "...the federal government and the nation’s school systems have made and are continuing to make significant investments toward the improvement of mathematics education. However, the knowledge base upon which these efforts are founded is generally weak." p.iii Denigrating Mathematical Proficiency for All Students Rand Corporation, 2003 https://www.rand.org/pubs/monograph_reports/MR1643.html Office of Research and Improvement, US Education Department  
130 Deborah Loewenberg Ball Jo Boaler, Phil Daro, Andrew Porter, & 14 others "New curricular materials have been developed along with training and coaching programs intended to provide teachers with the knowledge and skills needed to use those materials. However, these efforts have been supported by only a limited and uneven base of research and research-based development, which is part of the reason for the limited success of those efforts." p. xi Dismissive Mathematical Proficiency for All Students Rand Corporation, 2003 https://www.rand.org/pubs/monograph_reports/MR1643.html Office of Research and Improvement, US Education Department  
131 Deborah Loewenberg Ball Jo Boaler, Phil Daro, Andrew Porter, & 14 others "More important, the intense debates over the past decade seem to be based more often on ideology than on evidence." p.xiii Denigrating Mathematical Proficiency for All Students Rand Corporation, 2003 https://www.rand.org/pubs/monograph_reports/MR1643.html Office of Research and Improvement, US Education Department  
132 Deborah Loewenberg Ball Jo Boaler, Phil Daro, Andrew Porter, & 14 others "However, despite more than a century of efforts to improve school mathematics in the United States, investments in research and development have been virtually nonexistent." p.xiv Dismissive Mathematical Proficiency for All Students Rand Corporation, 2003 https://www.rand.org/pubs/monograph_reports/MR1643.html Office of Research and Improvement, US Education Department  
133 Deborah Loewenberg Ball Jo Boaler, Phil Daro, Andrew Porter, & 14 others "There has never been a long-range programmatic effort to fund research and development in mathematics education, nor has funding been organized to focus on knowledge that would be usable in practice." p.xiv Denigrating Mathematical Proficiency for All Students Rand Corporation, 2003 https://www.rand.org/pubs/monograph_reports/MR1643.html Office of Research and Improvement, US Education Department  
134 Deborah Loewenberg Ball Jo Boaler, Phil Daro, Andrew Porter, & 14 others "Despite the strong history of work in this area, we lack research about what is happening today in algebra classrooms; how innovations in algebra teaching and learning can be designed, implemented, and assessed; and how policy decisions shape student learning and affect equity." p.xxi Dismissive Mathematical Proficiency for All Students Rand Corporation, 2003 https://www.rand.org/pubs/monograph_reports/MR1643.html Office of Research and Improvement, US Education Department  
135 Deborah Loewenberg Ball Jo Boaler, Phil Daro, Andrew Porter, & 14 others "Because most studies have focused on algebra at the high school level, we lack knowledge about younger students’ learning of algebraic ideas and skills." p.xxi Dismissive Mathematical Proficiency for All Students Rand Corporation, 2003 https://www.rand.org/pubs/monograph_reports/MR1643.html Office of Research and Improvement, US Education Department  
136 Deborah Loewenberg Ball Jo Boaler, Phil Daro, Andrew Porter, & 14 others "Little is known about what happens when algebra is viewed as a K–12 subject, what happens when it is integrated with other subjects, or what happens when it emphasizes a wider range of concepts and processes." p.xxi Dismissive Mathematical Proficiency for All Students Rand Corporation, 2003 https://www.rand.org/pubs/monograph_reports/MR1643.html Office of Research and Improvement, US Education Department  
137 Deborah Loewenberg Ball Jo Boaler, Phil Daro, Andrew Porter, & 14 others "Research could inform the perennial debates surrounding the algebra curriculum: what to include, emphasize, reduce, or omit.." p.xxi Dismissive Mathematical Proficiency for All Students Rand Corporation, 2003 https://www.rand.org/pubs/monograph_reports/MR1643.html Office of Research and Improvement, US Education Department  
138 Deborah Loewenberg Ball Jo Boaler, Phil Daro, Andrew Porter, & 14 others "For the most part, these debates are poorly informed because research evidence is lacking." p.xxiv Dismissive Mathematical Proficiency for All Students Rand Corporation, 2003 https://www.rand.org/pubs/monograph_reports/MR1643.html Office of Research and Improvement, US Education Department  
139 Deborah Loewenberg Ball Jo Boaler, Phil Daro, Andrew Porter, & 14 others "Despite more than a century of efforts to improve school mathematics in the United States, efforts that have yielded numerous research studies and development projects, investments in research and development have been inadequate." p.5 Dismissive Mathematical Proficiency for All Students Rand Corporation, 2003 https://www.rand.org/pubs/monograph_reports/MR1643.html Office of Research and Improvement, US Education Department  
140 Deborah Loewenberg Ball Jo Boaler, Phil Daro, Andrew Porter, & 14 others "Federal agencies (primarily the National Science Foundation and the U.S. Department of Education) have contributed funding for many of these efforts. But the investments have been relatively small, and the support has been fragmented and uncoordinated." p.5 Dismissive Mathematical Proficiency for All Students Rand Corporation, 2003 https://www.rand.org/pubs/monograph_reports/MR1643.html Office of Research and Improvement, US Education Department  
141 Deborah Loewenberg Ball Jo Boaler, Phil Daro, Andrew Porter, & 14 others "There has never been a long-range programmatic effort devoted solely to funding research in mathematics education, nor has research (as opposed to development) funding been organized to focus on knowledge that would be usable in practice. Consequently, major gaps exist in the knowledge base and in knowledge-based development." p.5 Dismissive Mathematical Proficiency for All Students Rand Corporation, 2003 https://www.rand.org/pubs/monograph_reports/MR1643.html Office of Research and Improvement, US Education Department  
142 Laura S. Hamilton Daniel F. McCaffrey, J.R. Lockwood, Daniel M. Koretz “The shortcomings of the studies make it difficult to determine the size of teacher effects, but we suspect that the magnitude of some of the effects reported in this literature are overstated.” p. xiii Denigrating Evaluating Value-Added Models for Teacher Accountability  Rand Corporation, 2003 https://www.rand.org/content/dam/rand/pubs/monographs/2004/RAND_MG158.pdf Rand Corporation funders Tennessee's TVAAS value-added measurement system had been running a decade when they wrote this and did much of what these authors claim had never been done.
143 Laura S. Hamilton Daniel F. McCaffrey, J.R. Lockwood, Daniel M. Koretz “Using VAM to estimate individual teacher effects is a recent endeavor, and many of the possible sources of error have not been thoroughly evaluated in the literature.” p. xix Dismissive Evaluating Value-Added Models for Teacher Accountability  Rand Corporation, 2003 https://www.rand.org/content/dam/rand/pubs/monographs/2004/RAND_MG158.pdf Rand Corporation funders Tennessee's TVAAS value-added measurement system had been running a decade when they wrote this and did much of what these authors claim had never been done.
144 Laura S. Hamilton Daniel F. McCaffrey, J.R. Lockwood, Daniel M. Koretz “Empirical evaluations do not exist for many of the potential sources of error we have identified. Studies need to be conducted to determine how these factors contribute to estimated teacher effects and to determine the conditions that exacerbate or mitigate the impact these factors have on teacher effects.” p. xix Dismissive Evaluating Value-Added Models for Teacher Accountability  Rand Corporation, 2003 https://www.rand.org/content/dam/rand/pubs/monographs/2004/RAND_MG158.pdf Rand Corporation funders Tennessee's TVAAS value-added measurement system had been running a decade when they wrote this and did much of what these authors claim had never been done.
145 Laura S. Hamilton Daniel F. McCaffrey, J.R. Lockwood, Daniel M. Koretz “This lack of attention to teachers in policy discussions may be attributed in part to another body of literature that attempted to determine the effects of specific teacher background characteristics, including credentialing status (e.g., Miller, McKenna, and McKenna, 1998; Goldhaber and Brewer, 2000) and subject matter coursework (e.g., Monk, 1994).” p. 8 Dismissive Evaluating Value-Added Models for Teacher Accountability  Rand Corporation, 2003 https://www.rand.org/content/dam/rand/pubs/monographs/2004/RAND_MG158.pdf Rand Corporation funders Tennessee's TVAAS value-added measurement system had been running a decade when they wrote this and did much of what these authors claim had never been done.
146 Laura S. Hamilton Daniel F. McCaffrey, J.R. Lockwood, Daniel M. Koretz “To date, there has been little empirical exploration of the size of school effects and the sensitivity of teacher effects to modeling of school effects.” p. 78 Dismissive Evaluating Value-Added Models for Teacher Accountability  Rand Corporation, 2003 https://www.rand.org/content/dam/rand/pubs/monographs/2004/RAND_MG158.pdf Rand Corporation funders Tennessee's TVAAS value-added measurement system had been running a decade when they wrote this and did much of what these authors claim had never been done.
147 Laura S. Hamilton Daniel F. McCaffrey, J.R. Lockwood, Daniel M. Koretz “There are no empirical explorations of the robustness of estimates to assumptions about prior-year schooling effects.“ p. 81 Dismissive Evaluating Value-Added Models for Teacher Accountability  Rand Corporation, 2003 https://www.rand.org/content/dam/rand/pubs/monographs/2004/RAND_MG158.pdf Rand Corporation funders Tennessee's TVAAS value-added measurement system had been running a decade when they wrote this and did much of what these authors claim had never been done.
148 Laura S. Hamilton Daniel F. McCaffrey, J.R. Lockwood, Daniel M. Koretz “There is currently no empirical evidence about the sensitivity of gain scores or teacher effects to such alternatives.” p. 89 Dismissive Evaluating Value-Added Models for Teacher Accountability  Rand Corporation, 2003 https://www.rand.org/content/dam/rand/pubs/monographs/2004/RAND_MG158.pdf Rand Corporation funders Tennessee's TVAAS value-added measurement system had been running a decade when they wrote this and did much of what these authors claim had never been done.
149 Laura S. Hamilton Daniel F. McCaffrey, J.R. Lockwood, Daniel M. Koretz “Empirical evaluations do not exist for many of the potential sources of error we have identified. Studies need to be conducted to determine how these factors contribute to estimated teacher effects and to determine the conditions that exacerbate or mitigate the impact these factors have on teacher effects.” p. 116 Dismissive Evaluating Value-Added Models for Teacher Accountability  Rand Corporation, 2003 https://www.rand.org/content/dam/rand/pubs/monographs/2004/RAND_MG158.pdf Rand Corporation funders Tennessee's TVAAS value-added measurement system had been running a decade when they wrote this and did much of what these authors claim had never been done.
150 Laura S. Hamilton Daniel F. McCaffrey, J.R. Lockwood, Daniel M. Koretz “Although we expect missing data are likely to be pervasive, there is little systematic discussion of the extent or nature of missing data in test score databases.” p. 117 Dismissive Evaluating Value-Added Models for Teacher Accountability  Rand Corporation, 2003 https://www.rand.org/content/dam/rand/pubs/monographs/2004/RAND_MG158.pdf Rand Corporation funders Tennessee's TVAAS value-added measurement system had been running a decade when they wrote this and did much of what these authors claim had never been done.
151 Joan L. Herman, Noreen Webb, & Stephen Zuniga   "Despite the importance of the concept, the present state of alignment is weak (Feuer, Holland, Green, Bertenthal, & Hemphill, 1999; Rothman, Slattery, Vranek, & Resnick, 2000), and sound methodologies for examining and documenting it are just recently emerging." p.2 Dismissive Alignment and College Admissions: The Match of Expectations, Assessments, and Educator Perspectives CSE Technical Report 593, April 2003   Office of Research and Improvement, US Education Department  
152 Marguerite Clarke 5 co-authors “What this study adds to the body of literature in this area is a systematic look at how impact varies with the stakes attached to the test results.” p. 91 1stness Perceived Effects of State-Mandated Testing Programs on Teaching and Learning etc. (5 co-authors) National Board on Educational Testing and Public Policy monograph, January 2003 http://files.eric.ed.gov/fulltext/ED474867.pdf Ford Foundation See, for example, Test Frequency, Stakes, and Feedback in Student Achievement: A Meta-Analysis   https://journals.sagepub.com/doi/abs/10.1177/0193841X19865628#abstract
153 Marguerite Clarke 5 co-authors “Many calls for school reform assert that high-stakes testing will foster the economic competitiveness of the U.S. However, the empirical basis for this claim is weak.” p. 96, n. 1 Denigrating Perceived Effects of State-Mandated Testing Programs on Teaching and Learning etc. (5 co-authors) National Board on Educational Testing and Public Policy monograph, January 2003 http://files.eric.ed.gov/fulltext/ED474867.pdf Ford Foundation  
154 Brian M. Stecher Laura S. Hamilton "The business model of setting clear targets, attaching incentives to the attainment of those targets, and rewarding those responsible for reaching the targets has proven successful in a wide range of business enterprises. But there is no evidence that these accountability principles will work well in an educational context, and there are many reasons to doubt that the principles can be applied without significant adaptation." Dismissive Putting Theory to the Test: Systems of "Educational Accountability" Should be Held Accountable Rand Review, Spring 2002 https://www.rand.org/pubs/periodicals/rand-review/issues/rr-04-02/theory.html Rand Corporation funders See, for example:  https://www.tandfonline.com/doi/full/10.1080/15305058.2011.602920 ; https://nonpartisaneducation.org/Review/Resources/QuantitativeList.htm .  This list includes 24 studies completed before 2000 whose primary focus was to measure the effect of “test-based accountability.” A few dozen more pre-2000 studies also measured the effect of test-based accountability although such was not their primary focus. Include qualitative and program evaluation studies of test-based accountability, and the count of pre-2000 studies rises into the hundreds. 
155 Brian M. Stecher Laura S. Hamilton " The lack of strong evidence regarding the design and effectiveness of accountability systems hampers policymaking at a critical juncture." Denigrating Putting Theory to the Test: Systems of "Educational Accountability" Should be Held Accountable Rand Review, Spring 2002 https://www.rand.org/pubs/periodicals/rand-review/issues/rr-04-02/theory.html Rand Corporation funders See, for example:  https://www.tandfonline.com/doi/full/10.1080/15305058.2011.602920 ; https://nonpartisaneducation.org/Review/Resources/QuantitativeList.htm .  This list includes 24 studies completed before 2000 whose primary focus was to measure the effect of “test-based accountability.” A few dozen more pre-2000 studies also measured the effect of test-based accountability although such was not their primary focus. Include qualitative and program evaluation studies of test-based accountability, and the count of pre-2000 studies rises into the hundreds. 
156 Brian M. Stecher Laura S. Hamilton "Nonetheless, the evidence has yet to justify the expectations. The initial evidence is, at best, mixed. On the plus side, students and teachers seem to respond to the incentives created by the accountability systems Dismissive Putting Theory to the Test: Systems of "Educational Accountability" Should be Held Accountable Rand Review, Spring 2002 https://www.rand.org/pubs/periodicals/rand-review/issues/rr-04-02/theory.html Rand Corporation funders See, for example:  https://www.tandfonline.com/doi/full/10.1080/15305058.2011.602920 ; https://nonpartisaneducation.org/Review/Resources/QuantitativeList.htm .  This list includes 24 studies completed before 2000 whose primary focus was to measure the effect of “test-based accountability.” A few dozen more pre-2000 studies also measured the effect of test-based accountability although such was not their primary focus. Include qualitative and program evaluation studies of test-based accountability, and the count of pre-2000 studies rises into the hundreds. 
157 Brian M. Stecher Laura S. Hamilton "Proponents of accountability attribute the improved scores in these states to clearer expectations, greater motivation on the part of the students and teachers, a focused curriculum, and more-effective instruction. However, there is little or no research to substantiate these positive changes or their effects on scores." Dismissive Putting Theory to the Test: Systems of "Educational Accountability" Should be Held Accountable Rand Review, Spring 2002 https://www.rand.org/pubs/periodicals/rand-review/issues/rr-04-02/theory.html Rand Corporation funders In fact, the evidence "that testing can improve education" is voluminous. See, for example, Phelps, R. P. (2005). The rich, robust research literature on testing’s achievement benefits. In R. P. Phelps (Ed.), Defending standardized testing (pp. 55–90). Mahwah, NJ: Psychology Press. Or, see https://journals.sagepub.com/doi/abs/10.1177/0193841X19865628#abstract
158 Brian M. Stecher Laura S. Hamilton "One of the earliest studies on the effects of testing (conducted in two Arizona schools in the late 1980s) showed that teachers reduced their emphasis on important, nontested material." Dismissive Putting Theory to the Test: Systems of "Educational Accountability" Should be Held Accountable Rand Review, Spring 2002 https://www.rand.org/pubs/periodicals/rand-review/issues/rr-04-02/theory.html Rand Corporation funders Hundreds, perhaps thousands, of studies of the effects of testing predate CRESST's in the 1980s. See, for example:  https://www.tandfonline.com/doi/full/10.1080/15305058.2011.602920 ; https://nonpartisaneducation.org/Review/Resources/QuantitativeList.htm ; https://nonpartisaneducation.org/Review/Resources/SurveyList.htm ; https://nonpartisaneducation.org/Review/Resources/QualitativeList.htm
159 Brian M. Stecher Laura S. Hamilton "Test-based accountability systems will work better if we acknowledge how little we know about them, if the federal government devotes appropriate resources to studying them, and if the states make ongoing efforts to improve them."  Dismissive Putting Theory to the Test: Systems of "Educational Accountability" Should be Held Accountable Rand Review, Spring 2002 https://www.rand.org/pubs/periodicals/rand-review/issues/rr-04-02/theory.html Rand Corporation funders See, for example:  https://www.tandfonline.com/doi/full/10.1080/15305058.2011.602920 ; https://nonpartisaneducation.org/Review/Resources/QuantitativeList.htm .  This list includes 24 studies completed before 2000 whose primary focus was to measure the effect of “test-based accountability.” A few dozen more pre-2000 studies also measured the effect of test-based accountability although such was not their primary focus. Include qualitative and program evaluation studies of test-based accountability, and the count of pre-2000 studies rises into the hundreds. 
160 Robert L. Linn Eva L. Baker "“It is true that many of these accommodated test conditions are not subjected to validity studies to determine whether the construct or domain tested has been significantly altered. In part, this lack of empirical data results from restricted resources.” p. 14 Dismissive Validity Issues for Accountability Systems CSE Technical Report 585 (December 2002) http://www.cse.ucla.edu/products/reports/TR585.pdf Office of Research and Improvement, US Education Department External evaluations of large-scale testing programs not only exist, but represent the norm. 
161 Lauren B. Resnick Robert Rothman, Jean B. Slattery, Jennifer L. Vranek "States that have or adopt test-based accountability programs claim that their tests are aligned to their standards. But there has been, up to now, no independent methodology for checking alignment. This paper describes and illustrates such a methodology..." 1stness Benchmarking and Alignment of Standards and Testing, p.1 CSE Technical Report 566, CRESST/Achieve, May 2002 https://www.achieve.org/files/TR566.pdf Office of Research and Improvement, US Education Department Relevant pre-2000 studies of the effects of standards, alignment, goal setting, setting reachable goals, etc. include those of Mitchell (1999); Morgan & Ramist (1998); the *Southern Regional Education Board (1998); Miles, Bishop, Collins, Fink, Gardner, Grant, Hussain, et al. (1997); the Florida Office of Program Policy Analysis (1997); Pomplun (1997); Schmoker (1996); Aguilera & Hendricks (1996); Banta, Lund, Black & Oblander (1996); Bottoms & Mikos (1995); *Bamburg & Medina (1993); Bishop (1993); the U. S. General Accounting Office (1993); Eckstein & Noah (1993); Mattsson (1993); Brown (1992); Heyneman & Ransom (1992); Whetton (1992); Anderson, Muir, Bateson, Blackmore & Rogers (1990); Csikszentmihalyi (1990); *Levine & Lezotte (1990); LaRoque & Coleman (1989); Hillocks (1987); Willingham & Morris (1986); Resnick & Resnick (1985); Ogle & Fritts (1984); *Natriello & Dornbusch (1984); Brooke & Oxenham (1984); Rentz (1979); Wellisch, MacQueen, Carriere & Dick (1978); *Rosswork (1977); Estes, Colvin & Goodwin (1976); Wood (1953); and Panlasigui & Knight (1930).
162 Lauren B. Resnick Robert Rothman, Jean B. Slattery, Jennifer L. Vranek "Yet  few,  if  any,  states have put in place effective policies or resource systems for improving instructional quality (National Research Council, 1999)." Dismissive Benchmarking and Alignment of Standards and Testing, p.4 CSE Technical Report 566, CRESST/Achieve, May 2002 https://www.achieve.org/files/TR566.pdf Office of Research and Improvement, US Education Department Hundreds, perhaps thousands, of studies of the effects of testing predate CRESST's in the 1980s. See, for example:  https://www.tandfonline.com/doi/full/10.1080/15305058.2011.602920 ; https://nonpartisaneducation.org/Review/Resources/QuantitativeList.htm ; https://nonpartisaneducation.org/Review/Resources/SurveyList.htm ; https://nonpartisaneducation.org/Review/Resources/QualitativeList.htm
163 Laura S. Hamilton, Brian M. Stecher, Stephen P. Klein   "Although test-based accountability has shown some compelling results, the issues are complex, the research is new and incomplete, and many of the claims that have received the most attention have proved to be premature and superficial." Denigrating Summary, p.xiv Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002 https://www.rand.org/pubs/monograph_reports/MR1554.html US National Science Foundation Hundreds, perhaps thousands, of studies of the effects of testing predate CRESST's in the 1980s. See, for example:  https://www.tandfonline.com/doi/full/10.1080/15305058.2011.602920 ; https://nonpartisaneducation.org/Review/Resources/QuantitativeList.htm ; https://nonpartisaneducation.org/Review/Resources/SurveyList.htm ; https://nonpartisaneducation.org/Review/Resources/QualitativeList.htm
164 Laura S. Hamilton, Brian M. Stecher, Stephen P. Klein   "The research evidence does not provide definitive information about the actual costs of testing but the information that is available suggests that expenditures for testing have grown in recent years." Dismissive Introduction, p.9 Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002 https://www.rand.org/pubs/monograph_reports/MR1554.html US National Science Foundation No. See, for example, Phelps, R.P. (2000, Winter). Estimating the cost of systemwide student testing in the United States. Journal of Education Finance, 25(3) 343–380; Danitz, T. (2001, February 27). Special report: States pay $400 million for tests in 2001. Stateline.org. Pew Center for the States; Hoxby, C.M. (2002). The cost of accountability, in W. M Evers & H.J. Walberg (Eds.), School Accountability, Stanford, CA: Hoover Institution Press; U.S. GAO. (1993, January). Student testing: Current extent and expenditures, with cost estimates for a national examination. GAO/PEMD-93-8. Washington, DC: US General Accounting Office; Picus, L.O., & Tralli, A. (1998, February). Alternative assessment programs: What are the true costs? CSE Technical Report 441, Los Angeles: CRESST; Phelps, R.P. (1998). Benefit-cost analysis of systemwide student testing, Paper presented at the annual meeting of the American Education Finance Association, Mobile, AL.
165 Laura S. Hamilton, Brian M. Stecher, Stephen P. Klein   "The General Accounting Office (1993) … estimate was $516 million … The estimate does not include time for more-extensive test preparation activities." p.9 Denigrating Introduction, p.9 Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002 https://www.rand.org/pubs/monograph_reports/MR1554.html US National Science Foundation As a matter of fact the GAO report did include those costs -- all of them. The GAO surveys very explicitly instructed respondents to "include any and all costs related" to each test, including any and all test preparation time and expenses.
166 Laura S. Hamilton, Daniel M. Koretz Laura S. Hamilton, Brian M. Stecher, Stephen P. Klein, Eds. "There is currently no substantial evidence on the effects of published report cards on parents’ decisionmaking or on the schools themselves." Dismissive Chapter 2: Tests and their use in test-based accountability systems, p.44 Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002 https://www.rand.org/pubs/monograph_reports/MR1554.html US National Science Foundation For decades, consulting services have existed that help parents new to a city select the right school or school district for them.
167 Vi-Nhuan Le, Stephen P. Klein Laura S. Hamilton, Brian M. Stecher, Stephen P. Klein, Eds. "Research on the inflation of gains remains too limited to indicate how prevalent the problem is." Dismissive Chapter 3: Technical criteria for evaluating tests, p. 68 Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002 https://www.rand.org/pubs/monograph_reports/MR1554.html US National Science Foundation In fact the test prep, or test coaching, literature is vast and dates back decades, with meta-analyses of the literature dating back at least to the 1970s. There's even a What Works Clearinghouse summary of the (post World Wide Web) college admission test prep research literature:  https://ies.ed.gov/ncee/wwc/Docs/InterventionReports/wwc_act_sat_100416.pdf . See also: Gilmore (1927)  DeWeerdt (1927)  French (1959) French & Dear (1959)  Ortar (1960)  Marron (1965)  ETS (1965). Messick & Jungeblut (1981)  Ellis, Konoske, Wulfeck, & Montague (1982)  DerSimonian and Laird (1983)  Kulik, Bangert-Drowns & Kulik (1984)  Powers (1985)  Samson (1985) Scruggs, White, & Bennion (1986) Jones (1986). Fraker (1986/1987)  Halpin (1987)  Whitla (1988)  Snedecor (1989)  Bond (1989). Baydar (1990)  Becker (1990)  Smyth (1990)  Moore (1991)  Alderson & Wall (1992)  Powers (1993)  Oren (1993). Powers & Rock (1994)  Scholes, Lane (1997)   Allalouf & Ben Shakhar (1998)  Robb & Ercanbrack (1999)  McClain (1999)  Camara (1999, 2001, 2008) Stone & Lane (2000, 2003)  Din & Soldan (2001)  Briggs (2001)  Palmer (2002)  Briggs & Hansen (2004)  Cankoy & Ali Tut (2005)  Crocker (2005)  Allensworth, Correa, & Ponisciak (2008)  Domingue & Briggs (2009)  Koljatic & Silva (2014)  Early (2019)  Herndon (2021)
168 Vi-Nhuan Le, Stephen P. Klein Laura S. Hamilton, Brian M. Stecher, Stephen P. Klein, Eds. "Relatively little is known about how testing accomodations affect score validity, and the few studies that have been conducted on the subject have had mixed results." Dismissive Chapter 3: Technical criteria for evaluating tests, p. 71 Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002 https://www.rand.org/pubs/monograph_reports/MR1554.html US National Science Foundation  
169 Brian M. Stecher Laura S. Hamilton, Brian M. Stecher, Stephen P. Klein, Eds. "High-stakes testing may also affect parents (e.g., their attitudes toward education, their engagement with schools,  and their direct participation in their child's learning) as well as policymakers (their beliefs about system performance, their judgements about program effectiveness, and their allocation of resources). However, these issues remain largely unexamined in the literature." Dismissive Chapter 4: Consequences of large-scale, high-stakes testing on school and classroom practice, p. 79 Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002 https://www.rand.org/content/dam/rand/pubs/monograph_reports/2002/MR1554.pdf US National Science Foundation Parents and other adults are typically reached.through public opinion polls. See, for example:  https://www.tandfonline.com/doi/full/10.1080/15305058.2011.602920 ; https://nonpartisaneducation.org/Review/Resources/SurveyList.htm .  Among the hundreds of polls conducted between 1958 and 2008, a majority of them included parents in particular or adults in general.
170 Brian M. Stecher Laura S. Hamilton, Brian M. Stecher, Stephen P. Klein, Eds. "As described in chapter 2, there was little concern about the effects of testing on teaching prior to the 1970s." Dismissive Chapter 4: Consequences of large-scale, high-stakes testing on school and classroom practice, p. 81 Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002 https://www.rand.org/content/dam/rand/pubs/monograph_reports/2002/MR1554.pdf US National Science Foundation Rubbish. Entire books were written on the topic, for example:  C.C. Ross, Measurement in Today’s Schools, 1942;  G.M. Ruch, G.D. Stoddard, Tests and Measurements in High School Instruction, 1927;  C.W. Odell, Educational Measurement in High School, 1930. Other testimonies to the abundance of educational testing and empirical research on test use starting in the first half of the twentieth century can be found in Lincoln & Workman 1936, 4, 7; Butts 1947, 605; Monroe 1950, 1461; Holman & Docter 1972, 34; Tyack 1974, 183; and Lohman 1997, 88.
171 Brian M. Stecher Laura S. Hamilton, Brian M. Stecher, Stephen P. Klein, Eds. "In light of the changes that occurred in the uses of large-scale testing in the 1980s and 1990s, researchers began to investigate teachers' reactions to external assessment. The initial research on the impact of large-scale testing was conducted in the 1980s and the 1990s." Dismissive Chapter 4: Consequences of large-scale, high-stakes testing on school and classroom practice, p. 83 Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002 https://www.rand.org/content/dam/rand/pubs/monograph_reports/2002/MR1554.pdf US National Science Foundation Hundreds, perhaps thousands, of studies of the effects of testing predate CRESST's in the 1980s. See, for example:  https://www.tandfonline.com/doi/full/10.1080/15305058.2011.602920 ; https://nonpartisaneducation.org/Review/Resources/QuantitativeList.htm ; https://nonpartisaneducation.org/Review/Resources/SurveyList.htm ; https://nonpartisaneducation.org/Review/Resources/QualitativeList.htm
172 Brian M. Stecher Laura S. Hamilton, Brian M. Stecher, Stephen P. Klein, Eds. "The bulk of the research on the effects of testing has been conducted using surveys and case studies." Dismissive Chapter 4: Consequences of large-scale, high-stakes testing on school and classroom practice, p. 83 Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002 https://www.rand.org/content/dam/rand/pubs/monograph_reports/2002/MR1554.pdf US National Science Foundation This is misleading. True, many of the hundreds of studies on the effects of testing have been surveys and case studies. But, many, and more by my count, have been randomized experiments. See, for example:  https://www.tandfonline.com/doi/full/10.1080/15305058.2011.602920 ; https://nonpartisaneducation.org/Review/Resources/QuantitativeList.htm ;
173 Brian M. Stecher Laura S. Hamilton, Brian M. Stecher, Stephen P. Klein, Eds. "Data on the incidence of cheating [on educational tests] are scarce…" Dismissive Chapter 4: Consequences of large-scale, high-stakes testing on school and classroom practice, p. 96 Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002 https://www.rand.org/pubs/monograph_reports/MR1554.html US National Science Foundation Actually, there have been, in surveys, in which respondents freely admit that they cheat and how. Moreover, news reports of cheating, by students or educators, have been voluminous. See, for example, Caveon Test Security's "Cheating in the News" section on its web site.
174 Brian M. Stecher Laura S. Hamilton, Brian M. Stecher, Stephen P. Klein, Eds. "Less is known about changes in policies at the district and school levels in response to high-stakes testing, but mixed evidence of some impact has appeared." Dismissive Chapter 4: Consequences of large-scale, high-stakes testing on school and classroom practice, p. 96 Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002 https://www.rand.org/pubs/monograph_reports/MR1554.html US National Science Foundation Relevant pre-2000 studies of the effects of testing on at-risk students, completion, dropping out, curricular offerings, attitudes, etc. include those of Schleisman (1999); the *Southern Regional Education Board (1998); Webster, Mendro, Orsak, Weerasinghe & Bembry (1997); Jones (1996); Boylan (1996); Jones, 1993; Jacobson (1992); Grisay (1991); Johnstone (1990); Task Force on Educational Assessment Programs [Florida] (1979); Wellisch, MacQueen, Carriere & Duck (1978); Enochs (1978); Pronaratna (1976); and McWilliams & Thomas (1976).
175 Brian M. Stecher Laura S. Hamilton, Brian M. Stecher, Stephen P. Klein, Eds. "Although numerous news articles have addressed the negative effects of high-stakes testing, systematic research on the subject is limited." Dismissive Chapter 4: Consequences of large-scale, high-stakes testing on school and classroom practice, p. 98 Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002 https://www.rand.org/pubs/monograph_reports/MR1554.html US National Science Foundation Relevant pre-2000 studies of the effects of testing on at-risk students, completion, dropping out, curricular offerings, attitudes, etc. include those of Schleisman (1999); the *Southern Regional Education Board (1998); Webster, Mendro, Orsak, Weerasinghe & Bembry (1997); Jones (1996); Boylan (1996); Jones, 1993; Jacobson (1992); Grisay (1991); Johnstone (1990); Task Force on Educational Assessment Programs [Florida] (1979); Wellisch, MacQueen, Carriere & Duck (1978); Enochs (1978); Pronaratna (1976); and McWilliams & Thomas (1976).
176 Brian M. Stecher Laura S. Hamilton, Brian M. Stecher, Stephen P. Klein, Eds. "Research regarding the effects of test-based accountability on equity is very limited." Dismissive Chapter 4: Consequences of large-scale, high-stakes testing on school and classroom practice, p. 99 Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002 https://www.rand.org/content/dam/rand/pubs/monograph_reports/2002/MR1554.pdf US National Science Foundation  
177 Brian M. Stecher Laura S. Hamilton, Brian M. Stecher, Stephen P. Klein, Eds. "Researchers have not documented the desirable consequences of testing … as clearly as the undesirable ones." Dismissive Chapter 4: Consequences of large-scale, high-stakes testing on school and classroom practice, p. 99 Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002 https://www.rand.org/content/dam/rand/pubs/monograph_reports/2002/MR1554.pdf US National Science Foundation See, for example:  https://www.tandfonline.com/doi/full/10.1080/15305058.2011.602920 ; https://nonpartisaneducation.org/Review/Resources/QuantitativeList.htm ; https://nonpartisaneducation.org/Review/Resources/SurveyList.htm ; https://nonpartisaneducation.org/Review/Resources/QualitativeList.htm
178 Brian M. Stecher Laura S. Hamilton, Brian M. Stecher, Stephen P. Klein, Eds. " … researchers have not generally measured the extent or magnitude of the shifts in practice that they identified as a result of high-stakes testing." Dismissive Chapter 4: Consequences of large-scale, high-stakes testing on school and classroom practice, pp. 99–100 Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002 https://www.rand.org/content/dam/rand/pubs/monograph_reports/2002/MR1554.pdf US National Science Foundation The 1993 GAO study did. See, also:  https://www.tandfonline.com/doi/full/10.1080/15305058.2011.602920 ; https://nonpartisaneducation.org/Review/Resources/QuantitativeList.htm ; https://nonpartisaneducation.org/Review/Resources/SurveyList.htm ; https://nonpartisaneducation.org/Review/Resources/QualitativeList.htm
179 Lorraine M. McDonnell Laura S. Hamilton, Brian M. Stecher, Stephen P. Klein, Eds. "...this chapter can only describe the issues that are raised when one looks at testing from a political perspective. Because of the lack of systematic studies on the topic." Dismissive Chapter 5: Accountability as seen through a political lens, p.102 Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002 https://www.rand.org/pubs/monograph_reports/MR1554.html US National Science Foundation Parents and other adults are typically reached.through public opinion polls. See, for example:  https://www.tandfonline.com/doi/full/10.1080/15305058.2011.602920 ; https://nonpartisaneducation.org/Review/Resources/SurveyList.htm .  Among the hundreds of polls conducted between 1958 and 2008, a majority of them included parents in particular or adults in general.
180 Lorraine M. McDonnell Laura S. Hamilton, Brian M. Stecher, Stephen P. Klein, Eds. "...public opinion, as measured by surveys, does not always provide a clear and unambiguous measure of public sentiment." Denigrating Chapter 5: Accountability as seen through a political lens, p.108 Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002 https://www.rand.org/pubs/monograph_reports/MR1554.html US National Science Foundation Parents and other adults are typically reached.through public opinion polls. See, for example:  https://www.tandfonline.com/doi/full/10.1080/15305058.2011.602920 ; https://nonpartisaneducation.org/Review/Resources/SurveyList.htm .  Among the hundreds of polls conducted between 1958 and 2008, a majority of them included parents in particular or adults in general.
181 Laura S. Hamilton, Brian M. Stecher Laura S. Hamilton, Brian M. Stecher, Stephen P. Klein, Eds. "So test-based accountability remains controversial because there is inadequate evidence to make clear judgments about its effectiveness in raising test scores and achieving its other goals." Dismissive Chapter 6: Improving test-based accountability, p.122 Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002 https://www.rand.org/pubs/monograph_reports/MR1554.html US National Science Foundation In fact, the evidence "that testing can improve education" is voluminous. See, for example, Phelps, R. P. (2005). The rich, robust research literature on testing’s achievement benefits. In R. P. Phelps (Ed.), Defending standardized testing (pp. 55–90). Mahwah, NJ: Psychology Press. Or, see https://journals.sagepub.com/doi/abs/10.1177/0193841X19865628#abstract
182 Laura S. Hamilton, Brian M. Stecher Laura S. Hamilton, Brian M. Stecher, Stephen P. Klein, Eds. "Unfortunately, the complexity of the issues and the ambiguity of the existing research do not allow our recommendations to take the form of a practical “how-to” guide for policymakers and practitioners." Denigrating Chapter 6: Improving test-based accountability, p.123 Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002 https://www.rand.org/pubs/monograph_reports/MR1554.html US National Science Foundation In fact, the evidence "that testing can improve education" is voluminous. See, for example, Phelps, R. P. (2005). The rich, robust research literature on testing’s achievement benefits. In R. P. Phelps (Ed.), Defending standardized testing (pp. 55–90). Mahwah, NJ: Psychology Press. Or, see https://journals.sagepub.com/doi/abs/10.1177/0193841X19865628#abstract
183 Laura S. Hamilton, Brian M. Stecher Laura S. Hamilton, Brian M. Stecher, Stephen P. Klein, Eds. "Additional research is needed to identify the elements of performance on tests and how these elements map onto other tests …." Denigrating Chapter 6: Improving test-based accountability, p.127 Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002 https://www.rand.org/pubs/monograph_reports/MR1554.html US National Science Foundation  
184 Laura S. Hamilton, Brian M. Stecher Laura S. Hamilton, Brian M. Stecher, Stephen P. Klein, Eds. "Another part of the interpretive question is the need to gather information
in other subject areas to portray a more complete picture of
achievement.
The scope of constructs that have been considered in
research
to date has been fairly narrow, focusing on the subjects that
are part of the accountability systems that have been studied. Many
legitimate instructional
objectives have been ignored in the literature
to date."
Denigrating Chapter 6: Improving test-based accountability, p.127 Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002 https://www.rand.org/pubs/monograph_reports/MR1554.html US National Science Foundation Many studies of the effects of testing predate CRESST's in the 1980s and cover all subject fields, not just reading and math. See, for example:  https://www.tandfonline.com/doi/full/10.1080/15305058.2011.602920 ; https://nonpartisaneducation.org/Review/Resources/QuantitativeList.htm ; https://nonpartisaneducation.org/Review/Resources/SurveyList.htm ; https://nonpartisaneducation.org/Review/Resources/QualitativeList.htm
185 Laura S. Hamilton, Brian M. Stecher Laura S. Hamilton, Brian M. Stecher, Stephen P. Klein, Eds. "States should also conduct ongoing analyses of the performance of groups whose members may not be numerous enough to permit separate reporting. English-language learners and students with disabilities are increasingly being included in high-stakes testing systems, and, as discussed in Chapter Three, little is currently known about the validity of scores for these groups." Dismissive Chapter 6: Improving test-based accountability, p.131 Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002 https://www.rand.org/pubs/monograph_reports/MR1554.html US National Science Foundation Difficult to believe given that the federal government has for decades generously funded research into testing students with disabilities. See, for example, https://nceo.info/ and Kurt Geisinger's and Janet Carlson's chapters in Defending Standardized Testing and Correcting Fallacies in Educational and Psychological Testing. 
186 Laura S. Hamilton, Brian M. Stecher Laura S. Hamilton, Brian M. Stecher, Stephen P. Klein, Eds. "It would be especially helpful to know what changes in instruction are made in response to different kinds of information and incentives. In particular, we need to know how teachers interpret information from tests and how they use it to modify instruction." Dismissive Chapter 6: Improving test-based accountability, p.133 Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002 https://www.rand.org/pubs/monograph_reports/MR1554.html US National Science Foundation Relevant studies of the effects of tests and/or accountability program on motivation and instructional practice include those of the *Southern Regional Education Board (1998); Johnson (1998); Schafer, Hultgren, Hawley, Abrams Seubert & Mazzoni (1997); Miles, Bishop, Collins, Fink, Gardner, Grant, Hussain, et al. (1997); Tuckman & Trimble (1997); Clarke & Stephens (1996); Zigarelli (1996); Stevenson, Lee, et al. (1995); Waters, Burger & Burger (1995); Egeland (1995); Prais (1995); Tuckman (1994); Ritchie & Thorkildsen (1994); Brown & Walberg, (1993); Wall & Alderson (1993); Wolf & Rapiau (1993); Eckstein & Noah (1993); Chao-Qun & Hui (1993); Plazak & Mazur (1992); Steedman (1992); Singh, Marimutha & Mukjerjee (1990); *Levine & Lezotte (1990); O’Sullivan (1989); Somerset (1988); Pennycuick & Murphy (1988); Stevens (1984); Marsh (1984); Brunton (1982); Solberg (1977); Foss (1977); *Kirkland (1971); Somerset (1968); Stuit (1947); and Keys (1934).  *Covers many studies; study is a research review, research synthesis, or meta-analysis. "Others have considered the role of tests in incentive programs.  These researchers have included Homme, Csanyi, Gonzales, Rechs, O’Leary, Drabman, Kaszdin, Bootzin, Staats, Cameron, Pierce, McMillan, Corcoran, and Wilson. International organizations, such as the World Bank or the Asian Development Bank, have studied the effects of testing on education programs they sponsor.  Researchers have included Somerset, Heynemann, Ransom, Psacharopoulis, Velez, Brooke, Oxenham, Bude, Chapman, Snyder, and Pronaratna.
Moreover, the mastery learning/mastery testing experiments conducted from the 1960s through today varied incentives, frequency of tests, types of tests, and many other factors to determine the optimal structure of testing programs. Researchers included such notables as Bloom, Carroll, Keller, Block, Burns, Wentling, Anderson, Hymel, Kulik, Tierney, Cross, Okey, Guskey, Gates, and Jones."
187 Laura S. Hamilton, Brian M. Stecher Laura S. Hamilton, Brian M. Stecher, Stephen P. Klein, Eds. It seems clear that aligning the components of the system and providing appropriate professional development should, at a minimum, increase teachers’ political support for test-based accountability policies .... Although there is no empirical evidence to suggest that this strategy will reduce inappropriate responses to high-stakes testing,... Additional research needs to be done to determine the importance of alignment for promoting positive effects of test-based accountability. Dismissive Chapter 6: Improving test-based accountability, p.135 Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002 https://www.rand.org/pubs/monograph_reports/MR1554.html US National Science Foundation Relevant studies of the effects of tests and/or accountability program on motivation and instructional practice include those of the *Southern Regional Education Board (1998); Johnson (1998); Schafer, Hultgren, Hawley, Abrams Seubert & Mazzoni (1997); Miles, Bishop, Collins, Fink, Gardner, Grant, Hussain, et al. (1997); Tuckman & Trimble (1997); Clarke & Stephens (1996); Zigarelli (1996); Stevenson, Lee, et al. (1995); Waters, Burger & Burger (1995); Egeland (1995); Prais (1995); Tuckman (1994); Ritchie & Thorkildsen (1994); Brown & Walberg, (1993); Wall & Alderson (1993); Wolf & Rapiau (1993); Eckstein & Noah (1993); Chao-Qun & Hui (1993); Plazak & Mazur (1992); Steedman (1992); Singh, Marimutha & Mukjerjee (1990); *Levine & Lezotte (1990); O’Sullivan (1989); Somerset (1988); Pennycuick & Murphy (1988); Stevens (1984); Marsh (1984); Brunton (1982); Solberg (1977); Foss (1977); *Kirkland (1971); Somerset (1968); Stuit (1947); and Keys (1934).  *Covers many studies; study is a research review, research synthesis, or meta-analysis. "Others have considered the role of tests in incentive programs.  These researchers have included Homme, Csanyi, Gonzales, Rechs, O’Leary, Drabman, Kaszdin, Bootzin, Staats, Cameron, Pierce, McMillan, Corcoran, and Wilson. International organizations, such as the World Bank or the Asian Development Bank, have studied the effects of testing on education programs they sponsor.  Researchers have included Somerset, Heynemann, Ransom, Psacharopoulis, Velez, Brooke, Oxenham, Bude, Chapman, Snyder, and Pronaratna.
Moreover, the mastery learning/mastery testing experiments conducted from the 1960s through today varied incentives, frequency of tests, types of tests, and many other factors to determine the optimal structure of testing programs. Researchers included such notables as Bloom, Carroll, Keller, Block, Burns, Wentling, Anderson, Hymel, Kulik, Tierney, Cross, Okey, Guskey, Gates, and Jones."
188 Laura S. Hamilton, Brian M. Stecher Laura S. Hamilton, Brian M. Stecher, Stephen P. Klein, Eds. "… we currently do not know enough about test-based accountability to design a system that is immune from the problems we have discussed Dismissive Chapter 6: Improving test-based accountability, p.136 Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002 https://www.rand.org/pubs/monograph_reports/MR1554.html US National Science Foundation In fact, the evidence "that testing can improve education" is voluminous. See, for example, Phelps, R. P. (2005). The rich, robust research literature on testing’s achievement benefits. In R. P. Phelps (Ed.), Defending standardized testing (pp. 55–90). Mahwah, NJ: Psychology Press. Or, see https://journals.sagepub.com/doi/abs/10.1177/0193841X19865628#abstract
189 Laura S. Hamilton, Brian M. Stecher Laura S. Hamilton, Brian M. Stecher, Stephen P. Klein, Eds. "There is some limited evidence that educators’ responses to test based accountability vary according to the characteristics of their student populations,…" Denigrating Chapter 6: Improving test-based accountability, p.138 Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002 https://www.rand.org/pubs/monograph_reports/MR1554.html US National Science Foundation There was and is far more than "limited" evidence. See, for example:  Hundreds, perhaps thousands, of studies of the effects of testing predate CRESST's in the 1980s. See, for example:  https://www.tandfonline.com/doi/full/10.1080/15305058.2011.602920 ; https://nonpartisaneducation.org/Review/Resources/SurveyList.htm ; https://nonpartisaneducation.org/Review/Resources/QualitativeList.htm
190 Laura S. Hamilton, Brian M. Stecher Laura S. Hamilton, Brian M. Stecher, Stephen P. Klein, Eds. "... there is very limited evidence to guide thinking about political issues." Dismissive Chapter 6: Improving test-based accountability, p.139 Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002 https://www.rand.org/pubs/monograph_reports/MR1554.html US National Science Foundation Parents and other adults are typically reached.through public opinion polls. See, for example:  https://www.tandfonline.com/doi/full/10.1080/15305058.2011.602920 ; https://nonpartisaneducation.org/Review/Resources/SurveyList.htm .  Among the hundreds of polls conducted between 1958 and 2008, a majority of them included parents in particular or adults in general.
191 Laura S. Hamilton, Brian M. Stecher Laura S. Hamilton, Brian M. Stecher, Stephen P. Klein, Eds. "First, we do not have an accurate assessment of the additional costs." Denigrating Chapter 6: Improving test-based accountability, p.141 Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002 https://www.rand.org/pubs/monograph_reports/MR1554.html US National Science Foundation Yes, we did and we do. See, for example, Phelps, R.P. (2000, Winter). Estimating the cost of systemwide student testing in the United States. Journal of Education Finance, 25(3) 343–380; Danitz, T. (2001, February 27). Special report: States pay $400 million for tests in 2001. Stateline.org. Pew Center for the States; Hoxby, C.M. (2002). The cost of accountability, in W. M Evers & H.J. Walberg (Eds.), School Accountability, Stanford, CA: Hoover Institution Press; U.S. GAO. (1993, January). Student testing: Current extent and expenditures, with cost estimates for a national examination. GAO/PEMD-93-8. Washington, DC: US General Accounting Office; Picus, L.O., & Tralli, A. (1998, February). Alternative assessment programs: What are the true costs? CSE Technical Report 441, Los Angeles: CRESST; Phelps, R.P. (1998). Benefit-cost analysis of systemwide student testing, Paper presented at the annual meeting of the American Education Finance Association, Mobile, AL.
192 Laura S. Hamilton, Brian M. Stecher Laura S. Hamilton, Brian M. Stecher, Stephen P. Klein, Eds. "However, many of these recommended reforms are relatively inexpensive in comparison with the total cost of education. This equation is seldom examined."  Denigrating Chapter 6: Improving test-based accountability, p.141 Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002 https://www.rand.org/pubs/monograph_reports/MR1554.html US National Science Foundation Wrong. See, for example, Phelps, R.P. (2000, Winter). Estimating the cost of systemwide student testing in the United States. Journal of Education Finance, 25(3) 343–380;  Phelps, R.P. (1998). Benefit-cost analysis of systemwide student testing, Paper presented at the annual meeting of the American Education Finance Association, Mobile, AL.
193 Laura S. Hamilton, Brian M. Stecher Laura S. Hamilton, Brian M. Stecher, Stephen P. Klein, Eds. "Part of the reason these issues are rarely considered may be that no one has produced a good estimate of the cost of an improved accountability system in comparison with its benefits." Denigrating Chapter 6: Improving test-based accountability, p.141 Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002 https://www.rand.org/pubs/monograph_reports/MR1554.html US National Science Foundation No. See, for example, Phelps, R.P. (2000, Winter). Estimating the cost of systemwide student testing in the United States. Journal of Education Finance, 25(3) 343–380; Danitz, T. (2001, February 27). Special report: States pay $400 million for tests in 2001. Stateline.org. Pew Center for the States; Hoxby, C.M. (2002). The cost of accountability, in W. M Evers & H.J. Walberg (Eds.), School Accountability, Stanford, CA: Hoover Institution Press; U.S. GAO. (1993, January). Student testing: Current extent and expenditures, with cost estimates for a national examination. GAO/PEMD-93-8. Washington, DC: US General Accounting Office; Picus, L.O., & Tralli, A. (1998, February). Alternative assessment programs: What are the true costs? CSE Technical Report 441, Los Angeles: CRESST; Phelps, R.P. (1998). Benefit-cost analysis of systemwide student testing, Paper presented at the annual meeting of the American Education Finance Association, Mobile, AL.
194 Laura S. Hamilton, Brian M. Stecher Laura S. Hamilton, Brian M. Stecher, Stephen P. Klein, Eds. "Nevertheless, our knowledge of the costs of alternative accountability systems is still somewhat limited. Policymakers need to know how much it would cost to change their current systems to be responsive to criticisms such as those described in this book. These estimates need to consider all of the associated costs, including possible opportunity costs associated with increased testing time and increased test preparation time." Dismissive Chapter 6: Improving test-based accountability, p.142 Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002 https://www.rand.org/pubs/monograph_reports/MR1554.html US National Science Foundation No. See, for example, Phelps, R.P. (2000, Winter). Estimating the cost of systemwide student testing in the United States. Journal of Education Finance, 25(3) 343–380; Danitz, T. (2001, February 27). Special report: States pay $400 million for tests in 2001. Stateline.org. Pew Center for the States; Hoxby, C.M. (2002). The cost of accountability, in W. M Evers & H.J. Walberg (Eds.), School Accountability, Stanford, CA: Hoover Institution Press; U.S. GAO. (1993, January). Student testing: Current extent and expenditures, with cost estimates for a national examination. GAO/PEMD-93-8. Washington, DC: US General Accounting Office; Picus, L.O., & Tralli, A. (1998, February). Alternative assessment programs: What are the true costs? CSE Technical Report 441, Los Angeles: CRESST; Phelps, R.P. (1998). Benefit-cost analysis of systemwide student testing, Paper presented at the annual meeting of the American Education Finance Association, Mobile, AL.
195 Laura S. Hamilton, Brian M. Stecher Laura S. Hamilton, Brian M. Stecher, Stephen P. Klein, Eds. "However, there is still much about these systems that is not well understood. Lack of research-based knowledge about the quality of scores and the mechanisms through which high-stakes testing programs operate limits our ability to improve these systems. As a result, our discussions also identified unanswered questions..." Dismissive Chapter 6: Improving test-based accountability, p.143 Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002 https://www.rand.org/pubs/monograph_reports/MR1554.html US National Science Foundation In fact, the evidence "that testing can improve education" is voluminous. See, for example, Phelps, R. P. (2005). The rich, robust research literature on testing’s achievement benefits. In R. P. Phelps (Ed.), Defending standardized testing (pp. 55–90). Mahwah, NJ: Psychology Press. Or, see https://journals.sagepub.com/doi/abs/10.1177/0193841X19865628#abstract
196 Eva L. Baker, Robert L. Linn, Joan L. Herman, and Daniel Koretz   "Because experience with accountability systems is still developing, the standards we propose are intended to help evaluate existing systems and to guide the design of improved procedures." p.1 Dismissive Standards for Educational
Accountability Systems
CRESST Policy Brief 5, Winter 2002 https://www.gpo.gov/fdsys/pkg/ERIC-ED466643/pdf/ERIC-ED466643.pdf Office of Research and Improvement, US Education Department See, for example:  https://www.tandfonline.com/doi/full/10.1080/15305058.2011.602920 ; https://nonpartisaneducation.org/Review/Resources/QuantitativeList.htm .  This list includes 24 studies completed before 2000 whose primary focus was to measure the effect of “test-based accountability.” A few dozen more pre-2000 studies also measured the effect of test-based accountability although such was not their primary focus. Include qualitative and program evaluation studies of test-based accountability, and the count of pre-2000 studies rises into the hundreds.  Relevant studies of the effects of tests and/or accountability program on motivation and instructional practice include those of the *Southern Regional Education Board (1998); Johnson (1998); Schafer, Hultgren, Hawley, Abrams Seubert & Mazzoni (1997); Miles, Bishop, Collins, Fink, Gardner, Grant, Hussain, et al. (1997); Tuckman & Trimble (1997); Clarke & Stephens (1996); Zigarelli (1996); Stevenson, Lee, et al. (1995); Waters, Burger & Burger (1995); Egeland (1995); Prais (1995); Tuckman (1994); Ritchie & Thorkildsen (1994); Brown & Walberg, (1993); Wall & Alderson (1993); Wolf & Rapiau (1993); Eckstein & Noah (1993); Chao-Qun & Hui (1993); Plazak & Mazur (1992); Steedman (1992); Singh, Marimutha & Mukjerjee (1990); *Levine & Lezotte (1990); O’Sullivan (1989); Somerset (1988); Pennycuick & Murphy (1988); Stevens (1984); Marsh (1984); Brunton (1982); Solberg (1977); Foss (1977); *Kirkland (1971); Somerset (1968); Stuit (1947); and Keys (1934).  *Covers many studies; study is a research review, research synthesis, or meta-analysis.
197 Eva L. Baker, Robert L. Linn, Joan L. Herman, and Daniel Koretz   "It is not possible at this stage in the development of accountability systems to know in advance how every element of an accountability system will actually operate in practice or what effects it will produce." p.1 Dismissive Standards for Educational
Accountability Systems
CRESST Policy Brief 5, Winter 2002 https://www.gpo.gov/fdsys/pkg/ERIC-ED466643/pdf/ERIC-ED466643.pdf Office of Research and Improvement, US Education Department See, for example:  https://www.tandfonline.com/doi/full/10.1080/15305058.2011.602920 ; https://nonpartisaneducation.org/Review/Resources/QuantitativeList.htm .  This list includes 24 studies completed before 2000 whose primary focus was to measure the effect of “test-based accountability.” A few dozen more pre-2000 studies also measured the effect of test-based accountability although such was not their primary focus. Include qualitative and program evaluation studies of test-based accountability, and the count of pre-2000 studies rises into the hundreds.  Relevant studies of the effects of tests and/or accountability program on motivation and instructional practice include those of the *Southern Regional Education Board (1998); Johnson (1998); Schafer, Hultgren, Hawley, Abrams Seubert & Mazzoni (1997); Miles, Bishop, Collins, Fink, Gardner, Grant, Hussain, et al. (1997); Tuckman & Trimble (1997); Clarke & Stephens (1996); Zigarelli (1996); Stevenson, Lee, et al. (1995); Waters, Burger & Burger (1995); Egeland (1995); Prais (1995); Tuckman (1994); Ritchie & Thorkildsen (1994); Brown & Walberg, (1993); Wall & Alderson (1993); Wolf & Rapiau (1993); Eckstein & Noah (1993); Chao-Qun & Hui (1993); Plazak & Mazur (1992); Steedman (1992); Singh, Marimutha & Mukjerjee (1990); *Levine & Lezotte (1990); O’Sullivan (1989); Somerset (1988); Pennycuick & Murphy (1988); Stevens (1984); Marsh (1984); Brunton (1982); Solberg (1977); Foss (1977); *Kirkland (1971); Somerset (1968); Stuit (1947); and Keys (1934).  *Covers many studies; study is a research review, research synthesis, or meta-analysis.
198 Jay P. Heubert   "For Heubert, it is very much an open question what the effect of standards and high-stakes testing will be." p.83 Dismissive Achieving High Standards for All National Research Council   "This project was funded by grant R215U990023 from the Office of Educational Research andImprovement (OERI) of the United States Department of Education." See, for example, https://www.tandfonline.com/doi/full/10.1080/15305058.2011.602920 ; https://nonpartisaneducation.org/Review/Resources/SurveyList.htm ; https://nonpartisaneducation.org/Review/Resources/QualitativeList.htm; https://nonpartisaneducation.org/Review/Resources/QuantitativeList.htm Relevant pre-2000 studies of the effects of standards, alignment, goal setting, setting reachable goals, etc. include those of Mitchell (1999); Morgan & Ramist (1998); the *Southern Regional Education Board (1998); Miles, Bishop, Collins, Fink, Gardner, Grant, Hussain, et al. (1997); the Florida Office of Program Policy Analysis (1997); Pomplun (1997); Schmoker (1996); Aguilera & Hendricks (1996); Banta, Lund, Black & Oblander (1996); Bottoms & Mikos (1995); *Bamburg & Medina (1993); Bishop (1993); the U. S. General Accounting Office (1993); Eckstein & Noah (1993); Mattsson (1993); Brown (1992); Heyneman & Ransom (1992); Whetton (1992); Anderson, Muir, Bateson, Blackmore & Rogers (1990); Csikszentmihalyi (1990); *Levine & Lezotte (1990); LaRoque & Coleman (1989); Hillocks (1987); Willingham & Morris (1986); Resnick & Resnick (1985); Ogle & Fritts (1984); *Natriello & Dornbusch (1984); Brooke & Oxenham (1984); Rentz (1979); Wellisch, MacQueen, Carriere & Dick (1978); *Rosswork (1977); Estes, Colvin & Goodwin (1976); Wood (1953); and Panlasigui & Knight (1930).
199 Ready, Timothy, Ed.; Edley, Christopher, Jr., Ed.; Snow,
Catherine E., Ed.
  "To be sure, there is a largely unexamined empirical assertion under-lying the arguments of high-stakes proponents: attaching high-stakesconsequences for the students provides an indispensable, otherwise un-obtainable incentive for students, parents, and teachers topay carefulattention to learning tasks." p. 128 Dismissive Achieving High Standards for All National Research Council   "This project was funded by grant R215U990023 from the Office of Educational Research andImprovement (OERI) of the United States Department of Education." Relevant studies of the effects of tests and/or accountability program on motivation and instructional practice include those of the *Southern Regional Education Board (1998); Johnson (1998); Schafer, Hultgren, Hawley, Abrams Seubert & Mazzoni (1997); Miles, Bishop, Collins, Fink, Gardner, Grant, Hussain, et al. (1997); Tuckman & Trimble (1997); Clarke & Stephens (1996); Zigarelli (1996); Stevenson, Lee, et al. (1995); Waters, Burger & Burger (1995); Egeland (1995); Prais (1995); Tuckman (1994); Ritchie & Thorkildsen (1994); Brown & Walberg, (1993); Wall & Alderson (1993); Wolf & Rapiau (1993); Eckstein & Noah (1993); Chao-Qun & Hui (1993); Plazak & Mazur (1992); Steedman (1992); Singh, Marimutha & Mukjerjee (1990); *Levine & Lezotte (1990); O’Sullivan (1989); Somerset (1988); Pennycuick & Murphy (1988); Stevens (1984); Marsh (1984); Brunton (1982); Solberg (1977); Foss (1977); *Kirkland (1971); Somerset (1968); Stuit (1947); and Keys (1934).  *Covers many studies; study is a research review, research synthesis, or meta-analysis. "Others have considered the role of tests in incentive programs.  These researchers have included Homme, Csanyi, Gonzales, Rechs, O’Leary, Drabman, Kaszdin, Bootzin, Staats, Cameron, Pierce, McMillan, Corcoran, and Wilson. International organizations, such as the World Bank or the Asian Development Bank, have studied the effects of testing on education programs they sponsor.  Researchers have included Somerset, Heynemann, Ransom, Psacharopoulis, Velez, Brooke, Oxenham, Bude, Chapman, Snyder, and Pronaratna.
Moreover, the mastery learning/mastery testing experiments conducted from the 1960s through today varied incentives, frequency of tests, types of tests, and many other factors to determine the optimal structure of testing programs. Researchers included such notables as Bloom, Carroll, Keller, Block, Burns, Wentling, Anderson, Hymel, Kulik, Tierney, Cross, Okey, Guskey, Gates, and Jones."
200 Daniel M. Koretz Daniel F. McCaffrey, Laura S. Hamilton "Although high-stakes testing is now widespread, methods for evaluating the validity of gains obtained under high-stakes conditions are poorly developed. This report presents an approach for evaluating the validity of inferences based on score gains on high-stakes tests. It describes the inadequacy of traditional validation approaches for validating gains under high-stakes conditions and outlines an alternative validation framework for conceptualizing meaningful and inflated score gains.", p.1 Denigrating Toward a framework for validating gains under high-stakes conditions CSE Technical Report 551, CRESST/Harvard Graduate School of Education, CRESST/RAND Education, December 2001 https://files.eric.ed.gov/fulltext/ED462410.pdf Office of Research and Improvement, US Education Department In fact the test prep, or test coaching, literature is vast and dates back decades, with meta-analyses of the literature dating back at least to the 1970s. There's even a What Works Clearinghouse summary of the (post World Wide Web) college admission test prep research literature:  https://ies.ed.gov/ncee/wwc/Docs/InterventionReports/wwc_act_sat_100416.pdf . See also: Gilmore (1927)  DeWeerdt (1927)  French (1959) French & Dear (1959)  Ortar (1960)  Marron (1965)  ETS (1965). Messick & Jungeblut (1981)  Ellis, Konoske, Wulfeck, & Montague (1982)  DerSimonian and Laird (1983)  Kulik, Bangert-Drowns & Kulik (1984)  Powers (1985)  Samson (1985) Scruggs, White, & Bennion (1986) Jones (1986). Fraker (1986/1987)  Halpin (1987)  Whitla (1988)  Snedecor (1989)  Bond (1989). Baydar (1990)  Becker (1990)  Smyth (1990)  Moore (1991)  Alderson & Wall (1992)  Powers (1993)  Oren (1993). Powers & Rock (1994)  Scholes, Lane (1997)   Allalouf & Ben Shakhar (1998)  Robb & Ercanbrack (1999)  McClain (1999)  Camara (1999, 2001, 2008) Stone & Lane (2000, 2003)  Din & Soldan (2001)  Briggs (2001)  Palmer (2002)  Briggs & Hansen (2004)  Cankoy & Ali Tut (2005)  Crocker (2005)  Allensworth, Correa, & Ponisciak (2008)  Domingue & Briggs (2009)  Koljatic & Silva (2014)  Early (2019)  Herndon (2021)
201 Daniel M. Koretz Daniel F. McCaffrey, Laura S. Hamilton "Few efforts are made to evaluate directly score gains obtained under high-stakes conditions, and conventional validation tools are not fully adequate for the task.", p. 1 Dismissive Toward a framework for validating gains under high-stakes conditions CSE Technical Report 551, CRESST/Harvard Graduate School of Education, CRESST/RAND Education, December 2001 https://files.eric.ed.gov/fulltext/ED462410.pdf Office of Research and Improvement, US Education Department In fact the test prep, or test coaching, literature is vast and dates back decades, with meta-analyses of the literature dating back at least to the 1970s. There's even a What Works Clearinghouse summary of the (post World Wide Web) college admission test prep research literature:  https://ies.ed.gov/ncee/wwc/Docs/InterventionReports/wwc_act_sat_100416.pdf . See also: Gilmore (1927)  DeWeerdt (1927)  French (1959) French & Dear (1959)  Ortar (1960)  Marron (1965)  ETS (1965). Messick & Jungeblut (1981)  Ellis, Konoske, Wulfeck, & Montague (1982)  DerSimonian and Laird (1983)  Kulik, Bangert-Drowns & Kulik (1984)  Powers (1985)  Samson (1985) Scruggs, White, & Bennion (1986) Jones (1986). Fraker (1986/1987)  Halpin (1987)  Whitla (1988)  Snedecor (1989)  Bond (1989). Baydar (1990)  Becker (1990)  Smyth (1990)  Moore (1991)  Alderson & Wall (1992)  Powers (1993)  Oren (1993). Powers & Rock (1994)  Scholes, Lane (1997)   Allalouf & Ben Shakhar (1998)  Robb & Ercanbrack (1999)  McClain (1999)  Camara (1999, 2001, 2008) Stone & Lane (2000, 2003)  Din & Soldan (2001)  Briggs (2001)  Palmer (2002)  Briggs & Hansen (2004)  Cankoy & Ali Tut (2005)  Crocker (2005)  Allensworth, Correa, & Ponisciak (2008)  Domingue & Briggs (2009)  Koljatic & Silva (2014)  Early (2019)
202 Karen J. Mitchell, David Z. Robinson, Barbara S. Plake, & Kaeli T. Knowles (Eds.)   "Despite their importance and widespread use, little is known about the impact of these tests on states’ recent efforts to improve teaching and learning." Dismissive Testing Teacher Candidates: The Role of Licensure Tests in Improving Teacher Quality, 2001, p.14 Committee on Assessment and Teacher Quality   Board on Testing and Assessment, National Research Council Every stage of test development, administration, and analysis at National Evaluation Systems—the contractors for dozens of states' teacher licensure tests—was thoroughly documented. But, instead of requesting that documentation from each state, which owned said documentaiton, the NRC committee insisted that NES provide it. NES refused to do so unless the NRC committee received permission from each state. The NRC committee, apparently, didn't feel like doing that much work, so declared the information nonexistent.
203 Karen J. Mitchell, David Z. Robinson, Barbara S. Plake, & Kaeli T. Knowles (Eds.)   "Little information about the technical soundness of teacher licensure tests appears in the published literature." Dismissive Testing Teacher Candidates: The Role of Licensure Tests in Improving Teacher Quality, 2001, p.14 Committee on Assessment and Teacher Quality   Board on Testing and Assessment, National Research Council Every stage of test development, administration, and analysis at National Evaluation Systems—the contractors for dozens of states' teacher licensure tests—was thoroughly documented. But, instead of requesting that documentation from each state, which ownsed said documentaiton, the NRC committee insisted that NES provide it. NES refused to do so unless the NRC committee received permission from each state. The NRC committee, apparently, didn't feel like doing that much work, so declared the information nonexistent.
204 Karen J. Mitchell, David Z. Robinson, Barbara S. Plake, & Kaeli T. Knowles (Eds.)   "Little research exists on the extent to which licensure tests identify candidates with the knowledge and skills necessary to be minimally competent beginning teachers." Dismissive Testing Teacher Candidates: The Role of Licensure Tests in Improving Teacher Quality, 2001, p.14 Committee on Assessment and Teacher Quality   Board on Testing and Assessment, National Research Council Every stage of test development, administration, and analysis at National Evaluation Systems—the contractors for dozens of states' teacher licensure tests—was thoroughly documented. But, instead of requesting that documentation from each state, which ownsed said documentaiton, the NRC committee insisted that NES provide it. NES refused to do so unless the NRC committee received permission from each state. The NRC committee, apparently, didn't feel like doing that much work, so declared the information nonexistent.
205 Karen J. Mitchell, David Z. Robinson, Barbara S. Plake, & Kaeli T. Knowles (Eds.)   "Information is needed about the soundness and technical quality of the tests that states use to license their teachers." Dismissive Testing Teacher Candidates: The Role of Licensure Tests in Improving Teacher Quality, 2001, p.14 Committee on Assessment and Teacher Quality   Board on Testing and Assessment, National Research Council Every stage of test development, administration, and analysis at National Evaluation Systems—the contractors for dozens of states' teacher licensure tests—was thoroughly documented. But, instead of requesting that documentation from each state, which ownsed said documentaiton, the NRC committee insisted that NES provide it. NES refused to do so unless the NRC committee received permission from each state. The NRC committee, apparently, didn't feel like doing that much work, so declared the information nonexistent.
206 Karen J. Mitchell, David Z. Robinson, Barbara S. Plake, & Kaeli T. Knowles (Eds.)   "policy and practice on teacher licensure testing in the United States are nascent and evolving" Dismissive Testing Teacher Candidates: The Role of Licensure Tests in Improving Teacher Quality, 2001, p.17 Committee on Assessment and Teacher Quality   Board on Testing and Assessment, National Research Council Every stage of test development, administration, and analysis at National Evaluation Systems—the contractors for dozens of states' teacher licensure tests—was thoroughly documented. But, instead of requesting that documentation from each state, which ownsed said documentaiton, the NRC committee insisted that NES provide it. NES refused to do so unless the NRC committee received permission from each state. The NRC committee, apparently, didn't feel like doing that much work, so declared the information nonexistent.
207 Karen J. Mitchell, David Z. Robinson, Barbara S. Plake, & Kaeli T. Knowles (Eds.)   "The paucity of data and these methodological challenges made the committee’s examination of teacher licensure testing difficult." Dismissive Testing Teacher Candidates: The Role of Licensure Tests in Improving Teacher Quality, 2001, p.17 Committee on Assessment and Teacher Quality   Board on Testing and Assessment, National Research Council Every stage of test development, administration, and analysis at National Evaluation Systems—the contractors for dozens of states' teacher licensure tests—was thoroughly documented. But, instead of requesting that documentation from each state, which ownsed said documentaiton, the NRC committee insisted that NES provide it. NES refused to do so unless the NRC committee received permission from each state. The NRC committee, apparently, didn't feel like doing that much work, so declared the information nonexistent.
208 Karen J. Mitchell, David Z. Robinson, Barbara S. Plake, & Kaeli T. Knowles (Eds.)   "There were a number of questions the committee wanted to answer but could not, either because they were beyond the scope of this study, the evidentiary base was inconclusive, or the committee’s time and resources were insufficient." Dismissive Testing Teacher Candidates: The Role of Licensure Tests in Improving Teacher Quality, 2001, p.17 Committee on Assessment and Teacher Quality   Board on Testing and Assessment, National Research Council Every stage of test development, administration, and analysis at National Evaluation Systems—the contractors for dozens of states' teacher licensure tests—was thoroughly documented. But, instead of requesting that documentation from each state, which ownsed said documentaiton, the NRC committee insisted that NES provide it. NES refused to do so unless the NRC committee received permission from each state. The NRC committee, apparently, didn't feel like doing that much work, so declared the information nonexistent.
209 Harold F. O’Neil, Jr., University of Southern California, CRESST
Jamal Abedi, UCLA/CRESST, Charlotte Lee, UCLA/CRESST, Judy Miyoshi, UCLA/CRESST, Ann Mastergeorge, UCLA/CRESST "To our knowledge, based on an extensive literature review (to be reported elsewhere), our research group is the only one conducting research of this type; i.e., meaningful monetary incentives with released items from either NAEP or TIMSS with 12th graders." p.1 Firstness Monetary Incentives for Low-Stakes Tests, March 2001 report to USED, CRESST https://nces.ed.gov/pubs2001/2001024.pdf "The work reported herein was funded at least in part with Federal funds from the U.S. Department of Education under the American Institutes for Research (AIR)/Education Statistical Services Institute (ESSI) contract number RN95127001, Task Order 1.2.93.1, as administered by the ... NCES.. The work reported herein was also supported under the Educational Research and Development Centers Program, PR/Award Number R305B60002, as administered by the Office of Educational Research and Improvement (OERI), U.S. Department of Education."  
210 Marguerite Clarke Jamal Abedi, UCLA/CRESST “[T]here has been no analogous infrastructure for independently evaluating a testing program before or after implementation, or for monitoring test use and impact.” p. 19 Dismissive The Adverse Impact of High Stakes Testing on Minority Students: Evidence from 100 Years of Test Data In G. Orfield and M. Kornhaber (Eds.), Raising standards or raising barriers? Inequality and high stakes testing in public education. New York: The Century Foundation (2001) http://files.eric.ed.gov/fulltext/ED450183.pdf The Century Foundation External evaluations of large-scale testing programs not only exist, but represent the norm. 
211 Marguerite Clarke Charlotte Lee, UCLA/CRESST “The effects of testing are now so diverse, widespread, and serious that it is necessary to establish mechanisms for catalyzing inquiry about, and systematic independent scrutiny of them.” p. 20 Dismissive The Adverse Impact of High Stakes Testing on Minority Students: Evidence from 100 Years of Test Data In G. Orfield and M. Kornhaber (Eds.), Raising standards or raising barriers? Inequality and high stakes testing in public education. New York: The Century Foundation (2001) http://files.eric.ed.gov/fulltext/ED450183.pdf The Century Foundation External evaluations of large-scale testing programs not only exist, but represent the norm. 
212 Ronald Deitel Judy Miyoshi, UCLA/CRESST "In the late 1980s, CRESST was among the first to research the measurement of rigorous, discipline-based knowledge for purposes of large-scale assessment." 1stness Center for Research on Evaluation, Standards, and Student Testing (CRESST) clarify the goals and activities of CRESST EducationNews.org, November 18, 2000   Office of Research and Improvement, US Education Department Nonsense. Hundreds, perhaps thousands, of studies of the effects of testing predate CRESST's in the 1980s. See, for example:  https://www.tandfonline.com/doi/full/10.1080/15305058.2011.602920 ; https://nonpartisaneducation.org/Review/Resources/QuantitativeList.htm ; https://nonpartisaneducation.org/Review/Resources/SurveyList.htm ; https://nonpartisaneducation.org/Review/Resources/QualitativeList.htm
213 Marguerite Clarke Ann Mastergeorge, UCLA/CRESST “[F]or most of this century, there has been no infrastructure for independently evaluating a testing programme before or after implementation, or for monitoring test use and impact. The commercial testing industry does not as yet have any structure in place for the regulation and monitoring of appropriate test use.” p. 177 Dismissive Retrospective on Educational Testing and Assessment in the 20th Century Curriculum Studies, 2000, vol. 32, no. 2, http://webpages.uncc.edu/~rglamber/Rsch6109%20Materials/HistoryAchTests_3958652.pdf   External evaluations of large-scale testing programs not only exist, but represent the norm. 
214 Marguerite Clarke Madaus, Horn, and Ramos “Given the paucity of evidence available on the volume of testing over time, we examined five indirect indicators of growth in testing. . . .” p. 169 Dismissive Retrospective on Educational Testing and Assessment in the 20th Century Curriculum Studies, 2000, vol. 32, no. 2 http://webpages.uncc.edu/~rglamber/Rsch6109%20Materials/HistoryAchTests_3958652.pdf   There exist many sources of such information, from the Council of Chief State School Officers (CCSSO), the US Education Department, the US General Accounting Office (GAO), for example.
215 Sheila Barron   "Although this is a topic researchers ... talk about often, very little has been written about the difficulties secondary analysts confront." p.173 Dismissive Difficulties associated with secondary analysis of NAEP data, chapter 9 Grading the Nation's Report Card, National Research Council, 2000 https://www.nap.edu/catalog/9751/grading-the-nations-report-card-research-from-the-evaluation-of National Research Council funders In their 2009 Evaluation of NAEP for the US Education Department, Buckendahl, Davis, Plake, Sireci, Hambleton, Zenisky, & Wells (pp. 77–85) managed to find quite a lot of research on making comparisons between NAEP and state assessments: several of NAEP's own publications, Chromy 2005), Chromy, Ault, Black, & Mosquin (2007), McLaughlin (2000), Schuiz & Mitzel (2005), Sireci, Robin, Meara, Rogers, & Swaminathan (2000),  Stancavage, Et al (2002),  Stoneberg (2007), WestEd (2002), and Wise, Le, Hoffman, & Becker (2004). 
216 Sheila Barron   "...few articles have been written that specifically address the difficulties of using NAEP data." p.173 Dismissive Difficulties associated with secondary analysis of NAEP data, chapter 9 Grading the Nation's Report Card, National Research Council, 2000 https://www.nap.edu/catalog/9751/grading-the-nations-report-card-research-from-the-evaluation-of National Research Council funders In their 2009 Evaluation of NAEP for the US Education Department, Buckendahl, Davis, Plake, Sireci, Hambleton, Zenisky, & Wells (pp. 77–85) managed to find quite a lot of research on making comparisons between NAEP and state assessments: several of NAEP's own publications, Chromy 2005), Chromy, Ault, Black, & Mosquin (2007), McLaughlin (2000), Schuiz & Mitzel (2005), Sireci, Robin, Meara, Rogers, & Swaminathan (2000),  Stancavage, Et al (2002),  Stoneberg (2007), WestEd (2002), and Wise, Le, Hoffman, & Becker (2004). 
217 Herman, Joan L.    “Testing accommodations that attempt to reduce the language load of a test or otherwise compensate for students' reduced language skills (e.g., by providing students more time) are also currently being researched, but answers that are equitable and fair for all students have not yet been found.” p. 8 Dismissive Student Assessment and Student Achievement in the California Public School System (with Brown and Baker) CSE Technical Report 519, April 2000 https://www.cse.ucla.edu/products/reports/TECH519.pdf Office of Research and Improvement, US Education Department
218 Herman, Joan L.    “Thus, the extent to which gains reflect real improvement in learning is an open question (see, e.g., Shepard, 1990).” p. 15 Dismissive Student Assessment and Student Achievement in the California Public School System (with Brown and Baker) CSE Technical Report 519, April 2000 https://www.cse.ucla.edu/products/reports/TECH519.pdf Office of Research and Improvement, US Education Department
219 R. L. Linn   "There are many reasons for the Lake Wobegon Effect, most of which are less sinister than those emphasized by Cannell." Denigrating Assessments and Accountability, p.7 Educational Researcher, March, pp.4–16. https://journals.sagepub.com/doi/abs/10.3102/0013189x029002004 Office of Research and Improvement, US Education Department No. Cannell was exactly right. There  was corruption, lax security, and cheating. See, for example, https://nonpartisaneducation.org/Review/Articles/v6n3.htm
220 Lorrie A. Shepard   "This portrayal derives mostly from research leading to Wood and Bruner’s original conception of scaffolding, from Vygotskian theory, and from naturalistic studies of effective tutoring described next. Relatively few studies have been undertaken in which explicit feedback interventions have been tried in the context of constructivist instructional settings." Dismissive The Role of Classroom Assessment in Teaching and Learning, p.59 CSE Technical Report 517, February 2000 https://nepc.colorado.edu/sites/default/files/publications/TECH517.pdf Office of Research and Improvement, US Education Department  
221 Lorrie A. Shepard   "The NCTM and NRC visions are idealizations based on beliefs about constructivist pedagogy and reflective practice. Although both are supported by examples of individual teachers who use assessment to improve their teaching, little is known about what kinds of support would be required to help large numbers of teachers develop these strategies or to ensure that teacher education programs prepared teachers to use assessment in these ways. Research is needed to address these basic implementation questions." Dismissive The Role of Classroom Assessment in Teaching and Learning, p.64 CSE Technical Report 517, February 2000 https://nepc.colorado.edu/sites/default/files/publications/TECH517.pdf Office of Research and Improvement, US Education Department Rubbish. Entire books dating back a century were written on the topic, for example:  C.C. Ross, Measurement in Today’s Schools, 1942;  G.M. Ruch, G.D. Stoddard, Tests and Measurements in High School Instruction, 1927;  C.W. Odell, Educational Measurement in High School, 1930. Other testimonies to the abundance of educational testing and empirical research on test use starting in the first half of the twentieth century can be found in Lincoln & Workman 1936, 4, 7; Butts 1947, 605; Monroe 1950, 1461; Holman & Docter 1972, 34; Tyack 1974, 183; and Lohman 1997, 88.
222 Lorrie A. Shepard   "This social-constructivist view of classroom assessment is an idealization. The new ideas and perspectives underlying it have a basis in theory and empirical studies, but how they will work in practice and on a larger scale is not known." Dismissive The Role of Classroom Assessment in Teaching and Learning, p.67 CSE Technical Report 517, February 2000 https://nepc.colorado.edu/sites/default/files/publications/TECH517.pdf Office of Research and Improvement, US Education Department  
223 Marguerite Clarke Madaus, Pedulla, and Shore “The National Board believes that we must as a nation conduct research that helps testing contribute to student learning, classroom practice, and state and district management of school resources.” p. 2 Dismissive An Agenda for Research on Educational Testing NBETPP Statements, Vol. 1, No. 1, Jan. 2000 http://files.eric.ed.gov/fulltext/ED456137.pdf Ford Foundation Rubbish. Entire books dating back a century were written on the topic, for example:  C.C. Ross, Measurement in Today’s Schools, 1942;  G.M. Ruch, G.D. Stoddard, Tests and Measurements in High School Instruction, 1927;  C.W. Odell, Educational Measurement in High School, 1930. Other testimonies to the abundance of educational testing and empirical research on test use starting in the first half of the twentieth century can be found in Lincoln & Workman 1936, 4, 7; Butts 1947, 605; Monroe 1950, 1461; Holman & Docter 1972, 34; Tyack 1974, 183; and Lohman 1997, 88.
224 Marguerite Clarke Madaus, Pedulla, and Shore “Validity research on teacher testing needs to address the following four issues in particular. . .” : [four bullet-point paragraphs follow] p. 3 Dismissive An Agenda for Research on Educational Testing NBETPP Statements, Vol. 1, No. 1, Jan. 2000 http://files.eric.ed.gov/fulltext/ED456137.pdf Ford Foundation  
225 Marguerite Clarke Madaus, Pedulla, and Shore “[W]e need to understand better the relationship between testing and the diversity of the college student body.” p. 6 Dismissive An Agenda for Research on Educational Testing NBETPP Statements, Vol. 1, No. 1, Jan. 2000 http://files.eric.ed.gov/fulltext/ED456137.pdf Ford Foundation  
226 Marguerite Clarke Haney, Madaus We trust that further research will build on this good example and help all of us move from suggestive correlational studies towards more definitive conclusions.” p. 9 1stness High Stakes Testing and High School Completion NBETPP Statements, Volume 1, Number 3, Jan. 2000 http://files.eric.ed.gov/fulltext/ED456139.pdf Ford Foundation Just some of the relevant pre-2008 studies of the effects of minimum-competency or exit exams and the problems with a single passing score include those of Alvarez, Moreno, & Patrinos (2007); Grodsky & Kalogrides (2006); Audette (2005); Orlich (2003); StandardsWork (2003); Meisels, et al. (2003); Braun (2003); Rosenshine (2003); Tighe, Wang, & Foley (2002); Carnoy & Loeb (2002); Baumert & Demmrich (2001); Rosenblatt & Offer (2001); Phelps (2001); Toenjes, Dworkin, Lorence, & Hill (2000); Wenglinsky (2000); Massachusetts Finance Office (2000); DeMars (2000); Bishop (1999, 2000, 2001, & 2004); Grissmer & Flanagan(1998); Strauss, Bowes, Marks, & Plesko (1998); Frederiksen (1994); Ritchie & Thorkildsen (1994); Chao-Qun & Hui (1993); Potter & Wall (1992); Jacobson (1992); Rodgers, et al. (1991); Morris (1991); Winfield (1990); Ligon, Johnstone, Brightman, Davis, et al. (1990); Winfield (1987); Koffler (1987); Losack (1987); Marshall (1987); Hembree (1987); Mangino, Battaille, Washington, & Rumbaut (1986); Michigan Department of Education (1984); Ketchie (1984); Serow (1982); Indiana Education Department (1982); Brunton (1982); Paramore, et al. (1980); Ogden (1979); Down(2) (1979); Wellisch (1978); and Findley (1978).
227 Jay P. Heubert Robert M. Hauser "A growing body of research suggests that tests often do in fact change school and classroom practices (Corbett & Wilson, 1991; Madaus, 1988; Herman & Golan 1993; Smith & Rottenberg, 1991)." p.29 Dismissive High Stakes: Testing for Tracking, Promotion, and Graduation Board on Testing and Assessment, National Research Council, 1999 https://www.nap.edu/catalog/6336/high-stakes-testing-for-tracking-promotion-and-graduation Ford Foundation Rubbish. Entire books dating back a century were written on the topic, for example:  C.C. Ross, Measurement in Today’s Schools, 1942;  G.M. Ruch, G.D. Stoddard, Tests and Measurements in High School Instruction, 1927;  C.W. Odell, Educational Measurement in High School, 1930. Other testimonies to the abundance of educational testing and empirical research on test use starting in the first half of the twentieth century can be found in Lincoln & Workman 1936, 4, 7; Butts 1947, 605; Monroe 1950, 1461; Holman & Docter 1972, 34; Tyack 1974, 183; and Lohman 1997, 88.
228 Jay P. Heubert Robert M. Hauser "A growing body of research suggests that tests often do in fact change school and classroom practices (Corbett & Wilson, 1991; Madaus, 1988; Herman & Golan 1993; Smith & Rottenberg, 1991)." p.29 Denigrating High Stakes: Testing for Tracking, Promotion, and Graduation Board on Testing and Assessment, National Research Council, 1999 https://www.nap.edu/catalog/6336/high-stakes-testing-for-tracking-promotion-and-graduation Ford Foundation Rubbish. Entire books dating back a century were written on the topic, for example:  C.C. Ross, Measurement in Today’s Schools, 1942;  G.M. Ruch, G.D. Stoddard, Tests and Measurements in High School Instruction, 1927;  C.W. Odell, Educational Measurement in High School, 1930. Other testimonies to the abundance of educational testing and empirical research on test use starting in the first half of the twentieth century can be found in Lincoln & Workman 1936, 4, 7; Butts 1947, 605; Monroe 1950, 1461; Holman & Docter 1972, 34; Tyack 1974, 183; and Lohman 1997, 88.
229 Jay P. Heubert Robert M. Hauser "Most standards-based assessments have only recently been implemented or are still being developed. Consequently, it is too early to determine whether they will produce the intended effects on classroom instruction." p.36 Dismissive High Stakes: Testing for Tracking, Promotion, and Graduation Board on Testing and Assessment, National Research Council, 1999 https://www.nap.edu/catalog/6336/high-stakes-testing-for-tracking-promotion-and-graduation Ford Foundation Relevant pre-2000 studies of the effects of standards, alignment, goal setting, setting reachable goals, etc. include those of Mitchell (1999); Morgan & Ramist (1998); the *Southern Regional Education Board (1998); Miles, Bishop, Collins, Fink, Gardner, Grant, Hussain, et al. (1997); the Florida Office of Program Policy Analysis (1997); Pomplun (1997); Schmoker (1996); Aguilera & Hendricks (1996); Banta, Lund, Black & Oblander (1996); Bottoms & Mikos (1995); *Bamburg & Medina (1993); Bishop (1993); the U. S. General Accounting Office (1993); Eckstein & Noah (1993); Mattsson (1993); Brown (1992); Heyneman & Ransom (1992); Whetton (1992); Anderson, Muir, Bateson, Blackmore & Rogers (1990); Csikszentmihalyi (1990); *Levine & Lezotte (1990); LaRoque & Coleman (1989); Hillocks (1987); Willingham & Morris (1986); Resnick & Resnick (1985); Ogle & Fritts (1984); *Natriello & Dornbusch (1984); Brooke & Oxenham (1984); Rentz (1979); Wellisch, MacQueen, Carriere & Dick (1978); *Rosswork (1977); Estes, Colvin & Goodwin (1976); Wood (1953); and Panlasigui & Knight (1930).
230 Jay P. Heubert Robert M. Hauser "A recent review of the available research evidence by Mehrens (1998) reaches several interim conclusions. Drawing on eight studies...." p.36 Dismissive High Stakes: Testing for Tracking, Promotion, and Graduation Board on Testing and Assessment, National Research Council, 1999 https://www.nap.edu/catalog/6336/high-stakes-testing-for-tracking-promotion-and-graduation Ford Foundation Just some of the relevant pre-2008 studies of the effects of minimum-competency or exit exams and the problems with a single passing score include those of Alvarez, Moreno, & Patrinos (2007); Grodsky & Kalogrides (2006); Audette (2005); Orlich (2003); StandardsWork (2003); Meisels, et al. (2003); Braun (2003); Rosenshine (2003); Tighe, Wang, & Foley (2002); Carnoy & Loeb (2002); Baumert & Demmrich (2001); Rosenblatt & Offer (2001); Phelps (2001); Toenjes, Dworkin, Lorence, & Hill (2000); Wenglinsky (2000); Massachusetts Finance Office (2000); DeMars (2000); Bishop (1999, 2000, 2001, & 2004); Grissmer & Flanagan(1998); Strauss, Bowes, Marks, & Plesko (1998); Frederiksen (1994); Ritchie & Thorkildsen (1994); Chao-Qun & Hui (1993); Potter & Wall (1992); Jacobson (1992); Rodgers, et al. (1991); Morris (1991); Winfield (1990); Ligon, Johnstone, Brightman, Davis, et al. (1990); Winfield (1987); Koffler (1987); Losack (1987); Marshall (1987); Hembree (1987); Mangino, Battaille, Washington, & Rumbaut (1986); Michigan Department of Education (1984); Ketchie (1984); Serow (1982); Indiana Education Department (1982); Brunton (1982); Paramore, et al. (1980); Ogden (1979); Down(2) (1979); Wellisch (1978); and Findley (1978).
231 Jay P. Heubert Robert M. Hauser "Although there are no national data summarizing how local districts use standardized tests in certifying students, we do know that serveral of the largest school systems have begun to use test scores in determining grade-to-grade promotion (Chicago) or are considering doing so (New York City, Boston)." p.37 Dismissive High Stakes: Testing for Tracking, Promotion, and Graduation Board on Testing and Assessment, National Research Council, 1999 https://www.nap.edu/catalog/6336/high-stakes-testing-for-tracking-promotion-and-graduation Ford Foundation Just some of the relevant pre-2008 studies of the effects of minimum-competency or exit exams and the problems with a single passing score include those of Alvarez, Moreno, & Patrinos (2007); Grodsky & Kalogrides (2006); Audette (2005); Orlich (2003); StandardsWork (2003); Meisels, et al. (2003); Braun (2003); Rosenshine (2003); Tighe, Wang, & Foley (2002); Carnoy & Loeb (2002); Baumert & Demmrich (2001); Rosenblatt & Offer (2001); Phelps (2001); Toenjes, Dworkin, Lorence, & Hill (2000); Wenglinsky (2000); Massachusetts Finance Office (2000); DeMars (2000); Bishop (1999, 2000, 2001, & 2004); Grissmer & Flanagan(1998); Strauss, Bowes, Marks, & Plesko (1998); Frederiksen (1994); Ritchie & Thorkildsen (1994); Chao-Qun & Hui (1993); Potter & Wall (1992); Jacobson (1992); Rodgers, et al. (1991); Morris (1991); Winfield (1990); Ligon, Johnstone, Brightman, Davis, et al. (1990); Winfield (1987); Koffler (1987); Losack (1987); Marshall (1987); Hembree (1987); Mangino, Battaille, Washington, & Rumbaut (1986); Michigan Department of Education (1984); Ketchie (1984); Serow (1982); Indiana Education Department (1982); Brunton (1982); Paramore, et al. (1980); Ogden (1979); Down(2) (1979); Wellisch (1978); and Findley (1978).
232 Jay P. Heubert Robert M. Hauser "There is very little research that specifically addresses the consequences of graduation testing." p.172 Dismissive High Stakes: Testing for Tracking, Promotion, and Graduation Board on Testing and Assessment, National Research Council, 1999 https://www.nap.edu/catalog/6336/high-stakes-testing-for-tracking-promotion-and-graduation Ford Foundation Just some of the relevant pre-2008 studies of the effects of minimum-competency or exit exams and the problems with a single passing score include those of Alvarez, Moreno, & Patrinos (2007); Grodsky & Kalogrides (2006); Audette (2005); Orlich (2003); StandardsWork (2003); Meisels, et al. (2003); Braun (2003); Rosenshine (2003); Tighe, Wang, & Foley (2002); Carnoy & Loeb (2002); Baumert & Demmrich (2001); Rosenblatt & Offer (2001); Phelps (2001); Toenjes, Dworkin, Lorence, & Hill (2000); Wenglinsky (2000); Massachusetts Finance Office (2000); DeMars (2000); Bishop (1999, 2000, 2001, & 2004); Grissmer & Flanagan(1998); Strauss, Bowes, Marks, & Plesko (1998); Frederiksen (1994); Ritchie & Thorkildsen (1994); Chao-Qun & Hui (1993); Potter & Wall (1992); Jacobson (1992); Rodgers, et al. (1991); Morris (1991); Winfield (1990); Ligon, Johnstone, Brightman, Davis, et al. (1990); Winfield (1987); Koffler (1987); Losack (1987); Marshall (1987); Hembree (1987); Mangino, Battaille, Washington, & Rumbaut (1986); Michigan Department of Education (1984); Ketchie (1984); Serow (1982); Indiana Education Department (1982); Brunton (1982); Paramore, et al. (1980); Ogden (1979); Down(2) (1979); Wellisch (1978); and Findley (1978).
233 Jay P. Heubert Robert M. Hauser "Caterall adds, 'initial boasts and doubts alike regarding the effects of gatekeeping competency testing have met with a paucity of follow-up research." p.172 Dismissive High Stakes: Testing for Tracking, Promotion, and Graduation Board on Testing and Assessment, National Research Council, 1999 https://www.nap.edu/catalog/6336/high-stakes-testing-for-tracking-promotion-and-graduation Ford Foundation Just some of the relevant pre-2008 studies of the effects of minimum-competency or exit exams and the problems with a single passing score include those of Alvarez, Moreno, & Patrinos (2007); Grodsky & Kalogrides (2006); Audette (2005); Orlich (2003); StandardsWork (2003); Meisels, et al. (2003); Braun (2003); Rosenshine (2003); Tighe, Wang, & Foley (2002); Carnoy & Loeb (2002); Baumert & Demmrich (2001); Rosenblatt & Offer (2001); Phelps (2001); Toenjes, Dworkin, Lorence, & Hill (2000); Wenglinsky (2000); Massachusetts Finance Office (2000); DeMars (2000); Bishop (1999, 2000, 2001, & 2004); Grissmer & Flanagan(1998); Strauss, Bowes, Marks, & Plesko (1998); Frederiksen (1994); Ritchie & Thorkildsen (1994); Chao-Qun & Hui (1993); Potter & Wall (1992); Jacobson (1992); Rodgers, et al. (1991); Morris (1991); Winfield (1990); Ligon, Johnstone, Brightman, Davis, et al. (1990); Winfield (1987); Koffler (1987); Losack (1987); Marshall (1987); Hembree (1987); Mangino, Battaille, Washington, & Rumbaut (1986); Michigan Department of Education (1984); Ketchie (1984); Serow (1982); Indiana Education Department (1982); Brunton (1982); Paramore, et al. (1980); Ogden (1979); Down(2) (1979); Wellisch (1978); and Findley (1978).
234 Jay P. Heubert Robert M. Hauser "in one of the few such studies on this topic (Bishop, 1997) compared the Third International Mathematics and Science Study (TIMSS) test scores of countries with and without rigorous graduation tests. He found that countries with demanding exit exams outperformed other countries at a comparable level of development. He concluded, however that such exams were probably not the most important determinant of achievement levels and that more research was needed." p.173 Dismissive High Stakes: Testing for Tracking, Promotion, and Graduation Board on Testing and Assessment, National Research Council, 1999 https://www.nap.edu/catalog/6336/high-stakes-testing-for-tracking-promotion-and-graduation Ford Foundation Relevant pre-2000 studies of the effects of minimum-competency testing and the problems with a single passing score include those of Frederiksen (1994); Winfield (1990); Ligon, Johnstone, Brightman, Davis, et al. (1990); Losack (1987); Mangino & Babcock (1986); Serow (1982); Brunton (1982); Paramore, et al. (1980); Ogden (1979); and Findley (1978).
235 Jay P. Heubert Robert M. Hauser "Very little is known about the specific consequences of passing or failing a high school graduation exam." p.176 Dismissive High Stakes: Testing for Tracking, Promotion, and Graduation Board on Testing and Assessment, National Research Council, 1999 https://www.nap.edu/catalog/6336/high-stakes-testing-for-tracking-promotion-and-graduation Ford Foundation Relevant pre-2000 studies of the effects of minimum-competency testing and the problems with a single passing score include those of Frederiksen (1994); Winfield (1990); Ligon, Johnstone, Brightman, Davis, et al. (1990); Losack (1987); Mangino & Babcock (1986); Serow (1982); Brunton (1982); Paramore, et al. (1980); Ogden (1979); and Findley (1978).
236 Jay P. Heubert Robert M. Hauser "American experience is limited and research is needed to explore their effectiveness. For instance, we do not know how to combine advance notice of high-stakes test requirements, remedial intervention, and opportunity to retake graduation tests." p.180 Dismissive High Stakes: Testing for Tracking, Promotion, and Graduation Board on Testing and Assessment, National Research Council, 1999 https://www.nap.edu/catalog/6336/high-stakes-testing-for-tracking-promotion-and-graduation Ford Foundation Relevant pre-2000 studies of the effects of minimum-competency testing and the problems with a single passing score include those of Frederiksen (1994); Winfield (1990); Ligon, Johnstone, Brightman, Davis, et al. (1990); Losack (1987); Mangino & Babcock (1986); Serow (1982); Brunton (1982); Paramore, et al. (1980); Ogden (1979); and Findley (1978).
237 Jay P. Heubert Robert M. Hauser "Research is also needed to explore the effects of different kinds of high school credentials on employment and other post-school outcomes." p.180 Dismissive High Stakes: Testing for Tracking, Promotion, and Graduation Board on Testing and Assessment, National Research Council, 1999 https://www.nap.edu/catalog/6336/high-stakes-testing-for-tracking-promotion-and-graduation Ford Foundation  
238 Jay P. Heubert Robert M. Hauser "At the same time, solid evaluation research on the most effective remedial approaches is sparse." p.183 Denigrating High Stakes: Testing for Tracking, Promotion, and Graduation Board on Testing and Assessment, National Research Council, 1999 https://www.nap.edu/catalog/6336/high-stakes-testing-for-tracking-promotion-and-graduation Ford Foundation Developmental (i.e., remedial) education researchers have conducted many studies to determine what works best to keep students from failing in their “courses of last resort,” after which there are no alternatives.  Researchers have included Boylan, Roueche, McCabe, Wheeler, Kulik, Bonham, Claxton, Bliss, Schonecker, Chen, Chang, and Kirk.
239 Jay P. Heubert Robert M. Hauser "There is plainly a need for good research on effective remedial eduation." p.183 Denigrating High Stakes: Testing for Tracking, Promotion, and Graduation Board on Testing and Assessment, National Research Council, 1999 https://www.nap.edu/catalog/6336/high-stakes-testing-for-tracking-promotion-and-graduation Ford Foundation Developmental (i.e., remedial) education researchers have conducted many studies to determine what works best to keep students from failing in their “courses of last resort,” after which there are no alternatives.  Researchers have included Boylan, Roueche, McCabe, Wheeler, Kulik, Bonham, Claxton, Bliss, Schonecker, Chen, Chang, and Kirk.
240 Jay P. Heubert Robert M. Hauser "However, in most of the nation, much needs to be done before a world-class curriculum and world-class instruction will be in place." p.277 Dismissive High Stakes: Testing for Tracking, Promotion, and Graduation Board on Testing and Assessment, National Research Council, 1999 https://www.nap.edu/catalog/6336/high-stakes-testing-for-tracking-promotion-and-graduation Ford Foundation  
241 Jay P. Heubert Robert M. Hauser "The committee sees a strong need for better evidence on the benefits and costs of high-stakes testing." p.281 Denigrating High Stakes: Testing for Tracking, Promotion, and Graduation Board on Testing and Assessment, National Research Council, 1999 https://www.nap.edu/catalog/6336/high-stakes-testing-for-tracking-promotion-and-graduation Ford Foundation No. See, for example, Phelps, R.P. (2000, Winter). Estimating the cost of systemwide student testing in the United States. Journal of Education Finance, 25(3) 343–380; Danitz, T. (2001, February 27). Special report: States pay $400 million for tests in 2001. Stateline.org. Pew Center for the States; Hoxby, C.M. (2002). The cost of accountability, in W. M Evers & H.J. Walberg (Eds.), School Accountability, Stanford, CA: Hoover Institution Press; U.S. GAO. (1993, January). Student testing: Current extent and expenditures, with cost estimates for a national examination. GAO/PEMD-93-8. Washington, DC: US General Accounting Office; Phelps, R.P. (1998). Benefit-cost analysis of systemwide student testing, Paper presented at the annual meeting of the American Education Finance Association, Mobile, AL.
242 Jay P. Heubert Robert M. Hauser "Very little is known about the specific consequences of passing or failing a high school graduation exam." p.288 Dismissive High Stakes: Testing for Tracking, Promotion, and Graduation Board on Testing and Assessment, National Research Council, 1999 https://www.nap.edu/catalog/6336/high-stakes-testing-for-tracking-promotion-and-graduation Ford Foundation The many studies of district and state minimum competency or diploma testing programs popular from the 1960s through the 1980s found positive effects for students just below the cut score and mixed effects for students far below and anywhere above.  Researchers have included Fincher, Jackson, Battiste, Corcoran, Jacobsen, Tanner, Boylan, Saxon, Anderson, Muir, Bateson, Blackmore, Rogers, Zigarelli, Schafer, Hultgren, Hawley, Abrams, Seubert, Mazzoni, Brookhart, Mendro, Herrick, Webster, Orsack, Weerasinghe, and Bembry
243 Jay P. Heubert Robert M. Hauser "At present, however, advanced skills are often not well defined and ways of assessing them are not well established." p.289 Denigrating High Stakes: Testing for Tracking, Promotion, and Graduation Board on Testing and Assessment, National Research Council, 1999 https://www.nap.edu/catalog/6336/high-stakes-testing-for-tracking-promotion-and-graduation Ford Foundation  
244 Jay P. Heubert Robert M. Hauser "...in many cases, the demands that full participation of these students [i.e., students with disabilities] place on assessment systems are greater than current assessment knowledge and technology can support." p.191 Dismissive High Stakes: Testing for Tracking, Promotion, and Graduation Board on Testing and Assessment, National Research Council, 1999 https://www.nap.edu/catalog/6336/high-stakes-testing-for-tracking-promotion-and-graduation Ford Foundation  
245 Jay P. Heubert Robert M. Hauser "...available evidence about the possible effects of graduation tests on learning and on high school dropout is inconclusive (e.g., Kreitzer et al., 1989, Reardon, 1996; Catterall, 1990; Cawthorne, 1990; Bishop, 1997). Dismissive High Stakes: Testing for Tracking, Promotion, and Graduation Board on Testing and Assessment, National Research Council, 1999 https://www.nap.edu/catalog/6336/high-stakes-testing-for-tracking-promotion-and-graduation Ford Foundation The many studies of district and state minimum competency or diploma testing programs popular from the 1960s through the 1980s found positive effects for students just below the cut score and mixed effects for students far below and anywhere above.  Researchers have included Fincher, Jackson, Battiste, Corcoran, Jacobsen, Tanner, Boylan, Saxon, Anderson, Muir, Bateson, Blackmore, Rogers, Zigarelli, Schafer, Hultgren, Hawley, Abrams, Seubert, Mazzoni, Brookhart, Mendro, Herrick, Webster, Orsack, Weerasinghe, and Bembry
246 Jay P. Heubert Robert M. Hauser "We do not know how to combine advance notice of high-stakes test requirements, remedial intervention, and opportunity to retake graduation tests. Research is also needed to explore the effects of different kinds of high school credentials on employment and other post-school outcomes." p.289 Dismissive High Stakes: Testing for Tracking, Promotion, and Graduation Board on Testing and Assessment, National Research Council, 1999 https://www.nap.edu/catalog/6336/high-stakes-testing-for-tracking-promotion-and-graduation Ford Foundation The many studies of district and state minimum competency or diploma testing programs popular from the 1960s through the 1980s found positive effects for students just below the cut score and mixed effects for students far below and anywhere above.  Researchers have included Fincher, Jackson, Battiste, Corcoran, Jacobsen, Tanner, Boylan, Saxon, Anderson, Muir, Bateson, Blackmore, Rogers, Zigarelli, Schafer, Hultgren, Hawley, Abrams, Seubert, Mazzoni, Brookhart, Mendro, Herrick, Webster, Orsack, Weerasinghe, and Bembry
247 Richard F. Elmore, Robert Rothman, Eds. Eva L. Baker, Lauren B. Resnick, Robert L. Linn, Lorraine McDonnel, Lauress L. Wise, Michael Feuer, et al. "But the practical nature of our charge and the limits of the evidence available to us have meant that we have also had to draw on the practical experience of committee members and outside experts in crafting our advice. Hence, this report relies heavily on expert advice from the field, in addition to scientific research." p. vii Dismissive Testing, Teaching, and Learning: A Guide forStates and School Districts, 1999 Committee on Title I Testing and Assessment, Board on Testing and Assessment, National Research Council   Pew Charitable Trusts, Spencer Foundation, William T. Grant Foundation  
248 Richard F. Elmore, Robert Rothman, Eds. Eva L. Baker, Lauren B. Resnick, Robert L. Linn, Lorraine McDonnel, Lauress L. Wise, Michael Feuer, et al. "we reviewed available evidence from research on assessment, accountability, and standards-based reform. However, we recognized that in many areas the evidentiary base was slim." p.11 Dismissive Testing, Teaching, and Learning: A Guide forStates and School Districts, 1999 Committee on Title I Testing and Assessment, Board on Testing and Assessment, National Research Council   Pew Charitable Trusts, Spencer Foundation, William T. Grant Foundation  
249 Richard F. Elmore, Robert Rothman, Eds. Eva L. Baker, Lauren B. Resnick, Robert L. Linn, Lorraine McDonnel, Lauress L. Wise, Michael Feuer, et al. "Standards-based reform is a new idea, and few places have put all the pieces in place, and even fewer have put them in place long enough to enable scholars to observe their effects." p.11 1stness Testing, Teaching, and Learning: A Guide forStates and School Districts, 1999 Committee on Title I Testing and Assessment, Board on Testing and Assessment, National Research Council   Pew Charitable Trusts, Spencer Foundation, William T. Grant Foundation  
250 Richard F. Elmore, Robert Rothman, Eds. Eva L. Baker, Lauren B. Resnick, Robert L. Linn, Lorraine McDonnel, Lauress L. Wise, Michael Feuer, et al. "Yet despite the prominence of standards-based reform in the policy debate, there are few examples of districts or states that have put the entire standards-based puzzle together, much less achieved success through it. Some evidence is beginning to gather." p.16 Dismissive Testing, Teaching, and Learning: A Guide forStates and School Districts, 1999 Committee on Title I Testing and Assessment, Board on Testing and Assessment, National Research Council   Pew Charitable Trusts, Spencer Foundation, William T. Grant Foundation  
251 Richard F. Elmore, Robert Rothman, Eds. Eva L. Baker, Lauren B. Resnick, Robert L. Linn, Lorraine McDonnel, Lauress L. Wise, Michael Feuer, et al. "In large part, the limited body of evidence in this country reflects the complexity of the concept." p.16 Dismissive Testing, Teaching, and Learning: A Guide forStates and School Districts, 1999 Committee on Title I Testing and Assessment, Board on Testing and Assessment, National Research Council   Pew Charitable Trusts, Spencer Foundation, William T. Grant Foundation  
252 Richard F. Elmore, Robert Rothman, Eds. Eva L. Baker, Lauren B. Resnick, Robert L. Linn, Lorraine McDonnel, Lauress L. Wise, Michael Feuer, et al. "Despite the common use of such accommodations, however, there is little research on their effects on the validity of test score information, and most of the research has examined college admission tests and other postsecondary measures, not achievement tests in elementary and secondary schools (National Research Council, 1997a)." p.57 Dismissive Testing, Teaching, and Learning: A Guide forStates and School Districts, 1999 Committee on Title I Testing and Assessment, Board on Testing and Assessment, National Research Council   Pew Charitable Trusts, Spencer Foundation, William T. Grant Foundation  
253 Richard F. Elmore, Robert Rothman, Eds. Eva L. Baker, Lauren B. Resnick, Robert L. Linn, Lorraine McDonnel, Lauress L. Wise, Michael Feuer, et al. "Because of the paucity of research, questions remain about whether test results from assessments using accommodations represent valid and reliable indicators of what students with disabilities know and are able to do (Koretz, 1997)." p.57 Dismissive Testing, Teaching, and Learning: A Guide forStates and School Districts, 1999 Committee on Title I Testing and Assessment, Board on Testing and Assessment, National Research Council   Pew Charitable Trusts, Spencer Foundation, William T. Grant Foundation  
254 Richard F. Elmore, Robert Rothman, Eds. Eva L. Baker, Lauren B. Resnick, Robert L. Linn, Lorraine McDonnel, Lauress L. Wise, Michael Feuer, et al. "As with accommodations for students with disabilities, the research on the effects of test accommodations for English-language learners is inconclusive." p.62 Dismissive Testing, Teaching, and Learning: A Guide forStates and School Districts, 1999 Committee on Title I Testing and Assessment, Board on Testing and Assessment, National Research Council   Pew Charitable Trusts, Spencer Foundation, William T. Grant Foundation  
255 Richard F. Elmore, Robert Rothman, Eds. Eva L. Baker, Lauren B. Resnick, Robert L. Linn, Lorraine McDonnel, Lauress L. Wise, Michael Feuer, et al. "The small body of research that has examined classrooms in depth suggests that such instructional practices may be rare, even among teachers who say they endorse the changes the standards are intended to foster." p.75 Dismissive Testing, Teaching, and Learning: A Guide forStates and School Districts, 1999 Committee on Title I Testing and Assessment, Board on Testing and Assessment, National Research Council   Pew Charitable Trusts, Spencer Foundation, William T. Grant Foundation  
256 Richard F. Elmore, Robert Rothman, Eds. Eva L. Baker, Lauren B. Resnick, Robert L. Linn, Lorraine McDonnel, Lauress L. Wise, Michael Feuer, et al. "Districts' capacity to monitor the conditions of instruction in schools is limited, and there are few examples of districts that have been shown to be effective in analyzing such conditions and using the data to improve instruction. The research base on such efforts is slim, in large part because there are so few examples to study." p.76 Dismissive Testing, Teaching, and Learning: A Guide forStates and School Districts, 1999 Committee on Title I Testing and Assessment, Board on Testing and Assessment, National Research Council   Pew Charitable Trusts, Spencer Foundation, William T. Grant Foundation  
257 Robert L. Linn   "Two obvious, but frequently ignored, cautions [from the TIERS experience] are these: . . . " p. 6 Denigrating Assessments and Accountability CSE Technical Report 490 (November 1998) http://www.cse.ucla.edu/products/Reports/TECH490.pdf Office of Research and Improvement, US Education Department  
258 Robert L. Linn   "Moreover, it is critical to recognize first that the choice of constructs matters, and so does the way in which measures are developed and linked to the constructs. Although these two points may be considered obvious, they are too often ignored." p. 13 Denigrating Assessments and Accountability CSE Technical Report 490 (November 1998) http://www.cse.ucla.edu/products/Reports/TECH490.pdf Office of Research and Improvement, US Education Department  
259 Robert L. Linn   “Although that claim is subject to debate, it seldom even gets considered when aggregate results are used either to monitor progress (e.g., NAEP) or for purposes of school, district, or state accountability.” p. 16 Dismissive Assessments and Accountability CSE Technical Report 490 (November 1998) http://www.cse.ucla.edu/products/Reports/TECH490.pdf Office of Research and Improvement, US Education Department  
260 Lawrence O. Picus Alisha Tralli "What is surprising is, given the tremendous emphasis placed on assessment systems to measure school accountability, the relatively minuscule portion of educational expenditures devoted to this important and highly visible component of the educational system." p.66 Dismissive Alternative assessment programs: What are the true costs?  CSE Technical Report 441, February 1998 https://cresst.org/publications/cresst-publication-2813/?_sf_s=441 Office of Research and Improvement, US Education Department The taxpayers ponied up big time to fund the GAO study, which Picus has spent his whole career misrepresenting, demeaning, or dismissing. By 1998, it is simply not believable that his continuing efforts stem from honest misunderstanding. He is deliberately misrepresenting previous research on the topic in order to advance his own work and career. 
261 Lawrence O. Picus Alisha Tralli "In all of these analyses, except the GAO report, the cost estimates are based on the direct costs of the assessment program. The GAO is the only other organization we are aware of that has attempted to estimate the opportunity costs of personnel time, in attempting to determine the full costs of assessment programs. The GAO study, however, did not focus specifically on state assessment programs that included portfolios, an important factor in the higher cost estimates identified in the present study." p.64 Denigrating Alternative assessment programs: What are the true costs?  CSE Technical Report 441, February 1998 https://cresst.org/publications/cresst-publication-2813/?_sf_s=441 Office of Research and Improvement, US Education Department The previous 63 pages of the Picus and Tralli report claimed: theirs was the first study to look at opportunity costs and all previous studies were "just expenditure studies" that ignored "true" opportunity costs. Then, here, on page 64, they finally admit something a bit truthful about the earlier and vastly better GAO report, but also immediately attempt to demain it, because it did not estimate the costs of Vermont's doomed portfolio program, which did not exist when the GAO did its study.
262 Lawrence O. Picus Alisha Tralli "Costs and expenditures are not synonymous terms. Monk (1995) distinguishes between these two terms. Costs are “measures of what must be foregone to realize some benefit,” while expenditures are “measures of resource flows regardless of their consequence” (p. 365). Expenditures are generally easier to track since accounting systems typically report resource flows by object, e.g., instruction, administration, transportation. Typically, most cost analyses in education focus on these measurable expenditures and ignore the more difficult measures of opportunity. The goal of this report is to move one step beyond past work and estimate these economic costs as well." p.5 Denigrating Alternative assessment programs: What are the true costs?  CSE Technical Report 441, February 1998 https://cresst.org/publications/cresst-publication-2813/?_sf_s=441 Office of Research and Improvement, US Education Department No. Picus & Tralli neither did the first study of opportunity costs, nor the first study of opportunity costs in those two states. The 1993 GAO study did both. See, for example, Phelps, R.P. (2000, Winter). Estimating the cost of systemwide student testing in the United States. Journal of Education Finance, 25(3) 343–380; Danitz, T. (2001, February 27). Special report: States pay $400 million for tests in 2001. Stateline.org. Pew Center for the States; Hoxby, C.M. (2002). The cost of accountability, in W. M Evers & H.J. Walberg (Eds.), School Accountability, Stanford, CA: Hoover Institution Press; U.S. GAO. (1993, January). Student testing: Current extent and expenditures, with cost estimates for a national examination. GAO/PEMD-93-8. Washington, DC: US General Accounting Office; Phelps, R.P. (1998). Benefit-cost analysis of systemwide student testing, Paper presented at the annual meeting of the American Education Finance Association, Mobile, AL.
263 Lawrence O. Picus Alisha Tralli "Although several states have implemented new assessment programs, there has been little research on the costs of developing and implementing these new systems." p.4 Dismissive Alternative assessment programs: What are the true costs?  CSE Technical Report 441, February 1998 https://cresst.org/publications/cresst-publication-2813/?_sf_s=441 Office of Research and Improvement, US Education Department No. Picus & Tralli neither did the first study of opportunity costs, nor the first study of opportunity costs in those two states. The 1993 GAO study did both. See, for example, Phelps, R.P. (2000, Winter). Estimating the cost of systemwide student testing in the United States. Journal of Education Finance, 25(3) 343–380; Danitz, T. (2001, February 27). Special report: States pay $400 million for tests in 2001. Stateline.org. Pew Center for the States; Hoxby, C.M. (2002). The cost of accountability, in W. M Evers & H.J. Walberg (Eds.), School Accountability, Stanford, CA: Hoover Institution Press; U.S. GAO. (1993, January). Student testing: Current extent and expenditures, with cost estimates for a national examination. GAO/PEMD-93-8. Washington, DC: US General Accounting Office; Phelps, R.P. (1998). Benefit-cost analysis of systemwide student testing, Paper presented at the annual meeting of the American Education Finance Association, Mobile, AL.
264 Lawrence O. Picus Alisha Tralli "The purpose of this report is to provide a first detailed analysis of the “economic” or opportunity costs of the testing systems in two states, Kentucky and Vermont." p.2 1stness Alternative assessment programs: What are the true costs?  CSE Technical Report 441, February 1998 https://cresst.org/publications/cresst-publication-2813/?_sf_s=441 Office of Research and Improvement, US Education Department No. Picus & Tralli neither did the first study of opportunity costs, nor the first study of opportunity costs in those two states. The 1993 GAO study did both. See, for example, Phelps, R.P. (2000, Winter). Estimating the cost of systemwide student testing in the United States. Journal of Education Finance, 25(3) 343–380; Danitz, T. (2001, February 27). Special report: States pay $400 million for tests in 2001. Stateline.org. Pew Center for the States; Hoxby, C.M. (2002). The cost of accountability, in W. M Evers & H.J. Walberg (Eds.), School Accountability, Stanford, CA: Hoover Institution Press; U.S. GAO. (1993, January). Student testing: Current extent and expenditures, with cost estimates for a national examination. GAO/PEMD-93-8. Washington, DC: US General Accounting Office; Phelps, R.P. (1998). Benefit-cost analysis of systemwide student testing, Paper presented at the annual meeting of the American Education Finance Association, Mobile, AL.
265 Anne Lewis quoting Arnold Fege, National PTA "The national testing proposal is based on 'quantum leap' theories, not on research, contended Arnold Fege of the National PTA. 'As I listened to the presentations this morning,’ he said, ‘I didn't hear about any research that backs up the introduction of national testing.’ In his opinion, ‘no parent in the country is losing sleep because his or her child is not meeting NAEP standards,’ and even though testing is pervasive in American education, it seems to not have made a big impact on change." Dismissive Assessing Student Achievement: Search for Validity and Balance CSE Technical Report 481 (1997) https://cresst.org/wp-content/uploads/TECH481.pdf Office of Research and Improvement, US Education Department In their 2009 Evaluation of NAEP for the US Education Department, Buckendahl, Davis, Plake, Sireci, Hambleton, Zenisky, & Wells (pp. 77–85) managed to find quite a lot of research on making comparisons between NAEP and state assessments: several of NAEP's own publications, Chromy 2005), Chromy, Ault, Black, & Mosquin (2007), McLaughlin (2000), Schuiz & Mitzel (2005), Sireci, Robin, Meara, Rogers, & Swaminathan (2000),  Stancavage, Et al (2002),  Stoneberg (2007), WestEd (2002), and Wise, Le, Hoffman, & Becker (2004). 
266 Eva L. Baker Zenaida Aguirre-Munoz "The extent and nature of the impact of language skills on performance assessments remains elusive due to the paucity of research in this area." Dismissive Improving the equity and validity of assessment-based information systems, p.3 CSE Technical Report 462, December 1997 https://cresst.org/wp-content/uploads/TECH462.pdf Office of Research and Improvement, US Education Department  
267 Joan L. Herman   "Although conceptual models for analyzing the cost of alternative assessment and for conducting cost-benefit analyses have been formulated (Catterall & Winters, 1994; Picus, 1994), definitive cost studies are yet to be completed (see, however, Picus & Tralli, forthcoming)." p. 30 Dismissive, Denigrating Large-Scale Assessment in Support of School Reform: Lessons in the Search for Alternative Measures CSE Technical Report 446, Oct. 1997 http://www.cse.ucla.edu/products/reports/TECH446.pdf Office of Research and Improvement, US Education Department No. See, for example, Phelps, R.P. (2000, Winter). Estimating the cost of systemwide student testing in the United States. Journal of Education Finance, 25(3) 343–380; Danitz, T. (2001, February 27). Special report: States pay $400 million for tests in 2001. Stateline.org. Pew Center for the States; Hoxby, C.M. (2002). The cost of accountability, in W. M Evers & H.J. Walberg (Eds.), School Accountability, Stanford, CA: Hoover Institution Press; U.S. GAO. (1993, January). Student testing: Current extent and expenditures, with cost estimates for a national examination. GAO/PEMD-93-8. Washington, DC: US General Accounting Office; Phelps, R.P. (1998). Benefit-cost analysis of systemwide student testing, Paper presented at the annual meeting of the American Education Finance Association, Mobile, AL.
268 Robert L. Linn Eva L. Baker "“Very little research has been conducted to validate performance standards, particularly those that include specification of student response attributes.” pp. 26-27 Dismissive Emerging Educational Standards of Performance in the United States CSE Technical Report 437 (August 1997) http://www.cse.ucla.edu/products/reports/TECH437.pdf Office of Research and Improvement, US Education Department Relevant pre-2000 studies of the effects of standards, alignment, goal setting, setting reachable goals, etc. include those of Mitchell (1999); Morgan & Ramist (1998); the *Southern Regional Education Board (1998); Miles, Bishop, Collins, Fink, Gardner, Grant, Hussain, et al. (1997); the Florida Office of Program Policy Analysis (1997); Pomplun (1997); Schmoker (1996); Aguilera & Hendricks (1996); Banta, Lund, Black & Oblander (1996); Bottoms & Mikos (1995); *Bamburg & Medina (1993); Bishop (1993); the U. S. General Accounting Office (1993); Eckstein & Noah (1993); Mattsson (1993); Brown (1992); Heyneman & Ransom (1992); Whetton (1992); Anderson, Muir, Bateson, Blackmore & Rogers (1990); Csikszentmihalyi (1990); *Levine & Lezotte (1990); LaRoque & Coleman (1989); Hillocks (1987); Willingham & Morris (1986); Resnick & Resnick (1985); Ogle & Fritts (1984); *Natriello & Dornbusch (1984); Brooke & Oxenham (1984); Rentz (1979); Wellisch, MacQueen, Carriere & Dick (1978); *Rosswork (1977); Estes, Colvin & Goodwin (1976); Wood (1953); and Panlasigui & Knight (1930).
269 Harold F. O'Neil, Jr. Brenda Sugrue, Jamal Abedi, Eva L. Baker, Shari Golan "However, as d'Ydewalle (1987) has pointed out, 'clear-cut results from neat experiments on the impact of motivation on learning [or performance] do not exist.'" Dismissive Final Report of Experimental Studies on Motivation and NAEP Test Performance, p.5 CSE Technical Report 427, June 1997 https://cresst.org/wp-content/uploads/TECH427.pdf Office of Research and Improvement, US Education Department Relevant studies of the effects of tests and/or accountability program on motivation and instructional practice include those of the *Southern Regional Education Board (1998); Johnson (1998); Schafer, Hultgren, Hawley, Abrams Seubert & Mazzoni (1997); Miles, Bishop, Collins, Fink, Gardner, Grant, Hussain, et al. (1997); Tuckman & Trimble (1997); Clarke & Stephens (1996); Zigarelli (1996); Stevenson, Lee, et al. (1995); Waters, Burger & Burger (1995); Egeland (1995); Prais (1995); Tuckman (1994); Ritchie & Thorkildsen (1994); Brown & Walberg, (1993); Wall & Alderson (1993); Wolf & Rapiau (1993); Eckstein & Noah (1993); Chao-Qun & Hui (1993); Plazak & Mazur (1992); Steedman (1992); Singh, Marimutha & Mukjerjee (1990); *Levine & Lezotte (1990); O’Sullivan (1989); Somerset (1988); Pennycuick & Murphy (1988); Stevens (1984); Marsh (1984); Brunton (1982); Solberg (1977); Foss (1977); *Kirkland (1971); Somerset (1968); Stuit (1947); and Keys (1934).  *Covers many studies; study is a research review, research synthesis, or meta-analysis. "Others have considered the role of tests in incentive programs.  These researchers have included Homme, Csanyi, Gonzales, Rechs, O’Leary, Drabman, Kaszdin, Bootzin, Staats, Cameron, Pierce, McMillan, Corcoran, and Wilson. International organizations, such as the World Bank or the Asian Development Bank, have studied the effects of testing on education programs they sponsor.  Researchers have included Somerset, Heynemann, Ransom, Psacharopoulis, Velez, Brooke, Oxenham, Bude, Chapman, Snyder, and Pronaratna.
Moreover, the mastery learning/mastery testing experiments conducted from the 1960s through today varied incentives, frequency of tests, types of tests, and many other factors to determine the optimal structure of testing programs. Researchers included such notables as Bloom, Carroll, Keller, Block, Burns, Wentling, Anderson, Hymel, Kulik, Tierney, Cross, Okey, Guskey, Gates, and Jones."
270 Harold F. O'Neil, Jr. Brenda Sugrue, Jamal Abedi, Eva L. Baker, Shari Golan "In the educational context, most existing studies have focused on the influence of characteristics of the classroom learning environment, such as rewards, teacher feedback, goal structures, evaluation practices, on either the entecedents of consequences of motivation." Dismissive Final Report of Experimental Studies on Motivation and NAEP Test Performance, p.5 CSE Technical Report 427, June 1997 https://cresst.org/wp-content/uploads/TECH427.pdf Office of Research and Improvement, US Education Department Relevant studies of the effects of tests and/or accountability program on motivation and instructional practice include those of the *Southern Regional Education Board (1998); Johnson (1998); Schafer, Hultgren, Hawley, Abrams Seubert & Mazzoni (1997); Miles, Bishop, Collins, Fink, Gardner, Grant, Hussain, et al. (1997); Tuckman & Trimble (1997); Clarke & Stephens (1996); Zigarelli (1996); Stevenson, Lee, et al. (1995); Waters, Burger & Burger (1995); Egeland (1995); Prais (1995); Tuckman (1994); Ritchie & Thorkildsen (1994); Brown & Walberg, (1993); Wall & Alderson (1993); Wolf & Rapiau (1993); Eckstein & Noah (1993); Chao-Qun & Hui (1993); Plazak & Mazur (1992); Steedman (1992); Singh, Marimutha & Mukjerjee (1990); *Levine & Lezotte (1990); O’Sullivan (1989); Somerset (1988); Pennycuick & Murphy (1988); Stevens (1984); Marsh (1984); Brunton (1982); Solberg (1977); Foss (1977); *Kirkland (1971); Somerset (1968); Stuit (1947); and Keys (1934).  *Covers many studies; study is a research review, research synthesis, or meta-analysis. "Others have considered the role of tests in incentive programs.  These researchers have included Homme, Csanyi, Gonzales, Rechs, O’Leary, Drabman, Kaszdin, Bootzin, Staats, Cameron, Pierce, McMillan, Corcoran, and Wilson. International organizations, such as the World Bank or the Asian Development Bank, have studied the effects of testing on education programs they sponsor.  Researchers have included Somerset, Heynemann, Ransom, Psacharopoulis, Velez, Brooke, Oxenham, Bude, Chapman, Snyder, and Pronaratna.
Moreover, the mastery learning/mastery testing experiments conducted from the 1960s through today varied incentives, frequency of tests, types of tests, and many other factors to determine the optimal structure of testing programs. Researchers included such notables as Bloom, Carroll, Keller, Block, Burns, Wentling, Anderson, Hymel, Kulik, Tierney, Cross, Okey, Guskey, Gates, and Jones."
271 Harold F. O'Neil, Jr. Brenda Sugrue, Jamal Abedi, Eva L. Baker, Shari Golan "Most of the studies that have compared goal orientations have examined their effects on performance during classroom learning activities rather than at the time of test taking." Dismissive Final Report of Experimental Studies on Motivation and NAEP Test Performance, p.7 CSE Technical Report 427, June 1997 https://cresst.org/wp-content/uploads/TECH427.pdf Office of Research and Improvement, US Education Department Relevant studies of the effects of tests and/or accountability program on motivation and instructional practice include those of the *Southern Regional Education Board (1998); Johnson (1998); Schafer, Hultgren, Hawley, Abrams Seubert & Mazzoni (1997); Miles, Bishop, Collins, Fink, Gardner, Grant, Hussain, et al. (1997); Tuckman & Trimble (1997); Clarke & Stephens (1996); Zigarelli (1996); Stevenson, Lee, et al. (1995); Waters, Burger & Burger (1995); Egeland (1995); Prais (1995); Tuckman (1994); Ritchie & Thorkildsen (1994); Brown & Walberg, (1993); Wall & Alderson (1993); Wolf & Rapiau (1993); Eckstein & Noah (1993); Chao-Qun & Hui (1993); Plazak & Mazur (1992); Steedman (1992); Singh, Marimutha & Mukjerjee (1990); *Levine & Lezotte (1990); O’Sullivan (1989); Somerset (1988); Pennycuick & Murphy (1988); Stevens (1984); Marsh (1984); Brunton (1982); Solberg (1977); Foss (1977); *Kirkland (1971); Somerset (1968); Stuit (1947); and Keys (1934).  *Covers many studies; study is a research review, research synthesis, or meta-analysis. "Others have considered the role of tests in incentive programs.  These researchers have included Homme, Csanyi, Gonzales, Rechs, O’Leary, Drabman, Kaszdin, Bootzin, Staats, Cameron, Pierce, McMillan, Corcoran, and Wilson. International organizations, such as the World Bank or the Asian Development Bank, have studied the effects of testing on education programs they sponsor.  Researchers have included Somerset, Heynemann, Ransom, Psacharopoulis, Velez, Brooke, Oxenham, Bude, Chapman, Snyder, and Pronaratna.
Moreover, the mastery learning/mastery testing experiments conducted from the 1960s through today varied incentives, frequency of tests, types of tests, and many other factors to determine the optimal structure of testing programs. Researchers included such notables as Bloom, Carroll, Keller, Block, Burns, Wentling, Anderson, Hymel, Kulik, Tierney, Cross, Okey, Guskey, Gates, and Jones."
272 Harold F. O'Neil, Jr. Brenda Sugrue, Jamal Abedi, Eva L. Baker, Shari Golan "As yet, there appear to be no published studies that investigate the direct and indirect causal paths from motivational antecedents through use of metacognitive strategies to achievement."  Dismissive Final Report of Experimental Studies on Motivation and NAEP Test Performance, p.8 CSE Technical Report 427, June 1997 https://cresst.org/wp-content/uploads/TECH427.pdf Office of Research and Improvement, US Education Department Relevant studies of the effects of tests and/or accountability program on motivation and instructional practice include those of the *Southern Regional Education Board (1998); Johnson (1998); Schafer, Hultgren, Hawley, Abrams Seubert & Mazzoni (1997); Miles, Bishop, Collins, Fink, Gardner, Grant, Hussain, et al. (1997); Tuckman & Trimble (1997); Clarke & Stephens (1996); Zigarelli (1996); Stevenson, Lee, et al. (1995); Waters, Burger & Burger (1995); Egeland (1995); Prais (1995); Tuckman (1994); Ritchie & Thorkildsen (1994); Brown & Walberg, (1993); Wall & Alderson (1993); Wolf & Rapiau (1993); Eckstein & Noah (1993); Chao-Qun & Hui (1993); Plazak & Mazur (1992); Steedman (1992); Singh, Marimutha & Mukjerjee (1990); *Levine & Lezotte (1990); O’Sullivan (1989); Somerset (1988); Pennycuick & Murphy (1988); Stevens (1984); Marsh (1984); Brunton (1982); Solberg (1977); Foss (1977); *Kirkland (1971); Somerset (1968); Stuit (1947); and Keys (1934).  *Covers many studies; study is a research review, research synthesis, or meta-analysis. "Others have considered the role of tests in incentive programs.  These researchers have included Homme, Csanyi, Gonzales, Rechs, O’Leary, Drabman, Kaszdin, Bootzin, Staats, Cameron, Pierce, McMillan, Corcoran, and Wilson. International organizations, such as the World Bank or the Asian Development Bank, have studied the effects of testing on education programs they sponsor.  Researchers have included Somerset, Heynemann, Ransom, Psacharopoulis, Velez, Brooke, Oxenham, Bude, Chapman, Snyder, and Pronaratna.
Moreover, the mastery learning/mastery testing experiments conducted from the 1960s through today varied incentives, frequency of tests, types of tests, and many other factors to determine the optimal structure of testing programs. Researchers included such notables as Bloom, Carroll, Keller, Block, Burns, Wentling, Anderson, Hymel, Kulik, Tierney, Cross, Okey, Guskey, Gates, and Jones."
273 Harold F. O'Neil, Jr. Brenda Sugrue, Jamal Abedi, Eva L. Baker, Shari Golan "In general, there is a need for more studies to focus on the effects on test performance of motivational antecedents (not just anxiety) introduced at the time of test taking." Dismissive Final Report of Experimental Studies on Motivation and NAEP Test Performance, p.10 CSE Technical Report 427, June 1997 https://cresst.org/wp-content/uploads/TECH427.pdf Office of Research and Improvement, US Education Department Relevant studies of the effects of tests and/or accountability program on motivation and instructional practice include those of the *Southern Regional Education Board (1998); Johnson (1998); Schafer, Hultgren, Hawley, Abrams Seubert & Mazzoni (1997); Miles, Bishop, Collins, Fink, Gardner, Grant, Hussain, et al. (1997); Tuckman & Trimble (1997); Clarke & Stephens (1996); Zigarelli (1996); Stevenson, Lee, et al. (1995); Waters, Burger & Burger (1995); Egeland (1995); Prais (1995); Tuckman (1994); Ritchie & Thorkildsen (1994); Brown & Walberg, (1993); Wall & Alderson (1993); Wolf & Rapiau (1993); Eckstein & Noah (1993); Chao-Qun & Hui (1993); Plazak & Mazur (1992); Steedman (1992); Singh, Marimutha & Mukjerjee (1990); *Levine & Lezotte (1990); O’Sullivan (1989); Somerset (1988); Pennycuick & Murphy (1988); Stevens (1984); Marsh (1984); Brunton (1982); Solberg (1977); Foss (1977); *Kirkland (1971); Somerset (1968); Stuit (1947); and Keys (1934).  *Covers many studies; study is a research review, research synthesis, or meta-analysis. "Others have considered the role of tests in incentive programs.  These researchers have included Homme, Csanyi, Gonzales, Rechs, O’Leary, Drabman, Kaszdin, Bootzin, Staats, Cameron, Pierce, McMillan, Corcoran, and Wilson. International organizations, such as the World Bank or the Asian Development Bank, have studied the effects of testing on education programs they sponsor.  Researchers have included Somerset, Heynemann, Ransom, Psacharopoulis, Velez, Brooke, Oxenham, Bude, Chapman, Snyder, and Pronaratna.
Moreover, the mastery learning/mastery testing experiments conducted from the 1960s through today varied incentives, frequency of tests, types of tests, and many other factors to determine the optimal structure of testing programs. Researchers included such notables as Bloom, Carroll, Keller, Block, Burns, Wentling, Anderson, Hymel, Kulik, Tierney, Cross, Okey, Guskey, Gates, and Jones."
274 Brian M. Stecher Stephen P. Klein "In constrast, relatively little has been published on the costs of such measures [performance tests] in operational programs. An Office of Technology Assessment (1992) … (Hoover and Bray) …." Dismissive The Cost of Science Performance Assessments in Large-Scale Testing Programs, p.1 Educational Evaluation and Policy Analysis, Spring 1997, 19(1)   "This article is based on work supported by the National Science Foundation under Grant No. MDR-9154406." p.12 The January 1993 GAO report on testing costs included such information. CRESST has spent a quarter century denigrating that report.
275 Brian M. Stecher Stephen P. Klein "However, empirical and observational data suggest much more needs to be done to understand what hands-on tasks actually measure. Klein et al. (1996b) … Shavelson et al. (1992) … Hamilton (1994) …." pp.9-10 Dismissive The Cost of Science Performance Assessments in Large-Scale Testing Programs, p.1 Educational Evaluation and Policy Analysis, Spring 1997, 19(1)   "This article is based on work supported by the National Science Foundation under Grant No. MDR-9154406." p.12 Article references only works by other CRESST authors and completely ignores the career-tech education literature, where such studies are most likely to be found.
276 Brian M. Stecher Stephen P. Klein "Future research will no doubt shed more light on the validity question, but for now, it is not clear how scores on hands-on performance tasks should be interpreted." p.10 Dismissive The Cost of Science Performance Assessments in Large-Scale Testing Programs, p.1 Educational Evaluation and Policy Analysis, Spring 1997, 19(1)   "This article is based on work supported by the National Science Foundation under Grant No. MDR-9154406." p.12 Article references only works by other CRESST authors and completely ignores the career-tech education literature, where such studies are most likely to be found.
277 Brian M. Stecher Stephen P. Klein "Advocates of performance assessment believe that the use of these measures will reinforce efforts to reform curriculum and instruction. … Unfortunately, there is very little research to confirm either the existence or the size of most off these potential benefits. Those few studies ... Klein (1995) ... Javonovic, Solanno-Flores, & Shavelson, 1994; Klein et al., 1996a)." p.10 Dismissive The Cost of Science Performance Assessments in Large-Scale Testing Programs, p.1 Educational Evaluation and Policy Analysis, Spring 1997, 19(1)   "This article is based on work supported by the National Science Foundation under Grant No. MDR-9154406." p.12 Article references only works by other CRESST authors and completely ignores the career-tech education literature, where such studies are most likely to be found.
278 Mary Lee Smith 11 others "The purpose of the research described in this report is to understand what happens in the aftermath of a change in state assessment policy that is designed to improve schools and make them more accountable to a set of common standards. Although theoretical and rhetorical works about this issue are common in the literature, empirical evidence is novel and scant." Dismissive Reforming schools by reforming assessment: Consequences of the Arizona Student Assessment Program (ASAP): Equity and teacher capacity building, p.3 CSE Technical Report 425, March 1997 https://cresst.org/wp-content/uploads/TECH425.pdf Office of Research and Improvement, US Education Department Relevant pre-2000 studies of the effects of standards, alignment, goal setting, setting reachable goals, etc. include those of Mitchell (1999); Morgan & Ramist (1998); the *Southern Regional Education Board (1998); Miles, Bishop, Collins, Fink, Gardner, Grant, Hussain, et al. (1997); the Florida Office of Program Policy Analysis (1997); Pomplun (1997); Schmoker (1996); Aguilera & Hendricks (1996); Banta, Lund, Black & Oblander (1996); Bottoms & Mikos (1995); *Bamburg & Medina (1993); Bishop (1993); the U. S. General Accounting Office (1993); Eckstein & Noah (1993); Mattsson (1993); Brown (1992); Heyneman & Ransom (1992); Whetton (1992); Anderson, Muir, Bateson, Blackmore & Rogers (1990); Csikszentmihalyi (1990); *Levine & Lezotte (1990); LaRoque & Coleman (1989); Hillocks (1987); Willingham & Morris (1986); Resnick & Resnick (1985); Ogle & Fritts (1984); *Natriello & Dornbusch (1984); Brooke & Oxenham (1984); Rentz (1979); Wellisch, MacQueen, Carriere & Dick (1978); *Rosswork (1977); Estes, Colvin & Goodwin (1976); Wood (1953); and Panlasigui & Knight (1930).
279 Robert L. Linn Joan L. Herman "How much do standards-led assessments costs? Dependable estimates are difficult to obtain, in part because many of the costs associated with assessment -- the time spent by teachers in preparation, administration, and scoring -- are typically absorbed by schools' normal operations and not prices in a separate budget." p.14 Denigrating A Policymaker's Guide to Standards-Led Assessment Education Commission of the States, February, 1997     The January 1993 GAO report on testing costs included such information. CRESST has spent a quarter century denigrating that report. See, for example, Phelps, R.P. (2000, Winter). Estimating the cost of systemwide student testing in the United States. Journal of Education Finance, 25(3) 343–380; Danitz, T. (2001, February 27). Special report: States pay $400 million for tests in 2001. Stateline.org. Pew Center for the States; Hoxby, C.M. (2002). The cost of accountability, in W. M Evers & H.J. Walberg (Eds.), School Accountability, Stanford, CA: Hoover Institution Press; U.S. GAO. (1993, January). Student testing: Current extent and expenditures, with cost estimates for a national examination. GAO/PEMD-93-8. Washington, DC: US General Accounting Office; Phelps, R.P. (1998). Benefit-cost analysis of systemwide student testing, Paper presented at the annual meeting of the American Education Finance Association, Mobile, AL.
280 Robert L. Linn Joan L. Herman "None of the above estimates includes operational costs for schools, districts, or states." p.14 Denigrating A Policymaker's Guide to Standards-Led Assessment Education Commission of the States, February, 1997     The January 1993 GAO report on testing costs included such information. CRESST has spent a quarter century denigrating that report.
281 Eva L. Baker Robert L. Linn, Joan L. Herman "How do we assure accurate placement of students with varying abilities and language capabilities? There is little research to date to guide policy and practice (August, et al., 1994)." Dismissive CRESST: A Continuing Mission to Improve Educational Assessment, p.12 Evaluation Comment, Summer 1996   Office of Research and Improvement, US Education Department  
282 Eva L. Baker Robert L. Linn, Joan L. Herman "Alternative assessments are needed for these students (see Kentucky Portfolios for Special Education, Kentucky Department of Education, 1995). Although promising, there has been little or no research investigating the validity of inferences from these adaptations or alternatives." Dismissive CRESST: A Continuing Mission to Improve Educational Assessment, p.13 Evaluation Comment, Summer 1996   Office of Research and Improvement, US Education Department  
283 Eva L. Baker Robert L. Linn, Joan L. Herman "Similarly, research is needed to provide a basis for understanding the implications of using different summaries of student performance, such as group means or percentage of students meeting a standard, for measuring progress." p.15 Dismissive CRESST: A Continuing Mission to Improve Educational Assessment Evaluation Comment, Summer 1996   Office of Research and Improvement, US Education Department  
284 Eva L. Baker Harold F O'Neil, Jr "Few research findings exist about the performance of ethnically different groups of students on performance-based assessment in its present form."" p.193 Dismissive Chapter 10 in Implementing Performance Assessment: Promises, Problems, and Challenges Lawrence Erlbaum Associates Publishers, 1996   Office of Research and Improvement, US Education Department  
285 Eva L. Baker Harold F O'Neil, Jr "The authors have not been able to find studies of the interaction of rters and student ethnicities in educational settings." p.193 Dismissive Chapter 10 in Implementing Performance Assessment: Promises, Problems, and Challenges Lawrence Erlbaum Associates Publishers, 1996   Office of Research and Improvement, US Education Department  
286 Robert L. Linn Daniel M. Koretz, Eva Baker “’Yet we do not have the necessary comprehensive dependable data. . . .’ (Tyler 1996a, p. 95)” p. 8 Dismissive Assessing the Validity of the National Assessment of Educational Progress CSE Technical Report 416 (June 1996) http://www.cse.ucla.edu/products/reports/TECH416.pdf Office of Research and Improvement, US Education Department In their 2009 Evaluation of NAEP for the US Education Department, Buckendahl, Davis, Plake, Sireci, Hambleton, Zenisky, & Wells (pp. 77–85) managed to find quite a lot of research on making comparisons between NAEP and state assessments: several of NAEP's own publications, Chromy 2005), Chromy, Ault, Black, & Mosquin (2007), McLaughlin (2000), Schuiz & Mitzel (2005), Sireci, Robin, Meara, Rogers, & Swaminathan (2000),  Stancavage, Et al (2002),  Stoneberg (2007), WestEd (2002), and Wise, Le, Hoffman, & Becker (2004). 
287 Robert L. Linn Daniel M. Koretz, Eva Baker "“There is a need for more extended discussion and reconsideration of the approach being used to measure long-term trends.” p. 21  Dismissive Assessing the Validity of the National Assessment of Educational Progress CSE Technical Report 416 (June 1996) http://www.cse.ucla.edu/products/reports/TECH416.pdf Office of Research and Improvement, US Education Department There was extended discussion and cosideration. Simply put, they did not get their way because others disagreed with them.
288 Robert L. Linn Daniel M. Koretz, Eva Baker "“Only a small minority of the articles that discussed achievement levels made any mention of the judgmental nature of the levels, and most of those did so only briefly.” p. 27 Denigrating Assessing the Validity of the National Assessment of Educational Progress CSE Technical Report 416 (June 1996) http://www.cse.ucla.edu/products/reports/TECH416.pdf Office of Research and Improvement, US Education Department All achievement levels, just like all course grades, are set subjectively. This information was never hidden.
289 Thomas Kellaghan George F. Madaus, Anastasia Raczek "The limited evidence on the effectiveness of external, or extrinsic, rewards in education is also reviewed." p.vii Dismissive The Use of External Examinations to Improve Student Motication American Educational Research Association monograph   "Work on this monograph was supported by Grant 910-1205-1 from the Ford Foundation." See, for example:  https://www.tandfonline.com/doi/full/10.1080/15305058.2011.602920 ; https://nonpartisaneducation.org/Review/Resources/QuantitativeList.htm .  This list includes 24 studies completed before 2000 whose primary focus was to measure the effect of “test-based accountability.” A few dozen more pre-2000 studies also measured the effect of test-based accountability although such was not their primary focus. Include qualitative and program evaluation studies of test-based accountability, and the count of pre-2000 studies rises into the hundreds. 
290 Lawrence O. Picus Alisha Tralli, Suzanne Tacheny "Although several states have implemented new assessment programs, there has been little research on the costs of developing and implementing these new systems." p.4 Dismissive Estimating the Costs of Student Assessment in North Carolina and Kentucky: A State-Level Analysis CSE Technical Report 408 (February 1996) http://www.cse.ucla.edu/products/reports/TECH408.pdf Office of Research and Improvement, US Education Department The January 1993 GAO report on testing costs included such information. Picus has spent over two decades denigrating that report, both directly ad by insinuation.
291 Lawrence O. Picus Alisha Tralli, Suzanne Tacheny "Although several states have implemenmted new assessment programs, there has been little research on the cost of developing and implementing these new systems." p.3 Dismissive Estimating the Costs of Student Assessment in North Carolina and Kentucky: A State-Level Analysis CSE Technical Report 408 (February 1996) http://www.cse.ucla.edu/products/reports/TECH408.pdf Office of Research and Improvement, US Education Department The January 1993 GAO report on testing costs included such information. Picus has spent over two decades denigrating that report, both directly ad by insinuation.
292 Thomas Kellaghan George F. Madaus, Anastasia Raczek "At the very least, a careful analysis of relecvant issues and a consideration of empirical evidence are required before reaching such a conclusion.   However, the arguments put forward by reformers are not based on such analysis or consideration. Indeed, their arguments often lack clarity, even in the terminology they use. Further, although not much research deals directly with the relationship between external examinations and motivation, ..." p.2 Dismissive, Denigrating The Use of External Examinations to Improve Student Motication American Educational Research Association monograph   "Work on this monograph was supported by Grant 910-1205-1 from the Ford Foundation." Relevant studies of the effects of tests and/or accountability program on motivation and instructional practice include those of the *Southern Regional Education Board (1998); Johnson (1998); Schafer, Hultgren, Hawley, Abrams Seubert & Mazzoni (1997); Miles, Bishop, Collins, Fink, Gardner, Grant, Hussain, et al. (1997); Tuckman & Trimble (1997); Clarke & Stephens (1996); Zigarelli (1996); Stevenson, Lee, et al. (1995); Waters, Burger & Burger (1995); Egeland (1995); Prais (1995); Tuckman (1994); Ritchie & Thorkildsen (1994); Brown & Walberg, (1993); Wall & Alderson (1993); Wolf & Rapiau (1993); Eckstein & Noah (1993); Chao-Qun & Hui (1993); Plazak & Mazur (1992); Steedman (1992); Singh, Marimutha & Mukjerjee (1990); *Levine & Lezotte (1990); O’Sullivan (1989); Somerset (1988); Pennycuick & Murphy (1988); Stevens (1984); Marsh (1984); Brunton (1982); Solberg (1977); Foss (1977); *Kirkland (1971); Somerset (1968); Stuit (1947); and Keys (1934).  *Covers many studies; study is a research review, research synthesis, or meta-analysis.
293 Thomas Kellaghan George F. Madaus, Anastasia Raczek "The final proposition in the armory of proponents of external examinations anticipates that all students at selected grades at both elementary and high school levels will take such examinations. This proposition is presumably based on the unexamined assumption that the motivational power of examinations will operate more or less the same way for students of all ages." p.10 Dismissive, Denigrating The Use of External Examinations to Improve Student Motication American Educational Research Association monograph   "Work on this monograph was supported by Grant 910-1205-1 from the Ford Foundation." Relevant studies of the effects of tests and/or accountability program on motivation and instructional practice include those of the *Southern Regional Education Board (1998); Johnson (1998); Schafer, Hultgren, Hawley, Abrams Seubert & Mazzoni (1997); Miles, Bishop, Collins, Fink, Gardner, Grant, Hussain, et al. (1997); Tuckman & Trimble (1997); Clarke & Stephens (1996); Zigarelli (1996); Stevenson, Lee, et al. (1995); Waters, Burger & Burger (1995); Egeland (1995); Prais (1995); Tuckman (1994); Ritchie & Thorkildsen (1994); Brown & Walberg, (1993); Wall & Alderson (1993); Wolf & Rapiau (1993); Eckstein & Noah (1993); Chao-Qun & Hui (1993); Plazak & Mazur (1992); Steedman (1992); Singh, Marimutha & Mukjerjee (1990); *Levine & Lezotte (1990); O’Sullivan (1989); Somerset (1988); Pennycuick & Murphy (1988); Stevens (1984); Marsh (1984); Brunton (1982); Solberg (1977); Foss (1977); *Kirkland (1971); Somerset (1968); Stuit (1947); and Keys (1934).  *Covers many studies; study is a research review, research synthesis, or meta-analysis.
294 Robert L. Linn Eva L. Baker "Although the connection between student achievement and economic competitiveness is not well established, exhortations for higher standards of student achievement nonetheless are frequently based on the assumption of a strong connection." Dismissive What Do International Assessments Imply for World-Class Standards? Educational Evaluation and Policy Analysis, Dec. 1, 1995 https://journals.sagepub.com/doi/abs/10.3102/01623737017004405 Office of Research and Improvement, US Education Department  
295 Robert Rothman   "Though Cannell's methods were flawed and he overstated his case, …" p.51 Dismissive, Denigrating Measuring Up: Standards, Assessment, and School Reform Jossey-Bass Publishers, 1995   "This book would not have come about without the support of two extraordinary groups of people, to whom I owe inclculable debt." CRESST, Dean Ted Mitchell, Director Eva Baker; Education Week, Editors Ron Wolk, Ginny Edwards. Also, Steve Ferrara, Chester Finn, Joan Herman, Laura Resnick  Rothman claims correctly there were likely multiple causes for test score inflation, including outdated norms and genuine improved student achievement. Then, he suggests that Cannell had insisted that there was only one cause--cheating. That is false. Cannell specifically acknowledged other possible causes. See https://eric.ed.gov/?q=Cannell&pg=2&id=ED314454
296 Robert Rothman   "To those familiar with testing—the finding—confirmed by a federally sponsored study by leading experts—pointed up many of the problems brought on by reliance on high-stakes testing. In any event, Cannell's small, crude study helped fuel a mounting criticism of the enterprise." p.52 Dismissive, Denigrating Measuring Up: Standards, Assessment, and School Reform Jossey-Bass Publishers, 1995   "This book would not have come about without the support of two extraordinary groups of people, to whom I owe inclculable debt." CRESST, Dean Ted Mitchell, Director Eva Baker; Education Week, Editors Ron Wolk, Ginny Edwards. Also, Steve Ferrara, Chester Finn, Joan Herman, Laura Resnick  Cannell surveyed education departments in all fifty states and, in states where districts made all the testing decisions, the larger districts within each state. He was unusually successful in retrieving responses, which required many hours and persistence. It was was an enormous undertaking, and very revealing. Most states and districts admitted that were not following many professional test security standards. See https://eric.ed.gov/?q=Cannell&pg=2&id=ED314454 
297 Robert Rothman   "And as a big man with a booming baritone voice, Cannell was able to make himself heard from statehouses to the corridors of the the U.S. Education Department." p.52 Denigrating Measuring Up: Standards, Assessment, and School Reform Jossey-Bass Publishers, 1995   "This book would not have come about without the support of two extraordinary groups of people, to whom I owe inclculable debt." CRESST, Dean Ted Mitchell, Director Eva Baker; Education Week, Editors Ron Wolk, Ginny Edwards. Also, Steve Ferrara, Chester Finn, Joan Herman, Laura Resnick  Cannell was exactly right. There was corruption, lax security, and cheating. See, for example, https://nonpartisaneducation.org/Review/Articles/v6n3.htm
298 Robert Rothman   "To Cannell, the high scores reflected flagrant cheating. … This charge lent an air of sensationalism to Cannell's already provacative findings and helped attract even more publicity for them. … Cannell began receiving letters from other teachers around the country confessing their own misdeeds or charging others with committing similar ones." p.56 Denigrating Measuring Up: Standards, Assessment, and School Reform Jossey-Bass Publishers, 1995   "This book would not have come about without the support of two extraordinary groups of people, to whom I owe inclculable debt." CRESST, Dean Ted Mitchell, Director Eva Baker; Education Week, Editors Ron Wolk, Ginny Edwards. Also, Steve Ferrara, Chester Finn, Joan Herman, Laura Resnick  Rothman claims correctly there were likely multiple causes for test score inflation, including outdated norms and genuine improved student achievement. Then, he suggests that Cannell had insisted that there was only one cause--cheating. That is false. Cannell specifically acknowledged other possible causes. See https://eric.ed.gov/?q=Cannell&pg=2&id=ED314454
299 Robert Rothman   "Despite those cses, there is little evidence that cheating is epidemic in schools or that such practices are the reason test scores have risen." p.57 Dismissive Measuring Up: Standards, Assessment, and School Reform Jossey-Bass Publishers, 1995   "This book would not have come about without the support of two extraordinary groups of people, to whom I owe inclculable debt." CRESST, Dean Ted Mitchell, Director Eva Baker; Education Week, Editors Ron Wolk, Ginny Edwards. Also, Steve Ferrara, Chester Finn, Joan Herman, Laura Resnick  Rotham cites one CRESST study. Meanwhile, Cannell surveyed all 50 states on their test security practices and found most lacking.
300 Robert Rothman   "Daniel M. Koretz and his colleagues (at CRESST) found that students performed much worse on tests they had not seen before than they did on the district's tests, even though the test measured the same general content and skills." p.62 Denigrating Measuring Up: Standards, Assessment, and School Reform Jossey-Bass Publishers, 1995   "This book would not have come about without the support of two extraordinary groups of people, to whom I owe inclculable debt." CRESST, Dean Ted Mitchell, Director Eva Baker; Education Week, Editors Ron Wolk, Ginny Edwards. Also, Steve Ferrara, Chester Finn, Joan Herman, Laura Resnick  The comparison test most likely did not measure the same content and skills, as it was a "competing test" in an era when national norm-references tests including widely varying content and sequencing of topics. Though, we cannot check, as Koretz has kept the identity of the tests and the schools secret.
301 Robert Rothman   "'Teachers have gotten the message loud and clear that they would be rated on how kids score on tests. That's all it takes. The problem is, it simply hasn't worked in raising performance. I don't know why we would want to try it again when it hasn't worked before." p.134 Denigrating Measuring Up: Standards, Assessment, and School Reform Jossey-Bass Publishers, 1995   "This book would not have come about without the support of two extraordinary groups of people, to whom I owe inclculable debt." CRESST, Dean Ted Mitchell, Director Eva Baker; Education Week, Editors Ron Wolk, Ginny Edwards. Also, Steve Ferrara, Chester Finn, Joan Herman, Laura Resnick  In fact, the evidence "that testing can improve education" is voluminous. See, for example, Phelps, R. P. (2005). The rich, robust research literature on testing’s achievement benefits. In R. P. Phelps (Ed.), Defending standardized testing (pp. 55–90). Mahwah, NJ: Psychology Press. Or, see https://journals.sagepub.com/doi/abs/10.1177/0193841X19865628#abstract
302 Robert Rothman   "Moreover, Madaus and Kellaghan found that almost no country tests students before age sixteen, and most use tests to select students for scarce slots in higher education and training programs." p.135 Dismissive Measuring Up: Standards, Assessment, and School Reform Jossey-Bass Publishers, 1995   "This book would not have come about without the support of two extraordinary groups of people, to whom I owe inclculable debt." CRESST, Dean Ted Mitchell, Director Eva Baker; Education Week, Editors Ron Wolk, Ginny Edwards. Also, Steve Ferrara, Chester Finn, Joan Herman, Laura Resnick  Madaus and Kellaghan did not "find" anything. They simply declared that such was a fact. It was not. See https://onlinelibrary.wiley.com/doi/abs/10.1111/j.1745-3992.2000.tb00018.x
303 Robert Rothman   "But scholars are just beginning to learn how the new instruments can be used to measure students' abilities." p.149 Dismissive Measuring Up: Standards, Assessment, and School Reform Jossey-Bass Publishers, 1995   "This book would not have come about without the support of two extraordinary groups of people, to whom I owe inclculable debt." CRESST, Dean Ted Mitchell, Director Eva Baker; Education Week, Editors Ron Wolk, Ginny Edwards. Also, Steve Ferrara, Chester Finn, Joan Herman, Laura Resnick  It is selected-response item formats (e.g., multiple choice) that are new. Open-ended item formats have been with us for millenia. And, thousands of research, evalution, and validity studies have been conducted on them.
304 Lawrence O. Picus   "While our understanding of how each of these assessment instruments can best be used is growing, information of their costs is virtually nonexistent." p.1 Dismissive A Conceptual Framework for Analyzing the Costs of Alternative Assessment CSE Technical Report 384 (August 1994) https://cresst.org/wp-content/uploads/TECH384.pdf Office of Research and Improvement, US Education Department The January 1993 GAO report on testing costs included such information. Picus has spent over two decades denigrating that report, both directly and by insinuation.
305 Lawrence O. Picus   "Research at the Center for Research on Evaluation, Standards, and Student Testing (CRESST) has found that policy makers have little information about the costs of alternative assessments, and that they are concerned abou the cost trade-offs involved in using alternative assessment compared to the many other activities they feel continue to be necessary." p.1 Dismissive A Conceptual Framework for Analyzing the Costs of Alternative Assessment CSE Technical Report 384 (August 1994) https://cresst.org/wp-content/uploads/TECH384.pdf Office of Research and Improvement, US Education Department The January 1993 GAO report on testing costs included such information. Picus has spent over two decades denigrating that report, both directly and by insinuation.
306 Lawrence O. Picus   "A number of important issues must be resolved before accurate estimates of costs can be developed. Central among those issues is the development of a clear definition of what constitutes a cost." p.1 Denigrating A Conceptual Framework for Analyzing the Costs of Alternative Assessment CSE Technical Report 384 (August 1994) https://cresst.org/wp-content/uploads/TECH384.pdf Office of Research and Improvement, US Education Department The January 1993 GAO report on testing costs included such information. Picus has spent over two decades denigrating that report, both directly and by insinuation.
307 Lawrence O. Picus   "Determining the resources necessary to achieve each of these goals is, at best, a difficult task. Because of this difficulty, many analysts stop short of estimating the true cost of a program, and instead focus on the expenditures required for its implementation." pp.3-4 Denigrating A Conceptual Framework for Analyzing the Costs of Alternative Assessment CSE Technical Report 384 (August 1994) https://cresst.org/wp-content/uploads/TECH384.pdf Office of Research and Improvement, US Education Department The January 1993 GAO report on testing costs included such information. Picus has spent over two decades denigrating that report, both directly and by insinuation.
308 Lawrence O. Picus   "… cost analysts in education have often resorted to estimating the monetary value of the resources devoted to the program being evaluated. ... However, it is important to remember the opportunity costs that result from time commitments of individuals not directly compensated through the assessment program, such as the teachers who are required to spend time on tasks that previously did not exist or were not their responsibility. Determining the value of these opportunity costs will improve the quality of educational cost analyses dramatically." p.33 Denigrating A Conceptual Framework for Analyzing the Costs of Alternative Assessment CSE Technical Report 384 (August 1994) https://cresst.org/wp-content/uploads/TECH384.pdf Office of Research and Improvement, US Education Department The January 1993 GAO report on testing costs included such information. Picus has spent over two decades denigrating that report, both directly and by insinuation.
309 Mary Lee Smith 5 others "This study also draws on previous research on the role of mandated testing. …The question unanswered by extant research is whether assessments that differ in form from the traditional, norm- or criterion-referenced standardized tests would produce similar reactions and effects." Dismissive What Happens When the Test Mandate Changes? Results of a Multiple Case Study CSE Technical Report 380, July 1994 https://cresst.org/wp-content/uploads/TECH380.pdf Office of Research and Improvement, US Education Department Emerging? It is selected-response item formats (e.g., multiple choice) that are new. Open-ended item formats have been with us for millenia. And, thousands of research, evalution, and validity studies have been conducted on them.
310 Linn, R.L.   "Evidence is also needed that the uses and interpretations are contributing to enhanced student achievement and at the same time, not producing unintended negative outcomes." p.8   Performance Assessment: Policy promises and technical measurement standards.  Educational Researcher, 23(9), 4-14, 1994 As quoted in William A. Mehrens, Consequences of Assessment: What is the Evidence?, Education Policy Analysis Archives Volume 6 Number 13 July 14, 1998,  https://epaa.asu.edu/ojs/article/view/580/ Office of Research and Improvement, US Education Department  
311 Audrey J. Noble Mary Lee Smith "Are the behaviorist beliefs underlying measurement-driven reform warranted? A small body of evidence addresses the functions of assessments from the traditional viewpoint. Dismissive Old and New Beliefs About Measurement-Driven Reform: The More Things Change, the More They Stay the Same, p.3 CSE Technical Report 373, CRESST/Arizona State University https://cresst.org/wp-content/uploads/TECH373.pdf Office of Research and Improvement, US Education Department Emerging? It is selected-response item formats (e.g., multiple choice) that are new. Open-ended item formats have been with us for millenia. And, thousands of research, evalution, and validity studies have been conducted on them.
312 Audrey J. Noble Mary Lee Smith "Few empirical studies exist of the use and effects of performance testing in
high-stakes  environments."
Dismissive Old and New Beliefs About Measurement-Driven Reform: The More Things Change, the More They Stay the Same, p.10 CSE Technical Report 373, CRESST/Arizona State University https://cresst.org/wp-content/uploads/TECH373.pdf Office of Research and Improvement, US Education Department Emerging? It is selected-response item formats (e.g., multiple choice) that are new. Open-ended item formats have been with us for millenia. And, thousands of research, evalution, and validity studies have been conducted on them.
313 Baker, E.L. O'Neil, H.F., & Linn, R.L.  "Sufficient high-quality assessments must be available before their impact on educational reform can be assessed. Although interest in performance-based assessment is high, our knowledge about its quality is low." Dismissive Policy and validity prospects for performance-based assessment, 1993, p.332 American Psychologist, 48(12), 1210-1218. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.816.7823&rep=rep1&type=pdf Office of Research and Improvement, US Education Department It is selected-response item formats (e.g., multiple choice) that are new. Performance assessments have been with us for millenia. And, thousands of research, evalution, and validity studies have been conducted on them.
314 Baker, E.L. O'Neil, H.F., & Linn, R.L.  "Moreover, few psychometric templates exist to guide the technical practices of assessment developers." Dismissive Policy and validity prospects for performance-based assessment, 1993, p.332 American Psychologist, 48(12), 1210-1218. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.816.7823&rep=rep1&type=pdf Office of Research and Improvement, US Education Department It is selected-response item formats (e.g., multiple choice) that are new. Performance assessments have been with us for millenia. And, thousands of research, evalution, and validity studies have been conducted on them.
315 Baker, E.L. O'Neil, H.F., & Linn, R.L.  "Most of the arguments in favor of performance-based assessment ... are based on single instances, essentially hand-crafted exercises whose virtues are assumed because they have been developed by teachers or because they are thought to model good instructional practice."  Denigrating Policy and validity prospects for performance-based assessment, 1993, p.334 American Psychologist, 48(12), 1210-1218. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.816.7823&rep=rep1&type=pdf Office of Research and Improvement, US Education Department It is selected-response item formats (e.g., multiple choice) that are new. Open-ended item formats have been with us for millenia. And, thousands of research, evalution, and validity studies have been conducted on them.
316 Baker, E.L. O'Neil, H.F., & Linn, R.L.  "Although there is a considerable literature on the problem of unit or team assessment in the military (Swezey & Salas, 1992) and in technical fields such as antisubmarine warfare (Franken, in press), no compelling solutions have been forwarded for disaggregating group or team performance into individual records, a potential problem if assessments are to be used to allocate individual access or certification." Denigrating Policy and validity prospects for performance-based assessment, 1993, p.336 American Psychologist, 48(12), 1210-1218. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.816.7823&rep=rep1&type=pdf Office of Research and Improvement, US Education Department It is selected-response item formats (e.g., multiple choice) that are new. Open-ended item formats have been with us for millenia. And, thousands of research, evalution, and validity studies have been conducted on them.
317 Baker, E.L. O'Neil, H.F., & Linn, R.L.  "What is the evidence in support of performance assessment? Reviews conducted of literature in military performance assessments (Baker, O’Neil, & Linn, 1990) and of literature in education (Baker, 1990b) have reported the relatively low incidence of any empirical literature in the field; less than 5% of the literature cited empirical data." Dismissive Policy and validity prospects for performance-based assessment, 1993, p.339-340 American Psychologist, 48(12), 1210-1218. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.816.7823&rep=rep1&type=pdf Office of Research and Improvement, US Education Department It is selected-response item formats (e.g., multiple choice) that are new. Open-ended item formats have been with us for millenia. And, thousands of research, evalution, and validity studies have been conducted on them.
318 Baker, E.L. O'Neil, H.F., & Linn, R.L.  "To date, there is some evidence that precollegiate performance assessments result in relatively low levels of student performance in almost every subject matter area in which they have been tried. There is also emerging data from NAEP analyses (Koretz, Lewis, Skewes-Cox, & Burstein, 1992) that students differ by ethnicity in the rate at which they attempt more open-ended types of items." Dismissive Policy and validity prospects for performance-based assessment, 1993, p.341 American Psychologist, 48(12), 1210-1218. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.816.7823&rep=rep1&type=pdf Office of Research and Improvement, US Education Department It is selected-response item formats (e.g., multiple choice) that are new. Open-ended item formats have been with us for millenia. And, thousands of research, evalution, and validity studies have been conducted on them.
319 Baker, E.L. O'Neil, H.F., & Linn, R.L.  "Research is underway attempting to address the motivational aspects of these assessments (Gearhart, Saxe, Stipek, & Hakansson, 1992; O’Neil, Sugrue, Abedi, Baker, & Golan, 1992)." Dismissive Policy and validity prospects for performance-based assessment, 1993, p.341 American Psychologist, 48(12), 1210-1218. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.816.7823&rep=rep1&type=pdf Office of Research and Improvement, US Education Department It is selected-response item formats (e.g., multiple choice) that are new. Open-ended item formats have been with us for millenia. And, thousands of research, evalution, and validity studies have been conducted on them.
320 Baker, E.L. O'Neil, H.F., & Linn, R.L.  "Another approach might require the reconceptualization of the unit of assessment to include both teacher and student and thereby to legitimate help of various sorts. As yet, there is little research and only occasional speculation about the degree to which new assessments will be corrupted." Dismissive Policy and validity prospects for performance-based assessment, 1993, p.344-345 American Psychologist, 48(12), 1210-1218. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.816.7823&rep=rep1&type=pdf Office of Research and Improvement, US Education Department It is selected-response item formats (e.g., multiple choice) that are new. Open-ended item formats have been with us for millenia. And, thousands of research, evalution, and validity studies have been conducted on them.
321 Baker, E.L. O'Neil, H.F., & Linn, R.L.  "A better research base is needed to evaluate the degree to which newly developed assessments fulfill expectations" Denigrating Policy and validity prospects for performance-based assessment, 1993, p.346 American Psychologist, 48(12), 1210-1218. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.816.7823&rep=rep1&type=pdf Office of Research and Improvement, US Education Department It is selected-response item formats (e.g., multiple choice) that are new. Open-ended item formats have been with us for millenia. And, thousands of research, evalution, and validity studies have been conducted on them.
322 Eva L. Baker Robert L. Linn "Because performance assessments are emerging phenomena, procedures for assessing their quality are in some disorder." Denigrating The Technical Merits of Performance Assessments, p.1 CRESST Line, Special 1993 AERA Issue   Office of Research and Improvement, US Education Department Emerging? It is selected-response item formats (e.g., multiple choice) that are new. Open-ended item formats have been with us for millenia. And, thousands of research, evalution, and validity studies have been conducted on them.
323 Eva L. Baker Robert L. Linn "Second, there is relatively little analysis of the sequence of technical procedures required to render assessments sound for some uses."  Dismissive The Technical Merits of Performance Assessments, p.1 CRESST Line, Special 1993 AERA Issue   Office of Research and Improvement, US Education Department Emerging? It is selected-response item formats (e.g., multiple choice) that are new. Open-ended item formats have been with us for millenia. And, thousands of research, evalution, and validity studies have been conducted on them.
324 Eva L. Baker Robert L. Linn "The problem is that we cannot learn enough from the conduct of short-term instructional studies, nor can we wait for the results of longer-term instructional programs. ...We must continue to operate on faith." Denigrating The Technical Merits of Performance Assessments, p.2 CRESST Line, Special 1993 AERA Issue   Office of Research and Improvement, US Education Department Emerging? It is selected-response item formats (e.g., multiple choice) that are new. Open-ended item formats have been with us for millenia. And, thousands of research, evalution, and validity studies have been conducted on them.
325 Walter M. Haney George F. Madaus, Robert Lyons "Academics who write about educational and psychological testing similarly have given little attention to the commercial side of testing." p.9 Dismissive The Fractured Marketplace for Standardized Testing National Commission on Testing and Public Policy, Boston College, Kluwer Academic Publishers, 1993   "Finally we thank the Ford Foundation, and three present and former officials there, …"  
326 Walter M. Haney George F. Madaus, Robert Lyons "Nor is there much clear evidence on the potential distortions introduced by the Lake Wobegon phenomenon." p.231 Dismissive The Fractured Marketplace for Standardized Testing National Commission on Testing and Public Policy, Boston College, Kluwer Academic Publishers, 1993   "Finally we thank the Ford Foundation, and three present and former officials there, …" John J. Cannells original "Lake Wobegon Effect" studies did a fine job of specifying the results, in detail.  See:  http://nonpartisaneducation.org/Review/Books/CannellBook1.htm  http://nonpartisaneducation.org/Review/Books/Cannell2.pdf
327 Robert L. Linn Vonda L. Kiplinger "Unfortunately, there have been no empirical studies to date to either support or reject the hypothesized lack of motivation generated by the NAEP testing environment, or to show whether students' performance would be improved if motivation were increased." 1stness Raising the stakes of test administration: The impact on student performance on NAEP, p.3 CSE Technical Report 360, March 3, 1993 https://files.eric.ed.gov/fulltext/ED378221.pdf Office of Research and Improvement, US Education Department A cornucopia of research has shown "no stakes" tests to be relatively unreliable, less reliable than high stakes tests, and to dampen student effort (see, e.g., Acherman & Kanfer, 2009; S. M. Brown & Walberg, 1993; Cole, Bergin, & Whittaker, 2008; Eklof, 2007; Finn, 2015; Hawthorne, Bol, Pribesh, & Suh, 2015; Wise & DeMars, 2005, 2015). Relevant studies of the effects of tests and/or accountability program on motivation and instructional practice include those of the *Southern Regional Education Board (1998); Johnson (1998); Schafer, Hultgren, Hawley, Abrams Seubert & Mazzoni (1997); Miles, Bishop, Collins, Fink, Gardner, Grant, Hussain, et al. (1997); Tuckman & Trimble (1997); Clarke & Stephens (1996); Zigarelli (1996); Stevenson, Lee, et al. (1995); Waters, Burger & Burger (1995); Egeland (1995); Prais (1995); Tuckman (1994); Ritchie & Thorkildsen (1994); Brown & Walberg, (1993); Wall & Alderson (1993); Wolf & Rapiau (1993); Eckstein & Noah (1993); Chao-Qun & Hui (1993); Plazak & Mazur (1992); Steedman (1992); Singh, Marimutha & Mukjerjee (1990); *Levine & Lezotte (1990); O’Sullivan (1989); Somerset (1988); Pennycuick & Murphy (1988); Stevens (1984); Marsh (1984); Brunton (1982); Solberg (1977); Foss (1977); *Kirkland (1971); Somerset (1968); Stuit (1947); and Keys (1934).  *Covers many studies; study is a research review, research synthesis, or meta-analysis.
328 Robert L. Linn Vonda L. Kiplinger "Although much has been written on achievement motivation per se, there has been surprisingly little empirical research on the effects of different motivation conditions on test performance. Before examining the paucity of research on the relationship of motivation and test performance....?" Dismissive Raising the stakes of test administration: The impact on student performance on NAEP, p.3 CSE Technical Report 360, March 3, 1993 https://files.eric.ed.gov/fulltext/ED378221.pdf Office of Research and Improvement, US Education Department A cornucopia of research has shown "no stakes" tests to be relatively unreliable, less reliable than high stakes tests, and to dampen student effort (see, e.g., Acherman & Kanfer, 2009; S. M. Brown & Walberg, 1993; Cole, Bergin, & Whittaker, 2008; Eklof, 2007; Finn, 2015; Hawthorne, Bol, Pribesh, & Suh, 2015; Wise & DeMars, 2005, 2015). Relevant studies of the effects of tests and/or accountability program on motivation and instructional practice include those of the *Southern Regional Education Board (1998); Johnson (1998); Schafer, Hultgren, Hawley, Abrams Seubert & Mazzoni (1997); Miles, Bishop, Collins, Fink, Gardner, Grant, Hussain, et al. (1997); Tuckman & Trimble (1997); Clarke & Stephens (1996); Zigarelli (1996); Stevenson, Lee, et al. (1995); Waters, Burger & Burger (1995); Egeland (1995); Prais (1995); Tuckman (1994); Ritchie & Thorkildsen (1994); Brown & Walberg, (1993); Wall & Alderson (1993); Wolf & Rapiau (1993); Eckstein & Noah (1993); Chao-Qun & Hui (1993); Plazak & Mazur (1992); Steedman (1992); Singh, Marimutha & Mukjerjee (1990); *Levine & Lezotte (1990); O’Sullivan (1989); Somerset (1988); Pennycuick & Murphy (1988); Stevens (1984); Marsh (1984); Brunton (1982); Solberg (1977); Foss (1977); *Kirkland (1971); Somerset (1968); Stuit (1947); and Keys (1934).  *Covers many studies; study is a research review, research synthesis, or meta-analysis.
329 Robert L. Linn Vonda L. Kiplinger "Before examining the paucity of research on the relationship of motivation and test performance, we first review briefly the general literature on the relationship of motivation and achievement." Dismissive Raising the stakes of test administration: The impact on student performance on NAEP, p.3 CSE Technical Report 360, March 3, 1993 https://files.eric.ed.gov/fulltext/ED378221.pdf Office of Research and Improvement, US Education Department A cornucopia of research has shown "no stakes" tests to be relatively unreliable, less reliable than high stakes tests, and to dampen student effort (see, e.g., Acherman & Kanfer, 2009; S. M. Brown & Walberg, 1993; Cole, Bergin, & Whittaker, 2008; Eklof, 2007; Finn, 2015; Hawthorne, Bol, Pribesh, & Suh, 2015; Wise & DeMars, 2005, 2015). Relevant studies of the effects of tests and/or accountability program on motivation and instructional practice include those of the *Southern Regional Education Board (1998); Johnson (1998); Schafer, Hultgren, Hawley, Abrams Seubert & Mazzoni (1997); Miles, Bishop, Collins, Fink, Gardner, Grant, Hussain, et al. (1997); Tuckman & Trimble (1997); Clarke & Stephens (1996); Zigarelli (1996); Stevenson, Lee, et al. (1995); Waters, Burger & Burger (1995); Egeland (1995); Prais (1995); Tuckman (1994); Ritchie & Thorkildsen (1994); Brown & Walberg, (1993); Wall & Alderson (1993); Wolf & Rapiau (1993); Eckstein & Noah (1993); Chao-Qun & Hui (1993); Plazak & Mazur (1992); Steedman (1992); Singh, Marimutha & Mukjerjee (1990); *Levine & Lezotte (1990); O’Sullivan (1989); Somerset (1988); Pennycuick & Murphy (1988); Stevens (1984); Marsh (1984); Brunton (1982); Solberg (1977); Foss (1977); *Kirkland (1971); Somerset (1968); Stuit (1947); and Keys (1934).  *Covers many studies; study is a research review, research synthesis, or meta-analysis.
330 Robert L. Linn Vonda L. Kiplinger "Prior to 1980, achievement motivation theory focused primarily on the need for achievement and the effects of test anxiety on test performance." Dismissive Raising the stakes of test administration: The impact on student performance on NAEP, p.3 CSE Technical Report 360, March 3, 1993 https://files.eric.ed.gov/fulltext/ED378221.pdf Office of Research and Improvement, US Education Department Relevant studies of the effects of tests and/or accountability program on motivation and instructional practice include those of the *Southern Regional Education Board (1998); Johnson (1998); Schafer, Hultgren, Hawley, Abrams Seubert & Mazzoni (1997); Miles, Bishop, Collins, Fink, Gardner, Grant, Hussain, et al. (1997); Tuckman & Trimble (1997); Clarke & Stephens (1996); Zigarelli (1996); Stevenson, Lee, et al. (1995); Waters, Burger & Burger (1995); Egeland (1995); Prais (1995); Tuckman (1994); Ritchie & Thorkildsen (1994); Brown & Walberg, (1993); Wall & Alderson (1993); Wolf & Rapiau (1993); Eckstein & Noah (1993); Chao-Qun & Hui (1993); Plazak & Mazur (1992); Steedman (1992); Singh, Marimutha & Mukjerjee (1990); *Levine & Lezotte (1990); O’Sullivan (1989); Somerset (1988); Pennycuick & Murphy (1988); Stevens (1984); Marsh (1984); Brunton (1982); Solberg (1977); Foss (1977); *Kirkland (1971); Somerset (1968); Stuit (1947); and Keys (1934).  *Covers many studies; study is a research review, research synthesis, or meta-analysis. "Others have considered the role of tests in incentive programs.  These researchers have included Homme, Csanyi, Gonzales, Rechs, O’Leary, Drabman, Kaszdin, Bootzin, Staats, Cameron, Pierce, McMillan, Corcoran, and Wilson. International organizations, such as the World Bank or the Asian Development Bank, have studied the effects of testing on education programs they sponsor.  Researchers have included Somerset, Heynemann, Ransom, Psacharopoulis, Velez, Brooke, Oxenham, Bude, Chapman, Snyder, and Pronaratna.
Moreover, the mastery learning/mastery testing experiments conducted from the 1960s through today varied incentives, frequency of tests, types of tests, and many other factors to determine the optimal structure of testing programs. Researchers included such notables as Bloom, Carroll, Keller, Block, Burns, Wentling, Anderson, Hymel, Kulik, Tierney, Cross, Okey, Guskey, Gates, and Jones."
331 Robert L. Linn Vonda L. Kiplinger "Despite continuing concern regarding the effects of motivation on student achievement and test performance in general, ...there has been very little empirical research on students' self-reported motivation levels or experimental manipulation of motivational conditions--until recently." Dismissive Raising the stakes of test administration: The impact on student performance on NAEP, p.3 CSE Technical Report 360, March 3, 1993 https://files.eric.ed.gov/fulltext/ED378221.pdf Office of Research and Improvement, US Education Department Relevant studies of the effects of tests and/or accountability program on motivation and instructional practice include those of the *Southern Regional Education Board (1998); Johnson (1998); Schafer, Hultgren, Hawley, Abrams Seubert & Mazzoni (1997); Miles, Bishop, Collins, Fink, Gardner, Grant, Hussain, et al. (1997); Tuckman & Trimble (1997); Clarke & Stephens (1996); Zigarelli (1996); Stevenson, Lee, et al. (1995); Waters, Burger & Burger (1995); Egeland (1995); Prais (1995); Tuckman (1994); Ritchie & Thorkildsen (1994); Brown & Walberg, (1993); Wall & Alderson (1993); Wolf & Rapiau (1993); Eckstein & Noah (1993); Chao-Qun & Hui (1993); Plazak & Mazur (1992); Steedman (1992); Singh, Marimutha & Mukjerjee (1990); *Levine & Lezotte (1990); O’Sullivan (1989); Somerset (1988); Pennycuick & Murphy (1988); Stevens (1984); Marsh (1984); Brunton (1982); Solberg (1977); Foss (1977); *Kirkland (1971); Somerset (1968); Stuit (1947); and Keys (1934).  *Covers many studies; study is a research review, research synthesis, or meta-analysis. "Others have considered the role of tests in incentive programs.  These researchers have included Homme, Csanyi, Gonzales, Rechs, O’Leary, Drabman, Kaszdin, Bootzin, Staats, Cameron, Pierce, McMillan, Corcoran, and Wilson. International organizations, such as the World Bank or the Asian Development Bank, have studied the effects of testing on education programs they sponsor.  Researchers have included Somerset, Heynemann, Ransom, Psacharopoulis, Velez, Brooke, Oxenham, Bude, Chapman, Snyder, and Pronaratna.
Moreover, the mastery learning/mastery testing experiments conducted from the 1960s through today varied incentives, frequency of tests, types of tests, and many other factors to determine the optimal structure of testing programs. Researchers included such notables as Bloom, Carroll, Keller, Block, Burns, Wentling, Anderson, Hymel, Kulik, Tierney, Cross, Okey, Guskey, Gates, and Jones."
332 Joan L. Herman   "Although the development of new alternatives is a popular idea, and many are engaged in the process, most developers of these new alternatives (with the exception of writing assessments) are at the design and prototyping stages, at some distance from having validated assessments." Dismissive Accountability and Alternative Assessment: Research and Development Issues, p.9 CSE Technical Report 348, August 1992 https://cresst.org/wp-content/uploads/TECH348.pdf Office of Research and Improvement, US Education Department It is selected-response item formats (e.g., multiple choice) that are new. Performance and authentic test formats have been with us for millenia. And, thousands of research, evalution, and validity studies have been conducted on them.
333 Joan L. Herman   "Yet what we know about alternative or performance-based measures is relatively small when compared to what we have yet to discover." Dismissive Accountability and Alternative Assessment: Research and Development Issues, p.9 CSE Technical Report 348, August 1992 https://cresst.org/wp-content/uploads/TECH348.pdf Office of Research and Improvement, US Education Department It is selected-response item formats (e.g., multiple choice) that are new. Performance and authentic test formats have been with us for millenia. And, thousands of research, evalution, and validity studies have been conducted on them.
334 Lorrie A. Shepard   "Proponents of measurement-driveni nstruction (MDI) argued, in the 1980s, that high-stakes tests would set clear targets thus assuring that teachers would focus greater attentionon essential basic skills. Critics countered that measurement-driven instruction distorts the curriculum, .... Each side argued theoretically and from limited observations but without systematic proof of these assertions." Dismissive Will National Tests Improve Student Learning?, p.6 CSE Technical Report 342, April 1992 https://files.eric.ed.gov/fulltext/ED348382.pdf Office of Research and Improvement, US Education Department Relevant pre-2000 studies of the effects of standards, alignment, goal setting, setting reachable goals, etc. include those of Mitchell (1999); Morgan & Ramist (1998); the *Southern Regional Education Board (1998); Miles, Bishop, Collins, Fink, Gardner, Grant, Hussain, et al. (1997); the Florida Office of Program Policy Analysis (1997); Pomplun (1997); Schmoker (1996); Aguilera & Hendricks (1996); Banta, Lund, Black & Oblander (1996); Bottoms & Mikos (1995); *Bamburg & Medina (1993); Bishop (1993); the U. S. General Accounting Office (1993); Eckstein & Noah (1993); Mattsson (1993); Brown (1992); Heyneman & Ransom (1992); Whetton (1992); Anderson, Muir, Bateson, Blackmore & Rogers (1990); Csikszentmihalyi (1990); *Levine & Lezotte (1990); LaRoque & Coleman (1989); Hillocks (1987); Willingham & Morris (1986); Resnick & Resnick (1985); Ogle & Fritts (1984); *Natriello & Dornbusch (1984); Brooke & Oxenham (1984); Rentz (1979); Wellisch, MacQueen, Carriere & Dick (1978); *Rosswork (1977); Estes, Colvin & Goodwin (1976); Wood (1953); and Panlasigui & Knight (1930).
335 Lorrie A. Shepard   "The vision of curriculum-driven examinations offered by the National Education Goals Panel is inspired. However, we do not at present have the technical, curricular, or political know-how to install such a system at least not on so large a scale." Dismissive Will National Tests Improve Student Learning?, p.10 CSE Technical Report 342, April 1992 https://files.eric.ed.gov/fulltext/ED348382.pdf Office of Research and Improvement, US Education Department  
336 Lorrie A. Shepard   "Moreover, there is no evidence available about what would happen to the quality of instruction if all high-school teachers, not just those who volunteered, were required to teach to the AP curricula." Dismissive Will National Tests Improve Student Learning?, p.10 CSE Technical Report 342, April 1992 https://files.eric.ed.gov/fulltext/ED348382.pdf Office of Research and Improvement, US Education Department  
337 Lorrie A. Shepard   "Research evidence on the effects of traditional standardized tests when used as high-stakes accountability instruments is strikingly negative." Dismissive Will National Tests Improve Student Learning?, pp.15-16 CSE Technical Report 342, April 1992 https://files.eric.ed.gov/fulltext/ED348382.pdf Office of Research and Improvement, US Education Department In fact, the evidence "that testing can improve education" is voluminous. See, for example, Phelps, R. P. (2005). The rich, robust research literature on testing’s achievement benefits. In R. P. Phelps (Ed.), Defending standardized testing (pp. 55–90). Mahwah, NJ: Psychology Press. Or, see https://journals.sagepub.com/doi/abs/10.1177/0193841X19865628#abstract
338 Joan L. Herman Shari Golan ""Using greater technical rigor, Linn et al. (1989) replicated Cannell's findings, but moved beyond them in identifying underlying causes for such seemingly spurious results, among them the age of norms." pp.10-11 Denigrating Effects of Standardized Testing on Teachers and Learning—Another Look CSE Report No. 334 https://eric.ed.gov/?id=ED341738 Office of Research and Improvement, US Education Department No. Cannell was exactly right. There was corruption, lax security, and cheating. See, for example, https://nonpartisaneducation.org/Review/Articles/v6n3.htm
339 R.J. Dietel, J.L. Herman, and R.A. Knuth   "Although there is now great excitement about performance-based assessment, we still know relatively little about methods for designing and validating such assessments. CRESST is one of many organizations and schools researching the promises and realities of such assessments." p.3 Dismissive What Does Research Say About Assessment? North Central Regional Education Laboratory, 1991 http://methodenpool.uni-koeln.de/portfolio/What%20Does%20Research%20Say%20About%20Assessment.htm Office of Research and Improvement, US Education Department It is selected-response item formats (e.g., multiple choice) that are new. Open-ended item formats have been with us for millenia. And, thousands of research, evalution, and validity studies have been conducted on them.
340 R.J. Dietel, J.L. Herman, and R.A. Knuth   "What we know about performance-based assessment is limited and there are many issues yet to be resolved." p.6 Dismissive What Does Research Say About Assessment? North Central Regional Education Laboratory, 1991 http://methodenpool.uni-koeln.de/portfolio/What%20Does%20Research%20Say%20About%20Assessment.htm Office of Research and Improvement, US Education Department It is selected-response item formats (e.g., multiple choice) that are new. Open-ended item formats have been with us for millenia. And, thousands of research, evalution, and validity studies have been conducted on them.
341 Mary Lee Smith Carole Edelsky, Kelly Draper, Claire Rottenberg, Meredith Cherland "Although schools have administered standardized tests of achievement for decades, only recently have such tests been used as instruments of social policy." p.1 Dismissive The Role of Testing in Elementary Schools CSE Technical Report 321, May 1991 https://cresst.org/publications/cresst-publication-2695/ Office of Research and Improvement, US Education Department Rubbish. Entire books dating back a century were written on the topic, for example:  C.C. Ross, Measurement in Today’s Schools, 1942;  G.M. Ruch, G.D. Stoddard, Tests and Measurements in High School Instruction, 1927;  C.W. Odell, Educational Measurement in High School, 1930. Other testimonies to the abundance of educational testing and empirical research on test use starting in the first half of the twentieth century can be found in Lincoln & Workman 1936, 4, 7; Butts 1947, 605; Monroe 1950, 1461; Holman & Docter 1972, 34; Tyack 1974, 183; and Lohman 1997, 88.
342 Mary Lee Smith Carole Edelsky, Kelly Draper, Claire Rottenberg, Meredith Cherland "The research literature on the effects of external testing is small but growing." p.3 Dismissive The Role of Testing in Elementary Schools CSE Technical Report 321, May 1991 https://cresst.org/publications/cresst-publication-2695/ Office of Research and Improvement, US Education Department Rubbish. Entire books dating back a century were written on the topic, for example:  C.C. Ross, Measurement in Today’s Schools, 1942;  G.M. Ruch, G.D. Stoddard, Tests and Measurements in High School Instruction, 1927;  C.W. Odell, Educational Measurement in High School, 1930. Other testimonies to the abundance of educational testing and empirical research on test use starting in the first half of the twentieth century can be found in Lincoln & Workman 1936, 4, 7; Butts 1947, 605; Monroe 1950, 1461; Holman & Docter 1972, 34; Tyack 1974, 183; and Lohman 1997, 88.
343 Mary Lee Smith Carole Edelsky, Kelly Draper, Claire Rottenberg, Meredith Cherland "Past researchers have not examined the classroom directly for traces of testing effects." p.5 Dismissive The Role of Testing in Elementary Schools CSE Technical Report 321, May 1991 https://cresst.org/publications/cresst-publication-2695/ Office of Research and Improvement, US Education Department Rubbish. Entire books dating back a century were written on the topic, for example:  C.C. Ross, Measurement in Today’s Schools, 1942;  G.M. Ruch, G.D. Stoddard, Tests and Measurements in High School Instruction, 1927;  C.W. Odell, Educational Measurement in High School, 1930. Other testimonies to the abundance of educational testing and empirical research on test use starting in the first half of the twentieth century can be found in Lincoln & Workman 1936, 4, 7; Butts 1947, 605; Monroe 1950, 1461; Holman & Docter 1972, 34; Tyack 1974, 183; and Lohman 1997, 88.
344 Eva L. Baker   "Knowledge Base: Paltry But Sure to Improve: At the same time that interest in alternative assessment is high, our knowledge about the design, distribution, quality and impact of such efforts is low. This is a time of tingling metaphor, cottage industry, and existence proofs rather than carefully designed research and development." Dismissive What Probably Works in Alternative Assessment, p.2 Paper presented at the Annual Meetings of the American Educational Research Association (Chicago, IL, April 3-7, 1991)  https://files.eric.ed.gov/fulltext/ED512658.pdf Office of Research and Improvement, US Education Department It is selected-response item formats (e.g., multiple choice) that are new. Performance and authentic test formats have been with us for millenia. And, thousands of research, evalution, and validity studies have been conducted on them.
345 Eva L. Baker   "Moreover, because psychometric methods appropriate for dealing with such new measures are not readily available, nor even a matter of common agreement, no clear templates exist to guide the technical practices of alternative assessment developers (Linn, Baker, Dunbar, 1991)." Dismissive What Probably Works in Alternative Assessment, p.2 Paper presented at the Annual Meetings of the American Educational Research Association (Chicago, IL, April 3-7, 1991)  https://files.eric.ed.gov/fulltext/ED512658.pdf Office of Research and Improvement, US Education Department It is selected-response item formats (e.g., multiple choice) that are new. Performance and authentic test formats have been with us for millenia. And, thousands of research, evalution, and validity studies have been conducted on them.
346 Eva L. Baker   "Given that the level of empirical work is so obviously low, one well might wonder what these studies are about." Denigrating What Probably Works in Alternative Assessment, p.3 Paper presented at the Annual Meetings of the American Educational Research Association (Chicago, IL, April 3-7, 1991)  https://files.eric.ed.gov/fulltext/ED512658.pdf Office of Research and Improvement, US Education Department It is selected-response item formats (e.g., multiple choice) that are new. Performance and authentic test formats have been with us for millenia. And, thousands of research, evalution, and validity studies have been conducted on them.
347 Eva L. Baker   "Despite this fragile research base, alternative assessment has already taken off. What issues can we anticipate being raised by relevant communities about the value of these efforts?" Dismissive What Probably Works in Alternative Assessment, p.6 Paper presented at the Annual Meetings of the American Educational Research Association (Chicago, IL, April 3-7, 1991)  https://files.eric.ed.gov/fulltext/ED512658.pdf Office of Research and Improvement, US Education Department It is selected-response item formats (e.g., multiple choice) that are new. Performance and authentic test formats have been with us for millenia. And, thousands of research, evalution, and validity studies have been conducted on them.
348 Eva L. Baker   "This phenomenon may be due to lack of coherent specifications of the performance task domain, lack of coherent instructional experience, or the inherent instability of more complex performance? Until some insight on this phenomenon can be developed, however, using a single performance assessment for individual student decisions is a scary prospect." Dismissive What Probably Works in Alternative Assessment, p.7 Paper presented at the Annual Meetings of the American Educational Research Association (Chicago, IL, April 3-7, 1991)  https://files.eric.ed.gov/fulltext/ED512658.pdf Office of Research and Improvement, US Education Department It is selected-response item formats (e.g., multiple choice) that are new. Performance and authentic test formats have been with us for millenia. And, thousands of research, evalution, and validity studies have been conducted on them.
349 Lorrie A. Shepard Catherine Cutts Dougherty "Evidence to support the positive claims for measurement-driven instruction comes primarily from high-stakes tests themselves. For example, Popham, Cruse, Rankin, Sandifer, and Williams (1985) and Popham (1987) pointed to the steeply rising passing rates on minimum competency tests as demonstrations that MDI had improved student learning." p.2 Denigrating Effect of High-Stakes Testing on Instruction Paper presented at the Annual Meetings of the American Educational Research Association (Chicago, IL, April 3-7, 1991) and the National Council on Measurement in Education (Chicago, IL, April 4-6,1991) https://files.eric.ed.gov/fulltext/ED337468.pdf Office of Research and Improvement, US Education Department The many studies of district and state minimum competency or diploma testing programs popular from the 1960s through the 1980s found positive effects for students just below the cut score and mixed effects for students far below and anywhere above.  Researchers have included Fincher, Jackson, Battiste, Corcoran, Jacobsen, Tanner, Boylan, Saxon, Anderson, Muir, Bateson, Blackmore, Rogers, Zigarelli, Schafer, Hultgren, Hawley, Abrams, Seubert, Mazzoni, Brookhart, Mendro, Herrick, Webster, Orsack, Weerasinghe, and Bembry
350 Lorrie A. Shepard Catherine Cutts Dougherty "Evidence documenting the negative influence on instruction is limited to a few studies. Darling-Hammond and Wise (1985) reported that teachers in their study were pressured to 'teach to the test.'" Dismissive Effect of High-Stakes Testing on Instruction Paper presented at the Annual Meetings of the American Educational Research Association (Chicago, IL, April 3-7, 1991) and the National Council on Measurement in Education (Chicago, IL, April 4-6,1991) https://files.eric.ed.gov/fulltext/ED337468.pdf Office of Research and Improvement, US Education Department Rubbish. Entire books dating back a century were written on the topic, for example:  C.C. Ross, Measurement in Today’s Schools, 1942;  G.M. Ruch, G.D. Stoddard, Tests and Measurements in High School Instruction, 1927;  C.W. Odell, Educational Measurement in High School, 1930. Other testimonies to the abundance of educational testing and empirical research on test use starting in the first half of the twentieth century can be found in Lincoln & Workman 1936, 4, 7; Butts 1947, 605; Monroe 1950, 1461; Holman & Docter 1972, 34; Tyack 1974, 183; and Lohman 1997, 88.
351 Daniel M. Koretz Robert L. Linn, Stephen Dunbar, Lorrie A. Shepard “Evidence relevant to this debate has been limited.” p. 2 Dismissive The Effects of High-Stakes Testing On Achievement: Preliminary Findings About Generalization Across Tests  Originally presented at the annual meeting of the AERA and the NCME, Chicago, April 5, 1991 http://nepc.colorado.edu/files/HighStakesTesting.pdf Office of Research and Improvement, US Education Department See, for example, https://www.tandfonline.com/doi/full/10.1080/15305058.2011.602920 ; https://nonpartisaneducation.org/Review/Resources/QuantitativeList.htm ; https://nonpartisaneducation.org/Review/Resources/SurveyList.htm ; https://nonpartisaneducation.org/Review/Resources/QualitativeList.htm
352 James S. Catterall   "Before proceeding, readers should note that the observations do not result from an accumulated weight of in-depth cost-benefit type studies, since no such weight has been registered." p.2 Dismissive Estimating the Costs and Benefits of Large-Scale Assessments: Lessons from Recent Research CSE Report No. 319, 1990 https://cresst.org/wp-content/uploads/TECH319.pdf Office of Research and Improvement, US Education Department  
353 James S. Catterall   "The points tend to build on the small number of interesting developments reported (particularly Shepard & Kreitzer, 1987a, 1987b; Solmon & Fagnano, in press), as well as on the author's experiences in conducting cost-benefit type analyses of educational assessment practices (Catterall, 1984, 1989). We also base inferences on the paucity of research itself." p.2 Dismissive Estimating the Costs and Benefits of Large-Scale Assessments: Lessons from Recent Research CSE Report No. 319, 1990 https://cresst.org/wp-content/uploads/TECH319.pdf Office of Research and Improvement, US Education Department  
354 Hartigan, J. A., & Wigdor, A. K.   "The empirical evidence cited for the standard deviation of worker productivity is quite slight." p.239 Dismissive Fairness in employment testing: Validity generalization, minority issues, and the General Aptitude Test Battery.  Washington, DC: National Academy Press, 1989 https://www.nap.edu/catalog/1338/fairness-in-employment-testing-validity-generalization-minority-issues-and-the National Research Council funders See, for example, The National Research Council’s Testing Expertise,  https://www.apa.org/pubs/books/supplemental/correcting-fallacies-educational-psychological-testing/Phelps Web Appendix D new.doc
355 Hartigan, J. A., & Wigdor, A. K.   "Some fragmentary confirming evidence that supports this point of view can be found in Hunter et al. (1988)... We regard the Hunter and Schmidt assumption as plausible but note that there is very little evidence about the nature of the relationship of ability to output." p.243 Dismissive Fairness in employment testing: Validity generalization, minority issues, and the General Aptitude Test Battery.  Washington, DC: National Academy Press, 1989 https://www.nap.edu/catalog/1338/fairness-in-employment-testing-validity-generalization-minority-issues-and-the National Research Council funders See, for example, The National Research Council’s Testing Expertise,  https://www.apa.org/pubs/books/supplemental/correcting-fallacies-educational-psychological-testing/Phelps Web Appendix D new.doc
356 Hartigan, J. A., & Wigdor, A. K.   "It is also important to remember that the most important assumptions of the Hunter-Schmidt models rest on a very slim empirical foundation .... Hunter and Schmidt's economy-wide models are based on simple assumptions for which the empirical evidence is slight." p.245 Dismissive, Denigrating Fairness in employment testing: Validity generalization, minority issues, and the General Aptitude Test Battery.  Washington, DC: National Academy Press, 1989 https://www.nap.edu/catalog/1338/fairness-in-employment-testing-validity-generalization-minority-issues-and-the National Research Council funders See, for example, The National Research Council’s Testing Expertise,  https://www.apa.org/pubs/books/supplemental/correcting-fallacies-educational-psychological-testing/Phelps Web Appendix D new.doc
357 Hartigan, J. A., & Wigdor, A. K.   "It is important to remember that the most important assumptions of the Hunter-Schmidt models rest on a very slim empirical foundation." p.245 Dismissive, Denigrating Fairness in employment testing: Validity generalization, minority issues, and the General Aptitude Test Battery.  Washington, DC: National Academy Press, 1989 https://www.nap.edu/catalog/1338/fairness-in-employment-testing-validity-generalization-minority-issues-and-the National Research Council funders See, for example, The National Research Council’s Testing Expertise,  https://www.apa.org/pubs/books/supplemental/correcting-fallacies-educational-psychological-testing/Phelps Web Appendix D new.doc
358 Hartigan, J. A., & Wigdor, A. K.   "Hunter and Schmidt's economy wide models are based on simple assumptions for which the empirical evidence is slight." p.245 Dismissive, Denigrating Fairness in employment testing: Validity generalization, minority issues, and the General Aptitude Test Battery.  Washington, DC: National Academy Press, 1989 https://www.nap.edu/catalog/1338/fairness-in-employment-testing-validity-generalization-minority-issues-and-the National Research Council funders See, for example, The National Research Council’s Testing Expertise,  https://www.apa.org/pubs/books/supplemental/correcting-fallacies-educational-psychological-testing/Phelps Web Appendix D new.doc
359 Hartigan, J. A., & Wigdor, A. K.   "That assumption is supported by only a very few studies." p.245 Dismissive, Denigrating Fairness in employment testing: Validity generalization, minority issues, and the General Aptitude Test Battery.  Washington, DC: National Academy Press, 1989 https://www.nap.edu/catalog/1338/fairness-in-employment-testing-validity-generalization-minority-issues-and-the National Research Council funders See, for example, The National Research Council’s Testing Expertise,  https://www.apa.org/pubs/books/supplemental/correcting-fallacies-educational-psychological-testing/Phelps Web Appendix D new.doc
360 Hartigan, J. A., & Wigdor, A. K.   "There is no well-developed body of evidence from which to estimate the aggregate effects of better personnel selection...we have seen no empirical evidence that any of them provide an adequate basis for estimating the aggregate economic effects of implementing the VG-GATB on a nationwide basis." p.247 Dismissive, Denigrating Fairness in employment testing: Validity generalization, minority issues, and the General Aptitude Test Battery.  Washington, DC: National Academy Press, 1989 https://www.nap.edu/catalog/1338/fairness-in-employment-testing-validity-generalization-minority-issues-and-the National Research Council funders See, for example, The National Research Council’s Testing Expertise,  https://www.apa.org/pubs/books/supplemental/correcting-fallacies-educational-psychological-testing/Phelps Web Appendix D new.doc
361 Hartigan, J. A., & Wigdor, A. K.   "Furthermore, given the state of scientific knowledge, we do not believe that realistic dollar estimates of aggregate gains from improved selection are even possible." p.248 Dismissive Fairness in employment testing: Validity generalization, minority issues, and the General Aptitude Test Battery.  Washington, DC: National Academy Press, 1989 https://www.nap.edu/catalog/1338/fairness-in-employment-testing-validity-generalization-minority-issues-and-the National Research Council funders See, for example, The National Research Council’s Testing Expertise,  https://www.apa.org/pubs/books/supplemental/correcting-fallacies-educational-psychological-testing/Phelps Web Appendix D new.doc
362 Hartigan, J. A., & Wigdor, A. K.   "...primitive state of knowledge..." p.248 Denigrating Fairness in employment testing: Validity generalization, minority issues, and the General Aptitude Test Battery.  Washington, DC: National Academy Press, 1989 https://www.nap.edu/catalog/1338/fairness-in-employment-testing-validity-generalization-minority-issues-and-the National Research Council funders See, for example, The National Research Council’s Testing Expertise,  https://www.apa.org/pubs/books/supplemental/correcting-fallacies-educational-psychological-testing/Phelps Web Appendix D new.doc
363 Joan L. Herman, Donald W. Dorr-Bremme Walter E. Hathaway, Ed. "Despite the controversy and the important issues that it raises, little information has been forthcoming on the nature of testing as it is actually used in the schools. What functions do tests serve in the classrooms? How do teachers and principals use test results? What kinds of tests do principals and teachers trust and rely on most? These and similar questions have gone largely unaddressed." p.8 Dismissive Uses of Testing in the Schools: A National Profile Testing in the Schools, New Directions for Testing and Measurement #19, Jossey-Bass, September 1983   Office of Research and Improvement, US Education Department Rubbish. Entire books dating back a century were written on the topic, for example:  C.C. Ross, Measurement in Today’s Schools, 1942;  G.M. Ruch, G.D. Stoddard, Tests and Measurements in High School Instruction, 1927;  C.W. Odell, Educational Measurement in High School, 1930. Other testimonies to the abundance of educational testing and empirical research on test use starting in the first half of the twentieth century can be found in Lincoln & Workman 1936, 4, 7; Butts 1947, 605; Monroe 1950, 1461; Holman & Docter 1972, 34; Tyack 1974, 183; and Lohman 1997, 88.
364 Joan L. Herman, Donald W. Dorr-Bremme Walter E. Hathaway, Ed. "A few studies have indicated teachers' circumspect attitudes toward and limited use of one type of achievement measure, the norm-referenced test. Beyond this, however, the landscape of test uses in American schools has remained largely unexplored." p.8 Dismissive Uses of Testing in the Schools: A National Profile Testing in the Schools, New Directions for Testing and Measurement #19, Jossey-Bass, September 1983   Office of Research and Improvement, US Education Department Rubbish. Entire books dating back a century were written on the topic, for example:  C.C. Ross, Measurement in Today’s Schools, 1942;  G.M. Ruch, G.D. Stoddard, Tests and Measurements in High School Instruction, 1927;  C.W. Odell, Educational Measurement in High School, 1930. Other testimonies to the abundance of educational testing and empirical research on test use starting in the first half of the twentieth century can be found in Lincoln & Workman 1936, 4, 7; Butts 1947, 605; Monroe 1950, 1461; Holman & Docter 1972, 34; Tyack 1974, 183; and Lohman 1997, 88.
365 Joan L. Herman, Donald W. Dorr-Bremme Walter E. Hathaway, Ed. "We know very little about the quality of teacher-developed tests." p.15 Dismissive Uses of Testing in the Schools: A National Profile Testing in the Schools, New Directions for Testing and Measurement #19, Jossey-Bass, September 1983   Office of Research and Improvement, US Education Department Rubbish. Entire books dating back a century were written on the topic, for example:  C.C. Ross, Measurement in Today’s Schools, 1942;  G.M. Ruch, G.D. Stoddard, Tests and Measurements in High School Instruction, 1927;  C.W. Odell, Educational Measurement in High School, 1930. Other testimonies to the abundance of educational testing and empirical research on test use starting in the first half of the twentieth century can be found in Lincoln & Workman 1936, 4, 7; Butts 1947, 605; Monroe 1950, 1461; Holman & Docter 1972, 34; Tyack 1974, 183; and Lohman 1997, 88.
366 Don Dorr-Bremme James Catterall "Relatively little is known aout students' attitudes and feelings toward assessment in general. Even less is known regarding their feelings about different forms of assessment." p.48-1 Dismissive Costs of Testing: Test Use Project CSE Report, November 1982 https://files.eric.ed.gov/fulltext/ED224835.pdf National Institute of Education, US Education Department See  https://nonpartisaneducation.org/Review/Resources/QualitativeList.htm. For a list of 19 pre-1982 qualitative studies of student attitudes toward testing
367 Don Dorr-Bremme James Catterall "in light of these few and certainly non-definitive findings, student interviews were undertaken to explore the affective valence that different forms of achievement assessment have for students." p.48-2 Dismissive Costs of Testing: Test Use Project CSE Report, November 1982 https://files.eric.ed.gov/fulltext/ED224835.pdf National Institute of Education, US Education Department See  https://nonpartisaneducation.org/Review/Resources/QualitativeList.htm. For a list of 19 pre-1982 qualitative studies of student attitudes toward testing
368 Don Dorr-Bremme James Catterall "Because of the small sample size and the paucity of research in this topic, these findings suggests potential avenues for research as much as they provide information." p.48-26 Dismissive Costs of Testing: Test Use Project CSE Report, November 1982 https://files.eric.ed.gov/fulltext/ED224835.pdf National Institute of Education, US Education Department See  https://nonpartisaneducation.org/Review/Resources/QualitativeList.htm. For a list of 19 pre-1982 qualitative studies of student attitudes toward testing
369 Jennie P. Yeh Joan L. Herman "Testing in American schools is increasing in both scope and visibility. … What return are we getting for this quite considerable investment? Little information is available. How are tests used in schools? What functions to test serve in classrooms?", p.1 Dismissive Teachers and testing: A survey of test use CSE Report No. 166, 1981 https://files.eric.ed.gov/fulltext/ED218336.pdf National Institute of Education, US Education Department Rubbish. Entire books dating back a century were written on the topic, for example:  C.C. Ross, Measurement in Today’s Schools, 1942;  G.M. Ruch, G.D. Stoddard, Tests and Measurements in High School Instruction, 1927;  C.W. Odell, Educational Measurement in High School, 1930. Other testimonies to the abundance of educational testing and empirical research on test use starting in the first half of the twentieth century can be found in Lincoln & Workman 1936, 4, 7; Butts 1947, 605; Monroe 1950, 1461; Holman & Docter 1972, 34; Tyack 1974, 183; and Lohman 1997, 88.
370 Joan L. Herman James Burry, Don Dorr-Bremme, Charlotte M. Lazar-Morrison, James D. Lehman, Jennie P. Yeh "Despite the great controversy that surrounds testing and its potential uses and abuses, there is little empirical information available about the nature of testing as it actually occurs and is used (or not used) in schools. The Test Use Project at the Center for the Study of Evaluation seeks to fill this gap and answer basic questions about tests and schooling.", p.2 Dismissive Teaching and testing: Allies or adversaries CSE Report No. 165, 1981 https://files.eric.ed.gov/fulltext/ED218336.pdf National Institute of Education, US Education Department Rubbish. Entire books dating back a century were written on the topic, for example:  C.C. Ross, Measurement in Today’s Schools, 1942;  G.M. Ruch, G.D. Stoddard, Tests and Measurements in High School Instruction, 1927;  C.W. Odell, Educational Measurement in High School, 1930. Other testimonies to the abundance of educational testing and empirical research on test use starting in the first half of the twentieth century can be found in Lincoln & Workman 1936, 4, 7; Butts 1947, 605; Monroe 1950, 1461; Holman & Docter 1972, 34; Tyack 1974, 183; and Lohman 1997, 88.
371 Joan L. Herman James Burry, Don Dorr-Bremme, Charlotte M. Lazar-Morrison, James D. Lehman, Jennie P. Yeh "Clearly the policy toward testing in this country has been one of accretion, but the full magnitude is undocumented. The CSE Test Use Project ... ", p.2 Dismissive Teaching and testing: Allies or adversaries CSE Report No. 165, 1981 https://files.eric.ed.gov/fulltext/ED218336.pdf National Institute of Education, US Education Department Rubbish. Entire books dating back a century were written on the topic, for example:  C.C. Ross, Measurement in Today’s Schools, 1942;  G.M. Ruch, G.D. Stoddard, Tests and Measurements in High School Instruction, 1927;  C.W. Odell, Educational Measurement in High School, 1930. Other testimonies to the abundance of educational testing and empirical research on test use starting in the first half of the twentieth century can be found in Lincoln & Workman 1936, 4, 7; Butts 1947, 605; Monroe 1950, 1461; Holman & Docter 1972, 34; Tyack 1974, 183; and Lohman 1997, 88.
372 James Burry   "As instructional considerations have come into prominence, the dialogue over testing has become somewhat adversarial, with a great deal of the recent literature forming a series of position papers espousing the value of one kind of test over another, but offering little empirical data (Lazar-Morrison, Polin, Moy, & Burry, 1980)." p.27 Dismissive The Design of Testing Programs with Multiple and Complimentary Uses Paper presented at the Annual Meeting of the National Council on Measurement in Education (Los Angeles, CA, April 1981) https://files.eric.ed.gov/fulltext/ED218337.pdf National Institute of Education, US Education Department Rubbish. Entire books dating back a century were written on the topic, for example:  C.C. Ross, Measurement in Today’s Schools, 1942;  G.M. Ruch, G.D. Stoddard, Tests and Measurements in High School Instruction, 1927;  C.W. Odell, Educational Measurement in High School, 1930. Other testimonies to the abundance of educational testing and empirical research on test use starting in the first half of the twentieth century can be found in Lincoln & Workman 1936, 4, 7; Butts 1947, 605; Monroe 1950, 1461; Holman & Docter 1972, 34; Tyack 1974, 183; and Lohman 1997, 88.
373 James Burry   "This paper makes a preliminary step toward explicating school peoples' points of view about the kinds of assessment that are useful for external accountability concerns and for instructional decision making." pp.27-28 1stness The Design of Testing Programs with Multiple and Complimentary Uses Paper presented at the Annual Meeting of the National Council on Measurement in Education (Los Angeles, CA, April 1981) https://files.eric.ed.gov/fulltext/ED218337.pdf National Institute of Education, US Education Department Rubbish. Entire books dating back a century were written on the topic, for example:  C.C. Ross, Measurement in Today’s Schools, 1942;  G.M. Ruch, G.D. Stoddard, Tests and Measurements in High School Instruction, 1927;  C.W. Odell, Educational Measurement in High School, 1930. Other testimonies to the abundance of educational testing and empirical research on test use starting in the first half of the twentieth century can be found in Lincoln & Workman 1936, 4, 7; Butts 1947, 605; Monroe 1950, 1461; Holman & Docter 1972, 34; Tyack 1974, 183; and Lohman 1997, 88.
374 Joan L. Herman Jennie Yeh "Despite the great controversy that surrounds testing and its potential uses and abuses, there is little empirical information available about the nature of testing as it actually occurs and is used (or not used) in schools. The Test Use Project …." p.2 Dismissive Contextual Examination of Test Use: The Test, The Setting, The Cost Paper presented at the Annual Meeting of the National Council on Measurement in Education (Los Angeles, CA, April 1981) https://files.eric.ed.gov/fulltext/ED218337.pdf National Institute of Education, US Education Department Rubbish. Entire books dating back a century were written on the topic, for example:  C.C. Ross, Measurement in Today’s Schools, 1942;  G.M. Ruch, G.D. Stoddard, Tests and Measurements in High School Instruction, 1927;  C.W. Odell, Educational Measurement in High School, 1930. Other testimonies to the abundance of educational testing and empirical research on test use starting in the first half of the twentieth century can be found in Lincoln & Workman 1936, 4, 7; Butts 1947, 605; Monroe 1950, 1461; Holman & Docter 1972, 34; Tyack 1974, 183; and Lohman 1997, 88.
375 Joan L. Herman Jennie Yeh "Clearly the policy toward testing in this country has been one of accretion, but the full magnitude is undocumented. The CSE Test Use Project ... ", p.2 Dismissive Contextual Examination of Test Use: The Test, The Setting, The Cost Paper presented at the Annual Meeting of the National Council on Measurement in Education (Los Angeles, CA, April 1981) https://files.eric.ed.gov/fulltext/ED218337.pdf National Institute of Education, US Education Department Rubbish. Entire books dating back a century were written on the topic, for example:  C.C. Ross, Measurement in Today’s Schools, 1942;  G.M. Ruch, G.D. Stoddard, Tests and Measurements in High School Instruction, 1927;  C.W. Odell, Educational Measurement in High School, 1930. Other testimonies to the abundance of educational testing and empirical research on test use starting in the first half of the twentieth century can be found in Lincoln & Workman 1936, 4, 7; Butts 1947, 605; Monroe 1950, 1461; Holman & Docter 1972, 34; Tyack 1974, 183; and Lohman 1997, 88.
376 Charlotte Lazar-Morrison Linda Polin, Raymond Moy, James Burry "There is little research-based information about current testing practice." Dismissive A review of the literature on test use, p.3 CSE Report No. 144, August 1980 https://cresst.org/publications/cresst-publication-2531/ National Institute of Education, US Department of Health and Human Services Rubbish. Entire books dating back a century were written on the topic, for example:  C.C. Ross, Measurement in Today’s Schools, 1942;  G.M. Ruch, G.D. Stoddard, Tests and Measurements in High School Instruction, 1927;  C.W. Odell, Educational Measurement in High School, 1930. Other testimonies to the abundance of educational testing and empirical research on test use starting in the first half of the twentieth century can be found in Lincoln & Workman 1936, 4, 7; Butts 1947, 605; Monroe 1950, 1461; Holman & Docter 1972, 34; Tyack 1974, 183; and Lohman 1997, 88.
377 Charlotte Lazar-Morrison Linda Polin, Raymond Moy, James Burry "Almost ten years ago, Kirkland (1971) reviewed the literature on test impact on students and schools and found that while much had been written about tests, few empirical studies were evident."  Dismissive A review of the literature on test use, p.3 CSE Report No. 144, August 1980 https://cresst.org/publications/cresst-publication-2531/ National Institute of Education, US Department of Health and Human Services Rubbish. Entire books dating back a century were written on the topic, for example:  C.C. Ross, Measurement in Today’s Schools, 1942;  G.M. Ruch, G.D. Stoddard, Tests and Measurements in High School Instruction, 1927;  C.W. Odell, Educational Measurement in High School, 1930. Other testimonies to the abundance of educational testing and empirical research on test use starting in the first half of the twentieth century can be found in Lincoln & Workman 1936, 4, 7; Butts 1947, 605; Monroe 1950, 1461; Holman & Docter 1972, 34; Tyack 1974, 183; and Lohman 1997, 88.
378 Charlotte Lazar-Morrison Linda Polin, Raymond Moy, James Burry "What is significant about [Kirkland's] exclusions is the correct observation that these issues are 'implications,' often not founded on empirical research."  Denigrating A review of the literature on test use, p.3 CSE Report No. 144, August 1980 https://cresst.org/publications/cresst-publication-2531/ National Institute of Education, US Department of Health and Human Services Rubbish. Entire books dating back a century were written on the topic, for example:  C.C. Ross, Measurement in Today’s Schools, 1942;  G.M. Ruch, G.D. Stoddard, Tests and Measurements in High School Instruction, 1927;  C.W. Odell, Educational Measurement in High School, 1930. Other testimonies to the abundance of educational testing and empirical research on test use starting in the first half of the twentieth century can be found in Lincoln & Workman 1936, 4, 7; Butts 1947, 605; Monroe 1950, 1461; Holman & Docter 1972, 34; Tyack 1974, 183; and Lohman 1997, 88.
379 Charlotte Lazar-Morrison Linda Polin, Raymond Moy, James Burry "Today, there still remains a plethora of publications on these very issues and a dearth of empirical support on actual test use practices." Dismissive A review of the literature on test use, p.3 CSE Report No. 144, August 1980 https://cresst.org/publications/cresst-publication-2531/ National Institute of Education, US Department of Health and Human Services Rubbish. Entire books dating back a century were written on the topic, for example:  C.C. Ross, Measurement in Today’s Schools, 1942;  G.M. Ruch, G.D. Stoddard, Tests and Measurements in High School Instruction, 1927;  C.W. Odell, Educational Measurement in High School, 1930. Other testimonies to the abundance of educational testing and empirical research on test use starting in the first half of the twentieth century can be found in Lincoln & Workman 1936, 4, 7; Butts 1947, 605; Monroe 1950, 1461; Holman & Docter 1972, 34; Tyack 1974, 183; and Lohman 1997, 88.
380 Charlotte Lazar-Morrison Linda Polin, Raymond Moy, James Burry "Kirkland's review of the literature is concentrated mainly upon the social and psychological issues in testing, more than upon instructional issues. Also, then as now, little empirical research had accumulated on the latter. Dismissive A review of the literature on test use, p.3 CSE Report No. 144, August 1980 https://cresst.org/publications/cresst-publication-2531/ National Institute of Education, US Department of Health and Human Services Rubbish. Entire books dating back a century were written on the topic, for example:  C.C. Ross, Measurement in Today’s Schools, 1942;  G.M. Ruch, G.D. Stoddard, Tests and Measurements in High School Instruction, 1927;  C.W. Odell, Educational Measurement in High School, 1930. Other testimonies to the abundance of educational testing and empirical research on test use starting in the first half of the twentieth century can be found in Lincoln & Workman 1936, 4, 7; Butts 1947, 605; Monroe 1950, 1461; Holman & Docter 1972, 34; Tyack 1974, 183; and Lohman 1997, 88.
381 Charlotte Lazar-Morrison Linda Polin, Raymond Moy, James Burry "Only recently has the testing dialogue begun to move away from social and psychological issues ...and begun to focus on the instructional issues of testing. Dismissive A review of the literature on test use, p.3 CSE Report No. 144, August 1980 https://cresst.org/publications/cresst-publication-2531/ National Institute of Education, US Department of Health and Human Services Rubbish. Entire books dating back a century were written on the topic, for example:  C.C. Ross, Measurement in Today’s Schools, 1942;  G.M. Ruch, G.D. Stoddard, Tests and Measurements in High School Instruction, 1927;  C.W. Odell, Educational Measurement in High School, 1930. Other testimonies to the abundance of educational testing and empirical research on test use starting in the first half of the twentieth century can be found in Lincoln & Workman 1936, 4, 7; Butts 1947, 605; Monroe 1950, 1461; Holman & Docter 1972, 34; Tyack 1974, 183; and Lohman 1997, 88.
382 Charlotte Lazar-Morrison Linda Polin, Raymond Moy, James Burry " ...the testing dialogue has taken the form of a debate, with the bulk of the test literature being a series of position papers citing little empirical data. This debate is being carried on predominantly by people outside the schools." Denigrating A review of the literature on test use, p.4 CSE Report No. 144, August 1980 https://cresst.org/publications/cresst-publication-2531/ National Institute of Education, US Department of Health and Human Services Rubbish. Entire books dating back a century were written on the topic, for example:  C.C. Ross, Measurement in Today’s Schools, 1942;  G.M. Ruch, G.D. Stoddard, Tests and Measurements in High School Instruction, 1927;  C.W. Odell, Educational Measurement in High School, 1930. Other testimonies to the abundance of educational testing and empirical research on test use starting in the first half of the twentieth century can be found in Lincoln & Workman 1936, 4, 7; Butts 1947, 605; Monroe 1950, 1461; Holman & Docter 1972, 34; Tyack 1974, 183; and Lohman 1997, 88.
383 Charlotte Lazar-Morrison Linda Polin, Raymond Moy, James Burry ""There is little empirical research available that can answer the questions that have arisen."  Dismissive A review of the literature on test use, p.5 CSE Report No. 144, August 1980 https://cresst.org/publications/cresst-publication-2531/ National Institute of Education, US Department of Health and Human Services Rubbish. Entire books dating back a century were written on the topic, for example:  C.C. Ross, Measurement in Today’s Schools, 1942;  G.M. Ruch, G.D. Stoddard, Tests and Measurements in High School Instruction, 1927;  C.W. Odell, Educational Measurement in High School, 1930. Other testimonies to the abundance of educational testing and empirical research on test use starting in the first half of the twentieth century can be found in Lincoln & Workman 1936, 4, 7; Butts 1947, 605; Monroe 1950, 1461; Holman & Docter 1972, 34; Tyack 1974, 183; and Lohman 1997, 88.
384 Charlotte Lazar-Morrison Linda Polin, Raymond Moy, James Burry "... little is known about the amount of other testing that takes place."  Dismissive A review of the literature on test use, p.6 CSE Report No. 144, August 1980 https://cresst.org/publications/cresst-publication-2531/ National Institute of Education, US Department of Health and Human Services Rubbish. Entire books dating back a century were written on the topic, for example:  C.C. Ross, Measurement in Today’s Schools, 1942;  G.M. Ruch, G.D. Stoddard, Tests and Measurements in High School Instruction, 1927;  C.W. Odell, Educational Measurement in High School, 1930. Other testimonies to the abundance of educational testing and empirical research on test use starting in the first half of the twentieth century can be found in Lincoln & Workman 1936, 4, 7; Butts 1947, 605; Monroe 1950, 1461; Holman & Docter 1972, 34; Tyack 1974, 183; and Lohman 1997, 88.
385 Charlotte Lazar-Morrison Linda Polin, Raymond Moy, James Burry "Although much has been written about minimum competency issues, there has yet to be any report of the actual uses or extent of the use of competency-based tests." Dismissive A review of the literature on test use, p.7 CSE Report No. 144, August 1980 https://cresst.org/publications/cresst-publication-2531/ National Institute of Education, US Department of Health and Human Services Rubbish. Entire books dating back a century were written on the topic, for example:  C.C. Ross, Measurement in Today’s Schools, 1942;  G.M. Ruch, G.D. Stoddard, Tests and Measurements in High School Instruction, 1927;  C.W. Odell, Educational Measurement in High School, 1930. Other testimonies to the abundance of educational testing and empirical research on test use starting in the first half of the twentieth century can be found in Lincoln & Workman 1936, 4, 7; Butts 1947, 605; Monroe 1950, 1461; Holman & Docter 1972, 34; Tyack 1974, 183; and Lohman 1997, 88.
386 Charlotte Lazar-Morrison Linda Polin, Raymond Moy, James Burry ""Virtually nothing is known about the amount of testing taking place using other types of assessments."  Dismissive A review of the literature on test use, p.7 CSE Report No. 144, August 1980 https://cresst.org/publications/cresst-publication-2531/ National Institute of Education, US Department of Health and Human Services Rubbish. Entire books dating back a century were written on the topic, for example:  C.C. Ross, Measurement in Today’s Schools, 1942;  G.M. Ruch, G.D. Stoddard, Tests and Measurements in High School Instruction, 1927;  C.W. Odell, Educational Measurement in High School, 1930. Other testimonies to the abundance of educational testing and empirical research on test use starting in the first half of the twentieth century can be found in Lincoln & Workman 1936, 4, 7; Butts 1947, 605; Monroe 1950, 1461; Holman & Docter 1972, 34; Tyack 1974, 183; and Lohman 1997, 88.
387 Charlotte Lazar-Morrison Linda Polin, Raymond Moy, James Burry "The literature on curriculum-embedded tests is equally scant." Dismissive A review of the literature on test use, p.8 CSE Report No. 144, August 1980 https://cresst.org/publications/cresst-publication-2531/ National Institute of Education, US Department of Health and Human Services Rubbish. Entire books dating back a century were written on the topic, for example:  C.C. Ross, Measurement in Today’s Schools, 1942;  G.M. Ruch, G.D. Stoddard, Tests and Measurements in High School Instruction, 1927;  C.W. Odell, Educational Measurement in High School, 1930. Other testimonies to the abundance of educational testing and empirical research on test use starting in the first half of the twentieth century can be found in Lincoln & Workman 1936, 4, 7; Butts 1947, 605; Monroe 1950, 1461; Holman & Docter 1972, 34; Tyack 1974, 183; and Lohman 1997, 88.
388 Charlotte Lazar-Morrison Linda Polin, Raymond Moy, James Burry "The current information focuses on norm- and criterion-referenced tests with some emphasis on minimum competency testing. Since literature on the other evaluative processes is lacking, there is a great need to look at various types of assessments to determine the purposes they serve.  Dismissive A review of the literature on test use, p.9 CSE Report No. 144, August 1980 https://cresst.org/publications/cresst-publication-2531/ National Institute of Education, US Department of Health and Human Services Rubbish. Entire books dating back a century were written on the topic, for example:  C.C. Ross, Measurement in Today’s Schools, 1942;  G.M. Ruch, G.D. Stoddard, Tests and Measurements in High School Instruction, 1927;  C.W. Odell, Educational Measurement in High School, 1930. Other testimonies to the abundance of educational testing and empirical research on test use starting in the first half of the twentieth century can be found in Lincoln & Workman 1936, 4, 7; Butts 1947, 605; Monroe 1950, 1461; Holman & Docter 1972, 34; Tyack 1974, 183; and Lohman 1997, 88.
389 Charlotte Lazar-Morrison Linda Polin, Raymond Moy, James Burry "The kinds of contextual factors which influence testing and the use of test results are just beginning to be appreciated." Dismissive A review of the literature on test use, p.9 CSE Report No. 144, August 1980 https://cresst.org/publications/cresst-publication-2531/ National Institute of Education, US Department of Health and Human Services Rubbish. Entire books dating back a century were written on the topic, for example:  C.C. Ross, Measurement in Today’s Schools, 1942;  G.M. Ruch, G.D. Stoddard, Tests and Measurements in High School Instruction, 1927;  C.W. Odell, Educational Measurement in High School, 1930. Other testimonies to the abundance of educational testing and empirical research on test use starting in the first half of the twentieth century can be found in Lincoln & Workman 1936, 4, 7; Butts 1947, 605; Monroe 1950, 1461; Holman & Docter 1972, 34; Tyack 1974, 183; and Lohman 1997, 88.
390 Charlotte Lazar-Morrison Linda Polin, Raymond Moy, James Burry "Concern exists about the level of teacher training in testing. ... The literature does not appear to reflect any great follow-up to such suggestions [regarding teacher competence with testing]." Dismissive A review of the literature on test use, p.9 CSE Report No. 144, August 1980 https://cresst.org/publications/cresst-publication-2531/ National Institute of Education, US Department of Health and Human Services Rubbish. Entire books dating back a century were written on the topic, for example:  C.C. Ross, Measurement in Today’s Schools, 1942;  G.M. Ruch, G.D. Stoddard, Tests and Measurements in High School Instruction, 1927;  C.W. Odell, Educational Measurement in High School, 1930. Other testimonies to the abundance of educational testing and empirical research on test use starting in the first half of the twentieth century can be found in Lincoln & Workman 1936, 4, 7; Butts 1947, 605; Monroe 1950, 1461; Holman & Docter 1972, 34; Tyack 1974, 183; and Lohman 1997, 88.
391 Charlotte Lazar-Morrison Linda Polin, Raymond Moy, James Burry "All of the studies mentioned included information about standardized achievement testing. As of yet, there is no evidence about how teacher attitudes toward other types of tests affect the use of those assessments." Dismissive A review of the literature on test use, p.19 CSE Report No. 144, August 1980 https://cresst.org/publications/cresst-publication-2531/ National Institute of Education, US Department of Health and Human Services Rubbish. Entire books dating back a century were written on the topic, for example:  C.C. Ross, Measurement in Today’s Schools, 1942;  G.M. Ruch, G.D. Stoddard, Tests and Measurements in High School Instruction, 1927;  C.W. Odell, Educational Measurement in High School, 1930. Other testimonies to the abundance of educational testing and empirical research on test use starting in the first half of the twentieth century can be found in Lincoln & Workman 1936, 4, 7; Butts 1947, 605; Monroe 1950, 1461; Holman & Docter 1972, 34; Tyack 1974, 183; and Lohman 1997, 88.
392 Charlotte Lazar-Morrison Linda Polin, Raymond Moy, James Burry "The effect of the actual testing environment on test use is only beginning to emerge. Evidence suggests that characteristics of the test-takers and the instructional environment need to be explored." Dismissive A review of the literature on test use, p.19 CSE Report No. 144, August 1980 https://cresst.org/publications/cresst-publication-2531/ National Institute of Education, US Department of Health and Human Services Rubbish. Entire books dating back a century were written on the topic, for example:  C.C. Ross, Measurement in Today’s Schools, 1942;  G.M. Ruch, G.D. Stoddard, Tests and Measurements in High School Instruction, 1927;  C.W. Odell, Educational Measurement in High School, 1930. Other testimonies to the abundance of educational testing and empirical research on test use starting in the first half of the twentieth century can be found in Lincoln & Workman 1936, 4, 7; Butts 1947, 605; Monroe 1950, 1461; Holman & Docter 1972, 34; Tyack 1974, 183; and Lohman 1997, 88.
393 Charlotte Lazar-Morrison Linda Polin, Raymond Moy, James Burry "These factors have been considered in research on teachers' instructional decision-making or in studies of the social or organizational qualities of the classroom. The investigation of these variables as factors affecting teachers' use of tests and test data is minimal." Dismissive A review of the literature on test use, p.20 CSE Report No. 144, August 1980 https://cresst.org/publications/cresst-publication-2531/ National Institute of Education, US Department of Health and Human Services Rubbish. Entire books dating back a century were written on the topic, for example:  C.C. Ross, Measurement in Today’s Schools, 1942;  G.M. Ruch, G.D. Stoddard, Tests and Measurements in High School Instruction, 1927;  C.W. Odell, Educational Measurement in High School, 1930. Other testimonies to the abundance of educational testing and empirical research on test use starting in the first half of the twentieth century can be found in Lincoln & Workman 1936, 4, 7; Butts 1947, 605; Monroe 1950, 1461; Holman & Docter 1972, 34; Tyack 1974, 183; and Lohman 1997, 88.
394 Charlotte Lazar-Morrison Linda Polin, Raymond Moy, James Burry "In the community, parent involvement, accounability pressures, and news media coverage of test scores are possible influences on the nature and amount of testing, but they have yet to be researched."  Dismissive A review of the literature on test use, p.20 CSE Report No. 144, August 1980 https://cresst.org/publications/cresst-publication-2531/ National Institute of Education, US Department of Health and Human Services Rubbish. Entire books dating back a century were written on the topic, for example:  C.C. Ross, Measurement in Today’s Schools, 1942;  G.M. Ruch, G.D. Stoddard, Tests and Measurements in High School Instruction, 1927;  C.W. Odell, Educational Measurement in High School, 1930. Other testimonies to the abundance of educational testing and empirical research on test use starting in the first half of the twentieth century can be found in Lincoln & Workman 1936, 4, 7; Butts 1947, 605; Monroe 1950, 1461; Holman & Docter 1972, 34; Tyack 1974, 183; and Lohman 1997, 88.
395 Charlotte Lazar-Morrison Linda Polin, Raymond Moy, James Burry "We know very little about the costs of testing." Dismissive A review of the literature on test use, p.20 CSE Report No. 144, August 1980 https://cresst.org/publications/cresst-publication-2531/ National Institute of Education, US Department of Health and Human Services Rubbish. Entire books dating back a century were written on the topic, for example:  C.C. Ross, Measurement in Today’s Schools, 1942;  G.M. Ruch, G.D. Stoddard, Tests and Measurements in High School Instruction, 1927;  C.W. Odell, Educational Measurement in High School, 1930. Other testimonies to the abundance of educational testing and empirical research on test use starting in the first half of the twentieth century can be found in Lincoln & Workman 1936, 4, 7; Butts 1947, 605; Monroe 1950, 1461; Holman & Docter 1972, 34; Tyack 1974, 183; and Lohman 1997, 88.
396 Charlotte Lazar-Morrison Linda Polin, Raymond Moy, James Burry "Little information is available about these types of costs, and the little information that is available concerns teachers and student attitudes." Dismissive A review of the literature on test use, p.22 CSE Report No. 144, August 1980 https://cresst.org/publications/cresst-publication-2531/ National Institute of Education, US Department of Health and Human Services Rubbish. Entire books dating back a century were written on the topic, for example:  C.C. Ross, Measurement in Today’s Schools, 1942;  G.M. Ruch, G.D. Stoddard, Tests and Measurements in High School Instruction, 1927;  C.W. Odell, Educational Measurement in High School, 1930. Other testimonies to the abundance of educational testing and empirical research on test use starting in the first half of the twentieth century can be found in Lincoln & Workman 1936, 4, 7; Butts 1947, 605; Monroe 1950, 1461; Holman & Docter 1972, 34; Tyack 1974, 183; and Lohman 1997, 88.
397 Charlotte Lazar-Morrison Linda Polin, Raymond Moy, James Burry "The question of whether test scores affect a student's self-concept has also been raised." ... As indicated previously, information on any of the aforementioned issues is scant," Dismissive A review of the literature on test use, p.23 CSE Report No. 144, August 1980 https://cresst.org/publications/cresst-publication-2531/ National Institute of Education, US Department of Health and Human Services Rubbish. Entire books dating back a century were written on the topic, for example:  C.C. Ross, Measurement in Today’s Schools, 1942;  G.M. Ruch, G.D. Stoddard, Tests and Measurements in High School Instruction, 1927;  C.W. Odell, Educational Measurement in High School, 1930. Other testimonies to the abundance of educational testing and empirical research on test use starting in the first half of the twentieth century can be found in Lincoln & Workman 1936, 4, 7; Butts 1947, 605; Monroe 1950, 1461; Holman & Docter 1972, 34; Tyack 1974, 183; and Lohman 1997, 88.
398 Charlotte Lazar-Morrison Linda Polin, Raymond Moy, James Burry "Other evidence suggests that tests of many types are being administered and the results are being utilized. To what extent this is occurring is not specifically known." Dismissive A review of the literature on test use, pp.23-24 CSE Report No. 144, August 1980 https://cresst.org/publications/cresst-publication-2531/ National Institute of Education, US Department of Health and Human Services Rubbish. Entire books dating back a century were written on the topic, for example:  C.C. Ross, Measurement in Today’s Schools, 1942;  G.M. Ruch, G.D. Stoddard, Tests and Measurements in High School Instruction, 1927;  C.W. Odell, Educational Measurement in High School, 1930. Other testimonies to the abundance of educational testing and empirical research on test use starting in the first half of the twentieth century can be found in Lincoln & Workman 1936, 4, 7; Butts 1947, 605; Monroe 1950, 1461; Holman & Docter 1972, 34; Tyack 1974, 183; and Lohman 1997, 88.
399 Charlotte Lazar-Morrison Linda Polin, Raymond Moy, James Burry "There are a number of areas concerning teachers and testing for which there is no information." Dismissive A review of the literature on test use, p.24 CSE Report No. 144, August 1980 https://cresst.org/publications/cresst-publication-2531/ National Institute of Education, US Department of Health and Human Services Rubbish. Entire books dating back a century were written on the topic, for example:  C.C. Ross, Measurement in Today’s Schools, 1942;  G.M. Ruch, G.D. Stoddard, Tests and Measurements in High School Instruction, 1927;  C.W. Odell, Educational Measurement in High School, 1930. Other testimonies to the abundance of educational testing and empirical research on test use starting in the first half of the twentieth century can be found in Lincoln & Workman 1936, 4, 7; Butts 1947, 605; Monroe 1950, 1461; Holman & Docter 1972, 34; Tyack 1974, 183; and Lohman 1997, 88.
400 Charlotte Lazar-Morrison Linda Polin, Raymond Moy, James Burry "The impact of other testing must also be considered. In-class assessments made by individual teachers have yet to be examined in depth." Dismissive A review of the literature on test use, p.24 CSE Report No. 144, August 1980 https://cresst.org/publications/cresst-publication-2531/ National Institute of Education, US Department of Health and Human Services Rubbish. Entire books dating back a century were written on the topic, for example:  C.C. Ross, Measurement in Today’s Schools, 1942;  G.M. Ruch, G.D. Stoddard, Tests and Measurements in High School Instruction, 1927;  C.W. Odell, Educational Measurement in High School, 1930. Other testimonies to the abundance of educational testing and empirical research on test use starting in the first half of the twentieth century can be found in Lincoln & Workman 1936, 4, 7; Butts 1947, 605; Monroe 1950, 1461; Holman & Docter 1972, 34; Tyack 1974, 183; and Lohman 1997, 88.
401 Charlotte Lazar-Morrison Linda Polin, Raymond Moy, James Burry "Teachers place greater reliance on, and have more confidence in, the results of their own judgments of students' performance, but little is known about the kinds of activities that give voice to this information." Dismissive A review of the literature on test use, p.25 CSE Report No. 144, August 1980 https://cresst.org/publications/cresst-publication-2531/ National Institute of Education, US Department of Health and Human Services Rubbish. Entire books dating back a century were written on the topic, for example:  C.C. Ross, Measurement in Today’s Schools, 1942;  G.M. Ruch, G.D. Stoddard, Tests and Measurements in High School Instruction, 1927;  C.W. Odell, Educational Measurement in High School, 1930. Other testimonies to the abundance of educational testing and empirical research on test use starting in the first half of the twentieth century can be found in Lincoln & Workman 1936, 4, 7; Butts 1947, 605; Monroe 1950, 1461; Holman & Docter 1972, 34; Tyack 1974, 183; and Lohman 1997, 88.
402 Charlotte Lazar-Morrison Linda Polin, Raymond Moy, James Burry "The settings and factors which affect the use of tests and their results is yet another uninformed area." Dismissive A review of the literature on test use, p.25 CSE Report No. 144, August 1980 https://cresst.org/publications/cresst-publication-2531/ National Institute of Education, US Department of Health and Human Services Rubbish. Entire books dating back a century were written on the topic, for example:  C.C. Ross, Measurement in Today’s Schools, 1942;  G.M. Ruch, G.D. Stoddard, Tests and Measurements in High School Instruction, 1927;  C.W. Odell, Educational Measurement in High School, 1930. Other testimonies to the abundance of educational testing and empirical research on test use starting in the first half of the twentieth century can be found in Lincoln & Workman 1936, 4, 7; Butts 1947, 605; Monroe 1950, 1461; Holman & Docter 1972, 34; Tyack 1974, 183; and Lohman 1997, 88.
                   
  IRONIES:                
  Rand Corporation   "All RAND [monographs/occasional papers/etc.] undergo rigorous peer review to ensure that they meet high standards for research quality and objectivity."            
  Susannah Faxon-Mills, Laura S. Hamilton, Mollie Rudnick, Brian M. Stecher   "We found considerable research on the effects of testing in U.S. schools, including studies of high-stakes testing, performance assessment, and formative assessment." p. viii   New Assessments, Better Instruction? Designing Assessment Systems to Promote Instructional Improvement Rand Corporation Research Report, 2013   "Funding to support the research was provided by the William and Flora Hewlett Foundation."  "Marc Chun at the Hewlett Foundation first approached us about reviewing the literature on the impact of assessment, and he was very helpful in framing this investigation."  
  Michael J. Feuer   "To challenge authority is to hold authority accountable. Challenging people in power requires them to show that what they are doing is legitimate; we invite them to rise to the challenge and prove their case; and they, in turn, trust that the system will treat them fairly."   Measuring Accountability When Trust Is Conditional Education Week, September 24, 2012 https://www.edweek.org/ew/articles/2012/09/24/05feuer_ep.h32.html?print=1    
  Michael J. Feuer   "No profession is granted automatic autonomy or an exemption from evaluation."   Measuring Accountability When Trust Is Conditional Education Week, September 24, 2012 https://www.edweek.org/ew/articles/2012/09/24/05feuer_ep.h32.html?print=1    
  Joan L. Herman
Susan H. Fuhrman & Richard F. Elmore, Eds "Granted, one would expect to see higher growth on KIRIS, which was customized to Kentucky's learning objectives, than to the more general and thereby less curricularly sensitive NAEP measure."    Redesigning Accountability Systems for Education, Chapter 7 Teachers College Press, 2004   Institute of Education Sciences, US Education Department  
  Deborah Loewenberg Ball Jo Boaler, Phil Daro, Andrew Porter, & 14 others "High-quality work depends on open debate unconstrained by orthodoxies and political agendas. It is crucial that the composition of the panels and the extended research communities be inclusive, engaging individuals with a wide range of views and skills." p.xxiii   Mathematical Proficiency for All Students Rand Corporation, 2003 https://www.rand.org/pubs/monograph_reports/MR1643.html Office of Research and Improvement, US Education Department  
  Laura S. Hamilton Brian M. Stecher, Stephen P. Klein "Greater knowledge about testing and accountability can lead to better system design and more-effective system management." p.xiv   Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002 Summary, p.xiv      
  Laura S. Hamilton Brian M. Stecher "Incremental improvements to existing systems, based on current research on testing and accountability, should be combined with long-term research and development efforts that may ultimately lead to a major redesign of these systems. Success in this endeavor will require the thoughtful engagement of educators, policymakers, and researchers in discussions and debates about tests and testing policies."   Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002 Chapter 6, Improving test-based accountability, pp.143-144      
  Brian M. Stecher Stephen P. Klein "Additional information about the impact of performance assessments on curriculum and instruction would provide policymakers with valuable data on the benefits that may accrue from this relatively expensive form of assessment." p.11   The Cost of Science Performance Assessments in Large-Scale Testing Programs, p.1 Educational Evaluation and Policy Analysis, Spring 1997, 19(1)      
  Ronald James Dietel   "comparative information from other research organizations would aid decision makers in measuring program quality;" Abstract   Evaluation of the Dissemination Program from an Education Research and Development Center Doctoral Dissertation, University of California, Los Angeles      
  Eva L. Baker Robert L. Linn, Joan L. Herman "Diverse perspectives are needed to clarify real differences and to find equitable, workable balances."   CRESST: A Continuing Mission to Improve Educational Assessment, p.13 Evaluation Comment, Summer 1996      
  Eva L. Baker Robert L. Linn, Joan L. Herman "Impartiality, not advocacy, is the key to the credibility of research and development."   CRESST: A Continuing Mission to Improve Educational Assessment, p.13 Evaluation Comment, Summer 1996      
  Madaus, G.F.   "too often policy debates emphasize only one side or the other of the testing effects coin"   The effects of important tests on students: Implications for a National Examination System, 1991 Phi Delta Kappan, 73(3), 226-231. As quoted in William A. Mehrens, Consequences of Assessment: What is the Evidence?, Education Policy Analysis Archives Volume 6 Number 13 July 14, 1998,  https://epaa.asu.edu/ojs/article/view/580/    
                   
      Author cites (and accepts as fact without checking) someone elses dismissive review            
      Cite selves or colleagues in the group, but dismiss or denigrate all other work            
      Falsely claim that research has only recently been done on topic.            
1) [as of July 4, 2021] SCOPE funders include:  Bill & Melinda Gates Foundation; California Education Policy Fund;  Carnegie Corporation of New York; Center for American Progress; Community Education Fund, Silicon Valley Community Foundation; Ford Foundation; James Irvine Foundation; Joyce Foundation; Justice Matters; Learning Forward; Metlife Foundation; National Center on Education and the Economy; National Education Association; National Public Education Support Fund; Nellie Mae Education Foundation; NoVo Foundation; Rose Foundation;S. D. Bechtel, Jr. Foundation; San Francisco Foundation; Sandler Foundation; Silver Giving Foundation; Spencer Foundation; Stanford University; Stuart Foundation; The Wallace Foundation; William and Flora Hewlett Foundation; William T. Grant Foundation