HOME: Dismissive Reviews in Education Policy Research
	Author	Co-author(s)	Dismissive Quote	type	Title	Source	Link1	Funders	Notes	Notes2
1	John F. Pane		"Practitioners and policymakers seeking to implement personalized learning, lacking clearly defined evidence-based models to adopt, are creating custom designs for their specific contexts. Those who want to use rigorous research evidence to guide their designs will find many gaps and will be left with important unanswered questions about which practices or combinations of practices are effective. It will likely take many years of research to fill these gaps".	Dismissive	Strategies for Implementing Personalized Learning While Evidence and Resources Are Underdeveloped, p.1	Rand Corporation Perspective, October 2018	https://www.rand.org/pubs/perspectives/PE314.html	Funded by the William and Flora Hewlett Foundation, UCLA’s National Center for Research on Evaluation, Standards, and Student Testing (CRESST) is monitoring the extent to which the two consortia’s assessment development efforts are likely to produce tests that measure and support goals for deeper learning. And, Rand Corporation funders	Pane devotes considerable text to claims that no prior research exists, except for another Rand study, and then, on p.7 admits that there exist some relevant mastery learning studies from the 1980s. He implies, however, that there were only one or a few. In fact, there were hundreds. These researchers have included Homme, Csanyi, Gonzales, Rechs, O’Leary, Drabman, Kaszdin, Bootzin, Staats, Cameron, Pierce, McMillan, Corcoran, and Wilson. There have also been thousands of studies of personalized instruction in conjunction with studies in special education, tutoring, teachers' aides, tracking, etc.
2	John F. Pane		"The purpose of this Perspective is to offer strategic guidance for designers of personalized learning programs to consider while the evidence base is catching up."	Dismissive	Strategies for Implementing Personalized Learning While Evidence and Resources Are Underdeveloped, p.1	Rand Corporation Perspective, October 2018	https://www.rand.org/pubs/perspectives/PE314.html	Funded by the William and Flora Hewlett Foundation, UCLA’s National Center for Research on Evaluation, Standards, and Student Testing (CRESST) is monitoring the extent to which the two consortia’s assessment development efforts are likely to produce tests that measure and support goals for deeper learning. And, Rand Corporation funders	Pane devotes considerable text to claims that no prior research exists, except for another Rand study, and then, on p.7 admits that there exist some relevant mastery learning studies from the 1980s. He implies, however, that there were only one or a few. In fact, there were hundreds. These researchers have included Homme, Csanyi, Gonzales, Rechs, O’Leary, Drabman, Kaszdin, Bootzin, Staats, Cameron, Pierce, McMillan, Corcoran, and Wilson. There have also been thousands of studies of personalized instruction in conjunction with studies in special education, tutoring, teachers' aides, tracking, etc.
3	John F. Pane		"This guidance draws on theory, basic principles of learning science, and the limited research that does exist on personalized learning and its component parts."	Dismissive	Strategies for Implementing Personalized Learning While Evidence and Resources Are Underdeveloped, p.1	Rand Corporation Perspective, October 2018	https://www.rand.org/pubs/perspectives/PE314.html	Funded by the William and Flora Hewlett Foundation, UCLA’s National Center for Research on Evaluation, Standards, and Student Testing (CRESST) is monitoring the extent to which the two consortia’s assessment development efforts are likely to produce tests that measure and support goals for deeper learning. And, Rand Corporation funders	Pane devotes considerable text to claims that no prior research exists, except for another Rand study, and then, on p.7 admits that there exist some relevant mastery learning studies from the 1980s. He implies, however, that there were only one or a few. In fact, there were hundreds. These researchers have included Homme, Csanyi, Gonzales, Rechs, O’Leary, Drabman, Kaszdin, Bootzin, Staats, Cameron, Pierce, McMillan, Corcoran, and Wilson. There have also been thousands of studies of personalized instruction in conjunction with studies in special education, tutoring, teachers' aides, tracking, etc.
4	John F. Pane		"Thus far, the research evidence on personalized learning as an overarching schoolwide model is sparse."	Dismissive	Strategies for Implementing Personalized Learning While Evidence and Resources Are Underdeveloped, p.4	Rand Corporation Perspective, October 2018	https://www.rand.org/pubs/perspectives/PE314.html	Funded by the William and Flora Hewlett Foundation, UCLA’s National Center for Research on Evaluation, Standards, and Student Testing (CRESST) is monitoring the extent to which the two consortia’s assessment development efforts are likely to produce tests that measure and support goals for deeper learning. And, Rand Corporation funders	Pane devotes considerable text to claims that no prior research exists, except for another Rand study, and then, on p.7 admits that there exist some relevant mastery learning studies from the 1980s. He implies, however, that there were only one or a few. In fact, there were hundreds. These researchers have included Homme, Csanyi, Gonzales, Rechs, O’Leary, Drabman, Kaszdin, Bootzin, Staats, Cameron, Pierce, McMillan, Corcoran, and Wilson. There have also been thousands of studies of personalized instruction in conjunction with studies in special education, tutoring, teachers' aides, tracking, etc.
5	John F. Pane		"A team of RAND Corporation researchers conducted the largest and most-rigorous studies of student achievement effects to date."	1stness	Strategies for Implementing Personalized Learning While Evidence and Resources Are Underdeveloped, p.4	Rand Corporation Perspective, October 2018	https://www.rand.org/pubs/perspectives/PE314.html	Funded by the William and Flora Hewlett Foundation, UCLA’s National Center for Research on Evaluation, Standards, and Student Testing (CRESST) is monitoring the extent to which the two consortia’s assessment development efforts are likely to produce tests that measure and support goals for deeper learning. And, Rand Corporation funders	Pane devotes considerable text to claims that no prior research exists, except for another Rand study, and then, on p.7 admits that there exist some relevant mastery learning studies from the 1980s. He implies, however, that there were only one or a few. In fact, there were hundreds. These researchers have included Homme, Csanyi, Gonzales, Rechs, O’Leary, Drabman, Kaszdin, Bootzin, Staats, Cameron, Pierce, McMillan, Corcoran, and Wilson. There have also been thousands of studies of personalized instruction in conjunction with studies in special education, tutoring, teachers' aides, tracking, etc.
6	John F. Pane		"While we await the answers to those questions, substantial enthusiasm around personalized learning persists. Educators, policy makers, and advocates are moving forward without the guidance of conclusive research evidence."	Dismissive	Strategies for Implementing Personalized Learning While Evidence and Resources Are Underdeveloped, p.5	Rand Corporation Perspective, October 2018	https://www.rand.org/pubs/perspectives/PE314.html	Funded by the William and Flora Hewlett Foundation, UCLA’s National Center for Research on Evaluation, Standards, and Student Testing (CRESST) is monitoring the extent to which the two consortia’s assessment development efforts are likely to produce tests that measure and support goals for deeper learning. And, Rand Corporation funders	Pane devotes considerable text to claims that no prior research exists, except for another Rand study, and then, on p.7 admits that there exist some relevant mastery learning studies from the 1980s. He implies, however, that there were only one or a few. In fact, there were hundreds. These researchers have included Homme, Csanyi, Gonzales, Rechs, O’Leary, Drabman, Kaszdin, Bootzin, Staats, Cameron, Pierce, McMillan, Corcoran, and Wilson. There have also been thousands of studies of personalized instruction in conjunction with studies in special education, tutoring, teachers' aides, tracking, etc.
7	John F. Pane		"In the absence of comprehensive, rigorous evidence to help select the personalized learning components most likely to succeed, what is the path forward? I suggest a few guiding principles aimed at using existing scientific knowledge and the best available resources."	Denigrating	Strategies for Implementing Personalized Learning While Evidence and Resources Are Underdeveloped, p.5	Rand Corporation Perspective, October 2018	https://www.rand.org/pubs/perspectives/PE314.html	Funded by the William and Flora Hewlett Foundation, UCLA’s National Center for Research on Evaluation, Standards, and Student Testing (CRESST) is monitoring the extent to which the two consortia’s assessment development efforts are likely to produce tests that measure and support goals for deeper learning. And, Rand Corporation funders	Pane devotes considerable text to claims that no prior research exists, except for another Rand study, and then, on p.7 admits that there exist some relevant mastery learning studies from the 1980s. He implies, however, that there were only one or a few. In fact, there were hundreds. These researchers have included Homme, Csanyi, Gonzales, Rechs, O’Leary, Drabman, Kaszdin, Bootzin, Staats, Cameron, Pierce, McMillan, Corcoran, and Wilson. There have also been thousands of studies of personalized instruction in conjunction with studies in special education, tutoring, teachers' aides, tracking, etc.
8	John F. Pane		"However, more work is necessary to establish causal evidence that the concept leads to improved outcomes for students"	Dismissive	Strategies for Implementing Personalized Learning While Evidence and Resources Are Underdeveloped, p.9	Rand Corporation Perspective, October 2018	https://www.rand.org/pubs/perspectives/PE314.html	Funded by the William and Flora Hewlett Foundation, UCLA’s National Center for Research on Evaluation, Standards, and Student Testing (CRESST) is monitoring the extent to which the two consortia’s assessment development efforts are likely to produce tests that measure and support goals for deeper learning. And, Rand Corporation funders	Pane devotes considerable text to claims that no prior research exists, except for another Rand study, and then, on p.7 admits that there exist some relevant mastery learning studies from the 1980s. He implies, however, that there were only one or a few. In fact, there were hundreds. These researchers have included Homme, Csanyi, Gonzales, Rechs, O’Leary, Drabman, Kaszdin, Bootzin, Staats, Cameron, Pierce, McMillan, Corcoran, and Wilson. There have also been thousands of studies of personalized instruction in conjunction with studies in special education, tutoring, teachers' aides, tracking, etc.
9	John F. Pane		"Those who want to use rigorous research evidence to guide their designs will find many gaps and will be left with important unanswered questions about which practices or combinations of practices are effective."	Dismissive, Denigrating	Strategies for Implementing Personalized Learning While Evidence and Resources Are Underdeveloped, p.12	Rand Corporation Perspective, October 2018	https://www.rand.org/pubs/perspectives/PE314.html	Funded by the William and Flora Hewlett Foundation, UCLA’s National Center for Research on Evaluation, Standards, and Student Testing (CRESST) is monitoring the extent to which the two consortia’s assessment development efforts are likely to produce tests that measure and support goals for "deeper learning."	Pane devotes considerable text to claims that no prior research exists, except for another Rand study, and then, on p.7 admits that there exist some relevant mastery learning studies from the 1980s. He implies, however, that there were only one or a few. In fact, there were hundreds. These researchers have included Homme, Csanyi, Gonzales, Rechs, O’Leary, Drabman, Kaszdin, Bootzin, Staats, Cameron, Pierce, McMillan, Corcoran, and Wilson. There have also been thousands of studies of personalized instruction in conjunction with studies in special education, tutoring, teachers' aides, tracking, etc.
10	John F. Pane		"Despite the lack of evidence, there is considerable enthusiasm about personalized learning among practitioners and policymakers, and implementation is spreading."	Dismissive	Strategies for Implementing Personalized Learning While Evidence and Resources Are Underdeveloped, p.12	Rand Corporation Perspective, October 2018	https://www.rand.org/pubs/perspectives/PE314.html	Funded by the William and Flora Hewlett Foundation, UCLA’s National Center for Research on Evaluation, Standards, and Student Testing (CRESST) is monitoring the extent to which the two consortia’s assessment development efforts are likely to produce tests that measure and support goals for "deeper learning."	Pane devotes considerable text to claims that no prior research exists, except for another Rand study, and then, on p.7 admits that there exist some relevant mastery learning studies from the 1980s. He implies, however, that there were only one or a few. In fact, there were hundreds. These researchers have included Homme, Csanyi, Gonzales, Rechs, O’Leary, Drabman, Kaszdin, Bootzin, Staats, Cameron, Pierce, McMillan, Corcoran, and Wilson. There have also been thousands of studies of personalized instruction in conjunction with studies in special education, tutoring, teachers' aides, tracking, etc.
11	John F. Pane		"Thus, the purpose of this Perspective is to offer strategic guidance for designers of personalized learning programs to consider while the evidence base is catching up. This guidance draws on theory, basic principles from learning science, and the limited research that does exist on personalized learning and its component parts. This research was conducted in RAND Education."	Dismissive	Strategies for Implementing Personalized Learning While Evidence and Resources Are Underdeveloped, p.12	Rand Corporation Perspective, October 2018	https://www.rand.org/pubs/perspectives/PE314.html	Funded by the William and Flora Hewlett Foundation, UCLA’s National Center for Research on Evaluation, Standards, and Student Testing (CRESST) is monitoring the extent to which the two consortia’s assessment development efforts are likely to produce tests that measure and support goals for "deeper learning."	Pane devotes considerable text to claims that no prior research exists, except for another Rand study, and then, on p.7 admits that there exist some relevant mastery learning studies from the 1980s. He implies, however, that there were only one or a few. In fact, there were hundreds. These researchers have included Homme, Csanyi, Gonzales, Rechs, O’Leary, Drabman, Kaszdin, Bootzin, Staats, Cameron, Pierce, McMillan, Corcoran, and Wilson. There have also been thousands of studies of personalized instruction in conjunction with studies in special education, tutoring, teachers' aides, tracking, etc.
12	Jennifer L. Steele, Matthew W. Lewis, Lucrecia Santibañez, Susannah Faxon-Mills, Mollie Rudnick, Brian M. Stecher, Laura S. Hamilton		"Despite taking on considerable momentum in the field, competency-based systems have not been extensively researched." p.2	Dismissive	Competency-Based Education in Three Pilot Programs Examining Implementation and Outcomes	Rand Education, 2014	https://www.rand.org/content/dam/rand/pubs/research_reports/RR700/RR732/RAND_RR732.pdf	"The research described in this report was sponsored by the Bill & Melinda Gates Foundation"	Just some of the relevant pre-2008 studies of the effects of minimum-competency or exit exams and the problems with a single passing score include those of Alvarez, Moreno, & Patrinos (2007); Grodsky & Kalogrides (2006); Audette (2005); Orlich (2003); StandardsWork (2003); Meisels, et al. (2003); Braun (2003); Rosenshine (2003); Tighe, Wang, & Foley (2002); Carnoy & Loeb (2002); Baumert & Demmrich (2001); Rosenblatt & Offer (2001); Phelps (2001); Toenjes, Dworkin, Lorence, & Hill (2000); Wenglinsky (2000); Massachusetts Finance Office (2000); DeMars (2000); Bishop (1999, 2000, 2001, & 2004); Grissmer & Flanagan(1998); Strauss, Bowes, Marks, & Plesko (1998); Frederiksen (1994); Ritchie & Thorkildsen (1994); Chao-Qun & Hui (1993); Potter & Wall (1992); Jacobson (1992); Rodgers, et al. (1991); Morris (1991); Winfield (1990); Ligon, Johnstone, Brightman, Davis, et al. (1990); Winfield (1987); Koffler (1987); Losack (1987); Marshall (1987); Hembree (1987); Mangino, Battaille, Washington, & Rumbaut (1986); Michigan Department of Education (1984); Ketchie (1984); Serow (1982); Indiana Education Department (1982); Brunton (1982); Paramore, et al. (1980); Ogden (1979); Down(2) (1979); Wellisch (1978); and Findley (1978).
13	Jennifer L. Steele, Matthew W. Lewis, Lucrecia Santibañez, Susannah Faxon-Mills, Mollie Rudnick, Brian M. Stecher, Laura S. Hamilton		"Recent studies have described the experiences of educators working to undertake competency-based reforms or have highlighted promising models, but these studies have not systematically examined the effects of these models on student learning or persistence." p.2	Denigrating	Competency-Based Education in Three Pilot Programs Examining Implementation and Outcomes	Rand Education, 2014	https://www.rand.org/content/dam/rand/pubs/research_reports/RR700/RR732/RAND_RR732.pdf	"The research described in this report was sponsored by the Bill & Melinda Gates Foundation"	Just some of the relevant pre-2008 studies of the effects of minimum-competency or exit exams and the problems with a single passing score include those of Alvarez, Moreno, & Patrinos (2007); Grodsky & Kalogrides (2006); Audette (2005); Orlich (2003); StandardsWork (2003); Meisels, et al. (2003); Braun (2003); Rosenshine (2003); Tighe, Wang, & Foley (2002); Carnoy & Loeb (2002); Baumert & Demmrich (2001); Rosenblatt & Offer (2001); Phelps (2001); Toenjes, Dworkin, Lorence, & Hill (2000); Wenglinsky (2000); Massachusetts Finance Office (2000); DeMars (2000); Bishop (1999, 2000, 2001, & 2004); Grissmer & Flanagan(1998); Strauss, Bowes, Marks, & Plesko (1998); Frederiksen (1994); Ritchie & Thorkildsen (1994); Chao-Qun & Hui (1993); Potter & Wall (1992); Jacobson (1992); Rodgers, et al. (1991); Morris (1991); Winfield (1990); Ligon, Johnstone, Brightman, Davis, et al. (1990); Winfield (1987); Koffler (1987); Losack (1987); Marshall (1987); Hembree (1987); Mangino, Battaille, Washington, & Rumbaut (1986); Michigan Department of Education (1984); Ketchie (1984); Serow (1982); Indiana Education Department (1982); Brunton (1982); Paramore, et al. (1980); Ogden (1979); Down(2) (1979); Wellisch (1978); and Findley (1978).
14	Jennifer L. Steele, Matthew W. Lewis, Lucrecia Santibañez, Susannah Faxon-Mills, Mollie Rudnick, Brian M. Stecher, Laura S. Hamilton		"… there are no studies that would allow us to attribute outperformance to the competency-based education systems alone," p.2	Dismissive	Competency-Based Education in Three Pilot Programs Examining Implementation and Outcomes	Rand Education, 2014	https://www.rand.org/content/dam/rand/pubs/research_reports/RR700/RR732/RAND_RR732.pdf	"The research described in this report was sponsored by the Bill & Melinda Gates Foundation"	Just some of the relevant pre-2008 studies of the effects of minimum-competency or exit exams and the problems with a single passing score include those of Alvarez, Moreno, & Patrinos (2007); Grodsky & Kalogrides (2006); Audette (2005); Orlich (2003); StandardsWork (2003); Meisels, et al. (2003); Braun (2003); Rosenshine (2003); Tighe, Wang, & Foley (2002); Carnoy & Loeb (2002); Baumert & Demmrich (2001); Rosenblatt & Offer (2001); Phelps (2001); Toenjes, Dworkin, Lorence, & Hill (2000); Wenglinsky (2000); Massachusetts Finance Office (2000); DeMars (2000); Bishop (1999, 2000, 2001, & 2004); Grissmer & Flanagan(1998); Strauss, Bowes, Marks, & Plesko (1998); Frederiksen (1994); Ritchie & Thorkildsen (1994); Chao-Qun & Hui (1993); Potter & Wall (1992); Jacobson (1992); Rodgers, et al. (1991); Morris (1991); Winfield (1990); Ligon, Johnstone, Brightman, Davis, et al. (1990); Winfield (1987); Koffler (1987); Losack (1987); Marshall (1987); Hembree (1987); Mangino, Battaille, Washington, & Rumbaut (1986); Michigan Department of Education (1984); Ketchie (1984); Serow (1982); Indiana Education Department (1982); Brunton (1982); Paramore, et al. (1980); Ogden (1979); Down(2) (1979); Wellisch (1978); and Findley (1978).
15	Jennifer L. Steele, Matthew W. Lewis, Lucrecia Santibañez, Susannah Faxon-Mills, Mollie Rudnick, Brian M. Stecher, Laura S. Hamilton		"Because it is one of the first studies we are aware of since the late 1980s that has attempted to estimate the impact of competency-based models on students’ academic outcomes," p.4	1stness	Competency-Based Education in Three Pilot Programs Examining Implementation and Outcomes	Rand Education, 2014	https://www.rand.org/content/dam/rand/pubs/research_reports/RR700/RR732/RAND_RR732.pdf	"The research described in this report was sponsored by the Bill & Melinda Gates Foundation"	Just some of the relevant pre-2008 studies of the effects of minimum-competency or exit exams and the problems with a single passing score include those of Alvarez, Moreno, & Patrinos (2007); Grodsky & Kalogrides (2006); Audette (2005); Orlich (2003); StandardsWork (2003); Meisels, et al. (2003); Braun (2003); Rosenshine (2003); Tighe, Wang, & Foley (2002); Carnoy & Loeb (2002); Baumert & Demmrich (2001); Rosenblatt & Offer (2001); Phelps (2001); Toenjes, Dworkin, Lorence, & Hill (2000); Wenglinsky (2000); Massachusetts Finance Office (2000); DeMars (2000); Bishop (1999, 2000, 2001, & 2004); Grissmer & Flanagan(1998); Strauss, Bowes, Marks, & Plesko (1998); Frederiksen (1994); Ritchie & Thorkildsen (1994); Chao-Qun & Hui (1993); Potter & Wall (1992); Jacobson (1992); Rodgers, et al. (1991); Morris (1991); Winfield (1990); Ligon, Johnstone, Brightman, Davis, et al. (1990); Winfield (1987); Koffler (1987); Losack (1987); Marshall (1987); Hembree (1987); Mangino, Battaille, Washington, & Rumbaut (1986); Michigan Department of Education (1984); Ketchie (1984); Serow (1982); Indiana Education Department (1982); Brunton (1982); Paramore, et al. (1980); Ogden (1979); Down(2) (1979); Wellisch (1978); and Findley (1978).
16	Jennifer L. Steele, Matthew W. Lewis, Lucrecia Santibañez, Susannah Faxon-Mills, Mollie Rudnick, Brian M. Stecher, Laura S. Hamilton		"In part, the lack of recent research on competency-based education may be due to variability around the concept of competency-based education itself." p.10	Dismissive	Competency-Based Education in Three Pilot Programs Examining Implementation and Outcomes	Rand Education, 2014	https://www.rand.org/content/dam/rand/pubs/research_reports/RR700/RR732/RAND_RR732.pdf	"The research described in this report was sponsored by the Bill & Melinda Gates Foundation"	Just some of the relevant pre-2008 studies of the effects of minimum-competency or exit exams and the problems with a single passing score include those of Alvarez, Moreno, & Patrinos (2007); Grodsky & Kalogrides (2006); Audette (2005); Orlich (2003); StandardsWork (2003); Meisels, et al. (2003); Braun (2003); Rosenshine (2003); Tighe, Wang, & Foley (2002); Carnoy & Loeb (2002); Baumert & Demmrich (2001); Rosenblatt & Offer (2001); Phelps (2001); Toenjes, Dworkin, Lorence, & Hill (2000); Wenglinsky (2000); Massachusetts Finance Office (2000); DeMars (2000); Bishop (1999, 2000, 2001, & 2004); Grissmer & Flanagan(1998); Strauss, Bowes, Marks, & Plesko (1998); Frederiksen (1994); Ritchie & Thorkildsen (1994); Chao-Qun & Hui (1993); Potter & Wall (1992); Jacobson (1992); Rodgers, et al. (1991); Morris (1991); Winfield (1990); Ligon, Johnstone, Brightman, Davis, et al. (1990); Winfield (1987); Koffler (1987); Losack (1987); Marshall (1987); Hembree (1987); Mangino, Battaille, Washington, & Rumbaut (1986); Michigan Department of Education (1984); Ketchie (1984); Serow (1982); Indiana Education Department (1982); Brunton (1982); Paramore, et al. (1980); Ogden (1979); Down(2) (1979); Wellisch (1978); and Findley (1978).
17	Kun Yuan, Vi-Nhuan Le		"… there has been no systematic empirical examination of the extent to which other widely used achievement tests emphasize deeper learning." p.xi	Dismissive	Measuring Deeper Learning Through Cognitively Demanding Test Items	Rand Corporation Research Report, 2014	https://www.rand.org/content/dam/rand/pubs/research_reports/RR400/RR483/RAND_RR483.pdf	"The research described in this report was sponsored by the William and Flora Hewlett Foundation"
18	Pete Wilmoth		"The increasing availability of computers and Internet access makes technology based education an enticing option, both inside and outside the classroom. However, school districts have adopted many such tools without compelling evidence that they are effective in improving student achievement."	Dismissive	Cognitive Tutor: Encouraging Signs for Computers in the Classroom	The RAND Blog, November 19, 2013		"The research reported here was supported by the Institute of Education Sciences, U.S. Department of Education, through Grant R305A070185 to the RAND Corporation."
19	Pete Wilmoth		"To help fill this evidence gap, a RAND research team … "	Dismissive	Cognitive Tutor: Encouraging Signs for Computers in the Classroom	The RAND Blog, November 19, 2013		"The research reported here was supported by the Institute of Education Sciences, U.S. Department of Education, through Grant R305A070185 to the RAND Corporation."
20	Pete Wilmoth		"As one of the first large-scale assessments of a blended learning approach, this study suggests promise for using technology to improve student achievement."	1stness	Cognitive Tutor: Encouraging Signs for Computers in the Classroom	The RAND Blog, November 19, 2013		"The research reported here was supported by the Institute of Education Sciences, U.S. Department of Education, through Grant R305A070185 to the RAND Corporation."
21	John F. Pane, Beth Ann Griffin, Daniel F. McCaffrey, and Rita Karam,		"These tools allow self-paced instruction and provide students with customized feedback. These features, it is widely held, will improve student engagement and improve proficiency. However, evidence to support these claims remains scarce." p.2	Dismissive	Does an Algebra Course with Tutoring Software Improve Student Learning?	Rand Corporation Brief, 2013		"The research reported here was supported by the Institute of Education Sciences, U.S. Department of Education, through Grant R305A070185 to the RAND Corporation."
22	John F. Pane, Beth Ann Griffin, Daniel F. McCaffrey, and Rita Karam,		"To make headway in addressing this knowledge gap, a team of RAND researchers …" p.3	Dismissive	Does an Algebra Course with Tutoring Software Improve Student Learning?	Rand Corporation Brief, 2013		"The research reported here was supported by the Institute of Education Sciences, U.S. Department of Education, through Grant R305A070185 to the RAND Corporation."
23	Susannah Faxon-Mills, Laura S. Hamilton, Mollie Rudnick, Brian M. Stecher		"In particular, there is still much to learn about how changes in testing might influence the education system and how tests of deeper content and more complex skills and processes could best be used to promote the Foundation’s goals for deeper learning." p.1	Dismissive	New Assessments, Better Instruction? Designing Assessment Systems to Promote Instructional Improvement	Rand Corporation Research Report, 2013		"Funding to support the research was provided by the William and Flora Hewlett Foundation." "Marc Chun at the Hewlett Foundation first approached us about reviewing the literature on the impact of assessment, and he was very helpful in framing this investigation."
24	Susannah Faxon-Mills, Laura S. Hamilton, Mollie Rudnick, Brian M. Stecher		"Given the gaps in evidence regarding the link between testing and student outcomes … " p.1	Dismissive	New Assessments, Better Instruction? Designing Assessment Systems to Promote Instructional Improvement	Rand Corporation Research Report, 2013		"Funding to support the research was provided by the William and Flora Hewlett Foundation." "Marc Chun at the Hewlett Foundation first approached us about reviewing the literature on the impact of assessment, and he was very helpful in framing this investigation."
25	Susannah Faxon-Mills, Laura S. Hamilton, Mollie Rudnick, Brian M. Stecher		"The first step for each of these research areas was to identify relevant material from previous literature reviews on these topics, including those conducted by RAND researchers (e.g., Hamilton, Stecher, and Klein, 2002; Hamilton, 2003; Stecher, 2010) and by the National Research Council (e.g., Koenig, 2011). p.5	Dismissive	New Assessments, Better Instruction? Designing Assessment Systems to Promote Instructional Improvement	Rand Corporation Research Report, 2013		"Funding to support the research was provided by the William and Flora Hewlett Foundation." "Marc Chun at the Hewlett Foundation first approached us about reviewing the literature on the impact of assessment, and he was very helpful in framing this investigation."
26	Susannah Faxon-Mills, Laura S. Hamilton, Mollie Rudnick, Brian M. Stecher		"… we paid particular attention to sources from the past ten years, since these studies were less likely to have been included in previous literature reviews." p.5	Dismissive	New Assessments, Better Instruction? Designing Assessment Systems to Promote Instructional Improvement	Rand Corporation Research Report, 2013		"Funding to support the research was provided by the William and Flora Hewlett Foundation." "Marc Chun at the Hewlett Foundation first approached us about reviewing the literature on the impact of assessment, and he was very helpful in framing this investigation."
27	Susannah Faxon-Mills, Laura S. Hamilton, Mollie Rudnick, Brian M. Stecher		"Time and resource constraints limited the extent of our literature reviews, but we do not think this had a serious effect on our findings. Most importantly, we included all the clearly relevant studies from major sources that were available for electronic searching. In addition, many of the studies we reviewed also included comprehensive reviews of other literature, leading to fairly wide coverage of each body of literature." p.8	Dismissive	New Assessments, Better Instruction? Designing Assessment Systems to Promote Instructional Improvement	Rand Corporation Research Report, 2013		"Funding to support the research was provided by the William and Flora Hewlett Foundation." "Marc Chun at the Hewlett Foundation first approached us about reviewing the literature on the impact of assessment, and he was very helpful in framing this investigation."
28	Susannah Faxon-Mills, Laura S. Hamilton, Mollie Rudnick, Brian M. Stecher		"However, the amount of research on test attributes is limited, and the research has been conducted in a wide variety of contexts involving a wide variety of tests. Thus, while the findings are interesting, few have been replicated." p.22	Dismissive	New Assessments, Better Instruction? Designing Assessment Systems to Promote Instructional Improvement	Rand Corporation Research Report, 2013		"Funding to support the research was provided by the William and Flora Hewlett Foundation." "Marc Chun at the Hewlett Foundation first approached us about reviewing the literature on the impact of assessment, and he was very helpful in framing this investigation."
29	Susannah Faxon-Mills, Laura S. Hamilton, Mollie Rudnick, Brian M. Stecher		"It is important to recognize that the literature on how school characteristics, such as urbanicity and governance, affect educators’ responses to testing is sparse." p.29	Dismissive	New Assessments, Better Instruction? Designing Assessment Systems to Promote Instructional Improvement	Rand Corporation Research Report, 2013		"Funding to support the research was provided by the William and Flora Hewlett Foundation." "Marc Chun at the Hewlett Foundation first approached us about reviewing the literature on the impact of assessment, and he was very helpful in framing this investigation."
30	Susannah Faxon-Mills, Laura S. Hamilton, Mollie Rudnick, Brian M. Stecher		"… there is little empirical evidence that provides guidance on the amount and types of professional development that would promote constructive responses to assessment.	Dismissive	New Assessments, Better Instruction? Designing Assessment Systems to Promote Instructional Improvement	Rand Corporation Research Report, 2013		"Funding to support the research was provided by the William and Flora Hewlett Foundation." "Marc Chun at the Hewlett Foundation first approached us about reviewing the literature on the impact of assessment, and he was very helpful in framing this investigation."
31	Jinok Kim	Joan L. Herman	"However, the validity of existing criteria and procedures lack an empirical base; in fact, reclassification practices are formulated and implemented with little knowledge of the factors that may influence their success."	Dismissive, Denigrating	Understanding Patterns and Precursors of ELL Success Subsequent to Reclassification, p.1	CRESST Report 818, August, 2012	https://files.eric.ed.gov/fulltext/ED540604.pdf	"The work reported herein was supported under the National Research and Development Centers, PR/Award Number R305A09058101, as administered by the U.S. Department of Education, Institute of Education Sciences."
32	Jinok Kim	Joan L. Herman	"Because the research basis for making mainstreaming or reclassification decisions remains slim, it may not be surprising that criteria for reclassifying students from ELL to Reclassified as Fluent English Proficient (RFEP) status vary substantially across states, as documented by a recent report reviewing statewide practices related to ELLs."	Dismissive	Understanding Patterns and Precursors of ELL Success Subsequent to Reclassification, p.3	CRESST Report 818, August, 2012	https://files.eric.ed.gov/fulltext/ED540604.pdf	"The work reported herein was supported under the National Research and Development Centers, PR/Award Number R305A09058101, as administered by the U.S. Department of Education, Institute of Education Sciences."
33	Jinok Kim	Joan L. Herman	"Previous studies cited earlier have identified potential problems in current reclassification, qualitatively analyzed criteria, and student characteristics that may relate to high versus low redesignation rates, and examined related research questions, such as how long it takes for non native speakers to acquire ELP or be reclassified; but none of the existing literature has directly dealt with reclassification systems and their consequences, and more specifically with the consequences of various reclassification criteria."	1stness	Understanding Patterns and Precursors of ELL Success Subsequent to Reclassification, p.6	CRESST Report 818, August, 2012	https://files.eric.ed.gov/fulltext/ED540604.pdf	"The work reported herein was supported under the National Research and Development Centers, PR/Award Number R305A09058101, as administered by the U.S. Department of Education, Institute of Education Sciences."
34	Lorraine M. McDonnell		"Over the past 30 years, accountability policies have become more prominent in public K-12 education and have changed how teaching and learning are organized. It is less clear the extent to which these policies have altered the politics of education." Abstract, p.170	Dismissive	Educational Accountability and Policy Feedback	Educational Policy 27(2) 170–189, 2012	https://journals.sagepub.com/doi/10.1177/0895904812465119	"The author received financial support from the William T. Grant Foundation for research presented in this article."
35	Lorraine M. McDonnell		"In contrast to other policy areas such as health and social welfare where research is more developed, we know less about policy feedback in education." p.171	Dismissive	Educational Accountability and Policy Feedback	Educational Policy 27(2) 170–189, 2012	https://journals.sagepub.com/doi/10.1177/0895904812465119	"The author received financial support from the William T. Grant Foundation for research presented in this article."
36	Lorraine M. McDonnell		"However, an essential question for those interested in the politics of education policy has not been central in past research: To what extent have recent accountability policies altered the politics of education? This article begins to address that question ..." p.171	Dismissive	Educational Accountability and Policy Feedback	Educational Policy, 27(2) 170–189, 2012	https://journals.sagepub.com/doi/10.1177/0895904812465119	"The author received financial support from the William T. Grant Foundation for research presented in this article."
37	Laura S. Hamilton	Brian M. Stecher, Kun Yuan	"He also noted that virtually all of the arguments, both for and against standards, are based on beliefs and hypotheses rather than on direct empirical evidence” (p. 427) Although a large and growing body of research has been conducted to examine the effects of SBA, the caution Porter expressed in 1994 about the lack of empirical evidence remains relevant today." pp.157-158	Denigrating	Standards-Based Accountability in the United States: Lessons Learned and Future Directions	Education Inquiry, 3(2), June 2012, 149-170	https://www.academia.edu/15201890/Standards_Based_Accountability_in_the_United_States_Lessons_Learned_and_Future_Directions_1	"Material in this paper has been adapted from a paper commissioned by the Center on Education Policy: Hamilton, L.S., Stecher, B.M., & Yuan, K. (2009) Standards-based Reform in the United States: History, Research, and Future Directions. Washington, DC: Center on Education Policy. Portions of this work were supported by the National Science Foundation under Grant No. REC-0228295."	Relevant pre-2000 studies of the effects of standards, alignment, goal setting, setting reachable goals, etc. include those of Mitchell (1999); Morgan & Ramist (1998); the Southern Regional Education Board (1998); Miles, Bishop, Collins, Fink, Gardner, Grant, Hussain, et al. (1997); the Florida Office of Program Policy Analysis (1997); Pomplun (1997); Schmoker (1996); Aguilera & Hendricks (1996); Banta, Lund, Black & Oblander (1996); Bottoms & Mikos (1995); Bamburg & Medina (1993); Bishop (1993); the U. S. General Accounting Office (1993); Eckstein & Noah (1993); Mattsson (1993); Brown (1992); Heyneman & Ransom (1992); Whetton (1992); Anderson, Muir, Bateson, Blackmore & Rogers (1990); Csikszentmihalyi (1990); Levine & Lezotte (1990); LaRoque & Coleman (1989); Hillocks (1987); Willingham & Morris (1986); Resnick & Resnick (1985); Ogle & Fritts (1984); Natriello & Dornbusch (1984); Brooke & Oxenham (1984); Rentz (1979); Wellisch, MacQueen, Carriere & Dick (1978); *Rosswork (1977); Estes, Colvin & Goodwin (1976); Wood (1953); and Panlasigui & Knight (1930).
38	Laura S. Hamilton	Brian M. Stecher, Kun Yuan	"High-quality research on the effects of SBA is difficult to conduct for a number of reasons,…." p.158	Dismissive	Standards-Based Accountability in the United States: Lessons Learned and Future Directions	Education Inquiry, 3(2), June 2012, 149-170	https://www.academia.edu/15201890/Standards_Based_Accountability_in_the_United_States_Lessons_Learned_and_Future_Directions_1	"Material in this paper has been adapted from a paper commissioned by the Center on Education Policy: Hamilton, L.S., Stecher, B.M., & Yuan, K. (2009) Standards-based Reform in the United States: History, Research, and Future Directions. Washington, DC: Center on Education Policy. Portions of this work were supported by the National Science Foundation under Grant No. REC-0228295."	Access to anonymized student data is granted all the time. Externally administered high-stakes testing is widely reviled among US educationists. It strains credulity that one can not find one or a few districts out of the many thousands to cooperate in a study to discredit testing.
39	Laura S. Hamilton	Brian M. Stecher, Kun Yuan	"Even when the necessary data have been collected by states or other entities, it is often difficult for researchers to obtain these data because those responsible for the data refuse to grant access, either because of concerns about confidentiality or because they are not interested in having their programmes scrutinised by. researchers. Thus, the amount of rigorous analysis is limited." p.158	Dismissive	Standards-Based Accountability in the United States: Lessons Learned and Future Directions	Education Inquiry, 3(2), June 2012, 149-170	https://www.academia.edu/15201890/Standards_Based_Accountability_in_the_United_States_Lessons_Learned_and_Future_Directions_1	"Material in this paper has been adapted from a paper commissioned by the Center on Education Policy: Hamilton, L.S., Stecher, B.M., & Yuan, K. (2009) Standards-based Reform in the United States: History, Research, and Future Directions. Washington, DC: Center on Education Policy. Portions of this work were supported by the National Science Foundation under Grant No. REC-0228295."	Access to anonymized student data is granted all the time. Externally administered high-stakes testing is widely reviled among US educationists. It strains credulity that one can not find one or a few districts out of the many thousands to cooperate in a study to discredit testing.
40	Laura S. Hamilton	Brian M. Stecher, Kun Yuan	"These evaluation findings reveal the challenges inherent in trying to judge the quality of standards. Arguably the most important test of quality is whether the standards promote high-quality instruction and improved student learning but, as we discuss later, there is very little research to address that question." p.158	Dismissive	Standards-Based Accountability in the United States: Lessons Learned and Future Directions	Education Inquiry, 3(2), June 2012, 149-170	https://www.academia.edu/15201890/Standards_Based_Accountability_in_the_United_States_Lessons_Learned_and_Future_Directions_1	"Material in this paper has been adapted from a paper commissioned by the Center on Education Policy: Hamilton, L.S., Stecher, B.M., & Yuan, K. (2009) Standards-based Reform in the United States: History, Research, and Future Directions. Washington, DC: Center on Education Policy. Portions of this work were supported by the National Science Foundation under Grant No. REC-0228295."	Relevant pre-2000 studies of the effects of standards, alignment, goal setting, setting reachable goals, etc. include those of Mitchell (1999); Morgan & Ramist (1998); the Southern Regional Education Board (1998); Miles, Bishop, Collins, Fink, Gardner, Grant, Hussain, et al. (1997); the Florida Office of Program Policy Analysis (1997); Pomplun (1997); Schmoker (1996); Aguilera & Hendricks (1996); Banta, Lund, Black & Oblander (1996); Bottoms & Mikos (1995); Bamburg & Medina (1993); Bishop (1993); the U. S. General Accounting Office (1993); Eckstein & Noah (1993); Mattsson (1993); Brown (1992); Heyneman & Ransom (1992); Whetton (1992); Anderson, Muir, Bateson, Blackmore & Rogers (1990); Csikszentmihalyi (1990); Levine & Lezotte (1990); LaRoque & Coleman (1989); Hillocks (1987); Willingham & Morris (1986); Resnick & Resnick (1985); Ogle & Fritts (1984); Natriello & Dornbusch (1984); Brooke & Oxenham (1984); Rentz (1979); Wellisch, MacQueen, Carriere & Dick (1978); *Rosswork (1977); Estes, Colvin & Goodwin (1976); Wood (1953); and Panlasigui & Knight (1930).
41	Laura S. Hamilton	Brian M. Stecher, Kun Yuan	"In fact, the bulk of research relevant to SBA has focused on the links between high-stakes tests and educators’ practices rather than standards and practices." p.159	Dismissive	Standards-Based Accountability in the United States: Lessons Learned and Future Directions	Education Inquiry, 3(2), June 2012, 149-170	https://www.academia.edu/15201890/Standards_Based_Accountability_in_the_United_States_Lessons_Learned_and_Future_Directions_1	"Material in this paper has been adapted from a paper commissioned by the Center on Education Policy: Hamilton, L.S., Stecher, B.M., & Yuan, K. (2009) Standards-based Reform in the United States: History, Research, and Future Directions. Washington, DC: Center on Education Policy. Portions of this work were supported by the National Science Foundation under Grant No. REC-0228295."	Relevant pre-2000 studies of the effects of standards, alignment, goal setting, setting reachable goals, etc. include those of Mitchell (1999); Morgan & Ramist (1998); the Southern Regional Education Board (1998); Miles, Bishop, Collins, Fink, Gardner, Grant, Hussain, et al. (1997); the Florida Office of Program Policy Analysis (1997); Pomplun (1997); Schmoker (1996); Aguilera & Hendricks (1996); Banta, Lund, Black & Oblander (1996); Bottoms & Mikos (1995); Bamburg & Medina (1993); Bishop (1993); the U. S. General Accounting Office (1993); Eckstein & Noah (1993); Mattsson (1993); Brown (1992); Heyneman & Ransom (1992); Whetton (1992); Anderson, Muir, Bateson, Blackmore & Rogers (1990); Csikszentmihalyi (1990); Levine & Lezotte (1990); LaRoque & Coleman (1989); Hillocks (1987); Willingham & Morris (1986); Resnick & Resnick (1985); Ogle & Fritts (1984); Natriello & Dornbusch (1984); Brooke & Oxenham (1984); Rentz (1979); Wellisch, MacQueen, Carriere & Dick (1978); *Rosswork (1977); Estes, Colvin & Goodwin (1976); Wood (1953); and Panlasigui & Knight (1930).
42	Laura S. Hamilton	Brian M. Stecher, Kun Yuan	"The existing evidence does not provide definitive guidance regarding the SBA system features that would be most likely to promote desirable outcomes." p.163	Dismissive	Standards-Based Accountability in the United States: Lessons Learned and Future Directions	Education Inquiry, 3(2), June 2012, 149-170	https://www.academia.edu/15201890/Standards_Based_Accountability_in_the_United_States_Lessons_Learned_and_Future_Directions_1	"Material in this paper has been adapted from a paper commissioned by the Center on Education Policy: Hamilton, L.S., Stecher, B.M., & Yuan, K. (2009) Standards-based Reform in the United States: History, Research, and Future Directions. Washington, DC: Center on Education Policy. Portions of this work were supported by the National Science Foundation under Grant No. REC-0228295."	Relevant pre-2000 studies of the effects of standards, alignment, goal setting, setting reachable goals, etc. include those of Mitchell (1999); Morgan & Ramist (1998); the Southern Regional Education Board (1998); Miles, Bishop, Collins, Fink, Gardner, Grant, Hussain, et al. (1997); the Florida Office of Program Policy Analysis (1997); Pomplun (1997); Schmoker (1996); Aguilera & Hendricks (1996); Banta, Lund, Black & Oblander (1996); Bottoms & Mikos (1995); Bamburg & Medina (1993); Bishop (1993); the U. S. General Accounting Office (1993); Eckstein & Noah (1993); Mattsson (1993); Brown (1992); Heyneman & Ransom (1992); Whetton (1992); Anderson, Muir, Bateson, Blackmore & Rogers (1990); Csikszentmihalyi (1990); Levine & Lezotte (1990); LaRoque & Coleman (1989); Hillocks (1987); Willingham & Morris (1986); Resnick & Resnick (1985); Ogle & Fritts (1984); Natriello & Dornbusch (1984); Brooke & Oxenham (1984); Rentz (1979); Wellisch, MacQueen, Carriere & Dick (1978); *Rosswork (1977); Estes, Colvin & Goodwin (1976); Wood (1953); and Panlasigui & Knight (1930).
43	Girlie C. Delacruz		"Opportunities for student use of rubrics to improve learning appears logical, although only a few studies have examined this idea directly."	Dismissive	Impact of Incentives on the Use of Feedback in Educational Videogames	CRESST Report 813, March, 2012, p.3	https://cresst.org/wp-content/uploads/R813.pdf	Institute of Education Sciences, US Education Department	Relevant pre-2000 studies of the effects of standards, alignment, goal setting, setting reachable goals, etc. include those of Mitchell (1999); Morgan & Ramist (1998); the Southern Regional Education Board (1998); Miles, Bishop, Collins, Fink, Gardner, Grant, Hussain, et al. (1997); the Florida Office of Program Policy Analysis (1997); Pomplun (1997); Schmoker (1996); Aguilera & Hendricks (1996); Banta, Lund, Black & Oblander (1996); Bottoms & Mikos (1995); Bamburg & Medina (1993); Bishop (1993); the U. S. General Accounting Office (1993); Eckstein & Noah (1993); Mattsson (1993); Brown (1992); Heyneman & Ransom (1992); Whetton (1992); Anderson, Muir, Bateson, Blackmore & Rogers (1990); Csikszentmihalyi (1990); Levine & Lezotte (1990); LaRoque & Coleman (1989); Hillocks (1987); Willingham & Morris (1986); Resnick & Resnick (1985); Ogle & Fritts (1984); Natriello & Dornbusch (1984); Brooke & Oxenham (1984); Rentz (1979); Wellisch, MacQueen, Carriere & Dick (1978); *Rosswork (1977); Estes, Colvin & Goodwin (1976); Wood (1953); and Panlasigui & Knight (1930).
44	Jinok Kim		"Though we can find many such statistics in various reports, few have dealt with comparisons across students reclassified in various grade levels. Lack of such studies may be in part due to the difficulty in defining who are reclassified students as well as when they are reclassified."	Dismissive	Relationshiips among and between ELL status, demographic characteristics, enrollment history, and school persistence	CRESST Report 810, December, 2011, p.6	https://cresst.org/wp-content/uploads/R810.pdf	"The work reported herein was supported under the National Research and Development Centers, PR/Award Number R305A090581, as administered by the U.S. Department of Education, Institute of Education Sciences with funding to the National Center for Research on Evaluation, Standards, and Student Testing (CRESST)."
45	Joan Herman	4 others	"While the challenge of teachers’ content-pedagogical knowledge has been documented (Heritage et al., 2009; Heritage, Jones & White, 2010; Herman et al., 2010), few studies have examined the relationship between such knowledge and teachers’ assessment practices, nor examined how teachers’ knowledge may moderate the relationship between assessment practices and student learning."	Dismissive	Relationships between Teacher Knowledge, Assessment Practice, and Learning-Chicken, Egg, or Omelet?	CRESST Report 809, November 2011	http://cresst.org/wp-content/uploads/R809.pdf	Institute of Education Sciences, US Education Department	See, for example, https://www.tandfonline.com/doi/full/10.1080/15305058.2011.602920 ; https://nonpartisaneducation.org/Review/Resources/SurveyList.htm ; https://nonpartisaneducation.org/Review/Resources/QualitativeList.htm
46	Lorrie A. Shepard	Kristen L. Davidson, Richard Bowman	"Although some instruments, such as the Northwest Evaluation Association‘s (NWEA) Measures of Academic Progress (MAP®), have been around for decades, few studies have been conducted to examine the technical adequacy of interim assessments or to evaluate their effects on teaching and student learning."	Dismissive	How Middle-School Mathematics Teachers Use Interim and Benchmark Assessment Data, p.2	CRESST Report 807, October 2011	http://cresst.org/wp-content/uploads/R807.pdf	Institute of Education Sciences, US Education Department	Studies of formative testing date back a cenury, and the evidence, on average, is strongly positive, which is not the result favored by CRESST, so they declare the studies nonexistent. See, for example, https://www.tandfonline.com/doi/full/10.1080/15305058.2011.602920 ; https://nonpartisaneducation.org/Review/Resources/QuantitativeList.htm ; https://nonpartisaneducation.org/Review/Resources/SurveyList.htm ; https://nonpartisaneducation.org/Review/Resources/QualitativeList.htm
47	Kristen L. Davidson	Greta Frohbieter	"Yet, districts’ processes to this end [of adopting interim or benchmark assessments] have been largely unexamined (Bulkley et al.; Mandinach et al.; Young & Kim).	Dismissive	District Adoption and Implementation of Interim and Benchmark Assessments, p.2	CRESST Report 806, September 2011	https://eric.ed.gov/?id=ED525098	Institute of Education Sciences, US Education Department	Studies of formative testing date back a cenury, and the evidence, on average, is strongly positive, which is not the result favored by CRESST, so they declare the studies nonexistent. See, for example, https://www.tandfonline.com/doi/full/10.1080/15305058.2011.602920 ; https://nonpartisaneducation.org/Review/Resources/QuantitativeList.htm ; https://nonpartisaneducation.org/Review/Resources/SurveyList.htm ; https://nonpartisaneducation.org/Review/Resources/QualitativeList.htm
48	Kristen L. Davidson	Greta Frohbieter	"As noted above, district processes with regard to interim assessment adoption and implementation remain largely uninvestigated. A review of the few relevant studies, however, reveals..."	Dismissive	District Adoption and Implementation of Interim and Benchmark Assessments, p.4	CRESST Report 806, September 2011	https://eric.ed.gov/?id=ED525098	Institute of Education Sciences, US Education Department	Studies of formative testing date back a cenury, and the evidence, on average, is strongly positive, which is not the result favored by CRESST, so they declare the studies nonexistent. See, for example, https://www.tandfonline.com/doi/full/10.1080/15305058.2011.602920 ; https://nonpartisaneducation.org/Review/Resources/QuantitativeList.htm ; https://nonpartisaneducation.org/Review/Resources/SurveyList.htm ; https://nonpartisaneducation.org/Review/Resources/QualitativeList.htm
49	Marguerite Clarke		“The evidence base is stronger in some areas than in others. For example, there are many professional standards for assessment quality that ` be applied to classroom assessments, examinations, and large-scale assessments (APA, AERA, and NCME, 1999), but less professional or empirical research on enabling contexts.” p. 20	Dismissive	Framework for Building an Effective Student Assessment System	World Bank, READ/SABER Working Paper, Aug. 2011	http://files.eric.ed.gov/fulltext/ED553178.pdf	World Bank funders	No matter that there exist hundreds of other countries, a century's worth of research prior to 2010, literally thousands of other journals that might publish such a article, and a large "grey literature" of alignment studies conducted as routine parts of test development. Virtually any standards-based, large-scale test development includes an alignment study, not to be found in a scholarly journal. Some notable alignment studies: with NRTs: Freeman, Kuhs, Porter, Floden, Schmidt, Schwille (1983); Debra P. v. Turlington (1984); Cohen, Spillane (1993); La Marca, Redfield, Winter, Bailey, and Despriet (2000); Wainer (2011) with Standards: Archbald (1994); Buckendahl, Plake, Impara, Irwin (2000); Bhola, Impara, Buckendahl (2003); Phelps (2005) with RTs: Massell, Kirst, Hoppe (1997); Wiley, Hembry, Buckendahl, Forte,Towles Nebelsick-Gullett (2015)
50	Marguerite Clarke		“Data for some of these indicator areas can be found in official documents, published reports (for example, Ferrer, 2006), research articles (for example, Braun and Kanjee, 2005), and online databases. For the most part, data have not been gathered in any comprehensive or systematic fashion. Those wishing to review this type of information for a particular assessment system will most likely need to collect the data themselves.” p. 21	Denigrating	Framework for Building an Effective Student Assessment System	World Bank, READ/SABER Working Paper, Aug. 2011	http://files.eric.ed.gov/fulltext/ED553178.pdf	World Bank funders	No matter that there exist hundreds of other countries, a century's worth of research prior to 2010, literally thousands of other journals that might publish such a article, and a large "grey literature" of alignment studies conducted as routine parts of test development. Virtually any standards-based, large-scale test development includes an alignment study, not to be found in a scholarly journal. Some notable alignment studies: with NRTs: Freeman, Kuhs, Porter, Floden, Schmidt, Schwille (1983); Debra P. v. Turlington (1984); Cohen, Spillane (1993); La Marca, Redfield, Winter, Bailey, and Despriet (2000); Wainer (2011) with Standards: Archbald (1994); Buckendahl, Plake, Impara, Irwin (2000); Bhola, Impara, Buckendahl (2003); Phelps (2005) with RTs: Massell, Kirst, Hoppe (1997); Wiley, Hembry, Buckendahl, Forte,Towles Nebelsick-Gullett (2015)
51	Marguerite Clarke		“This paper has extracted principles and guidelines from countries’ experiences and the current research base to outline a framework for developing a more effective student assessment system. The framework provides policy makers and others with a structure for discussion and consensus building around priorities and key inputs for their assessment system.” p. 27	1rstness	Framework for Building an Effective Student Assessment System	World Bank, READ/SABER Working Paper, Aug. 2011	http://files.eric.ed.gov/fulltext/ED553178.pdf	World Bank funders	No matter that there exist hundreds of other countries, a century's worth of research prior to 2010, literally thousands of other journals that might publish such a article, and a large "grey literature" of alignment studies conducted as routine parts of test development. Virtually any standards-based, large-scale test development includes an alignment study, not to be found in a scholarly journal. Some notable alignment studies: with NRTs: Freeman, Kuhs, Porter, Floden, Schmidt, Schwille (1983); Debra P. v. Turlington (1984); Cohen, Spillane (1993); La Marca, Redfield, Winter, Bailey, and Despriet (2000); Wainer (2011) with Standards: Archbald (1994); Buckendahl, Plake, Impara, Irwin (2000); Bhola, Impara, Buckendahl (2003); Phelps (2005) with RTs: Massell, Kirst, Hoppe (1997); Wiley, Hembry, Buckendahl, Forte,Towles Nebelsick-Gullett (2015)
52	Michael Hout, Stuart W. Elliot, Editors		"Unfortunately, there were no other studies available that would have allowed us to contrast the overall effect of state incentive programs predating NCLB…" p. 4-6	Dismissive	Incentives and Test-Based Accountability in Education, 2011	Board on Testing and Assessment, National Research Council	https://www.nap.edu/catalog/12521/incentives-and-test-based-accountability-in-education	National Research Council funders	Relevant studies of the effects of varying types of incentive or the optimal structure of incentives include those of Kelley (1999); the Southern Regional Education Board (1998); Trelfa (1998); Heneman (1998); Banta, Lund, Black & Oblander (1996); Brooks-Cooper, 1993; Eckstein & Noah (1993); Richards & Shen (1992); Jacobson (1992); Heyneman & Ransom (1992); Levine & Lezotte (1990); Duran, 1989; Crooks (1988); Kulik & Kulik (1987); Corcoran & Wilson (1986); Guskey & Gates (1986); Brook & Oxenham (1985); Oxenham (1984); Venezky & Winfield (1979); Brookover & Lezotte (1979); McMillan (1977); Abbott (1977); Staats (1973); Kazdin & Bootzin (1972); O’Leary & Drabman (1971); Cronbach (1960); and Hurlock (1925). *Covers many studies; study is a research review, research synthesis, or meta-analysis. Other researchers who, prior to 2000, studied test-based incentive programs include Homme, Csanyi, Gonzales, Rechs, O’Leary, Drabman, Kaszdin, Bootzin, Staats, Cameron, Pierce, McMillan, Corcoran, Roueche, Kirk, Wheeler, Boylan, and Wilson.	"Others have considered the role of tests in incentive programs. These researchers have included Homme, Csanyi, Gonzales, Rechs, O’Leary, Drabman, Kaszdin, Bootzin, Staats, Cameron, Pierce, McMillan, Corcoran, and Wilson. International organizations, such as the World Bank or the Asian Development Bank, have studied the effects of testing on education programs they sponsor. Researchers have included Somerset, Heynemann, Ransom, Psacharopoulis, Velez, Brooke, Oxenham, Bude, Chapman, Snyder, and Pronaratna. Moreover, the mastery learning/mastery testing experiments conducted from the 1960s through today varied incentives, frequency of tests, types of tests, and many other factors to determine the optimal structure of testing programs. Researchers included such notables as Bloom, Carroll, Keller, Block, Burns, Wentling, Anderson, Hymel, Kulik, Tierney, Cross, Okey, Guskey, Gates, and Jones."	What about: Brooks-Cooper, C. (1993), Brown, S. M. & Walberg, H. J. (1993), Heneman, H. G., III. (1998), Hurlock, E. B. (1925), Jones, J. et al. (1996), Kazdin, A. & Bootzin, R. (1972), Kelley, C. (1999), Kirkpatrick, J. E. (1934), O’Leary, K. D. & Drabman, R. (1971), Palmer, J. S. (2002), Richards, C. E. & Shen, T. M. (1992), .Rosswork, S. G. (1977), Staats, A. (1973), Tuckman, B. W. (1994), Tuckman, B. W. & Trimble, S. (1997), Webster, W. J., Mendro, R. L., Orsack, T., Weerasinghe, D. & Bembry, K. (1997)
53	Michael Hout, Stuart W. Elliot, Editors		"Test-based incentive programs, as designed and implemented in the programs that have been carefully studied have not increased student achievement enough to bring the United States close to the levels of the highest achieving countries.", p. 4-26	Denigrating	Incentives and Test-Based Accountability in Education, 2011	Board on Testing and Assessment, National Research Council	https://www.nap.edu/catalog/12521/incentives-and-test-based-accountability-in-education	National Research Council funders	Relevant studies of the effects of varying types of incentive or the optimal structure of incentives include those of Kelley (1999); the Southern Regional Education Board (1998); Trelfa (1998); Heneman (1998); Banta, Lund, Black & Oblander (1996); Brooks-Cooper, 1993; Eckstein & Noah (1993); Richards & Shen (1992); Jacobson (1992); Heyneman & Ransom (1992); Levine & Lezotte (1990); Duran, 1989; Crooks (1988); Kulik & Kulik (1987); Corcoran & Wilson (1986); Guskey & Gates (1986); Brook & Oxenham (1985); Oxenham (1984); Venezky & Winfield (1979); Brookover & Lezotte (1979); McMillan (1977); Abbott (1977); Staats (1973); Kazdin & Bootzin (1972); O’Leary & Drabman (1971); Cronbach (1960); and Hurlock (1925). *Covers many studies; study is a research review, research synthesis, or meta-analysis. Other researchers who, prior to 2000, studied test-based incentive programs include Homme, Csanyi, Gonzales, Rechs, O’Leary, Drabman, Kaszdin, Bootzin, Staats, Cameron, Pierce, McMillan, Corcoran, Roueche, Kirk, Wheeler, Boylan, and Wilson.	Others have considered the role of tests in incentive programs. These researchers have included Homme, Csanyi, Gonzales, Rechs, O’Leary, Drabman, Kaszdin, Bootzin, Staats, Cameron, Pierce, McMillan, Corcoran, and Wilson. International organizations, such as the World Bank or the Asian Development Bank, have studied the effects of testing on education programs they sponsor. Researchers have included Somerset, Heynemann, Ransom, Psacharopoulis, Velez, Brooke, Oxenham, Bude, Chapman, Snyder, and Pronaratna. Moreover, the mastery learning/mastery testing experiments conducted from the 1960s through today varied incentives, frequency of tests, types of tests, and many other factors to determine the optimal structure of testing programs. Researchers included such notables as Bloom, Carroll, Keller, Block, Burns, Wentling, Anderson, Hymel, Kulik, Tierney, Cross, Okey, Guskey, Gates, and Jones.	What about: Brooks-Cooper, C. (1993), Brown, S. M. & Walberg, H. J. (1993), Heneman, H. G., III. (1998), Hurlock, E. B. (1925), Jones, J. et al. (1996), Kazdin, A. & Bootzin, R. (1972), Kelley, C. (1999), Kirkpatrick, J. E. (1934), O’Leary, K. D. & Drabman, R. (1971), Palmer, J. S. (2002), Richards, C. E. & Shen, T. M. (1992), .Rosswork, S. G. (1977), Staats, A. (1973), Tuckman, B. W. (1994), Tuckman, B. W. & Trimble, S. (1997), Webster, W. J., Mendro, R. L., Orsack, T., Weerasinghe, D. & Bembry, K. (1997)
54	Michael Hout, Stuart W. Elliot, Editors		"Despite using them for several decades, policymakers and educators do not yet know how to use test-based incentives to consistently generate positive effects on achievement and to improve education." p .5-1	Dismissive	Incentives and Test-Based Accountability in Education, 2011	Board on Testing and Assessment, National Research Council	https://www.nap.edu/catalog/12521/incentives-and-test-based-accountability-in-education	National Research Council funders	Relevant studies of the effects of varying types of incentive or the optimal structure of incentives include those of Kelley (1999); the Southern Regional Education Board (1998); Trelfa (1998); Heneman (1998); Banta, Lund, Black & Oblander (1996); Brooks-Cooper, 1993; Eckstein & Noah (1993); Richards & Shen (1992); Jacobson (1992); Heyneman & Ransom (1992); Levine & Lezotte (1990); Duran, 1989; Crooks (1988); Kulik & Kulik (1987); Corcoran & Wilson (1986); Guskey & Gates (1986); Brook & Oxenham (1985); Oxenham (1984); Venezky & Winfield (1979); Brookover & Lezotte (1979); McMillan (1977); Abbott (1977); Staats (1973); Kazdin & Bootzin (1972); O’Leary & Drabman (1971); Cronbach (1960); and Hurlock (1925). *Covers many studies; study is a research review, research synthesis, or meta-analysis. Other researchers who, prior to 2000, studied test-based incentive programs include Homme, Csanyi, Gonzales, Rechs, O’Leary, Drabman, Kaszdin, Bootzin, Staats, Cameron, Pierce, McMillan, Corcoran, Roueche, Kirk, Wheeler, Boylan, and Wilson.	"Others have considered the role of tests in incentive programs. These researchers have included Homme, Csanyi, Gonzales, Rechs, O’Leary, Drabman, Kaszdin, Bootzin, Staats, Cameron, Pierce, McMillan, Corcoran, and Wilson. International organizations, such as the World Bank or the Asian Development Bank, have studied the effects of testing on education programs they sponsor. Researchers have included Somerset, Heynemann, Ransom, Psacharopoulis, Velez, Brooke, Oxenham, Bude, Chapman, Snyder, and Pronaratna. Moreover, the mastery learning/mastery testing experiments conducted from the 1960s through today varied incentives, frequency of tests, types of tests, and many other factors to determine the optimal structure of testing programs. Researchers included such notables as Bloom, Carroll, Keller, Block, Burns, Wentling, Anderson, Hymel, Kulik, Tierney, Cross, Okey, Guskey, Gates, and Jones."	What about: Brooks-Cooper, C. (1993), Brown, S. M. & Walberg, H. J. (1993), Heneman, H. G., III. (1998), Hurlock, E. B. (1925), Jones, J. et al. (1996), Kazdin, A. & Bootzin, R. (1972), Kelley, C. (1999), Kirkpatrick, J. E. (1934), O’Leary, K. D. & Drabman, R. (1971), Palmer, J. S. (2002), Richards, C. E. & Shen, T. M. (1992), .Rosswork, S. G. (1977), Staats, A. (1973), Tuckman, B. W. (1994), Tuckman, B. W. & Trimble, S. (1997), Webster, W. J., Mendro, R. L., Orsack, T., Weerasinghe, D. & Bembry, K. (1997)
55	Michael Hout, Stuart W. Elliot, Editors		"The general lack of guidance coming from existing studies of test-based incentive programs in education…"	Dismissive	Incentives and Test-Based Accountability in Education, 2011	Board on Testing and Assessment, National Research Council	https://www.nap.edu/catalog/12521/incentives-and-test-based-accountability-in-education	National Research Council funders	Relevant studies of the effects of varying types of incentive or the optimal structure of incentives include those of Kelley (1999); the Southern Regional Education Board (1998); Trelfa (1998); Heneman (1998); Banta, Lund, Black & Oblander (1996); Brooks-Cooper, 1993; Eckstein & Noah (1993); Richards & Shen (1992); Jacobson (1992); Heyneman & Ransom (1992); Levine & Lezotte (1990); Duran, 1989; Crooks (1988); Kulik & Kulik (1987); Corcoran & Wilson (1986); Guskey & Gates (1986); Brook & Oxenham (1985); Oxenham (1984); Venezky & Winfield (1979); Brookover & Lezotte (1979); McMillan (1977); Abbott (1977); Staats (1973); Kazdin & Bootzin (1972); O’Leary & Drabman (1971); Cronbach (1960); and Hurlock (1925). *Covers many studies; study is a research review, research synthesis, or meta-analysis. Other researchers who, prior to 2000, studied test-based incentive programs include Homme, Csanyi, Gonzales, Rechs, O’Leary, Drabman, Kaszdin, Bootzin, Staats, Cameron, Pierce, McMillan, Corcoran, Roueche, Kirk, Wheeler, Boylan, and Wilson.	"Others have considered the role of tests in incentive programs. These researchers have included Homme, Csanyi, Gonzales, Rechs, O’Leary, Drabman, Kaszdin, Bootzin, Staats, Cameron, Pierce, McMillan, Corcoran, and Wilson. International organizations, such as the World Bank or the Asian Development Bank, have studied the effects of testing on education programs they sponsor. Researchers have included Somerset, Heynemann, Ransom, Psacharopoulis, Velez, Brooke, Oxenham, Bude, Chapman, Snyder, and Pronaratna. Moreover, the mastery learning/mastery testing experiments conducted from the 1960s through today varied incentives, frequency of tests, types of tests, and many other factors to determine the optimal structure of testing programs. Researchers included such notables as Bloom, Carroll, Keller, Block, Burns, Wentling, Anderson, Hymel, Kulik, Tierney, Cross, Okey, Guskey, Gates, and Jones."	What about: Brooks-Cooper, C. (1993), Brown, S. M. & Walberg, H. J. (1993), Heneman, H. G., III. (1998), Hurlock, E. B. (1925), Jones, J. et al. (1996), Kazdin, A. & Bootzin, R. (1972), Kelley, C. (1999), Kirkpatrick, J. E. (1934), O’Leary, K. D. & Drabman, R. (1971), Palmer, J. S. (2002), Richards, C. E. & Shen, T. M. (1992), .Rosswork, S. G. (1977), Staats, A. (1973), Tuckman, B. W. (1994), Tuckman, B. W. & Trimble, S. (1997), Webster, W. J., Mendro, R. L., Orsack, T., Weerasinghe, D. & Bembry, K. (1997)
56	Eva L. Baker		"At the same time that interest in alternative assessment is high, our knowledge about the design, distribution, quality and impact of such efforts is low. This is a time of tingling metaphor, cottage industry, and existence proofs rather than carefully designed research and development." p.2	Dismissive, Denigrating	What Probably Works in Alternative Assessment, July 2010	CRESST Report 772		Institute of Education Sciences, US Education Department	It is selected-response item formats (e.g., multiple choice) that are new. Open-ended item formats have been with us for millenia. And, thousands of research, evalution, and validity studies have been conducted on them. See, for example, https://www.tandfonline.com/doi/full/10.1080/15305058.2011.602920 ; https://nonpartisaneducation.org/Review/Resources/QuantitativeList.htm ; https://nonpartisaneducation.org/Review/Resources/SurveyList.htm ; https://nonpartisaneducation.org/Review/Resources/QualitativeList.htm
57	Eva L. Baker		"Moreover, because psychometric methods appropriate for dealing with such new measures are not readily available, nor even a matter of common agreement, no clear templates exist to guide the technical practices of alternative assessment developers (Linn, Baker, Dunbar, 1991)." p.2	Dismissive	What Probably Works in Alternative Assessment, July 2010	CRESST Report 772		Institute of Education Sciences, US Education Department	It is selected-response item formats (e.g., multiple choice) that are new. Open-ended item formats have been with us for millenia. And, thousands of research, evalution, and validity studies have been conducted on them. See, for example, https://www.tandfonline.com/doi/full/10.1080/15305058.2011.602920 ; https://nonpartisaneducation.org/Review/Resources/QuantitativeList.htm ; https://nonpartisaneducation.org/Review/Resources/SurveyList.htm ; https://nonpartisaneducation.org/Review/Resources/QualitativeList.htm
58	Eva L. Baker		"Given that the level of empirical work is so obviously low, one well might wonder what these studies are about. Some studies argue for new approaches to achievement testing." p.3	Denigrating	What Probably Works in Alternative Assessment, July 2010	CRESST Report 772		Institute of Education Sciences, US Education Department	She looked in two databases -- ERIC and NTIS -- and then implied she had looked everywhere.
59	Eva L. Baker		"Despite this fragile research base, alternative assessment has already taken off. What issues can we anticipate being raised by relevant communities about the value of these efforts?" p.6	Dismissive, Denigrating	What Probably Works in Alternative Assessment, July 2010	CRESST Report 772		Institute of Education Sciences, US Education Department	She looked in two databases -- ERIC and NTIS -- and then implied she had looked everywhere.
60	Lawrence O. Picus	Frank Adamson William Montague Margaret Owens	"As in the earlier studies, efforts are made to distinguish between the concept of economic or opportunity costs (i.e., the use of teacher time that is already “paid for” through the contract and used as part of the assessment process rather then for some other activity or function), and the direct expenditures made for assessment." p.1	Dismissive	A New Conceptual Framework for Analyzing the Costs of Performance Assessment, 2010	The Stanford Center for Opportunity Policy in Education (SCOPE)	https://edpolicy.stanford.edu/sites/default/files/publications/new-conceptual-framework-analyzing-costs-performance-assessment_0.pdf	SCOPE funders (1)	For at least two decades, Larry Picus has elevated the trivial and elemental difference between expenditure and cost to the level of heavenly revelation. As any beginning undergraduate in economics knows expenditures -- particularly budgetary line-item expenditures -- don't necessarily equal the cost of an item or acitivity. The classifications of the amounts might or might not match. Picus needled the trivial point over and over for decades. Meanwhile my project on testing costs at the GAO (1991-1993) was a cost study in every sense that Picus identified for the term, but the word "expenditures" was in the title of the report. So, when Picus repeated and repeated that most studies on the topic prior to his were "just expenditure studies" (and not really "cost" studies) there was the GAO report, one of the few cost studies done prior to his, with the word expenditure in its title. The ploy worked, and many were convinced then, and still today, that my work at the GAO relied on budgetary line-item expenditure data (it didn't), neglected to include the cost of personnel time (it did include those costs), or was otherwise suspect, an inferior study. Picus and CRESST managed to denigrate into oblivion a taxpayer-funded study that was vastly superior to any he would ever do.
61	Lawrence O. Picus	Frank Adamson William Montague Margaret Owens	"Determining the resources necessary to achieve each of these goals is, at best, a complex task. Because of this difficulty, many analysts stop short of estimating the true costs of a program, and instead focus on the expenditures required for its implementation." p.7	Dismissive	A New Conceptual Framework for Analyzing the Costs of Performance Assessment, 2010	The Stanford Center for Opportunity Policy in Education (SCOPE)	https://edpolicy.stanford.edu/sites/default/files/publications/new-conceptual-framework-analyzing-costs-performance-assessment_0.pdf	SCOPE funders (1)	For at least two decades, Larry Picus has elevated the trivial and elemental difference between expenditure and cost to the level of heavenly revelation. As any beginning undergraduate in economics knows expenditures -- particularly budgetary line-item expenditures -- don't necessarily equal the cost of an item or acitivity. The classifications of the amounts might or might not match. Picus needled the trivial point over and over for decades. Meanwhile my project on testing costs at the GAO (1991-1993) was a cost study in every sense that Picus identified for the term, but the word "expenditures" was in the title of the report. So, when Picus repeated and repeated that most studies on the topic prior to his were "just expenditure studies" (and not really "cost" studies) there was the GAO report, one of the few cost studies done prior to his, with the word expenditure in its title. The ploy worked, and many were convinced then, and still today, that my work at the GAO relied on budgetary line-item expenditure data (it didn't), neglected to include the cost of personnel time (it did include those costs), or was otherwise suspect, an inferior study. Picus and CRESST managed to denigrate into oblivion a taxpayer-funded study that was vastly superior to any he would ever do.
62	Lawrence O. Picus	Frank Adamson William Montague Margaret Owens	"The study defined purchase cost as the money spent on test-related goods and services, a category in line with what we call expenditures (U.S. GAO, 1993)." p.21	Denigrating	A New Conceptual Framework for Analyzing the Costs of Performance Assessment, 2010	The Stanford Center for Opportunity Policy in Education (SCOPE)	https://edpolicy.stanford.edu/sites/default/files/publications/new-conceptual-framework-analyzing-costs-performance-assessment_0.pdf	SCOPE funders (1)	For at least two decades, Larry Picus has elevated the trivial and elemental difference between expenditure and cost to the level of heavenly revelation. As any beginning undergraduate in economics knows expenditures -- particularly budgetary line-item expenditures -- don't necessarily equal the cost of an item or acitivity. The classifications of the amounts might or might not match. Picus needled the trivial point over and over for decades. Meanwhile my project on testing costs at the GAO (1991-1993) was a cost study in every sense that Picus identified for the term, but the word "expenditures" was in the title of the report. So, when Picus repeated and repeated that most studies on the topic prior to his were "just expenditure studies" (and not really "cost" studies) there was the GAO report, one of the few cost studies done prior to his, with the word expenditure in its title. The ploy worked, and many were convinced then, and still today, that my work at the GAO relied on budgetary line-item expenditure data (it didn't), neglected to include the cost of personnel time (it did include those costs), or was otherwise suspect, an inferior study. Picus and CRESST managed to denigrate into oblivion a taxpayer-funded study that was vastly superior to any he would ever do.
63	Lawrence O. Picus	Frank Adamson William Montague Margaret Owens	"Unfortunately, aggregating these different types of time disguises important differences between them that, in fairness to the GAO, have emerged in the NCLB era as more important considerations than in previous decades. Specifically, test-preparation time for students has become a subject of national debate about how much class time teachers spend 'teaching to the test.'" p.21	Denigrating	A New Conceptual Framework for Analyzing the Costs of Performance Assessment, 2010	The Stanford Center for Opportunity Policy in Education (SCOPE)	https://edpolicy.stanford.edu/sites/default/files/publications/new-conceptual-framework-analyzing-costs-performance-assessment_0.pdf	SCOPE funders (1)	I continued to publish articles and made presentations based on the GAO project for several years after I left the GAO. These publications reported the disagregated costs and estimated benefits. Indeed, I published a net benefit (i.e., benefit/cost) study in the Journal of Education Finance ten years prior to this Picus article. Almost certainly he knows about it -- he has served as editor or on the editorial board for that journal for many years. In this report of his for SCOPE, my name is never mentioned nor are any of my many publications or presentations related to the costs and benefits of testing.
64	Lawrence O. Picus	Frank Adamson William Montague Margaret Owens	"In its analysis, the GAO does provide aggregate time estimates. However, it does not provide disaggregated estimates of teacher time, nor estimated benefits in terms of either teacher PD or improved student learning." p.21	Denigrating	A New Conceptual Framework for Analyzing the Costs of Performance Assessment, 2010	The Stanford Center for Opportunity Policy in Education (SCOPE)	https://edpolicy.stanford.edu/sites/default/files/publications/new-conceptual-framework-analyzing-costs-performance-assessment_0.pdf	SCOPE funders (1)	I continued to publish articles and made presentations based on the GAO project for several years after I left the GAO. These publications reported the disagregated costs and estimated benefits. Indeed, I published a net benefit (i.e., benefit/cost) study in the Journal of Education Finance ten years prior to this Picus article. Almost certainly he knows about it -- he has served as editor or on the editorial board for that journal for many years. In this report of his for SCOPE, my name is never mentioned nor are any of my many publications or presentations related to the costs and benefits of testing.
65	Lawrence O. Picus	Frank Adamson William Montague Margaret Owens	"The performance assessments studied by the GAO also do not demonstrate much variety. Most included only writing samples, reading comprehension and response, and math/science problem-solving items. A few districts used science lab work, group work, and skills observations, but most still relied on paper-and-pencil testing (U.S. GAO, 1993)." p.21	Denigrating	A New Conceptual Framework for Analyzing the Costs of Performance Assessment, 2010	The Stanford Center for Opportunity Policy in Education (SCOPE)	https://edpolicy.stanford.edu/sites/default/files/publications/new-conceptual-framework-analyzing-costs-performance-assessment_0.pdf	SCOPE funders (1)	Picus neglects to mention that the GAO collected data from the universe of states with testing programs and a very large, representative sample (> 660) of public school districts. We collected all the data on all the systemwide testing occurring at the time. We oversampled districts in certain states, such as Maryland, the one state at the time with the most elaborate performance test types. In doing that, we did more than he ever did in his couple of state studies. Yet, as usual, he implies that the GAO study or my work must have left out something important.
66	Lawrence O. Picus	Frank Adamson William Montague Margaret Owens	"In every instance, test developers crafting the performancebased tests started from scratch, writing test questions that fit the state’s curriculum or guidelines, then testing the draft on pilot groups of students and using an iterative revision process that did not involve state curriculum, which was undergoing simultaneous development (U.S. GAO, 1993)." p.22	Denigrating	A New Conceptual Framework for Analyzing the Costs of Performance Assessment, 2010	The Stanford Center for Opportunity Policy in Education (SCOPE)	https://edpolicy.stanford.edu/sites/default/files/publications/new-conceptual-framework-analyzing-costs-performance-assessment_0.pdf	SCOPE funders (1)	This sentence doesn't make sense, but he doesn't include page numbers in his citations so it is not even possible to find what text he might have been misunderstanding. Within one sentence, Picus claims that test items were based on established content standards, but then not based on them, because they didn't yet exist. The latter point is certainly not true. When standards-based tests are developed, the content standards are completed first, and the test items are written directly from them.
67	Joan L. Herman	Ellen Osmundson, David Silver	"These indeed are promising developments for pushing formative assessment to fruition in classroom practice. They acknowledge and work toward remedying the need for classroom tools to assess and support student learning. Yet at the same time, recent studies reveal challenges in implementing quality formative assessment and show non-robust results with regard to effects on student learning (Herman, Osmundson, Ayala, Schneider, & Timms, 2006; Furtak, et al., 2008)."	Dismissive, Denigrating	Capturing Quality in Formative Assessment Practice: Measurement Challenges, p.2	CRESST Report 770, June 2010	https://eric.ed.gov/?id=ED512648	Institute of Education Sciences, US Education Department	Studies of formative testing date back a cenury, and the evidence, on average, is strongly positive, which is not the result favored by CRESST, so they declare the studies nonexistent. See, for example, https://www.tandfonline.com/doi/full/10.1080/15305058.2011.602920 ; https://nonpartisaneducation.org/Review/Resources/QuantitativeList.htm ; https://nonpartisaneducation.org/Review/Resources/SurveyList.htm ; https://nonpartisaneducation.org/Review/Resources/QualitativeList.htm
68	Joan L. Herman	Ellen Osmundson, David Silver	"Just as the concept of formative assessment itself underscores the central role of evidence—learning data—in an effective teaching and learning process, so too do policymakers and practitioners need evidence on which to build effective formative practices. Toward this latter goal, this report explores ..."	1stness	Capturing Quality in Formative Assessment Practice: Measurement Challenges, p.2	CRESST Report 770, June 2010	https://eric.ed.gov/?id=ED512648	Institute of Education Sciences, US Education Department	Studies of formative testing date back a cenury, and the evidence, on average, is strongly positive, which is not the result favored by CRESST, so they declare the studies nonexistent. See, for example, https://www.tandfonline.com/doi/full/10.1080/15305058.2011.602920 ; https://nonpartisaneducation.org/Review/Resources/QuantitativeList.htm ; https://nonpartisaneducation.org/Review/Resources/SurveyList.htm ; https://nonpartisaneducation.org/Review/Resources/QualitativeList.htm
69	Diana Pullin (Chair)	Joan Herman, Scott Marion, Dirk Mattson, Rebecca Maynard, Mark Wilson,	"However, there have been very few studies of how interim assessments are actually used by individual teachers in classrooms, by principals, and by districts or of their impact on student achievement." p. 6	Dismissive	Best Practices for State Assessment Systems, Part I	Committee on Best Practices for State Assessment Systems: Improving Assessment While Revisiting Standards; Center for Education; Division of Behavioral and Social Sciences and Education; National Research Council	https://www.nap.edu/catalog/12906/best-practices-for-state-assessment-systems-part-i-summary-of	"With funding from the James B. Hunt, Jr. Institute for Educational Leadership and Policy, as well as additional support from the Bill & Melinda Gates Foundation and the Stupski Foundation, the National Research Council (NRC) planned two workshops designed to explore some of the possibilities for state assessment systems."
70	Diana Pullin (Chair)	Joan Herman, Scott Marion, Dirk Mattson, Rebecca Maynard, Mark Wilson,	"Research indicates that the result has been emphasis on lower-level knowledge and skills and very thin alignment with the standards. For example, Porter, Polikoff, and Smithson (2009) found very low to moder ate alignment between state assessments and standards—meaning that large proportions of content standards are not covered on the assessments (see also Fuller et al., 2006; Ho, 2008). p. 10	Denigrating	Best Practices for State Assessment Systems, Part I	Committee on Best Practices for State Assessment Systems: Improving Assessment While Revisiting Standards; Center for Education; Division of Behavioral and Social Sciences and Education; National Research Council	https://www.nap.edu/catalog/12906/best-practices-for-state-assessment-systems-part-i-summary-of	"With funding from the James B. Hunt, Jr. Institute for Educational Leadership and Policy, as well as additional support from the Bill & Melinda Gates Foundation and the Stupski Foundation, the National Research Council (NRC) planned two workshops designed to explore some of the possibilities for state assessment systems."
71	Diana Pullin (Chair)	Joan Herman, Scott Marion, Dirk Mattson, Rebecca Maynard, Mark Wilson,	"Another issue is that the implications of computer-based approaches for validity and reliability have not been thoroughly evaluated." p. 40	Dismissive	Best Practices for State Assessment Systems, Part I	Committee on Best Practices for State Assessment Systems: Improving Assessment While Revisiting Standards; Center for Education; Division of Behavioral and Social Sciences and Education; National Research Council	https://www.nap.edu/catalog/12906/best-practices-for-state-assessment-systems-part-i-summary-of	"With funding from the James B. Hunt, Jr. Institute for Educational Leadership and Policy, as well as additional support from the Bill & Melinda Gates Foundation and the Stupski Foundation, the National Research Council (NRC) planned two workshops designed to explore some of the possibilities for state assessment systems."
72	Diana Pullin (Chair)	Joan Herman, Scott Marion, Dirk Mattson, Rebecca Maynard, Mark Wilson,	"For current tests, he [Lauress Wise] observed, there is little evidence that they are good indicators of instructional effectiveness or good predictors of students’ readiness for subsequent levels of instruction."	Dismissive	Best Practices for State Assessment Systems, Part I	Committee on Best Practices for State Assessment Systems: Improving Assessment While Revisiting Standards; Center for Education; Division of Behavioral and Social Sciences and Education; National Research Council	https://www.nap.edu/catalog/12906/best-practices-for-state-assessment-systems-part-i-summary-of	"With funding from the James B. Hunt, Jr. Institute for Educational Leadership and Policy, as well as additional support from the Bill & Melinda Gates Foundation and the Stupski Foundation, the National Research Council (NRC) planned two workshops designed to explore some of the possibilities for state assessment systems."
73	Laura S. Hamilton	Brian M. Stecher, Kun Yuan	“A few studies have attempted to examine how the creation and publication of standards, per se, have affected practices.” p. 3	Dismissive	Standards-Based Reform in the United States: History, Research, and Future Directions	Center on Education Policy, December, 2008	http://www.rand.org/content/dam/rand/pubs/reprints/2009/RAND_RP1384.pdf	Institute of Education Sciences, US Education Department	Relevant pre-2000 studies of the effects of standards, alignment, goal setting, setting reachable goals, etc. include those of Mitchell (1999); Morgan & Ramist (1998); the Southern Regional Education Board (1998); Miles, Bishop, Collins, Fink, Gardner, Grant, Hussain, et al. (1997); the Florida Office of Program Policy Analysis (1997); Pomplun (1997); Schmoker (1996); Aguilera & Hendricks (1996); Banta, Lund, Black & Oblander (1996); Bottoms & Mikos (1995); Bamburg & Medina (1993); Bishop (1993); the U. S. General Accounting Office (1993); Eckstein & Noah (1993); Mattsson (1993); Brown (1992); Heyneman & Ransom (1992); Whetton (1992); Anderson, Muir, Bateson, Blackmore & Rogers (1990); Csikszentmihalyi (1990); Levine & Lezotte (1990); LaRoque & Coleman (1989); Hillocks (1987); Willingham & Morris (1986); Resnick & Resnick (1985); Ogle & Fritts (1984); Natriello & Dornbusch (1984); Brooke & Oxenham (1984); Rentz (1979); Wellisch, MacQueen, Carriere & Dick (1978); *Rosswork (1977); Estes, Colvin & Goodwin (1976); Wood (1953); and Panlasigui & Knight (1930).
74	Laura S. Hamilton	Brian M. Stecher, Kun Yuan	“The research evidence does not provide definitive answers to these questions.” p. 6	Denigrating	Standards-Based Reform in the United States: History, Research, and Future Directions	Center on Education Policy, December, 2008	http://www.rand.org/content/dam/rand/pubs/reprints/2009/RAND_RP1384.pdf	Institute of Education Sciences, US Education Department	Relevant pre-2000 studies of the effects of standards, alignment, goal setting, setting reachable goals, etc. include those of Mitchell (1999); Morgan & Ramist (1998); the Southern Regional Education Board (1998); Miles, Bishop, Collins, Fink, Gardner, Grant, Hussain, et al. (1997); the Florida Office of Program Policy Analysis (1997); Pomplun (1997); Schmoker (1996); Aguilera & Hendricks (1996); Banta, Lund, Black & Oblander (1996); Bottoms & Mikos (1995); Bamburg & Medina (1993); Bishop (1993); the U. S. General Accounting Office (1993); Eckstein & Noah (1993); Mattsson (1993); Brown (1992); Heyneman & Ransom (1992); Whetton (1992); Anderson, Muir, Bateson, Blackmore & Rogers (1990); Csikszentmihalyi (1990); Levine & Lezotte (1990); LaRoque & Coleman (1989); Hillocks (1987); Willingham & Morris (1986); Resnick & Resnick (1985); Ogle & Fritts (1984); Natriello & Dornbusch (1984); Brooke & Oxenham (1984); Rentz (1979); Wellisch, MacQueen, Carriere & Dick (1978); *Rosswork (1977); Estes, Colvin & Goodwin (1976); Wood (1953); and Panlasigui & Knight (1930).
75	Laura S. Hamilton	Brian M. Stecher, Kun Yuan	“He [Poynter 1994] also noted that ‘virtually all of the arguments, both for and against standards, are based on beliefs and hypotheses rather than on direct empirical evidence’ (p. 427).” pp. 34-35	Dismissive, Denigrating	Standards-Based Reform in the United States: History, Research, and Future Directions	Center on Education Policy, December, 2008	http://www.rand.org/content/dam/rand/pubs/reprints/2009/RAND_RP1384.pdf	Institute of Education Sciences, US Education Department	Relevant pre-2000 studies of the effects of standards, alignment, goal setting, setting reachable goals, etc. include those of Mitchell (1999); Morgan & Ramist (1998); the Southern Regional Education Board (1998); Miles, Bishop, Collins, Fink, Gardner, Grant, Hussain, et al. (1997); the Florida Office of Program Policy Analysis (1997); Pomplun (1997); Schmoker (1996); Aguilera & Hendricks (1996); Banta, Lund, Black & Oblander (1996); Bottoms & Mikos (1995); Bamburg & Medina (1993); Bishop (1993); the U. S. General Accounting Office (1993); Eckstein & Noah (1993); Mattsson (1993); Brown (1992); Heyneman & Ransom (1992); Whetton (1992); Anderson, Muir, Bateson, Blackmore & Rogers (1990); Csikszentmihalyi (1990); Levine & Lezotte (1990); LaRoque & Coleman (1989); Hillocks (1987); Willingham & Morris (1986); Resnick & Resnick (1985); Ogle & Fritts (1984); Natriello & Dornbusch (1984); Brooke & Oxenham (1984); Rentz (1979); Wellisch, MacQueen, Carriere & Dick (1978); *Rosswork (1977); Estes, Colvin & Goodwin (1976); Wood (1953); and Panlasigui & Knight (1930).
76	Laura S. Hamilton	Brian M. Stecher, Kun Yuan	"Although a large and growing body of research has been conducted to examine the effects of SBR, the caution Poynter expressed in 1994 about the lack of empirical evidence remains relevant today.” pp. 34-35	Dismissive	Standards-Based Reform in the United States: History, Research, and Future Directions	Center on Education Policy, December, 2008	http://www.rand.org/content/dam/rand/pubs/reprints/2009/RAND_RP1384.pdf	Institute of Education Sciences, US Education Department	Relevant pre-2000 studies of the effects of standards, alignment, goal setting, setting reachable goals, etc. include those of Mitchell (1999); Morgan & Ramist (1998); the Southern Regional Education Board (1998); Miles, Bishop, Collins, Fink, Gardner, Grant, Hussain, et al. (1997); the Florida Office of Program Policy Analysis (1997); Pomplun (1997); Schmoker (1996); Aguilera & Hendricks (1996); Banta, Lund, Black & Oblander (1996); Bottoms & Mikos (1995); Bamburg & Medina (1993); Bishop (1993); the U. S. General Accounting Office (1993); Eckstein & Noah (1993); Mattsson (1993); Brown (1992); Heyneman & Ransom (1992); Whetton (1992); Anderson, Muir, Bateson, Blackmore & Rogers (1990); Csikszentmihalyi (1990); Levine & Lezotte (1990); LaRoque & Coleman (1989); Hillocks (1987); Willingham & Morris (1986); Resnick & Resnick (1985); Ogle & Fritts (1984); Natriello & Dornbusch (1984); Brooke & Oxenham (1984); Rentz (1979); Wellisch, MacQueen, Carriere & Dick (1978); *Rosswork (1977); Estes, Colvin & Goodwin (1976); Wood (1953); and Panlasigui & Knight (1930).
77	Laura S. Hamilton	Brian M. Stecher, Kun Yuan	“Arguably the most important test of quality is whether the standards promote high-quality instruction and improved student learning, but as we discuss later, there is very little research to address that question.” p. 37	Dismissive	Standards-Based Reform in the United States: History, Research, and Future Directions	Center on Education Policy, December, 2008	http://www.rand.org/content/dam/rand/pubs/reprints/2009/RAND_RP1384.pdf	Institute of Education Sciences, US Education Department	Relevant pre-2000 studies of the effects of standards, alignment, goal setting, setting reachable goals, etc. include those of Mitchell (1999); Morgan & Ramist (1998); the Southern Regional Education Board (1998); Miles, Bishop, Collins, Fink, Gardner, Grant, Hussain, et al. (1997); the Florida Office of Program Policy Analysis (1997); Pomplun (1997); Schmoker (1996); Aguilera & Hendricks (1996); Banta, Lund, Black & Oblander (1996); Bottoms & Mikos (1995); Bamburg & Medina (1993); Bishop (1993); the U. S. General Accounting Office (1993); Eckstein & Noah (1993); Mattsson (1993); Brown (1992); Heyneman & Ransom (1992); Whetton (1992); Anderson, Muir, Bateson, Blackmore & Rogers (1990); Csikszentmihalyi (1990); Levine & Lezotte (1990); LaRoque & Coleman (1989); Hillocks (1987); Willingham & Morris (1986); Resnick & Resnick (1985); Ogle & Fritts (1984); Natriello & Dornbusch (1984); Brooke & Oxenham (1984); Rentz (1979); Wellisch, MacQueen, Carriere & Dick (1978); *Rosswork (1977); Estes, Colvin & Goodwin (1976); Wood (1953); and Panlasigui & Knight (1930).
78	Laura S. Hamilton	Brian M. Stecher, Kun Yuan	“[T]here have been a few studies of SBR as a comprehensive system. . . . [T]here is some research on how the adoption of standards, per se, or the alignment of standards with curriculum influences school practices or student outcomes.” p. 38	Dismissive	Standards-Based Reform in the United States: History, Research, and Future Directions	Center on Education Policy, December, 2008	http://www.rand.org/content/dam/rand/pubs/reprints/2009/RAND_RP1384.pdf	Institute of Education Sciences, US Education Department	Relevant pre-2000 studies of the effects of standards, alignment, goal setting, setting reachable goals, etc. include those of Mitchell (1999); Morgan & Ramist (1998); the Southern Regional Education Board (1998); Miles, Bishop, Collins, Fink, Gardner, Grant, Hussain, et al. (1997); the Florida Office of Program Policy Analysis (1997); Pomplun (1997); Schmoker (1996); Aguilera & Hendricks (1996); Banta, Lund, Black & Oblander (1996); Bottoms & Mikos (1995); Bamburg & Medina (1993); Bishop (1993); the U. S. General Accounting Office (1993); Eckstein & Noah (1993); Mattsson (1993); Brown (1992); Heyneman & Ransom (1992); Whetton (1992); Anderson, Muir, Bateson, Blackmore & Rogers (1990); Csikszentmihalyi (1990); Levine & Lezotte (1990); LaRoque & Coleman (1989); Hillocks (1987); Willingham & Morris (1986); Resnick & Resnick (1985); Ogle & Fritts (1984); Natriello & Dornbusch (1984); Brooke & Oxenham (1984); Rentz (1979); Wellisch, MacQueen, Carriere & Dick (1978); *Rosswork (1977); Estes, Colvin & Goodwin (1976); Wood (1953); and Panlasigui & Knight (1930).
79	Laura S. Hamilton	Brian M. Stecher, Kun Yuan	“The lack of evidence about the effects of SBR derives primarily from the fact that the vision has never been fully realized in practice.” p. 47	Dismissive	Standards-Based Reform in the United States: History, Research, and Future Directions	Center on Education Policy, December, 2008	http://www.rand.org/content/dam/rand/pubs/reprints/2009/RAND_RP1384.pdf	Institute of Education Sciences, US Education Department	Relevant pre-2000 studies of the effects of standards, alignment, goal setting, setting reachable goals, etc. include those of Mitchell (1999); Morgan & Ramist (1998); the Southern Regional Education Board (1998); Miles, Bishop, Collins, Fink, Gardner, Grant, Hussain, et al. (1997); the Florida Office of Program Policy Analysis (1997); Pomplun (1997); Schmoker (1996); Aguilera & Hendricks (1996); Banta, Lund, Black & Oblander (1996); Bottoms & Mikos (1995); Bamburg & Medina (1993); Bishop (1993); the U. S. General Accounting Office (1993); Eckstein & Noah (1993); Mattsson (1993); Brown (1992); Heyneman & Ransom (1992); Whetton (1992); Anderson, Muir, Bateson, Blackmore & Rogers (1990); Csikszentmihalyi (1990); Levine & Lezotte (1990); LaRoque & Coleman (1989); Hillocks (1987); Willingham & Morris (1986); Resnick & Resnick (1985); Ogle & Fritts (1984); Natriello & Dornbusch (1984); Brooke & Oxenham (1984); Rentz (1979); Wellisch, MacQueen, Carriere & Dick (1978); *Rosswork (1977); Estes, Colvin & Goodwin (1976); Wood (1953); and Panlasigui & Knight (1930).
80	Laura S. Hamilton	Brian M. Stecher, Kun Yuan	“[A]lthough many conceptions of SBR emphasize autonomy, we currently know relatively little about the effects of granting autonomy or what the right balance is between autonomy and prescriptiveness.” p. 55	Dismissive	Standards-Based Reform in the United States: History, Research, and Future Directions	Center on Education Policy, December, 2008	http://www.rand.org/content/dam/rand/pubs/reprints/2009/RAND_RP1384.pdf	Institute of Education Sciences, US Education Department	Relevant pre-2000 studies of the effects of standards, alignment, goal setting, setting reachable goals, etc. include those of Mitchell (1999); Morgan & Ramist (1998); the Southern Regional Education Board (1998); Miles, Bishop, Collins, Fink, Gardner, Grant, Hussain, et al. (1997); the Florida Office of Program Policy Analysis (1997); Pomplun (1997); Schmoker (1996); Aguilera & Hendricks (1996); Banta, Lund, Black & Oblander (1996); Bottoms & Mikos (1995); Bamburg & Medina (1993); Bishop (1993); the U. S. General Accounting Office (1993); Eckstein & Noah (1993); Mattsson (1993); Brown (1992); Heyneman & Ransom (1992); Whetton (1992); Anderson, Muir, Bateson, Blackmore & Rogers (1990); Csikszentmihalyi (1990); Levine & Lezotte (1990); LaRoque & Coleman (1989); Hillocks (1987); Willingham & Morris (1986); Resnick & Resnick (1985); Ogle & Fritts (1984); Natriello & Dornbusch (1984); Brooke & Oxenham (1984); Rentz (1979); Wellisch, MacQueen, Carriere & Dick (1978); *Rosswork (1977); Estes, Colvin & Goodwin (1976); Wood (1953); and Panlasigui & Knight (1930).
81	Laura S. Hamilton	Brian M. Stecher, Kun Yuan	“One of the primary responsibilities of the federal government should be to ensure ongoing collection of evidence demonstrating the effects of the policies, which could be used to make decisions about whether to continue on the current course or whether small adjustments or a major overhaul are needed.” p. 55	Dismissive	Standards-Based Reform in the United States: History, Research, and Future Directions	Center on Education Policy, December, 2008	http://www.rand.org/content/dam/rand/pubs/reprints/2009/RAND_RP1384.pdf	Institute of Education Sciences, US Education Department	Relevant pre-2000 studies of the effects of standards, alignment, goal setting, setting reachable goals, etc. include those of Mitchell (1999); Morgan & Ramist (1998); the Southern Regional Education Board (1998); Miles, Bishop, Collins, Fink, Gardner, Grant, Hussain, et al. (1997); the Florida Office of Program Policy Analysis (1997); Pomplun (1997); Schmoker (1996); Aguilera & Hendricks (1996); Banta, Lund, Black & Oblander (1996); Bottoms & Mikos (1995); Bamburg & Medina (1993); Bishop (1993); the U. S. General Accounting Office (1993); Eckstein & Noah (1993); Mattsson (1993); Brown (1992); Heyneman & Ransom (1992); Whetton (1992); Anderson, Muir, Bateson, Blackmore & Rogers (1990); Csikszentmihalyi (1990); Levine & Lezotte (1990); LaRoque & Coleman (1989); Hillocks (1987); Willingham & Morris (1986); Resnick & Resnick (1985); Ogle & Fritts (1984); Natriello & Dornbusch (1984); Brooke & Oxenham (1984); Rentz (1979); Wellisch, MacQueen, Carriere & Dick (1978); *Rosswork (1977); Estes, Colvin & Goodwin (1976); Wood (1953); and Panlasigui & Knight (1930).
82	Douglas N. Harris	Lori L. Taylor, Amy A. Levine, William K. Ingle, Leslie McDonald	"However, previous studies under-state current costs by focusing on costs before NCLB was put in place and by excluding important cost categories."	Denigrating	The Resource Costs of Standards, Assessments, and Accountability	report to the National Research Council		National Research Council funders	No, they did not leave out important cost categories; Harris' study deliberately exagerates costs. See pages 3-10: https://nonpartisaneducation.org/Review/Essays/v10n1.pdf
83	Joan Herman	Katherine E. Ryan, Lorrie A. Shepard, Eds.	"Yet, available evidence suggests that the rhetoric surpasses the reality of formative assessment use" p.217	Denigrating	Accountability and assessment: Is public interest in K-12 education being served?	Chapter 11 in The Future of Test-Based Educational Accountability	https://www.routledge.com/The-Future-of-Test-Based-Educational-Accountability-1st-Edition/Ryan-Shepard/p/book/9780805864700	Institute of Education Sciences, US Education Department	Studies of formative testing date back a cenury, and the evidence, on average, is strongly positive, which is not the result favored by CRESST, so they declare the studies nonexistent. See, for example, https://www.tandfonline.com/doi/full/10.1080/15305058.2011.602920 ; https://nonpartisaneducation.org/Review/Resources/QuantitativeList.htm ; https://nonpartisaneducation.org/Review/Resources/SurveyList.htm ; https://nonpartisaneducation.org/Review/Resources/QualitativeList.htm
84	Joan Herman	Katherine E. Ryan, Lorrie A. Shepard, Eds.	"The research base examining effects on students with disabilities and on English Language learners is scanty." p.223	Dismissive	Accountability and assessment: Is public interest in K-12 education being served?	Chapter 11 in The Future of Test-Based Educational Accountability	https://www.routledge.com/The-Future-of-Test-Based-Educational-Accountability-1st-Edition/Ryan-Shepard/p/book/9780805864700	Institute of Education Sciences, US Education Department
85	Joan Herman	Katherine E. Ryan, Lorrie A. Shepard, Eds.	"...there is no obvious accountability mechanism for the "average student" who may have made it just over the proficient level. There is little research on this issue." p.224	Dismissive	Accountability and assessment: Is public interest in K-12 education being served?	Chapter 11 in The Future of Test-Based Educational Accountability	https://www.routledge.com/The-Future-of-Test-Based-Educational-Accountability-1st-Edition/Ryan-Shepard/p/book/9780805864700	Institute of Education Sciences, US Education Department	Relevant pre-2000 studies of the effects of minimum-competency testing and the problems with a single passing score include those of Frederiksen (1994); Jacobson (1992); Winfield (1990); Ligon, Johnstone, Brightman, Davis, et al. (1990); Losack (1987); Marshall (1987); Mangino & Babcock (1986); Michigan Department of Education (1984); Serow (1982); Brunton (1982); Paramore, et al. (1980); Ogden (1979); and Findley (1978).
86	Joan Herman		"The report considers how well the model fits available evidence by examining whether and how accountability assessment influences students’ learning opportunities and the relationship between accountability and learning." abstract	Dismissive	Accountability and assessment: Is public interest in K-12 education being served?	CRESST Report 728, October 2007	https://eric.ed.gov/?id=ED499421	Institute of Education Sciences, US Education Department	See, for example, Test Frequency, Stakes, and Feedback in Student Achievement: A Meta-Analysis https://journals.sagepub.com/doi/abs/10.1177/0193841X19865628#abstract
87	Joan Herman		"What of the impact of accountability on other segments of the student population--traditionally higher performing students? ...The average student? ...there is no obvious accountability mechanism for the "average student. There is little research on this issue." p.17	Dismissive	Accountability and assessment: Is public interest in K-12 education being served?	CRESST Report 728, October 2007	https://eric.ed.gov/?id=ED499421	Institute of Education Sciences, US Education Department	Relevant pre-2000 studies of the effects of minimum-competency testing and the problems with a single passing score include those of Frederiksen (1994); Winfield (1990); Ligon, Johnstone, Brightman, Davis, et al. (1990); Losack (1987); Mangino & Babcock (1986); Serow (1982); Brunton (1982); Paramore, et al. (1980); Ogden (1979); and Findley (1978).
88	Joan Herman		"While a thorough treatment of the effects on teachers is also beyond the scope of this report, it is worth noting a growing literature that is cause for concern." p.17	Dismissive	Accountability and assessment: Is public interest in K-12 education being served?	CRESST Report 728, October 2007	https://eric.ed.gov/?id=ED499421	Institute of Education Sciences, US Education Department	Rubbish. Entire books dating back a century were written on the topic, for example: C.C. Ross, Measurement in Today’s Schools, 1942; G.M. Ruch, G.D. Stoddard, Tests and Measurements in High School Instruction, 1927; C.W. Odell, Educational Measurement in High School, 1930. Other testimonies to the abundance of educational testing and empirical research on test use starting in the first half of the twentieth century can be found in Lincoln & Workman 1936, 4, 7; Butts 1947, 605; Monroe 1950, 1461; Holman & Docter 1972, 34; Tyack 1974, 183; and Lohman 1997, 88.
89	Joan Herman		"The research base examining effects on students with disabilities and on English language learner students is scanty." pp.16-17	Dismissive	Accountability and assessment: Is public interest in K-12 education being served?	CRESST Report 728, October 2007	https://eric.ed.gov/?id=ED499421	Institute of Education Sciences, US Education Department
90	Eva L. Baker		"Tests only dimly reflect in their design the results of research on learning, whether of skills, subject matter, or problem solving." p.310	Denigrating	The End(s) of Testing	Educational Researcher, Vol. 36, No. 6, pp. 309–317		2007 Presidential Address for the American Educational Research Association
91	Eva L. Baker		"To my mind, the evidential disconnect between test design and learning research is no small thing." p.310	Dismissive	The End(s) of Testing	Educational Researcher, Vol. 36, No. 6, pp. 309–317		2007 Presidential Address for the American Educational Research Association
92	Eva L. Baker		"What if we set aside learning-based design and ask, “How well do any of our external tests work?” The answer is that we often don’t know enough to know. We have little evidence that tests are in sync with their stated or de facto purposes or that their results lead to appropriate decisions." p.310	Dismissive	The End(s) of Testing	Educational Researcher, Vol. 36, No. 6, pp. 309–317		2007 Presidential Address for the American Educational Research Association
93	Laura S. Hamilton	Brian M. Stecher, Julie A. Marsh, Jennifer Sloan McCombs, Abby Robyn, Jennifer Lin Russell, Scott Naftel, Heather Barney	"For many educators, the utility of SBA was demonstrated in a few pioneering states in the 1990s. Two of the most prominent examples of SBA occurred in Texas and North Carolina, where scores on state accountability tests rose dramatically after the introduction of SBA systems (Grissmer and Flanagan, 1998)." p.4		Standards-Based Accountability Under No Child Left Behind: Experiences of Teachers and Administrators in Three States	Rand Corporation, 2007	https://www.rand.org/pubs/monographs/MG589.html	"This research was sponsored by the National Science Foundation under grant number REC-0228295."	Relevant pre-2000 studies of the effects of standards, alignment, goal setting, setting reachable goals, etc. include those of Mitchell (1999); Morgan & Ramist (1998); the Southern Regional Education Board (1998); Miles, Bishop, Collins, Fink, Gardner, Grant, Hussain, et al. (1997); the Florida Office of Program Policy Analysis (1997); Pomplun (1997); Schmoker (1996); Aguilera & Hendricks (1996); Banta, Lund, Black & Oblander (1996); Bottoms & Mikos (1995); Bamburg & Medina (1993); Bishop (1993); the U. S. General Accounting Office (1993); Eckstein & Noah (1993); Mattsson (1993); Brown (1992); Heyneman & Ransom (1992); Whetton (1992); Anderson, Muir, Bateson, Blackmore & Rogers (1990); Csikszentmihalyi (1990); Levine & Lezotte (1990); LaRoque & Coleman (1989); Hillocks (1987); Willingham & Morris (1986); Resnick & Resnick (1985); Ogle & Fritts (1984); Natriello & Dornbusch (1984); Brooke & Oxenham (1984); Rentz (1979); Wellisch, MacQueen, Carriere & Dick (1978); *Rosswork (1977); Estes, Colvin & Goodwin (1976); Wood (1953); and Panlasigui & Knight (1930).
94	Laura S. Hamilton	Brian M. Stecher, Julie A. Marsh, Jennifer Sloan McCombs, Abby Robyn, Jennifer Lin Russell, Scott Naftel, Heather Barney	"However, the paths through which SBA [standards-based accountability] changes district, school, and classroom practices and how these changes in practice influence student outcomes are largely unexplored. There is strong evidence that SBA leads to changes in teachers’ instructional practices (Hamilton, 2004; Stecher, 2002)." p.5	Dismissive	Standards-Based Accountability Under No Child Left Behind: Experiences of Teachers and Administrators in Three States	Rand Corporation, 2007	https://www.rand.org/pubs/monographs/MG589.html	"This research was sponsored by the National Science Foundation under grant number REC-0228295."	Relevant pre-2000 studies of the effects of standards, alignment, goal setting, setting reachable goals, etc. include those of Mitchell (1999); Morgan & Ramist (1998); the Southern Regional Education Board (1998); Miles, Bishop, Collins, Fink, Gardner, Grant, Hussain, et al. (1997); the Florida Office of Program Policy Analysis (1997); Pomplun (1997); Schmoker (1996); Aguilera & Hendricks (1996); Banta, Lund, Black & Oblander (1996); Bottoms & Mikos (1995); Bamburg & Medina (1993); Bishop (1993); the U. S. General Accounting Office (1993); Eckstein & Noah (1993); Mattsson (1993); Brown (1992); Heyneman & Ransom (1992); Whetton (1992); Anderson, Muir, Bateson, Blackmore & Rogers (1990); Csikszentmihalyi (1990); Levine & Lezotte (1990); LaRoque & Coleman (1989); Hillocks (1987); Willingham & Morris (1986); Resnick & Resnick (1985); Ogle & Fritts (1984); Natriello & Dornbusch (1984); Brooke & Oxenham (1984); Rentz (1979); Wellisch, MacQueen, Carriere & Dick (1978); *Rosswork (1977); Estes, Colvin & Goodwin (1976); Wood (1953); and Panlasigui & Knight (1930).
95	Laura S. Hamilton	Brian M. Stecher, Julie A. Marsh, Jennifer Sloan McCombs, Abby Robyn, Jennifer Lin Russell, Scott Naftel, Heather Barney	"Much less is known about the impact of SBA at the district and school levels and the relationships among actions at the various levels and student outcomes. This study was designed to shed light on this complex set of relationships…" p.5	Dismissive	Standards-Based Accountability Under No Child Left Behind: Experiences of Teachers and Administrators in Three States	Rand Corporation, 2007	https://www.rand.org/pubs/monographs/MG589.html	"This research was sponsored by the National Science Foundation under grant number REC-0228295."	Relevant pre-2000 studies of the effects of standards, alignment, goal setting, setting reachable goals, etc. include those of Mitchell (1999); Morgan & Ramist (1998); the Southern Regional Education Board (1998); Miles, Bishop, Collins, Fink, Gardner, Grant, Hussain, et al. (1997); the Florida Office of Program Policy Analysis (1997); Pomplun (1997); Schmoker (1996); Aguilera & Hendricks (1996); Banta, Lund, Black & Oblander (1996); Bottoms & Mikos (1995); Bamburg & Medina (1993); Bishop (1993); the U. S. General Accounting Office (1993); Eckstein & Noah (1993); Mattsson (1993); Brown (1992); Heyneman & Ransom (1992); Whetton (1992); Anderson, Muir, Bateson, Blackmore & Rogers (1990); Csikszentmihalyi (1990); Levine & Lezotte (1990); LaRoque & Coleman (1989); Hillocks (1987); Willingham & Morris (1986); Resnick & Resnick (1985); Ogle & Fritts (1984); Natriello & Dornbusch (1984); Brooke & Oxenham (1984); Rentz (1979); Wellisch, MacQueen, Carriere & Dick (1978); *Rosswork (1977); Estes, Colvin & Goodwin (1976); Wood (1953); and Panlasigui & Knight (1930).
96	Julie A. Marsh, John F. Pane, and Laura S. Hamilton		"Unlike past studies of data use in schools, this paper brings together information systematically gathered from large, representative samples of educators at the district, school, and classroom levels in a variety of contexts." p.1	Dismissive, Denigrating	Making Sense of Data-Driven Decision Making in Education	Rand Corporation Occassional Paper, 2006
97	Julie A. Marsh, John F. Pane, and Laura S. Hamilton		"Although a few studies have tried to link DDDM to changes in school culture or performance (Chen et al., 2005; Copland, 2003; Feldman and Tung, 2001; Schmoker and Wilson, 1995; Wayman and Stringfield 2005), most of the literature focuses on implementation. In addition, previous work has tended to describe case studies of schools or has taken the form of advocacy or technical assistance (such as the “how to” implementation guides described by Feldman and Tung, 2001)." p.4	Dismissive, Denigrating	Making Sense of Data-Driven Decision Making in Education	Rand Corporation Occassional Paper, 2006
98	Eva L. Baker	Joan L. Herman, Robert L. Linn	"For example, performance assessment was a rage in the early 1990s because it was something new and flashy, and looked to have great promise. Before almost any research was done, a number of states dropped their multiple-choice accountability systems, replacing them with performance assessments.	Dismissive	ACCELERATING FUTURE POSSIBILITIES FORASSESSMENT AND LEARNING, p.1	CRESST Line, Winter 2006	https://www.researchgate.net/publication/277283780_in_Educational_Researcher_called_The_Awful_Reputation_of_Education_Research	Institute of Education Sciences, US Education Department	It is selected-response item formats (e.g., multiple choice) that are new. Open-ended item formats have been with us for millenia. And, thousands of research, evalution, and validity studies have been conducted on them.
99	Eva L. Baker	Joan L. Herman, Robert L. Linn	"By the end of this year, nearly half of all states will have graduation exams in place (Peterson, 2005). Short institutional memory forgets that similar minimum competency tests did not lead to increased achievement some 20 years ago, but instead contributed to higher numbers of high school dropouts and inequities along racial lines (Catterall, 1989; Haertel & Herman, 2005)."	Dismissive	ACCELERATING FUTURE POSSIBILITIES FORASSESSMENT AND LEARNING, p.3	CRESST Line, Winter 2006	https://www.researchgate.net/publication/277283780_in_Educational_Researcher_called_The_Awful_Reputation_of_Education_Research	Institute of Education Sciences, US Education Department	Just some of the relevant pre-2008 studies of the effects of minimum-competency or exit exams and the problems with a single passing score include those of Alvarez, Moreno, & Patrinos (2007); Grodsky & Kalogrides (2006); Audette (2005); Orlich (2003); StandardsWork (2003); Meisels, et al. (2003); Braun (2003); Rosenshine (2003); Tighe, Wang, & Foley (2002); Carnoy & Loeb (2002); Baumert & Demmrich (2001); Rosenblatt & Offer (2001); Phelps (2001); Toenjes, Dworkin, Lorence, & Hill (2000); Wenglinsky (2000); Massachusetts Finance Office (2000); DeMars (2000); Bishop (1999, 2000, 2001, & 2004); Grissmer & Flanagan(1998); Strauss, Bowes, Marks, & Plesko (1998); Frederiksen (1994); Ritchie & Thorkildsen (1994); Chao-Qun & Hui (1993); Potter & Wall (1992); Jacobson (1992); Rodgers, et al. (1991); Morris (1991); Winfield (1990); Ligon, Johnstone, Brightman, Davis, et al. (1990); Winfield (1987); Koffler (1987); Losack (1987); Marshall (1987); Hembree (1987); Mangino, Battaille, Washington, & Rumbaut (1986); Michigan Department of Education (1984); Ketchie (1984); Serow (1982); Indiana Education Department (1982); Brunton (1982); Paramore, et al. (1980); Ogden (1979); Down(2) (1979); Wellisch (1978); and Findley (1978).
100	Edward Haertel	Joan Herman	"Passing rates on MCTs in many states rose rapidly from year to year (Popham, Cruse, Rankin, Sandifer, & Williams, 1985). Despite these gains, and positive trends on examinations like the National Assessment of Educational Progress (NAEP), there is little evidence that MCTs were the reason for improvements on other examinations."	Dismissive	A Historical Perspective on Validity Arguments for Accountability Testing	CRESST Report 654, June 2005	https://cresst.org/wp-content/uploads/R654.pdf	Institute of Education Sciences, US Education Department	Just some of the relevant pre-2008 studies of the effects of minimum-competency or exit exams and the problems with a single passing score include those of Alvarez, Moreno, & Patrinos (2007); Grodsky & Kalogrides (2006); Audette (2005); Orlich (2003); StandardsWork (2003); Meisels, et al. (2003); Braun (2003); Rosenshine (2003); Tighe, Wang, & Foley (2002); Carnoy & Loeb (2002); Baumert & Demmrich (2001); Rosenblatt & Offer (2001); Phelps (2001); Toenjes, Dworkin, Lorence, & Hill (2000); Wenglinsky (2000); Massachusetts Finance Office (2000); DeMars (2000); Bishop (1999, 2000, 2001, & 2004); Grissmer & Flanagan(1998); Strauss, Bowes, Marks, & Plesko (1998); Frederiksen (1994); Ritchie & Thorkildsen (1994); Chao-Qun & Hui (1993); Potter & Wall (1992); Jacobson (1992); Rodgers, et al. (1991); Morris (1991); Winfield (1990); Ligon, Johnstone, Brightman, Davis, et al. (1990); Winfield (1987); Koffler (1987); Losack (1987); Marshall (1987); Hembree (1987); Mangino, Battaille, Washington, & Rumbaut (1986); Michigan Department of Education (1984); Ketchie (1984); Serow (1982); Indiana Education Department (1982); Brunton (1982); Paramore, et al. (1980); Ogden (1979); Down(2) (1979); Wellisch (1978); and Findley (1978).
101	Robert L. Linn		"Despite the clear appeal of assessment-based accountability and the widespread use of this approach, the development of assessments that are aligned with content standards and for which there is solid evidence of validity and reliability is a challenging endeavor."	Dismissive	Issues in the Design of Accountability Systems	CRESST Report 650, April 2005	https://cresst.org/wp-content/uploads/R650.pdf	Institute of Education Sciences, US Education Department	Relevant pre-2000 studies of the effects of standards, alignment, goal setting, setting reachable goals, etc. include those of Mitchell (1999); Morgan & Ramist (1998); the Southern Regional Education Board (1998); Miles, Bishop, Collins, Fink, Gardner, Grant, Hussain, et al. (1997); the Florida Office of Program Policy Analysis (1997); Pomplun (1997); Schmoker (1996); Aguilera & Hendricks (1996); Banta, Lund, Black & Oblander (1996); Bottoms & Mikos (1995); Bamburg & Medina (1993); Bishop (1993); the U. S. General Accounting Office (1993); Eckstein & Noah (1993); Mattsson (1993); Brown (1992); Heyneman & Ransom (1992); Whetton (1992); Anderson, Muir, Bateson, Blackmore & Rogers (1990); Csikszentmihalyi (1990); Levine & Lezotte (1990); LaRoque & Coleman (1989); Hillocks (1987); Willingham & Morris (1986); Resnick & Resnick (1985); Ogle & Fritts (1984); Natriello & Dornbusch (1984); Brooke & Oxenham (1984); Rentz (1979); Wellisch, MacQueen, Carriere & Dick (1978); *Rosswork (1977); Estes, Colvin & Goodwin (1976); Wood (1953); and Panlasigui & Knight (1930).
102	Robert L. Linn		"Alignment of an assessment with the content standards that it is intended to measure is critical if the assessment is to buttress rather than undermine the standards. Too little attention has been given to the evaluation of the alignment of assessments and standards."	Denigrating	Issues in the Design of Accountability Systems	CRESST Report 650, April 2005	https://cresst.org/wp-content/uploads/R650.pdf	Institute of Education Sciences, US Education Department	Relevant pre-2000 studies of the effects of standards, alignment, goal setting, setting reachable goals, etc. include those of Mitchell (1999); Morgan & Ramist (1998); the Southern Regional Education Board (1998); Miles, Bishop, Collins, Fink, Gardner, Grant, Hussain, et al. (1997); the Florida Office of Program Policy Analysis (1997); Pomplun (1997); Schmoker (1996); Aguilera & Hendricks (1996); Banta, Lund, Black & Oblander (1996); Bottoms & Mikos (1995); Bamburg & Medina (1993); Bishop (1993); the U. S. General Accounting Office (1993); Eckstein & Noah (1993); Mattsson (1993); Brown (1992); Heyneman & Ransom (1992); Whetton (1992); Anderson, Muir, Bateson, Blackmore & Rogers (1990); Csikszentmihalyi (1990); Levine & Lezotte (1990); LaRoque & Coleman (1989); Hillocks (1987); Willingham & Morris (1986); Resnick & Resnick (1985); Ogle & Fritts (1984); Natriello & Dornbusch (1984); Brooke & Oxenham (1984); Rentz (1979); Wellisch, MacQueen, Carriere & Dick (1978); *Rosswork (1977); Estes, Colvin & Goodwin (1976); Wood (1953); and Panlasigui & Knight (1930).
103	Betheny Gross, Michael Kirst, Dana Holland, and Tom Luschei	Bethany Gross & Margaret E. Goertz, Eds.	"Unlike elementary and middle school leaders, for whose institutions countless reform models have been designed and subsequently employed in efforts to meet accountability demands, high school leaders have relatively few models or school designs to which they can turn for guidance." p.43	Dismissive	Got You Under My Spell? How Accountability Policy Is Changing and Not Changing Decision Making in High Schools	Holding High Hopes: How High Schools Respond to State Accountability Policies, CPRE Research Report Series RR-056, March 2005		US Education Department funding for the Consortium for Policy Research in Education
104	Betheny Gross, Michael Kirst, Dana Holland, and Tom Luschei	Bethany Gross & Margaret E. Goertz, Eds.	"Perceptions that little information exists to be found may very well reduce the likelihood that information will be sought and that new strategies will be found." p.48	Dismissive	Got You Under My Spell? How Accountability Policy Is Changing and Not Changing Decision Making in High Schools	Holding High Hopes: How High Schools Respond to State Accountability Policies, CPRE Research Report Series RR-056, March 2005		US Education Department funding for the Consortium for Policy Research in Education	… perceptions that they encourage
105	Elliot H. Weinbaum	Bethany Gross & Margaret E. Goertz, Eds.	"However, state accountability policies and the research on those policies have traditionally overlooked the role of school districts. Little research is available about the ways in which districts respond to accountability pressure or, until recently, the strategies that they might use for improvement."	Dismissive	Stuck in the Middle With You: District Response to State Accountability, p.96	Holding High Hopes: How High Schools Respond to State Accountability Policies, CPRE Research Report Series RR-056, March 2005		US Education Department funding for the Consortium for Policy Research in Education
106	Elliot H. Weinbaum	Bethany Gross & Margaret E. Goertz, Eds.	"Because of the limited investigation that has been done, and the urgent need for high school improvement"	Dismissive	Stuck in the Middle With You: District Response to State Accountability, p.96	Holding High Hopes: How High Schools Respond to State Accountability Policies, CPRE Research Report Series RR-056, March 2005		US Education Department funding for the Consortium for Policy Research in Education
107	Elliot H. Weinbaum	Bethany Gross & Margaret E. Goertz, Eds.	"The research community has relatively little understanding of the ways in which state level, performance-based accountability systems and local school districts interact given various contexts." p.98	Dismissive	Stuck in the Middle With You: District Response to State Accountability, p.98	Holding High Hopes: How High Schools Respond to State Accountability Policies, CPRE Research Report Series RR-056, March 2005		US Education Department funding for the Consortium for Policy Research in Education
108	Elliot H. Weinbaum	Bethany Gross & Margaret E. Goertz, Eds.	"First of all, much of the research on districts has studied districts that are, for some reason, 'outliers.'" p.100	Dismissive	Stuck in the Middle With You: District Response to State Accountability	Holding High Hopes: How High Schools Respond to State Accountability Policies, CPRE Research Report Series RR-056, March 2005		US Education Department funding for the Consortium for Policy Research in Education
109	Elliot H. Weinbaum	Bethany Gross & Margaret E. Goertz, Eds.	"This is particularly true at the high school level, where continued debates about standards, the subject-specific nature of teacher expertise, and the lack of basic research about effective practices at the high school level make effective improvement strategies complex." p.104	Dismissive	Stuck in the Middle With You: District Response to State Accountability	Holding High Hopes: How High Schools Respond to State Accountability Policies, CPRE Research Report Series RR-056, March 2005		US Education Department funding for the Consortium for Policy Research in Education
110	Margaret E. Goertz and Diane Massell	Bethany Gross & Margaret E. Goertz, Eds.	"We know little about how high schools respond to external accountability pressures." p. 123	Dismissive	Summary	Holding High Hopes: How High Schools Respond to State Accountability Policies, CPRE Research Report Series RR-056, March 2005		US Education Department funding for the Consortium for Policy Research in Education
111	Margaret E. Goertz and Diane Massell	Bethany Gross & Margaret E. Goertz, Eds.	"Little academic research has explored what motivates and helps district organizations intervene on behalf of state accountability goals, particularly at the high school level. Our study sheds some light on this question." p.129	Dismissive	Summary	Holding High Hopes: How High Schools Respond to State Accountability Policies, CPRE Research Report Series RR-056, March 2005		US Education Department funding for the Consortium for Policy Research in Education
112	Joan L. Herman	Susan H. Fuhrman & Richard F. Elmore, Eds	"Based on available research, this chapter explores how well assessments serve these functions from the perspective of elementary schools." p.141	Dismissive	Redesigning Accountability Systems for Education, Chapter 7	Teachers College Press, 2004	Joint project between CRESST and CPRE.	Institute of Education Sciences, US Education Department	Relevant pre-2000 studies of the effects of standards, alignment, goal setting, setting reachable goals, etc. include those of Mitchell (1999); Morgan & Ramist (1998); the Southern Regional Education Board (1998); Miles, Bishop, Collins, Fink, Gardner, Grant, Hussain, et al. (1997); the Florida Office of Program Policy Analysis (1997); Pomplun (1997); Schmoker (1996); Aguilera & Hendricks (1996); Banta, Lund, Black & Oblander (1996); Bottoms & Mikos (1995); Bamburg & Medina (1993); Bishop (1993); the U. S. General Accounting Office (1993); Eckstein & Noah (1993); Mattsson (1993); Brown (1992); Heyneman & Ransom (1992); Whetton (1992); Anderson, Muir, Bateson, Blackmore & Rogers (1990); Csikszentmihalyi (1990); Levine & Lezotte (1990); LaRoque & Coleman (1989); Hillocks (1987); Willingham & Morris (1986); Resnick & Resnick (1985); Ogle & Fritts (1984); Natriello & Dornbusch (1984); Brooke & Oxenham (1984); Rentz (1979); Wellisch, MacQueen, Carriere & Dick (1978); Rosswork (1977); Estes, Colvin & Goodwin (1976); Wood (1953); and Panlasigui & Knight (1930). It is selected-response item formats (e.g., multiple choice) that are new. Open-ended item formats have been with us for millenia. And, thousands of research, evalution, and validity studies have been conducted on them. See, for example, https://www.tandfonline.com/doi/full/10.1080/15305058.2011.602920 ; https://nonpartisaneducation.org/Review/Resources/QuantitativeList.htm ; https://nonpartisaneducation.org/Review/Resources/SurveyList.htm ; https://nonpartisaneducation.org/Review/Resources/QualitativeList.htm	"Others have considered the role of tests in incentive programs. These researchers have included Homme, Csanyi, Gonzales, Rechs, O’Leary, Drabman, Kaszdin, Bootzin, Staats, Cameron, Pierce, McMillan, Corcoran, and Wilson. International organizations, such as the World Bank or the Asian Development Bank, have studied the effects of testing on education programs they sponsor. Researchers have included Somerset, Heynemann, Ransom, Psacharopoulis, Velez, Brooke, Oxenham, Bude, Chapman, Snyder, and Pronaratna. Moreover, the mastery learning/mastery testing experiments conducted from the 1960s through today varied incentives, frequency of tests, types of tests, and many other factors to determine the optimal structure of testing programs. Researchers included such notables as Bloom, Carroll, Keller, Block, Burns, Wentling, Anderson, Hymel, Kulik, Tierney, Cross, Okey, Guskey, Gates, and Jones." "What about: Brooks-Cooper, C. (1993), Brown, S. M. & Walberg, H. J. (1993), Heneman, H. G., III. (1998), Hurlock, E. B. (1925), Jones, J. et al. (1996), Kazdin, A. & Bootzin, R. (1972), Kelley, C. (1999), Kirkpatrick, J. E. (1934), O’Leary, K. D. & Drabman, R. (1971), Palmer, J. S. (2002), Richards, C. E. & Shen, T. M. (1992), .Rosswork, S. G. (1977), Staats, A. (1973), Tuckman, B. W. (1994), Tuckman, B. W. & Trimble, S. (1997), Webster, W. J., Mendro, R. L., Orsack, T., Weerasinghe, D. & Bembry, K. (1997)"
113	Joan L. Herman	Susan H. Fuhrman & Richard F. Elmore, Eds	"What is particularly new in standards-based assessment reform is being clear not only on the 'what' of what is expected (the content standards), but also on "how well" it should be accomplished (the performance standards) (Linn and Herman, 1997)." pp.141-142	Dismissive	Redesigning Accountability Systems for Education, Chapter 7	Teachers College Press, 2004	Joint project between CRESST and CPRE.	Institute of Education Sciences, US Education Department
114	Joan L. Herman	Susan H. Fuhrman & Richard F. Elmore, Eds	"More is known currently about the variation in those elements across states and localities than about their influence on schools, teaching, and student learning." p.154	Dismissive, Denigrating	Redesigning Accountability Systems for Education, Chapter 7	Teachers College Press, 2004	Joint project between CRESST and CPRE.	Institute of Education Sciences, US Education Department
115	Joan L. Herman	Susan H. Fuhrman & Richard F. Elmore, Eds	"There is ample evidence to suggest that state assessment systems do create pressure for teachers and principals … but little clear evidence on how various stakes have differential effects on teachers, their curriculum and instruction, or, ultimately, student learning." p.155	Dismissive, Denigrating	Redesigning Accountability Systems for Education, Chapter 7	Teachers College Press, 2004	Joint project between CRESST and CPRE.	Institute of Education Sciences, US Education Department
116	Joan L. Herman	Susan H. Fuhrman & Richard F. Elmore, Eds	"Similarly, states and districts differ in how they respond to low-performing schools, but evidence on whether and how their various responses influence classroom teaching, test performance, and student learning is limited." p.155	Dismissive, Denigrating	Redesigning Accountability Systems for Education, Chapter 7	Teachers College Press, 2004	Joint project between CRESST and CPRE.	Institute of Education Sciences, US Education Department
117	Joan L. Herman	Susan H. Fuhrman & Richard F. Elmore, Eds	"Further research is necessary, however, to identify optimal approaches. Needed, too, is additional research on how schools can best orchestrate their improvement efforts." p.155	Dismissive, Denigrating	Redesigning Accountability Systems for Education, Chapter 7	Teachers College Press, 2004	Joint project between CRESST and CPRE.	Institute of Education Sciences, US Education Department
118	Richard F. Elmore	Susan H. Fuhrman & Richard F. Elmore, Eds	"Nowhere is this question of what we don't know more apparent than in the issue of stakes. State policies require proficiency levels for grade promotion and graduation for students, for example, without any empirical evidence …" p.278	Dismissive	Redesigning Accountability Systems for Education, Chapter 7	Teachers College Press, 2004	Joint project between CRESST and CPRE.	Institute of Education Sciences, US Education Department
119	Richard F. Elmore	Susan H. Fuhrman & Richard F. Elmore, Eds	"Likewise, state policies set expected levels of improvement in schools without any evidence or theory about how schools actually respond to external pressure for student performance ..." pp.278-279	Dismissive	Redesigning Accountability Systems for Education, Chapter 7	Teachers College Press, 2004	Joint project between CRESST and CPRE.	Institute of Education Sciences, US Education Department
120	Lorraine M. McDonnell		"A growing body of research suggests that school and classroom practices do change in response to these assessments (Herman and Golan, 1993; Smith and Rottenberg, 1991; Madaus, 1988)"	1stness	Politics, Persuasion, and Educational Testing, p.9	Harvard University Press, 2004			Rubbish. Entire books dating back a century were written on the topic, for example: C.C. Ross, Measurement in Today’s Schools, 1942; G.M. Ruch, G.D. Stoddard, Tests and Measurements in High School Instruction, 1927; C.W. Odell, Educational Measurement in High School, 1930. Other testimonies to the abundance of educational testing and empirical research on test use starting in the first half of the twentieth century can be found in Lincoln & Workman 1936, 4, 7; Butts 1947, 605; Monroe 1950, 1461; Holman & Docter 1972, 34; Tyack 1974, 183; and Lohman 1997, 88.
121	Lorraine M. McDonnell		"A growing body of research suggests that school and classroom practices do change in response to these assessments (Herman and Golan, 1993; Smith and Rottenberg, 1991; Madaus, 1988)"	Dismissive	Politics, Persuasion, and Educational Testing, p.9	Harvard University Press, 2004			Rubbish. Entire books dating back a century were written on the topic, for example: C.C. Ross, Measurement in Today’s Schools, 1942; G.M. Ruch, G.D. Stoddard, Tests and Measurements in High School Instruction, 1927; C.W. Odell, Educational Measurement in High School, 1930. Other testimonies to the abundance of educational testing and empirical research on test use starting in the first half of the twentieth century can be found in Lincoln & Workman 1936, 4, 7; Butts 1947, 605; Monroe 1950, 1461; Holman & Docter 1972, 34; Tyack 1974, 183; and Lohman 1997, 88.
122	Lorraine M. McDonnell		"Although most literature on policy instruments identifies this persuasive tool as one of the stategies available to policymakers, little theoretical or comparative empirical research has been conducted on its properties."	Dismissive	Politics, Persuasion, and Educational Testing, p.24	Harvard University Press, 2004
123	Lorraine M. McDonnell		"There is empirical research on policies that rely on hortatory tools, but studies of these individual policies have not examined them within a broader theoretical framework."	Denigrating	Politics, Persuasion, and Educational Testing, p.24	Harvard University Press, 2004
124	Lorraine M. McDonnell		"This chapter represents an initial attempt to analyze the major characteristics of hortatory policy by taking an inductive approach and looking across several different policy areas to identify a few basic properties common to most policies of this type."	1stness	Politics, Persuasion, and Educational Testing, p.24	Harvard University Press, 2004
125	Lorraine M. McDonnell		"This chapter has begun the task of building a conceptual framework for understanding hortatory policies by identifying their underlying causal assumptions and analyzing some basic properties common to most polcies that rely on information and values to motivate action."	1stness	Politics, Persuasion, and Educational Testing, p.44–45	Harvard University Press, 2004
126	Lorraine M. McDonnell		"Because so little systematic research has been conducted on hortatory policy, it is possible at this point only to suggest, rather than to specify, the conditions under which its underlying assumptions will be valid and a policy likely to succeed."	Dismissive	Politics, Persuasion, and Educational Testing, p.45	Harvard University Press, 2004
127	Lorraine M. McDonnell		"Additional theoretical and empirical work is needed to develop a more rigorous and nuanced understanding of hotatory policy. Nevertheless, this study starts that process by articulating the policy theory undergirding hortatory policy and by outlining its potential promise and shortcomings."	Denigrating	Politics, Persuasion, and Educational Testing, p.45	Harvard University Press, 2004
128	Lorraine M. McDonnell		"However, because research on the effects of high stakes testing is limited, finds mixed results, and suggests unintended consequences, the informational and persuasive dimensions of testing will continue to be critical to the success of this policy."	Dismissive	Politics, Persuasion, and Educational Testing, p.182–183	Harvard University Press, 2004			Rubbish. Entire books dating back a century were written on the topic, for example: C.C. Ross, Measurement in Today’s Schools, 1942; G.M. Ruch, G.D. Stoddard, Tests and Measurements in High School Instruction, 1927; C.W. Odell, Educational Measurement in High School, 1930. Other testimonies to the abundance of educational testing and empirical research on test use starting in the first half of the twentieth century can be found in Lincoln & Workman 1936, 4, 7; Butts 1947, 605; Monroe 1950, 1461; Holman & Docter 1972, 34; Tyack 1974, 183; and Lohman 1997, 88.
129	Deborah Loewenberg Ball	Jo Boaler, Phil Daro, Andrew Porter, & 14 others	"...the federal government and the nation’s school systems have made and are continuing to make significant investments toward the improvement of mathematics education. However, the knowledge base upon which these efforts are founded is generally weak." p.iii	Denigrating	Mathematical Proficiency for All Students	Rand Corporation, 2003	https://www.rand.org/pubs/monograph_reports/MR1643.html	Office of Research and Improvement, US Education Department
130	Deborah Loewenberg Ball	Jo Boaler, Phil Daro, Andrew Porter, & 14 others	"New curricular materials have been developed along with training and coaching programs intended to provide teachers with the knowledge and skills needed to use those materials. However, these efforts have been supported by only a limited and uneven base of research and research-based development, which is part of the reason for the limited success of those efforts." p. xi	Dismissive	Mathematical Proficiency for All Students	Rand Corporation, 2003	https://www.rand.org/pubs/monograph_reports/MR1643.html	Office of Research and Improvement, US Education Department
131	Deborah Loewenberg Ball	Jo Boaler, Phil Daro, Andrew Porter, & 14 others	"More important, the intense debates over the past decade seem to be based more often on ideology than on evidence." p.xiii	Denigrating	Mathematical Proficiency for All Students	Rand Corporation, 2003	https://www.rand.org/pubs/monograph_reports/MR1643.html	Office of Research and Improvement, US Education Department
132	Deborah Loewenberg Ball	Jo Boaler, Phil Daro, Andrew Porter, & 14 others	"However, despite more than a century of efforts to improve school mathematics in the United States, investments in research and development have been virtually nonexistent." p.xiv	Dismissive	Mathematical Proficiency for All Students	Rand Corporation, 2003	https://www.rand.org/pubs/monograph_reports/MR1643.html	Office of Research and Improvement, US Education Department
133	Deborah Loewenberg Ball	Jo Boaler, Phil Daro, Andrew Porter, & 14 others	"There has never been a long-range programmatic effort to fund research and development in mathematics education, nor has funding been organized to focus on knowledge that would be usable in practice." p.xiv	Denigrating	Mathematical Proficiency for All Students	Rand Corporation, 2003	https://www.rand.org/pubs/monograph_reports/MR1643.html	Office of Research and Improvement, US Education Department
134	Deborah Loewenberg Ball	Jo Boaler, Phil Daro, Andrew Porter, & 14 others	"Despite the strong history of work in this area, we lack research about what is happening today in algebra classrooms; how innovations in algebra teaching and learning can be designed, implemented, and assessed; and how policy decisions shape student learning and affect equity." p.xxi	Dismissive	Mathematical Proficiency for All Students	Rand Corporation, 2003	https://www.rand.org/pubs/monograph_reports/MR1643.html	Office of Research and Improvement, US Education Department
135	Deborah Loewenberg Ball	Jo Boaler, Phil Daro, Andrew Porter, & 14 others	"Because most studies have focused on algebra at the high school level, we lack knowledge about younger students’ learning of algebraic ideas and skills." p.xxi	Dismissive	Mathematical Proficiency for All Students	Rand Corporation, 2003	https://www.rand.org/pubs/monograph_reports/MR1643.html	Office of Research and Improvement, US Education Department
136	Deborah Loewenberg Ball	Jo Boaler, Phil Daro, Andrew Porter, & 14 others	"Little is known about what happens when algebra is viewed as a K–12 subject, what happens when it is integrated with other subjects, or what happens when it emphasizes a wider range of concepts and processes." p.xxi	Dismissive	Mathematical Proficiency for All Students	Rand Corporation, 2003	https://www.rand.org/pubs/monograph_reports/MR1643.html	Office of Research and Improvement, US Education Department
137	Deborah Loewenberg Ball	Jo Boaler, Phil Daro, Andrew Porter, & 14 others	"Research could inform the perennial debates surrounding the algebra curriculum: what to include, emphasize, reduce, or omit.." p.xxi	Dismissive	Mathematical Proficiency for All Students	Rand Corporation, 2003	https://www.rand.org/pubs/monograph_reports/MR1643.html	Office of Research and Improvement, US Education Department
138	Deborah Loewenberg Ball	Jo Boaler, Phil Daro, Andrew Porter, & 14 others	"For the most part, these debates are poorly informed because research evidence is lacking." p.xxiv	Dismissive	Mathematical Proficiency for All Students	Rand Corporation, 2003	https://www.rand.org/pubs/monograph_reports/MR1643.html	Office of Research and Improvement, US Education Department
139	Deborah Loewenberg Ball	Jo Boaler, Phil Daro, Andrew Porter, & 14 others	"Despite more than a century of efforts to improve school mathematics in the United States, efforts that have yielded numerous research studies and development projects, investments in research and development have been inadequate." p.5	Dismissive	Mathematical Proficiency for All Students	Rand Corporation, 2003	https://www.rand.org/pubs/monograph_reports/MR1643.html	Office of Research and Improvement, US Education Department
140	Deborah Loewenberg Ball	Jo Boaler, Phil Daro, Andrew Porter, & 14 others	"Federal agencies (primarily the National Science Foundation and the U.S. Department of Education) have contributed funding for many of these efforts. But the investments have been relatively small, and the support has been fragmented and uncoordinated." p.5	Dismissive	Mathematical Proficiency for All Students	Rand Corporation, 2003	https://www.rand.org/pubs/monograph_reports/MR1643.html	Office of Research and Improvement, US Education Department
141	Deborah Loewenberg Ball	Jo Boaler, Phil Daro, Andrew Porter, & 14 others	"There has never been a long-range programmatic effort devoted solely to funding research in mathematics education, nor has research (as opposed to development) funding been organized to focus on knowledge that would be usable in practice. Consequently, major gaps exist in the knowledge base and in knowledge-based development." p.5	Dismissive	Mathematical Proficiency for All Students	Rand Corporation, 2003	https://www.rand.org/pubs/monograph_reports/MR1643.html	Office of Research and Improvement, US Education Department
142	Laura S. Hamilton	Daniel F. McCaffrey, J.R. Lockwood, Daniel M. Koretz	“The shortcomings of the studies make it difficult to determine the size of teacher effects, but we suspect that the magnitude of some of the effects reported in this literature are overstated.” p. xiii	Denigrating	Evaluating Value-Added Models for Teacher Accountability	Rand Corporation, 2003	https://www.rand.org/content/dam/rand/pubs/monographs/2004/RAND_MG158.pdf	Rand Corporation funders	Tennessee's TVAAS value-added measurement system had been running a decade when they wrote this and did much of what these authors claim had never been done.
143	Laura S. Hamilton	Daniel F. McCaffrey, J.R. Lockwood, Daniel M. Koretz	“Using VAM to estimate individual teacher effects is a recent endeavor, and many of the possible sources of error have not been thoroughly evaluated in the literature.” p. xix	Dismissive	Evaluating Value-Added Models for Teacher Accountability	Rand Corporation, 2003	https://www.rand.org/content/dam/rand/pubs/monographs/2004/RAND_MG158.pdf	Rand Corporation funders	Tennessee's TVAAS value-added measurement system had been running a decade when they wrote this and did much of what these authors claim had never been done.
144	Laura S. Hamilton	Daniel F. McCaffrey, J.R. Lockwood, Daniel M. Koretz	“Empirical evaluations do not exist for many of the potential sources of error we have identified. Studies need to be conducted to determine how these factors contribute to estimated teacher effects and to determine the conditions that exacerbate or mitigate the impact these factors have on teacher effects.” p. xix	Dismissive	Evaluating Value-Added Models for Teacher Accountability	Rand Corporation, 2003	https://www.rand.org/content/dam/rand/pubs/monographs/2004/RAND_MG158.pdf	Rand Corporation funders	Tennessee's TVAAS value-added measurement system had been running a decade when they wrote this and did much of what these authors claim had never been done.
145	Laura S. Hamilton	Daniel F. McCaffrey, J.R. Lockwood, Daniel M. Koretz	“This lack of attention to teachers in policy discussions may be attributed in part to another body of literature that attempted to determine the effects of specific teacher background characteristics, including credentialing status (e.g., Miller, McKenna, and McKenna, 1998; Goldhaber and Brewer, 2000) and subject matter coursework (e.g., Monk, 1994).” p. 8	Dismissive	Evaluating Value-Added Models for Teacher Accountability	Rand Corporation, 2003	https://www.rand.org/content/dam/rand/pubs/monographs/2004/RAND_MG158.pdf	Rand Corporation funders	Tennessee's TVAAS value-added measurement system had been running a decade when they wrote this and did much of what these authors claim had never been done.
146	Laura S. Hamilton	Daniel F. McCaffrey, J.R. Lockwood, Daniel M. Koretz	“To date, there has been little empirical exploration of the size of school effects and the sensitivity of teacher effects to modeling of school effects.” p. 78	Dismissive	Evaluating Value-Added Models for Teacher Accountability	Rand Corporation, 2003	https://www.rand.org/content/dam/rand/pubs/monographs/2004/RAND_MG158.pdf	Rand Corporation funders	Tennessee's TVAAS value-added measurement system had been running a decade when they wrote this and did much of what these authors claim had never been done.
147	Laura S. Hamilton	Daniel F. McCaffrey, J.R. Lockwood, Daniel M. Koretz	“There are no empirical explorations of the robustness of estimates to assumptions about prior-year schooling effects.“ p. 81	Dismissive	Evaluating Value-Added Models for Teacher Accountability	Rand Corporation, 2003	https://www.rand.org/content/dam/rand/pubs/monographs/2004/RAND_MG158.pdf	Rand Corporation funders	Tennessee's TVAAS value-added measurement system had been running a decade when they wrote this and did much of what these authors claim had never been done.
148	Laura S. Hamilton	Daniel F. McCaffrey, J.R. Lockwood, Daniel M. Koretz	“There is currently no empirical evidence about the sensitivity of gain scores or teacher effects to such alternatives.” p. 89	Dismissive	Evaluating Value-Added Models for Teacher Accountability	Rand Corporation, 2003	https://www.rand.org/content/dam/rand/pubs/monographs/2004/RAND_MG158.pdf	Rand Corporation funders	Tennessee's TVAAS value-added measurement system had been running a decade when they wrote this and did much of what these authors claim had never been done.
149	Laura S. Hamilton	Daniel F. McCaffrey, J.R. Lockwood, Daniel M. Koretz	“Empirical evaluations do not exist for many of the potential sources of error we have identified. Studies need to be conducted to determine how these factors contribute to estimated teacher effects and to determine the conditions that exacerbate or mitigate the impact these factors have on teacher effects.” p. 116	Dismissive	Evaluating Value-Added Models for Teacher Accountability	Rand Corporation, 2003	https://www.rand.org/content/dam/rand/pubs/monographs/2004/RAND_MG158.pdf	Rand Corporation funders	Tennessee's TVAAS value-added measurement system had been running a decade when they wrote this and did much of what these authors claim had never been done.
150	Laura S. Hamilton	Daniel F. McCaffrey, J.R. Lockwood, Daniel M. Koretz	“Although we expect missing data are likely to be pervasive, there is little systematic discussion of the extent or nature of missing data in test score databases.” p. 117	Dismissive	Evaluating Value-Added Models for Teacher Accountability	Rand Corporation, 2003	https://www.rand.org/content/dam/rand/pubs/monographs/2004/RAND_MG158.pdf	Rand Corporation funders	Tennessee's TVAAS value-added measurement system had been running a decade when they wrote this and did much of what these authors claim had never been done.
151	Joan L. Herman, Noreen Webb, & Stephen Zuniga		"Despite the importance of the concept, the present state of alignment is weak (Feuer, Holland, Green, Bertenthal, & Hemphill, 1999; Rothman, Slattery, Vranek, & Resnick, 2000), and sound methodologies for examining and documenting it are just recently emerging." p.2	Dismissive	Alignment and College Admissions: The Match of Expectations, Assessments, and Educator Perspectives	CSE Technical Report 593, April 2003		Office of Research and Improvement, US Education Department
152	Marguerite Clarke	5 co-authors	“What this study adds to the body of literature in this area is a systematic look at how impact varies with the stakes attached to the test results.” p. 91	1stness	Perceived Effects of State-Mandated Testing Programs on Teaching and Learning etc. (5 co-authors)	National Board on Educational Testing and Public Policy monograph, January 2003	http://files.eric.ed.gov/fulltext/ED474867.pdf	Ford Foundation	See, for example, Test Frequency, Stakes, and Feedback in Student Achievement: A Meta-Analysis https://journals.sagepub.com/doi/abs/10.1177/0193841X19865628#abstract
153	Marguerite Clarke	5 co-authors	“Many calls for school reform assert that high-stakes testing will foster the economic competitiveness of the U.S. However, the empirical basis for this claim is weak.” p. 96, n. 1	Denigrating	Perceived Effects of State-Mandated Testing Programs on Teaching and Learning etc. (5 co-authors)	National Board on Educational Testing and Public Policy monograph, January 2003	http://files.eric.ed.gov/fulltext/ED474867.pdf	Ford Foundation
154	Brian M. Stecher	Laura S. Hamilton	"The business model of setting clear targets, attaching incentives to the attainment of those targets, and rewarding those responsible for reaching the targets has proven successful in a wide range of business enterprises. But there is no evidence that these accountability principles will work well in an educational context, and there are many reasons to doubt that the principles can be applied without significant adaptation."	Dismissive	Putting Theory to the Test: Systems of "Educational Accountability" Should be Held Accountable	Rand Review, Spring 2002	https://www.rand.org/pubs/periodicals/rand-review/issues/rr-04-02/theory.html	Rand Corporation funders	See, for example: https://www.tandfonline.com/doi/full/10.1080/15305058.2011.602920 ; https://nonpartisaneducation.org/Review/Resources/QuantitativeList.htm . This list includes 24 studies completed before 2000 whose primary focus was to measure the effect of “test-based accountability.” A few dozen more pre-2000 studies also measured the effect of test-based accountability although such was not their primary focus. Include qualitative and program evaluation studies of test-based accountability, and the count of pre-2000 studies rises into the hundreds.
155	Brian M. Stecher	Laura S. Hamilton	" The lack of strong evidence regarding the design and effectiveness of accountability systems hampers policymaking at a critical juncture."	Denigrating	Putting Theory to the Test: Systems of "Educational Accountability" Should be Held Accountable	Rand Review, Spring 2002	https://www.rand.org/pubs/periodicals/rand-review/issues/rr-04-02/theory.html	Rand Corporation funders	See, for example: https://www.tandfonline.com/doi/full/10.1080/15305058.2011.602920 ; https://nonpartisaneducation.org/Review/Resources/QuantitativeList.htm . This list includes 24 studies completed before 2000 whose primary focus was to measure the effect of “test-based accountability.” A few dozen more pre-2000 studies also measured the effect of test-based accountability although such was not their primary focus. Include qualitative and program evaluation studies of test-based accountability, and the count of pre-2000 studies rises into the hundreds.
156	Brian M. Stecher	Laura S. Hamilton	"Nonetheless, the evidence has yet to justify the expectations. The initial evidence is, at best, mixed. On the plus side, students and teachers seem to respond to the incentives created by the accountability systems	Dismissive	Putting Theory to the Test: Systems of "Educational Accountability" Should be Held Accountable	Rand Review, Spring 2002	https://www.rand.org/pubs/periodicals/rand-review/issues/rr-04-02/theory.html	Rand Corporation funders	See, for example: https://www.tandfonline.com/doi/full/10.1080/15305058.2011.602920 ; https://nonpartisaneducation.org/Review/Resources/QuantitativeList.htm . This list includes 24 studies completed before 2000 whose primary focus was to measure the effect of “test-based accountability.” A few dozen more pre-2000 studies also measured the effect of test-based accountability although such was not their primary focus. Include qualitative and program evaluation studies of test-based accountability, and the count of pre-2000 studies rises into the hundreds.
157	Brian M. Stecher	Laura S. Hamilton	"Proponents of accountability attribute the improved scores in these states to clearer expectations, greater motivation on the part of the students and teachers, a focused curriculum, and more-effective instruction. However, there is little or no research to substantiate these positive changes or their effects on scores."	Dismissive	Putting Theory to the Test: Systems of "Educational Accountability" Should be Held Accountable	Rand Review, Spring 2002	https://www.rand.org/pubs/periodicals/rand-review/issues/rr-04-02/theory.html	Rand Corporation funders	In fact, the evidence "that testing can improve education" is voluminous. See, for example, Phelps, R. P. (2005). The rich, robust research literature on testing’s achievement benefits. In R. P. Phelps (Ed.), Defending standardized testing (pp. 55–90). Mahwah, NJ: Psychology Press. Or, see https://journals.sagepub.com/doi/abs/10.1177/0193841X19865628#abstract
158	Brian M. Stecher	Laura S. Hamilton	"One of the earliest studies on the effects of testing (conducted in two Arizona schools in the late 1980s) showed that teachers reduced their emphasis on important, nontested material."	Dismissive	Putting Theory to the Test: Systems of "Educational Accountability" Should be Held Accountable	Rand Review, Spring 2002	https://www.rand.org/pubs/periodicals/rand-review/issues/rr-04-02/theory.html	Rand Corporation funders	Hundreds, perhaps thousands, of studies of the effects of testing predate CRESST's in the 1980s. See, for example: https://www.tandfonline.com/doi/full/10.1080/15305058.2011.602920 ; https://nonpartisaneducation.org/Review/Resources/QuantitativeList.htm ; https://nonpartisaneducation.org/Review/Resources/SurveyList.htm ; https://nonpartisaneducation.org/Review/Resources/QualitativeList.htm
159	Brian M. Stecher	Laura S. Hamilton	"Test-based accountability systems will work better if we acknowledge how little we know about them, if the federal government devotes appropriate resources to studying them, and if the states make ongoing efforts to improve them."	Dismissive	Putting Theory to the Test: Systems of "Educational Accountability" Should be Held Accountable	Rand Review, Spring 2002	https://www.rand.org/pubs/periodicals/rand-review/issues/rr-04-02/theory.html	Rand Corporation funders	See, for example: https://www.tandfonline.com/doi/full/10.1080/15305058.2011.602920 ; https://nonpartisaneducation.org/Review/Resources/QuantitativeList.htm . This list includes 24 studies completed before 2000 whose primary focus was to measure the effect of “test-based accountability.” A few dozen more pre-2000 studies also measured the effect of test-based accountability although such was not their primary focus. Include qualitative and program evaluation studies of test-based accountability, and the count of pre-2000 studies rises into the hundreds.
160	Robert L. Linn	Eva L. Baker	"“It is true that many of these accommodated test conditions are not subjected to validity studies to determine whether the construct or domain tested has been significantly altered. In part, this lack of empirical data results from restricted resources.” p. 14	Dismissive	Validity Issues for Accountability Systems	CSE Technical Report 585 (December 2002)	http://www.cse.ucla.edu/products/reports/TR585.pdf	Office of Research and Improvement, US Education Department	External evaluations of large-scale testing programs not only exist, but represent the norm.
161	Lauren B. Resnick	Robert Rothman, Jean B. Slattery, Jennifer L. Vranek	"States that have or adopt test-based accountability programs claim that their tests are aligned to their standards. But there has been, up to now, no independent methodology for checking alignment. This paper describes and illustrates such a methodology..."	1stness	Benchmarking and Alignment of Standards and Testing, p.1	CSE Technical Report 566, CRESST/Achieve, May 2002	https://www.achieve.org/files/TR566.pdf	Office of Research and Improvement, US Education Department	Relevant pre-2000 studies of the effects of standards, alignment, goal setting, setting reachable goals, etc. include those of Mitchell (1999); Morgan & Ramist (1998); the Southern Regional Education Board (1998); Miles, Bishop, Collins, Fink, Gardner, Grant, Hussain, et al. (1997); the Florida Office of Program Policy Analysis (1997); Pomplun (1997); Schmoker (1996); Aguilera & Hendricks (1996); Banta, Lund, Black & Oblander (1996); Bottoms & Mikos (1995); Bamburg & Medina (1993); Bishop (1993); the U. S. General Accounting Office (1993); Eckstein & Noah (1993); Mattsson (1993); Brown (1992); Heyneman & Ransom (1992); Whetton (1992); Anderson, Muir, Bateson, Blackmore & Rogers (1990); Csikszentmihalyi (1990); Levine & Lezotte (1990); LaRoque & Coleman (1989); Hillocks (1987); Willingham & Morris (1986); Resnick & Resnick (1985); Ogle & Fritts (1984); Natriello & Dornbusch (1984); Brooke & Oxenham (1984); Rentz (1979); Wellisch, MacQueen, Carriere & Dick (1978); *Rosswork (1977); Estes, Colvin & Goodwin (1976); Wood (1953); and Panlasigui & Knight (1930).
162	Lauren B. Resnick	Robert Rothman, Jean B. Slattery, Jennifer L. Vranek	"Yet few, if any, states have put in place effective policies or resource systems for improving instructional quality (National Research Council, 1999)."	Dismissive	Benchmarking and Alignment of Standards and Testing, p.4	CSE Technical Report 566, CRESST/Achieve, May 2002	https://www.achieve.org/files/TR566.pdf	Office of Research and Improvement, US Education Department	Hundreds, perhaps thousands, of studies of the effects of testing predate CRESST's in the 1980s. See, for example: https://www.tandfonline.com/doi/full/10.1080/15305058.2011.602920 ; https://nonpartisaneducation.org/Review/Resources/QuantitativeList.htm ; https://nonpartisaneducation.org/Review/Resources/SurveyList.htm ; https://nonpartisaneducation.org/Review/Resources/QualitativeList.htm
163	Laura S. Hamilton, Brian M. Stecher, Stephen P. Klein		"Although test-based accountability has shown some compelling results, the issues are complex, the research is new and incomplete, and many of the claims that have received the most attention have proved to be premature and superficial."	Denigrating	Summary, p.xiv	Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002	https://www.rand.org/pubs/monograph_reports/MR1554.html	US National Science Foundation	Hundreds, perhaps thousands, of studies of the effects of testing predate CRESST's in the 1980s. See, for example: https://www.tandfonline.com/doi/full/10.1080/15305058.2011.602920 ; https://nonpartisaneducation.org/Review/Resources/QuantitativeList.htm ; https://nonpartisaneducation.org/Review/Resources/SurveyList.htm ; https://nonpartisaneducation.org/Review/Resources/QualitativeList.htm
164	Laura S. Hamilton, Brian M. Stecher, Stephen P. Klein		"The research evidence does not provide definitive information about the actual costs of testing but the information that is available suggests that expenditures for testing have grown in recent years."	Dismissive	Introduction, p.9	Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002	https://www.rand.org/pubs/monograph_reports/MR1554.html	US National Science Foundation	No. See, for example, Phelps, R.P. (2000, Winter). Estimating the cost of systemwide student testing in the United States. Journal of Education Finance, 25(3) 343–380; Danitz, T. (2001, February 27). Special report: States pay $400 million for tests in 2001. Stateline.org. Pew Center for the States; Hoxby, C.M. (2002). The cost of accountability, in W. M Evers & H.J. Walberg (Eds.), School Accountability, Stanford, CA: Hoover Institution Press; U.S. GAO. (1993, January). Student testing: Current extent and expenditures, with cost estimates for a national examination. GAO/PEMD-93-8. Washington, DC: US General Accounting Office; Picus, L.O., & Tralli, A. (1998, February). Alternative assessment programs: What are the true costs? CSE Technical Report 441, Los Angeles: CRESST; Phelps, R.P. (1998). Benefit-cost analysis of systemwide student testing, Paper presented at the annual meeting of the American Education Finance Association, Mobile, AL.
165	Laura S. Hamilton, Brian M. Stecher, Stephen P. Klein		"The General Accounting Office (1993) … estimate was $516 million … The estimate does not include time for more-extensive test preparation activities." p.9	Denigrating	Introduction, p.9	Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002	https://www.rand.org/pubs/monograph_reports/MR1554.html	US National Science Foundation	As a matter of fact the GAO report did include those costs -- all of them. The GAO surveys very explicitly instructed respondents to "include any and all costs related" to each test, including any and all test preparation time and expenses.
166	Laura S. Hamilton, Daniel M. Koretz	Laura S. Hamilton, Brian M. Stecher, Stephen P. Klein, Eds.	"There is currently no substantial evidence on the effects of published report cards on parents’ decisionmaking or on the schools themselves."	Dismissive	Chapter 2: Tests and their use in test-based accountability systems, p.44	Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002	https://www.rand.org/pubs/monograph_reports/MR1554.html	US National Science Foundation	For decades, consulting services have existed that help parents new to a city select the right school or school district for them.
167	Vi-Nhuan Le, Stephen P. Klein	Laura S. Hamilton, Brian M. Stecher, Stephen P. Klein, Eds.	"Research on the inflation of gains remains too limited to indicate how prevalent the problem is."	Dismissive	Chapter 3: Technical criteria for evaluating tests, p. 68	Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002	https://www.rand.org/pubs/monograph_reports/MR1554.html	US National Science Foundation	In fact the test prep, or test coaching, literature is vast and dates back decades, with meta-analyses of the literature dating back at least to the 1970s. There's even a What Works Clearinghouse summary of the (post World Wide Web) college admission test prep research literature: https://ies.ed.gov/ncee/wwc/Docs/InterventionReports/wwc_act_sat_100416.pdf . See also: Gilmore (1927) DeWeerdt (1927) French (1959) French & Dear (1959) Ortar (1960) Marron (1965) ETS (1965). Messick & Jungeblut (1981) Ellis, Konoske, Wulfeck, & Montague (1982) DerSimonian and Laird (1983) Kulik, Bangert-Drowns & Kulik (1984) Powers (1985) Samson (1985) Scruggs, White, & Bennion (1986) Jones (1986). Fraker (1986/1987) Halpin (1987) Whitla (1988) Snedecor (1989) Bond (1989). Baydar (1990) Becker (1990) Smyth (1990) Moore (1991) Alderson & Wall (1992) Powers (1993) Oren (1993). Powers & Rock (1994) Scholes, Lane (1997) Allalouf & Ben Shakhar (1998) Robb & Ercanbrack (1999) McClain (1999) Camara (1999, 2001, 2008) Stone & Lane (2000, 2003) Din & Soldan (2001) Briggs (2001) Palmer (2002) Briggs & Hansen (2004) Cankoy & Ali Tut (2005) Crocker (2005) Allensworth, Correa, & Ponisciak (2008) Domingue & Briggs (2009) Koljatic & Silva (2014) Early (2019) Herndon (2021)
168	Vi-Nhuan Le, Stephen P. Klein	Laura S. Hamilton, Brian M. Stecher, Stephen P. Klein, Eds.	"Relatively little is known about how testing accomodations affect score validity, and the few studies that have been conducted on the subject have had mixed results."	Dismissive	Chapter 3: Technical criteria for evaluating tests, p. 71	Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002	https://www.rand.org/pubs/monograph_reports/MR1554.html	US National Science Foundation
169	Brian M. Stecher	Laura S. Hamilton, Brian M. Stecher, Stephen P. Klein, Eds.	"High-stakes testing may also affect parents (e.g., their attitudes toward education, their engagement with schools, and their direct participation in their child's learning) as well as policymakers (their beliefs about system performance, their judgements about program effectiveness, and their allocation of resources). However, these issues remain largely unexamined in the literature."	Dismissive	Chapter 4: Consequences of large-scale, high-stakes testing on school and classroom practice, p. 79	Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002	https://www.rand.org/content/dam/rand/pubs/monograph_reports/2002/MR1554.pdf	US National Science Foundation	Parents and other adults are typically reached.through public opinion polls. See, for example: https://www.tandfonline.com/doi/full/10.1080/15305058.2011.602920 ; https://nonpartisaneducation.org/Review/Resources/SurveyList.htm . Among the hundreds of polls conducted between 1958 and 2008, a majority of them included parents in particular or adults in general.
170	Brian M. Stecher	Laura S. Hamilton, Brian M. Stecher, Stephen P. Klein, Eds.	"As described in chapter 2, there was little concern about the effects of testing on teaching prior to the 1970s."	Dismissive	Chapter 4: Consequences of large-scale, high-stakes testing on school and classroom practice, p. 81	Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002	https://www.rand.org/content/dam/rand/pubs/monograph_reports/2002/MR1554.pdf	US National Science Foundation	Rubbish. Entire books were written on the topic, for example: C.C. Ross, Measurement in Today’s Schools, 1942; G.M. Ruch, G.D. Stoddard, Tests and Measurements in High School Instruction, 1927; C.W. Odell, Educational Measurement in High School, 1930. Other testimonies to the abundance of educational testing and empirical research on test use starting in the first half of the twentieth century can be found in Lincoln & Workman 1936, 4, 7; Butts 1947, 605; Monroe 1950, 1461; Holman & Docter 1972, 34; Tyack 1974, 183; and Lohman 1997, 88.
171	Brian M. Stecher	Laura S. Hamilton, Brian M. Stecher, Stephen P. Klein, Eds.	"In light of the changes that occurred in the uses of large-scale testing in the 1980s and 1990s, researchers began to investigate teachers' reactions to external assessment. The initial research on the impact of large-scale testing was conducted in the 1980s and the 1990s."	Dismissive	Chapter 4: Consequences of large-scale, high-stakes testing on school and classroom practice, p. 83	Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002	https://www.rand.org/content/dam/rand/pubs/monograph_reports/2002/MR1554.pdf	US National Science Foundation	Hundreds, perhaps thousands, of studies of the effects of testing predate CRESST's in the 1980s. See, for example: https://www.tandfonline.com/doi/full/10.1080/15305058.2011.602920 ; https://nonpartisaneducation.org/Review/Resources/QuantitativeList.htm ; https://nonpartisaneducation.org/Review/Resources/SurveyList.htm ; https://nonpartisaneducation.org/Review/Resources/QualitativeList.htm
172	Brian M. Stecher	Laura S. Hamilton, Brian M. Stecher, Stephen P. Klein, Eds.	"The bulk of the research on the effects of testing has been conducted using surveys and case studies."	Dismissive	Chapter 4: Consequences of large-scale, high-stakes testing on school and classroom practice, p. 83	Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002	https://www.rand.org/content/dam/rand/pubs/monograph_reports/2002/MR1554.pdf	US National Science Foundation	This is misleading. True, many of the hundreds of studies on the effects of testing have been surveys and case studies. But, many, and more by my count, have been randomized experiments. See, for example: https://www.tandfonline.com/doi/full/10.1080/15305058.2011.602920 ; https://nonpartisaneducation.org/Review/Resources/QuantitativeList.htm ;
173	Brian M. Stecher	Laura S. Hamilton, Brian M. Stecher, Stephen P. Klein, Eds.	"Data on the incidence of cheating [on educational tests] are scarce…"	Dismissive	Chapter 4: Consequences of large-scale, high-stakes testing on school and classroom practice, p. 96	Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002	https://www.rand.org/pubs/monograph_reports/MR1554.html	US National Science Foundation	Actually, there have been, in surveys, in which respondents freely admit that they cheat and how. Moreover, news reports of cheating, by students or educators, have been voluminous. See, for example, Caveon Test Security's "Cheating in the News" section on its web site.
174	Brian M. Stecher	Laura S. Hamilton, Brian M. Stecher, Stephen P. Klein, Eds.	"Less is known about changes in policies at the district and school levels in response to high-stakes testing, but mixed evidence of some impact has appeared."	Dismissive	Chapter 4: Consequences of large-scale, high-stakes testing on school and classroom practice, p. 96	Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002	https://www.rand.org/pubs/monograph_reports/MR1554.html	US National Science Foundation	Relevant pre-2000 studies of the effects of testing on at-risk students, completion, dropping out, curricular offerings, attitudes, etc. include those of Schleisman (1999); the *Southern Regional Education Board (1998); Webster, Mendro, Orsak, Weerasinghe & Bembry (1997); Jones (1996); Boylan (1996); Jones, 1993; Jacobson (1992); Grisay (1991); Johnstone (1990); Task Force on Educational Assessment Programs [Florida] (1979); Wellisch, MacQueen, Carriere & Duck (1978); Enochs (1978); Pronaratna (1976); and McWilliams & Thomas (1976).
175	Brian M. Stecher	Laura S. Hamilton, Brian M. Stecher, Stephen P. Klein, Eds.	"Although numerous news articles have addressed the negative effects of high-stakes testing, systematic research on the subject is limited."	Dismissive	Chapter 4: Consequences of large-scale, high-stakes testing on school and classroom practice, p. 98	Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002	https://www.rand.org/pubs/monograph_reports/MR1554.html	US National Science Foundation	Relevant pre-2000 studies of the effects of testing on at-risk students, completion, dropping out, curricular offerings, attitudes, etc. include those of Schleisman (1999); the *Southern Regional Education Board (1998); Webster, Mendro, Orsak, Weerasinghe & Bembry (1997); Jones (1996); Boylan (1996); Jones, 1993; Jacobson (1992); Grisay (1991); Johnstone (1990); Task Force on Educational Assessment Programs [Florida] (1979); Wellisch, MacQueen, Carriere & Duck (1978); Enochs (1978); Pronaratna (1976); and McWilliams & Thomas (1976).
176	Brian M. Stecher	Laura S. Hamilton, Brian M. Stecher, Stephen P. Klein, Eds.	"Research regarding the effects of test-based accountability on equity is very limited."	Dismissive	Chapter 4: Consequences of large-scale, high-stakes testing on school and classroom practice, p. 99	Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002	https://www.rand.org/content/dam/rand/pubs/monograph_reports/2002/MR1554.pdf	US National Science Foundation
177	Brian M. Stecher	Laura S. Hamilton, Brian M. Stecher, Stephen P. Klein, Eds.	"Researchers have not documented the desirable consequences of testing … as clearly as the undesirable ones."	Dismissive	Chapter 4: Consequences of large-scale, high-stakes testing on school and classroom practice, p. 99	Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002	https://www.rand.org/content/dam/rand/pubs/monograph_reports/2002/MR1554.pdf	US National Science Foundation	See, for example: https://www.tandfonline.com/doi/full/10.1080/15305058.2011.602920 ; https://nonpartisaneducation.org/Review/Resources/QuantitativeList.htm ; https://nonpartisaneducation.org/Review/Resources/SurveyList.htm ; https://nonpartisaneducation.org/Review/Resources/QualitativeList.htm
178	Brian M. Stecher	Laura S. Hamilton, Brian M. Stecher, Stephen P. Klein, Eds.	" … researchers have not generally measured the extent or magnitude of the shifts in practice that they identified as a result of high-stakes testing."	Dismissive	Chapter 4: Consequences of large-scale, high-stakes testing on school and classroom practice, pp. 99–100	Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002	https://www.rand.org/content/dam/rand/pubs/monograph_reports/2002/MR1554.pdf	US National Science Foundation	The 1993 GAO study did. See, also: https://www.tandfonline.com/doi/full/10.1080/15305058.2011.602920 ; https://nonpartisaneducation.org/Review/Resources/QuantitativeList.htm ; https://nonpartisaneducation.org/Review/Resources/SurveyList.htm ; https://nonpartisaneducation.org/Review/Resources/QualitativeList.htm
179	Lorraine M. McDonnell	Laura S. Hamilton, Brian M. Stecher, Stephen P. Klein, Eds.	"...this chapter can only describe the issues that are raised when one looks at testing from a political perspective. Because of the lack of systematic studies on the topic."	Dismissive	Chapter 5: Accountability as seen through a political lens, p.102	Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002	https://www.rand.org/pubs/monograph_reports/MR1554.html	US National Science Foundation	Parents and other adults are typically reached.through public opinion polls. See, for example: https://www.tandfonline.com/doi/full/10.1080/15305058.2011.602920 ; https://nonpartisaneducation.org/Review/Resources/SurveyList.htm . Among the hundreds of polls conducted between 1958 and 2008, a majority of them included parents in particular or adults in general.
180	Lorraine M. McDonnell	Laura S. Hamilton, Brian M. Stecher, Stephen P. Klein, Eds.	"...public opinion, as measured by surveys, does not always provide a clear and unambiguous measure of public sentiment."	Denigrating	Chapter 5: Accountability as seen through a political lens, p.108	Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002	https://www.rand.org/pubs/monograph_reports/MR1554.html	US National Science Foundation	Parents and other adults are typically reached.through public opinion polls. See, for example: https://www.tandfonline.com/doi/full/10.1080/15305058.2011.602920 ; https://nonpartisaneducation.org/Review/Resources/SurveyList.htm . Among the hundreds of polls conducted between 1958 and 2008, a majority of them included parents in particular or adults in general.
181	Laura S. Hamilton, Brian M. Stecher	Laura S. Hamilton, Brian M. Stecher, Stephen P. Klein, Eds.	"So test-based accountability remains controversial because there is inadequate evidence to make clear judgments about its effectiveness in raising test scores and achieving its other goals."	Dismissive	Chapter 6: Improving test-based accountability, p.122	Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002	https://www.rand.org/pubs/monograph_reports/MR1554.html	US National Science Foundation	In fact, the evidence "that testing can improve education" is voluminous. See, for example, Phelps, R. P. (2005). The rich, robust research literature on testing’s achievement benefits. In R. P. Phelps (Ed.), Defending standardized testing (pp. 55–90). Mahwah, NJ: Psychology Press. Or, see https://journals.sagepub.com/doi/abs/10.1177/0193841X19865628#abstract
182	Laura S. Hamilton, Brian M. Stecher	Laura S. Hamilton, Brian M. Stecher, Stephen P. Klein, Eds.	"Unfortunately, the complexity of the issues and the ambiguity of the existing research do not allow our recommendations to take the form of a practical “how-to” guide for policymakers and practitioners."	Denigrating	Chapter 6: Improving test-based accountability, p.123	Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002	https://www.rand.org/pubs/monograph_reports/MR1554.html	US National Science Foundation	In fact, the evidence "that testing can improve education" is voluminous. See, for example, Phelps, R. P. (2005). The rich, robust research literature on testing’s achievement benefits. In R. P. Phelps (Ed.), Defending standardized testing (pp. 55–90). Mahwah, NJ: Psychology Press. Or, see https://journals.sagepub.com/doi/abs/10.1177/0193841X19865628#abstract
183	Laura S. Hamilton, Brian M. Stecher	Laura S. Hamilton, Brian M. Stecher, Stephen P. Klein, Eds.	"Additional research is needed to identify the elements of performance on tests and how these elements map onto other tests …."	Denigrating	Chapter 6: Improving test-based accountability, p.127	Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002	https://www.rand.org/pubs/monograph_reports/MR1554.html	US National Science Foundation
184	Laura S. Hamilton, Brian M. Stecher	Laura S. Hamilton, Brian M. Stecher, Stephen P. Klein, Eds.	"Another part of the interpretive question is the need to gather information in other subject areas to portray a more complete picture of achievement. The scope of constructs that have been considered in research to date has been fairly narrow, focusing on the subjects that are part of the accountability systems that have been studied. Many legitimate instructional objectives have been ignored in the literature to date."	Denigrating	Chapter 6: Improving test-based accountability, p.127	Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002	https://www.rand.org/pubs/monograph_reports/MR1554.html	US National Science Foundation	Many studies of the effects of testing predate CRESST's in the 1980s and cover all subject fields, not just reading and math. See, for example: https://www.tandfonline.com/doi/full/10.1080/15305058.2011.602920 ; https://nonpartisaneducation.org/Review/Resources/QuantitativeList.htm ; https://nonpartisaneducation.org/Review/Resources/SurveyList.htm ; https://nonpartisaneducation.org/Review/Resources/QualitativeList.htm
185	Laura S. Hamilton, Brian M. Stecher	Laura S. Hamilton, Brian M. Stecher, Stephen P. Klein, Eds.	"States should also conduct ongoing analyses of the performance of groups whose members may not be numerous enough to permit separate reporting. English-language learners and students with disabilities are increasingly being included in high-stakes testing systems, and, as discussed in Chapter Three, little is currently known about the validity of scores for these groups."	Dismissive	Chapter 6: Improving test-based accountability, p.131	Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002	https://www.rand.org/pubs/monograph_reports/MR1554.html	US National Science Foundation	Difficult to believe given that the federal government has for decades generously funded research into testing students with disabilities. See, for example, https://nceo.info/ and Kurt Geisinger's and Janet Carlson's chapters in Defending Standardized Testing and Correcting Fallacies in Educational and Psychological Testing.
186	Laura S. Hamilton, Brian M. Stecher	Laura S. Hamilton, Brian M. Stecher, Stephen P. Klein, Eds.	"It would be especially helpful to know what changes in instruction are made in response to different kinds of information and incentives. In particular, we need to know how teachers interpret information from tests and how they use it to modify instruction."	Dismissive	Chapter 6: Improving test-based accountability, p.133	Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002	https://www.rand.org/pubs/monograph_reports/MR1554.html	US National Science Foundation	Relevant studies of the effects of tests and/or accountability program on motivation and instructional practice include those of the Southern Regional Education Board (1998); Johnson (1998); Schafer, Hultgren, Hawley, Abrams Seubert & Mazzoni (1997); Miles, Bishop, Collins, Fink, Gardner, Grant, Hussain, et al. (1997); Tuckman & Trimble (1997); Clarke & Stephens (1996); Zigarelli (1996); Stevenson, Lee, et al. (1995); Waters, Burger & Burger (1995); Egeland (1995); Prais (1995); Tuckman (1994); Ritchie & Thorkildsen (1994); Brown & Walberg, (1993); Wall & Alderson (1993); Wolf & Rapiau (1993); Eckstein & Noah (1993); Chao-Qun & Hui (1993); Plazak & Mazur (1992); Steedman (1992); Singh, Marimutha & Mukjerjee (1990); Levine & Lezotte (1990); O’Sullivan (1989); Somerset (1988); Pennycuick & Murphy (1988); Stevens (1984); Marsh (1984); Brunton (1982); Solberg (1977); Foss (1977); Kirkland (1971); Somerset (1968); Stuit (1947); and Keys (1934). Covers many studies; study is a research review, research synthesis, or meta-analysis.	"Others have considered the role of tests in incentive programs. These researchers have included Homme, Csanyi, Gonzales, Rechs, O’Leary, Drabman, Kaszdin, Bootzin, Staats, Cameron, Pierce, McMillan, Corcoran, and Wilson. International organizations, such as the World Bank or the Asian Development Bank, have studied the effects of testing on education programs they sponsor. Researchers have included Somerset, Heynemann, Ransom, Psacharopoulis, Velez, Brooke, Oxenham, Bude, Chapman, Snyder, and Pronaratna. Moreover, the mastery learning/mastery testing experiments conducted from the 1960s through today varied incentives, frequency of tests, types of tests, and many other factors to determine the optimal structure of testing programs. Researchers included such notables as Bloom, Carroll, Keller, Block, Burns, Wentling, Anderson, Hymel, Kulik, Tierney, Cross, Okey, Guskey, Gates, and Jones."
187	Laura S. Hamilton, Brian M. Stecher	Laura S. Hamilton, Brian M. Stecher, Stephen P. Klein, Eds.	It seems clear that aligning the components of the system and providing appropriate professional development should, at a minimum, increase teachers’ political support for test-based accountability policies .... Although there is no empirical evidence to suggest that this strategy will reduce inappropriate responses to high-stakes testing,... Additional research needs to be done to determine the importance of alignment for promoting positive effects of test-based accountability.	Dismissive	Chapter 6: Improving test-based accountability, p.135	Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002	https://www.rand.org/pubs/monograph_reports/MR1554.html	US National Science Foundation	Relevant studies of the effects of tests and/or accountability program on motivation and instructional practice include those of the Southern Regional Education Board (1998); Johnson (1998); Schafer, Hultgren, Hawley, Abrams Seubert & Mazzoni (1997); Miles, Bishop, Collins, Fink, Gardner, Grant, Hussain, et al. (1997); Tuckman & Trimble (1997); Clarke & Stephens (1996); Zigarelli (1996); Stevenson, Lee, et al. (1995); Waters, Burger & Burger (1995); Egeland (1995); Prais (1995); Tuckman (1994); Ritchie & Thorkildsen (1994); Brown & Walberg, (1993); Wall & Alderson (1993); Wolf & Rapiau (1993); Eckstein & Noah (1993); Chao-Qun & Hui (1993); Plazak & Mazur (1992); Steedman (1992); Singh, Marimutha & Mukjerjee (1990); Levine & Lezotte (1990); O’Sullivan (1989); Somerset (1988); Pennycuick & Murphy (1988); Stevens (1984); Marsh (1984); Brunton (1982); Solberg (1977); Foss (1977); Kirkland (1971); Somerset (1968); Stuit (1947); and Keys (1934). Covers many studies; study is a research review, research synthesis, or meta-analysis.	"Others have considered the role of tests in incentive programs. These researchers have included Homme, Csanyi, Gonzales, Rechs, O’Leary, Drabman, Kaszdin, Bootzin, Staats, Cameron, Pierce, McMillan, Corcoran, and Wilson. International organizations, such as the World Bank or the Asian Development Bank, have studied the effects of testing on education programs they sponsor. Researchers have included Somerset, Heynemann, Ransom, Psacharopoulis, Velez, Brooke, Oxenham, Bude, Chapman, Snyder, and Pronaratna. Moreover, the mastery learning/mastery testing experiments conducted from the 1960s through today varied incentives, frequency of tests, types of tests, and many other factors to determine the optimal structure of testing programs. Researchers included such notables as Bloom, Carroll, Keller, Block, Burns, Wentling, Anderson, Hymel, Kulik, Tierney, Cross, Okey, Guskey, Gates, and Jones."
188	Laura S. Hamilton, Brian M. Stecher	Laura S. Hamilton, Brian M. Stecher, Stephen P. Klein, Eds.	"… we currently do not know enough about test-based accountability to design a system that is immune from the problems we have discussed	Dismissive	Chapter 6: Improving test-based accountability, p.136	Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002	https://www.rand.org/pubs/monograph_reports/MR1554.html	US National Science Foundation	In fact, the evidence "that testing can improve education" is voluminous. See, for example, Phelps, R. P. (2005). The rich, robust research literature on testing’s achievement benefits. In R. P. Phelps (Ed.), Defending standardized testing (pp. 55–90). Mahwah, NJ: Psychology Press. Or, see https://journals.sagepub.com/doi/abs/10.1177/0193841X19865628#abstract
189	Laura S. Hamilton, Brian M. Stecher	Laura S. Hamilton, Brian M. Stecher, Stephen P. Klein, Eds.	"There is some limited evidence that educators’ responses to test based accountability vary according to the characteristics of their student populations,…"	Denigrating	Chapter 6: Improving test-based accountability, p.138	Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002	https://www.rand.org/pubs/monograph_reports/MR1554.html	US National Science Foundation	There was and is far more than "limited" evidence. See, for example: Hundreds, perhaps thousands, of studies of the effects of testing predate CRESST's in the 1980s. See, for example: https://www.tandfonline.com/doi/full/10.1080/15305058.2011.602920 ; https://nonpartisaneducation.org/Review/Resources/SurveyList.htm ; https://nonpartisaneducation.org/Review/Resources/QualitativeList.htm
190	Laura S. Hamilton, Brian M. Stecher	Laura S. Hamilton, Brian M. Stecher, Stephen P. Klein, Eds.	"... there is very limited evidence to guide thinking about political issues."	Dismissive	Chapter 6: Improving test-based accountability, p.139	Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002	https://www.rand.org/pubs/monograph_reports/MR1554.html	US National Science Foundation	Parents and other adults are typically reached.through public opinion polls. See, for example: https://www.tandfonline.com/doi/full/10.1080/15305058.2011.602920 ; https://nonpartisaneducation.org/Review/Resources/SurveyList.htm . Among the hundreds of polls conducted between 1958 and 2008, a majority of them included parents in particular or adults in general.
191	Laura S. Hamilton, Brian M. Stecher	Laura S. Hamilton, Brian M. Stecher, Stephen P. Klein, Eds.	"First, we do not have an accurate assessment of the additional costs."	Denigrating	Chapter 6: Improving test-based accountability, p.141	Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002	https://www.rand.org/pubs/monograph_reports/MR1554.html	US National Science Foundation	Yes, we did and we do. See, for example, Phelps, R.P. (2000, Winter). Estimating the cost of systemwide student testing in the United States. Journal of Education Finance, 25(3) 343–380; Danitz, T. (2001, February 27). Special report: States pay $400 million for tests in 2001. Stateline.org. Pew Center for the States; Hoxby, C.M. (2002). The cost of accountability, in W. M Evers & H.J. Walberg (Eds.), School Accountability, Stanford, CA: Hoover Institution Press; U.S. GAO. (1993, January). Student testing: Current extent and expenditures, with cost estimates for a national examination. GAO/PEMD-93-8. Washington, DC: US General Accounting Office; Picus, L.O., & Tralli, A. (1998, February). Alternative assessment programs: What are the true costs? CSE Technical Report 441, Los Angeles: CRESST; Phelps, R.P. (1998). Benefit-cost analysis of systemwide student testing, Paper presented at the annual meeting of the American Education Finance Association, Mobile, AL.
192	Laura S. Hamilton, Brian M. Stecher	Laura S. Hamilton, Brian M. Stecher, Stephen P. Klein, Eds.	"However, many of these recommended reforms are relatively inexpensive in comparison with the total cost of education. This equation is seldom examined."	Denigrating	Chapter 6: Improving test-based accountability, p.141	Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002	https://www.rand.org/pubs/monograph_reports/MR1554.html	US National Science Foundation	Wrong. See, for example, Phelps, R.P. (2000, Winter). Estimating the cost of systemwide student testing in the United States. Journal of Education Finance, 25(3) 343–380; Phelps, R.P. (1998). Benefit-cost analysis of systemwide student testing, Paper presented at the annual meeting of the American Education Finance Association, Mobile, AL.
193	Laura S. Hamilton, Brian M. Stecher	Laura S. Hamilton, Brian M. Stecher, Stephen P. Klein, Eds.	"Part of the reason these issues are rarely considered may be that no one has produced a good estimate of the cost of an improved accountability system in comparison with its benefits."	Denigrating	Chapter 6: Improving test-based accountability, p.141	Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002	https://www.rand.org/pubs/monograph_reports/MR1554.html	US National Science Foundation	No. See, for example, Phelps, R.P. (2000, Winter). Estimating the cost of systemwide student testing in the United States. Journal of Education Finance, 25(3) 343–380; Danitz, T. (2001, February 27). Special report: States pay $400 million for tests in 2001. Stateline.org. Pew Center for the States; Hoxby, C.M. (2002). The cost of accountability, in W. M Evers & H.J. Walberg (Eds.), School Accountability, Stanford, CA: Hoover Institution Press; U.S. GAO. (1993, January). Student testing: Current extent and expenditures, with cost estimates for a national examination. GAO/PEMD-93-8. Washington, DC: US General Accounting Office; Picus, L.O., & Tralli, A. (1998, February). Alternative assessment programs: What are the true costs? CSE Technical Report 441, Los Angeles: CRESST; Phelps, R.P. (1998). Benefit-cost analysis of systemwide student testing, Paper presented at the annual meeting of the American Education Finance Association, Mobile, AL.
194	Laura S. Hamilton, Brian M. Stecher	Laura S. Hamilton, Brian M. Stecher, Stephen P. Klein, Eds.	"Nevertheless, our knowledge of the costs of alternative accountability systems is still somewhat limited. Policymakers need to know how much it would cost to change their current systems to be responsive to criticisms such as those described in this book. These estimates need to consider all of the associated costs, including possible opportunity costs associated with increased testing time and increased test preparation time."	Dismissive	Chapter 6: Improving test-based accountability, p.142	Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002	https://www.rand.org/pubs/monograph_reports/MR1554.html	US National Science Foundation	No. See, for example, Phelps, R.P. (2000, Winter). Estimating the cost of systemwide student testing in the United States. Journal of Education Finance, 25(3) 343–380; Danitz, T. (2001, February 27). Special report: States pay $400 million for tests in 2001. Stateline.org. Pew Center for the States; Hoxby, C.M. (2002). The cost of accountability, in W. M Evers & H.J. Walberg (Eds.), School Accountability, Stanford, CA: Hoover Institution Press; U.S. GAO. (1993, January). Student testing: Current extent and expenditures, with cost estimates for a national examination. GAO/PEMD-93-8. Washington, DC: US General Accounting Office; Picus, L.O., & Tralli, A. (1998, February). Alternative assessment programs: What are the true costs? CSE Technical Report 441, Los Angeles: CRESST; Phelps, R.P. (1998). Benefit-cost analysis of systemwide student testing, Paper presented at the annual meeting of the American Education Finance Association, Mobile, AL.
195	Laura S. Hamilton, Brian M. Stecher	Laura S. Hamilton, Brian M. Stecher, Stephen P. Klein, Eds.	"However, there is still much about these systems that is not well understood. Lack of research-based knowledge about the quality of scores and the mechanisms through which high-stakes testing programs operate limits our ability to improve these systems. As a result, our discussions also identified unanswered questions..."	Dismissive	Chapter 6: Improving test-based accountability, p.143	Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002	https://www.rand.org/pubs/monograph_reports/MR1554.html	US National Science Foundation	In fact, the evidence "that testing can improve education" is voluminous. See, for example, Phelps, R. P. (2005). The rich, robust research literature on testing’s achievement benefits. In R. P. Phelps (Ed.), Defending standardized testing (pp. 55–90). Mahwah, NJ: Psychology Press. Or, see https://journals.sagepub.com/doi/abs/10.1177/0193841X19865628#abstract
196	Eva L. Baker, Robert L. Linn, Joan L. Herman, and Daniel Koretz		"Because experience with accountability systems is still developing, the standards we propose are intended to help evaluate existing systems and to guide the design of improved procedures." p.1	Dismissive	Standards for Educational Accountability Systems	CRESST Policy Brief 5, Winter 2002	https://www.gpo.gov/fdsys/pkg/ERIC-ED466643/pdf/ERIC-ED466643.pdf	Office of Research and Improvement, US Education Department	See, for example: https://www.tandfonline.com/doi/full/10.1080/15305058.2011.602920 ; https://nonpartisaneducation.org/Review/Resources/QuantitativeList.htm . This list includes 24 studies completed before 2000 whose primary focus was to measure the effect of “test-based accountability.” A few dozen more pre-2000 studies also measured the effect of test-based accountability although such was not their primary focus. Include qualitative and program evaluation studies of test-based accountability, and the count of pre-2000 studies rises into the hundreds.	Relevant studies of the effects of tests and/or accountability program on motivation and instructional practice include those of the Southern Regional Education Board (1998); Johnson (1998); Schafer, Hultgren, Hawley, Abrams Seubert & Mazzoni (1997); Miles, Bishop, Collins, Fink, Gardner, Grant, Hussain, et al. (1997); Tuckman & Trimble (1997); Clarke & Stephens (1996); Zigarelli (1996); Stevenson, Lee, et al. (1995); Waters, Burger & Burger (1995); Egeland (1995); Prais (1995); Tuckman (1994); Ritchie & Thorkildsen (1994); Brown & Walberg, (1993); Wall & Alderson (1993); Wolf & Rapiau (1993); Eckstein & Noah (1993); Chao-Qun & Hui (1993); Plazak & Mazur (1992); Steedman (1992); Singh, Marimutha & Mukjerjee (1990); Levine & Lezotte (1990); O’Sullivan (1989); Somerset (1988); Pennycuick & Murphy (1988); Stevens (1984); Marsh (1984); Brunton (1982); Solberg (1977); Foss (1977); Kirkland (1971); Somerset (1968); Stuit (1947); and Keys (1934). Covers many studies; study is a research review, research synthesis, or meta-analysis.
197	Eva L. Baker, Robert L. Linn, Joan L. Herman, and Daniel Koretz		"It is not possible at this stage in the development of accountability systems to know in advance how every element of an accountability system will actually operate in practice or what effects it will produce." p.1	Dismissive	Standards for Educational Accountability Systems	CRESST Policy Brief 5, Winter 2002	https://www.gpo.gov/fdsys/pkg/ERIC-ED466643/pdf/ERIC-ED466643.pdf	Office of Research and Improvement, US Education Department	See, for example: https://www.tandfonline.com/doi/full/10.1080/15305058.2011.602920 ; https://nonpartisaneducation.org/Review/Resources/QuantitativeList.htm . This list includes 24 studies completed before 2000 whose primary focus was to measure the effect of “test-based accountability.” A few dozen more pre-2000 studies also measured the effect of test-based accountability although such was not their primary focus. Include qualitative and program evaluation studies of test-based accountability, and the count of pre-2000 studies rises into the hundreds.	Relevant studies of the effects of tests and/or accountability program on motivation and instructional practice include those of the Southern Regional Education Board (1998); Johnson (1998); Schafer, Hultgren, Hawley, Abrams Seubert & Mazzoni (1997); Miles, Bishop, Collins, Fink, Gardner, Grant, Hussain, et al. (1997); Tuckman & Trimble (1997); Clarke & Stephens (1996); Zigarelli (1996); Stevenson, Lee, et al. (1995); Waters, Burger & Burger (1995); Egeland (1995); Prais (1995); Tuckman (1994); Ritchie & Thorkildsen (1994); Brown & Walberg, (1993); Wall & Alderson (1993); Wolf & Rapiau (1993); Eckstein & Noah (1993); Chao-Qun & Hui (1993); Plazak & Mazur (1992); Steedman (1992); Singh, Marimutha & Mukjerjee (1990); Levine & Lezotte (1990); O’Sullivan (1989); Somerset (1988); Pennycuick & Murphy (1988); Stevens (1984); Marsh (1984); Brunton (1982); Solberg (1977); Foss (1977); Kirkland (1971); Somerset (1968); Stuit (1947); and Keys (1934). Covers many studies; study is a research review, research synthesis, or meta-analysis.
198	Jay P. Heubert		"For Heubert, it is very much an open question what the effect of standards and high-stakes testing will be." p.83	Dismissive	Achieving High Standards for All	National Research Council		"This project was funded by grant R215U990023 from the Office of Educational Research andImprovement (OERI) of the United States Department of Education."	See, for example, https://www.tandfonline.com/doi/full/10.1080/15305058.2011.602920 ; https://nonpartisaneducation.org/Review/Resources/SurveyList.htm ; https://nonpartisaneducation.org/Review/Resources/QualitativeList.htm; https://nonpartisaneducation.org/Review/Resources/QuantitativeList.htm	Relevant pre-2000 studies of the effects of standards, alignment, goal setting, setting reachable goals, etc. include those of Mitchell (1999); Morgan & Ramist (1998); the Southern Regional Education Board (1998); Miles, Bishop, Collins, Fink, Gardner, Grant, Hussain, et al. (1997); the Florida Office of Program Policy Analysis (1997); Pomplun (1997); Schmoker (1996); Aguilera & Hendricks (1996); Banta, Lund, Black & Oblander (1996); Bottoms & Mikos (1995); Bamburg & Medina (1993); Bishop (1993); the U. S. General Accounting Office (1993); Eckstein & Noah (1993); Mattsson (1993); Brown (1992); Heyneman & Ransom (1992); Whetton (1992); Anderson, Muir, Bateson, Blackmore & Rogers (1990); Csikszentmihalyi (1990); Levine & Lezotte (1990); LaRoque & Coleman (1989); Hillocks (1987); Willingham & Morris (1986); Resnick & Resnick (1985); Ogle & Fritts (1984); Natriello & Dornbusch (1984); Brooke & Oxenham (1984); Rentz (1979); Wellisch, MacQueen, Carriere & Dick (1978); *Rosswork (1977); Estes, Colvin & Goodwin (1976); Wood (1953); and Panlasigui & Knight (1930).
199	Ready, Timothy, Ed.; Edley, Christopher, Jr., Ed.; Snow, Catherine E., Ed.		"To be sure, there is a largely unexamined empirical assertion under-lying the arguments of high-stakes proponents: attaching high-stakesconsequences for the students provides an indispensable, otherwise un-obtainable incentive for students, parents, and teachers topay carefulattention to learning tasks." p. 128	Dismissive	Achieving High Standards for All	National Research Council		"This project was funded by grant R215U990023 from the Office of Educational Research andImprovement (OERI) of the United States Department of Education."	Relevant studies of the effects of tests and/or accountability program on motivation and instructional practice include those of the Southern Regional Education Board (1998); Johnson (1998); Schafer, Hultgren, Hawley, Abrams Seubert & Mazzoni (1997); Miles, Bishop, Collins, Fink, Gardner, Grant, Hussain, et al. (1997); Tuckman & Trimble (1997); Clarke & Stephens (1996); Zigarelli (1996); Stevenson, Lee, et al. (1995); Waters, Burger & Burger (1995); Egeland (1995); Prais (1995); Tuckman (1994); Ritchie & Thorkildsen (1994); Brown & Walberg, (1993); Wall & Alderson (1993); Wolf & Rapiau (1993); Eckstein & Noah (1993); Chao-Qun & Hui (1993); Plazak & Mazur (1992); Steedman (1992); Singh, Marimutha & Mukjerjee (1990); Levine & Lezotte (1990); O’Sullivan (1989); Somerset (1988); Pennycuick & Murphy (1988); Stevens (1984); Marsh (1984); Brunton (1982); Solberg (1977); Foss (1977); Kirkland (1971); Somerset (1968); Stuit (1947); and Keys (1934). Covers many studies; study is a research review, research synthesis, or meta-analysis.	"Others have considered the role of tests in incentive programs. These researchers have included Homme, Csanyi, Gonzales, Rechs, O’Leary, Drabman, Kaszdin, Bootzin, Staats, Cameron, Pierce, McMillan, Corcoran, and Wilson. International organizations, such as the World Bank or the Asian Development Bank, have studied the effects of testing on education programs they sponsor. Researchers have included Somerset, Heynemann, Ransom, Psacharopoulis, Velez, Brooke, Oxenham, Bude, Chapman, Snyder, and Pronaratna. Moreover, the mastery learning/mastery testing experiments conducted from the 1960s through today varied incentives, frequency of tests, types of tests, and many other factors to determine the optimal structure of testing programs. Researchers included such notables as Bloom, Carroll, Keller, Block, Burns, Wentling, Anderson, Hymel, Kulik, Tierney, Cross, Okey, Guskey, Gates, and Jones."
200	Daniel M. Koretz	Daniel F. McCaffrey, Laura S. Hamilton	"Although high-stakes testing is now widespread, methods for evaluating the validity of gains obtained under high-stakes conditions are poorly developed. This report presents an approach for evaluating the validity of inferences based on score gains on high-stakes tests. It describes the inadequacy of traditional validation approaches for validating gains under high-stakes conditions and outlines an alternative validation framework for conceptualizing meaningful and inflated score gains.", p.1	Denigrating	Toward a framework for validating gains under high-stakes conditions	CSE Technical Report 551, CRESST/Harvard Graduate School of Education, CRESST/RAND Education, December 2001	https://files.eric.ed.gov/fulltext/ED462410.pdf	Office of Research and Improvement, US Education Department	In fact the test prep, or test coaching, literature is vast and dates back decades, with meta-analyses of the literature dating back at least to the 1970s. There's even a What Works Clearinghouse summary of the (post World Wide Web) college admission test prep research literature: https://ies.ed.gov/ncee/wwc/Docs/InterventionReports/wwc_act_sat_100416.pdf . See also: Gilmore (1927) DeWeerdt (1927) French (1959) French & Dear (1959) Ortar (1960) Marron (1965) ETS (1965). Messick & Jungeblut (1981) Ellis, Konoske, Wulfeck, & Montague (1982) DerSimonian and Laird (1983) Kulik, Bangert-Drowns & Kulik (1984) Powers (1985) Samson (1985) Scruggs, White, & Bennion (1986) Jones (1986). Fraker (1986/1987) Halpin (1987) Whitla (1988) Snedecor (1989) Bond (1989). Baydar (1990) Becker (1990) Smyth (1990) Moore (1991) Alderson & Wall (1992) Powers (1993) Oren (1993). Powers & Rock (1994) Scholes, Lane (1997) Allalouf & Ben Shakhar (1998) Robb & Ercanbrack (1999) McClain (1999) Camara (1999, 2001, 2008) Stone & Lane (2000, 2003) Din & Soldan (2001) Briggs (2001) Palmer (2002) Briggs & Hansen (2004) Cankoy & Ali Tut (2005) Crocker (2005) Allensworth, Correa, & Ponisciak (2008) Domingue & Briggs (2009) Koljatic & Silva (2014) Early (2019) Herndon (2021)
201	Daniel M. Koretz	Daniel F. McCaffrey, Laura S. Hamilton	"Few efforts are made to evaluate directly score gains obtained under high-stakes conditions, and conventional validation tools are not fully adequate for the task.", p. 1	Dismissive	Toward a framework for validating gains under high-stakes conditions	CSE Technical Report 551, CRESST/Harvard Graduate School of Education, CRESST/RAND Education, December 2001	https://files.eric.ed.gov/fulltext/ED462410.pdf	Office of Research and Improvement, US Education Department	In fact the test prep, or test coaching, literature is vast and dates back decades, with meta-analyses of the literature dating back at least to the 1970s. There's even a What Works Clearinghouse summary of the (post World Wide Web) college admission test prep research literature: https://ies.ed.gov/ncee/wwc/Docs/InterventionReports/wwc_act_sat_100416.pdf . See also: Gilmore (1927) DeWeerdt (1927) French (1959) French & Dear (1959) Ortar (1960) Marron (1965) ETS (1965). Messick & Jungeblut (1981) Ellis, Konoske, Wulfeck, & Montague (1982) DerSimonian and Laird (1983) Kulik, Bangert-Drowns & Kulik (1984) Powers (1985) Samson (1985) Scruggs, White, & Bennion (1986) Jones (1986). Fraker (1986/1987) Halpin (1987) Whitla (1988) Snedecor (1989) Bond (1989). Baydar (1990) Becker (1990) Smyth (1990) Moore (1991) Alderson & Wall (1992) Powers (1993) Oren (1993). Powers & Rock (1994) Scholes, Lane (1997) Allalouf & Ben Shakhar (1998) Robb & Ercanbrack (1999) McClain (1999) Camara (1999, 2001, 2008) Stone & Lane (2000, 2003) Din & Soldan (2001) Briggs (2001) Palmer (2002) Briggs & Hansen (2004) Cankoy & Ali Tut (2005) Crocker (2005) Allensworth, Correa, & Ponisciak (2008) Domingue & Briggs (2009) Koljatic & Silva (2014) Early (2019)
202	Karen J. Mitchell, David Z. Robinson, Barbara S. Plake, & Kaeli T. Knowles (Eds.)		"Despite their importance and widespread use, little is known about the impact of these tests on states’ recent efforts to improve teaching and learning."	Dismissive	Testing Teacher Candidates: The Role of Licensure Tests in Improving Teacher Quality, 2001, p.14	Committee on Assessment and Teacher Quality		Board on Testing and Assessment, National Research Council	Every stage of test development, administration, and analysis at National Evaluation Systems—the contractors for dozens of states' teacher licensure tests—was thoroughly documented. But, instead of requesting that documentation from each state, which owned said documentaiton, the NRC committee insisted that NES provide it. NES refused to do so unless the NRC committee received permission from each state. The NRC committee, apparently, didn't feel like doing that much work, so declared the information nonexistent.
203	Karen J. Mitchell, David Z. Robinson, Barbara S. Plake, & Kaeli T. Knowles (Eds.)		"Little information about the technical soundness of teacher licensure tests appears in the published literature."	Dismissive	Testing Teacher Candidates: The Role of Licensure Tests in Improving Teacher Quality, 2001, p.14	Committee on Assessment and Teacher Quality		Board on Testing and Assessment, National Research Council	Every stage of test development, administration, and analysis at National Evaluation Systems—the contractors for dozens of states' teacher licensure tests—was thoroughly documented. But, instead of requesting that documentation from each state, which ownsed said documentaiton, the NRC committee insisted that NES provide it. NES refused to do so unless the NRC committee received permission from each state. The NRC committee, apparently, didn't feel like doing that much work, so declared the information nonexistent.
204	Karen J. Mitchell, David Z. Robinson, Barbara S. Plake, & Kaeli T. Knowles (Eds.)		"Little research exists on the extent to which licensure tests identify candidates with the knowledge and skills necessary to be minimally competent beginning teachers."	Dismissive	Testing Teacher Candidates: The Role of Licensure Tests in Improving Teacher Quality, 2001, p.14	Committee on Assessment and Teacher Quality		Board on Testing and Assessment, National Research Council	Every stage of test development, administration, and analysis at National Evaluation Systems—the contractors for dozens of states' teacher licensure tests—was thoroughly documented. But, instead of requesting that documentation from each state, which ownsed said documentaiton, the NRC committee insisted that NES provide it. NES refused to do so unless the NRC committee received permission from each state. The NRC committee, apparently, didn't feel like doing that much work, so declared the information nonexistent.
205	Karen J. Mitchell, David Z. Robinson, Barbara S. Plake, & Kaeli T. Knowles (Eds.)		"Information is needed about the soundness and technical quality of the tests that states use to license their teachers."	Dismissive	Testing Teacher Candidates: The Role of Licensure Tests in Improving Teacher Quality, 2001, p.14	Committee on Assessment and Teacher Quality		Board on Testing and Assessment, National Research Council	Every stage of test development, administration, and analysis at National Evaluation Systems—the contractors for dozens of states' teacher licensure tests—was thoroughly documented. But, instead of requesting that documentation from each state, which ownsed said documentaiton, the NRC committee insisted that NES provide it. NES refused to do so unless the NRC committee received permission from each state. The NRC committee, apparently, didn't feel like doing that much work, so declared the information nonexistent.
206	Karen J. Mitchell, David Z. Robinson, Barbara S. Plake, & Kaeli T. Knowles (Eds.)		"policy and practice on teacher licensure testing in the United States are nascent and evolving"	Dismissive	Testing Teacher Candidates: The Role of Licensure Tests in Improving Teacher Quality, 2001, p.17	Committee on Assessment and Teacher Quality		Board on Testing and Assessment, National Research Council	Every stage of test development, administration, and analysis at National Evaluation Systems—the contractors for dozens of states' teacher licensure tests—was thoroughly documented. But, instead of requesting that documentation from each state, which ownsed said documentaiton, the NRC committee insisted that NES provide it. NES refused to do so unless the NRC committee received permission from each state. The NRC committee, apparently, didn't feel like doing that much work, so declared the information nonexistent.
207	Karen J. Mitchell, David Z. Robinson, Barbara S. Plake, & Kaeli T. Knowles (Eds.)		"The paucity of data and these methodological challenges made the committee’s examination of teacher licensure testing difficult."	Dismissive	Testing Teacher Candidates: The Role of Licensure Tests in Improving Teacher Quality, 2001, p.17	Committee on Assessment and Teacher Quality		Board on Testing and Assessment, National Research Council	Every stage of test development, administration, and analysis at National Evaluation Systems—the contractors for dozens of states' teacher licensure tests—was thoroughly documented. But, instead of requesting that documentation from each state, which ownsed said documentaiton, the NRC committee insisted that NES provide it. NES refused to do so unless the NRC committee received permission from each state. The NRC committee, apparently, didn't feel like doing that much work, so declared the information nonexistent.
208	Karen J. Mitchell, David Z. Robinson, Barbara S. Plake, & Kaeli T. Knowles (Eds.)		"There were a number of questions the committee wanted to answer but could not, either because they were beyond the scope of this study, the evidentiary base was inconclusive, or the committee’s time and resources were insufficient."	Dismissive	Testing Teacher Candidates: The Role of Licensure Tests in Improving Teacher Quality, 2001, p.17	Committee on Assessment and Teacher Quality		Board on Testing and Assessment, National Research Council	Every stage of test development, administration, and analysis at National Evaluation Systems—the contractors for dozens of states' teacher licensure tests—was thoroughly documented. But, instead of requesting that documentation from each state, which ownsed said documentaiton, the NRC committee insisted that NES provide it. NES refused to do so unless the NRC committee received permission from each state. The NRC committee, apparently, didn't feel like doing that much work, so declared the information nonexistent.
209	Harold F. O’Neil, Jr., University of Southern California, CRESST	Jamal Abedi, UCLA/CRESST, Charlotte Lee, UCLA/CRESST, Judy Miyoshi, UCLA/CRESST, Ann Mastergeorge, UCLA/CRESST	"To our knowledge, based on an extensive literature review (to be reported elsewhere), our research group is the only one conducting research of this type; i.e., meaningful monetary incentives with released items from either NAEP or TIMSS with 12th graders." p.1	Firstness	Monetary Incentives for Low-Stakes Tests, March 2001	report to USED, CRESST	https://nces.ed.gov/pubs2001/2001024.pdf	"The work reported herein was funded at least in part with Federal funds from the U.S. Department of Education under the American Institutes for Research (AIR)/Education Statistical Services Institute (ESSI) contract number RN95127001, Task Order 1.2.93.1, as administered by the ... NCES.. The work reported herein was also supported under the Educational Research and Development Centers Program, PR/Award Number R305B60002, as administered by the Office of Educational Research and Improvement (OERI), U.S. Department of Education."
210	Marguerite Clarke	Jamal Abedi, UCLA/CRESST	“[T]here has been no analogous infrastructure for independently evaluating a testing program before or after implementation, or for monitoring test use and impact.” p. 19	Dismissive	The Adverse Impact of High Stakes Testing on Minority Students: Evidence from 100 Years of Test Data	In G. Orfield and M. Kornhaber (Eds.), Raising standards or raising barriers? Inequality and high stakes testing in public education. New York: The Century Foundation (2001)	http://files.eric.ed.gov/fulltext/ED450183.pdf	The Century Foundation	External evaluations of large-scale testing programs not only exist, but represent the norm.
211	Marguerite Clarke	Charlotte Lee, UCLA/CRESST	“The effects of testing are now so diverse, widespread, and serious that it is necessary to establish mechanisms for catalyzing inquiry about, and systematic independent scrutiny of them.” p. 20	Dismissive	The Adverse Impact of High Stakes Testing on Minority Students: Evidence from 100 Years of Test Data	In G. Orfield and M. Kornhaber (Eds.), Raising standards or raising barriers? Inequality and high stakes testing in public education. New York: The Century Foundation (2001)	http://files.eric.ed.gov/fulltext/ED450183.pdf	The Century Foundation	External evaluations of large-scale testing programs not only exist, but represent the norm.
212	Ronald Deitel	Judy Miyoshi, UCLA/CRESST	"In the late 1980s, CRESST was among the first to research the measurement of rigorous, discipline-based knowledge for purposes of large-scale assessment."	1stness	Center for Research on Evaluation, Standards, and Student Testing (CRESST) clarify the goals and activities of CRESST	EducationNews.org, November 18, 2000		Office of Research and Improvement, US Education Department	Nonsense. Hundreds, perhaps thousands, of studies of the effects of testing predate CRESST's in the 1980s. See, for example: https://www.tandfonline.com/doi/full/10.1080/15305058.2011.602920 ; https://nonpartisaneducation.org/Review/Resources/QuantitativeList.htm ; https://nonpartisaneducation.org/Review/Resources/SurveyList.htm ; https://nonpartisaneducation.org/Review/Resources/QualitativeList.htm
213	Marguerite Clarke	Ann Mastergeorge, UCLA/CRESST	“[F]or most of this century, there has been no infrastructure for independently evaluating a testing programme before or after implementation, or for monitoring test use and impact. The commercial testing industry does not as yet have any structure in place for the regulation and monitoring of appropriate test use.” p. 177	Dismissive	Retrospective on Educational Testing and Assessment in the 20th Century	Curriculum Studies, 2000, vol. 32, no. 2,	http://webpages.uncc.edu/~rglamber/Rsch6109%20Materials/HistoryAchTests_3958652.pdf		External evaluations of large-scale testing programs not only exist, but represent the norm.
214	Marguerite Clarke	Madaus, Horn, and Ramos	“Given the paucity of evidence available on the volume of testing over time, we examined five indirect indicators of growth in testing. . . .” p. 169	Dismissive	Retrospective on Educational Testing and Assessment in the 20th Century	Curriculum Studies, 2000, vol. 32, no. 2	http://webpages.uncc.edu/~rglamber/Rsch6109%20Materials/HistoryAchTests_3958652.pdf		There exist many sources of such information, from the Council of Chief State School Officers (CCSSO), the US Education Department, the US General Accounting Office (GAO), for example.
215	Sheila Barron		"Although this is a topic researchers ... talk about often, very little has been written about the difficulties secondary analysts confront." p.173	Dismissive	Difficulties associated with secondary analysis of NAEP data, chapter 9	Grading the Nation's Report Card, National Research Council, 2000	https://www.nap.edu/catalog/9751/grading-the-nations-report-card-research-from-the-evaluation-of	National Research Council funders	In their 2009 Evaluation of NAEP for the US Education Department, Buckendahl, Davis, Plake, Sireci, Hambleton, Zenisky, & Wells (pp. 77–85) managed to find quite a lot of research on making comparisons between NAEP and state assessments: several of NAEP's own publications, Chromy 2005), Chromy, Ault, Black, & Mosquin (2007), McLaughlin (2000), Schuiz & Mitzel (2005), Sireci, Robin, Meara, Rogers, & Swaminathan (2000), Stancavage, Et al (2002), Stoneberg (2007), WestEd (2002), and Wise, Le, Hoffman, & Becker (2004).
216	Sheila Barron		"...few articles have been written that specifically address the difficulties of using NAEP data." p.173	Dismissive	Difficulties associated with secondary analysis of NAEP data, chapter 9	Grading the Nation's Report Card, National Research Council, 2000	https://www.nap.edu/catalog/9751/grading-the-nations-report-card-research-from-the-evaluation-of	National Research Council funders	In their 2009 Evaluation of NAEP for the US Education Department, Buckendahl, Davis, Plake, Sireci, Hambleton, Zenisky, & Wells (pp. 77–85) managed to find quite a lot of research on making comparisons between NAEP and state assessments: several of NAEP's own publications, Chromy 2005), Chromy, Ault, Black, & Mosquin (2007), McLaughlin (2000), Schuiz & Mitzel (2005), Sireci, Robin, Meara, Rogers, & Swaminathan (2000), Stancavage, Et al (2002), Stoneberg (2007), WestEd (2002), and Wise, Le, Hoffman, & Becker (2004).
217	Herman, Joan L.		“Testing accommodations that attempt to reduce the language load of a test or otherwise compensate for students' reduced language skills (e.g., by providing students more time) are also currently being researched, but answers that are equitable and fair for all students have not yet been found.” p. 8	Dismissive	Student Assessment and Student Achievement in the California Public School System (with Brown and Baker)	CSE Technical Report 519, April 2000	https://www.cse.ucla.edu/products/reports/TECH519.pdf	Office of Research and Improvement, US Education Department
218	Herman, Joan L.		“Thus, the extent to which gains reflect real improvement in learning is an open question (see, e.g., Shepard, 1990).” p. 15	Dismissive	Student Assessment and Student Achievement in the California Public School System (with Brown and Baker)	CSE Technical Report 519, April 2000	https://www.cse.ucla.edu/products/reports/TECH519.pdf	Office of Research and Improvement, US Education Department
219	R. L. Linn		"There are many reasons for the Lake Wobegon Effect, most of which are less sinister than those emphasized by Cannell."	Denigrating	Assessments and Accountability, p.7	Educational Researcher, March, pp.4–16.	https://journals.sagepub.com/doi/abs/10.3102/0013189x029002004	Office of Research and Improvement, US Education Department	No. Cannell was exactly right. There was corruption, lax security, and cheating. See, for example, https://nonpartisaneducation.org/Review/Articles/v6n3.htm
220	Lorrie A. Shepard		"This portrayal derives mostly from research leading to Wood and Bruner’s original conception of scaffolding, from Vygotskian theory, and from naturalistic studies of effective tutoring described next. Relatively few studies have been undertaken in which explicit feedback interventions have been tried in the context of constructivist instructional settings."	Dismissive	The Role of Classroom Assessment in Teaching and Learning, p.59	CSE Technical Report 517, February 2000	https://nepc.colorado.edu/sites/default/files/publications/TECH517.pdf	Office of Research and Improvement, US Education Department
221	Lorrie A. Shepard		"The NCTM and NRC visions are idealizations based on beliefs about constructivist pedagogy and reflective practice. Although both are supported by examples of individual teachers who use assessment to improve their teaching, little is known about what kinds of support would be required to help large numbers of teachers develop these strategies or to ensure that teacher education programs prepared teachers to use assessment in these ways. Research is needed to address these basic implementation questions."	Dismissive	The Role of Classroom Assessment in Teaching and Learning, p.64	CSE Technical Report 517, February 2000	https://nepc.colorado.edu/sites/default/files/publications/TECH517.pdf	Office of Research and Improvement, US Education Department	Rubbish. Entire books dating back a century were written on the topic, for example: C.C. Ross, Measurement in Today’s Schools, 1942; G.M. Ruch, G.D. Stoddard, Tests and Measurements in High School Instruction, 1927; C.W. Odell, Educational Measurement in High School, 1930. Other testimonies to the abundance of educational testing and empirical research on test use starting in the first half of the twentieth century can be found in Lincoln & Workman 1936, 4, 7; Butts 1947, 605; Monroe 1950, 1461; Holman & Docter 1972, 34; Tyack 1974, 183; and Lohman 1997, 88.
222	Lorrie A. Shepard		"This social-constructivist view of classroom assessment is an idealization. The new ideas and perspectives underlying it have a basis in theory and empirical studies, but how they will work in practice and on a larger scale is not known."	Dismissive	The Role of Classroom Assessment in Teaching and Learning, p.67	CSE Technical Report 517, February 2000	https://nepc.colorado.edu/sites/default/files/publications/TECH517.pdf	Office of Research and Improvement, US Education Department
223	Marguerite Clarke	Madaus, Pedulla, and Shore	“The National Board believes that we must as a nation conduct research that helps testing contribute to student learning, classroom practice, and state and district management of school resources.” p. 2	Dismissive	An Agenda for Research on Educational Testing	NBETPP Statements, Vol. 1, No. 1, Jan. 2000	http://files.eric.ed.gov/fulltext/ED456137.pdf	Ford Foundation	Rubbish. Entire books dating back a century were written on the topic, for example: C.C. Ross, Measurement in Today’s Schools, 1942; G.M. Ruch, G.D. Stoddard, Tests and Measurements in High School Instruction, 1927; C.W. Odell, Educational Measurement in High School, 1930. Other testimonies to the abundance of educational testing and empirical research on test use starting in the first half of the twentieth century can be found in Lincoln & Workman 1936, 4, 7; Butts 1947, 605; Monroe 1950, 1461; Holman & Docter 1972, 34; Tyack 1974, 183; and Lohman 1997, 88.
224	Marguerite Clarke	Madaus, Pedulla, and Shore	“Validity research on teacher testing needs to address the following four issues in particular. . .” : [four bullet-point paragraphs follow] p. 3	Dismissive	An Agenda for Research on Educational Testing	NBETPP Statements, Vol. 1, No. 1, Jan. 2000	http://files.eric.ed.gov/fulltext/ED456137.pdf	Ford Foundation
225	Marguerite Clarke	Madaus, Pedulla, and Shore	“[W]e need to understand better the relationship between testing and the diversity of the college student body.” p. 6	Dismissive	An Agenda for Research on Educational Testing	NBETPP Statements, Vol. 1, No. 1, Jan. 2000	http://files.eric.ed.gov/fulltext/ED456137.pdf	Ford Foundation
226	Marguerite Clarke	Haney, Madaus	“We trust that further research will build on this good example and help all of us move from suggestive correlational studies towards more definitive conclusions.” p. 9	1stness	High Stakes Testing and High School Completion	NBETPP Statements, Volume 1, Number 3, Jan. 2000	http://files.eric.ed.gov/fulltext/ED456139.pdf	Ford Foundation	Just some of the relevant pre-2008 studies of the effects of minimum-competency or exit exams and the problems with a single passing score include those of Alvarez, Moreno, & Patrinos (2007); Grodsky & Kalogrides (2006); Audette (2005); Orlich (2003); StandardsWork (2003); Meisels, et al. (2003); Braun (2003); Rosenshine (2003); Tighe, Wang, & Foley (2002); Carnoy & Loeb (2002); Baumert & Demmrich (2001); Rosenblatt & Offer (2001); Phelps (2001); Toenjes, Dworkin, Lorence, & Hill (2000); Wenglinsky (2000); Massachusetts Finance Office (2000); DeMars (2000); Bishop (1999, 2000, 2001, & 2004); Grissmer & Flanagan(1998); Strauss, Bowes, Marks, & Plesko (1998); Frederiksen (1994); Ritchie & Thorkildsen (1994); Chao-Qun & Hui (1993); Potter & Wall (1992); Jacobson (1992); Rodgers, et al. (1991); Morris (1991); Winfield (1990); Ligon, Johnstone, Brightman, Davis, et al. (1990); Winfield (1987); Koffler (1987); Losack (1987); Marshall (1987); Hembree (1987); Mangino, Battaille, Washington, & Rumbaut (1986); Michigan Department of Education (1984); Ketchie (1984); Serow (1982); Indiana Education Department (1982); Brunton (1982); Paramore, et al. (1980); Ogden (1979); Down(2) (1979); Wellisch (1978); and Findley (1978).
227	Jay P. Heubert	Robert M. Hauser	"A growing body of research suggests that tests often do in fact change school and classroom practices (Corbett & Wilson, 1991; Madaus, 1988; Herman & Golan 1993; Smith & Rottenberg, 1991)." p.29	Dismissive	High Stakes: Testing for Tracking, Promotion, and Graduation	Board on Testing and Assessment, National Research Council, 1999	https://www.nap.edu/catalog/6336/high-stakes-testing-for-tracking-promotion-and-graduation	Ford Foundation	Rubbish. Entire books dating back a century were written on the topic, for example: C.C. Ross, Measurement in Today’s Schools, 1942; G.M. Ruch, G.D. Stoddard, Tests and Measurements in High School Instruction, 1927; C.W. Odell, Educational Measurement in High School, 1930. Other testimonies to the abundance of educational testing and empirical research on test use starting in the first half of the twentieth century can be found in Lincoln & Workman 1936, 4, 7; Butts 1947, 605; Monroe 1950, 1461; Holman & Docter 1972, 34; Tyack 1974, 183; and Lohman 1997, 88.
228	Jay P. Heubert	Robert M. Hauser	"A growing body of research suggests that tests often do in fact change school and classroom practices (Corbett & Wilson, 1991; Madaus, 1988; Herman & Golan 1993; Smith & Rottenberg, 1991)." p.29	Denigrating	High Stakes: Testing for Tracking, Promotion, and Graduation	Board on Testing and Assessment, National Research Council, 1999	https://www.nap.edu/catalog/6336/high-stakes-testing-for-tracking-promotion-and-graduation	Ford Foundation	Rubbish. Entire books dating back a century were written on the topic, for example: C.C. Ross, Measurement in Today’s Schools, 1942; G.M. Ruch, G.D. Stoddard, Tests and Measurements in High School Instruction, 1927; C.W. Odell, Educational Measurement in High School, 1930. Other testimonies to the abundance of educational testing and empirical research on test use starting in the first half of the twentieth century can be found in Lincoln & Workman 1936, 4, 7; Butts 1947, 605; Monroe 1950, 1461; Holman & Docter 1972, 34; Tyack 1974, 183; and Lohman 1997, 88.
229	Jay P. Heubert	Robert M. Hauser	"Most standards-based assessments have only recently been implemented or are still being developed. Consequently, it is too early to determine whether they will produce the intended effects on classroom instruction." p.36	Dismissive	High Stakes: Testing for Tracking, Promotion, and Graduation	Board on Testing and Assessment, National Research Council, 1999	https://www.nap.edu/catalog/6336/high-stakes-testing-for-tracking-promotion-and-graduation	Ford Foundation	Relevant pre-2000 studies of the effects of standards, alignment, goal setting, setting reachable goals, etc. include those of Mitchell (1999); Morgan & Ramist (1998); the Southern Regional Education Board (1998); Miles, Bishop, Collins, Fink, Gardner, Grant, Hussain, et al. (1997); the Florida Office of Program Policy Analysis (1997); Pomplun (1997); Schmoker (1996); Aguilera & Hendricks (1996); Banta, Lund, Black & Oblander (1996); Bottoms & Mikos (1995); Bamburg & Medina (1993); Bishop (1993); the U. S. General Accounting Office (1993); Eckstein & Noah (1993); Mattsson (1993); Brown (1992); Heyneman & Ransom (1992); Whetton (1992); Anderson, Muir, Bateson, Blackmore & Rogers (1990); Csikszentmihalyi (1990); Levine & Lezotte (1990); LaRoque & Coleman (1989); Hillocks (1987); Willingham & Morris (1986); Resnick & Resnick (1985); Ogle & Fritts (1984); Natriello & Dornbusch (1984); Brooke & Oxenham (1984); Rentz (1979); Wellisch, MacQueen, Carriere & Dick (1978); *Rosswork (1977); Estes, Colvin & Goodwin (1976); Wood (1953); and Panlasigui & Knight (1930).
230	Jay P. Heubert	Robert M. Hauser	"A recent review of the available research evidence by Mehrens (1998) reaches several interim conclusions. Drawing on eight studies...." p.36	Dismissive	High Stakes: Testing for Tracking, Promotion, and Graduation	Board on Testing and Assessment, National Research Council, 1999	https://www.nap.edu/catalog/6336/high-stakes-testing-for-tracking-promotion-and-graduation	Ford Foundation	Just some of the relevant pre-2008 studies of the effects of minimum-competency or exit exams and the problems with a single passing score include those of Alvarez, Moreno, & Patrinos (2007); Grodsky & Kalogrides (2006); Audette (2005); Orlich (2003); StandardsWork (2003); Meisels, et al. (2003); Braun (2003); Rosenshine (2003); Tighe, Wang, & Foley (2002); Carnoy & Loeb (2002); Baumert & Demmrich (2001); Rosenblatt & Offer (2001); Phelps (2001); Toenjes, Dworkin, Lorence, & Hill (2000); Wenglinsky (2000); Massachusetts Finance Office (2000); DeMars (2000); Bishop (1999, 2000, 2001, & 2004); Grissmer & Flanagan(1998); Strauss, Bowes, Marks, & Plesko (1998); Frederiksen (1994); Ritchie & Thorkildsen (1994); Chao-Qun & Hui (1993); Potter & Wall (1992); Jacobson (1992); Rodgers, et al. (1991); Morris (1991); Winfield (1990); Ligon, Johnstone, Brightman, Davis, et al. (1990); Winfield (1987); Koffler (1987); Losack (1987); Marshall (1987); Hembree (1987); Mangino, Battaille, Washington, & Rumbaut (1986); Michigan Department of Education (1984); Ketchie (1984); Serow (1982); Indiana Education Department (1982); Brunton (1982); Paramore, et al. (1980); Ogden (1979); Down(2) (1979); Wellisch (1978); and Findley (1978).
231	Jay P. Heubert	Robert M. Hauser	"Although there are no national data summarizing how local districts use standardized tests in certifying students, we do know that serveral of the largest school systems have begun to use test scores in determining grade-to-grade promotion (Chicago) or are considering doing so (New York City, Boston)." p.37	Dismissive	High Stakes: Testing for Tracking, Promotion, and Graduation	Board on Testing and Assessment, National Research Council, 1999	https://www.nap.edu/catalog/6336/high-stakes-testing-for-tracking-promotion-and-graduation	Ford Foundation	Just some of the relevant pre-2008 studies of the effects of minimum-competency or exit exams and the problems with a single passing score include those of Alvarez, Moreno, & Patrinos (2007); Grodsky & Kalogrides (2006); Audette (2005); Orlich (2003); StandardsWork (2003); Meisels, et al. (2003); Braun (2003); Rosenshine (2003); Tighe, Wang, & Foley (2002); Carnoy & Loeb (2002); Baumert & Demmrich (2001); Rosenblatt & Offer (2001); Phelps (2001); Toenjes, Dworkin, Lorence, & Hill (2000); Wenglinsky (2000); Massachusetts Finance Office (2000); DeMars (2000); Bishop (1999, 2000, 2001, & 2004); Grissmer & Flanagan(1998); Strauss, Bowes, Marks, & Plesko (1998); Frederiksen (1994); Ritchie & Thorkildsen (1994); Chao-Qun & Hui (1993); Potter & Wall (1992); Jacobson (1992); Rodgers, et al. (1991); Morris (1991); Winfield (1990); Ligon, Johnstone, Brightman, Davis, et al. (1990); Winfield (1987); Koffler (1987); Losack (1987); Marshall (1987); Hembree (1987); Mangino, Battaille, Washington, & Rumbaut (1986); Michigan Department of Education (1984); Ketchie (1984); Serow (1982); Indiana Education Department (1982); Brunton (1982); Paramore, et al. (1980); Ogden (1979); Down(2) (1979); Wellisch (1978); and Findley (1978).
232	Jay P. Heubert	Robert M. Hauser	"There is very little research that specifically addresses the consequences of graduation testing." p.172	Dismissive	High Stakes: Testing for Tracking, Promotion, and Graduation	Board on Testing and Assessment, National Research Council, 1999	https://www.nap.edu/catalog/6336/high-stakes-testing-for-tracking-promotion-and-graduation	Ford Foundation	Just some of the relevant pre-2008 studies of the effects of minimum-competency or exit exams and the problems with a single passing score include those of Alvarez, Moreno, & Patrinos (2007); Grodsky & Kalogrides (2006); Audette (2005); Orlich (2003); StandardsWork (2003); Meisels, et al. (2003); Braun (2003); Rosenshine (2003); Tighe, Wang, & Foley (2002); Carnoy & Loeb (2002); Baumert & Demmrich (2001); Rosenblatt & Offer (2001); Phelps (2001); Toenjes, Dworkin, Lorence, & Hill (2000); Wenglinsky (2000); Massachusetts Finance Office (2000); DeMars (2000); Bishop (1999, 2000, 2001, & 2004); Grissmer & Flanagan(1998); Strauss, Bowes, Marks, & Plesko (1998); Frederiksen (1994); Ritchie & Thorkildsen (1994); Chao-Qun & Hui (1993); Potter & Wall (1992); Jacobson (1992); Rodgers, et al. (1991); Morris (1991); Winfield (1990); Ligon, Johnstone, Brightman, Davis, et al. (1990); Winfield (1987); Koffler (1987); Losack (1987); Marshall (1987); Hembree (1987); Mangino, Battaille, Washington, & Rumbaut (1986); Michigan Department of Education (1984); Ketchie (1984); Serow (1982); Indiana Education Department (1982); Brunton (1982); Paramore, et al. (1980); Ogden (1979); Down(2) (1979); Wellisch (1978); and Findley (1978).
233	Jay P. Heubert	Robert M. Hauser	"Caterall adds, 'initial boasts and doubts alike regarding the effects of gatekeeping competency testing have met with a paucity of follow-up research." p.172	Dismissive	High Stakes: Testing for Tracking, Promotion, and Graduation	Board on Testing and Assessment, National Research Council, 1999	https://www.nap.edu/catalog/6336/high-stakes-testing-for-tracking-promotion-and-graduation	Ford Foundation	Just some of the relevant pre-2008 studies of the effects of minimum-competency or exit exams and the problems with a single passing score include those of Alvarez, Moreno, & Patrinos (2007); Grodsky & Kalogrides (2006); Audette (2005); Orlich (2003); StandardsWork (2003); Meisels, et al. (2003); Braun (2003); Rosenshine (2003); Tighe, Wang, & Foley (2002); Carnoy & Loeb (2002); Baumert & Demmrich (2001); Rosenblatt & Offer (2001); Phelps (2001); Toenjes, Dworkin, Lorence, & Hill (2000); Wenglinsky (2000); Massachusetts Finance Office (2000); DeMars (2000); Bishop (1999, 2000, 2001, & 2004); Grissmer & Flanagan(1998); Strauss, Bowes, Marks, & Plesko (1998); Frederiksen (1994); Ritchie & Thorkildsen (1994); Chao-Qun & Hui (1993); Potter & Wall (1992); Jacobson (1992); Rodgers, et al. (1991); Morris (1991); Winfield (1990); Ligon, Johnstone, Brightman, Davis, et al. (1990); Winfield (1987); Koffler (1987); Losack (1987); Marshall (1987); Hembree (1987); Mangino, Battaille, Washington, & Rumbaut (1986); Michigan Department of Education (1984); Ketchie (1984); Serow (1982); Indiana Education Department (1982); Brunton (1982); Paramore, et al. (1980); Ogden (1979); Down(2) (1979); Wellisch (1978); and Findley (1978).
234	Jay P. Heubert	Robert M. Hauser	"in one of the few such studies on this topic (Bishop, 1997) compared the Third International Mathematics and Science Study (TIMSS) test scores of countries with and without rigorous graduation tests. He found that countries with demanding exit exams outperformed other countries at a comparable level of development. He concluded, however that such exams were probably not the most important determinant of achievement levels and that more research was needed." p.173	Dismissive	High Stakes: Testing for Tracking, Promotion, and Graduation	Board on Testing and Assessment, National Research Council, 1999	https://www.nap.edu/catalog/6336/high-stakes-testing-for-tracking-promotion-and-graduation	Ford Foundation	Relevant pre-2000 studies of the effects of minimum-competency testing and the problems with a single passing score include those of Frederiksen (1994); Winfield (1990); Ligon, Johnstone, Brightman, Davis, et al. (1990); Losack (1987); Mangino & Babcock (1986); Serow (1982); Brunton (1982); Paramore, et al. (1980); Ogden (1979); and Findley (1978).
235	Jay P. Heubert	Robert M. Hauser	"Very little is known about the specific consequences of passing or failing a high school graduation exam." p.176	Dismissive	High Stakes: Testing for Tracking, Promotion, and Graduation	Board on Testing and Assessment, National Research Council, 1999	https://www.nap.edu/catalog/6336/high-stakes-testing-for-tracking-promotion-and-graduation	Ford Foundation	Relevant pre-2000 studies of the effects of minimum-competency testing and the problems with a single passing score include those of Frederiksen (1994); Winfield (1990); Ligon, Johnstone, Brightman, Davis, et al. (1990); Losack (1987); Mangino & Babcock (1986); Serow (1982); Brunton (1982); Paramore, et al. (1980); Ogden (1979); and Findley (1978).
236	Jay P. Heubert	Robert M. Hauser	"American experience is limited and research is needed to explore their effectiveness. For instance, we do not know how to combine advance notice of high-stakes test requirements, remedial intervention, and opportunity to retake graduation tests." p.180	Dismissive	High Stakes: Testing for Tracking, Promotion, and Graduation	Board on Testing and Assessment, National Research Council, 1999	https://www.nap.edu/catalog/6336/high-stakes-testing-for-tracking-promotion-and-graduation	Ford Foundation	Relevant pre-2000 studies of the effects of minimum-competency testing and the problems with a single passing score include those of Frederiksen (1994); Winfield (1990); Ligon, Johnstone, Brightman, Davis, et al. (1990); Losack (1987); Mangino & Babcock (1986); Serow (1982); Brunton (1982); Paramore, et al. (1980); Ogden (1979); and Findley (1978).
237	Jay P. Heubert	Robert M. Hauser	"Research is also needed to explore the effects of different kinds of high school credentials on employment and other post-school outcomes." p.180	Dismissive	High Stakes: Testing for Tracking, Promotion, and Graduation	Board on Testing and Assessment, National Research Council, 1999	https://www.nap.edu/catalog/6336/high-stakes-testing-for-tracking-promotion-and-graduation	Ford Foundation
238	Jay P. Heubert	Robert M. Hauser	"At the same time, solid evaluation research on the most effective remedial approaches is sparse." p.183	Denigrating	High Stakes: Testing for Tracking, Promotion, and Graduation	Board on Testing and Assessment, National Research Council, 1999	https://www.nap.edu/catalog/6336/high-stakes-testing-for-tracking-promotion-and-graduation	Ford Foundation	Developmental (i.e., remedial) education researchers have conducted many studies to determine what works best to keep students from failing in their “courses of last resort,” after which there are no alternatives. Researchers have included Boylan, Roueche, McCabe, Wheeler, Kulik, Bonham, Claxton, Bliss, Schonecker, Chen, Chang, and Kirk.
239	Jay P. Heubert	Robert M. Hauser	"There is plainly a need for good research on effective remedial eduation." p.183	Denigrating	High Stakes: Testing for Tracking, Promotion, and Graduation	Board on Testing and Assessment, National Research Council, 1999	https://www.nap.edu/catalog/6336/high-stakes-testing-for-tracking-promotion-and-graduation	Ford Foundation	Developmental (i.e., remedial) education researchers have conducted many studies to determine what works best to keep students from failing in their “courses of last resort,” after which there are no alternatives. Researchers have included Boylan, Roueche, McCabe, Wheeler, Kulik, Bonham, Claxton, Bliss, Schonecker, Chen, Chang, and Kirk.
240	Jay P. Heubert	Robert M. Hauser	"However, in most of the nation, much needs to be done before a world-class curriculum and world-class instruction will be in place." p.277	Dismissive	High Stakes: Testing for Tracking, Promotion, and Graduation	Board on Testing and Assessment, National Research Council, 1999	https://www.nap.edu/catalog/6336/high-stakes-testing-for-tracking-promotion-and-graduation	Ford Foundation
241	Jay P. Heubert	Robert M. Hauser	"The committee sees a strong need for better evidence on the benefits and costs of high-stakes testing." p.281	Denigrating	High Stakes: Testing for Tracking, Promotion, and Graduation	Board on Testing and Assessment, National Research Council, 1999	https://www.nap.edu/catalog/6336/high-stakes-testing-for-tracking-promotion-and-graduation	Ford Foundation	No. See, for example, Phelps, R.P. (2000, Winter). Estimating the cost of systemwide student testing in the United States. Journal of Education Finance, 25(3) 343–380; Danitz, T. (2001, February 27). Special report: States pay $400 million for tests in 2001. Stateline.org. Pew Center for the States; Hoxby, C.M. (2002). The cost of accountability, in W. M Evers & H.J. Walberg (Eds.), School Accountability, Stanford, CA: Hoover Institution Press; U.S. GAO. (1993, January). Student testing: Current extent and expenditures, with cost estimates for a national examination. GAO/PEMD-93-8. Washington, DC: US General Accounting Office; Phelps, R.P. (1998). Benefit-cost analysis of systemwide student testing, Paper presented at the annual meeting of the American Education Finance Association, Mobile, AL.
242	Jay P. Heubert	Robert M. Hauser	"Very little is known about the specific consequences of passing or failing a high school graduation exam." p.288	Dismissive	High Stakes: Testing for Tracking, Promotion, and Graduation	Board on Testing and Assessment, National Research Council, 1999	https://www.nap.edu/catalog/6336/high-stakes-testing-for-tracking-promotion-and-graduation	Ford Foundation	The many studies of district and state minimum competency or diploma testing programs popular from the 1960s through the 1980s found positive effects for students just below the cut score and mixed effects for students far below and anywhere above. Researchers have included Fincher, Jackson, Battiste, Corcoran, Jacobsen, Tanner, Boylan, Saxon, Anderson, Muir, Bateson, Blackmore, Rogers, Zigarelli, Schafer, Hultgren, Hawley, Abrams, Seubert, Mazzoni, Brookhart, Mendro, Herrick, Webster, Orsack, Weerasinghe, and Bembry
243	Jay P. Heubert	Robert M. Hauser	"At present, however, advanced skills are often not well defined and ways of assessing them are not well established." p.289	Denigrating	High Stakes: Testing for Tracking, Promotion, and Graduation	Board on Testing and Assessment, National Research Council, 1999	https://www.nap.edu/catalog/6336/high-stakes-testing-for-tracking-promotion-and-graduation	Ford Foundation
244	Jay P. Heubert	Robert M. Hauser	"...in many cases, the demands that full participation of these students [i.e., students with disabilities] place on assessment systems are greater than current assessment knowledge and technology can support." p.191	Dismissive	High Stakes: Testing for Tracking, Promotion, and Graduation	Board on Testing and Assessment, National Research Council, 1999	https://www.nap.edu/catalog/6336/high-stakes-testing-for-tracking-promotion-and-graduation	Ford Foundation
245	Jay P. Heubert	Robert M. Hauser	"...available evidence about the possible effects of graduation tests on learning and on high school dropout is inconclusive (e.g., Kreitzer et al., 1989, Reardon, 1996; Catterall, 1990; Cawthorne, 1990; Bishop, 1997).	Dismissive	High Stakes: Testing for Tracking, Promotion, and Graduation	Board on Testing and Assessment, National Research Council, 1999	https://www.nap.edu/catalog/6336/high-stakes-testing-for-tracking-promotion-and-graduation	Ford Foundation	The many studies of district and state minimum competency or diploma testing programs popular from the 1960s through the 1980s found positive effects for students just below the cut score and mixed effects for students far below and anywhere above. Researchers have included Fincher, Jackson, Battiste, Corcoran, Jacobsen, Tanner, Boylan, Saxon, Anderson, Muir, Bateson, Blackmore, Rogers, Zigarelli, Schafer, Hultgren, Hawley, Abrams, Seubert, Mazzoni, Brookhart, Mendro, Herrick, Webster, Orsack, Weerasinghe, and Bembry
246	Jay P. Heubert	Robert M. Hauser	"We do not know how to combine advance notice of high-stakes test requirements, remedial intervention, and opportunity to retake graduation tests. Research is also needed to explore the effects of different kinds of high school credentials on employment and other post-school outcomes." p.289	Dismissive	High Stakes: Testing for Tracking, Promotion, and Graduation	Board on Testing and Assessment, National Research Council, 1999	https://www.nap.edu/catalog/6336/high-stakes-testing-for-tracking-promotion-and-graduation	Ford Foundation	The many studies of district and state minimum competency or diploma testing programs popular from the 1960s through the 1980s found positive effects for students just below the cut score and mixed effects for students far below and anywhere above. Researchers have included Fincher, Jackson, Battiste, Corcoran, Jacobsen, Tanner, Boylan, Saxon, Anderson, Muir, Bateson, Blackmore, Rogers, Zigarelli, Schafer, Hultgren, Hawley, Abrams, Seubert, Mazzoni, Brookhart, Mendro, Herrick, Webster, Orsack, Weerasinghe, and Bembry
247	Richard F. Elmore, Robert Rothman, Eds.	Eva L. Baker, Lauren B. Resnick, Robert L. Linn, Lorraine McDonnel, Lauress L. Wise, Michael Feuer, et al.	"But the practical nature of our charge and the limits of the evidence available to us have meant that we have also had to draw on the practical experience of committee members and outside experts in crafting our advice. Hence, this report relies heavily on expert advice from the field, in addition to scientific research." p. vii	Dismissive	Testing, Teaching, and Learning: A Guide forStates and School Districts, 1999	Committee on Title I Testing and Assessment, Board on Testing and Assessment, National Research Council		Pew Charitable Trusts, Spencer Foundation, William T. Grant Foundation
248	Richard F. Elmore, Robert Rothman, Eds.	Eva L. Baker, Lauren B. Resnick, Robert L. Linn, Lorraine McDonnel, Lauress L. Wise, Michael Feuer, et al.	"we reviewed available evidence from research on assessment, accountability, and standards-based reform. However, we recognized that in many areas the evidentiary base was slim." p.11	Dismissive	Testing, Teaching, and Learning: A Guide forStates and School Districts, 1999	Committee on Title I Testing and Assessment, Board on Testing and Assessment, National Research Council		Pew Charitable Trusts, Spencer Foundation, William T. Grant Foundation
249	Richard F. Elmore, Robert Rothman, Eds.	Eva L. Baker, Lauren B. Resnick, Robert L. Linn, Lorraine McDonnel, Lauress L. Wise, Michael Feuer, et al.	"Standards-based reform is a new idea, and few places have put all the pieces in place, and even fewer have put them in place long enough to enable scholars to observe their effects." p.11	1stness	Testing, Teaching, and Learning: A Guide forStates and School Districts, 1999	Committee on Title I Testing and Assessment, Board on Testing and Assessment, National Research Council		Pew Charitable Trusts, Spencer Foundation, William T. Grant Foundation
250	Richard F. Elmore, Robert Rothman, Eds.	Eva L. Baker, Lauren B. Resnick, Robert L. Linn, Lorraine McDonnel, Lauress L. Wise, Michael Feuer, et al.	"Yet despite the prominence of standards-based reform in the policy debate, there are few examples of districts or states that have put the entire standards-based puzzle together, much less achieved success through it. Some evidence is beginning to gather." p.16	Dismissive	Testing, Teaching, and Learning: A Guide forStates and School Districts, 1999	Committee on Title I Testing and Assessment, Board on Testing and Assessment, National Research Council		Pew Charitable Trusts, Spencer Foundation, William T. Grant Foundation
251	Richard F. Elmore, Robert Rothman, Eds.	Eva L. Baker, Lauren B. Resnick, Robert L. Linn, Lorraine McDonnel, Lauress L. Wise, Michael Feuer, et al.	"In large part, the limited body of evidence in this country reflects the complexity of the concept." p.16	Dismissive	Testing, Teaching, and Learning: A Guide forStates and School Districts, 1999	Committee on Title I Testing and Assessment, Board on Testing and Assessment, National Research Council		Pew Charitable Trusts, Spencer Foundation, William T. Grant Foundation
252	Richard F. Elmore, Robert Rothman, Eds.	Eva L. Baker, Lauren B. Resnick, Robert L. Linn, Lorraine McDonnel, Lauress L. Wise, Michael Feuer, et al.	"Despite the common use of such accommodations, however, there is little research on their effects on the validity of test score information, and most of the research has examined college admission tests and other postsecondary measures, not achievement tests in elementary and secondary schools (National Research Council, 1997a)." p.57	Dismissive	Testing, Teaching, and Learning: A Guide forStates and School Districts, 1999	Committee on Title I Testing and Assessment, Board on Testing and Assessment, National Research Council		Pew Charitable Trusts, Spencer Foundation, William T. Grant Foundation
253	Richard F. Elmore, Robert Rothman, Eds.	Eva L. Baker, Lauren B. Resnick, Robert L. Linn, Lorraine McDonnel, Lauress L. Wise, Michael Feuer, et al.	"Because of the paucity of research, questions remain about whether test results from assessments using accommodations represent valid and reliable indicators of what students with disabilities know and are able to do (Koretz, 1997)." p.57	Dismissive	Testing, Teaching, and Learning: A Guide forStates and School Districts, 1999	Committee on Title I Testing and Assessment, Board on Testing and Assessment, National Research Council		Pew Charitable Trusts, Spencer Foundation, William T. Grant Foundation
254	Richard F. Elmore, Robert Rothman, Eds.	Eva L. Baker, Lauren B. Resnick, Robert L. Linn, Lorraine McDonnel, Lauress L. Wise, Michael Feuer, et al.	"As with accommodations for students with disabilities, the research on the effects of test accommodations for English-language learners is inconclusive." p.62	Dismissive	Testing, Teaching, and Learning: A Guide forStates and School Districts, 1999	Committee on Title I Testing and Assessment, Board on Testing and Assessment, National Research Council		Pew Charitable Trusts, Spencer Foundation, William T. Grant Foundation
255	Richard F. Elmore, Robert Rothman, Eds.	Eva L. Baker, Lauren B. Resnick, Robert L. Linn, Lorraine McDonnel, Lauress L. Wise, Michael Feuer, et al.	"The small body of research that has examined classrooms in depth suggests that such instructional practices may be rare, even among teachers who say they endorse the changes the standards are intended to foster." p.75	Dismissive	Testing, Teaching, and Learning: A Guide forStates and School Districts, 1999	Committee on Title I Testing and Assessment, Board on Testing and Assessment, National Research Council		Pew Charitable Trusts, Spencer Foundation, William T. Grant Foundation
256	Richard F. Elmore, Robert Rothman, Eds.	Eva L. Baker, Lauren B. Resnick, Robert L. Linn, Lorraine McDonnel, Lauress L. Wise, Michael Feuer, et al.	"Districts' capacity to monitor the conditions of instruction in schools is limited, and there are few examples of districts that have been shown to be effective in analyzing such conditions and using the data to improve instruction. The research base on such efforts is slim, in large part because there are so few examples to study." p.76	Dismissive	Testing, Teaching, and Learning: A Guide forStates and School Districts, 1999	Committee on Title I Testing and Assessment, Board on Testing and Assessment, National Research Council		Pew Charitable Trusts, Spencer Foundation, William T. Grant Foundation
257	Robert L. Linn		"Two obvious, but frequently ignored, cautions [from the TIERS experience] are these: . . . " p. 6	Denigrating	Assessments and Accountability	CSE Technical Report 490 (November 1998)	http://www.cse.ucla.edu/products/Reports/TECH490.pdf	Office of Research and Improvement, US Education Department
258	Robert L. Linn		"Moreover, it is critical to recognize first that the choice of constructs matters, and so does the way in which measures are developed and linked to the constructs. Although these two points may be considered obvious, they are too often ignored." p. 13	Denigrating	Assessments and Accountability	CSE Technical Report 490 (November 1998)	http://www.cse.ucla.edu/products/Reports/TECH490.pdf	Office of Research and Improvement, US Education Department
259	Robert L. Linn		“Although that claim is subject to debate, it seldom even gets considered when aggregate results are used either to monitor progress (e.g., NAEP) or for purposes of school, district, or state accountability.” p. 16	Dismissive	Assessments and Accountability	CSE Technical Report 490 (November 1998)	http://www.cse.ucla.edu/products/Reports/TECH490.pdf	Office of Research and Improvement, US Education Department
260	Lawrence O. Picus	Alisha Tralli	"What is surprising is, given the tremendous emphasis placed on assessment systems to measure school accountability, the relatively minuscule portion of educational expenditures devoted to this important and highly visible component of the educational system." p.66	Dismissive	Alternative assessment programs: What are the true costs?	CSE Technical Report 441, February 1998	https://cresst.org/publications/cresst-publication-2813/?_sf_s=441	Office of Research and Improvement, US Education Department	The taxpayers ponied up big time to fund the GAO study, which Picus has spent his whole career misrepresenting, demeaning, or dismissing. By 1998, it is simply not believable that his continuing efforts stem from honest misunderstanding. He is deliberately misrepresenting previous research on the topic in order to advance his own work and career.
261	Lawrence O. Picus	Alisha Tralli	"In all of these analyses, except the GAO report, the cost estimates are based on the direct costs of the assessment program. The GAO is the only other organization we are aware of that has attempted to estimate the opportunity costs of personnel time, in attempting to determine the full costs of assessment programs. The GAO study, however, did not focus specifically on state assessment programs that included portfolios, an important factor in the higher cost estimates identified in the present study." p.64	Denigrating	Alternative assessment programs: What are the true costs?	CSE Technical Report 441, February 1998	https://cresst.org/publications/cresst-publication-2813/?_sf_s=441	Office of Research and Improvement, US Education Department	The previous 63 pages of the Picus and Tralli report claimed: theirs was the first study to look at opportunity costs and all previous studies were "just expenditure studies" that ignored "true" opportunity costs. Then, here, on page 64, they finally admit something a bit truthful about the earlier and vastly better GAO report, but also immediately attempt to demain it, because it did not estimate the costs of Vermont's doomed portfolio program, which did not exist when the GAO did its study.
262	Lawrence O. Picus	Alisha Tralli	"Costs and expenditures are not synonymous terms. Monk (1995) distinguishes between these two terms. Costs are “measures of what must be foregone to realize some benefit,” while expenditures are “measures of resource flows regardless of their consequence” (p. 365). Expenditures are generally easier to track since accounting systems typically report resource flows by object, e.g., instruction, administration, transportation. Typically, most cost analyses in education focus on these measurable expenditures and ignore the more difficult measures of opportunity. The goal of this report is to move one step beyond past work and estimate these economic costs as well." p.5	Denigrating	Alternative assessment programs: What are the true costs?	CSE Technical Report 441, February 1998	https://cresst.org/publications/cresst-publication-2813/?_sf_s=441	Office of Research and Improvement, US Education Department	No. Picus & Tralli neither did the first study of opportunity costs, nor the first study of opportunity costs in those two states. The 1993 GAO study did both. See, for example, Phelps, R.P. (2000, Winter). Estimating the cost of systemwide student testing in the United States. Journal of Education Finance, 25(3) 343–380; Danitz, T. (2001, February 27). Special report: States pay $400 million for tests in 2001. Stateline.org. Pew Center for the States; Hoxby, C.M. (2002). The cost of accountability, in W. M Evers & H.J. Walberg (Eds.), School Accountability, Stanford, CA: Hoover Institution Press; U.S. GAO. (1993, January). Student testing: Current extent and expenditures, with cost estimates for a national examination. GAO/PEMD-93-8. Washington, DC: US General Accounting Office; Phelps, R.P. (1998). Benefit-cost analysis of systemwide student testing, Paper presented at the annual meeting of the American Education Finance Association, Mobile, AL.
263	Lawrence O. Picus	Alisha Tralli	"Although several states have implemented new assessment programs, there has been little research on the costs of developing and implementing these new systems." p.4	Dismissive	Alternative assessment programs: What are the true costs?	CSE Technical Report 441, February 1998	https://cresst.org/publications/cresst-publication-2813/?_sf_s=441	Office of Research and Improvement, US Education Department	No. Picus & Tralli neither did the first study of opportunity costs, nor the first study of opportunity costs in those two states. The 1993 GAO study did both. See, for example, Phelps, R.P. (2000, Winter). Estimating the cost of systemwide student testing in the United States. Journal of Education Finance, 25(3) 343–380; Danitz, T. (2001, February 27). Special report: States pay $400 million for tests in 2001. Stateline.org. Pew Center for the States; Hoxby, C.M. (2002). The cost of accountability, in W. M Evers & H.J. Walberg (Eds.), School Accountability, Stanford, CA: Hoover Institution Press; U.S. GAO. (1993, January). Student testing: Current extent and expenditures, with cost estimates for a national examination. GAO/PEMD-93-8. Washington, DC: US General Accounting Office; Phelps, R.P. (1998). Benefit-cost analysis of systemwide student testing, Paper presented at the annual meeting of the American Education Finance Association, Mobile, AL.
264	Lawrence O. Picus	Alisha Tralli	"The purpose of this report is to provide a first detailed analysis of the “economic” or opportunity costs of the testing systems in two states, Kentucky and Vermont." p.2	1stness	Alternative assessment programs: What are the true costs?	CSE Technical Report 441, February 1998	https://cresst.org/publications/cresst-publication-2813/?_sf_s=441	Office of Research and Improvement, US Education Department	No. Picus & Tralli neither did the first study of opportunity costs, nor the first study of opportunity costs in those two states. The 1993 GAO study did both. See, for example, Phelps, R.P. (2000, Winter). Estimating the cost of systemwide student testing in the United States. Journal of Education Finance, 25(3) 343–380; Danitz, T. (2001, February 27). Special report: States pay $400 million for tests in 2001. Stateline.org. Pew Center for the States; Hoxby, C.M. (2002). The cost of accountability, in W. M Evers & H.J. Walberg (Eds.), School Accountability, Stanford, CA: Hoover Institution Press; U.S. GAO. (1993, January). Student testing: Current extent and expenditures, with cost estimates for a national examination. GAO/PEMD-93-8. Washington, DC: US General Accounting Office; Phelps, R.P. (1998). Benefit-cost analysis of systemwide student testing, Paper presented at the annual meeting of the American Education Finance Association, Mobile, AL.
265	Anne Lewis	quoting Arnold Fege, National PTA	"The national testing proposal is based on 'quantum leap' theories, not on research, contended Arnold Fege of the National PTA. 'As I listened to the presentations this morning,’ he said, ‘I didn't hear about any research that backs up the introduction of national testing.’ In his opinion, ‘no parent in the country is losing sleep because his or her child is not meeting NAEP standards,’ and even though testing is pervasive in American education, it seems to not have made a big impact on change."	Dismissive	Assessing Student Achievement: Search for Validity and Balance	CSE Technical Report 481 (1997)	https://cresst.org/wp-content/uploads/TECH481.pdf	Office of Research and Improvement, US Education Department	In their 2009 Evaluation of NAEP for the US Education Department, Buckendahl, Davis, Plake, Sireci, Hambleton, Zenisky, & Wells (pp. 77–85) managed to find quite a lot of research on making comparisons between NAEP and state assessments: several of NAEP's own publications, Chromy 2005), Chromy, Ault, Black, & Mosquin (2007), McLaughlin (2000), Schuiz & Mitzel (2005), Sireci, Robin, Meara, Rogers, & Swaminathan (2000), Stancavage, Et al (2002), Stoneberg (2007), WestEd (2002), and Wise, Le, Hoffman, & Becker (2004).
266	Eva L. Baker	Zenaida Aguirre-Munoz	"The extent and nature of the impact of language skills on performance assessments remains elusive due to the paucity of research in this area."	Dismissive	Improving the equity and validity of assessment-based information systems, p.3	CSE Technical Report 462, December 1997	https://cresst.org/wp-content/uploads/TECH462.pdf	Office of Research and Improvement, US Education Department
267	Joan L. Herman		"Although conceptual models for analyzing the cost of alternative assessment and for conducting cost-benefit analyses have been formulated (Catterall & Winters, 1994; Picus, 1994), definitive cost studies are yet to be completed (see, however, Picus & Tralli, forthcoming)." p. 30	Dismissive, Denigrating	Large-Scale Assessment in Support of School Reform: Lessons in the Search for Alternative Measures	CSE Technical Report 446, Oct. 1997	http://www.cse.ucla.edu/products/reports/TECH446.pdf	Office of Research and Improvement, US Education Department	No. See, for example, Phelps, R.P. (2000, Winter). Estimating the cost of systemwide student testing in the United States. Journal of Education Finance, 25(3) 343–380; Danitz, T. (2001, February 27). Special report: States pay $400 million for tests in 2001. Stateline.org. Pew Center for the States; Hoxby, C.M. (2002). The cost of accountability, in W. M Evers & H.J. Walberg (Eds.), School Accountability, Stanford, CA: Hoover Institution Press; U.S. GAO. (1993, January). Student testing: Current extent and expenditures, with cost estimates for a national examination. GAO/PEMD-93-8. Washington, DC: US General Accounting Office; Phelps, R.P. (1998). Benefit-cost analysis of systemwide student testing, Paper presented at the annual meeting of the American Education Finance Association, Mobile, AL.
268	Robert L. Linn	Eva L. Baker	"“Very little research has been conducted to validate performance standards, particularly those that include specification of student response attributes.” pp. 26-27	Dismissive	Emerging Educational Standards of Performance in the United States	CSE Technical Report 437 (August 1997)	http://www.cse.ucla.edu/products/reports/TECH437.pdf	Office of Research and Improvement, US Education Department	Relevant pre-2000 studies of the effects of standards, alignment, goal setting, setting reachable goals, etc. include those of Mitchell (1999); Morgan & Ramist (1998); the Southern Regional Education Board (1998); Miles, Bishop, Collins, Fink, Gardner, Grant, Hussain, et al. (1997); the Florida Office of Program Policy Analysis (1997); Pomplun (1997); Schmoker (1996); Aguilera & Hendricks (1996); Banta, Lund, Black & Oblander (1996); Bottoms & Mikos (1995); Bamburg & Medina (1993); Bishop (1993); the U. S. General Accounting Office (1993); Eckstein & Noah (1993); Mattsson (1993); Brown (1992); Heyneman & Ransom (1992); Whetton (1992); Anderson, Muir, Bateson, Blackmore & Rogers (1990); Csikszentmihalyi (1990); Levine & Lezotte (1990); LaRoque & Coleman (1989); Hillocks (1987); Willingham & Morris (1986); Resnick & Resnick (1985); Ogle & Fritts (1984); Natriello & Dornbusch (1984); Brooke & Oxenham (1984); Rentz (1979); Wellisch, MacQueen, Carriere & Dick (1978); *Rosswork (1977); Estes, Colvin & Goodwin (1976); Wood (1953); and Panlasigui & Knight (1930).
269	Harold F. O'Neil, Jr.	Brenda Sugrue, Jamal Abedi, Eva L. Baker, Shari Golan	"However, as d'Ydewalle (1987) has pointed out, 'clear-cut results from neat experiments on the impact of motivation on learning [or performance] do not exist.'"	Dismissive	Final Report of Experimental Studies on Motivation and NAEP Test Performance, p.5	CSE Technical Report 427, June 1997	https://cresst.org/wp-content/uploads/TECH427.pdf	Office of Research and Improvement, US Education Department	Relevant studies of the effects of tests and/or accountability program on motivation and instructional practice include those of the Southern Regional Education Board (1998); Johnson (1998); Schafer, Hultgren, Hawley, Abrams Seubert & Mazzoni (1997); Miles, Bishop, Collins, Fink, Gardner, Grant, Hussain, et al. (1997); Tuckman & Trimble (1997); Clarke & Stephens (1996); Zigarelli (1996); Stevenson, Lee, et al. (1995); Waters, Burger & Burger (1995); Egeland (1995); Prais (1995); Tuckman (1994); Ritchie & Thorkildsen (1994); Brown & Walberg, (1993); Wall & Alderson (1993); Wolf & Rapiau (1993); Eckstein & Noah (1993); Chao-Qun & Hui (1993); Plazak & Mazur (1992); Steedman (1992); Singh, Marimutha & Mukjerjee (1990); Levine & Lezotte (1990); O’Sullivan (1989); Somerset (1988); Pennycuick & Murphy (1988); Stevens (1984); Marsh (1984); Brunton (1982); Solberg (1977); Foss (1977); Kirkland (1971); Somerset (1968); Stuit (1947); and Keys (1934). Covers many studies; study is a research review, research synthesis, or meta-analysis.	"Others have considered the role of tests in incentive programs. These researchers have included Homme, Csanyi, Gonzales, Rechs, O’Leary, Drabman, Kaszdin, Bootzin, Staats, Cameron, Pierce, McMillan, Corcoran, and Wilson. International organizations, such as the World Bank or the Asian Development Bank, have studied the effects of testing on education programs they sponsor. Researchers have included Somerset, Heynemann, Ransom, Psacharopoulis, Velez, Brooke, Oxenham, Bude, Chapman, Snyder, and Pronaratna. Moreover, the mastery learning/mastery testing experiments conducted from the 1960s through today varied incentives, frequency of tests, types of tests, and many other factors to determine the optimal structure of testing programs. Researchers included such notables as Bloom, Carroll, Keller, Block, Burns, Wentling, Anderson, Hymel, Kulik, Tierney, Cross, Okey, Guskey, Gates, and Jones."
270	Harold F. O'Neil, Jr.	Brenda Sugrue, Jamal Abedi, Eva L. Baker, Shari Golan	"In the educational context, most existing studies have focused on the influence of characteristics of the classroom learning environment, such as rewards, teacher feedback, goal structures, evaluation practices, on either the entecedents of consequences of motivation."	Dismissive	Final Report of Experimental Studies on Motivation and NAEP Test Performance, p.5	CSE Technical Report 427, June 1997	https://cresst.org/wp-content/uploads/TECH427.pdf	Office of Research and Improvement, US Education Department	Relevant studies of the effects of tests and/or accountability program on motivation and instructional practice include those of the Southern Regional Education Board (1998); Johnson (1998); Schafer, Hultgren, Hawley, Abrams Seubert & Mazzoni (1997); Miles, Bishop, Collins, Fink, Gardner, Grant, Hussain, et al. (1997); Tuckman & Trimble (1997); Clarke & Stephens (1996); Zigarelli (1996); Stevenson, Lee, et al. (1995); Waters, Burger & Burger (1995); Egeland (1995); Prais (1995); Tuckman (1994); Ritchie & Thorkildsen (1994); Brown & Walberg, (1993); Wall & Alderson (1993); Wolf & Rapiau (1993); Eckstein & Noah (1993); Chao-Qun & Hui (1993); Plazak & Mazur (1992); Steedman (1992); Singh, Marimutha & Mukjerjee (1990); Levine & Lezotte (1990); O’Sullivan (1989); Somerset (1988); Pennycuick & Murphy (1988); Stevens (1984); Marsh (1984); Brunton (1982); Solberg (1977); Foss (1977); Kirkland (1971); Somerset (1968); Stuit (1947); and Keys (1934). Covers many studies; study is a research review, research synthesis, or meta-analysis.	"Others have considered the role of tests in incentive programs. These researchers have included Homme, Csanyi, Gonzales, Rechs, O’Leary, Drabman, Kaszdin, Bootzin, Staats, Cameron, Pierce, McMillan, Corcoran, and Wilson. International organizations, such as the World Bank or the Asian Development Bank, have studied the effects of testing on education programs they sponsor. Researchers have included Somerset, Heynemann, Ransom, Psacharopoulis, Velez, Brooke, Oxenham, Bude, Chapman, Snyder, and Pronaratna. Moreover, the mastery learning/mastery testing experiments conducted from the 1960s through today varied incentives, frequency of tests, types of tests, and many other factors to determine the optimal structure of testing programs. Researchers included such notables as Bloom, Carroll, Keller, Block, Burns, Wentling, Anderson, Hymel, Kulik, Tierney, Cross, Okey, Guskey, Gates, and Jones."
271	Harold F. O'Neil, Jr.	Brenda Sugrue, Jamal Abedi, Eva L. Baker, Shari Golan	"Most of the studies that have compared goal orientations have examined their effects on performance during classroom learning activities rather than at the time of test taking."	Dismissive	Final Report of Experimental Studies on Motivation and NAEP Test Performance, p.7	CSE Technical Report 427, June 1997	https://cresst.org/wp-content/uploads/TECH427.pdf	Office of Research and Improvement, US Education Department	Relevant studies of the effects of tests and/or accountability program on motivation and instructional practice include those of the Southern Regional Education Board (1998); Johnson (1998); Schafer, Hultgren, Hawley, Abrams Seubert & Mazzoni (1997); Miles, Bishop, Collins, Fink, Gardner, Grant, Hussain, et al. (1997); Tuckman & Trimble (1997); Clarke & Stephens (1996); Zigarelli (1996); Stevenson, Lee, et al. (1995); Waters, Burger & Burger (1995); Egeland (1995); Prais (1995); Tuckman (1994); Ritchie & Thorkildsen (1994); Brown & Walberg, (1993); Wall & Alderson (1993); Wolf & Rapiau (1993); Eckstein & Noah (1993); Chao-Qun & Hui (1993); Plazak & Mazur (1992); Steedman (1992); Singh, Marimutha & Mukjerjee (1990); Levine & Lezotte (1990); O’Sullivan (1989); Somerset (1988); Pennycuick & Murphy (1988); Stevens (1984); Marsh (1984); Brunton (1982); Solberg (1977); Foss (1977); Kirkland (1971); Somerset (1968); Stuit (1947); and Keys (1934). Covers many studies; study is a research review, research synthesis, or meta-analysis.	"Others have considered the role of tests in incentive programs. These researchers have included Homme, Csanyi, Gonzales, Rechs, O’Leary, Drabman, Kaszdin, Bootzin, Staats, Cameron, Pierce, McMillan, Corcoran, and Wilson. International organizations, such as the World Bank or the Asian Development Bank, have studied the effects of testing on education programs they sponsor. Researchers have included Somerset, Heynemann, Ransom, Psacharopoulis, Velez, Brooke, Oxenham, Bude, Chapman, Snyder, and Pronaratna. Moreover, the mastery learning/mastery testing experiments conducted from the 1960s through today varied incentives, frequency of tests, types of tests, and many other factors to determine the optimal structure of testing programs. Researchers included such notables as Bloom, Carroll, Keller, Block, Burns, Wentling, Anderson, Hymel, Kulik, Tierney, Cross, Okey, Guskey, Gates, and Jones."
272	Harold F. O'Neil, Jr.	Brenda Sugrue, Jamal Abedi, Eva L. Baker, Shari Golan	"As yet, there appear to be no published studies that investigate the direct and indirect causal paths from motivational antecedents through use of metacognitive strategies to achievement."	Dismissive	Final Report of Experimental Studies on Motivation and NAEP Test Performance, p.8	CSE Technical Report 427, June 1997	https://cresst.org/wp-content/uploads/TECH427.pdf	Office of Research and Improvement, US Education Department	Relevant studies of the effects of tests and/or accountability program on motivation and instructional practice include those of the Southern Regional Education Board (1998); Johnson (1998); Schafer, Hultgren, Hawley, Abrams Seubert & Mazzoni (1997); Miles, Bishop, Collins, Fink, Gardner, Grant, Hussain, et al. (1997); Tuckman & Trimble (1997); Clarke & Stephens (1996); Zigarelli (1996); Stevenson, Lee, et al. (1995); Waters, Burger & Burger (1995); Egeland (1995); Prais (1995); Tuckman (1994); Ritchie & Thorkildsen (1994); Brown & Walberg, (1993); Wall & Alderson (1993); Wolf & Rapiau (1993); Eckstein & Noah (1993); Chao-Qun & Hui (1993); Plazak & Mazur (1992); Steedman (1992); Singh, Marimutha & Mukjerjee (1990); Levine & Lezotte (1990); O’Sullivan (1989); Somerset (1988); Pennycuick & Murphy (1988); Stevens (1984); Marsh (1984); Brunton (1982); Solberg (1977); Foss (1977); Kirkland (1971); Somerset (1968); Stuit (1947); and Keys (1934). Covers many studies; study is a research review, research synthesis, or meta-analysis.	"Others have considered the role of tests in incentive programs. These researchers have included Homme, Csanyi, Gonzales, Rechs, O’Leary, Drabman, Kaszdin, Bootzin, Staats, Cameron, Pierce, McMillan, Corcoran, and Wilson. International organizations, such as the World Bank or the Asian Development Bank, have studied the effects of testing on education programs they sponsor. Researchers have included Somerset, Heynemann, Ransom, Psacharopoulis, Velez, Brooke, Oxenham, Bude, Chapman, Snyder, and Pronaratna. Moreover, the mastery learning/mastery testing experiments conducted from the 1960s through today varied incentives, frequency of tests, types of tests, and many other factors to determine the optimal structure of testing programs. Researchers included such notables as Bloom, Carroll, Keller, Block, Burns, Wentling, Anderson, Hymel, Kulik, Tierney, Cross, Okey, Guskey, Gates, and Jones."
273	Harold F. O'Neil, Jr.	Brenda Sugrue, Jamal Abedi, Eva L. Baker, Shari Golan	"In general, there is a need for more studies to focus on the effects on test performance of motivational antecedents (not just anxiety) introduced at the time of test taking."	Dismissive	Final Report of Experimental Studies on Motivation and NAEP Test Performance, p.10	CSE Technical Report 427, June 1997	https://cresst.org/wp-content/uploads/TECH427.pdf	Office of Research and Improvement, US Education Department	Relevant studies of the effects of tests and/or accountability program on motivation and instructional practice include those of the Southern Regional Education Board (1998); Johnson (1998); Schafer, Hultgren, Hawley, Abrams Seubert & Mazzoni (1997); Miles, Bishop, Collins, Fink, Gardner, Grant, Hussain, et al. (1997); Tuckman & Trimble (1997); Clarke & Stephens (1996); Zigarelli (1996); Stevenson, Lee, et al. (1995); Waters, Burger & Burger (1995); Egeland (1995); Prais (1995); Tuckman (1994); Ritchie & Thorkildsen (1994); Brown & Walberg, (1993); Wall & Alderson (1993); Wolf & Rapiau (1993); Eckstein & Noah (1993); Chao-Qun & Hui (1993); Plazak & Mazur (1992); Steedman (1992); Singh, Marimutha & Mukjerjee (1990); Levine & Lezotte (1990); O’Sullivan (1989); Somerset (1988); Pennycuick & Murphy (1988); Stevens (1984); Marsh (1984); Brunton (1982); Solberg (1977); Foss (1977); Kirkland (1971); Somerset (1968); Stuit (1947); and Keys (1934). Covers many studies; study is a research review, research synthesis, or meta-analysis.	"Others have considered the role of tests in incentive programs. These researchers have included Homme, Csanyi, Gonzales, Rechs, O’Leary, Drabman, Kaszdin, Bootzin, Staats, Cameron, Pierce, McMillan, Corcoran, and Wilson. International organizations, such as the World Bank or the Asian Development Bank, have studied the effects of testing on education programs they sponsor. Researchers have included Somerset, Heynemann, Ransom, Psacharopoulis, Velez, Brooke, Oxenham, Bude, Chapman, Snyder, and Pronaratna. Moreover, the mastery learning/mastery testing experiments conducted from the 1960s through today varied incentives, frequency of tests, types of tests, and many other factors to determine the optimal structure of testing programs. Researchers included such notables as Bloom, Carroll, Keller, Block, Burns, Wentling, Anderson, Hymel, Kulik, Tierney, Cross, Okey, Guskey, Gates, and Jones."
274	Brian M. Stecher	Stephen P. Klein	"In constrast, relatively little has been published on the costs of such measures [performance tests] in operational programs. An Office of Technology Assessment (1992) … (Hoover and Bray) …."	Dismissive	The Cost of Science Performance Assessments in Large-Scale Testing Programs, p.1	Educational Evaluation and Policy Analysis, Spring 1997, 19(1)		"This article is based on work supported by the National Science Foundation under Grant No. MDR-9154406." p.12	The January 1993 GAO report on testing costs included such information. CRESST has spent a quarter century denigrating that report.
275	Brian M. Stecher	Stephen P. Klein	"However, empirical and observational data suggest much more needs to be done to understand what hands-on tasks actually measure. Klein et al. (1996b) … Shavelson et al. (1992) … Hamilton (1994) …." pp.9-10	Dismissive	The Cost of Science Performance Assessments in Large-Scale Testing Programs, p.1	Educational Evaluation and Policy Analysis, Spring 1997, 19(1)		"This article is based on work supported by the National Science Foundation under Grant No. MDR-9154406." p.12	Article references only works by other CRESST authors and completely ignores the career-tech education literature, where such studies are most likely to be found.
276	Brian M. Stecher	Stephen P. Klein	"Future research will no doubt shed more light on the validity question, but for now, it is not clear how scores on hands-on performance tasks should be interpreted." p.10	Dismissive	The Cost of Science Performance Assessments in Large-Scale Testing Programs, p.1	Educational Evaluation and Policy Analysis, Spring 1997, 19(1)		"This article is based on work supported by the National Science Foundation under Grant No. MDR-9154406." p.12	Article references only works by other CRESST authors and completely ignores the career-tech education literature, where such studies are most likely to be found.
277	Brian M. Stecher	Stephen P. Klein	"Advocates of performance assessment believe that the use of these measures will reinforce efforts to reform curriculum and instruction. … Unfortunately, there is very little research to confirm either the existence or the size of most off these potential benefits. Those few studies ... Klein (1995) ... Javonovic, Solanno-Flores, & Shavelson, 1994; Klein et al., 1996a)." p.10	Dismissive	The Cost of Science Performance Assessments in Large-Scale Testing Programs, p.1	Educational Evaluation and Policy Analysis, Spring 1997, 19(1)		"This article is based on work supported by the National Science Foundation under Grant No. MDR-9154406." p.12	Article references only works by other CRESST authors and completely ignores the career-tech education literature, where such studies are most likely to be found.
278	Mary Lee Smith	11 others	"The purpose of the research described in this report is to understand what happens in the aftermath of a change in state assessment policy that is designed to improve schools and make them more accountable to a set of common standards. Although theoretical and rhetorical works about this issue are common in the literature, empirical evidence is novel and scant."	Dismissive	Reforming schools by reforming assessment: Consequences of the Arizona Student Assessment Program (ASAP): Equity and teacher capacity building, p.3	CSE Technical Report 425, March 1997	https://cresst.org/wp-content/uploads/TECH425.pdf	Office of Research and Improvement, US Education Department	Relevant pre-2000 studies of the effects of standards, alignment, goal setting, setting reachable goals, etc. include those of Mitchell (1999); Morgan & Ramist (1998); the Southern Regional Education Board (1998); Miles, Bishop, Collins, Fink, Gardner, Grant, Hussain, et al. (1997); the Florida Office of Program Policy Analysis (1997); Pomplun (1997); Schmoker (1996); Aguilera & Hendricks (1996); Banta, Lund, Black & Oblander (1996); Bottoms & Mikos (1995); Bamburg & Medina (1993); Bishop (1993); the U. S. General Accounting Office (1993); Eckstein & Noah (1993); Mattsson (1993); Brown (1992); Heyneman & Ransom (1992); Whetton (1992); Anderson, Muir, Bateson, Blackmore & Rogers (1990); Csikszentmihalyi (1990); Levine & Lezotte (1990); LaRoque & Coleman (1989); Hillocks (1987); Willingham & Morris (1986); Resnick & Resnick (1985); Ogle & Fritts (1984); Natriello & Dornbusch (1984); Brooke & Oxenham (1984); Rentz (1979); Wellisch, MacQueen, Carriere & Dick (1978); *Rosswork (1977); Estes, Colvin & Goodwin (1976); Wood (1953); and Panlasigui & Knight (1930).
279	Robert L. Linn	Joan L. Herman	"How much do standards-led assessments costs? Dependable estimates are difficult to obtain, in part because many of the costs associated with assessment -- the time spent by teachers in preparation, administration, and scoring -- are typically absorbed by schools' normal operations and not prices in a separate budget." p.14	Denigrating	A Policymaker's Guide to Standards-Led Assessment	Education Commission of the States, February, 1997			The January 1993 GAO report on testing costs included such information. CRESST has spent a quarter century denigrating that report. See, for example, Phelps, R.P. (2000, Winter). Estimating the cost of systemwide student testing in the United States. Journal of Education Finance, 25(3) 343–380; Danitz, T. (2001, February 27). Special report: States pay $400 million for tests in 2001. Stateline.org. Pew Center for the States; Hoxby, C.M. (2002). The cost of accountability, in W. M Evers & H.J. Walberg (Eds.), School Accountability, Stanford, CA: Hoover Institution Press; U.S. GAO. (1993, January). Student testing: Current extent and expenditures, with cost estimates for a national examination. GAO/PEMD-93-8. Washington, DC: US General Accounting Office; Phelps, R.P. (1998). Benefit-cost analysis of systemwide student testing, Paper presented at the annual meeting of the American Education Finance Association, Mobile, AL.
280	Robert L. Linn	Joan L. Herman	"None of the above estimates includes operational costs for schools, districts, or states." p.14	Denigrating	A Policymaker's Guide to Standards-Led Assessment	Education Commission of the States, February, 1997			The January 1993 GAO report on testing costs included such information. CRESST has spent a quarter century denigrating that report.
281	Eva L. Baker	Robert L. Linn, Joan L. Herman	"How do we assure accurate placement of students with varying abilities and language capabilities? There is little research to date to guide policy and practice (August, et al., 1994)."	Dismissive	CRESST: A Continuing Mission to Improve Educational Assessment, p.12	Evaluation Comment, Summer 1996		Office of Research and Improvement, US Education Department
282	Eva L. Baker	Robert L. Linn, Joan L. Herman	"Alternative assessments are needed for these students (see Kentucky Portfolios for Special Education, Kentucky Department of Education, 1995). Although promising, there has been little or no research investigating the validity of inferences from these adaptations or alternatives."	Dismissive	CRESST: A Continuing Mission to Improve Educational Assessment, p.13	Evaluation Comment, Summer 1996		Office of Research and Improvement, US Education Department
283	Eva L. Baker	Robert L. Linn, Joan L. Herman	"Similarly, research is needed to provide a basis for understanding the implications of using different summaries of student performance, such as group means or percentage of students meeting a standard, for measuring progress." p.15	Dismissive	CRESST: A Continuing Mission to Improve Educational Assessment	Evaluation Comment, Summer 1996		Office of Research and Improvement, US Education Department
284	Eva L. Baker	Harold F O'Neil, Jr	"Few research findings exist about the performance of ethnically different groups of students on performance-based assessment in its present form."" p.193	Dismissive	Chapter 10 in Implementing Performance Assessment: Promises, Problems, and Challenges	Lawrence Erlbaum Associates Publishers, 1996		Office of Research and Improvement, US Education Department
285	Eva L. Baker	Harold F O'Neil, Jr	"The authors have not been able to find studies of the interaction of rters and student ethnicities in educational settings." p.193	Dismissive	Chapter 10 in Implementing Performance Assessment: Promises, Problems, and Challenges	Lawrence Erlbaum Associates Publishers, 1996		Office of Research and Improvement, US Education Department
286	Robert L. Linn	Daniel M. Koretz, Eva Baker	“’Yet we do not have the necessary comprehensive dependable data. . . .’ (Tyler 1996a, p. 95)” p. 8	Dismissive	Assessing the Validity of the National Assessment of Educational Progress	CSE Technical Report 416 (June 1996)	http://www.cse.ucla.edu/products/reports/TECH416.pdf	Office of Research and Improvement, US Education Department	In their 2009 Evaluation of NAEP for the US Education Department, Buckendahl, Davis, Plake, Sireci, Hambleton, Zenisky, & Wells (pp. 77–85) managed to find quite a lot of research on making comparisons between NAEP and state assessments: several of NAEP's own publications, Chromy 2005), Chromy, Ault, Black, & Mosquin (2007), McLaughlin (2000), Schuiz & Mitzel (2005), Sireci, Robin, Meara, Rogers, & Swaminathan (2000), Stancavage, Et al (2002), Stoneberg (2007), WestEd (2002), and Wise, Le, Hoffman, & Becker (2004).
287	Robert L. Linn	Daniel M. Koretz, Eva Baker	"“There is a need for more extended discussion and reconsideration of the approach being used to measure long-term trends.” p. 21	Dismissive	Assessing the Validity of the National Assessment of Educational Progress	CSE Technical Report 416 (June 1996)	http://www.cse.ucla.edu/products/reports/TECH416.pdf	Office of Research and Improvement, US Education Department	There was extended discussion and cosideration. Simply put, they did not get their way because others disagreed with them.
288	Robert L. Linn	Daniel M. Koretz, Eva Baker	"“Only a small minority of the articles that discussed achievement levels made any mention of the judgmental nature of the levels, and most of those did so only briefly.” p. 27	Denigrating	Assessing the Validity of the National Assessment of Educational Progress	CSE Technical Report 416 (June 1996)	http://www.cse.ucla.edu/products/reports/TECH416.pdf	Office of Research and Improvement, US Education Department	All achievement levels, just like all course grades, are set subjectively. This information was never hidden.
289	Thomas Kellaghan	George F. Madaus, Anastasia Raczek	"The limited evidence on the effectiveness of external, or extrinsic, rewards in education is also reviewed." p.vii	Dismissive	The Use of External Examinations to Improve Student Motication	American Educational Research Association monograph		"Work on this monograph was supported by Grant 910-1205-1 from the Ford Foundation."	See, for example: https://www.tandfonline.com/doi/full/10.1080/15305058.2011.602920 ; https://nonpartisaneducation.org/Review/Resources/QuantitativeList.htm . This list includes 24 studies completed before 2000 whose primary focus was to measure the effect of “test-based accountability.” A few dozen more pre-2000 studies also measured the effect of test-based accountability although such was not their primary focus. Include qualitative and program evaluation studies of test-based accountability, and the count of pre-2000 studies rises into the hundreds.
290	Lawrence O. Picus	Alisha Tralli, Suzanne Tacheny	"Although several states have implemented new assessment programs, there has been little research on the costs of developing and implementing these new systems." p.4	Dismissive	Estimating the Costs of Student Assessment in North Carolina and Kentucky: A State-Level Analysis	CSE Technical Report 408 (February 1996)	http://www.cse.ucla.edu/products/reports/TECH408.pdf	Office of Research and Improvement, US Education Department	The January 1993 GAO report on testing costs included such information. Picus has spent over two decades denigrating that report, both directly ad by insinuation.
291	Lawrence O. Picus	Alisha Tralli, Suzanne Tacheny	"Although several states have implemenmted new assessment programs, there has been little research on the cost of developing and implementing these new systems." p.3	Dismissive	Estimating the Costs of Student Assessment in North Carolina and Kentucky: A State-Level Analysis	CSE Technical Report 408 (February 1996)	http://www.cse.ucla.edu/products/reports/TECH408.pdf	Office of Research and Improvement, US Education Department	The January 1993 GAO report on testing costs included such information. Picus has spent over two decades denigrating that report, both directly ad by insinuation.
292	Thomas Kellaghan	George F. Madaus, Anastasia Raczek	"At the very least, a careful analysis of relecvant issues and a consideration of empirical evidence are required before reaching such a conclusion. However, the arguments put forward by reformers are not based on such analysis or consideration. Indeed, their arguments often lack clarity, even in the terminology they use. Further, although not much research deals directly with the relationship between external examinations and motivation, ..." p.2	Dismissive, Denigrating	The Use of External Examinations to Improve Student Motication	American Educational Research Association monograph		"Work on this monograph was supported by Grant 910-1205-1 from the Ford Foundation."	Relevant studies of the effects of tests and/or accountability program on motivation and instructional practice include those of the Southern Regional Education Board (1998); Johnson (1998); Schafer, Hultgren, Hawley, Abrams Seubert & Mazzoni (1997); Miles, Bishop, Collins, Fink, Gardner, Grant, Hussain, et al. (1997); Tuckman & Trimble (1997); Clarke & Stephens (1996); Zigarelli (1996); Stevenson, Lee, et al. (1995); Waters, Burger & Burger (1995); Egeland (1995); Prais (1995); Tuckman (1994); Ritchie & Thorkildsen (1994); Brown & Walberg, (1993); Wall & Alderson (1993); Wolf & Rapiau (1993); Eckstein & Noah (1993); Chao-Qun & Hui (1993); Plazak & Mazur (1992); Steedman (1992); Singh, Marimutha & Mukjerjee (1990); Levine & Lezotte (1990); O’Sullivan (1989); Somerset (1988); Pennycuick & Murphy (1988); Stevens (1984); Marsh (1984); Brunton (1982); Solberg (1977); Foss (1977); Kirkland (1971); Somerset (1968); Stuit (1947); and Keys (1934). Covers many studies; study is a research review, research synthesis, or meta-analysis.
293	Thomas Kellaghan	George F. Madaus, Anastasia Raczek	"The final proposition in the armory of proponents of external examinations anticipates that all students at selected grades at both elementary and high school levels will take such examinations. This proposition is presumably based on the unexamined assumption that the motivational power of examinations will operate more or less the same way for students of all ages." p.10	Dismissive, Denigrating	The Use of External Examinations to Improve Student Motication	American Educational Research Association monograph		"Work on this monograph was supported by Grant 910-1205-1 from the Ford Foundation."	Relevant studies of the effects of tests and/or accountability program on motivation and instructional practice include those of the Southern Regional Education Board (1998); Johnson (1998); Schafer, Hultgren, Hawley, Abrams Seubert & Mazzoni (1997); Miles, Bishop, Collins, Fink, Gardner, Grant, Hussain, et al. (1997); Tuckman & Trimble (1997); Clarke & Stephens (1996); Zigarelli (1996); Stevenson, Lee, et al. (1995); Waters, Burger & Burger (1995); Egeland (1995); Prais (1995); Tuckman (1994); Ritchie & Thorkildsen (1994); Brown & Walberg, (1993); Wall & Alderson (1993); Wolf & Rapiau (1993); Eckstein & Noah (1993); Chao-Qun & Hui (1993); Plazak & Mazur (1992); Steedman (1992); Singh, Marimutha & Mukjerjee (1990); Levine & Lezotte (1990); O’Sullivan (1989); Somerset (1988); Pennycuick & Murphy (1988); Stevens (1984); Marsh (1984); Brunton (1982); Solberg (1977); Foss (1977); Kirkland (1971); Somerset (1968); Stuit (1947); and Keys (1934). Covers many studies; study is a research review, research synthesis, or meta-analysis.
294	Robert L. Linn	Eva L. Baker	"Although the connection between student achievement and economic competitiveness is not well established, exhortations for higher standards of student achievement nonetheless are frequently based on the assumption of a strong connection."	Dismissive	What Do International Assessments Imply for World-Class Standards?	Educational Evaluation and Policy Analysis, Dec. 1, 1995	https://journals.sagepub.com/doi/abs/10.3102/01623737017004405	Office of Research and Improvement, US Education Department
295	Robert Rothman		"Though Cannell's methods were flawed and he overstated his case, …" p.51	Dismissive, Denigrating	Measuring Up: Standards, Assessment, and School Reform	Jossey-Bass Publishers, 1995		"This book would not have come about without the support of two extraordinary groups of people, to whom I owe inclculable debt." CRESST, Dean Ted Mitchell, Director Eva Baker; Education Week, Editors Ron Wolk, Ginny Edwards. Also, Steve Ferrara, Chester Finn, Joan Herman, Laura Resnick	Rothman claims correctly there were likely multiple causes for test score inflation, including outdated norms and genuine improved student achievement. Then, he suggests that Cannell had insisted that there was only one cause--cheating. That is false. Cannell specifically acknowledged other possible causes. See https://eric.ed.gov/?q=Cannell&pg=2&id=ED314454
296	Robert Rothman		"To those familiar with testing—the finding—confirmed by a federally sponsored study by leading experts—pointed up many of the problems brought on by reliance on high-stakes testing. In any event, Cannell's small, crude study helped fuel a mounting criticism of the enterprise." p.52	Dismissive, Denigrating	Measuring Up: Standards, Assessment, and School Reform	Jossey-Bass Publishers, 1995		"This book would not have come about without the support of two extraordinary groups of people, to whom I owe inclculable debt." CRESST, Dean Ted Mitchell, Director Eva Baker; Education Week, Editors Ron Wolk, Ginny Edwards. Also, Steve Ferrara, Chester Finn, Joan Herman, Laura Resnick	Cannell surveyed education departments in all fifty states and, in states where districts made all the testing decisions, the larger districts within each state. He was unusually successful in retrieving responses, which required many hours and persistence. It was was an enormous undertaking, and very revealing. Most states and districts admitted that were not following many professional test security standards. See https://eric.ed.gov/?q=Cannell&pg=2&id=ED314454
297	Robert Rothman		"And as a big man with a booming baritone voice, Cannell was able to make himself heard from statehouses to the corridors of the the U.S. Education Department." p.52	Denigrating	Measuring Up: Standards, Assessment, and School Reform	Jossey-Bass Publishers, 1995		"This book would not have come about without the support of two extraordinary groups of people, to whom I owe inclculable debt." CRESST, Dean Ted Mitchell, Director Eva Baker; Education Week, Editors Ron Wolk, Ginny Edwards. Also, Steve Ferrara, Chester Finn, Joan Herman, Laura Resnick	Cannell was exactly right. There was corruption, lax security, and cheating. See, for example, https://nonpartisaneducation.org/Review/Articles/v6n3.htm
298	Robert Rothman		"To Cannell, the high scores reflected flagrant cheating. … This charge lent an air of sensationalism to Cannell's already provacative findings and helped attract even more publicity for them. … Cannell began receiving letters from other teachers around the country confessing their own misdeeds or charging others with committing similar ones." p.56	Denigrating	Measuring Up: Standards, Assessment, and School Reform	Jossey-Bass Publishers, 1995		"This book would not have come about without the support of two extraordinary groups of people, to whom I owe inclculable debt." CRESST, Dean Ted Mitchell, Director Eva Baker; Education Week, Editors Ron Wolk, Ginny Edwards. Also, Steve Ferrara, Chester Finn, Joan Herman, Laura Resnick	Rothman claims correctly there were likely multiple causes for test score inflation, including outdated norms and genuine improved student achievement. Then, he suggests that Cannell had insisted that there was only one cause--cheating. That is false. Cannell specifically acknowledged other possible causes. See https://eric.ed.gov/?q=Cannell&pg=2&id=ED314454
299	Robert Rothman		"Despite those cses, there is little evidence that cheating is epidemic in schools or that such practices are the reason test scores have risen." p.57	Dismissive	Measuring Up: Standards, Assessment, and School Reform	Jossey-Bass Publishers, 1995		"This book would not have come about without the support of two extraordinary groups of people, to whom I owe inclculable debt." CRESST, Dean Ted Mitchell, Director Eva Baker; Education Week, Editors Ron Wolk, Ginny Edwards. Also, Steve Ferrara, Chester Finn, Joan Herman, Laura Resnick	Rotham cites one CRESST study. Meanwhile, Cannell surveyed all 50 states on their test security practices and found most lacking.
300	Robert Rothman		"Daniel M. Koretz and his colleagues (at CRESST) found that students performed much worse on tests they had not seen before than they did on the district's tests, even though the test measured the same general content and skills." p.62	Denigrating	Measuring Up: Standards, Assessment, and School Reform	Jossey-Bass Publishers, 1995		"This book would not have come about without the support of two extraordinary groups of people, to whom I owe inclculable debt." CRESST, Dean Ted Mitchell, Director Eva Baker; Education Week, Editors Ron Wolk, Ginny Edwards. Also, Steve Ferrara, Chester Finn, Joan Herman, Laura Resnick	The comparison test most likely did not measure the same content and skills, as it was a "competing test" in an era when national norm-references tests including widely varying content and sequencing of topics. Though, we cannot check, as Koretz has kept the identity of the tests and the schools secret.
301	Robert Rothman		"'Teachers have gotten the message loud and clear that they would be rated on how kids score on tests. That's all it takes. The problem is, it simply hasn't worked in raising performance. I don't know why we would want to try it again when it hasn't worked before." p.134	Denigrating	Measuring Up: Standards, Assessment, and School Reform	Jossey-Bass Publishers, 1995		"This book would not have come about without the support of two extraordinary groups of people, to whom I owe inclculable debt." CRESST, Dean Ted Mitchell, Director Eva Baker; Education Week, Editors Ron Wolk, Ginny Edwards. Also, Steve Ferrara, Chester Finn, Joan Herman, Laura Resnick	In fact, the evidence "that testing can improve education" is voluminous. See, for example, Phelps, R. P. (2005). The rich, robust research literature on testing’s achievement benefits. In R. P. Phelps (Ed.), Defending standardized testing (pp. 55–90). Mahwah, NJ: Psychology Press. Or, see https://journals.sagepub.com/doi/abs/10.1177/0193841X19865628#abstract
302	Robert Rothman		"Moreover, Madaus and Kellaghan found that almost no country tests students before age sixteen, and most use tests to select students for scarce slots in higher education and training programs." p.135	Dismissive	Measuring Up: Standards, Assessment, and School Reform	Jossey-Bass Publishers, 1995		"This book would not have come about without the support of two extraordinary groups of people, to whom I owe inclculable debt." CRESST, Dean Ted Mitchell, Director Eva Baker; Education Week, Editors Ron Wolk, Ginny Edwards. Also, Steve Ferrara, Chester Finn, Joan Herman, Laura Resnick	Madaus and Kellaghan did not "find" anything. They simply declared that such was a fact. It was not. See https://onlinelibrary.wiley.com/doi/abs/10.1111/j.1745-3992.2000.tb00018.x
303	Robert Rothman		"But scholars are just beginning to learn how the new instruments can be used to measure students' abilities." p.149	Dismissive	Measuring Up: Standards, Assessment, and School Reform	Jossey-Bass Publishers, 1995		"This book would not have come about without the support of two extraordinary groups of people, to whom I owe inclculable debt." CRESST, Dean Ted Mitchell, Director Eva Baker; Education Week, Editors Ron Wolk, Ginny Edwards. Also, Steve Ferrara, Chester Finn, Joan Herman, Laura Resnick	It is selected-response item formats (e.g., multiple choice) that are new. Open-ended item formats have been with us for millenia. And, thousands of research, evalution, and validity studies have been conducted on them.
304	Lawrence O. Picus		"While our understanding of how each of these assessment instruments can best be used is growing, information of their costs is virtually nonexistent." p.1	Dismissive	A Conceptual Framework for Analyzing the Costs of Alternative Assessment	CSE Technical Report 384 (August 1994)	https://cresst.org/wp-content/uploads/TECH384.pdf	Office of Research and Improvement, US Education Department	The January 1993 GAO report on testing costs included such information. Picus has spent over two decades denigrating that report, both directly and by insinuation.
305	Lawrence O. Picus		"Research at the Center for Research on Evaluation, Standards, and Student Testing (CRESST) has found that policy makers have little information about the costs of alternative assessments, and that they are concerned abou the cost trade-offs involved in using alternative assessment compared to the many other activities they feel continue to be necessary." p.1	Dismissive	A Conceptual Framework for Analyzing the Costs of Alternative Assessment	CSE Technical Report 384 (August 1994)	https://cresst.org/wp-content/uploads/TECH384.pdf	Office of Research and Improvement, US Education Department	The January 1993 GAO report on testing costs included such information. Picus has spent over two decades denigrating that report, both directly and by insinuation.
306	Lawrence O. Picus		"A number of important issues must be resolved before accurate estimates of costs can be developed. Central among those issues is the development of a clear definition of what constitutes a cost." p.1	Denigrating	A Conceptual Framework for Analyzing the Costs of Alternative Assessment	CSE Technical Report 384 (August 1994)	https://cresst.org/wp-content/uploads/TECH384.pdf	Office of Research and Improvement, US Education Department	The January 1993 GAO report on testing costs included such information. Picus has spent over two decades denigrating that report, both directly and by insinuation.
307	Lawrence O. Picus		"Determining the resources necessary to achieve each of these goals is, at best, a difficult task. Because of this difficulty, many analysts stop short of estimating the true cost of a program, and instead focus on the expenditures required for its implementation." pp.3-4	Denigrating	A Conceptual Framework for Analyzing the Costs of Alternative Assessment	CSE Technical Report 384 (August 1994)	https://cresst.org/wp-content/uploads/TECH384.pdf	Office of Research and Improvement, US Education Department	The January 1993 GAO report on testing costs included such information. Picus has spent over two decades denigrating that report, both directly and by insinuation.
308	Lawrence O. Picus		"… cost analysts in education have often resorted to estimating the monetary value of the resources devoted to the program being evaluated. ... However, it is important to remember the opportunity costs that result from time commitments of individuals not directly compensated through the assessment program, such as the teachers who are required to spend time on tasks that previously did not exist or were not their responsibility. Determining the value of these opportunity costs will improve the quality of educational cost analyses dramatically." p.33	Denigrating	A Conceptual Framework for Analyzing the Costs of Alternative Assessment	CSE Technical Report 384 (August 1994)	https://cresst.org/wp-content/uploads/TECH384.pdf	Office of Research and Improvement, US Education Department	The January 1993 GAO report on testing costs included such information. Picus has spent over two decades denigrating that report, both directly and by insinuation.
309	Mary Lee Smith	5 others	"This study also draws on previous research on the role of mandated testing. …The question unanswered by extant research is whether assessments that differ in form from the traditional, norm- or criterion-referenced standardized tests would produce similar reactions and effects."	Dismissive	What Happens When the Test Mandate Changes? Results of a Multiple Case Study	CSE Technical Report 380, July 1994	https://cresst.org/wp-content/uploads/TECH380.pdf	Office of Research and Improvement, US Education Department	Emerging? It is selected-response item formats (e.g., multiple choice) that are new. Open-ended item formats have been with us for millenia. And, thousands of research, evalution, and validity studies have been conducted on them.
310	Linn, R.L.		"Evidence is also needed that the uses and interpretations are contributing to enhanced student achievement and at the same time, not producing unintended negative outcomes." p.8		Performance Assessment: Policy promises and technical measurement standards.	Educational Researcher, 23(9), 4-14, 1994	As quoted in William A. Mehrens, Consequences of Assessment: What is the Evidence?, Education Policy Analysis Archives Volume 6 Number 13 July 14, 1998, https://epaa.asu.edu/ojs/article/view/580/	Office of Research and Improvement, US Education Department
311	Audrey J. Noble	Mary Lee Smith	"Are the behaviorist beliefs underlying measurement-driven reform warranted? A small body of evidence addresses the functions of assessments from the traditional viewpoint.	Dismissive	Old and New Beliefs About Measurement-Driven Reform: The More Things Change, the More They Stay the Same, p.3	CSE Technical Report 373, CRESST/Arizona State University	https://cresst.org/wp-content/uploads/TECH373.pdf	Office of Research and Improvement, US Education Department	Emerging? It is selected-response item formats (e.g., multiple choice) that are new. Open-ended item formats have been with us for millenia. And, thousands of research, evalution, and validity studies have been conducted on them.
312	Audrey J. Noble	Mary Lee Smith	"Few empirical studies exist of the use and effects of performance testing in high-stakes environments."	Dismissive	Old and New Beliefs About Measurement-Driven Reform: The More Things Change, the More They Stay the Same, p.10	CSE Technical Report 373, CRESST/Arizona State University	https://cresst.org/wp-content/uploads/TECH373.pdf	Office of Research and Improvement, US Education Department	Emerging? It is selected-response item formats (e.g., multiple choice) that are new. Open-ended item formats have been with us for millenia. And, thousands of research, evalution, and validity studies have been conducted on them.
313	Baker, E.L.	O'Neil, H.F., & Linn, R.L.	"Sufficient high-quality assessments must be available before their impact on educational reform can be assessed. Although interest in performance-based assessment is high, our knowledge about its quality is low."	Dismissive	Policy and validity prospects for performance-based assessment, 1993, p.332	American Psychologist, 48(12), 1210-1218.	http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.816.7823&rep=rep1&type=pdf	Office of Research and Improvement, US Education Department	It is selected-response item formats (e.g., multiple choice) that are new. Performance assessments have been with us for millenia. And, thousands of research, evalution, and validity studies have been conducted on them.
314	Baker, E.L.	O'Neil, H.F., & Linn, R.L.	"Moreover, few psychometric templates exist to guide the technical practices of assessment developers."	Dismissive	Policy and validity prospects for performance-based assessment, 1993, p.332	American Psychologist, 48(12), 1210-1218.	http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.816.7823&rep=rep1&type=pdf	Office of Research and Improvement, US Education Department	It is selected-response item formats (e.g., multiple choice) that are new. Performance assessments have been with us for millenia. And, thousands of research, evalution, and validity studies have been conducted on them.
315	Baker, E.L.	O'Neil, H.F., & Linn, R.L.	"Most of the arguments in favor of performance-based assessment ... are based on single instances, essentially hand-crafted exercises whose virtues are assumed because they have been developed by teachers or because they are thought to model good instructional practice."	Denigrating	Policy and validity prospects for performance-based assessment, 1993, p.334	American Psychologist, 48(12), 1210-1218.	http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.816.7823&rep=rep1&type=pdf	Office of Research and Improvement, US Education Department	It is selected-response item formats (e.g., multiple choice) that are new. Open-ended item formats have been with us for millenia. And, thousands of research, evalution, and validity studies have been conducted on them.
316	Baker, E.L.	O'Neil, H.F., & Linn, R.L.	"Although there is a considerable literature on the problem of unit or team assessment in the military (Swezey & Salas, 1992) and in technical fields such as antisubmarine warfare (Franken, in press), no compelling solutions have been forwarded for disaggregating group or team performance into individual records, a potential problem if assessments are to be used to allocate individual access or certification."	Denigrating	Policy and validity prospects for performance-based assessment, 1993, p.336	American Psychologist, 48(12), 1210-1218.	http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.816.7823&rep=rep1&type=pdf	Office of Research and Improvement, US Education Department	It is selected-response item formats (e.g., multiple choice) that are new. Open-ended item formats have been with us for millenia. And, thousands of research, evalution, and validity studies have been conducted on them.
317	Baker, E.L.	O'Neil, H.F., & Linn, R.L.	"What is the evidence in support of performance assessment? Reviews conducted of literature in military performance assessments (Baker, O’Neil, & Linn, 1990) and of literature in education (Baker, 1990b) have reported the relatively low incidence of any empirical literature in the field; less than 5% of the literature cited empirical data."	Dismissive	Policy and validity prospects for performance-based assessment, 1993, p.339-340	American Psychologist, 48(12), 1210-1218.	http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.816.7823&rep=rep1&type=pdf	Office of Research and Improvement, US Education Department	It is selected-response item formats (e.g., multiple choice) that are new. Open-ended item formats have been with us for millenia. And, thousands of research, evalution, and validity studies have been conducted on them.
318	Baker, E.L.	O'Neil, H.F., & Linn, R.L.	"To date, there is some evidence that precollegiate performance assessments result in relatively low levels of student performance in almost every subject matter area in which they have been tried. There is also emerging data from NAEP analyses (Koretz, Lewis, Skewes-Cox, & Burstein, 1992) that students differ by ethnicity in the rate at which they attempt more open-ended types of items."	Dismissive	Policy and validity prospects for performance-based assessment, 1993, p.341	American Psychologist, 48(12), 1210-1218.	http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.816.7823&rep=rep1&type=pdf	Office of Research and Improvement, US Education Department	It is selected-response item formats (e.g., multiple choice) that are new. Open-ended item formats have been with us for millenia. And, thousands of research, evalution, and validity studies have been conducted on them.
319	Baker, E.L.	O'Neil, H.F., & Linn, R.L.	"Research is underway attempting to address the motivational aspects of these assessments (Gearhart, Saxe, Stipek, & Hakansson, 1992; O’Neil, Sugrue, Abedi, Baker, & Golan, 1992)."	Dismissive	Policy and validity prospects for performance-based assessment, 1993, p.341	American Psychologist, 48(12), 1210-1218.	http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.816.7823&rep=rep1&type=pdf	Office of Research and Improvement, US Education Department	It is selected-response item formats (e.g., multiple choice) that are new. Open-ended item formats have been with us for millenia. And, thousands of research, evalution, and validity studies have been conducted on them.
320	Baker, E.L.	O'Neil, H.F., & Linn, R.L.	"Another approach might require the reconceptualization of the unit of assessment to include both teacher and student and thereby to legitimate help of various sorts. As yet, there is little research and only occasional speculation about the degree to which new assessments will be corrupted."	Dismissive	Policy and validity prospects for performance-based assessment, 1993, p.344-345	American Psychologist, 48(12), 1210-1218.	http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.816.7823&rep=rep1&type=pdf	Office of Research and Improvement, US Education Department	It is selected-response item formats (e.g., multiple choice) that are new. Open-ended item formats have been with us for millenia. And, thousands of research, evalution, and validity studies have been conducted on them.
321	Baker, E.L.	O'Neil, H.F., & Linn, R.L.	"A better research base is needed to evaluate the degree to which newly developed assessments fulfill expectations"	Denigrating	Policy and validity prospects for performance-based assessment, 1993, p.346	American Psychologist, 48(12), 1210-1218.	http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.816.7823&rep=rep1&type=pdf	Office of Research and Improvement, US Education Department	It is selected-response item formats (e.g., multiple choice) that are new. Open-ended item formats have been with us for millenia. And, thousands of research, evalution, and validity studies have been conducted on them.
322	Eva L. Baker	Robert L. Linn	"Because performance assessments are emerging phenomena, procedures for assessing their quality are in some disorder."	Denigrating	The Technical Merits of Performance Assessments, p.1	CRESST Line, Special 1993 AERA Issue		Office of Research and Improvement, US Education Department	Emerging? It is selected-response item formats (e.g., multiple choice) that are new. Open-ended item formats have been with us for millenia. And, thousands of research, evalution, and validity studies have been conducted on them.
323	Eva L. Baker	Robert L. Linn	"Second, there is relatively little analysis of the sequence of technical procedures required to render assessments sound for some uses."	Dismissive	The Technical Merits of Performance Assessments, p.1	CRESST Line, Special 1993 AERA Issue		Office of Research and Improvement, US Education Department	Emerging? It is selected-response item formats (e.g., multiple choice) that are new. Open-ended item formats have been with us for millenia. And, thousands of research, evalution, and validity studies have been conducted on them.
324	Eva L. Baker	Robert L. Linn	"The problem is that we cannot learn enough from the conduct of short-term instructional studies, nor can we wait for the results of longer-term instructional programs. ...We must continue to operate on faith."	Denigrating	The Technical Merits of Performance Assessments, p.2	CRESST Line, Special 1993 AERA Issue		Office of Research and Improvement, US Education Department	Emerging? It is selected-response item formats (e.g., multiple choice) that are new. Open-ended item formats have been with us for millenia. And, thousands of research, evalution, and validity studies have been conducted on them.
325	Walter M. Haney	George F. Madaus, Robert Lyons	"Academics who write about educational and psychological testing similarly have given little attention to the commercial side of testing." p.9	Dismissive	The Fractured Marketplace for Standardized Testing	National Commission on Testing and Public Policy, Boston College, Kluwer Academic Publishers, 1993		"Finally we thank the Ford Foundation, and three present and former officials there, …"
326	Walter M. Haney	George F. Madaus, Robert Lyons	"Nor is there much clear evidence on the potential distortions introduced by the Lake Wobegon phenomenon." p.231	Dismissive	The Fractured Marketplace for Standardized Testing	National Commission on Testing and Public Policy, Boston College, Kluwer Academic Publishers, 1993		"Finally we thank the Ford Foundation, and three present and former officials there, …"	John J. Cannells original "Lake Wobegon Effect" studies did a fine job of specifying the results, in detail. See: http://nonpartisaneducation.org/Review/Books/CannellBook1.htm http://nonpartisaneducation.org/Review/Books/Cannell2.pdf
327	Robert L. Linn	Vonda L. Kiplinger	"Unfortunately, there have been no empirical studies to date to either support or reject the hypothesized lack of motivation generated by the NAEP testing environment, or to show whether students' performance would be improved if motivation were increased."	1stness	Raising the stakes of test administration: The impact on student performance on NAEP, p.3	CSE Technical Report 360, March 3, 1993	https://files.eric.ed.gov/fulltext/ED378221.pdf	Office of Research and Improvement, US Education Department	A cornucopia of research has shown "no stakes" tests to be relatively unreliable, less reliable than high stakes tests, and to dampen student effort (see, e.g., Acherman & Kanfer, 2009; S. M. Brown & Walberg, 1993; Cole, Bergin, & Whittaker, 2008; Eklof, 2007; Finn, 2015; Hawthorne, Bol, Pribesh, & Suh, 2015; Wise & DeMars, 2005, 2015). Relevant studies of the effects of tests and/or accountability program on motivation and instructional practice include those of the Southern Regional Education Board (1998); Johnson (1998); Schafer, Hultgren, Hawley, Abrams Seubert & Mazzoni (1997); Miles, Bishop, Collins, Fink, Gardner, Grant, Hussain, et al. (1997); Tuckman & Trimble (1997); Clarke & Stephens (1996); Zigarelli (1996); Stevenson, Lee, et al. (1995); Waters, Burger & Burger (1995); Egeland (1995); Prais (1995); Tuckman (1994); Ritchie & Thorkildsen (1994); Brown & Walberg, (1993); Wall & Alderson (1993); Wolf & Rapiau (1993); Eckstein & Noah (1993); Chao-Qun & Hui (1993); Plazak & Mazur (1992); Steedman (1992); Singh, Marimutha & Mukjerjee (1990); Levine & Lezotte (1990); O’Sullivan (1989); Somerset (1988); Pennycuick & Murphy (1988); Stevens (1984); Marsh (1984); Brunton (1982); Solberg (1977); Foss (1977); Kirkland (1971); Somerset (1968); Stuit (1947); and Keys (1934). Covers many studies; study is a research review, research synthesis, or meta-analysis.
328	Robert L. Linn	Vonda L. Kiplinger	"Although much has been written on achievement motivation per se, there has been surprisingly little empirical research on the effects of different motivation conditions on test performance. Before examining the paucity of research on the relationship of motivation and test performance....?"	Dismissive	Raising the stakes of test administration: The impact on student performance on NAEP, p.3	CSE Technical Report 360, March 3, 1993	https://files.eric.ed.gov/fulltext/ED378221.pdf	Office of Research and Improvement, US Education Department	A cornucopia of research has shown "no stakes" tests to be relatively unreliable, less reliable than high stakes tests, and to dampen student effort (see, e.g., Acherman & Kanfer, 2009; S. M. Brown & Walberg, 1993; Cole, Bergin, & Whittaker, 2008; Eklof, 2007; Finn, 2015; Hawthorne, Bol, Pribesh, & Suh, 2015; Wise & DeMars, 2005, 2015). Relevant studies of the effects of tests and/or accountability program on motivation and instructional practice include those of the Southern Regional Education Board (1998); Johnson (1998); Schafer, Hultgren, Hawley, Abrams Seubert & Mazzoni (1997); Miles, Bishop, Collins, Fink, Gardner, Grant, Hussain, et al. (1997); Tuckman & Trimble (1997); Clarke & Stephens (1996); Zigarelli (1996); Stevenson, Lee, et al. (1995); Waters, Burger & Burger (1995); Egeland (1995); Prais (1995); Tuckman (1994); Ritchie & Thorkildsen (1994); Brown & Walberg, (1993); Wall & Alderson (1993); Wolf & Rapiau (1993); Eckstein & Noah (1993); Chao-Qun & Hui (1993); Plazak & Mazur (1992); Steedman (1992); Singh, Marimutha & Mukjerjee (1990); Levine & Lezotte (1990); O’Sullivan (1989); Somerset (1988); Pennycuick & Murphy (1988); Stevens (1984); Marsh (1984); Brunton (1982); Solberg (1977); Foss (1977); Kirkland (1971); Somerset (1968); Stuit (1947); and Keys (1934). Covers many studies; study is a research review, research synthesis, or meta-analysis.
329	Robert L. Linn	Vonda L. Kiplinger	"Before examining the paucity of research on the relationship of motivation and test performance, we first review briefly the general literature on the relationship of motivation and achievement."	Dismissive	Raising the stakes of test administration: The impact on student performance on NAEP, p.3	CSE Technical Report 360, March 3, 1993	https://files.eric.ed.gov/fulltext/ED378221.pdf	Office of Research and Improvement, US Education Department	A cornucopia of research has shown "no stakes" tests to be relatively unreliable, less reliable than high stakes tests, and to dampen student effort (see, e.g., Acherman & Kanfer, 2009; S. M. Brown & Walberg, 1993; Cole, Bergin, & Whittaker, 2008; Eklof, 2007; Finn, 2015; Hawthorne, Bol, Pribesh, & Suh, 2015; Wise & DeMars, 2005, 2015). Relevant studies of the effects of tests and/or accountability program on motivation and instructional practice include those of the Southern Regional Education Board (1998); Johnson (1998); Schafer, Hultgren, Hawley, Abrams Seubert & Mazzoni (1997); Miles, Bishop, Collins, Fink, Gardner, Grant, Hussain, et al. (1997); Tuckman & Trimble (1997); Clarke & Stephens (1996); Zigarelli (1996); Stevenson, Lee, et al. (1995); Waters, Burger & Burger (1995); Egeland (1995); Prais (1995); Tuckman (1994); Ritchie & Thorkildsen (1994); Brown & Walberg, (1993); Wall & Alderson (1993); Wolf & Rapiau (1993); Eckstein & Noah (1993); Chao-Qun & Hui (1993); Plazak & Mazur (1992); Steedman (1992); Singh, Marimutha & Mukjerjee (1990); Levine & Lezotte (1990); O’Sullivan (1989); Somerset (1988); Pennycuick & Murphy (1988); Stevens (1984); Marsh (1984); Brunton (1982); Solberg (1977); Foss (1977); Kirkland (1971); Somerset (1968); Stuit (1947); and Keys (1934). Covers many studies; study is a research review, research synthesis, or meta-analysis.
330	Robert L. Linn	Vonda L. Kiplinger	"Prior to 1980, achievement motivation theory focused primarily on the need for achievement and the effects of test anxiety on test performance."	Dismissive	Raising the stakes of test administration: The impact on student performance on NAEP, p.3	CSE Technical Report 360, March 3, 1993	https://files.eric.ed.gov/fulltext/ED378221.pdf	Office of Research and Improvement, US Education Department	Relevant studies of the effects of tests and/or accountability program on motivation and instructional practice include those of the Southern Regional Education Board (1998); Johnson (1998); Schafer, Hultgren, Hawley, Abrams Seubert & Mazzoni (1997); Miles, Bishop, Collins, Fink, Gardner, Grant, Hussain, et al. (1997); Tuckman & Trimble (1997); Clarke & Stephens (1996); Zigarelli (1996); Stevenson, Lee, et al. (1995); Waters, Burger & Burger (1995); Egeland (1995); Prais (1995); Tuckman (1994); Ritchie & Thorkildsen (1994); Brown & Walberg, (1993); Wall & Alderson (1993); Wolf & Rapiau (1993); Eckstein & Noah (1993); Chao-Qun & Hui (1993); Plazak & Mazur (1992); Steedman (1992); Singh, Marimutha & Mukjerjee (1990); Levine & Lezotte (1990); O’Sullivan (1989); Somerset (1988); Pennycuick & Murphy (1988); Stevens (1984); Marsh (1984); Brunton (1982); Solberg (1977); Foss (1977); Kirkland (1971); Somerset (1968); Stuit (1947); and Keys (1934). Covers many studies; study is a research review, research synthesis, or meta-analysis.	"Others have considered the role of tests in incentive programs. These researchers have included Homme, Csanyi, Gonzales, Rechs, O’Leary, Drabman, Kaszdin, Bootzin, Staats, Cameron, Pierce, McMillan, Corcoran, and Wilson. International organizations, such as the World Bank or the Asian Development Bank, have studied the effects of testing on education programs they sponsor. Researchers have included Somerset, Heynemann, Ransom, Psacharopoulis, Velez, Brooke, Oxenham, Bude, Chapman, Snyder, and Pronaratna. Moreover, the mastery learning/mastery testing experiments conducted from the 1960s through today varied incentives, frequency of tests, types of tests, and many other factors to determine the optimal structure of testing programs. Researchers included such notables as Bloom, Carroll, Keller, Block, Burns, Wentling, Anderson, Hymel, Kulik, Tierney, Cross, Okey, Guskey, Gates, and Jones."
331	Robert L. Linn	Vonda L. Kiplinger	"Despite continuing concern regarding the effects of motivation on student achievement and test performance in general, ...there has been very little empirical research on students' self-reported motivation levels or experimental manipulation of motivational conditions--until recently."	Dismissive	Raising the stakes of test administration: The impact on student performance on NAEP, p.3	CSE Technical Report 360, March 3, 1993	https://files.eric.ed.gov/fulltext/ED378221.pdf	Office of Research and Improvement, US Education Department	Relevant studies of the effects of tests and/or accountability program on motivation and instructional practice include those of the Southern Regional Education Board (1998); Johnson (1998); Schafer, Hultgren, Hawley, Abrams Seubert & Mazzoni (1997); Miles, Bishop, Collins, Fink, Gardner, Grant, Hussain, et al. (1997); Tuckman & Trimble (1997); Clarke & Stephens (1996); Zigarelli (1996); Stevenson, Lee, et al. (1995); Waters, Burger & Burger (1995); Egeland (1995); Prais (1995); Tuckman (1994); Ritchie & Thorkildsen (1994); Brown & Walberg, (1993); Wall & Alderson (1993); Wolf & Rapiau (1993); Eckstein & Noah (1993); Chao-Qun & Hui (1993); Plazak & Mazur (1992); Steedman (1992); Singh, Marimutha & Mukjerjee (1990); Levine & Lezotte (1990); O’Sullivan (1989); Somerset (1988); Pennycuick & Murphy (1988); Stevens (1984); Marsh (1984); Brunton (1982); Solberg (1977); Foss (1977); Kirkland (1971); Somerset (1968); Stuit (1947); and Keys (1934). Covers many studies; study is a research review, research synthesis, or meta-analysis.	"Others have considered the role of tests in incentive programs. These researchers have included Homme, Csanyi, Gonzales, Rechs, O’Leary, Drabman, Kaszdin, Bootzin, Staats, Cameron, Pierce, McMillan, Corcoran, and Wilson. International organizations, such as the World Bank or the Asian Development Bank, have studied the effects of testing on education programs they sponsor. Researchers have included Somerset, Heynemann, Ransom, Psacharopoulis, Velez, Brooke, Oxenham, Bude, Chapman, Snyder, and Pronaratna. Moreover, the mastery learning/mastery testing experiments conducted from the 1960s through today varied incentives, frequency of tests, types of tests, and many other factors to determine the optimal structure of testing programs. Researchers included such notables as Bloom, Carroll, Keller, Block, Burns, Wentling, Anderson, Hymel, Kulik, Tierney, Cross, Okey, Guskey, Gates, and Jones."
332	Joan L. Herman		"Although the development of new alternatives is a popular idea, and many are engaged in the process, most developers of these new alternatives (with the exception of writing assessments) are at the design and prototyping stages, at some distance from having validated assessments."	Dismissive	Accountability and Alternative Assessment: Research and Development Issues, p.9	CSE Technical Report 348, August 1992	https://cresst.org/wp-content/uploads/TECH348.pdf	Office of Research and Improvement, US Education Department	It is selected-response item formats (e.g., multiple choice) that are new. Performance and authentic test formats have been with us for millenia. And, thousands of research, evalution, and validity studies have been conducted on them.
333	Joan L. Herman		"Yet what we know about alternative or performance-based measures is relatively small when compared to what we have yet to discover."	Dismissive	Accountability and Alternative Assessment: Research and Development Issues, p.9	CSE Technical Report 348, August 1992	https://cresst.org/wp-content/uploads/TECH348.pdf	Office of Research and Improvement, US Education Department	It is selected-response item formats (e.g., multiple choice) that are new. Performance and authentic test formats have been with us for millenia. And, thousands of research, evalution, and validity studies have been conducted on them.
334	Lorrie A. Shepard		"Proponents of measurement-driveni nstruction (MDI) argued, in the 1980s, that high-stakes tests would set clear targets thus assuring that teachers would focus greater attentionon essential basic skills. Critics countered that measurement-driven instruction distorts the curriculum, .... Each side argued theoretically and from limited observations but without systematic proof of these assertions."	Dismissive	Will National Tests Improve Student Learning?, p.6	CSE Technical Report 342, April 1992	https://files.eric.ed.gov/fulltext/ED348382.pdf	Office of Research and Improvement, US Education Department	Relevant pre-2000 studies of the effects of standards, alignment, goal setting, setting reachable goals, etc. include those of Mitchell (1999); Morgan & Ramist (1998); the Southern Regional Education Board (1998); Miles, Bishop, Collins, Fink, Gardner, Grant, Hussain, et al. (1997); the Florida Office of Program Policy Analysis (1997); Pomplun (1997); Schmoker (1996); Aguilera & Hendricks (1996); Banta, Lund, Black & Oblander (1996); Bottoms & Mikos (1995); Bamburg & Medina (1993); Bishop (1993); the U. S. General Accounting Office (1993); Eckstein & Noah (1993); Mattsson (1993); Brown (1992); Heyneman & Ransom (1992); Whetton (1992); Anderson, Muir, Bateson, Blackmore & Rogers (1990); Csikszentmihalyi (1990); Levine & Lezotte (1990); LaRoque & Coleman (1989); Hillocks (1987); Willingham & Morris (1986); Resnick & Resnick (1985); Ogle & Fritts (1984); Natriello & Dornbusch (1984); Brooke & Oxenham (1984); Rentz (1979); Wellisch, MacQueen, Carriere & Dick (1978); *Rosswork (1977); Estes, Colvin & Goodwin (1976); Wood (1953); and Panlasigui & Knight (1930).
335	Lorrie A. Shepard		"The vision of curriculum-driven examinations offered by the National Education Goals Panel is inspired. However, we do not at present have the technical, curricular, or political know-how to install such a system at least not on so large a scale."	Dismissive	Will National Tests Improve Student Learning?, p.10	CSE Technical Report 342, April 1992	https://files.eric.ed.gov/fulltext/ED348382.pdf	Office of Research and Improvement, US Education Department
336	Lorrie A. Shepard		"Moreover, there is no evidence available about what would happen to the quality of instruction if all high-school teachers, not just those who volunteered, were required to teach to the AP curricula."	Dismissive	Will National Tests Improve Student Learning?, p.10	CSE Technical Report 342, April 1992	https://files.eric.ed.gov/fulltext/ED348382.pdf	Office of Research and Improvement, US Education Department
337	Lorrie A. Shepard		"Research evidence on the effects of traditional standardized tests when used as high-stakes accountability instruments is strikingly negative."	Dismissive	Will National Tests Improve Student Learning?, pp.15-16	CSE Technical Report 342, April 1992	https://files.eric.ed.gov/fulltext/ED348382.pdf	Office of Research and Improvement, US Education Department	In fact, the evidence "that testing can improve education" is voluminous. See, for example, Phelps, R. P. (2005). The rich, robust research literature on testing’s achievement benefits. In R. P. Phelps (Ed.), Defending standardized testing (pp. 55–90). Mahwah, NJ: Psychology Press. Or, see https://journals.sagepub.com/doi/abs/10.1177/0193841X19865628#abstract
338	Joan L. Herman	Shari Golan	""Using greater technical rigor, Linn et al. (1989) replicated Cannell's findings, but moved beyond them in identifying underlying causes for such seemingly spurious results, among them the age of norms." pp.10-11	Denigrating	Effects of Standardized Testing on Teachers and Learning—Another Look	CSE Report No. 334	https://eric.ed.gov/?id=ED341738	Office of Research and Improvement, US Education Department	No. Cannell was exactly right. There was corruption, lax security, and cheating. See, for example, https://nonpartisaneducation.org/Review/Articles/v6n3.htm
339	R.J. Dietel, J.L. Herman, and R.A. Knuth		"Although there is now great excitement about performance-based assessment, we still know relatively little about methods for designing and validating such assessments. CRESST is one of many organizations and schools researching the promises and realities of such assessments." p.3	Dismissive	What Does Research Say About Assessment?	North Central Regional Education Laboratory, 1991	http://methodenpool.uni-koeln.de/portfolio/What%20Does%20Research%20Say%20About%20Assessment.htm	Office of Research and Improvement, US Education Department	It is selected-response item formats (e.g., multiple choice) that are new. Open-ended item formats have been with us for millenia. And, thousands of research, evalution, and validity studies have been conducted on them.
340	R.J. Dietel, J.L. Herman, and R.A. Knuth		"What we know about performance-based assessment is limited and there are many issues yet to be resolved." p.6	Dismissive	What Does Research Say About Assessment?	North Central Regional Education Laboratory, 1991	http://methodenpool.uni-koeln.de/portfolio/What%20Does%20Research%20Say%20About%20Assessment.htm	Office of Research and Improvement, US Education Department	It is selected-response item formats (e.g., multiple choice) that are new. Open-ended item formats have been with us for millenia. And, thousands of research, evalution, and validity studies have been conducted on them.
341	Mary Lee Smith	Carole Edelsky, Kelly Draper, Claire Rottenberg, Meredith Cherland	"Although schools have administered standardized tests of achievement for decades, only recently have such tests been used as instruments of social policy." p.1	Dismissive	The Role of Testing in Elementary Schools	CSE Technical Report 321, May 1991	https://cresst.org/publications/cresst-publication-2695/	Office of Research and Improvement, US Education Department	Rubbish. Entire books dating back a century were written on the topic, for example: C.C. Ross, Measurement in Today’s Schools, 1942; G.M. Ruch, G.D. Stoddard, Tests and Measurements in High School Instruction, 1927; C.W. Odell, Educational Measurement in High School, 1930. Other testimonies to the abundance of educational testing and empirical research on test use starting in the first half of the twentieth century can be found in Lincoln & Workman 1936, 4, 7; Butts 1947, 605; Monroe 1950, 1461; Holman & Docter 1972, 34; Tyack 1974, 183; and Lohman 1997, 88.
342	Mary Lee Smith	Carole Edelsky, Kelly Draper, Claire Rottenberg, Meredith Cherland	"The research literature on the effects of external testing is small but growing." p.3	Dismissive	The Role of Testing in Elementary Schools	CSE Technical Report 321, May 1991	https://cresst.org/publications/cresst-publication-2695/	Office of Research and Improvement, US Education Department	Rubbish. Entire books dating back a century were written on the topic, for example: C.C. Ross, Measurement in Today’s Schools, 1942; G.M. Ruch, G.D. Stoddard, Tests and Measurements in High School Instruction, 1927; C.W. Odell, Educational Measurement in High School, 1930. Other testimonies to the abundance of educational testing and empirical research on test use starting in the first half of the twentieth century can be found in Lincoln & Workman 1936, 4, 7; Butts 1947, 605; Monroe 1950, 1461; Holman & Docter 1972, 34; Tyack 1974, 183; and Lohman 1997, 88.
343	Mary Lee Smith	Carole Edelsky, Kelly Draper, Claire Rottenberg, Meredith Cherland	"Past researchers have not examined the classroom directly for traces of testing effects." p.5	Dismissive	The Role of Testing in Elementary Schools	CSE Technical Report 321, May 1991	https://cresst.org/publications/cresst-publication-2695/	Office of Research and Improvement, US Education Department	Rubbish. Entire books dating back a century were written on the topic, for example: C.C. Ross, Measurement in Today’s Schools, 1942; G.M. Ruch, G.D. Stoddard, Tests and Measurements in High School Instruction, 1927; C.W. Odell, Educational Measurement in High School, 1930. Other testimonies to the abundance of educational testing and empirical research on test use starting in the first half of the twentieth century can be found in Lincoln & Workman 1936, 4, 7; Butts 1947, 605; Monroe 1950, 1461; Holman & Docter 1972, 34; Tyack 1974, 183; and Lohman 1997, 88.
344	Eva L. Baker		"Knowledge Base: Paltry But Sure to Improve: At the same time that interest in alternative assessment is high, our knowledge about the design, distribution, quality and impact of such efforts is low. This is a time of tingling metaphor, cottage industry, and existence proofs rather than carefully designed research and development."	Dismissive	What Probably Works in Alternative Assessment, p.2	Paper presented at the Annual Meetings of the American Educational Research Association (Chicago, IL, April 3-7, 1991)	https://files.eric.ed.gov/fulltext/ED512658.pdf	Office of Research and Improvement, US Education Department	It is selected-response item formats (e.g., multiple choice) that are new. Performance and authentic test formats have been with us for millenia. And, thousands of research, evalution, and validity studies have been conducted on them.
345	Eva L. Baker		"Moreover, because psychometric methods appropriate for dealing with such new measures are not readily available, nor even a matter of common agreement, no clear templates exist to guide the technical practices of alternative assessment developers (Linn, Baker, Dunbar, 1991)."	Dismissive	What Probably Works in Alternative Assessment, p.2	Paper presented at the Annual Meetings of the American Educational Research Association (Chicago, IL, April 3-7, 1991)	https://files.eric.ed.gov/fulltext/ED512658.pdf	Office of Research and Improvement, US Education Department	It is selected-response item formats (e.g., multiple choice) that are new. Performance and authentic test formats have been with us for millenia. And, thousands of research, evalution, and validity studies have been conducted on them.
346	Eva L. Baker		"Given that the level of empirical work is so obviously low, one well might wonder what these studies are about."	Denigrating	What Probably Works in Alternative Assessment, p.3	Paper presented at the Annual Meetings of the American Educational Research Association (Chicago, IL, April 3-7, 1991)	https://files.eric.ed.gov/fulltext/ED512658.pdf	Office of Research and Improvement, US Education Department	It is selected-response item formats (e.g., multiple choice) that are new. Performance and authentic test formats have been with us for millenia. And, thousands of research, evalution, and validity studies have been conducted on them.
347	Eva L. Baker		"Despite this fragile research base, alternative assessment has already taken off. What issues can we anticipate being raised by relevant communities about the value of these efforts?"	Dismissive	What Probably Works in Alternative Assessment, p.6	Paper presented at the Annual Meetings of the American Educational Research Association (Chicago, IL, April 3-7, 1991)	https://files.eric.ed.gov/fulltext/ED512658.pdf	Office of Research and Improvement, US Education Department	It is selected-response item formats (e.g., multiple choice) that are new. Performance and authentic test formats have been with us for millenia. And, thousands of research, evalution, and validity studies have been conducted on them.
348	Eva L. Baker		"This phenomenon may be due to lack of coherent specifications of the performance task domain, lack of coherent instructional experience, or the inherent instability of more complex performance? Until some insight on this phenomenon can be developed, however, using a single performance assessment for individual student decisions is a scary prospect."	Dismissive	What Probably Works in Alternative Assessment, p.7	Paper presented at the Annual Meetings of the American Educational Research Association (Chicago, IL, April 3-7, 1991)	https://files.eric.ed.gov/fulltext/ED512658.pdf	Office of Research and Improvement, US Education Department	It is selected-response item formats (e.g., multiple choice) that are new. Performance and authentic test formats have been with us for millenia. And, thousands of research, evalution, and validity studies have been conducted on them.
349	Lorrie A. Shepard	Catherine Cutts Dougherty	"Evidence to support the positive claims for measurement-driven instruction comes primarily from high-stakes tests themselves. For example, Popham, Cruse, Rankin, Sandifer, and Williams (1985) and Popham (1987) pointed to the steeply rising passing rates on minimum competency tests as demonstrations that MDI had improved student learning." p.2	Denigrating	Effect of High-Stakes Testing on Instruction	Paper presented at the Annual Meetings of the American Educational Research Association (Chicago, IL, April 3-7, 1991) and the National Council on Measurement in Education (Chicago, IL, April 4-6,1991)	https://files.eric.ed.gov/fulltext/ED337468.pdf	Office of Research and Improvement, US Education Department	The many studies of district and state minimum competency or diploma testing programs popular from the 1960s through the 1980s found positive effects for students just below the cut score and mixed effects for students far below and anywhere above. Researchers have included Fincher, Jackson, Battiste, Corcoran, Jacobsen, Tanner, Boylan, Saxon, Anderson, Muir, Bateson, Blackmore, Rogers, Zigarelli, Schafer, Hultgren, Hawley, Abrams, Seubert, Mazzoni, Brookhart, Mendro, Herrick, Webster, Orsack, Weerasinghe, and Bembry
350	Lorrie A. Shepard	Catherine Cutts Dougherty	"Evidence documenting the negative influence on instruction is limited to a few studies. Darling-Hammond and Wise (1985) reported that teachers in their study were pressured to 'teach to the test.'"	Dismissive	Effect of High-Stakes Testing on Instruction	Paper presented at the Annual Meetings of the American Educational Research Association (Chicago, IL, April 3-7, 1991) and the National Council on Measurement in Education (Chicago, IL, April 4-6,1991)	https://files.eric.ed.gov/fulltext/ED337468.pdf	Office of Research and Improvement, US Education Department	Rubbish. Entire books dating back a century were written on the topic, for example: C.C. Ross, Measurement in Today’s Schools, 1942; G.M. Ruch, G.D. Stoddard, Tests and Measurements in High School Instruction, 1927; C.W. Odell, Educational Measurement in High School, 1930. Other testimonies to the abundance of educational testing and empirical research on test use starting in the first half of the twentieth century can be found in Lincoln & Workman 1936, 4, 7; Butts 1947, 605; Monroe 1950, 1461; Holman & Docter 1972, 34; Tyack 1974, 183; and Lohman 1997, 88.
351	Daniel M. Koretz	Robert L. Linn, Stephen Dunbar, Lorrie A. Shepard	“Evidence relevant to this debate has been limited.” p. 2	Dismissive	The Effects of High-Stakes Testing On Achievement: Preliminary Findings About Generalization Across Tests	Originally presented at the annual meeting of the AERA and the NCME, Chicago, April 5, 1991	http://nepc.colorado.edu/files/HighStakesTesting.pdf	Office of Research and Improvement, US Education Department	See, for example, https://www.tandfonline.com/doi/full/10.1080/15305058.2011.602920 ; https://nonpartisaneducation.org/Review/Resources/QuantitativeList.htm ; https://nonpartisaneducation.org/Review/Resources/SurveyList.htm ; https://nonpartisaneducation.org/Review/Resources/QualitativeList.htm
352	James S. Catterall		"Before proceeding, readers should note that the observations do not result from an accumulated weight of in-depth cost-benefit type studies, since no such weight has been registered." p.2	Dismissive	Estimating the Costs and Benefits of Large-Scale Assessments: Lessons from Recent Research	CSE Report No. 319, 1990	https://cresst.org/wp-content/uploads/TECH319.pdf	Office of Research and Improvement, US Education Department
353	James S. Catterall		"The points tend to build on the small number of interesting developments reported (particularly Shepard & Kreitzer, 1987a, 1987b; Solmon & Fagnano, in press), as well as on the author's experiences in conducting cost-benefit type analyses of educational assessment practices (Catterall, 1984, 1989). We also base inferences on the paucity of research itself." p.2	Dismissive	Estimating the Costs and Benefits of Large-Scale Assessments: Lessons from Recent Research	CSE Report No. 319, 1990	https://cresst.org/wp-content/uploads/TECH319.pdf	Office of Research and Improvement, US Education Department
354	Hartigan, J. A., & Wigdor, A. K.		"The empirical evidence cited for the standard deviation of worker productivity is quite slight." p.239	Dismissive	Fairness in employment testing: Validity generalization, minority issues, and the General Aptitude Test Battery.	Washington, DC: National Academy Press, 1989	https://www.nap.edu/catalog/1338/fairness-in-employment-testing-validity-generalization-minority-issues-and-the	National Research Council funders	See, for example, The National Research Council’s Testing Expertise, https://www.apa.org/pubs/books/supplemental/correcting-fallacies-educational-psychological-testing/Phelps Web Appendix D new.doc
355	Hartigan, J. A., & Wigdor, A. K.		"Some fragmentary confirming evidence that supports this point of view can be found in Hunter et al. (1988)... We regard the Hunter and Schmidt assumption as plausible but note that there is very little evidence about the nature of the relationship of ability to output." p.243	Dismissive	Fairness in employment testing: Validity generalization, minority issues, and the General Aptitude Test Battery.	Washington, DC: National Academy Press, 1989	https://www.nap.edu/catalog/1338/fairness-in-employment-testing-validity-generalization-minority-issues-and-the	National Research Council funders	See, for example, The National Research Council’s Testing Expertise, https://www.apa.org/pubs/books/supplemental/correcting-fallacies-educational-psychological-testing/Phelps Web Appendix D new.doc
356	Hartigan, J. A., & Wigdor, A. K.		"It is also important to remember that the most important assumptions of the Hunter-Schmidt models rest on a very slim empirical foundation .... Hunter and Schmidt's economy-wide models are based on simple assumptions for which the empirical evidence is slight." p.245	Dismissive, Denigrating	Fairness in employment testing: Validity generalization, minority issues, and the General Aptitude Test Battery.	Washington, DC: National Academy Press, 1989	https://www.nap.edu/catalog/1338/fairness-in-employment-testing-validity-generalization-minority-issues-and-the	National Research Council funders	See, for example, The National Research Council’s Testing Expertise, https://www.apa.org/pubs/books/supplemental/correcting-fallacies-educational-psychological-testing/Phelps Web Appendix D new.doc
357	Hartigan, J. A., & Wigdor, A. K.		"It is important to remember that the most important assumptions of the Hunter-Schmidt models rest on a very slim empirical foundation." p.245	Dismissive, Denigrating	Fairness in employment testing: Validity generalization, minority issues, and the General Aptitude Test Battery.	Washington, DC: National Academy Press, 1989	https://www.nap.edu/catalog/1338/fairness-in-employment-testing-validity-generalization-minority-issues-and-the	National Research Council funders	See, for example, The National Research Council’s Testing Expertise, https://www.apa.org/pubs/books/supplemental/correcting-fallacies-educational-psychological-testing/Phelps Web Appendix D new.doc
358	Hartigan, J. A., & Wigdor, A. K.		"Hunter and Schmidt's economy wide models are based on simple assumptions for which the empirical evidence is slight." p.245	Dismissive, Denigrating	Fairness in employment testing: Validity generalization, minority issues, and the General Aptitude Test Battery.	Washington, DC: National Academy Press, 1989	https://www.nap.edu/catalog/1338/fairness-in-employment-testing-validity-generalization-minority-issues-and-the	National Research Council funders	See, for example, The National Research Council’s Testing Expertise, https://www.apa.org/pubs/books/supplemental/correcting-fallacies-educational-psychological-testing/Phelps Web Appendix D new.doc
359	Hartigan, J. A., & Wigdor, A. K.		"That assumption is supported by only a very few studies." p.245	Dismissive, Denigrating	Fairness in employment testing: Validity generalization, minority issues, and the General Aptitude Test Battery.	Washington, DC: National Academy Press, 1989	https://www.nap.edu/catalog/1338/fairness-in-employment-testing-validity-generalization-minority-issues-and-the	National Research Council funders	See, for example, The National Research Council’s Testing Expertise, https://www.apa.org/pubs/books/supplemental/correcting-fallacies-educational-psychological-testing/Phelps Web Appendix D new.doc
360	Hartigan, J. A., & Wigdor, A. K.		"There is no well-developed body of evidence from which to estimate the aggregate effects of better personnel selection...we have seen no empirical evidence that any of them provide an adequate basis for estimating the aggregate economic effects of implementing the VG-GATB on a nationwide basis." p.247	Dismissive, Denigrating	Fairness in employment testing: Validity generalization, minority issues, and the General Aptitude Test Battery.	Washington, DC: National Academy Press, 1989	https://www.nap.edu/catalog/1338/fairness-in-employment-testing-validity-generalization-minority-issues-and-the	National Research Council funders	See, for example, The National Research Council’s Testing Expertise, https://www.apa.org/pubs/books/supplemental/correcting-fallacies-educational-psychological-testing/Phelps Web Appendix D new.doc
361	Hartigan, J. A., & Wigdor, A. K.		"Furthermore, given the state of scientific knowledge, we do not believe that realistic dollar estimates of aggregate gains from improved selection are even possible." p.248	Dismissive	Fairness in employment testing: Validity generalization, minority issues, and the General Aptitude Test Battery.	Washington, DC: National Academy Press, 1989	https://www.nap.edu/catalog/1338/fairness-in-employment-testing-validity-generalization-minority-issues-and-the	National Research Council funders	See, for example, The National Research Council’s Testing Expertise, https://www.apa.org/pubs/books/supplemental/correcting-fallacies-educational-psychological-testing/Phelps Web Appendix D new.doc
362	Hartigan, J. A., & Wigdor, A. K.		"...primitive state of knowledge..." p.248	Denigrating	Fairness in employment testing: Validity generalization, minority issues, and the General Aptitude Test Battery.	Washington, DC: National Academy Press, 1989	https://www.nap.edu/catalog/1338/fairness-in-employment-testing-validity-generalization-minority-issues-and-the	National Research Council funders	See, for example, The National Research Council’s Testing Expertise, https://www.apa.org/pubs/books/supplemental/correcting-fallacies-educational-psychological-testing/Phelps Web Appendix D new.doc
363	Joan L. Herman, Donald W. Dorr-Bremme	Walter E. Hathaway, Ed.	"Despite the controversy and the important issues that it raises, little information has been forthcoming on the nature of testing as it is actually used in the schools. What functions do tests serve in the classrooms? How do teachers and principals use test results? What kinds of tests do principals and teachers trust and rely on most? These and similar questions have gone largely unaddressed." p.8	Dismissive	Uses of Testing in the Schools: A National Profile	Testing in the Schools, New Directions for Testing and Measurement #19, Jossey-Bass, September 1983		Office of Research and Improvement, US Education Department	Rubbish. Entire books dating back a century were written on the topic, for example: C.C. Ross, Measurement in Today’s Schools, 1942; G.M. Ruch, G.D. Stoddard, Tests and Measurements in High School Instruction, 1927; C.W. Odell, Educational Measurement in High School, 1930. Other testimonies to the abundance of educational testing and empirical research on test use starting in the first half of the twentieth century can be found in Lincoln & Workman 1936, 4, 7; Butts 1947, 605; Monroe 1950, 1461; Holman & Docter 1972, 34; Tyack 1974, 183; and Lohman 1997, 88.
364	Joan L. Herman, Donald W. Dorr-Bremme	Walter E. Hathaway, Ed.	"A few studies have indicated teachers' circumspect attitudes toward and limited use of one type of achievement measure, the norm-referenced test. Beyond this, however, the landscape of test uses in American schools has remained largely unexplored." p.8	Dismissive	Uses of Testing in the Schools: A National Profile	Testing in the Schools, New Directions for Testing and Measurement #19, Jossey-Bass, September 1983		Office of Research and Improvement, US Education Department	Rubbish. Entire books dating back a century were written on the topic, for example: C.C. Ross, Measurement in Today’s Schools, 1942; G.M. Ruch, G.D. Stoddard, Tests and Measurements in High School Instruction, 1927; C.W. Odell, Educational Measurement in High School, 1930. Other testimonies to the abundance of educational testing and empirical research on test use starting in the first half of the twentieth century can be found in Lincoln & Workman 1936, 4, 7; Butts 1947, 605; Monroe 1950, 1461; Holman & Docter 1972, 34; Tyack 1974, 183; and Lohman 1997, 88.
365	Joan L. Herman, Donald W. Dorr-Bremme	Walter E. Hathaway, Ed.	"We know very little about the quality of teacher-developed tests." p.15	Dismissive	Uses of Testing in the Schools: A National Profile	Testing in the Schools, New Directions for Testing and Measurement #19, Jossey-Bass, September 1983		Office of Research and Improvement, US Education Department	Rubbish. Entire books dating back a century were written on the topic, for example: C.C. Ross, Measurement in Today’s Schools, 1942; G.M. Ruch, G.D. Stoddard, Tests and Measurements in High School Instruction, 1927; C.W. Odell, Educational Measurement in High School, 1930. Other testimonies to the abundance of educational testing and empirical research on test use starting in the first half of the twentieth century can be found in Lincoln & Workman 1936, 4, 7; Butts 1947, 605; Monroe 1950, 1461; Holman & Docter 1972, 34; Tyack 1974, 183; and Lohman 1997, 88.
366	Don Dorr-Bremme	James Catterall	"Relatively little is known aout students' attitudes and feelings toward assessment in general. Even less is known regarding their feelings about different forms of assessment." p.48-1	Dismissive	Costs of Testing: Test Use Project	CSE Report, November 1982	https://files.eric.ed.gov/fulltext/ED224835.pdf	National Institute of Education, US Education Department	See https://nonpartisaneducation.org/Review/Resources/QualitativeList.htm. For a list of 19 pre-1982 qualitative studies of student attitudes toward testing
367	Don Dorr-Bremme	James Catterall	"in light of these few and certainly non-definitive findings, student interviews were undertaken to explore the affective valence that different forms of achievement assessment have for students." p.48-2	Dismissive	Costs of Testing: Test Use Project	CSE Report, November 1982	https://files.eric.ed.gov/fulltext/ED224835.pdf	National Institute of Education, US Education Department	See https://nonpartisaneducation.org/Review/Resources/QualitativeList.htm. For a list of 19 pre-1982 qualitative studies of student attitudes toward testing
368	Don Dorr-Bremme	James Catterall	"Because of the small sample size and the paucity of research in this topic, these findings suggests potential avenues for research as much as they provide information." p.48-26	Dismissive	Costs of Testing: Test Use Project	CSE Report, November 1982	https://files.eric.ed.gov/fulltext/ED224835.pdf	National Institute of Education, US Education Department	See https://nonpartisaneducation.org/Review/Resources/QualitativeList.htm. For a list of 19 pre-1982 qualitative studies of student attitudes toward testing
369	Jennie P. Yeh	Joan L. Herman	"Testing in American schools is increasing in both scope and visibility. … What return are we getting for this quite considerable investment? Little information is available. How are tests used in schools? What functions to test serve in classrooms?", p.1	Dismissive	Teachers and testing: A survey of test use	CSE Report No. 166, 1981	https://files.eric.ed.gov/fulltext/ED218336.pdf	National Institute of Education, US Education Department	Rubbish. Entire books dating back a century were written on the topic, for example: C.C. Ross, Measurement in Today’s Schools, 1942; G.M. Ruch, G.D. Stoddard, Tests and Measurements in High School Instruction, 1927; C.W. Odell, Educational Measurement in High School, 1930. Other testimonies to the abundance of educational testing and empirical research on test use starting in the first half of the twentieth century can be found in Lincoln & Workman 1936, 4, 7; Butts 1947, 605; Monroe 1950, 1461; Holman & Docter 1972, 34; Tyack 1974, 183; and Lohman 1997, 88.
370	Joan L. Herman	James Burry, Don Dorr-Bremme, Charlotte M. Lazar-Morrison, James D. Lehman, Jennie P. Yeh	"Despite the great controversy that surrounds testing and its potential uses and abuses, there is little empirical information available about the nature of testing as it actually occurs and is used (or not used) in schools. The Test Use Project at the Center for the Study of Evaluation seeks to fill this gap and answer basic questions about tests and schooling.", p.2	Dismissive	Teaching and testing: Allies or adversaries	CSE Report No. 165, 1981	https://files.eric.ed.gov/fulltext/ED218336.pdf	National Institute of Education, US Education Department	Rubbish. Entire books dating back a century were written on the topic, for example: C.C. Ross, Measurement in Today’s Schools, 1942; G.M. Ruch, G.D. Stoddard, Tests and Measurements in High School Instruction, 1927; C.W. Odell, Educational Measurement in High School, 1930. Other testimonies to the abundance of educational testing and empirical research on test use starting in the first half of the twentieth century can be found in Lincoln & Workman 1936, 4, 7; Butts 1947, 605; Monroe 1950, 1461; Holman & Docter 1972, 34; Tyack 1974, 183; and Lohman 1997, 88.
371	Joan L. Herman	James Burry, Don Dorr-Bremme, Charlotte M. Lazar-Morrison, James D. Lehman, Jennie P. Yeh	"Clearly the policy toward testing in this country has been one of accretion, but the full magnitude is undocumented. The CSE Test Use Project ... ", p.2	Dismissive	Teaching and testing: Allies or adversaries	CSE Report No. 165, 1981	https://files.eric.ed.gov/fulltext/ED218336.pdf	National Institute of Education, US Education Department	Rubbish. Entire books dating back a century were written on the topic, for example: C.C. Ross, Measurement in Today’s Schools, 1942; G.M. Ruch, G.D. Stoddard, Tests and Measurements in High School Instruction, 1927; C.W. Odell, Educational Measurement in High School, 1930. Other testimonies to the abundance of educational testing and empirical research on test use starting in the first half of the twentieth century can be found in Lincoln & Workman 1936, 4, 7; Butts 1947, 605; Monroe 1950, 1461; Holman & Docter 1972, 34; Tyack 1974, 183; and Lohman 1997, 88.
372	James Burry		"As instructional considerations have come into prominence, the dialogue over testing has become somewhat adversarial, with a great deal of the recent literature forming a series of position papers espousing the value of one kind of test over another, but offering little empirical data (Lazar-Morrison, Polin, Moy, & Burry, 1980)." p.27	Dismissive	The Design of Testing Programs with Multiple and Complimentary Uses	Paper presented at the Annual Meeting of the National Council on Measurement in Education (Los Angeles, CA, April 1981)	https://files.eric.ed.gov/fulltext/ED218337.pdf	National Institute of Education, US Education Department	Rubbish. Entire books dating back a century were written on the topic, for example: C.C. Ross, Measurement in Today’s Schools, 1942; G.M. Ruch, G.D. Stoddard, Tests and Measurements in High School Instruction, 1927; C.W. Odell, Educational Measurement in High School, 1930. Other testimonies to the abundance of educational testing and empirical research on test use starting in the first half of the twentieth century can be found in Lincoln & Workman 1936, 4, 7; Butts 1947, 605; Monroe 1950, 1461; Holman & Docter 1972, 34; Tyack 1974, 183; and Lohman 1997, 88.
373	James Burry		"This paper makes a preliminary step toward explicating school peoples' points of view about the kinds of assessment that are useful for external accountability concerns and for instructional decision making." pp.27-28	1stness	The Design of Testing Programs with Multiple and Complimentary Uses	Paper presented at the Annual Meeting of the National Council on Measurement in Education (Los Angeles, CA, April 1981)	https://files.eric.ed.gov/fulltext/ED218337.pdf	National Institute of Education, US Education Department	Rubbish. Entire books dating back a century were written on the topic, for example: C.C. Ross, Measurement in Today’s Schools, 1942; G.M. Ruch, G.D. Stoddard, Tests and Measurements in High School Instruction, 1927; C.W. Odell, Educational Measurement in High School, 1930. Other testimonies to the abundance of educational testing and empirical research on test use starting in the first half of the twentieth century can be found in Lincoln & Workman 1936, 4, 7; Butts 1947, 605; Monroe 1950, 1461; Holman & Docter 1972, 34; Tyack 1974, 183; and Lohman 1997, 88.
374	Joan L. Herman	Jennie Yeh	"Despite the great controversy that surrounds testing and its potential uses and abuses, there is little empirical information available about the nature of testing as it actually occurs and is used (or not used) in schools. The Test Use Project …." p.2	Dismissive	Contextual Examination of Test Use: The Test, The Setting, The Cost	Paper presented at the Annual Meeting of the National Council on Measurement in Education (Los Angeles, CA, April 1981)	https://files.eric.ed.gov/fulltext/ED218337.pdf	National Institute of Education, US Education Department	Rubbish. Entire books dating back a century were written on the topic, for example: C.C. Ross, Measurement in Today’s Schools, 1942; G.M. Ruch, G.D. Stoddard, Tests and Measurements in High School Instruction, 1927; C.W. Odell, Educational Measurement in High School, 1930. Other testimonies to the abundance of educational testing and empirical research on test use starting in the first half of the twentieth century can be found in Lincoln & Workman 1936, 4, 7; Butts 1947, 605; Monroe 1950, 1461; Holman & Docter 1972, 34; Tyack 1974, 183; and Lohman 1997, 88.
375	Joan L. Herman	Jennie Yeh	"Clearly the policy toward testing in this country has been one of accretion, but the full magnitude is undocumented. The CSE Test Use Project ... ", p.2	Dismissive	Contextual Examination of Test Use: The Test, The Setting, The Cost	Paper presented at the Annual Meeting of the National Council on Measurement in Education (Los Angeles, CA, April 1981)	https://files.eric.ed.gov/fulltext/ED218337.pdf	National Institute of Education, US Education Department	Rubbish. Entire books dating back a century were written on the topic, for example: C.C. Ross, Measurement in Today’s Schools, 1942; G.M. Ruch, G.D. Stoddard, Tests and Measurements in High School Instruction, 1927; C.W. Odell, Educational Measurement in High School, 1930. Other testimonies to the abundance of educational testing and empirical research on test use starting in the first half of the twentieth century can be found in Lincoln & Workman 1936, 4, 7; Butts 1947, 605; Monroe 1950, 1461; Holman & Docter 1972, 34; Tyack 1974, 183; and Lohman 1997, 88.
376	Charlotte Lazar-Morrison	Linda Polin, Raymond Moy, James Burry	"There is little research-based information about current testing practice."	Dismissive	A review of the literature on test use, p.3	CSE Report No. 144, August 1980	https://cresst.org/publications/cresst-publication-2531/	National Institute of Education, US Department of Health and Human Services	Rubbish. Entire books dating back a century were written on the topic, for example: C.C. Ross, Measurement in Today’s Schools, 1942; G.M. Ruch, G.D. Stoddard, Tests and Measurements in High School Instruction, 1927; C.W. Odell, Educational Measurement in High School, 1930. Other testimonies to the abundance of educational testing and empirical research on test use starting in the first half of the twentieth century can be found in Lincoln & Workman 1936, 4, 7; Butts 1947, 605; Monroe 1950, 1461; Holman & Docter 1972, 34; Tyack 1974, 183; and Lohman 1997, 88.
377	Charlotte Lazar-Morrison	Linda Polin, Raymond Moy, James Burry	"Almost ten years ago, Kirkland (1971) reviewed the literature on test impact on students and schools and found that while much had been written about tests, few empirical studies were evident."	Dismissive	A review of the literature on test use, p.3	CSE Report No. 144, August 1980	https://cresst.org/publications/cresst-publication-2531/	National Institute of Education, US Department of Health and Human Services	Rubbish. Entire books dating back a century were written on the topic, for example: C.C. Ross, Measurement in Today’s Schools, 1942; G.M. Ruch, G.D. Stoddard, Tests and Measurements in High School Instruction, 1927; C.W. Odell, Educational Measurement in High School, 1930. Other testimonies to the abundance of educational testing and empirical research on test use starting in the first half of the twentieth century can be found in Lincoln & Workman 1936, 4, 7; Butts 1947, 605; Monroe 1950, 1461; Holman & Docter 1972, 34; Tyack 1974, 183; and Lohman 1997, 88.
378	Charlotte Lazar-Morrison	Linda Polin, Raymond Moy, James Burry	"What is significant about [Kirkland's] exclusions is the correct observation that these issues are 'implications,' often not founded on empirical research."	Denigrating	A review of the literature on test use, p.3	CSE Report No. 144, August 1980	https://cresst.org/publications/cresst-publication-2531/	National Institute of Education, US Department of Health and Human Services	Rubbish. Entire books dating back a century were written on the topic, for example: C.C. Ross, Measurement in Today’s Schools, 1942; G.M. Ruch, G.D. Stoddard, Tests and Measurements in High School Instruction, 1927; C.W. Odell, Educational Measurement in High School, 1930. Other testimonies to the abundance of educational testing and empirical research on test use starting in the first half of the twentieth century can be found in Lincoln & Workman 1936, 4, 7; Butts 1947, 605; Monroe 1950, 1461; Holman & Docter 1972, 34; Tyack 1974, 183; and Lohman 1997, 88.
379	Charlotte Lazar-Morrison	Linda Polin, Raymond Moy, James Burry	"Today, there still remains a plethora of publications on these very issues and a dearth of empirical support on actual test use practices."	Dismissive	A review of the literature on test use, p.3	CSE Report No. 144, August 1980	https://cresst.org/publications/cresst-publication-2531/	National Institute of Education, US Department of Health and Human Services	Rubbish. Entire books dating back a century were written on the topic, for example: C.C. Ross, Measurement in Today’s Schools, 1942; G.M. Ruch, G.D. Stoddard, Tests and Measurements in High School Instruction, 1927; C.W. Odell, Educational Measurement in High School, 1930. Other testimonies to the abundance of educational testing and empirical research on test use starting in the first half of the twentieth century can be found in Lincoln & Workman 1936, 4, 7; Butts 1947, 605; Monroe 1950, 1461; Holman & Docter 1972, 34; Tyack 1974, 183; and Lohman 1997, 88.
380	Charlotte Lazar-Morrison	Linda Polin, Raymond Moy, James Burry	"Kirkland's review of the literature is concentrated mainly upon the social and psychological issues in testing, more than upon instructional issues. Also, then as now, little empirical research had accumulated on the latter.	Dismissive	A review of the literature on test use, p.3	CSE Report No. 144, August 1980	https://cresst.org/publications/cresst-publication-2531/	National Institute of Education, US Department of Health and Human Services	Rubbish. Entire books dating back a century were written on the topic, for example: C.C. Ross, Measurement in Today’s Schools, 1942; G.M. Ruch, G.D. Stoddard, Tests and Measurements in High School Instruction, 1927; C.W. Odell, Educational Measurement in High School, 1930. Other testimonies to the abundance of educational testing and empirical research on test use starting in the first half of the twentieth century can be found in Lincoln & Workman 1936, 4, 7; Butts 1947, 605; Monroe 1950, 1461; Holman & Docter 1972, 34; Tyack 1974, 183; and Lohman 1997, 88.
381	Charlotte Lazar-Morrison	Linda Polin, Raymond Moy, James Burry	"Only recently has the testing dialogue begun to move away from social and psychological issues ...and begun to focus on the instructional issues of testing.	Dismissive	A review of the literature on test use, p.3	CSE Report No. 144, August 1980	https://cresst.org/publications/cresst-publication-2531/	National Institute of Education, US Department of Health and Human Services	Rubbish. Entire books dating back a century were written on the topic, for example: C.C. Ross, Measurement in Today’s Schools, 1942; G.M. Ruch, G.D. Stoddard, Tests and Measurements in High School Instruction, 1927; C.W. Odell, Educational Measurement in High School, 1930. Other testimonies to the abundance of educational testing and empirical research on test use starting in the first half of the twentieth century can be found in Lincoln & Workman 1936, 4, 7; Butts 1947, 605; Monroe 1950, 1461; Holman & Docter 1972, 34; Tyack 1974, 183; and Lohman 1997, 88.
382	Charlotte Lazar-Morrison	Linda Polin, Raymond Moy, James Burry	" ...the testing dialogue has taken the form of a debate, with the bulk of the test literature being a series of position papers citing little empirical data. This debate is being carried on predominantly by people outside the schools."	Denigrating	A review of the literature on test use, p.4	CSE Report No. 144, August 1980	https://cresst.org/publications/cresst-publication-2531/	National Institute of Education, US Department of Health and Human Services	Rubbish. Entire books dating back a century were written on the topic, for example: C.C. Ross, Measurement in Today’s Schools, 1942; G.M. Ruch, G.D. Stoddard, Tests and Measurements in High School Instruction, 1927; C.W. Odell, Educational Measurement in High School, 1930. Other testimonies to the abundance of educational testing and empirical research on test use starting in the first half of the twentieth century can be found in Lincoln & Workman 1936, 4, 7; Butts 1947, 605; Monroe 1950, 1461; Holman & Docter 1972, 34; Tyack 1974, 183; and Lohman 1997, 88.
383	Charlotte Lazar-Morrison	Linda Polin, Raymond Moy, James Burry	""There is little empirical research available that can answer the questions that have arisen."	Dismissive	A review of the literature on test use, p.5	CSE Report No. 144, August 1980	https://cresst.org/publications/cresst-publication-2531/	National Institute of Education, US Department of Health and Human Services	Rubbish. Entire books dating back a century were written on the topic, for example: C.C. Ross, Measurement in Today’s Schools, 1942; G.M. Ruch, G.D. Stoddard, Tests and Measurements in High School Instruction, 1927; C.W. Odell, Educational Measurement in High School, 1930. Other testimonies to the abundance of educational testing and empirical research on test use starting in the first half of the twentieth century can be found in Lincoln & Workman 1936, 4, 7; Butts 1947, 605; Monroe 1950, 1461; Holman & Docter 1972, 34; Tyack 1974, 183; and Lohman 1997, 88.
384	Charlotte Lazar-Morrison	Linda Polin, Raymond Moy, James Burry	"... little is known about the amount of other testing that takes place."	Dismissive	A review of the literature on test use, p.6	CSE Report No. 144, August 1980	https://cresst.org/publications/cresst-publication-2531/	National Institute of Education, US Department of Health and Human Services	Rubbish. Entire books dating back a century were written on the topic, for example: C.C. Ross, Measurement in Today’s Schools, 1942; G.M. Ruch, G.D. Stoddard, Tests and Measurements in High School Instruction, 1927; C.W. Odell, Educational Measurement in High School, 1930. Other testimonies to the abundance of educational testing and empirical research on test use starting in the first half of the twentieth century can be found in Lincoln & Workman 1936, 4, 7; Butts 1947, 605; Monroe 1950, 1461; Holman & Docter 1972, 34; Tyack 1974, 183; and Lohman 1997, 88.
385	Charlotte Lazar-Morrison	Linda Polin, Raymond Moy, James Burry	"Although much has been written about minimum competency issues, there has yet to be any report of the actual uses or extent of the use of competency-based tests."	Dismissive	A review of the literature on test use, p.7	CSE Report No. 144, August 1980	https://cresst.org/publications/cresst-publication-2531/	National Institute of Education, US Department of Health and Human Services	Rubbish. Entire books dating back a century were written on the topic, for example: C.C. Ross, Measurement in Today’s Schools, 1942; G.M. Ruch, G.D. Stoddard, Tests and Measurements in High School Instruction, 1927; C.W. Odell, Educational Measurement in High School, 1930. Other testimonies to the abundance of educational testing and empirical research on test use starting in the first half of the twentieth century can be found in Lincoln & Workman 1936, 4, 7; Butts 1947, 605; Monroe 1950, 1461; Holman & Docter 1972, 34; Tyack 1974, 183; and Lohman 1997, 88.
386	Charlotte Lazar-Morrison	Linda Polin, Raymond Moy, James Burry	""Virtually nothing is known about the amount of testing taking place using other types of assessments."	Dismissive	A review of the literature on test use, p.7	CSE Report No. 144, August 1980	https://cresst.org/publications/cresst-publication-2531/	National Institute of Education, US Department of Health and Human Services	Rubbish. Entire books dating back a century were written on the topic, for example: C.C. Ross, Measurement in Today’s Schools, 1942; G.M. Ruch, G.D. Stoddard, Tests and Measurements in High School Instruction, 1927; C.W. Odell, Educational Measurement in High School, 1930. Other testimonies to the abundance of educational testing and empirical research on test use starting in the first half of the twentieth century can be found in Lincoln & Workman 1936, 4, 7; Butts 1947, 605; Monroe 1950, 1461; Holman & Docter 1972, 34; Tyack 1974, 183; and Lohman 1997, 88.
387	Charlotte Lazar-Morrison	Linda Polin, Raymond Moy, James Burry	"The literature on curriculum-embedded tests is equally scant."	Dismissive	A review of the literature on test use, p.8	CSE Report No. 144, August 1980	https://cresst.org/publications/cresst-publication-2531/	National Institute of Education, US Department of Health and Human Services	Rubbish. Entire books dating back a century were written on the topic, for example: C.C. Ross, Measurement in Today’s Schools, 1942; G.M. Ruch, G.D. Stoddard, Tests and Measurements in High School Instruction, 1927; C.W. Odell, Educational Measurement in High School, 1930. Other testimonies to the abundance of educational testing and empirical research on test use starting in the first half of the twentieth century can be found in Lincoln & Workman 1936, 4, 7; Butts 1947, 605; Monroe 1950, 1461; Holman & Docter 1972, 34; Tyack 1974, 183; and Lohman 1997, 88.
388	Charlotte Lazar-Morrison	Linda Polin, Raymond Moy, James Burry	"The current information focuses on norm- and criterion-referenced tests with some emphasis on minimum competency testing. Since literature on the other evaluative processes is lacking, there is a great need to look at various types of assessments to determine the purposes they serve.	Dismissive	A review of the literature on test use, p.9	CSE Report No. 144, August 1980	https://cresst.org/publications/cresst-publication-2531/	National Institute of Education, US Department of Health and Human Services	Rubbish. Entire books dating back a century were written on the topic, for example: C.C. Ross, Measurement in Today’s Schools, 1942; G.M. Ruch, G.D. Stoddard, Tests and Measurements in High School Instruction, 1927; C.W. Odell, Educational Measurement in High School, 1930. Other testimonies to the abundance of educational testing and empirical research on test use starting in the first half of the twentieth century can be found in Lincoln & Workman 1936, 4, 7; Butts 1947, 605; Monroe 1950, 1461; Holman & Docter 1972, 34; Tyack 1974, 183; and Lohman 1997, 88.
389	Charlotte Lazar-Morrison	Linda Polin, Raymond Moy, James Burry	"The kinds of contextual factors which influence testing and the use of test results are just beginning to be appreciated."	Dismissive	A review of the literature on test use, p.9	CSE Report No. 144, August 1980	https://cresst.org/publications/cresst-publication-2531/	National Institute of Education, US Department of Health and Human Services	Rubbish. Entire books dating back a century were written on the topic, for example: C.C. Ross, Measurement in Today’s Schools, 1942; G.M. Ruch, G.D. Stoddard, Tests and Measurements in High School Instruction, 1927; C.W. Odell, Educational Measurement in High School, 1930. Other testimonies to the abundance of educational testing and empirical research on test use starting in the first half of the twentieth century can be found in Lincoln & Workman 1936, 4, 7; Butts 1947, 605; Monroe 1950, 1461; Holman & Docter 1972, 34; Tyack 1974, 183; and Lohman 1997, 88.
390	Charlotte Lazar-Morrison	Linda Polin, Raymond Moy, James Burry	"Concern exists about the level of teacher training in testing. ... The literature does not appear to reflect any great follow-up to such suggestions [regarding teacher competence with testing]."	Dismissive	A review of the literature on test use, p.9	CSE Report No. 144, August 1980	https://cresst.org/publications/cresst-publication-2531/	National Institute of Education, US Department of Health and Human Services	Rubbish. Entire books dating back a century were written on the topic, for example: C.C. Ross, Measurement in Today’s Schools, 1942; G.M. Ruch, G.D. Stoddard, Tests and Measurements in High School Instruction, 1927; C.W. Odell, Educational Measurement in High School, 1930. Other testimonies to the abundance of educational testing and empirical research on test use starting in the first half of the twentieth century can be found in Lincoln & Workman 1936, 4, 7; Butts 1947, 605; Monroe 1950, 1461; Holman & Docter 1972, 34; Tyack 1974, 183; and Lohman 1997, 88.
391	Charlotte Lazar-Morrison	Linda Polin, Raymond Moy, James Burry	"All of the studies mentioned included information about standardized achievement testing. As of yet, there is no evidence about how teacher attitudes toward other types of tests affect the use of those assessments."	Dismissive	A review of the literature on test use, p.19	CSE Report No. 144, August 1980	https://cresst.org/publications/cresst-publication-2531/	National Institute of Education, US Department of Health and Human Services	Rubbish. Entire books dating back a century were written on the topic, for example: C.C. Ross, Measurement in Today’s Schools, 1942; G.M. Ruch, G.D. Stoddard, Tests and Measurements in High School Instruction, 1927; C.W. Odell, Educational Measurement in High School, 1930. Other testimonies to the abundance of educational testing and empirical research on test use starting in the first half of the twentieth century can be found in Lincoln & Workman 1936, 4, 7; Butts 1947, 605; Monroe 1950, 1461; Holman & Docter 1972, 34; Tyack 1974, 183; and Lohman 1997, 88.
392	Charlotte Lazar-Morrison	Linda Polin, Raymond Moy, James Burry	"The effect of the actual testing environment on test use is only beginning to emerge. Evidence suggests that characteristics of the test-takers and the instructional environment need to be explored."	Dismissive	A review of the literature on test use, p.19	CSE Report No. 144, August 1980	https://cresst.org/publications/cresst-publication-2531/	National Institute of Education, US Department of Health and Human Services	Rubbish. Entire books dating back a century were written on the topic, for example: C.C. Ross, Measurement in Today’s Schools, 1942; G.M. Ruch, G.D. Stoddard, Tests and Measurements in High School Instruction, 1927; C.W. Odell, Educational Measurement in High School, 1930. Other testimonies to the abundance of educational testing and empirical research on test use starting in the first half of the twentieth century can be found in Lincoln & Workman 1936, 4, 7; Butts 1947, 605; Monroe 1950, 1461; Holman & Docter 1972, 34; Tyack 1974, 183; and Lohman 1997, 88.
393	Charlotte Lazar-Morrison	Linda Polin, Raymond Moy, James Burry	"These factors have been considered in research on teachers' instructional decision-making or in studies of the social or organizational qualities of the classroom. The investigation of these variables as factors affecting teachers' use of tests and test data is minimal."	Dismissive	A review of the literature on test use, p.20	CSE Report No. 144, August 1980	https://cresst.org/publications/cresst-publication-2531/	National Institute of Education, US Department of Health and Human Services	Rubbish. Entire books dating back a century were written on the topic, for example: C.C. Ross, Measurement in Today’s Schools, 1942; G.M. Ruch, G.D. Stoddard, Tests and Measurements in High School Instruction, 1927; C.W. Odell, Educational Measurement in High School, 1930. Other testimonies to the abundance of educational testing and empirical research on test use starting in the first half of the twentieth century can be found in Lincoln & Workman 1936, 4, 7; Butts 1947, 605; Monroe 1950, 1461; Holman & Docter 1972, 34; Tyack 1974, 183; and Lohman 1997, 88.
394	Charlotte Lazar-Morrison	Linda Polin, Raymond Moy, James Burry	"In the community, parent involvement, accounability pressures, and news media coverage of test scores are possible influences on the nature and amount of testing, but they have yet to be researched."	Dismissive	A review of the literature on test use, p.20	CSE Report No. 144, August 1980	https://cresst.org/publications/cresst-publication-2531/	National Institute of Education, US Department of Health and Human Services	Rubbish. Entire books dating back a century were written on the topic, for example: C.C. Ross, Measurement in Today’s Schools, 1942; G.M. Ruch, G.D. Stoddard, Tests and Measurements in High School Instruction, 1927; C.W. Odell, Educational Measurement in High School, 1930. Other testimonies to the abundance of educational testing and empirical research on test use starting in the first half of the twentieth century can be found in Lincoln & Workman 1936, 4, 7; Butts 1947, 605; Monroe 1950, 1461; Holman & Docter 1972, 34; Tyack 1974, 183; and Lohman 1997, 88.
395	Charlotte Lazar-Morrison	Linda Polin, Raymond Moy, James Burry	"We know very little about the costs of testing."	Dismissive	A review of the literature on test use, p.20	CSE Report No. 144, August 1980	https://cresst.org/publications/cresst-publication-2531/	National Institute of Education, US Department of Health and Human Services	Rubbish. Entire books dating back a century were written on the topic, for example: C.C. Ross, Measurement in Today’s Schools, 1942; G.M. Ruch, G.D. Stoddard, Tests and Measurements in High School Instruction, 1927; C.W. Odell, Educational Measurement in High School, 1930. Other testimonies to the abundance of educational testing and empirical research on test use starting in the first half of the twentieth century can be found in Lincoln & Workman 1936, 4, 7; Butts 1947, 605; Monroe 1950, 1461; Holman & Docter 1972, 34; Tyack 1974, 183; and Lohman 1997, 88.
396	Charlotte Lazar-Morrison	Linda Polin, Raymond Moy, James Burry	"Little information is available about these types of costs, and the little information that is available concerns teachers and student attitudes."	Dismissive	A review of the literature on test use, p.22	CSE Report No. 144, August 1980	https://cresst.org/publications/cresst-publication-2531/	National Institute of Education, US Department of Health and Human Services	Rubbish. Entire books dating back a century were written on the topic, for example: C.C. Ross, Measurement in Today’s Schools, 1942; G.M. Ruch, G.D. Stoddard, Tests and Measurements in High School Instruction, 1927; C.W. Odell, Educational Measurement in High School, 1930. Other testimonies to the abundance of educational testing and empirical research on test use starting in the first half of the twentieth century can be found in Lincoln & Workman 1936, 4, 7; Butts 1947, 605; Monroe 1950, 1461; Holman & Docter 1972, 34; Tyack 1974, 183; and Lohman 1997, 88.
397	Charlotte Lazar-Morrison	Linda Polin, Raymond Moy, James Burry	"The question of whether test scores affect a student's self-concept has also been raised." ... As indicated previously, information on any of the aforementioned issues is scant,"	Dismissive	A review of the literature on test use, p.23	CSE Report No. 144, August 1980	https://cresst.org/publications/cresst-publication-2531/	National Institute of Education, US Department of Health and Human Services	Rubbish. Entire books dating back a century were written on the topic, for example: C.C. Ross, Measurement in Today’s Schools, 1942; G.M. Ruch, G.D. Stoddard, Tests and Measurements in High School Instruction, 1927; C.W. Odell, Educational Measurement in High School, 1930. Other testimonies to the abundance of educational testing and empirical research on test use starting in the first half of the twentieth century can be found in Lincoln & Workman 1936, 4, 7; Butts 1947, 605; Monroe 1950, 1461; Holman & Docter 1972, 34; Tyack 1974, 183; and Lohman 1997, 88.
398	Charlotte Lazar-Morrison	Linda Polin, Raymond Moy, James Burry	"Other evidence suggests that tests of many types are being administered and the results are being utilized. To what extent this is occurring is not specifically known."	Dismissive	A review of the literature on test use, pp.23-24	CSE Report No. 144, August 1980	https://cresst.org/publications/cresst-publication-2531/	National Institute of Education, US Department of Health and Human Services	Rubbish. Entire books dating back a century were written on the topic, for example: C.C. Ross, Measurement in Today’s Schools, 1942; G.M. Ruch, G.D. Stoddard, Tests and Measurements in High School Instruction, 1927; C.W. Odell, Educational Measurement in High School, 1930. Other testimonies to the abundance of educational testing and empirical research on test use starting in the first half of the twentieth century can be found in Lincoln & Workman 1936, 4, 7; Butts 1947, 605; Monroe 1950, 1461; Holman & Docter 1972, 34; Tyack 1974, 183; and Lohman 1997, 88.
399	Charlotte Lazar-Morrison	Linda Polin, Raymond Moy, James Burry	"There are a number of areas concerning teachers and testing for which there is no information."	Dismissive	A review of the literature on test use, p.24	CSE Report No. 144, August 1980	https://cresst.org/publications/cresst-publication-2531/	National Institute of Education, US Department of Health and Human Services	Rubbish. Entire books dating back a century were written on the topic, for example: C.C. Ross, Measurement in Today’s Schools, 1942; G.M. Ruch, G.D. Stoddard, Tests and Measurements in High School Instruction, 1927; C.W. Odell, Educational Measurement in High School, 1930. Other testimonies to the abundance of educational testing and empirical research on test use starting in the first half of the twentieth century can be found in Lincoln & Workman 1936, 4, 7; Butts 1947, 605; Monroe 1950, 1461; Holman & Docter 1972, 34; Tyack 1974, 183; and Lohman 1997, 88.
400	Charlotte Lazar-Morrison	Linda Polin, Raymond Moy, James Burry	"The impact of other testing must also be considered. In-class assessments made by individual teachers have yet to be examined in depth."	Dismissive	A review of the literature on test use, p.24	CSE Report No. 144, August 1980	https://cresst.org/publications/cresst-publication-2531/	National Institute of Education, US Department of Health and Human Services	Rubbish. Entire books dating back a century were written on the topic, for example: C.C. Ross, Measurement in Today’s Schools, 1942; G.M. Ruch, G.D. Stoddard, Tests and Measurements in High School Instruction, 1927; C.W. Odell, Educational Measurement in High School, 1930. Other testimonies to the abundance of educational testing and empirical research on test use starting in the first half of the twentieth century can be found in Lincoln & Workman 1936, 4, 7; Butts 1947, 605; Monroe 1950, 1461; Holman & Docter 1972, 34; Tyack 1974, 183; and Lohman 1997, 88.
401	Charlotte Lazar-Morrison	Linda Polin, Raymond Moy, James Burry	"Teachers place greater reliance on, and have more confidence in, the results of their own judgments of students' performance, but little is known about the kinds of activities that give voice to this information."	Dismissive	A review of the literature on test use, p.25	CSE Report No. 144, August 1980	https://cresst.org/publications/cresst-publication-2531/	National Institute of Education, US Department of Health and Human Services	Rubbish. Entire books dating back a century were written on the topic, for example: C.C. Ross, Measurement in Today’s Schools, 1942; G.M. Ruch, G.D. Stoddard, Tests and Measurements in High School Instruction, 1927; C.W. Odell, Educational Measurement in High School, 1930. Other testimonies to the abundance of educational testing and empirical research on test use starting in the first half of the twentieth century can be found in Lincoln & Workman 1936, 4, 7; Butts 1947, 605; Monroe 1950, 1461; Holman & Docter 1972, 34; Tyack 1974, 183; and Lohman 1997, 88.
402	Charlotte Lazar-Morrison	Linda Polin, Raymond Moy, James Burry	"The settings and factors which affect the use of tests and their results is yet another uninformed area."	Dismissive	A review of the literature on test use, p.25	CSE Report No. 144, August 1980	https://cresst.org/publications/cresst-publication-2531/	National Institute of Education, US Department of Health and Human Services	Rubbish. Entire books dating back a century were written on the topic, for example: C.C. Ross, Measurement in Today’s Schools, 1942; G.M. Ruch, G.D. Stoddard, Tests and Measurements in High School Instruction, 1927; C.W. Odell, Educational Measurement in High School, 1930. Other testimonies to the abundance of educational testing and empirical research on test use starting in the first half of the twentieth century can be found in Lincoln & Workman 1936, 4, 7; Butts 1947, 605; Monroe 1950, 1461; Holman & Docter 1972, 34; Tyack 1974, 183; and Lohman 1997, 88.

	IRONIES:
	Rand Corporation		"All RAND [monographs/occasional papers/etc.] undergo rigorous peer review to ensure that they meet high standards for research quality and objectivity."
	Susannah Faxon-Mills, Laura S. Hamilton, Mollie Rudnick, Brian M. Stecher		"We found considerable research on the effects of testing in U.S. schools, including studies of high-stakes testing, performance assessment, and formative assessment." p. viii		New Assessments, Better Instruction? Designing Assessment Systems to Promote Instructional Improvement	Rand Corporation Research Report, 2013		"Funding to support the research was provided by the William and Flora Hewlett Foundation." "Marc Chun at the Hewlett Foundation first approached us about reviewing the literature on the impact of assessment, and he was very helpful in framing this investigation."
	Michael J. Feuer		"To challenge authority is to hold authority accountable. Challenging people in power requires them to show that what they are doing is legitimate; we invite them to rise to the challenge and prove their case; and they, in turn, trust that the system will treat them fairly."		Measuring Accountability When Trust Is Conditional	Education Week, September 24, 2012	https://www.edweek.org/ew/articles/2012/09/24/05feuer_ep.h32.html?print=1
	Michael J. Feuer		"No profession is granted automatic autonomy or an exemption from evaluation."		Measuring Accountability When Trust Is Conditional	Education Week, September 24, 2012	https://www.edweek.org/ew/articles/2012/09/24/05feuer_ep.h32.html?print=1
	Joan L. Herman	Susan H. Fuhrman & Richard F. Elmore, Eds	"Granted, one would expect to see higher growth on KIRIS, which was customized to Kentucky's learning objectives, than to the more general and thereby less curricularly sensitive NAEP measure."		Redesigning Accountability Systems for Education, Chapter 7	Teachers College Press, 2004		Institute of Education Sciences, US Education Department
	Deborah Loewenberg Ball	Jo Boaler, Phil Daro, Andrew Porter, & 14 others	"High-quality work depends on open debate unconstrained by orthodoxies and political agendas. It is crucial that the composition of the panels and the extended research communities be inclusive, engaging individuals with a wide range of views and skills." p.xxiii		Mathematical Proficiency for All Students	Rand Corporation, 2003	https://www.rand.org/pubs/monograph_reports/MR1643.html	Office of Research and Improvement, US Education Department
	Laura S. Hamilton	Brian M. Stecher, Stephen P. Klein	"Greater knowledge about testing and accountability can lead to better system design and more-effective system management." p.xiv		Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002	Summary, p.xiv
	Laura S. Hamilton	Brian M. Stecher	"Incremental improvements to existing systems, based on current research on testing and accountability, should be combined with long-term research and development efforts that may ultimately lead to a major redesign of these systems. Success in this endeavor will require the thoughtful engagement of educators, policymakers, and researchers in discussions and debates about tests and testing policies."		Making Sense of Test-Based Accountability in Education, Rand Corporation, 2002	Chapter 6, Improving test-based accountability, pp.143-144
	Brian M. Stecher	Stephen P. Klein	"Additional information about the impact of performance assessments on curriculum and instruction would provide policymakers with valuable data on the benefits that may accrue from this relatively expensive form of assessment." p.11		The Cost of Science Performance Assessments in Large-Scale Testing Programs, p.1	Educational Evaluation and Policy Analysis, Spring 1997, 19(1)
	Ronald James Dietel		"comparative information from other research organizations would aid decision makers in measuring program quality;" Abstract		Evaluation of the Dissemination Program from an Education Research and Development Center	Doctoral Dissertation, University of California, Los Angeles
	Eva L. Baker	Robert L. Linn, Joan L. Herman	"Diverse perspectives are needed to clarify real differences and to find equitable, workable balances."		CRESST: A Continuing Mission to Improve Educational Assessment, p.13	Evaluation Comment, Summer 1996
	Eva L. Baker	Robert L. Linn, Joan L. Herman	"Impartiality, not advocacy, is the key to the credibility of research and development."		CRESST: A Continuing Mission to Improve Educational Assessment, p.13	Evaluation Comment, Summer 1996
	Madaus, G.F.		"too often policy debates emphasize only one side or the other of the testing effects coin"		The effects of important tests on students: Implications for a National Examination System, 1991	Phi Delta Kappan, 73(3), 226-231.	As quoted in William A. Mehrens, Consequences of Assessment: What is the Evidence?, Education Policy Analysis Archives Volume 6 Number 13 July 14, 1998, https://epaa.asu.edu/ojs/article/view/580/

			Author cites (and accepts as fact without checking) someone elses dismissive review
			Cite selves or colleagues in the group, but dismiss or denigrate all other work
			Falsely claim that research has only recently been done on topic.

	1) [as of July 4, 2021] SCOPE funders include: Bill & Melinda Gates Foundation; California Education Policy Fund; Carnegie Corporation of New York; Center for American Progress; Community Education Fund, Silicon Valley Community Foundation; Ford Foundation; James Irvine Foundation; Joyce Foundation; Justice Matters; Learning Forward; Metlife Foundation; National Center on Education and the Economy; National Education Association; National Public Education Support Fund; Nellie Mae Education Foundation; NoVo Foundation; Rose Foundation;S. D. Bechtel, Jr. Foundation; San Francisco Foundation; Sandler Foundation; Silver Giving Foundation; Spencer Foundation; Stanford University; Stuart Foundation; The Wallace Foundation; William and Flora Hewlett Foundation; William T. Grant Foundation