Should we switch from mandated “standardized” tests to mandated “performance” tests?

Sandra Stotsky, August 1, 2019

According to many education writers in this country, there are no tests in Finnish schools, at least no “mandated standardized tests.” That phrase was carefully hammered out by Smithsonian Magazine to exclude the many no- or low-stakes “norm-referenced” tests (like the Iowa Test of Basic Skills, or ITBS) that have been given for decades across this country especially in the elementary grades to help school administrators to understand where their students’ achievement fell under a “normal curve” of distributing test scores. https://thefederalist.com/2014/09/24/top-ten-things-parents-hate-about-common-core/ https://www.smithsonianmag.com/innovation/why-are-finlands-schools-successful-49859555/

Yet, a prominent Finnish educator tells us that Finnish teachers regularly test their upper grade students. https://pioneerinstitute.org/news/the-serpent-in-finlands-garden-of-equityessay-review-of-finnish-lessons-what-can-the-world-learnfrom-educational-change-in-finland-by-pasi-sahlberg/ As Finnish educator, Pasi Sahlberg, noted (p. 25), teachers assess student achievement in the upper secondary school at the end of each six to seven-week period, or five or six times per subject per school year. There are lots of tests in Finnish schools, it seems, but mainly teacher-made tests (not state-wide tests) of what they have taught. There are also “matriculation” tests at the end of high school (as the Smithsonian article admits)—for students who want to go to a Finnish university. They are in fact voluntary; only students who want to go on to university take them. Indeed, there are lots of tests for Finnish students, just not where American students are heavily tested (in the elementary and middle grades) and not constructed by a testing company.

Why should Americans now be even more interested in the topic of testing than ever before? Mainly because there seems to be a groundswell developing for “performance” tests in place of “standardized” tests. And they are called “assessments” perhaps to make parents and teachers think they are not those dreaded tests mandated by state boards of education for grades 3-8 and beyond as part of the Every Student Succeeds Act (ESSA). Who wouldn’t want a test that “accurately measures one or more specific course standards”? And is also “complex, authentic, process and/or product-oriented, and open-ended.” Edutopia’s writer, Patricia Hilliard, doesn’t tell us in her 2015 blog “Performance-Based Assessment: Reviewing the Basics” whether it also brushes our hair and shines our shoes at the same time. https://www.edutopia.org/blog/performance-based-assessment-reviewing-basics-patricia-hilliard

It’s as if our problem was simply the type of test that states have been giving, not what is tested nor the cost or amount of time teachers and students spend on them. It doesn’t take much browsing on-line to discover that two states have already found out there were deep problems with those tests, too: Vermont and Kentucky.

An old government publication (1993) warned readers about some of the problems with portfolios: ”Users need to pay close attention to technical and equity issues to ensure that the assessments are fair to all students.” https://www2.ed.gov/pubs/OR/ConsumerGuides/admuses.html It turns out that portfolios are not good for high stakes assessment—for a range of important reasons. In a nutshell, they are costly, time-consuming, and unreliable. Quoting one of the researchers/evaluators in the Vermont initiative, it indicates: “The Vermont experience demonstrates the need to set realistic expectations for the short-term success of performance-assessment programs and to acknowledge the large costs of these programs.” The authors state elsewhere in their own blog that the researchers “found the reliability of the scoring by teachers to be very low in both subjects… Disagreement among scorers alone accounts for much of the variance in scores and therefore invalidates any comparisons of scores.” https://www.ernweb.com/educational-research-articles/preliminary-results-of-a-large-scale-portfolio-assessment-program/ https://eric.ed.gov/?id=EJ598325

Validity and reliability are the two central qualities needed in a test. Indeed, the first two chapters of the testing industry’s “bible,” The Standards for Educational and Psychological Testing are devoted to those two topics. https://www.apa.org/science/programs/testing/standards

We learned even more from a book chapter by education professor George K. Cunningham on the “failed accountability system” in Kentucky. http://education-consumers.org/pdf/Cunningham2.pdf One of Cunningham’s most astute observations is the following:

Historically, the purpose of instruction in this country has been increasing student academic achievement. This is not the purpose of progressive education, which prefers to be judged by standards other than student academic performance. The Kentucky reform presents a paradox, a system structured to require increasing levels of academic performance while supporting a set of instructional methods that are hostile to the idea of increased academic performance (pp. 264-65).

That is still the dilemma today—skills-oriented standards assessed by “standardized” tests that require, for the sake of a reliable assessment, some multiple-choice questions.

Cunningham also warned, in the conclusion to his long chapter on Kentucky, about using performance assessments for large-scale assessment (p. 288). “The Performance Events were expensive and presented many logistical headaches.” In addition, he noted:

The biggest problem with using performance assessments in a standards-based accountability system, other than poor reliability, is the impossibility of equating forms longitudinally from year to year or horizontally with other forms of assessment. In Kentucky, because of the amount of time required, each student participated in only one performance assessment task. As a result, items could never be reused from year to year because of the likelihood that students would remember the tasks and their responses. This made equating almost impossible.

Further details on the problems of equating Performance Events may be found in a technical review in January 1998 by James Catterall and four others for the Commonwealth of Kentucky Legislative Research Commission. Also informative is a 1995 analysis of Kentucky’s tests by Ronald Hambleton et al. It is a scanned document and can be made searchable with Adobe Acrobat Professional.

https://legislature.ky.gov/LRC/OEA/Documents/MEASUREMENT%20QUALITY%20FINAL%20REPORT%2091-94.pdf

A slightly optimistic account of what could be learned from the attempt to use writing and mathematics portfolios for assessment can be found in a recent paper by education analyst Richard Innes at Kentucky’s Bluegrass Institute. http://www.freedomkentucky.org/images/d/d4/KERAReport.pdf

For more articles on the costs and benefits of student testing, see the following:

Phelps, R. P. (2002, February). Estimating the costs and benefits of educational testing programs. Briefings on Educational Research, Education Consumers Clearinghouse, 2(2). http://www.education-consumers.com/briefs/phelps2.shtm

Phelps, R. P. (2000, Winter). Estimating the cost of systemwide student testing in the United States. Journal of Education Finance, 25(3) 343–380. http://www.jstor.org/discover/10.2307/40704103?uid=3739896&uid=2134&uid=2&uid=70&uid=4&uid=3739256&sid=21106063737141

Phelps, R. P., et al. (1993). Student testing: Current extent and expenditures, with cost estimates for a national examination. GAO/PEMD-93-8, U.S. General Accounting Office, U.S. Congress.

Concluding Remarks:

Changing to highly subjective “performance-based assessments” removes any urgent need for content-based questions. That was why the agreed-upon planning documents for teacher licensure tests in Massachusetts (which were required by the Massachusetts Education Reform Act of 1993) specified more multiple-choice questions on content than essay questions in their format (they all included both) and, for their construction, revision, and approval, required content experts as well as practicing teachers with that license, together with education school faculty who taught methods courses (pedagogy) for that license. With the help of the president of the National Evaluation Systems (NES, the state’s licensure test developer) and others in the company, the state was able to get more content experts involved in the test approval process. What Pearson, a co-owner of these tests, has done since its purchase of NES is unknown.

For example, it is known that for the Foundations of Reading (90), a licensure test for most prospective teachers of young children (in programs for elementary, early childhood, and special education teachers), Common Core’s beginning reading standards were added to the test description, as were examples for assessing the state’s added standards to the original NES Practice Test. It is not known if changes were made to the licensure test itself (used by about 6 other states) or to other Common Core-aligned licensure tests or test preparation materials, e.g., for mathematics. Even if Common Core’s standards are eliminated (as in Florida in 2019 by a governor’s Executive Order), their influence remains in some of the pre-Common Core licensure tests developed in the Bay State—tests that contributed to academically stronger teachers for the state.

It is time for the Bay State’s own legislature to do some prolonged investigations of the costs and benefits of “performance-based assessments” before agreeing to their possibility in Massachusetts and to arguments that may be made by FairTest, a Bay State-based company, or others who are eager to eliminate “standardized” testing but implement expensive and unreliable performance tests.

This entry was posted in Common Core, Curriculum & Instruction, Reading & Writing, Sandra Stotsky, Testing/Assessment. Bookmark the permalink.

Leave a Reply

Your email address will not be published.