Comparing states by only looking at overall NAEP average scores can provide incomplete analysis of performance

One of the more notable problems with much that is written about the National Assessment of Educational Progress (NAEP) regarding relative state performances is that far too often, only overall average scores are compared. Whether we are talking college professors, state education agencies, local educators, members of the press, and more, far too often some important parts of the real story are ignored because only overall average scores are compared.

This isn’t a new problem. The National Center for Education Statistics (NCES) has cautioned about overly simplistic analysis that only looks at overall average scores for many years. NCES even included special comments on the topic in the 2009 NAEP Science Report Card (http://nces.ed.gov/nationsreportcard/pdf/main2009/2011451.pdf).

Below is a partial extract from Page 32 in that report card that highlights some examples of how the picture can be VERY different once more thorough analysis of NAEP is conducted.

The first example used by NCES is Kentucky’s performance in the 2009 Grade 8 NAEP Science Assessment. When you only look at overall average scores, Kentucky scores statistically significantly higher than the national public school average. However, when you only consider scores for White students in each state, Kentucky’s score statistically significantly lower than the national average. Once you learn that in this assessment Kentucky’s NAEP student sample was 85% White, the importance of this additional information becomes far more apparent.

As you can see in the next example from the Page 32 extract, things can work the opposite way, as well. When you only consider overall average scores, Florida’s 8th Graders scored statistically significantly below the national public school average in the 2009 NAEP Science Assessment. But Florida’s Hispanic 8th Graders scored statistically significantly higher than the national public school average for all Hispanics.

In both cases, the picture presented only by the overall average scores is incomplete and might be rather misleading. Very simply, good analysis with NAEP requires more. Those who fail to provide it are not presenting strong arguments for whatever case they are trying to make.

Another example, Mississippi’s Grade 8 Reading improvement

The first comment in this thread deals with general examples from NCES about how comparing state NAEP results by only looking at overall average student scores can provide a very incomplete picture of the true relative performances of those states. In this section, I provide data from the 2013 and 2024 NAEP Grade 8 Reading Assessment to show that comments still being made by educators that Mississippi has not made any headway in moving its notable 4thgrade performance up to the 8th grade are no longer accurate.

It is true, as some engaged in these discussions have written, that when you look at only overall average NAEP Grade 8 Reading scores, Mississippi still ranks towards the bottom of the stack. But what such pundits aren’t telling us is that once you break the data out by race, the improvement in Mississippi’s Grade 8 NAEP Reading performance compared to other states is hard to miss.

The tables below show separate breakouts of scores for White students in the top half and Black students in the bottom half. Some states did not get score reports for Black students as the samples were inadequate in some way, most likely due to low enrollment numbers of Black students in those states. The states are ranked by their reported NAEP Scale Score for Grade 8 Reading in the two listed years.

As you can see below, back in 2013, the year Mississippi started enacting some key education reform legislation, both its White and Black students performed very poorly compared to other states. However, as shown in the right side of the graphic, both the state’s White and Black students’ rankings have moved up since 2013 and in the latest NAEP Grade 8 Reading Assessment are notably higher. But that isn’t the complete story.

One issue with NAEP rankings by scores is that this assessment only tests a sample of students in each state, so all the scores have statistical sampling errors. Due to the presence of sampling errors, it is possible that the true relative performance of two states fairly close in the table might actually be reversed if NAEP had tested all the students in those two states. The presence of this statistical sampling issue is why the tables below, which come from the NAEP Data Explorer web tool (https://nationsreportcard.gov/ndecore/landing), also show information about the statistical significance in the scores. One column of particular interest shows how many states scored statistically “significantly higher” than the state listed in each row. Let’s see how this statistical analysis works out for Mississippi.

For White students, back in 2013 a total of 43 states outscored Mississippi’s by a statistically significant amount in NAEP Grade 8 Reading. By 2024 that number had been cut remarkably as just 7 states outscored Mississippi by a statistically significant amount. Can any reasonable person think that isn’t notable progress?

For Black students, back in 2013, those in 27 other states statistically significantly outscored Mississippi’s. Flash forward to 2024 and now only 2 states can make that claim – just 2.

The message about Mississippi in the tables below also reiterates that you simply cannot get an accurate picture about what is happening in education by only looking at overall NAEP scores. As NCES points out in its 2009 NAEP Science Report Card and as pointed out in the examples using Mississippi, you have to look deeper. Those who don’t do more than look at overall scores are not doing adequate research and may just be cherry picking information to try and support an incorrect case. Don’t fall for that.