Critiquing Charles Murray on Changing IQ Gaps

In his new book, and in recent interviews, Charles Murray makes the claim that the Black-White IQ gap has shrunk to around 12 points, having fallen between the 1970s and mid 1980s and remained roughly stagnant since. In this article, I will explain why I think he is wrong.

First, let’s look at the graph he uses to visualize this trend.

Murray (2021)

Something which immediately pops out is that the proportion of data points coming from “cognitive batteries” significantly decreased around 1990. The black dots mostly represent NAEP data and are not based on formal IQ tests. How problematic this is will depend upon how much formal IQ tests share with the NAEP.

But the truth is that IQ tests don’t even share enough with each other for this sort of aggregation to make sense. On average, roughly half of the variance in a given IQ test overlaps with the variance in ability measured in other IQ tests. The other half of variance in ability an IQ test measures usually reflects specific abilities which that test measures but which are not common across most IQ tests.

AFQT 1

(Herrnstein and Murray, 1996)

The variance which IQ tests do share is due to the g factor. After extracting a g factor, the residual correlation between mental abilities is, on average, roughly zero (Gignac, 2016). While full scale IQ scores taken from different tests only share roughly half of their variance, g factor scores taken from different tests share virtually all of their variance in common. This fact will be important later.

  Citation  Test 1  Test 2Correlation Between G Factors
Johnson et al. (2004)WAISHB Raven1
Johnson et al. (2004)WAISCAB0.99
Johnson et al. (2004)CABHB Raven0.99
Johnson et al. (2008)GATBRoyal Dutch Navy0.96
Johnson et al. (2008)GATBTIB1
Johnson et al. (2008)GATBCattell Culture Fair Test0.77
Johnson et al. (2008)GATBGroninger Intelligence Test0.99
Johnson et al. (2008)Groninger Intelligence TestRoyal Dutch Navy0.98
Johnson et al. (2008)Groninger Intelligence TestTIB1
Johnson et al. (2008)Groninger Intelligence TestCattell Culture Fair Test0.96
Johnson et al. (2008)Cattell Culture FairDutch Royal Navy0.88
Johnson et al. (2008)Cattell Culture FairTIB0.79
Johnson et al. (2008)TIBRoyal Dutch Navy0.95
Median0.98
Mean0.94

Returning to Murray’s analysis, he admits that the non IQ test data he is using “tend to understate differences in g that would be revealed by full-scale cognitive test batteries”. Given this, such tests probably don’t even share half of their variance in common with IQ tests. Consequently, even though they are plotted on the same axis the variable being measured is meaningfully changing with time. For this reason, comparisons of earlier and later periods will obviously be invalid.

For Murray’s estimate of the current racial IQ gap, he uses ten tests, only one of which is an actual IQ test. This is because he restricts himself to only using data from 2010 or later and it just happens than not many IQ tests have been normed since then. This decision to restrict the analysis to very recent data might make since if we thought that the gap before 2010 was significantly different than the one after it. But Murray’s own story is that the gap has been pretty stable since the mid 1980s. Given this, such a restriction makes no sense.

Murray calls his choice to only use very recent data points which are almost entirely not derived from IQ tests “conservative”. Normally, a conservative decision would be one which minimizes the risk for error, but in this case it seemingly involved introducing unnecessary and obvious error into the analysis in order to alter the result.

Turning now just to IQ tests, before I said that they themselves are too heterogenous to be simply aggregated and this fact is easily demonstrated.

Consider the following:

  • According to the WISC, the B/W gap stayed at 1.2SD from 1972 to 1989 and then fell to 0.89 in 2002.
  • According to WJ data, the gap fell from 1.33d to 0.75 between 1976 and 1987 but then increased back to 1.02d by 1998 before falling back to 0.76 in 2012.
  • The WAIS data says the gap was only 1.00d in 1978, fell to 0.92d by 1995, but then rose to 1.06d, its all time high, in 2006.
  • The SB shows a decline from 1.11d to 0.98 from 1986 to 2002
  • The AFQT shows a decline from 1.35d (1972) to 1.24d(1980) and then 0.98d(1997).

Notice that these tests generally don’t show the same trends over time and, perhaps most importantly, literally none of them show the B/W IQ gap falling significantly below 1SD in the 1980s and staying put since. Murray’s narrative is an artifact of combining data that should not be combined to produce results that no coherent dataset actually shows.

I think we can produce a better estimate of the current racial gap, but before turning to that I want to note some oddities in Murray’s data. First, it is interesting that in 2007 Murray reported that the WJ1, 2, and 3, samples showed B/W gaps of 1.23d, 0.90d, and 1.05d.

Murray (2007)

The reason this is interesting is that in his new dataset Murray reports the gaps from WJ1, 2, and 3, as 1.33d, 0.75d, and 1.02d. I don’t know what explains these differences but the effect of them is to increase the degree to which the WJ data shows a decline in the B/W IQ gap over time. The WJ4 data Murray is now reporting is based on unpublished data, so not much can be said about that.

I also want to note that the inclusion criteria Murray used when selecting tests is non-obvious. For instance, he included a sample of the KAIT, even though previous research on trends in the B/W IQ gap have excluded this data due to the test being constructed so as to minimize racial gaps (Dickens and Flynn, 2006). There are also various tests (Wonderlic, DAT, etc.) which could have been included but for whatever reason weren’t. Because Murray’s work is aimed at a popular audience, him lacking an explanation of this is not surprising, but I thought I should note that the inclusion criteria being used here is not self evident.

Anyway, as we’ve seen unlike is the case with comparing full scale IQ scores generated from differing tests, or IQ scores to scores on non-IQ tests, g factor scores should be virtually the same regardless of the test they happen to be extracted from. For this reason, g factor scores are far better suited for this kind of analysis. After briefly looking for estimates of the g factor gap between Blacks and Whites, I was able to find a handful of studies, each using a large and quality sample, and they all showed the B/W gap still being moderately above 1.0d even after the early 80s.

SourceTestGapYear
Lasker et al. (2019)PCNB1.05d2011
Frisby et al. (2015)WAIS-41.16d2006
Hu et al. (2019)NLSY971.13d1997
Lasker et al. (2021)VET1.41987
Lasker et al. (2021)NLSY791.261979
Jensen et al. (1982)WISC-R1.141974

Because it is g that drives racial disparities in IQ to begin with, as well as the role IQ plays in explaining racial inequality, I feel that this gives us a better picture of what is happening with the racial IQ gap over the last few decades. Racial differences in non-general abilities are normally small and sometimes favor blacks, and the gap in those abilities may have changed with time, or with time we may be including tests that more heavily measure such abilities, but any investigation of this would require using the same specific measures repeatedly and by its nature is not susceptible to aggregation across tests. Moreover, it would likely not be of much more than academic interest, since it is differences in general ability that produced the gaps we all know and care about.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s