Scott Kaufman has recently posted a critique at Scientific American of some things that the behavioral geneticist Robert Plomin has been saying lately about genetics and intelligence. I agree with a few of Kaufman’s points, but most of them I find to be problematic for reasons I will explain in this post.
Kaufman’s article takes the form a series of responses to quotes from Plomin. The first quote is this: ” “In the Nature-Nurture War, Nature Wins. Environmental influences are important, too, but they are largely unsystematic, unstable and idiosyncratic.”
Kaufman argues that this statement is contradictory, saying “How can one say that nature wins and then in the very next sentence say that environmental influences are important too?”
The solution to this seems pretty simple to me: genetics explains most of the variation in several important psychological variables, so it “wins”, but it doesn’t explain all of the variation in such traits, and so the environment still matters. This is what I assume Plomin meant.
The Family Environment
Plomin’s second quote is this: ““For most of the 20th century, environmental factors were called nurture because the family was thought to be crucial in determining environmentally who we become. Genetic research has shown that this is not the case.”
To expand on what Plomin said, in the 20th century many psychologists thought that variables like parenting style or household income explained a great deal of why it is that some people are depressed, schizophrenic, intelligent, etc., while others are not. Today, we know that, in adulthood, individuals who are not genetically related but who nonetheless had the same home environment via adoption resemble each-other roughly as much as two people selected randomly from the population do. For any variable shared by people who grow up in the same home, say household income, this implies that the increased similarity in that variable doesn’t produce an increased similarity in people’s psychological profile. This is true for many, though not all, psychological traits.
Kaufman has two complaints about Plomin’s comment.
First, he feels that Plomin should have explained that this reasoning does not apply to traits that don’t vary between individuals. Kaufman worries, for instance, that people may read Plomin’s work and think that having a family, as opposed to not having one, doesn’t impact a person’s ability to develop the ability to speak a language, a trait (and home environment) shared by basically everyone. Perhaps Plomin should have specified this, but I doubt that any sane person would seriously entertain the idea that children can develop language without human contact.
Kaufman’s second complaint goes as follows: “Furthermore, for traits like IQ that are only about 50% heritable, even if there are no effects of the so-called “shared environment,” parents may nonetheless be an important part of the so-called “non-shared environment,” as long as the effects they have on their children tend to make them different from each other. This may seem counter-intuitive– surely given that children in the same family have the same parents, then parents must be part of the shared environment– but that is not true because of the technical meaning of “shared environment” in the models used to estimate heritability.”
Kaufman’s point is theoretically valid but empirically dis-confirmed by the fact that the correlation in IQ, in adulthood, for unrelated individuals who grew up in the same home is .02 (Hunt, 2011, p 227). If Kaufman’s hypothesis was true and sharing a home made people dissimilar, the correlation would be substantial and negative, not effectively zero and positive.
The Non Shared Environment
Plomin’s next quote is ““We would essentially be the same person if we had been adopted at birth and raised in a different family. Environmental influences are important, accounting for about half of the differences between us, but they are largely unsystematic, unstable and idiosyncratic”– in a word, random.”
What Plomin is referring to here is the “non-shared environment”. A popular behavioral genetics textbook, which Plomin is a co-author of, defines the unshared environment as “all non-genetic influences that are independent (or uncorrelated) for family members, including measurement error” (Knopkin et al, 2017, p. 103).
Operationally, the non-shared environment is anything that causes identical twins raised in the same home to differ from one another. This includes environments not within, or correlated with, the home, random life events, and random error in the measurement of the trait in question.
Plomin here is referring to the fact that the environmental influences which have been shown to be important for explaining psychological variation in adulthood are non-shared ones. He seems to emphasize random life events and, anecdotally, I find this compelling: most people I know, myself included, would have ended up at very different places in life where it not for random experiences which could not have been predicted on the basis of our genetics or home environment.
Kaufman says the following: “What he is basically saying in these two sentences is the following: “You would be the same person if you had been adopted at birth and raised in a different family if that family somehow was in exactly the same situation and experienced exactly the same environmental forces your entire life as the family you were adopted from.” This seems pretty unlikely based on how the world works, and seems like a pointless comparison anyway.”
Here Kaufman seems to be confused about what the non-shared environment is. By definition, the non-shared environment is uncorrelated with the home environment, meaning that a person’s expected non-shared environment won’t differ depending on their home. Kaufman calls this unlikely, but it is actually logically necessary, tautological, and mathematically ensured by the means by which the influence of the non-shared environment is calculated.
Before moving on from this topic, it is worth noting that these are population specific statistics, meaning that they make not predictions about being adopted into a home from another population.
Causes of Stability
Plomin’s next quote is the following: “The environment can alter this plan temporarily, but after these environmental bumps we bounce back to our genetic trajectory. DNA isn’t all that matters, but it matters more than everything else put together in terms of the stable psychological traits that make us who we are.”
To expand on what Plomin said, though people change over the course of their life there is considerable stability. You resemble yourself of 10 years ago more than a randomly selected person from the population would. What explains this stability? It depends on the trait, but for intelligence the answer is largely genes, and this is increasingly true the older someone is. For personality, the influence of genes on stability falls with age, but for both traits genes explain the majority of stability and the environmental variables that explain stability are non-shared variables.
In their textbook, Plomin et al. repeatedly cite this sort of research to substantiate this point.
Kaufman interprets Plomin’s statement as an endorsement of genetic set point theory, or the idea that individuals are destined, due to genes, to manifest a particular value for psychological traits that can only be temporarily deviated from via environmental manipulation. Such a viewpoint seems hard to maintain given that many traits, for instance intelligence and height, have increased dramatically over the last century. Plomin’s statement is a bit ambiguous, but I am inclined to think he was referring to the sort of research I’ve just cited. In any case, people should know that psychological stability is largely due to genetics.
The next set of quotes deals with metrics called polygenic scores. A polygenic score is created based on the versions of various genes associated with a trait that person happens to have. The combined information from these multiple genes is then used to predict the trait in question.
Plomin discussed a polygenic score that predicted 9% of variance in educational achievement as measured by GCSE score (British SATs basically). This means that predicting achievement scores for a sample of people based on the polygenic score will lead to 9% less error than if you simply predicted that each person had achieved the mean score on their GCSEs (the most rational assumption absent other information).
Polmin displays several graphs showing the relationship between polygenic score and achievement score both at the level of individuals and at the level of group differences when people are binned into deciles of polygenic score.
Plomin emphasizes that individual prediction is error prone and probabilistic in nature despite the large effect suggested by the decile differences.
Kaufman has a lot to say about this, including the following: “This graph clearly shows that if a person has a genome-wide polygenic score (GPS) at the 75% percentile, they could score anywhere between the 2nd percentile or the 98th percentile of academic achievement. Not very useful!”
This is not a valid way of measuring the predictive validity of a measure. Imagine that 99% of people of with a given polygenic score had an A on their GCSE but 1% of people with this polygenic score got a C. Using Kaufman’s method, we would have to conclude that such a score would have no predictive validity due to the range of outcomes per score. But this score would predict achievement scores almost perfectly, reflecting the fact that it is the distribution of achievement scores at each polygenic score that is relevant not the range of scores.
Several times, Kaufman applies subjective language like “not very useful” when describing the polygenic score’s predictive validity. This strikes me as entirely unnecessary since we have an objective way of expressing this: the score explains 9% of variation in achievement scores.
Kaufman also raises the question ” why not just use an IQ test rather than rely on a weakly predictive score as a proxy for what you really want to know? “
Here are some reasons:
- Testing new polygenic scores is necessary in the long run process of developing a polygenic score that can account for all the (narrow) heritability of intelligence. Hopefully, polygenic scores will someday be able to explain something like 50% of IQ variation, but achieving such results requires the sort of work being done now.
- Polygenic scores can be used for things IQ cannot. For instance, a polygenic score can be used to see if people from wealthy families have high IQ scores simply due to genes without having to acquire adoption and twin samples.
- As Plomin mentions, polygenic scores can be used to predict IQ from infancy, before IQ tests can be administered.
- Polygenic scores can also be used to predict IQ before birth, allowing for embryonic selection based on genotypic intelligence.
Kaufman questions the usefulness of predicting IQ from an early age. For reasons left unexplained, he limits the potential validity of polygenic scores to 10%. This seems unrealistic given the heritability of IQ in adulthood. With a higher level of predictive validity, such scores could be meaningfully used to predict intelligence at an individual level.
Kaufman also worries about confounding variables in polygenic research. Polmin says, correctly, that backward causation is not an issue in genetics research because a trait, for instance height, cannot cause a change in a person’s genotype. Kaufman notes that population stratification is still a problem. That is, a gene might correlate with a trait simply because it is more common in a population that is more exposed to an environmental stimulus that influences the trait in question. For instance, a gene that impacts melanin might be correlated with IQ because black people are more likely than white people to be exposed to lead poisoning which inhibits intelligence.
Kaufman says that “But the fact is that you always have the problem that you can’t necessarily control for all of the relevant historical influences. You don’t always know what they are, and as psychologists we are perpetually surprised to discover new influences we didn’t account for ahead of time. Especially with large samples, it’s hard to be sure that just because you are controlling for something relevant it means that you are controlling for all of the relevant variance, even in the very thing you’re attempting to control for!”
This seems rather misleading to me. In genetics research, population stratification is often avoided by extracting successive principal components of genetic co-variation from the data (Liu et al. 2013). Essentially, this amounts to removing the genetic clusters that represent populations structure from the data. This method does not require knowledge of the specific environments that differ by population in order to completely correct for population stratification. On top of this, these studies are normally done on mono racial samples and some specific environmental variables are sometimes controlled for in addition to PCs.
On IQ’s Reliability
Kaufman also makes some comments related to IQ’s reliability. To understand reliability, you have to understand random error. Random error is distinct from systematic error, as when a test includes bad question and so will consistently be flawed. Random error isn’t persistent and could be completely eliminated by taking a person’s average score across infinite iterations of the test assuming that the test taker didn’t improve on the measure due to practice effects. A metric is reliable to the extent that variation in scores is due to variation in true scores on that metric as opposed to random error.
Normally, reliability is talked about as a property of a single test. It can be measured in a variety of ways, including the correlation in scores found when people take the same test multiple times, or by arbitrarily splitting the measure up, say into even and odd items, and looking at the correlation between the halves, or by looking at the average correlation between all the items in a test.
Commenting on the use of such methods, Berkeley psychologist and intelligence expert Arthur Jensen wrote:
“It is a common misconception that psychological measurements of human abilities are generally more prone to error or inaccuracy than are physical measurements. In most psychological research, and especially in psychometrics, this kind of measurement error is practically negligible. If need be, and with proper care, the error variance can usually be made vanishingly small. In my laboratory, for example, we have been able to measure such variables as memory span, fiicker-fusion frequency (a sensory threshold), and reaction time (RT) with reliability coefficients greater than .99 (that is, less than 1 percent of the variance in RT is due to errors of measurement). The reliability coefficients for multi item tests of more complex mental processes, such as measured by typical IQ tests, are generally about .90 to .95. This is higher than the reliability of people’s height and weight measured in a doctor’s office! The reliability coefficients of blood pressure measurements, blood cholesterol level, and diagnosis based on chest X-rays are typically around .50.” – Jensen, 1998, p. 50
Kaufman addresses a related but distinct issue: how similar will a person’s test scores be if they take two different tests both of which are supposed to measure intelligence? He claims the following: “McGrew reviewed IQ fluctuations among today’s most frequently administered IQ tests and estimated that the full range of expected IQ differences for most of the general population is 16 to 26 points.” He interprets these facts as indicating that “Truth is, you can never actually know a person’s true IQ score … All we get from the outcome of an IQ testing session is a range of values based on how confident we want to be that the person’s true score is somewhere within that range… Forget precision. We’re dealing with complex dynamic systems such as humans.”
To understand why I find this reasoning problematic, we need to understand something about the structure of intelligence. As early as 1904, it was demonstrated that seemingly disconnected cognitive skills, for instances grades in literature, math, and music, all positively correlate with one another.
There is a statistical method called factor analysis which measures the common factor underlying these positive correlates, a metric researchers call general intelligence. Once general intelligence is measured, we can ask how strongly each individual test is correlated with it, a static known as a tests “g loading”.
It is possible to extract general intelligence from this correlation matrix. Once general intelligence is controlled for, the non-overlapping variability in these abilities will remain, and these are referred to as narrow cognitive abilities.
Importantly, the predictive validity that IQ is so well known for is mostly a function of general intelligence. Often times, narrow cognitive abilities fail to predict variables like job performance once general intelligence is controlled for, and the higher the G loading of a sub test is the higher its predictive validity tends to be (Jensen, 1998).
An important hypothesis in intelligence research is called the “indifference of the indicator”. This refers to the fact that the same general intelligence factor should be extractable from any test which has items probing a diverse set of cognitive skills.
Thus, there should be strong consistency across tests in their G factor scores. Consistency in narrow ability scores are far less important since narrow abilities add little to the predictive validity of a test, and because differences in the composition of test items between tests will obviously lead to differences in narrow abilities.
Kaufmann’s calculations about the reliability of IQ were based on the fact that IQ tests sometimes correlate with each-other at levels as low as .60. This is true, but G factors derived from separate tests have been found to sometimes correlate at highly as .99 – 1.00 (Johnson et al., 2004; Johnson et al., 2008). Using the same methodology as Kuafman’s citation, this reduces the expected difference in scores across tests from 16 to 26 to plus or minus 2 points. This strikes me as a very high degree of reliability and sufficiently consistent such that we can know someones “true” g score in the same way that we can know someone’s true height or blood pressure.
Education and IQ
The final Plomin quote I want to comment on is this: ““Schools matter, but they don’t make a difference… This does not mean that the quality of teaching and support offered by schools is unimportant. It matters a lot for the quality of life for students, but it doesn’t make a difference in their educational achievement.”
Kaufman responds by saying “This is patently false. Analyses conducted on over 600,000, using natural experiments that rule out the effects of genetics, have found consistent evidence for beneficial effects of education on cognitive abilities— approximately 1 to 5 IQ points for each additional year of education. Even if we take the lower bound of this range, after 10 years of education this compounds to an an increase of 2/3 of a standard deviation on IQ– far from trivial I’d say.”
Kaufman is right to say that Plomin is downplaying the effects of education on IQ. Unfortunately, Kaufman isn’t sticking to the evidence either. His writing leaves readers with the impression that experiments have shown IQ to increase in response to education and that this effect can be compounded across years.
Yet, the very paper Kaufman cites states “The finding of educational effects on intelligence raises a number of important questions that we could not fully address with our data. First, are the effects on intelligence additive across multiple years of education? We might expect the marginal cognitive benefits of education to diminish with increasing educational duration, such that the education-intelligence function eventually reaches a plateau. Unfortunately, we are not aware of any studies that have directly addressed this question using a rigorous quasi-experimental method.”
The paper also mentions that most of the experiments are based on increasing minimum low levels of education and so the results cannot be assumed to generalize across the range of education and that there isn’t sufficient data to determine if the effects of education are on general intelligence or just narrow cognitive abilities.
Plomin has seemingly been overstating the important of IQ, calling it a “omnipotent” metric and the like. That being said, Kaufman’s reply seems technically inaccurate or misleading at several places. At least, that’s my reading of the situation.
To conclude, here’s a statement I feel is accurate: IQ isn’t all important, but it’s more important than most social scientists realize, nothing is 100% heritable, but everything is more heritable that most social scientists realize, and the home environment isn’t totally irrelevant, but it a lot less important than most social scientists realize.