A key translational issue for neuroscience is to understand how genes affect individual differences in brain function. Although it is reasonable to suppose that genetic effects on specific learning abilities, such as reading and mathematics, as well as general cognitive ability (g), will overlap very little, the counterintuitive finding emerging from multivariate genetic studies is that the same genes affect these diverse learning abilities: a Generalist Genes hypothesis. To conclusively test this hypothesis, we exploited the widespread access to inexpensive and fast Internet connections in the UK to assess 2541 pairs of 10-year-old twins for reading, mathematics and g, using a web-based test battery. Heritabilities were 0.38 for reading, 0.49 for mathematics and 0.44 for g. Multivariate genetic analysis showed substantial genetic correlations between learning abilities: 0.57 between reading and mathematics, 0.61 between reading and g, and 0.75 between mathematics and g, providing strong support for the Generalist Genes hypothesis. If genetic effects on cognition are so general, the effects of these genes on the brain are also likely to be general. In this way, generalist genes may prove invaluable in integrating top-down and bottom-up approaches to the systems biology of the brain.
Naturally occurring genetic variation is a Rosetta Stone for translating causal effects from genes to brain to cognition, especially once specific genes are identified (Plomin et al. in press). Although learning abilities and disabilities are highly heritable, specific genes responsible for their heritability have not yet been identified despite several promising candidate genes for reading disability (Fisher & Francks 2006). Nonetheless, more can be gleaned from quantitative genetic research, such as twin studies that compare identical and fraternal twins, than the mere fact that they are heritable. A major advance in quantitative genetics is multivariate genetic analysis, which investigates not only the variance of traits considered one at a time but also the covariance among traits. In this way, it indicates the extent to which the same or different genes affect several traits (Neale et al. 2006), using a statistic known as a genetic correlation (Plomin et al. in press). These findings can constrain explanations of brain processes that underlie the traits. For example, it is reasonable to suppose that genetic effects will be specific to the substantially different cognitive processes involved in reading and mathematics. Such genetic specificity would indicate the need to identify the genetically driven differences in brain processes that underlie these cognitive differences.
However, a very different result is emerging from multivariate genetic research on learning abilities and disabilities. Most genetic effects appear to be general in that the same genes affect different learning abilities and disabilities. A review of multivariate genetic research on learning abilities found that genetic correlations varied from 0.67 to 1.0 between reading and language (five studies), 0.47 to 0.98 between reading and mathematics (three studies) and 0.59 to 0.98 between language and mathematics (two studies) (Plomin & Kovas 2005). The average genetic correlation was about 0.70. Moreover, the general effects of genes appear to extend beyond specific learning abilities such as reading and mathematics to other cognitive abilities such as verbal abilities (e.g. vocabulary and word fluency) and non-verbal abilities (e.g. spatial and memory). The average genetic correlation between specific learning abilities and general cognitive ability (g), which encompasses these verbal and non-verbal cognitive abilities, is about 0.60 (Plomin & Kovas 2005). These findings have led to a Generalist Genes hypothesis (Plomin & Kovas 2005), which has far-reaching implications for cognitive neuroscience (Kovas & Plomin 2006). However, the Generalist Genes hypothesis has yet to be tested by direct cognitive test measures in a sample large enough to conclusively establish the magnitude of the genetic correlations between learning abilities. To address this problem, we developed an online test battery that includes measures of reading, mathematics and g and used it to assess a UK-representative population sample of 2541 pairs of 10-year-old twins: by far the largest twin sample with cognitive test data. The purpose of the present study was to exploit the potential of web-based administration to provide a powerful test of the Generalist Genes hypothesis.
The Twins Early Development Study (TEDS) recruited families of twins born in England and Wales in 1994, 1995 and 1996: three annual cohorts (Oliver & Plomin 2007; Trouton et al. 2002). The present paper describes results at 10 years, where a two-cohort subsample of TEDS (twins born in 1994 and 1995) was tested. Despite inevitable attrition (Oliver & Plomin 2007), the sample remains representative of the UK population (ascertained by comparison with census data from the Office of National Statistics). Notably, because of the widespread availability of fast Internet access, we found no evidence of sampling bias in favor of higher socioeconomic status families. Informed consent was obtained by post or online consent forms, and a test administrator was then assigned who telephoned the family, sorted out any problems with the Internet or testing and generally assisted and encouraged the participating family. Ethical approval for TEDS has been provided by the Institute of Psychiatry Ethics Committee, reference number 05/Q0706/228.
We excluded from the analyses children with severe current medical problems and children who had suffered severe problems at birth or whose mothers had suffered severe problems during pregnancy from the analyses. We also excluded twins whose zygosity was unknown or uncertain or whose first language was other than English. Finally, we included only twins whose parents reported their ethnicity as ‘white’, which was 93% of this UK sample. The present analyses are based on 2541 twin pairs [919 monozygotic (MZ) pairs, 817 same-sex dizygotic (DZ) and 805 opposite-sex DZ].
Widespread access to inexpensive and fast Internet connections in the UK has made online testing an attractive possibility for collecting data on the substantial samples necessary for genetic research, especially for multivariate genetic research. The advantages and potential pitfalls of data collection over the Internet have been reviewed in detail elsewhere (Birnbaum 2004). For older children, most of whom are competent computer users, it is an interactive and enjoyable medium. Through adaptive branching, it allows the use of hundreds of items to test the full range of ability, while requiring individual children to complete only a relatively small number of items to ascertain their level of performance. In tests where it is appropriate, streaming voiceovers can minimize the necessary reading. In addition, the tests can be completed over a period of several weeks, allowing children to pace the activities themselves, although they are not allowed to return to items previously administered. Finally, it is possible to intersperse the activities with games. All of these factors help maintain children’s engagement with the tests.
Reading was assessed by an adaptation of the Peabody Individual Achievement Test (PIAT-Revised; Markwardt 1997) Reading Comprehension Scale, and mathematics by three subtests based on the nferNelson Math 5-14 Series (nferNelson 2001): Understanding Number, Non-Numerical Processes and Computation and Knowledge. g was indexed by two verbal tests – WISC-III-PI multiple-choice Vocabulary and Information (Wechsler 1992) – and two non-verbal tests – WISC-III-UK Picture Completion (Wechsler 1992) and Raven’s Standard Progressive Matrices (Raven et al. 1996). We created a g score with equal weights for the four tests by summing their standardized scores. This unit-weighted score correlated 0.99 with the corresponding factor score. Further information about the measures is available elsewhere (Oliver & Plomin 2007). We have shown that the web-based tests are reliable, stable and valid (Haworth et al. 2007). As an index of reliability, Cronbach’s alpha was 0.95 for PIAT subtests, 0.78–0.93 for the math subtests and 0.74–0.91 for the g subtests (n = 2569–2924). In terms of stability and validity, scores on the online version of our reading and mathematics tests correlate highly with traditional in-person versions administered in person 1–3 months later: r = 0.80 for PIAT and 0.92 for mathematics (n = 30). Download time, a proxy for computer performance, accounted for less than 2% of the variance in the PIAT and less than 0.5% of the variance in the other tests.
According to the quantitative genetic model (Plomin et al. in press; Rijsdijk & Sham 2002), same-sex twins reared together resemble each other because of the additive effects of shared genes (A) or shared (common) environmental factors (C). For identical or MZ twins, the correlation between their genes is 1.00, whereas for non-identical or DZ twins, the correlation is 0.50 because DZ twins on average share half of their segregating alleles. The correlation between twins for shared environment is, by definition, 1.00 for both MZ and DZ twins growing up in the same family, while non-shared environmental influences (E) are uncorrelated and contribute to differences between twins. For the twin analyses, standardized residuals correcting for age and sex were used because the age of twins is perfectly correlated across pairs, which means that, unless corrected, variation within each age group at the time of testing would contribute to the correlation between twins and be misrepresented as shared environmental influence (Eaves et al. 1989). The same applies to the sex of the twins because MZ twins are always of the same sex. Likewise, download time, a proxy for computer performance, was also regressed out of the twins’ test scores. The assumptions of the classical twin model, and their validity, have been discussed in detail elsewhere (Boomsma et al. 2002; Martin et al. 1997; Rijsdijk & Sham 2002).
As well as examining twin correlations, we used standard ACE model-fitting analysis in Mx1.7.01 (Neale et al. 2006) where ACE stands for additive genetic influences (A), shared or common environmental influences (C) and non-shared environmental influences (E), as above. Model-fitting analysis specifies a correlational structure (a model) using matrix algebra. This model is a hypothesis about the structure of the dataset and is derived from what we know about how MZ and DZ twins are related to each other (see above). By fitting the model to the data using an iteration process, we can derive its ‘goodness of fit’ and parameter estimates for the contributions of A, C and E. Before embarking on our multivariate analysis, we initially examined sex differences in the genetic and environmental parameter estimates by comparing the fit of three full univariate ACE models (one each for reading, mathematics and g) with that of various nested models, dividing the twin pairs into five groups: MZ male, MZ female, DZ male, DZ female and DZ opposite-sex pairs. These sex-limitation models (Eley 2005) allowed us to estimate qualitative and quantitative etiological differences between the sexes (Galsworthy et al. 2000).
Finally, we used multivariate genetic model fitting to investigate the genetic and environmental etiology of the covariation between learning abilities. Figure 1 shows the phenotypic Cholesky decomposition, which partitions variance into a universal factor influencing all three traits, a factor influencing reading and mathematics independent of g and a factor influencing mathematics independent of reading and g. As shown on the left of Fig. 2, the genetic Cholesky decomposition partitions this variance further into genetic, shared environmental and non-shared environmental components. As in the phenotypic Cholesky, variance attributable to genetic influences is divided into a universal genetic factor influencing g, reading and mathematics (A1); a genetic factor specific to reading and mathematics (A2) and a genetic factor unique to mathematics (A3). The shared environmental influences (C1, C2, C3) and non-shared environmental influences (E1, E2, E3) are partitioned in the same way.
The correlated factors model (on the right) can be derived from the Cholesky model. It specifies three latent genetic factors, one for each trait, and calculates the correlation between them. Once again, the same is true for the shared and non-shared environments. These statistics, the genetic, shared environmental and non-shared environmental correlations, are unique to multivariate analysis. Along with bivariate heritability and environmentality (the proportion of the phenotypic correlation accounted for by genes or the environment), they give us an essential insight into how genetic factors and environments are shared in the etiology of learning abilities. It is important to note that the genetic and environmental correlations are independent of the heritability or environmentality of the traits. For example, two traits with very little heritability can nevertheless be highly correlated genetically (i.e. share the same genetic influence), and two highly heritable traits can be genetically uncorrelated (independent genetic influences).
In Fig. 2, the squared path coefficients influencing each measured variable in the correlated factors model can be derived from the corresponding squared paths in the Cholesky model. Finally, individual Cholesky pathways were dropped one at a time, and the fit was compared with the full model. This tests the statistical significance of the influence of each latent factor on g, reading and mathematics and indicates the most parsimonious model.
As shown in Table 1, the measures exhibited small mean differences for sex (boys higher for mathematics and g) and zygosity (DZs higher for reading and g), although the differences are statistically significant, given the large sample sizes. Altogether, sex and zygosity accounted for less than 1% of the variance in all measures.
Table 1. Measure means (M) and standard deviations (SDs) by sex and zygosity
Phenotypic correlations for the age-, sex- and download-time-regressed indicators were as follows (using one member of each twin pair): mathematics–reading, r = 0.51, n = 2457, P = 0.000; mathematics–g, r = 0.63, n = 2342, P = 0.000; reading–g, r = 0.54, n = 2333, P = 0.000.
Univariate genetic analyses
Intraclass correlations (Shrout & Fleiss 1979; twin similarity coefficients) are shown in Table 2 for the total group of MZ, DZ same-sex and DZ opposite-sex twins, as well as for the male and female subgroups among the same-sex twin pairs. Correlations between MZ twins were consistently higher than those between DZ twins, suggesting a genetic contribution to reading, mathematics and g. As a first estimate, doubling the difference between the MZ and same-sex DZ correlations yields moderate heritability estimates of 40% for mathematics, 40% for reading and 36% for g. Shared environmental influences are also moderate, estimated as the extent to which MZ resemblance exceeds heritability: 31% for mathematics, 24% for reading and 35% for g. The remainder of the variance is attributed to non-shared environmental influences (plus error of measurement): 29% for mathematics, 36% for reading and 29% for g.
Table 2. Twin similarity coefficients (intraclass correlations) for mathematics, reading and g
All similarity coefficients are based on age-, sex- and download-time-corrected scores.
95% confidence intervals in parentheses.
Dzall, all DZ pairs; DZO, opposite-sex DZ pairs; DZS, same-sex DZ pairs; MZ, MZ pairs; MZF, MZ female pairs; MZM, MZ male pairs; n, number of twin pairs.
0.71 (0.68–0.75), (n = 863)
0.51 (0.45–0.56), (n = 762)
0.43 (0.37–0.49), (n = 751)
0.47 (0.43–0.51), (n = 1513)
0.69 (0.63–0.74), (n = 356)
0.73 (0.68–0.77), (n = 507)
0.47 (0.38–0.55), (n = 343)
0.54 (0.47–0.60), (n = 419)
0.64 (0.60–0.67), (n = 919)
0.44 (0.39–0.50), (n = 817)
0.42 (0.36–0.47), (n = 805)
0.43 (0.39–0.47), (n = 1622)
0.64 (0.58–0.70), (n = 383)
0.63 (0.58–0.68), (n = 536)
0.43 (0.34–0.51), (n = 373)
0.46 (0.38–0.53), (n = 444)
0.71 (0.67–0.74), (n = 833)
0.53 (0.47–0.58), (n = 728)
0.44 (0.38–0.50), (n = 709)
0.48 (0.44–0.52), (n = 1437)
0.71 (0.65–0.76), (n = 342)
0.71 (0.66–0.75), (n = 491)
0.50 (0.42–0.58), (n = 328)
0.54 (0.47–0.61), (n = 400)
As shown in Table 3, ACE model-fitting results are consistent with estimates based on the twin correlations in Table 2. For mathematics, reading and g, genetic influence is moderate (49%, 38% and 44%, respectively, for the best-fitting model). Shared and non-shared environmental influences are more modest.
Table 3. Parameter estimates for mathematics, reading and g
These estimates are based on the best-fitting submodel of the full sex-limitation model, the null model, indicating no quantitative or qualitative differences in etiology between males and females.
95% confidence intervals in parentheses.
A, additive genetic influence; C, shared environmental influence; E, non-shared environmental influence.
Moreover, across zygosity, correlations within male and female pairs (Table 2) were similar, suggesting similar ACE estimates for boys and girls. In addition, correlations between same-sex DZ twins were similar to those between opposite-sex DZ twins, suggesting no qualitative sex differences. Sex-limitation model fitting (Eley 2005) confirmed these expectations, yielding no significant sex differences in ACE parameter estimates or in comparisons between same-sex and opposite-sex DZ twins. The best-fitting model from the sex-limitation analyses was the null model, which includes no quantitative or qualitative differences in etiology between males and females: likelihood ratio χ2 test with Δdf compared with full sex-limitation model for mathematics, reading and g, respectively: 4.17, 3, P = 0.24; 0.18, 3, P = 0.98; 3.79, 3, P = 0.29. For this reason, and to maximize power, our multivariate genetic analyses combined sexes.
Multivariate genetic analysis
Cross-trait twin correlations (e.g. twin 1 reading versus twin 2 mathematics) are the essence of multivariate genetic analysis. Table 4 shows that cross-trait twin correlations are consistently greater for MZ than for DZ twins; the MZ cross-trait twin correlations are nearly as great as the phenotypic correlations for the same individual, shown in the third column of Table 4. Doubling the difference between the MZ and DZ cross-trait correlations estimates the genetic contribution to the phenotypic correlations (0.18, 0.32 and 0.32, respectively, for the three rows of Table 4), and dividing these estimates by the phenotypic correlations for the same individual (third column of Table 4) indicates the proportional contribution of genetic influences to the phenotypic correlation: the bivariate heritability (35%, 51% and 59%). As indicated in the Methods, the genetic correlation, unlike bivariate heritability, is independent of heritability. The genetic correlation can be estimated by dividing the genetic contribution to the phenotypic correlation by the product of the square roots of the heritabilities of the two traits (Plomin & DeFries 1979). These rough estimates of genetic correlations are substantial: 0.40 for reading and mathematics, 0.73 for reading and g, and 0.68 for mathematics and g.
Table 4. Cross-trait twin similarity coefficients (ICC1.1) for reading, mathematics and g
All similarity coefficients are based on age-, sex- and download-time-corrected scores. Correlations for same individual are based on one random twin from each pair. Reversing the ordering of the pairs (e.g. twin 2 reading–twin 1 mathematics) produces the same results.
95% confidence intervals in parentheses.
Dzall, all DZ pairs; MZ, MZ pairs; n, number of twin pairs.
Twin 1 reading–twin 2 mathematics
0.46 (0.41–0.51), (n = 875)
0.37 (0.33–0.41), (n = 1548)
0.51 (0.48–0.54), (n = 2457)
Twin 1 reading–twin 2 g
0.52 (0.47–0.56), (n = 840)
0.36 (0.32–0.41), (n = 1462)
0.54 (0.51–0.56), (n = 2333)
Twin 1 mathematics–twin 2 g
0.56 (0.51–0.61), (n = 838)
0.40 (0.36–0.44), (n = 1457)
0.63 (0.60–0.65), (n = 2342)
The results of multivariate model-fitting analyses echo this simple analysis based on the cross-trait twin correlations. The Cholesky decomposition model is shown in Fig. 1. Tables 5 and 6 show parameter estimates and confidence intervals, with the results summarized visually in Fig. 2. As shown in Table 5 and Fig. 2 (curved arrows, top right), genetic correlations are substantial (0.57 for reading and mathematics, 0.61 for reading and g, and 0.75 for mathematics and g), providing evidence in support of the Generalist Genes hypothesis. Bivariate heritabilities (Table 5) are about 50%, indicating that about half of the phenotypic correlations across g, reading and mathematics are mediated genetically. The Cholesky analysis (Table 6 and left side of Fig. 2) also indicates that, independent of g, there is no residual genetic overlap between reading and mathematics and that there are significant genetic influences specific to reading and mathematics.
Table 5. Reading, mathematics and g: multivariate analysis fitting a correlated factors model. Genetic and environmental correlations, bivariate heritability and environmental influence (proportion of phenotypic correlation mediated by A, C and E)
Reading and mathematics
Reading and g
Mathematics and g
rA, rC, rE= genetic, shared environmental and non-shared environmental correlations. Model fit statistics are reported in the footnote to Table 6.
95% confidence intervals in parentheses.
Mediation of rP
Table 6. Reading, mathematics and g: multivariate analysis fitting a Cholesky model. Standardized, squared path coefficients for g, reading and mathematics
Likelihood ratio χ2 test with Δdf compared with saturated phenotypic model: 22.8, 24df, P = 0.53. Sample-size-adjusted Bayesian Information Criterion = −17658.
95% confidence intervals in parentheses.
A, additive genetic influence; C, shared environmental influence; E, non-shared environmental influence.
Shared environment also shows substantial overlap between reading, mathematics and g, contributing almost as much as genetics to their phenotypic correlations and yielding correlations of 0.89–0.94. Non-shared environment, which includes error of measurement, is the chief contributor to differences between abilities, accounting for only about 10% of the phenotypic correlations and yielding correlations of 0.15–0.22.
Using direct tests of cognitive ability, reading and mathematics in the largest representative twin sample to date, the present study provides conclusive evidence in favor of the Generalist Genes hypothesis, the first time it has been tested by direct assessment of cognitive abilities in a large sample. The key results are the genetic correlations of 0.57 between reading and mathematics, 0.61 between reading and g, and 0.75 between mathematics and g, which are similar to the average result for previous multivariate genetic studies (Markowitz et al. 2005; Plomin & Kovas 2005). Despite the large sample, 95% confidence intervals range from 0.45 to 0.86 across the three domains, indicating that the differences in genetic correlations are not significant, and only permitting the conclusion that all three genetic correlations are substantial. The genetic correlations between domains imply that genetic correlations within domains would be even higher, and this is what research suggests (Plomin & Kovas 2005). For example, genetic correlations have been reported to be about 0.90 between reading processes such as word recognition, orthographic coding and phonological decoding (Gayan & Olson 2003), about 0.90 between mathematical computation, application and comprehension (Kovas et al. 2007), and about 0.80 between verbal and spatial abilities (Petrill 2002). Likewise, the correlations between shared environmental factors are strong in this childhood sample, accounting for about 40% of the phenotypic correlation between traits. We would expect the shared environmental contribution to the etiology of g to diminish throughout childhood and into adolescence while the genetic contribution increases (Boomsma 1993). In contrast, the non-shared environment accounts for very little of the phenotypic correlation between traits, contributing largely to discrepancies in cognitive profiles.
The Generalist Genes hypothesis proposes that some, but not all, genetic effects are general – what is novel is the substantial extent to which genetic effects are general. Nonetheless, these genetic correlations are not 1.0, suggesting that there are also genetic effects specific to each domain. The Cholesky analysis (Table 6, left side of Fig. 2) indicates significant domain-specific genetic variance for both reading and mathematics.
These results predict that, when genes are identified that account for the substantial heritability of reading, mathematics and g, genes associated with one domain such as reading are highly likely to be associated as well with mathematics and g. The Generalist Genes hypothesis implies that molecular genetic attempts to identify genes will profit from targeting what is in common among cognitive domains, as well as what is specific to each (Butcher et al. 2006).
For neuroscience, the general effect of genes across such different domains warrants consideration of the possibility that these genes will be found to have similarly general effects across brain structures and functions (Kovas & Plomin 2006). For example, the basic synapse comprises 1000 proteins, with 80% of unknown function in the nervous system (Grant 2006; Jordan & Ziff 2006). The most well-understood synaptic system, the neurotransmitter receptor complex N-methyl-d-aspartate receptor complex (NRC/MASC), includes 186 proteins that have been implicated in synaptic plasticity and a wide range of cognitive processes (Pocklington et al. 2006). Identifying generalist genes on the basis of their association with downstream cognitive processes could facilitate a systems approach to brain organization (Armstrong et al. 2006) because the genes are all anchored in these functional cognitive products of the brain.
For example, genome-wide association studies guided by quantitative genetic findings are beginning to identify sets of polymorphisms associated with cognitive abilities (Butcher et al. 2005; Meaburn et al. 2007), enabling a top-down approach complementary to bottom-up functional genomic analysis: an approach we have termed behavioral genomics (Harlaar et al. 2005; Haworth et al. 2007). This allows us to take sets of genes related to cognitive abilities and look at them from a variety of perspectives: multivariate (are the genes associated with one cognitive trait also associated with other cognitive traits?), longitudinal (do the associations reflect changes in the heritability of a trait across time?) and environmental (are the genes associated with a cognitive trait also associated with relevant environments or does the environment influence the strength of the association between the genes and the trait?). We anticipate that the patterns of association emerging from these studies will reflect both the genetic and environmental etiology of cognition derived from quantitative genetics, and the biological systems underlying those cognitive processes arrived at through functional genomics. In this sense, generalist genes could integrate top-down and bottom-up approaches to the systems biology of the brain.
We thank the parents of the twins in TEDS and the twins themselves for making this study possible. TEDS is supported by a program grant from the UK Medical Research Council (G500079), and this research is also supported by a grant from the US NICHD/OSERS (HD46167).
O.S.P.D. carried out the analyses and drafted the manuscript together with R.P., who also designed the study. Y.K., N.H., S.A.P. and P.S.D. provided intellectual input, and P.B., A.M. and J.F. coordinated the data collection. All authors contributed to the design of the web battery.