The developmental origins of genetic factors influencing language and literacy: Associations with early-childhood vocabulary.

Background: The heritability of language and literacy skills increases from early-childhood to adolescence. The underlying mechanisms are little understood and may involve (a) the ampliﬁcation of genetic inﬂuences contributing to early language abilities, and/or (b) the emergence of novel genetic factors (innovation). Here, we investigate the developmental origins of genetic factors inﬂuencing mid-childhood/early-adolescent language and literacy. We evaluate evidence for the ampliﬁcation of early-childhood genetic factors for vocabulary, in addition to genetic innovation processes. Methods: Expressive and receptive vocabulary scores at 38 months, thirteen language- and literacy-related abilities and nonverbal cognition (7 – 13 years) were assessed in unrelated children from the Avon Longitudinal Study of Parents and Children (ALSPAC, N individuals ≤ 6,092). We investigated the multivariate genetic architecture underlying early-childhood expressive and receptive vocabulary, and each of 14 mid-childhood/early-adolescent language, literacy or cognitive skills with trivariate structural equation (Cholesky) models as captured by genome-wide genetic relationship matrices. The individual path coefﬁcients of the resulting structural models were ﬁnally meta-analysed to evaluate evidence for overarching patterns. Results: We observed little support for the emergence of novel genetic sources for language, literacy or cognitive abilities during mid-childhood or early adolescence. Instead, genetic factors of early-childhood vocabulary, especially those unique to receptive skills, were ampliﬁed and represented the majority of genetic variance underlying many of these later complex skills ( ≤ 99%). The most predictive early genetic factor accounted for 29.4%( SE = 12.9%) to 45.1%( SE = 7.6%) of the phenotypic variation in verbal intelligence and literacy skills, but also for 25.7%( SE = 6.4%) in performance intelligence, while explaining only a fraction of the phenotypic variation in receptive vocabulary (3.9%( SE = 1.8%)). Conclusions: Genetic factors contributing to many complex skills during mid-childhood and early adolescence, including literacy, verbal cognition and nonverbal cognition, originate developmentally in early-childhood and are captured by receptive vocabulary. This suggests developmental genetic stability and overarching aetiological mechanisms.


Introduction
Individual differences in vocabulary during the preschool period are predictive of many later languageand literacy-related skills (Bleses, Makransky, Dale, Højen, & Ari, 2016;Bornstein, Hahn, Putnick, & Suwalsky, 2014;Duff, Reen, Plunkett, & Nation, 2015;Lee, 2011), an important component of academic achievement (Durham, Farkas, Hammer, Bruce Tomblin, & Catts, 2007). For example, a latent factor consisting of expressive and receptive vocabulary size at 16-24 months predicted vocabulary size, as well as performance on tests of phonological awareness, reading accuracy and reading comprehension in children five years later (Duff et al., 2015). Similarly, infants with a larger expressive vocabulary at 24 months showed a larger vocabulary as well as better decoding, word recognition and passage comprehension skills when assessed up to primary school (Lee, 2011).
Associations between infant vocabulary and language and literacy skills during later life may arise due to shared underlying aetiologies. According to the 'simple view of reading' theory, reading comprehension is the product of printed word recognition (decoding) and oral language comprehension (Gough & Tunmer, 1986). Early vocabulary is a central component of both these abilities (Jong & Leij, 2002). Decoding is substantially based on phonological awareness (i.e. the awareness of sound structures of speech), which develops during the preschool period and has been shown to be related to vocabulary size (Jong & Leij, 2002). Listening comprehension (i.e. the understanding of spoken language), particularly bottom-up processing, necessarily begins with vocabulary comprehension (Dickinson & Neuman, 2007). Spelling performance is also closely related to phonological awareness and other phonological abilities (Dich & Cohn, 2013). However, the biological processes that underlie these complex developmental interrelationships are only partially understood.
Variation in expressive and receptive language skills, assessed during the first four years of life, is modestly heritable, while genetic influences on language and literacy skills assessed from mid-childhood to early adolescence are moderate to strong (Harlaar, Hayiou-Thomas, Dale, & Plomin, 2008; Hayiou-Thomas, Dale, & Plomin, 2012;St Pourcain et al., 2014;Verhoef et al., 2019). Specifically, longitudinal twin studies, assessing heritability indirectly based on differences in phenotypic correlations between monozygotic and dizygotic twins (twinh 2 ; Plomin, DeFries, Knopik, & Neiderhiser, 2016), have reported heritability estimates of 22% to 28% for a combined language measure including expressive vocabulary at 2, 3 and 4 years of age (Hayiou-Thomas et al., 2012). A considerable part of this twin-h 2 can be attributed to common genetic variation, as estimated from directly assessed genotype information in population-based samples of unrelated children. Single nucleotide polymorphism (SNP)-h 2 estimates range between 13% and 14% for expressive vocabulary at 15-18 and 24-30 months of age, respectively (St Pourcain et al., 2014). In contrast, the heritability for language and literacy skills assessed from mid-childhood onwards is larger, with twin-h 2 estimates from 47% to 72% (Harlaar et al., 2008;Hayiou-Thomas et al., 2012) and SNP-h 2 estimates from 32% to 54% (Verhoef et al., 2019). Twin-based genetic correlations, reflecting the extent to which genetic variation is shared between two traits, are moderate between early-childhood and later developmental stages and imply some genetic stability (Harlaar et al., 2008;Hayiou-Thomas et al., 2012).
The increase in heritability from early-childhood to adolescence has been reported for many cognitive skills (Briley & Tucker-Drob, 2013;Haworth et al., 2010), suggesting overarching aetiological mechanisms that may involve processes of genetic innovation and amplification (Plomin & DeFries, 1985). Innovation refers to novel genetic factors emerging during development (i.e. previously unrelated genetic variation becomes associated with a trait over time). In contrast, amplification refers to genetic influences that are associated with a trait throughout development, explaining increasingly more variation with progressing age (Briley & Tucker-Drob, 2013). A meta-analysis of twin studies on cognitive abilities suggested that novel genetic influences predominate during the transition from early-to middle childhood, supporting innovation (Briley & Tucker-Drob, 2013). From eight years of age onwards (mid-childhood), there was evidence for enhanced genetic stability with dominance of amplification processes (Briley & Tucker-Drob, 2013). A similar pattern of amplification and innovation processes was reported by twin studies examining genetic links between early language (including expressive vocabulary and syntax skills between 2-4 years of age) and both mid-childhood/adolescent language (Hayiou-Thomas et al., 2012) and reading abilities (Harlaar et al., 2008), based on latent factor models. Thus, innovation and to a lesser degree amplification processes may account for the observed increase in heritability of language and literacy skills during the transition from early-to mid-childhood.
Beyond latent factor twin analyses (Harlaar et al., 2008;Hayiou-Thomas et al., 2012), the developmental origins of genetic variation contributing to child and adolescent language, literacy and cognition are little characterised. In particular, genetic relationships with early-childhood receptive vocabulary are unknown and the spectrum of interrelated later-life skills that are genetically related to early-childhood language abilities is only partially understood. Furthermore, evidence for amplification and innovation processes has not yet been established beyond twin research. Here, we use SNP information from directly genotyped markers and structural equation models to seek evidence for innovation and/or amplification processes during language and literacy development within a sample of unrelated children from the Avon Longitudinal Study of Parents And Children (ALSPAC, N ≤ 6,092). Specifically, we study expressive and receptive vocabulary at 38 months and a wide range of mid-childhood/early-adolescent language-and literacy-related skills, including reading, spelling, phonemic awareness, listening comprehension, nonword repetition and verbal intelligence, as well as nonverbal intelligence (7-13 years).

Participants
All participants were drawn from ALSPAC, a UK populationbased longitudinal pregnancy-ascertained birth cohort (estimated birth date: 1991-1992, Appendix S1; Boyd et al., 2013;Fraser et al., 2013). The ALSPAC Ethics and Law Committee and the Local Research Ethics Committees provided ethical approval for the study. Consent for biological samples has been collected in accordance with the Human Tissue Act (2004). Informed consent for the use of data collected via questionnaires and clinics was obtained from participants following the recommendations of the ALSPAC Ethics and Law Committee at the time.
ALSPAC participants were genotyped using the Illumina HumanHap550 quad chip genotyping platforms. Standard genomic quality control was performed using PLINK (v1.07; Purcell et al., 2007;Appendix S2). After quality control, 465,740 SNPs and ≤ 6,092 individuals with high-quality genetic and phenotypic data remained.
MacArthur Communicative Development Inventory Words & Sentences (CDI; Fenson et al., 1993). Parents were asked whether their child was able to (a) say, (b) understand, or (c) both say and understand a word from a list of 123 words. Expressive vocabulary size reflects the number of words a child produces, regardless of whether they also understand these words and was defined as the sum of words a child (a) says and (c) says and understands. Receptive vocabulary size reflects the number of words a child understands, regardless of whether children are able to produce these words and was defined as the sum of words a child (b) understands and (c) says and understands. CDI expressive vocabulary scores have high validity, showing correlations with direct assessments of over 0.70 (Dale, 1991;Ring & Fenson, 2000). The correlation between parental and direct assessment of receptive vocabulary is 0.55 (Ring & Fenson, 2000). In total, 6,092 children had both early vocabulary and genome-wide genetic data available (Table 1).
Mid-childhood/early-adolescent language-and literacy-related abilities. Thirteen language-and literacyrelated abilities (LRAs) capturing reading, spelling, phonemic awareness, listening comprehension, nonword repetition and verbal intelligence were assessed from mid-childhood to early adolescence (7-13 years, N ≤ 5,749) using both standardised and ALSPAC-specific instruments (Table 1, Appendix S3). Word reading accuracy and comprehension (age 7 years) were measured using the basic reading subtest of the Wechsler Objective Reading Dimensions (WORD) assessment. Word and nonword reading accuracy scores were assessed using an ALSPAC-specific measure (Appendix S3), in addition to passage reading accuracy and speed with the revised Neale Analysis of Reading Ability (NARA II), all at age 9 years. Word and nonword reading speed (age 13 years) was captured with the Test of Word Reading Efficiency (TOWRE). Spelling accuracy (age 7 and 9 years) was assessed with an ALSPAC-specific measure (Appendix S3). Phonemic awareness (age 7 years) was measured with the Auditory Analysis Test (AAT) and listening comprehension, and nonword repetition and verbal intelligence quotient (VIQ) scores (all age 8 years) were assessed with a subset of the Wechsler Objective Language Dimensions (WOLD) test, an adaptation of the Children's Test of Nonword Repetition (CNRep) and the Wechsler Intelligence Scale for Children (WISC-III), respectively. A detailed description of each instrument, including reliability, validity and references, is available in Table 1 and Appendix S3.
Mid-childhood performance intelligence. We studied performance intelligence quotient (PIQ) scores (age 8 years), assessed using the WISC-III (Table 1, Appendix S3), as part of sensitivity analyses.
Phenotype transformation. Early-childhood vocabulary and mid-childhood/early-adolescent LRA and PIQ scores were rank-transformed to achieve normality and to allow for comparisons of genetic effects across different psychological instruments. All measures were residualised for sex, age (unless measures were derived using age-specific norms) and the two most significant ancestry-informative principal components, calculated using EIGENSOFT (v6.1.4; Price et al., 2006). In addition, vocabulary scores were residualised for age squared, as vocabulary develops rapidly during early-childhood (Brooks & Meltzoff, 2008).

Phenotypic correlations. Phenotypic correlations (r p )
were calculated for untransformed and rank-transformed scores using Spearman rank-correlation and Pearson correlation coefficients, respectively. Patterns were highly similar for untransformed and transformed scores ( Figure S1).
Genome-wide Complex Trait Analysis. SNP-h 2 was estimated using restricted maximum-likelihood (REML) analyses as implemented in Genome-wide Complex Trait Analysis (GCTA, v1.26.0, https://cnsgenomics.com/software/gcta/) software (Yang, Lee, Goddard, & Visscher, 2011). This method examines unrelated individuals, pair by pair, and predicts phenotypic similarity by genetic similarity. Genetic interrelatedness between individuals is captured by a genetic relationship matrix (GRM; Yang et al., 2011), which is a matrix with as many  (Yang et al., 2011). In addition, we estimated SNP-h 2 , genetic correlations, factorial coheritability (the proportion of total genetic variance explained by a specific genetic factor) and bivariate heritability (the contribution of genetic factors to the observed phenotypic covariance between two measures) with GSEM (Appendices S5-S6). Due to computational constrains, it was not possible to include all measures of interest into one large structural equation model. Consequently, our data analysis strategy followed a two-step procedure: first, we fitted 13 trivariate Cholesky decomposition models, each consisting of expressive and receptive vocabulary at 38 months and one of the 13 LRAs (in this order, termed 'forward' GSEM, Figures S2a and S3a). Second, we carried out a meta-analysis of absolute GSEM path coefficients for these 13 models across predefined domains including (i) reading-related measures, (ii) spelling-related measures and (iii) all LRA outcomes (Table S1), accounting for interrelatedness between LRAs (R:metafor library, Rv3.2.0, http://www.metafor-project.org/doku.php; Appendix S7). As Cholesky decompositions are sensitive to the order of modelled traits, the order of the two vocabulary measures at 38 months was reversed within the 13 trivariate Cholesky decomposition models (termed 'reverse' GSEM, Figures S4a and S5a) as part of sensitivity analyses. Finally, to compare LRA genetic covariance patterns with nonverbal cognitive abilities, we studied expressive and receptive vocabulary at 38 months together with PIQ at 8 years.

Structural equation modelling
Next, we modelled multivariate genetic variances between expressive and receptive vocabulary at 38 months and, in turn, each of the 13 mid-childhood/early-adolescent LRAs using GSEM. Within each forward GSEM model, the estimated path coefficients link to shared and unique genetic variance components through structural equations (Appendix S4). SNP-h 2 estimates were consistent between GCTA and GSEM (Table S2).
Squared path coefficients for the third genetic factor (A3) account for unique genetic variance in the studied LRAs, independent of genetic factors contributing to expressive and/or receptive vocabulary at 38 months (a 33 ; Figure S2a). We found little evidence for novel genetic LRA influences arising after early-childhood ( Figure 2, Figure S2).
At the level of individual LRAs, forward GSEMs identified two highly related developmental association patterns. The first pattern, observed for VIQ only, includes shared genetic variation with both expressive (a 31 ) and receptive (a 32 ) vocabulary (Figure 2A). The second pattern includes, primarily, an amplification of genetic influences for receptive vocabulary (a 32 ) that relate to multiple literacy skills, including reading accuracy/comprehension at 7 years ( Figure 2C), reading accuracy at 9 years (assessed with NARA II), reading speed at 9 years, reading and nonword reading speed at 13 years and spelling accuracy at 7 years ( Figure S2d-i).
To evaluate overarching association patterns across mid-childhood/early-adolescent language and literacy skills ( Figures S1 and S6), we metaanalysed absolute path coefficients across all 13 forward GSEM models (Table S1). This meta-analysis confirmed the amplification of genetic influences  Table 1 © 2020 The Authors. Journal of Child Psychology and Psychiatry published by John Wiley & Sons Ltd on behalf of Association for Child and Adolescent Mental Health Developmental origins of genetic factors influencing language and literacy that are unique to receptive vocabulary at 38 months (meta-path-coefficient a 32 = 0.62(0.06), p < 1 9 10 À10 ; Figure 3, Table S4). In addition, we observed nominal evidence for an amplification of genetic influences that capture the entirety of expressive vocabulary at 38 months (meta-path-coefficient a 31 = 0.20(SE = 0.08), p = .009; Figure 3, Table S4). Consistent with individual GSEM models, there was little meta-analytic evidence for novel genetic influences arising after early-childhood (meta-path-coefficient a 33 = 0.34(SE = 0.29), p = .24; Figure 3, Table S4). Meta-analyses of reading measures (N = 7) and spelling measures (N = 2) showed that developmental genetic amplification patterns observed across all LRAs primarily, but not exclusively, involved reading-related abilities (Table S4).
Cholesky decompositions are sensitive to the order of modelled traits, although SNP-h 2 estimations remain unchanged. We therefore created 13 additional GSEM models, as part of sensitivity analyses, reversing the order of expressive and receptive vocabulary at 38 months (reverse GSEM models, with path coefficients as detailed in Figure S4a). Consistent with forward GSEM models, there was little evidence for novel LRA-related genetic factors emerging after early-childhood (A3; Figures S4 and  S5). For reverse GSEM, the first genetic factor (A1), capturing the entire SNP-h 2 of receptive vocabulary, accounted also for 11.8%(SE = 5.5%) of the phenotypic variance in expressive vocabulary ( Figures S4  and S5, shown for the GSEM model including VIQ). A further 5.9%(SE = 3.0%) of the phenotypic variance in expressive vocabulary was explained by a second genetic factor (A2), capturing genetic influences that are independent of receptive and unique to expressive vocabulary. Early genetic factors accounted for phenotypic variation in VIQ, reading and spelling abilities, but also phonemic awareness and/or nonword repetition (Figures S4 and S5).
To identify the most predictive early genetic factors observed using either forward or reverse GSEM models, we studied factorial coheritabilities and bivariate heritabilities. The largest contribution to the genetic variance of later LRAs was observed for genetic influences uniquely related to receptive vocabulary (A2, forward GSEM, Figure S2a), explaining up to 95%(SE = 20%) in LRA SNP-h 2 , especially for reading and VIQ (Table S3). In comparison, shared receptive/expressive vocabulary-related genetic influences (A1, reverse GSEM, Figure S4a) explained only up to 73%(SE = 20%) of LRA SNP-h 2 (Table S3), although derived 95% confidence intervals overlap. Consistently, genetic covariance between receptive vocabulary and later LRAs accounted for the majority of their phenotypic covariance, with bivariate heritability estimates of up to 1.00(SE = 0.22; Figure 4, Table S5). In contrast, there was little evidence that genetic factors underlying expressive vocabulary, irrespective of its variance decomposition, substantially predicted variation in LRAs (Figure 4, Table S5), except for VIQ (0.69(SE = 0.24)). Thus, the majority of genetic variation in later LRAs can be attributed to a small proportion of genetic variance in early language that uniquely captures receptive vocabulary, and that has been amplified during development.

Discussion
Multivariate genetic variance analyses in this study showed that genetic factors contributing to midchildhood/early-adolescent LRAs, including reading and spelling skills, but also phonological awareness, nonword repetition, verbal and nonverbal cognitive functioning, can already be captured by early-childhood language. Early genetic influences, especially those uniquely related to receptive vocabulary, are amplified during development and fully account for genetic variation in later reading, verbal and nonverbal cognitive skills. Independent of model specification, there was little evidence for novel genetic influences emerging during mid-childhood and early adolescence that would suggest specificity in the genetic LRA composition. Thus, developmental processes underlying language and literacy skills may not fully adhere to a paradigm that exclusively predicts genetic innovation during the transition from early to middle childhood (Briley & Tucker-Drob, 2013;Hayiou-Thomas et al., 2012). Figure 3 Meta-analyses of developmental structural models. Absolute path coefficients for 13 structural equation models (forward GSEM) corresponding to 13 LRAs in mid-childhood and early adolescence were meta-analysed, accounting for phenotypic interrelatedness. Detailed information, including estimates of effect heterogeneity, is shown in Table S4. # Path coefficient passing the nominal (p ≤ .05), but not the experiment-wide significance threshold (p ≤ .005). Trait abbreviations are described in Table 1 © The identification of amplification processes is consistent with twin research reporting moderate genetic correlations between latent factors for early language (including expressive vocabulary and syntax skills) and both mid-childhood and/or adolescent latent language (Hayiou-Thomas et al., 2012) and reading (Harlaar et al., 2008). However, as genetic factors accounted only for about a third of the phenotypic correlations (Harlaar et al., 2008;Hayiou-Thomas et al., 2012), findings have been interpreted as evidence for genetic innovation (Hayiou-Thomas et al., 2012). In the present study, early vocabulary-related genetic factors, especially those related to receptive vocabulary, explained the majority of genetic variance (≤99% SNP-h 2 ) for many later reading and cognitive skills. The difference in results, implicating amplification instead of innovation processes, might be due to two reasons. First, previous studies focused on early-childhood expressive language skills only. In the current study, however, the largest amplification was observed for a small proportion of genetic variance that is unique to early receptive and independent of early expressive vocabulary. Consistently, the majority of phenotypic covariance between early receptive vocabulary and later skills, especially literacy and cognition, was accounted for by shared genetic sources. In contrast, genetic influences in expressive vocabulary did not substantially contribute to the total genetic variance of later LRAs, despite some evidence for genetic interrelationships with VIQ. Thus, structural models omitting genetic factors influencing early receptive vocabulary may attribute developmental changes in the genetic architecture of mid-childhood/early-adolescent traits to genetic innovation processes. Note that VIQ findings were representative of many (less powerful) WISC-III subtests, including the WISC-III vocabulary subtest, showing similar association patterns (data not shown). Second, this study benefits from a direct estimation of genetic interrelationships between individuals, based on genotyping information (St Pourcain et al., 2017), enabling the detection of small changes in SNP-h 2 , compared to a more indirect assessment based on twin correlations.
The similarity in developmental genetic changes predicting the genetic composition of mid-childhood/early-adolescent reading and cognitive skills, as observed by factorial coheritability estimates, is consistent with overarching developmental patterns. According to the 'generalist genes' hypothesis, cognitive abilities are presumed to share genetic variance components (Plomin & Kovas, 2005). Our results suggest that an early generalist genetic component may manifest with the emergence of receptive vocabulary by the age of three years. This shared genetic component may imply developmentally stable biological mechanisms, but could also Figure 4 Bivariate heritability estimates. Bivariate heritability estimates (forward GSEM, GSEM software) reflect the proportion of the phenotypic covariance that is accounted for by the genetic covariance. Bivariate heritability estimates were truncated at one for reading a 9 (NARA II), reading s 9 (NARA II), reading s 13 (TOWRE) and NW reading s 13 (TOWRE). PIQ at 8 years was assessed for sensitivity analyses only. Bars represent standard errors. * Estimates passing the experiment-wide significance threshold (p ≤ .005). Trait abbreviations are described in Table 1 © 2020 The Authors. Journal of Child Psychology and Psychiatry published by John Wiley & Sons Ltd on behalf of Association for Child and Adolescent Mental Health reflect different regulations of the same genes over time, although the current findings do not allow us to infer specific biological pathways. At the same time, amplification processes predominate for genetic factors underlying early receptive vocabulary compared to genetic factors contributing to early expressive vocabulary, suggesting some degree of genetic specificity. Hence, the underlying genetic mechanisms may only partially adhere to the concept of 'generalist genes' (Plomin & Kovas, 2005), for the following reasons: Early language skills at the age of three, including vocabulary, comprehension and sentence construction, have been linked to adolescent reading comprehension (Frost, Madsbjerg, Niedersøe, Olofsson, & Sørensen, 2005). Notably, broadly defined early oral language, including receptive skills (Bryant, Maclean, & Bradley, 1990), has been shown to affect word recognition (NICHD Early Child Care Research Network, 2005), while vocabulary comprehension is also a precursor of listening comprehension (Dickinson & Neuman, 2007). Thus, receptive vocabulary skills might show wide-ranging links with both key predictors of reading comprehension, decoding and language comprehension, as proposed by the 'simple view of reading' (Gough & Tunmer, 1986). Consistently, a delay in both expressive and receptive vocabulary at the age of two is much more likely to lead to problems with later literacy, compared to delays in expressive vocabulary alone (Psyridou, Eklund, Poikkeus, & Torppa, 2018), and expressive and receptive vocabulary may be independently related to prereading skills (Wise, Sevcik, Morris, Lovett, & Wolf, 2007). Furthermore, variation in comprehension has been associated with nonlinguistic cognitive measures, such as tool use and symbolic play, compared to expressive vocabulary (Bates, Dale, & Thal, 2019). Consequently, genetic variation for receptive vocabulary at 38 months may share genetic foundations with several key skills that are important for future reading, language and cognitive development, detectable as genetic amplification, and only partially overlap with cognitive mechanisms that are predicted by genetic factors influencing expressive vocabulary alone. Note that literacy abilities in this study primarily assessed accuracy and speed of reading and spelling, and that our findings may, thus, only partially apply to reading comprehension.
Increased SNP-h 2 estimates of language/literacy skills from mid-childhood to adolescence, compared to estimates of early-childhood language, may arise due to genotype-environment correlations, as children modify and select their environment in accordance with their genetic make-up (Plomin, DeFries, & Loehlin, 1977). Furthermore, the environmental variance may decrease with the start of schooling (Samuelsson et al., 2008). Finally, parent-reported vocabulary measures might be associated with higher random error rates (rendering them less reliable) than direct assessments of language and literacy skills using standardised psychological instruments, which consequently may affect the reliability of heritability estimations (Walters, Churchhouse, & Hosking, 2019). Thus, our findings do not preclude the emergence of novel genetic influences from mid-childhood onwards. Parent-reported vocabulary measures in ALSPAC have sufficient power (80%) to detect SNP-h 2 estimates of ≥0.15 (Appendix S8). However, compared to largescale genome-wide studies of educational attainment (Lee et al., 2018) or direct assessments of language and literacy measures, their predictive power is low. This advocates a need for improvement of instruments assessing early language skills, especially as moderate to strong correlations between parental judgements and direct assessments of a child's vocabulary suggest sufficient instrument validity (Ring & Fenson, 2000;Sachse & Von Suchodoletz, 2008). A further limitation of the current study is that the CDI Words & Sentences were developed for vocabulary assessment in children up to 30 months (Fenson et al., 1993), whereas ALSPAC children were assessed at 38 months of age, potentially leading to ceiling effects. Finally, the lack of independent cohorts with data on both early expressive and receptive vocabulary prevents a direct replication of our findings.
The strength of this work lies in the identification of amplification processes using longitudinal models, suggesting that the developmental origins of many later complex skills, especially those related to literacy and cognition, lie in early-childhood. Thus, cheaply and easily administered parent-reported CDI questionnaires, which are widely used to assess children's early language (Frank, Braginsky, Yurovsky, & Marchman, 2017), might be useful instruments to capture genetic variation in language, literacy and cognitive skills many years later in life.

Supporting information
Additional supporting information may be found online in the Supporting Information section at the end of the article: Appendix S1. ALSPAC description. Appendix S2. Genetic quality control. Appendix S3. Mid-childhood/early-adolescent ALSPAC measures. Appendix S4. Structural equation modelling. Appendix S5. Factorial co-heritability. Appendix S6. Bivariate heritability. Appendix S7. Meta-analysis across mid-childhood/ early-adolescent language-and literacy-related abilities. Appendix S8. Power analyses. Table S1. Meta-analysis domains of mid-childhood/ early-adolescent language-and literacy-related abilities. Table S2. SNP-heritability estimates.  Table S3. Factorial co-heritabilities. Table S4. Meta-analysis across pre-defined languageand literacy-related ability combinations. Table S5. Bivariate heritability estimates. Figure S1. Phenotypic correlations among early vocabulary and mid-childhood/early-adolescent abilities related to literacy, language and cognition. Figure S2. Path models of early vocabulary and midchildhood/early-adolescent literacy-and language-related abilities (forward GSEM). Figure S3. Variance plots for path models of early vocabulary and mid-childhood/early-adolescent literacy-and language-related abilities (forward GSEM). Figure S4. Path models of early vocabulary and midchildhood/early-adolescent literacy-and language-related abilities (reverse GSEM). Figure S5. Variance plots for path models of early vocabulary and mid-childhood/early-adolescent literacy-and language-related abilities (reverse GSEM). Figure S6. Genetic correlations among early vocabulary and mid-childhood/early-adolescent abilities related to literacy, language and cognition. Figure S7. Path model and variance plot for early vocabulary and mid-childhood performance intelligence.