Re-use of this article is permitted in accordance with the Terms and Conditions set out at http://www3.interscience.wiley.com/authorresources/onlineopen.html.
Article first published online: 7 NOV 2007
© 2007 Society for Research in Child Development
Monographs of the Society for Research in Child Development
Volume 72, Issue 3, pages 14–48, December 2007
How to Cite
(2007), II. METHODS. Monographs of the Society for Research in Child Development, 72: 14–48. doi: 10.1111/j.1540-5834.2007.00440.x
- Issue published online: 7 NOV 2007
- Article first published online: 7 NOV 2007
In this chapter, we describe the sample, measures, and analyses used to investigate the genetic and environmental origins of learning abilities and disabilities in the early school years. We also present descriptive statistics for all of the measures at 7, 9, and 10 years.
- Top of page
- OVERVIEW OF MEASURES AND PROCEDURES
- PHENOTYPIC ANALYSES
- GENETIC ANALYSES
All analyses reported in this monograph are based on data collected as part of the Twins' Early Development Study (TEDS), a longitudinal study involving a representative sample of all twins born in England and Wales in 1994, 1995, and 1996 (Oliver & Plomin, 2007; Trouton, Spinath, & Plomin, 2002). Families of twins (n=25,815) were identified by the Office for National Statistics (ONS) from their children's birth records and contacted when the children were 1 year old. Of all families (n=16,810) who responded that they were interested in participating in TEDS, 12,054 families have been involved in TEDS since its inception, at least for one assessment point. Various subsets of this foundation sample were assessed at each age, as described later.
Although cognitive and language data were obtained in TEDS at 2–4 years (e.g., Colledge et al., 2002; Dale et al., 1998; Dale, Dionne, Eley, & Plomin, 2000; Dionne, Dale, Boivin, & Plomin, 2003; Hayiou-Thomas et al., 2006; Kovas et al., 2005; Price, Dale, & Plomin, 2004; Spinath, Ronald, Harlaar, Price, & Plomin, 2003; Spinath, Harlaar, Ronald, & Plomin, 2004; Viding et al., 2003; Viding et al., 2004), the focus of this monograph is on learning abilities assessed at 7, 9, and 10 years. These ages correspond to the early school years during which important changes in academic content occur, reflected in the U.K. National Curriculum (NC) by a second key stage (see Appendices A–C). The NC across all of the key stages is based on an 8-point scale. The differences between the key stages reflect the expectation that children of a certain age should score appropriately on this scale. For example, at the end of key stage 1 most children reach level 2, and at the end of key stage 2 most children reach level 4. These changes are also accompanied by major content and difficulty changes. For example, for English: Speaking, and Listening, at the end of key stage 1 children are expected to reach level 2, which is described as children beginning to show confidence with speaking and listening. By the end of key stage 2 children are expected to have reached level 4 where they are able to talk and listen with confidence (see Appendices A–C that detail the attainment targets for each level).
Before analysis, the following exclusion criteria were applied: specific medical syndromes such as Down syndrome and other chromosomal anomalies, cystic fibrosis, and cerebral palsy; severe hearing loss; autism spectrum disorder; organic brain damage; extreme outliers for birth weight and gestational age; heavy maternal alcohol consumption (>13 units of alcohol per week) during pregnancy; and intensive care after birth. Although the numbers of children excluded varies for different analyses, in general 8% of the sample was excluded on the basis of these criteria.
Table 1 summarizes the sample sizes at each age after exclusions. Although teacher ratings at 7 years were obtained from all three cohorts, funds were available only to include the first two cohorts for the other measures and other ages.
|Age||National Curriculum (NC)||Tests|
|English Math Science||Reading||Math||“g”|
|7||Cohorts 1–3: n=11,333–11,482||Cohorts 1–2: n=9,925–9,979 (telephone testing)||—||Cohorts 1–2: n=9,940 (telephone testing)|
|9||Cohorts 1–2: n=5,319–5,421||—||—||Cohorts 1–2: n=6,259 (booklet)|
|10||Cohorts 1–2: n=5,561–5,690||Cohorts 1–2: n=5,808 (web)||Cohorts 1–2: n=5,348 (web)||Cohorts 1–2: n=5,084 (web)|
Considering the major burden imposed by the booklets on harried parents of young twins and our lack of pressure on the parents in order to avoid having families drop out of the study, a gratifyingly large number of parents completed the time-consuming booklets, which testifies to the well-known phenomenon of excellent cooperation from parents of young twins. Each year, parents were given the opportunity to indicate by checking a box that they no longer wish to participate in the study; after 10 years, only 1,147 of the 16,810 (6.8%) families have so indicated.
TEDS families are reasonably representative as compared with U.K. census data for families with children. Table 2 indicates that mothers in the total TEDS sample are representative of the United Kingdom population for ethnicity and for the percentage who completed A-level exams, which are taken by students finishing secondary school who plan to go to university. Moreover, mothers who completed all test booklets at each age (third column in Table 2) do not differ from the total TEDS sample (second column) for ethnicity and A-level exams. The percentage of mothers who had no educational qualifications (i.e., in the U.K. system they did not pass the examinations as part of the General Certification in Secondary Education or any higher examinations) was somewhat higher and the percentage of working mothers was somewhat lower in TEDS as compared with all mothers in the United Kingdom.
|Mother||U.K.||TEDS||TEDS complete data|
|School leaver (%)||19||10||7|
A parent-rated questionnaire was used to assign twin zygosity of same-sex twins when the twins were 18 months old, and again when twins were 3 and 4 years old. (Opposite-sex twins are of course always DZ.) This questionnaire includes items such as whether the twins are “as physically alike as two peas in the pod,” whether they have hair that is similar in color and texture, and whether they have the same eye color. At 18 months of age, zygosity was correctly assigned by parent ratings in 94% of cases as validated against zygosity assigned by identity of polymorphic DNA markers using DNA extracted from cheek swabs (for details see Freeman et al., 2003; Price et al., 2000).
These results validate the use of parental report questionnaire data to assign zygosity even in infancy, and concur with other studies showing that the determination of zygosity in twins based on questionnaires can be done with a high degree of accuracy (for a review, see Rietveld, van Baal, Dolan, & Boomsma, 2000). For the sample used in this monograph, we used zygosity information assessed from DNA when it was available (34% of the total sample). DNA is available for twice as many pairs in anticipation of future molecular genetic studies. However, zygosity tests are costly and were conducted only when the parents requested zygosity testing or when the twins' zygosity was doubtful. For the rest of the sample, zygosity of same-sex twins was based on parental assessments of their twins' physical similarity. As expected, roughly one-third of the twins are MZ, one-third are same-sex DZ, and one-third are opposite-sex DZ.
OVERVIEW OF MEASURES AND PROCEDURES
- Top of page
- OVERVIEW OF MEASURES AND PROCEDURES
- PHENOTYPIC ANALYSES
- GENETIC ANALYSES
At ages 7, 9, and 10, data collection was based on the school year (September–August). Rating scales and questionnaires were sent to teachers in the spring term to ensure that each child had received approximately the same contact time with teachers and to allow teachers to become familiar with the children's achievement and behavior over the academic year. Both members of a twin pair were rated by a single teacher if they were in the same classroom; co-twins were rated by different teachers if they were in different classrooms. The percentages of twins rated by the same teacher were 67% at 7 years, 63% at 9 years, and 58% at 10 years. Informed consent was obtained in writing from parents at each assessment so that they were free to withdraw from that particular part of the project, as well as having the option of withdrawing from the entire study as well. Informed consent was also obtained from teachers.
Teacher NC Assessments at 7, 9, and 10 Years
When the twins were 7, 9, and 10 years of age (corresponding to the second, fourth, and fifth years of school in the United Kingdom), their teachers assessed three broad areas of ability: English (including Speaking and Listening, Reading, and Writing), mathematics (including Using and Applying Mathematics, Numbers, and Shapes, Space, and Measures), and science (including Scientific Enquiry, Life Processes, and Physical Processes), which was assessed at 9 and 10 only. These assessments were based on the U.K. NC, the core academic curriculum developed by the Qualifications and Curriculum Authority (QCA), and the National Foundation for Educational Research (NFER) (QCA: http://www.qca.org.uk; NFER: http://www.nfer.ac.uk/index.cfm). This assessment follows from requirements of key stages for attainment in English, Mathematics, and Science. Although U.K. teachers are well familiar with these criteria, we reminded them of these criteria as part of our mailing (see Appendices A–D).
The second year of school (age 7) corresponds to NC key stage 1, and the fourth and fifth school years (ages 9 and 10) correspond to NC key stage 2 (Qualifications and Curriculum Authority, 1999; Qualifications and Curriculum Authority, 2003). For the NC Teacher Assessments, at the end of the school year, teachers summarize students' performance throughout the school year in each of these areas using a 5-point scale (see Appendices A–C for full details of the scales for each subject). This judgment was not made specifically for the present study, but rather forms the continuing assessment of each child that ultimately leads to the final NC Teacher Assessment score submitted to the QCA at the end of the school year to indicate the child's academic achievement during that year. (Other measures such as QCA-administered tests also contribute to children's grades, but we did not have access to these data.) We asked teachers to provide this rating using a similar format. In addition to analyzing the three components within each of the three broad areas of achievement, composite measures were created for each of the three broad areas at each age (English composite, Mathematics composite, and Science composite) by calculating a mean for the three scores. The use of composites to represent each area was supported by the results of factor analyses (computed using one twin from each pair), which showed high first unrotated principal component loadings for all measures at all ages (average variance explained by the first principal component=87%, range=78–93%).
There is growing evidence for the validity of teacher assessments. In TEDS, for example, a general factor for NC ratings at 7 years has been found to correlate .58 with a general factor of telephone-administered tests of verbal and nonverbal cognitive abilities (Spinath, Ronald, Harlaar, Price, & Plomin, 2003). Correlations between NC ratings and test data also support the validity of teacher assessments, as described in Chapter VI.
Telephone Testing at 7 Years
At age 7, we assessed the children's reading and general cognitive ability on the telephone. Our telephone adaptation of the tests retained the original test materials, and the administration procedure was closely aligned to the standard face-to-face procedure. Item lists were mailed to families in a sealed envelope before the test sessions. Twins in each pair were tested within the same test session and by the same tester, who was blind to zygosity. Several precautions were taken to prevent cheating. First, it was emphasized to parents that the test items were meant for a range of ages and that no 7-year-old children would be able to perform successfully on all tasks. Second, test stimuli were mailed to families in a sealed envelope before the test sessions with separate instructions that the envelope should not be opened until the time of testing. Third, parents were asked to provide a room that was free from distractions, such as other family members and operating televisions. Finally, the testing procedure provided no opportunity for parental intercession.
Telephone-administered measures have been shown to be efficient and cost-effective alternatives to in-person assessments. Recent reports have demonstrated good reliability and validity of telephone assessments. For example, in a validation study of telephone-administered cognitive measures 52 children as young as 6 years were recruited as part of a larger volunteer family registry at Wesleyan University, U.S.A. (Petrill, Rempell, Oliver, & Plomin, 2002). These children were assessed using the telephone battery and then tested at home using the Stanford-Binet (SB) Intelligence Scale (Thorndike, Hagen, & Sattler, 1986). A general cognitive ability composite from the telephone-administered battery and the SB correlated .62. We have also shown in TEDS that a word recognition test administered by telephone correlated .70 with NC teacher assessments of reading (Dale, Harlaar, & Plomin, 2005).
Booklet Testing at 9 Years
Nine-year-old participants received a test booklet containing four cognitive tests that were administered under the supervision of the parent who was guided by an instruction booklet. As with telephone testing, precautions were taken to prevent cheating. Correlations with telephone-administered and web-administered cognitive testing are described in Chapters V.
Web-Based Testing at 10 Years
At age 10 children participated in web-based testing. The internet is well suited to children as young as 10, most of whom are competent computer users. Web-based testing can be interactive and enjoyable; ease of understanding the test questions can be facilitated by including voice instructions as well as on-screen text as well as graphics and practice items. Branching rules on some tests allowed for adaptive testing, which increases their engagement while limiting the number of items that need to be answered (Birnbaum, 2004).
The use of web-based assessment facilitates data collection because it allows data from large widely dispersed samples to be collected quickly, cheaply, and reliably. Web-based data collection is less error prone because it does not require human transcription and data entry (Kraut et al., 2004; Naglieri et al., 2004). Another positive aspect of web testing is that the social pressure or embarrassment which might be present in face-to-face testing is reduced (Kraut et al., 2004; Birnbaum, 2004). Moreover, several recent empirical studies have found that web-based findings generalize across presentation formats, and are consistent with findings from traditional methods (e.g., Gosling, Vazire, Srivastava, & John, 2004).
In TEDS, 80% of the families have daily access to the internet (based on a pilot study with 100 randomly selected TEDS' families), which is similar to the results of market surveys of U.K. families with adolescents. Most children without access to the internet at home have access in their schools and local libraries.
In designing our web-based battery, we guarded against potential problems associated with research on the internet. The web page and testing were administered by a secure server in the TEDS office (the TEDS' web page can be accessed at http://www.teds.ac.uk). We used a secure site for data storage; identifying information is kept separately from the data. Safeguards were in place that prevented children from answering the same item more than once. We provided technical support and other advice to parents and children who were advised to call our toll-free telephone number in case of any problems or questions.
Parents supervised the testing by coming online first with a user name and password for the family, examining a demonstration test and completing a consent form. Then parents allowed each twin to complete the test in turn. Parents were urged not to assist the twins with answers and not to allow the twins to see each other's answers. We are confident on the basis of our telephone interactions with many of the parents that parents complied with these requirements, most of whom have participated in the TEDS research program for a decade.
- Top of page
- OVERVIEW OF MEASURES AND PROCEDURES
- PHENOTYPIC ANALYSES
- GENETIC ANALYSES
When children were seven, teachers assessed academic achievement in three areas of English at key stage 1, designed for children aged 5–7 years. The QCA provides teachers with guidelines for assessments that aim to cover diverse aspects of the three areas, Writing, Reading, and Speaking/Listening. (See Appendix A for the 5-point NC criteria given by the QCA and used by teachers to indicate achievement levels in each of the three areas of English.) The same three areas were assessed when the children were 9 and 10 using key stage 2 NC criteria.
When children were 7, the Test of Word Reading Efficiency (TOWRE, Form B; Torgesen, Wagner, & Rashotte, 1999) was administered to children over the telephone. The TOWRE, a standardized measure of fluency and accuracy in word reading skills, includes two subtests, each printed on a single sheet: A list of 85 words, called Sight-word Efficiency (SWE), which assesses the ability to read aloud real words, and a list of 54 non-words, called Phonemic Decoding Efficiency (PDE), which assesses the ability to read aloud pronounceable printed nonwords. The child is given 45 seconds to read as many words as possible. Twins were individually assessed by telephone using test stimuli that had been mailed to families in a sealed package with separate instructions that the package should not be opened until the time of testing. The same tester, who was blind to zygosity, assessed both twins in a pair within the same test session. In addition to looking at each component, a reading composite was also created, as supported by the correlation of .83 between the two subtests.
Although we are not aware of any previous studies that have administered reading tests by telephone, we recently examined reading scores for 54 twin pairs from the 1994 cohort who participated in the 7-year telephone testing and who were also tested by telephone at age 9 on Form A of the TOWRE and on the comprehension subtest of the Neale Analysis of Reading Ability (NARA)-II; (Neale, 1997). For the 108 children, the correlation between TOWRE Form B at 7 years and TOWRE Form A at 9 years was .83. This finding is consistent with previous research demonstrating the longitudinal stability of word level reading skills ( Juel, 1988; Torgesen et al., 1999) and can be seen as a lower-limit estimate of reliability. Furthermore, the correlation between the TOWRE composite and the NARA-II comprehension test was .73, consistent with previous research demonstrating the association between word identification and later reading performance (e.g., Juel, 1988; Storch & Whitehurst, 2002). In addition, our results (e.g., standard deviations, twin correlations, heritability estimates) mirror very closely the TOWRE results from a U.S. study in which the TOWRE was administered in the standard format to twins in kindergarten (age 6) and first grade (age 7) (Byrne et al., 2005). Although TOWRE's standardization has been done in the United States, rather than United Kingdom, the focus of this study is not on how the children compared with norms, but rather on variance within the sample.
At age 10, participants completed a web-based adaptation of the reading comprehension subtest of the Peabody Individual Achievement Test (Markwardt, 1997) at home (hereafter referred to as PIAT). The PIAT assesses literal comprehension of sentences. Sentence items were presented visually and with oral instructions given by the computer using digitized speech. The children responded by selecting the picture described by the sentence using the mouse, moving the pointer to the desired location and clicking on it. All the children started with the same items, but an adaptive algorithm modified item order and test discontinuation depending on the performance of the participant. Children could attempt each item only once. The web-based adaptation of the PIAT contained the same practice items, test items, and instructions as the original published test. Credit (automatic score of 1) was given for all items that were skipped due to upward branching. PIAT total scores were derived by summing correct and credited scores. Test–retest reliability of the PIAT across 7 months was .66 in a subsample of the TEDS twins (n=55). The PIAT also shows good internal consistency (Cronbach's α=.95).
When children were seven, teachers assessed academic achievement in three areas of mathematics at key stage 1, designed for children aged 5–7 years. The QCA provides teachers with guidelines for assessments that aim to cover diverse aspects of the three domains: Using and Applying Mathematics, Numbers, and Shapes, Space, and Measures (see Appendix B for the 5-point NC criteria given by the QCA and used by teachers to indicate achievement levels in each of the three areas of mathematics). The same three areas were assessed when the children were 9 and 10 using key stage 2 NC criteria (see Appendix B).
We developed a web-based battery that assessed three aspects of mathematics performance (described below) when the children were 10. The items were based on the NFER 5–14 Mathematics Series, which is linked closely to curriculum requirements in the United Kingdom and the English Numeracy Strategy (nferNelson, 1994, 1999, 2001). Such curriculum-based assessment alleviates some of the potential biases associated with other achievement tests (Good & Salvia, 1988). From booklets 6–11 (referring to age of students), a total of 77 target items were chosen. The items were organized by mathematical subtest and level of difficulty. The level of difficulty was based on the NC level and the percentage correct for each item from the NC standardization sample (reported in the Group Record Sheets, nferNelson). A set of adaptive branching rules was developed separately for each of three subtests, so that all the children started with the same items, but then were branched to easier or harder items depending on their performance. The presentation of items was streamed, so that items from the three subtests were mixed to make the test more interesting, but the data recording and branching were done within each subtest. Participants could attempt each item only once.
As with many psychological tests that use branching (e.g., Wechsler Intelligence Scale for Children (WISC-III-UK, Wechsler, 1992)), the general scoring rules were as follows: 1 point was recorded for each correct response, for each unadministered item preceding the child's starting point, and for each item skipped through branching to harder items. After a certain number of failures, a discontinuation rule was applied within each area, and no points were recorded for all items after discontinuation. Thus, for each of the 77 items, a score of 1 or 0 was recorded for each child. For example, for Computation and Knowledge (total number of items=31), all children started at item 10. The following rules were then applied:
- •If items 10–12 were all answered incorrectly, the child was branched to item 1, and had to continue with the test attempting all remaining items, or until the discontinuation criterion was met.
- •If items 10–12 were all answered correctly, the child received credit for all preceding items (1–9), and was branched to item 24. If items 24–26 were all answered incorrectly, the child was branched back to item 13 and had to continue with the test (skipping all items administered previously), attempting all remaining items, or until the discontinuation criterion was met. If one or two of items 24–26 were answered incorrectly the child received credit for all preceding items (13–23) and then continued with the test, attempting items 27–31, or until the discontinuation criterion was met.
- •If items 10–12 were not all answered incorrectly or correctly (i.e., if some but not all were answered correctly), the child received credit for all preceding items (1–9) and then had to continue with the test, attempting at all remaining items or until the discontinuation criterion was met.
- •Discontinuation criterion: three incorrect answers in a row (does not apply across branching points).
As with other psychological tests with items of increasing difficulty and using similar rules, this scoring system for our branching approach is meant to mirror the traditional approach in which all children attempt all items, allowing us to calculate total number and proportion of correct responses for each child for each subtest, as well as testing the internal consistency of each subtest. Specific branching and discontinuation rules and the number of skipped (credited) items for each subtest are available from the authors.
The items were drawn from the following three subtests:
Understanding Number (27 items) requires an understanding of the numerical and algebraic process to be applied when solving problems (such as understanding that multiplication and division are inverse operations). For example, “Look at the number 6085. Change the order of the figures around to make the biggest number possible.” Another example is: “Type the missing number in the box: 27+27+27+27+27=27 × _.”
Nonnumerical Processes (19 items) requires understanding of nonnumerical mathematical processes and concepts such as rotational or reflective symmetry and other spatial operations. The questions do not have any significant numerical content that needs to be considered by the pupils. Three examples follow: “Which is the longest drinking straw? Click on it.”“One of these shapes has corners that are the same. Click on this shape.”“Which card appears the same when turned upside down? Click on it.”
Computation and Knowledge (31 items) assesses the ability to perform straightforward computations using well-rehearsed pencil and paper techniques and the ability to recall mathematical facts and terminology. These questions are either algorithmic or rely upon memorizing mathematical facts and terminology. The operation is stated or is relatively unambiguous. Three examples follow. “Type in the answer: 76 – 39.”“All four-sided shapes are called? Click on the answer (squares rectangles parallelograms kites quadrilaterals).”“Type in the answer: 149+785=?.”
A composite score was also created using the mean of the percentage scores of the three tests. This was supported by the high correlations between the three tests; as reported in Chapter VI, the average correlation was .59.
The web-administered measures yielded high Chronbach's α coefficients (Understanding Number: α=.88; Nonnumerical Processes: α=.78; Computation and Knowledge: α=.93).
Finally, in terms of validity, we were able to compare children's overall web-based performance in mathematics at 10 years to their overall mathematics performance in the classroom as assessed by their teachers on the national curriculum criteria when the children were 10 years old and we found a correlation of .53 ( p<.001, N=1,878). Only one twin from each pair was randomly selected for this analysis; a similar correlation of .50 was found for the other half of the sample.
As a direct test of the reliability and validity of the web-based measures, we conducted a test–retest study in which thirty 12-year-old children (members of 15 twin pairs) who had completed the web-based testing were administered the tests in person using the standard 12-year paper and pencil version of the test (nferNelson, 2001). Stratified sampling was used to ensure coverage of the full range of ability. The interval between test and retest was 1–3 months with an average of 2.2 months. The total math score from our web-based tests correlated .92 with the total score from the in-person testing for the total sample of 30 children; generalized estimation equations that take into account the nested covariance structure yielded a correlation of .93. For the three subtests reported in this paper the correlations between the web and the paper and pencil scores were .77, .64, and .81 for Understanding Number, Nonnumerical Processes, and Computation and Knowledge, respectively. These results demonstrate that our web-based testing is both highly reliable and valid, at least at 12 years.
As for all children in U.K. schools, the twins' scientific performance was assessed throughout the fourth and the fifth years of school (corresponding to age 9 and 10) by their teachers, using criteria and tests of the NC. In the current study, the NC Teacher Assessments at key stage 2 were used, which are familiar to teachers and are designed for children age 8 through their sixth year of primary school at age 11. For key stage 2, the QCA provides teachers with NC material and assessment guidelines for three strands of science which directly map on to areas in science that are taught throughout the NC at this stage: Scientific Enquiry, Life Processes, and Physical Processes (see Appendix C for the 5-point NC criteria given by the QCA and used by teachers to indicate achievement levels in each of the three areas of science).
General Cognitive Ability
We assessed general cognitive ability (“g”) at 7, 9, and 10 using two verbal tests and two nonverbal tests but with very different procedures at each age (from telephone testing at 7 to parent administration of mailed booklets at 9 and to web-based testing at 10). At each age, we selected tests that were highly loaded on “g” and well suited to the particular format of administration.
Two verbal and two nonverbal cognitive measures designed to yield an index of “g” were administered over the telephone using the same procedure as described in the aforementioned section on reading at 7. The verbal measures were the Vocabulary (what does “strenuous” mean?) and Similarities (in what way are milk and water alike?) subtests of the Wechsler Intelligence Scale for Children (WISC-III-UK; Wechsler, 1992). The nonverbal measures were the Picture Completion subtest from the Wechsler Scale, in which a child needs to find a missing part in a picture in 20 seconds, and Conceptual Grouping from the McCarthy Scales of Children's Abilities (MCSA; McCarthy, 1972), which assesses the child's ability to deal logically with objects, to classify, and to generalize. Scores from our telephone adaptations of these standard cognitive tests have been shown to be substantially correlated with both subtest and composite scores from in-person assessments using the Stanford-Binet Intelligence Scale (Thorndike, Hagen, & Sattler, 1986) in 6- to 8-year-old children (Petrill et al., 2002).
Nine-year-old participants received a test booklet containing two nonverbal and two verbal tests that were administered under the supervision of the parent (guided by an instruction booklet). The verbal tests included two tests adapted from the WISC-III (Wechsler, 1992): Vocabulary (what does “migrate” mean?) and a General Knowledge test (in which direction does the sun set?) adapted from the Information subtest of the multiple choice version of WISC-III (Kaplan, Fein, Kramer, Delis, & Morris, 1999).
The nonverbal tests included a Puzzle test adapted from the Figure Classification subtest of the Cognitive Abilities Test 3 (CAT) (Smith, Fernandes, & Strand, 2001). This test involves inductive reasoning and a minor element of visualization. The child is asked to identify which shape, out of five, continues a series. The second nonverbal test is a Shapes test also adapted from the CAT3 Figure Analogies subtest that assesses inductive and deductive reasoning. The child is asked to identify the one shape, out of five, that relates to another shape in the same way as shown by an example (e.g., a rectangle and a square relate to each other like an oval and what other shape?).
Participants at age 10 were tested on a web-based adaptation of two verbal tests: WISC-III Multiple Choice Information (General Knowledge) and WISC-III Vocabulary Multiple Choice (Wechsler, 1992). Two nonverbal reasoning tests were also administered as part of the web battery: WISC-III-UK Picture Completion (Wechsler, 1992) and Raven's Standard Progressive Matrices (Raven, Court, & Raven, 1996).
In addition to examining each test separately, a composite measure was constructed at each age. A mean standardized score was calculated when data were available for all four subtests. The use of a composite was supported by the results of factor analyses (conducted on one twin from each pair), which showed high principal component loadings for all measures at all ages: the first principal component accounted for 47%, 53%, and 55% of the variance of the four measures at 7, 9, and 10 years, respectively.
- Top of page
- OVERVIEW OF MEASURES AND PROCEDURES
- PHENOTYPIC ANALYSES
- GENETIC ANALYSES
Although all analyses in this monograph are based on standard scores, in order to provide a general characterization of performance we report unadjusted raw score means and standard deviations for NC measures and test scores in Appendix D. Normative data are available for two of the tests. For the TOWRE administered by telephone at age 7, the mean performance of our sample on both subtests corresponds to a standard score of 105. For the web-administered PIAT Reading Comprehension, the mean performance of our sample corresponds to a standard score of 102. The agreement with norms is remarkable, given the different national context (U.K. vs. U.S.), method of administration, and twinship status of the sample, and provides further assurance of the appropriateness of the measures. Moreover, the six mean NC ratings of the TEDS sample at 7 reported in Appendix D are also very close to national norms (available from http://www.standards.dfes.gov.uk/performance for age 7), in every case deviating by less than .2 SD. These national norms are not available for ages 9 and 10 because these ages are not at the end of a key stage.
Analysis of variance (ANOVA) was performed on each variable in order to assess the mean effects of sex and zygosity and their interaction on each variable. All scores were corrected for age at time of testing and standardized using the standardized residuals from a regression on age. Tables 3–8 present means and standard deviations and the results of ANOVAs for all measures. These data are corrected for age at time of assessment and standardized to facilitate comparisons between groups; standardized data corrected for age and sex are used in our genetic analyses for reasons explained later (unstandardized means and standard deviations are included in Appendix D).
|Measure at 7||Zygosity||Sex||ANOVA|
|MZ (n=4,090–4,133)||DZ (n=7,296–7,349)||Female (n=5,855–5,908)||Male (n=5,531–5,574)||Zygosity||Sex||Zygosity × Sex|
|Speaking and Listening||−.06 (1.02)||.03 (.99)||.08 (.96)||−.09 (1.03)||p<.001 η2=.002||p< .001 η2=.008||p=.037 η2<.001|
|Reading||−.06 (1.00)||.03 (1.00)||.10 (.96)||−.11 (1.03)||p<.001 η2=.002||p<.001 η2=.011||p=.459 η2<.001|
|Writing||−.04 (1.00)||.02 (1.00)||.14 (.95)||−.15 (1.03)||p<.001 η2=.001||p<.001 η2=.019||p=.768 η2<.001|
|Composite||−.06 (1.01)||.03 (.99)||.12 (.95)||−.13 (1.03)||p<.001 η2=.002||p<.001 η2=.016||p=.236 η2<.001|
|Measure at 9||Zygosity||Sex||ANOVA|
|MZ (n=1,947–1,963)||DZ (n=3,429–3,458)||Female (n=2,824–2,848)||Male (n=2,552–2,573)||Zygosity||Sex||Zygosity × Sex|
|Speaking and Listening||−.06 (1.00)||.03 (1.00)||.11 (.96)||−.12 (1.03)||p<.001 η2=.002||p<.001 η2=.013||p=.350 η2<.001|
|Reading||−.03 (1.00)||.02 (1.00)||.11 (.95)||−.12 (1.04)||p=.049 η2=.001||p<.001 η2=.014||p=.220 η2<.001|
|Writing||−.05 (1.00)||.03 (1.00)||.13 (.96)||−.14 (1.02)||p=.003 η2=.002||p<.001 η2=.018||p=.327 η2 < .001|
|Composite||−.05 (1.00)||.03 (1.00)||.13 (.95)||−.14 (1.03)||p=.001 η2=.002||p<.001 η2=.019||p=.217 η2< .001|
|Measure at 10||Zygosity||Sex||ANOVA|
|MZ (n=2,006–2,033)||DZ (n=3,624–3,657)||Female (n=2,957–2,992)||Male (n=2,673–2,698)||Zygosity||Sex||Zygosity × Sex|
|Speaking and Listening||−.07 (1.01)||.04 (.99)||.11 (.95)||−.12 (1.04)||p<.001 η2=.003||p<.001 η2=.012||p=.719 η2<.001|
|Reading||−.05 (1.00)||.03 (1.00)||.09 (.96)||−.10 (1.03)||p=.002 η2=.002||p<.001 η2=.010||p=.519 η2<.001|
|Writing||−.02 (1.00)||.01 (1.00)||.14 (.95)||−.16 (1.03)||p=.048 η2=.001||p<.001 η2=.022||p=.402 η2<.001|
|Composite||−.05 (1.01)||.03 (.99)||.12 (.95)||−.14 (1.04)||p=.001 η2=.002||p<.001 η2=.017||p=.486 η2<.001|
|Measure at 7||Zygosity||Sex||ANOVA|
|MZ (n=3,582–3,602)||DZ (n=6,343–6,377)||Female (n=5,104–5,138||Male (n=4,821–4,841)||Zygosity||Sex||Zygosity × Sex|
|Towre: word||−.03 (1.01)||.02 (1.00)||.09 (.97)||−.09 (1.02)||p=.005 η2=.001||p< .001 η2=.009||p=.003 η2=.001|
|Towre: nonword||−.04 (1.01)||.03 (.99)||−.00 (.98)||.00 (1.02)||p=.001 η2=.001||p=.582 η2<.001||p=.019 η2=.001|
|Towre: composite||−.04 (1.01)||.02 (1.00)||.05 (.97)||−.05 (1.03)||p=.001 η2=.001||p<.001 η2=.003||p=.004 η2=.001|
|Measure at 10||Zygosity||Sex||ANOVA|
|MZ (n=2,110)||DZ (n=3,698)||Female (n=3,162)||Male (n=2,646)||Zygosity||Sex||Zygosity × Sex|
|PIAT||−.06 (1.00)||.03 (1.00)||−.01 (.97)||.02 (1.04)||p=.001 η2=.002||p=.307 η2<.001||p=.871 η2<.001|
|Measure at 7||Zygosity||Sex||ANOVA|
|MZ (n=4,063–4,118)||DZ (n=7,270–7,337)||Female (n=5,829–5,894)||Male (n=5,504–5,561)||Zygosity||Sex||Zygosity × Sex|
|Using and Applying||−.05 (1.00)||.03 (1.00)||−.04 (.94)||.04 (1.06)||p<.001 η2=.001||p=.003 η2=.001||p=.008 η2=.001|
|Numbers and Algebra||−.05 (1.00)||.03 (1.00)||−.04 (.95)||.04 (1.05)||p<.001 η2=.001||p=.002 η2=.001||p=.028 η2<.001|
|Shapes, Space and Measures||−.06 (1.01)||.03 (.99)||−.01 (.94)||.01 (1.06)||p<.001 η2=.002||p=.658 η2<.001||p=.028 η2<.001|
|Composite||−.05 (1.00)||.03 (1.00)||−.03 (.94)||.03 (1.06)||p<.001 η2=.002||p=.026 η2<.001||p=.012 η2=.001|
|Measure at 9||Zygosity||Sex||ANOVA|
|MZ (n=1,932–1,946)||DZ (n=3,413–3,441)||Female (n=2,809–2,832)||Male (n=2,539–2,555)||Zygosity||Sex||Zygosity × Sex|
|Using and Applying||−.06 (.99)||.03 (1.01)||−.04 (.96)||.05 (1.04)||p=.002 η2=.002||p=.014 η2=.001||p=.050 η2=.001|
|Numbers and Algebra||−.05 (.99)||.03 (1.00)||−.06 (.97)||.07 (1.03)||p=.005 η2=.001||p<.001 η2=.003||p=.214 η2<.001|
|Shapes, Space and Measures||−.06 (.99)||.03 (1.00)||−.02 (.96)||.03 (1.04)||p=.001 η2=.002||p=.207 η2<.001||p=.113 η2<.001|
|Composite||−.06 (.99)||.03 (1.00)||−.04 (.96)||.05 (1.04)||p=.001 η2=.002||p=.008 η2=.001||p=.094 η2=.001|
|Measure at 10||Zygosity||Sex||ANOVA|
|MZ (n=1,995–2,021)||DZ (n=3,596–3,632)||Female (n=2,943–2,972)||Male (n=2,648–2,681)||Zygosity||Sex||Zygosity × Sex|
|Using and Applying||−.04 (1.00)||.02 (1.00)||−.06 (.95)||.06 (1.05)||p=.020 η2=.001||p<.001 η2=.003||p=.685 η2<.001|
|Numbers and Algebra||−.04 (.99)||.02 (1.01)||−.06 (.95)||.07 (1.05)||p=.027 η2=.001||p<.001 η2=.003||p=.556 η2<.001|
|Shapes, Space and Measures||−.06 (1.00)||.03 (1.00)||−.04 (.95)||.05 (1.05)||p=.001 η2=.002||p=.003 η2=.002||p=.770 η2<.001|
|Composite||−.05 (1.00)||.03 (1.00)||−.06 (.95)||.06 (1.05)||p=.006 η2=.001||p<.001 η2=.003||p=.567 η2<.001|
|MZ (n=1,941)||DZ (n=3,407)||Female (n=2,935)||Male (n=2,413)||Zygosity||Sex||Zygosity × Sex|
|Understanding Number||−.03 (1.00)||.02 (1.00)||−.08 (1.00)||.09 (.99)||p=.226 η2<.001||p<.001 η2=.007||p=.723 η2<.001|
|Nonnumerical Processes||−.04 (1.03)||.02 (.98)||−.05 (.99)||.06 (1.02)||p=.079 η2=.001||p<.001 η2=.002||p=.840 η2<.001|
|Computation and Knowledge||−.02 (1.00)||.01 (1.00)||−.05 (1.01)||.06 (.98)||p=.339 η2<.001||p<.001 η2=.003||p=.471 η2<.001|
|Math composite||−.03 (1.01)||.02 (1.00)||−.07 (1.00)||.08 (.99)||p=.158 η2<.001||p<001 η2=.005||p=.587 η2<.001|
|Measure at 9||Zygosity||Sex||ANOVA|
|MZ (n=1,922–1,949)||DZ (n=3,397–3,445)||Female (n=2,793–2,834)||Male (n=2,526–2,560)||Zygosity||Sex||Zygosity × Sex|
|Scientific Enquiry||−.05 (.99)||.03 (1.00)||−.03 (.96)||.03 (1.04)||p=.006 η2=.001||p=.177 η2<.001||p=.019 η2=.001|
|Life Processes||−.05 (1.00)||.03 (1.00)||−.01 (.94)||.01(1.06)||p=.002 η2=.002||p=.890 η2<.001||p=.098 η2=.001|
|Physical Processes||−.03 (.98)||.02 (1.01)||−.03 (.95)||.03 (1.05)||p=.064 η2=.001||p=.148 η2<.001||p=.022 η2=.001|
|Composite||−.05 (.99)||.03 (1.01)||−.02 (.95)||.02 (1.05)||p=.010 η2=.001||p=.340 η2<.001||p=.025 η2=.001|
|Measure at 10||Zygosity||Sex||ANOVA|
|MZ (n=1,987–2,018)||DZ (n=3,574–3,639)||Female (n=2,921–2,974)||Male (n=2,640–2,683)||Zygosity||Sex||Zygosity × Sex|
|Scientific Enquiry||−.05 (1.01)||.03 (.99)||−.01 (.95)||.01 (1.05)||p=.003 η2=.002||p=.408 η2<.001||p=.958 η2<.001|
|Life Processes||−.04 (.99)||.02 (1.00)||−.00 (.96)||.00 (1.04)||p=.017 η2=.001||p=.924 η2<.001||p=.919 η2<.001|
|Physical Processes||−.04 (1.00)||.02 (1.00)||−.02 (.96)||.02 (1.05)||p=.041 η2=.001||p=.148 η2<.001||p=.633 η2<.001|
|Composite||−.05 (1.00)||.03 (1.00)||−.01 (.96)||.01 (1.05)||p=.008 η2=.001||p=.505 η2<.001||p=.916 η2<.001|
|MZ||DZ||Female||Male||Zygosity||Sex||Zygosity × Sex|
|(“g”) at 7 Telephone assessment||−.06 (.99) n=3,590||.03 (1.00) n=6,350||.00 (.98) n=5,122||−.00 (1.02) n=4,818||p<.001 η2=.002||p=.527 η2<.001||p=.578 η2<.001|
|(“g”) at 9 Booklet assessment||−.05 (.98) n=2,320||.03 (1.01) n=3,939||−.02 (.99) n=3,348||.02 (1.01) n=2,911||p=.004 η2=.001||p=.176 η2<.001||p=.291 η2<.001|
|(“g”) at 10 Web assessment||−.05 (.99) n=1,850||.03 (1.00) n=3,234||−.06 (.98) n=2,804||.08 (1.02) n=2,280||p=.020 η2=.001||p<.001 η2=.004||p=.645 η2<.001|
It can be seen from Tables 3–8 that sex and zygosity as well as interactions between them were not important factors in explaining variance in any of the measures. Phenotypic correlations among the measures are presented in Chapter VI.
In our genetic analyses (described in the following section) the scores were corrected for age so that age does not contribute to twin resemblance, which is standard in analyses of twin data (McGue & Bouchard, Jr., 1984). The results could be affected even by small differences in age at the time of testing at this important stage of development, which would inflate estimates of shared environment because members of a twin pair are of exactly the same age. For the analyses of individual differences the scores were also corrected for sex differences. This was not done for the extremes analyses in order not to affect the representativeness of groups at low ability cut-offs. For the individual differences analyses (but not the extremes analyses), in order to avoid the possibility that our results were affected by very extreme scores, all pairs in which one or both twins scored 3 or more standard deviations below or above the mean were excluded from each category.
- Top of page
- OVERVIEW OF MEASURES AND PROCEDURES
- PHENOTYPIC ANALYSES
- GENETIC ANALYSES
The twin method, one of the major tools of quantitative genetic research, addresses the origins of individual differences by estimating the proportion of variance that can be attributed to genetic, shared environment, and nonshared environment factors (Plomin et al., in press). In the case of complex traits that are likely to be influenced by multiple factors, the genetic component of variance refers to the influence of alleles at all gene loci that affect the trait. The similarity between twins for any particular trait can be due wholly or in part to these shared genetic effects. Twin similarity may also be due wholly or in part to shared environment, which refers to environmental influences that vary in the population but are experienced similarly by members of pairs of twins. For example, pairs of twins experience similar conditions during gestation, have the same socio-economic status, live in the same family, and usually go to the same school. These factors could reasonably be expected to increase similarity between co-twins. Nonshared environment refers to any aspect of environmental influence that is experienced differently by the two twins and contributes to phenotypic differences between them, including measurement error. Such influences involve aspects of experience that are specific to an individual, such as traumas and diseases, idiosyncratic experiences, different peers, differential treatment by the parents and teachers, and, importantly, different perceptions of such experiences, even if the events appear to be ostensibly the same for the two children.
Genetic influence can be estimated by comparing intraclass correlations for identical (monozygotic, MZ) twins, who are genetically identical, and fraternal (dizygotic, DZ) twins, whose genetic relatedness is on average .50. The phenotypic variance of a trait can be attributed to genetic variance to the extent that the MZ twin correlation exceeds the DZ twin correlation. Specifically, heritability, which is the proportion of phenotypic variance attributed to genetic variance, can be estimated as twice the difference between the MZ and DZ twin correlations. The relatedness for shared (common) environmental influences is assumed to be 1.0 for both MZ and DZ twin pairs who grow up in the same family because they experience equally similar prenatal and postnatal environments. Shared environmental influences are evidenced to the extent that the DZ twins' correlation is more than half of the MZ correlation. Limitations of the twin method can be found elsewhere (e.g., Plomin, DeFries, McClearn, & McGuffin, 2001). Twin correlations for all of the measures at all of the ages are presented in Chapter III.
Structural equation model fitting is a comprehensive way of estimating variance components of a given trait (or, as explained below, of the covariance between traits) based on the principles described above. The fundamental quantitative genetic model is the so-called ACE model. It apportions the phenotypic variance into genetic (A), shared environmental (C), and nonshared environmental (E) components, assuming no effects of nonadditive genetics or nonrandom mating. Figure 1 illustrates the basic logic of this method. The path coefficients of latent variables A (genetic), C (shared environmental), and E (nonshared environmental, including error of measurement) are represented by the lowercase letters a, c, and e, respectively. Genetic relatedness is 1.0 for MZ twins and .5 for DZ twins. Shared environmental relatedness is assumed to be 1.0 for both MZ and DZ twins. The ACE parameters and their confidence intervals can be estimated by fitting the models to variance/covariance matrices using the model-fitting program Mx (Neale, 1997).
ACE model-fitting results of individual differences for the entire sample for all measures at all ages are presented in Chapter III.
As summarized in Table 9, there are three possibilities with respect to the causes of individual differences in boys and girls, regardless of mean differences between the sexes (Neale & Maes, 2003). The first possibility is that different genetic and environmental factors are responsible for individual differences in mathematics for boys and girls—these are called qualitative differences. Such sex-specific effects are not limited to genes on the X chromosome but can also involve genes on the autosomal chromosomes that affect boys and girls differently, for example, because the genes interact with sex hormones. The second possibility, not mutually exclusive with the first, is that the same etiological influences affect individual differences in boys and girls, but that they do so to a different extent—these are known as quantitative differences. The third possibility is that there are no differences in the etiology of individual differences for boys and girls; the same genes and environments operate to the same extent in both sexes, even if there are mean differences between boys and girls. That is, mean reading scores are lower for boys than girls, but the factors that make one boy different from another can be the same as those that make one girl different from another girl. It should be noted that quantitative genetics with its focus on individual differences has little to say about the origins of mean differences between boys and girls. Indeed, we frequently find no quantitative or qualitative differences in the etiology of individual differences for boys and girls despite large mean differences (Viding et al., 2004).
|Sex Differences in Etiology of Individual Differences||Explanation||Possible Contributing Factors|
|Qualitative differences||Different genetic and environmental factors are responsible for individual differences for boys and girls.||Genes on the sex chromosomes. Genes on the autosomal chromosomes affect boys and girls differently, for example, because the genes interact with sex hormones. Teachers treat boys and girls differently in terms of their expectations or requests for help.|
|Quantitative differences||The same etiological influences affect individual differences in boys and girls, but that they do so to a different extent.||As above, but the differences are in quantity of effects.|
|No differences in etiology||The same genes and environments operate to the same extent in both sexes.||Boys as a group may exhibit a mean disadvantage, but the factors that make one boy different from another are the same as those that make one girl different from another girl.|
These three possibilities (qualitative differences, quantitative differences, and no differences) can be assessed using sex-limitation structural equation modeling. Each possibility is associated with a set of parameters in the sex-limitation models (see Figure 2). Qualitative differences are evidenced in the genetic relatedness (rg) between DZ opposite-sex twins. In DZ same-sex pairs, the assumption is that on average the twins share 50% of their varying DNA, and the coefficient of genetic relatedness is therefore .5. If there are qualitative differences in etiology between boys and girls (different genetic and environmental factors), the genetic relatedness in DZ opposite-sex twins will be less than .5. If there are quantitative differences (the same factors, but exerting different magnitudes of effect) rather than qualitative differences, the genetic relatedness for DZ opposite-sex pairs will still be .5, but the parameter estimates for the A, C, and E components will be significantly different for male–male pairs and female–female pairs. If there are no qualitative or quantitative differences between boys and girls, the genetic relatedness of DZ opposite-sex (DZos) pairs will be .5 and the A, C, and E estimates for male–male and female–female pairs will be the same. However, the phenotypic variance might nonetheless differ for the two sexes because mean differences are often associated with variance differences (i.e., higher means have higher variances).
Using the model-fitting program Mx (Neale, 1997) for each composite measure, we first tested the full model which allows all parameters to vary: rg in the DZ opposite-sex pairs, A, C, and E estimates, and variance estimates (see Figure 2a). This was fit to variance/covariance matrices derived from the data. A series of nested models was then tested. The first nested model (Figure 2b) is called the common effects sex-limitation model that tests for qualitative sex differences by fixing rg to .5 in the DZos, but allows different A, C, E, and variance estimates. The second nested model (Figure 2c) is called a scalar effects sex-limitation model that tests for quantitative sex differences by constraining A, C, and E parameters to be the same in boys and girls as well as constraining rg to .5 in the DZos; however, it allows differences in phenotypic variance between males and females. The third and final nested model, called the null model (also Figure 2c), tests for variance differences between boys and girls by constraining all the parameters to be equal for males and females. For each model, the ACE parameters and their confidence intervals were estimated. The overall fit of each model was evaluated using the root mean square error of approximation (RMSEA), with lower values representing better fitting models. Results of sex-limitation model fitting are presented in Chapter III.
Teacher Heterogeneity Model
In order to test whether being in the same classroom and having the same teacher affected the results of our analyses, we analyzed each of the composite scores separately for the two groups (same vs. different teacher for the two twins in the family). After examining the pattern of twin correlations for the two groups, we performed model-fitting analyses to test whether the differences in estimates for the two groups were statistically significant. The model used for this analysis was similar to that of the sex-limitation models used to test for quantitative sex differences. The full model allowed A, C, and E parameters to vary between the groups. The null model equated the A, C, and E parameters for the two groups. The results from the teacher heterogeneity model are also described in Chapter III. Note that in the case of teacher ratings, being in the same classroom includes the effects of a shared teaching experience and a shared rater, whereas for the test scores, being in the same classroom reflects a shared teaching experience only.
The previous model-fitting sections focused on the analysis of individual differences for the entire sample; that is, ability rather than disability. An important feature of TEDS is that its large community sample makes it possible to study disability in the context of ability by selecting children at the low end of the normal distribution. In Chapter IV, we present results for all measures at 7, 9, and 10 years for children in the lowest 15% of the distribution.
For each of the measures, we defined probands as 5% and 15% of the whole sample, identifying statistically low performance on that measure. As results for both cut-offs were generally similar, we only present the results from the 15% cut-off analyses, which provided greater power. Probandwise concordances (the ratio of the number of probands in concordant pairs to the total number of probands) were calculated separately for each measure and each of the five sex-by-zygosity groups. Probandwise concordances represent the risk that a co-twin of a proband is affected. Greater MZ than DZ concordances suggest genetic influence, but unlike twin correlations, twin concordances cannot be used to estimate genetic and environmental parameters because they do not in themselves include information about the population incidence.
DF extremes analysis assesses genetic links between disability and ability by bringing together dichotomous diagnoses of disability and quantitative traits of ability. Rather than assessing twin similarity in terms of individual differences on a quantitative trait of ability or in terms of concordance for a diagnostic cut-off, DF extremes analysis assesses twin similarity as the extent to which the mean standardized quantitative trait score of co-twins of selected extreme or diagnosed probands is below the population mean and approaches the mean standardized score of those probands (see Plomin & Kovas, 2005 for detailed explanation of DF extremes analysis and for discussion of alternative methods). This measure of twin similarity is called a group twin correlation (or transformed co-twin mean) in DF extremes analysis because it focuses on the mean quantitative trait score of co-twins rather than individual differences. Genetic influence is implied if group twin correlations are greater for MZ than for DZ twins, that is, if the mean standardized score of the co-twins is lower for MZ pairs than for DZ pairs. Doubling the difference between MZ and DZ group twin correlations estimates the genetic contribution to the average phenotypic difference between the probands and the population. The ratio between this genetic estimate and the phenotypic difference between the probands and the population is called group heritability. It should be noted that group heritability does not refer to individual differences among the probands–the question is not why one proband is slightly more disabled than another but rather why the probands as a group have lower scores than the rest of the population. Figure 3 illustrates the basic logic of the DF analysis.
Although DF extremes group heritability can be estimated by doubling the difference in MZ and DZ group twin correlations (Plomin, 1991), DF extremes analysis is more properly conducted using a regression model (DeFries & Fulker, 1988). The DF extremes model fits standardized scores for MZ and DZ twins to the regression equation, C=B1P+B2R+A, where C is the predicted score for the co-twin, P is the proband score, R is the coefficient of genetic relatedness (1.0 for MZ twins and .5 for DZ twins), and A is the regression constant. B1 is the partial regression of the co-twin score on the proband, an index of average MZ and DZ twin resemblance independent of B2. The focus of DF extremes analysis is on B2. B2 is the partial regression of the co-twin score on R independent of B1. It is equivalent to twice the difference between the means for MZ and DZ co-twins adjusted for differences between MZ and DZ probands. In other words, B2 is the genetic contribution to the phenotypic mean difference between the probands and the population. Group heritability is estimated by dividing B2 by the difference between the means for probands and the population.
Finding group heritability implies that, first, disability and ability are both heritable, and second, that there are genetic links between the disability and normal variation in the ability. That is, group heritability itself, not the comparison between group heritability and the other estimates of heritability, indicates genetic links between disability and ability. If a measure of extremes (or a diagnosis) were not linked genetically to a quantitative trait, group heritability would be zero. For example, this situation could occur if a severe form of learning disability is due to a single-gene disorder that contributes little to normal variation in learning ability. However, most researchers now believe that common disorders such as learning disabilities are caused by common genetic variants—the common disease/common variant hypothesis (Collins, Euyer, & Chakravarti, 1997)—rather than by a concatenation of rare single-gene disorders. To the extent that the same genes contribute to learning disability and normal variation in learning ability, group heritability will be observed, although the magnitude of group heritability depends on the individual heritability for normal variation and the heritability of disability gleaned from concordances for disability.
The results of these DF extremes analyses are the topic of Chapter IV.
Cross-sectional designs can be used to compare genetic and environmental estimates across age but are weakened by the use of different samples at each age. One strength of a longitudinal design is that the same sample is studied at each age. However, the most important benefit of a longitudinal design is that analyses of age-to-age change and continuity are possible, as in the previous example of longitudinal DF extremes analysis. Prospective and retrospective longitudinal analyses can be performed using the multivariate twin methodology described in the following section. Longitudinal analyses are described in Chapter V. In Chapter V, we also present, for the first time, an extension of DF extremes analysis to a trait assessed at two measurement occasions, following the approach described in the following section. For longitudinal DF extremes analysis, we selected probands on the basis of reading scores at 7 years and analyzed their co-twins quantitative reading scores, not at 7 years, but at 10 years.
The principles of the twin method can be extended to determine the etiology of the covariance between different traits, which is called multivariate genetic analysis. As mentioned in the previous section, longitudinal analysis is a special case of multivariate analysis in that it focuses on the etiology of the covariance between the same trait at different ages. In contrast to univariate quantitative genetic analysis that decomposes the variance of a single trait into genetic and environmental sources of variance, multivariate genetic analysis decomposes the covariance between traits into genetic and environmental sources of covariance (Martin & Eaves, 1977). In other words, multivariate genetic analysis assesses genetic and environmental factors responsible for the phenotypic correlation between two traits. For example, if the same genes affect different traits (called pleiotropy), a genetic correlation will be observed between the traits.
For twin studies, multivariate genetic analysis is based on cross-trait twin correlations for two or more traits. That is, rather than comparing one twin's score on variable X with the co-twin's score on the same variable X, one twin's X is correlated with the co-twin's Y. The phenotypic covariance between two traits is attributed wholly or in part to their genetic overlap to the extent that the MZ cross-trait twin correlation exceeds the DZ cross-trait twin correlation. Shared environmental influences are indicated to the extent that DZ twins' correlation is more than half of the MZ correlation. As with the univariate analyses, structural equation modeling, based on the same principles, is used as a more comprehensive way of estimating the proportion of covariance. Figure 4 illustrates a typical model (called Cholesky decomposition) that tests for common and independent genetic and environmental effects on variance in two different traits. The Cholesky procedure is similar to hierarchical regression analyses in nongenetic studies, where the independent contribution of a predictor variable is assessed after accounting for its shared variance with other predictor variables. In the bivariate case, the first factor assesses genetic and shared and nonshared environmental influences on trait 1, some of which also influence trait 2. The second factor estimates genetic and shared and nonshared environmental influences unique to trait 2. The same logic applies to more than two factors.
Another important statistic that can be derived from Cholesky analyses is bivariate heritability. This statistic indexes the extent to which the phenotypic correlation between X and Y is mediated genetically. That is, univariate heritability is the extent to which the variance of a trait can be explained by genetic variance; bivariate heritability is the extent to which the covariance between two traits (or the same trait at two ages) can be explained by genetic covariance. Bivariate heritability is the genetic correlation (see the next paragraph) weighted by the product of the square roots of the heritabilities of X and Y and divided by the phenotypic correlation between the two traits (Plomin & DeFries, 1979). The rest of the phenotypic correlation is explained by bivariate shared environment and bivariate nonshared environment.
In addition, the paths from the model can be transformed to obtain the estimates of genetic, shared, and non-shared environmental correlations between each pair of factors. Genetic correlations index the extent to which genetic influences on one measure correlate with genetic influences on a second measure. In other words, genetic correlations indicate the extent to which individual differences in the two measures reflect the same genetic influences. This correlated factors model is illustrated in Figure 5—it is merely an algebraic transformation of the Cholesky model shown in Figure 4. The point is that there are two important statistics: bivariate heritability which is the genetic contribution to the phenotypic correlation between traits, and the genetic correlation which is the extent to which genetic effects on one trait are correlated with genetic effects on another trait. Multivariate genetic analyses are the topic of Chapter VI; Chapter V presents longitudinal genetic analyses based on similar models.
It is also possible to extend DF extremes analysis to address multivariate issues (Light & DeFries, 1995; Plomin & Kovas, 2005), analyzing two traits at the same measurement occasion, or the same trait at two measurement occasions. In Chapter VI, probands were selected on the basis of being in the lowest 15% of web-based reading and mathematics scores at 10 years and analyzed in comparison to their co-twin's reading and mathematics scores. Group heritability indicates the extent to which genetic factors account for the mean difference between probands selected on reading and the population on mathematics. In other words, group heritability in a multivariate extremes analysis indicates the extent to which genetic effects mediate the phenotypic covariance between reading disability and mathematics ability. The group genetic correlation indicates the extent to which the same genetic effects operate on reading disability and mathematics ability. Analysis in both directions is required to estimate a DF extremes genetic correlation—that is, probands were also selected from the lowest 15% of mathematics performance and analyzed with their co-twin's quantitative trait scores on reading. The group genetic correlation can be calculated using the following formula:
where is the group heritability from reading (x) to mathematics ( y), is the group heritability from mathematics to reading, is the group heritability of reading, and is the group heritability of mathematics (see Knopik, Alarcón, & DeFries, 1997 for details).
Although this methods chapter is necessarily dense, especially for readers first exposed to these standard quantitative genetic analyses, we hope that applications of these methods and interpretations of the results of these analyses in the following chapters will clarify the concepts. The following chapter presents univariate analyses of individual differences of learning abilities for the total TEDS sample for all measures at 7, 9, and 10 years. Chapter IV focuses on extremes analyses of learning disabilities. Chapter V considers longitudinal analyses of composite measures from 7 to 10 years. Chapter VI addresses multivariate analyses between composite measures at all three ages. Chapter VII summarizes the results in relation to our three themes of the relationship between normal and abnormal, longitudinal analyses of change and continuity, and multivariate analyses of heterogeneity and homogeneity, and also considers limitations and implications of the research.