Quantitative trait loci for IQ and other complex traits: single-nucleotide polymorphism genotyping using pooled DNA and microarrays
*I. Craig, Social, Genetic and Developmental Psychiatry Centre, King's College Londan Institute of Psychiatry PO82, De Crespigny Park, Denmark Hill, London SE5 8AF, UK. E-mail: email@example.com
Similar to other complex traits, it is likely that many DNA polymorphisms of small effect size [quantitative trait loci (QTLs)] are responsible for the high heritability of intelligence, in addition to many rare monogenic disorders known to contribute to lowered intelligence. We review the current status of approaches to identify QTLs associations for intelligence employing genome-wide strategies using pooled DNA from many individuals and evaluate the innovative approach of microarray analysis to genotype DNA pools for large numbers of single nucleotide polymorphisms.
Although intelligence is one of the most complex and most controversial quantitative phenotypes, it is an excellent target for genetic research on complex traits in animal models (Galsworthy et al. 2002; Plomin 2001) and in human species (Plomin 1999). Phenotypically, intelligence is one of the best-documented phenomena in the behavioral sciences (Lubinski 2004). Since Spearman (Spearman 1904) first identified general cognitive ability (g) 100 years ago, hundreds of studies have shown that diverse cognitive process correlate substantially (Carroll 1993). In scores of quantitative genetic studies, intelligence defined as g – that is, what diverse cognitive tests have in common – consistently indicate substantial genetic influence, with heritability estimates of about 50% in meta-analyses and even higher in studies of adults (Plomin et al. 2001a). The most important genetic reason to study intelligence is that intelligence captures most of the genetic effects on diverse cognitive abilities. Multivariate genetic research that analyzes the covariance between traits, rather than the variance of each trait in isolation, shows that genetic overlap is largely responsible for the phenotypic overlap among cognitive abilities (Petrill 1997) and cognitive disabilities (Plomin & Kovas 2005). Genetic correlations among cognitive abilities are extremely high, about 0.80. In other words, intelligence is where the genetic action is in relation to cognitive functioning.
An article in this issue by Anthony Payton reviews candidate gene association studies of cognitive abilities (Payton 2006). In our article, we consider new approaches to identifying quantitative trait loci (QTLs) associated with intelligence and other complex phenotypes. Specifically, we consider DNA pooling which makes it possible to screen the very large samples needed to detect reliably QTLs of small effect size, microarrays which make it possible to genotype the very large numbers of DNA markers (single-nucleotide polymorphisms, SNPs) needed to conduct systematic whole-genome QTL searches and the exciting new possibilities of combining both genotyping of DNA pools and microarray analysis in this endeavor.
As is the case with most complex traits, many monogenic disorders have been identified that affect intelligence. A recent review identified 282 monogenic disorders that include low intelligence among their symptoms (Inlow & Restifo 2004). Many of these single-gene disorders are severe, and most are rare, with frequencies of 0.0001 or less (Winnepenninckx et al. 2003). There is some evidence that there may be a concentration of cognition genes on the X chromosome with 37 implicated in non-syndromic mental retardation and a further 15 with syndromic XLMR (Craig et al. in press; Ropers et al. 2003; Zechner et al. 2001). Common disorders, such as mental retardation, could be a concatenation of such single-gene disorders, but it is now generally accepted that common disorders are largely caused by common DNA variants (Lohmueller et al. 2003). The QTL model, which is the molecular genetic equivalent of the model of quantitative genetics, assumes that complex traits are caused by multiple genes of varying but small effect size as well as multiple environmental factors (Plomin et al. 1994). That is, if just a few genes affect a trait, they will produce a quantitatively distributed trait, as demonstrated in Fisher's 1918 resolution to the dispute between Mendelians who focused on single-gene dichotomous (qualitative) traits and biometricians who focused on continuous (quantitative) traits (Fisher 1918). A radical implication of the QTL model is that common disorders might not be disorders at all in the sense of being etiologically distinct from normality as is the case for monogenic disorders. Instead, common disorders might represent the quantitative extreme of the same genetic and environmental factors that operate to create variation throughout the normal distribution. That is, the QTL model predicts that, when genes are found for common disorders such as mild mental retardation or learning disabilities, the same genes will be associated with variation throughout the normal distribution, including the high end of the distribution.
Intelligence is an exemplar of a quantitative trait that is distributed as a bell-shaped normal distribution. As we have seen, much of this normal variation is caused by DNA variation between individuals. Although the number of QTLs or their effect sizes is not known for any complex traits including intelligence, one recommendation is to design studies that can detect QTLs that account for less than 1% of the genetic variance, because such studies will also have the power to detect QTLs of larger effect size (Plomin et al. 2003). Few studies have had the power to break this 1% QTL barrier, which requires an unselected sample of about 1000 individuals to achieve 80% power (P = 0.01) for a single test, without taking into consideration protection against multiple testing and other complications (Zondervan & Cardon 2004). Because linkage designs based on allele sharing within families have trouble breaking the 10% QTL barrier, the much more powerful association design based on population correlations between genotypes and quantitative traits is increasingly favored (Risch & Merikangas 1996). As has happened in other, newer areas of research on complex traits such as hyperactivity (Thapar 2003), QTL research on intelligence has focused on association analysis of candidate genes. Candidate gene studies of cognitive abilities, some of which have the power to break the 1% QTL barrier, have recently reported some replicated associations (Payton 2004; Plomin 2003). However, the record for replication in general for candidate gene studies is not good (Glatt & Freimer 2002; Hirschhorn et al. 2002), in part because any of the 15 000, or so, genes expressed in the brain could conceivably be a candidate.
For this reason, a second recommendation is for whole genome association studies, comparable to the systematic approach of linkage studies but requiring many thousands of DNA markers rather than just a few hundred markers as employed in linkage studies. The number of markers needed for a whole genome association scan is a matter of some uncertainty (Abecasis et al. 2001; Ke et al. 2004; Kruglyak 1999; Reich et al. 2001), but it seems likely that at least 100 000 markers would be needed. The problem with genotyping such large numbers of DNA markers on large samples is of course expense. The focus of this short communication is to outline a solution to this problem in which pooled DNA for large samples is genotyped on microarrays with large numbers of DNA markers.
One way to greatly reduce the amount of genotyping for large samples is to pool DNA from many individuals (Sham et al. 2002). DNA pooling can be used to compare groups such as the bottom vs. the top of a quantitative trait distribution of intelligence or mental retardation cases vs. controls. Samples can also be stratified for sex, and it is as yet unknown if there are sex-specific QTLs for intelligence. But given the dichotomy in brain and language development between the sexes this does not seem to be an unreasonable hypothesis. Pooled DNA samples have been successfully genotyped for microsatellite markers; (Barcellos et al. 1997; Daniels et al. 1998; Pacek et al. 1993; Plomin et al. 2001b, 2002) and for SNPs (e.g. (Craig & McClay 2003; Germer et al. 2000; Hoogendoorn et al. 2000; Kirov et al. 2000; Norton et al. 2002; Ross et al. 2000; Sasaki et al. 2001). Estimates of allelic frequencies from pooled DNA have consistently been shown to be reliable when compared between pools and have been shown to be valid when compared with individual genotyping (Sham et al. 2002). Nonetheless, DNA pooling is best viewed as a tool to screen large numbers of DNA markers quickly for large samples to nominate a small number of candidate markers that can then be tested for confirmation using individual genotyping and traditional statistical analyses. Technical aspects of DNA pooling are discussed elsewhere (Craig et al. 2004).
In research on intelligence, we used pooled DNA to screen 1842 simple sequence-repeat (SSR) markers for association (Plomin et al. 2001b). A three-stage design attempted to provide a balance between false-positive and false-negative results in the search for QTLs of small effect size. In the first stage, pooled DNA was used to screen the SSR markers for an original sample of cases (101 high-IQ individuals) and controls (101 average-IQ individuals). SSR markers that were nominally significant in the first stage were screened further in another study of 96 extremely high-IQ cases and 100 average-IQ controls. Two of the SSR markers passed the first two stages, but neither made it past the third stage, which was a within-family analysis consisting of 196 parent-offspring trios, in which the offspring were of high IQ. Selection for cases at the high extreme of the IQ distribution made this three-stage design more powerful than would be expected given the relatively modest sample sizes – the power of the three stages to detect a 1% QTL was 56, 98 and 97%, respectively, if the SSR marker is very close to the QTL – and the overall false positive error rate was 0.000125, providing robust protection. It is not surprising that no significant QTL associations emerged across all the three stages, because at least 50 times more SSR markers would be needed for an association scan that provided coverage of the whole genome even if the markers are very close to or identical with the QTL itself. Although use of microsatellite markers for whole genome screening offers some advantages, in that there is evidence that some at least may be functional and that they appear to be concentrated in and around coding regions (Gerome Breen personal communication), the wealth of information available for SNPs makes these an attractive alternative (see below).
Part of the problem for whole genome scans using indirect association (in which the marker is not likely to be functional and is thus not likely to be the QTL itself) is that power drops off rapidly, as the distance between the marker and the QTL increases (Kruglyak 1999; Reich et al. 2001). This is the reason that at least 50 times more markers than employed in the preliminary screens are needed for a comprehensive whole-genome scan. An alternative is to use a direct association approach, in which the DNA marker is likely to be functional and can thus be hypothesized to be the QTL itself (Carlson et al. 2004). Although regulatory regions and many other aspects of DNA are likely to be functional, the clearest category of potentially functional markers is non-synonymous SNPs, SNPs in coding regions that result in an amino acid substitution (nsSNP). In a study of non-verbal intelligence, we used DNA pooling in a direct association strategy to screen 432 non-synonymous SNPs selected from millions of SNPs to meet criteria such as showing a polymorphism with a minor allele frequency of no less than 10% in at least 20 Caucasian individuals in proven genes that are expressed in the brain. The sample included 288 children with low non-verbal IQ who were selected from a sample of more than 14 000 children and 1025 controls representing the full range of intelligence (Butcher et al. in press). SNPs were screened on the basis of allelic frequency differences between the low-IQ group and the control group, and these SNPs were tested for QTL association within the control group of 1025 children following the QTL hypothesis that QTLs will affect intelligence throughout the normal distribution. Both the stages involved pooled DNA. The first stage compared pooled DNA for the low-IQ and control groups using triplicate pools to increase reliability. The second stage used five subpools of the controls that represented quintiles of the IQ distribution – that is, the lowest 20 (excluding cases), 21–40, 41–60, 61–80 and 81–100%. Six SNPs surviving these two hurdles were individually genotyped for 1313 children.
Using standard statistics to test QTL associations for the individual genotyping data, we found that one SNP was significantly associated with IQ for both the comparison between case (low IQ) and control groups and an association analysis within the control group. This SNP (rs1136141), which is in a heat-shock protein gene (HSPA8), shows a very small effect size – a relative risk of 1.35 in the comparison between the low-IQ and control group and a correlation of 0.07 within the control group which accounts for about 0.5% of the total variance (about 1 IQ point) in the control group. Although the SNP was selected as an nsSNP, the SNP is now known to be in the untranslated region of the HSPA8 gene rather than in the gene-coding region. If this is a true association – and only replication by other groups will tell – it would be a lucky finding because the number of nsSNPs examined in this study represents only a small fraction of all nsSNPs. Moreover, nsSNPs probably represent only a small fraction of all functional polymorphisms. Nonetheless, the strategy of using DNA pooling to screen large samples for a large number of nsSNPs represents a step toward identifying QTLs of small effect size associated with complex traits in the postgenomic era when all functional polymorphisms will be known (Botstein & Risch 2003).
Although DNA pooling greatly reduces the effort required to genotype large samples, even genotyping a small number of DNA pools for thousands of DNA markers is expensive in time and money, because oligonucleotide primers need to be designed and reagents purchased for each DNA marker to amplify the polymorphic DNA. Microarrays alleviate this problem by using a one-primer assay to genotype thousands of SNPs, such as the microarray developed by Affymetrix that genotypes 11 555 SNPs on a single microarray using allele-specific hybridization (Affymetrix GeneChip® Mapping 10K Array Xba 131). The median intermarker distance between SNPs on the Affymetrix 10K GeneChip® is 105 kb and has been shown to be capable of an average call rate greater than 95%, with reproducibility (as compared with other GeneChip® assays) greater than 99.9% and accuracy (as compared with individual genotyping) greater than 99.5% (Matsuzaki et al. 2004).
Pooled DNA on microarrays
Although the use of microarrays reduces the costs of genotyping, the cost remains substantial for genotyping large samples, because one microarray is required for each individual and microarrays are costly. We have shown in two studies that it is possible to combine the strength of DNA pooling to genotype large samples and the strength of microarrays to genotype large numbers of SNPs by genotyping pooled DNA on microarrays. In developing this approach, our hypothesis was that, instead of reaching a qualitative decision about the presence or absence of a particular SNP allele for an individual, microarrays could be used to assess the relative quantity of the two SNP alleles in pooled DNA, similar to the way microarrays are used in expression profiling to provide a quantitative estimate of mRNA transcripts.
In our first study (Butcher et al. 2004), DNA was pooled for 105 Caucasian males and genotyped three times on the Affymetrix 10K GeneChip. Standard GeneChip protocols were used, even though the GeneChip was designed to genotype individuals not pooled DNA. The average correlation was 0.973 between the allele frequency estimates for the three GeneChips using the same DNA pool. The correlation was 0.923 between the average of the three GeneChip estimates of allelic frequencies using pooled DNA and individual genotyping estimates from a sample of 100 Caucasian individuals which are available from Affymetrix (NetAffx™).
In our second study (Meaburn et al. in press), triplicates of pooled DNA for 100 individuals were each genotyped three times on 10K GeneChips and compared with our own individual genotyping results for 104 SNPs. Good signal detection of SNP allelic frequencies was obtained for 83.9% of the SNPs on average across the nine 10K GeneChips. As in our first study, genotyping results for the nine GeneChips were highly intercorrelated (average r = 0.962). The GeneChip results for pooled DNA correlated highly (average r = 0.986 with K-correction; 0.942 without K-correction) with individual genotyping for 104 SNPs. Calculations from these results indicate that the method can detect with 80% power a case-control allelic frequency difference of 0.05 and virtual certainty of detecting a difference of 0.10. We verified these predictions empirically in a spiking experiment in which varying amounts of DNA (15 and 20%) from one individual, whose genotypes were known by genotyping the individual on the 10K GeneChip, were added to an aliquot of the original pool.
In summary, genotyping pooled DNA on microarrays is reliable and valid approach to estimating allele frequencies and can provide a systematic and powerful tool for identifying QTL associations for complex traits such as intelligence. We have conducted the first study using microarrays to screen pooled DNA for intelligence (Butcher et al. 2005) (Plomin in press). From a community sample of 15 000, 7-year-old children (members of 7500 twin pairs) tested via telephone on a battery of four verbal and non-verbal tests. Two independent studies are being conducted using pooled DNA: a case-control comparison between 515 children with the lowest IQ scores and 1028 unselected controls and a low-high comparison between 503 children with the next lowest IQ scores and 505 children with the highest IQ scores. Each of the four groups were randomly divided into five subgroups of about 100 individuals each to assess sampling variation within each group. DNA pools were constructed for each of these 20 subgroups and are being genotyped on separate the Affymetrix 10K Mapping GeneChip® microarrays. The design has 99% power to detect a QTL with 1% effect size while providing genome-wide protection against false-positive results (P < 10−7). Thirteen SNPs that yielded nominally significant differences (P < 0.03) in both studies were individually genotyped for 2551 children in the two studies, as well as for an additional 5000 children. Four of the 13 SNPs were found to be significantly associated with intelligence in the entire sample, with population effect sizes as small as 0.2%. The effects of the four SNPs are additive and can be aggregated into a SNP set and used in behavioral genomic analyses, as discussed in the following section (Harlaar et al. 2005).
Although pooled DNA on microarrays shows considerable promise in the search for QTLs of small effect size, a limitation of these first studies using pooled DNA on the Affymetrix 10K GeneChip is that at least 10 times more SNPs are needed for even a preliminary scan of the whole genome. Fortunately, a new two-microarray GeneChip set which genotypes more than 100 000 SNPs is now available from Affymetrix, which we will now use in our research with pooled DNA. In addition, the Sanger Centre's exon resequencing project (http://www.sanger.ac.uk/genetics/exon/), in which 48 Caucasian individuals are being resequenced for all exons, is identifying twice as many nsSNPs as previously known. These nsSNPs, which are expected to be available in 2005, will greatly increase power by permitting direct association whole-genome analyses based on the hypothesis that nsSNPs are likely to be functional. nsSNPs are only one possible source of functional DNA polymorphisms and we look forward to the time when microarrays are available with all known functional polymorphisms in the genome (Botstein & Risch 2003).