Principal components methods for narrow-sense heritability in the analysis of multidimensional longitudinal cognitive phenotypes
- Conflict of interest: There are no actual or potential conflicts of interest to disclose.
Genetic association studies of longitudinal cognitive phenotypes are an alternate approach to discovering genetic risk factors for Alzheimer's disease (AD). However, the standard linear mixed model approach is limited in the face of multidimensional longitudinal data and multiple genotypes. In this setting, the principal components of heritability (PCH) approach may increase efficiency by deriving a linear combination of phenotypes to maximize the heritability attributable to a particular genetic locus. The current study investigated the performance of two PCH methods, the Principal Components of Heritability Association Test (PCHAT) and C2BAT, in detecting association of the known AD susceptibility allele APOE-ϵ4 with cognitive function at baseline and decline in cognition over time.
PCHAT, C2BAT, and standard linear mixed models were used to test for association between APOE-ϵ4 allele and performance on 19 neuropsychological tests using subjects without dementia at baseline from the Religious Orders Study (ROS) (n = 693) and Memory and Aging Project (MAP) (n = 778). Analyses were conducted across the three methods for three nested phenotype definitions (all 19 measures, executive function and episodic memory measures, and episodic memory only), and for baseline data only versus longitudinal change.
In all cases, APOE-ϵ4 was significantly associated with baseline level of and change over time in cognitive function, and PCHAT and C2BAT yielded evidence of association comparable to or stronger than conventional methods.
PCHAT, C2BAT, and other PCH methods may have utility for genetic association studies of multidimensional cognitive and other phenotypes by maximizing genetic information while limiting multiple comparisons. Copyright © 2013 Wiley Periodicals, Inc.
With heritability estimates in the range of 60–80% [Gatz et al., 2006] genetic variation plays a crucial role in risk for late-onset Alzheimer's disease (AD) [Bertram and Tanzi, 2008; Bertram et al., 2010; Hollingworth et al., 2011a]. In addition to the consistently replicated genetic risk factor apolipoprotein E ϵ4 (APOE-ϵ4), recent large scale genome-wide association studies (GWAS) have identified several more modest AD susceptibility genes, including CR1, CLU, and PICALM, with odds ratios in the range of 0.80–1.2 [Harold et al., 2009; Lambert et al., 2009; Naj et al., 2011; Seshadri et al., 2010; Hollingworth et al., 2011b]. It is likely that additional risk factors of small effect remain undetected, even with thousands of samples, because of insufficient power [Ku et al., 2010].
While large scale case control studies may be the most efficient way to detect these genes, data from population-based cohorts play a key role in the confirmation of case–control findings. Population based data minimize biases related to health status, survival, education, age, and ethnicity that may be only partially controlled analytically in the case control setting. They also offer an opportunity to derive estimates of population-level effects that cannot be obtained from case–control data. In the setting of AD, limiting the analysis of cohort data to cases and controls is inefficient, as it fails to take advantage of potential signal in subthreshold levels of impairment and change over time. Longitudinal population-based studies document the gradual onset of cognitive decline preceding dementia [Hall et al., 2000; Amieva et al., 2008; Wilson et al., 2011], and clinical pathologic studies confirm that such changes are associated with underlying AD pathology [Galvin et al., 2005; Petersen et al., 2006; Sonnen et al., 2007; Bennett et al., 2009]. Analyses of cross-sectional cognitive function and longitudinal cognitive change in persons without dementia have been able to detect the effect of APOE-ϵ4 in a large number of studies [Henderson et al., 1995; Haan et al., 1999; Riley et al., 2000; Mayeux et al., 2001; Wilson et al., 2002; Bennett et al., 2009; Boyle et al., 2010], and have recently been used to replicate one of the genes of small effect discovered in case–control GWAS studies [Chibnik et al., 2011].
State of the art analysis of such data relies on linear mixed models [Laird and Ware, 1982], typically on one or more single tests or on averaged Z scores within or across domains or other summary measures [Wilson et al., 2002]. These approaches have many strengths, but may limit power with multiple comparisons, or can dilute genetically informative features of the data by collapsing multiple tests and domains that are more strongly or weakly related to a particular genetic variant. One novel approach is to optimize the phenotype through transformations that exploit possible pleiotropic effects of the genetic loci to reduce dimensionality and to amplify any genetic signal [Klei et al., 2008]. The principal component of heritability (PCH) method derives a linear combination of multiple phenotypes to maximize the heritability attributable to a particular genetic locus [Yang and Wang, 2012]. This approach was initially applied to family-based association studies in the FBAT-PC method [Lange et al., 2004], and has more recently been extended to association studies of unrelated individuals in the Principal Components of Heritability Association Test (PCHAT) [Klei et al., 2008] and C2BAT [Naylor and Lange, 2012]. In both approaches, a subset of the sample is first used to estimate the coefficients that maximize the correlation between a particular genetic marker with a linear combination of phenotypes, and the rest of the sample is used to test the association between this optimized linear combination of phenotypes and the genotype. In this proof of concept study, we compare the performance of the two PCH methods with the more standard approach in assessing the well-replicated association of APOE-ϵ4 allele with neuropsychological test performance in longitudinal cohort study data.
Study Design and Population
The subjects used for the current study were from two longitudinal cohort studies—the Religious Orders Study (ROS) [Bennett et al., 2006; Boyle et al., 2010] and Memory and Aging Project (MAP) [Bennett et al., 2006]. Detailed descriptions of these cohorts have been published [Bennett et al., 2012a, 2012b]. Briefly, the two studies, both based at the Rush Alzheimer's Disease Center, follow two distinct populations using a large common core of identical methods, so they can be analyzed jointly. ROS [Bennett et al., 2006; Boyle et al., 2006], begun in 1994, has 1,167 older religious clergy (nuns, priests, and brothers) who have agreed to medical and psychological evaluation each year and brain donation after death. The sample is primarily white and highly educated, although recent recruiting has focused on racial and ethnic diversity. MAP [Bennett et al., 2006], begun in 1997, has a more educationally diverse and somewhat more racially diverse population of 1,559 recruited from the local community. Up to 18 waves of annual data collection are available for ROS, and up to 15 for MAP. Both ROS and MAP were approved by the institutional review board of Rush University Medical Center. The overall follow-up rate exceeds 95% for both studies.
For the current analysis, all Caucasian subjects without dementia at baseline and with at least two visits and APOE genotype available were included: 693 from ROS and 778 from MAP. Table I displays the basic demographic data of these subjects. The populations are similar, but the MAP subjects were somewhat older at baseline, greater proportion female, had fewer years of education, and shorter follow-up time.
Table I. Baseline Demographics of the Samples from ROS, MAP, and ROS & MAP Combined*
|Age (years)||75.8 (7.2)||(61, 102)||81.0 (6.5)||(55, 100)||78.6 (7.3)||(55, 102)|
|Education (years)||18.1 (3.3)||(6, 30)||14.7 (2.9)||(5, 28)||16.3 (3.5)||(5, 30)|
|Sex (% female)||65|| ||73|| ||69|| |
|Follow-up time (years)||9.71 (4.51)||(1, 16)||5.94 (2.61)||(1, 13)||7.72 (4.09)||(1, 16)|
Assessment of Cognitive Function
In both ROS and MAP, the cognitive assessment consists of a battery of 19 neuropsychological measures covering five domains: seven categorized as episodic memory, four as semantic memory, four as working memory, two as perceptual speed, and two as visuospatial ability [Boyle et al., 2006] (Table II). For the present analyses, we used three cognitive phenotype definitions: (i) “kitchen-sink” (KS), which included all 19 NP measures; (ii) “executive function and episodic memory” (EFEM), which focused on the 7 tests of episodic memory and the 3 working memory tests assessing executive function (digit span backward, digit ordering, and alpha span test); and (iii) “episodic memory” (EM), only the 7 episodic memory measures. These latter two are “a priori” models—in that these domains are known to have early impairments AD and other disorders, and had previously been reported to be most associated with APOE and or to show decline prior to the development of frank dementia [Mayeux et al., 2001; Wilson et al., 2002; Blacker et al., 2007; Dickerson et al., 2007].
Table II. Neuropsychological Measures With Their Corresponding Cognitive Domains [Boyle et al., 2006]
|Episodic memory (7)||East Boston Story—immediate and delayed recall|
| ||Story A from Logical Memory of the Wechsler Memory Scale Revised—immediate and delayed recall|
| ||Consortium to Establish a Registry for AD neuropsychological battery (CERAD)|
| ||Word List Memory|
| ||Word List Recall|
| ||Word List Recognition|
|Semantic memory (4)||Boston Naming Test (20-item)|
| ||Verbal Fluency|
| ||Extended Range Vocabulary Test (15-item)|
| ||National Adult Reading Test (20-item)|
|Working memory (4)||Digit Span Forward|
| ||Digit Span Backward|
| ||Digit Ordering|
| ||Alpha Span Test|
|Perceptual speed (2)||Symbol Digit Modalities Test (oral version)|
| ||Number Comparison Test|
|Visuospatial ability (2)||Judgment of Line Orientation|
| ||Standard Progressive Matrices|
Blood sample collection and APOE genotyping were performed as previously described [Hixson and Vernier, 1990]. The distribution of the APOE genotypes is shown in Table III. All genotypes were found to be in Hardy–Weinberg equilibrium (HWE), assessed using a standard Chi-square test.
Table III. Distributions of APOE Genotype for ROS, MAP, and ROS & MAP Samples
The two novel approaches, PCHAT and C2BAT, were compared to standard linear regression (for baseline data only) and linear mixed model (for baseline and slope). An additive genetic model was used for the analyses, with the number of ϵ4 alleles entered as a linear covariate. Scores were adjusted for age, sex, and years of education, as described for each approach below. Each of the analyses was performed separately for ROS and MAP, and then on the combined sample, and for each of the three phenotype definitions described above. For each approach, analyses of baseline cognitive function and longitudinal decline in cognitive function were completed. P-values for the association between the phenotype and APOE-ϵ4 were compared across analytic approaches, phenotype definitions, and temporal models.
For analysis of baseline cognitive function, linear regression was used, with the mean Z score [Bennett et al., 2006, 2005; Boyle et al., 2006] of the neuropsychological measures for the appropriate phenotype definition as the dependent variable, and number of APOE-ϵ4 alleles, age, sex, and years of education as independent variables.
For the analysis of longitudinal cognitive function, linear mixed effects models [Laird and Ware, 1982] were used. First, linear regression controlling for age, gender, and education was performed on the mean Z score for each phenotype definition for each visit, and the residuals from these regression analyses were then used as the dependent variables for the linear mixed effects model. The independent variables were the number of APOE-ϵ4 alleles, follow-up time, and the interaction term of APOE × time. Random intercepts allowed for the detection of genetic differences in function at the time of ascertainment, essential given that some subjects entered the study with greater or lesser degrees of impairment. Random slopes allowed the detection of genetic effects on the rate of decline. All of the above analyses were conducted using SAS v.9.1 [SAS Institute Inc, 2000].
Generally speaking, a dimension reduction approach derives a single or a few new phenotypes that are linear combinations of original phenotypes [Yang and Wang, 2012], such as
The PCH approach generally involves using a subset of the sample to estimate the coefficients in (1) that maximize the correlation between and the particular genetic marker, and using the rest of the sample to test the association between this optimized linear combination of phenotypes and the genotype. Different samples must be used for each step so that the type I error is not inflated. Technical details of the approach have been published [Klei et al., 2008; Yang and Wang, 2012]. Briefly, the total phenotypic variance (Vp) is partitioned into the variance attributable to a particular quantitative trait locus (i.e., a genetic marker underlying the risk for particular quantitative phenotype) (QTL) (Vq), and the residual variance (Vϵ) after removing the genetic effect of the particular QTL, as follows:
where Vp is the K × K total phenotype variance–covariance matrix, Vq the QTL variance matrix, and Vϵ the residual variance matrix. Let the weight vector A = (α1, …, αK). The variance of = AtY explained by the genetic marker (i.e., heritability attributable to the particular QTL) is
The objective of the PCH method is to derive an overall phenotype with maximal locus-specific heritability by defining a weight vector A that maximizes .
For the PCHAT method, the sample is randomly split into two disjoint subsets: N0 observations for “training” (i.e., estimation of A in (3)) and the remaining N1 observations for “testing” (the association between optimized with the genotype), with N0/N1 typically being a small fraction, e.g., 0.2 [Klei et al., 2008]. As the results significantly depend on how the sample is split into the training and testing subsets, the process is repeated multiple times with a random split of the sample—and the resulting overall P-value for association test is derived from the mean of the individual test statistics from these repeated steps. Technical details of the PCHAT method have been published [Klei et al., 2008], and software to run PCHAT analyses is available at: http://wpicr.wpic.pitt.edu/WPICComp-Gen/.
For the present analyses, phenotypes were entered into the PCHAT software according to the three phenotype definitions for three temporal models: baseline, slope only, and slope–baseline. First, for each phenotype definition, scores for each cognitive test at each visit were regressed against age at baseline, sex, and years of education. The residuals from these regression analyses were then used in the PCHAT analyses. The baseline model included only the scores from the baseline visit. The slope model included only slopes calculated in separate linear regression analyses of the residual obtained earlier. The slope–baseline model included both baseline values and slopes. Unfortunately, the PCHAT software can only accommodate 20 phenotypes, so the KS phenotype definition could not be used in the slope–baseline model.
The C2BAT method (developed by authors M.G.N. and C.L.) is another PCH approach with somewhat different methodology than PCHAT. Technical details of the C2BAT method have been published [Naylor and Lange, 2012]. Briefly, in the context of several phenotypes being tested simultaneously for association with one genetic marker, the total sample is split into two smaller datasets (A and B), each containing roughly half of the total subjects. Using only Dataset A, a MANOVA is performed to test whether the phenotypes differ across genotypes, and Wilks' test is used to obtain a P-value, pA. Dataset A is also used to perform a multivariate regression of the phenotypes on genotype in order to estimate the regression coefficients. The estimated coefficients from the regression analysis of dataset A are then used to calculate a linear combination of the phenotypes in dataset B, which is then regressed on the genotype, producing a second P-value, pB. The two P-values, pA and pB are then combined into an overall P-value using Fisher's method.
As with PCHAT, the resulting P-value can be highly dependent on how the data are initially split into datasets A and B, so all of the above steps are repeated multiple times—with the median of the resulting P-values noted. A valid P-value for the entire method is then obtained by permutation tests in which the link between the genotypes and the phenotypic information is randomly permuted. The permutation P-value is the proportion of times the observed median is less than the median observed from the unpermuted data.
We implemented the C2BAT method in R [R Development Core Team, 2009] for the current study, and used 100,000 permutations for the permutation tests. As described in detail for PCHAT, we used residuals of neuropsychological test scores and slopes of these residuals for each of the three phenotype definitions and three temporal models: baseline, slope only, and slope–baseline. Implementation in R allows extensive flexibility in model specification, so it would be possible in principle to run a linear mixed model on dataset B, but this would require still more processing time.
Results from the linear regression for the analysis of cognitive function at baseline for the three phenotype definitions are shown in Table IV. As expected, the coefficients are negative indicating worsening performance over time, and all P-values indicate highly significant association in each sample and the combined sample. Also as expected, the strongest signal is from the two a priori phenotype definitions—episodic memory alone and with executive function; the most significant P-value of 4.52 × 10−14 is for episodic memory in the combined sample.
Table IV. Linear Regression Results* for ROS, MAP, and ROS & MAP Combined
Results for the linear mixed models are shown in Table V. Baseline cognitive function was again significantly associated with APOE-ϵ4 with more significant results and more negative coefficients for the a priori phenotype models, especially episodic memory, but overall the results were somewhat less significant than in the linear regression approach. In addition, there were significant P-values for the interaction term, APOE-ϵ4 × follow-up time, indicating more rapid decline in APOE-ϵ4 carriers, in this case with little difference across the different phenotype definitions.
Table V. Linear Mixed Effects Model Results* for ROS, MAP, and ROS & MAP Combined
Table VI shows results of the PCHAT and C2BAT analyses. Of note, the P-values for all phenotype definitions across all temporal models were significant for both samples and the combined sample (P < 0.05), as expected—thus providing proof of concept for these approaches. Overall, as with the standard approaches, the a priori phenotype definitions, particularly episodic memory, typically showed stronger effects. Beyond this, the baseline models provided a stronger signal than slope alone, but there was often a marginal benefit to the slope–baseline model over the baseline alone.
Table VI. Results* From PCHAT and C2BAT for ROS, MAP, and ROS & MAP Combined
Comparisons across the methods were limited by logistical issues, in that PCHAT did not accommodate the full slope–baseline model for the KS phenotype definition, and C2BAT's permutation-test derived P-values reached a floor at 1 × 10−5 with 100,000 permutations, limiting the assessment of all of the combined ROS and MAP associations, and some of the stronger of the individual associations as well. Nonetheless, where comparisons could be made, the novel methods offered some advantages. For baseline performance, PCHAT had comparable or occasionally smaller P-values compared to linear regression, while C2BAT's were typically somewhat larger. For longitudinal cognitive function, the slope-only models typically performed worse than linear mixed effects models, but the “slope–baseline” models yielded more significant P-values than those for the APOE-ϵ4 × follow-up time interaction term, particularly for PCHAT. The limited comparison available also suggests that PCHAT offers advantages over C2BAT.
Genetic analysis of longitudinal cognitive phenotypes is a major focus of current efforts in many epidemiologic studies of older individuals, both individual studies [Henderson et al., 1995; Haan et al., 1999; Riley et al., 2000; Mayeux et al., 2001; Wilson et al., 2002], and larger consortia such as the Alzheimer's Disease Genetics Consortium [Naj et al., 2011] (ADGC) and the Cohorts for Heart and Aging Research in Genetic Epidemiology [Psaty et al., 2009] (CHARGE). Analytic approaches that increase the power of such analyses are of potentially great value.
Novel Versus Standard Methods
The general PCH approach of deriving a trait based on the measured phenotypes to enhance heritability dates back at least to 1988 [Klei et al., 1988], and was introduced formally in the context of pedigrees by Ott and Rabinowitz . Its first application to genetic association studies came from Lange et al.  as FBAT-PC, offering a family-based association test for multiple correlated phenotypes. FBAT-PC broke new ground in applying the generalized principal component analysis to both the estimated genetic variance matrix and the phenotypic variance matrix, which enables it to derive a linear combination of the phenotypes that maximizes the heritability attributable to the genetic marker of interest. This method was used to identify a confirmed locus for body mass index [Herbert et al., 2006].
FBAT-PC uses families that do not contribute to the test statistic to calculate the linear combination of phenotypes. Both PCHAT and C2BAT have extended this approach to unrelated samples by using iterative random splits of the sample to estimate the coefficients and test the association. This genetically driven data reduction method may offer key advantages over a composite phenotype like the mean Z score across tests or domains commonly used in the linear mixed model. The current study represents the first to compare and contrast these two PCH methods to one another and to standard linear approaches, which we do in the context of genetic association studies of longitudinal cognitive phenotypes.
The study worked as proof of concept for the two PCH methods in that the APOE signal was readily detectable, and we were able to see some benefits to these approaches over more standard regression models, particularly for longitudinal change. PCHAT gave strong signals, particularly when both baseline values and slopes were included. PCHAT also offers the advantage of freely available software and more precise estimation of P-values. C2BAT is limited by the requirement of time-consuming permutation tests with processing time too long for use in real world association studies at present. In principle, however, it offers extensive flexibility in the number of phenotypes that can be accommodated and how they are modeled (although we elected to estimate the slopes separately in the interest of limiting computation time for the present project), which may have advantages in certain settings. For PCHAT, we show that limited, thoughtful dimension reduction approaches can be implemented when the number of phenotypes is large without losing the advantages of the PCH approach.
Optimal Models for Longitudinal Cognitive Phenotypes
Comparing the phenotype definitions, as expected, the APOE-ϵ4 allele was more significantly associated with the a priori models, which contributes to the proof of concept for the methods. Episodic memory and executive function are affected early in the course of AD, including prior to the clinical diagnosis of dementia, and thus are most likely to give a signal for an AD-related gene among non-demented individuals in a population based sample.
Comparing the temporal models, these findings are also consistent with the current view of the development of AD. Across both standard and PCH methods, APOE-ϵ4 was more strongly associated with baseline cognitive impairment than with cognitive decline over time. This may be partially related to the statistical properties of cross-sectional compared to slope measures, but is also consistent with the current recognition of a long prodrome for AD in which substantial numbers of older individuals selected to be non-demented at baseline will nonetheless have significant cognitive impairment driven by AD-related pathology [Galvin et al., 2005; Petersen et al., 2006; Sonnen et al., 2007; Bennett et al., 2009]. However, it is noteworthy that P-values from the slope–baseline models that could be implemented in the PCH methods were generally somewhat more significant than their counterparts for the baseline models, again consistent with the current view that both cross-sectional cognitive status in late life (even among the non-demented) and rate of change over time are both related to underlying AD pathology.
The study has a number of limitations that lead to more cautious conclusions. Most critically, a full-scale simulation analysis would be required to more thoroughly understand the relative performance of these measures. Nonetheless, a comparison of P-values for a well-documented association in real data is a first step in establishing the potential utility of these methods for genetic association studies.
Beyond this, our comparisons across methods are limited by the imprecision in P-value estimates for C2BAT, and the number of phenotypes that could be accommodated in PCHAT. In addition, interpretation of comparisons of the very small P-values typical of APOE and cognition is fraught, as P-values tend to be less precise in this range.
Moreover, although we tried to make fair comparisons across the various methods, it was impossible to achieve exact parity. For instance, for the standard linear approaches we used mean Z scores to limit multiple comparisons (and to be consistent with many published papers using these methods). Because the scores could be entered directly into PCHAT and C2BAT, this transformation was not needed. Also, although the slope–baseline model was intended as an analog to the APOE-ϵ4 × follow-up time interaction term of the linear mixed effects model, they are not equivalent. However, some of the difference is a real advantage of the slope–baseline model, in that it incorporates baseline differences and change over time into a comparison.
In addition, both the PCHAT and C2BAT approaches rely on finding optimal linear combinations of phenotypes with maximal heritability attributable to a genetic marker. This may not adequately capture nonlinear changes in cognitive functions over time. This limitation, of course, is shared with linear mixed models, but alternative approaches, such as change point and spline models [Kerner et al., 2009; Yu et al., 2012] might capture genetically relevant change more meaningfully. These issues become more important in studies with very long follow-up.
Also, the results may depend on the specific tests used. In particular, the executive function tests available in ROS and MAP are limited. The executive function and episodic memory phenotypic definition might have given a stronger signal compared to episodic memory alone if we had had access to a larger selection of executive function measures. Further, it is hard to characterize and describe that exact phenotype resulting from the two PCH approaches. We only know that they represent the construct of interest based on the tests selected for inclusion in PCHAT or C2BAT. Because C2BAT is programmed in R, it would possible to output the mean loadings across the iterations in order to better understand which tests are contributing the most to a given genetic signal, although this feature is not currently available.
Last, the study subjects are not representative of the population at large. In particular, we have restricted the population to Caucasian subjects, and the sample, particularly ROS, is very highly educated. However, because the focus of the current study is in investigating the utility of the PCHAT and C2BAT methods, generalizability of the findings to the population at large is less of a concern here.
In summary, this is the first study demonstrating the utility of the two PCH methods PCHAT and C2BAT in genetic association studies of longitudinal, quantitative cognitive phenotype. Further investigations of their performance in genetic association studies of real and simulated data are warranted.
The authors would like to thank Dr. Matthew McQueen for assistance with the initial data analyses, and Drs. Rebecca Betensky, Olivia Okereke, and Peter Kraft for their invaluable comments on the study design and analyses, and feedback on the manuscript. This work was supported by an Alzheimer's Association grant (D.B.), a Harvard University scholarship (W.L.A.F.), National Institute of Mental Health Training Grant T32MH017119 (M.G.N.), and National Institute on Aging Grants P30AG10161, R01AG15819, and R01AG17917 (D.A.B.).