An Alternative Model for Quantitative Trait Loci (QTL) Analysis in General Pedigrees

Authors


Corresponding author: Saonli Basu, Division of Biostatistics, School of Public Health, University of Minnesota, MN 55455, USA. Tel: 612-624-2135; Fax: 612-626-0660; E-mail: saonli@umn.edu

Summary

Linkage detection of a trait involves detecting regions of the genome that influence the trait. A wide variety of statistical models are currently employed for linkage analysis of quantitative traits. Many of these models are developed under some assumptions of the trait distributions. Violation of the assumptions about the trait generally affects the type I error and power for linkage detection. In this paper, we have proposed a trait-model-free approach for linkage analysis of a quantitative trait in general pedigrees. The conditional segregation of marker alleles given the trait is modeled using a latent-variable logistic model. A likelihood-ratio test is used to test for linkage under our model. The main applicability of this approach lies in the fact that it always provides correct type I error no matter what the trait distribution is and thus can be used for nonnormal traits or for selected samples. By means of simulation studies, we have compared the power of our proposed model with existing approaches for nonnormal traits. The performance of these methods was also studied on a real dataset. We have demonstrated the usefulness of our approach in terms of power and robustness for linkage detection of quantitative traits in general pedigrees.

Introduction

Linkage analysis of a quantitative trait involves detecting regions of the genome that influence the trait. These regions of the genome are referred as quantitative trait loci (QTL). The QTL mapping approach is a statistical technique to find a region x (QTL) by studying the trait-dependent segregation of alleles at x from founders to nonfounders in pedigrees. Under no linkage between the trait and x, we observe Mendelian segregation at x. Thus, under no linkage, the amount of genetic sharing at x between family members is determined purely as a function of family relationship and has nothing to do with the trait values. But if the location x is linked to the trait, there is expected to be excess genetic sharing among people with concordant phenotypes and less sharing among people with discordant phenotypes in a pedigree. The genetic sharing among family members referred to above is called identity by descent (IBD), which is defined as the sharing of the same copy of a founder allele. People who have similar phenotypes tend to have a higher than expected levels of IBD sharing near the genes that influence the trait.

A number of different statistical models (Pratt et al., 2000; Feingold, 2001, 2002) have been developed to assess linkage between specific regions of the genome and a quantitative phenotype. The power and robustness of these QTL mapping methods vary based on their assumptions on the underlying trait model (Feingold, 2001). The particularly popular approach that has been extensively used for many QTL studies is the variance component (VC) approach (Goldgar, 1990; Schork, 1993; Amos, 1994; Fulker et al., 1995; Almasy & Blangero, 1998; Pratt et al., 2000). The approach (Almasy & Blangero, 1998) assumes that the trait values for the family members follow a multivariate normal distribution where the covariance matrix between the family members depends on their IBD sharing at a location being tested for linkage. This approach can be applied to any type of pedigree structure. This approach is very popular and is used extensively. The major issue with this approach is that it can produce elevated type I error and biased parameter estimates in the case of nonnormal distribution of the trait or in case of selected sampling of phenotypes. Both of these issues are often encountered in real studies. The power of the VC method for linkage detection under these situations can be drastically reduced (Amos et al., 1996; Allison et al. 1999; Feingold, 2001, 2002; Tang & Siegmund, 2001). When there is nonnormality, one can perform a parametric transformation on the trait values to approximate normality in order to correct for the inflated type I error (Allison et al., 2000; Etzel et al., 2003). It is often difficult to find an appropriate transformation to ensure correct type I error for the VC approach. In this paper, we have studied the performance of two such transformations [Box–Cox transformation and rank-based inverse normal transformation (INT)] on the VC approach, but the main focus of the paper suggests an alternative powerful approach that always produces the correct type I error irrespective of the trait distribution, and thus avoids searching for appropriate transformation of the trait in order to produce correct type I error for the VC approach.

This issue with the VC approach has led to a number of attempts to propose different methods to handle situations where normality does not hold. Most of these are based on score statistics or regression-based statistics and attempt to achieve the power of the VC likelihood-based methods, while retaining the robustness and computational simplicity of the original Haseman–Elston regression (Haseman & Elston, 1972). A number of such approaches were discussed and studied extensively in the paper by Bhattacharjee et al. (2008). Among these approaches, the set of approaches that condition on the trait maintains correct type I error and also has comparable or better power to the VC approach and other existing approaches (Bhattacharjee et al., 2008). This is generally accomplished by fitting a model (usually multivariate normal) for the trait values, but conditional on the trait values in the sample. A statistic of this type has the correct type I error (at least asymptotically) no matter what the underlying trait distribution or the sampling scheme, although the power still depends on the correctness of the assumed model (Feingold, 2001). Many of these QTL mapping approaches (Haseman & Elston, 1972; Alcais & Abel, 1999; Dudoit & Speed, 2000) are restricted to nuclear families. A very promising approach in this category, which is applicable to general pedigrees is the regression approach proposed by Sham et al. (2002). An important assumption of this approach is that most of the linkage information in a pedigree can be summarized by pairwise relationships. The pairwise IBD sharing at a location x is regressed on the pairwise squared differences and squared sums for the trait. This approach produces statistics whose type I error is robust to the selected sampling or nonnormality of the trait, while having similar power with the VC approach if the multivariate normal model is correct.

In this paper, we have proposed a likelihood-based approach that belongs to the above category of approaches that condition on the trait data. This model is an extension of the approach proposed by Basu et al. (2010) for linkage analysis of a discrete trait and has similarities with the approach proposed by Sinsheimer et al. (2000), which is primarily developed for testing of association. Our approach specifies an explicit alternative model for the transmission of founder alleles among individuals in the pedigree conditional on the trait data of the individuals, and performs a likelihood-ratio test against the null hypothesis of no linkage; that is, the transmission of founder alleles is independent of trait. Our model conditions on the trait data and does not make any assumption on the distribution of the trait. Hence it produces the correct type I error irrespective of the underlying trait distribution or the sampling scheme. Moreover, a key advantage of this approach is that it models the segregation of founder alleles directly as opposed to modeling the pairwise IBD sharing in Sham et al. (2002), which could be more informative for extended pedigrees. This approach is also capable of dealing rigorously with the fact that IBD sharing is not observed directly, but must be inferred (with uncertainty) from the observed marker data. Our model is aimed primarily at extracting this multigenerational information, which current models struggle to capture. However, it is applicable to pedigrees of any size.

Hence, the potential advantage of this proposed approach is that it can be used for linkage analysis of any trait, without worrying about the type I error. The approach provides a computationally feasible solution even for extended pedigrees and models explicitly the transmission of alleles from founders to nonfounders in a pedigree. We compared the performance of our model with the VC approach (Almasy & Blangero, 1998) on transformed and untransformed data and the regression-based approach (Sham et al., 2002) through simulation studies for nonnormally distributed traits. We also studied the performance of our method on the National Heart Lung and Blood Institute (NHLBI) Family Heart Study (FHS) dataset (Higgins et al., 1996). The simulation study demonstrated that our method has better power than the majority of these approaches for the nonnormal traits we studied in the paper. Moreover, unlike the VC approach, our proposed approach always produced the correct type I error. The simulation study and the real data analysis demonstrated that our proposed approach is an attractive alternative for QTL analysis in general pedigrees, especially the in case of nonnormally distributed traits or selected sampling of phenotypes.

Model

Variance Component Approach

The VC method, as described by Almasy & Blangero (1998), is an extension of the model proposed by Amos (1994). The method assumes that the vector of phenotypic trait values Y in a pedigree follows a multivariate normal distribution. The likelihood for a pedigree of size k, L, can be written as:

image(1)

where μ is the vector of the theoretical trait means (which may include effects due to covariates), and Σ is the expected variance–covariance matrix. Covariate effects are introduced by modeling the trait vector as μ=α+Aβ, where α is the overall trait mean, and A is a k X m design matrix for the trait whose k rows contain m covariates which are uncorrelated with the additive genetic effects for each pedigree, and β is an m-vector of fixed covariate effects.

This method can be used to estimate the genetic variance attributable to a region around a genetic marker of interest. Because the VC approach allows for single locus, polygenic and residual components of variance, the corresponding components of variance (σ2a, σ2g, σ2e) are estimated. A model estimating the variance attributable to a marker locus is compared to a model where the variance is fixed to be zero. In both models, the residual genetic variance is parameterized as an additive polygenic component. The −2 log-likelihood ratio is compared to a χ2 distribution with the number of degrees of freedom determined by the number of independent parameters. Generally, the only parameter of interest is the variance attributable to the marker locus. The value of the parameter under the null hypothesis (H 0: σ2g= 0) lies at the boundary of values under the alternative, so that the likelihood-ratio test (LRT) asymptotically follows a 50:50 mixture of χ20 and χ21 under the null hypothesis. Although the VC method has greater power than the H–E method (Haseman & Elston, 1972), it is computationally intensive and can be difficult to implement on more complex pedigrees. Additionally, the method is not robust to violations of assumed normality of the quantitative trait.

The VC approach can produce elevated type I error and biased parameter estimates in case of nonnormal distribution of the trait or for selected samples. The biased parameter estimation can also cause great reduction in power for linkage detection (Amos et al., 1996; Allison et al. 1999; Feingold, 2001, 2002; Tang & Siegmund, 2001). Several alternatives were suggested in Allison et al. (1999) to deal with the issues of nonnormality for the VC approach. One option is to resimulate data under the null hypothesis for the same trait model and estimate the correct quantile as opposed to using the asymptotic distribution of the test statistic. We implemented this strategy in the paper and reported the empirical power to compare among different approaches. Another strategy is to transform data to approximate normality (Allison et al., 1999). There are many different ways of transforming a dataset to approximate normality (Wang & Huang, 2002; Etzel et al., 2003; Beasley et al., 2009). We used a Box–Cox transformation (Etzel et al., 2003) and a rank-based INT (Beasley et al., 2009) to approximate normality and then applied the VC approach on the transformed dataset. In this paper, we have studied the performance of the VC approach on both the untransformed data and the transformed data under the null hypothesis of no linkage and under the alternative hypothesis of complete linkage between a marker and the trait.

Regression-based Approach

Sham et al. (2002) proposed a regression-based approach (REGRESS) that produces correct type I error under nonnormality of the trait distribution or selected sampling of phenotypes and retains the power of the VC approach when the trait distribution is normal. The method involves regression of multipoint IBD sharing on trait squared sums and squared differences among all pairs of relatives. Within a single pedigree, the estimated IBD sharing proportions are regressed on an equal number of squared sums and an equal number of squared differences. Because, under imperfect marker information, the full distribution of IBD sharing is uncertain, a weighted-least-squares estimation procedure is used that requires only the covariance matrix of IBD sharing. The weighted-least-squares estimators of the regression coefficients can be written as a function of the covariance matrix of the IBD sharing proportions, the covariance matrix of the squared sums and squared differences, and the covariance matrix between the estimated IBD proportions and the squared sums and squared differences (Sham et al., 2002). The elements of the last of these matrices are proportional to the additive variance explained by a linked QTL. The solution of this multivariate regression in a single pedigree provides an estimate of the additive QTL variance, together with its sampling variance. Then one could combine these estimates across all the pedigrees in a sample, weighting them by the inverse of their variances. This provides the sampling variance of the combined estimate, and a χ2 test for linkage.

Yu et al. (2004) performed extensive simulations to compare the performance between the REGRESS and the VC approaches and concluded that the REGRESS method is an approximation of the VC method. They also provided theoretical justification why these two approaches are essentially the same and the REGRESS method is essentially the score statistic corresponding to the VC likelihood. The only difference is that VC produces inflated type I error when there is deviation from normality for the trait distribution or there is selected sampling of phenotypes, whereas REGRESS produces correct type I error under these circumstances. Sham et al. (2002) noted that the REGRESS method can be liberal in some circumstances, particularly if a small number of families are contributing to the test statistic or if there are highly skewed contributions from some families.

There are numerous other statistical methods for quantitative trait linkage analysis in human studies. A number of score tests have been proposed to maintain correct type I error under nonnormality of the trait (Tang & Siegmund, 2001; Putter et al., 2002; Wang & Huang, 2002). Similar to the REGRESS approach, such score tests have the advantage that, although they are based on a normal model, they can be made robust to departures from normality. Chen et al. (2004) described a general framework for quantitative trait linkage analysis in general pedigrees that makes use of generalized estimating equations (GEE) and for which many of the current quantitative trait linkage methods are special cases. Chen et al. (2005) proposed two novel test statistics that take account of higher moments of the phenotype distribution and investigated the performance of several such score statistics. More recently, Bhattacharjee et al. (2008) studied the performance of different score statistics in terms of power and robustness and provided general guidelines regarding the choice of score statistics in practical situations. For comparison purpose in this paper, we decided to restrict our comparison to the REGRESS approach.

A Latent Variable Logistic Regression Model

Here we propose a latent variable logistic regression model for the linkage detection of quantitative traits. The approach is motivated by the model proposed by Basu et al. (2010) for a binary trait and provides an alternative way of QTL mapping that does not require any distributional assumption of the trait. This proposed approach conditions on the trait data and thus provides the correct type I error irrespective of the distribution of the trait or the sampling scheme. The approach can be applied on general pedigrees, though it gets computationally intensive when there are a large number of founders. The particular advantage of this proposed approach as compared to the method proposed by Sham et al. (2002) is that the proposed approach models the transmission of alleles from founder to nonfounders instead of modeling the pairwise IBD sharing and thus could have more power for linkage detection in large complex pedigrees. Moreover the proposed approach deals more rigorously with the uncertainty in transmission of founder alleles through the pedigree in the case of imperfect marker information.

An inheritance vector ν (Lander & Green, 1987) at a location x denotes the segregation of founder alleles at location x from founders to the nonfounders. All founders are assumed to be unrelated and have distinct founder alleles at x. Nonfounders inherit these founder alleles from founders. For each nonfounder, the inheritance vector ν at x represents the pair of founder alleles transmitted from the parents. If Y is the quantitative trait data on the individuals in n pedigrees, under the null hypothesis (H 0) of no linkage between the trait and the location x,

image(2)

where m is the total number of meioses in n pedigrees. Hence under H 0, it is equally likely to observe any one of the inheritance vectors given the trait data on individuals. In the presence of linkage (H 1) between x and the trait, there would be deviation in the conditional distribution of the inheritance vector Pr 1[ν | Y] from its null distribution specified in equation (2).

We propose here a quantitative preferential transmission model (QPTM) for the dependent segregation of ν conditional on the trait in the presence of linkage. The proposed model employs a logistic regression framework for the trait-dependent segregation of a founder allele from a parent to an offspring. Our model assumes that each founder allele can be labeled as one of two types such as type 1 or type 0. The label allocation of each founder allele is independent of other founder alleles. Founder alleles of type 1 are those that are associated with the high trait levels as compared to the population trait mean level, whereas founder alleles of type 0 are associated with the low levels of the trait as compared to the population mean trait level. We do not know a priori which founder alleles are associated with the high or low values of the trait in a pedigree. Hence we assume that each founder allele has a 50:50 chance of being type 1 or 0. An allocation of label inline image represents one possible allocation of (0-1) types to all the founder alleles in a pedigree. In a pedigree with f founders, there are 22f possible allocations of labels to the founder alleles.

We then model the trait-dependent segregation of the founder alleles to the nonfounders of a pedigree for a specific allocation inline image. Note that as a founder allele is transmitted down through the pedigree it takes its label or type with it. For example, if a father passes a founder allele of type 1 to his offspring then the paternal allele of the offspring is of type 1. According to our model, if the founder alleles of a parent have the same label (i.e., both 0 or both 1), or if the trait data of the offspring is unknown, the founder alleles from the parent get transmitted with equal probability (0.5). That is, our proposed model assumes Mendelian segregation of founder alleles for offspring with unknown phenotype. This assumption is required to implement our model when there is missing phenotype data. If the founder alleles of a parent have different labels (i.e., one labeled 0, the other labeled 1), and y is the trait data of the offspring, then according to our model the probability of transmitting an allele labeled 1 is

image(3)

where μ is the population mean and σ is the population standard deviation of the trait. According to our model, if the observed trait value of an individual equals the estimated mean, the individual is equally likely to receive any of the two founder alleles from a parent. If the trait value is much higher or much lower than the mean, then there is a preferential transmission of one of the two types of founder alleles from a parent to the offspring.

Using the above transmission model, we compute the probability distribution Pr 1[ν|Y]. This requires a sum over all possible allocations of label inline image. In other words,

image(4)

where ai is the label of the founder allele received by an offspring from a parent in meiosis i where inline image are the labels of two founder alleles in the parent and Yi is the trait data of the offspring involved in meiosis i. The inline image for a pedigree contains 22f terms, where f is the total number of founders in the pedigree. It can be computed or estimated under the null hypothesis (Lander & Green, 1987; Sobel & Lange, 1996; Thompson & Heath, 1999). The parameter β is chosen so that β= 0 corresponds to the null hypothesis of no linkage and β≥ 0 corresponds to the alternative hypothesis of excess sharing. If ν can be directly observed, then the log likelihood for n pedigrees is simply inline image. We can then find the maximum-likelihood estimate of β and use the likelihood-ratio statistic to test the null hypothesis H0:  β= 0 against the alternative H1: β≥ 0.

In linkage analysis, trait (Y) and marker data (G) are collected on families. Consider any location x that is not necessarily a marker locus but that has one or more markers in the neighborhood. Our model focuses on the probability of the transmission of founder alleles at x, given knowledge of the trait values of individuals in a pedigree. In linkage analysis, due to missing marker data and lack of knowledge on ordered marker genotypic information, the exact path for segregation of founder alleles in a pedigree (ν) at x cannot be determined. Inference about linkage between the set of markers and the trait depends on the imputed patterns of segregation. The likelihood ratio in favor of H1 for a single pedigree can then be written as

image(5)

where E (.| G) denotes expectation over the conditional distribution of ν given G under H0. We can compute Pr 1(ν | Y) in equation (4) using our model specified in equation (4). The maximum likelihood estimator of β is constrained to be >0, so that the asymptotic distribution of the likelihood ratio test statistic is a 50:50 mixture of χ20 and χ21 under the null hypothesis (Self & Liang, 1987).

A test of H0 can be performed by first computing equation (4), and then computing a p-value by comparing the observed value with its (asymptotic) distribution under H0. Computation of equation (4) can be performed exactly for pedigrees of moderate size (Kruglyak & Lander, 1998). For larger pedigrees it can be approximated, for example, using Markov chain Monte Carlo to sample from inline image as in Heath (1997) or in Thompson & Heath (1999). More details about the implementation of the model in larger pedigrees are discussed in Basu et al. (2010). Note that whether one performs the computation exactly, or by Markov chain Monte Carlo, it is necessary to assume a model for founder alleles. The standard assumptions in this setting are Hardy–Weinberg equilibrium and linkage equilibrium in the founders, and marker allele frequencies are known (Kruglyak et al., 1996; Thompson & Heath, 1999). Although these assumptions may not hold in practice, they are almost universally assumed in this setting, and we will make them here. We have incorporated our QPTM approach into GENEHUNTER (Kruglyak et al., 1996). The maximum likelihood estimator for β is currently being computed using a grid search algorithm.

Application: Simulation Study

We conducted a number of simulations to compare the performance of our QPTM approach with several other quantitative linkage analysis approaches. We mainly focused on the nonnormally distributed traits to study the performance of different approaches in order to demonstrate the fact that the main usefulness of our QPTM approach is its robustness against deviation from normality. Unlike most of the model-based approaches, this trait-model-free approach does not explicitly model the residual polygenic component of the trait. Hence, our QPTM approach can suffer from loss of power as compared to other trait-model-based approaches when there is high residual genetic sharing among pedigree members. We investigated the effect of increased residual genetic component on all these approaches through simulation studies.

We considered two transformations on the phenotypic data such as a Box–Cox transformation and a rank-based INT to approximate normality. The individual phenotypes were then transformed by Box–Cox transformation for each simulated dataset. The Box–Cox transformation requires that the phenotype data for the jth member in the ith family, Yij is positive for all i, j, so we shifted the standardized distribution of Y by adding the absolute value of the min(Y) plus a small positive constant to each value.The Box–Cox transformed set inline image was conducted as:

image(6)

For each simulation replicate, the choice of λ was the estimated value of unconditional power transformation to approximate normality of the data by the method of maximum likelihood. We used the ‘car’ package in R (Hornik, 2010) to perform the Box–Cox transformation.

For the rank-based INT, we ranked the trait data and applied INT on the ranks to transform the data to normal. We computed a new transformed value of the phenotype for the ith subject:

image(7)

where ri is the ordinary rank of the i-th subject among the N observations and Φ−1 denotes the standard normal quantile (or probit) function (Beasley et al., 2009). We then implemented the VC approach on the transformed dataset. We used MERLIN (Abecasis et al., 2002) to implement the VC approach and the REGRESS approach. We compared the performance of five approaches in this simulation study of nonnormal traits. They were the VC approach on the simulated untransformed data, the VC approach on the Box–Cox transformed data (VC–Boxcox), the VC approach on the inverse normal rank transformed data (VC–INT), the REGRESS approach and the QPTM approach. We studied their type I error and power for linkage detection under a variety of trait models.

Simulation 1

We simulated phenotypes for 250 sets of four siblings (sibquads) for our analysis. The trait data were simulated assuming a diallelic QTL accounting for 15% of the trait variance, with the remaining variance due to additive polygenic effects and random error. In each case, we set the sum of the QTL variance (σ2a) and the polygenic effects (σ2g) to be 15%, 40%, or 70% of the trait variance. We considered the QTL allele frequency of (0.2, 0.8) for alleles (A, a), and simulated data under dominant, additive and recessive genetic models. The different trait models considered for this simulation had residual genetic component 0% (Model 1), 25% (Model 2, Model 4, and Model 5) and 55% (Model 3) just to demonstrate the impact of increase in residual genetic component on these approaches. For Models 1, 2, and 3, we considered an additive genetic model with mean (−1,0,1) for genotypes aa, Aa, and AA, respectively. Model 4 was a dominant trait model with mean (−1,1,1) and Model 5 was a recessive model with mean (−1,−1,1) for the genotypes (aa, Aa, AA), respectively. In this simulation study, we employed Sham et al. (2002)'s method to simulate a nonnormal trait, where we divided the normally simulated trait values for each sibship by a χ2 random variable with 10 degrees of freedom. The phenotype values were then scaled to have zero mean and unit variance. We simulated 1000 replicate datasets under each trait model. We conducted the analyses using a marker with five alleles each with an allele frequency of 0.2. The estimated average skewness and kurtosis of the trait over 1000 replicates and over all trait models was 0.102 and 9.53. The program ‘genedrop’ in MORGAN (http://www.stat.washington.edu/thompson/Genepi/pangaea.shtml) was used for simulating marker data.

We first examined how the methods perform under the null hypothesis of no linkage between the trait under different trait models specified above. The test statistic for each model had a mixture of χ2 distribution (Z =inline image) under the null hypothesis. We estimated the proportion of values out of 1000 replications that were ≥Z.95 (95th quantile of inline image distribution) and ≥Z.99 (99th quantile of inline image distribution) under each method. Table 1 shows that the VC method on untransformed data consistently produced inflated type I error. There was an increase in the type I error rate with the increase in percentage of the residual genetic component to the total phenotypic variance. This was reflected in the estimated proportions of replicates that exceeded the 0.05 critical value or 0.01 critical value. Because the trait distribution was still slightly leptokurtic after the transformation (average kurtosis coefficient 9.53 before transformation and 3.15 after transformation), the VC–Boxcox also produced inflated type I error. A similar phenomenon was reported by Etzel et al. (2003). The rank-based INT reduced the kurtosis of the observed data significantly (average kurtosis coefficient 9.53 before transformation and −0.156 after transformation). The VC–INT approach performed better than the VC–Boxcox, because the former had a type I error less inflated than the latter. Similar to the VC approach on the untransformed data, the type I error rate for the VC–Boxcox and VC–INT approach went up with the increase in residual genetic component. Hence, in order to correctly interpret the findings from the VC approach on both untransformed and transformed data, one needs to simulate the null distribution of the test statistic to adjust for the inflated type I error. The transformations could not completely fix the inflated type I error rate of the VC approach. Similar findings were reported by Beasley et al. (2009). In general, the type I error for VC, VC–Boxcox, and the VC–INT approaches were marginally lower for the dominant and the recessive model as compared to the additive model. The REGRESS approach had a slightly elevated type I error for all the trait models. The QPTM approach was unaffected by the nonnormality of the trait and produced the correct type I error rate for this nonnormal trait.

Table 1.  Type I error of different approaches under a variety of trait models when the trait has multivariate t distribution for a sibquad (Simulation 1). Data were simulated at a marker with five equifrequent alleles unlinked to the trait. The proportion of observations ≥inline image (empirical 95th quantile of the distribution of each test statistic) and ≥inline image (empirical 99th quantile of the distribution of each test statistic) are listed.
Trait ModelVCVC–BoxcoxVC–INTREGRESSQPTM
α0.050.010.050.010.050.010.050.010.050.01
10.1580.0850.1410.0650.0870.0250.0550.0120.0450.009
20.1680.1030.1600.0920.1010.0300.0690.0200.0490.012
30.1740.0880.1650.1100.1050.0350.0600.0170.0450.010
40.1710.1140.1590.0890.1100.0400.0640.0250.0510.009
50.1710.1010.1620.0940.1020.0360.0680.0210.0450.011

We next explored the power of these approaches by simulating data on a marker with five equifrequent alleles completely linked to the QTL. We listed the empirical power of all the approaches for the trait models specified in Table 2. For each approach, we first estimated the 95th and 99th quantile of the distribution of the test statistic under the null hypothesis. We then calculated the power of the approaches under each trait model using the estimated cut-off points from its null distribution. As shown in Table 2, the QPTM approach outperformed the VC approach on untransformed data, the VC–Boxcox approach, and the REGRESS approach. The VC–INT approach had better empirical power than the QPTM approach, but it had inflated type I error under all these trait models. The QPTM approach maintained the correct type I error and outperformed many of the existing approaches for linkage detection. Because the QPTM approach does not model the polygenic component, its power did not show substantial increase in power as in the VC–INT approach (Table 2). In general, all the approaches had better power to detect linkage when the data was simulated from the additive model as compared to the dominant and the recessive model.

Table 2.  Empirical power of different approaches under a variety of trait models when the trait has multivariate t distribution for a sibquad (Simulation 1). Data were simulated at a marker with five equifrequent alleles completely linked to the trait. The proportion of observations ≥inline image (empirical 95th quantile of the distribution of each test statistic) and ≥inline image (empirical 99th quantile of the distribution of each test statistic) are listed.
Trait ModelVCVC–BoxcoxVC–INTREGRESSQPTM
α0.050.010.050.010.050.010.050.010.050.01
10.2450.0610.2540.1010.3410.1840.2640.1120.3450.150
20.2340.0520.2760.0660.4220.1622250.0890.3670.154
30.3090.0830.3460.1330.5280.2800.2840.1160.3960.210
40.2200.1100.2750.1050.3510.1400.2150.1100.3150.120
50.2150.0370.2480.0520.3990.0960.2430.0540.3460.136

One additional issue to mention here is that the trait was simulated from a multivariate t distribution with a large df. Hence, these datasets probably did not portray the impact of severe nonnormality on the VC and the REGRESS approaches. Nonetheless the QPTM approach outperformed the VC, VC–Boxcox and REGRESS approaches for nonnormal traits. The VC–INT had better power than the QPTM approach, but it had inflated type I error for all the trait models. Here, we estimated the 0.05 critical value or 0.01 critical value of the null distribution of the test statistic in order to correct for the inflated type I error. In analyzing any real dataset, it is extremely computationally intensive to implement the VC, VC–Boxcox or VC–INT approach this way.

Simulation 2

In this simulation study, we decided to simulate the data from a skewed and highly leptokurtic distribution to study the impact of departure from normality on all these approaches. This time, we simulated data on 200 sibpairs for this simulation study. The residual component of the siblings (i.e., the values before adding in any mean displacement effects due to the QTL itself) was simulated from a bivariate skewed t distribution with 4 df, each with mean 0 and variance 1 and a skewness parameter 5 (Azzalini & Capitanio, 2003). We considered different values of the correlation such as 0.25, 0.50, and 0.75 between the siblings while simulating data from the skewed-t distribution. Observations were sampled from this skewed t distribution using the statistical software package ‘fCopulae’ in R (Hornik, 2010). The latent variables representing QTL effects were simulated by first generating two independent parental diallelic genotypes assuming an allele frequency 0.5 for the trait-increasing allele A. The transmissions of these alleles to two siblings were generated keeping track of parental ancestry, enabling sibling pairs to be classified as sharing 0, 1, or 2 alleles IBD. We simulated data under an additive genetic model. Under the null hypothesis of no linkage, IBD status was generated independently for the simulated phenotypes and the QTL was unlinked to the marker locus. Under the alternative hypothesis of complete genetic linkage between the marker and the trait, the phenotype was dependent on IBD, assuming no recombination between the QTL and the marker locus. The genetic model for complete linkage consisted of a single major diallelic QTL simulated to explain 15% of the phenotypic variance. The trait values for each sibling were specified to be a linear combination of the residual component and the QTL component. For convenience, all distributions for both sib 1 and sib 2 were then rescaled to have zero means and unit variances. We followed a similar simulation strategy as specified in Fernandez et al. (2002).

First we considered the marker locus unlinked to the trait and evaluated the type I error of all these approaches. Table 3 lists the type I errors of all these methods for three different trait models such as Model 1, Model 2, and Model 3 representing the residual correlation of 0.25, 0.50, and 0.75, respectively among siblings. The residual correlation 0.75 is extremely high, but we wanted to demonstrate the impact of the residual correlation on all these approaches. We simulated 1000 replicates to study the type I error. The estimated average kurtosis for these models was 19.3. After the Box–Cox transformation, the average kurtosis was reduced to 1.77 and the average skewness was 0.168. In contrast, after the rank-based INT, the average skewness was reduced to 0 and the average kurtosis was 2.75. The VC–INT approach appeared to be overly conservative at residual correlation of 0.25 and 0.50 (Model 1 and Model 2) and gave inflated type I error for Model 3. Beasley et al. (2009) also reported that the rank-based INT yields conservative or liberal results depending on the nature of the underlying model and the extent of departure from normality. The VC–Boxcox also had conservative type I error for Models 1 and 2, but not as conservative as the VC–INT approach. The VC approach on untransformed data had inflated type I error at a residual correlation of 0.50 and 0.75, but slightly conservative at 0.25. Similar findings were reported by Allison et al. (1999). We again noticed that the type I error increases for the VC approach both for untransformed and the transformed data as the residual correlation among siblings increases. The REGRESS approach appeared to perform better than the VC approach in terms of type I error for Models 1 and 2, but gave slightly inflated type I error for Model 3. The QPTM approach maintained correct type I error for all models.

Table 3.  Type I error of different approaches under a variety of trait models when the trait had bivariate skewed t distribution for a sibpair (Simulation 2). Data were simulated at a diallelic marker with equifrequent alleles unlinked to the trait. The proportion of observations ≥inline image (empirical 95th quantile of the distribution of each test statistic) and ≥inline image (empirical 99th quantile of the distribution of each test statistic) are listed.
Trait ModelVCVC–BoxcoxVC–INTREGRESSQPTM
α0.050.010.050.010.050.010.050.010.050.01
10.0320.0110.0050000.0420.0090.0550.007
20.1080.0350.0320.0030.00500.0560.0090.0560.010
30.1520.0820.1520.0680.0750.0310.0850.0190.0510.009

We next reported the power for all these approaches for Models 1, 2, and 3. This time, we did not adjust for the appropriate cut-off using the null distribution of the test statistics. We wanted to show the impact of conservativeness or inflation in type I error on the power of all these VC approaches. We simulated 1000 replicates under each model—Models 1, 2, and 3. For Model 1, the VC–INT approach and the VC–Boxcox approach had no power to detect linkage (Table 4). In general, the performance of the VC–INT approach was better than the VC–Boxcox approach, but both suffered from inflated type I error in Model 3 and the extremely high power in Model 3 was due to the inflated type I error for both these approaches (Table 4). This simulation study again demonstrated the need to estimate the critical values by simulating the null distribution of the test statistic for the VC approaches. The transformations could not completely fix the type I error issue of the VC approach in case of nonnormality. The REGRESS approach maintained the correct type I error, in general, but the QPTM approach outperformed the REGRESS approach for all three models. The QPTM approach had the correct type I error and maintained a very good power to detect linkage in comparison with the other approaches.

Table 4.  Power of different approaches under a variety of trait models when the trait has bivariate skewed t distribution for a sibpair (Simulation 2). Data were simulated at a diallelic marker with equifrequent alleles completely linked to the trait. The proportion of observations ≥Z.95 (asymptotic 95th quantile of the distribution of each test statistic) and ≥Z.99 (asymptotic 99th quantile of the distribution of each test statistic) are listed.
Trait ModelVCVC–BoxcoxVC–INTREGRESSQPTM
α0.050.010.050.010.050.010.050.010.050.01
10.0500.01900000.1810.0320.4770.248
20.4230.2060.3760.1410.4460.1420.3430.1100.6010.360
30.6920.5290.7390.5760.8470.6790.3470.1590.6240.421

Our proposed model assumes Mendelian segregation of founder alleles for offspring with unknown phenotype. This assumption is required to implement our model when there is missing phenotype data. In other words, our approach assumes that the founder allele type 0 or 1 has a 50:50 chance of being transmitted to the offspring of unknown phenotype from the parents. This assumption implies that there is no preferential transmission of alleles to an offspring with unknown phenotype. If there is a trait segregating in a pedigree, this could cause loss in power. We decided to compare the loss in information for the VC, REGRESS, and the QPTM approaches with the different rates of missing phenotype data. For this study, we considered a three-generation pedigree with a total of 10 individuals. The two founders of the pedigree had two offspring and each of the two offspring had two offspring. We studied the impact of missing phenotypes for a normally distributed trait with 50% additive genetic variance and zero residual genetic component. We simulated 1000 marker datasets at a marker with five equifrequent alleles completely linked to the trait. For each marker dataset, we ran the VC, REGRESS, and the QPTM approach with no missing phenotype data and also with random deletions of 2%, 5%, 10%, 20% of the phenotype data on the individuals in the pedigree. Figure 1 shows the ratio of (estimated mean/estimated s.d.) of the test statistic based on 1000 datasets for each approach with the increase in the percentage of missing data. For this three-generation pedigree, we did not see any substantial differences in the loss in power with the increase in the missing phenotype data for these three approaches, but more investigation is required to see if the QPTM approach suffers additional loss in power due to the missing phenotype data on the extended pedigrees.

Figure 1.

The y-axis represents the ratio of the estimated mean/estimated s.d. of the test statistic computed using 1000 simulated marker datasets at a marker completely linked to the trait. The ratio was plotted against the percentage of missing phenotype data for the same 1000 marker datasets.

Application: Real Data

The NHLBI FHS is a multicenter population-based study of genetic and nongenetic determinants of coronary heart disease (CHD), atherosclerosis, and cardiovascular risk factors. Families were enrolled at four U.S. centers (Higgins et al., 1996). We selected the continuous trait body mass index (BMI) measured at the baseline exam in order to study the performance of different QTL mapping approaches. The main reason for choosing BMI was that a variance component analysis (Province & Rao, 1995) on these pedigrees using adjusted BMI produced significant linkage signals on chromosomes 7 and 13 (Feitosa et al., 2002). A few other chromosomes also showed weak linkage signals. Because there were substantial missing phenotype data in some pedigrees, we decided to consider only the pedigrees with 30% or less missing phenotype data. We finally analyzed 53 pedigrees with 505 individuals. Among the pedigrees, 83% were three-generation pedigrees and rest were two-generation pedigrees. The average pedigree size was 9.53. There were 425 people with observed phenotypes. The mean and standard deviation of the unadjusted BMI was 27.5 and 5.85, respectively. The unadjusted BMI showed some skewness (skewness = 1.4373 with a p-value = 9.609e-14 based on DAgostino's skewness test) and was leptokurtic (kurtosis = 7.2413 with a p-value = 2.116e-15 based on Anscombe–Glynn's kurtosis test). There were 80 people with unknown phenotypes. We analyzed 396 microsatellite markers on 22 chromosomes with average heterozygosity of 75.1%. The trait BMI was fairly heritable with estimated heritability of 52%.

We used GENEHUNTER to implement the QPTM on the FHS dataset. The QPTM model gave very strong evidence of linkage on chromosome 7. We reported the findings for the chromosomes where at least one of the methods produced a linkage signal with a p-value lower than 0.01. The findings of the analysis based on the QPTM approach resembled closely the findings from the VC analysis using all pedigrees (Feitosa et al., 2002). We have also plotted the previous findings on this dataset based on all pedigrees in Figure 2. Because, the previously reported results were based on a much larger set of pedigrees, we did notice some differences in the findings. The findings of the VC approach based on this smaller set of pedigrees were very similar to the REGRESS approach and they were somewhat different from the QPTM approach, especially for chromosomes 9 and 12.

Figure 2.

Figure shows the performance of the likelihood-ratio test under the QPTM approach (solid line), the VC approach (Province & Rao, 1995) (dashed line) and the REGRESS approach (Sham et al., 2002) (dotted line) for linkage detection of BMI in NHLBI FHS. We have reported the linkage signals on chromosomes 1, 2, 7, 9, 10, and 12. We have also reported the previous findings (dark line with filled circles) on these chromosomes (Feitosa et al., 2002) based on all pedigrees.

Among our findings, the region on chromosome 7 was the only region that has been reported by many other studies (Frayling et al., 2007; Laramie et al., 2009). Laramie et al. (2009) reported that several genes may be associated with BMI in the 7q31-34 region using linkage and association scans in two independent FHS samples. The evidence of linkage around 7q32 has been widely replicated for both BMI and other obesity-related phenotypes including fat mass, waist circumference, and waist-to-hip ratio (Clement et al., 1996; Bray et al., 1999; Arya et al., 2002; Feitosa et al., 2002), although variants in this region have not reached genome-wide significance in genome-wide association studies of BMI published to date. In our linkage analysis, we used a small selected sample of FHS pedigrees, therefore many of our suggestive signals were probably not real, but the interesting finding in this linkage analysis was that the methods differed in their findings due to nonnormality of the trait data.

Discussion

The proposed trait-model-free approach provides a robust approach for linkage detection in general pedigrees. The model conditions on the trait data of the individuals in a pedigree, hence it does not suffer from issues like inflated type I error and can, in principle, be used without worrying about the underlying trait distribution or the sampling scheme. However the power for linkage detection still depends on the choice of the model and nature of the underlying trait. In general, we found that our model outperformed the VC and REGRESS approach for a nonnormally distributed trait. In many real studies, investigators select samples based on their phenotypes. A trait-model-free approach is more preferable in these situations, because it conditions on the trait data of the individuals in a pedigree. The proposed approach could provide a robust but powerful alternative in such situations. Further investigation is required to see the performance of this QPTM approach for selected samples. In order to implement the QPTM approach for selected samples, one would however require the knowledge of population mean and standard deviation of the trait.

The potential gain in power of our approach as compared to the REGRESS approach could also be a result of our modeling the inheritance vector explicitly as opposed to summarizing it by pairwise IBD sharing. Moreover, our model handles more rigorously the fact that IBD sharing is not observed directly, but must be inferred (with uncertainty) from observed marker data.

The proposed model is motivated by the idea that some founder chromosomes in a pedigree will carry a risk allele at the trait locus, and that these chromosomes will appear to be transmitted more frequently to individuals with phenotypes significantly higher than the population average. In particular, the same founder chromosome may tend to appear together with the disease through multiple generations, and our model is aimed primarily at extracting this multigenerational information, which current models struggle to capture. The current approach does not model the residual genetic variance and hence suffers from loss of power as compared to the approaches that model the residual component if there is a significant amount of residual genetic variance. Irrespective of this issue, the QPTM approach had good power for the nonnormal trait we studied in the paper. Another issue with the proposed model is that it assumes Mendelian segregation of founder alleles for offspring with unknown phenotype. This can be a potential problem if there is substantial missing phenotype data for the nonfounders. We studied the effect of missing phenotype data on a three-generation pedigree, where we did not see much difference in the performance of the QPTM approach as compared to the VC or the REGRESS approach, but further investigation is required to determine the extent of impact of this assumption on extended pedigrees. This problem can be addressed by imputing the relevant missing phenotype based on the number of affected and unaffected descendants of an individual and calculating a weighted LR statistic by assigning a weight to each imputed disease status. We intend to investigate all these issues further and intend to modify the model so that it can account for residual polygenic variance and missing phenotype data.

Currently we are assuming that each founder allele has a 50:50 chance of being type 0 or 1 in our model. A model using a similar idea has been developed for a binary trait (Basu et al., 2010). One can add another parameter p as the probability of a founder allele to be of type 1 and can estimate p from the available data. We investigated the power and type I error of such a model. One limitation of this model is the nonidentifiability of p under the null hypothesis which complicates the asymptotic distribution of the likelihood-ratio statistic. Moreover the information for p comes from the data on founders. Because in most situations, the data on the founders are missing, this additional parameter does not influence the likelihood much. In cases where there is data on founders, use of this additional parameter p will probably be more informative for linkage than fixing p at 0.5.

Our model has similarities with the gamete competition model of Sinsheimer et al. (2000), where alleles in a parent are considered to compete with one another for transmission to offspring. Their model is aimed toward detecting association of alleles with a trait, and so the probability that an allele is transmitted depends on its allelic state. In contrast, our model is aimed toward detecting linkage, and so the transmission probability depends on which founder allele it is descended from. However, our model could be extended to detection of association using pedigrees and it would be interesting to compare the performance of our approach with that of Sinsheimer et al. (2000). More details about extending our approach to association testing are discussed in Basu et al. (2010).

Conclusion

The proposed trait-model-free approach provides a robust and powerful alternative to the VC approach for linkage detection of quantitative traits. The approach can be applied to pedigrees of arbitrary size. Our model conditions on the trait data and thus does not suffer from nonnormality of the trait or selected sampling of phenotypes. It models the inheritance vector explicitly as opposed to summarizing it by pairwise IBD sharing. Moreover, this model handles more rigorously the fact that IBD sharing is not observed directly, but must be inferred (with uncertainty) from observed marker data. In our simulation studies, the QPTM approach maintained correct type I error and showed good power for linkage detection of a nonnormal trait.

Acknowledgements

The research was supported by the University of Minnesota Grant-in-Aid of Research, Artistry, and Scholarship Program. We thank Xiang Li for her help with the simulation. We also thank the anonymous reviewers for their constructive suggestions and comments.

Ancillary