SEARCH

SEARCH BY CITATION

Keywords:

  • Extreme rank selection (ERS);
  • Linkage;
  • QTL;
  • Ranked set sampling (RSS);
  • Truncation selection (TS)

Summary

  1. Top of page
  2. Summary
  3. Introduction
  4. Background
  5. Linkage Analysis Using Extreme Rank Selections
  6. Results: Power Comparison
  7. Discussion
  8. Acknowledgments
  9. References

It is well known that linkage analysis using simple random sib-pairs has relatively low power for detecting quantitative trait loci with small genetic effects. The power can be substantially increased by using samples selected based on their trait values. Usually, samples that are obtained by truncation selection consist of random samples from a truncated trait distribution. In this article we propose an alternative method using extreme ranks for linkage analysis with selected sib-pairs. This approach approximates the truncation selection. With similar screening sizes and the same sample size of selected sib-pairs, the extreme rank selection and truncation method have similar power performance, both of which are substantially more powerful than when using random sib-pairs. Simulation results on the comparison of powers between the truncation selection and the extreme rank selection and/or random selection for linkage analysis are reported.


Introduction

  1. Top of page
  2. Summary
  3. Introduction
  4. Background
  5. Linkage Analysis Using Extreme Rank Selections
  6. Results: Power Comparison
  7. Discussion
  8. Acknowledgments
  9. References

In genetic linkage analysis for locating quantitative trait loci (QTL) using sib-pairs, all individuals in a simple random sample are usually genotyped (e.g., Haseman & Elston, 1972; Lange et al. 1976; Blackwelder & Elston, 1982). However, unless there is a single locus responsible for 50% or more of the genetic variation, the power of linkage analysis using random samples is relatively low. To detect a locus accountable for 10% to 20% genetic variation thousands of sib-pairs would need to be genotyped. When genotyping is much more time-consuming and costly than measuring the traits one cost-effective method to increase the power is to use selected sib-pairs based on their continuous trait values. Lander & Botstein (1989) studied selective sampling methods for plant genetics. In human genetics, Carey & Williamson (1991) demonstrated that the power of linkage analyses using selected sib-pairs are substantially higher when compared to the random sampling approach of Haseman & Elston (1972). The further development of statistical methods and applications in linkage analysis using selected sib-pairs have been extensively reported in the literature. For review and some recent research, see Forest & Feingold (2000), Freidlin et al. (2003), Szatkiewicz et al. (2003), Szatkiewicz & Feingold (2004, 2005), Wang (2005) and Chen et al. (2005).

The commonly used approach of selective sampling is screening and truncation; that is, individuals are randomly sampled and screened but only those who have trait values falling in certain extreme regions are selected for genotyping. This approach is also referred to as selective genotyping. In sib-pair analysis Carey & Williamson (1991) considered screening sib-pairs according to the trait value of only one sib (say the first one by some natural order) and selecting those sib-pairs whose first sib trait value fell in the top 5% of the population distribution. Risch & Zhang (1995) considered screening according to the trait values of both sibs in a sib-pair and selecting discordant or concordant sib-pairs. Whenever the cost of screening is much less than the cost of genotyping, the approach of selective genotyping will be useful and cost-effective.

Prior research on selective genotyping has focused exclusively on truncation selection (TS). To apply the selective genotyping by TS, the cut-off points, e.g. the top 5% percentile of the trait distribution, must be known a priori, or otherwise they must be estimated using pilot data. For example, in Xu et al. (1999) about 40,000 individuals were first screened to estimate the percentiles of blood pressure and then more than 160,000 additional individuals were screened to select the extreme discordant sib-pairs using the estimated percentiles.

Recently Chen et al. (2005) applied an alternative approach, referred to as extreme rank selection (ERS), to linkage disequilibrium mapping of quantitative trait loci. ERS was shown to approximate TS as they both sample individuals with extreme trait values (Chen et al. 2005). However, TS measures the trait values and uses them for selection, while ERS only ranks the trait values and uses the ranks for selection. For continuous traits that can be easily measured with relatively low cost (e.g. blood pressure) these two approaches are similar, as in each approach the trait can be measured for selecting and/or ranking individuals. On the other hand, certain traits may be more expensive to measure but have surrogate phenotypes that are easier to work with. One example suggested by a referee is in genetic studies of obesity. According to the World Health Organization, bring overweight is one of the top 10 risks for public health in the world and top 5 in developed nations. More and more patients with asthma, diabetes, heart diseases, stroke, etc. are obese. The Dual-Energy X-ray Absorptiometry (DEXA) measurement of fat is usually used to measure the overweight and obese (Goulding et al. 1996; Taylor et al. 2002). For overweight and obesity studies the DEXA measurement of fat is more costly to calculate than the body mass index (BMI), which is a less expensive surrogate for the DEXA measurement. As ERS depends only on the ranks of the DEXA measurements of fat, rather than the values of the measurements, the BMI can be easily calculated and used to rank and select individuals rather than using the DEXA measurements. Once the samples are selected based on the ranks of BMI, the DEXA measurements of fat can be obtained only from these selected samples. A similar example is to evaluate the effect of selective phenotypes in response to gene and low fat diet interaction in overweight and obesity interventions using young twins. The ERS can be applied by ranking the BMI of the twins. Formal measures of obesity, e.g. the DEXA measurement of fat, can be obtained only on those selected twins. Another example is in genetic studies of personality traits using selected samples. The personality trait values are difficult to measure. However, when applying ERS some questionnaires can be used to rank and select individuals and the personality trait values are only obtained from the individuals selected by ERS. Compared to TS ERS is more cost-effective for traits that are difficult and/or costly to measure.

In this article we apply ERS to linkage analysis using sib-pairs, and compare the power performance of ERS, TS and/or random samples of sib-pairs by simulation studies. For a sib-pair two selections using ERS are studied. The first selection is based on the trait of the first sib using the regression model of Carey & Williamson (1991). In the second selection, the traits of both sibs are used and the robust statistic of Szatkiewicz & Feingold (2004) is applied.

Background

  1. Top of page
  2. Summary
  3. Introduction
  4. Background
  5. Linkage Analysis Using Extreme Rank Selections
  6. Results: Power Comparison
  7. Discussion
  8. Acknowledgments
  9. References

Linkage Analysis Using Random Sib-Pairs

Let X be the quantitative trait of concern. Assume that a putative QTL has two alleles M and N with allele frequencies p and q= 1 −p, respectively. Let g be the genotypic value of the QTL. Under the assumption of Hardy-Weinberg equilibrium (HWE), g is a random variable taking values μ+a, μ+d and μ−a, corresponding to the genotypes MM, MN and NN, with probabilities p2, 2pq and q2, respectively. The mean and variance of g are given, respectively, by μg=μ+a(pq) + 2pqd and σ2g2a2d where σ2a= 2pq{ad(pq)}2 and σ2d= 4p2q2d2. The genetic variance σ2g is decomposed to σ2a and σ2d, which are referred to as the additive and dominant components, respectively. The trait value X of an individual can be expressed as X=g, where ε is a random variable with zero mean and variance σ2ε, independent of g, accounting for environmental and other un-identifiable effects.

Based on the above genetic model Haseman & Elston (1972) developed an important regression model for linkage analysis using sib-pairs. Let Y= (X1X2)2 where X1 and X2 are the trait values of the sibs in a sib-pair. Let πm denote the proportion of genes identical by descent (IBD) shared by the sibs at a marker locus and inline image its estimate. Haseman & Elston (1972) proved that, if d= 0,

  • image(1)

where α and β are parameters. When d≠ 0 they showed that (1) holds asymptotically. The coefficient β is specifically given by β=− 2(1 − 2θ)2σ2g where θ is the recombination fraction between the marker and the putative QTL. Hence, to test the existence of a QTL linked to the marker (θ < 1/2), the hypotheses can be formulated as H0: β= 0 against H1: β < 0. The linkage analysis can then be carried out through a simple linear regression model based on (1).

TS Using One Sib and Carey-Williamson's Regression Models

To increase the power of linkage analysis using sib-pairs Carey & Williamson (1991) proposed a TS approach. Let t be the value such that Pr(X > t) =s, where s is pre-specified, say s= 0.05. In TS a sib-pair is selected and genotyped if the trait value of the first sib (determined by some natural order) exceeds t. The truncation point t may be known or may have to be estimated after screening a large number of individuals. The selection is independent of the trait value of the other sib, their genotypes at the marker locus, and the IBD sharing proportion πm.

To compare the power under TS with that under simple random sampling (SRS), Carey & Williamson (1991) considered two different regression models based on the expression of E (X2 |X1, πm) under TS and SRS, where X1 and X2 denote the trait values of the first and second sib in the same sib-pair. To distinguish between TS and SRS, we denote paired trait values under SRS by (X1, X2) and their counterpart under TS by (X*1, X*2). Carey & Williamson (1991) found that, under SRS,

  • image(2)

and, under TS,

  • image(3)

where δi and γi are defined as follows: δi=−1/2, 0, 1/2 and γi= 1/4, − 1/4, 1/4 according as πmi= 0, 1/2, 1. They proved that, under model (2), the least squares estimator (LSE) of β5 has an asymptotic expression inline image, and that, under model (3), the LSE of β*3 has an asymptotic expression inline image, where fl is the conditional probability given X > t that the genotype of the putative QTL has l N alleles and gl is the deviation of the corresponding genotypic value from μg. Note that if the putative QTL is in fact not a QTL then inline image, since the conditional probabilities in this case are the same as the unconditional probabilities, that is f0=p2, f1= 2pq and f2=q2. Therefore, to test the existence of a QTL linked to the marker, the hypotheses can be formulated as H0: β5= 0 against H1: β5 > 0 under model (2). Note that for model (3) the alternative hypothesis was not stated in Carey & Williamson (1991). From inline image it is not straightforward whether the one-side or two-sided alternative should be used. Prior simulation studies under various practical situations indicate that under the alternative β*3 > 0 for model (3). Thus the one-sided alternative will be used for model (3) in our simulation studies. In their simulation studies Carey & Williamson (1991) demonstrated that the power in most situations using TS is dramatically increased compared to using SRS.

TS Using Both Sibs and the Robust Statistic

To select a discordant sib-pair (DSP), denoted as (X1, X2), by TS a sib-pair is screened such that Pr (X1 > t1) =s and Pr (X2 < t2) =s are satisfied, where s is a small percentile. Thus sib-pairs are screened until a pre-specified number of DSPs are obtained. The screening probability of DSP is Pr (X1 > t1, X2 < t2) which is usually less than s2. The greater the correlation between two sibs' traits the smaller the screening probability. For example, assuming (X1, X2) follow a bivariate normal distribution with correlation ρ, the screening probabilities for a moderate DSP (top/bottom 33%) and an extreme DSP (top/bottom 5%) are 0.098 (ρ= 0.1), 0.0722 (ρ= 0.3) and 0.00158 (ρ= 0.1), 0.00046 (ρ= 0.3), respectively.

Let the trait of n DSPs obtained by TS be (X1i, X2i), i= 1, … , n, and μ and σ be the mean and standard deviation of the population trait. Suppose πi is the estimated proportion of alleles shared IBD at the marker locus in the ith sib-pair. The robust statistic proposed by Szatkiewicz & Feingold (2004) can be written as

  • image(4)

which has an asymptotic standard normal distribution under the null. Linkage is detected for a strong positive value of Z. As one does not need to estimate μ and σ the test statistic (4) is easy to apply.

A Brief Review on Ranked Set Sampling

The ERS is motivated from ranked set sampling (RSS) which is briefly reviewed here. The summary of research on RSS can be found in Chen et al. (2004). RSS applies if the measurement of the variable of interest is costly and/or time consuming, but sampling units can be easily obtained and ranked with negligible cost by a certain means without measuring the variable of interest. The procedure of RSS goes as follows. To obtain the first observation on the variable of interest k units are randomly drawn, and their latent values on the variable of interest, denoted by Y1, … , Yk, are ranked in an ascending order without actual measurement as Y[1]⋯Y[k]. Let r be a pre-specified rank, say r= 1, then Y[1] is actually measured. To get the second observation on the variable of interest the procedure is repeated; if the pre-specified rank is r= 2 this time, Y[2] is then actually measured. The scheme continues this way and each time a rank is pre-specified. If the ranks from r= 1 to r=k are specified an equal number of times the scheme is called a balanced RSS. Otherwise it is called an unbalanced RSS. In particular if only the extreme rank r=k (or r= 1) is specified in the whole process the scheme is called an extreme RSS. The ranking in RSS can be done by various means depending on the real situations. It can be done by visual inspection, auxiliary information or measurement on easily obtainable closely correlated concomitant variables. The rankings need not be perfect. As long as the ranking is not completely arbitrary efficiency will be gained through the RSS procedure (see Chen et al. 2004, Chapter 2).

The procedure of RSS is analogous to that of stratified sampling. Instead of pre-stratifying the population the RSS post-stratifies the population based on the random ranks of the individuals. With set size k the RSS virtually stratifies the population into k strata. In the case of extreme RSS with set size k, the distribution of the sampled extreme rank samples mimics the upper 1/k proportion of the original population. The connection between RSS and stratified sampling makes it clear that the extreme RSS can be applied to selective genotyping in addition to TS. In principle, an upper 10% TS in Carey & Williamson (1991) can be replaced by an extreme RSS with set size k= 10. The trait can be considered as a concomitant variable which is used for ranking in RSS.

Linkage Analysis Using Extreme Rank Selections

  1. Top of page
  2. Summary
  3. Introduction
  4. Background
  5. Linkage Analysis Using Extreme Rank Selections
  6. Results: Power Comparison
  7. Discussion
  8. Acknowledgments
  9. References

ERS for Carey-Williamson's Regression Models

By using ERS similar to Chen et al. (2005) the selection of sib-pairs is carried out as follows. To ascertain one sib-pair for genotyping, first k randomly chosen sib-pairs are screened by measuring their trait values (X1j, X2j), j= 1, … , k, where X1j and X2j denote the trait values of the first and second sib, respectively, in the jth sib-pair. Here the order of the sibs in a sib-pair is determined by certain natural ordering independent of their trait values. The trait values of the first sibs in these k sib-pairs are ranked in ascending order as X1(1)X1(2)≤⋯≤X1(k). Finally, the sib-pair whose first sib has trait value X1(k) is selected for genotyping. If n sib-pairs are to be ascertained the above procedure is repeated n times. For the convenience of notation we denote the trait values of the selected sib-pairs by inline image.

The ERS described above with set size k is analogous to an upper (100/k)% TS procedure. The screening size of ERS is a fixed constant nk. The screening size of TS is random, with mean value n/Pr (X1 >tk) =nk, since tk is chosen such that Pr (X1 >tk) = 1/k. Therefore, on average these two selection procedures have similar screening sizes. We can expect these two selection procedures to have comparable powers in linkage analysis, which will be demonstrated later.

Similar Carey-Williamson's regression models can be obtained for ERS. The detailed derivations of the following results are similar to those in Carey & Williamson (1991) and are omitted. Like TS we assume the following for ERS:

  • image(5)

where inline image is the proportion of alleles IBD shared by the ith selected sib-pair, and inline image and inline image are defined through inline image the same as δi and γi through πmi. Note that model (5) is analogous to model (3) in Carey & Williamson (1991). We show that the properties of model (3) for TS hold in model (5) for ERS.

In matrix notation, (5) is written as inline image, where inline image is the design matrix, inline image, and the expectation inline image is conditional on the same variables as in (5). As n is large, the LSE of inline image can be asymptotically approximated by

  • image

Note that inline image are independent and identically distributed. Denote inline image and inline image We obtain

  • image(6)

and

  • image

Thus, the asymptotic approximation of the LSE of inline image can be written as

  • image

Denote the expected frequencies of genotypes MM, MN and NN in the ERS selected sample inline image by inline image, where l indicates the number of N alleles in the genotype, e.g., inline image. It can be shown that

  • image(7)

where gl is as defined before. Since the selection in ERS is independent of the trait of the second sib-pair (X2), we have inline image. Further, as ERS approximates TS, inline image for l= 0, 1, 2. Thus, inline image. Thus, inline image, where inline image was obtained by Carey & Williamson (1991). Thus, the QTL effect can be tested by testing inline image against inline image The one-sided alternative is used, similar to the TS approach used before.

ERS for Discordant Sib-Pairs

To obtain DSPs by ERS a two-stage procedure is required. Denote the trait of a sib-pair by (X1, X2). First, identify k2 sib-pairs and divide them into k groups each of size k, denoted as (X1ij, X2ij) for the jth sib-pair in the ith group, i, j= 1, … , k. In the ith group rank all the first sibs, X1i1, … , X1ik, and obtain X1i(1)≤⋯≤X1i(k). The corresponding second sibs are denoted by X2i[1], ⋯, X2i[k]. We retain both pairs (X1i(1), X2i[1]) and (X1i(k), X2i[k]) for i= 1, … , k. Then, in the second stage we rank all X2i[1] as X2(1)[1]≤⋯≤X2(k)[1] and their corresponding X1i(1), i= 1, … , k, are denoted by X1[1](1), ⋯, X1[k](1). We retain the pair (X1[k](1), X2(k)[1]) and it is a DSP. For the other set of retained pairs (X1i(k), X2i[k]), i= 1, … , k, a similar procedure is applied and the pair (X1[1](k), X2(1)[k]) will be selected as a DSP. Thus we retain two DSPs from k2 random sib pairs and their genotype data are obtained. The ERS is repeated n times to obtain 2n DSPs. Then the robust statistic (4) can be applied. An example of applying ERS for DSPs is given in Table 1 when k= 3. This dataset was generated from a bivariate normal distribution with zero means, unit variances, and correlation ρ= 0.2. Applying ERS we selected two DSPs (1.693, 0.344) and (0.070, 0.106).

Table 1.  Discordant sib-pairs obtained by extreme rank selection (ERS) (k= 3).
Steps(X1, X2)
Group 1Group 2Group 3
Observed(−1.409, −0.115)(0.274, −1.167)(−0.068, −1.211)
sib-pairs(1.693, 0.344)(1.418, 2.555)(0.989, 1.787)
(0.391, −0.704)(0.070, 0.106)(0.785, 0.489)
Rank X1 (by columns)(−1.409, −0.115)(0.070, 0.106)(−0.068, −1.211)
(0.391, −0.704)(0.274, −1.167)(0.785, 0.489)
(1.693, 0.344)(1.418, 2.555)(0.989, 1.787)
Selection(−1.409, −0.115)(0.070, 0.106)(−0.068, −1.211)
(1.693, 0.344)(1.418, 2.555)(0.989, 1.787)
Rank X2 (by rows)(−0.068, −1.211)(−1.409, −0.115)(0.070, 0.106)
(1.693, 0.344)(0.989, 1.787)(1.418, 2.555)
Selection (ERS)(1.693, 0.344) (0.070, 0.106)

Results: Power Comparison

  1. Top of page
  2. Summary
  3. Introduction
  4. Background
  5. Linkage Analysis Using Extreme Rank Selections
  6. Results: Power Comparison
  7. Discussion
  8. Acknowledgments
  9. References

Simulation studies were conducted to compare the power performance of different selective genotyping. The first simulation was to compare the powers for detecting QTL using SRS, TS, and ERS for regression models of Carey & Williamson (1991). The second simulation was to compare powers of TS and ERS using DSPs. Both simulation procedures were similar to that of Carey & Williamson (1991). For a QTL with two alleles M and N, there are nine possible unordered genotype pairs for a sib-pair (Table 2). Denote the probability of the ith genotype pair by pi, and the probability of the ith sib-pair sharing j alleles IBD at the QTL conditional on the genotype pair by fij, j= 0, 1, 2 and i= 1, … , 9. The expressions of pi and fij, which were given in Haseman & Elston (1972, Table III), are summarized in Table 2.

Table 2.  Probabilities of genotypes of sib-pairs and probabilities of IBD sharing at the QTL conditional on the sib-pair genotype (q= 1 −p).
CountsSib-pair ipifi0fi1fi2
n1MMMMp2(1 +p)2/4p2/(1 +p)22p/(1 +p)21/(1 +p)2
n2MMMNp2q(1 +p)/2p/(1 +p)1/(1 +p)0
n3MMNNp2q2/4100
n4MNMNpq(1 +pq)pq/(1 +pq)1/(2 + 2pq)1/(2 + 2pq)
n5MNNNpq2(1 +q)/2q/(1 +q)1/(1 +q)0
n6NNNNq2(1 +q)2/4q2/(1 +q)22q/(1 +q)21/(1 +q)2
n7NNMNpq2(1 +q)/2q/(1 +q)1/(1 +q)0
n8NNMMp2q2/4100
n9MNMMp2q(1 +p)/2p/(1 +p)1/(1 +p)0

First we describe the procedure for simulating the data for a SRS of sib-pairs. We fixed the sample size n and the allele frequency p. The counts of nine different genotypes of n sib-pairs were generated using the multinomial distribution (n1, … , n9) ∼Mul(n; p1, … , p9). For each sib-pair with the ith genotype (i= 1, … , 9), the IBD status at the QTL was generated using Mul(1; fi0, fi1, fi2). Denote by (s1, s2) the genotypes of a sib-pair and (X1, X2) their trait values. Let σ2ε=Var (X1|s1) =Var (X2|s2) and ω=Corr (X1, X2|s1, s2). Fix the broad-sense heritability H2g/(σ2ε2g) =σ2g/(1 +σ2g), where we set σ2ε= 1. Using σ2g= 2pq{ad(pq)}2+ 4p2q2d, the values of a and d were determined for three genetic models: recessive (REC, d=−a), additive (ADD, d= 0) and dominant (DOM, d=a). Then, for each sib-pair the deviations (d1, d2) of the QTL effect were calculated from their genotypes and values of a and d. The trait values (X1, X2) were generated using the bivariate normal distribution with mean (d1, d2) and covariance matrix given by inline image with σ2ε= 1. For each sib-pair the IBD status at the marker locus (πm) was generated for a given recombination fraction θ, using the joint distributions of IBDs at the QTL and marker loci, which were given in Haseman & Elston (1972, Table V). Finally, δ and γ were computed for each sib-pair from πm.

To generate data with TS or ERS, SRS were generated first according to the above procedure. Then TS and ERS were applied to obtain the selected samples. The pre-specified threshold value for TS was found by using simple random sib-pairs simulated independently.

To compare the powers of the three sampling approaches for detecting linkage under various genetic models, we chose minor allele frequencies p= 0.1 and 0.3, recombination fraction θ= 0.05 and 0.10, heritability H= 0.05, 0.20, 0.35 and 0.50. Sample size n= 240 and correlation ω= 0.2 were fixed in all simulation studies. The significance level used in the simulation was α= 0.05. For TS selection was based on the top 10% portion of the trait distribution, while for the corresponding ERS the set size was taken as k= 10. The Carey and Williamson regression models were calculated for testing the null hypothesis of no linkage. The empirical powers presented in Table 3 for recessive (REC), additive (ADD) and dominant (DOM) models are based on 1000 replications.

Table 3.  Linkage analysis of Carey and Williamson regression models (correlation ω= 0.2, n= 240 sib-pairs, and 1000 replications)
pθHRECADDDOM
SRSTSERSSRSTSERSSRSTSERS
0.10.050.050.060.350.270.060.170.140.090.160.15
0.200.150.880.830.210.890.810.260.870.76
0.350.220.940.920.441.001.000.461.001.00
0.500.220.980.960.621.001.000.671.001.00
0.100.050.060.260.200.080.110.120.060.110.11
0.200.130.710.660.160.710.620.130.660.60
0.350.150.800.780.320.980.960.380.980.95
0.500.220.880.860.471.001.000.511.001.00
0.30.050.050.070.200.170.100.120.100.080.100.08
0.200.190.980.920.210.690.610.230.500.44
0.350.371.001.000.520.980.970.500.840.80
0.500.571.001.000.791.001.000.740.960.95
0.100.050.070.140.140.100.100.110.080.090.09
0.200.130.890.770.200.530.450.160.350.31
0.350.301.000.990.360.900.850.430.650.63
0.500.391.001.000.591.000.990.620.830.81

In the situation studied in the simulation results, we observed that TS and ERS were substantially more powerful for testing linkage than SRS. TS and ERS seemed to have comparable powers, with TS being slightly more powerful than ERS. The slight loss of power by using ERS was as expected, since ERS only approximately stratifies the population. The difference in power between these two approaches was largest (less than 13%) when H= 0.20 for the dominant model with rare allele frequency, or for the recessive model with common allele frequency. It is expected that the two approaches would have more similar power performance as the selection is more extreme. Thus, ERS does not incur substantial loss of power while avoiding the potential difficulties of TS, such as estimating truncation points.

In the second simulation, we compared TS and ERS for linkage analysis using DSPs. Assume that the BMI is used as a surrogate phenotype for the DEXA measurements of fat. In practice, to apply TS the DEXA measurements of fat need to be obtained for each sib-pair. The percentiles of the population distribution of the DEXA measurement of fat are either known or are approximated using the data. Once the sib-pairs are selected based on the DEXA measurement of fat their genotypes are also obtained. To apply ERS, on the other hand, we calculate the BMI for each sib-pair rather than the DEXA measurement of fat. The sib-pairs are then ranked based on their BMI (rather than the DEXA measurement of fat). Once a sib-pair is selected based on the ranks of BMI, their DEXA measurement of fat and genotypes are also obtained. Given the set size k, the screening using ERS is independent of the correlation of two sibs while the screening using TS depends on the correlation. Thus it is difficult to compare the screening sizes of both approaches. In our simulation, in each replication we first simulated n= 240 DSPs by TS as described before and recorded its screening size. The 10% and 90% cutoff points were used. Then the set size k of ERS was calculated so that both approaches had the same screening size. Finally, n= 240 DSPs using ERS were obtained. The simulation of IBD data and the choices of parameter values in the second simulation were similar to the first simulation. The powers of linkage analysis using TS and ERS are plotted in Figures 1-3 for three genetic models (recessive, additive and dominant) based on 100 replications. In each Figure different combinations of allele frequency p and recombination fraction θ were used. This shows that, with the same screening size and sample size, TS is slightly more powerful than ERS in most situations. As in the first simulation study, the small loss of power was due to the fact that ERS is an approximation of TS.

image

Figure 1. Powers of linkage analysis using TS and ERS under the recessive model.

Download figure to PowerPoint

image

Figure 2. Powers of linkage analysis using TS and ERS under the additive model.

Download figure to PowerPoint

image

Figure 3. Powers of linkage analysis using TS and ERS under the dominant model.

Download figure to PowerPoint

Discussion

  1. Top of page
  2. Summary
  3. Introduction
  4. Background
  5. Linkage Analysis Using Extreme Rank Selections
  6. Results: Power Comparison
  7. Discussion
  8. Acknowledgments
  9. References

Our study indicates that the extreme rank selection (ERS) procedure for selecting sib-pairs in genetic linkage analysis can be used as an alternative approach to the truncation selection (TS) method. Both approaches can be used to select extremely discordant sib-pairs and moderately discordant sib-pairs. When the cut-off point, e.g. the top 10%, for the TS is known the TS consists of independent samples. When the cut-off point is estimated by a pilot study and some of the samples in the top 10% in the pilot study may also be included in the final analysis, the correlation of these samples is negligible. Likewise the ERS consists of independent selected samples, as only a single extreme sample is selected from every k samples. For selecting discordant sib-pairs two oppositely discordant sib-pairs are selected in every k2 samples; their correlation is also negligible.

The two methods, however, are different in design and analysis. When applying TS the cutoff points for truncation need to be estimated and the screening size is random. This may cause difficulty with a study of cost-effectiveness at the design stage. For example, in Xu et al. (1999) more than 200,000 samples were finally screened to obtain the discordant sib-pairs. To plan such a study and estimate the cost the power and sample size cannot be obtained unless the percentiles of the trait distribution in the population are known or estimated using a pilot study. In contrast, using ERS the screening size is fixed and there is no need to estimate the cutoff points. Furthermore, using ERS the selection of samples is based on the ranks of the trait values. Thus, the actual trait values are not always required for all samples screened. This approach is useful when a surrogate phenotype is available and/or when traits can be ranked easily without quantifications, e.g. weight, height, the size of an infant's head, etc.

Acknowledgments

  1. Top of page
  2. Summary
  3. Introduction
  4. Background
  5. Linkage Analysis Using Extreme Rank Selections
  6. Results: Power Comparison
  7. Discussion
  8. Acknowledgments
  9. References

The authors would like to thank two anonymous referees for their thoughtful comments and helpful suggestions that greatly improved our presentation. The research of Zehua Chen was supported by National University of Singapore grant R-155-000-043-112. The research of Zhaohai Li was partially supported by grant EY014478 of The National Eye Institute.

References

  1. Top of page
  2. Summary
  3. Introduction
  4. Background
  5. Linkage Analysis Using Extreme Rank Selections
  6. Results: Power Comparison
  7. Discussion
  8. Acknowledgments
  9. References