Traditionally in genetic case-control studies controls have been screened to exclude subjects with a personal history of illness. This control group has the advantage of optimal power to detect loci involved in illness, but requires more work and may incur substantial cost in recruitment. An alternative approach to screening is to use unscreened controls sampled from the general population. Such controls are generally plentiful and inexpensive, but in general there is a risk that some may have the same disease as the cases, which will reduce power to detect associations. We have quantified the extent of this power loss, and produced mathematical formulae for the number of unscreened controls necessary to achieve the same power as a fixed sample of screened controls. The effect of using unscreened controls will also depend on the ratio of the number of screened controls to cases specified in the original study design, and this is also investigated. We have also investigated the cost-benefits of the screened and unscreened approaches, according to variation in the relative costs of sampling screened and unscreened controls, together with genotyping costs. We have, thus, identified the range of situations in which using unscreened controls is a cost-effective alternative to the screened control method and could be considered when designing a study. In many of the typical, real-world situations in complex genetics, the use of unscreened controls is potentially cost-effective and can, in general, be considered for disorders with population prevalence Kp < 0.2. With the steady reduction in genotyping costs and the availability of common sets of “population controls” this design is likely to become increasingly cost effective.
In this paper we consider issues relating to the power and design of case-control studies. An approach that is often used is to screen controls to exclude subjects with a personal history of illness (or in some designs to exclude also those with a family history of illness or other perceived diathesis to illness). The resultant supernormal control group has the advantage of optimal power to detect loci involved in illness because (compared with general population frequencies) risk alleles are expected to occur at increased frequency in the proband group and decreased frequency in the control group. The disadvantages of supernormal controls are that they do not necessarily allow straightforward epidemiological inferences and require more work in recruitment. This latter issue has important implications. For many traits screening is expensive and/or time-consuming. For example, screening in psychiatric disorders typically requires a semi-structured interview and review of medical records. This can therefore add considerably to the cost of the study, or necessitate the collection of a smaller control sample than originally planned, thereby reducing power.
An alternative approach to screening is to use unscreened controls sampled from the general population. Such controls are usually plentiful, but in general there is a risk that some may have (or be at high risk for) the same disease as the cases (and thus more likely to possess the same disease-susceptibility genes), which will also reduce power to detect associations. The extent of this power loss will depend on the prevalence of the trait in the population, being largest when the prevalence is high. In many situations the power may, however, be recovered by typing a larger number of the unscreened controls. Previously, Savitz & Pearce (1988) looked at sources of bias when case ascertainment is incomplete; furthermore Wacholder et al. (1992), in a series of papers, addressed the issue of choosing controls so as to minimise bias. However, these studies take the point of view of minimising bias but make no attempt to quantify it. In contrast, the first aim of this paper is to quantify the extent of this power loss, and to produce mathematical formulae for the number of unscreened controls necessary to achieve the same power as a fixed sample of screened controls. The effect of using unscreened controls will also depend on the ratio of the number of screened controls to cases specified in the original study design, and this is also investigated. The second aim is to investigate the cost-benefits of the screened and unscreened approaches according to variation in the relative costs of sampling screened and unscreened controls, together with genotyping costs. We thus identify the range of situations in which using unscreened controls is a cost-effective alternative to the screened control method, and could be considered when designing a study.
Allele Frequencies for Unscreened Controls
We start by considering the standard case/control study design, where both case and control subjects are screened, i.e. personal history of illness (presence for cases and absence for controls) was identified; the data are represented in the following 2×2 contingency table:
where a, b are the numbers of allele 1 and c, d are the numbers of allele 2 in cases and controls, respectively. Thus n1 is the number of cases and n2 is number of controls. Denoting the relative frequencies of allele 1 in the case and control samples by
then the odds ratio can be written
The sample odds ratio is limited at the lower end, since it cannot be negative, but not at the upper end, and so has a skew distribution. The natural logarithm of the odds ratio is symmetric, running from minus infinity to plus infinity, with zero at OR= 1, and has an approximately normal distribution. The variance for the natural logarithm of the odds ratio equals (see e.g. Armitage & Colton, 1998, p.204)
where r=n2/n1, the ratio of the number of controls to the number of cases.
Let us now consider a case/control study where the controls are recruited from a population without screening. Thus the control sample will include affected individuals with a relative frequency Kp, where Kp is the prevalence of the disease in the population. The relative frequency pu of allele 1 in the unscreened control sample can be calculated as a sum of the frequency of allele 1 for affected individuals multiplied by the prevalence Kp of the disease in the population, and the frequency of allele 1 for healthy individuals weighted by the probability of being healthy, 1 −Kp. Thus the relative frequency of allele 1 in the control sample will be
which changes the odds ratio
The odds ratio tends towards 1 with increasing Kp and ORu= 1 if Kp= 1. Note that the screened controls study design corresponds to Kp= 0 and then the ORu is expressed by (2).
Screened Versus Unscreened Controls Studies
Consider two case/control study designs: one with screened controls, the other with controls recruited from a population. Let us consider the situation in which we fix relevant variables (such as number of available cases) in the screened controls situation and evaluate how the number of required unscreened controls changes with Kp to maintain the original power at the same test size. The screened controls study corresponds to Kp= 0. In order to compare the efficiency of these two studies, let us assume that p1 and p2 are known, and that n1 has the same fixed value for both studies (as in both studies the cases have to be phenotypically evaluated). Let us also fix r0=n2/n1 for the screened controls study. For given p1, p2, n1 and r0, the power =1 −β0 of this study on the fixed significance level α0 can be calculated using standard result (see e.g. Armitage & Colton, 1998, p. 3895)
where σ2 is the variance for the natural logarithm of the odds ratio (3) and zα/2 is the (1 −α/2)-quantile of the standard normal distribution Φ(x). Denote the ratio of the number of unscreened controls to the number of cases by ru=nu/n1. Let us now find ru for the unscreened controls study (i.e. with Kp > 0) such that this study has the same power 1 −β0 as the screened controls study at the same significance level α0. From formula (3) we obtain:
where σu=ln (ORu)/(zβ+zα/2) can be derived from (6). Here the sum of critical values zα/2+zβ is defined by the power and significance level of the screened controls study z+z= ln (OR0)/σ0; OR0 and σ0 are the odds ratio and variance for the natural logarithm of the odds ratio for Kp= 0. Calculating the variance in the screened controls study σ0 from (3) we obtain:
Thus the expression for ru in terms of r0, OR0 and ORu is the following:
Finally, the relative increase in the number of unscreened, as compared to screened, controls needed to preserve the power of the test at the given significance level is given by the ratio (8)
Note that depends on p1, p2, r0, Kp, and does not depend on the number of cases n1.
We note that one can use the χ2 test for 2 × 2 contingency tables instead of the OR to derive , equating the values of theχ2 statistic for screened and unscreened study designs (still using (4) for the number of alleles 1 and 2 for unscreened controls). The resulting quadratic equation can be solved for ru in terms of p1, p2, r0 and Kp. The corresponding expression for , although considerably more complex than (8), gives very similar results (equations and data not shown). We continue with the OR statistic in the following.
Table 1 presents values of for the situation in which OR0= 2, p1= 0.67, p2= 0.5 for different r0 as Kp increases. Figure 1 shows the behaviour of for various values of p1 and p2 such that OR0= 2, and r0= 1. The behaviour of as a function of p1 and OR0 for r0= 1 and Kp= 0.01, 0.05 and 0.2 is shown in Table 2. When Kp is small, is almost constant with respect to both p1 and OR0, however when Kp is large, increases slightly with p1 and decreases with increasing OR0. Table 2 also shows that the ratio strongly depends on Kp but is very robust with respect to misspecification of the allele frequencies. In the table, p1 varies from 0.1 to 0.5 and p2 varies between 0.036 to 0.465, according to the OR0; however in this range only changes from 1.03 to 1.04 for Kp= 0.01. For larger Kp the change is slightly more pronounced with values of between 2.4 and 3.56 for Kp= 0.2. Nevertheless, it is clear that the dependence of on Kp by far outweighs the dependence on p1 and p2.
Table 1. The ratio of unscreened controls compared to screened controls for different r0 (p1= 0.667, p2= 0.5 (OR0= 2)
Table 2. The ratio of unscreened controls compared to screened controls for different Kp, OR0 and p1 (r0= 1)
As one can see from Table 1 and Figure 1, (and, consequently, nu - the number of unscreened controls) tends to infinity as Kp tends to 0.2824 when p1= 0.66(6), p2= 0.5 and r0= 1. Formula (9), obtained from (8) by setting the denominator to zero and using (5) to express ORu, gives the precise expression for the value of Kp for which the ratio becomes infinite. This value of K*p is called the position of the pole of the function .
In practice, values of larger than some sizeable number R, say R= 5, are of no practical use. The question for which Kp the ratio reaches a given level R gives rise to a transcendental equation which cannot be solved in explicit terms; however these values can be calculated numerically, cf. Tables 1, 2. Alternatively, formula (10) gives an upper bound for Kp depending on the value of R for given p1, p2 and r0:
This estimate has been obtained from (8) by substituting R for and replacing the pu[1 −pu] by either p1[1 −p1] or p2[1 −p2], whichever is smaller. Table 3 shows how K*p varies with r0 and OR0 for p2 fixed at 0.1 and 0.5. If Kp is larger than K*p, it is impossible to achieve the required power with unscreened controls. Figure 2 shows the drop in the power for different initial odds ratios when Kp becomes larger than the pole. For this figure the frequency p1 was fixed at 0.5, and frequencies p2 were calculated correspondingly to the OR0.
Table 3. Values of Kp for which the number of unscreened controls tends to infinity, for various values of OR0, r0; p2= 0.1, 0.5 and p1 is calculated according to OR0 and p2
OR0= 1.15 p1= 0.11
OR0= 1.5 p1= 0.14
OR0= 2 p1= 0.18
OR0= 3 p1= 0.25
In practice, if the number of available cases is insufficient to achieve the required power when r0 > 1, one would consider a design with r0= 1. Figure 3 shows the effect of varying r0 on when p1= 0.667, p2= 0.5. It can be seen (Table 1, Figure 3) that as r0 increases, is getting steeper and the pole moves to the left.
Furthermore, we remark that the population prevalence Kp is usually known, but if it is not one could estimate it by screening a number of randomly selected people from the population. Then a certain number of controls will already be screened, and we can use the method in this paper to determine the number of additional unscreened controls required. In this case formula (4) will have a form: pν=[1 −ν][p1−p2]Kp+p2, where ν is the ratio of already screened controls ns among all controls (already screened and additional unscreened nu), i.e. ns=ν·[nu+ns] (both ν and nu are unknown). The corresponding ORν can be calculated replacing pu in (5) by pν and the ratio rν of overall number of controls compared to cases can be calculated using (7) substituting pu and ORu by pν and ORν. The resulting expression for rν as a function of Kp and ν is very bulky and not shown here. The parameter ν can then be calculated numerically (e.g. by the bisection method) as the solution of the equation
Then, the additional number of unscreened controls nu is ns/ν−ns.
Design of Experiment and Cost Functions
We will consider only the cost associated with recruiting and genotyping controls, because the costs associated with recruiting and genotyping cases is the same in both situations. Denote by Rs the cost of recruiting one screened control individual and by Ru the cost of recruiting one unscreened control individual from the population. Let G be the cost of genotyping each individual. Fixing the same number of cases n1 for both study designs, the overall cost for the screened control individuals will be C1=r0n1[G+Rs], and the overall cost for unscreened control individuals will be C2=run1[G+Ru]. The ratio of these two costs (i.e. screened: unscreened) is
where is given by formula (8). C can be interpreted as an indicator showing when screened controls are less cost-effective than unscreened to achieve a given power. If C < 1, then the unscreened controls experiment costs more than the screened controls experiment; if C > 1 then the unscreened controls experiment costs less; and if C= 1 the cost of the experiments is the same. Assuming that screening controls is more expensive than recruiting unscreened controls from the population, one can represent Rs=Ru+Δ, where Δ is the cost of screening. Then, formula (11) can be rewritten
where δ is the relative surplus cost of screening (relative to the basic cost for unscreened control individuals). The borderline between decisions in favour (C > 1) or against (C < 1) the unscreened controls experiment is C= 1, i.e.
or in details:
This formula is derived from (8) by replacing r0 with the expression obtained from formula (3):
(as defined above). For the design of an experiment, one would fix the α and β in advance: formula (14) then gives the required ratio between cases and screened controls.
Figure 4 shows the relationship between δ and Kp for C= 1 and p1= 2/3, p2= 1/2 (OR0= 2). In order to decide which experiment is more profitable, one needs to know Kp and calculate δ according to real prices, then plot the point on Figure 4. If the point is above the curve (a) then the unscreened controls experiment is cheaper. If the point appears below the curve (b), then it is cheaper to screen controls. Note that the position of the pole of the function δ is the same as the pole of the function r, and when Kp is larger than the position of the pole, screened controls are always preferable unless one is willing to reduce the power of the test.
Let us consider a real world example corresponding to Bipolar Disorder (population prevalence 1%). For this case the cost of screening an individual is of the order of Rs=£110. We assume that the cost of recruiting an unscreened control, Ru, is £10. The relative costs of using screened controls compared to those of using unscreened controls are presented in Table 5 for Kp= 0.01 and a range of OR0. We also consider examples corresponding to more prevalent disorders that would require a similar cost for screening controls (Kp= 0.05 (eg. Panic Disorder); Kp= 0.1 and Kp= 0.2 (this is the typical range of estimates of prevalence for Major Depression – the estimates in any population depend upon the stringency of definition of phenotype). We consider four genotyping scenarios:
1G =£1 (e.g. 10 markers at 10p per genotype - this corresponds to a small study in an academic lab). Here the relative surplus cost of screening, δ (see formula (12), is 9.1;
2G =£10 (e.g. 100 markers at 10p per genotype) - a moderate scale study in an academic lab (δ= 5);
3G =£100 (e.g. 1000 markers at 10p per genotype or 10000 markers at 1p per genotype) - an intermediate size of study (δ= 0.91);
4G =£1000 (e.g. 100000 markers at 1p per genotype) - this corresponds to a genome-wide association study with approximately 30kb spacing done by a high-throughput genotyping facility (δ= 0.099).
Table 4 illustrates the above discussion that the unscreened study is cheaper if (cf. formula (13) and Table 2 for values of ). In general, unscreened studies are most cost effective when Kp is small and the cost of genotyping is small relative to the cost of screening. For Bipolar Disorder Kp= 0.01, and the unscreened study is always cheaper than the screened one for the screening costs assumed here, although for the genome-wide study the differences are small.
Table 4. Ratio of screened vs unscreened study design costs for different Kp, OR0, p1 and cost of genotyping G (r0= 1).
For many traits it is difficult and/or expensive to collect appropriately matched and screened controls. Large population based studies may potentially provide large samples of controls that can be used in studies of multiple phenotypes, but these are unlikely to have been screened for all the diseases under study. An example is the UK 1958 birth cohort sample of approximately 10,000 individuals for whom there is a range of demographic data, and on whom immortalized cell lines are being established to provide a DNA resource that can be used as population controls in molecular genetic studies (http://www.cls.ioe.ac.uk/Cohort/Ncds/mainncds.htm).
We have considered the relative efficiency of using unscreened population controls in an unrelated case-control design with a fixed number of available disease cases, as compared with the situation in which controls screened to exclude individuals with a personal history of disease are used. This paper shows that the loss in power incurred by using unscreened controls depends on the prevalence of the trait in the population (denoted by Kp). If Kp is small, this power loss is small. As Kp increases, so does the power loss. For any given combination of odds ratio and allele frequency in cases, it is possible to calculate a quantity K*p such that, if Kp > K*p, it is impossible for a sample of unscreened controls, however large, to give the same power as the sample of screened controls for which the original study power calculation was made. The quantity K*p is typically above 0.25, suggesting that it is possible to achieve the required power using unscreened controls for all but the most common traits. If Kp < 0.1, using unscreened controls will usually require less than a doubling of sample size. Thus, in many situations, using unscreened controls may be cost-effective, particularly when costs of screening controls are high relative to those of recruiting unscreened controls and of genotyping. This may be the case when the screening process involves numerous clinical tests (e.g. stroke or epilepsy where brain scans or electroencephalography may be required in addition to biochemical tests) or complex interviews and questionnaires (e.g. psychiatric disorders). We note that the recruitment cost could alternatively be reduced by using the same screened control subjects for several studies, but this means that they need to be screened for all diseases in question at once, and therefore would require a high level of coordination between studies and research groups.
In general, , the number of unscreened controls required to give equivalent power to one screened control, increases as the odds ratio of the associated allele (as measured in cases vs screened controls) decreases. Thus, screening controls appears to be more important when studying small genetic effects (OR < 2) with a high prevalence of the disease (Kp > 0.2). Nevertheless, Table 5 shows that the effect size has a much smaller impact on the cost efficiency of unscreened controls than the prevalence of the disease and the genotyping cost. Furthermore, the results in this paper show that increases with r0, the ratio of the number of screened controls to cases in the original study design. Thus, the larger the ratio of screened controls to cases in the original study design, the more important screening becomes in maintaining power. This has implications for studies where the number of cases is limited and power is increased by sampling multiple controls for each case. (This is intuitively obvious in that, if achieving acceptable power is highly dependent on the controls (because of a restricted number of available cases), then it is important to ensure that each control used contributes maximal power itself).
A concern with the case-comparison design is the possibility of spurious association caused by unsuspected population stratification (and thus inadequate matching), a particular problem in highly heterogeneous populations such as the USA. Falk & Rubinstein (1987) pointed out that the non-transmitted alleles of the parents of a singly ascertained proband represent a random sample of alleles from the population from which the proband was sampled and can, therefore, be used to construct a well-matched control sample. This neatly circumvents the problem of generating spurious association due to population stratification (although an ethnically homogeneous sample is still required to minimise risk of type 2 error due to population admixture). Over recent years, such family based association methods have gained popularity because of their robustness to stratification. Many methods of analysis have been, and continue to be, developed. One of the most popular methods is the transmission disequilibrium test (TDT) (Spielman et al. 1993) which uses a McNemar statistic to test for excess transmission of a marker allele to affected individuals, over and above that expected by chance. Several refinements of the approach have been described (Schaid & Sommer, 1994), as have other methods of analysis that use data from other family members such as siblings (Spielman & Ewens, 1996).
However, for diseases with onset in adulthood, such as Alzheimer's Disease, Bipolar Disorder, Unipolar Depression or Schizophrenia, family based samples are relatively difficult to collect because parents frequently are unavailable due to death or family breakup. Further, the stratification problem is much less marked in more homogeneous populations, such as those in Western Europe, and approaches are available to test for the presence of, and apply “genomic control” for, population stratification using markers unlinked to the locus being tested in the association study (eg. Pritchard & Rosenberg, 1999; Hoggart et al. 2003). Case-comparison designs offer advantages over the family-based design in a number of areas including:
(a) the samples are much easier and less expensive to recruit;
(d) environmental risk factor co-actions can be explored, whereas only (non-linear) interactions can be studied with family-based designs (Schaid, 1999; Gauderman, 2002b).
We have addressed the issue of screened versus unscreened controls from the viewpoints of power and cost-effectiveness. There may, of course, be other issues that affect the choice of controls in a particular study. If population level inferences are required (such as estimates of effect size and population importance) then unscreened controls will be preferred. If it is important that extensive data are available on controls for environmental variables thought to be involved in the disease of interest (to facilitate analysis of gene-environment interactions) this requirement is likely to reduce the additional cost of screening, and make the screened design relatively more attractive.
One attraction of the use of a large, common population control sample for studies of a diverse set of phenotypes is that the controls are likely to be genotyped by a range of researchers for an increasingly large set of polymorphisms, which offers the possibility of reduced genotyping costs when studying polymorphisms for which genotypes are already available on the control set – in such instances a laboratory would need only to (re-)type a relatively small subset of the control sample to check the consistency and reliability of the genotyping between the study laboratory and the previous typings.
Whilst the power of a study can usually be maintained by using an increased number of unselected controls, the resulting estimate of the odds ratio will be biased towards the null hypothesis. To obtain unbiased estimates of the odds ratio from unscreened controls, it is necessary to have an estimate of Kp. However, uncertainty in this estimate will reduce the precision of the resulting estimate of the odds ratio, compared to that obtained from screened controls. If both screened and unscreened controls are available, one could estimate the odds ratio from each in turn. Since one would expect the odds ratio for a true effect to be larger when estimated from screened controls, the reverse might be taken as evidence that the observed association was a false-positive.
In summary, we have shown that for many real-world situations in complex genetics, the use of unscreened controls is potentially cost-effective and can, in general, be considered for disorders with population prevalence, Kp= 0.2. With the steady reduction in genotyping costs and the availability of common sets of “population controls” this design is likely to become increasingly cost effective.