• Open Access

Do low control response rates always affect the findings? Assessments of smoking and obesity in two Australian case-control studies of cancer

Authors


Correspondence to:
David Whiteman, Queensland Institute of Medical Research, PO Royal Brisbane Hospital, Queensland 4029. Fax: (07) 3845 3502; e-mail: David.Whiteman@qimr.edu.au

Abstract

Objective: Participation rates have been declining in case-control studies, particularly among controls, raising concerns about possible bias. Formal assessments of the effect of low participation on odds ratios (OR) are seldom presented however. We sought to quantify possible bias using multiple imputation techniques.

Methods: Using data from two Australian case-control studies, we estimated the relative risks of oesophageal squamous cell carcinoma (OSCC) and adenocarcinoma (OAC), and serous ovarian cancer (SOC) associated with smoking and body mass index (BMI). We compared ORs observed using self-reported data from participating controls with ORs derived using imputed exposures for non-participating controls.

Results: Participating controls were less likely than non-participants to smoke currently. Smoking remained significantly associated with oesophageal cancer even under the most extreme assumption of smoking prevalence among non-participants (OSCC: observed OR 6.54, 4.62-9.28, imputed OR 3.94, 2.83-5.49; OAC: observed OR 2.69, 1.87-3.85 imputed OR 1.58, 1.13-2.22). For SOC however, risks associated with smoking were attenuated to null under plausible smoking assumptions among non-participants. BMI distributions were similar among participating and non-participating controls, and risk estimates were essentially unchanged.

Conclusion and implications: Bias is not an inevitable consequence of low control participation and depends on the association examined. Sensitivity analyses can assist in interpretation of results.

Population-based case-control designs are the most effective method to study rare diseases in a population. The validity of the relative risk estimates obtained from this design however, is based on the crucial underlying assumption that the participants truly represent the target populations from which the cases arise. Hence high participation fractions among selected participants are necessary to produce reliable relative risk estimates for likely causal factors. Participation in community-based surveys has been declining for the past decades casting doubt on the veracity of research findings, particularly for case-control studies.1–5 Non-response bias is a function of both the non-response rate and the difference between respondents and non-respondents in the prevalence of salient exposures.4,6 In general, non-participation in epidemiologic studies has been reported to be associated with characteristics such as socio-economic status, education, age, sex, smoking and alcohol.7 There is considerable literature dealing with missing data in longitudinal studies using multiple imputation techniques under a ‘missing at random’ assumption.8,9 While non-participation also affects case-control studies, methods to deal with it are not well-developed. Unlike longitudinal studies, additional information is seldom available for non-participants in case-control studies. Moreover, those who do not participate are unlikely to be a random subset of the eligible population and thus ‘missing at random’ assumptions are not applicable. Nonetheless, imputation approaches can be adopted by making plausible assumptions about the distributions of exposures of interest among non-participants, and then testing the sensitivity of risk estimates under increasingly ‘biased’ levels of non-participation.

Here, we present the findings of an approach using data from two large-scale case-control studies that investigated the causes of three cancers: squamous cell carcinomas (OSCC) and adenocarcinomas (OAC) of the oesophagus and serous epithelial ovarian (SOC) cancers. We explored the effects of potentially biased control participation on risk estimates associated with two common exposures, namely smoking and body mass index. We assumed the participation fraction did not have a significant role among cases due to their vested interest in the study. In addition, no information on exposure prevalence is yet available for the analysis of our cases. Our main concern was to understand the potential impact of the greatly reduced rate of control participation in our study.

Materials and methods

Study design

We used data from two Australian case-control studies of oesophageal and ovarian cancer that were conducted in parallel by the same team over the same time period, using essentially identical recruitment strategies for cases and a common pool of controls.

Cases were Australian residents aged 18-79 years with a histologically confirmed diagnosis of either primary OAC or OSCC or SOC cancer from July 1, 2002 (July 1, 2001 in the state of Queensland for oesophageal cancer cases) to June 30, 2005. Potentially eligible cases were identified by trained nurses in contact with clinics, physicians and state cancer registries throughout Australia. After first obtaining the permission of the treating doctor, nurses invited patients to participate in the studies.

Potential controls were randomly selected from the commonwealth electoral roll (registration is compulsory), frequency matched by age (five-year age groups), sex and state of residence to the case series. Electoral roll recruitment has been validated previously.10 Considerable efforts were made to locate and recruit potential controls. Invitation letters were personally addressed and individually signed by a study investigator. Where possible, letters of invitation were followed up by a telephone call with repeated attempts (up to five) at different times. Otherwise a second letter was sent to the address. If after all attempts, no confirmed contact occurred, then candidates were grouped as ‘uncontactable’. If the address or the telephone number was confirmed but no response was received, they were classed as passive refusals and grouped with those who declined. As for cases, potential controls who were subsequently found to be outside the eligible age range or who had a previous diagnosis of cancer were considered ineligible. Potential cases and controls who were otherwise eligible at the time of selection, but did not speak English, were gravely ill, mentally incompetent, had died or were out of the country at the time of interview were excluded. Controls were classified into five categories as follows: ineligible, uncontactable, excluded, declined and participated. For analysis, non-participants were all those classed as uncontactable, excluded or declined.

Data sources

Data were collected from participants through structured, self-completed questionnaires followed by standard telephone interviews conducted by trained research nurses. Information about participants' social background, including education level, marital status and income was collected. We calculated body mass index (BMI) by dividing weight in kilograms one year ago (or one year before diagnosis for cases) by the square of height in meters. Continuous BMI measure were then categorised into standard categories of 18.5, 18.5-24.9, 25.0-29.9, 30.0-34.9 and 35.0-39.9 and 40+. Participants were asked whether they had ever smoked more than 100 cigarettes/cigars/pipes in their lifetime; positive responses elicited further questions about how much they usually smoked on a typical day and how many years they had smoked, age started smoking and if they had stopped smoking permanently, the age they stopped. Participants were classified as never, past or current smokers based on their smoking status one year before diagnosis (for cases) or the interview (for controls).

National health survey

As a nationwide case-control study in Australia, we intended our controls to be an unbiased sample of Australian population. To obtain ‘gold-standard’ estimates of the distribution of education, income, smoking and BMI in the Australian population, we used data from the National Health Survey (NHS) conducted by the Australian Bureau of Statistics in 2004/5,11 described elsewhere.12 Briefly, personal interviews were conducted with primary respondents identified using stratified multistage area sample of private dwellings. A weighting scheme was used to enable the respondent data to be expanded to provide estimates relating to the whole population. In 2004/5, the NHS comprised 18,550 individuals aged 18-79 years (overall response rate 89%).12

Statistical analysis

We first compared basic demographic predictors between participants and non-participants in the control series including age (in five-year age groups), sex, state of residence and the Socio-Economic Indexes for Areas (SEIFA) advantage/disadvantage score (a measure of socio-economic status based on residential postcode).12

To evaluate the bias in our estimates, we compared the prevalence estimates of various health characteristics between NHS participants and our control series. We then assessed the potential effect of non-participation on inference by comparing risk estimates for two key exposures, namely smoking and BMI, derived from study participants with estimates derived after assigning exposure prevalence for non-participants under varying assumptions. We did these analyses firstly for OAC and OSCC, both of which have been associated previously with smoking and BMI.13–15 We then compared the results to risk estimates for SOC, which has not previously been strongly linked to either exposure.16,17

Imputation procedure

Imputation of exposure data for non-participants was based on the probability distribution of the same exposure from the reference population. Figure 1 shows the step-by-step detail of the imputation procedure. Briefly, the probability of being in the kth category (πk) of exposure with K categories was derived for each stratum of age and sex. Then a random number from the uniform distribution U [0, 1] was drawn for each non-participant in the study. Starting with the first category, individuals with a drawn random number less than or equal to the probability of being in the first exposure category (π1) from the reference data were assigned to the first exposure category. For those not allocated to the first category, a second random number was drawn and again if the drawn number was less than or equal to the probability of being in the second category excluding the first (i.e. π2/(1- π1), the individual was assigned to the second exposure category. The loop continued until K-1 categories were assigned. Individuals not assigned to a category after K-1th step were automatically assigned Kth exposure category.

Figure 1.

Flow chart for imputing exposure data for non-participating controls.

Note: Diamond shapes represent queries that follow decisions and rectangles represent actions that were taken after the decisions were made. The loop starts with setting k=1 and ends at k=K.

For example, assume that among females aged 40-44 years in the general population, it is known that the proportion of current, former and never smokers (k: 1=current, 2=former, 3=never smokers) is 0.24, 0.29 and 0.47 respectively. For an observation identified in our study sample with a missing value for smoking from a woman aged 40-44 years, we can impute a smoking value using the following algorithm. First, we initiate the process by setting k = 1 (current smoker). Next, we assign a random number between 0 and 1 to the missing observation. If the assigned random number is smaller than or equal to 0.24 (the proportion of current smokers), then we impute the missing smoking value as ‘1’ (i.e. current smoker) and proceed to the next record. Otherwise, if the random number is greater than 0.24, then we temporarily set k=2 (former smoker) and assign another random number to that observation. If this second random number is smaller than or equal to 0.29/ (1-0.24) (the proportion of former smokers among the combined group of former and never smokers), then we impute the missing value as ‘2’ (i.e. former smoker) and proceed to the next record. Otherwise, if the second random number is greater than 0.29/ (1-0.24), then we impute the value ‘3’ (i.e. never smoker). This process will randomly allocate the value 1 to approximately 24% of missing smoking values among women aged 40-44 years, value 2 to 29% and value 3 to 47% as would be expected if the women in that age group were sampled at random from the general population. This same process is performed for each missing value using age- and sex-specific proportions.

The observed (for participants) and the imputed data (for non-participants) were then combined to form a complete dataset. Given the higher proportion of non-participants, 40 imputations under each assumption were needed to provide a stable estimate of the standard error due to imputations.9 In total, 40 complete datasets were created for analysis under each assumption.

We imputed exposures assuming firstly that the distribution of each exposure among non-participants was the same as for study participants. We then assumed increasingly more extreme distributions of exposure among non-participants to model the maximum likely impact of non-participation on risk estimates (Tables 3 and 4). Finally, the last round of imputation was performed assuming the probability distribution of exposure based on age and sex for all controls (including participants) was the same as the NHS (i.e. ignoring reported data for participants).

Table 3.  Odds ratio associated with smoking for squamous cell carcinoma (SCC) and adenocarcinoma of the oesophagus and serous ovarian cancer under different assumptions of smoking prevalence among non-participants.
 Smoking statusOesophageal SCC (na=306)Oesophageal adenocarcinoma (na=357)Serous Ovarian cancer (na=631)
Assumptions OR (95% CI)bOR (95% CI)bOR (95% CI)b
  1. Notes: a) Number of cases with full information for smoking.

  2.       b) Odds ratio and 95% confidence interval adjusted for age and sex.

Observed data from participating controls onlyNever1.0 (ref)1.0 (ref)1.0 (ref)
 Former2.07 (1.53, 2.82)1.83 (1.38, 2.44)0.93 (0.75, 1.15)
 Current6.54 (4.62, 9.28)2.69 (1.87, 3.85)1.62 (1.21, 2.17)
Assumption 1: Distribution of smoking among non-participating controls is the same as participating controlsNever1.0 (ref)1.0 (ref)1.0 (ref)
 Former2.05 (1.52, 2.77)1.79 (1.36, 2.35)0.91 (0.74, 1.12)
 Current6.39 (4.57, 8.93)2.55 (1.81, 3.59)1.58 (1.21, 2.07)
Assumption 2: Distribution of smoking among non- participating controls is the same as NHS distributionNever1.0 (ref)1.0 (ref)1.0 (ref)
 Former2.23 (1.66, 3.01)2.03 (1.54, 2.68)0.97 (0.79, 1.19)
 Current5.56 (3.99, 7.75)2.46 (1.73, 3.48)1.25 (0.96, 1.62)
Assumption 3: Non- participating controls are two times more likely to be current smokers than participant controlsNever1.0 (ref)1.0 (ref)1.0 (ref)
 Former2.42 (1.79, 3.26)1.58 (1.13, 2.22)1.13 (0.92, 1.40)
 Current3.94 (2.83, 5.49)2.20 (1.67, 2.90)0.97 (0.74, 1.26)
Assumption 4: All potential controls (including non-participants) have same distribution of smoking as NHSNever1.0 (ref)1.0 (ref)1.0 (ref)
 Former2.34 (1.73, 3.16)2.15 (1.63, 2.85)1.01 (0.82, 1.24)
 Current5.17 (3.72, 7.19)2.40 (1.69, 3.39)1.09 (0.83, 1.42)
Table 4.  Odds ratio associated with BMI for squamous cell carcinoma (OSCC) and adenocarcinoma (OAC) of the oesophagus and serous ovarian cancer (SOC) under different assumptions of BMI prevalence among non-participants.
AssumptionsBMI statusOesophageal Squamous cell carcinoma (na=294)Oesophageal adenocarcinoma (na=344)Serous Ovarian cancer (na=583)
  OR (95% CI)bOR (95% CI)bOR (95% CI)b
  1. Notes:

  2. a) Number of cases with full information for BMI.

  3. b) Odds ratio and 95% confidence interval adjusted for age and sex

Observed data from participant controls only<18.52.36 (1.21, 4.39)0.43 (0.02, 2.2)0.82 (0.38, 1.66)
 18.5-24.91.0 (ref)1.0 (ref)1.0 (ref)
 25.0-29.90.42 (0.31, 0.56)1.44 (1.06, 1.99)1.03 (0.83, 1.29)
 30.0-34.90.41 (0.26, 0.61)2.79 (1.94, 4.01)0.66 (0.48, 0.90)
 35.0-39.90.72 (0.38, 1.29)3.71 (2.08, 6.49)0.83 (0.53, 1.27)
 40+0.28 (0.05, 0.92)6.47 (2.99, 13.61)0.68 (0.35, 1.25)
Assumption 1: Distribution of BMI among non-participating controls is the same as participating controls<18.52.13 (1.17, 3.89)0.35 (0.05, 2.64)0.85 (0.42, 1.71)
 18.5-24.91.0 (ref)1.0 (ref)1.0 (ref)
 25.0-29.90.42 (0.32, 0.56)1.45 (1.07, 1.98)1.02 (0.83, 1.26)
 30.0-34.90.41 (0.27, 0.62)2.82 (2.0, 3.98)0.65 (0.48, 0.88)
 35.0-39.90.73 (0.40, 1.32)3.80 (2.21, 6.54)0.83 (0.55, 1.25)
 40+0.28(0.07, 1.16)6.01 (2.97, 12.19)0.68 (0.37, 1.26)
Assumption 2: Distribution of BMI among non- participating controls is the same as NHS distribution<18.52.33 (1.28, 4.25)0.52 (0.07, 3.94)0.70 (0.35, 1.39)
 18.5-24.91.0 (ref)1.0 (ref)1.0 (ref)
 25.0-29.90.45 (0.34, 0.59)1.55 (1.14, 2.10)1.02 (0.83, 1.26)
 30.0-34.90.41 (0.27, 0.63)2.76 (1.97, 3.88)0.68 (0.51, 0.92)
 35.0-39.90.77 (0.43, 1.40)3.53 (2.07, 6.01)0.95 (0.62, 1.44)
 40+0.34 (0.08, 1.40)6.86 (3.43, 13.73)0.92 (0.50, 1.71)
Assumption 3: All potential controls (including non-participants) have same distribution of BMI as NHS<18.52.37 (1.30, 4.31)0.62 (0.08, 4.75)0.65 (0.32, 1.29)
 18.5-24.91.0 (ref)1.0 (ref)1.0 (ref)
 25.0-29.90.47 (0.36, 0.63)1.65 (1.22, 2.24)1.04 (0.84, 1.29)
 30.0-34.90.43 (0.28, 0.65)2.84 (2.0, 4.03)0.71 (0.53, 0.97)
 35.0-39.90.83 (0.45, 1.51)3.59 (2.09, 6.16)1.05 (0.68, 1.63)
 40+0.39 (0.09, 1.65)7.68 (3.48, 16.97)1.17 (0.62, 2.23)

Logistic regression analyses adjusting for age and sex were performed firstly for the dataset restricted to actual participants (hereafter ‘observed data’), and subsequently for each of the datasets that combined observed data from participants and imputed data from non-participants (hereafter ‘imputed data’). We obtained odds ratios (OR) and 95% confidence intervals (95% CI) for the estimates of relative risks of smoking and BMI on cancer by summarising the estimates of odds ratios from 40 imputed datasets adjusting for within and between variance estimates.8 Finally, we estimated the likely degree of bias by calculating the relative difference in excess risk estimates (where the excess risk is that over and above the reference value of 1.0) due to non participation as:

[Observed excess OR – Imputed excess OR] / [Imputed excess OR]

Where,

   Observed excess OR= (OR estimated using observed data – 1)

   Imputed excess OR= (OR estimated using imputed data – 1)

Results

We collected questionnaire information for 367 people with OAC and 309 people with OSCC and 683 women with SOC. The overall participation rates among those invited were 70% for oesophageal and 87% for ovarian cancer cases, respectively. No exposure information was available for the cases who did not participate.

Of 7,017 people randomly selected from the electoral roll, fewer than 1% (44) were ineligible and 1,256 (18%) were uncontactable. Two-hundred and eighty four (4%) potentially eligible controls contacted were excluded due to language and ill health, and 2,784 (37%) declined the initial invitation to take part or never returned a questionnaire. In total, 2,649 questionnaires were returned (Figure 2).

Figure 2.

Flow chart depicting control recruitment for the combined studies of oesophageal and ovarian cancer.

Predictors of uncontactability among all eligible controls

Compared with potential controls who were contactable (i.e. excluded + declined + participated), those who could not be contacted were more likely to be younger and reside in localities with lower socio-economic indicators (Table 1).

Table 1.  Participation status among study controls, by socio-demographic characteristics in case-control studies of oesophageal and ovarian cancers 2002-05.
 No Contacts N (%)Exclusions N (%)Refusals N (%)Participants N (%)Total N (%)
  1. Notes: a) P values comparing each of the different classes of non-participating controls with participating controls (each one adjusting other variables in the table).

  2. b) SEIFA score is a relative measure of social disadvantage based on residential postal code as defined by the Australian Bureau of Statistics.31

Gender
Female817 (65.1)174 (61.3)1814 (65.2)1611 (60.8)4441 (63.3)
Male439 (34.9)110 (38.7)969 (34.8)1038 (39.2)2575 (36.7)
 pa=0.98pa=0.08pa<0.001  
Age group
<30106 (8.4)1 (0.4)57 (2.1)48 (1.8)212 (3.0)
30-39150 (11.9)5 (1.8)154 (5.5)135 (5.1)444 (6.3)
40-49225 (17.9)24 (8.5)331 (11.9)374 (14.1)955 (13.6)
50-59358 (28.5)48 (16.9)689 (24.8)745 (28.1)1,843 (26.3)
60-69259 (20.6)82 (28.9)782 (28.1)827 (31.2)1,965 (28.0)
70-79158 (12.6)124 (43.7)771 (27.7)520 (19.6)1,598 (22.8)
 pa<0.001pa<0.001pa<0.001  
State
NSW412 (32.8)97 (34.2)822 (29.5)637 (24.1)1,981 (28.2)
QLD364 (29.0)51 (18.0)774 (27.8)887 (33.5)2,091 (29.8)
SA143 (11.4)28 (9.9)326 (11.7)293 (11.1)793 (11.3)
TAS8 (0.6)2 (0.7)48 (1.7)55 (2.1)117 (1.7)
VIC230 (18.3)84 (29.6)551 (19.8)489 (18.5)1,362 (19.4)
WA99 (7.9)22 (7.8)263 (9.5)288 (10.9)673 (9.6)
 p*<0.001p*<0.001p*<0.001  
SEIFAbscore
(Mean, SE)999.2 (2.1)981.4 (4.9)1,003.7 (1.3)1,010.4 (1.3)1,004.4 (0.83)
 pa<0.001pa<0.001pa<0.001  

Predictors of declined invitation among contactable controls

Among those potential controls who were contacted and not excluded, women were more likely to decline overall than men (Table 1). However, age-specific refusal rates differed between the sexes. Older women were more likely to decline than younger women whereas younger men were more likely to decline than older men. Participation proportions varied significantly across the states of Australia, with lowest participation in the most populous and urbanised state of New South Wales. Participants who declined tended to reside in localities with lower socio-economic indices compared to those who participated.

Comparison of exposure distribution between controls and NHS population

Distributions of socio-demographic factors and salient exposures among study control participants and the NHS population (directly age adjusted to control distribution) are presented for males and females in Table 2. Overall, study control participants were more likely than the NHS participants to be Australian born (reflecting the electoral roll sampling frame and the language restrictions on participation), have post-school qualifications, married, and to be ex-smokers. Participating controls in our study were more likely to have BMI in the normal range than NHS participants, however the difference was small, especially for older age groups. Overall, study control participants were generally better educated and have better health status compared to those surveyed for the NHS.

Table 2.  Demographic characteristics of control participants in case-control studies of oesophageal and ovarian cancer (ACS) and the Australian National Health Survey 2004/05 (NHS).
 ACSNHSa
VariablesMale (%)Female (%)Male (%)Female (%)
  1. Notes: a) Proportion of male and female NHS participants are directly age standardised to the control distribution so that the two proprotions are comparable.

  2. b) Data were only available for women aged 59 yrs or less in NHS study.

Australian Born74.478.464.257.0
Education
None post school36.049.945.956.4
Technical college/Trade certificate47.536.240.629.3
University16.413.913.514.2
Marital Status
Married/defacto86.176.975.863.8
Divorced/Separated5.89.510.914.8
Widowed4.19.74.312.2
Never married4.13.99.19.2
Smoking
Current smokers13.310.718.515.8
Ex-smokers48.728.845.725.3
Never smokers37.960.535.956.7
Age started smoking
17 or less66.252.161.246.1
18-2531.342.235.342.9
25-291.02.11.43.5
30+1.33.72.27.6
Body Mass Index
<2532.946.034.848.0
25-29.9950.030.744.731.6
30-39.9916.220.519.418.7
40+1.02.81.01.9
Had Hysterectomy 24.2 23.9
Oral Contraceptiveb 86.7 76.6

To assess the extent to which non-participation might lead to error we investigated the changes in risk estimates associated with smoking and BMI under different assumptions of the prevalence of each exposure in non-participants (Tables 3 and 4).

Smoking

Under the first model, smoking data for non-participants were imputed assuming the same distribution as control participants. As expected, this model generated almost identical OR to the analyses restricted to actual study participants (Table 3). The next model assumed non-participating controls had the same distribution for smoking as NHS participants, and this attenuated the excess risk estimates for current smoking for oesophageal and ovarian cancers (relative reductions in excess risk of 18% for oesophageal SCC, 14% for oesophageal adenocarcinoma and 60% for serious ovarian cancer). In a third model, we assumed that potential controls who did not participate had similar proportions of non-smokers but a twofold greater proportion of current smokers than the participating controls. Relative risk estimates for current smoking under this extreme scenario were further attenuated for oesophageal cancer (relative reductions in excess risk of 48% for OSCC and 66% for OAC) and the association was completely abolished for SOC (Table 3). Finally, we calculated the odds ratio assuming that our original control sample had exactly the same distribution of smoking as the NHS. Under this assumption, risks of OSCC and OAC were modestly attenuated among current smokers, but modestly increased among former smokers. For both types of oesophageal cancers, the association with smoking remained strong and highly significant, however, the association between current smoking and SOC was very weak and no longer statistically significant (Table 3).

Body mass index

Estimates of cancer risk assuming various distributions of BMI among non-participants are presented in Table 4. While some variations in the magnitude of the association between BMI and the various cancer outcomes were observed, changes in risk estimates were modest for all three cancers under all of the assumptions tested.

Discussion

Case-control studies are efficient designs for investigating rare diseases. However, concerns have been expressed at declining participation rates, particularly among controls, raising the prospect of selection bias.18–22 Although it is well-documented that low participation rates do not inevitably result in biased estimates,23,24 the extent of bias is seldom formally assessed, presumably because information on salient exposures is generally unavailable for non-participants in case-control studies. Although sensitivity analysis (that is, testing the stability of a risk estimate by examining the extent to which such estimates are affected by changes in the values of variables) has been suggested to quantify the effect of non-participation,25 such analyses are seldom undertaken, or at least, seldom reported.

Other approaches for detecting possible selection bias have been described, including comparing early versus late responders22,26 and using long versus short questionnaires.21,27 Although each of these approaches may give some insight into possible direction or magnitude of bias, they do not directly address the issue of non-response.28 Thus, the approach we have taken is to compare the prevalence of risk factors among study participants with normative data from the underlying population.29,30 In this study, we explored a direct method of estimating potential bias in the odds ratio by assigning exposure values to non-participants using a multiple imputation technique. Our analysis suggested non-participation among potential controls may have overestimated the relative risk of cancers of the oesophagus associated with current smoking, although a large and statistically significant risk remained after accounting for the higher rates of smoking among non-participating controls. Our analyses using participant-only data suggested a significant association with smoking for serous ovarian cancer, whereas the imputation analysis found no association, consistent with the results of a recent meta-analysis.16,17 For BMI on the other hand, risk estimates between observed and imputed datasets differed by less than 10% for all three cancers, except for the most extreme category of BMI (>40 kg/m2).

We have focused on the effects of reduced participation among controls because this is the most common problem when conducting population-based studies. Case recruitment rates are generally higher than for controls, and the reasons for non-participation are often related to factors such as sickness and death. Thus while the participating case group may not represent all potential cases, it is more likely that differences between participating and non-participating cases are due to disease factors than exposure factors. Consequently, low case participation may mean the results of a study will not be generalisable to more serious cases of disease, but they are unlikely to result in systematically biased risk estimates. In this study, normative data about the distribution of smoking and BMI were not available for non-participating cases, although it is difficult to conceive that they would have smoked less than participating cases. Thus, any bias introduced by non-participation among cases is likely to have led to underestimation of the effect of smoking.

We believe the systematic approach we have applied is a straightforward and comprehensive way to assess the possible bias introduced by non-participation among controls in case-control studies. As with every study, ours also had limitations. Although the NHS is the best available population-based dataset for health behaviours in the Australian population, it also suffered from incomplete participation and thus almost certainly has some degree of selection bias. In addition, the sampling method used by NHS (clustered household sampling) differed markedly from that used by ACS (electoral roll sampling matched by age, sex and state of residence to the case series). It is possible that these differences in sampling may have introduced disparities in the distributions of risk factors, although it is difficult to predict the direction or magnitude of any such bias.

Another limitation of our approach is that there is no simple way to assess the extent of any likely bias without going through some form of sensitivity analysis to test the stability of risk estimates under a range of scenarios. In longitudinal studies, missing data for participants who subsequently drop out are imputed by multiple regressions on their observed information under the assumption the data are missing at random. In case-control studies, no such information is available for non-participants. Moreover data are missing at the unit level and are less likely to be missing at random hence the same regression techniques cannot be used. We believe that using external data and probability sampling to impute the missing data is the most effective approach to investigate participation bias in this setting.

In summary, we have investigated the effects of non-participation in population-based studies of cancer, and considered the consequences for estimating measures of effect based on sub-optimal control samples. Our investigation showed that non-participation of otherwise eligible controls resulted in an overestimation of the effect of smoking for cancers of the oesophagus and falsely identified a modest yet significant association for serous ovarian cancers. Within the same studies, on the other hand, we found no evidence of bias for associations with BMI for any of the cancers.

While efforts to achieve maximum participation rates should always be made, it is unlikely that rates of participation in future epidemiological studies will be as high as those in the past. It will therefore become increasingly necessary to quantify the extent of possible bias arising from non-participation. The use of imputation methods as described here permits the results of studies with suboptimal response rates to be interpreted within the context of exposure rates prevailing in the target population.

Acknowledgements

This study was supported by grants from the US Army Medical Research and Materiel Command under DAMD17-01-1-0729, the National Health and Medical Research Council (NHMRC) of Australia (Program no. 199600), the Queensland Cancer Fund, the Cancer Council Tasmania and the Cancer Foundation of Western Australia. We gratefully acknowledge the co-peration of the New South Wales, Queensland, South Australian, Victorian and Western Australian Cancer Registries as well as all the collaborating institutions. David Whiteman and Penelope Webb are supported by Research Fellowships from the National Health and Medical Research Council of Australia. Nirmala Pandeya is supported by a PhD scholarship from the National Health and Medical Research Council of Australia. The funding bodies played no role in the design or conduct of the study; the collection, management, analysis, or interpretation of the data; or preparation, review, or approval of the manuscript. The Australian Cancer Study: Oesophageal Cancer: David Whiteman, Penelope Webb, Adèle Green, Nicholas Hayward, Peter Parsons, David Purdie (Queensland Institute of Medical Research). The Australian Ovarian Cancer Study management group comprises: David Bowtell (Peter MacCallum Cancer Centre), Georgia Chenevix-Trench, Adèle Green, Penelope Webb (Queensland Institute of Medical Research), Anna deFazio (Westmead Hospital), Dorota Gertig (University of Melbourne).

Ancillary