What is wrong with non‐respondents? Alcohol‐, drug‐ and smoking‐related mortality and morbidity in a 12‐year follow‐up study of respondents and non‐respondents in the Danish Health and Morbidity Survey

Abstract Aim Response rates in health surveys have diminished over the last two decades, making it difficult to obtain reliable information on health and health‐related risk factors in different population groups. This study compared cause‐specific mortality and morbidity among survey respondents and different types of non‐respondents to estimate alcohol‐, drug‐ and smoking‐related mortality and morbidity among non‐respondents. Design Prospective follow‐up study of respondents and non‐respondents in two cross‐sectional health surveys. Setting Denmark. Participants A total sample of 39 540 Danish citizens aged 16 years or older. Measurements Register‐based information on cause‐specific mortality and morbidity at the individual level was obtained for respondents (n = 28 072) and different types of non‐respondents (refusals n = 8954; illness/disabled n = 731, uncontactable n = 1593). Cox proportional hazards models were used to examine differences in alcohol‐, drug‐ and smoking‐related mortality and morbidity, respectively, in a 12‐year follow‐up period. Findings Overall, non‐response was associated with a significantly increased hazard ratio (HR) of 1.56 [95% confidence interval (CI) = 1.36–1.78] for alcohol‐related morbidity, 1.88 (95% CI = 1.38–2.57) for alcohol‐related mortality, 1.55 (95% CI = 1.27–1.88) for drug‐related morbidity, 3.04 (95% CI = 1.57–5.89) for drug‐related mortality and 1.15 (95% CI = 1.03–1.29) for smoking‐related morbidity. The hazard ratio for smoking‐related mortality also tended to be higher among non‐respondents compared with respondents, although no significant association was evident (HR = 1.14; 95% CI = 0.95–1.36). Uncontactable and ill/disabled non‐respondents generally had a higher hazard ratio of alcohol‐, drug‐ and smoking‐related mortality and morbidity compared with refusal non‐respondents. Conclusion Health survey non‐respondents in Denmark have an increased hazard ratio of alcohol‐, drug‐ and smoking‐related mortality and morbidity compared with respondents, which may indicate more unfavourable health behaviours among non‐respondents.


INTRODUCTION
Reliable information on health and health-related risk factors in different population groups is required to calculate the burden of morbidity and mortality attributable to these risk factors and to formulate and evaluate policies aimed at improving population health and reducing health inequalities. Health surveys of the general population are a commonly used method to obtain such information, but the validity depends upon the representativeness of the population of interest. During the last couple of decades response rates in health surveys have been diminishing, which can be problematic as it means that inference is being made on a progressively limited subsample of the population [1][2][3][4]. This will, however, only affect the estimates if respondents and non-respondents differ systematically from each other in a way that is relevant to the study results. There are known differences in socio-demographic characteristics between respondents and non-respondents; e.g. non-respondents are more likely to be men, young, unmarried and less educated [2,[5][6][7]. Further, non-respondents tend to have more unfavourable health behaviours and excess mortality than respondents [5,6,[8][9][10][11][12]. This evidence suggests that survey-based estimates, e.g. measures of smoking and alcohol consumption, are underestimated.
Often, non-response is assessed by comparing respondents to general population register-based data sets [7,13,14], and in some cases by comparing respondents and non-respondents on known characteristics from the sampling frame [1,6,10,15]. In the present study it was possible to assess mortality and morbidity among respondents and different types of non-respondents. Previous results indicate that individuals who took the time to decline the study invitation differed from those with whom the researchers had no contact [16]. Knowledge about cause-specific mortality and morbidity among respondents and different types of non-respondents may provide additional insight into the extent of the associated bias. This has, to our knowledge, not been assessed previously. Hence, this study seeks to expand upon and improve the evidence surrounding the well-documented selection effects caused by non-response.
The aim of the present study is to estimate the magnitude of non-response bias in health surveys by comparing register-based information on alcohol-, drug-and smoking-related mortality and morbidity, respectively, for respondents and different types of non-respondents during a follow-up period of up to 12 years.

MATERIAL AND METHODS
The study is based on pooled data from the Danish Health and Morbidity surveys in 2000 and 2005. Both surveys are cross-sectional and designed to be nationally representative. Information on cause-specific death and mortality at the individual level were subsequently obtained from administrative registers and linked to all individuals in the sampling frames, including non-respondents.

Survey-based data
The survey from 2000 consisted of a county-stratified random sample of 22 484 individuals. The sample was drawn from the adult (aged 16 years or older) Danish population by the Danish National Centre for Social Research (who carried out the data collection) using the Danish Civil Registration System (each citizen has a unique personal identification number) [17]. A crucial aim was to obtain at least 1000 completed interviews in each of the 15 Danish counties (except in the smallest county, where 600 completed interviews were considered sufficient). The main reason for this aim was that the survey should serve as a tool for the counties in their health-care planning. In addition, the counties had the opportunity to increase the sample size in their county if they were willing to cover the costs that arise from such an expansion. Only one county (Frederiksborg County) decided to increase their sample size. Thus, the sample was supplemented with 612 individuals from Frederiksborg County and the total sample size was 23 096 Danish citizens. A total of 17 137 (74.2%) individuals participated in the survey. The reasons for nonresponse were refusal (n = 5188; 22.8%), uncontactable (address of residence was obtained but the interviewer failed to establish contact to the sampled individual) (n = 379; 1.6%), illness/disabled (n = 305; 1.3%) and other reasons (e.g. linguistic barriers) (n = 87; 0.4%).
In 2004, a new local government reform was planned and was implemented in 2007. The counties were dissolved and five regions were established. In order to provide health- Institutionalized individuals were included in the sampling frame and respondents were interviewed at the institution. It is not possible to know the breakdown of reasons for non-response for institutionalized non-respondents from the available data material.
All selected individuals received a letter of introduction that briefly described the purpose and content of the survey, and it was emphasized that participation was voluntary. Data were collected via face-to-face interviews at the respondents' places of residence (a minimum of four contact attempts) and carried out by the professional interview staff at the Danish National Centre for Social Research. More details of the survey designs are described elsewhere [1].

Register-based data
In Denmark, nation-wide administrative registers are available for research purposes, and owing to the unique personal identification number it was possible to link information directly from several registers to each individual in the sample. The Danish Civil Registration System was used to retrieve information on sex, age, vital status and the date of any change of vital status. Information on the highest completed education at the time of data collection was extracted from Danish education registers, which are generated from the education institutions' administrative records [18]. Information on cause-specific mortality was obtained from the Danish Register of Causes of Death (DRCD) [19], which covers all deaths among citizens dying in Denmark. Information on hospitalizations was obtained from the National Patient Register (DNPR), which holds administrative (e.g. hospital ward and date and time of activity) and clinical data (diagnoses and surgical procedures) on all patients in Danish hospitals [20]. Classification of cause(s) of deaths and diagnoses is based on ICD-10 codes. Table 1 displays the ICD-10 codes used to define alcohol-, drug-and smoking-related mortality and morbidity. In-patient admissions with one of the listed ICD-10 codes as either a primary or secondary diagnosis were defined as events, and deaths with one of the listed ICD-10 codes as either a primary or secondary cause were also defined as events. Only the first registered admission or death for the event under study was included in the analysis. Classifications are not mutually exclusive, which means that individuals can be classified, for example, as having both an alcohol-related mortality event and a smokingrelated morbidity event if the individual is hospitalized with a smoking-related event in 2008 and dies of an alcoholrelated event in 2010. The list of conditions was based mainly on a former Danish study assessing the impact on various risk factors on public health [21]. All-cause mortality was defined as any given event registered in DRCD including alcohol-, smoking-and drug-related deaths. The Danish Data Protection Agency approved the linking of the registers and the survey data and all local confidentiality and privacy requirements were met. No consent was needed at the individual level.

Statistical analyses
Initial descriptive analyses provided incidence rates for alcohol-, drug-and smoking-related mortality and morbidity during follow-up and simple frequency distributions of potential confounding variables by response status. Observation intervals were calculated from the sampling date for non-respondents (i.e. 1 January 2000 and 2005, respectively) and the interview date for respondents until the first relevant event, death, emigration or end of follow-up (31 December 2011), whichever came first. Hence, individuals' survival times were censored upon experiencing a competing risk; 31 December 2011 was chosen as end of follow-up, as the longest possible follow-up time was preferred, and 2011 was the latest year that DNPR and DRCD were fully updated at the time of data analysis. Incidence rates were calculated as the number of events during the study period divided by the sum of the person-time of the individuals at risk. Individuals were considered at risk until the first registered admission in DNPR or death in DRCD for the event under study. The association between response status and the incidence of alcohol-, drug-and smoking-related mortality and morbidity, respectively, was analysed using the Cox proportional hazards model adjusting for potential confounding factors (survey year, sex, education). Age was applied as the underlying time in the statistical model. Evaluation of the validity of the proportional hazards assumption was performed by visual inspections of log-log plots (data not shown). The number of participants (n) in the models with and without the inclusion of education differs due to missing information on educational attainment. This information is missing for individuals educated abroad and the older generation. In the present data material, information on educational level is missing for 5.5% among individuals aged 16 years or older and for 45.1% among individuals aged 75 years or older. In all tests, P-values were two-sided and statistical significance was defined as P<0.05. All analysis was carried out using SAS version 9.3.

RESULTS
The relative distribution of basic characteristics among respondents and different types of non-respondents is summarized in Table 2. In general, respondents were younger than non-respondents and individuals with basic school education were under-represented among the respondents.
No difference was seen in the sex distribution between respondents and non-respondents. Incidence rates of alcohol-, smoking-and drug-related mortality and morbidity were lower among respondents compared to non-respondents; this also applied for all-cause mortality.
The associations between response status and causespecific mortality and morbidity, respectively, are shown in Table 3. Non-respondents had an increased hazard ratio for alcohol-related mortality and morbidity, respectively, compared to respondents when adjusting for survey year, sex and education. The same pattern was evident for drug-related mortality and morbidity, smoking-related morbidity and all-cause mortality. Smoking-related mortality tended to be higher among non-respondents compared to respondents, but not significantly different. Table 4 shows the association between cause-specific mortality   non-respondents had an increased hazard ratio for drugrelated morbidity and mortality, respectively. All types of non-respondents had an increased hazard ratio for all-cause mortality compared to respondents.

DISCUSSION
It is evident from the current study that non-respondents were more likely than respondents to suffer from alcohol-, smoking-and drug-related morbidity and mortality. Further, the analyses showed that non-response is a heterogeneous matter, i.e. different types of non-response have varied effects on the non-response bias. Refusal was the most frequent reason for non-response, but significant differences between respondents and refusing nonrespondents was observed only in relation to alcohol-and drug-related mortality and morbidity. Uncontactable non-respondents constituted the second largest category, and pronounced differences between uncontactable nonrespondents and respondents were seen in relation to alcohol-and smoking-related mortality and morbidity and drug-related morbidity. The ill/disabled constituted a small group among non-respondents, but had a significantly higher hazard ratio for smoking-and drug-related mortality and morbidity and alcohol-related morbidity.
Overall, the observed hazard ratio was higher among uncontactable and ill/disabled non-respondents compared to refusal non-respondents. Hence, uncontactable and ill/disable non-respondents were the most important contributors to non-response bias.
The assumption in the present study is that differences in alcohol-, smoking-and drug-related morbidity and mortality infer differences in alcohol consumption, smoking and drug use. However, this requires a strong association between the selected ICD-10 codes and the particular health behaviour. Smoking is a fairly specific cause for chronic obstructive pulmonary disease and cancer in the larynx, tracheas, bronchus and lung, which were the diagnoses selected to represent smoking. Overall, the aetiological fraction of smoking in the development of these diseases is approximately 80-90% [22]. The aetiological fraction for each of the selected alcohol-and drug-related diagnoses is 100% per definition. The chosen alcoholand drug-related diseases are, however, associated mainly with extensive alcohol consumption and drug use, respectively, and diseases associated with more moderate use are not included. The results, therefore, indicate more heavy alcohol and drug use and not necessarily more moderate use among non-respondents than respondents. Thus the selected diagnoses provide guideline indications of the actual consumption in the general population and the results indicate that estimates of smoking and heavy alcohol and drug use are most probably underestimated by the bias produced by non-response. This is in accordance with previous studies [6,9,10,15,23,24]. Such resultant bias is an important component that should be taken into consideration in surveys based on general populations [14].
The reason why smoking-related events show less relationship with non-response compared to alcohol-and drug-related events may be explained partly as follows: as described, the alcohol-and drug-related diagnoses are associated with extensive alcohol consumption and drug use, and individuals with these diagnoses may represent a more marginalized group with more severe health problems than heavy smokers. Further, the selected alcohol-and drug-related diagnoses have a shorter lag time than the selected smoking-related diagnoses. Hence, nonrespondents may have died from other causes before developing a smoking-related diagnosis.
Some degree of health outcome-related self-selection into the study is to be expected; e.g. those who are already sick at baseline decline to respond resulting in a higher morbidity and mortality among non-respondents during the first years of follow-up. This is supported by a previous study, showing that that the excess mortality of non-respondents was higher after 4 years of follow-up in comparison to 28 years of follow-up [10]. However, a stable excess mortality and morbidity several years after baseline would indicate that respondents and nonrespondents differ not only in health status at baseline. The analysis in the present study showed differences in mortality and morbidity between respondents and nonrespondents even after a relatively long follow-up period, and differences in life-style are therefore likely. Differences in socio-demographic characteristics between respondents and non-respondents have been suggested as an explanation for this difference in outcome, but this factor did not seem to be a sufficient explanation in the current study. In most cases the adjusted HR remained significantly increased for non-respondents, indicating that these socio-demographic characteristics did not capture all the differences in health between the groups. This is in accordance with findings in other studies [10,23]. One explanation could be that sex and education captured inadequately the factors related to participation, i.e. the categories were too broad or that other unmeasured characteristics were more important. Another explanation could be that those who chose to respond were healthier than those who chose not to respond, even within the same socio-demographic group.
As mentioned in the Statistics section, information on educational level is missing for individuals educated abroad and the older generation. However, as this is a problem for both respondents and non-respondents there is no reason to suspect any differential misclassification of educational level.
When using DNRP there are some aspects one needs to be aware of. Since 2000 the DNPR has formed the basis for payment to public hospitals, and the registration from these hospitals is assumed to be complete from that time. However, registration from private hospitals and clinics is known to be incomplete. In 2008, the National Board of Health estimated that 5% of all operations were missing from the DNPR. In addition, hospitals in Denmark are reimbursed according to the diagnosis-related groups system (DRG), which is a classification system that identified the 'products' that the patient has received. Hence, the use of the DRG system for payment of hospitals may cause a diagnostic drift in the coding towards diagnoses with higher costs, which influences the validity of disease classifications in hospital systems [25]. However, any misclassification of outcome will most probably be non-differential, as it will apply for both respondents and non-respondents, and will therefore tend to underestimate the true association. Lastly, DNPR and DRCD only cover morbidity and mortality events that occur in Denmark. Hence, individuals who emigrate have unknown health outcomes and are therefore censored from the analysis at the day of emigration.
Overall, the identified non-response bias is an important component that should be taken into consideration when using estimates based on surveys of the general population. Standard approaches aim to improve overall response rates, but even if the response rate is high selective nonresponse may bias estimates, and tailored methods aiming to improve response rate in different types of non-response groups-particularly hard-to-reach communities and those with morbidities-may be warranted. For example, the introduction letter could stress the importance of response despite illness/disability, and a shorter questionnaire could be offered as a second option to those indicating illness/disability. Additionally, the number of contacts attempts could be increased for uncountable non-respondents and the contact attempts could be varied by time of day and week and by contact mode, i.e. telephone, letter and visit to the address.
Post-hoc, non-response adjustment by inverse probability weighting is applied routinely to account for nonresponse bias by selective non-response. However, weights based on socio-demographics characteristics may not capture adequately differential health and health-related behaviour within categories. The information obtained from the present study may be used to improve weighting, as the success of weighting depends upon how useful the proxy variables (reason for non-response) are for the survey variables (health behaviour). This will never substitute fully for missing data from non-respondents, but knowledge concerning socio-demographic characteristics and cause-specific mortality and morbidity among respondents and different types of non-respondents may provide additional insight into the magnitude and direction of the associated bias. However, this requires that information on reason for non-response be registered in a systematic way during data collection. A promising alternative to weighting is a health outcome-informed multiple imputation approach, which is currently in development [26]. This exploits the record-linked health outcome data to form the basis of imputation of missing survey data on nonrespondents. The explicit incorporation of differential distributions for respondents and different kinds of nonrespondents can be factored in by implementation of a pattern mixture-based approach which allows for data which are missing not at random [27].
No further information, such as diagnosis, is available about illness or disability from the present data material. This information would have been valuable to further explore this reason for non-response, but unfortunately it was not registered during data collection.

CONCLUSION
The increased hazard ratios of alcohol-, drug-and smokingrelated mortality and morbidity among non-respondents compared to respondents indicate more unfavourable health behaviours among non-respondents compared to non-respondents. Further, different types of non-response seemed to have varied effects on non-response bias. To reduce the selection bias, data collection strategies that maximize the response rate among those non-respondents who are the most important contributors to non-response bias should be used, and post-hoc methodologies such as tailored multiple imputation using information on reasons for non-response could be applied.

Ethical considerations
The Danish Data Protection Agency has approved the linking of the registers and the survey data, and all local confidentiality and privacy requirements have been met.

Declaration of interests
None.