Problem drug use prevalence estimation revisited: heterogeneity in capture–recapture and the role of external evidence

Abstract Background and Aims Capture–recapture (CRC) analysis is recommended for estimating the prevalence of problem drug use or people who inject drugs (PWID). We aim to demonstrate how naive application of CRC can lead to highly misleading results, and to suggest how the problems might be overcome. Methods We present a case study of estimating the prevalence of PWID in Bristol, UK, applying CRC to lists in contact with three services. We assess: (i) sensitivity of results to different versions of the dominant (treatment) list: specifically, to inclusion of non‐incident cases and of those who were referred directly from one of the other services; (ii) the impact of accounting for a novel covariate, housing instability; and (iii) consistency of CRC estimates with drug‐related mortality data. We then incorporate formally the drug‐related mortality data and lower bounds for prevalence alongside the CRC into a single coherent model. Results Five of 11 models fitted the full data equally well but generated widely varying prevalence estimates, from 2740 [95% confidence interval (CI) = 2670, 2840] to 6890 (95% CI = 3740, 17680). Results were highly sensitive to inclusion of non‐incident cases, demonstrating the presence of considerable heterogeneity, and were sensitive to a lesser extent to inclusion of direct referrals. A reduced data set including only incident cases and excluding referrals could be fitted by simpler models, and led to much greater consistency in estimates. Accounting for housing stability improved model fit considerably more than did the standard covariates of age and gender. External data provided validation of results and aided model selection, generating a final estimate of the number of PWID in Bristol in 2011 of 2770 [95% credible interval (Cr‐I) = 2570, 3110] or 0.9% (95% Cr‐I = 0.9, 1.0%) of the population aged 15–64 years. Conclusions Steps can be taken to reduce bias in capture–recapture analysis, including: careful consideration of data sources, reduction of lists to less heterogeneous subsamples, use of covariates and formal incorporation of external data.

The basic principle of CRC is that prevalence can be estimated based on the pattern of overlap between two, or preferably more, lists of observed individuals from the target population. The number of unobserved individuals is then estimated by fitting a statistical model, usually a loglinear regression [26], to this overlap. Added to the number observed, this gives the total population size or prevalence. These models are subject to assumptions which, if violated, can lead to invalid estimates [27][28][29][30][31][32]. Assumptions of independence between data sources (appearing in one list does not affect an individual's likelihood of appearing in another) and homogeneity of 'capture' probabilities (all individuals are equally likely to appear in a list) can be relaxed to some extent by incorporating interaction terms and covariates into the regression model [14,27]. However, two key assumptions remain: (1) all variability in capture probabilities is explained by measured covariates; and (2) in an analysis using k lists there is no k-way dependency [14]. It has been noted that overlap between data sources may be inflated by policies to refer between services (e.g. greater integration of criminal justice and drug treatment services) [20]. We have demonstrated that even relatively simple referral structures can lead to violations of this second assumption, and consequently to biased results [33]. In particular, problems arise if the 'saturated model' (which has as many parameters as there are data points) is selected according to conventional model fit criteria, because there are many alternative such models, all of which have perfect fit to the observed data but which can produce widely differing estimates of the population size [14,33].
In this paper we show, through a case study of estimating the number of PWID in Bristol, UK, how CRC estimates can be highly sensitive to multiple sources of heterogeneity. We demonstrate the need for careful selection of 'lists', accounting for routes of referral and use of covariates. In addition, we show how the formal incorporation of other sources of evidence related to PWID prevalence, such as drug-related deaths, can support the generation of valid estimates [7,19,20,34]

Data sources
Our PWID case definition was individuals resident in Bristol aged between 15 and 64 years reporting current or previous injecting and use of opiates and/or crack cocaine during 2011.
Cases were identified from a single database serving multiple service providers (http://www.theseus.org.uk/) comprising: (a) treatment in shared care with primary care or specialist drug treatment services; (b) the Bristol Drugs Project needle and syringe programme (NSP); and (c) the Criminal Justice Intervention Team (CJIT), which undertakes assessments in police custody, probation or prison. Routine linkage of individuals between these sources is undertaken by service providers based on full name (and alternative spellings and known aliases), date of birth and sex.
Individuals in drug treatment were categorized as 'incident' (a new treatment onset in 2011 with at least 21 days since last treatment discharge) or 'non-incident' (already in treatment at the beginning of 2011, with no new onset during 2011). Additionally, referrals into treatment were categorized as 'CJIT' (referred directly by the CJIT) or 'non-CJIT' [largely general practitioner (GP), non-statutory drug service or self-referred].
The following covariates that might influence the 'capture' probability (the likelihood of being observed) were available for each of the three data sources: gender, age group (15-24, 25-34 and 35-64 years) and instability of housing (see Supporting information, Appendix for coding of this variable).

External information: mortality data
We also obtained data on: (i) the number of drug-related poisoning deaths (DRPs) that occurred in Bristol during 2011, for which opiates and/or crack cocaine were mentioned on the death certificate; and (ii) rates of DRP for a large English cohort of PWID linked to the Office for National Statistics (ONS) mortality register [35,36]. More details are provided in the Supporting information, Appendix. Combined, these data can provide us with an estimate of the number of PWID independently of the CRC.

Statistical analysis
We undertook the following analytical steps: (i) Preliminary analysis (without covariates) The overlap data consist of seven observed cell counts representing the number of PWID in each combination of the three sources. We used Poisson log-linear regression models [26] fitted in Stata version 13.1 to model these observed overlaps. We fitted the eight standard models usually applied in CRC estimation exercises, compared population size estimates, and assessed model fit via the Akaike information criterion (AIC). The most complex of these models accommodates an interaction (dependency) between each pair of lists. This is a 'saturated' model, meaning that there are as many parameters as there are data points: it is impossible, therefore, to allow for a three-way dependency in addition. We compared results from three alternative saturated models in which the three-way interaction term is included at the expense of one of the two-way interactions [33].

(ii) The effect of referrals
We have shown previously that referrals between service providers (e.g. directly from criminal justice into treatment) can produce complex between-source dependencies, leading potentially to misleading estimates [33]. We therefore re-ran all 11 regression models after excluding cases referred via CJIT from the treatment list.

(iii) The effect of excluding non-incident cases
We hypothesized that incident treatment cases might have different capture probabilities (i.e. a greater likelihood of appearing in NSP or CJIT) than those treated for longer periods, as longer-term treatment might be associated with a less chaotic life-style. Failure to account for this heterogeneity could bias results. Conversely, if heterogeneity is not present, then estimates should be robust to exclusion of non-incident cases. We therefore re-ran all 11 models after restricting the treatment list to incident cases only. We performed this exercise both with and without inclusion of the CJIT referrals.
The four alternative versions of the treatment list are shown in Box.

A
Full treatment list B CJIT referrals excluded C All incident treatment cases D Incident treatment cases, excluding CJIT referrals Sensitivity of estimates to choice of treatment list was taken as evidence of heterogeneity in the full list. Because this violates the assumptions of CRC, we selected the least heterogeneous treatment list to take forward into the next stages of analysis.

(iv) Incorporation of covariates
We assumed main effects of, and first-order interactions between, all three covariates. We then assessed whether model fit improved with the addition of covariate interaction terms. We compared the three models which assumed independence between data sources: (1) no covariates, (2) all three sources depend on gender and age group and (3) all three also depend on housing instability. Thereafter, we considered extensions of the best-fitting of these to also allow for dependencies between sources.
These analyses could be performed in any standard statistical software. We used the Bayesian software WinBUGS [37], as this facilitates the extensions in the next analytical step. Bayesian analyses require the specification of 'prior' distributions for parameters, quantifying knowledge prior to examination of the data. We assumed vague normal prior distributions for each regression coefficient, representing lack of prior knowledge. The results presented are posterior medians with 95% credible intervals (Cr-Is). Model fit was compared using the Deviance information criterion (DIC) [38], a Bayesian analogue to the AIC with lower values implying better fit, and posterior mean residual deviance. Well-fitting models should have mean residual deviance approximately equal to the number of data points (in this case, 84 covariate and source combinations). WinBUGS code for the best-fitting model is shown in the Supporting information, Appendix.

(v) Incorporation of external information
Incorporation of external information alongside a CRC analysis offers potential for model validation or formal identification of inconsistencies. Full details of our incorporation of external information are provided in the Supporting information, Appendix.
Briefly, we first used DRP data in isolation to provide an estimate of the total number of PWID independently from the CRC. The approach used is similar to the 'multiplier method' often used for prevalence estimation [8], but with more appropriate accounting for uncertainties.
We then formally incorporated the mortality data alongside the CRC, allowing both sources of information to inform prevalence simultaneously [19,20], i.e. unifying the two analyses.
Models based on reduced versions of the CRC data (i.e. using treatment lists B-D rather than A) could produce estimates for some subpopulations that are less than the total observed, which would be invalid. We therefore also incorporated formally the total number of observed PWID in each age and gender group as lower bounds.
Because the incorporation of external information has the potential to change the optimal choice of CRC model, we present results from each of the main competing CRC models incorporating the external data. Our final estimates are from the model with the lowest DIC at this stage.

Characteristics of the data
The full data, stratified by all covariates and source combinations, are shown in the Supporting information, Appendix [27]. In total, 2450 PWID aged 15-64 years were identified: 5% aged 15-24, 38% aged 25-35 and 57% aged 35-64 years. 73% were men and 31% were in 'unstable' housing; 285 were in contact with the NSP; 418 were assessed by CJIT; and 2276 (93%) were in treatment during 2011 (full treatment list A).
Removing CJIT referrals reduced the treatment list to 2189 cases (treatment list B) and the full linked data set to 2418. There were 674 incident treatment cases (list C). Inclusion of only these in the treatment list reduced the full data set to 1089. Further removing incident CJIT referrals reduced the treatment list to 573 (list D) and the full linked data set to 1066.

Statistical analysis
Summary results from analytical stages (i), (ii) and (iii) are displayed in Table 1. Estimated interaction terms from each model are displayed in the Supporting information, Appendix.

(i) Preliminary analysis
Based on the full data set, there was clear evidence from the AIC that at least two interaction terms were required to explain the observed overlap. Evidence for a positive dependence between treatment and the CJIT was particularly strong. Five models had jointly lowest AIC (Table 1, models 7-11). These generated widely varying estimates of the total number of PWID: from 2740 [95% confidence interval (CI) = 2670, 2840] to 6890 (95% CI = 3740, 17680).

(ii) The effect of referrals
After removing CJIT referrals from the treatment list (list B) there remained evidence for a positive dependence between treatment and the CJIT, but the estimated size of this was reduced considerably. The true dependence structure, however, remained unclear, the same five models still jointly having the lowest AIC. Population size estimates from these models were slightly less variable, but still differed considerably (Table 1).

(iii) The effect of non-incident cases
Results were highly sensitive to inclusion of nonincident treatment cases, providing evidence of variable catchability, i.e. incident and non-incident treatment cases having different probabilities of appearing in the other lists. Inclusion of only incident cases (treatment lists C and D) reduced the complexity of dependence structures needed to explain the overlap and generated substantially smaller, and more consistent, estimates.
As there was evidence of heterogeneity in the larger treatment lists (as indicated by sensitivity of results to removal of some cases), we selected the reduced list D to take forward into the next stage. For this data set, the simplest model was preferred based on the AIC, and other models with similar AIC provided similar estimates. We note, however, that the best-fitting unadjusted (without covariates) model gives an estimate of 2350 (95% CI = 2140, 2600) PWID, which is below the number observed (2450).
(iv) Incorporation of covariates Table 2 (models 1-7) displays estimates and model fit statistics from covariate models. Corresponding gender and age group-specific estimates are shown in Table 3.
Allowing capture probabilities to vary by gender and age group led to some improvement in model fit (11-point reduction in DIC, relative to baseline model 1), but did not change estimated prevalence. Gender and age groupspecific estimates were also largely robust to incorporation of these covariates, although Cr-Is became wider.
Accounting for housing instability led to an additional 96-point reduction in the DIC, indicating substantially better fit. Incorporation of this covariate led to a slight increase in estimated prevalence for all subgroups (Table 3, model  3) and overall (2690, 95% Cr-I = 2360, 3140).
There was evidence for a positive dependence between treatment and the CJIT in younger males (model 5;   Capture-recapture subject to constraints and accounting for mortality data a 3.  six-point reduction in DIC relative to independence model) or for a negative dependence between the NSP and CJIT for men only (model 6; three-point reduction in DIC). These two models gave estimates of 2960 (95% Cr-I = 2430, 3890) and 2530 (95% Cr-I = 2210, 2980), respectively.
We also fitted a model in which both types of interaction were accommodated (model 7), but this did not improve the fit further.
(v) Incorporation of external information From the external information, we estimated the rate of DRPs among PWID in Bristol to be 5.7 (95% Cr-I = 3.9-8.3) per 1000 person-years. There were 15 such deaths among PWID in Bristol in 2011. This generates an estimate (independent of the CRC) of 2570 PWID (95% Cr-I = 1330, 4730). Full details are provided in the Supporting information, Appendix.
Models 3*-7* (extensions of models 3-7, respectively) in Tables 2, 3 show prevalence estimates based on simultaneous modelling of the CRC, mortality data and lower bounds. After incorporating the external data, the preferred model based on the DIC was model 6*. Table 3 shows that model 5 was in conflict with some of the lower bounds, particularly for older women [predicting 170 (95% Cr-I = 120, 250), whereas there were known to be at least 300]. The degree of conflict between model 6 and the lower bounds was less, resulting in better overall fit when all data were modelled together.
Our posterior estimate of the DRP rate among PWID in Bristol, after formal incorporation with the CRC data, was 5.6 (95% Cr-I = 4.1, 7.5) per 1000 person-years. This reflects some learning about mortality rates from the CRC, in addition to learning about prevalence from the mortality data.

DISCUSSION
We demonstrate potential biases in CRC estimation, but show that it may be possible to: identify sublists of individuals for whom application of CRC is more likely to be valid; incorporate novel additional covariates; and use external information to test consistency, guide model choice and strengthen the evidence for CRC estimates.
For our case study, standard analysis of the routine data usually used for CRC could not identify a single best estimate: estimates from equally well-fitting models varied 2.5-fold. Estimates were sensitive to inclusion of nonincident treatment cases and deliberate referrals between the data sources. Inclusion of such cases generates considerable heterogeneity that is not accounted for by standard methods.

Limitations
Despite the progress we have made, the estimates from our case study should be interpreted cautiously. We found evidence of heterogeneity in the full treatment list, violating a key CRC assumption. We therefore reduced this list to a more homogeneous subpopulation before applying CRC techniques. However, CRC assumes that all individuals within covariate groups (whether observed or not) have equal capture probabilities. This means that our CRC-based estimates relate technically only to those PWID who are similar to those included in the reduced data set. As our target population is, in fact, all PWID, including the longer-term treated, we then used the total number of observed cases to provide lower bounds. Ideally, we would have accounted instead for heterogeneity in the full list using observed covariates. We explored use of 'incident' as a covariate, but these analyses indicated a complex between-source dependence structure among the nonincident cases, such that it was not possible to identify a single best-fitting model. Additional factors for which we were unable to account could also influence capture probabilities [39]; for example, crack use, intensity of opiate dependence, pattern of injecting and comorbidity. These were not available from all three data sources.
We considered our target population to be people 'at risk' of injecting-encompassing people with a risk of relapse-which is a key quantity for policymakers and studies of injecting risk [2]. There is no accepted definition of 'injecting risk', with studies showing high rates of relapse after 1 year but reducing substantially after more than 5 years [40,41]. By including all individuals who reported ever injecting, it is conceivable that we have included some who have ceased injecting, which would lead to an overestimate. Ideally we would include only individuals who had injected in the last year, but this information was not recorded consistently and, even if it were, is unlikely to be reliable.
We assumed that all matches were identified correctly, with no misclassification of the target population within data sources. However, unlike many other CRC exercises, we took lists of individuals from a single database in which records are matched on an ongoing basis based on full details (including aliases and potential misspelling), which is likely to minimize misclassification errors.
There are also limitations in our external data. We assume both that the 15 opiate-and crack-related deaths were all PWID (which is never specified on the death certificate) and that the mortality rate generated from the cohort study is accurate (relying upon perfect matching to ONS mortality data). It is conceivable that incorporating other external information could reveal further biases and alter estimates.

Implications and other evidence
Covariate CRC techniques-where heterogeneity is accommodated within a single model rather than by stratifying data-have been recommended previously, but are not adopted routinely [27,42]. National exercises in England and Scotland [16,18] involve separate CRC estimation in each of many subgroups. Interaction terms are then unlikely to be statistically significant and will be overlooked. In addition, very few CRC studies in drug misuse have made use of covariates other than gender and age. In our case study, results were more sensitive to incorporation of a variable representing instability of housing (a marker of chaotic life-style). Similarly, Fisher and colleagues [43] showed that mental health problems were an important factor when estimating prevalence of homelessness. It is important, therefore, to ensure that data sources collect rich information regarding characteristics that might influence 'catchability', to aid future prevalence estimation.
The problems around non-incident treatment cases and referrals between services represent important potential biases to be recognized in future studies. It is our contention that many recent exercises of CRC (including potentially some of our own) may be biased, especially if the estimates have not been validated against other data. The data sets used for national exercises in England [18] include all treated cases (not just incident) and do not account explicitly for referrals [33]. The results are therefore subject to the problems we have identified.
Simple 'multiplier' methods for estimating prevalence, e.g. based on DRPs, are also subject to bias [4,8,44]. Techniques essentially using a series or combination of multipliers to minimize bias have been proposed [45,46], but have not quantified uncertainties or formally assessed consistency of evidence. Incorporation of multiple sources of evidence in a single Bayesian analysis facilitates this, with the potential to also allow for biased evidence [1,7,34]. Other studies have used a Bayesian approach to incorporate DRP data alongside CRC, but with weaker information on expected mortality rates, such that the impact on prevalence estimates was very low, and without careful assessment of heterogeneity in the CRC [19,20,47].

CONCLUSIONS
We would like to rescue CRC as a robust and valid method for estimating the prevalence of problem drug use to inform evidence-based medicine and policy making. Indirect estimation methods are crucial, as household surveys are not a viable alternative. However, there is a need for a step change in the analysis, presentation and justification of CRC in drug misuse. Given the emerging complexity of dependencies and heterogeneity across data sources, researchers may need to re-focus, working with local experts in order to understand and enrich the available data more effectively, rather than attempting large-scale national exercises with minimal information.

Declaration of interests
This study was supported by the Medical Research Council (MRC) Nationally Integrated Quantitative Understanding of Addiction Harms addiction research cluster (grant number G1000021) and the NIHR Health Protection Research Unit in Evaluation of Interventions. T.M. has received research funding from the UK National Treatment Agency for Substance Misuse (now Public Health England) and the Home Office. He is a member of the organizing committee for, and chairs, conferences supported by unrestricted educational grants from Reckitt Reckitt Benckiser/Indivior, Lundbeck, Martindale Pharma and Britannia Pharmaceuticals Ltd, for which he receives no personal remuneration. The views expressed in this paper are those of the authors and not necessarily those of the NHS, the NIHR, Department of Health or Public Health England.