Estimating the number of people with hepatitis C virus who have ever injected drugs and have yet to be diagnosed: an evidence synthesis approach for Scotland

Abstract Aims To estimate the number of people who have ever injected drugs (defined here as PWID) living in Scotland in 2009 who have been infected with the hepatitis C virus (HCV) and to quantify and characterize the population remaining undiagnosed. Methods Information from routine surveillance (n = 22 616) and survey data (n = 2511) was combined using a multiparameter evidence synthesis approach to estimate the size of the PWID population, HCV antibody prevalence and the proportion of HCV antibody prevalent cases who have been diagnosed, in subgroups defined by recency of injecting (in the last year or not), age (15–34 and 35–64 years), gender and region of residence (Greater Glasgow and Clyde and the rest of Scotland). Results HCV antibody‐prevalence among PWID in Scotland during 2009 was estimated to be 57% [95% CI=52−61%], corresponding to 46 657 [95% credible interval (CI) = 33 812–66 803] prevalent cases. Of these, 27 434 (95% CI = 14 636–47 564) were undiagnosed, representing 59% [95% CI=43−71%] of prevalent cases. Among the undiagnosed, 83% (95% CI = 75–89%) were PWID who had not injected in the last year and 71% (95% CI = 58–85%) were aged 35–64 years. Conclusions The number of undiagnosed hepatitis C virus‐infected cases in Scotland appears to be particularly high among those who have injected drugs more than 1 year ago and are more than 35 years old.


INTRODUCTION
Hepatitis C virus (HCV) is a major cause of chronic liver disease, leading potentially to cirrhosis and hepatocellular carcinoma [1]. The greatest risk of HCV infection in resource-rich countries comes from injecting drug use [2]. With an estimated 16 million people world-wide currently injecting drugs [3], 10 million of whom have already been infected, in this population HCV represents a significant global public health challenge [2].
As spontaneous viral clearance occurs in only approximately 25% of those diagnosed HCV-antibody-positive [4], effective treatment strategies are crucial in reducing the demand on health-care systems from chronic HCV. The development of more effective antiviral therapies-with reduced toxicity, simplified oral dosing and shortened regimens-will majorly transform the treatment of HCV infection in future [5]. For these new therapies to have any great impact on the burden of HCV, particularly among people who inject drugs (PWID) [6], effective targeting of HCV screening and case-finding initiatives is essential. To achieve this, understanding the size and characteristics of the infected populations, involving not just diagnosed individuals, but importantly those remaining undiagnosed, is crucial. Reliable estimation of these quantities is not straightforward, as direct data are not readily available. Instead, we rely upon a multiplicity of information, typically related indirectly to the quantities of interest.
Scotland has an extensive national HCV surveillance programme established to inform and monitor the impact of its Government Action Plan [7]. A wealth of epidemiological data on the PWID and HCV-diagnosed populations is available, more than in most other countries, which may be exploited usefully in a multiparameter evidence synthesis (MPES) to estimate anti-HCV antibody prevalence (HCV prevalence hereafter). MPES combines direct and indirect information, accounting for uncertainty in and potentially resolving any inconsistencies between data sources [8][9][10][11][12][13].
A Bayesian approach to MPES was applied here to: (a) estimate the number of PWID living in Scotland who are HCV-prevalent in 2009, and (b) quantify and characterize the infected PWID population remaining undiagnosed. In addition, the MPES approach enabled estimation of the total number of PWID; namely, all those who have ever injected, even though no directly relevant data were available, due to the inherent difficulties surveying this risk group.

METHODS
The analysis proceeded in two stages. In stage 1, the following estimates were obtained: 1.1 Number of HCV-diagnosed PWID, estimated from the linkage of the Scottish Drugs Misuse Database (TrtDat) [14] to the Scottish Hepatitis C Diagnosis Database (DiagDat) [15]. TrtDat records attendance at drug treatment services, whereas DiagDat records HCV diagnoses. 1.2 Number of HCV-diagnosed recently injecting PWID, using data from TrtDat to predict whether HCVdiagnosed PWID had injected recently. Note that 'recently' is defined as having injected in the last year (see Discussion for further consideration of this definition).
In stage 2, estimates of the size of the non-recently injecting PWID population and both the total and undiagnosed HCV-infected PWID populations were derived. This involved combining information on: 2.

Bayesian MPES framework
Throughout we adopted a Bayesian framework for estimation [19]. This approach consists of: (i) Defining prior distributions: before looking at the data, anything known about the basic parameters (e.g. HCV prevalence) is expressed as a probability distribution (the prior distribution). This is flat, with equal probability across all possible values, when no specific information is available or peaked otherwise (e.g. if evidence is available from a previous study). (ii) Relating data to parameters: the observed data are assumed to be realizations from a distribution (see Model details below) and used to construct a 'likelihood' function, which describes the relationship between the data and the basic parameters, quantifying the support that the data provide to the possible parameter values. (iii) Obtaining posterior distributions: the prior distribution is updated with the information from the data likelihood to form a posterior distribution, combining from both prior knowledge and data. In principle, this distribution is proportional to the product of the prior and the likelihood. For complex models, however, an analytical expression for this distribution cannot be derived easily. Instead, we simulate from the posterior distribution using a Markov chain Monte Carlo algorithm [20]. We use the posterior samples of the basic parameters to estimate the key quantities of interest. All posteriors are summarized in terms of posterior medians and 95% credible intervals (CI). A Bayesian MPES approach incorporates data from multiple sources, potentially including information known to be affected by biases, which then are modelled explicitly. The Bayesian approach was implemented in OpenBUGS [21], with posterior estimates for all parameters of interest based on 100 000 samples.

Epidemiological model
As HCV prevalence can vary over time and depends upon demographic characteristics among PWID, we estimated the size of the HCV-infected population according to: (a) recency of injecting [recent (R) and non-recent (NR)], (b) age group (15-34 and 35-64 years), (c) gender and (d) region of residence [Greater Glasgow and Clyde (Glasgow) and the rest of Scotland]. Denoting by i the recency of injecting, i ∈ {R, NR} and d the demographic groups defined by age (a), gender (g) and region (r), such that d = {a, g, r}, define: other risk factor (e.g. blood transfusion) for 5% and unknown for 34%. Some diagnosed individuals with unknown risk were identified as being PWID through linkage of DiagDat with TrtDat (n = 2352), which contains data on those who had attended drug-treatment services since April 1995 [14]. Of the remainder with unknown risk, the proportion who were PWID was estimated based on the observed proportion and the model in Fig. 1. Figure 1 shows the data structure of DiagDat linked to TrtDat, where HCV-diagnosed individuals are subdivided into recent PWID, non-recent PWID and non-PWID in 2009. The parameters p j (j = 1, …, 21) denote the probabilities of possible subdivisions at each branching. For example, p 3 represents the probability that an HCV-diagnosed individual with unknown risk group at diagnosis is a PWID; p 12 represents the probability that an HCV-diagnosed individual, with PWID risk at diagnosis and ever-injector status in TrtDat in 1995-2008, was a recent PWID in 2009; and p 17 represents the probability that an HCV-diagnosed individual, with unknown risk group at diagnosis and ever-injector status in TrtDat in 1995-2008, was a recent PWID in 2009.

HCV-diagnosed recent PWID
While the information held on DiagDat cannot distinguish between a recent and non-recent PWID, TrtDat records whether an individual injected in the last month. However, this can only be considered to reflect recent behaviour in those last registered with a drug service in 2009. For those last registered prior to 2009, a prediction of their recent/nonrecent PWID status in 2009 was made based on individual characteristics relating to injecting behaviour using a regression approach (see Supporting information, Appendix S1 for details). In Fig. 1, the number of individuals at each branching, y j (j = 1, …, 21), was assumed to be a realization from a binomial distribution with unknown probability, p j , and denominator equal to its 'parent', n j , such that y j ∼ Binomial(n j , p j ). For example, the number of PWID in the unknown risk group is assumed to be from a binomial distribution with probability p 3 and denominator equal to the number in the unknown risk group (n 3 = 7603). To identify the total number of recent and non-recent PWID, it was necessary to constrain some of the unknown probability parameters. Table 1 gives details of these constraints and the prior distributions employed in the model.
Inference about the parameters in the regression model and the p j (j = 1, …, 21) were made simultaneously,

Stage 2: Estimating the number of HCV-infected recent and non-recent PWID and the number undiagnosed
The following estimates for each demographic group d were combined using MPES: The CR study [16] generated estimates (Supporting information, Appendix S2) of the number of current PWID in Scotland in 2009 by age, gender and region, which provide information on the size of the recent PWID population via a prior distribution. Note that this prior is bimodal ( Fig. 2 and Supporting information, Appendix S2), as the CR results were obtained by averaging estimates over different models [16]. 2.2 HCV prevalence in recent and non-recent PWID (π R,d , π NR,d ) and proportion diagnosed (δ R,d , δ NR,d ) NSP is a voluntary anonymous survey of PWID, conducted nationally at approximately 100 selected needle exchange services [17,18]. Participants provide a blood-spot sample for HCV testing and information on any previous HCV diagnosis. From the 2008-09 survey, data on HCV prevalence in PWID (n = 2511), both recent (n = 1738) and non-recent (n = 772), and on the diagnosed proportion in these groups were available (Supporting information, Appendix S2). A recent PWID was defined in NSP by injection in the last month: a sensitivity analysis using injection in the last 6 months instead found the main results unchanged. NSP participants Table 1 Prior assumptions for the parameters in the stage 1 model.

Parameter
Prior assumption Comment p j Uniform (0,1) Flat prior distribution (for j = 1,2,4,…,11) p 3 Uniform (0.6, p 2 ) The prevalence of PWID in Scotland's HCV-diagnosed has been estimated as 83% (95% CI = 81-87%) [22], which would imply that 18 771 of the 22 616 diagnosed are PWID. 13 800 were known PWID from DiagDat, leaving 4971 unknown PWID. This gives the probability that a diagnosed individual with unknown risk was a PWID as 0.65 (95% CI = 59-77%) prob(PWID | unknown risk group at diagnosis) The probability that a known PWID 'never injector', linked to TrtDat in 1995-2008, had recently injected was assumed to be equal to that for a known PWID 'ever injector' linked to  attend services providing injecting equipment and other harm-reduction services and so are potentially more likely to have been tested for HCV than those not attending these services, which could result in an overestimate of the proportion diagnosed. This potential bias has been modelled explicitly by including an additional unknown age-specific bias parameter, b δ a , representing the log odds ratio of the NSP estimated relative to the 'true' diagnosed proportion (Supporting information, Appendix S3).

Number of HCV
TrtDat records data at the first attendance in at least 6 months to a particular drug treatment service. This source is thus probably biased towards recent, rather than non-recent, PWID, generating an overestimate of the number of diagnosed recent PWID from stage 1. The stage 2 model includes an additional age-specific parameter to account for this potential bias, b D a , representing the ratio of the TrtDat estimated to the 'true' number of diagnosed recent PWID (Supporting information, Appendix S3).

Model details
All subgroups were modelled simultaneously. Estimation of unknown parameters of interest was based on the joint posterior distribution, with likelihood a product of independent binomial likelihoods for the NSP data and independent normal likelihoods for the stage 1 estimates (Supporting information, Appendix S4 and OpenBUGS model code in Supporting information, Appendix S6). Figure 3 presents schematically the relationship between the unknown parameters and the data sources. Table 2 gives details of the prior distributions and constraints that were specified in the model.

Stage 1
Estimated number of HCV-diagnosed recent and non-recent PWID  (Table 3).
Seventy-three per cent (95% CI = 61-91%) of diagnosed individuals with unknown risk at diagnosis are estimated to be PWID (p 3 in Fig. 1). This increases the total number of diagnosed PWID by approximately 40% compared with ignoring this unknown risk group, from 61% to 86% of all those diagnosed.
The estimated proportion of diagnosed PWID who are recent PWID varies from 0.27 to 0.44 across demographic groups. Lower proportions are estimated for the older age Figure 3 Relationship between the parameters and the data sources. Circles denote the unknown parameters (or functions of parameters) which are to be estimated. Squares denote the data sources. A link between a parameter (or function of parameters) and a data source indicates that the data source provides information on that parameter (or function of parameters). ρ : proportion of the population in risk group; π : hepatitis C virus (HCV) prevalence; δ : proportion of HCV-prevalent cases that are diagnosed; T : total population size; b D : bias parameter for the number of diagnosed people who inject drugs (PWID) recently; b δ : bias parameter for proportion diagnosed group compared with the younger, for women compared with men and for those living in Glasgow compared with the rest of Scotland.

Stage 2
The results presented in the following sections are from the baseline model with bias parameters. Results from other models are given in the Sensitivity analysis section.

Estimated number of recent and non-recent PWID
The estimated numbers of non-recent PWID are considerably higher than of recent PWID, particularly in the older age group (  (Fig. 2) that the posterior distributions for recent PWID are slightly bimodal, reflecting the bimodal CR prior.

HCV prevalence estimates
Prevalence estimates vary between subgroups from 37 to 81%, with a higher prevalence in Glasgow than in the rest of Scotland, and in the older versus younger age groups (Table 4). In Glasgow, the prevalence is higher in non-recent than recent PWID in men, but the reverse is found in women. Outside Glasgow, HCV prevalences for recent and non-recent PWID are similar. The estimated total number of HCV-prevalent cases in Scotland in 2009 is 46 657 (95% CI = 33 812-66 803), involving 7559 (95% CI = 6579-8501) recent and 39 121 (95% CI = 26 310-59 094) non-recent PWID.

HCV diagnosed and undiagnosed estimates
The estimated proportion diagnosed ranges from 30 to 56% and is generally higher in the younger than the older age groups ( Table 5). The estimated total number of undiagnosed-HCVprevalent PWID in Scotland in 2009 is 27 434 (95% CI = 14 636-47 564), with more than 80% of undiagnosed cases being non-recent PWID and more than 65% in the older age-group.

Bias parameter estimates
Estimates of the number of diagnosed recent PWID generated from the DiagDat/TrtDat data in stage 1 are larger than expected, based on the other data sources, by a factor of 1.30 (95% CI = 0.87-2.23) in the younger age group and 3.81 (95% CI = 2.45-4.93) in the older age group. This bias parameter estimate is clearly larger in the older age group than the younger, suggesting there are fewer than a third as many diagnosed recent PWID aged > 35 than estimated in stage 1.
The estimated odds ratio of the NSP reported to the 'true' diagnosed proportion is 0.99 (95% CI = 0.53-2.11) in 15-34-year-olds and 1.75 (95% CI = 0.85-3.06) in 35-64year-olds. Although there is a suggestion of an age difference in the bias parameter estimates for the NSP diagnosed data, due to uncertainty there is no clear evidence of a difference.

Model fit
The overall model fit was assessed using deviance summaries. The baseline model provided a nearly exact fit, as the numbers of data points and parameters are similar (Table 6).

Sensitivity analysis
The inclusion of bias-adjustment parameters was driven by expert knowledge of the data sources and their potential biases. No direct empirical evidence was available to inform the bias parameters; hence, the expert knowledge comprised plausible upper and lower bounds for their prior distributions (Table 2). To assess sensitivity to this expert judgement, alternative models including one in which the data sources were assumed to be unbiased, were explored: Sensitivity 1: Model with unbounded bias parameters; Sensitivity 2: Model without bias parameters; and Sensitivity 3: Model without either bias parameters or CR informative prior for ρ R. Flat prior distribution Values outside the range of 0.5-5 were thought to be implausible Also, log b D It was expected that any bias would be greater in the older age-group (a = 2) than the younger age-group (a = 1) CR = capture-recapture; people who inject drugs. When the bounds for the bias parameters are removed the non-recent PWID estimates increase greatly, due to a higher upper limit for the 95% credible interval estimate of the risk group size (sensitivity 1).
There is evidence that without bias adjustment (sensitivity 2) there is some lack of fit, with estimates of the number of recent PWID from this model being in conflict with those from the CR prior (17 811 and 15 618, respectively) ( Table 7 and Fig. 2). Without the CR prior (sensitivity 3), the estimated number of recent PWID are even higher (27 977), further supporting the hypothesis that the lack of fit was due largely to this conflict between the CR study and the other data.
The main results and conclusions presented were based on the baseline model with bias adjustment parameters, as this gave the best model fit according to deviance statistics when incorporating all available relevant sources of information (Supporting information, Appendix S5).

Key findings
For the first time, using MPES, we have obtained estimates of the size of the HCV undiagnosed populations, which are particularly valuable for the planning of future healthservice demands and for identifying specific subgroups to target in prevention programmes. Other modelling has demonstrated that new HCV treatments could have a substantial impact on reducing HCV transmission among PWID [6,25]. Accurate assessment of the magnitude of that effect, as well as implementation of treatment strategies, will require reliable knowledge of the diagnosed and undiagnosed PWID populations. We estimated that of the 46 000 prevalent HCV infections among PWID in Scotland in 2009, 59% were undiagnosed and 83% (95% CI = 75-89%) of the undiagnosed had not injected that year. While some non-recent PWID will be in contact with drug treatment services, an appreciable number may not. Reaching this population may prove challenging, but it is necessary to implement diagnosis and treatment programmes. Our analysis has also highlighted a need to target diagnosis programmes towards older age groups. We estimated that 71% (95% CI = 58-85%) of undiagnosed PWID in 2009 were aged 35-64 years, compared with 55% of all new HCV diagnoses in Scotland in 2009-12 in the same age group [26]. Furthermore, as these older individuals are at greater risk of progressing to advanced stages of HCV disease, they have a pressing need for prompt treatment.
In Glasgow, HCV prevalence estimates are especially high in the older group due to a historical injecting epidemic which started in the early 1980s, and resulted in a rapid rise in the number of PWID before the establishment of needle/syringe exchange in the city.
Linkage of TrtDat to DiagDat enabled a betterinformed estimate of the number of PWID among the HCV-diagnosed for use in the evidence synthesis than was obtainable from DiagDat alone. Through modelling the probability of linkage to TrtDat and the recent/nonrecent status of PWID explicitly, estimates of the size of subgroups that were not observed directly (e.g. PWID in the unknown risk group) were obtained. This increased the estimated number of diagnosed PWID from 61 to 86% of all those diagnosed, comparable to a similar estimate from a CR study of the Scottish HCVdiagnosed population [22].

Limitations
Producing reliable estimates of the number of individuals with anti-HCV antibodies depends upon information on the size of the PWID population. The CR study provides Table 7 Posterior medians and 95% credible intervals for people who inject drugs (PWID) group size, number of hepatitis C virus (HCV) prevalent and number of HCV undiagnosed from sensitivity analyses. estimates of the number of recent PWID, but no data on the size of the non-recent PWID population exist. This is, by nature, a difficult risk group to identify and survey. Recent PWID were estimated to account for only 19% (95% CI = 13-26%) of the ever-PWID population, similar to modelling projections for Scotland for 2010 of 19% [25] but smaller than estimates for England in 2005 of 40% [12].
The definition of non-recent PWID is variable, even among the data sources used here, and cannot be interpreted as long-term cessation of injecting. The CR study provides estimates of the number of PWID injecting during a particular year (2009), but in other data sources 'recent' was defined as injecting in the last month. However, this definition of recent does not capture all infrequent but at risk of continuing injectors, and nor does the CR definition of 'last year' injectors, due to the high uptake of methadone treatment in the PWID population. We were limited by the definitions in the data available; however, a challenge for the future is the collection of data in which a broader definition of recent PWID, which includes infrequent injectors and reflects that people temporarily cease injecting due to opioid substitution therapy and prison, is used.
The sensitivity analyses highlighted a discrepancy between the number of recent PWID estimated by the CR study alone and the number suggested by the other data, which was resolved by inclusion of bias-adjustment parameters. The propensity of a recent PWID to contact drug treatment services and hence be reported in TrtDat may not be the only reason for bias; the regression analysis may not have captured fully the characteristics that distinguish recent from non-recent PWID. In the absence of data on timings or characteristics of the transition from injecting to non-injecting, a more comprehensive modelling of injecting careers, allowing prediction of the recent/non-recent status at any point in time, was impossible. Furthermore, to discriminate more clearly between the sensitivity analyses and estimate more accurately the magnitude of the biases in the data would require improved external data, ideally on the sizes of the recent and non-recent PWID populations, which are currently nonexistent due to the challenge of surveying these populations.

Findings in relation to other evidence
We have presented estimates of HCV-antibody prevalence in PWID in Scotland from the combined use of survey and surveillance data relating to PWID and HCV prevalence. The flexibility of the MPES approach allowed us to combine the information from each data source simultaneously; to account for any potential bias; and to propagate the full uncertainty of each contributing data item through to the final estimates. This approach overcomes the limitations of more traditional methods of prevalence estimation [13,27]. MPES methods have been employed successfully to estimate the prevalence and incidence of other diseases, including HIV [9], toxoplasmosis [8] and influenza [28,29], as well as for HCV prevalence estimation in other countries [12,31,30]. To our knowledge, however, this is the first synthesis that allows estimation of undiagnosed HCV prevalence. Evidence synthesis that accounts for expert knowledge of biases and other limitations of available data may be of value to other countries, particularly those with a mixed evidence base for HCV infection.

Implications
HCV testing in drug treatment services has been found recently to be effective in increasing the numbers of PWID diagnosed in Scotland [32,33]. Targeting older individuals with a history of injecting drug use through primary care can also be an effective case-finding approach [34]. However, such approaches will require fully engaged general practitioners (GPs) and communitysetting practitioners in high HCV-prevalence areas, and widespread adoption, to diagnose the vast majority of PWID.
Our modelling has focused upon HCV in the PWID population. While these individuals account for the majority of the HCV burden, the contribution of other risk groups may also be important. HCV prevalence varies by ethnicity and it is thought that South Asian individuals may have an increased prevalence [12]. Future work will extend the evidence synthesis to include ethnicity, thus estimating the prevalence of undiagnosed HCV for the whole population in Scotland.

Declaration of interests
None.