Using capture‐recapture methods to estimate influenza hospitalization incidence rates

Abstract Background Accurate population estimates of disease incidence and burden are needed to set appropriate public health policy. The capture–recapture (C‐R) method combines data from multiple sources to provide better estimates than is possible using single sources. Methods Data were derived from clinical virology test results and from an influenza vaccine effectiveness study from seasons 2016–2017 to 2018–2019. The Petersen C‐R method was used to estimate the population size of influenza cases; these estimates were then used to calculate adult influenza hospitalization burden using a Centers for Disease Control and Prevention (CDC) multiplier method. Results Over all seasons, 343 influenza cases were reported in the clinical database, and 313 in the research database. Fifty‐nine cases (17%) reported in the clinical database were not captured in the research database, and 29 (9%) cases in the research database were not captured in the clinical database. Influenza hospitalizations were higher among vaccinated (58%) than the unvaccinated (35%) in the current season and were similar among unvaccinated (51%) and vaccinated (49%) in the previous year. Completeness of the influenza hospitalization capture was estimated to be 76%. The incidence rates for influenza hospitalizations varied by age and season and averaged 307–309 cases/100,000 adult population annually. Conclusion Using C‐R methods with more than one database, along with a multiplier method with adjustments improves the population estimates of influenza disease burden compared with relying on a single‐data source.

population estimates depends largely upon the quality of sampling, which in turn is dependent upon many factors including consistency of the population being sampled across all capture occasions, 1 and affected individuals having contact with the healthcare system to allow for enumeration. Cases of unreported disease are more difficult to account for.
Statistical methods to improve population disease incidence and prevalence detection include the capture-recapture (C-R) method.
The Lincoln-Petersen (Petersen) method was the earliest C-R method for estimating population size. It was developed for field studies of animals in which only a sample of the population can be caught and marked (captured). Frequently, animals are re-caught (recaptured), and this method allows for recaptured animals to improve the population estimate. 2 In health-related research, C-R uses the overlap of subjects from two or more data sources and log-linear methods to more accurately estimate true population disease burden. 3 C-R has the advantage of using both research and clinical databases to measure the same data, thereby creating a fuller perspective; however, the best method to adjust for complex denominators in urban areas with competing health systems is not straightforward within C-R.
The Centers for Disease Control and Prevention (CDC) has developed methods to estimate population influenza burden (L Kim, personal communication, 2020) that account for some of the complexities of multicenter study design and the incomplete nature of surveillance data of the Hospitalized Adult Influenza Vaccine Effectiveness Network (HAIVEN) study. The current study used available data from a single health system, HAIVEN methods and adjustments, and C-R calculations, to estimate adult influenza hospitalization burden in Allegheny County in Southwestern Pennsylvania.

| METHODS
The study was approved by the University of Pittsburgh IRB and consists of a three-phase analytic plan. The phases are as follows: (1) C-R to more accurately estimate influenza cases; (2) statistical analyses for population burden based on the CDC HAIVEN network methods (Lindsay Kim, MD, personal communication); and (3) an adjustment to this resultant population burden using C-R incidence estimates. The HAIVEN population burden methods may not account for the richness of the clinical virology data available in our particular locale. Our adjustment to the HAIVEN methods was intended to capitalize on both the robust nature of the HAIVEN methods and the richness of the local virology data.

| Phase 1: Statistical analyses for C-R
Data used for this analysis were collected from two sources: (1) the local health system's clinical surveillance software system (Theradoc ® ), which extracts virology test results from the electronic medical record (EMR); and (2) research data from selected hospitals participating in the HAIVEN study.
An IRB-approved honest broker extracted a data list from The variance and 95% confidence intervals (CIs) were calculated for the C-R estimates using the formulae: The C-R calculations were made assuming that (1) the population is closed; that is, there was no outmigration or loss to follow-up because the capture and recapture would have usually occurred during the same hospitalization. Calculation of completeness of reporting by the two sources of the C-R method is determined by calculating the number of missing cases, X An example of a C-R estimate is shown in Table S1.
Secondly, it is assumed that the populations are homogeneous; that is, each hospitalized patient has the same and constant probabil- Independence equality : Independence was tested for the 3-year total samples and the 15 subpopulations derived by stratifying on demographic factors (age, sex, and race), influenza season, vaccination status, and prior vaccination status (Table S2). should come from the same database.

| Phase 3: Combination of C-R and HAIVEN methods for adjusted population burden
To incorporate the C-R method into the HAIVEN methods to account for cases estimated by C-R but not due to the enrollment fraction, the following modification of Equation 7 was used:

| RESULTS
The viral test result analytic databases are shown in Figure 1.  Petersen C-R method using two sources of data in this study is a special case of the generalized C-R method for estimating burden using multiple data sources. In this study, estimates were nearly identical between observed and estimated incidence. By comparison, in a C-R study of norovirus cases, the combined databases yielded incidence at a rate 2.5 times the level of the rate of the highest individual database. 12 The assumptions of the Petersen estimator allow for any amount of overlap in the databases. We have shown in Table S2 that T A B L E 2 Estimated population influenza hospitalizations using the capture-recapture method  the independence assumption is met for the 3-year total and all sub-

| Strengths and limitations
Some inaccuracy of burden estimates based on surveillance data would be expected given its inherent weaknesses, compared with true population-based data. Hence, adjustments were made to account for some of those weaknesses. The methods were further enhanced by using C-R to estimate hospitalized cases. This study is one of only a few C-R papers for influenza hospitalization rates among adults and the only one of which we are aware for these seasons. It was con-

| CONCLUSIONS
Influenza illness is associated with significant costs that include lost productivity due to absenteeism and presenteeism, lost wages, and costs of medical care. Understanding the burden of influenza hospitalization is important for policy makers to allocate resources for the prevention and treatment of influenza. Petersen's C-R method, in combination with the CDC HAIVEN burden method, improved population estimates.

ACKNOWLEDGMENT
The Pennsylvania Health Care Cost Containment Council (PHC4) is an independent state agency responsible for addressing the problem of escalating health costs, ensuring the quality of healthcare, and increasing access to healthcare for all citizens regardless of ability to pay.
PHC4 has provided data to this entity in an effort to further PHC4's mission of educating the public and containing healthcare costs in Pennsylvania.
PHC4, its agents, and staff have made no representation, guarantee, or warranty, express or implied that the data-financial, patient, payor, and physician specific information-provided to this entity, are error-free, or that the use of the data will avoid differences of opinion or interpretation.
This analysis was not prepared by PHC4. This analysis was done by the University of Pittsburgh. PHC4, its agents and staff, bear no responsibility or liability for the results of the analysis, which are solely the opinion of this entity.