Record linkage study of the pathogen‐specific burden of respiratory viruses in children

Background Reliance on hospital discharge diagnosis codes alone will likely underestimate the burden of respiratory viruses. Objectives To describe the epidemiology of respiratory viruses more accurately, we used record linkage to examine data relating to all children hospitalized in Western Australia between 2000 and 2012. Patients/Methods We extracted hospital, infectious disease notification and laboratory data of a cohort of children born in Western Australia between 1996 and 2012. Laboratory records of respiratory specimens collected within 48 hours of admission were linked to hospitalization records. We calculated the frequency and rates of virus detection. To identify groups where under‐ascertainment for respiratory viruses was greatest, we used logistic regression to determine factors associated with failure to test. Results and conclusions Nine percentage of 484 992 admissions linked to a laboratory record for respiratory virus testing. While 62% (n = 26 893) of laboratory‐confirmed admissions received respiratory infection diagnosis codes, 38% (n = 16 734) had other diagnoses, notably viral infection of unspecified sites. Of those tested, incidence rates were highest for respiratory syncytial virus (247 per 100 000 child‐years) followed by parainfluenza (63 per 100 000 child‐years). Admissions among older children and those without a respiratory diagnosis were associated with failure to test for respiratory viruses. Linked data can significantly enhance diagnostic codes when estimating the true burden of disease. In contrast to current emphasis on influenza, respiratory syncytial virus and parainfluenza were the most common viral pathogens among hospitalized children. By characterizing those failing to be tested, we can begin to quantify the under‐ascertainment of respiratory viruses.


| INTRODUCTION
Acute lower respiratory infections (ALRI) include conditions such as bronchiolitis, pneumonia and influenza. The true burden of disease is difficult to calculate with estimates heavily influenced by identification methods. Globally, an estimated 10 per 1000 children aged less than 5 years were hospitalized for ALRI in 2010 in the developed world. 1 Using linked administrative data, we estimated that in Western Australia (WA) the true rate was significantly higher; approximately 45 per 1000 children were hospitalized for ALRI before their second birthday, with a much higher burden among Aboriginal and Torres Strait Islander children (hereafter referred to as Aboriginal). 2 Viruses associated with ALRI include influenza viruses, respiratory syncytial virus (RSV) and human metapneumovirus (hMPV). Current vaccines targeting ALRI viral pathogens are limited to influenza; however, multiple RSV vaccine candidates are in clinical trials. 3 Accurate pathogen-specific estimates are paramount before planning or evaluating prevention programmes.
At present, most retrospective studies using hospital admission data to investigate the aetiology of ALRI begin by selecting children with an International Classification of Disease (ICD) discharge code for ALRI. ICD codes alone are insufficient to provide accurate estimates of the true burden of respiratory pathogens because they are poorly sensitive, 4 with limited use of pathogen-specific codes. Furthermore, data are often from selected hospitals and may not be representative of the wider population. One method for more accurate calculations of the pathogen-specific burden for a whole population is record linkage.
Record linkage combines data from multiple sources relating to the same person. 5 Access to large, population-level data sets at relatively low cost is one advantage of record linkage over prospective studies. 6 We have shown that record linkage is a feasible and valid method for describing the pathogen-specific aetiology of ALRI. 7,8 It is also useful for identifying conditions and populations where current databases underestimate the burden of disease.
We first sought to more accurately describe the pathogen-specific burden of respiratory viruses (specifically by age-specific rates and clinical diagnoses) identified in a cohort of WA-born children hospitalized between 2000 and 2012. As significant portions of hospitalized children are not tested for respiratory viruses, their viral burden is probably underestimated. Therefore, we also aimed to characterize those who were not tested so that statistical modelling could be used to assign viruses to those who were not tested, based on demographic and admission factors.

| Setting and data sources
WA covers 2.5 million square kilometres with a population of approximately 2.5 million people at June 2015. 9

| Hospital data
As PathWest data were only available from 2000, we restricted hospital records to those with an admission date between January 2000 and December 2012. Only hospital records with an admission and discharge date during the study period were included (hereafter referred to as hospital admissions). Admissions on or after the date of death were considered post-mortem admissions and excluded.
Clinical diagnosis was classified using a hierarchy of ALRI, upper respiratory tract infections (URTI), or other, with priority given to ALRI.
Admissions were classified as ALRI if they had a principal or codiagnosis ICD-10-AM (ICD 10th revision, Australian Modification) code for pneumonia, bronchiolitis, influenza, unspecified ALRI, bronchitis or whooping cough (codes listed in Table S1). 10 Admissions with a principal or codiagnosis of URTI, such as otitis media and sinusitis, were classified as URTI (Table S1). 11 All other admissions were classified as "other" and further subdivided based on principal diagnosis codes (Table S1).  (Table S1).

| Laboratory data
PathWest is the sole public pathology provider in WA and services all but three hospitals admitting paediatric patients in the state. It also processes referred samples from private pathology laboratories in WA. Further details on PathWest are provided elsewhere. 7 WANIDD is managed by the WA Department of Health and collects information on all notifiable infectious diseases in WA. We combined PathWest and WANIDD data on respiratory specimens collected from the birth cohort between January 2000 and December 2012.
We included laboratory data from nasal/nasopharyngeal (NP), throat, tracheal, bronchial, sputum, lung, pleural fluid, blood or serum specimens. Nasal/NP specimens included combined nose and throat, nasopharyngeal, per-and postnasal swabs and aspirates. Specimens where respiratory virus testing was requested and used one or more detection methods (including antigen detection [eg, immunofluorescent antibodies], polymerase chain reaction [PCR], viral culture and serology) were included. We focussed on respiratory viruses as parallel data on testing for bacterial pathogens were incomplete. Moreover, the sensitivity and specificity of current microbiology diagnostics for bacterial ALRI are relatively poor.
Respiratory viruses routinely tested on respiratory specimens included RSV, influenza A and B, human adenovirus and parainfluenza types 1-3. Testing for hMPV was available from 2003 and routinely tested from 2008. Testing for other respiratory viruses, like human picornaviruses, was performed only on request. Picornaviruses were detected by PCR and viral culture but were not routinely speciated into rhinovirus or enterovirus. A specimen was deemed to test positive if one or more of adenovirus, influenza virus (any type), parainfluenza virus (any type), hMPV, picornavirus (unspeciated), rhinovirus or RSV were detected by antigen detection, PCR, culture or serology.
Admissions where both picornavirus and rhinovirus were detected were coded as rhinovirus, whereas if only an unspecified picornavirus was identified, it was coded as a picornavirus admission.
Laboratory records were merged with a hospital record if respiratory specimens were collected 48 hours before or after the admission date. If the same person had multiple admissions for different reasons within 48 hours, laboratory records were linked to the ALRI admission.
If the same person had multiple admissions for the same reason within 48 hours, laboratory records were linked to the admission closest to the date of specimen collection.

| Statistical analysis
After merging hospital and laboratory records, we compared the clinical characteristics of admissions classified as ALRI, URTI and other. We then identified those with at least one virus detected and calculated the frequency and incidence of each virus by diagnosis, age and Aboriginal status. Incidence rates were calculated using person-time-at-risk as the denominator, derived from dates of birth, death and end of the study (31 December 2012). Aboriginal children were identified using a derived variable provided by the WA Data Linkage Branch. 12 We used logistic regression to identify admission-specific factors associated with failing to have a virology test (outcome variable), clustering by person to allow for multiple admissions of the same person. Variables were selected based on clinical plausibility and/or a p-value of less than 0.05 compared to the base model. These were included in the multivariable model to calculate adjusted odds ratios (aOR) and 95% confidence intervals (CI  F I G U R E 1 Flow chart of testing for respiratory viruses and diagnosis. Total number of children admitted in each category may not equal the total number of children hospitalized as a child may be hospitalized more than once linked to a laboratory record (Figure 1).

| RESULTS
Of all laboratory-confirmed admissions, 47.8% had an ALRI diagnosis (n = 20 874; Figure 1). Over 70% of laboratory-confirmed admissions with an ALRI diagnosis concerned children aged less than 2 years old at admission (Table S2). The majority of these chil-  Table 1). Approximately 12.5% of these admissions related to children aged 5 years or more at admission and 12.3% of these admissions included a stay in ICU (Table S2). Approximately 20.6% (n = 3651) of laboratory-confirmed admissions with at least one virus detected did not have an ALRI or URTI diagnosis (Table 1). RSV was the most commonly detected virus, with an incidence rate that was four times higher (95% CI=3.8-4.1) than the next most common virus (Table 1). RSV incidence rates were highest among children under 6 months of age ( Figure 2A) and among those diagnosed with bronchiolitis (Table 1).

| Pathogen-specific burden of respiratory viruses
Unspeciated picornavirus, RSV, parainfluenza and rhinovirus had similar incidence rates among children with other diagnoses ( Table 1).
Rates of unspeciated picornavirus were highest among infants aged less than 1 month ( Figure 2B). On the other hand, rates of parainflu-

| Exploring under-ascertainment of respiratory viruses
Univariable analyses suggested that age at admission, diagnosis, length of stay, hospital type and mechanical ventilation were strongly associated with failure to test for respiratory viruses ( Table 2). These variables remained significant after adjusting for all other variables in the multivariable model.
Children over 5 years of age and those without an ALRI diagnosis had at least fourfold greater odds of not being tested (Table 2).
Furthermore, admissions to non-tertiary hospitals had at least threefold odds of not being tested than admissions to tertiary hospitals.
When restricted to admissions with an ALRI diagnosis, results were similar for all variables except ICU admission and interhospital transfers (Table S3). Children admitted to regular wards with an ALRI diagnosis had 0.71 times the odds of failing to test for respiratory viruses compared to those admitted to ICU (Table S3). Likewise, admissions following interhospital transfers had lower odds of not receiving a test than those who were transferred (Table S3). T A B L E 1 Frequency and incidence rates of respiratory viruses per 100 000 child-y among those who were tested Approximately 38% of all laboratory-confirmed admissions were not coded as ALRI or URTI, with 22% of these admissions testing positive for at least one respiratory virus. These do not appear to be solely attributable to asymptomatic virus identification as many of these viruses were infrequently detected in asymptomatic children. 16 Moreover, these laboratory-confirmed admissions were mostly unspecified diagnosis codes (eg, viral infections of unspecified site) or non-specific clinical symptoms (eg, breathing abnormalities). This highlights the limitations of using ICD codes alone when calculating disease estimates, as documented elsewhere. 4,17,18 Previous studies also support the notion that coding algorithms, particularly for less welldefined diseases, are needed for accurate disease estimates. 19 These are crucial to policy development and are often used as baseline data when evaluating the impact of said policies. Our findings further highlight the need to look beyond ICD codes when estimating infectious disease burden and the impact of vaccination programs.

| DISCUSSION
While there were many laboratory-confirmed admissions without a respiratory code, over half of all ALRI admissions were not tested for respiratory viruses, implying that the burden of specific viruses is still underestimated. Without the capacity to directly influence testing patterns and because it is economically unfeasible to impose universal testing, one possibility to address this issue is by statistical modelling.
By characterizing those testing positive for particular viruses, we could extrapolate virus "detection" to those with similar characteristics but not tested. This method was used on similar data in England, 20 and a future study using these data is planned.
We noted that admissions to private metropolitan hospitals had five times greater odds of failing to test compared to tertiary hospitals. We were unable to investigate the role of comorbidities (eg, immunosuppression) and antiviral use on testing practices and the frequency of virus detection. We also could not examine the emerging role of rhinovirus in severe respiratory infections 21  to influence incidence rates of individual viruses investigated here.
Inclusion of data on bacterial pathogens is the next step in assessing the burden of all respiratory pathogens.

| CONCLUSION
Despite these limitations, this is one of the few studies, to our knowledge, to quantify the population-level burden of respiratory viruses in children, irrespective of diagnosis. It is also the first to use individual, rather than aggregated, person-time-at-risk data to more accurately estimate the rate of respiratory virus infection. This has enabled reporting of pathogen-specific incidence rates of these viruses, the majority of which are not notifiable. Access to data on all admissions for a whole population, in addition to both positive and negative test results, is a major strength of this study. Using these data, we have shown that respiratory viruses are pervasive and their prevention could reduce the burden of respiratory and non-respiratory hospitalizations. These data provide a framework for further, in-depth studies to enhance current and future preventative strategies. We plan to use these data to investigate temporal trends and risk factors for specific viruses and their combinations in the context of changes in testing patterns.