Evaluation of using ICD‐10 code data for respiratory syncytial virus surveillance

Abstract Background Respiratory syncytial virus (RSV) is the most common cause of acute lower respiratory tract infection (ALRI) in young children. ICD‐10‐based syndromic surveillance can transmit data rapidly in a standardized way. Objectives We investigated the use of RSV‐specific ICD‐10 codes for RSV surveillance. Methods We performed a retrospective descriptive data analysis based on existing ICD‐10‐based surveillance systems for ALRI in primary and secondary care and a linked virological surveillance in Germany. We described RSV epidemiology and compared the epidemiological findings based on ICD‐10 and virological data. We calculated sensitivity and specificity of RSV‐specific ICD‐10 codes and in combination with ICD‐10 codes for acute respiratory infections (ARI) for the identification of laboratory‐confirmed RSV infections. Results Based on the ICD‐10 and virological data, epidemiology of RSV was described, and common findings were found. The RSV‐specific ICD‐10 codes had poor sensitivity 6% (95%‐CI: 3%‐12%) and high specificity 99.8% (95%‐CI: 99.6%‐99.9%). In children <5 years and in RSV seasons, the sensitivities of RSV‐specific ICD‐10 codes combined with general ALRI ICD‐10 codes J18.‐, J20.‐ and with J12.‐, J18.‐, J20.‐, J21.‐, J22 were moderate (44%, 95%‐CI: 30%‐59%). The specificities of both combinations remained high (91%, 95%‐CI: 86%‐94%; 90%, 95%‐CI: 85%‐94%). Conclusions The use of RSV‐specific ICD‐10 codes may be a useful indicator to describe RSV epidemiology. However, RSV‐specific ICD‐10 codes underestimate the number of actual RSV infections. This can be overcome by combining RSV‐specific and general ALRI ICD‐10 codes. Further investigations are required to validate this approach in other settings.


| INTRODUC TI ON
Respiratory syncytial virus (RSV) is a worldwide distributed pathogen of acute respiratory infection (ARI) of all ages. In infants and young children, RSV is the most common cause of acute lower respiratory tract infection (ALRI) and a major cause of hospital admission for ALRI. Worldwide in 2015, 21.6-50.3 million RSV-associated ALRI episodes occurred in children younger than 5 years, with about 2.7-3.8 million hospital admissions. 1,2 Currently, only passive immunization with palivizumab against RSV is available for children at high risk. 3 In 2015, the World Health Organization

International Statistical Classification of Diseases and Related
Health Problems (ICD) diagnosis codes have been used to describe the burden of respiratory diseases and the impact of vaccination. [6][7][8][9] ICD-based digital syndromic surveillance is a relatively novel surveillance practice, compared to the traditional surveillance. It can not only describe epidemiology of disease, but also capture and transmit data rapidly in a standardized and sustainable way at lower costs, and provide very early warning of potential public health threats. [10][11][12] The Robert Koch Institute (RKI) established the 10th revision of ICD (ICD-10)-based digital syndromic surveillance systems for influenza and other ARI in primary and secondary care in Germany (Appendix S1). In primary care, general practitioners, internists, and pediatricians of sentinel practices report influenza and other ARI data voluntarily through a syndromic influenza surveillance system. This system has been linked with a virological surveillance and a sentinel electronic data collection system based on ICD-10 codes (SEED ARE ). 13 SEED ARE was evaluated as a valid system for syndromic influenza surveillance. 14 In secondary care, an ICD-10 code-based surveillance system for severe acute respiratory infections (ICOSARI) has been implemented in cooperation with a private hospital network in Germany. 15 Studies estimating validity of ICD diagnosis codes for the identification of laboratory-confirmed influenza have shown mixed results. 14,[16][17][18] So far, few studies have looked at accuracy of RSVspecific ICD-10 diagnosis codes for the identification of true RSV infections. To our knowledge, only Pisesky et al 19 reported high sensitivity (97.9%, 95%-CI: 95.5%-99.2%) and specificity (99.6%, 95%-CI: 98.2%-99.8%) of RSV-specific ICD-10 codes for the identification of hospitalized RSV among children.
The aim of this study was to evaluate the use of RSV-specific ICD-10 diagnosis codes for RSV surveillance.

| ME THODS
We performed a retrospective descriptive data analysis based on the data derived from ICD-10-based influenza and other ARI surveillance systems SEED ARE and ICOSARI, and from the virological surveillance at the RKI. The SEED ARE system has functioned since 2007,   the virological surveillance since 2010, and ICOSARI since 2015. The   datasets of ICOSARI for the years 2009 to 2014 were collected ret-rospectively. The Appendix S1 provides details on the surveillance participants, data collection methods, collected data, total number of collected data, and study period (13)(14)(15) , Appendix S1). The SEED ARE system was approved by the German Federal Commissioner for Data Protection and Freedom of Information, and the ICOSARI system by the RKI and HELIOS Kliniken GmbH data protection authority. As SEED ARE and ICOSARI involved no interventions and the analysis was based on anonymized data only, no ethical clearance was required for them. 14,15 The virological surveillance activities were approved by the German Federal Commissioner for Data Protection and Freedom of Information and the Ethical Committee of the Charité, Universitätsmedizin, Berlin.
We defined a RSV-ICD-case based on SEED ARE data as a medical consultation with any of the three RSV-specific ICD-10 code diagnoses (J12.1 RSV pneumonia, J20.5 acute bronchitis due to RSV, and J21.0 acute bronchiolitis due to RSV). 6 We defined a RSV-ICD-case based on ICOSARI data as a hospitalization with any of the three RSVspecific ICD-10 code diagnoses as primary discharge diagnosis. In the virological surveillance, we defined a confirmed-RSV-case as a by realtime reverse transcriptase polymerase chain reaction (rtRT-PCR) confirmed RSV sample. In each data source, a RSV season was defined as the weeks when cumulative number of RSV-ICD-cases or confirmed-RSV-cases exceeded 1.2% of total RSV-ICD-cases or confirmed-RSVcases. One gap week below the threshold was allowed. 20,21 We estimated number of RSV-ICD-cases and confirmed-RSVcases by gender, age group (0-1, 2-4, 5-14, 15-34, 35-49, 50-59, ≥60 years), and calendar week based on each data source, respectively.
We identified the sentinel practices that participated in both SEED ARE and the virological surveillance concurrently by practice-ID.
We matched the medical consultations of SEED ARE with virological samples by practice-ID, age, gender, consultation date, and sampling date. Only one-to-one matches were included for the further data evaluation. We calculated sensitivity of RSV-specific ICD-10 code respectively. 6 The sensitivities and specificities were calculated with 95% confidence interval (95%-CI). Additionally, we compared RSV-ICD-cases with confirmed-RSV-cases of the identified practices by calendar week.
We used Stata (version 15) and microsoft excel 2010 for the data analysis.

| Integration of RSV data of practices participated in SEED ARE and virological surveillance
Forty-eight sentinel practices participated in both SEED ARE and the virological surveillance from week 40/2010-13/2017. In total, 5589 respiratory specimens of the 48 practices were tested for RSV.
Of those, 400 (7%) were RSV positive, and 2624 (47%) could be matched with the medical consultations based on SEED ARE one to one ( Figure 3).

| D ISCUSS I ON
Using ICD-10-based surveillance, we identified age groups under high risk of RSV, and successfully described general trends and seasonality of RSV in primary and secondary care in Germany, as confirmed by data from the virological surveillance system. In primary care, RSV-specific ICD-10 codes had poor sensitivity and high specificity for the identification of laboratory-confirmed RSV infections.
In young children, two combinations of RSV-specific ICD-10 codes with general ALRI ICD-10 codes increased the sensitivity without decreasing the specificity much.
The described RSV epidemiology based on ICD-10 code and virological data showed many common findings. Especially, high number of RSV cases among young children, and higher number of RSV cases among young boys than young girls were found in ICD-10 and In the present study, the proportion of young children among all RSV-ICD-cases was higher in secondary care based on ICOSARI than in primary care based on SEED ARE data. This is in agreement with the clinical observation that RSV infection is normally more serious in young children and is a major cause of hospital admission in this group. 1,2 Bronchiolitis is a very severe manifestation of RSV disease mainly affecting young children, whereas bronchitis is more common in older children and adults. 24,25 Of the three RSV-specific ICD-10 codes, J21.0 (acute bronchiolitis due to RSV) was most frequently diagnosed in secondary care based on ICOSARI and J20.5 (acute bronchitis due to RSV) in primary care based on SEED ARE .
Based on the three data sources, the RSV season onset ranged from mid-October to end-November, the season offset was in mid-April, and the peak of season ranged from end-January to mid-February in Germany. The RSV season length ranged from 20 to 28 weeks. The RSV seasons captured most of the RSV cases. We found that RSV-specific ICD-10 codes were less sensitive and highly specific for the identification of laboratory-confirmed RSV infections in primary care. Low sensitivity of the ICD-10 codes was also reported for influenza. [16][17][18] In Germany, laboratory diagnostic tests are not always performed for suspected RSV infections in primary care. Even if testing is performed, an ICD-10 code diagnosis will probably no longer be recoded when laboratory findings are only available in the practice a few days later after the medical consultation. Therefore, suspected and also laboratory-confirmed RSV infections may be encoded with general ARI ICD-10 codes. These could be the reasons why most of the laboratory-confirmed RSV cases were not encoded with RSV-specific ICD-10 codes in the sentinel practices which participated in both the SEED ARE and virological surveillance in the present study. In preparation for the present study, the RKI performed a survey to explore RSV coding behavior in primary care in Germany. The results of the survey are in line with the explanations above (unpublished data).
In children aged <5 years and in RSV seasons, the sensitivity of RSV-specific ICD-10 codes grew more than twofold, and the specificity remained high. Physicians were probably more likely to encode with RSV-specific ICD-10 codes for young children and in RSV seasons since RSV is more common in this group and during this time period. In the present study, we tried estimating the sensitivities and specificities of RSV-specific ICD-10 codes combined with different general ARI ICD-10 codes. RSV-specific ICD-10 codes combined with two groups of general ALRI ICD-10 codes achieved moderate sensitives and high specificities. The high sensitivity of RSV-specific TA B L E 1 Sensitivities and specificities of RSV-specific ICD-10 code diagnosis combined with different general ARI ICD-10 codes of the practices participated in both SEED ARE  RSV codes + B34. The present study has some limitations. The sensitivity and specificity of RSV-specific ICD-10 code diagnoses in secondary care could not be evaluated on a case by case basis since virological data of the ICOSARI network were not available for the present study. However, in the ICOSARI network, suspected RSV cases in young children were tested by rapid antigen detection tests and rtRT-PCR, and laboratory-confirmed RSV infections were encoded with RSV-specific ICD-10 codes. Although whether the testing and coding took place in a 100% frequency is not verified, these have been as a standard procedure in the pediatric units and the coding quality could have increased in recent years (personal communication). In addition, high validity has been reported in the literature for RSV-specific ICD-10 codes for the identification of hospitalized RSV among children. 19 The RSV coding behavior of physicians in primary care may vary during and out of RSV season, based on use of laboratory diagnostics, age of patient, and level of coding awareness. The differences in coding behavior may lead to information bias. The number of confirmed-RSV-cases and RSV-ICD-cases increased slightly among older adults based on virological as well as ICOSARI data, and it remained at a low level based on SEED ARE . The RSV infection normally goes unrecognized with milder symptoms among adults; however, it is a common pathogen of ARI in older adults and can lead to severe disease. 29,30 Therefore, the RSV infections were probably underestimated among older adults in SEED ARE . This could be another limitation. However, the evaluation of the accuracy of ICD-10 codes was exactly the objective of the present study due to the potential information bias.
The present study was based on anonymized data. According to practice-ID, age, gender, consultation date, and sampling date alone, more than half of the virological samples could not be matched to medical consultations one to one and were excluded for the evaluation of sensitivity and specificity of RSV-specific ICD-10 codes which might lead to selection bias. However, the probability of the selection bias was low since no conspicuous deviations were found between the matched and the excluded virological data (data not shown).

| CON CLUS IONS
The use of RSV-specific ICD-10 code data may be a useful indicator to identify age groups under high risk of RSV, to monitor general trends, and to observe seasonality of RSV. The RSV epidemiology based on ICD-10 code data from different data sources and virological data showed similar age and sex distribution, percent positivity, and seasonality patterns. Therefore, RSV-specific ICD-10 codes are appropriate for RSV surveillance. However, in primary care, RSV-specific ICD-10 code diagnosis was less sensitive, and relying on RSV-specific ICD-10 codes alone will underes-

ACK N OWLED G EM ENTS
The authors thank Kerstin Prahm (RKI), Sven Schröder (RKI) for their technical support in data transmission and database programming, and thank the German National Reference Centre for Influenza at the RKI for the virological surveillance activities.

CO N FLI C T O F I NTE R E S T
None.