Nosocomial transmission of influenza: A retrospective cross‐sectional study using next generation sequencing at a hospital in England (2012‐2014)

Abstract Background The extent of transmission of influenza in hospital settings is poorly understood. Next generation sequencing may improve this by providing information on the genetic relatedness of viral strains. Objectives We aimed to apply next generation sequencing to describe transmission in hospital and compare with methods based on routinely‐collected data. Methods All influenza samples taken through routine care from patients at University College London Hospitals NHS Foundation Trust (September 2012 to March 2014) were included. We conducted Illumina sequencing and identified genetic clusters. We compared nosocomial transmission estimates defined using classical methods (based on time from admission to sample) and genetic clustering. We identified pairs of cases with space‐time links and assessed genetic relatedness. Results We sequenced influenza sampled from 214 patients. There were 180 unique genetic strains, 16 (8.8%) of which seeded a new transmission chain. Nosocomial transmission was indicated for 32 (15.0%) cases using the classical definition and 34 (15.8%) based on genetic clustering. Of the 50 patients in a genetic cluster, 11 (22.0%) had known space‐time links with other cases in the same cluster. Genetic distances between pairs of cases with space‐time links were lower than for pairs without spatial links (P < .001). Conclusions Genetic data confirmed that nosocomial transmission contributes significantly to the hospital burden of influenza and elucidated transmission chains. Prospective next generation sequencing could support outbreak investigations and monitor the impact of infection and control measures.


| BACKG ROU N D
Nosocomial influenza is associated with increased length of hospital stay, severe complications and death. 1 The extent of transmission in hospital settings is poorly understood, however, because identification of transmission events is challenging. Classical methods assume cases to be "hospital-acquired" when the time between admission and the first positive sample exceeds the incubation period of the influenza virus. This definition is not always accurate as the incubation period is variable (0.7 to 2.8 days), 2 early symptoms may not be recorded or recognised as influenza, samples may not be taken at consistent time points within an illness, and systems often fail to capture information on hospital contact prior to admission.
Next generation sequencing methods have the potential to improve the precision of these inferences by providing information on the genetic relatedness of viral strains. 3 Genetic approaches use assumptions about the rate at which the virus acquires mutations and the likely duration of an outbreak to assess whether direct links between patients are plausible. Availability of near real-time sequencing data therefore raises the opportunity for improved surveillance through earlier identification of outbreaks and more effective response. Used retrospectively, information derived from next generation sequencing may also inform policy and practice for future outbreaks.
Previous applications of next generation sequencing of influenza have included elucidating zoonosis and describing transmission of seasonal and pandemic strains. [3][4][5][6] In the context of nosocomial transmission, several studies have used next generation sequencing to assess differences between sequences of specific influenza genome segments (HA, NA and/or PB2) or to investigate small outbreaks. [7][8][9][10][11][12][13][14][15][16][17][18] These results have highlighted the importance of multiple introductions of community strains. Whole genome sequencing has been used in other studies to demonstrate that isolates in pre-defined epidemiological clusters are more likely to be related than those outside of such clusters and to differentiate outbreaks into clusters. [19][20][21] However, we are unaware of studies using the greater resolution afforded by next generation sequencing of the entire genome to explore nosocomial transmission of influenza across whole seasons. Implementation of next generation sequencing has also been limited by lack of analytical capacity, absence of established quality control comparators and cost. 22 In this study, we conducted whole genome next generation sequencing on all samples of influenza taken at a large teaching hospital in London over two winter seasons. We aimed to investigate the capability of this method to enhance identification of hospital transmission of influenza compared to methods based on routinely collected data alone and to describe transmission within the hospital setting.

| Study design and setting
This was a retrospective cross-sectional study of patients at University College London Hospitals NHS Foundation Trust (UCLH).
UCLH is a major teaching and research hospital in central London, which has approximately 900 beds, sees on average more than one million outpatients, has 131 000 accident and emergency attendances and admits more than 170 000 patients each year. 23 had known space-time links with other cases in the same cluster. Genetic distances between pairs of cases with space-time links were lower than for pairs without spatial links (P < .001).
Conclusions: Genetic data confirmed that nosocomial transmission contributes significantly to the hospital burden of influenza and elucidated transmission chains.
Prospective next generation sequencing could support outbreak investigations and monitor the impact of infection and control measures.

K E Y W O R D S
cross infection, disease outbreaks, influenza, human, molecular epidemiology a 14-day period were assumed to be a continuation of the same illness.

| Next generation sequencing and phylogenetic analysis
RNA was extracted from residual diagnostic specimens and sequenced using Illumina MiSeq paired-end sequencing as previously described. 4 Full details of phylogenetic methods are provided in the supplementary appendix. In summary, we generated consensus sequences from short reads using an in-house de novo assembly pipeline, applying a read depth cut-off of ≥20 reads to the final sequences. Sets of segments were compiled after categorising samples by lineage (A/(H1N1) pdm09, A/H3N2, B/Yamagata) and season . Maximum-likelihood phylogenetic trees were inferred for each alignment.
We defined genetic distance as the number of pairwise nucleotide differences between aligned sequences of the same subtype and within the same season. The maximum expected number of substitutions between pairs of samples was calculated using the upper bound of the 95% credibility interval of the rate of substitution and sequencing error rate for each season and lineage, assuming an upper limit of 20 days between transmission pairs and normalising for pairwise alignment length (see Table S2 for rates of substitution and sequencing error).
We defined genetic clusters as viral genomes that differed by less than the maximum expected number of nucleotide substitutions obtained from samples collected within 20 days of each other. We calculated the number of distinct genetic strains, the proportion of cases that seeded a new transmission chain (ie clusters of at least two cases) and the median number of cases per cluster.

| Identification of nosocomial transmission
We identified potential instances of nosocomial transmission using a classical method (based on routinely collected hospital data only) and a genetic method (using results from next generation sequencing). In the "classical" method, we defined cases as hospital-acquired if the positive sample was taken more than two days after admission and as community-acquired if taken within 2 days. We calculated the proportion assumed to be nosocomially acquired using the formula: number of cases with the first positive sample taken more than 2 days after admission/ total number of cases.
In the "genetic" method, we considered that cases within the same genetically defined cluster were linked through transmission. We calculated the proportion assumed to be nosocomially acquired using the formula: (number of cases in genetic clustersnumber of unique genetic clusters)/ number of cases. This assumes that each genetic cluster has one community-acquired index case.
We hypothesised that cases in this hospital classified as hospital-acquired by the genetic definition would be more likely to be hospital-than community-acquired (according to the "classical" definition). We therefore calculated the proportions in each group and compared them using Fisher's exact test.

| Identification of space-time links
We sought to establish the extent to which pairs of cases with space-time links based on dates and ward locations also shared genetic links. We identified space-time links between pairs of cases with the same influenza subtype based on their assumed infectious and "acquisition" periods ( Figure S1). The acquisition period was the period in which they may have been infected and was derived from the incubation period (1-3 days) plus an interval (0-2 days) between onset of symptoms and sample collection. 2 Acquisition periods therefore ranged from 1 to 5 calendar days prior to the sample collection date. We considered acquisition to be possible in the hospital ward where the sample was taken and all wards where the patient was treated during the assumed acquisition period. We defined the infectious period as lasting a maximum of 14 days starting from two days before the sample date. 25 We also conducted sensitivity analyses varying the length of the infectious period (Appendix S1).
Pairs of cases were classified as having space-time links if they had the same influenza subtype and overlapping infectious and acquisition periods whilst in the same hospital location. We calculated the proportion of cases in genetic clusters that had spacetime links with cases in the same genetic cluster (and therefore also of the same influenza subtype). We also hypothesised that pairs of cases with space-time links would have closer genetic links than pairs of cases that were linked temporally (ie by overlap in infectious and acquisition periods) but did not have spatial co-occurrence. Time-linked cases were used for this comparison to account for the accumulation of independent genetic changes over time.
We investigated this by comparing the genetic distances (regardless of cluster assignment) amongst these pairs of cases with the Wilcoxon rank-sum test.
Finally, we combined epidemiological and genetic data to visualise potential transmission links. Data were managed, analysed and visualised using Stata v14 and R v3.5.0.

| RE SULTS
A total of 332 PCR-positive influenza samples were identified during the study period. Full genome sequencing was possible for 242 (72.9%) samples, from 214 patients. It is likely that sequencing was not successful for the remaining samples due to insufficient viral load. All subsequent analyses are based on the samples for which sequencing was successful. The characteristics of the patient population are shown in Table 1.  Of the 50 cases in genetic clusters, 11 (22.0%) had space-time links (based on routinely collected data) with other cases in the same genetic cluster. Genetic distances between pairs of cases that had space-time links were smaller (median 1.8 × 10 −3 substitutions/site, interquartile range 0.7-3.1) than between pairs of cases that were

| D ISCUSS I ON
We have used whole genome sequencing, on an established next generation sequencing platform, to investigate nosocomial spread of influenza across two winter seasons. Based on genetic data, we found that one in eleven cases of influenza introduced to a hospital  to be in a genetic cluster leading to discordant results. The genetic method does not rely on these assumptions, and can establish direct transmission links between cases, and is therefore likely to be more reliable. It is also possible that cases classified as hospital-acquired by the genetic method within two days of admission were part of community clusters, but this is unlikely given the diversity of community strains.
Pairs of cases that had space-time links (derived from dates and ward locations recorded in routine hospital data) had smaller genetic distances than those without space-time links (P < .001). This indicates closer genetic relatedness and is consistent with studies in household, hospital and long-term care facility settings. 19,20,[27][28][29] However, only 22% of cases in genetic clusters had space-time links with other cases in the same genetic cluster. This implies that most transmission is not through obvious ward-based contact. Genetic clustering analysis could therefore be useful to distinguish genuine outbreaks from coincidental pairs of cases on wards and to direct control efforts accordingly.
This study aimed to describe how virus genomics could improve understanding of nosocomial transmission gleaned from routine hospital data and clinical practice. As such, there was no enhanced sampling or epidemiological investigation to identify potential interactions between patients outside ward settings. Results will therefore have been based on incomplete case ascertainment, and transmission occurring on non-ward settings, from sub-clinically infected patients, staff members or visitors could not be detected.
This demonstrates an advantage of using sequencing data, which can group cases into genetic clusters even if some of the links in the transmission chain are missing. Enhanced sampling of patients, staff and visitors to identify all cases and prospective collection of contact data would likely be needed to establish evidence of contact between a greater proportion of genetically clustered cases than was possible using retrospective patient ward movement data.
A limitation of our analysis is that we did not have information on symptoms, co-morbidities or clinical outcomes such as length of stay. We therefore could not estimate which of these factors may have influenced transmission or severity of illness. We also did not have data on negative tests for influenza and were therefore unable to ascertain if individuals were tested before their positive sample