Sputum microbiomic clustering in asthma and chronic obstructive pulmonary disease reveals a Haemophilus‐predominant subgroup

Abstract Background Airway ecology is altered in asthma and chronic obstructive pulmonary disease (COPD). Anti‐microbial interventions might have benefit in subgroups of airway disease. Differences in sputum microbial profiles at acute exacerbation of airways disease are reflected by the γProteobacteria:Firmicutes (γP:F) ratio. We hypothesized that sputum microbiomic clusters exist in stable airways disease, which can be differentiated by the sputum γP:F ratio. Methods Sputum samples were collected from 63 subjects with severe asthma and 78 subjects with moderate‐to‐severe COPD in a prospective single centre trial. Microbial profiles were obtained through 16S rRNA gene sequencing. Topological data analysis was used to visualize the data set and cluster analysis performed at genus level. Clinical characteristics and sputum inflammatory mediators were compared across the clusters. Results Two ecological clusters were identified across the combined airways disease population. The smaller cluster was predominantly COPD and was characterized by dominance of Haemophilus at genus level (n = 20), high γP:F ratio, increased H influenzae, low diversity measures and increased pro‐inflammatory mediators when compared to the larger Haemophilus‐low cluster (n = 121), in which Streptococcus demonstrated the highest relative abundance at the genus level. Similar clusters were identified within disease groups individually and the γP:F ratio consistently differentiated between clusters. Conclusion Cluster analysis by airway ecology of asthma and COPD in stable state identified two subgroups differentiated according to dominance of Haemophilus. The γP:F ratio was able to distinguish the Haemophilus‐high versus Haemophilus‐low subgroups, whether the Haemophilus‐high group might benefit from treatment strategies to modulate the airway ecology warrants further investigation.


| INTRODUC TI ON
Asthma and chronic obstructive pulmonary disease (COPD) are increasingly prevalent airways diseases responsible for considerable morbidity and resource utilization worldwide, especially in those with the most severe disease. They are well recognized to encompass a heterogeneous pathobiology, with well-defined disease phenotypes or endotypes 1-3 identifiable at different scales, which impact on treatment strategies for individual patients.
Bacterial colonization is an important feature in both diseases.
Culture-independent techniques based on sequencing the variable regions of bacterial 16S ribosomal RNA genes have demonstrated that the healthy airway is colonized and that the microbial composition of the airway is altered in asthma and COPD. 4,5 Associations between microbial ecology, clinical characteristics and inflammatory mediator profiles have been demonstrated, suggesting that specific pathogens may be associated with particular disease phenotypes. In particular, Haemophilus spp. are often the most frequently detected potential pathogens in subgroups of asthma and COPD and are associated with airway neutrophilia 6,7 and corticosteroid resistance 8 whilst studies in childhood asthma have associated Moraxella with an increased risk of exacerbation. 9 Establishing the microbial community profiles characteristic of these subgroups, at stable and exacerbation states, will assist in defining therapeutic targets. Anti-microbial therapies have been evaluated in both asthma and COPD, as both maintenance treatment and at acute exacerbation, and are beneficial in some cases. [10][11][12][13] However, there are public health risks associated with widespread antibiotic use. 14 Identification of key bacterial targets and biomarkers to distinguish those who will potentially derive a clinically significant benefit will aid clinicians to manage this risk or develop new therapies beyond antibiotics.
Previous longitudinal assessments of microbiome dynamics in COPD by our group reported a subgroup in whom dysbiosis occurred at exacerbation 15 and proposed the use of γProteobacteria to Firmicutes (γP:F) ratio to identify this group. Furthermore, a recent analysis of biological exacerbation subgroups in asthma and COPD suggested that exacerbation aetiology varies, 16 with a bacterial-associated group demonstrating elevated Proteobacteria to Firmicutes ratios, compared to those observed in eosinophilic or viral-triggered episodes. It is unclear whether separation of these subgroups is maintained at stable state.
In this study, we hypothesized that subgroups with distinct microbial profiles exist within a stable, severe asthma and moderateto-severe COPD population and that the γP:F ratio can be used to distinguish these groups, potentially highlighting individuals who may benefit from therapy targeting airway dysbiosis. To test this, we examined the microbiome in sputum samples from subjects with asthma and COPD and utilized a combination of cluster analysis and topological data analysis (TDA) to define groups which could then be characterized in more detail.

| Study participants
Patients with severe asthma and moderate-to-severe COPD were prospectively recruited from a single centre at Glenfield Hospital,

G R A P H I C A L A B S T R A C T
Sputum microbiomic cluster analysis and TDA demonstrates 2 subgroups of stable severe asthma and moderate-to-severe COPD differentiated according to dominance of Haemophilus. The Haemophilus-high group had no defining clinical characteristics but had lower microbial diversity and increased levels of sputum TNFα and IL1β. γProteobacteria:Firmicutes (γP:F) differentiated the clusters and was the most predictive biomarker of the Haemophilus-high group.
Leicester, UK. Asthma and COPD were diagnosed according to physician assessment consistent with definitions based on Global Initiative for Asthma (GINA) or Global Initiative for Chronic Obstructive Lung Disease (GOLD) guidance. Asthmatic subjects had participated in a published observational study (n = 131) 17 and COPD subjects had provided stable samples, at least 8 weeks postexacerbation, whilst participating in a published longitudinal exacerbation study (n = 156). 18 Subjects with asthma were GINA step 4 or 5, with sputum samples provided at visits when clinically stable, at least 4 weeks postexacerbation. Subjects with COPD, GOLD class I-IV, were required to have experienced at least one exacerbation in the preceding 12 months and were excluded if they were unable to produce sputum following an induced sputum procedure, or if a current or previous diagnosis of asthma was present. Subjects requiring maintenance oral corticosteroid therapy were included in both studies. All patients provided written informed consent for samples to be used in future analyses, and all subjects with sputum samples adequate for microbiome sequencing were included in the current analysis. Both studies were approved by the local Leicestershire, Northamptonshire and Rutland ethics committee.

| Measurements
Demographics, clinical characteristics and lung function data were collected. Severity of cough and dyspnoea was assessed using a visual analogue scale (VAS) which has previously been described. 16 Spontaneous or induced sputum was collected for sputum total and differential cell counts and bacteriology. 95% of samples were spontaneous. Inflammatory mediator profiling was performed on cell-free sputum supernatants using the Meso Scale Discovery Platform (MSD; Gaithersburg, MD, USA) as previously described, 19 with values below the detectable limit replaced with the corresponding lower limit of detection prior to analysis. Bacterial load was measured by quantitative polymerase chain reaction (qPCR) with DNA extraction and qPCR performed as previously described, 18 based on the abundance of 16S ribosomal subunit encoding genes (total 16S). Pathogen-specific bacterial 16S abundance via qPCR was quantified for H influenzae using the SYBR Green assay (PE Applied Biosystems). The threshold of detection for pathogen-specific bacterial 16S qPCR analysis was taken as 1 × 10 4 genome copies/ml reflecting previous cut-off thresholds used in this field. 20 Measurements refer to genome copies/ml of homogenized sputum compared to a standard curve. For microbiomic analyses, DNA was extracted from sputum using the Qiagen DNA Mini kit (Qiagen) as per manufacturer's protocol, following which PCR-amplification of the V4-5 hypervariable regions of 16S ribosomal RNA were pyrosequenced on 454 Genome Sequencer FLX platform (454 Life Sciences; Roche Diagnostics) to obtain microbiome communities. Sequencing reads were processed using QIIME 21 (quantitative insights into microbial ecology, Version 1.9.1) as previously described. 22 Taxonomic classification, within sample (alpha-diversity) and between sample (beta-diversity) microbiome measures were performed at normalized sequence read depth of 1666 and 97% sequence identity. PCR reagent control and a sample with a known microbiomic profile from previous analyses were used as positive and negative controls, respectively. To ensure quality of sequencing data, low-quality sequence reads were trimmed and adaptors, chimeras and potential human sequences were removed. γP:F ratio was calculated for each individual sample using the sequencing data, by dividing the total proportion of the sample belonging to the class Gammaproteobacteria, by the proportion belonging to the phylum Firmicutes. Sequence data are deposited at the National Center for Biotechnology Information Sequence Read Archive (SRP065072). Repeatability was assessed by the intra-class correlation coefficient (ICC). For statistical significance, P < .05 was applied.

| Microbiome asthma versus COPD
Clinical characteristics for subjects with severe asthma (n = 63) and moderate-to-severe COPD (n = 78) are as shown in Table 1. Subjects with asthma were younger and more likely to be receiving treatment with oral steroids, with a longer duration of disease and a higher BMI. Those with COPD were more likely to be male, current or exsmokers with a longer pack-year history, lower lung function and were more symptomatic with cough and dyspnoea. There were no differences between disease groups in number of exacerbations in the preceding 12 months, nor neutrophils and eosinophils in blood or sputum.
The distribution of the major sputum bacterial phyla and genera are shown in Figure 1A COPD subjects displayed a significantly lower richness and alpha-diversity compared to subjects with asthma ( Figure S1).

| Microbiomic clusters of stable airways disease (asthma and COPD combined)
Proportions of the two major phyla were similar between disease groups; therefore, further analysis was performed with asthma and COPD subjects combined. Topological data analysis (TDA) was utilized to visualize the data and improve clarity regarding the presence of clusters. At the phylum level, the population appeared as a single structure with areas of differentiation. In contrast, clustering according to genus demonstrated two distinct structures that were well differentiated according to the genus Haemophilus, but not differentiated by other genera (Figure 2 and S2).
Cluster 1 (Haemophilus-high) contained 20 subjects (14% of the total population) and was predominantly COPD (75%, compared to 52% COPD in the Haemophilus-low Cluster 2). This was reflected in the clinical characteristics (Tables 2 and S1) with lower FEV 1 demonstrated in Cluster 1 compared to Cluster 2, although no other distinguishing clinical characteristics were noted. The predominance of subjects with COPD in Cluster 1 was also reflected in an increased sputum neutrophilia compared to Cluster 2. Cluster 1 had a higher γP:F ratio across both conditions. Despite a higher proportion of asthmatics in Cluster 2, raised levels of blood and sputum eosinophilia were not observed.

| Microbiome clustering within disease group
To ascertain whether the clusters observed in the combined analy- There was a lack of distinguishing clinical characteristics between the microbiome clusters in both conditions. In severe asthma, Cluster 1 was older and in COPD, they demonstrated an increased sputum neutrophilia. γP:F ratio was significantly higher in Cluster 1 in both diseases, as was the sputum IL1β concentration.

Haemophilus-high subgroup
Measurable characteristics differentiating the subgroups were reviewed as potential biomarkers. ROC analysis to determine the best biomarkers of the  .670 Pre-FEV 1 (L) a  Here, we have applied topological data analysis in concert with cluster analysis for the first time to explore the airway microbiome in asthma and COPD. Classical clustering methods such as Hierarchical or K-Means split a data set apart and can result in data points being artificially separated. Topological data analysis does not split a data set apart, but rather provides a two-dimensional representation of the original high-dimensional data set, retaining the essential features of the underlying geometry of the original data. The combination of topological and cluster analysis strengthens our assertion that there are two sputum microbiome clusters best described as

| D ISCUSS I ON
Haemophilus-high and Haemophilus-low. Our study has a number of potential limitations. The sample size was good for the analysis of asthma and COPD but the combined clusters were small, as were clusters within the individual diseases and therefore limits our confidence in these comparisons. The majority of sputum samples tested were spontaneous rather than induced samples, and data from both approaches were combined.
Sputum induction was undertaken only if an inadequate spontaneous sample was produced. We have previously shown that the microbiome from these approaches are comparable and therefore do not anticipate this will have affected our interpretations, 19 although we recognize that differences in sampling technique impact microbiomic analysis and require consideration when interpreting data.
Additionally, analysis of sputum has an acquisition bias as it excludes those that fail to produce a sputum sample. This is a minor issue for COPD with successful sputum production in >90% of subjects 29 but with success up to 90% 30 in asthma means that clusters cannot be generalized to all asthmatics. Likewise, the number of subjects that had repeated sampling adequate for molecular microbiological analysis was small and limits our ability to study phenotypic stability.
We did not explore the sensitivity and specificity of routine culture to identify the Haemophilus-high group in our analysis; however, it is well established that molecular techniques are more sensitive for assessment of the complete microbial profile. Additionally, our study did not include detailed radiological evaluation for bronchiectasis, which commonly co-exists with asthma and COPD; therefore, it is unclear whether co-morbid bronchiectasis is associated with one of the sputum microbiome clusters. Finally, the airway microbiome is a complex, dynamic ecological system and, whilst we suggest that more severe airflow obstruction observable in the Haemophilushigh group is driven by disease, it is not possible to exclude reverse causality, whereby the airway conditions in subjects with airways disease permit colonization by Haemophilus species. 31 The effect of environmental factors, including allergens, and competition between organisms colonizing the airway 32 on microbiome composition warrants further study in future.
In summary, we have identified two sputum microbiome clusters, Haemophilus-high and Haemophilus-low, in airways disease. These clusters can be distinguished by the sputum γP:F ratio or H influenzae, which present opportunities for rapid molecular biomarkers in the clinic to determine cluster membership. Whether identification of these microbiome clusters and more specifically the Haemophilushigh cluster is associated with a favourable response to antibiotic or nonantibiotic anti-microbial interventions in airways disease needs to be tested in future trials.

ACK N OWLED G M ENTS
The asthma and COPD data sets were generated as part of a sponsored research collaboration with AstraZeneca.