SEARCH

SEARCH BY CITATION

Keywords:

  • asthma;
  • atopic sensitization;
  • birth cohort;
  • cluster analysis;
  • machine learning

Abstract

  1. Top of page
  2. Abstract
  3. Methods
  4. Results
  5. Discussion
  6. Acknowledgments
  7. Funding
  8. Author contributions
  9. Conflict of interest
  10. References
  11. Supporting Information

Background

Although atopic sensitization is one of the strongest risk factors for asthma, its relationship with asthma is poorly understood. We hypothesize that ‘atopy’ encompasses multiple sub-phenotypes that relate to asthma in different ways.

Methods

In two population-based birth cohorts (Manchester and Isle of Wight – IoW), we used a machine learning approach to independently cluster children into different classes of atopic sensitization in an unsupervised manner, based on skin prick and sIgE tests taken throughout childhood and adolescence. We examined the qualitative cluster properties and their relationship to asthma and lung function.

Results

A five-class solution best described the data in both cohorts, with striking similarity between the classes across the two populations. Compared with nonsensitized class, children in the class with sensitivity to a wide variety of allergens (~1/3 of children atopic by conventional definition) were much more likely to have asthma (aOR [95% CI0; 20.1 [10.9–40.2] in Manchester and 11.9 [7.3–19.4] in IoW). The relationship between asthma and conventional atopy was much weaker (5.5 [3.4–8.8] in Manchester and 5.8 [4.1–8.3] in IoW). In both cohorts, children in this class had significantly poorer lung function (FEV1/FVC lower by 4.4% in Manchester and 2.6% in IoW; < 0.001), most reactive airways, highest eNO and most hospital admissions for asthma (< 0.001).

Conclusions

By adopting a machine learning approach to longitudinal data on allergic sensitization from two independent unselected birth cohorts, we identified latent classes with strikingly similar patterns of atopic response and association with clinical outcomes, suggesting the existence of multiple atopy phenotypes.

Atopic sensitization is one of the strongest risks of developing asthma [1-3], and yet the relationship between the two is still poorly understood [4-6]. We speculate that one reason for this stems from a coarse clinical definition of the atopic phenotype [7, 8]. Atopy is conventionally defined as a positive allergen-specific serum immunoglobulin E (sIgE) or skin prick test (SPT) to any food or inhalant allergen [9] – a definition that ignores temporal trends and patterns of sensitivity to different allergens [9]. We propose that this is an oversimplification and that atopy encompasses multiple sub-phenotypes [10], as is often seen with complex clinical conditions. We hypothesize that identifying these sub-phenotypes can help elucidate the relationship between atopy and important clinical outcomes such as asthma, as well as aid in discovering consistent causal environmental and genetic risk factors.

New approaches are required to identify sub-phenotypes within complex, high-dimensional, longitudinal data sets, where underlying data patterns may not be immediately obvious. We propose that sub-phenotypes should be inferred using a statistical approach, capable of discovering latent structure in an unsupervised manner, while incorporating prior investigator knowledge and assumptions. Latent class analysis has previously been useful in identifying distinct asthma phenotypes [5, 11-15]. In a recent study, we used a machine learning approach to redefine atopy, moving from a dichotomous (atopic/nonatopic) to a five-class phenotype [10], where each class reflected a distinct temporal and qualitative pattern of atopic sensitization. Only one of these classes was unequivocally associated with asthma; the relationship between asthma and conventionally defined atopy was much weaker.

In this work, we evaluated the validity of the sub-phenotypes found in our previous study [10]. We inferred multinomial atopy phenotypes in two population-based birth cohorts. We compared the characteristics of the obtained phenotypes in the two cohorts qualitatively in terms of typical atopic sensitization patterns for each class. We then examined the associations between the new phenotypes and clinical outcomes, including asthma and lung function, contrasting our inferred phenotypes with conventionally defined atopy.

Methods

  1. Top of page
  2. Abstract
  3. Methods
  4. Results
  5. Discussion
  6. Acknowledgments
  7. Funding
  8. Author contributions
  9. Conflict of interest
  10. References
  11. Supporting Information

Study design, setting and participants

Manchester Asthma and Allergy Study (MAAS) is a population-based birth cohort (ISRCTN72673620) [16-18]. Participants were recruited prenatally and followed prospectively, attending review clinics at ages 1, 3, 5, 8 and 11 years. The Isle of Wight birth cohort study (IoW) is a whole population birth cohort study established in 1989 [2, 19-21]. Participants were recruited prenatally and followed prospectively with assessments at ages 1, 2, 4, 10 and 18 years. Both studies were approved by the local research ethics committees. Parents and participants, as appropriate, gave informed consent. For detailed descriptions of both cohorts, see online supplement.

Definition of clinical outcomes

Sensitization

In MAAS, we ascertained sensitization by SPT (ages 1, 3, 5, 8 and 11) and measurement of sIgE (ages 1, 3, 5 and 8, Immuno-CAP; Thermo Fisher, Uppsala, Sweden) to a panel of inhalant and food allergens. In IoW, we carried out skin prick test at ages 4, 10 and 18. We defined positive SPT as mean wheal diameter at least 3 mm greater than the negative control. We assigned three levels of allergen-specific sIgE sensitization: negative (<0.35 kU/I), positive (0.35–10 kU/I) and highly positive (≥10 kU/I). Atopy was defined according to convention as a dichotomous variable, corresponding to a child having a positive SPT to at least one allergen.

Asthma

It was defined as ‘yes’ to ‘Have you ever had asthma?’ and either of ‘Have you had wheezing in the last 12 months?’ or ‘Have you had asthma treatment in the last 12 months?’ Asthma-related hospital admissions were transcribed from primary care records in MAAS (age 3–8 years). In IoW, parents were asked at the age 10 year follow up to recall any previous asthma-related hospital admissions.

Lung function

Forced expiratory volume in 1 s (FEV1) was corrected for gender, age and height using the Asthma UK predictive equations [20] at 11 years in MAAS and 10 in IoW. The FEV1/forced vital capacity (FVC) ratio provided a measure of airway obstruction. Methacholine bronchial hyper-responsiveness, at 11 years in MAAS and 10 in IoW, was quantified as the normalized dose–response ratio (DRR) to incorporate participants without a 20% drop into FEV1 [21]. Exhaled nitric oxide (eNO) was used as a measure of eosinophilic airway inflammation [19] at 11 years in MAAS and 18 in IoW.

Statistical methods

Atopic sensitization class model

Our previous work [10] contains a detailed description of the model used to infer sensitization classes in the two cohorts, and we only provide a brief summary here. Sensitization of each child to each allergen over time was modelled using the hidden Markov model [22] illustrated in Fig. 1. In the model, the hidden state at each time point corresponds to ‘true’ sensitization, and the probabilities of observing positive SPT/sIgE responses can be viewed as true-/false-positive rates. The probabilities of acquiring/losing sensitization over time for all available allergens define cluster membership: they are shared between all children within a single sensitization class, but allowed to differ between classes. Cluster assignments and posterior distributions over all model parameters, latent variables and missing values were inferred using variational message passing [23], a method for approximate Bayesian inference, implemented under the Infer.NET framework [24]. To determine the number of clusters, we used a cross-validation approach, where we trained the model on 90% of the data and evaluated the log-likelihood of the remaining 10%. The removed patients were randomly selected among those with at least one positive response, and the procedure was repeated 100 times.

image

Figure 1. hidden Markov model used to model the sensitization of a child to a particular allergen over time. Each latent variable inline image corresponds to ‘true’ underlying sensitization to allergen a at time t. inline image determines the probability of observing positive skin prick test (SPT) and sIgE responses, as well as sensitization at the next time point inline image.

Download figure to PowerPoint

The model was independently applied to cluster all MAAS SPTs and sIgEs (ages 1, 3, 5, 8, 11), and all IoW SPTs (ages 4, 10, 18). In a previous work [10], we clustered MAAS data using the same model, and this solution was refined here using additional data (age 11 SPTs) and a semi-quantitative rather than binary representation of sIgE responses.

Association analysis

We investigated the clinical characteristics of the sensitization classes by comparing each of the atopic classes with the reference nonatopic class. Categorical variables were reported as odds ratios with 95% confidence intervals; continuous variables were reported as mean differences and standard deviations, with adjustment for gender. We assessed significance levels using chi-squared, Fisher's exact or two-sample t-tests, as appropriate, using Stata, version 12.1 (StataCorp, College Station, TX, USA).

Results

  1. Top of page
  2. Abstract
  3. Methods
  4. Results
  5. Discussion
  6. Acknowledgments
  7. Funding
  8. Author contributions
  9. Conflict of interest
  10. References
  11. Supporting Information

Participant characteristics

In MAAS, after excluding children who had been randomised prenatally to an environmental intervention [25] and children with no SPT or sIgE data, 1028 children were included. In IoW, 230 of 1457 participants with no SPT data were excluded. All remaining children with available clinical outcomes were included at each time point. Demographic characteristics of the two cohorts are summarized in Table 1. Using a conventional definition of atopy, 22.7% of IoW children were sensitized at 10 years of age, compared with 33.9% MAAS children at 11 years. At 10 years of age, 15.1% IoW children had asthma, compared with 12.7% MAAS children at 11 years.

Table 1. Characteristics of the two cohorts
 MAAS (= 1028)IoW (= 1226)
  1. MASS, Manchester Asthma and Allergy Study; IoW, Isle of Wight.

  2. a

    Asthma was defined as ‘yes’ to ‘Have you ever had asthma?’ and either wheezing or asthma treatment in the last 12 months.

  3. b

    Atopy was defined as any positive skin prick test (mite, pollens, cat, dog, egg, milk or peanut).

Male 568/1027 (55.3%) 492/1028 (47.9%)
AsthmaaAge 8131/866 (15.1%)  
Age 11100/789 (12.7%)Age 10177/1176 (15.1%)
  Age 18202/1125 (18.0%)
AtopybAge 11241/710 (33.9%)Age 10278/1226 (22.7%)
RhinoconjunctivitisAge 8161/890 (18.1%)  
Age 11192/801 (24.0%)Age 10153/1173 (13.0%)
  Age 18290/1127 (25.7%)

Sensitization classes

While there was no clear best number of clusters according to the cross-validation procedure (Figure S1), we observed that solutions with more than five clusters produced empty or nearly empty clusters (<30 members) in both cohorts, and so we opted for a five-class model. For the MAAS cohort, the inferred sensitization classes were similar to the results of our previous work [10] in terms of qualitative properties and class membership (Table S1 and Figure S2).

In MAAS, the inferred classes can qualitatively be described as follows: (0) few or no positive tests (676/1028, [65.7%]), (1) early childhood sensitivity to grass pollens and transient sensitivity to eggs, but not to mite (39/1028, [3.8%]), (2) sensitivity to mite, but rarely to other allergens (83/1028, [8.1%]), (3) early sensitivity to mite and grass and tree pollens, later onset of sensitivity to pets (147/1028, [14.3%]) and (4) sensitivity to a wide variety of allergens, including mite, pollens, cat and dog, throughout childhood (83/1028, [8.1%]).

In IoW, the inferred classes of latent atopic vulnerability can be qualitatively described as follows: (0) few or no positive atopy tests (791/1226, [64.5%]), (1) sensitivity to grass pollens and late-onset sensitivity to peanut (69/1226, [5.6%]), (2) sensitivity to mite (120/1226, [9.8%]), (3) sensitivity to mite and grass pollens, late-onset sensitivity to pets (153/1226, [12.5%]) and (4) sensitivity to a wide variety of allergens, including mite, pollens, cat and dog (93/1226, [7.6%]).

Tables S2–S4 summarize the SPT and sIgE tests overall, by conventionally defined atopy and by sensitization class. The patterns of sensitization (SPT) are shown for each of the five classes in Fig. 2, each bar representing a single time point.

image

Figure 2. Sensitization class characteristics for the Manchester Asthma and Allergy Study (MAAS) cohort (top) and IoW cohort (bottom). The figures show the number of positive skin prick test (SPT)s (black) and negative SPTs (white) for ages 1, 3, 5, 8 and 11 in MAAS and ages 4, 10 and 18 in IoW, arranged according to allergen and class. Time is increasing across the horizontal axis in each figure.

Download figure to PowerPoint

Association between sensitization classes, conventional atopy and clinical outcomes

Asthma

We investigated the relationship between asthma (age 11 years in MAAS and 10 years in IoW) and conventional atopy and the four atopic classes in the five-class model (using the nonatopic class as reference). The results are presented in Fig. 3 and Table S5. Children in Class 4 (early sensitization to a broad panel of allergens) were much more likely to have asthma (aOR 20.1 [95% CI 10.9–40.2] in MAAS and aOR 11.9 [95% CI 7.3–19.4] in IoW). Although conventional atopy also predicted asthma, the relationship was much weaker (aOR 5.5 [3.4–8.8] in MAAS and 5.8 [4.1–8.3] in IoW). Class 1 was not associated with asthma in either cohort at age 10/11 years, whereas Class 3 was in both cohorts, and Class 2 was at risk of asthma in MAAS, with a trend in IoW. Hospital admissions for asthma were much more common in Class 4 (aOR 15.3 [5.3–44.1] in MAAS and aOR 2.5 [1.3–4.7] in IoW) and were also associated with Class 1 in MAAS and Class 3 in IoW (Table S5). The strength of the association with rhinoconjunctivitis also differed between the classes (Table S5).

image

Figure 3. Asthma in the two cohorts at ages 11 Manchester Asthma and Allergy Study (MAAS) and 10 (IoW), by current atopy and by sensitization classes. The figure shows gender-adjusted odds ratio and 95% confidence intervals, where Class 0 was the reference class for sensitization Classes 1–4.

Download figure to PowerPoint

Lung function

Lung function was materially and significantly poorer in Class 4 in both cohorts (Fig. 4, Table S6). The most striking differences were seen for FEV1/FVC ratio, where for Class 4 at age 10/11 years, ratios were 4.4% lower in MAAS and 2.6% lower in IoW (< 0.001). In addition, in both cohorts, children in Class 4 had the most reactive airways (< 0.001) and significantly higher eNO (< 0.001).

image

Figure 4. Top: lung function in the Manchester Asthma and Allergy Study (MAAS) cohort, by conventional atopy at age 11 and by sensitization classes (group means and 95% confidence intervals). Bottom: Lung function in the IoW cohort, by conventional atopy at age 10 and by and sensitization classes (group means and 95% confidence intervals).

Download figure to PowerPoint

Discussion

  1. Top of page
  2. Abstract
  3. Methods
  4. Results
  5. Discussion
  6. Acknowledgments
  7. Funding
  8. Author contributions
  9. Conflict of interest
  10. References
  11. Supporting Information

By adopting a machine learning approach to longitudinal data on allergic sensitization from two independent, unselected birth cohorts from distinct geographical areas within the UK, we identified qualitatively similar latent structure in the atopic responses of children. The similarities in the sensitization classes of the two cohorts in terms of patterns of atopic response and association with clinical outcomes suggest that they are indicative of the existence of multiple true atopy phenotypes. Our results validate the findings of our previous study [10], showing that very similar classes can be identified in an independent population, in which data had been collected to adulthood. The data within MAAS also suggest that these atopy clusters change little over time (Tables S1 and S7, Figure S2). Importantly, of the four classes of children showing evidence of allergic sensitization, one (Class 4) was associated with a much higher risk of asthma during childhood. Children in this class had materially poorer lung function, more reactive airways and much higher risk of acute exacerbation of asthma resulting in hospital admission. This class comprises only approximately one-third of children who would be considered atopic by conventional criteria. This supports our hypothesis that the clinical expression of asthma does not merely depend on the presence of specific IgE antibodies, but rather on patterns of IgE responses over time.

The sensitization classes in the two cohorts are qualitatively very similar, despite the fact that they were derived independently from cohorts with participants assessed at different ages and by a different panel of allergens. For example, the MAAS data extend from 1 to 11 years (compared with 4–18 years for the IoW cohort) and include both skin tests and IgE (compared with only skin tests in IoW). The similarities between Class 2 (sensitization to mite) and Class 4 (early sensitization to multiple allergens) in two cohorts are striking. Classes 1 and 3 are also similar in general, although they differ somewhat in sensitization to grass pollen. We speculate that clustering more data at multiple time points in each cohort might lead to an even better correspondence. The prevalence of positive skin tests and of asthma in our two populations was generally similar to that of UK children in this age group [6, 26], confirming that our results are broadly applicable to the general population.

In a setting with high-dimensional, longitudinal data that can depend on various environmental factors, it can be advantageous to use a statistical approach to summarize different patient characteristics in place of a potentially biased, investigator-imposed, deterministic phenotype. A machine learning approach coupled with Bayesian inference allows one to incorporate prior knowledge and assumptions into the process of discovering latent structure in data, while handling missing data in a principled manner. In applying these techniques, which we acknowledge are unfamiliar to many clinicians, we are capitalizing on developments in computer science and harnessing them to help us solve clinical problems that have eluded resolution by traditional means. The phenotype replication seen between the MAAS and IoW data sets appears to validate this approach. We acknowledge that our cross-validation procedure did not identify any material difference in the likelihood between solutions with three or more clusters and therefore no clear ‘correct’ number of clusters. As the solutions with more than five clusters typically contained empty or very small clusters in both cohorts, they were rejected. We opted for a five-class model based on the results of our previous clustering and the face validity of the clusters. We emphasize that the number of clusters is not necessarily definitive and that our approach is predominantly hypothesis-generating. It is possible that with more subjects, there would be more evidence for the five-class solution, rather than an increase in the number of clusters. We also acknowledge that as the clusters serve to summarize the characteristics of patient subpopulations, it is possible that including more variables (e.g. different allergens) might produce more or a different number of clusters.

Asthma is a complex disease [5], and conventional atopy is neither necessary nor sufficient for this disease, despite being the strongest risk factor. We suggest that statistical modelling of high-dimensional atopic sensitization data can lead to a more precise understanding of the relationship between asthma and atopy. We hypothesize that atopic sub-phenotypes will facilitate a better understanding of the mechanisms underlying the development of different allergic sensitization patterns and therefore the development of associated allergic diseases. This is an important step in the development of new therapies for asthma, which remains the most common chronic disease of childhood.

Acknowledgments

  1. Top of page
  2. Abstract
  3. Methods
  4. Results
  5. Discussion
  6. Acknowledgments
  7. Funding
  8. Author contributions
  9. Conflict of interest
  10. References
  11. Supporting Information

The MAAS authors would like to thank the children and their parents for their continued support and enthusiasm. We would also like to acknowledge the hard work and dedication of the study team (postdoctoral scientists, research fellows, nurses, physiologists, technicians and clerical staff). We thank the North West Lung Centre Charity for continued support. The Isle of Wight authors gratefully acknowledge the cooperation of the children and parents who have participated in the 1989 cohort. We also thank Abid Raza, Martha Scott, Ramesh Kurukulaaratchy, Wilfred Karamus, Hongmei Zhang, Sharon Matthews, Jane Grundy, Paula Williams, Frances Mitchell, Bernie Clayton, Monica Fenn, Linda Terry, Stephen Potter and Rosemary Lisseter for their considerable assistance with the study.

Funding

  1. Top of page
  2. Abstract
  3. Methods
  4. Results
  5. Discussion
  6. Acknowledgments
  7. Funding
  8. Author contributions
  9. Conflict of interest
  10. References
  11. Supporting Information

MAAS was supported by Asthma UK Grant No. 04/014, MRC Grant G0601361 and JP Moulton Charitable Foundation. The 10- and 18-year assessments of the Isle of Wight 1989 cohort were funded by grants from the National Institute of Health, USA (R01 HL082925), National Eczema Society/British Dermatological Nursing Group Research Awards, British Medical Association and National Asthma Campaign, UK (Grant No. 364).

Author contributions

  1. Top of page
  2. Abstract
  3. Methods
  4. Results
  5. Discussion
  6. Acknowledgments
  7. Funding
  8. Author contributions
  9. Conflict of interest
  10. References
  11. Supporting Information

AS and AC conceived the idea and worked with all the authors on the analysis plan. NL undertook the machine learning analysis, DB analysed the MAAS data, and GR analysed the IoW data. All authors reviewed and discussed the results, contributed to the manuscript and approved the final version.

Conflict of interest

  1. Top of page
  2. Abstract
  3. Methods
  4. Results
  5. Discussion
  6. Acknowledgments
  7. Funding
  8. Author contributions
  9. Conflict of interest
  10. References
  11. Supporting Information

None of the authors have any conflict of interests regarding this study.

References

  1. Top of page
  2. Abstract
  3. Methods
  4. Results
  5. Discussion
  6. Acknowledgments
  7. Funding
  8. Author contributions
  9. Conflict of interest
  10. References
  11. Supporting Information

Supporting Information

  1. Top of page
  2. Abstract
  3. Methods
  4. Results
  5. Discussion
  6. Acknowledgments
  7. Funding
  8. Author contributions
  9. Conflict of interest
  10. References
  11. Supporting Information
FilenameFormatSizeDescription
all12134-sup-0001-SupplementaryMaterial.docxWord document834KData S1. Methods.
all12134-sup-0002-LazicOnlineSupplementRevisedClean.docxWord document834K 

Please note: Wiley Blackwell is not responsible for the content or functionality of any supporting information supplied by the authors. Any queries (other than missing content) should be directed to the corresponding author for the article.