Challenges in interpreting allergen microarrays in relation to clinical symptoms: A machine learning approach

Authors

  • Mattia C. F. Prosperi,

    Corresponding author
    1. Centre for Health Informatics, Institute of Population Health, University of Manchester, Manchester, UK
    2. Centre for Respiratory Medicine and Allergy, Institute of Inflammation and Repair, University of Manchester, Manchester, UK
    • Correspondence

      Mattia C. F. Prosperi, Department of Biomedical Modelling, Centre for Health Informatics, Institute of Population Health, University of Manchester, 1st Floor, Jean McFarlane Building, Room 1.314, University Place, Oxford Road, Manchester M13 9PL, UK

      Tel.: +44 (0) 161 27 51125

      E-mail: mattia.prosperi@manchester.ac.uk

    Search for more papers by this author
  • Danielle Belgrave,

    1. Centre for Health Informatics, Institute of Population Health, University of Manchester, Manchester, UK
    2. Centre for Respiratory Medicine and Allergy, Institute of Inflammation and Repair, University of Manchester, Manchester, UK
    Search for more papers by this author
  • Iain Buchan,

    1. Centre for Health Informatics, Institute of Population Health, University of Manchester, Manchester, UK
    Search for more papers by this author
  • Angela Simpson,

    1. Centre for Respiratory Medicine and Allergy, Institute of Inflammation and Repair, University of Manchester, Manchester, UK
    Search for more papers by this author
  • Adnan Custovic

    1. Centre for Respiratory Medicine and Allergy, Institute of Inflammation and Repair, University of Manchester, Manchester, UK
    Search for more papers by this author

Abstract

Background

Identifying different patterns of allergens and understanding their predictive ability in relation to asthma and other allergic diseases is crucial for the design of personalized diagnostic tools.

Methods

Allergen-IgE screening using ImmunoCAP ISAC® assay was performed at age 11 yrs in children participating a population-based birth cohort. Logistic regression (LR) and nonlinear statistical learning models, including random forests (RF) and Bayesian networks (BN), coupled with feature selection approaches, were used to identify patterns of allergen responses associated with asthma, rhino-conjunctivitis, wheeze, eczema and airway hyper-reactivity (AHR, positive methacholine challenge). Sensitivity/specificity and area under the receiver operating characteristic (AUROC) were used to assess model performance via repeated validation.

Results

Serum sample for IgE measurement was obtained from 461 of 822 (56.1%) participants. Two hundred and thirty-eight of 461 (51.6%) children had at least one of 112 allergen components IgE > 0 ISU. The binary threshold >0.3 ISU performed less well than using continuous IgE values, discretizing data or using other data transformations, but not significantly (p = 0.1). With the exception of eczema (AUROC~0.5), LR, RF and BN achieved comparable AUROC, ranging from 0.76 to 0.82. Dust mite, pollens and pet allergens were highly associated with asthma, whilst pollens and dust mite with rhino-conjunctivitis. Egg/bovine allergens were associated with eczema.

Conclusions

After validation, LR, RF and BN demonstrated reasonable discrimination ability for asthma, rhino-conjunctivitis, wheeze and AHR, but not for eczema. However, further improvements in threshold ascertainment and/or value transformation for different components, and better interpretation algorithms are needed to fully capitalize on the potential of the technology.

Ancillary