SEARCH

SEARCH BY CITATION

Keywords:

  • component-resolved diagnostics;
  • asthma;
  • wheeze;
  • rhinitis;
  • airway hyper-reactivity;
  • methacholine;
  • IgE;
  • children;
  • machine learning;
  • feature selection;
  • logistic regression;
  • random forests;
  • Bayesian networks

Abstract

  1. Top of page
  2. Abstract
  3. Methods
  4. Results
  5. Discussion
  6. Conclusions
  7. Authors contributions
  8. Conflict of interest
  9. Funding
  10. References
  11. Supporting Information

Background

Identifying different patterns of allergens and understanding their predictive ability in relation to asthma and other allergic diseases is crucial for the design of personalized diagnostic tools.

Methods

Allergen-IgE screening using ImmunoCAP ISAC® assay was performed at age 11 yrs in children participating a population-based birth cohort. Logistic regression (LR) and nonlinear statistical learning models, including random forests (RF) and Bayesian networks (BN), coupled with feature selection approaches, were used to identify patterns of allergen responses associated with asthma, rhino-conjunctivitis, wheeze, eczema and airway hyper-reactivity (AHR, positive methacholine challenge). Sensitivity/specificity and area under the receiver operating characteristic (AUROC) were used to assess model performance via repeated validation.

Results

Serum sample for IgE measurement was obtained from 461 of 822 (56.1%) participants. Two hundred and thirty-eight of 461 (51.6%) children had at least one of 112 allergen components IgE > 0 ISU. The binary threshold >0.3 ISU performed less well than using continuous IgE values, discretizing data or using other data transformations, but not significantly (p = 0.1). With the exception of eczema (AUROC~0.5), LR, RF and BN achieved comparable AUROC, ranging from 0.76 to 0.82. Dust mite, pollens and pet allergens were highly associated with asthma, whilst pollens and dust mite with rhino-conjunctivitis. Egg/bovine allergens were associated with eczema.

Conclusions

After validation, LR, RF and BN demonstrated reasonable discrimination ability for asthma, rhino-conjunctivitis, wheeze and AHR, but not for eczema. However, further improvements in threshold ascertainment and/or value transformation for different components, and better interpretation algorithms are needed to fully capitalize on the potential of the technology.

Detection of allergen-specific IgE antibodies (sIgE) is associated with an increased risk of wheeze/asthma, and among asthmatic patients with more severe disease and diminished lung function [1-4]. The level of sIgE to common inhalant allergens offers more valuable information than a simple detection of ‘positive sIgE’ [4, 5]. Different allergen sources (both indoor and outdoor) have been independently associated with asthma and asthma-related symptoms [6-9]. However, it remains unclear how allergen sensitizations in toto contribute towards clinical manifestations of different atopic diseases (e.g., asthma vs. rhino-conjunctivitis vs. eczema).

The increasing availability of allergen components (purified from natural source or produced as recombinant proteins) marks the shift in allergy diagnosis that may lead to a transition towards component-resolved diagnostics [10]. For example, the multiplex chip-based assay ImmunoCAP ISAC® has been validated in terms of performance and reproducibility [11], providing an opportunity to identify both allergen patterns and their interactions in relation to different clinical outcomes. Using ImmunoCAP ISAC® assay, the sIgE antibody profiles associated with asthma, exhaled nitric oxide and airway hyper-reactivity (AHR) have been investigated, with multiple sensitizations to several allergen groups increasing the risk of asthma [12].

We hypothesized that different sIgE patterns are predictive of different diseases commonly associated with atopy; also, different interpretation algorithms, component threshold ascertainment and/or value transformation may modify the association between such patterns and clinical symptoms. To address these hypotheses, we investigated the ability and the interpretability of different linear and nonlinear statistical learning models in classifying contemporaneous asthma, wheezing, AHR, rhino-conjunctivitis and eczema. Models were fit on sIgE levels measured by the ISAC® among participants in a population-based birth cohort at age 11 yrs. We used machine learning methods such as decision trees (DTs) (that divide study population into nested subgroups, based on allergen thresholds and combinations, each with a specific probability of manifesting clinical symptoms) and Bayesian networks (BN) (graphs that represent causal dependencies of variables), coupled with logistic regression (LR), to fully exploit the large amount of information provided by the microarray, and to associate it to clinical symptoms.

Methods

  1. Top of page
  2. Abstract
  3. Methods
  4. Results
  5. Discussion
  6. Conclusions
  7. Authors contributions
  8. Conflict of interest
  9. Funding
  10. References
  11. Supporting Information

Study population and data sources

Manchester Asthma and Allergy Study is a population-based birth cohort described in detail elsewhere [13]. The study was approved by a local ethics committee; informed consent was obtained from all parents. All data in this manuscript were ascertained at age 11 yrs. We administered validated questionnaires to collect information on parentally reported symptoms and physician-diagnosed illnesses. We measured AHR using methacholine challenge [14].

Definition of outcomes

Current asthma

Positive answer to all three of the following questions: (i) ‘Has your child wheezed within the past 12 months?’, (ii) ‘Has your child received asthma medication within the past 12 months?’ and (iii) ‘Has your child ever been diagnosed with asthma?’.

Current wheeze

Positive answer to the question ‘Has your child had wheezing or whistling in the chest in the last 12 months?’.

Current eczema

Positive answer to the question ‘Has your child had eczema in the last 12 months?’.

Current rhino-conjunctivitis

Positive answer to the question ‘In the past 12 months, has your child ever had a problem with sneezing, or a runny nose, or a blocked nose when he/she did not have a cold or the flu which was accompanied by itchy-watery eyes?’.

Airway hyper-reactivity

Provocative concentration of methacholine causing a 20% decline in FEV1 < 16 mg/ml.

All outcomes were encoded as binary variables. Other variables considered in the descriptive statistics were FEV1, FVC, FEV1/FVC ratio, eNO and methacholine dose-response ratio.

Detection of IgE antibodies

The presence of sIgE to 112 allergen components was assessed by the ImmunoCAP ISAC® (ThermoFisher Scientific, Uppsala, Sweden).

Transformation of sIgE values

In the analyses described below, we expressed sIgE values as follows: (i) binarized using the threshold of 0.3 ISU; (ii) discretized into four categories using the manufacturer's semiquantitative scale (<0.3 ISU, undetectable or very low; ≥0.3 and <1 ISU, low; ≥1 and <15 ISU, moderate to high; ≥15 ISU, very high); (iii) discretized using an automated supervised discretization approach [15]; (iv) continuous raw values; (v) square-root or hyperbolic-arcsine transformation [16]; (vi) using other normalization methods such as quantile normalization [17].

Statistical learning

For a detailed explanation of the methods, please see the Supporting Information. Briefly, we analysed the discriminative ability of sIgE patterns in relation to clinical outcomes by fitting a series of machine learning models. All analyses were adjusted for gender. Statistical models were run on the subset of patients with at least one sIgE > 0 ISU.

Logistic regression

We fitted main-effects LR using: (i) the sum of and the number of positive sIgE values; and (ii) sIgE to all allergen components (using different transformations). For the latter, due to the high number of variables, LR was subject to feature selection via LogitBoost [18].

Decision tree and random forest models

These models were fitted to investigate possible nonlinear/interaction effects [19, 20]. DTs are machine learning methods that divide population into nested subgroups according to values of the covariates, usually those that have the highest discriminatory power with respect to the outcome. For example, our study population could be divided into two using Fel d 1 IgE below or above 0.3 ISU. Other tree-branching rules can then be inferred on the two subpopulations, and so on recursively until a stopping criterion is met (e.g., a minimum number of subjects per subgroup). This progressive data partition can be represented in the form of a tree (Fig. S1). DTs are easy to interpret, but sometimes have poor predictive power. RFs are an ensemble of several different DTs, fitted with resampling/randomization, with the aim to improve prediction performance by combining many decision pathways (e.g., averaging across many DT predictions). DT and RF have dedicated methods for measuring variable importance which can capture complex interactions, without the need of explicitly defining them.

Bayesian networks

Bayesian networks are graphs in which each node is a covariate, and a link between two nodes represent a dependency. If no link is present between two nodes, they are conditionally independent. For a comprehensive introduction to BN modelling for biomedicine, please see Millán et al. [21]. The naïve Bayes (NB), a simpler model which assumes conditional independence among variables, was also fit as a control to BN.

Model performance

Goodness-of-fit functions for assessing prediction performance of models included as follows: accuracy (% correct), area under the receiver operating characteristic (AUROC), sensitivity and specificity. The ability to generalize on unseen data was assessed through repeated validation, executing for 50 times a randomized training/test procedure (80%/20%) and comparing differences between models with a paired-corrected t-test. We also assessed power of the sample in relation to covariate size.

Results

  1. Top of page
  2. Abstract
  3. Methods
  4. Results
  5. Discussion
  6. Conclusions
  7. Authors contributions
  8. Conflict of interest
  9. Funding
  10. References
  11. Supporting Information

Participants

We reviewed 822 children. Sample for IgE measurement was obtained for 461 (56.1%); there was no difference in gender, family history of allergic diseases, position in sibship, asthma, sensitization (skin tests) or parental atopy between those with and without IgE (data available on request). A total of 238 of 461 (51.6%) children tested positive (sIgE > 0.3 ISU) to at least one allergen component. Characteristics of study participants are shown in Table 1.

Table 1. Characteristics of the study population at age 11 (N = 426)
At least one IgE > 0Median (IQR) or NMissing
Male 154Female 84Total 238N/A
N1 = 238 (51.6%)
Number of specific positive IgE (>0.3 ISU)7.5 (3.0–14.5)7.0 (2.0–12.0)7.0 (3.0–13.0)N/A
Sum of all IgE34.0 (8.4–130.2)38.9 (1.9–132.8)36.4 (4.9–131.4)N/A
Asthma3315484
Eczema3622585
Mean eNO14.6 (8.8–33.8)19.9 (9.8–41.7)17.6 (9.4–37.8)62
FVC2.7 (2.4–3.0)2.5 (2.3–2.9)2.6 (2.4–3.0)3
% predicted FEV199.2 (91.0–107.4)98.5 (92.2–103.7)99.0 (91.6–106.0)3
Current wheeze4825733
AHR62319359
Rhino-conjunctivitis6328913
Methacholine dose-response ratio3.20 (1.01–5.47)3.48 (0.95–6.11)0.98 (3.28–5.84)57
FEV/FVC ratio0.86 (0.81–0.90)0.89 (0.84–0.92)0.86 (0.82–0.91)4
All IgE = 0Male 101Female 122Total 223N/A
N2 = 223 (48.4%)
Asthma55109
Eczema1514297
Mean eNO8.3 (6.9–10.7)8.1 (6.5–10.4)8.2 (6.6–10.6)46
FVC2.7 (2.5–3.1)2.6 (2.3–2.9)2.7 (2.4–3.0)6
% Predicted FEV198.3 (93.0–106.1)99.0 (91.8–106.6)98.6 (92.5–106.5)5
Current wheeze810182
Airway hyper-reactivity23184150
Rhino-conjunctivitis791611
Methacholine dose-response ratio4.69 (2.88–6.48)5.13 (3.81–6.81)4.99 (3.54–6.57)53
FEV/FVC ratio0.86 (0.82–0.90)0.89 (0.85–0.93)0.88 (0.83–0.92)13

Distribution and transformation of sIgE values

The distribution of sIgE values was highly skewed; Fig. S2 shows the histograms upon several input transformations. In a preliminary test on model performance (AUROC in relation to asthma, using LR and RF), we found that the binary discretization of sIgE values using the 0.3 ISU threshold was performing less well compared with a continuous scale or a multiple categorization. The manufacturer's semiquantitative scale performed better than the binary threshold, although not significantly. The automated supervised discretization method yielded the best results. Fig. S3 shows detailed box plots of AUROC performance across all transformation methods.

Discriminative ability of sIgE patterns in relation to clinical outcomes

Robustness of model performance

Average AUROC > 0.5 was achieved for asthma, wheeze, rhino-conjunctivitis and AHR in all statistical learning models; in contrast, this was not achieved for eczema (AUROC~0.5). However, although AUROC for eczema for all models was poor, in the univariate analysis, sIgE to egg ovomucoid, ovalbumin and ovotransferrin were significantly associated with eczema (Gal d 1, p = 0.02; Gal d 2; p = 0.02; Gal d 3, p = 0.02). We also observed a strong trend for bovine allergens Bos d 4-5-6 (p = 0.06, p = 0.06, p = 0.04, respectively).

Overall, all models showed reasonable AUROC (0.76–0.82 for RF, 0.63–0.79 for LR, 0.56–0.77 for BN, 0.64–0.76 for NB) and sensitivity (0.69–0.97 for RF, 0.54–0.95 for LR, 0.58–0.96 for BN, 0.62–0.92 for NB), but poor specificity (0.34–0.70 for RF, 0.40–0.69 for LR, 0.32–0.54 for BN, 0.38–0.57 for NB), except for the positive AHR (higher specificity, decreased AUROC/sensitivity).

Random forest outperformed other approaches in terms of AUROC in all but one outcome (rhino-conjunctivitis). LR (model ii) ranked always as the best in terms of specificity. The number of variables selected by LogitBoost yielded a median (IQR) of 9 [7-18] covariates per model across all validation runs. In most cases, the hypothesis that there was no difference in the mean performance among the RF, LR, NB and BN could not be rejected at the 0.05 level. Performance of DT was consistently inferior to that of RF (p < 0.05), and the same held for LR model i (encoding the number of positive IgE + sum of all IgE), except for rhino-conjunctivitis. Table 2 summarizes prediction performance obtained by the repeated validation; AUROC plots are shown in Fig. 1. For this experiment, we used the square-root transformed sIgE, but similar results were obtained for the other transformation methods.

Table 2. Performance of statistical learning models by means of 50 independent validation runs, stratified by different outcomes
OutcomeMethodFeature setFeature/topology selectionAUROC (s.d.)Sensitivity (s.d.)Specificity (s.d.)
  1. AHR, airway hyper-reactivity; AUROC, area under the receiver operating characteristic; BN, Bayesian networks; DT, Decision tree; LR, Logistic regression; NB, naïve Bayes; RF, random forests.

  2. a

    The hypothesis of difference in means comparing against the best model (in bold) could not be rejected at p = 0.05.

AsthmaMajority classN/AN/A0.50 (0.00)a 1.00 (0.00) 0.00 (0.00)a
LRNumber of positive IgE + sum of all IgEN/A0.71 (0.10)a0.96 (0.03)0.20 (0.12)a
LR112 IgE + genderCross-validated LogitBoost0.79 (0.08)0.95 (0.03)a 0.40 (0.15)
DT112 IgE + genderEmbedded (information gain, pruning)0.59 (0.10)a0.96 (0.06)0.14 (0.16)
RF112 IgE + genderEmbedded (Gini index, random subset) 0.82 (0.06) 0.97 (0.04)0.34 (0.13)
NB112 IgE + genderCross-validated wrapper (best-first search, K2)0.76 (0.08)0.91 (0.05)a0.38 (0.15)
BN112 IgE + genderCross-validated wrapper (best-first search, K2)0.77 (0.07)0.96 (0.03)a0.32 (0.15)
WheezeMajority classN/AN/A0.50 (0.00)a 1.00 (0.00) 0.00 (0.00)a
LRNumber of positive IgE + sum of all IgEN/A0.67 (0.06)a0.93 (0.04)a0.13 (0.08)a
LR112 IgE + genderCross-validated LogitBoost0.72 (0.07)0.94 (0.05) 0.37 (0.12)
DT112 IgE + genderEmbedded (information gain, pruning)0.61 (0.08)a0.90 (0.09)0.29 (0.20)
RF112 IgE + genderEmbedded (Gini index, random subset) 0.78 (0.06) 0.91 (0.05)a0.45 (0.12)
NB112 IgE + genderCross-validated wrapper (best-first search, K2)0.69 (0.06)a0.92 (0.06)a0.30 (0.12)
BN112 IgE + genderCross-validated wrapper (best-first search, K2)0.65 (0.07)a0.93 (0.10)0.29 (0.12)
Rhino-conjunctivitisMajority classN/AN/A0.50 (0.00)a 1.00 (0.00) 0.00 (0.00)a
LRNumber of positive IgE + sum of all IgEN/A 0.80 (0.07) 0.84 (0.07)a0.44 (0.11)a
LR112 IgE + genderCross-validated LogitBoost0.73 (0.07)a0.79 (0.09)a 0.53 (0.13)
DT112 IgE + genderEmbedded (information gain, pruning)0.66 (0.06)a0.81 (0.11)a0.47 (0.17)
RF112 IgE + genderEmbedded (Gini index, random subset)0.78 (0.07)0.80 (0.08)a0.57 (0.11)
NB112 IgE + genderCross-validated wrapper (best-first search, K2)0.75 (0.07)0.88 (0.06)a0.41 (0.12)a
BN112 IgE + genderCross-validated wrapper (best-first search, K2)0.73 (0.07)a0.82 (0.08)a0.50 (0.12)
AHRMajority classN/AN/A0.50 (0.00)a 1.00 (0.00) 0.00 (0.00)a
LRNumber of positive IgE + sum of all IgEN/A0.57 (0.11)a0.70 (0.19)a0.39 (0.20)a
LR112 IgE + genderCross-validated LogitBoost0.64 (0.05)a0.55 (0.18)a0.70 (0.23)
DT112 IgE + genderEmbedded (information gain, pruning)0.64 (0.10)a0.55 (0.18)a0.70 (0.21)
RF112 IgE + genderEmbedded (Gini index, random subset) 0.76 (0.07) 0.69 (0.11)a 0.70 (0.10)
NB112 IgE + genderCross-validated wrapper (best-first search, K2)0.64 (0.10)a0.62 (0.20)a0.57 (0.25)
BN112 IgE + genderCross-validated wrapper (best-first search, K2)0.56 (0.06)a0.58 (0.21)a0.54 (0.29)
image

Figure 1. Performance of statistical learning models in classifying the asthma and rhino-conjunctivitis outcomes, using the full feature set (112 IgE + gender) by means of area under the receiver operating characteristic, across 50 independent validation (80%/20%) runs. Results are out-of-sample predictions (i.e., on unseen data). Bars represent standard errors.

Download figure to PowerPoint

Variable importance in relation to clinical outcomes and their dependencies

Fig. 2 shows RF feature importance plots with respect to asthma and rhino-conjunctivitis across 1000 permutation runs. The importance is expressed as rescaled decrease in accuracy when randomizing a variable of interest. For asthma, there was a broader set of top-scoring allergens belonging to different sources including dust mite, cat, dog and pollens, whilst the top-scoring allergens for rhino-conjunctivitis were all pollens followed by dust mite. The order of variables may change when randomizing permutations [22]; indeed, variable ranks and associated p-values for our data set were subject to a consistent degree of variation.

image

Figure 2. Feature importance plots for the asthma outcome (upper panel) and for the rhino-conjunctivitis outcome (lower panel) measured as mean decrease in accuracy from fitting a random forest and performing an outcome permutation test (1000 runs). Green intervals represent rescaled average (±standard deviation) decrease in accuracy, whilst box plots represent the null distribution (randomized outcomes); p-values are highlighted in red. Only the first 10 variables shown.

Download figure to PowerPoint

There was partial consistency between the variables selected by the stepwise heuristic for NB/BN with the top-scoring variables output by the RF. Similarly, runs of stepwise LR with different starting points led to different final sets, probably due to a number of correlated variables. Fig. 3 shows the mutually adjusted odds ratio from LogitBoost LR, including only variables significant (p < 0.05) in univariate analysis. Supplementary results give a more thorough explanation of the relevant variables and their association into ‘equivalency’ groups (Figs S4–S6).

image

Figure 3. Multivariable logistic regression for asthma and rhino-conjunctivitis outcomes (upper and lower panel, respectively), showing mutually adjusted odds ratios and associated p-values from the LogitBoost algorithm (run on the whole data set). Only variables significant in the univariate analysis were included (p < 0.05).

Download figure to PowerPoint

Fig. 4 depicts two optimized NB and BN structures for asthma and rhino-conjunctivitis. The networks shown here are representative of a single run; therefore, we cannot exclude the possibility that there are other topologies and variable sets with comparable performance.

image

Figure 4. Bayesian networks (BN) for the classification of asthma and rhino-conjunctivitis. Upper panel shows the naïve Bayes (NB) models (hypothesizing variable independence, it can be abstracted to a main-effect logistic model, that is, a linear score where each variable has a weight), and lower panel shows the BN that allow for more complex (direct and indirect) conditional dependencies. Given the non-superiority of the more complex BN model as compared to the NB on the current data set, one could choose the NB hypothesis and further evaluate different variable sets, as in main-effects logistic regression.

Download figure to PowerPoint

Discussion

  1. Top of page
  2. Abstract
  3. Methods
  4. Results
  5. Discussion
  6. Conclusions
  7. Authors contributions
  8. Conflict of interest
  9. Funding
  10. References
  11. Supporting Information

Key findings

We investigated the ability of linear and nonlinear statistical learning models fit on ISAC® assay data, to identify asthma, wheezing, AHR, rhino-conjunctivitis and eczema. In general, all modelling techniques (excluding DRs) performed comparably. With the exception of eczema, all outcomes could be predicted with an AUROC > 0.5. Random forests (RF) outperformed other approaches in terms of AUROC, whilst LR ranked as the best in terms of specificity. We could not clarify if a main-effect model or a model that hypothesizes conditional independence among variables (LR/NB) was performing as well as a model which accounts for interactions or conditional dependencies (RF/BN). Based on these results, one could argue that a simple linear score (LR) with fewer allergen components (from 7 to 18) may be as effective as a more complex model. However, we cannot rule out the possibility that interactions among allergens may have a potentially important role.

Our data suggest that sIgE discretization and transformation policy may increase the model performance compared with the single dichotomous threshold at 0.3 ISU. The number of positive sIgEs and the sum of all sIgE levels were poorer predictors of asthma, wheeze and AHR compared with the information on all components, supporting the evaluation of all specific components rather than an overall qualitative assessment.

Limitations and interpretation

From a methodological point of view, one limitation of this study lies within the procedures for feature/model selection. A challenge within BN learning is the simultaneous estimation of both node set and topology, limited here by the usage of two nested heuristic searches. Reliability of the associations in the networks was not assessed, nor the stability of selected feature sets. The selection of main effects and interactions might be also dependent on the discretization policy.

The advantage of coupling machine learning methods with classical statistical models is that more complex mechanistic hypotheses can be investigated and that large, sparse, heterogeneous data sets can be analysed. Furthermore, by comparing performance of nonlinear and linear methods, one can ascertain whether the unexplained variability (e.g., poor prediction of the outcome) can be reduced by measuring other potentially important variables, permitting formulation of new hypotheses.

Our data suggest that a careful and informed sIgE discretization/transformation may increase the performance, although the differences in AUROC of various approaches could not be always confirmed at the formal 0.05 significance level. The semiquantitative coding suggested by the manufacturer performed better than using the binary threshold at 0.3 ISU (albeit not significantly). In our data set, the automated supervised discretization method yielded the best results, supporting the notion that a range of population-specific (and perhaps age-specific) expected values should be established for different populations. Given the multimodal and skewed nature of sIgE distributions and relatively modest sample size, further analyses in larger data sets are warranted. We cannot exclude the possibility that different thresholds are applicable to different components.

The heuristic algorithms for attribute selection yielded compact sets of predictors (from hundreds to a dozen, confirmed when applying the automated supervised discretization approach), which simplifies the interpretation of the results. Not surprisingly, we confirmed previous findings [12] that mite, pollen and pet allergens are top-scoring predictors of asthma, whilst pollens (and mite) are for rhino-conjunctivitis. However, execution of the feature selection algorithms multiple times led to variations in the sets and scores. This may be in part due to highly correlated variables, but also other latent causes. Such variability does not permit identification of one unique model, and therefore, a unique pathway. Our results suggest that pre-clustering of allergens into relevant families may help stabilize the feature selection phases and may potentially be more useful than standard classifications by source (e.g., pollen or mite).

The prediction models for eczema had poor performance, suggesting either that disruption in skin barrier function may be more important in the pathogenesis of eczema than IgE-mediated mechanisms or that other allergens not available on the chip may be predictive of eczema (e.g., Staphylococcus aureus enterotoxins) [23].

Conclusions

  1. Top of page
  2. Abstract
  3. Methods
  4. Results
  5. Discussion
  6. Conclusions
  7. Authors contributions
  8. Conflict of interest
  9. Funding
  10. References
  11. Supporting Information

Component-resolved diagnostic tests may offer a more accurate assessment of allergic diseases. However, further improvements in threshold ascertainment and/or value transformations for different allergen components, well-thought-out interpretation algorithms and selection of components are needed to fully capitalize on the potential of the technology.

Authors contributions

  1. Top of page
  2. Abstract
  3. Methods
  4. Results
  5. Discussion
  6. Conclusions
  7. Authors contributions
  8. Conflict of interest
  9. Funding
  10. References
  11. Supporting Information

MCFP machine learning modelling, manuscript writing; DB data extraction, statistics, statistical review, manuscript review; AS study cohort management, experimental set up, manuscript review; AC principal investigator, study design, manuscript review; IB statistical review, manuscript review.

Conflict of interest

  1. Top of page
  2. Abstract
  3. Methods
  4. Results
  5. Discussion
  6. Conclusions
  7. Authors contributions
  8. Conflict of interest
  9. Funding
  10. References
  11. Supporting Information

The authors declare no conflict of interest in relation to this study.

Funding

  1. Top of page
  2. Abstract
  3. Methods
  4. Results
  5. Discussion
  6. Conclusions
  7. Authors contributions
  8. Conflict of interest
  9. Funding
  10. References
  11. Supporting Information

MAAS is supported by grants from J P Moulton Charitable Foundation, and MRC Grants G0601361 and MR/K002449/1. Study was partly supported by the University of Manchester's Health Research Center (HeRC) funded by the Medical Research Council (MRC) Grant MR/K006665/1. ImmunoCAP ISAC® assay was performed by the ThermoFisher Scientific, Uppsala, Sweden, as an in-kind contribution; ThermoFisher Scientific had no role in study design, data collection, analysis, interpretation, writing of the report or the decision to submit the study for publication.

References

  1. Top of page
  2. Abstract
  3. Methods
  4. Results
  5. Discussion
  6. Conclusions
  7. Authors contributions
  8. Conflict of interest
  9. Funding
  10. References
  11. Supporting Information
  • 1
    Burrows B, Martinez FD, Halonen M, Barbee RA, Cline MG. Association of asthma with serum IgE levels and skin-test reactivity to allergens. N Engl J Med 1989: 320: 2717.
  • 2
    Beeh KM, Ksoll M, Buhl R. Elevation of total serum immunoglobulin E is associated with asthma in nonallergic individuals. Eur Respir J 2000: 16: 60914.
  • 3
    Simpson BM, Custovic A, Simpson A, et al. NAC Manchester Asthma and Allergy Study (NACMAAS): risk factors for asthma and allergic disorders in adults. Clin Exp Allergy 2001: 31: 3919.
  • 4
    Marinho S, Simpson A, Söderström L, Woodcock A, Ahlstedt S, Custovic A. Quantification of atopy and the probability of rhinitis in preschool children: a population-based birth cohort study. Allergy 2007: 62: 137986.
  • 5
    Simpson A, Soderstrom L, Ahlstedt S, Murray CS, Woodcock A, Custovic A. IgE antibody quantification and the probability of wheeze in preschool children. J Allergy Clin Immunol 2005: 116: 7449.
  • 6
    Taylor PE, Jacobson KW, House JM, Glovsky MM. Links between pollen, atopy and the asthma epidemic. Int Arch Allergy Immunol 2007: 144: 16270.
  • 7
    Gent JF, Belanger K, Triche EW, Bracken MB, Beckett WS, Leaderer BP. Association of pediatric asthma severity with exposure to common household dust allergens. Environ Res 2009: 109: 76874.
  • 8
    Wang J, Calatroni A, Visness CM, Sampson HA. Correlation of specific IgE to shrimp with cockroach and dust mite exposure and sensitization in an inner-city population. J Allergy Clin Immunol 2011: 128: 8347.
  • 9
    Sordillo JE, Webb T, Kwan D, et al. Allergen exposure modifies the relation of sensitization to fraction of exhaled nitric oxide levels in children at risk for allergy and asthma. J Allergy Clin Immunol 2011: 127: 116572.e5.
  • 10
    De Knop KJ, Bridts CH, Verweij MM, et al. Component-resolved allergy diagnosis by microarray: potential, pitfalls, and prospects. Adv Clin Chem 2010: 50: 87101.
  • 11
    Melioli G, Bonifazi F, Bonini S, et al. The ImmunoCAP ISAC molecular allergology approach in adult multi-sensitized Italian patients with respiratory symptoms. Clin Biochem 2011: 44: 100511.
  • 12
    Patelis A, Gunnbjörnsdottir M, Malinovschi A, et al. Population-based study of multiplexed IgE sensitization in relation to asthma, exhaled nitric oxide, and bronchial responsiveness. J Allergy Clin Immunol 2012: 130: 397402.e2.
  • 13
    Custovic A, Simpson BM, Murray CS, Lowe L, Woodcock A. The National Asthma Campaign Manchester Asthma and Allergy Study. Pediatr Allergy Immunol 2002: 13 (Suppl 1): 327.
  • 14
    Crapo RO, Casaburi R, Coates AL, et al. Guidelines for methacholine and exercise challenge testing-1999. This official statement of the American Thoracic Society was adopted by the ATS Board of Directors, July 1999. Am J Respir Crit Care Med 2000: 161: 10239.
  • 15
    Fayyad UM, Irani KB. Multi-interval discretization of continuous-valued attributes for classification learning. In: Bajcsy R, ed. Proceedings of the International Joint Conference on Uncertainty in AI. San Francisco, USA: Morgan Kaufmann, 1993: 10227.
  • 16
    Huber W, Von Heydebreck A, Sültmann H, Poustka A, Vingron M. Variance stabilization applied to microarray data calibration and to the quantification of differential expression. Bioinformatics 2002: 18(Suppl 1): S96104.
  • 17
    Bolstad BM, Irizarry RA, Astrand M, Speed TP. A comparison of normalization methods for high density oligonucleotide array data based on variance and bias. Bioinformatics 2003: 19: 18593.
  • 18
    Friedman J, Hastie T, Tibshirani R. Additive logistic regression: a statistical view of boosting (With discussion and a rejoinder by the authors). Ann Stat 2000: 28: 337407.
  • 19
    Murthy SK. Automatic construction of decision trees from data: a multi-disciplinary survey. Data Min Knowl Disc 1998: 2: 34589.
  • 20
    Breiman L. Random forests. Mach Learn 2001: 45: 532.
  • 21
    Millán E, Loboda T, Pérez-de-la-Cruz JL. Bayesian networks for student model engineering. Comput Educ 2010: 55: 166383.
  • 22
    Altmann A, Toloşi L, Sander O, Lengauer T. Permutation importance: a corrected feature importance measure. Bioinformatics 2010: 26: 13407.
  • 23
    Semic-Jusufagic A, Bachert C, Gevaert P, et al. Staphylococcus aureus sensitization and allergic disease in early childhood: population-based birth cohort study. J Allergy Clin Immunol 2007: 119: 9306.

Supporting Information

  1. Top of page
  2. Abstract
  3. Methods
  4. Results
  5. Discussion
  6. Conclusions
  7. Authors contributions
  8. Conflict of interest
  9. Funding
  10. References
  11. Supporting Information
FilenameFormatSizeDescription
pai12139-sup-0001-SupplementaryMaterial.docxWord document12413K

Figure S1. Example of a decision tree, classifying an outcome of interest (1/2) based on the values of variables x, y and z.

Figure S2. Distribution of IgE values from the ISAC chip (ISU scale) by selecting all (left panels) and strictly positive (>0, right panel) values.

Figure S3. Performance of RF and LR models by varying the input IgE transformation function, specifically (from left to right): binary 0.3 ISU threshold (blue colour), column-wise quartile discretisation, matrix-wise quartile discretisation, square-root transform, raw values, supervised discretisation, matrix-wise standardisation, quantile normalisation, and manufacturer's semi-quantitative scale.

Figure S4. Correlation plots for allergen IgEs. Spearman rank-correlation was used, and the correlation matrix has been sorted to group highly-correlated allergen groups.

Figure S5. Hierarchical clustering of allergen IgEs. Manhattan (L1 norm) distance was used, aggregating instances progressively with the complete linkage method.

Figure S6. Thresholded adjacency graph of allergen IgE, based on Spearman's rank-correlation.

Please note: Wiley Blackwell is not responsible for the content or functionality of any supporting information supplied by the authors. Any queries (other than missing content) should be directed to the corresponding author for the article.