Development of childhood asthma prediction models using machine learning approaches

Abstract Background Respiratory symptoms are common in early life and often transient. It is difficult to identify in which children these will persist and result in asthma. Machine learning (ML) approaches have the potential for better predictive performance and generalisability over existing childhood asthma prediction models. This study applied ML approaches to predict school‐age asthma (age 10) in early life (Childhood Asthma Prediction in Early life, CAPE model) and at preschool age (Childhood Asthma Prediction at Preschool age, CAPP model). Methods Clinical and environmental exposure data was collected from children enrolled in the Isle of Wight Birth Cohort (N = 1368, ∼15% asthma prevalence). Recursive Feature Elimination (RFE) identified an optimal subset of features predictive of school‐age asthma for each model. Seven state‐of‐the‐art ML classification algorithms were used to develop prognostic models. Training was performed by applying fivefold cross‐validation, imputation, and resampling. Predictive performance was evaluated on the test set. Models were further externally validated in the Manchester Asthma and Allergy Study (MAAS) cohort. Results RFE identified eight and twelve predictors for the CAPE and CAPP models, respectively. Support Vector Machine (SVM) algorithms provided the best performance for both the CAPE (area under the receiver operating characteristic curve, AUC = 0.71) and CAPP (AUC = 0.82) models. Both models demonstrated good generalisability in MAAS (CAPE 8‐year = 0.71, 11‐year = 0.71, CAPP 8‐year = 0.83, 11‐year = 0.79) and excellent sensitivity to predict a subgroup of persistent wheezers. Conclusion Using ML approaches improved upon the predictive performance of existing regression‐based models, with good generalisability and ability to rule in asthma and predict persistent wheeze.


| INTRODUCTION
Childhood asthma is highly heterogeneous, with numerous factors contributing towards its development, persistence and severity. [1][2][3] Despite approximately 80% of asthmatic children developing symptoms (such as wheeze) before the age of six, these clinical symptoms are neither universally present in early life among all future asthmatics nor specific to asthma. 4 With the added difficulty of making an objective asthma diagnosis before the age of five, both under-treatment and over-treatment of wheezing disorders are common in early life. 5,6 The ability to predict the development of school-age asthma can help to identify high-risk preschool children and distinguish them from children whose symptoms are likely to be transient. 7 Furthermore, early prediction of asthma susceptibility will be critical for the successful implementation of potential primary prevention strategies to reduce the risk of developing asthma.
A recent systematic review identified twenty-one logistic regression-based models for predicting childhood asthma. 8 However, none of these models have been implemented into standard clinical practice, possibly due to relatively weak predictive power, poor generalisability and need for specialised clinical testing. The review further proposed that regression-based methods for predicting childhood asthma may have been exhausted, with the identified models offering similar predictive power to each other and being unable to be significantly improved upon. 8 Machine learning approaches have increasingly been applied to a wide range of healthcare problems due to their ability to integrate large quantities of heterogeneous data, handle complex interactions between variables and identify patterns within data. 9 Particularly for disease prediction, where interactions between biological variables are complex, machine learning approaches have the potential to identify novel predictors which may have been previously overlooked by regression-based approaches. [9][10][11] Furthermore, application of methods to reduce model overfitting may address the poor generalisability of existing prediction models in independent populations. Machine learning approaches have shown promise in predicting a variety of clinical asthma outcomes, phenotypes and decisions, [12][13][14][15][16] including the diagnostic or prognostic prediction of school-age asthma development. [17][18][19][20][21][22][23][24][25] While these studies tend to offer improved predictive performance, none of these studies support their findings with external validations of their models or explain how their "black-box" models (where relevant) arrive at their predictions. Without these two components, machine learning models will fail to obtain the trust of physicians and continue to be limited in their clinical utility, regardless of the superior prediction accuracy they may offer. 26,27 This study aimed to utilise machine learning approaches to improve upon the performance of traditional regression methods and develop explainable and independently validated prediction models for childhood asthma. Two prognostic prediction models, the Childhood Asthma Prediction in Early-life (CAPE) and Childhood Asthma Prediction at Preschool-age (CAPP) models, were developed to predict school-age asthma at 10 years, within a general populationbased cohort, using information available from the first two years and first four years of life, respectively.

| Prediction outcome
School-age asthma, evaluated at age 10, was defined as "a doctor diagnosis of asthma ever and at least one episode of wheezing or use of asthma medication in the last 12 months". Only individuals with a reported asthma status at the 10-year follow-up were included in the analyses (n = 1368).

| Candidate predictors
Fifty-four candidate predictors previously reported to be associated with childhood asthma, and for which data was available in the IOWBC, were identified (Table E1). Candidate predictors included data on subject demographics, lifestyle, clinical symptoms of allergy and asthma and environmental exposures collected across three time points: at birth (prenatal and perinatal data), early life (combined exposure at either the 1-year or the 2-year follow-ups) and at preschool age (4-year follow-up).

| Model development
All stages of model development were performed independently for the CAPE and CAPP models ( Figure 1).

| Feature selection
For each model, feature selection was performed on the complete dataset for all available candidate predictors (without any missing values) using Recursive Feature Elimination (RFE) with a random forest algorithm, using fivefold cross-validation (see supportinginformation S1).

| Model construction and optimisation
To identify the best classification algorithm, seven machine learning classifiers were implemented: two support vector machines (SVM) (linear and radial basis (RBF) kernel functions), decision tree, random forest, naive Bayes, multilayer perceptron, and K-Nearest Neighbours (see supporting information S1).
Each machine learning algorithm was initially trained and evaluated on the subset of individuals who had complete data for the F I G U R E 1 Workflow for the development and validation of asthma prediction models using machine learning approaches. Model development in the Isle of Wight Birth Cohort (IOWBC) was performed independently in for the construction of the CAPE and CAPP tools. (A) Feature selection was performed using only individuals with complete data for all candidate predictors. (B, C) Seven machine learning classifiers (two support vector machines with different kernel functions (linear and radial basis function), naïve Bayes classifier, decision tree, multilayer perceptron, random forest and K-nearest neighbours) were developed. Models were developed using complete data for the subset of features identified from feature selection (B), and subsequently redeveloped using optimised training datasets (C). Training dataset optimisation consisted of the step-wise application of imputation and resampling (oversampling using ADASYN and random undersampling) to the entire IOWBC dataset not allocated to the test dataset, including those with missing predictor data (CAPE: n = 1113; CAPP: n = 1185). (D) The best CAPE and CAPP models were selected based on performance in the test set. (E) Selected models were externally validated to predict school-age asthma at ages 8 and 11 years in an independent population (Manchester Asthma and Allergy Study, MAAS). † The performance of the best CAPE model was developed on the complete training dataset, undersampled to balance class proportions (n = 136). The best CAPP model was developed on the complete training dataset, with cases oversampled by 300% and controls undersampled to balance class proportions (n = 408) predictors selected through RFE. The dataset was split (ratio of 2:1, preserving class proportions) into a training and holdout test set for model development and validation, respectively (Figure 1). Within a fivefold cross-validation, the hyperparameters for each model were tuned using a grid search, optimising for its balanced accuracy (see supporting information S1, Table E2).
The training dataset was then optimised to further improve the performance of the classification algorithms. Multiple imputation using Multivariate Imputation by Chain Equations (MICE), 29 oversampling using an adaptive synthetic sampling approach (ADASYN), 30 and random under-sampling were implemented in a stepwise approach to address the degree of missing data and class imbalance in the training set (see supporting information S1). The seven algorithms were then redeveloped, with hyperparameters retuned, on each optimised training set to identify the best asthma prediction model(s) and tested on the same holdout test set ( Figure 1).
The best CAPE and CAPP models were selected based on their discriminative performance on the test set using the area under the receiver operating characteristics curve (AUC). Sensitivity, specificity, positive and negative predictive values (PPV and NPV), positive and negative likelihood ratios (LR+ and LR−), balanced accuracy, F1score and Brier score were reported at the optimal threshold that maximized the Youden's Index, with 2000 bootstrap samples used to calculate 95% confidence intervals for the performance measures.

| External validation
The best performing models were validated in the Manchester Asthma and Allergy Study (MAAS) cohort 31 to predict school-age asthma at ages eight and eleven ( Figure 1, see supporting information S1). Data extracted from MAAS was closely matched to maximise the similarity of predictor and outcome definitions used in the development cohort (Table E3).

| Sensitivity analyses
Sensitivity analyses were conducted to comprehensively evaluate the developed models, including evaluations of (i) their generalisability in high risk subgroups; (ii) their robustness to predict an alternative definition of school-age asthma; (iii) the resolution of the predictions to distinguish between individuals presenting with distinct wheeze phenotypes throughout childhood and adolescence; and (iv) their performance compared to similar regression-based models (see supporting information S1).

| Explaining the 'black-box' models
SHapley Additive exPlanations (SHAP) 32 were used to evaluate feature importance and provide global explanations for how predictions were made by the CAPE and CAPP models (see supporting information S1). Examples of how SHAP can be used locally to explain individual predictions were also provided.

| RESULTS
In the IOWBC, 1368 enrolled participants had a defined asthma outcome at age 10, of whom 201 (14.69%) were asthmatic. Baseline characteristics between individuals with complete data were largely comparable with the full IOWBC dataset (Table E4).

| Childhood Asthma Prediction in Early-life (CAPE) model
Complete data on all 39 predictors collected by age two was available for 490 individuals. RFE identified a subset of eight predictors for inclusion in the CAPE model, with an average balanced accuracy of 64.49%. Figure 2A details the feature importance, direction, and magnitude of asthma risk for each selected predictor based on SHAP.
Complete data for these eight predictors was available for 765 individuals; 510 (68 asthmatics) and 255 (34 asthmatics) individuals were allocated to the initial training and test sets, respectively. An SVM classifier (RBF kernel) was the best performing classification algorithm for the CAPE model (AUC = 0.71, Brier score = 0.21) (Table 1A).

| External validation of the CAPE model
To predict the development of asthma at the 8-year and 11-year time-points in MAAS, complete data on the eight CAPE predictors was available for 322 and 299 individuals, respectively. Table E5 compares the distribution of predictors in the IOWBC and MAAS.
The CAPE model demonstrated moderate generalisability, maintaining an AUC = 0.71 at both 8 and 11 years (Table 1A; Figure 3), despite slight reductions in PPV. In the high-risk subgroups, despite a 3%-4% increase in PPV, overall predictive performance decreased (Table 1A).

| Childhood Asthma Prediction at Preschool-age (CAPP) model
For the CAPP model, 373 individuals had complete data for all 54 candidate predictors available by age four. RFE identified an optimal subset of 12 predictors for inclusion in the model, with an average balanced accuracy of 74.93% ( Figure 2B). Complete data for these 12 predictors was available for 548 individuals, of whom 365 (51 asthmatics) and 183 (25 asthmatics) individuals were assigned to the initial training and test sets, respectively. The best performing classification algorithm for the CAPP model was an SVM (linear kernel) classifier (AUC = 0.82, Brier score = 0.18) (Table 1B).  (Table 1B).

| Sensitivity analysis
The CAPE and CAPP models were robust in correctly predicting nonasthmatics using the alternative asthma definition (similar NPV). However, neither model was robust in predicting asthmatics, with an increase in false positive predictions reducing the PPV by approximately 50% for both models, likely due to disagreement between the original and modified asthma definitions (Table E6; Figure E1). Furthermore, both models showed excellent power to predict a persistent wheeze phenotype, with 100% and 90% of individuals with persistent wheeze offered a positive prediction by the CAPE and   The CAPP model was developed using an SVM classification algorithm using a linear kernel (C = 0.33), and was trained on the training dataset consisting of individuals with complete data, with cases oversampled by 300% and controls under-sampled to obtain a 1:1 class ratio.
c High-risk, defined as a child having at least one parent with allergic disease (asthma, eczema or allergic rhinitis).
d High-risk, defined as a child with both parents with allergic disease (asthma, eczema or allergic rhinitis).

| Comparison with regression methods
Both the CAPE and CAPP models outperformed their equivalent logistic regression models (Table E7; Figure 3). There was a sub-  (Table 2) and MAAS (Table E8).

| Explaining the "black-box" models
Based on SHAP, only a subset of predictors included in each model were shown to have a major contribution on the predictions-early life cough and wheeze for the CAPE model and preschool cough, atopy and polysensitisation for the CAPP model ( Figure E2). The contributions of these predictors were consistent with explanations of individual predictions ( Figure E3). Redevelopment of the models including only these highly contributing predictors showed similar performance for the CAPP model but a 10% fall in AUC for the CAPE model ( Figure E4).

| Comparisons with existing models
To date, twenty-one regression-based prediction models have been developed for childhood asthma (reviewed in Kothalawala et al. 8 ), of which only six have been externally validated (Table E9). A recent systematic review further identified 10 studies that developed prediction models for childhood asthma using machine learning approaches, but only eight specifically predicted school-age asthma (5-14 years). 26 Another study directly compared the performance of a current regression-based asthma prediction model, PARS, with a conditional inference tree-based decision rule model using the same predictors. 25 However, none of these studies externally validated the machine learning models they proposed.
Similar to the CAPE and CAPP models, most published asthma prediction models are very good at ruling out asthma rather than ruling in asthma, resulting partly from low power due to low asthma prevalence. 8 Even if existing models offer good PPV, this often degrades upon validation. 8

| Predictor selection and availability
Both the CAPE and CAPP models include data collected across  (Table E10), the predictive benefit offered by the inclusion of sensitisation was deemed to outweigh the potential reduction in applicability.
Of the predictors selected for inclusion in the two models, some were well-established risk factors with a clear inferred direction of asthma risk (Figure 2). Others were predictors which have not previously been used in childhood asthma prediction models (maternal age at the time of the child's birth, age of solid food introduction and total breastfeeding duration) and offer a less clear direction of asthma risk. The selection of these novel predictors, over others that have a more established biological relevance in the literature (such as parental asthma, eczema or allergic rhinitis), may be cautiously accepted by the clinical community. However, RFE identifies the subset of features that collectively offer the best predictive accuracy rather than devising a comprehensive list of childhood asthma risk factors, which may be biologically sound but lacking in predictive power. 37 In fact, the predictors of wheeze and cough were among those repeatedly included in the majority of machine learning models identified to date. 26 The predictors of atopy, polysensitisation and wheeze were also included in Owora et al.'s machine learning model, however the predictors were taken from the PARS model rather than being identified from an independent feature selection. 25 It is also important to acknowledge the possibility that the selection of these novel predictors may stem from an inherent bias of the random forest algorithm to assign greater importance to features which are continuous or which have a large number of categories. 38 However, as the CAPE and CAPP models developed using these selected predictors demonstrated improved performance against existing prediction models, any bias stemming from the feature selection process did not appear to limit the inclusion of features that were truly predictive of school-age asthma.

| Prediction generalisability, robustness and resolution
In the unselected MAAS cohort, the CAPE and CAPP models showed moderate to good generalisability to predict asthma across school ages, despite the marginal decline in the PPV of the CAPE model. Validation in high-risk MAAS subgroups showed the PPV of both models to increase with the number of allergic parents, suggesting that confidence in ruling in asthma improves in high-risk groups; but replication in a larger study population is required.
The lack of a clear definition for asthma is an unavoidable limitation in epidemiological studies. 39 The asthma definition used in this study aimed to account for children with a clinical indication of asthma (physician diagnosed) who were actively symptomatic, but also those potentially asymptomatic at the time of assessment due to the use of symptom relieving medications. Whilst both models were robust in predicting non-asthmatics using an alternative asthma definition of wheeze and bronchial hyper-responsiveness (BHR), they had reduced power to predict true asthmatics (∼50% decline in PPV).
The latter may be explained by objective tests, such as spirometry and BHR, being influenced by treatment; potential asthmatics on controller medications, whom the models are trained to identify as asthmatic, may be considered as non-asthmatic with the alternative definition, resulting in greater false positive predictions.
As the aim of this study was to compare whether machine learning approaches could improve upon existing regression-based models that predict childhood asthma, the primary prediction outcome for this study was restricted to school-age asthma rather than predicting asthma phenotypes. However, acknowledging the importance of exploring specific sub-phenotypes of asthma, the resolution of the machine learning models to predict an individual's future wheeze trajectory was explored. Notably, both the CAPE and CAPP models showed excellent sensitivity to predict individuals with a persistent wheeze phenotype; these individuals would likely benefit from early primary or secondary asthma prevention/management.

| Strengths and limitations
This study had a number of strengths. First, each model was developed to make timely predictions to identify future asthmatics within a general population, rather than among those already considered at high-risk (mainly those experiencing wheeze or with a familial history of asthma/allergy). Second, by utilising machine learning methods, novel predictor subsets for school-age asthma were identified and the developed models offered improved predictive performance over current regression-based methods. Third, to our knowledge, this is the first study to externally validate asthma prediction models developed using machine learning approaches. The models demonstrated good generalisability to predict school-age asthma across multiple time-points, without degrading the predictive power to rule in asthma (particularly with the CAPP model). Fourth, the two models displayed excellent sensitivity to predict a subgroup of individuals with persistent wheeze. Finally, this study was able to use SHAP to address one of the key issues preventing the uptake of machine learning methods in clinical practice-the inability to interpret the models and explain the individual predictions made.
However, this study was limited by both model development and validation being conducted in predominantly Caucasian populations.
Machine learning also requires large datasets-the introduction of more data would undoubtedly improve the performance of the machine learning models and offer more precise performance estimates with smaller confidence intervals. To retain a sample size appropriate for machine learning, feature selection was conducted before performing a train-test split. This decision could have resulted in information leakage, potentially biasing the performance seen in the IOWBC test sets. To mitigate any bias, external replication was used to evaluate the models; as performance in MAAS was similar to the IOWBC, data leakage was not deemed a significant problem. Finally, whilst genomic data was available in the IOWBC, only clinical and environmental predictors were considered in order to maximize the clinical applicability of the models. It is possible that the consideration of genomic predictors might significantly improve childhood asthma predictions further 22,40 ; however, the aim of this study was to explore whether machine learning methods could surpass the predictive ceiling that existing logistic regression methods appeared to be limited to. Hence, to provide a fair comparison with existing regression-based models, such asthma biomarkers were not incorporated into this study.

| CONCLUSION AND FUTURE WORK
Using machine learning, the CAPE and CAPP models were able to surpass the predictive performance of similar models developed using traditional logistic regression-based methods. Both models were generalisable in an independent population, with the CAPP model also demonstrating superior predictive power to rule in true asthmatics compared to its benchmark model (and was retained upon validation). Future application of these models could include the development of a personalised tool/app capable of providing explanations of which predictors contributed to an individual's predicted probability of developing asthma. Both models also demonstrated excellent sensitivity to predict a subgroup of persistent wheezers. Therefore, rather than developing an all-encompassing asthma prediction tool, further research into predicting specific 'asthmas' using machine learning approaches may offer greater predictive insight and clinical utility. Finally, continued exploration of machine learning approaches and the identification and integration of novel biomarkers is warranted to further improve the power to predict future childhood asthma.