Risk subgroups and intervention effects among infants at high risk for peanut allergy: A model for clinical decision making

The Learning Early About Peanut Allergy (LEAP) trial showed that early dietary introduction of peanut reduced the risk of developing peanut allergy by age 60 months in infants at high risk for peanut allergy. In this secondary analysis of LEAP data, we aimed to determine risk subgroups within these infants and estimate their respective intervention effects of early peanut introduction.


| INTRODUC TI ON
The prevalence of food allergy has increased over the past few decades, and studies indicate that hospitalizations due to food-induced anaphylaxis have also increased worldwide. 1,2Peanut allergy often manifests in early childhood and only about 20% of patients will outgrow their peanut allergy. 3,4 the early 2000s, delaying the oral introduction of allergens was the primary food allergy prevention strategy, but this recommendation was later withdrawn in many countries in the mid-2000s due to lack of evidence supporting the recommendation. 5,68][9] Among these studies, the findings of the Learning Early About Peanut Allergy (LEAP) trial showed that the prevalence of peanut allergy could be significantly reduced in high-risk patients with or without evidence of sensitization to peanut. 7[16][17][18][19][20] However, a clinical decision model that can be used for identifying subgroups of patients who may have varying risks for future peanut allergy and benefit differently from early introduction of peanut is lacking.Recent studies have focused on the effect of early introduction of peanut within pre-specified subgroups. 21,22A prognostic model that can predict the probability of peanut allergy development in high-risk infants avoiding peanut consumption has not been reported.This would allow for examination of the heterogeneity of the effect of intervening with early peanut introduction.Such a model could be part of the shared decision making in the approach to peanut allergy and a useful tool for the prevention of peanut allergy via early introduction.The goal of this study was to identify risk subgroups within infants at high risk for peanut allergy according to their predicted probability of developing peanut allergy if avoiding peanut.To do this, we performed a secondary analysis of publicly available data from the LEAP trial, a randomized clinical trial with well-characterized clinical data. 7We additionally estimated the intervention effect of early introduction of peanut for the risk subgroups that we identified.

| Study population and samples
The LEAP study was a randomized controlled trial studying whether early introduction of peanut could reduce the prevalence of peanut allergy at age 60 months among high-risk infants.The trial enrolled 640 infants from 4 to 11 months of age with severe eczema, egg allergy, or both diagnoses, and a peanut skin prick test (SPT) of ≤4 mm, and randomized them to peanut consumption (i.e.intervention) or peanut avoidance until 60 months of age. 7In this work, the intentionto-treat population of 314 participants from the avoidance arm was used to develop prognostic models to group the infants according to their predicted risk of peanut allergy at 60 months.To calculate intervention effects, a sub-population with 307 participants from the consumption arm (after removal of 7 participants who reacted to the baseline peanut oral food challenge and never received the intervention), was utilized along with participants in the avoidance arm.

Risk subgroups and interven
effects among infants at high risk for peanut allergy: a model for clinical decision making

Key messages
• Machine learning iden risk subgroups for peanut allergy at 60 months of age within LEAP pa pants.• Baseline peanut and Ara h 2 specific IgE were selected as predictors of 60-months peanut allergy.
• Infants from all risk subgroups benefit from early dietary peanut introduc on and those with higher probability can benefit more.

Key messages
• Machine learning identified risk subgroups for peanut allergy at 60 months within LEAP participants.
• Baseline peanut and Ara h 2-specific-IgE were selected as predictors of 60-months peanut allergy.
• Infants with higher predicted probability of peanut allergy benefit more from the early introduction intervention.
| 187 The individual participant-level data of LEAP are made available through ImmPort (SDY660) and ITN TrialShare (ITN032AD, www.
itntr ialsh are.4][25] More details about the study population and data preprocessing can be found in the Supplemental methods and Table S1.
Throughout the manuscript, 'baseline' refers to measurements at the start of the LEAP trial when the participants were between 4 and 11 months old.'12-month' and '60-month' refer to the visits in LEAP when the participants were 12 and 60 months old, respectively.

| Data analysis
The primary outcome is the allergy status at 60 months of age as determined by oral food challenge or diagnostic algorithm in the LEAP trial.The data analysis was carried out in two stages (Figure 1).
The first stage applied classification and regression tree (CART) modelling to derive risk subgroups of infants based on their predicted risk of peanut allergy at 60 months if avoiding peanut using data from 314 participants in the avoidance arm.A conditional random forest (RF) was used to select variables, based on importance scores, that were then included in the CART modelling.The variables that were included in this variable selection step via RF are shown in Table 1.For the main model (model 1), all immunoglobulin E (IgE) baseline data were truncated to account for the limit of detection (LOD) commonly used for IgE measurements by laboratories in clinical practice (i.e.0.09 kU/L) to derive a model with more clinical utility than when relying on a resolution of values at the lower limit of the measurements that is usually not available in clinical practice.
In the second stage, utilizing data from participants from the avoidance and consumption arms, the intervention effect was estimated for each risk subgroup identified from the model using baseline data with LOD adjustment (model 1) from the first stage.
Given the RCT design, participants randomized to receiving the intervention are considered exchangeable to those randomized to the control.However, the balance may not be maintained in each of the subgroups.To ensure the covariate balance, we applied stabilized inverse probability treatment weighting (sIPTW) and estimated the averaged treatment effect after ensuring the covariate balance in each risk subgroup.See Appendix S1 for details for each stage of data analysis.

| Sensitivity analysis
We performed analyses as described in the first stage to develop two additional models: one based solely on 12-month data (model 2) and another combining both baseline and 12-month data (model Raw data from the LEAP trial were downloaded from ITN TrialShare and preprocessed.Data from the baseline and/or 12-month (i.e. when the participants were 12 months old) visits were used to generate four different models.For each model, the first stage was the selection of important variables (see Appendix S1 for a complete list of included variables) for predicting peanut allergy status at 60 months of age using a conditional random forest on the participants from the avoidance arm.Important variables were then used to create a decision tree to group the infants in the avoidance arm based on their risk of peanut allergy at 60 months of age.
In the second stage, for models including only baseline data, the intervention effects of early peanut introduction were estimated for each risk subgroup (i.e.terminal node in the decision tree).
3).These models were to assess the impact of 12-month data and change in covariates between baseline and 12 months of age on predictions of peanut allergy at 60 months of age.
Additionally, we examined a model using baseline data without sIgE LOD modification (model 4) to determine its influence on peanut allergy prediction at 60 months and intervention effects.Details are available in Appendix S1.

| Study population
The baseline characteristics of 314 participants in the avoidance and 307 participants in the consumption arm of the LEAP trial were shown in Table 2 and Table S2, respectively, and further described in the corresponding section of the Supplemental results and Figure S1.

| Grouping infants avoiding peanut by their probability of peanut allergy at 60 months via CART modelling
A model using baseline data with LOD was developed to group the infants randomized to peanut avoidance based on their estimated probability of peanut allergy at 60 months.In the RF analysis, Ara h 2 and peanut sIgE at baseline were selected as important variables in predicting peanut allergy status at 60 months and were therefore used in the CART analysis (Figure 2A).A complexity parameter of 0.00769, corresponding to two splits, was found to be optimal for the CART model (Figure S2A), resulting in a tree with three terminal nodes (Figure 2B).The CART model had an AUC of 0.76, a sensitivity of 0.72, and specificity of 0.79 for predicting peanut allergy at 60 months (Table S3).When the peanut sIgE at baseline was ≥0. and false negative [N = 15]).Given that peanut sIgE at baseline was selected as one of the important variables, we explored longitudinal peanut sIgE levels and average trajectories for the participants in each one of the four cells in the confusion matrix (Figure 3).Details on peanut sIgE trajectories are in Appendix S1.

| Early peanut consumption intervention effects estimation
To estimate the intervention effect for each risk subgroup, the participants were classified using the CART model (Figure 2B).Of the 307 participants from the consumption arm, 78 were classified to the third terminal node and 4 of those participants were observed to be peanut allergic at 60 months.For this node, the data suggested that, among the 165 participants from both arms (n = 87 avoidance, n = 78 consumption arm), 3.88% (95% CI: [1.02%, 11.58%]) would be allergic at 60 months had these participants received the intervention, while their estimated proportion of peanut allergy had these participants not received the intervention was 44.29% (95% CI: [33.65%, 55.45%]).This suggests that the early peanut consumption intervention among the participants with a baseline peanut sIgE ≥0.22  For the other two nodes, the SMD for all baseline characteristics were reduced after sIPTW adjustment (Figure S3A).

| Sensitivity analysis
Model 2, which was developed using 12-month data only (Figures S4 and S5), had higher AUC than that of model 1 (Figure 4, Table S3).For model 3 (Figures S6 and S7), using both baseline and 12-month data to account for changes in biomarkers between visits, RF selected two variables reflecting changes between the two visits as important (i.e.difference in log Ara h 2 and slope of Ara h 2 sIgE), but they were not used by the final CART model (Figure S6B).Lastly, the optimal decision tree of model 4 using the same baseline data as model 1 without LOD modification closely

| DISCUSS ION
We utilized publicly available data from the LEAP trial to develop a model for grouping infants into risk subgroups based on their estimated risk of peanut allergy at 60 months of age if dietary peanut is avoided and estimated the individualized intervention effect of early introduction of peanut for each risk subgroup predicted by the model.We used a machine learning approach based on RF and CART.RF is an ensemble learning method used for both classification and regression by creating multiple decision trees during training and then using their combined predictions to make more accurate and robust predictions.Unlike standard regression procedures, RF allowed us to measure the importance of predictors.CART is also a type of decision tree algorithm that has been widely used due to high interpretability without the need of assumptions for underlying data distribution.A similar machine learning approach has previously been used in a study to predict the severity of peanut challenge outcomes. 26Our analysis provided patient-centred estimates of intervention effects accounting for all independent variables selected by RF importance measure and utilized in the CART TA B L E 2 Demographics and clinical characteristics of participants in the avoidance arm.analyses simultaneously.In contrast, the one variable at a time approach of traditional subgroup analysis is more susceptible to falsepositive and false-negative results. 27o recent studies have focused on the intervention effect of early introduction of peanut within prespecified subgroups using data from multiple trials.Logan et al. estimated the intervention effect among a number of different prespecified risk strata in pooled data from the LEAP and Enquiring About Tolerance (EAT) trials. 7,28eir causal effect subgroup analyses of the per-protocol population showed graded and significant beneficial effects of early introduction of peanut across all eczema severity and baseline peanut sIgE subgroups with the intervention being more beneficial among severe patients. 21These findings are consistent with the results of our study in that the intervention effects were graded.Contrasting Logan et al.'s methods, we used a machine learning approach and applied real-world LOD to determine the risk subgroups based on data from the avoidance arm, resulting in different sets of risk subgroups between the studies.

Robert et al. studied the timing of introduction of peanut
for peanut allergy prevention using data from LEAP, Peanut Allergy Sensitization (PAS), and EAT studies, and their model indicated that the amount of reduction in peanut allergy in the general population decreased with every month of delayed introduction. 22Furthermore, their model highlighted that the younger the participants, the higher their risk of peanut allergy by 36/60 months if peanuts were not introduced, as shown in Figure E3 of their paper. 22This aligns with our findings that interventions were especially beneficial for those with higher probability of peanut allergy.
Both studies suggest a greater intervention effect the earlier peanut is introduced into the diet. 21,22To account for variation in age at baseline (4-11 months), age was included in the list of variables used in our analysis.However, age was not selected by the RF variable selection process.This suggests that, while increasing age of introduction was shown to be associated with decreasing preventive benefit of early peanut introduction, it may not be as important as the variables that were selected for stratifying infants within the LEAP trial into risk subgroups when avoiding peanut.

Greenhawt et al. performed a secondary analysis on the LEAP
trial data using a logistic regression model with nonlinear interaction and found that the difference of probability of peanut tolerance between intervention and avoidance arms grew larger with increasing peanut SPT wheal size, suggesting that early peanut introduction may be more beneficial in subjects with greater peanut SPT wheal size. 29Our findings were similar in that there was a better intervention effect for the risk subgroups with higher predicted probability of peanut allergy characterized by higher peanut sIgE or higher Ara to impute peanut allergy status using LEAP data from the time of study outcome assessment when a food challenge result was not available, differing from our approach predicting peanut allergy at 60 months using baseline data. 30Additionally, peanut sIgE and Ara h 2 sIgE were identified by RF and used by the CART as predictors of peanut allergy at 60 months in our main analysis.[16][17][18][19][20] Our main model relied on baseline data in which sIgE measure- After the findings from the LEAP trial showed the significant beneficial effect of early oral introduction and sustained exposure of peanut for prevention, the complementary feeding recommendations in the United States changed to recommend earlier oral peanut introduction, particularly among those at high risk.31,32 Despite these new recommendations, the acceptability of early peanut introduction remains unclear on the part of caregivers. 31The EAT trial had a below 50% adherence rate to the protocol for high-dose consumption due to feeding difficulties and symptoms with food consumption. 33A survey among caregivers found that only 31% showed willingness to introduce peanut before or at 6 months of age with 40% of caregivers showing willingness to introduce peanut after 11 months of age. 34Furthermore, the intervention in LEAP trial was early and sustained introduction of peanut; emphasizing maintenance of consistent exposure, which requires commitment and persistence, and a recent study implied that early introduction of peanut alone without sustained exposure might not be enough to prevent peanut allergy. 35The models from this study may be helpful to aid clinicians in developing personalized approaches based on individual biomarker data, for educating caregivers about the benefits of early introduction and to counter any possible hesitancy.
To note is that all 6 peanut allergy cases in the study population of the intervention arm discontinued the intervention during the study period, with 5 out of the 6 participants discontinuing the intervention due to peanut-induced allergic symptoms. 7Thus, the intervention effects estimated in this study include the effect of being unable to comply with the protocol due to adverse symptoms associated with the intervention.For infants who cannot tolerate early introduction of peanut due to allergy, the focus should transition from prevention to treatment.Chua et al. suggested that peanut oral immunotherapy is an effective and safe intervention for infants who do not tolerate early introduction of peanut. 36On the other hand, three participants that discontinued the intervention due to adverse symptoms were non-allergic at 60 months.
Another limitation is the lack of additional public datasets that would allow us to test the model's performance against an external validation dataset, which limits the model's generalizability.
Additionally, the dataset in this study had a small number of cases, which resulted in even smaller sample size in some of the terminal nodes and large uncertainty in the intervention effect, but the effects in some risk subgroups were still significantly beneficial due to how effective the intervention was.
Lastly, the study results are limited to high-risk infants as specified by the inclusion and exclusion criteria of the LEAP trial.
Specifically, the LEAP trial included children aged 4-11 months who had an egg allergy, severe eczema, or both diagnoses.The eligibility criteria intended to identify a cohort of infants with an elevated risk for peanut allergy for enrolment into the LEAP study.As a result, in our model, the identified strata with low probability of peanut allergy do not represent a low-risk group in the general population.
The LEAP trial also excluded infants with a peanut SPT wheal of greater than 4 mm as they were considered likely to have peanut allergy. 37Our models cannot be applied to those who were excluded by the LEAP trial's criteria, including infants with a peanut SPT wheal >4 mm.Further studies are needed to see if the results might be accurate and consistent for these excluded infants.Given that the paediatric population that our model applies to is considered high risk, it is likely that they are being evaluated by both paediatricians and allergists, who would obtain sIgE measurements, making the model applicable to clinical practice.

| CON CLUS IONS
In conclusion, we used publicly available data from the LEAP trial to develop a prediction model for infants at high risk for peanut allergy grouping them into risk subgroups based on their estimated risk of peanut allergy at 60 months of age, if they avoided peanut.The calculated individualized intervention effects were significant across all risk subgroups.These results support the messaging that early and sustained introduction of peanut be recommended to infants at high risk for peanut allergy as defined by the LEAP trial, regardless of their probability of peanut allergy development.In addition, the intervention effect results showed that infants with higher probability of peanut allergy according to our model benefited more from the early introduction of peanut.Our findings suggest that within this high-risk population, further stratification can occur to identify infants for which the early intervention may be most important.

AUTH O R CO NTR I B UTI O N S
YL analysed the data, interpreted the findings and wrote the original draft of the manuscript; AD conceptualized the study, interpreted the findings, assisted with writing the manuscript, and provided funding; BH conceptualized the study, interpreted the findings, supervised the work, assisted with writing the manuscript, and provided funding; SA conceptualized the study, interpreted the findings, supervised the work, assisted with writing the manuscript and provided funding.All authors reviewed, edited, and approved the final manuscript.
P H I C A L A B S T R A C T

h 2
sIgE if avoiding peanut.Sever et al. developed a statistical model F I G U R E 2 Permutation importance from conditional RF variable selection for weighted CART and decision tree using baseline data with limit-of-detection adjustment.(A) Permutation importance score from the conditional RF.Important variables (peanut and Ara h 2 sIgE) were selected for the weighted CART analysis.(B) Decision tree of CART analysis and intervention effect estimation per terminal node.sIgE in kU/L.Details in Appendix S1.

F I G U R E 3 F I G U R E 4
ments were truncated to reflect the LOD used by laboratories in clinical practice to increase clinical utility.We performed a sensitivity analysis utilizing baseline data without adjustment for real-world LOD (model 4).The resulting model was very similar to our main model with the exception that one of the risk subgroups (peanut sIgE <0.22 kU/L, Ara h 2 sIgE <0.024 kU/L) had a statistically insignificant intervention effect of −2.18% (95% CI:[−6.69,2.23]) implying that there might be a subgroup of infants within the LEAP participants that has a low risk of peanut allergy at 60 months of age even without intervention.However, the risk of peanut allergy for this subgroup was nevertheless greatly reduced by 62.46% from 3.49% to 1.31% by the intervention.The majority of misclassification in our model came from falsepositive predictions.Given that the model predicted the probability of peanut allergy at 60 months based on baseline data and showed that children significantly benefit from early introduction of peanut across all risk subgroups, children with a false-positive prediction would likely consume peanut as part of early introduction without adverse reaction.Conversely, false-negative prediction may result in infrequent or unsustained peanut consumption due to caregiver perception of low risk.Additional discussion on the main model and sensitivity analysis is available in Appendix S1.Trajectories of peanut sIgE grouped by the peanut allergy status predicted by the model using baseline data with LOD (model 1) and the observed allergy status at 60 months of age.Each grey line represents one of the 314 infants from the avoidance arm used in the development of the CART.The blue line represents the average peanut sIgE.Receiver operating characteristic curves for the 4 CART models.'12 m data' refers to data when the participants were 12 months old.AUC, area under the receiver operating characteristic curve.