Component‐specific clusters for diagnosis and prediction of allergic airway diseases

Previous studies which applied machine learning on multiplex component‐resolved diagnostics arrays identified clusters of allergen components which are biologically plausible and reflect the sources of allergenic proteins and their structural homogeneity. Sensitization to different clusters is associated with different clinical outcomes.

and rhinitis (both contemporaneously and longitudinally) that were previously unseen using binary sensitization to clusters.A more detailed description of the subjects' within-cluster c-sIgE responses in terms of the number of positive c-sIgEs and unique sensitization patterns added new information relevant to allergic diseases, both for diagnostic and prognostic purposes.For example, the increase in the number of withincluster positive c-sIgEs at age 5 years was correlated with the increase in prevalence of asthma at ages 5 and 16 years, with the correlations being stronger in the prediction context (e.g. for the largest 'Broad' component cluster, contemporaneous: r = .28,p = .012;r = .22,p = .043;longitudinal: r = .36,p = .004;r = .27,p = .04).

| INTRODUC TI ON
IgE-mediated sensitization in early childhood is a risk factor for asthma and rhinitis in adolescence. 1,2However, the data about the strength of this association are inconsistent, [3][4][5][6] and there are no reliable and reproducible sensitization parameters on which to base accurate diagnosis and disease risk prediction. 4,7,8Furthermore, in a clinical setting, confirmation of sensitization using standard diagnostic tests does not necessarily prove that a patient's symptoms are caused by an IgE-mediated reaction. 2What is needed are biomarkers to differentiate in individual sensitized patients whether sensitization is important for current or future symptoms, or a chance finding of little relevance to the disease.Quantification through standard diagnostic tests (i.e. using IgE titre or size of skin test response rather than commonly used cut-offs to define sensitization) can increase the specificity (both in terms of diagnostic accuracy 9,10 and the capacity to predict the persistence of symptoms 11 ), but the problem of false-positive test results remains. 2,12[17][18] The development of component-resolved diagnostics (CRD) multiplex arrays has enabled the simultaneous analysis of >100 component-specific IgE (c-sIgE) responses, providing detailed sensitization profiles for individual patients. 19,20Several c-sIgEs to specific components in early life have been identified as risk molecules for predicting asthma in adolescence, 21 and c-sIgE polysensitization to house dust mite (HDM) allergenic proteins predicts allergic disease. 22Furthermore, the time of onset of c-sIgE response is important, 23 and the time of emergence of specific c-sIgE sensitization patterns in preschool age is associated with a subsequent development of respiratory symptoms. 23,24D arrays produce complex data sets which allow the application of machine learning (ML) techniques to interrogate the data.This approach has identified clusters of c-sIgEs to multiple allergenic proteins associated with asthma diagnosis 25,26 and severity 27 in cross-sectional studies, as well as asthma development in longitudinal studies. 23,24,28One previous study has demonstrated that the key associate of childhood asthma diagnosis is not the c-sIgE response to any individual molecule but, rather, a connectivity structure among c-sIgE responses. 26 have previously described the evolution of IgE responses to multiple allergen components throughout childhood and described the sequential development of allergen component clusters which were biologically plausible and reflected the sources of allergenic proteins (e.g.cat, grass and HDM clusters), or structural homogeneity of components within protein families (e.g.Pathogenesis-related [PR]-10 and profilin clusters). 24Of note, almost identical component clusters were uncovered using different ML methodologies in studies of paediatric and adult patients with asthma, 27 demonstrating a remarkable similarity in the structure of the CRD component sensitization patterns in the general population and among patients with asthma. 24,26,27wever, whilst it is possible to ascertain the latent structure within the CRD array data, there are challenges in extending this into clinically useful tools for the diagnosis and prediction of allergic airway diseases.
In our previous work, 24 we showed that sensitization to at least one component within an allergen component cluster (which we term 'binary sensitization') can be associated with the occurrence of current or future allergic disease.Here, we hypothesise that the number of c-sIgE responses within each component cluster, and their distinct patterns, alter the risk of disease.To address our hypothesis, we extended our previous analyses which described the sequential development of allergen component clusters derived based on c-sIgE response profiles across participants in a population-based birth cohort, 24  • Distinct within-cluster sensitization patterns differ in their association with health, asthma and rhinitis.

| Study design, setting and participants
The Manchester Asthma and Allergy Study (MAAS) is an unselected birth cohort; participants were recruited prenatally and followed prospectively 29 (details in the Data S1).The study was approved by the Research Ethics Committee; parents gave written informed consent.Study subjects attended reviews at ages 1, 3, 5, 8, 11 and 16 years.Parentally-reported symptoms, physiciandiagnosed diseases and treatments received were ascertained using interviewer-administered validated questionnaires.Information on demographic and clinical characteristics of the study participants was reported previously 24 and is summarized in Table E1.

| Measurement and clustering of component-specific IgE antibodies
Blood samples were collected from participants who gave assent.C-sIgEs to 112 allergen components were measured using ImmunoCAP ISAC (Thermo Fisher, Uppsala, Sweden) at each follow-up.Data were discretized and annotated using the pipeline described previously 24 (details in Data S1).In a previous analysis, 24 by clustering c-sIgEs using a Bernoulli mixture model, 30 we derived allergen component clusters at each follow-up.We labelled clusters based on the profile of c-sIgE responses; cluster names and membership were described previously 24 and are summarized in Data S1 (Figure E1, Tables E2 and E3).Briefly, the number of component clusters differed across ages (ranging from one at age 1 to six at age 16). 24The only cluster present at all time points was the 'Broad' cluster (c-sIgEs membership of this cluster differed at different ages).The HDM cluster remained unchanged from age 3 years onwards.The 'Grass' cluster emerged at age 3 years, absorbed Fel d 1 at age 5 to form the 'Grass/Cat' cluster (Figure E1), which remained unchanged at age eight.Fel d formed its own singleton cluster at age 11 years.The 'PR-10/profilin' cluster formed at age 11 years and divided into two at age 16: 'PR-10' and 'Profilin' (Figure E1).

| Definition of c-sIgE sensitization structure within a component cluster
For the purposes of the current analysis, we defined each subject's within-cluster sensitization structure as follows:

| Binary
Positive c-sIgE response to at least one component within a multicomponent cluster, or a response to a single-component cluster's component.

| Response counts
Total number of allergen components within a cluster with a positive c-sIgE response.Children with positive response to six or more components in the 'Broad' cluster were binned together for the analysis of the associations with clinical outcomes.

| Sensitization patterns
We determined intra-cluster sensitization patterns based on the observed patterns of IgE responses to individual components within each cluster.

| Current asthma
At least two of the three features: (1) Current wheeze: (positive answer to the question 'Has your child had wheezing or whistling in the chest in the last 12 months?');(2) Current use of asthma medication; (3) Physician-diagnosed asthma ever. 31

| Current rhinitis
Positive answer to 'In the past 12 months, has your child had a problem with sneezing or a runny or blocked nose when he/she did not have a cold or the flu?'. 32

| Statistical analysis
C-sIgE responses at age 5 were the earliest age used for analysis of the association with clinical outcomes due to the sparsity of data at ages 1 and 3 years. 24We therefore investigated the relationship between the binary cluster sensitization and the internal within-cluster sensitization structure of clusters derived at age 5 and age 16 years 24 with clinical outcomes.We carried out cross-sectional analyses at ages 5 and 16, as well as longitudinal analyses, with c-sIgE data at age 5 years predicting clinical outcomes at age 16.
Associations between clinical outcomes and binary cluster sensitization and intra-cluster sensitization patterns were examined using logistic regression (multiple and univariable), reported as odds ratios (ORs) with 95% confidence intervals (CIs), with Bonferroni correction.Children who did not respond to the relevant cluster's components were assigned to a 'non-sensitized' baseline group.
To visualize non-linear trends, a smoothed, binomial generalized additive model (GAM) was fitted to a cluster's response counts and a given clinical outcome (when they contained multiple allergen components) using the package mgcv() with default parameters 33,34 with credible regions showing the uncertainty in the inferred nonlinear trend.To quantify linear associations, Pearson correlation coefficients were calculated for each cluster-outcome combination and were reported alongside their p-values and a FDR threshold of .05(Benjamini-Hochberg).All analyses were conducted in R. 35

| RE SULTS
Participant flow is shown in Figure E2. 24Demographic characteristics of the study population, number of subjects with CRD data at each follow-up and the proportion with positive responses to active components, as well as the relationship between sensitization clusters and clinical outcomes are described in detail in our previous analysis. 24CRD data were available for ≥1 time point for 922/1184 children. 24lusters from ages 5 and 16 are the focus of the remaining analysis in this study.Given the non-uniform nature of the response patterns seen in 'HDM' and 'Grass' clusters (Figure 1), we assessed their intra-cluster sensitization patterns at age 5 years.

| Sensitization patterns
Table 1 shows the ranked frequencies of the intra-cluster sensitization patterns in 'HDM' and 'Grass/Cat' clusters at age 5.The most common c-sIgE response pattern in the 'Grass/Cat' cluster was c-sIgE to Fel d 1 only (n = 27/168, Table 1A), with the next most common response being sensitization to all components in the cluster The most common pattern for children sensitized to the 'HDM' cluster at age 5 years had c-sIgE to all four components (n = 48/106, 'Complete' sensitization, Table 1B).The next two most commonly observed sensitization patterns were 'Group 1' (Der p 1, Der f 1) and 'Group 2' (Der p 2 and Der f 2; n = 25/106 and n = 20/106 respectively); only these three patterns were observed in >10% of the subjects in this cluster.

F I G U R E 1
Frequencies for the total number of positive responses to components within each component cluster (response counts) for ages 1-16 years by each subject.Includes responses from participants who had a positive response to at least one active component at a given time point.This excludes those who did not respond to any of the active components at that time point (ranging from 81% of participants at Age 1 to 42.7% at Age 16).

| Component-specific clusters for diagnosis and prediction of allergic airway diseases
To facilitate comparison between binary cluster sensitization and withincluster internal structure in relation to clinical outcomes, Figure E3 shows previously derived ORs and 95% CIs from multivariable logistic regression between binary sensitization of component clusters at ages 5 and 16 years in relation to asthma, wheeze and rhinitis. 242.1 | Within-cluster sensitization structure and clinical outcomes: Cross-sectional analysis

Response counts
Figure 2 shows the relationship between within-cluster number of positive c-sIgEs (response counts) and contemporaneous asthma, wheeze and rhinitis for ages 5 and 16 years.
The increase in the number of within-cluster positive c-sIgEs in the Broad cluster at age 5 years was correlated with an increase in the proportion of participants with asthma and wheeze at age 5 (r = .28,p = .012,The 'Just Fel d 1' pattern was the only other pattern to significantly associate with all three clinical outcomes.In contrast, the 'All bar (except) Fel d 1' and 'Just Phl p 4' patterns were associated with rhinitis, but not asthma or wheeze.

Response counts
Figure 4 shows the relationship between number of positive c-sIgEs within each cluster at age 5 years and the diagnosis of asthma, wheeze and rhinitis at age 16.The proportion of study participants with asthma and wheeze increased with the increasing number of c-sIgE responses to the Broad cluster (asthma: r = .36,p = .004,Figure 4A; wheeze: r = .27,p = .04,Figure 4B).There was also an upward trend observed in the HDM and Grass/Cat clusters for asthma and wheeze, but a test for non-zero linear correlation did not reach significance in either case (HDM cluster asthma: r = .21,p = .15;wheeze: r = .19,p = .22;Grass/Cat cluster asthma: r = .16,p = .18;wheeze: r = .07,p = .58).The proportion of adolescents with rhinitis increased significantly with increasing number of c-sIgE responses in the Grass/Cat cluster (r = .44,p = 1.77•e −05 , Figure 4C).

| DISCUSS ION
4][25][26][27] We have shown that among sensitized individuals, a detailed description of within-cluster c-sIgE responses, both in terms of number of positive responses and distinct sensitization patterns, adds potentially important information relevant to allergic diseases.This observation was of importance for both diagnostic and prognostic purposes.Our data demonstrate that for a more precise ascertainment of the risk of current and future allergic diseases, we may need accurate information about the extent of sensitization and its specific patterns for unique sets of allergenic molecules within component clusters, as well as the precise timing of onset of sensitization.Increasing the resolution of the patterns shown here may help better understanding of the pathophysiological processes giving rise to different symptoms, and may facilitate the development of diagnostic algorithms, with potential use for the prediction of current and future risk.

Our findings in relation to component sensitization at age 5 years
show that intra-cluster patterns and response counts reveal new associations with contemporaneous clinical diagnoses, previously unseen using the binary definition of sensitization to component clusters.The association we describe here between the 'Group 2' (Der p 2, Der f 2) pattern in HDM cluster and asthma was also seen in another cohort, 36 raising the question of whether properties Der F I G U R E 3 Odds ratios and 95% CIs from univariable logistic regression for the associations between the responses to the House Dust Mite (HDM) and Grass/ cat clusters at age 5 and current asthma, rhinitis and wheeze at age 5. Response pattern terminology is defined in Table 1.p 2 molecules (such as, for example, lipid binding 37 ) may affect its potency as an allergen and its physiological role.
Binary sensitization to the Grass/Cat cluster at age 5 years was associated with asthma and wheeze in our previous analysis. 24wever, when we break the within-cluster sensitization into different patterns, only two of the five patterns ('Complete' and 'Just Fel d 1') are associated with asthma.Similarly, although binary sensitization to Grass/Cat cluster was not significantly associated with rhinitis, four of the five intra-cluster patterns were (all patterns except Just Phl p 1).This supports the notion that it is not just the number of components being responded to which matters, but the specific components involved.
We gain further potentially valuable information when the number of positive components within each cluster is considered in the analysis (particularly for the Broad cluster).For example, binary sensitization to the Broad cluster was only significantly associated with F I G U R E 5 Odds ratios and 95% CIs from univariable logistic regression for the association between the responses to the House Dust Mite (HDM) and Grass/ cat clusters at age 5 and current asthma, rhinitis and wheeze at age 16.Response pattern terminology is defined in Table 1.Similarly, the new associations between response counts for the Broad cluster (number of positive c-sIgEs within-cluster) at age 5 years and asthma at the same age also held for the risk of this outcome at age 16, with their correlations becoming stronger.
There are several limitations to our study.The main limitation of assessing CRD sensitization patterns within any given cluster is the size of a cluster and the number of components within the cluster.For example, in clusters characterized by a large number of components (e.g.Broad), the number of possible sensitization patterns may become intractably large and unlikely to be observed at a high enough frequency.The feasibility and inference that can be made are heavily reliant on sample size.Furthermore, the way cluster sensitization patterns are described here makes their analysis relatively inflexible and intractable for other clusters.Other approaches such as network analysis may provide additional valuable information. 26,27other limitation of assessing sensitization patterns as done here is that it does not take responses from other clusters into account.To improve predictive performance, the patterns uncovered here could be combined as features within a generalized linear model or ML algorithms. 38,39r analyses should be viewed predominantly as a proof-ofconcept study.It is reassuring that different unbiased approaches revealed clinically and biologically meaningful patterns, in that: (1)   clustering of c-sIgEs uncovered clusters which reflect the sources of allergenic proteins and/or their structural homogeneity [25][26][27] ; (2) the number of c-sIgE responses within each component cluster adds further information relevant for the diagnosis and prediction of asthma and rhinitis; and (3) distinct within-cluster sensitization patterns differ in their association with health, asthma and rhinitis.This should give us confidence that measuring sensitization using CRD arrays may be more informative than standard tests in respiratory allergy.
Importantly, these findings suggest that it may be possible to develop

CO N FLI C T O F I NTER E S T S TATEM ENT
Dr. Custovic reports personal fees from Sanofi personal fees from AstraZeneca, personal fees from Stallergen Greer, personal fees from GSK, outside the submitted work.AS reports lecture fees from Thermo Fisher Scientific.Other authors have no competing interests to declare.

DATA AVA I L A B I L I T Y S TAT E M E N T
The decision to supply research data to a potential new user will be compliant with the MRC independent oversight of data access and sharing policy (https:// www.mrc.ac.uk/ publi catio ns/ browse/ mrcpolic y-and-guida nce-on-shari ng-of-resea rch-data-from-popul ation -and-patie nt-studi es/ ) and the existing access mechanisms which comply with the governance requirement.
to investigate the internal structure of such derived clusters.Within each cluster, we considered the number of positive responses to individual components, and specific c-sIgE response patterns.We then ascertained the associations of the derived within-cluster sensitization structure with allergic diseases.Conclusion: Among sensitized individuals, a more detailed description of withincluster c-sIgE responses in terms of the number of positive c-sIgE responses and distinct sensitization patterns, adds potentially important information relevant to allergic diseases.K E Y W O R D S asthma, component-resolved diagnostics, diagnosis, machine learning, prognosis, rhinitis Key messages • Clustering of c-sIgEs uncovers structure which reflects sources of allergenic proteins and/or their structural homogeneity; • Number of positive c-sIgEs within each cluster adds information relevant to the diagnosis/prediction of asthma.

Figure 1
Figure 1 shows the observed frequencies of positive c-sIgE responses within each of the component clusters across the six time points.For some multi-component clusters, e.g. the HDM cluster, non-uniform response patterns were clear, for example, a c-sIgE ('Complete' sensitization, n = 21/168).Three further patterns were present in at least 10% of the subjects sensitized to this cluster (only Phl p 1, only Phl p 4, all bar [except] Fel d 1).

Figure 2A ;Figure 3
Figure2A; r = .22,p = .043,Figure2B).Similarly, the prevalence of asthma and wheeze at age 16 increased with the increasing number of c-sIgE responses within the Broad cluster at age 16 years (r = .36,p = .0009,Figure2D; r = .27,p = .015,Figure2E); similar findings were observed for HDM cluster (r = .29,p = .019,Figure2D; r = .28,p = .0244,Figure2E).The proportion of children with rhinitis increased with increasing number of c-sIgE responses in the Grass cluster (r = .60,p < .00001,Figure2F).Of note, the relationships in Figure2are non-linear and the correlation coefficient only captures the linear trend, however, the non-linear GAM fit also shows significant non-linear upward trends in each case.

Figure 5
Figure5shows the associations between current asthma, wheeze and rhinitis at age 16 years in relation to the intra-cluster sensitization patterns at age 5 years.For HDM cluster, 'Complete' sensitization was associated with a higher risk of all of the outcomes at age 16 (OR [95% CI] -asthma 9.4 [4.6-19.4];wheeze: 6.8 [3.3-13.8];rhinitis:

F I G U R E 4
Cluster response counts from age 5 and their smoothed prevalence of (A) asthma, (B) wheeze and (C) rhinitis at age 16.For single-component clusters: the bars represent the proportion of responders with asthma (shown with standard error).For multiplecomponent clusters: smoothed fit and 95% confidence intervals are displayed.Broad cluster responses larger than 5 are binned together.
CRD-based screening biomarkers to help diagnose asthma, prognostic biomarker to assess disease severity among sensitized asthmatics, and to ascertain future risk of asthma among sensitized children.In conclusion, among sensitized individuals, a more detailed description of within-cluster c-sIgE responses in terms of the number of c-sIgE responses and distinct sensitization patterns add potentially important information relevant to allergic diseases.Incorporating this information may facilitate the development of diagnostic algorithms, with potential use for the prediction of current and future risks.A combination of network analysis, unsupervised and supervised statistical learning techniques may offer an analytical framework to capitalize on complex CRD data, and the models may learn which interactions among c-sIgEs are important to differentiate different types of sensitization (benign and pathologic 40 ) and their relationship to allergic diseases.To achieve this will require a large-scale international collaboration to increase the sample size and generalizability.AUTH O R CO NTR I B UTI O N S RH: Data curation; analysis; methodology; software; visualization; writing-original draft.SF: Data curation; analysis.AS and CSM: Validation; writing-review and editing; project administration.AC and MR: Conceptualization; funding acquisition; resources; supervision; writing-original draft; writing-review and editing.FU N D I N G I N FO R M ATI O N Medical Research Council (MRC) grant MR/S025340/1.
Frequencies for each subject's response pattern at age 5 years to the (A) 'Grass/cat' cluster's components and (B) 'HDM' cluster's components.Response patterns above the intersecting lines were responded to by at least 10% of the subjects who had a positive response to the respective cluster at age 5. 'All bar Fel d 1' refers to all components being active except for Fel d 1.
TA B L E 1