Differential associations of allergic disease genetic variants with developmental profiles of eczema, wheeze and rhinitis

Abstract Background Allergic diseases (eczema, wheeze and rhinitis) in children often present as heterogeneous phenotypes. Understanding genetic associations of specific patterns of symptoms might facilitate understanding of the underlying biological mechanisms. Objective To examine associations between allergic disease‐related variants identified in a recent genome‐wide association study and latent classes of allergic diseases (LCADs) in two population‐based birth cohorts. Methods Eight previously defined LCADs between birth and 11 years: “No disease,” “Atopic march,” “Persistent eczema and wheeze,” “Persistent eczema with later‐onset rhinitis,” “Persistent wheeze with later‐onset rhinitis,” “Transient wheeze,” “Eczema only” and “Rhinitis only” were used as the study outcome. Weighted multinomial logistic regression was used to estimate associations between 135 SNPs (and a polygenic risk score, PRS) and LCADs among 6345 individuals from The Avon Longitudinal Study of Parents and Children (ALSPAC). Heterogeneity across LCADs was assessed before and after Bonferroni correction. Results were replicated in Manchester Asthma and Allergy Study (MAAS) (n = 896) and pooled in a meta‐analysis. Results We found strong evidence for differential genetic associations across the LCADs; pooled PRS heterogeneity P‐value = 3.3 × 10−14, excluding “no disease” class. The associations between the PRS and LCADs in MAAS were remarkably similar to ALSPAC. Two SNPs (a protein‐truncating variant in FLG and a SNP within an intron of GSDMB) had evidence for differential association (pooled P‐values ≤ 0.006). The FLG locus was differentially associated across LCADs that included eczema, with stronger associations for LCADs with comorbid wheeze and rhinitis. The GSDMB locus in contrast was equally associated across LCADs that included wheeze. Conclusions and clinical relevance We have shown complex, but distinct patterns of genetic associations with LCADs, suggesting that heterogeneous mechanisms underlie individual disease trajectories. Establishing the combination of allergic diseases with which each genetic variant is associated may inform therapeutic development and/or predictive modelling.


| INTRODUC TI ON
Although asthma, eczema (atopic dermatitis) and allergic rhinitis (AR) are diagnosed clinically as distinct conditions, their co-occurrence is well acknowledged. 1 Twin heritability estimates vary vastly, the range has been reported to be between 35% and 95% for asthma, 33% and 91% for AR and 71% and 84% for eczema, 2 and estimates of the genetic correlation between them are also high: 0.55 between asthma and eczema, 0.47 between asthma and rhinitis and 0.62 between eczema and rhinitis. 3 This supports shared genetic factors between all three conditions, but also genetic factors that are specific to each. Although we are starting to uncover both shared and specific genetic risk factors, understanding more precisely which loci predispose individuals to particular patterns of symptoms should translate into clinical benefits, such as personalized approaches to disease management and identification of novel therapeutic targets. 4 Children often present with broad and heterogeneous phenotypes of allergic diseases. 5,6 For example, some cases have mild symptoms affecting a single organ/system, but others have more severe symptoms encompassing multiple organs (eg skin, upper and lower airways).
The term "atopic march" has been proposed for a specific pattern of progression from childhood eczema to subsequent asthma and AR. 7 We have previously used Bayesian machine learning methods to model the developmental profiles of eczema, wheeze and rhinitis during childhood in two population-based birth cohorts: the Avon Longitudinal Study of Parent and Children (ALSPAC) and the Manchester Asthma and Allergy Study (MAAS). 8 Eight latent classes were identified, each characterized by unique patterns of diseases over time: "No disease" (51.3% of the children), "Atopic march" (3.1%), "Persistent eczema and wheeze" (2.7%), "Persistent eczema with later-onset rhinitis" (4.7%), "Persistent wheeze with later-onset rhinitis" (5.7%), "Transient wheeze" (7.7%), "Eczema only" (15.3%) and "Rhinitis only" (9.6%).
A recent genome wide association study (GWAS) 9 identified 136 independent risk variants (P < 3 × 10 −8 ) for asthma, eczema and AR, including 73 novel variants, and went on to link the pathophysiology of allergic diseases to 132 neighbouring genes. Although this study was designed to detect risk variants shared across allergic diseases, six SNPs in five established risk loci (rs61816761, rs921650, rs115288876, rs12470864, rs6594499 and rs61839660) were identified as having some disease-specific effects. In this study, we aimed to examine the associations between these 136 genetic variants associated with allergic diseases 9 and the developmental profiles of eczema, wheeze, and rhinitis ("Latent Classes of Allergic Diseases", LCADs) we previously described 8 to better understand the genetic heterogeneity between the allergic disease class profiles.

| Study populations
Avon Longitudinal Study of Parents and Children is a populationbased birth cohort which recruited 14 541 pregnant women with expected delivery dates between 1st April 1991 and 31st December 1992. The study protocol was described previously. 10,11 Ethical approval was obtained from the ALSPAC Ethics and Law Committee and local research ethics committees. The study website contains details of all the data that are available through a fully searchable data dictionary: http://www.bris.ac.uk/alspa c/resea rcher s/dataacces s/data-dicti onary/ .
Manchester Asthma and Allergy Study is an unselected birth cohort established in 1995 in Manchester. 12 It included 1184 children born between February 1996 and April 1998. Participants were recruited prenatally and followed prospectively. 13 The study was approved by the Local Research Ethics Committee.

| Latent classes of allergic diseases (LCADs)
Eight LCADs were previously identified using a latent disease profile model in a Bayesian machine learning modelling framework applied to longitudinal individual reports of eczema, wheezing and rhinitis collected at multiple time points between ages 1 and 11 years using joint ALSPAC and MAAS data 8,14 : "No disease", "Atopic march," "Persistent eczema and wheeze," "Persistent eczema with later-onset rhinitis", "Persistent wheeze with later-onset rhinitis," "Transient wheeze," "Eczema only," "Rhinitis only"; these are described in more detail in the Supplemental Material. Posterior probabilities of class membership (mean probabilities between 0.76 and 0.94 in joint ALSPAC and MAAS data) were used as the outcome in this study (Table S2 in Belgrave et al 8 ).

| Genotyping
In ALSPAC, we used the Illumina HumanHap550 quad chip and imputed against the Dec 2013 Version 1 Phase 3 release of 1000 Genomes reference haplotypes 15 using IMPUTE2. 16 Participants with genetic evidence of non-European ancestry were excluded before underlie individual disease trajectories. Establishing the combination of allergic diseases with which each genetic variant is associated may inform therapeutic development and/or predictive modelling.

K E Y W O R D S
asthma, atopic dermatitis, Avon Longitudinal Study of Parents and Children, eczema, genetics, rhinitis imputation. Further details describing the QC and imputation procedure of the ALSPAC genetic data can be found here. 17 In MAAS, DNA samples were genotyped using Illumina 610 quad and imputed using the 1000 Genomes Phase 3 integrated variant set reference. 18

| Genetic risk score
Per-allele SNP dosages were extracted in ALSPAC for the 136 SNPs identified in previous GWAS, 9 and proxies were identified for SNPs with imputation quality < 0.80 (Table S1). All 136 relevant SNPs were available; SNP rs34290285 was excluded due to poor imputation quality and lack of adequate proxy. All 135 SNPs included in the ALSPAC score were available in MAAS; SNP rs10305290 was monomorphic and hence excluded from the MAAS score (see Supplemental Material). A weighted polygenic risk score (PRS) including 135 SNPs with INFO ≥0.77 was derived, weighted according to the overall effect sizes observed in Ref. 9 . The standardized PRS represented a per 1-standard deviation increase in the weighted risk score (details in Supplemental Material). Individual SNPs were also analysed separately, where each SNP was coded as the dosage of the risk allele.
Following investigation of individual SNPs, a modified PRS, which excluded any SNPs with evidence for differential association, was generated to test for any residual effects among the remaining SNPs.

| Association analysis
For the PRS and each SNP, a multinomial logistic regression was conducted, and relative risk ratios (RRR; also known as multinomial odd ratio) with corresponding 95% confidence intervals (Cis) for associations of the PRS or SNP dosages with the LCADs were reported. All regressions were weighted by posterior membership probabilities (the probabilities that a child belongs to each class) to account for uncertainty in class membership. "No disease" class was used as the baseline group.
Heterogeneity P-values from chi-square tests excluding (degrees of freedom = 6) and including (degrees of freedom = 7) the "No disease" class were generated (using post-estimation test command) to test for both associations with any latent class and differential associations across LCADs. To further assess heterogeneity, pairwise tests among disease latent classes were performed when omnibus test for differential associations provided nominal evidence for overall heterogeneity.
To assess the level of power in the analysis of ALSPAC data, power calculations were conducted for associations with individual phenotypes (details in the Supplemental Material and Table S2).
The associations of the nominal SNPs (P < .05) identified in ALSPAC and the genetic risk scores (original and modified versions) with LCADs were then tested in MAAS. Pooled estimates and overall test for heterogeneity between sub-groups (including and excluding "No disease" class) were derived. All statistical analyses were conducted with Stata 15.0. 19

| Characteristics of the study populations
A total of 6345 participants in ALSPAC and 896 in MAAS had both genetic and outcome data. Characteristics of the study populations are presented in Table 1.
Similar proportions of males (51.1%, 53.8%), lower social class (43.9%, 36.9%) and lower maternal educational level (57.5%, 55.3%) were reported in ALSPAC and MAAS, respectively (for more details see Supplemental Material). Parents of participants excluded from the study because of missing data tended to be from the lower social class (P = 3.54 × 10 −29 ) and lower maternal educational level (P = 1.31 × 10 −62 ) in ALSPAC, as seen before. 20 Marginal evidence for lower social class among the excluded individual was observed in MAAS (P = .10), but not for lower maternal educational level (P = .39). No differences were found between proportions of LCADs in the study populations vs. the excluded samples in either cohort (Table 1).

| Standardized genetic score (original)
There was strong evidence for an association between the standardized PRS and the LCADs (het. P-value = 3.2 × 10 −30 including "No disease" class) and strong evidence for differential association with the PRS across the latent classes (het P-value = 3.3 × 10 −13 excluding "No disease" class), with stronger associations observed for classes with more than one disease ( Figure 1 and Table S3). Pairwise tests provide further evidence for differential associations between multiple-disease and single-disease classes: P-values ≤ 9.68 × 10 −10 for "atopic march" vs "transient wheeze," "eczema only" and "rhinitis only" (Table S4). in FLG) and rs921650 (within an intron of GSDMB), showed strong evidence for differential association between the LCADs (P-values 0.002 and 1.42 × 10 −5, respectively, excluding "No disease" class) (Table S5 and Figures 2 and 3). Pairwise tests provide further evidence of these differential associations: P-values ≤ 8.8 × 10 −3 for "transient wheeze" vs "persistent eczema LO rhinitis", "eczema only"

| Individual SNP associations
and "rhinitis only" in the association between rs921650 (GSDMB) and the LCADs; P-values ≤ 3.0 × 10 −3 for "atopic march" vs. "persistent wheeze LO rhinitis," "transient wheeze" and "rhinitis only" in the associations between rs61816761 (FLG) and the LCADs (Table S6). TA B L E 1 Characteristics of the study populations in ALSPAC and MAAS cohorts A modified PRS which excluded these two SNPs still showed strong evidence for differential association (heterogeneity Pvalue = 9.6 × 10 −12 , Table S3; Figure 4), suggesting that there are additional differential associations among the SNPs which did not meet our significance threshold.

Study sample d
Four additional SNPs (rs11652139, rs479844, rs6990534 and rs5743618) showed nominal evidence of differential association across the LCADs (heterogeneity P-values < .05, Table S5). Pairwise tests provide further evidence of these differential associations. For example: P-values ≤ 6.8 × 10 −3 for "eczema only" vs "atopic march," "persistent wheeze LO rhinitis" and "transient wheeze" in the association between rs11652139 and the LCADs; P-values ≤ .007 for "atopic march" vs "transient wheeze" and "eczema only" in the association between rs6990534 and the LCADs (Table S6).

| Standardized genetic scores (original and modified)
The associations between the PRS and LCADs in MAAS were remarkably similar to ALSPAC (Table S7 and Figure 1). In the pooled F I G U R E 1 Forest plot of the associations between the standardized genetic score and allergic disease latent classes in ALSPAC and MAAS cohorts analysis, there was strong evidence for differential association across LCADs (excluding the baseline class; P-value = 3.27 × 10 −14 ).
To investigate the nature of these differences, we compared ORs  Figure 1).

| Individual SNP associations
We tested six nominally associated individual SNPs from ALSPAC (P < .05) in MAAS (Table S8). Pooled results are shown in Table 3.
We found moderate evidence for differential associations across LCADs (excluding the baseline class) for SNPs rs61816761, rs921650 and rs11652139 (pooled heterogeneity P-values ≤ .006).
To investigate the nature of these differences, we compared ORs  Figure 3). Little or no evidence of association was seen for other LCADs.

F I G U R E 3 Forest plot of the associations between SNP rs921650[A] near GSDMB gene and allergic disease latent classes in ALSPAC and MAAS cohorts
The pattern of associations for SNP rs11652139 was very similar to SNP rs921650 (within an intron of GSDMB) ( Table 3; Figure   S1). For SNP rs479844, we found evidence for a moderate effect on "Persistent eczema with late-onset rhinitis" (pooled RRR 1.21, 95%

| D ISCUSS I ON
We examined the associations between 135 independent risk variants for allergic diseases identified in a recent GWAS 9 and LCADs which we previously described in two independent birth cohorts. 8 SNP rs61816761 (a protein-truncating variant in FLG gene) and SNP rs921650 (within an intron of GSDMB) which were previously identified as having disease-specific effects 9 were differentially associated with distinct LCADs. The FLG locus was associated with all LCADs that included eczema, with stronger associations seen for those classes with comorbid wheeze and/or rhinitis. In contrast, the strength of the

| rs61816761 (FLG)
rs61816761 (also known as R501X) has long been linked with eczema. 21 In Ferreira's GWAS, 9  compared with individuals experiencing only AR. Similarly, a 1.26fold difference was observed when comparing individuals experiencing only eczema with individuals with only asthma. This indicated that this SNP predisposes to eczema more strongly than to either of the other two conditions. In the present study, we found evidence for a large effect on each of the four latent classes including eczema, but no association with classes which did not include eczema. We observed the strongest association with "Atopic march" (pooled RRR 3.01). This is in line with a recent meta-analysis which found that the two FLG mutations combined (R501X and FLG 2282del4) were associated with Atopic march. 22 It is tempting to speculate that genotyping patients with eczema for FLG mutations could help to identify individuals who may benefit from interventions targeted at prevention of progression to the atopic march. 23 However, the difference in odds between the "Atopic march" versus "Eczema only" classes is of insufficient magnitude to be of a clinical predictive value. It is also important to note that whilst FLG mutations (including R501X) play a role in predisposing individuals of Caucasian ancestry to eczema, such mutations have not been seen in other ethnic groups. 24

| rs921650 (GSDMB)
In Ferreira's GWAS, 9  and thus may affect the frequency and severity of rhinovirus infection and early childhood wheezing illness, 28 that impaired anti-virus immunity is associated with early-life wheezing, 29 and that early-life antibiotic use (a proxy for impaired innate anti-virus immunity) is associated with 17q21 polymorphisms. 30

| Interpretation of the Polygenic Risk Score (PRS)
We report strong evidence of heterogeneity of associations across the LCADs (pooled het. P-value = 3.3 × 10 −14 excluding "No disease" class). The strongest association was seen for "Atopic march"

| Strengths and Limitations
The prevalence of rhinitis-related classes was higher in MAAS compared to ALSPAC. 8 However, in the original study, similar latent profiles were identified in MAAS and ALSPAC, suggesting consistent patterns across the two populations. 8 Furthermore, we have used latent classes derived from joint modelling, which accounted for these differences and increased the resolution of the identified latent classes. 8 The restricted sample (after removal of individuals who had missing genetic and/or outcome data) resulted in a loss of participants

| Conclusions
We found strong evidence for differential genetic associations across different developmental profiles of eczema, wheeze and rhinitis, which were remarkably consistent across two cohorts.
Two polymorphisms (a protein-truncating variant in FLG and a SNP within an intron of GSDMB) showed evidence for distinct patterns of association. The FLG locus was associated with all profiles that included eczema, but with stronger associations for those with comorbid wheeze and/or rhinitis. The GSDMB locus in contrast was associated with all profiles which included wheeze (including wheezing up to age 5 years with remission by age 8 years), but with no additional risk of comorbid conditions. This emphasizes the likely complex and heterogeneous mechanisms underlying within-individual disease trajectories and demonstrates the need for future studies to take account of the complex nature of these associations. Our analysis using a PRS also demonstrates that there is likely to be additional heterogeneity among other SNPs, that this study did not have power to detect. This approach to disentangling the complex nature of multi-trait aetiology might be a promising one that should be used in future larger studies.

ACK N OWLED G EM ENTS
We are extremely grateful to all the families who took part in this study, the midwives for their help in recruiting them, and the whole ALSPAC team, which includes interviewers, computer and laboratory technicians, clerical workers, research scientists, volunteers, managers, receptionists and nurses. The authors would like to thank the Manchester Asthma and Allergy Study participants and their parents for their continued support and enthusiasm. We greatly appreciate the commitment they have given to the project. We would also like to acknowledge the hard work and dedication of the study teams (post-doctoral scientists, physiologists, research fellows, nurses, technicians and clerical staff).

DATA AVA I L A B I L I T Y S TAT E M E N T
The data that support the findings of this study are available from the corresponding author upon reasonable request.