Modelling trajectories of parentally reported and physician‐confirmed atopic dermatitis in a birth cohort study *

In a population‐based birth cohort, we aimed to identify longitudinal trajectories of atopic dermatitis (AD) during childhood using data from different sources (validated questionnaires and healthcare records). We investigated the impact of different AD definitions on such trajectories and their relationships with various risk factors.

T.S. and S.H. made an equal contribution to this work as joint first authors. A.S. and A.C. made an equal contribution as joint senior authors.

Data availability
The data that support the findings of this study are available from the corresponding author uponreasonable request.

Summary
Background In a population-based birth cohort, we aimed to identify longitudinal trajectories of atopic dermatitis (AD) during childhood using data from different sources (validated questionnaires and healthcare records). We investigated the impact of different AD definitions on such trajectories and their relationships with various risk factors. Methods Of the 1184 children born into the study, 1083 had information on current AD for at least three follow-ups from birth to age 11 years and were included in the analysis for parentally reported AD (PRAD). Data were transcribed from healthcare records for 916 of 1184 children for the analysis of doctor-diagnosed AD (DDAD). We also derived a composite definition of AD (CDAD) (at least two of the following: PRAD, DDAD, current use of AD treatment). Using latent class analysis (LCA), we determined longitudinal profiles of AD using the three definitions. Filaggrin (FLG) genotype data were available for 803 white participants. Results For PRAD, LCA identified four AD classes ('no AD', 'persistent', 'earlyonset remitting' and 'late-onset'). For DDAD and CDAD, the optimal number of phenotypes was three ('no AD', 'persistent' and 'early-onset remitting'). Although AD classes at population level appeared similar in different models, a considerable proportion of children (n = 485, 45%) moved between classes. The association with FLG genotype, atopic diseases and early-life risk factors was inconsistent across different definitions, but the association with oral food challenge-confirmed peanut allergy was similar, with a nine-to 11-fold increase among children in the persistent AD class. In a CDAD model, compared with the early-onset remitting class, those with persistent AD were significantly more likely to have (at age 3 years) moderate/severe AD, polysensitization and current wheeze, and were less likely to have been breastfed. Conclusions Standardized composite definitions of AD may help to define AD cases with more precision and identify more consistent long-term trajectories.
What is already known about this topic?
• Atopic dermatitis (AD) is heterogeneous, but there is no general consensus on what the different subtypes are.
• Techniques such as latent class analysis (LCA) have been used to disentangle the long-term course of AD.
• AD phenotypes assigned the same name in different studies often differ in the age of onset, temporal trajectory, distributions within a population and associated risk factors, which makes comparisons difficult and clinical application uncertain.

What does this study add?
• We report that the use of different data sources and definitions of AD has a major influence on the number and type of AD phenotypes identified by longitudinal LCA, in addition to the phenotype membership among individual children.
• Although AD latent classes, at a population level, appeared similar when different definitions were used, almost half of the children changed class allocation in different models.
• The association with oral food challenge-confirmed peanut allergy across all models was similar, with a striking nine-to 11-fold increase among children in the persistent AD class.
Atopic dermatitis (AD) is the most common chronic inflammatory skin disease affecting infants and children. 1 Skin symptoms vary in severity and distribution between and within individuals, and in longitudinal trajectories of remission and relapse. [2][3][4][5] AD usually starts in early infancy, with symptoms remitting for a proportion of affected individuals in later childhood. 6 However, a substantial number of children can be affected by symptoms beyond childhood, 7 and some patients develop AD for the first time in adulthood. 8 Although heterogeneity of AD is now widely accepted, 5,9,10 there is no general consensus on what the different subtypes are. 11 Data-driven techniques such as latent class analysis (LCA) have been used to disaggregate other atopic diseases, 12 and such an approach to derive longitudinal trajectories of AD may help in understanding its heterogeneity. To date, latent classes (often referred to as 'phenotypes') of childhood AD derived by LCA were reported in several birth cohorts. [13][14][15][16] Similar phenotypes have been identified across all studies, [13][14][15] including persistent, early-onset and late-onset AD. 5,11 However, although the latent classes tend to share the same nomenclature, 5 phenotypes with the same name in different studies often differ in the age of onset, prevalence within a population and associated risk factors. 11 This heterogeneity may, in part, be caused by the use of different AD definitions. 17 Hanifin and Rajka proposed AD diagnostic criteria based on a physical examination, 18 but there is no uniform definition for questionnaire-based diagnoses for large cohort studies in which regular physical examination is not feasible. 19,20 Questionnaire data can provide useful information on parentally reported symptoms, but AD diagnosis via the use of questionnaires is challenging. 21 Consequently, diverse definitions were used across different epidemiological and genetic studies. 17,22 Our previous study showed that the use of different AD definitions significantly influences the prevalence estimates and the association with risk factors. 17 To date, the trajectories of AD have been mainly derived from a parent/guardian reporting a characteristic rash on validated questionnaires. [13][14][15][16] However, questionnaire data might be unreliable, and combining the parental report with diagnosis by a family physician may offer a more precise ascertainment of AD presence. 23 Data from primary care electronic health records (EHRs) 24 and prescription data 25 have been utilized to identify patients with AD and ascertain its trajectory. As EHRs may contain information on different skin inflammatory diseases, several studies have proposed algorithms to define individuals with AD more precisely. 25,26 For example, a combination of AD diagnostic code and two codes for skin-directed therapy had > 80% positive predictive value for physician diagnosis of AD. 26 However, there are currently no uniform definitions for any of the data sources. 21,27 In this study, we aimed to elucidate the impact of using different AD definitions derived from different data sources (validated questionnaires administered in a structured research framework, and data from primary healthcare medical records) on mapping the long-term course of AD. We then ascertained the relationships between the derived AD trajectories with different risk factors [including filaggrin (FLG) loss-offunction mutations] and clinical outcomes (such as peanut allergy and asthma in later childhood). Finally, among children with early-onset AD, we investigated whether we could distinguish children whose symptoms persisted from those with remitting AD.

Study design, setting and participants
The Manchester Asthma and Allergy Study is a populationbased birth cohort. 28 The study was approved by the Research Ethics Committee. Written informed consent was obtained from parents. A detailed description of the methods used in this study is provided in File S1 (see Supporting Information).

Review clinics
Information on AD and other symptoms, treatments received and environmental exposures was collected using validated interviewer-administered questionnaires completed by parents/guardians for children at ages 1, 3, 5, 8 and 11 years. Children underwent physical examination by the study physician to ascertain AD presence and severity at follow-up clinics. We assessed allergic sensitization using skin prick tests. 29 Peanut allergy was confirmed in school-age children using oral food challenges as previously described (File S1). [30][31][32] Primary care medical records Data from healthcare records, including AD diagnosis and prescriptions for topical treatments, were transcribed by a trained paediatrician. 33

Definition of variables
We used the following three definitions of current AD (AD in the previous 12 months):

Statistical analysis
Firstly, we compared the prevalence of current AD using different definitions (PRAD, DDAD and CDAD) and mapped its course using transition analysis. We then applied LCA for repeated measures to identify longitudinal AD trajectories using different definitions. Children with data for at least three timepoints were included in the analysis. The optimal number of classes was determined using the Bayesian information criterion and bootstrapped likelihood ratio test. To check the robustness of this analysis, we conducted LCA among children with complete datasets. To assess the impact of different definitions on the derived classes, we compared the proportion of children moving between classes, and calculated the adjusted Rand index (ARI) to compare clustering results. We then used multinomial logistic regression to explore the association between FLG genotype and early-life risk factors, with AD latent classes for each definition. Finally, for each latent class, we ascertained the risk of developing other atopic diseases in late childhood, and whether, among children with early-onset AD, we can distinguish those whose symptoms persisted from those with remitting symptoms.

Participant flow and characteristics of study population
Of the 1184 children born into the study, 1083 (91%) had information on current AD for at least three follow-ups. Data were transcribed from healthcare records for 916 of 1184 children. FLG genotype data were available for 803 white participants. Characteristics of the study population are shown in Table S1 (see Supporting Information); there were no differences in demographic or clinical characteristics between children with healthcare records data and those without this information.

Definitions of atopic dermatitis: impact on prevalence estimates and longitudinal trajectories
The estimated prevalence of current AD differed considerably between PRAD, DDAD and CDAD definitions ( Figure S1; see Supporting Information). The overlap between definitions at each timepoint is shown in Figure S2 (see Supporting Information); the highest frequency of children with AD using all three definitions was at age 3 years (n = 213, 30%). The prevalence and severity of AD confirmed at physical examination at each follow-up is shown in Table S2 (see Supporting Information). The prevalence was highest at age 1 year (19%), after which it remained unchanged up to age 11 years (13-15%). Moderate-to-severe AD was confirmed for 5% of children at age 1 year, and $ 2% from age 3 years onwards. Compared with DDAD, children with PRAD or CDAD were significantly more likely to have AD upon physical examination ( Figure S3; see Supporting Information). Table S3 shows AD severity at each follow-up in different definitions; in general, the proportion of children with moderate/severe AD was highest in CDAD.
The highest remission rate between two adjacent timepoints was between ages 1 year and 3 years, and was highest for DDAD (File S1 and Table S4; see Supporting Information).

Latent class analysis
We included 1083 children in the LCA for PRAD, 916 children for DDAD and 1007 children for CDAD. Model fit parameters for each definition are shown in Table S5 (see Supporting Information). For PRAD (Figure 1a), the following four-class model was the optimal solution: no AD (58%), persistent AD (11%), early-onset remitting AD (21%) and lateonset AD (10%). For DDAD and CDAD (Figure 1b, c), the optimal solution was the following three latent classes: no AD (62% and 75%), persistent AD (8% and 6%) and early-onset remitting AD (29% and 19%). Sensitivity analyses for children with complete data suggested a similar solution for PRAD and CDAD, but a two-class solution for DDAD (Table S6, Figure S4; see Supporting Information).
The posterior probabilities of class membership were generally high, with only 70 (6Á5%) children, 22 (2Á4%) children and 43 (4Á3%) children having low membership probability (< 0Á60) of any class in PRAD, DDAD and CDAD models, respectively ( Figure 2). However, there were large differences within classes regarding the certainty of the class assignment (Table S7; see Supporting Information). In PRAD, only 36Á7% of children were assigned to late-onset AD with a probability of > 0Á8.
Individual AD patterns were heterogeneous within each class and model, particularly when the posterior probability of assignment was < 0Á8. There was evidence that within-class individual patterns did not follow the descriptive label of the class; e.g. 26 of 175 children in the early-onset remitting PRAD class reported symptoms after age 5 years (Figure 2). Figures S5-7 (see Supporting Information) show AD severity in different classes across the models. For all indicators, more severe AD was most frequently observed among children in the persistent AD class.

Class membership agreement between latent classes in different models
Although AD classes appeared similar in different models, a considerable proportion of children (n = 485, 45%) moved between classes. Transition of individual children between classes is shown in Figure 3. The class membership agreement between the PRAD and DDAD models was poor (ARI = 0Á17), and only 26% of children in the persistent AD class in the PRAD model remained in the same DDAD class. Agreement between CDAD and the other two models was moderate (PRAD, ARI = 0Á37; DDAD, ARI = 0Á40). Of children in the persistent AD class in the CDAD model, 84% remained in the same class in PRAD and 56% remained in the same class in DDAD.
Filaggrin loss-of-function mutations and atopic dermatitis latent classes in different models The association between FLG mutations and AD latent classes significantly differed between different models (Table S8; see Supporting Information). FLG was strongly associated with persistent AD in PRAD [relative risk reduction (RRR) 2Á9, 95% confidence interval (CI) 1Á6-5Á2; P < 0Á001] and CDAD (RRR 2Á5, 95% CI 1Á1-6Á1; P = 0Á04), but not in DDAD. There was a significant association between FLG and the early-onset class in CDAD (RRR 2Á4, 95% CI 1Á3-4Á5; P = 0Á01), but not in the other two models.
Association of atopic dermatitis latent classes with atopic diseases in school-age children Table 1 shows the association of AD latent classes with atopic diseases at age 11 years. In all models, children in the persistent AD class were consistently at the highest risk of having peanut allergy, asthma and hay fever at age 11 years compared with other AD classes.
The association of peanut allergy with persistent AD was similar across PRAD, DDAD and CDAD [odds ratio (OR) 9Á6, 95% CI 2Á9-31Á9; OR 11Á6, 95% CI 3Á2-41Á9 and OR 11Á2, 95% CI 3Á0-41Á5, respectively]. The risk of peanut allergy was substantially lower in the early-onset remitting class in all three models, with no significant association with late-onset AD in the PRAD model.
The strength of the associations between persistent AD and asthma differed in different models, ranging from OR 5Á5 (95% CI 2Á8-10Á6) for DDAD to OR 11Á7 (95% CI 5Á3-26Á1) for CDAD.

Early identification of persistent atopic dermatitis
Given that the CDAD model provided the most clinically interpretable evidence, we used this definition to investigate  whether, among children with early-onset AD, we can distinguish those in the persistent class from those in the earlyonset remitting class. As the proportion of children reporting current AD did not differ between these classes at age 3 years (˜80%, Figure 1c), we utilized predictors ascertained at this timepoint. Results of the univariate analysis are shown in Table S9 (see Supporting Information). In the multivariate logistic regression shown in Table 2 (which included sex, FLG genotype and variables that differed significantly between the classes in the univariate analysis), having moderate/severe AD (at age 3 years) increased the risk of persistent AD 11Á6-fold (95% CI 1Á7-80Á2), as did sensitization to two or more allergens (5Á2-fold, 95% CI 1Á3-21Á2) and current asthma (4Á8fold, 95% CI 1Á4-16Á6). This model had 57% sensitivity and 89% specificity for discriminating between the classes (area under receiver operating characteristic curve = 0Á84) (Figure 4); 80% of children were correctly classified, with a positive predictive value of 71%.

Discussion
We have shown that the use of different data sources and definitions of AD results in different numbers and types of longitudinal AD phenotypes identified by LCA, in addition to different phenotype membership among individual children. Although AD latent classes at a population level appeared similar when different definitions were used, almost half of children changed class allocation in different models. The impact of different definitions on study findings was obvious in the inconsistent associations with FLG genotype, which varied considerably depending on the definition. There were also some similarities. For example, for all indicators and models, more severe AD was most frequently observed among children in the persistent class, and children in this class were consistently at the highest risk of having peanut allergy, asthma and hay fever at age 11 years. The association with peanut allergy across the models was similar, with a striking nine-to 11-fold increase among children in the persistent AD class and a substantially lower risk among those in the early-onset remitting class, with no significant association with late-onset AD. Compared with children in the early-onset remitting class, those with persistent AD were significantly more likely to have (at age 3 years) moderate/severe AD, polysensitization to at least two allergens, and current asthma, and were less likely to have been breastfed. Table 1 Associates of longitudinal atopic dermatitis (AD) latent classes in parentally reported AD (PRAD), doctor-diagnosed AD (DDAD) and composite definition of current AD (CDAD) models from multivariable logistic regression analyses. The analyses were adjusted using parental history of AD, parental history of atopy, sex, ethnicity, cat ownership during pregnancy, dog ownership during pregnancy, birth season, breastfeeding, position of siblings and daycare attendance PRAD DDAD CDAD aOR (95% CI) P-values aOR (95% CI) P-values aOR (95% CI) P-values Peanut allergy, age 11 years n = 684 2Á7 (1Á4-5Á1) 0Á002 Hay fever, age 11 years n = 703 aOR, adjusted odds ratio. CI, confidence interval. Reference group: no AD. One of the main limitations of our study is the lack of a replication population. However, to our knowledge, there are very few birth cohorts that have longitudinal information on AD collected using questionnaires in a research framework and the transcribed data from healthcare records for the same children, both of which are key to interpreting our findings.
We acknowledge that physical examination offers a much more accurate way of defining AD. 37 To this end, both the UK Working Party's criteria [38][39][40] and Hanifin and Rajka's 18 criteria are excellent for case definition, but are difficult to implement fully in birth cohorts that are mostly questionnaire-based, and in which information from physical examination is usually available on few timepoints during the follow-up, usually months or years apart. Given the variability of AD symptoms, the use of this information from a 'spotlight' clinical examination would likely introduce bias towards more severe disease. Participants in our birth cohort were predominantly white (95%), and the results are not applicable to other ethnic groups.
Our follow-up was not long enough to identify adolescent AD phenotype (which is less common, comprising ≥ 10% of the overall AD burden), 9 and further follow-up to mid or late adulthood may be required to ascertain all longitudinal patterns with more precision.
At age 1 year, 36% of children had PRAD, and 47% had DDAD, and the prevalence of AD for each definition in our study was higher than that reported in most previous studies. 41,42 This difference may have arisen from differences in questionnaires 17 and the algorithms for identifying AD in healthcare records. 21 Diagnoses across clinicians may vary, as diagnostic criteria change over time and clinicians may rely upon medical history provided by parents for diagnosing and managing the condition. There may be a risk of overestimating the prevalence, as clinical presentation of AD is similar to other skin conditions (e.g. common rashes in infancy). 43,44 To date, longitudinal trajectories of AD have been mainly derived from questionnaire-based parental reports (summarized in Table S10; see Supporting Information). [13][14][15][16] In the comparable PRAD model, our study identified similar phenotypes, but the estimated prevalence of each phenotype differed from some previous studies. For example, the persistent phenotype was much more common in our study (12%) and in the Avon Longitudinal Study of Parents and Children (ALSPAC) (11%) 16 than in the Prevention and Incidence of Asthma and Mite Allergy (PIAMA) study (4Á9%) 14 and Generation R (2%). 13 Similarly, FLG genotype was a significant associate of persistent phenotype in our study and in ALSPAC, but not in PIAMA and Generation R.
To our knowledge, ours is the first study to investigate within-phenotype heterogeneity. In AD phenotyping, LCA has been used to discover subgroups that were presumed to be homogeneous. However, our data suggest that this may not always be the case. A proportion of children were classified imprecisely, and the precision with which children were assigned to a class varied across phenotypes and models. It is of note that the greatest level of uncertainty in assignment was for late-onset AD in the PRAD model, with only 36Á7% of children being assigned to this class with a probability of > 0Á8.
We also found evidence that within-class patterns of symptoms among individual children did not follow the 'label' ascribed to the class (for example, across all models, some Receiver operating characteristic (ROC) curve for a prediction model between persistent vs. remitting atopic dermatitis (AD) using the composite definition of current ADWe constructed a prediction model for AD persistence among children with early-onset AD. Results derived from logistic regression analysis including the following predictors: filaggrin loss-of-function mutations, breastfeeding, polysensitization at age 3 years, current asthma at age 3years and AD severity at age 3 years.
children assigned to early-onset remitting classes reported AD at middle-school age, children with reported AD at certain timepoints were assigned to 'no AD', and some children with identical symptom patterns were assigned to different classes). This was particularly evident when the individual's likelihood of belonging to the assigned class was < 0Á80. This is consistent with the recent report of the LCA-derived wheeze phenotypes, which has shown that a substantial number of children were classified imprecisely and did not follow wheeze patterns suggested by the latent-class label. 45 Similar to our data on AD, the greatest level of uncertainty in the assignment was for the late-onset class. This within-class heterogeneity may, in part, be responsible for a lack of consistent associations of phenotypes with risk factors reported in previous studies. It is striking that almost half of the study participants (45%) moved between classes in different models. Some marked differences between the models were notable; for example, one-third of children in the persistent AD class in PRAD were assigned to 'no AD' in the DDAD model, suggesting that parental report of AD and general practitioners' diagnosis of AD in the first few years of life may, to a certain extent, identify different conditions. Clinicians and parents may have divergent views on the presence and severity of AD. Although parentally reported AD is one of the most commonly used definitions in epidemiological studies, 17 this definition is unable to differentiate other pruritic skin diseases completely. 43 Conversely, doctor diagnosis of AD can comprise various skin inflammatory diseases, such as seborrhoeic dermatitis and impetigo. 44 This is indirectly confirmed by our data, which suggest that doctor diagnosis of AD may be particularly imprecise in early childhood. Our findings indicate that the use of AD definition from a single data source can bias study findings, and that the use of composite definitions comprising several aspects of AD may be preferable.
In our study, among children with early-onset AD, several features ascertained in early life predicted subsequent persistence of AD. The key independent associates in the first 3 years of life, which predicted persistent AD to age 11 years, included moderate/severe AD symptoms, polysensitization and current asthma, whereas breastfeeding was protective. The association between breastfeeding and AD is controversial. 46 Our findings are consistent with results of recent studies that reported a protective effect of breastfeeding on AD in children of school age 47 and those in adolescence. 48 In contrast, a meta-analysis of nine studies demonstrated no significant relationship between never vs. ever having been breastfed and longer vs. shorter duration of breasfeeding. 49 Our data suggest that breastfeeding may not have an impact on the onset of AD, but may reduce the risk of persistence among children with early-onset AD. The protective effect that we observed should be validated in future studies, ideally including the association with breastfeeding duration.
Urinary eosinophil protein X (U-EPX) is a marker of eosinophil activation and is associated with atopic diseases. 30 In our study, compared with children in the 'no AD' class, the U-EXP/creatinine ratio at age 3 years was significantly higher in persistent AD, but not in early-onset remitting AD. However, among children with early onset of symptoms, this biomarker was not an independent associate of AD persistence. Nonetheless, as U-EPX can be accurately measured in urine (thereby making it suitable for use in young children), further investigations may be warranted in larger cohorts.
In conclusion, our results suggest that caution should be exercised when comparing results of different studies, as AD definitions based on different data sources may lead to inconsistent findings. Our data suggest that the use of a composite definition of AD may help to define cases with more precision and identify more consistent long-term trajectories. Our datadriven analyses indicated that different trajectories of AD have different associations with allergic comorbidities, including a striking nine-to 11-fold increase in the risk of peanut allergy among children in the persistent AD class. Among children with AD in the first 3 years of life, healthcare professionals should be alerted by the presence of moderate/severe symptoms, allergic polysensitization and contemporaneous asthma, which are associated with persistence of AD and considerably higher risk of allergic multimorbidity in school age.

Supporting Information
Additional Supporting Information may be found in the online version of this article at the publisher's website: File S1 Supplementary appendix. Figure S1 Prevalence estimates of in parentally reported current atopic dermatitis, composite definition of current atopic dermatitis, and doctor-diagnosed current atopic dermatitis. Figure S2 Overlap between definitions at each timepoint.

Figure S3
Proportion of children with atopic dermatitis (AD) at physical examination among children with different definitions of current AD. Figure S4 Trajectories of atopic dermatitis identified using latent class analysis among children with complete dataset. Figure S5 Percentage of children in each class with moderate/severe atopic dermatitis on physical examination. Figure S6 Percentage of children in each class with sleep disruption owing to atopic dermatitis in the last 12 months. Figure S7 Percentage of children in each class using regular topical ointments for atopic dermatitis in the last 12 months. Table S1 Characteristics of the study population. Table S2 The point prevalence and severity of atopic dermatitis confirmed by physical examination at each review. Table S3 Atopic dermatitis (AD) severity at each follow-up in children with different definitions of current AD.  Table S8 Association of filaggrin loss-of-function mutations and longitudinal atopic dermatitis latent classes in parentally reported current atopic dermatitis, composite definition of current atopic dermatitis, and doctor-diagnosed current atopic dermatitis models. Table S9 Characteristics of children in the two latent classes with early-onset atopic dermatitis in composite definition of current atopic dermatitis model. Table S10 Characteristics of cohorts and identified atopic dermatitis phenotypes using latent class analysis.