Do patient characteristics matter when calculating sample size for eczema clinical trials?

Abstract Background The Patient‐Oriented Eczema Measure (POEM) is the core outcome instrument recommended for measuring patient‐reported atopic eczema symptoms in clinical trials. To ensure that the statistical significance of clinical trial results is meaningful, trials are often designed by specifying the target difference in the primary outcome as part of the sample size calculation. One method used to specify the target difference is a score that corresponds to a standardized effect size. Objectives to assess how the standardized effect size of POEM scores vary across age, gender, ethnicity and disease severity. Methods This study combined data from five UK‐based randomized clinical trials of eczema treatments in order to assess differences in self‐reported eczema symptoms (POEM) corresponding to a standardized effect size (0.5 SD of baseline POEM scores) across age, gender, ethnicity and disease severity. Results POEM scores corresponding to 0.5 SD(baseline) were remarkably consistent across participants of varying ages, gender, ethnicity and disease severity from datasets of five UK trials in children (range 2.99–3.45). Conclusions This study provides information that can support those designing clinical trials to determine their sample size and can aid individuals interpreting trial results. Further exploration of differences in populations beyond the United Kingdom is needed.


| INTRODUCTION
The Patient-Oriented Eczema Measure (POEM) is a 7question patient-reported measure of atopic eczema severity assessed over the previous week and scored 0 to 28. 1

POEM is recommended for inclusion in all eczema clinical trials by the Harmonising Outcome
Measures in Eczema (HOME) initiative. 2 When designing clinical trials with POEM as a primary outcome, the target difference in the primary outcome needs to be stated as part of the sample size calculation. 3 Cook et al. have identified methods that can be used to assess whether the target difference specified is important and/or realistic. 3 One approach to specifying a target difference is to use the minimally important change (MIC), defined as 'the smallest change in score in the construct to be measured which patients perceive as important'. 4 Anchor-based methods to calculate the MIC of an instrument look how change on the instrument corresponds to change on a different instrument (the 'anchor').
Another method commonly used to specify the difference to be detected in sample size calculations is a distribution-based approach based on the standardized effect size. 3 The effect size is the change in scores between baseline and post-treatment divided by the standard deviation of baseline scores. 5,6 It has been suggested the difference to be detected in clinical trials, based on a medium effect size, could be specified as the score that corresponds to 0.5 SD (baseline). This is the method that is used within this study. Some have suggested this provides a threshold of detection, which is conceptually related to minimally detectable change. 4 Essentially, this means eczema clinical trials should not design their studies to detect a target difference smaller than the score on POEM which corresponds to 0.5 SD (baseline) as it is unlikely to be considered a meaningful difference in POEM scores.
We have explored different anchor-and distribution-based methods in a previous article. 7 Recommendations for interpreting changes in POEM scores based on three published studies using both anchor-and distribution-based approaches suggest the following: ≤2, unlikely to be a change beyond measurement error; 2.1-2.9, a small change detected that is likely to be beyond measurement error but may not be clinically important; 3-3.9, probably a clinically important change; ≥4, very likely to be a clinically important change. [7][8][9] However, participant characteristics could potentially influence what difference would be important to specify in a sample size calculation, thus potentially limiting the generalizability of these recommendations. 10 The aim of this study was to assess how the POEM scores that correspond to the standardized effect size (0.5 SD) are influenced by age, gender, ethnicity and disease severity in these studies.

| METHODS
Secondary analysis was conducted on datasets from five clinical trials including children from the United Kingdom. Data sets were chosen due to availability, so may not be inclusive of all eczema trials within the United Kingdom. As this study made secondary use of existing trial datasets, further ethics approval was not required, and this was confirmed by the University of Nottingham's Faculty of Medicine & Health Sciences Research Ethics Committee (Ref: 258-1712). The protocol was prospectively registered on the CEBD protocol registration portal: https://www.nottingham.ac.uk/ research/groups/cebd/resources/protocol-registration. aspx.

WHAT IS ALREADY KNOWN ABOUT THIS TOPIC?
� Little is known about how participant characteristics might influence the difference that is important to detect when calculating sample size requirements for clinical trials.

WHAT DOES THIS STUDY ADD?
� Using the combined dataset across UK clinical trials including children with eczema, POEM scores corresponding to the standardized effect size 0.5 SD (baseline) was remarkably consistent across participants of varying ages, gender, ethnicity and disease severity. � This provides reassurance for those designing or interpreting the results of eczema clinical trials that have used POEM as their primary outcome that the specified difference can be consistent in a variety of eczema populations and recruitment settings.
Baseline POEM scores and patient characteristic variables from each trial were combined in one dataset in STATA Version 14. To explore the impact of disease severity, we calculated the overall mean, standard deviation and 0.5 SD of baseline POEM for each trial separately (as each trial recruited participants with different eczema severities). We also combined individual participant data to calculate the mean, standard deviation and 0.5 SD for overall sample and for each age, ethnicity and gender category.

| RESULTS
Data from 1426 participants across five UK trials were combined. All five trials included children in the categories 0-2 years and 3-7 years. The CLOTHES, SWET and BATHE trials included some children in the 8-17 years category, however this was the least frequent age in all three trials. Gender was roughly equally distributed within all trials. Most participants across all trials were white (79%) ( Table 1).
POEM scores corresponding to the standardized effect size from each of the five trials of children in the United Kingdom ranged from 2.68 to 2.95 (Table 2).
When the individual participant data from these trials were combined and categorized according to age, ethnicity and gender, the POEM scores corresponding to the standardized effect size ranged from 2.99 to 3.45.

| DISCUSSION
The POEM score corresponding to the standardized effect size was remarkably consistent across participants of varying ages, gender, ethnicity and disease severity. Previous studies that have used both anchorand distribution-based methods to estimate the MIC found that anchor-based methods suggested a larger difference would need to be detected than the standardized effect size approach. 7,8 Our findings are consistent with previous findings that a score less than 2 is unlikely to be a change beyond measurement error and support a stance that the specification of a target difference should be no lower than 2 when designing trials, as all 0.5 SD scores were above 2 points across the different participant characteristics. Previous research also suggests that changes in POEM scores less than 3 are unlikely to be clinically relevant. However, researchers and clinicians need to be cautious about relying on fixed values to interpret the importance of a change on the POEM and continue to consider the context within which they are using the POEM. A small improvement in many individuals could result in a large reduction in burden at a societal level. It has been cautioned that the SD used to calculate the standardized effect size needs to reflect the SD for the population of interest for the results to be used appropriately. 3,16 This study should give reassurance that the participant characteristics of age, ethnicity, sex and disease severity do not appear to influence the SD.
Given that datasets were chosen based on availability, they may not be fully representative of eczema trials including children within the United Kingdom. There are also limits on how confidently the results from this study can be applied to wider populations beyond children in the United Kingdom, such as those nations with larger documented ethnic and economic disparities in health outcomes. Further exploration in different populations is warranted. We can compare our results with the 0.5 SD we have calculated for the trial of methotrexate versus azathioprine for severe atopic dermatitis (MAcAD) (Dutch trial register: NTR1916). 17 MAcAD is a Dutch trial that included 43 adults with severe eczema. Twenty (46.5%) female and 38 (88.4%) reported white ethnicity. 0.5 SD from this sample was 2.32. This is slightly lower than that of the UK trials in children, but still relatively consistent with the results. 17 There are characteristics that vary amongst individuals that may influence the target difference that should be specified in a clinical trial. Some examples include body surface area of involved skin, phenotype, and whether the POEM was completed by parent or the child themselves. We were limited to exploring variables that were collected in the trial datasets. Note: Age, ethnicity and gender are the combined data from all five trials. a Q 1 = lower quartile, Q 3 = upper quartile.

of 6 -
The POEM score corresponding to the standardized effect size does not appear to be influenced by age, gender, ethnicity or disease severity of the population, which provides useful information for those designing or interpreting trials with POEM as the primary outcome.