Development of a core descriptor set for Crohn's anal fistula

Crohn's anal fistula (CAF) is a complex condition, with no agreement on which patient characteristics should be routinely reported in studies. The aim of this study was to develop a core descriptor set of key patient characteristics for reporting in all CAF research.

Recent publication of a core outcome set (COS) has highlighted significant challenges in CAF research methodology, emphasizing the importance of transparent, patient-relevant reporting to support comparable research [4]. Standardized reporting of outcomes is only one aspect of the three main challenges facing CAF researchers [5]. Other challenges include heterogeneous reporting of surgical interventions and their methods, and the variable descriptions of the patient population included in the research. Work to rectify the heterogeneity in surgical intervention reporting is already under way, beginning with a recently conducted analysis of the variation within reporting [6].
The third area of methodological challenge -the description of the sample population -is particularly relevant in CAF. CAF can range from a minimally symptomatic condition that is well controlled on medical therapy, to a debilitating, complex condition with a significant negative impact on quality of life. To understand why some patients respond well while others do not, the patients studied must be adequately described. Previous work has identified prognostic characteristics for CAF [7] that remain inconsistently reported. A standardized approach to reporting patient characteristics may reap several benefits: ensuring external validity, understanding disease phenotypes associated with varied outcomes, allowing deeper comparison in systematic review and ultimately enabling clinicians to advise patients of the most effective CAF treatment for their clinical situation.
The aim of this study was to define a standardized set of patient characteristics, known as a 'core descriptor set' (CDS), which defines the minimum patient characteristics to be reported in future research in CAF.

ME THOD
This study took the form of a modified Delphi consensus exercise, with the addition of principal component analysis (PCA) to interpret data structure. It was developed with reference to the COS-STAD guidelines [8] and reported using COS-STAR guidelines [9].

Scope
This CDS was developed with the intention of using it in adult cohorts or randomized trials investigating medical and/or surgical treatments in patients with CAF.

Steering group
The steering group was drawn from gastroenterologists and surgeons with a research interest in inflammatory bowel disease (IBD).
The steering group was primarily drawn from the UK-based ENiGMA (CAF) research network, with international collaborators identified from the USA and Sweden. The clinical steering group included patient and public representatives.

Delphi design and participants
An overview of the modified Delphi method is shown in Figure 1.
Three rounds of an online Delphi survey were conducted, in which participants were asked to rate a longlist of descriptors by importance, on a Likert scale of one to nine. The first round was an open internet survey distributed over social media to experts in any healthcare discipline with experience of managing patients with IBD. Email invitations were sent through professional email contacts and societies linked to the steering group. Subsequent rounds were only open to participants who had completed all previous rounds.
Feedback between rounds encouraged participants to compare their responses with those from their own and other professional groups, and the overall cohort.

Longlist generation
The initial longlist for the Delphi survey was drawn from constituent studies of three systematic reviews on the topic [4,5,10] and two more recently published randomized trials [9]. Items reported in tables of baseline characteristics of CAF patients included in the studies were extracted. Where descriptors were reported with numerical cut-offs (e.g. age > 40 years, white cell count > 15 × 10 9 / mm 3 ), the cut-offs were removed to retain just the main descriptor.
Where scoring systems were reported, these were split into their constituent components and each component was listed separately.
The longlist was reviewed by the steering group for completeness and clarity of language and meaning.

Consensus group participants
Participants who completed all three rounds of the Delphi survey were eligible to participate in the final consensus meeting to determine those descriptors to be included in the final list. Purposive sampling of interested participants ensured a global and multidisciplinary consensus group.

What does this paper add to the literature?
This study has established a consensus between gastroenterologists and colorectal surgeons on patient characteristics that should be described in future studies of Crohn's anal fistula. This may help better identification of different phenotypes and subgroups of patients.

Survey design
Round one presented all descriptors on one page in a random order.
Participants were asked to rate descriptors by importance for inclusion in future studies on a Likert scale of one to nine: 1-3 represented 'low importance', 4-6 'neutral importance' and 7-9 'high importance'. At the end of the round one survey, participants could also propose additional descriptors for assessment in round two.
These were reviewed by the steering group to ensure clarity of phrasing and avoid repetition of already assessed items.
Descriptors included in rounds two and three were presented in groups based upon PCA components from round one ratings. This approach is described below. New descriptors added to the longlist following round one were assigned to a group generated by PCA by the steering group. Random order of the descriptors in each group was used in rounds two and three of the survey.

Principal component analysis
Principal component analysis is a dimension reduction technique that gathers items into conceptual groups known as components, as identified by patterns in item ratings [7]. This means that items which have similar rating patterns across raters will be grouped together, suggesting a relationship or common idea underpinning them. For example, in a study one might find that procalcitonin, C-reactive protein, white cell count and interleukin-6 all showed correlations (rated consistently high or low). Using PCA, these could be grouped into a component called 'inflammation' . Utilizing Likert ratings of each item were included in the assessment, which was performed using a varimax rotation approach. Appropriateness of data for PCA was determined using Bartlett's test for sphericity and the Kaiser-Meier-Olkin test for sampling adequacy. As all items in round one were mandatory, there were no missing data points. Components were identified using the eigenvalue method, where the eigenvalue of the component was greater than or equal to one. PCA is a reactive statistical technique, therefore the loading threshold which generates components was set postanalysis. Items which were loaded across more than one component were allocated to the component with the greatest loading value after review by the steering group.

Inclusion criteria
Criteria for inclusion and exclusion of descriptors was defined a priori. Descriptors were included if they were unique, concerning CAF or Crohn's disease, and presumed a confirmed diagnosis of CAF.
Descriptors were excluded if they were duplicates, over-ambiguous and undefinable, or if they described characteristics related to the diagnostic process of CAF. Descriptors could be sourced from the literature or be F I G U R E 1 Outline of method. *Threshold for shortlisting: ≫ 70 % of participants voted the descriptor as 7-9 (high importance) the suggestions of the steering group and participants. New descriptors could not be included after commencement of round two of the survey.

Shortlisted descriptors
Thresholds for inclusion were set a priori; descriptors rated as 7-9 'high importance' by over 70% of each professional group were set aside after each round and automatically shortlisted for the consensus meeting. Three rounds were planned to ensure that any items proposed in round one had two opportunities to reach consensus for inclusion or exclusion. The descriptors shortlisted over the three rounds were grouped for presentation to the Patient and Public Involvement Group and consensus meeting.
Descriptor groups were generated by the steering group, based on the grouping of concepts in the underlying data elicited by PCA. Borderline descriptors were identified in round three, defined as those descriptors rated 7-9 'high importance' by 65%-69% of each professional group.

Patient and public involvement
Patient representation was included in the steering group and provided feedback on the development of the CDS, and is reported in line with GRIPP-2 SF [11]. The aim of patient involvement was to inform the steering group about the burden and acceptability of recording of items in the descriptor set. Feedback was conducted via multiple virtual discussions about the shortlisted descriptors and their groups, and how they might be measured. This involved sharing the longlist prior to the meeting and then discussion of each item on the list. Patient representatives were asked about their overall impression of the shortlist, borderline descriptors and grouping of individual descriptors. The potential added patient burden of having a minimum standard for descriptors measured was also discussed.

Consensus meeting
A virtual consensus meeting was convened to discuss and vote upon each change to the proposed final set as defined by the Delphi surveys.
Changes which were voted 'yes' by 80% or more of voting participants were finalized. Planned votes included the inclusion of borderline descriptors, combination and rewording of descriptors as proposed by the steering group. Spontaneous votes could include the renaming and rearrangement of groups, and the combination and rewording of descriptors.

Longlist of descriptors
Ninety six descriptors were eligible for the longlist, and the most common reason for removal was duplication of concept. Of these, 83 descriptors were longlisted (Appendix 2) and rated in round one.
An additional 14 descriptors were generated from the comments submitted in round one and rated in subsequent rounds ( Figure 2).

Responses and respondents
Round one received 133 unique responses from three healthcare professional roles (gastroenterologist, colorectal surgeon, IBD nurse specialist) and 22 countries (Table 1)

Shortlisted descriptors
Most shortlisted descriptors (22 out of 40 submitted to consensus) were extracted in round one, and the fewest were extracted in round three ( Table 2). Of the 40 descriptors shortlisted, 31 were included in a PCA component, and five were not included in any PCA component group as they were generated after round one. Descriptors tended to belong in the fistula complexity, immunomodulation and biological therapy, quality of life, and infection-related or smaller components.
The six proposed groups for the CDS were loosely based on these components: fistula anatomy, disease activity, risk factors, medical interventions for CAF, surgical interventions for CAF and quality of life.

Final core descriptor set
The finalized CDS for CAF contained 37 descriptors within six groups (Table 3). 'Best' methods of measurement for each descriptor cannot currently be described. Group F, 'Patient symptoms and impact on quality of life' measures six descriptors and the Group F methods of measurement should also explicitly assess pain, impact on sitting down and ability to defaecate.

DISCUSS ION
This study has completed an international consensus process to agree key patient and disease descriptors to be reported in TA B L E 1 Respondent characteristics from each round number (percentage of that characteristic in each round)

Round 1 Round 2 Round 3
Healthcare professional role  [14]. This process can help in reduction of initial longlists of candidate instruments.
The treatment of CAF often follows a complex pathway, with many opportunities for tailoring or personalizing care [15]. The

items included in the CDS reflect a range of factors that inform
clinical decision-making surrounding the range and timing of treatments offered to patients with CAF [16,17]. The CDS covers some of the key descriptors seen in current clinical guidelines [16,17] and in previous research which identifies prognostic factors [7]. Many of these items are covered in moderate detail in the CDS, including  [18,19].
It is notable that genetic markers have not been included in the final CDS, despite studies demonstrating their prognostic relevance [5,20]. As genetic markers are not routinely used in practice, or may be some way off full validation for a prognostic role, their presence may not directly inform management. Therefore, they may not currently be considered a useful descriptor by clinicians. This reflects the wider challenge as we move towards precision medicine in IBD -having access to prognostic markers that might predict the natural disease course, identifying the risk of specific presentations or complications of disease or responsiveness to specific treatments [21].
This study has several strengths. It followed recommendations set out for the development of COSs [8], and mirrored standards for defining disease [22], which are comparable to this study design. In addition, the methodology was validated by PCA to define the structure and theories underlying participant rating behaviour.
Participants were drawn from several countries, continents, healthcare systems and clinical specialties, which we believe will improve the external validity of this study.
While every effort was made to identify and extract patient descriptors from previous studies, it is possible that some descriptors were not identified at longlisting or at the later opportunities to add items. Some descriptors alone, for example fistula-related quality of life, might not be considered sufficiently descriptive. The integration of patients into this work highlighted the need to focus on subdomains or aspects of these descriptors, such as pain and incontinence.
Future iterations of the method ensure patient participation during longlisting, with a focus on identifying key baseline symptoms. In this case, pain was felt to be a key baseline descriptor driving patient and clinician decisions. However, this is inconsistently recorded clearly in the literature. There was a drop-out in participation over the study, mostly between rounds one and two. This rate and pattern of drop-out broadly matches that seen in other Delphi studies [22,23]. This does introduce the risk of an attrition/selection bias into the dataset. Most participants were from Europe or the USA, with smaller numbers from other countries that may have moderate to high levels of IBD. It is not known whether clinicians from other geographical areas might have differing views on key characteristics.
It might be argued that this represents a weakness of this Delphi process.
This is one of the first attempts to develop a CDS, and the methodology is developing as the research team learns from the process.
Key considerations so far include: -Generation of a comprehensive list of descriptors from the literature might be achieved with a 'saturation'-based approach (no new descriptors identified in five or ten papers), rather than a comprehensive systematic review. This could reduce the set-up workload.
-Engagement of interested parties from different disciplines, and from a range of countries, is needed to ensure external validity and potential wide uptake of the descriptor set.
-There should be regular reminders to participants and the steer- However, clinicians may baulk at the list of items. It should be considered that the likely tools used to collect these data are already in use. For example, items in domain F might easily be captured using the CAF-QOL tool [24]. Other domains may equally be covered by commonly used tools. Likewise, research funders will undoubtedly recognize the need for inter-study comparisons and overcoming barriers to implementation of research findings. Descriptors relevant to both surgical and medical treatment feature, as a reflection of the interdisciplinary collaboration. Given that most patients are treated with a combination of medical and surgical therapy [25], these descriptors will be relevant to most studies and should be reported, regardless of which primary outcome from the agreed COS is chosen by CAF researchers [12]. Additionally, in studies with multiple follow-up stages, and in observational studies, it may be useful to re- Work to improve standardized description of patients with CAF is ongoing, and recent studies have used higher-level phenotypes for classification [13]. Further work is required to define included descriptors in a way that is clinically and prognostically relevant [26].
We also recognize that the field continues to evolve, and the descriptor set will need to be reviewed in the future. Such a revision may include reassessment of clinical descriptors, refined imaging parameters and inclusion of genetic markers or other personalized treatment stratifiers [27,28]. The implementation of a CDS into routine clinical practice will take time, and require the engagement of researchers, funders and journal editors.
In conclusion, this study has achieved agreement on a 'core' list of patient descriptors to be reported in all clinical studies of CAF. Use of this in conjunction with an appropriate COS [12] might provide a strong foundation for studies. Future work might include the use of this CDS in CAF registries and adequately powered cohort studies, to evaluate current classification strategies [29] and to identify a range of key phenotypes, allowing more precise treatment strategies and predictors of success and failure of potential treatments [30].

FU N D I N G I N FO R M ATI O N
None.

CO N FLI C T O F I NTE R E S T
No conflicts of interest to declare.

This study was approved by the University of Sheffield Research
Ethics Committee (Ref: 034049).

DATA AVA I L A B I L I T Y S TAT E M E N T
The data that supports the findings of this study are available in the