The modified COVID‐19 Yorkshire Rehabilitation Scale (C19‐YRSm) patient‐reported outcome measure for Long Covid or Post‐COVID‐19 syndrome

Abstract Background The C19‐YRS is the literature's first condition‐specific, validated scale for patient assessment and monitoring in Post‐COVID‐19 syndrome (PCS). The 22‐item scale's subscales (scores) are symptom severity (0–100), functional disability (0–50), additional symptoms (0–60), and overall health (0–10). Objectives This study aimed to test the scale's psychometric properties using Rasch analysis and modify the scale based on analysis findings, emerging information on essential PCS symptoms, and feedback from a working group of patients and professionals. Methods Data from 370 PCS patients were assessed using a Rasch Measurement Theory framework to test model fit, local dependency, response category functioning, differential item functioning, targeting, reliability, and unidimensionality. The working group undertook iterative changes to the scale based on the psychometric results and including essential symptoms. Results Symptom severity and functional disability subscales showed good targeting and reliability. Post hoc rescoring suggested that a 4‐point response category structure would be more appropriate than an 11‐point response for both subscales. Symptoms with binary responses were placed in the other symptoms subscale. The overall health single‐item subscale remained unchanged. Conclusion A 17‐item C19‐YRSm was developed with subscales (scores): symptom severity (0–30), functional disability (0–15), other symptoms (0–25), and overall health (0–10).


| BACKGROUND
Long Covid (LC) is a term coined by patients. It refers to persistent symptoms 4 weeks after contracting COVID-19. 1 Ongoing symptomatic COVID-19 and post-COVID-19 syndrome (PCS) are the scientific terms for symptoms 4-12 and >12 weeks after the illness, respectively. 2 PCS affects more than two million individuals in the UK alone and more than 50 million cases worldwide. 3 More than 200 symptoms across 10 organ systems have been reported. The most common symptoms are breathlessness, fatigue, palpitations, dizziness, pain, brain fog (cognitive problems), anxiety, depression, posttraumatic stress, skin rash, and allergic reactions. 4 It can be a remitting and relapsing condition with a prolonged course causing significant distress and disability in some individuals. 5 A multidisciplinary team (MDT) of rehabilitation professionals working with patients recovering from COVID-19 during the first wave of the pandemic developed the original version of C19-YRS. [6][7][8] The content was based on staff experience in managing these patients, knowledge from our systematic review of previous outbreaks, and feedback on the scale from patients and healthcare professionals. [7][8][9] The content was decided using a consensus method. The scale was kept balanced in terms of questions spanning all aspects of 2001 WHO International Classification of Functioning, Disability and Health (ICF) framework. 10 The scale's content validity was supported by studies 11,12 using the scale, which revealed symptoms and functional problems similar to other PCS studies reported in the literature. 13,14 C-19 YRS was the first validated scale reported in the literature to capture PCS symptoms and grade the severity of symptoms and functional disability in PCS. The scale has been recommended in the NHS England Clinical Guidance for PCS services and NICE rapid guidelines. 2,15 The scale has been translated to numerous international languages and is currently used in many PCS studies worldwide. There is also a digital format of the scale available where the patient can complete the questionnaire on a smartphone application; the clinician can access the results on a web portal; both the patient and the clinician can use the system to monitor progress and response to ongoing treatments for PCS. 8 The original C19-YRS is a 22-item patient-reported outcome measure (PROM). Each item is rated on a 0-10 numerical rating scale, where 0 represents symptom not present, and 10 illustrates symptom being extremely severe or life disturbing. The C19-YRS has four subscales concerned with the severity of patients' key symptoms, functional limitations, overall health, and additional symptoms. The scale also captures pre-COVID scores for comparison. 8 Questions 1-10 form the symptom severity subscale (score 0-100), Questions 11-15 the functional disability subscale (0-50), Question 16 is the overall health score (0-10), and Questions 17-22 the additional symptoms subscale (0-60). 16 The classical psychometric analysis of the C19-YRS in a sample of 188 PCS patients showed good data quality, good scaling and targeting, and high internal consistency (Cronbach's α = 0.891), with good reliability of individual subscales. 16 Some items were identified as having poor scaling assumptions and targeting, such as swallowing, incontinence, fever, and skin rash. It was determined that the contribution of these items to the overall measurement properties of the scale was limited. 16 Although the classical psychometric analysis of the C19-YRS was promising, a further analysis using modern psychometric approaches (Rasch analysis) was included as part of the C19-YRS development plan. The Rasch model 17 is a unidimensional measurement model that satisfies the assumptions of fundamental measurement, 18,19 meaning it provides a measurement template against which scales can be tested. Rasch Measurement Theory (RMT) provides a way to assess the validity of multi-item latent scales where the items (questions) are summed together to form an overall total score. RMT provides a unified framework for several aspects of internal construct validity to be assessed, where it can highlight measurement anomalies within an item set. It should be emphasized that this C19-YRS development phase was intended to identify any specific measurement issues that would inform the development of a psychometrically robust modified version of the C19-YRS.

| Rasch analysis
Rasch analysis was completed with RUMM2030 software, 20 and carried out separately for the C19-YRS symptom severity subscale (10 items) and the functional disability subscale (5 items). The overall health score comprises a single item, which is treated independently from the other subscales and is therefore inappropriate for Rasch analysis. The additional symptoms subscale was not assessed, as these items provide supplementary information to the clinical staff rather than contributing to the symptom severity subscale.
Several tests of fit were carried out at the scale level and the item level; these are all described in more detail elsewhere. 21 All items were assessed for individual fit to the Rasch model relative to the subscale item set; this tests whether each item contributes to the same underlying construct. Misfit was indicated where items were significant at a Bonferroni-adjusted χ 2 p value or standardized (z-score) fit-residuals fall outside ±2.5. Tests of local dependency (LD) were carried out to determine whether the response to any item directly impacts any other item in the subscale; LD was indicated using a residual correlation (Q3 value) criterion cut point of 0.2 above average residual correlation. 22 Response category functioning was assessed to determine whether the response structure of the items worked as intended.
A functional 0-10 response category structure for each item would be indicated by sequential response thresholds (the crossover points between adjacent response categories) on the underlying logit scale. 23 Item bias was assessed through uniform and nonuniform differential item functioning (DIF) testing by sex, age group, disease duration, and hospitalization status; with significant DIF indicated at a Bonferroni-adjusted analysis of variance (ANOVA) p value. Scale targeting was assessed graphically through the relative distribution of item and person locations.
Unidimensionality was evaluated by a series of t-tests, 24 with multidimensionality indicated when independent subsets of items delivered significantly different person estimates, and the lower bound 95% CI percentage of significantly different t-tests was >5%.   Table 1.

| Symptom scale
Initially, 12 items were entered into the Rasch analysis. The it also means that the three separate items should not all be included in contribution to the total score of the symptom severity scale. Therefore, the breathlessness section was reconfigured so that only the maximum score observed across the three items was used, resulting in a single maximum breathlessness item. Initial Rasch analysis of the Symptom Severity scale (10 items, including a single maximum breathlessness item) looked promising but revealed certain measurement issues with the item set.
Overall scale fit statistics are presented in Table 2. Three items displayed misfit on the χ 2 statistic (fatigue, continence, anxiety), with the continence item displaying the largest degree of misfit. items displayed reverse thresholds. It was apparent that a 0-10 response structure was inappropriate for these items, as a logical progression of ordered response thresholds was not observed for any of the items (see Figure 1). The extent of the disordering was variable depending on the nature and content of the item, with the continence and post-traumatic stress items particularly unsuited to this response structure.  Figure 2). These items appear more suited for dichotomous or binary (yes/no) response categories.

T A B L E 1 Demographics of participants
Overall scale fit statistics following rescoring are presented in Table 2. At this point, two items still displayed misfit on the χ 2 statistic (continence, anxiety), with the anxiety item also showing a fit residual of −2.66. The rescoring had little effect on the pairwise dependencies, which remained present as previously reported, and the scale-sample targeting was good (see Figure 3). There was no DIF by sex, age group, or disease duration group. However, a uniform DIF by hospitalization status was observed for the PTSD item, with hospitalized patients having higher expected PTSD values than nonhospitalized patients.
Also, although it was not the intention of the study to determine this, distributional differences between certain demographic groups were observed. Significant score differences by sex (females more severely affected than males, p = 0.02), age group (people aged 50+ more severely affected than those below 50, p < 0.01), hospitalization status (hospitalized people more severely affected than those not hospitalized, p < 0.005), and BMI group (underweight group more severely affected than overweight, who are more severely affected than healthy weight, p < 0.001) were observed.
Further exploratory procedures suggested that the apparent dependency impacted the overall fit of the scale, as removal of either the depression item or the anxiety item resulted in a well-fitting, unidimensional scale (see Table 2). The dependency suggests that using a single item score for these two symptoms will work better than using separate scores from dependent items.

| Functional disability scale
Initial Rasch analysis of the functional disability scale (5 items) looked promising but revealed specific measurement issues with the item set. Overall scale fit statistics are presented in Table 2. At this point, only one item was borderline misfitting on the χ 2 statistic (ADL).
A pairwise dependency was observed between mobility & personal T A B L E 2 Rasch analysis summary statistics of C19-YRS subscales  Table 2.
At this point, one item still displayed misfit on the χ 2 statistic (personal care), and the previously observed pairwise dependency between mobility and personal care was still present. There was no DIF by sex, disease duration group, or hospitalization status, although the mobility item does display slight DIF by age. The scale-sample targeting was good ( Figure 4).
As with the symptom severity scale, distributional differences between demographic groups were observed, with mean score differences by sex (females more severely affected than males, p = 0.02), age group (people aged 50+ more severely affected than those below 50, p < 0.01), hospitalization status (hospitalized people more severely affected than those not hospitalized, p < 0.005), and BMI group (underweight group more severely affected than overweight, who are more severely affected than healthy weight, p < 0.05).  Working group and emerging evidence suggested even though these are not present in all patients they need capturing as these symptoms can be the cause of concern to patients and need addressing by clinicians patients in the UK whose symptoms and functional limitations will be captured using C19-YRSm at regular 3-monthly intervals. 26 We We will also evaluate the respondent burden of completing the measure within the population. We will assess the use of digital tools, which can be challenging in certain cohorts (such as those with cognitive problems and those who do not use smartphones). The

| Using the scale
The C19-YRSm is free to use (Supporting Information: Appendix I), and the MS Word/PDF copy of the tool is available on the University of Leeds website. The digital PROM system developed by ELAROS comprises a smartphone application for the patient and a web portal for the clinicians managing the patient's care. The digital system has C19-YRSm and other scales used in PCS care and is currently being used in more than 30 NHS Trusts in the UK. Any clinical service worldwide wishing to acquire the digital system can contact ELAROS, who will demonstrate the system and provide necessary training to the system's users.
University of Leeds and the authors hold the copyright for the scale. The scale will remain free for use. Any organization wishing to administer the scale to patients for a charge or add the scale to a commercial digital platform should contact the University of Leeds or the corresponding author to seek the required approvals.

AUTHOR CONTRIBUTIONS
Manoj Sivan is the project lead and conceptualized the study.