Minimal Clinically Important Differences for Burke‐Fahn‐Marsden Dystonia Rating Scale and 36‐Item Short‐Form Health Survey

Although an increasing number of trials are reported on the treatment of generalized or segmental isolated dystonia, the minimal clinically important difference thresholds for the most frequently reported outcome measures are still undetermined.

There is increasing research into health-related quality of life (HRQoL) in dystonia. 1 Currently, available data strongly suggest that patients with dystonia generally experience lower levels of HRQoL than do healthy individuals. 1 Several factors seem to determine the level of disability related to dystonia, including, but not limited to, the objective severity of dystonia and the presence of nonmotor symptoms, such as anxiety and depression, sleep disturbances, and pain. 2 Given that treatments for dystonia aim to relieve these problems in a holistic manner, measuring the changes in HRQoL can be an adequate way to evaluate the effectiveness of therapeutic interventions. The Burke-Fahn-Marsden Dystonia Rating Scale (BFMD-RS; frequently reported also as the Burke-Fahn-Marsden Motor Scale), Burke-Fahn-Marsden Dystonia Disability Scale (BFMD-DS), and 36-Item Short Form Health Survey (SF-36) are among the most frequently reported outcome measures in such trials. 3,4 The BFMD-RS is currently the only instrument that is recommended by the International and Parkinson Movement Disorder Society for evaluating the severity of generalized dystonia. 5 Other available scales, such as the Global Dystonia Rating Scale and the Unified Dystonia Rating Scale, are only suggested by the task force for such a purpose. 5 A PubMed search, using two terms ("dystonia" AND" Burke-Fahn-Marsden Dystonia Rating Scale") for All Fields, conducted on November 20, 2019, found 209 publications. A considerable part of these items was DBS [6][7][8][9][10][11][12][13][14][15][16][17] interventions reporting changes in the scores of the BFMD-RS and the BFMD-DS as their outcomes. According to the results of a recent analysis, the SF-36 is the most commonly used tool in studies reporting aspects of HRQoL in dystonia. 1 Consequently, the BFMD-RS, BFMD-DS, and SF-36 are widely used in clinical research for dystonia; however, their minimal clinically important difference (MCID) thresholds have not yet been established. Although the first randomized and controlled trials on DBS for dystonia used the arbitrary >25% improvement in BFMD-RS scores as the indicator of clinical relevance, 13,14 the accuracy and feasibility of this approach have never been tested before.
In clinical practice, a discrepancy may exist between levels of improvement or worsening in the objective severity of disease and the changes perceived by the patient. For example, the level of symptomatic improvement reported by the patient may be superior to that captured by clinicians. 18,19 Therefore, the sole use of threshold values established by only objective estimations for detecting clinically relevant changes in the severity of patients' symptoms may lead to distortions during evaluation of the efficacy of a treatment. However, the use of MCID threshold values, which reflect the smallest changes in an outcome measure that are meaningful to patients, may serve as a more feasible approach for revealing clinically important changes and contribute thereby to a more reliable translation of clinical outcomes into clinical practice.
Because no clinimetrically verified threshold values for detecting minimal but clinically relevant changes are available for the BFMD-RS, BFMD-DS, and SF-36 in generalized and segmental isolated (primary) dystonia, we aimed to determine these hallmarks.

Materials and Methods
The study protocol was similar to the procedure Makkos and colleageus used to establish MCID estimates for the Unified Dyskinesia Rating Scale. 20 A consecutive series of patients with idiopathic and inherited (torsin family 1 member A [TOR1A] positive) segmental and generalized isolated dystonia aged >18 years were enrolled with the ethical approval of the Regional and Institutional Ethical Committee (3617.316-24987/KK41) in the Department of Neurology, Pécs, Hungary, between 2013 and 2019. None of the included patients had structural brain abnormalities capable of producing dystonia, hallmarks of neurodegeneration (eg, brain iron accumulation) according to MRI, or other known causes of acquired dystonia, including neuroleptic use. The diagnosis of dystonia was confirmed by a neurologist specialized in the diagnosis and treatment of movement disorders.
In addition to demographic, treatment, and diseaserelated data, the BFMD-RS, 3 BFMD-DS, 3 and the SF-36 4 were obtained at baseline. With respect to the SF-36, the Physical Component Summary (PCS) and Mental Component Summary (MCS) scores were also calculated in addition to the scores of the eight subscales (physical functioning, social functioning, role limitations because of physical problems, role limitations because of emotional problems, mental health, energy and fatigue, bodily pain, and general health). 4 Although it is not recommended to calculate a single measure of HRQoL based on the individual SF-36 domains, the SF-36 Global Score, which has previously been called Total or Overall Score, has been increasingly reported during the past 20 years. 21 Therefore, we also generated such a global measure by the arithmetic averaging of the scores of the eight subscales. [21][22][23][24][25][26] Neurocognitive performance was also measured to detect major neurocognitive disorder (Montreal Cognitive Assessment [MoCA] score < 20.5), 27,28 which served as an exclusion criterion.
All enrolled patients were asked to return for followup every 12 months. At follow-up, the BFMD-RS, BFMD-DS, and SF-36 were reassessed. The magnitude of the perceived change in motor symptoms and disease-related difficulties since the last visit was measured using the Patient-rated Global Impression of Relative changes in the BFMDRS scores were calculated using the following formula: (score baselinescore follow-up ) / score baseline . CI, confidence interval; LR + , positive likelihood ratio; LR -, negative likelihood ratio; N/A = not applicable. Improvement (PGI-I) scale (1 = very much better; 2 = much better; 3 = a little better, 4 = no change; 5 = a little worse; 6 = much worse; and 7 = very much worse). 29

Statistical Analysis
The detailed methods for estimating MCID were described previously. 30 Briefly, MCID values were determined following the recommended strategy, 31 including the simultaneous use of both anchor-and distributionbased methods. Anchor-based methods estimate MCID by using an independent and clinically relevant tool that is simultaneously interpretable by itself and has a sufficiently strong correlation with the evaluated instrument. 31,32 Spearman's correlation method was used to test whether correlation coefficients between the PGI-I and changes in BFMD-RS, BFMD-DS, or SF-36 reached the required minimum of 0.3. 31 Correlation coefficients were 0.443, 0.357, and 0.374, respectively. Although all correlation coefficients exceeded the required minimum, ordinal regression modeling was also performed between the PGI-I (dependent value) and changes in scores of the BFMD-RS, BFMD-DS, and SF-36 to verify that the PGI-I is feasible to use as an anchor. 33 Subsequently, the first anchor-based method (within-patients score change method) compared changes in scores of the investigated instruments with the PGI-I score 4 (no change) to changes in the BFMD-RS, BFMD-DS, and SF-36 measures associated with the PGI-I score 5 (minimal worsening) and PGI-I score 3 (minimal improvement). The second anchor-based method (sensitivity-and specificitybased approach) used receiver operating characteristic (ROC) curve analysis to identify the MCID thresholds showing the most optimal specificity and sensitivity.
To ascertain the responsiveness of the PGI-I, a distribution-based approach was also used during the estimation of MCID values. Effect-size values (Cohen's d) were calculated 34 and, as has been recommended, changes in measures corresponding to a small effect size (approximately 0.2) were applied for determining the MCID cut-off values. 31,34 All statistical analyses were performed using the IBM SPSS software package (version 24.0.2; IBM Inc., Armonk, NY).

Results
A total of 898 paired examinations of 198 patients were finally analyzed. The number of paired visits, during which the change in the scores of the assessed scales was associated with a PGI-I score of 3, 4, or 5, was 517. The median number of follow-up visits was four, with the median intervisit interval of 12 months. Baseline characteristics of the study cohort are represented in Table 1.
Changes in treatment for dystonia during follow-up are shown in Supporting Information Table S1. A total of 136 patients (68.7%) were treated with DBS at the last follow-up.
Significant ordinal logistic regression models could be developed between the PGI-I and changes in BFMD-RS (Nagelkerke pseudo-R-square: 0.412; P< 0.01) and BFMD-DS scores (Nagelkerke pseudo-R-square: 0.389; P < 0.05). We could also build a significant ordinal logistic regression model between the PGI-I and changes in scores of the SF-36 (Nagelkerke pseudo-Rsquare: 0.461; P < 0.01).
Mean changes in BFMD-RS, BFMD-DS, and SF-36 scores, effect sizes, MCID values, and results of ROC curve analysis for the whole study population are shown in Table 2. Controlling for TOR1A gene testing (TOR1A positive [n = 59] vs. negative cases) did not alter the calculated thresholds considerably (Table 3).

Discussion
The concept of MCID is increasingly used in biomedical research for judging whether statistical significance implies clinical relevance. However, MCID scores for the BFMD-RS, BFMD-DS, and SF- 36 had not yet been evaluated in the population with segmental and generalized isolated dystonia. Therefore, we aimed to calculate MCID thresholds for these instruments. Following the recommendations of Revicki and colleagues, 31 the cut-off values on the BFMD-RS and BFMD-DS for minimal, yet clinically meaningful, improvement and worsening were 16.6% and 0.5 points and 21.5% and 0.5 points, respectively. Cut-off scores for the PCS, MCS, and the Global Score of the SF-36 for observing clinically meaningful improvement and relevant deterioration regarding HRQoL in dystonia could be set at 5.5 and 5.5, 6.5 and 7.5, and 7.5 and 8.5 points, respectively. As far as the authors are aware, present MCID estimations for the BFMD-RS and BFMD-DS cannot be compared to those of other studies because this is the first report on such threshold values for these instruments. Previous studies investigating DBS for isolated dystonia used a threshold value of >25% improvement in the BFMD-RS for considering clinical efficacy and >50% improvement for identifying "good" responders. 9,[13][14][15][16][17]35 Furthermore, patients having a 25% to 50% decrease in the BFMD-RS have been reported as partial responders. [36][37][38] Using the recommended methods for MCID estimation, we found that the threshold for minimal, yet clinically relevant, improvement in BFMD-RS scores may lie at an even lower level (16.6%). Our MCID threshold may give some explanations for those patients who reported perceived improvement despite <25% improvement on the BFMD-RS after DBS.
Some previous studies have already evaluated MCID thresholds for the SF-36 in patients with asthma, 39 heart disease, 39 chronic obstructive pulmonary disease, 39 rheumatoid arthritis, 40 and chronic fatigue syndrome, 41 and in patients undergoing total hip 42 or knee replacement surgeries, 43 where MCID values for SF-36 subscales varied between 0.4 and 25 points and between 8.3 and 25 points for improvement and decline, respectively. Although the SF-36 is a general health status measure, application of the aforementioned MCID estimations to dystonia patients may be misleading and inappropriate because MCID is highly dependent on characteristics of the study population. 39,44 The concept of MCID was established to overcome the issue of statistical significance not necessarily implying clinical importance. Therefore, our estimations may be useful in judging the clinical relevance of results from previous and future studies by using the BFMD-RS, BFMD-DS, and SF-36 to measure the effectiveness of treatments for isolated dystonia, especially DBS. According to our MCID calculations, the beneficial effects of neurosurgical procedures for isolated dystonia on the severity of dystonia and disability related to dystonia intensify with time, and, after reaching a plateau, they remain clinically relevant during long-term follow-up. In addition, improvements developing a long time after neurosurgical interventions exceed the MCID thresholds established in the present study in a more pronounced manner compared to those measured soon after the surgery (Supporting Information Tables S2 and S3).
The strength of the present approach lies partly in the simultaneous use of anchor-and distribution-based methods, resulting in similar MCID estimations. In addition, we provide MCID scores for scales evaluating changes in severity of dystonia both objectively and from patients' perspectives. The high number of included patients may also ensure reliability and enhance the wider applicability of the calculated thresholds. However, our estimations should be used with caution because differences in characteristics of study populations may exist. Based on disease severity, our patient population is largely characteristic for patients suitable for DBS therapy, but less likely proper for a focal dystonia cohort referred for botulinum neurotoxin treatment. Given that the present study cohort did not include patients with dystonia of genetic origin other than TOR1A gene, acquired dystonia, and pediatric patients, our MCID thresholds are not necessarily feasible in such populations. In addition, although we tried to investigate a relatively homogeneous population (idiopathic and inherited isolated dystonia syndromes with segmental and generalized distribution), a considerable heterogeneity may exist in both motor and nonmotor symptoms, which could not be specifically addressed during the study.