Patient-reported outcome measurements (PROMS) have been proposed sensitive outcome parameters in multiple sclerosis (MS). In this study, we assessed a German version of the Multiple Sclerosis Impact Scale (MSIS-29) and a revised version of the Hamburg Quality of Life Questionnaire in Multiple Sclerosis (HAQUAMS) in comparison with rater- and physician-based tools.
Consecutive MS patients (n = 117) of the MS outpatient unit were included. In addition to MSIS-29 and HAQUAMS, the following parameters were obtained: Expanded Disability Status Scale (EDSS) and modified Multiple Sclerosis Functional Composite (MSFC) [9-hole peg test (9HPT), 25-foot walk test and symbol digit modalities test]. We investigated validity, internal consistency and test–retest reliability as well as correlation between these measures.
Internal consistency (Cronbach's α ≤ 0.96) and test–retest coefficients (ICC ≤ 0.87) of both scales were high and satisfied psychometric standards. Convergent and discriminant validity was supported by direction, magnitude and pattern of correlation with other rater-based measures depending on the functional subdomain. Both MSIS-29 and HAQUAMS correlated with EDSS (ρ = 0.55 vs 0.62), but stronger correlation was found between MSIS-29 and HAQUAMS total score (ρ = 0.90). Both scales distinguished between patient groups of varied disease severity and cognitive impairment.
Patient-reported outcome measurements as MSIS-29 and HAQUAMS seem to be valid instruments to detect different impairment levels in comparison with traditional rater-based instruments like EDSS or MSFC.
In the fast-growing field of immunotherapeutic approaches in multiple sclerosis (MS), there is an increased need for sensitive and clinically relevant measurements of the natural disease course and treatment effects. Psychometric limitations of traditional rating instruments like the Expanded Disability Status Scale (EDSS) have repeatedly been shown [1, 2]. As increasingly patients with less active disease course are treated with immunotherapy  and more studies compare different treatment concepts with smaller differences in efficacy, responsive tools are needed. Patient-reported outcome measurements (PROMS) gain more and more importance in the assessment of MS. They supplement the traditionally used outcome parameters to achieve an elaborated triad of measuring: rater-based tests, objective tests and patient-based self-report. PROMS can easily be implemented in daily clinical practice and have also been proposed as possibly more sensitive outcome parameters for clinical studies than rate-based scales as the EDSS [4, 5]. For example, Hoogervorst et al.  demonstrated that the United Kingdom Neurological Disability Scale (UNDS), a patient-rated impairment scale might be even more sensitive to changes than EDSS.
The ultimate goal of any medical intervention is the improvement of quality of life (QoL). As a multidimensional construct, QoL is not only determined by disease-related factors but is also modified by individual factors like self-efficacy or impaired body image . QoL can only be measured by patient report and thus has not only been discussed as a major concept for PROMS but as a reasonable primary endpoint for interventions . Disease-specific patient reported impairment as measured by the Multiple Sclerosis Impact Scale (MSIS-29)  and disease-specific QoL as measured by the Hamburg Quality of Life Questionnaire in Multiple Sclerosis (HAQUAMS)  have shown validity in earlier studies.
These two scales suppose to measure a similar construct but were differently developed. While the MSIS-29 was established by a deductive approach reducing an item pool generated from patient interviews, the HAQUAMS was constructed inductively to cover all important disability dimensions in MS. An updated version of the HAQUAMS was recently developed with changes mainly in the area of neuropsychiatric symptoms, as it has been shown that these symptoms are of major relevance to QoL [11, 12].
This study was aimed at validating a German version of the MSIS-29 and the updated version of the HAQUAMS as well as comparing the scales with each other. The construct validity of these scales was tested by comparing the results to established standard rater- and physician-based measures.
We expected that both PROMS show at least moderate correlations in direction and magnitude to corresponding dimensions in physician and rater-based tools. As the psychological aspect is hardly addressed in EDSS or rater-based tests, we expected lower correlations in these subscales.
The MSIS-29 was not available in German and thus underwent a forward and backward translation procedure. The original English version was first translated into German by native German health scientists, after that discussed by health professionals (neurologists, health scientists and medical students) and modified accordingly. Finally, backward translation was performed by an independent native English translator and once again discussed with health professionals to achieve consensus on the German version.
Questionnaires were completed by patients with MS (clinically isolated syndrome (CIS), relapsing remitting MS (RRMS), secondary progressive MS (SPMS), primary progressive MS (PPMS)) recruited consecutively from the MS outpatient clinic of the University Medical Center in Hamburg Eppendorf (UKE) between June and December 2009 (n = 117). Patients with major impaired vision or hand function and thus not able to fill in the questionnaires on their own were excluded. Patients were asked to complete all questionnaires on the day of their visit. For the assessment of test–retest reliability, a subgroup of patients (n = 23) was asked to complete a second set of questionnaires 1 week later and send it back in an enclosed return envelope. Non-responders were reminded after 2 weeks by phone. All patients gave written informed consent for the study.
The Multiple Sclerosis Impact Scale (MSIS-29) is a measure of the physical and psychological impact of MS from the patients' perspective and was developed as an MS-specific scale using a standardized psychometric approach of reducing an item pool generated from patient interviews. It is a 29-item questionnaire  structured in two subscales – a 20-item scale for physical impairment and a 9-item scale for psychological impairment – and items are answered in a 5-point Likert scale. Subscale scores were calculated using means of each of the subscale items. A MSIS-29 total score was computed by the means of the two subscale scores. Usually, MSIS-29 subscale values are normalized to a 0–100 rating scale. We used raw data values to enhance dimensional comparability to HAQUAMS scores.
As a specific instrument measuring life quality in people living with MS, we used a recently updated version of the Hamburg Quality of Life in MS (HAQUAMS) . The new version of the HAQUAMS (10.0) consists of 44 items using a 5-point Likert scale. Twenty-eight of these items are subdivided into six subscales for ‘upper extremity’, ‘lower extremity’, ‘cognition’, ‘fatigue’, ‘mood’ and ‘communication’. Remaining items cover additional symptom domains as well as a patient-rated estimation of disease progression over time and main impairments. Mean subscale scores were calculated for the six subscales, and HAQUAMS total score was computed by the means of the six subscale scores.
As the HAQUAMS has been integrated into clinical routine in our centre for more than 10 years, some items have been changed and added to the originally validated version HAQUAMS 3.1, which only consisted of 38 items. In the new HAQUAMS version 10.0 (Institute of Neuroimmunology and Clinical MS Research in Hamburg, Germany), two of the original scales were modified to achieve more detailed information about neuropsychiatric symptoms. The former 4-item scale for ‘fatigue/thinking’ was split into two separate scales, a 4-item scale for ‘fatigue’ and a 4-item scale for ‘thinking’, each containing two original and two new items. In the subscale ‘mood’, 2 of 7 items were deleted, now the scale being formed by only 5 items. Another item was added in the field of vision, as we recently have shown that visual functioning is a major area of relevance to people with MS . Furthermore, one item in the mobility scale has been reworded to obtain more precise information.
The original HAQUAMS proved good reliability and validity  and was shown to be reliable in patients with cognitive deficits . Recently, sensitivity to change has been demonstrated in different clinical settings .
The EDSS  was used as the gold standard instrument to assess disability.
For clinical scoring, the modified Multiple Sclerosis Functional Composite (MSFC)  was used. This version of the MSFC consists of the time to walk 25 feet (T 25FW), the 9HPT for evaluation of hand function and the symbol digit modalities test (SDMT) [18, 19]. The SDMT was used to estimate cognitive impairment as a substitute for the paced auditory serial addition test (PASAT). The SDMT has higher patient acceptance compared with the PASAT and has shown similar sensitivity to change in cognitive functions .
To determine convergent and discriminant construct validity, MSIS-29 was assessed by correlations (Spearman) with already established scales and objective instruments. Total scores as well as corresponding functional dimensions in objective tests were correlated; the same correlations were obtained for the HAQUAMS 10.0. We considered correlations up to 0.3 as low, from 0.31 to 0.69 as moderate and from 0.7 as high. MSIS-29 was also tested concerning the differentiation between groups. Grouping was carried out by the means of predefined cut-off values as in previously performed validation studies [9, 21]. For EDSS, the groups were differentiated as ≤3.0 (minor physical impairment), 3.5–6 (moderate physical impairment) and ≥6.5 (severe physical impairment). For differentiation of impairment of information processing speed in the SDMT, groups were stated by standard deviation: +3 to 0 (not impaired), −0.5 to −2 (moderate impairment), ≤−2.5 (severe impairment), and one-way analysis of variance (ANOVA), Mann–Whitney U-test and Kruskal–Wallis H-test were applied.
All items of MSIS-29 and HAQUAMS 10.0 use a Likert Scale from 1 to 5, high scores indicating a high negative impact on QoL. Mean scores were computed for all subscales of MSIS-29 and HAQUAMS. In case of missing data, mean substitutions were allowed for missing items. Subscale scores were not computed if more than 20% of items were missing.
Total scores were computed by averaging subscale scores; thus, subscale dimensions are weighted equally regardless of the number of items of each subscale.
For test–retest reliability, two data sets obtained at least 1 week apart were compared by intraclass correlations (ICC). We also explored potential floor and ceiling effects (defined as more than 20% of tested patients have minimal or maximal score). Data were analysed using SPSS 15.0: 2006 (SPSS Inc., Chicago, IL, USA).
We included a total of 117 (n = 117) consecutive MS patients in the study, of them 80 (68.4%) were female and 37 were male. Mean age was 41.4 years (SD 9.5), mean disease duration was 8.9 years (SD 7.3), 60 (51.3%) patients had RRMS, 21 (17.9%) had a CIS, 23 (19.7%) a SPMS and 11 (9.4%) a PPMS. EDSS scores ranged from 0.0 to 8.0 (SD 1.9), with a rather low mean disability level of EDSS 3.4. Of 117 patients, 57 (51%) patients were treated with immunotherapy.
Mean scores of rater-based tests were −0.6 (SD 1.7) for SDMT, 6.0 (SD 4.3) for 25FWT, 20.5 (SD 7.7) for 9HPT right and 20.8 (SD 5.7) for the left hand.
Mean MSIS-29 scores (2.2, SD 0.9) were in the lower range of the scale. The physical subscale's mean score was 2.1 (SD 0.9), and the psychological subscale's mean score was 2.3 (SD 0.9). Mean scores of the HAQUAMS (2.2, SD 0.8) were similarly low, indicating a rather high QoL in the study sample. Only in a low percentage, computing of subscale scores was not possible because of missing data: HAQUAMS subscores (0–2.6%) and MSIS-29 subscores (0–1.7%).
Validity of HAQUAMS was not substantially changed using version 10.0, and Cronbach's alpha values for new scales ‘fatigue’ and ‘thinking’ were high (0.90 and 0.93, respectively). The original scale ‘fatigue/thinking’ showed slightly lower values with 0.85 (for detailed information see Table 1).
Table 1. Validity of HAQUAMS subscale scores (n = 117); maximum range of scores 1–5
Number of items
% of scoring 1 or 5
Test–retest reliability ICC (95% CI)
ICC, intraclass correlation coefficient; CI, confidence interval 95%; HAQUAMS, Hamburg Quality of Life Questionnaire in Multiple Sclerosis.
Data for test–retest reliability based on n = 15.
Mobility (upper limb)
Mobility (lower limb)
Factor analysis confirmed two underlying dimensions of the MSIS-29 (for detailed information see Data S1). Cronbach's alpha coefficients with 0.96 for the MSIS-29 and 0.94 for HAQUAMS 10.0 were high, indicating a high reliability of both scales.
Of the selected samples for MSIS-29 test–retest reliability, only 17 of 23 patients sent their questionnaires back despite being reminded by phone. Two sets were incomplete and were thus excluded from analysis, so for the assessment of test–retest reliability, a sample of n = 15 with an assessment interval of at least 1 week was available. Intraclass correlation ICC = 0.87 (CI 0.66–0.91) indicated high test–retest reliability and satisfied psychometric standards. No floor or ceiling effects were found in either of MSIS-29 subscales.
Differentiation of groups
Multiple Sclerosis Impact Scale-29 did significantly distinguish between patient groups of varied disease severity (EDSS: 0–3/, 3.5–6/, ≥6.5). ANOVA was significant for MSIS-29 total, physical and corresponding HAQUAMS scales (P < 0.001) (see Table 2). Mann–Whitney U-test and Kruskal–Wallis H-test also revealed significant differentiation of all reported scales between EDSS groups (all P < 0.01).
Table 2. Differentiation of EDSS subgroups by MSIS-29
EDSS, Expanded Disability Status Scale; HAQUAMS, Hamburg Quality of Life Questionnaire in Multiple Sclerosis; MSIS-29, Multiple Sclerosis Impact Scale.
Data are shown as mean, standard deviation (SD) in brackets; * indicates P ≤ 0.001 in ANOVA, Mann–Whitney U-test and Kruskal–Wallis H-test.
Minor impairment (≤3.0)
Moderate impairment (3.5–6)
Severe impairment (≥6.5)
Multiple Sclerosis Impact Scale-29 total was able to discriminate between different severities of cognitive impairment as classified by SDMT. All applied tests were significant for MSIS total (P < 0.01) and corresponding HAQUAMS subscale (P < 0.001), while the psychological subscale of the MSIS-29 could not differentiate groups (see Table 3). However, MSIS-29 differentiated between minor and moderately affected individuals according to SDMT (all P < 0.05).
Table 3. Differentiation of SDMT subgroups by MSIS-29
HAQUAMS, Hamburg Quality of Life Questionnaire in Multiple Sclerosis; MSIS-29, Multiple Sclerosis Impact Scale; SDMT, symbol digit modalities test.
Data are shown as mean and standard deviation (SD) in brackets; * indicates P ≤ 0.01, ** indicates P ≤ 0.001 in ANOVA, Mann–Whitney U-test and Kruskal–Wallis H-test.
No impairment (3–0)
Moderate impairment (−0.5 to −2)
Severe impairment (>−2)
Convergent and discriminant construct validity
Multiple Sclerosis Impact Scale-29 total score correlated significantly with rater-based instruments: EDSS, T25FW, 9HPT right, left and SDMT (see Table 3). When MSIS-29 subscales were correlated with the corresponding objective tests, the physical subscale correlated moderately with T25FW (ρ = 0.61), 9HPT right (ρ = 0.59) and left (ρ = 0.63). The psychological subscale did not significantly correlate with SDMT but moderately with the EDSS functional system score for cognition (ρ = 0.46) (for detailed information see Table 4 and Fig. 1A,B).
Table 4. Correlations of MSIS-29 with EDSS and modified MSFC
Convergent and discriminant validity of MSIS-29 was supported by direction, magnitude and pattern of correlation with other rater-based measures depending on the functional subdomain.
Hamburg QoL Questionnaire in MS 10.0 showed convergent and discriminatory power quite close to the former version. So HAQUAMS 10.0 subscales showed expected correlations corresponding to obtained objective measurements; highest correlations were found between the HAQUAMS subscale for mobility of the lower extremity with T25FW (ρ = 0.77) and EDSS (ρ = 0.81) (for detailed information see Data S2).
Correlations between MSIS-29 and HAQUAMS total scores were highly significant (ρ = 0.90, P ≤ 0.01, see Fig. 1E). MSIS-29 total score correlated less strongly with EDSS than HAQUAMS total score (ρ = 0.55 vs 0.62, see Fig 1C,D). Highest correlations were found between corresponding dimensions of the scales: The MSIS-29 physical subscale showed high correlation with HAQUAMS ‘lower extremity’ (ρ = 0.83) and ‘upper extremity’ (ρ = 0.78). In addition, the MSIS-29 psychological subscale correlated highly with HAQUAMS ‘mood’ (ρ = 0.74), ‘fatigue’ (ρ = 0.73), ‘thinking’ (ρ = 0.71) as well as moderately with ‘communication’ (ρ = 0.56). As a global rating for QoL, HAQUAMS item 43 correlated significantly with MSIS-29 (ρ = 0.65) and HAQUAMS (ρ = 0.63) total scores but only weakly with EDSS (ρ = 0.27) (for detailed information see Table 5).
Table 5. Correlations of MSIS-29 and HAQUAMS subscores
HAQUAMS total score
HAQUAMS ‘lower extremity’
HAQUAMS ‘upper extremity’
HAQUAMS, Hamburg Quality of Life Questionnaire in Multiple Sclerosis; MSIS-29, Multiple Sclerosis Impact Scale.
Data were expressed as Spearman's rho correlations,* indicates P ≤ 0.05, ** indicates P ≤ 0.01.
This study aimed to validate and compare a German version of the MSIS-29, a patient-based impairment scale, and an updated version of the HAQUAMS, a patient-based QoL scale. In general, both measures fulfilled psychometric validity criteria. Factor analysis of the German version of MSIS-29 confirmed two underlying subscales from the original scale. Confirming validity, correlations between the physical subscale with EDSS and HAQUAMS mobility subscales for upper and lower extremity were moderate to high. As expected, the psychological subscale showed only low correlation with EDSS, but high correlation with HAQUAMS subscale for ‘mood’, ‘fatigue’ and ‘thinking’. However, the MSIS-29 psychological subscale was less able than HAQUAMS ‘thinking’ to discriminate between different levels of cognitive impairment measured by SDMT. This might be due to the fact that MSIS-29 psychological subscale combines aspects of fatigue, depression and cognitive impairment in one scale, and further differentiation is not possible. However, we believe that it is worthwhile to try to distinguish these different qualities as far as possible, as therapeutic approaches are difficult and differ between these dimensions .
While there is ongoing discussion if patient-based rating of cognitive impairment really corresponds to objective measurements , previous HAQUAMS validation studies  as well as the current study show that patients suffering from cognitive impairment are able to make consistent ratings on HAQAMS correlating with SDMT results, indicating a high reliability. As a limitation, our study sample focuses on mild to medium impaired patients (EDSS mean 3.4). How well PROMS might differentiate levels of disability in heavily impaired patients has, to our knowledge, not been investigated [24, 25]. As data for test–retest reliability were only based on a small sample size (n = 15), interpretation of these results might thus be limited.
In the English version, physical and psychological scores of MSIS-29 showed good variability, high internal consistency and high test–retest reliability [26, 27]. MSIS-29 was developed in a highly standardized approach, heavily involving patients' opinions. However, longitudinal studies could not consistently show sensitivity to change of the MSIS-29 [26, 28, 29]. An advantage of MSIS-29 is that it is rather short, so that patient burden is low and the instrument has high acceptance. On the other hand, some relevant functional domains such as vision or bladder function are not sufficiently represented .
We calculated total scores for MSIS-29, an approach that has not been applied to this scale before. Our data indicate that a total score correlates with both subscores and thus an overall estimate of patient-based impairment through a total score might be justified. Calculated values for MSIS-29 total score in our study showed good intermediate values between the physical and psychological subscale. Therefore, MSIS-29 total score seems to be feasible to estimate disease-specific impact, even regarding the fact that the two scales measuring distinct constructs and combining the two in a total score may mask differential effects. The fact that MSIS-29 total score highly correlated with the validated HAQUAMS total score (ρ = 0.90) supports this approach.
In summary, PROMS as MSIS-29 and HAQUAMS seem to be valid instruments being able to detect different impairment levels in comparison with traditional rater-based instruments like EDSS or MSFC. They might be even more sensitive to detect areas for medical interventions in daily routine than EDSS. The question, if PROMS might be superior to rater-based tools to depict longitudinal disease evolution in MS is still open. A few studies indicate that patients' impression of change in disability may differ qualitatively and quantitatively from an examining physician . Costelloe et al.  could only show weak correlations of MSIS-29 with MSFC over a 3-year follow-up but a good correlation of the physical subscore with EDSS over 4 years . A major problem of PROMS is their susceptibility to response shift phenomena  which may overestimate or underestimate real disease impact. Very few attempts have been undertaken to control for this bias. Possibly parallel assessments of the changing value of bodily functions  might give some information in this area.
A multicentre study to investigate this question by applying the triad of rater-based objective and patient-based measurements in a 2-year follow-up is currently ongoing. Further research in this field is needed to justify PROMS emerging use in clinical studies as well as in daily practice as a major clinical and epidemiological study as well as management outcome parameter.
The work was partially supported by the Consortium Biopharma Neu2, Grant number 0315613 of the German Ministry of Education and Research as well as through the German Gemeinnützige Hertie-Stiftung, Grant number 1.01.1/08/117.
Conflict of Interest
The work was partially supported by the Consortium, Biopharma – Neu2; Grant number 0315613 of the German Ministry of Education and Research as well as through the German Gemeinnützige Hertie-Stiftung, Grant number 1.01.1/08/117.