International Development of the Patient-Reported Outcome Indices for Multiple Sclerosis (PRIMUS)


Stephen P. McKenna, Galen Research Ltd, Enterprise House, Manchester Science Park, Lloyd Street North, Manchester M15 6SE, UK. E-mail:


Background:  The Patient-Reported Indices for Multiple Sclerosis (PRIMUS) comprises a suite of three scales for assessing symptoms, activity limitations, and quality of life in multiple sclerosis (MS). It was developed in the UK and has been shown to have excellent psychometric properties. This study describes the adaptation of eight language versions for Canadian English, Canadian French, French, German, Italian, Spanish, Swedish, and US English.

Methods:  The PRIMUS was translated using the dual-panel process. Cognitive debriefing interviews conducted with MS patients assessed face and content validity. Psychometric and scaling properties were assessed via a two-administration postal survey conducted in each country involving the PRIMUS, the Nottingham Health Profile (NHP), the Unidimensional Fatigue Impact Scale (U-FIS), and demographic questions.

Results:  Cognitive debriefing interviews demonstrated the acceptability of the new language versions. Analysis of survey data showed that the new language versions of the three PRIMUS scales were unidimensional (as indicated by fit to the Rasch model) and that they had good internal consistency and reproducibility. PRIMUS scale scores correlated as expected with those on the NHP and the U-FIS. The scales in all countries were able to discriminate between groups of patients on the basis of their self-reported MS severity, general health, and employment status.

Conclusions:  The PRIMUS was successfully adapted into eight new languages. Most of the tests showed the PRIMUS to have good unidimensionality and to have good internal consistency, reproducibility, and construct validity. The measure is now available for use in clinical studies and trials involving these countries and the UK. Further work is required to assess the measure's responsiveness.


Multiple sclerosis (MS) is a chronic, progressive, disabling autoimmune disorder of the central nervous system that causes inflammation and neurodegeneration [1]. It is one of the most common neurological diseases affecting around 380,000 people in Europe, up to 400,000 in the United States, and an estimated 2,500,000 people worldwide [2–4]. Age of onset is commonly early to mid-adulthood [5]. In relapsing forms of MS patients experience periods of relatively good health (remissions) alternated with debilitating relapses. However, there is no set pattern, and clinical manifestation will vary both between and within patients depending on which areas of the central nervous system are affected at any given point in time. Symptoms experienced can include visual or sensory disturbances, loss of strength or sensation in limbs, ambulatory problems, loss of bladder and bowel control, cognitive impairment, fatigue, spasticity, and sexual dysfunction. The disorder generally worsens over time, leading to irreversible functional disability. Consequently, the condition can have a profound impact on all aspects of the patient's life. Current treatment regimes aim at managing symptoms to maximize life quality or to impede disease progression. However, the young age of onset means that the physical and psychological effects may be apparent for much of the patient's adult life [5].

The variable and unpredictable nature of MS presents a challenge to patient-reported outcome (PRO) assessment. Generic outcome measures will only assess some of the impacts of the illness on the patient and will include questions on irrelevant issues. Furthermore, the existing disease-specific measures fail to address the full range of potential problems that MS patients experience. The Patient-Reported Indices for Multiple Sclerosis (PRIMUS) was developed in the UK to capture the overall impact of MS from the patient's perspective [6]. This instrument consists of three distinct scales specific to MS: symptoms, activities, and quality of life (QoL). Scale content was generated directly from in-depth qualitative interviews with MS patients in the UK. Sufficient interviews were conducted to ensure that all relevant issues were identified. Patients were then involved in each stage of the scale development to ensure that items represented patients' own experience with MS.

As the PRIMUS was developed directly from patient interviews and is specific to their experience of MS, it ensures that all relevant issues are assessed. Furthermore, as the PRIMUS consists of unidimensional scales it provides a holistic assessment of the impact of MS on patients. As there was a need to employ the PRIMUS in an international clinical trial, additional language versions of the measure were required. This paper reports on the adaptation and validation of versions of the PRIMUS for Canada (English and French), France, Germany, Italy, Spain, Sweden, and the United States.


Three stages were involved in the adaptations: translation, assessment of face and content validity and formal psychometric evaluation.


Different patient samples were included at each of the three stages of the study in each country. Participants were recruited via patients' organizations or clinics. When recruited from the former source, potential participants were required to answer screening questions to ensure that they were eligible for study inclusion. Patients with significant comorbidity that might influence their responses to the questionnaire, such as diagnosed psychiatric disorders or cancer, were excluded. All patients provided informed consent for their participation, and confidentiality was maintained at each stage. Where patients were recruited through clinical centers, ethical approval was obtained from the relevant Research Ethics Board before the commencement of study activities.


The adaptation of an instrument for use in another language highlights a number of linguistic and conceptual issues. For example, language contains many nuances and phrases that, although well understood in the language in which the instrument was developed, are not always clear to nonnative speakers. Consequently, it is inappropriate to produce a new language version of a questionnaire by simply translating the content (literal translation). To produce new language versions that are comparable across languages, it is necessary for items to have conceptual equivalence. Conceptual equivalence ensures that the meaning of the original item is preserved by the translation of the concept or notion covered by that item.

The dual-panel translation methodology was used to produce the new language versions of the PRIMUS [7]. The dual-panel approach requires that the verification and evaluation of acceptability of translations rests with people who are typical of the patients who will later be asked to complete the questionnaire. This approach aims to produce translated measures that use clear, everyday language while maintaining conceptual equivalence. The translation process was managed by the UK PRIMUS developers who provided detailed item meanings for each scale and worked closely with local investigators to ensure that the conceptual meaning of items was retained in all countries. Translations were achieved via two translation panels (a “bilingual” and a “lay” panel) conducted by the local investigators in each country. The bilingual panel included a group of individuals fluent in both the target and source language. The panel worked together as a group to agree on the most appropriate translations. The lay panel comprised a group of monolingual people of average educational attainment. The remit of this group was to consider the translations produced by the professional panel to ensure that the questionnaire content was expressed in natural, everyday language. As the PRIMUS was first developed in UK English the US- and Canadian-English translations required only the lay panel. MS patients were excluded from both panels as the purpose was to determine the most appropriate wording for the PRIMUS rather than to comment on its content. All panels consisted of between four and seven participants. At both stages of the translation process, suggested changes to the translation were agreed with the UK developers before continuation to the next project stage.

Assessment of Face and Content Validity

Interviews were conducted with MS patients to test the acceptability, understanding, relevance, and comprehensiveness of the new translations in each country. Participants were asked to complete the PRIMUS in the presence of an interviewer who noted any obvious difficulties. Participants were then invited to comment on the questionnaire items, instructions, and response format. In particular, they were asked to consider the acceptability, clarity, and comprehensiveness of questionnaire content.

Scaling and Psychometric Evaluation

Psychometric surveys were conducted in each country. Participants completed the PRIMUS scales, the Unidimensional Fatigue Impact Scale (U-FIS) [8,9], demographic questions, and items about perceived MS severity and general health via postal survey on two occasions, two weeks apart. Participants in all countries other than Spain also completed the Nottingham Health Profile (NHP) [10,11].

Questionnaires.  The development of the PRIMUS has been described in detail previously [6]. It consists of three scales measuring symptoms, activities, and QoL. Both the symptom and QoL scale contain items in the form of simple statements accompanied by dichotomous response options. For each scale items are summed to yield a total score ranging from 0 to 22. High scores indicate worse symptomatic/QoL impact. The activities scale contains 15 items describing specific tasks. Individuals rate the degree to which they are able to perform the tasks on a 3-point scale. Again, items are summed to give a total score that can range from 0 to 30. High scores are indicative of greater levels of activity limitation.

The NHP is a measure of perceived distress that consists of six sections: energy level, pain, physical mobility, sleep, social interactions, and emotional reactions. Each section is scored 0 to 100 with a high score indicating greater distress. The U-FIS is a new version of the Fatigue Impact Scale [12] that has been developed to provide an index rather than a profile of fatigue impact scores. The U-FIS contains 22 items yielding a total score that can range from 0 to 66. High scores are indicative of greater impact of fatigue.

Patients also completed a demographic and disease questionnaire that included questions on self-rated MS severity (4-point Likert scale from “mild” to “very severe”), self-perceived current general health (4-point Likert scale from “very good” to “poor”), employment status (working vs. unable to work), and MS type.

Data analyses.  Rasch analysis [13] was conducted on the data from each country separately to determine scale unidimensionality using RUMM programme 2020 (RUMM Laboratory Pty Ltd, Perth). The Rasch model was first developed in the field of education and asserts that the easier an item is the more likely it is to be affirmed. According to the model the only two functions governing whether a PRO item is affirmed are the severity of the item and the person's level of the construct (for example, QoL) being measured. Each item and person is placed in order of severity based on item responses. Items that do not fit the model may be those that are answered inconsistently or where the responses are influenced by factors other than that being measured by the scale. If items fit the Rasch model then the scale as a whole can be considered to be unidimensional and also meet the strict properties of interval level measurement. Fit of the PRIMUS scales was evaluated through chi-square fit statistics. A significance level of 0.01 was chosen because of multiple-item testing in each scale. Nonsignificant results would indicate that the PRIMUS scales are unidimensional and that individual-item scores could be summed to derive an overall scale score. Rasch analysis was also used to determine whether external factors influenced scores via differential item functioning (DIF). DIF is an additional aspect of model fit and occurs when subgroups of individuals respond systematically differently to items, thus having the potential to produce bias within the scale [14]. Age and gender were assessed for DIF in the present analyses. Consistency in item ordering was also investigated in each country to assess cross-cultural equivalence of the scales. If item ordering is similar across the language adaptations then this presents preliminary evidence of the cross-cultural equivalence of the scales. Bonferroni corrections were applied to both item fit and DIF analyses because of the large number of analyses performed.

Internal consistency (the degree of relatedness of items) was assessed by Cronbach's alpha coefficients for each scale. A value of 0.70 or above was taken as being indicative of adequate internal consistency [15]. Reliability of the new PRIMUS scales (an estimate of the instrument's reproducibility over time, assuming that no change in condition has taken place) was assessed using Spearman's rank test–retest and intraclass correlation (ICC). A high correlation (above 0.85) indicates that the scale produces very low levels of random measurement error [16].

Convergent and divergent validity were evaluated by assessing the level of association (Spearman rank correlations) between scores on the PRIMUS scales and those on the U-FIS and NHP section scores. Moderate correlations were anticipated between the scales, indicating that they assess different but related constructs. Known Groups validity was assessed by examining the PRIMUS scores of respondents who differed according to their self-perceived MS severity, self-perceived current general health, and employment status. Individuals with worse general health and severity were expected to have significantly higher PRIMUS scores. Individuals unable to work were expected to have significantly worse PRIMUS scores. Nonparametric tests for independent samples (Mann-Whitney U-test for two groups and Kruskal-Wallis Test for three or more groups) were employed to test for differences. Psychometric testing was completed using the SPSS 15.0 statistical package (SPSS, Chicago, IL).



The PRIMUS was successfully translated in all countries. The lay panels employed for the non-English translations suggested improvements to some items provided by the professional panel that improved item clarity or immediacy.

Cognitive Debriefing Interviews

Interviewee details are shown in Table 1. All participants were able to respond to all items. Participants reported that the items were easy to understand and relevant to someone with MS—even those symptoms that did not currently apply to them. No areas were consistently reported to be missing from the questionnaire, and no items were reported to be redundant to patients. No changes in wording were required as a result of the patient interviews in any country.

Table 1.  Demographic details of participants
  1. C-E, Canada (English); C-F, Canada (French); D, Germany; E, Spain; F, France; I, Italy; Sw, Sweden; US-E, US English; MS, multiple sclerosis.

Cognitive debriefing        
 Female n (%)13 (86.7)11 (73.3)5 (55.6)9 (60.0)11 (73.3)5 (45.5)8 (57.1)11 (73.3)
 Age (years) mean (SD)42.8 (12.9)39.2 (15.8)39.0 (12.9)42.9 (13.4)51.7 (12.9)48.9 (8.3)47.8 (11.5)54.3 (5.9)
 MS duration (years) mean (SD)8.7 (6.9)11.8 (10.0)12.0 (9.2)8.4 (11.6)19.7 (12.3)19.5 (10.4)12.4 (6.5)22.7 (13.7)
Postal survey        
 Female n (%)71 (74.0)56 (64.4)48 (56.5)54 (71.1)65 (65.0)56 (64.4)102 (61.1)85 (81.7)
 Age (years) mean (sd)44.5 (9.9)45.1 (10.5)48.5 (10.4)42.1 (12.3)53.1 (12.1)43.6 (11.0)48.7 (12.1)45.7 (10.8)
Type of MS n (%)        
 Relapse remitting70 (72.9)39 (38.2)12 (14.1)40 (52.6)17 (17.0)41 (47.1)47 (23.5)82 (78.8)
 Primary progressive4 (4.2)7 (6.9)17 (20)7 (9.2)29 (29.0)6 (6.9)14 (7.0)4 (3.8)
 Secondary progressive5 (5.2)19 (18.6)23 (27.1)9 (11.8)29 (29.0)16 (18.4)25 (12.5)10 (9.6)
 Progressive relapsing3 (3.1)06 (7.1)2 (2.6)11 (11.0)2 (2.3)6 (3.0)2 (1.9)
 Benign10 (10.4)8 (7.8)2 (2.4)2 (2.6)4 (4.0)21 (24.1)7 (3.5)2 (1.9)
 Not reported4 (2.2)29 (28.4)25 (29.4)16 (21.1)10 (10.0)1 (1.1)101 (50.5)4 (3.8)
MS duration (years) mean (sd)11.6 (8.5)10.4 (7.5)15.3 (9.4)8.8 (8.0)17.8 (10.8)9 (7.3)15.4 (10.1)10.1 (8.9)

Psychometric and Scaling Postal Surveys

Participant's details and the total number of patients included in each language adaptation are shown in Table 1. Throughout the psychometric analysis, if individuals had missing data for a given test they were not included in the analysis for that test. Many participants were unaware of their type of MS. Of those who were able to report the type, 55.5% were relapse remitting, 14% primary progressive, 21.7% secondary progressive, 5.1% progressive relapsing, and 8.9% benign.

PRIMUS scaling properties.  The fit of the PRIMUS scales to the Rasch model is shown in Table 2. For the Symptoms scale, fit was demonstrated for all language versions. The Spanish version of the activity limitations scale showed borderline misfit to the model. The overall fit statistics for the QoL scale showed that all language adaptations except the French version fit the Rasch model.

Table 2.  Overall Rasch chi-square fit statistics of the PRIMUS
  1. C-E, Canada (English); C-F, Canada (French); D, Germany; E, Spain; F, France; I, Italy; Sw, Sweden; US-E, US English; PRIMUS, Patient-Reported Indices for Multiple Sclerosis; QoL, quality of life.

Symptoms (n)98838365927114598
Activity limitations (n)66798055806112481
QoL (n)939483679984160102

DIF was minimal for all language versions of the PRIMUS scales. After applying Bonferroni corrections to symptoms scale data, minimal DIF was observed by age for one item on the Canadian-French adaptation and by gender for two items on the Swedish version. None of the items in the QoL scale exhibited DIF.

Table 3 shows the three consistently mildest (representing the least symptomatic/activity limitation or QoL impact) and three consistently severest (representing the greatest symptomatic/activity limitation or QoL impact) items across all eight countries as identified by Rasch analysis.

Table 3.  PRIMUS item ordering
  1. PRIMUS, Patient-Reported Indices for Multiple Sclerosis; QoL, quality of life.

 Mildest items
  Have you experienced weakness in your arms or legs?
  Have you had problems with your balance?
  Have you been forgetting things?
 Most severe items
  Have you had difficulty swallowing?
  Have you had bowel incontinence?
  Have you experienced paralysis in any part of your body?
Activity limitations
 Mildest items
  Do heavy jobs around the house or garden
  Walk longer distances
  Carry heavy items
 Most severe items
  Have an allover wash (including bath or shower)
  Get out of bed
  Get dressed
 Mildest items
  I have to pace myself throughout the day
  I have to push myself to do things
  My self-confidence is affected
 Most severe items
  I can't think about anything but the MS
  I'm neglecting my appearance
  I feel as if I have nothing to offer anyone

Traditional psychometric properties.

Internal consistency.  Cronbach's alpha coefficients were in excess of the minimum requirement of 0.70 for all three PRIMUS scales in all eight languages (Table 4). This confirms that the items in the scales are adequately related to each other.

Table 4.  Internal consistency and test–retest reliability of the new language versions of the PRIMUS scales
  1. All correlations are significant at the 0.01 level (two-tailed).

  2. C-E, Canada (English); C-F, Canada (French); D, Germany; E, Spain; F, France; I, Italy; Sw, Sweden; US-E, US English; ICC, intraclass correlation; PRIMUS, Patient-Reported Indices for Multiple Sclerosis; QoL, quality of life.

 Cronbach's alpha0.860.810.820.870.760.870.840.80
 Test–retest reliability0.880.830.790.930.780.810.840.84
Activity limitations        
 Cronbach's alpha0.940.960.940.960.960.970.970.96
 Test–retest reliability0.940.970.930.930.970.870.940.94
 Cronbach's alpha0.900.940.960.890.900.880.890.90
 Test–retest reliability0.900.920.900.910.930.860.910.92

Test–retest reliability.  For the activity limitations and QoL scales all test–retest and ICCs were 0.85 or above, indicating that they had excellent reproducibility and that the scales can be used on an individual basis in clinical practice and also as outcome measures in clinical studies and trials. For Symptoms test–retest and ICCs were slightly lower, but most achieved a level of 0.80 (Table 4).

Convergent validity. Table 5 shows the correlations between the PRIMUS scales and the U-FIS and NHP. The PRIMUS symptoms scale correlated highest with the U-FIS and the NHP energy level, pain, and physical mobility sections reflecting the importance of these outcomes in MS symptomatology. The activity limitations scale correlated highest with the NHP Physical Mobility scale as expected and moderately with U-FIS. Again, as expected, there was a lower association between PRIMUS activities and the emotional reactions and sleep sections of the NHP. The QoL scale correlated highest with U-FIS and the energy level, emotional reactions, and social isolation sections of the NHP. PRIMUS QoL correlated moderately highly with the other NHP sections, suggesting that it is providing an overall assessment of the impact of MS on patients.

Table 5.  Convergent validity of the new language versions of the PRIMUS activity scale with U-FIS and the NHP
  • *

    All correlations are significant at the 0.05 level (2-tailed) except where marked with a .

  • C-E, Canada (English); C-F, Canada (French); D, Germany; E, Spain; F, France; I, Italy; Sw, Sweden; US-E, US English; NHP, Nottingham Health Profile; PRIMUS, Patient-Reported Indices for Multiple Sclerosis; QoL, quality of life; U-FIS, Unidimensional Fatigue Impact Scale.

 NHP energy level0.730.630.400.740.47N/A0.630.53
 Pain0.700.620.710.550.51 0.560.62
 Emotional reactions0.570.210.450.550.40 0.540.48
 Sleep0.430.390.360.430.33 0.390.44
 Physical mobility0.670.610.240.670.50 0.510.57
 Social isolation0.440.260.23*0.440.29 0.450.50
Activity limitations        
 NHP energy level0.720.500.330.580.27N/A0.530.59
 Pain0.560.500.350.390.27 0.440.61
 Emotional reactions0.230.230.22*0.15*0.22 0.190.21
 Sleep0.300.210.09*0.22*0.25 0.180.22
 Physical mobility0.880.910.850.820.79 0.890.89
 Social isolation0.440.330.310.360.24 0.440.41
 NHP energy level0.730.680.260.790.45N/A0.730.63
 Pain0.470.410.260.500.18* 0.450.50
 Emotional reactions0.630.580.640.610.71 0.680.58
 Sleep0.370.300.21*0.380.30 0.270.41
 Physical mobility0.610.580.300.600.49 0.610.63
 Social isolation0.680.650.750.740.64 0.700.68

Known groups validity. Table 6 shows PRIMUS scale scores by known factors. All three PRIMUS scales were able to distinguish between groups of patients categorized by self-perceived MS severity, general health, and employment status. Differences in scales scores were statistically significant for all countries except France. Results showed that mean scores for each scale were higher (indicating poorer symptom/activity/QoL status) in the groups with worse self-perceived severity and general health. Similarly, working patients had consistently milder scores on the three scales. The French symptoms and QoL scales showed similar trends that did not reach statistical significance.

Table 6.  Known Groups validity of the new language versions of the PRIMUS scales
 Mean PRIMUS scale score
  1. C-E, Canada (English); C-F, Canada (French); D, Germany; E, Spain; F, France; I, Italy; Sw, Sweden; US-E, US English; PRIMUS, Patient-Reported Indices for Multiple Sclerosis; QoL, quality of life.

 Employment; mean (SD)        
  Working6.6 (5.0)5.3 (4.0)9.0 (4.7)5.6 (4.2)6.5 (3.7)5.7 (4.4)6.6 (5.0)9.6 (4.4)
  Not working because of MS9.5 (4.6)8.8 (3.5)10.6 (4.8)8.9 (5.0)9.2 (4.3)9.7 (5.1)9.7 (4.2)12.8 (3.3)
 Perceived MS severity; mean (SD)
  Mild6.1 (4.6)4.7 (3.4)7.5 (2.2)3.0 (3.5)5.5 (3.8)4.4 (3.3)4.9 (3.8)8.1 (4.2)
  Moderate10.5 (4.5)8.5 (3.7)9.7 (5.0)7.5 (4.2)7.9 (3.5)10.7 (4.6)8.9 (4.3)12.5 (3.7)
  Quite/very severe11.8 (3.8)10.2 (4.0)10.8 (4.2)11.8 (4.3)10.0 (4.0)7.6 (5.6)12.0 (4.0)13.3 (2.4)
 Perceived general health; mean (SD)
  Good/very good6.1 (4.3)5.8 (3.7)7.8 (3.1)4.2 (4.3)6.3 (4.2)4.5 (3.8)5.2 (3.9)8.8 (4.1)
  Fair/poor11.7 (4.3)9.4 (4.0)11.2 (4.6)9.4 (4.2)9.3 (3.7)10.1 (4.7)10.5 (3.9)13.8 (3.0)
Activity limitations        
 Employment; mean (SD)        
  Working3.1 (4.4)3.3 (4.5)8.9 (5.9)2.3 (3.0)6.5 (6.3)4.0 (7.1)3.6 (5.6)4.3 (5.6)
  Not working because of MS9.4 (7.4)11.6 (7.9)15.5 (7.5)12.8 (10.1)22.3 (8.1)15.2 (8.6)14.3 (9.1)13.1 (8.2)
 Perceived MS severity; mean (SD)
  Mild2.8 (4.0)1.7 (2.4)4.7 (6.0)1.1 (1.9)11.1 (10.5)2.8 (6.0)3.7 (7.2)2.8 (4.6)
  Moderate9.7 (7.2)9.7 (5.9)10.3 (6.1)8.3 (7.1)15.3 (9.5)12.0 (8.9)8.9 (7.9)11.2 (7.9)
  Quite/very severe14.5 (6.4)18.1 (7.7)18.3 (6.2)16.7 (9.6)22.7 (7.4)17.9 (8.9)16.7 (8.8)16.4 (7.6)
 Perceived general health; mean (SD)
  Good/very good3.9 (5.8)4.4 (5.9)8.1 (7.9)3.7 (6.2)15.1 (10.6)4.9 (8.9)4.9 (7.9)5.2 (6.2)
  Fair/poor9.4 (6.3)13.6 (7.2)15.4 (6.0)11.2 (8.9)19.5 (9.0)12.2 (8.5)11.4 (8.9)12.9 (8.6)
 Employment; mean (SD)        
  Working4.0 (4.0)3.4 (3.9)8.5 (5.8)5.5 (5.9)7.1 (5.4)4.7 (5.1)4.3 (4.5)6.3 (5.1)
  Not working because of MS9.0 (5.5)8.6 (5.2)10.7 (5.5)7.7 (4.2)10.7 (4.8)11.0 (6.3)8.4 (5.3)11.2 (4.9)
 Perceived MS severity; mean (SD)
  Mild4.4 (4.5)2.5 (2.7)6.8 (5.8)2.8 (3.4)4.6 (5.6)3.0 (3.6)2.8 (3.3)4.7 (3.8)
  Moderate8.1 (4.9)7.8 (4.8)8.8 (5.2)6.2 (4.2)8.7 (5.0)10.9 (6.4)7.0 (4.8)10.5 (5.3)
  Quite/very severe12.8 (5.4)11.1 (5.7)11.0 (5.4)11.6 (3.5)11.8 (4.8)10.2 (6.6)11.2 (5.1)14.5 (4.2)
 Perceived general health; mean (SD)
  Good/very good4.7 (4.6)4.0 (3.9)6.4 (4.0)3.3 (3.2)6.9 (4.8)3.4 (4.4)2.8 (3.5)6.2 (5.0)
  Fair/poor9.7 (4.8)9.8 (5.4)11.1 (5.3)8.9 (4.8)10.8 (5.4)11.2 (5.9)9.1 (4.7)11.8 (5.0)


The variable and unpredictable nature of MS has presented a challenge to the assessment of PROs. The PRIMUS was developed to provide an empirical assessment of patient's perception of the symptomatic, activity limitation, and QoL benefits of treatment. The UK version of the measure has been shown to be well accepted by patients and to have good psychometric properties. The present study describes the adaptation and validation of new language versions of the PRIMUS scales. Additional language versions of the scales were required to allow the PRIMUS to be used in international clinical trials of treatments for MS. A well-established adaptation methodology was applied to ensure that the new language versions were conceptually equivalent to the UK original scales and that they would have equally acceptable psychometric qualities. Large samples of MS patients were employed to determine the internal consistency, reproducibility, and construct validity of the new language versions.

The evidence presented supports the quality of the new language versions of the PRIMUS scales. Application of the Rasch model showed all but two of the PRIMUS scales to be unidimensional using Time 1 data. Adaptations were shown to have good item stability and minimal DIF. Confirmation of unidimensionality means the scales are capable of providing single indices of symptomatic, activity, and QoL impact that are easy to interpret.

In addition, the examination of item severity across countries indicated that item ordering at the extremes is very similar. This suggests that the relative severity of the items has been retained in each country after the translation process.

Each new PRIMUS scale demonstrated high internal consistency. All scales showed good ICC and test-retest reliability indicating that there are low levels of random error inherent in the scales. The symptoms scale had lower ICC and test-retest scores than the QoL and activity limitations scales. However, this may be expected as symptoms are more variable on a day to day basis than activity limitation or QoL. Evidence of construct validity was provided by consistent correlations with comparator measures and by the ability of the scales to distinguish between groups of patients categorised by known factors.

The study has a number of limitations. Two of the twenty-four scale adaptations showed some level of misfit to the Rasch model. However, no allowance was made for multiple testing in the analyses. Further research is needed to confirm the unidimensionality of these two scales. In addition, the French QoL and Activities scales failed to distinguish between participants based on self-perceived MS severity. Further evidence of the construct validity for these two scales is desirable. The design of the study did not allow for an assessment of the responsiveness of the different language versions of the PRIMUS or the meaningfulness of change scores. Additional research is required to determine these properties. Furthermore, the current study did not allow the comparison of PRIMUS scores with clinical markers of MS or the comparison of PRIMUS scores between MS types.


It is concluded that the PRIMUS scales were successfully translated into eight new languages. Evidence indicated that there were few problems with the psychometric and scaling properties of the new language versions developed. The PRIMUS validly and reliably provides an empirical assessment of patient perceptions of the symptomatic, activity limitation, and QoL benefits of treatment.


The authors are particularly grateful to the MS Society and the MS International Federation for their assistance and to the many patients who participated in the development and testing of the new language versions of the PRIMUS.

Source of financial support: The study was funded by Novartis Pharmaceuticals.