1. Top of page
  2. Introduction
  3. Materials and Methods
  4. Results
  5. Discussion

Both in clinical practice and in research on patients with osteoarthritis (OA), outcome is evaluated using many different instruments. The Outcome Measures in Rheumatology Clinical Trials (OMERACT) group defined a core set of outcome dimensions for clinical studies in hip and knee OA, which are pain, physical function (the performance of daily activities), and patient global assessment (1, 2). These are in line with the recommendations of several guidelines for outcome measurement in OA trials (European League Against Rheumatism [EULAR] [3], Food and Drug Administration [FDA]; [1], and Slow-acting Drugs in Osteoarthritis [SADOA] [4]). However, these guidelines differ in their recommendations of specific instruments, or do not include recommendations of instruments at all (1, 5).

Nowadays, a large number of instruments are available to assess the outcome dimensions of the OMERACT. The issue arises of which instruments are most appropriate to use. The selection of an instrument should depend on the instrument's psychometric qualities (namely, reproducibility, validity, and responsiveness) and on practical considerations (for example, time to complete, ease of scoring, and mode of administration).

Because the majority of the instruments developed for patients with OA are questionnaires, our focus in this article is on questionnaires. Several reviews of OA questionnaires have been published, and recently, a special issue on outcome measurements was published by Arthritis Care & Research (1, 6–8). Sun et al concluded that both the Western Ontario and McMaster Universities Osteoarthritis Index (WOMAC) and Lequesne Index are recommended as primary measures in treatment studies (7). However, none of these reviews give a complete systematic overview of all available instruments for patients with OA of the hip and/or knee. Before a specific core set of questionnaires can be recommended, a systematic comparison of the descriptive and psychometric qualities of the instruments is required. Recently, systematic reviews of measurement instruments for specific populations have been conducted (9, 10).

The objective of this article was to give an overview of published self-assessment (self-administered and interview-based) instruments on pain, physical function, and patient global assessment for patients with hip and/or knee OA. To evaluate the selected questionnaires, data on the descriptive and psychometric qualities of the instruments were systematically collected and rated using a checklist (9, 11). This overview will facilitate the choice of the most appropriate questionnaires to measure the OMERACT outcome dimensions in patients with OA of the hip and/or knee.

Materials and Methods

  1. Top of page
  2. Introduction
  3. Materials and Methods
  4. Results
  5. Discussion

Study selection

An extensive search was conducted in the Medline (1966 through May 2004), CINAHL (1982 through May 2004), and Embase (1988 through May 2004) databases. The broad computerized search strategy was built on search strategy for OA of the hip/knee; search strategy for outcome assessment; search strategy for the outcome dimensions pain, physical function, and patient global assessment; and search strategy for psychometric qualities. Furthermore, references of the retrieved articles were screened for relevant articles.

Inclusion of articles was based on the title and abstract, and was decided by 2 independent reviewers (CV and CHME). In case of uncertainty, the full article was read by 2 independent reviewers (CV and MFP). If necessary, disagreements were resolved by a third reviewer (CHME). Inclusion criteria at the level of patients were as follows: patients with OA of the hip and/or knee; in case of surgical interventions, data were only included when collected before surgical interventions (e.g., total knee or total hip replacement) or before other invasive interventions, because patients after surgery were considered a different patient population. At the level of instruments, inclusion criteria were as follows: self-assessment (self-reported or interview-based) questionnaires; questionnaires that contained ≥1 separate dimensions of either pain, physical function, or global perceived effect; both condition-specific and generic questionnaires were included; in case of different language versions of the same questionnaire, only the English version or the native version was included. Finally, at the level of the performed studies, inclusion criteria were as follows: the main focus of the article was the development, construction, or psychometric evaluation of the instrument (only psychometric evaluations using classical test theory were included, evaluations based on item response theory [IRT] were excluded; no checklist is currently available to rate psychometric evaluations based on IRT); data of patients with hip and/or knee OA were published separately in case of mixed populations (e.g., patients with rheumatoid arthritis and patients with OA); results had been published in English as a full report.

Data extraction and quality assessment.

A checklist of specific criteria for quality assessment of instruments was used, consisting of a section with descriptive aspects of the instrument and a section with specific psychometric criteria (Appendix A). The checklist was developed by Bot et al (11) based on the work by Lohr et al (12) and the checklist developed by Bombardier and Tugwell (13). This list of criteria has already been used in several systematic reviews (9, 10). All qualities were rated as either positive, doubtful, or negative. In case no or insufficient information was available on an aspect, no rating was given. The psychometric qualities of each study were independently assessed by 2 reviewers (CV and GMD). Disagreements between the reviewers were resolved by discussion. Because the present review focused on pain, physical function, and patient global assessment, only information on these outcome dimensions was rated in case other dimensions were also included in the instrument (e.g., quality of life, mental functioning). When ≥2 studies were performed on the same psychometric qualities of the same instrument involving the same population (e.g., hip/knee OA or outpatients/inpatients), the highest rating was taken.

Characteristics of the instruments.

The descriptive data provide information about the target population, scales, and format of the instruments. Extracted data included target population, domains to which the scales could be classified (pain, physical functioning, emotional functioning, social functioning, general health, and quality of life), number of scales, number of items, response options, range of score, mode of administration (self administered or interview based), ease of scoring method, and time needed to complete the questionnaire. Three types of self-reported instruments were distinguished: generic scales, which are designed for various populations of patients; condition-specific questionnaires, designed for a specific group of patients; and patient-specific questionnaires, designed for use with individual patients (14).

Psychometric qualities.

Data on the characteristics of the study group (diagnosis and clinical features) were reported to reflect for which population the psychometric qualities (validity, reproducibility, responsiveness, and interpretability) were assessed.


Validity is the degree to which an instrument measures the construct it is intended to measure (12). The content validity, internal consistency, and construct validity of the instruments were evaluated. The content validity examines the extent to which the items adequately represent all significant aspects of the construct being measured (15). The rating of content validity was positive if patient consultation was combined with either expert consultation or examination of literature during item selection in the construction phase of the instrument, and was doubtful if only patients were consulted during item selection. The internal consistency, determined by calculating Cronbach's alpha, indicates the homogeneity of the items in a (sub)scale. To determine which selected items cluster together around one aspect (and thus form a separate (sub) scale), factor analysis has to be performed. Questionnaires were rated positive if factor analysis was conducted and Cronbach's alpha for each separate dimension was >0.70 (16). Construct validity refers to the extent to which scores on a particular instrument relate to other assessment tools in a manner that is consistent with theoretically derived hypotheses (17). A positive rating was achieved when hypotheses about the magnitude and direction of relationships of the questionnaire (sub)scales with reference instruments were specified, and when >75% of these hypotheses could be confirmed. If available, descriptive data on the distribution of scores, including information about the presence of floor or ceiling effect, were extracted. Floor and ceiling effects were considered present if >15% of the respondents achieved the highest or lowest possible score (18).


Reproducibility is the extent to which an instrument yields stable scores over time among respondents who are assumed not to have changed on the domains being assessed. Reproducibility was assessed by rating reliability and agreement. Reliability is the degree to which an instrument is free of measurement error. The intraclass correlation coefficient (ICC) for each (sub)scale is preferred to calculate reliability. The test–retest reliability and interobserver reliability were rated positive if the ICC was >0.70 and >0.60, respectively (12).

To quantify measurement error and detect systematic differences between 2 measurements, a measure of agreement is calculated. The 95% limits of agreement according to Bland and Altman (19) and the standard error of measurement (SEM) or smallest detectable (real) difference (SDD) (20) were considered to be adequate measures for agreement. Because it was not possible to define adequate cutoff points for the agreement, agreement was rated as positive if one of the adequate measures was presented.


Responsiveness is the ability of an instrument to detect real or important change over time in the concept being measured (21). Predefined hypotheses about the relation of change in the instrument to corresponding changes in reference instruments have to be postulated. Responsiveness was rated positive if these hypotheses were presented and if >75% of these hypotheses could be confirmed.


Interpretability is defined as the degree to which (change) scores can be interpreted and a qualitative meaning can be assigned to quantitative scores (12). A minimum clinically important difference (MCID) should be defined to interpret change scores in the target population. Other information that improves interpretation of the scores includes, for instance, presentation of means and standard deviations of patients' scores before and after treatment, data on the distribution of scores in relevant subgroups, and relating changes in the instrument score to patients' global perceived change. A positive rating was achieved when at least 2 types of information were presented.

Overall quality.

To obtain an overall score of the instruments, we counted the number of positive ratings for each instrument.


  1. Top of page
  2. Introduction
  3. Materials and Methods
  4. Results
  5. Discussion

Selection of the studies.

The search identified a total of 1,930 publications. After screening titles and abstracts, 1,777 studies were excluded. Of the remaining 153 publications, 37 publications were included after reading the full article. Reasons for exclusion were data collection after operation or other invasive interventions (n = 54); no separate data presented for patients with hip or knee OA (n = 25); no psychometric evaluation of the instrument (n = 16); no self-assessment instrument (n = 9); no outcome dimensions of pain, physical function, or patient global assessment (n = 6); no English version of the instrument (n = 3); and no use of classical test theory (n = 3). A total of 32 questionnaires were included in study, which were divided into 24 condition-specific instruments (22–55), 7 generic questionnaires (25, 28, 34, 42–44, 52, 55–58), and 1 patient-specific instrument (57, 58). The full names of the investigated questionnaires are presented in Table 1. Actually, the 24 condition-specific questionnaires included different versions of the same instrument. Five different versions of both the WOMAC and Lequesne Index were investigated, and 3 different versions of the Arthritis Impact Measurement Scales (AIMS) were included. Three versions of the WOMAC differed in response options (visual analog scale [VAS], Likert scale, and numeric scale), and 1 version identified the most important items specific for the individual patient (signal version). The last WOMAC version differed in number of items (modified WOMAC), which also applied to the Lequesne Index hip and knee (modified Lequense Index) and the AIMS (AIMS2, AIMS2 Short Form [AIMS2-SF]). Finally, the Lequense Index versions varied in the mode of administration (interview based or self reported). Because not all descriptive information on a specific instrument was published in the article describing the psychometric study of patients with OA, original articles about the development of the instruments involving other patient populations were consulted (59–67).

Table 1. Full names of the questionnaires included
AbbreviationFull name
WOMAC VA3.0Western Ontario and McMaster Universities Osteoarthritis Index, Visual Analog Scale
WOMAC LKWestern Ontario and McMaster Universities Osteoarthritis Index, Likert Scale
Lequesne IndexAlgofunctional indices for the hip and knee or index of severity for hip/knee disease
KOOSKnee Injury and Osteoarthritis Outcome Score
HOOSHip Osteoarthritis Outcome Score
SMFAShort Musculoskeletal Function Assessment Questionnaire
J-MAPJoint-Specific Multidimensional Assessment of Pain
SF-36MOS Short Form 36
HAQHealth Assessment Questionnaire
AIMSArthritis Impact Measurement Scales
AIMS2-SFArthritis Impact Measurement Scales-Short Form
IRGLInfluence of Rheumatic Disease on General Health and Lifestyle
QR&SQuestionnaire Rising and Sitting down
ADL difficulty scaleActivities of Daily Living difficulty scale
ADL pain scaleActivities of Daily Living pain scale
VASVisual Analog Scale
NHPNottingham Health Profile
SIPSickness Impact Profile

Description of questionnaires.

All included questionnaires and the descriptive items that were rated are presented in Table 2. Most questionnaires were developed to assess pain (n = 22) and/or physical function (n = 26) in separate (sub)scales. Only the Lequesne Index and the Patient Based Measure combine these 2 aspects in 1 single index.

Table 2. Description of the hip/knee OA questionnaires*
Questionnaire (references)Target populationDomainsNo. of scales§No. of itemsNo. of response optionsRange of scoresTime to administer, minutesMode of administration
  • *

    OA = osteoarthritis; P = pain subscale; S = stiffness subscale; PF = physical function subscale; Sy = symptoms subscale; A = activity limitations–daily living; SP = activity limitations–sport and recreation; Q = quality of life (hip or knee related); ? = no information found; NA = not available; see Table 1 for additional abbreviations.

  • Population for which the questionnaire has been developed.

  • Domains: pain, other symptoms, physical functioning, emotional functioning, and social functioning.

  • §

    Scales: a subscore within a questionnaire.

Condition-specific  questionnaires        
 WOMAC VA3.0  (22,24,32,37,42,47)Hip/knee OAPain, other symptoms, physical function3240–100P: 0–500 S: 0–200 PF: 0–1,700<10 (paper) 10–15  (computer)Self administered on paper and computer
 WOMAC LK  (28,30,32,45,52)Hip/knee OAPain, other symptoms, physical function3245P: 0–20 S: 0–8 PF: 0–68<10Self administered on paper and telephone
 WOMAC numeric  scale (25, 27, 38, 44, 46)Hip/knee OAPain, other symptoms, physical function32411P: 0–50; S: 0–20 PF: 0–170<10 (paper) 10–15  (computer)Self administered on paper and computer
 WOMAC signal (35)Hip/knee OAPain, other symptoms, physical function33  <10Self administered
 WOMAC VA  modified (23)Hip/knee OAPain, physical function2140–100P: 0–500 PF: 0–9006Self administered
 Lequesne Index knee  (31,33,47)Knee OAPain, physical function1112-60–243–4Interview based
 Lequesne Index hip  (31,33)Hip OAPain, physical function1112-60–243–4Interview based
 Lequesne-knee  self-reported (38)Knee OAPain, physical function1112-60–243–4Self administered
 Lequesne hip  self-reported (38)Knee OAPain, physical function1112-60–243–4Self administered
 Lequesne modified (51)Hip/knee OAPain, physical function1102-60–233.25Interview based
 HOOS (50, 53)Hip OAPain, physical function, quality of life5405P: 0–40; Sy: 0–20; A: 0–68; SP: 0–16; Q: 0–167–10Self administered
 KOOS (39, 41, 54)Knee OAPain, physical function, quality of life5425P: 0–36; Sy: 0–28; A: 0–68; SP: 0–20; Q: 0–1610Self administered
 A Patient-Based  Measure (40)Knee OAPain, physical function1122–60–100?Self administered
 Knee Pain Scale (36)Knee OAPain465/61–5 or 1–6?Self administered
 SMFA (48)Musculoskeletal extremity disordersPhysical function2465PF: 34–170; bother:12–60?Self administered
 J-MAP (49)Patients with pain in any jointPain29Varies0–100?Self administered
 SF-36 disease specific  for physical function  and role limitations (55)Every specific diseasePhysical function2142 or 30–100<10Self administered
 HAQ (34, 52)Arthritic conditionsPhysical function8202 or 40–35–8Self administered
 AIMS (34)Arthritic conditionsPain, physical function, mental function, social function952Varies0–1015–20Self administered
 AIMS2 (29)Arthritic conditionsPain, physical function, mental function, social function1278Varies0–1023Self administered
 AIMS2–SF (26)Arthritic conditionsPain, physical function, mental function, social function52350–10?Self administered
 IRGL (43)Rheumatoid arthritisPhysical function, mental function, social function11684Varies20Self administered
 ADL difficulty scale (57)Arthritic conditionsPhysical function1841–45Self administered
 ADL pain scale (57)Arthritic conditionsPain1841–45Self administered
Patient–specific questionnaires        
 Patient global assessment (57, 58)General populationPhysical functionNANAVariesVaries?Self administered
Generic questionnaires        
 Single question pain (VAS) (57, 58)General populationPainNANA0–1000–100?Self administered
 Single question pain (Likert) (58)General populationPainNANA50–4?Self administered
 NHP (43, 56)General populationPhysical function, mental function, social function63820–100<10Self administered
 SF–36 (25,28,42,44,52,55)General populationPain, physical function, mental function, social function, general health836Varies0–10010Self administered
 SIP (34)General populationPhysical function, mental function, social function1213620–10020–30Self administered
 QR&S (43)General populationPhysical function23220–10?Self administered

Some questionnaires have separate versions for hip OA and knee OA, such as the Hip Osteoarthritis Outcome Score (HOOS) and Knee Injury and Osteoarthritis Outcome Score (KOOS), and Lequesne Index Hip and Lequesne Index Knee. With the exception of the Patient Based Measure and the Knee Pain Scale, which were developed specifically for patients with knee OA, all other questionnaires can be used for both hip OA and knee OA. Of the condition-specific questionnaires, the AIMS2 had the largest number of items (n = 78), followed by the Influence of Rheumatic Disease on General Health and Lifestyle questionnaire (n = 68), AIMS (n = 52), and Short Musculoskeletal Function Assessment Questionnaire (SMFA; n = 46), whereas the Knee Pain Scale (n = 6) and Joint-Specific Multidimensional Assessment of Pain (J-MAP; n = 9) had the smallest number of items. Within the generic questionnaires, the number of items varied even more because the single pain questions (VAS and Likert) consisted of only 1 item and the Sickness Impact Profile (SIP) consisted of 136 items. The majority of questionnaires can be completed within 10 minutes.

Most studies involved patients with knee OA (n = 18) or both knee OA and hip OA (n = 16); only 3 studies included patients with only hip OA. Concerning the setting of the studies, 25 studies included patients from an outpatient setting (e.g., through general practitioner, hospital mailing), 8 studies described in patients (mostly from a hospital or rehabilitation center), and 4 studies included both outpatients and inpatients.

Psychometric qualities.

The rating of the psychometric qualities of the hip/knee OA questionnaires is presented in Table 3, summarizing each aspect as good, doubtful, or poor quality. An empty spot indicates no or insufficient information about an aspect. Because most results of psychometric qualities are dependent on the population studied, the type of population is presented in the table (a distinction is made between outpatients and inpatients, and between hip OA and knee OA). None of the questionnaires in this review have been adequately tested on all psychometric qualities of the checklist in patients with hip and/or knee OA.

Table 3. Summary of the quality assessment of the included questionnaires*
Questionnaire (references)Time to administerEase of scoringReadability and comprehensionContent validityInternal consistencyConstruct validityFloor/ceiling effectReliabilityAgreementResponsivenessInterpretabilityMCIDPositively rated qualities, no.
  • *

    MCID = minimum clinically important difference; + = positive; − = negative; ø = doubtful; out = outpatients; in = inpatients; knee = only knee osteoarthritis; hip = only hip osteoarthritis; see Table 1 for additional abbreviations.

  • Version on paper.

  • Version on computer.

  • §

    Floor effect.

  • Assessed from WOMAC VA3.0.

  • #

    Floor effect for sports and recreation subscale.

  • **

    All scales, except reach and activities subscales (floor effect: negative).

  • ††

    Mobility subscale.

  • ‡‡

    All scales, except physical function and role limitations subscales (floor effect: negative).

  • §§

    Ceiling effect.

 WOMAC VA3.0  (22,24,32,37,42,47)+/−ø++ø, out+, out knee/ in knee ø, out hip/ knee hip+, out knee§ø, out/ in knee+, outø, out/in+, out+, out8
 WOMAC LK (28, 30, 32, 45, 52)++ +−, in/ø, outø, out/in+, out knee/ in kneeø, out+, out/inø, out/inø, out knee/ in knee 5
 WOMAC numeric scale  (25,27,38,44,46)+/−+ +ø, outø, out+, in§ø, out+, inø, out/in+, in+, in7
 WOMAC signal (35) ø       ø, out kneeø, out knee 0
 WOMAC VA3.0 modified (23)+ø++ +, out knee/ in knee +, out knee/ in knee    5
 Lequesne Index knee (31, 33, 47)+++  ø, out knee ø, out knee    3
 Lequesne Index hip (31, 33)+++  ø, out hip ø, out hip    3
 Lequesne–knee self-reported (38)++  ø, out kneeø, out knee ø, out knee    2
 Lequesne hip self-reported (38)++  ø, outø, out hip ø, out hip    2
 Lequesne modified (51)+++  +, out knee/ in knee +, out knee/ in knee    5
 HOOS (50, 53)+ø +ø, out hip+, in hip+, in hip+, out hip    5
 KOOS (39, 41, 54)+ø +ø, out knee+, out knee+, out knee/−, out knee#ø, out knee  +, out knee 5
 A Patient-Based Measure (40)  +, out kneeø, out knee+, out knee     2
 Knee pain scale (36) ø +, out knee+, out kneeø, out knee ø, out knee    2
 SMFA (48) ø++ ø, in knee+, out     3
 J-MAP (49)  ++, in kneeø, in knee      2
 SF-36 disease specific for physical function and role limitations (55)+   ø, out knee+, out knee      2
 HAQ (34, 52)+ø++ø,out knee/ in kneeø, out knee/ in knee+, out knee/ in knee**  ø, out knee/ in kneeø, out knee/ in knee 4
 AIMS (34)++ ø, out   ø, outø, out 2
 AIMS2 (29)+øø, out     ø, out 2
 AIMS2-SF (26)  ø+, out +   +, out 3
 IRGL (43)+       ø, out††ø, out†† 1
 ADL difficultly scale (57) ø   ø, out knee   ø, out kneeø, out knee 0
 ADL pain scale (57) ø   ø, out knee   ø, out kneeø, out knee 0
 Patient global assessment (57, 58)++   −, out knee   ø, out knee+, out knee 3
 Single question pain (VAS) (57, 58)+ø   ø, out knee   ø, out knee+, out knee 2
 Single question pain (Likert) (58)++   ø, out knee   ø, out knee+, out knee 3
 NHP (43, 56)+ +   ø, out hip ø, out††ø, out†† 2
 SF-36 (25, 28, 42, 44, 52, 55)+  ø, out knee+, out knee/ ø, out hip+, out knee‡‡/+, in§§ +, inø, out/in+, out/in+, in6
 SIP (34)ø + ø, out   ø, outø, out 1
 QR&S (43) ø +     ø, outø, out 1
Content validity.

Almost all instruments were scored positively on content validity, meaning that patients and investigators or experts were involved during the development of the questionnaire. Only one instrument, the Patient Based Measure, was scored negatively on content validity because consultation of patients in the development of the questionnaire was not reported.

Internal consistency.

For a positive rating of the internal consistency, information was needed on the construct of the questionnaire (investigated by factor analysis) and on Cronbach's α of each (sub)scale. Information on both aspects was available for 6 of the 32 questionnaires, of which only 4 had a positive rating (Patient Based Measure, Knee Pain Scale, J-MAP, and AIMS2-SF). The other 2 questionnaires, WOMAC, Likert scale (WOMAC LK) and HOOS, were rated as doubtful. For the WOMAC LK the a priori dimensions could not be confirmed by factor analysis. The dimensions of the HOOS could be supported, with the exception of the subscale “activity limitations-daily living,” which loaded as 2 factors. Besides this, the activity subscale had a Cronbach's α >0.95.

The dimensionality of 3 other questionnaires was studied by factor analysis, but no information was available on Cronbach's alpha of the subscales. The construct of the WOMAC VAS (3 subscales: pain, stiffness, and physical function; WOMAC-VA3.0) and modified WOMAC (pain and physical function subscales) could not be confirmed. A 2-factor solution for the Lequesne Index was found, while the Lequesne Index claims to measure a single construct (68).

In 9 instruments, information on internal consistency was restricted to information on Cronbach's alpha only, which ranged from 0.70 to 0.96. The exceptions were the HAQ and the subscale “role limitations” of the Short Form 36 (SF-36), which had a Cronbach's alpha <0.70.

Construct validity.

Only 7 of 26 studies (for knee OA: WOMAC VA3.0, modified WOMAC, modified Lequesne Index, KOOS, disease-specific SF-36, and SF-36; for hip OA: HOOS) that investigated construct validity presented hypotheses relating to the magnitude and direction of expected correlations with other instruments, which is a condition for a positive rating according to the criteria of the checklist. The correlations between most (subscales of) questionnaires measuring pain were moderate (r = 0.40–0.70). Concerning physical function, the Lequesne Index, the WOMAC physical function subscale, the SF-36 physical function subscale, and the SMFA had high correlations (r > 0.7) with each other. These results also apply to the HOOS, KOOS, and WOMAC because the physical function subscales of the HOOS and KOOS are equal to the WOMAC physical function subscale. Of 10 questionnaires, floor and/or ceiling effects were investigated, mainly for outpatients with knee OA. No floor or ceiling effects were found, with the exception of some subscales (e.g., sports and recreation subscale of the KOOS, which showed a floor effect for outpatients with knee OA).


Information on test–retest reliability was found for 13 questionnaires. Because of low ICCs (<0.70), low sample size (<50), or the use of other correlation measures than ICC, only 3 of the 13 questionnaires had a positive rating for reliability. The modified WOMAC and modified Lequesne Index appeared to be reliable questionnaires for patients with knee OA, whereas the HOOS was reliable for patients with hip OA (ICC between 0.78 and 0.95). Information on agreement was available for 4 instruments (WOMAC VA3.0, WOMAC LK, WOMAC numeric scale, and SF-36). Either the SEM or SDD were presented.


The responsiveness was investigated for 16 questionnaires. None of these studies presented hypotheses relating to the magnitude of change and/or relationships with change scores of other instruments. Therefore, all questionnaires were rated as doubtful on responsiveness. Responsiveness was quantified as either effect sizes, relative efficiency, or standardized mean scores. Change scores were also calculated, and correlations with change scores of other instruments were presented. Some studies compared the responsiveness of ≥2 questionnaires. In general, the WOMAC appeared to be more responsive compared with the SF-36 in both patients with hip OA and those with knee OA (25, 42, 44, 58). In patients with knee OA, the responsiveness of the SF-36 appeared to be comparable with the HAQ (52), just as the AIMS was as responsive as the SIP in patients with hip OA and knee OA (34).

Interpretability and MCID.

Eight questionnaires were rated positive on interpretability by presenting at least 2 of the 4 types of information. Only 1 study (that of the AIMS2-SF) intentionally paid attention to the interpretability of scores by comparing the scores on the AIMS2-SF in groups of patients that differed in duration of disease, number of comorbidities, and general health perception (26). The MCID was calculated for 2 questionnaires, the WOMAC numeric scale and the SF-36. Of 13 questionnaires, means and standard deviations of baseline and followup scores or scores of relevant subgroups were presented.

Overall score.

After counting the total number of positive ratings for each instrument, the WOMAC VA3.0, WOMAC numeric scale, modified Lequesne Index, HOOS, and KOOS had the highest overall scores among the condition-specific instruments, with 8, 7, 5, 5, and 5 positive ratings, respectively. Concerning the generic questionnaires, the SF-36 obtained the highest overall score, with 6 positive ratings.


  1. Top of page
  2. Introduction
  3. Materials and Methods
  4. Results
  5. Discussion

An extensive search strategy led to the identification of 32 self-assessment questionnaires for the evaluation of pain and physical functioning and patient global assessment in patients with OA of the hip and/or knee, for which descriptive and psychometric qualities had been investigated. Most questionnaires were condition specific (n = 24); the remainder were generic (n = 7) and patient specific (n = 1). Twenty-two instruments were developed to rate pain; physical function was rated by 26 instruments. Concerning patient global assessment, only 1 instrument was found. Most studies included patients with knee OA (n = 18) or both knee and hip OA (n = 16); only 3 studies included patients with only hip OA. Many psychometric qualities were not properly tested for a large number of questionnaires, and none of the questionnaires were rated positive on all aspects of the checklist.

Overall, the condition-specific instruments (WOMAC, VAS version; Lequesne Index hip/knee; and HOOS/KOOS) had the best ratings for their descriptive and psychometric qualities for both pain and physical function. The WOMAC has been the most extensively studied instrument and received the best ratings for its descriptive and psychometric qualities. One should keep in mind that some instruments (such as the HOOS and KOOS) have not been studied extensively or have only been studied in other populations; ratings of these instruments might improve when more studies have been conducted on their psychometric qualities. Concerning generic instruments, the SF-36 has been studied most often and demonstrated, overall, the highest ratings. The psychometric qualities of the patient-specific instrument on patient global assessment has been studied to a limited degree in patients with hip and/or knee OA. Therefore, only a small number of quality criteria could be rated. The same accounted for the single questions pain on VAS and Likert scale, which were investigated in a small number of studies.

To compare the results of trials and optimize the transparency of care, a core set of questionnaires in patients with hip and/or knee OA seems to be indicated. For example, a core set of qualified questionnaires will facilitate the comparison and interpretation of the outcome of various treatment modalities in OA. At this time, guidelines for outcome dimensions in OA trials, such as OMERACT, EULAR, FDA, and SADOA guidelines, differ in their recommendations of instruments or do not include recommendations at all (1). Our results suggest that, at this time, the most appropriate questionnaires to use in patients with hip and/or knee OA seem to be the condition-specific questionnaire WOMAC and the generic questionnaire SF-36. Therefore, it is recommended that these questionnaires, completed with a patient-specific instrument on patient global assessment, are included in guidelines as a core set of instruments in patients with OA of the hip and/or knee. However, more research is needed on the psychometric qualities of patient global assessment measures before making a choice of the most appropriate instrument for patients with hip and/or knee OA.

Nonetheless, which scale is most appropriate to use always depends on the particular purpose of the assessment. For example, for discriminative purposes, the instrument should have satisfactory ratings for reproducibility and agreement. The modified WOMAC, modified Lequesne Index, and HOOS were the only questionnaires with a positive rating for test–retest reliability. Alternatively, when the purpose is to evaluate changes over time, an instrument should have positive ratings for responsiveness and no floor or ceiling effects. Currently, all questionnaires were rated equally, namely, were rated doubtful, on responsiveness. In general, the condition-specific instrument WOMAC appeared to be more responsive compared with the generic SF-36. Both in daily practice and in clinical trials, changes over time are frequently evaluated; therefore, responsiveness of the instrument is an important condition in selecting a questionnaire. The low ratings on responsiveness are remarkable, and more solid research on responsiveness is needed.

The dimensionality of only 9 questionnaires was tested using factor analysis. When the dimensionality of a questionnaire has not been analyzed, the internal consistency as reflected by Cronbach's alpha might not be interpretable (69). The theoretical dimensional structure of only 4 questionnaires could be confirmed (namely, Patient Based Measure, Knee Pain Scale, J-MAP, and AIMS2-SF). The factor analysis of the other 5 instruments (WOMAC VA3.0, WOMAC LK, modified WOMAC, Lequesne Index, and HOOS) yielded either more dimensions or less dimensions than a priori stated; therefore, internal consistency was rated as doubtful. It needs to be considered that only studies based on the classical test theory were included in the present review. IRT also provides a model to evaluate health status questionnaires. In total, 3 studies on the WOMAC were excluded from this review because Rasch analyses were performed.

Some limitations of this study have to be mentioned. First, some consideration is recommended when generalizing the results. After all, we excluded studies of patients after operations (e.g., total hip replacement and total knee replacement) or other invasive interventions. Furthermore, because it is uncertain whether psychometric qualities of translated versions can be generalized to the original version, we only included the English version (or, in the absence of an English version, the native version) of the questionnaires. In total, 20 studies (concerning the WOMAC [n = 12], KOOS [n = 2], AIMS [n = 2], Nottingham Health Profile [n = 2], Lequesne Index [n = 1], and VAS [n = 1]) were excluded because non-English versions were evaluated. The results of this review are only applicable to the included populations and questionnaires. Second, the criteria we used to evaluate the quality of the instruments were helpful to provide information on the practical and psychometric properties to facilitate the choice between questionnaires. However, in our opinion, there is room for improvement of the checklist. First, no instructions are given on how to determine the overall best instrument. We counted the number of positive ratings to make an overall judgement of the instruments, which implies that all different qualities are equally important. Second, the criteria for construct validity and responsiveness to postulate specific hypotheses can be questioned. The absence of hypotheses in the publication might be due to a shortcoming of the author instead of a lower psychometric quality of the instrument. In contrast, the need for clearly defined objective cutoff points to rate construct validity and responsiveness is high. Strikingly, the authors of all studies that investigated construct validity and responsiveness of questionnaires concluded that the questionnaires were valid and responsive instruments for patients with hip or knee OA. The present criteria on postulating hypotheses are a first step towards clearly defining these objective cutoff points. Furthermore, as suggested by Bot et al (9), authors can contribute to a good rating of questionnaires by clearly presenting the results of the studies they performed. The checklist, as used in the present review, might be a good tool for authors to check whether their results are systematically and unambiguously presented.

In conclusion, although the final choice of a questionnaire depends on the purpose of the assessment, the WOMAC VA3.0 and SF-36 currently demonstrated the highest ratings overall for both descriptive and psychometric qualities. Therefore, these questionnaires are recommended for evaluating pain and physical function in patients with hip and/or knee OA. Completed with a measure on patient global assessment, these instruments could be recommended in guidelines concerning outcome measurement in OA trials.


  1. Top of page
  2. Introduction
  3. Materials and Methods
  4. Results
  5. Discussion
  • 1
    Bellamy N. Osteoarthritis clinical trials: candidate variables and clinimetric properties. J Rheumatol 1997; 24: 76878.
  • 2
    Bellamy N, Kirwan J, Boers M, Brooks P, Strand V, Tugwell P, et al. Recommendations for a core set of outcome measures for future phase III clinical trials in knee, hip, and hand osteoarthritis: consensus development at OMERACT III. J Rheumatol 1997; 24: 799802.
  • 3
    World Health Organization, Regional Office for Europe. Guidelines for the clinical investigation of drugs used in rheumatic diseases: European drug guidelines. Series 5. Copenhagen: European League Against Rheumatism; 1985.
  • 4
    Lequesne M, Brandt K, Bellamy N, Moskowitz R, Menkes CJ, Pelletier JP, et al. Guidelines for testing slow acting drugs in osteoarthritis [published erratum appears in J Rheumatol Suppl 1994;21:2395]. J Rheumatol Suppl 1994; 41: 6573.
  • 5
    Bellamy N. Outcome measurement in osteoarthritis clinical trials. J Rheumatol Suppl 1995; 43: 4951.
  • 6
    Rogers JC, Irrgang JJ. Measures of adult lower extremity function: the American Academy of Orthopedic Surgeons Lower Limb Questionnaire, the Activities of Daily Living Scale of the Knee Outcome Survey (ADLS), Foot Function Index (FFI), Functional Assessment System (FAS), Harris Hip Score (HHS), Index of Severity for Hip Osteoarthritis (ISH), Index of Severity for Knee Osteoarthritis (ISK), Knee Injury and Osteoarthritis Outcome Score (KOOS), and Western Ontario and McMaster Universities Osteoarthritis Index (WOMACTM). Arthritis Care Res 2003; 49 Suppl 5: S6784.
  • 7
    Sun Y, Sturmer T, Gunther KP, Brenner H. Reliability and validity of clinical outcome measurements of osteoarthritis of the hip and knee: a review of the literature. Clin Rheumatol 1997; 16: 18598.
  • 8
    Garratt AM, Brealey S, Gillespie WJ, and the DAMASK Trial Team. Patient-assessed health instruments for the knee: a structured review. Rheumatology (Oxford) 2004; 43: 141423.
  • 9
    Bot SD, Terwee CB, van der Windt DA, Bouter LM, Dekker J, de Vet HC. Clinimetric evaluation of shoulder disability questionnaires: a systematic review of the literature. Ann Rheum Dis 2004; 63: 33541.
  • 10
    De Boer MR, Moll AC, de Vet HC, Terwee CB, Volker-Dieben HJ, van Rens GH. Psychometric properties of vision-related quality of life questionnaires: a systematic review. Ophthalmic Physiol Opt 2004; 24: 25773.
  • 11
    Bot SD, Terwee CB, van der Windt DA, Bouter LM, Dekker J, de Vet HC. Psychometric evaluation of self-report questionnaires: the development of a checklist. In: Ader HJ, Mellenbergh GJ, editors. Proceedings of the second workshop on research methodology 25–27 June 2003. Amsterdam: VU University Amsterdam; 2003. p. 1618.
  • 12
    Lohr KN, Aaronson NK, Alonso J, Burnam MA, Patrick DL, Perrin EB, et al. Evaluating quality-of-life and health status instruments: development of scientific review criteria. Clin Ther 1996; 18: 97992.
  • 13
    Bombardier CF, Tugwell P. Methodological considerations in functional assessment. J Rheumatol Suppl 1987; 14 Suppl 15: 610.
  • 14
    Binkley J. Measurement of functional status, progress, and outcome in orthopaedic clinical practice. Orthopaedic Practice 1999; 11: 1421.
  • 15
    Guyatt G, Feeny D, Patrick D. Issues in quality-of-life measurement in clinical trials. Control Clin Trials 1991; 12 Suppl: 81S90S.
  • 16
    Nunnally J. Psychometric theory. 2nd ed. New York: McGraw-Hill; 1978.
  • 17
    Kirshner BF, Guyatt G. A methodological framework for assessing health indices. J Chronic Dis 1985; 38: 2736.
  • 18
    McHorney CA, Tarlov AR. Individual-patient monitoring in clinical practice: are available health status surveys adequate? Qual Life Res 1995; 4: 293307.
  • 19
    Bland JM, Altman DG. Statistical methods for assessing agreement between two methods of clinical measurement. Lancet 1986; 1: 30710.
  • 20
    Beckerman H, Roebroeck ME, Lankhorst GJ, Becher JG, Bezemer PD, Verbeek AL. Smallest real difference, a link between reproducibility and responsiveness. Qual Life Res 2001; 10: 5718.
  • 21
    Terwee CB, Dekker FW, Wiersinga WM, Prummel MF, Bossuyt PM. On assessing responsiveness of health-related quality of life instruments: guidelines for instrument evaluation. Qual Life Res 2003; 12: 34962.
  • 22
    Bellamy N, Kean WF, Buchanan WW, Gerecz-Simon E, Campbell J. Double blind randomized controlled trial of sodium meclofenamate (Meclomen) and diclofenac sodium (Voltaren): post validation reapplication of the WOMAC Osteoarthritis Index. J Rheumatol 1992; 19: 1539.
  • 23
    Faucher M, Poiraudeau S, Lefevre-Colau MM, Rannou F, Fermanian J, Revel M. Assessment of the test-retest reliability and construct validity of a modified WOMAC index in knee osteoarthritis. Joint Bone Spine 2004; 71: 1217.
  • 24
    Feeny D, Blanchard CM, Mahon JL, Bourne R, Rorabeck C, Stitt L, et al. The stability of utility scores: test-retest reliability and the interpretation of utility scores in elective total hip arthroplasty. Qual Life Res 2004; 13: 1522.
  • 25
    Angst F, Aeschlimann A, Stucki G. Smallest detectable and minimal clinically important differences of rehabilitation intervention with their implications for required sample sizes using WOMAC and SF-36 quality of life measurement instruments in patients with osteoarthritis of the lower extremities. Arthritis Rheum 2001; 45: 38491.
  • 26
    Ren XS, Kazis L, Meenan RF. Short-form Arthritis Impact Measurement Scales 2: tests of reliability and validity among patients with osteoarthritis. Arthritis Care Res 1999; 12: 16371.
  • 27
    Angst F, Aeschlimann A, Michel BA, Stucki G. Minimal clinically important rehabilitation effects in patients with osteoarthritis of the lower extremities. J Rheumatol 2002; 29: 1318.
  • 28
    Davey RC, Edwards SM, Cochrane T. Test-retest reliability of lower extremity functional and self-reported measures in elderly with osteoarthritis. Adv Physiother 2003; 5: 15560.
  • 29
    Meenan RF, Mason JH, Anderson JJ, Guccione AA, Kazis LE. AIMS2: the content and properties of a revised and expanded Arthritis Impact Measurement Scales Health Status Questionnaire. Arthritis Rheum 1992; 35: 110.
  • 30
    Kennedy D, Stratford PW, Pagura SM, Wessel J, Gollish JD, Woodhouse LJ. Exploring the factorial validity and clinical interpretability of the Western Ontario and McMaster Universities Osteoarthritis Index (WOMAC). Physiother Can 2003; 55: 1608.
  • 31
    Lequesne MG, Mery C, Samson M, Gerard P. Indexes of severity for osteoarthritis of the hip and knee: validation: value in comparison with other assessment tests [published errata appear in Scand J Rheumatol 1988;17:following 241 and Scand J Rheumatol Suppl 1988;73:1]. Scand J Rheumatol Suppl 1987; 65: 859.
  • 32
    Bellamy N, Buchanan WW, Goldsmith CH, Campbell J, Stitt LW. Validation study of WOMAC: a health status instrument for measuring clinically important patient relevant outcomes to antirheumatic drug therapy in patients with osteoarthritis of the hip or knee. J Rheumatol 1988; 15: 183340.
  • 33
    Lequesne M. Indices of severity and disease activity for osteoarthritis. Semin Arthritis Rheum 1991; 20 Suppl 2: 4854.
  • 34
    Weinberger M, Samsa GP, Tierney WM, Belyea MJ, Hiner SL. Generic versus disease specific health status measures: comparing the sickness impact profile and the arthritis impact measurement scales. J Rheumatol 1992; 19: 5436.
  • 35
    Barr S, Bellamy N, Buchanan WW, Chalmers A, Ford PM, Kean WF, et al. A comparative study of signal versus aggregate methods of outcome measurement based on the WOMAC Osteoarthritis Index. J Rheumatol 1994; 21: 210612.
  • 36
    Rejeski WJ, Ettinger WH Jr, Shumaker S, Heuser MD, James P, Monu J, et al. The evaluation of pain in patients with knee osteoarthritis: the knee pain scale. J Rheumatol 1995; 22: 11249.
  • 37
    Bellamy N, Campbell J, Stevens J, Pilch L, Stewart C, Mahmood Z. Validation study of a computerized version of the Western Ontario and McMaster Universities VA3.0 Osteoarthritis Index. J Rheumatol 1997; 24: 24135.
  • 38
    Stucki G, Sangha O, Stucki S, Michel BA, Tyndall A, Dick W, et al. Comparison of the WOMAC (Western Ontario and McMaster Universities) osteoarthritis index and a self-report format of the self-administered Lequesne-Algofunctional index in patients with knee and hip osteoarthritis. Osteoarthritis Cartilage 1998; 6: 7986.
  • 39
    Roos EM, Roos HP, Lohmander LS, Ekdahl C, Beynnon BD. Knee Injury and Osteoarthritis Outcome Score (KOOS): development of a self-administered outcome measure. J Orthop Sports Phys Ther 1998; 28: 8896.
  • 40
    Clark JA, Spiro A 3rd, Fincke G, Miller DR, Kazis LE. Symptom severity of osteoarthritis of the knee: a patient-based measure developed in the veterans health study. J Gerontol A Biol Sci Med Sci 1998; 53: M35160.
  • 41
    Roos EM, Roos HP, Lohmander LS, and the Western Ontario and MacMaster Universities. WOMAC Osteoarthritis Index: additional dimensions for use in subjects with post-traumatic osteoarthritis of the knee. Osteoarthritis Cartilage 1999; 7: 21621.
  • 42
    Davies GM, Watson DJ, Bellamy N. Comparison of the responsiveness and relative effect size of the Western Ontario and McMaster Universities Osteoarthritis Index and the Short-Form Medical Outcomes Study Survey in a randomized, clinical trial of osteoarthritis patients. Arthritis Care Res 1999; 12: 1729.
  • 43
    Steultjens MP, Roorda LD, Dekker J, Bijlsma JW. Responsiveness of observational and self-report methods for assessing disability in mobility in patients with osteoarthritis. Arthritis Rheum 2001; 45: 5661.
  • 44
    Angst F, Aeschlimann A, Steiner W, Stucki G. Responsiveness of the WOMAC osteoarthritis index as compared with the SF-36 in patients with osteoarthritis of the legs undergoing a comprehensive rehabilitation intervention. Ann Rheum Dis 2001; 60: 83440.
  • 45
    Bellamy N, Campbell J, Hill J, Band P. A comparative study of telephone versus onsite completion of the WOMAC 3.0 osteoarthritis index. J Rheumatol 2002; 29: 7836.
  • 46
    Theiler R, Spielberger J, Bischoff HA, Bellamy N, Huber J, Kroesen S. Clinical evaluation of the WOMAC 3.0 OA Index in numeric rating scale format using a computerized touch screen version. Osteoarthritis Cartilage 2002; 10: 47981.
  • 47
    Faucher M, Poiraudeau S, Lefevre-Colau MM, Rannou F, Fermanian J, Revel M. Algo-functional assessment of knee osteoarthritis: comparison of the test-retest reliability and construct validity of the WOMAC and Lequesne indexes. Osteoarthritis Cartilage 2002; 10: 60210.
  • 48
    Kirschner S, Walther M, Bohm D, Matzer M, Heesen T, Faller H, et al. German short musculoskeletal function assessment questionnaire (SMFA-D): comparison with the SF-36 and WOMAC in a prospective evaluation in patients with primary osteoarthritis undergoing total knee arthroplasty. Rheumatol Int 2003; 23: 1520.
  • 49
    O'Malley KJ, Suarez-Almazor M, Aniol J, Richardson P, Kuykendall DH, Moseley JB Jr, et al. Joint-specific multidimensional assessment of pain (J-MAP): factor structure, reliability, validity, and responsiveness in patients with knee osteoarthritis. J Rheumatol 2003; 30: 53443.
  • 50
    Klassbo M, Larsson E, Mannevik E. Hip disability and osteoarthritis outcome score: an extension of the Western Ontario and McMaster Universities Osteoarthritis Index. Scand J Rheumatol 2003; 32: 4651.
  • 51
    Faucher M, Poiraudeau S, Lefevre-Colau MM, Rannou F, Fermanian J, Revel M. Assessment of the test-retest reliability and construct validity of a modified Lequesne index in knee osteoarthritis. Joint Bone Spine 2003; 70: 5215.
  • 52
    Brazier JE, Harper R, Munro J, Walters SJ, Snaith ML. Generic and condition-specific outcome measures for people with osteoarthritis of the knee. Rheumatology (Oxford) 1999; 38: 8707.
  • 53
    Nilsdotter AK, Lohmander LS, Klassbo M, Roos EM. Hip disability and osteoarthritis outcome score (HOOS): validity and responsiveness in total hip replacement. BMC Musculoskelet Disord 2003; 4: 10.
  • 54
    Roos EM, Toksvig-Larsen S. Knee injury and Osteoarthritis Outcome Score (KOOS): validation and comparison to the WOMAC in total knee replacement. Health Qual Life Outcomes 2003; 1: 17.
  • 55
    Ren XS, Kazis L, Lee A, Miller DR, Clark JA, Skinner K, et al. Comparing generic and disease-specific measures of physical and role functioning: results from the Veterans Health Study. Med Care 1998; 36: 15566.
  • 56
    Hunt SM, McKenna SP, Williams J. Reliability of a population survey tool for measuring perceived health problems: a study of patients with osteoarthrosis. J Epidemiol Community Health 1981; 35: 297300.
  • 57
    Brooks RH, Callahan LF, Pincus T. Use of self-report activities of daily living questionnaires in osteoarthritis. Arthritis Care Res 1988; 1: 2332.
  • 58
    Bolognese JA, Schnitzer TJ, Ehrich EW. Response relationship of VAS and Likert scales in osteoarthritis efficacy measurement. Osteoarthritis Cartilage 2003; 11: 499507.
  • 59
    Swiontkowski MF, Engelberg R, Martin DP, Agel J. Short musculoskeletal function assessment questionnaire: validity, reliability, and responsiveness. J Bone Joint Surg Am 1999; 81: 124560.
  • 60
    Roorda LD, Roebroeck ME, Lankhorst GJ, van Tilburg T, Bouter LM. Measuring functional limitations in rising and sitting down: development of a questionnaire. Arch Phys Med Rehabil 1996; 77: 6639.
  • 61
    Fries JF. The assessment of disability: from first to future principles. Br J Rheumatol 1983; 22 Suppl: 4858.
  • 62
    Meenan RF, Gertman PM, Mason JH. Measuring health status in arthritis: the Arthritis Impact Measurement Scales. Arthritis Rheum 1980; 23: 14652.
  • 63
    Callahan LF, Brooks RH, Summey JA, Pincus T. Quantitative pain assessment for routine care of rheumatoid arthritis patients, using a pain scale based on activities of daily living and a visual analog pain scale. Arthritis Rheum 1987; 30: 6306.
  • 64
    Bergner M, Bobbitt RA, Carter WB, Gilson BS. The Sickness Impact Profile: development and final revision of a health status measure. Med Care 1981; 19: 787805.
  • 65
    De Bruin AF, de Witte LP, Stevens F, Diederiks JP. Sickness Impact Profile: the state of the art of a generic functional status measure. Soc Sci Med 1992; 35: 100314.
  • 66
    Huiskes CJ, Kraaimaat FW, Bijlsma JW. De ontwikkeling van de IRGL: een instrument om gezondheid te meten bij patiënten met reuma. Gedrag Gezond 1990; 18: 7889.
  • 67
    Huiskes CJ, Kraaimaat FW, Bijlsma JW. Development of a self-report questionnaire to assess the impact of rheumatic diseases on health and lifestyle. J Rehab Sci 1990; 3: 6570.
  • 68
    Lequesne MG, Mery CF, Samson M, Marty M. Comparison between the WOMAC and the Lequesne indices in patients with knee and hip osteoarthritis [letter]. Osteoarthritis Cartilage 1998; 6: 4412.
  • 69
    Cortina JM. What is coefficient alpha? An examination of theory and applications. J Appl Psychol 2004; 78: 98104.


  1. Top of page
  2. Introduction
  3. Materials and Methods
  4. Results
  5. Discussion
Psychometric qualityDefinitionCriteria used to rate the psychometric quality
Time to administerTime needed to complete the questionnaireRating: [+]less than 10 minutes [−]more than 10 minutes [ ]no information found on time to administer
Ease of scoringEase of method used to calculate the questionnaire's scoreRating: [+] easy: summing up of the items [ø] moderate: visual analog scale (VAS) or simple formula [−] difficult: VAS in combination with formula or complex formula [ ] no information found on calculation of score
Readability and comprehensionThe questionnaire is understandable for all patientsRating: [+] readability tested; result was good [−] inadequate readability [ ] no information found on readability and comprehension
Content validityThe extent to which the domain of interest is comprehensively sampled by the items in the questionnaire1) Patients were involved during item selection and/or item reduction. 2) Patients were consulted for reading and comprehension.
  Rating: [+] patients and investigator/expert involved [ø] patients only [−] no patient involvement [ ] no information found on content validity
Internal consistencyThe extent to which items in a (sub)scale are intercorrelated; a measure of the homogeneity of a (sub)scale1) Factor analysis was applied in order to provide empirical support for the dimensionality of the questionnaire.
  2) Cronbach's alpha >0.70 for each dimension/subscale.
  Rating: [+] adequate design and method; factor analysis supporting the dimension; α > 0.70 [ø] doubtful method used or no factor analysis [−] inadequate internal consistency (α < 0.70) or dimensions not supported by factor analysis [ ] no information found on internal consistency
Construct validityThe extent to which scores on the questionnaire relate to other measures in a manner that is consistent with theoretically derived hypothesis concerning the domains that are measured1) Hypotheses were formulated. 2) Results were acceptable in accordance with ≥75% of hypotheses. 3) An adequate measure was used. Rating: [+] adequate design, method, and result [ø] doubtful method used [−] adequate design and method and inadequate construct validity [ ] no information found on construct validity
Floor and ceiling effectsThe questionnaire fails to demonstrate a worse score in patients who clinically deteriorated and an improved score in patients who clinically improved1) Descriptive statistics of the distribution of scores were presented. 2) ≤15% of respondents achieved the highest or lowest possible score.
  Rating: [+] no floor/ceiling effects [−] >15% in extremities [ ] no information found on floor and ceiling effects
Test–retest reliabilityThe extent to which the same results are obtained on repeated administrations of the same questionnaire when no change in physical functioning has occurred1) Calculation of an intraclass correlation coefficient (ICC); ICC > 0.70. 2) Time interval and confidence intervals (or n > 50) were presented.
  Rating: [+] adequate design, method, and ICC > 0.70 [ø] doubtful method [−] inadequate reliability, with adequate design and method [ ] no information found on test–retest reliability
AgreementThe ability to produce exactly the same scores with repeated measurements1) For evaluative questionnaires, reliability agreement should be assessed.
  2) Limits of agreement, Kappa, or standard error of measurement was presented.
  Rating: [+] adequate design, method, and result [ø] doubtful method used [−] inadequate agreement, with adequate design and method [ ] no information found on agreement
ResponsivenessThe ability to detect change over time in the concept being measured1) For evaluative questionnaires, responsiveness should be assessed.
  2) Hypotheses were formulated and results were in agreement with ≥75% of hypotheses.
  3) An adequate measure was used (effect size, standardized response mean, comparison with external standard).
  Rating: [+] adequate design, method, and result [ø] doubtful method used [−] inadequate responsiveness with adequate design, method [ ] no information found on responsiveness
InterpretabilityThe degree to which one can assign qualitative meaning to quantitative scoresAuthors provided information on the interpretation of scores:
  1. presentation of means and SD of scores before and after treatment
  2. comparative data on the distribution of scores in relevant subgroups
  3. information on the relationship of scores to well-known functional measures or clinical diagnosis
  4. information on the association between changes in score and patients' global ratings of the magnitude of change they experienced
  Rating: [+] ≥2 of above types of information was presented [ø] doubtful method used or doubtful description; 1 type of information was presented [ ] no information found on interpretability
Minimum clinically important difference (MCID)The smallest difference in score in the domain of interest that patients perceive as beneficial and would mandate a change in a patient's treatmentInformation is provided about what (difference in) score would be clinically meaningful. Rating: [+] MCID is presented [ ] no information found on MCID