Measures of adult shoulder function: Disabilities of the Arm, Shoulder, and Hand Questionnaire (DASH) and Its Short Version (QuickDASH), Shoulder Pain and Disability Index (SPADI), American Shoulder and Elbow Surgeons (ASES) Society Standardized Shoulder Assessment Form, Constant (Murley) Score (CS), Simple Shoulder Test (SST), Oxford Shoulder Score (OSS), Shoulder Disability Questionnaire (SDQ), and Western Ontario Shoulder Instability Index (WOSI)

Authors


INTRODUCTION

There exists a large number of instruments that measure symptoms and function of the shoulder. More than 30 different tools can be found by entering “shoulder” and “assessment” into PubMed and conducting a review of the ≥3,000 retrieved references. Literature for every instrument was systematically reviewed by the key words “shoulder” and “instrument's name.” We selected those that are cited in at least 20 references and for which psychometric testing has been reported. For each of these 9 tools, the 10–20 most informative studies about psychometric results were selected for citation to limit the references' lists, but the entire body of literature was reviewed.

The Disabilities of the Arm, Shoulder, and Hand questionnaire (DASH), together with its short form (QuickDASH), is the most widespread and best-tested and characterized instrument for shoulder assessment. However, it is region specific, i.e., specific to the arm, not just to the shoulder. The DASH stands out as an instrument positioned between the generic (as, for example, the Short Form 36) and the shoulder-specific measures, i.e., all other tools of the review: it forms the link between these 2 philosophies. It is a must for comprehensive assessment in conditions affecting different regions of the arm and for research studies. This review was focused only on shoulder studies of the DASH/QuickDASH.

The Shoulder Pain and Disability Index (SPADI), the Constant (Murley) Score (CS), and the American Shoulder and Elbow Surgeons (ASES) questionnaire for the shoulder are also well characterized and accepted in the scientific community. Their responsiveness is comparable. The SPADI is, together with the patient ASES, the shortest self-assessment and shows high validity. The ASES is a sophisticated measure for the patient and the examiner offering a relatively large number of items, often too long for clinicians. There are sparse data about the clinical (examiner-based) part of the ASES. The CS is the shortest self- and examiner-based tool. It combines the data of both into 1 total score. However, its intertester reliability is low and its validity is affected by the problem of different protocols on how to measure strength.

The Simple Shoulder Test (SST) is very short, very easy to understand and to score, and widely used in US. The binary item-response options (yes/no) affect the usability of the SST as metric score, validity, and comparability to other scores; the same is true for the Shoulder Disability Questionnaire (SDQ). The Oxford Shoulder Score was developed specifically for surgical conditions and is often used in the UK. It is very short, but there is a lack of psychometric testing data. The SDQ is very short but cannot be recommended due to absence of data on or weakness of psychometric properties.

Finally, the Western Ontario Shoulder Instability index (WOSI) was selected because, in the last few years, it has become the most often used and best psychometrically tested assessment of shoulder instability, although there is still a lack of testing data.

For a set of clinical assessment tools, we recommend the QuickDASH, the SPADI (or the patient ASES), and the CS, and the WOSI if instability is part of the condition. For a research set, the DASH, the SPADI, and, possibly, the clinical part of the ASES or the CS can be recommended in order to (also) obtain more information about examiner-based data.

DISABILITIES OF THE ARM, SHOULDER, AND HAND QUESTIONNAIRE (DASH) AND ITS SHORT VERSION (QUICKDASH)

Description

Purpose.

Self-assessment of symptoms and function of the entire upper extremity (1).

Settings.

All domains, any or multiple disorders of the upper extremity.

Versions.

Original version (30 items) and derivations of it as short versions (11 or 9 items); preliminary publication in 1996 (2), first publication of the manual in 1999, second edition in 2002, and third edition in 2011 (1); QuickDASH in 2005 (3); and QuickDASH-9 in 2009 (4).

Content and number of items.

30 items (total score): 6 items about symptoms (3 about pain, 1 for tingling/numbness, 1 for weakness, 1 for stiffness) and 24 about function (21 about physical function, 3 about social/role function). Determination of the subscores symptoms and function is possible, but this is not originally described (1, 5–9). Two optional additional modules for work (4 items) and sports/performing arts (4 items) are more rarely used in patient settings, but rather for manual workers and athletes. The “classic” QuickDASH has 11 items (3 for symptoms, 8 for function) and will be referred to throughout as the “QuickDASH” (3, 10, 11). Other short versions exist, e.g., the QuickDASH-9 (1 item for pain, 8 for function), but are rarely used and not supported by the authors of the original (1, 4).

Response options/scale.

All items are scored on a scale of 5 (Likert) levels: 1 = no difficulty/symptoms, 2 = mild difficulty/symptoms, 3 = moderate difficulty/symptoms, 4 = severe difficulty/symptoms, and 5 = extreme difficulty (unable to do)/symptoms.

Recall period for items.

1 week.

Endorsements.

American Association of Orthopedic Surgeons and Institute of Work and Health (IWH) (1).

Examples of use.

Relevant settings (aims and analysis [references]) for the DASH are as follws:

  • Various regions of upper extremity (development of the DASH [2])

  • Various regions of upper extremity (DASH manual: third edition [1])

  • Various regions of upper extremity (population normative data [1, 12])

  • Shoulder instruments (important comparative reviews [7, 13])

  • Various regions of upper extremity (reliability, validity, responsiveness [14])

  • Various operations of upper extremity (responsiveness [15])

  • Various regions of upper extremity (validity, factor, Rasch [9])

  • Upper extremity, neck pain (validity, responsiveness [16])

  • Upper extremity, lower extremity (validity [17])

  • Rheumatoid arthritis (reliability, validity [18])

  • Multiple sclerosis (reliability, validity, Rasch [8])

  • Shoulder arthroplasty (responsiveness [19])

  • Adhesive capsulitis (validity, responsiveness [20])

  • Shoulder impingement, tendinitis (validity, responsiveness, minimum clinically important difference [MCID] [21])

  • Proximal humerus fracture (reliability, validity [22])

  • Elbow, arthroplasty (validity [23, 24])

  • Distal radius facture (reliability, validity, responsiveness [25])

  • Hand osteoarthritis, fractures (responsiveness [26])

  • Hand, various (validity, German DASH [5])

  • Rhizarthrosis (validity [27])

Relevant settings (aims and analysis [references]) for the QuickDASH are as follows:

  • Various regions of upper extremity (development of the QuickDASH [3])

  • Various surgery of upper extremity (psychometric testing of the QuickDASH [10, 11])

  • Shoulder pain (reliability, MCID [28])

  • Various regions of upper extremity (development of the QuickDASH-9 [4])

Practical Application

How to obtain.

Property and copyright at the IWH (online at http://www.dash.iwh.on.ca/). There, further links lead to the forms for free for the DASH (http://www.dash.iwh.on.ca/assets/images/pdfs/dash_questionnaire_2010.pdf) and QuickDASH (http://www.dash.iwh.on.ca/assets/images/pdfs/quickdash_questionnaire_2010.pdf). Language versions are online at http://www.dash.iwh.on.ca/translate.htm. Free of charge for noncommercial use; license for commercial use available at the IWH. Manual (3rd edition) online and paper copy; costs not yet determined.

Method of administration.

Self-assessment.

Scoring.

The arithmetic mean of at least 27 of the 30 items (missing rule) is transformed by (mean − 1) × 25 into the scale from 0 = no symptoms/full function to 100 = maximal symptoms/no function for the DASH total score (1, 11). Five of 6 items are necessary for determination of the symptoms score and 22 of 24 items for the function score (11). Similarly, 10 of 11 items are necessary for the QuickDASH total score, 3 of 3 for symptoms, and 7 of 8 for function (3, 10, 11). Computer scoring is not necessary but easier, e.g., on Microsoft Excel or any calculation or statistics program. Scoring program is online at http://www.dash.iwh.on.ca/score.htm.

Score interpretation.

Originally, 0 = best and 100 = worst. The reverse scale from 0 = worst to 100 = best by (100 = original score) is also often used for comparison with other scores, e.g., the Short Form 36 (SF-36). Several studies showed varying distinct cutoff points to reflect severity (1). Cutoff scores: <15 = “no problem,” 16–40 = “problem, but working,” and >40 = “unable to work” (1). Normative values of 1,706 persons in the US general population, stratified by sex, age, and comorbidity, are available (US population mean ± SD 10.1 ± 14.7) (1, 6, 12).

Respondent burden.

Time to complete is 4 minutes for the DASH and 2 minutes for the QuickDASH (1, 3, 6, 7). All items are easy to comprehend and are not emotionally sensitive (with the exception of item 21; see below).

Administrative burden.

Item rating can be typed or scanned into an electronic database. Score computation is easy (see above). The head of the questionnaire contains instructions on how to complete it. Time to administer (including control of missing data): DASH, 10 minutes; QuickDASH, 8 minutes (1). Time to scan and determine the scores: 2 minutes. Little special training is necessary for these activities.

Translations/adaptations.

Available for free for 35 languages and dialects. Versions in 11 other languages are in progress (as of January 30, 2011).

Psychometric Information

Method of development.

Eight hundred twenty-one possible questions obtained by literature review were reduced to 67 (+3 new) due to content overlap or off target by a consensus group. Patient data were analyzed by different item to total correlation techniques, comparison to clinimetric ranking, and clinical judgment, resulting in the final 30-item version (1, 2). The newest manual contains extensive psychometric information (1). Psychometric analysis by item-response theory (using Rasch analysis) was performed later for the DASH (8, 9). All relevant modern strategies were used in the development of the QuickDASH comparing 3 strategies: the concept-retention method, the equidiscriminative item-total correlation, and the item-response theory (Rasch modeling). The concept-retention method was most similar to the DASH and was chosen to build the QuickDASH (3).

Acceptability.

All item content is easy to read and understand. Missing data are rare. Item 21 that asks about sexual activity is often left out by patients. For that reason, item 21 has been skipped in the QuickDASH (3, 6). Low floor and ceiling effects are reported (1, 6, 8, 11, 14, 18).

Reliability.

Internal consistency/cross-sectional reliability: Cronbach's α = 0.92–0.98 for the DASH (1, 4, 8, 9, 15) and 0.92–0.95 for the QuickDASH (3, 10).

Test–retest reliability: intraclass correlation coefficient 0.93–0.98 for the DASH (1, 14, 18, 21, 22) and 0.90–0.94 for the QuickDASH (3, 10, 28).

Validity.

Content validity.

Normally distributed scores and low floor and ceiling effects (6, 14, 18).

Criterion validity.

There is no gold standard for symptoms or function measurement of the shoulder. The obvious content validity of the used items and the numerous studies of the DASH give it a certain intrinsic validity. However, criterion validity of the DASH came into question when Rasch analysis was applied (8, 9). The corresponding results for the QuickDASH were better but also criticized (3, 9).

Construct validity.

Pearson's or Spearman's correlations of the DASH total score to other instruments are as follows:

  • SPADI: 0.79–0.93 and 0.55 (ref.6, 14, 20)

  • HAQ: 0.88 and 0.54 (ref.18, 20)

  • CS: 0.82 (ref.6) ASES: 0.79 (ref.6)

  • EQ-5D: 0.75 (ref.22)

  • SF-12 PCS: 0.75 and 0.57–0.63 (ref.16, 22)

  • SF-36 PCS: 0.70 (ref.6, 18)

  • Global disability rating: 0.67–0.71 (ref.21)

  • DAS28: 0.42 (ref.18)

  • SF-36 MCS: 0.27 and 0.06 (ref.6, 18)

  • SF-12 MCS: 0.10–0.33 (ref.16)

The correlations reflected a well-fitting dose-response curve for the construct of shoulder specificity of the compared instruments (19). Extraordinary low correlations were reported in 1 study (20).

Pearson's correlations of the QuickDASH total score to other instruments are as follows:

  • SPADI: 0.84 (ref.11)

  • SF-36 PCS: 0.68 (ref.11)

  • Global rating of change: 0.45 (ref.28)

Ability to detect change.

Minimally detectable change (MDC95%): 7.9–14.8 points for the DASH (7, 14, 21) and 13.3 for the QuickDASH (28).

MCID: 10.2 points (21). Comparison and critique of different methods to determine MCID on the DASH was done (29). QuickDASH: 8.0 points (28). Between-group differences are reported (1, 7).

Effect sizes (ES) and standardized response means (SRMs) of the DASH total score in shoulder conditions are as follows:

  • Total shoulder arthroplasty: ES 1.19, SRM 1.22 (ref.19)

  • Neck and/or shoulder at general practitioner: ES 0.88–0.90, SRM 0.88–0.93 (ref.16)

  • Arthroscopic acromioplasty: ES 0.9, SRM 0.5 (ref.15)

  • Neck symptoms at general practitioner: ES 0.88, SRM 0.88 (ref.16)

  • Shoulder impingement, tendinitis: physiotherapy: ES 0.81, SRM 0.72 (ref.21)

  • Rotator cuff surgery, total shoulder arthroplasty: ES 0.64, SRM 0.81 (ref.14)

  • Adhesive capsulitis: steroids: ES 0.34, SRM 0.43 (ref.20)

ES and SRMs of the QuickDASH total score in shoulder conditions are as follows:

  • Total shoulder arthroplasty: ES 1.26 (ref.11)

  • Shoulder or hand: conservative treatment: SRM 0.79 (ref.3)

  • Various upper extremity surgery: ES 0.50, SRM 0.63 (ref.10)

Critical Appraisal of Overall Value to the Rheumatology Community

Strengths.

The DASH is the best-tested and most often used self-assessment instrument for the shoulder and any other disorders of the upper extremity. It is particularly useful in polyarticular conditions or if measurement of symptoms and function of the entire upper extremity is wanted. Since shoulder function determines the position of the elbow and the hand, the DASH is also useful in all elbow and hand conditions. Some of the DASH items also ask about fine-motor hand functions. Empiric data can be compared to US population norms. The QuickDASH total score yields very similar values to those of the DASH and the total scores correlate highly to each other (3, 11).

Caveats and cautions.

The DASH is region specific, not joint specific. Specificity and responsiveness of the DASH are, therefore, lower than those of unique shoulder-specific tools but higher than those of generic quality of life tools (19). Compared to other instruments, the strict 90% missing rule produces a relatively high percentage of missing data. There is evidence that the DASH score is also influenced by disability of the lower extremity (17). Rasch analysis revealed problems with the unidimensionality of the DASH total score and with differentiation between “mild/moderate/severe difficulty,” which affects (criterion) validity (8, 9). Obvious misfits were items 21 (sexual activity) and 26 (tingling) (3, 8, 9). Item 26 is retained in the QuickDASH. However, this needs closer investigation as a classically developed tool is fitted into a modern measurement framework. The QuickDASH has a similar total score to the DASH but it underestimates symptoms (reports lower severity) and overestimates function (reports less disability) when compared to the DASH (11). In the case where an MDC95% is reported to be higher than the MCID, the MDC95% should be taken as the MCID.

Clinical usability.

The DASH is the best tool for comprehensive assessment of upper extremity conditions, e.g., if shoulder problems cannot be differentiated from hand problems (rheumatoid arthritis, polytrauma, multiple sclerosis). It is easy to apply, analyze, and interpret. Comparison of empirical and normative data allows valid description of the patient's upper extremity status. The QuickDASH provides the necessary short assessment for clinical visits.

Research usability.

The DASH is good for research purposes in various upper extremity conditions. It is well tested and there is a large body of data for comparison of different settings and different upper extremity instruments, especially for analysis of construct validity compared to other instruments. The concerns about validity obtained by Rasch analysis cannot be disregarded, but development of new methods to assess validity, e.g., item-response theory, is ongoing. Specificity and responsiveness in localized conditions (affecting only 1 joint) are moderate. The use of the subscales symptoms and function are recommended for the DASH but not for the QuickDASH (11). The constructs of the 2 instruments are not exactly the same.

SHOULDER PAIN AND DISABILITY INDEX (SPADI)

Description

Purpose.

Self-assessment of symptoms and function of the shoulder.

Settings.

All domains, any disorders of the shoulder joint.

Versions.

Original version published in 1991 (30). No revisions.

Content and number of items.

13 items (total score): 5 items for pain and 8 for function (subscores).

Response options/scale.

All SPADI items are originally scored on a visual analog scale (VAS) from no pain/no difficulty to worst pain imaginable/so difficult required help. The VAS line was divided into 12 equal intervals to obtain a 12-point numerical rating scale (NRS) ranging from 0 (best) to 11 (worst) (30). Later versions used the 12-point or an 11-point NRS (0–10) without a VAS line (31).

Recall period for items.

1 week.

Endorsements.

None.

Examples of use.

Relevant settings (aims and analysis [references]) for the SPADI are as follows:

  • Shoulder pain (development of the SPADI [30])

  • Shoulder instruments (important comparative reviews [7, 13])

  • Various upper extremity diagnoses (reliability, minimal detectable difference [MDD], minimum clinically important difference [MCID] [21])

  • Various shoulder diagnoses (validity [32])

  • Adhesive capsulitis (factor analysis [33])

  • Adhesive capsulitis (reliability, validity, responsiveness [20, 34])

  • Rotator cuff (reliability, validity [35])

  • Rotator cuff, local infiltration (MCID [36])

  • After shoulder arthroplasty (validity, MDC [6, 31])

  • Total shoulder arthroplasty (responsiveness [19])

  • Various shoulder surgery (reliability, responsiveness [37])

  • Orthopedic practice (validity, factor, MDC, MCID [38])

  • Orthopedic practice (Rasch, partial credit model [39])

  • Primary care (validity, responsiveness [40])

  • Outpatient physiotherapy (validity, responsiveness [41])

  • Community volunteers (factor analysis [42])

Practical Application

How to obtain.

Printed in various references (30, 31, 40–42). Free online at http://www.workcover.com/site/treat_home/outcome_measures_and_risk_screening_tools/links_to_outcome_measures_and_screening_tools.aspx?.

Method of administration.

Self-assessment.

Time to complete.

2–3 minutes (7, 37).

Scoring.

Originally, the sum of marked items/maximal possible score × 100 with at least 11 of 13 completed items necessary for the total score (30). Later and with permission of the developer K. E. Roach, the “2/3 missing rule,” as used for many instruments, was applied: at least 3 of 5 pain and 6 of 8 function items for the subscales are necessary (6, 31). The SPADI total score is the unweighted mean of the pain and function subscores (30). In fact, the (sub)scores can be determined by the arithmetic mean of the completed items by mean/11 × 100 using the 12-point NRS (or mean × 10 using the 11-point NRS). Computer scoring is not necessary but easier.

Score interpretation.

Originally, 0 = best and 100 = worst. A reverse scale from 0 = worst to 100 = best (100 = original score) is also often used to compare with other scores, e.g., the Short Form 36 (SF-36). There are no distinct cutoff points to reflect severity. Empirical normative values are not determined.

Respondent burden.

All items are easy to comprehend and are not emotionally sensitive.

Administrative burden.

Score computation is easy. The head of the questionnaire contains a short explanation on how to complete it. Time to administer: 5 minutes (30). Time to scan and determine the scores: 2 minutes.

Translations/adaptations.

Published in 3 languages: Norwegian (34), German (31), and Slovene. Versions in Chinese, Hindi, Brazilian Portuguese, Japanese, Turkish, and French Canadian exist but have not been published under peer review (Roach KE: unpublished observations).

Psychometric Information

Method of development.

20 items were selected by a group of 3 rheumatologists and 1 physiotherapist and established by assessing their face validity for pain and function, their test–retest reliability, and their correlation to shoulder range of motion (30). Item-response theory was applied to the function subscale only (39).

Acceptability.

Easy to read and understand. Missing data are very rare. Low floor and ceiling effects reported (6, 31, 32, 41).

Reliability.

Internal reliability/consistency: Cronbach's α = 0.86–0.96 (30, 31, 33, 38, 40, 42).

Test–retest reliability: intraclass correlation coefficient 0.84–0.95 (7, 21, 31, 34, 37). It was exceptionally low with 0.66 in the development study (30).

Validity.

Content validity.

The scores were normally distributed in 1 study (6) but not in 2 studies (31, 41). Low floor and ceiling effects were seen, especially for the function subscore (6, 31, 32, 41).

Criterion validity.

In the absence of a gold standard, the obvious content validity of the used items and the numerous studies examining the SPADI give it a certain intrinsic validity. Rasch and factor analysis revealed moderate overall criterion validity: items 8 (removing something from the back pocket), 7 (carrying ≥10 lbs), and 4 (closing front buttons) showed some misfit (only the function subscore was examined) (39). Very low and very high function were not precisely measured (39). The 2 subscores pain and function could not be supported by factor analysis (33, 38, 42).

Construct validity.

Pearson's or Spearman's correlations of the SPADI total score to other instruments are as follows:

  • DASH: 0.93, 0.55, and 0.88 (ref.6, 20, 31)

  • ASES: 0.81, 0.92, and 0.77 (ref.6, 31, 37)

  • OSS: 0.57 and 0.85 (ref.35, 43)

  • CS: 0.82 (ref.6)

  • SST: 0.74 and 0.80 (ref.32, 38)

  • SF-36 PCS: 0.63 and 0.67 (ref.6, 32)

  • Global disability rating: 0.64–0.69 (ref.21)

  • HAQ: 0.55 and 0.61 (ref.20, 40)

  • Sickness Impact Profile: 0.57 (ref.41)

  • Active ROM: 0.54–0.80 and 0.38 (ref.30, 34)

  • SF-36 MCS: 0.08 (ref.6)

Extraordinary low correlations were reported in 1 study (20).

Ability to detect change.

Minimally detectable change (MDC95%) for the total score: 17.0, 13.2, 17.2, and 21.5 points, respectively, as calculated in 4 studies (21, 31, 34, 38).

MCID: 13.2, 15.4, and 23.1 points, respectively (21, 36, 38).

Effect sizes (ES) and standardized response means (SRMs) of the SPADI total score are as follows:

  • Total shoulder arthroplasty: ES 2.10, SRM 1.72 (ref.19)

  • Adhesive capsulitis: steroids: ES 1.94, SRM 1.81 (ref.34)

  • Adhesive capsulitis: steroids: ES 1.20–1.64, SRM 1.27–1.68 (ref.20)

  • Shoulder pain, physiotherapy: ES 1.26, SRM 1.38 (ref.41)

  • Rotator cuff surgery + total shoulder arthroplasty: SRM 1.23 (ref.37)

  • Various upper extremity, occupational, physiotherapy: ES 0.80, SRM 0.67 (ref.21)

  • General practice, conservative therapy: ES 0.34 (ref.40)

Critical Appraisal of Overall Value to the Rheumatology Community

Strengths.

The SPADI is the most responsive shoulder instrument and has been tested in numerous settings. It is short; it is easy to understand, complete, and analyze; and no costs are involved in obtaining it.

Caveats and cautions.

Criterion and construct validity showed some weaknesses in factor and Rasch analysis. The original 12-item NRS (where 0 = best and 11 = worst) is uncommon. Only 1 item assesses overhead work or heavy use of the shoulder, which may produce ceiling effects. In the case where an MDC95% is reported to be higher than the MCID, the MDC95% should be taken as the MCID.

Clinical usability.

Very good for short and responsive assessment in all shoulder conditions. Easy to interpret.

Research usability.

Most responsive shoulder tool (19, 37). Recommended for every set of shoulder assessments. Subscores with limited criterion validity.

AMERICAN SHOULDER AND ELBOW SURGEONS (ASES) SOCIETY STANDARDIZED SHOULDER ASSESSMENT FORM

Description

Purpose.

Developed to “represent a state-of-the-art questionnaire with three key features: 1) ease of use 2) method of assessing activities of daily living (ADL) and 3) inclusion of a patient self-evaluation section,” approved by the ASES Research Committee in 1994 (44) to be applicable to all shoulder patients regardless of diagnosis. In 1998, the original ASES was modified to the mASES by deleting 2 and adding 5 function items to make a “whole-extremity questionnaire rather than a shoulder questionnaire alone” (37). This chapter deals with the original ASES only, not with the mASES.

Content and number of items.

Patient self-assessment section (patient ASES [pASES]) and a section to be completed by the examiner (clinical ASES [cASES] or, more precisely, ASES-examiner). The pASES form is divided into 3 sections: pain (6 items), instability (2 items), and ADL (10 items for both sides each). The cASES has 4 parts (each for left and right): range of motion (5 items, each passive and active), signs (11 items), strength (5 items), and instability (8 items + 1 open question).

Response options/scale.

Binary (yes/no) answers for pain and instability, visual analog scales (VAS) for pain and instability (where 0 = best and 10 = worst), and 4-point ordinal Likert scale for function (where 0 = unable to do, 1 = very difficult, 2 = somewhat difficult, and 3 = not difficult).

Recall period for items.

1 week.

Endorsements.

ASES (44).

Examples of use.

Relevant settings (aims and analysis [references]) for the ASES are as follows:

  • No empirical field testing (development of the ASES [44])

  • Outpatients without shoulder problems (normative data [45])

  • Shoulder instruments (important comparative reviews [7, 13])

  • Various shoulder dysfunctions (reliability, validity, responsiveness [46])

  • Subacromial impingement (validity [47])

  • Calcific tendinitis (responsiveness [48])

  • Rotator cuff, tendinitis (minimum clinically important difference [MCID] [49])

  • Rotator cuff, arthritis (Italian ASES, reliability, validity [50])

  • Rotator cuff, instability, arthritis (reliability, validity, responsiveness [51])

  • Rheumatoid arthritis, osteoarthritis (German ASES, reliability, validity [52])

  • Orthopedic practice (reliability [53])

  • Osteoarthritis, hemi- or total arthroplasty (responsiveness [54])

  • Total shoulder arthroplasty (validity, responsiveness [6, 19])

Practical Application

How to obtain.

Original publication (44). Free online at http://www.shoulderandkneesurgery.com/pdf/ases_assessment_form.pdf.

Method of administration.

Self-assessment.

Time to complete.

3 minutes (pASES).

Scoring.

The pASES total score = ((10 − VAS pain) × 5) + (5/3 × sum of ADL items) (44). The instability items and the remaining 5 pain items do not contribute to the pASES total score. Determination of the cASES was not described originally; 1 solution, using 2 of 3 of the completed items to determine the scores, is given in 1 study (6).

Score interpretation.

0 = worst and 100 = best. An original missing rule and distinct cutoffs to reflect severity have not been published. Normative data are provided in graph form, stratified by 10-year age groups but not by sex (45).

Respondent burden.

Time to complete is 3 minutes for the pASES (44). All items are easy to understand and are not suggestive or emotionally sensitive.

Administrative burden.

The patient section can be administered without the clinical section. This is short to perform and is done in most of the applications. Score computation is easy and can be implemented in any database. Time (pASES): 8 minutes (estimated). Patient examination for the cASES is time consuming.

Translations/adaptations.

German (52), Italian (50), and Portuguese.

Psychometric Information

Method of development.

Developed by the research committee of the ASES that reviewed existing instruments at that time through open discussion and without a specific methodologic approach.

Acceptability.

All item content is easy to read and understand. Missing data are very rare. Single items may show high floor and ceiling effects (52).

Reliability.

Internal reliability/consistency: Cronbach's α = 0.61–0.96 (46, 50–53).

Test–retest reliability: intraclass correlation coefficient 0.84–0.96 (45, 46, 50–52).

Validity.

Content validity.

Content validity was questioned in 1 study (13). Minimal floor and ceiling effects of the total score are described in 2 studies (50, 51), but higher ones are also described in 2 additional studies (6, 52). Normal distribution of the scores is reported (6).

Criterion validity.

In the absence of a gold standard, the obvious content validity of the used items and the numerous studies of the pASES give it a certain intrinsic validity. The ASES has not been examined by item-response theory, factor, or Rasch analysis.

Construct validity.

Pearson's or Spearman's correlations of the pASES total score to other instruments are as follows:

  • SPADI: 0.92 and 0.81 (ref.6, 52)

  • Western Ontario Rotator Cuff index: 0.81 (ref.47)

  • DASH: 0.79–0.92 (ref.6, 50, 52)

  • CS: 0.71 (ref.6)

  • Rotator Cuff QOL: 0.70 (ref.47)

  • SF-36 bodily pain: 0.60 and 0.65 (ref.50, 52)

  • SF-36 PCS: 0.48 and 0.64 (ref.6, 50, 52)

  • SF-36 physical functioning: 0.47 and 0.57 (ref.50, 52)

  • cASES: 0.48 (ref.6) SF-36 MCS: 0.24 and −0.20 (ref.6, 50)

Ability to detect change.

Minimally detectable change (MDC95%): 11.2 (46).

Minimum clinically important difference (MCID): 6.4 (46) and 12.0–16.9 (49).

Effect sizes (ES) and standardized response means (SRMs) of the pASES total score are as follows:

  • Osteoarthritis: total or hemi shoulder arthroplasty: ES 3.53 (ref.54)

  • Rheumatoid, osteoarthritis: total shoulder arthroplasty: ES 2.13, SRM 1.81 (ref.19)

  • Calcific tendinitis: subacromial steroid: ES 1.65–1.84 (ref.48)

  • Various, mainly impingement: physiotherapy: ES 1.39, SRM 1.54 (ref.46)

  • Rotator cuff disease: SRM 1.42 (ref.47)

  • Rotator cuff, instability, arthritis: surgery: ES 0.93–1.16 (ref.51)

Critical Appraisal of Overall Value to the Rheumatology Community

Strengths.

Recommended by the ASES and, by that, widespread use, especially in American centers. The ASES showed good reliability, high construct validity, and high responsiveness.

Caveats and cautions.

Mix of scales (binary, Likert, VAS). Limited content, especially criterion validity. In the case where an MDC95% is reported to be higher than the MCID, the MDC95% should be taken as the MCID.

Clinical usability.

Helpful combination of self- and clinical assessment.

Research usability.

Good applicability for research and good responsiveness. Slightly longer than and less frequently used as the Shoulder Pain and Disability Index. Some methodologic weaknesses.

CONSTANT (MURLEY) SCORE (CS)

Description

Purpose.

“The method records individual parameters and provides an overall clinical functional assessment … applicable irrespective of details of the diagnostic or radiological abnormalities … , sufficiently sensitive to reveal even small changes in function” (55). Introduced in 1987 (55). Revision in 2008 (56).

Content and number of items.

The score consists of 4 domains: pain (1 item), activities of daily living (ADL; 3 items for activity level, i.e., work, sports, sleep, 1 item for hand positioning, i.e., rotation), mobility (4 items: forward and lateral abduction/elevation, external and internal rotation), and power/strength (1 item). Pain and ADL 1–3 are interviewed from the patient (i.e., self-assessed); all other items are examiner assessed.

Response options/scale.

Pain item: originally 4 Likert levels, visual analog scale in the revised version (55, 56), where 0 = maximal pain and 15 = no pain. ADL: Likert scales, where 0 = worst and 5 = best for each item. Mobility: active, pain-free range of elevation: +2 points per 30°, where 0 = worst and 10 = best for each item; position of hand: 0 = worst to 10 = best (55–57). Strength is measured at 90° lateral abduction by use of either an Isobex device or a defined spring balance technique: 1 point per 0.5 kg (∼1 lbs), maximum 25 points (56).

Recall period for items.

1 week.

Endorsements.

European Society for Surgery of the Shoulder and the Elbow (SECEC-ESSSE) and recommend by the German Society of Shoulder and Elbow Surgeons.

Examples of use.

Relevant settings (aims and analysis [references]) for the CS are as follows:

  • No empirical field testing (development of the CS [55])

  • Referring to previous studies (revision of the CS [56])

  • Systematic literature review (psychometric properties of the CS [57])

  • No shoulder pain/disability (normative CS values [58])

  • Various shoulder dysfunctions (intra- and intertester reliability [59])

  • Various, mainly rotator cuff (validity, responsiveness [60])

  • Impingement (validity, responsiveness [61–63])

  • Degenerative, inflammatory (validity [64])

  • Rotator cuff repair (validity [65, 66])

  • Shoulder instability (validity, responsiveness [67])

  • Osteoarthritis (responsiveness [68])

  • Rheumatoid, osteoarthritis (validity, responsiveness [6, 19])

Practical Application

How to obtain.

Original publication (55) and online at http://www.secec.org/data/upload/files/Constant%20 Score.pdf.

Method of administration.

Clinical examination plus patient interview (self-assessment). Retrospective data extraction from the case history is not reliable, especially not for the patient's self-assessment items.

Time to complete.

5–7 minutes (61).

Scoring.

The sum of the subscores results in the CS total score: pain (0–15) + ADL (4 × (0–5) = 0–20) + mobility (4 × (0–10) = 0–40) + strength (0–25).

Score interpretation.

0 = worst and 100 = best function. Comparison with the contralateral side is possible. Different norm data are available, and in the past, expressed as a percentage of age-adjusted norm data, the relative CS was recommended, but is problematic because of different norm cohorts (58).

Respondent burden.

Minimal (see below). All items are easy to understand and not emotionally sensitive.

Administrative burden.

Moderate because the CS can be implemented in a normal clinical investigation (57). The measurement of strength demands some extra effort. Score calculation is easy and can be implemented in any calculation software.

Translations/adaptations.

The CS is used in almost every language without official translations because surgeons perceived the score as a clinical measure (57). In French, a validated translation/adaptation has been published.

Psychometric Information

Method of development.

The score was originally developed as part of a master's thesis and later published (55). The methodology of development was not reported or specified. The score was revisited by the SECEC-ESSSE members (56).

Acceptability.

High acceptance by patients because the items have a high relevance. Acceptance among surgeons is very high due to the clinical relevance.

Reliability.

Internal reliability/consistency: Cronbach's α = 0.37 and 0.60, respectively (60, 66).

Test–retest reliability: intraclass correlation coefficient 0.80–0.96 (57). Repeated strength measurements revealed high intratester but low intertester reliability (59).

Validity.

Content validity.

No floor and ceiling effects for the CS total score were shown, but the subscores, especially strength (when unable to reach 90° abduction), reached substantial floor levels (i.e., no strength) (6, 64). The CS total score was normally distributed (6).

Criterion validity.

There is no gold standard for self- and examiner-assessed shoulder function. There is an ongoing debate about the appropriate measure for abduction strength. Whereas originally an unsecured spring balance was utilized (55), the last modification of the score advocates Isobex measurement (56). However, both are strongly correlated to each other. Large variations in handling the testing protocol have been reported leading to a large interobserver variance (59). There are no data about factor, Rasch analysis, or item-response theory.

Construct validity.

Pearson's or Spearman's correlations of the CS to other instruments are as follows:

  • ASES: 0.72–0.87 (ref.6, 65, 66)

  • OSS: 0.65–0.87 (ref.61, 64)

  • DASH: 0.82, 0.76, and 0.50 (ref.6, 64)

  • SPADI: 0.53 and 0.82 (ref.6, 64)

  • WOSI: 0.58 (ref.67)

  • SST: 0.49 (ref.65)

  • SF-36 PCS: 0.45 (ref.6)

  • Rating of change (shoulder): 0.32–0.70 (ref.63)

  • SF-36 MCS: 0.02 (ref.6)

Considerably low correlations were found in 1 study (64).

Ability to detect change.

Minimally detectable change (MDC95%) and minimum clinically important difference (MCID): no data published.

Effect sizes (ES) and standardized response means (SRMs) of the CS total score are as follows:

  • Osteoarthritis: hemi or total shoulder arthroplasty: ES 3.02 (ref.54)

  • Rheumatoid, osteoarthritis: total shoulder arthroplasty: ES 2.23, SRM 1.99 (ref.19)

  • Impingement: arthroscopic decompression: ES 0.65–1.92, SRM 0.62–2.09 (ref.63)

  • Impingement: open decompression: ES 1.60, SRM 1.39 (ref.61)

  • Impingement: acupuncture, transcutaneous electrical nerve stimulation: ES 1.29 and 0.73 (ref.62)

  • Shoulder instability: physiotherapy ± surgery: SRM 0.59 (ref.67)

  • Various, mainly rotator cuff: surgery: ES 0.58, SRM 0.57 (ref.60)

Critical Appraisal of Overall Value to the Rheumatology Community

Strengths.

The CS covers the clinically most relevant domains and shows high responsiveness. It is highly accepted throughout the clinical community in the fields of arthroplasty, rotator cuff disease, shoulder trauma, and fractures.

Caveats and cautions.

There are sparse, and in some parts, no data about reliability and validity (except construct validity). Intertester reliability was shown to be low. Different versions and measurement methodologies lead to problems when comparing data. How to measure strength has not been standardized yet. The relative CS (percentage of norm data) is invalid due to different norm data. Only 1 pain item and only 3 ADL items may be not sufficient to adequately assess self-rated pain and function. Due to lack of testing data (MDC95%, MCID), caution is necessary for measurement at an individual patient level.

Clinical usability.

The CS is in widespread clinical use. The CS often serves as the mandatory part of a measurement protocol, especially in Europe. It is not suitable for patients with instability conditions. Due to lack of testing data or insufficient measurement properties, caution is necessary for measurement at an individual patient level.

Research usability.

Limited due to the caveats, especially insufficiently testing of validity.

SIMPLE SHOULDER TEST (SST)

Description

Purpose.

To assess functional disability of the shoulder (68).

Content and number of items.

Total score of 12 items: 2 about function related to pain, 7 about function/strength, and 3 about range of motion (32). No subscales.

Response options/scale.

Dichotomous responses: 1 = yes (function possible) and 0 = no.

Recall period for items.

Actual/at the moment of assessment.

Endorsements.

None.

Examples of use.

Relevant settings (aims and analysis [references]) for the SST are as follows:

  • Normal and affected shoulders (development of the SST [68])

  • Shoulder instruments (important comparative review [7])

  • Various shoulder problems (validity, responsiveness [32, 37])

  • Shoulder injuries (reliability, validity, responsiveness [69])

  • Shoulder joint destruction (responsiveness, minimum clinically important difference [MCID] [70])

  • Rotator cuff, conservative (MCID [49])

  • Rotator cuff repair (validity, responsiveness [71, 72])

  • Orthopedic practice (validity, factor, minimal detectable difference [38])

  • Orthopedic practice (Rasch, partial credit model [39])

Practical Application

How to obtain.

Original publication (68). Free online at http://www.orthop.washington.edu/PatientCare/OurServices/ShoulderElbow/Articles/SimpleShoulderTest.aspx.

Method of administration.

Self-assessment.

Time to complete.

2–3 minutes.

Scoring.

Original score: 0 = worst and 12 = best. Transformed by: number of “yes” items/number of completed items × 100 = % “yes” responses.

Score interpretation.

0 = worst and 100 = best function. A missing rule, distinct cutoffs for severity, and normative data have not been published.

Respondent burden.

Very short; easy to understand and not emotionally sensitive.

Administrative burden.

Free online. Score computation is very easy and possible by hand. Time to administer and determine: estimated 5 minutes.

Translations/adaptations.

No data published.

Psychometric Information

Method of development.

“Questions derived from Neer's evaluation, the ASES [American Shoulder and Elbow Surgeons] evaluation and the most frequent complaints of patients observed in the shoulder practice at the University of Washington” (68). Further details on how item content was selected have not been described. Item-response theory was applied later (39).

Acceptability.

All item content is easy to read and understand. Missing data are rare. Low floor and ceiling effects (32, 69).

Reliability.

Internal reliability/consistency: Cronbach's α = 0.85 (38).

Test–retest reliability: intraclass correlation coefficients 0.97 and 0.99 (37, 69).

Validity.

Content validity.

Low floor and ceiling effects (32, 69). Score distribution has not been further examined.

Criterion validity.

In the absence of a gold standard, the obvious content validity of the used items and the testing studies give a certain intrinsic validity to the SST. Factor analysis revealed a 2-factor solution and questions the 1-factor total score (38). Across the entire continuum of shoulder functioning, function was not measured with equal precision but with very large confidence intervals, i.e., larger than the ASES and Shoulder Pain and Disability Index (SPADI) (39). In Rasch analysis, items 2 (… shoulder allows you to sleep comfortably?) and 1 (is your shoulder comfortable … at rest?) showed misfit (39).

Construct validity.

Pearson's or Spearman's correlations of the SST to other instruments are as follows:

  • SPADI: 0.74 and 0.80 (ref.32, 38)

  • ASES: 0.73 and 0.81 (ref.32, 69)

  • DASH: 0.72 (ref.71) CS: 0.70 (ref.72)

  • Western Ontario Rotator Cuff index: 0.68 (ref.71)

  • SF-36 bodily pain: 0.62 (ref.32)

  • SF-36 physical functioning: 0.58 (ref.32)

  • SF-12 PCS: 0.44 (ref.69)

  • SF-36 PCS: 0.40 and 0.60 (ref.32, 71)

  • SF-36 MCS: 0.16 (ref.71)

Ability to detect change.

Minimally detectable change (MDC95%) for the range 0–100: 32.3 (38).

MCID for the range 0–12: 2.05 and 2.33 for rotator cuff disease (49); 3 points for shoulder arthroplasty (70). Corresponds to MCID 17.1–25.0 for the range 0–100.

Effect sizes (ES) and standardized response means (SRMs) of the SST are as follows:

  • Osteoarthritis: shoulder arthroplasty: ES 2.17–2.87, SRM 1.43–1.94 (ref.70)

  • Rotator cuff: repair: SRM 1.09 (ref.71)

  • Injury: rotator cuff surgery: ES 1.08, SRM 1.01 (ref.69)

  • Rotator cuff surgery + total shoulder arthroplasty: SRM 0.87 (ref.37)

  • Injury: instability surgery: ES 0.61, SRM 0.63 (ref.69)

Critical Appraisal of Overall Value to the Rheumatology Community

Strengths.

Very short and easy to use. Good construct validity.

Caveats and cautions.

Substantial lack of criterion validity (testing data). Due to binary response options, questionable use of the SST score as a metric measure, especially for responsiveness (as analogously shown by versions 1 and 2 of the SF-36). In the case where an MDC95% is reported to be higher than the MCID, the MDC95% should be taken as the MCID.

Clinical usability.

Easy to use; widespread use in the US. Due to lack of testing data or insufficient measurement properties, caution is necessary for measurement at an individual patient level.

Research usability.

Limited due to lack of non-English versions and the caveats.

OXFORD SHOULDER SCORE (OSS)

Description

Purpose.

Self-assessment of pain and function of the shoulder. Settings: shoulder operations other than stabilization (73). First published in 1996 (73). “Revision” in 2009 concerns only the specifications for use, not the content (74).

Content and number of items.

12 items: 4 about pain (2 for pain, 2 for interference with pain) and 8 about daily functions.

Response options/scale.

Each item is scored into 5 Likert categories: 1 = no pain/easy to do, 2 = mild pain/little difficulty, 3 = moderate pain/moderate difficulty, 4 = severe pain/extreme difficulty, and 5 = unbearable/impossible to do. In the revision study and on the online form (see below), the item scoring is 0 (worst) to 4 (best).

Recall period for items.

4 weeks.

Endorsements.

None.

Examples of use.

Relevant settings (surgery; aims and analysis [references]) for the OSS are as follows:

  • Degenerative, inflammatory (development of the OSS, revision [73, 74])

  • Degenerative, inflammatory (validity, responsiveness [64])

  • Subacromial impingement (reliability, validity, responsiveness [43, 61, 75])

  • Rotator cuff (responsiveness [35, 76, 77])

  • Osteoarthritis (responsiveness [78])

  • Proximal humerus fracture (validity [79])

Practical Application

How to obtain.

Original published in 1 study (73) and online at http://phi.uhce.ox.ac.uk/pdf/OxfordScores/Oxshoulderscore.pdf. Online form for automatic calculation is found at http://www.orthopaedicscore.com/scorepages/oxford_shoulder_score.html.

Method of administration.

Self-assessment.

Time to complete.

2 minutes.

Scoring.

The (total) score is the sum of the (completed) 12 items (scoring 1–5): 12 = best and 60 = worst (73). In the revision, it is 0 = worst and 48 = best (item scoring 0–4) (74). The online form (see above) also scores on 0–48. However, missing items are scored by a 5, which is a mistake on the online form that may lead to wrong scores. How to deal with missing items has only been described for the revision: ≥10 of 12 items have to be completed (74). To compare with other instruments, we recommend total score = (m − 1) × 25, where m = mean of the completed items (originally scaled 1–5, where 5 = worst): 0 = best and 100 = worst or transformed by (100 − total score) into 0 = worst and 100 = best, as for the Short Form 36 (SF-36), and the same for the revised item scaling 0–4 (4 = best): total score = m × 25 (64).

Score interpretation.

Total score, no subscores. Originally, 12 (no disability) to 60 (maximal disability). Revised OSS and online form: 0 (maximal disability) to 48 (no disability), where 0–19 = severe arthritis, 20–29 = moderate to severe arthritis, 30–39 = mild to moderate arthritis, and 40–48 = satisfactory joint function (published on the online form; see above). Normative data have not been published.

Respondent burden.

All items are easy to understand and to complete and are not emotionally sensitive.

Administrative burden.

Score computation is easy and needs no explanation. No training is needed to interpret the scores. Time to administer and score: ∼5 minutes.

Translations/adaptations.

Dutch, Italian, and German (75).

Psychometric Information

Method of development.

Open interviews of outpatients and review of established questionnaires created 22 items that were longitudinally tested in several steps, resulting in the 12-item version (73). Factor analysis or item-response theory was not used.

Acceptability.

All item content is short, easy to read, and understand. Missing data are rare (74). Very low floor and ceiling effects were shown (64, 75).

Reliability.

Internal reliability/consistency: Cronbach's α = 0.94 (75).

Test–retest reliability: Pearson's correlation = 0.98 (75). Intraclass correlation coefficient: no published data.

Validity.

Content validity.

No published data on score distribution. Low floor and ceiling effects (64, 75).

Criterion validity.

In the absence of a gold standard, the obvious content validity of the used items and the moderate number of published studies examining the OSS result in a moderate intrinsic validity. Rasch and factor analysis data have not been published.

Construct validity.

Pearson's or Spearman's correlations of the OSS to other instruments are as follows:

  • CS: 0.65–0.87 (ref.61, 64, 75, 79)

  • SPADI: 0.74 and 0.85 (ref.43, 61)

  • DASH: 0.79 (ref.61)

  • SF-36 bodily pain: 0.64–0.76 (ref.43, 61, 75)

  • SF-36 physical functioning: 0.57–0.68 (ref.43, 61, 75)

  • SF-36 PCS: 0.37 (ref.43)

Ability to detect change.

Minimally detectable change (MDC95%) and minimum clinically important difference (MCID): no published data.

Effect sizes (ES) and standardized response means (SRMs) of the OSS are as follows:

  • Osteoarthritis and rheumatoid arthritis: hemiarthroplasty: ES 2.3 (ref.78)

  • Impingement, rotator cuff: surgery: ES 1.10–1.88, SRM 1.10–1.14 (ref.61, 73, 76)

  • Rotator cuff: decompression (± cuff repair): ES 0.97 (ref.77)

  • Impingement: no treatment described: ES 0.96 (ref.43)

  • Degenerative, inflammatory: surgery: ES 0.61 (ref.64)

Critical Appraisal of Overall Value to the Rheumatology Community

Strengths.

Very short and responsive tool, easy to complete and to score. Specially constructed for surgical interventions. Construct validity to other measures is good. No costs to obtain.

Caveats and cautions.

Data about reliability and (especially criterion) validity are rather sparse. The OSS is not often used in literature. There is only 1 important study for conservative treatment (79). Due to lack of testing data (MDC95%, MCID), caution is necessary for measurement at an individual patient level.

Clinical usability.

Short tool for assessment of shoulder surgery. Easy to interpret. Due to lack of testing data or insufficient measurement properties, caution is necessary for measurement at an individual patient level.

Research usability.

Validity and usability for research are rather weak. Further testing is needed.

SHOULDER DISABILITY QUESTIONNAIRE (SDQ)

Description

Purpose.

Self-assessment of pain-related function of the shoulder. Settings: shoulder disorders in general (mainly soft tissue). First publication of a 22-item version in the UK (SDQ-UK) in 1994, which was not frequently used thereafter (80). Further development into the original 16-item SDQ in The Netherlands (SDQ-NL) in 1998 (81). Revision in 2000 (82).

Content and number of items.

16 items describing common situations or functions that may induce symptoms (mostly pain): “My shoulder hurts when I (do)… .”

Response options/scale.

All items are scored by “yes” = 1 or “no” = 0, and “not applicable” (missing).

Recall period for items.

24 hours.

Endorsements.

None.

Examples of use.

Relevant settings (aims and analysis [references]) for SDQ are as follows:

  • General population, primary care (development of the SDQ-UK [80])

  • Primary care (development, responsiveness [81])

  • Primary care (revision, responsiveness [82])

  • Shoulder instruments (comparative review [83, 84])

  • Shoulder pain (reliability, validity [85, 86])

  • Adhesive capsulitis (responsiveness [87, 88])

  • Rotator cuff (responsiveness [89])

  • Chronic shoulder pain (responsiveness [90])

Practical Application

How to obtain.

Published in 2 studies (81, 82). Online at http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1752535/pdf/v057p00082.pdf (see Appendix).

Method of administration.

Self-assessment.

Time to complete.

2 minutes.

Scoring.

The (total) score is calculated by dividing the number of positively scored items (value = 1) by the total of applicable/completed items and multiplying by 100.

Score interpretation.

0 = no disability and 100 = maximal disability. A missing rule, distinct cutoffs to reflect severity, and normative data have not been published.

Respondent burden.

All items are easy to understand and not emotionally sensitive.

Administrative burden.

Score computation is easy. Time to administer and score: ∼5 minutes.

Translations/adaptations.

English (80–82), Dutch (original, not published), Spanish, and Turkish (86).

Psychometric Information

Method of development.

Questions considered relevant to the shoulder were selected from the Functional Limitations Profile and a list of activities from therapists and patients was added (80–82). Reduction from 60 and 22 to 16 items according to the “judgmental approach” (81, 82). Data about factor analysis or item-response theory have not been published.

Acceptability.

Easy to read and understand. Missing data are rare (83). A substantial ceiling effect was shown (82).

Reliability.

Internal reliability/consistency: Cronbach's α = 0.76 and 0.79, respectively (85, 86).

Test–retest reliability: Pearson's correlation = 0.88 (86). Intraclass correlation coefficient: no data published.

Validity.

Content validity.

The content validity of the SDQ-NL was rated as doubtful in a comparison of multiple shoulder tools (84). There was a substantial ceiling effect (82, 84).

Criterion validity.

There are only sparse data on criterion validity (84). Rasch and factor analysis data have not been published.

Construct validity.

Pearson's or Spearman's correlations of the SDQ to other instruments are as follows:

  • VAS for function: 0.58 (ref.85)

  • SDQ-UK: 0.55 (ref.83)

  • VAS for pain: 0.41 (ref.85)

  • SPADI: 0.33 (ref.83)

  • ROM: 0.27–0.41 (ref.85)

Ability to detect change.

Minimally detectable change (MDC95%) and minimum clinically important difference (MCID): no data published. A mean change score of 40 points was highly specific for improvement (81).

Effect sizes (ES) and standardized response means (SRMs) of the SDQ are as follows:

  • Adhesive capsulitis: mobilization: ES 5.43 and 2.81, SRM 3.88 and 3.40 (ref.87)

  • Rotator cuff tendinitis: steroids, transcutaneous electrical nerve stimulation: ES 5.19 and 5.43, SRM 5.83 and 4.08 (ref.89)

  • Primary care: soft tissue, physiotherapy: SRM 2.22 and 1.14 (Guyatt's responsiveness index [ref.81, 82])

  • Adhesive capsulitis: steroids, physiotherapy: ES 1.73 and 1.12, SRM 1.32 and 0.97 (ref.88)

  • Chronic shoulder pain: graded exercise, usual care: ES 0.94 and 0.77, SRM 0.65 and 0.71 (ref.90)

Critical Appraisal of Overall Value to the Rheumatology Community

Strengths.

Very short tool, easy to complete and to score. No costs to obtain.

Caveats and cautions.

Data about reliability and validity are sparse. Content and criterion validity have been rated to be doubtful (84). Construct validity compared to other measures is weak. Published data indicate low reliability and validity, especially for measurement at an individual patient level (no data on MDC95%, MCID). Due to binary response options, questionable use of the SDQ score as a metric measure, especially for responsiveness (as analogously shown by versions 1 and 2 of the Short Form 36). Responsiveness results are extraordinarily high. The SDQ is rarely reported in the literature. There are no data on diseases related to the shoulder joint itself.

Clinical usability.

Although the SDQ is a short and easy to interpret tool, caution is necessary for clinical usability and measurement at a group or an individual patient level due to the lack of testing data and insufficient measurement properties (see above).

Research usability.

Weak and doubtful validity and usability for research. Further psychometric testing is needed.

WESTERN ONTARIO SHOULDER INSTABILITY INDEX (WOSI)

Description

Purpose.

“To evaluate the disease-specific quality-of-life of patients with symptomatic shoulder instability” (67). Settings: shoulder instability. First published in 1998 (67). Revision of scaling in 2005 (91).

Content and number of items.

21 items in 4 domains: physical symptoms, including pain (10 items), sports/ recreation/work (4 items), lifestyle (4 items), and emotions (3 items).

Response options/scale.

Each item is scored on a 100-mm visual analog scale (VAS). The use of a corresponding 11-point numerical rating scale (NRS; 0–10) to scan forms was later approved by the developer (92).

Recall period for items.

1 week.

Endorsements.

Recommended by the European Society for Surgery of the Shoulder and the Elbow (online at http://www.secec.org).

Examples of use.

Relevant settings (shoulder instability; aims and analysis [references]) for the WOSI are as follows:

  • Various reasons for instability (development, reliability, validity, responsiveness [67])

  • Randomized controlled trial: rehabilitation, surgery (revision of scaling, outcome [91])

  • Shoulder instruments (comparative review [93, 94])

  • Physiotherapy, surgery (German WOSI: reliability, validity, responsiveness [92, 95])

  • Stabilization surgery (Swedish WOSI: reliability, validity, responsiveness [96])

There are several other studies that have used the WOSI, but they only report followup without baseline scores and have been excluded from the review for this reason.

Practical Application

How to obtain.

Published in (67, 91, 92, 95, 96). Free online at http://www.secec.org/data/upload/files/Western %20Ontario%20Shoulder%20Instability%20Index%20(WOSI).pdf.

Method of administration.

Self-assessment.

Time to complete.

No data published. Estimated: 3 minutes.

Scoring.

Sum of 21 unweighted items (0 = best and 100 = worst).

Score interpretation.

0 = best to 2,100 = worst. We recommend a transformed score by 100 − original score/21 ranging from 0 = worst to 100 = best to be comparable to other instruments, as the Short Form 36 (SF-36) (91). A missing rule (we recommend at least 2 of 3 = 14 of 21 completed items), distinct cutoffs for severity, and normative data have not been published.

Respondent burden.

Minimal; easy questions and use of the VAS.

Administrative burden.

Reduced with the use of the 11-point NRS. Easy computation of the total score. Estimated 6 minutes.

Translations/adaptations.

Validated versions are available in Swedish (96) and in German: 2 simultaneously published versions, one is approved by the developer (92) and the other is not approved (95).

Psychometric Information

Method of development.

Item generation by review of the literature (other instruments) and interview of specialists and patients with shoulder instability (67). Item reduction based on expert group, empirical testing (patient's perception of item importance), and inter-item correlation (67).

Acceptability.

Highly accepted by patients and surgeons because of the importance of the item contents. No floor and ceiling effects (92, 96). The WOSI got the best rating of psychometric properties in a systematic review (93).

Reliability.

Internal reliability/consistency: Cronbach's α = 0.88–0.96 (92, 95, 96).

Test–retest reliability: intraclass correlation coefficient 0.87–0.98 (67, 92, 95, 96).

Validity.

Content validity.

Item content established by patients and experts. No floor or ceiling effects (92, 96). Score distribution has not been further examined.

Criterion validity.

There is no gold standard to measure shoulder instability. The obvious content validity of the items and the data from the psychometric testing studies result in a certain intrinsic validity. No data on item-response theory, factor, or Rasch analysis have been published.

Construct validity.

Pearson's or Spearman's correlations of the WOSI to other instruments are as follows:

  • VAS for function: 0.80 (ref.96)

  • DASH: 0.77 (ref.67)

  • SF-12 PCS: 0.66 (ref.67)

  • CS: 0.59 (ref.95)

  • Rowe score: 0.59 (ref.96)

  • Shoulder rating scale: 0.59 (ref.67)

  • SF-36 bodily pain: 0.56 (ref.95)

  • ASES: 0.55–0.67 (ref.67, 92)

  • SF-36 physical functioning: 0.44 (ref.95)

  • EQ-5D: 0.44 (ref.96)

  • SF-12 MCS: 0.12 (ref.67)

Ability to detect change.

Minimally detectable change (MDC95%) and minimum clinically important difference (MCID): no data published.

Effect sizes (ES) and standardized response means (SRMs) of the WOSI are as follows:

  • Stabilization surgery: ES 1.67, SRM 1.40 (ref.96)

  • Physiotherapy ± stabilization surgery: SRM 0.93 (ref.67)

Critical Appraisal of Overall Value to the Rheumatology Community

Strengths.

Relevant questions and domains, high patient acceptance, good construct validity. Psychometrically best-tested tool for shoulder instability (96).

Caveats and cautions.

Substantial lack of validity and responsiveness testing data. Due to lack of testing data (MDC95%, MCID), caution is necessary for measurement at an individual patient level. Unusual scale from 0 = worst to 2,100 = best in the original scaling.

Clinical usability.

Highly accepted by patients. Easy to use as patient self-assessment, no clinical examination necessary (as often to be done for instability). Due to lack of testing data or insufficient measurement properties, caution is necessary for measurement at an individual patient level.

Research usability.

Has to be recommended as the best psychometrically tested tool for shoulder instability, also in a set of different instruments if instability is present (96). However, there is still lack of testing data.

Table  . Summary Table for Adult Shoulder Function Measures*
ScalePurpose/contentMethod of administrationRespondent burdenAdministrative burdenScore interpretationReliability evidenceValidity evidenceAbility to detect changeStrengthsCautions
  • *

    DASH = Disabilities of the Arm, Shoulder, and Hand questionnaire; ICC = intraclass correlation coefficient; C = clinical use; + = appropriate/to be recommended; R = use for research; QuickDASH = short form of the DASH; parentheses = use with caution; SPADI = Shoulder Pain and Disability Index; ASES = American Shoulder and Elbow Surgeons questionnaire for the shoulder; pASES = patient ASES; ADL = activities of daily living; cASES = clinical ASES; CS = Constant (Murley) Score; – = not appropriate/not to be recommended; SST = Simple Shoulder Test; OSS = Oxford Shoulder Score; SDQ = Shoulder Disability Questionnaire; WOSI = Western Ontario Shoulder Instability index.

DASH
  • Symptoms and function of the entire upper extremity

  • 30 items

Self-assessment
  • 4 minutes

  • Easy to understand

  • 10 minutes

  • Hand or computer score

  • Total score and subscores symptoms and function

  • 0 = best, 100 = worst

  • Cronbach's α = 0.92–0.98

  • ICC 0.93–0.98

  • Content and construct very high

  • Criterion moderate

Moderately responsive for shoulder conditions
  • Best tested, most widely used

  • Also for multilocular conditions

  • C+, R+

Long, strict missing rule, not shoulder specific
QuickDASH
  • Extraction of the DASH: symptoms and function of the entire upper extremity

  • 11 items

Self-assessment
  • 2 minutes

  • Easy to understand

  • 8 minutes

  • Hand or computer score

  • Total score

  • 0 = best, 100 = worst

  • Cronbach's α = 0.92–0.95

  • ICC 0.90–0.94

  • Content and construct high

  • Criterion moderate to high

Moderately responsive for shoulder conditions
  • Very well tested, very short

  • Also for multilocular conditions

  • C+, R(+)

Total score only. Not exactly the same construct as the DASH, not shoulder specific
SPADI
  • Pain and function

  • 13 items

Self-assessment
  • 2 minutes

  • Easy to understand

  • 5 minutes

  • Hand or computer score

  • Total score and subscores pain and function

  • 0 = best, 100 = worst

  • Cronbach's α = 0.86–0.96

  • ICC 0.84–0.95

  • Content and construct high

  • Criterion moderate to high

Highly responsive
  • Very well tested, short, responsive

  • C+, R+

Few proviso regarding validity
ASES
  • pASES: pain, ADL, instability

  • 16 items

  • cASES: range of motion, signs, strength, instability

  • 34 items

Self- and examiner assessment
  • pASES: 3 minutes

  • Easy to understand

  • cASES: clinical examination is time consuming

  • 8 minutes (pASES)

  • Hand or computer score

  • Total score and subscores

  • 0 = worst, 100 = best.

  • Cronbach's α = 0.61–0.96

  • ICC 0.84–0.96

  • Content and construct moderate to high

  • Criterion: questionable but lack of data

Highly responsive
  • Moderately tested, responsive, widespread use

  • C+, R(+)

Long tool, not easy to score and interpret. Mix of different item scales. Limited criterion validity. Sparse data for the cASES
CS
  • Pain, ADL, mobility, strength

  • 10 items

Interviewer (self-) and examiner assessment combined
  • 5–7 minutes

  • Easy to understand

  • 10 minutes

  • Hand score

  • Total score

  • 0 = worst, 100 = best (subscores' maximum = best: 15, 20, 40, 25)

  • Cronbach's α = 0.37 and 0.60

  • ICC 0.80–0.96

  • Low intertester reliability

  • Content moderate

  • Criterion low

  • Construct high

Highly responsive
  • Most used (partly) clinical tool, widely accepted, responsive

  • C+, R(–)

Lack in reliability and validity (testing). Different protocols for measuring strength
SST
  • Function

  • 12 items

Self-assessment
  • 2–3 minutes

  • Very easy to understand

  • 5 minutes

  • Hand score

  • Total score

  • 0 = worst, 100 (originally 12) = best

  • Cronbach's α = 0.85

  • ICC 0.97 and 0.99

  • Content and construct moderate to high

  • Criterion low

Doubtfully/ moderately responsive
  • Widely used in US, very short and simple

  • C(+), R–

Lack in criterion validity. Questionable use as metric score
OSS
  • Pain and function

  • 12 items

Self-assessment
  • 2 minutes

  • Easy to understand

  • 5 minutes

  • Hand or computer score

  • Total score

  • 0 = worst, 48 = best

  • Cronbach's α = 0.94

  • ICC: no data

  • Content moderate

  • Construct high

  • Criterion: lack of data

Highly responsive
  • Very short, specific for surgery

  • C(+), R–

Lack in reliability and validity testing, especially in nonsurgical conditions
SDQ
  • Pain-related function

  • 16 items

Self-assessment
  • 2 minutes

  • Very easy to understand

  • 5 minutes

  • Hand score

  • Total score

  • 0 = best, 100 = worst

  • Cronbach's α = 0.76 and 0.79

  • ICC: no data

Low/questionable validity in all domainsDoubtfully responsive
  • Very short, easy to determine

  • C–, R–

Lack in reliability and validity (testing). Questionable use as metric score
WOSI
  • (In)stability

  • 21 items

Self-assessment
  • 3 minutes

  • Easy to understand

  • 6 minutes

  • Hand or computer score

  • Total score

  • 0 = best, 2,100 = worst

  • Cronbach's α = 0.88–0.96

  • ICC 0.87–0.98

  • Content and construct moderate to high

  • Criterion: sparse data

Well responsive but lack of data
  • Best-tested tool for instability, high clinical acceptance

  • C(+), R(+)

Still lack of data about validity and responsiveness

AUTHOR CONTRIBUTIONS

All authors were involved in drafting the article or revising it critically for important intellectual content, and all authors approved the final version to be published.

Acknowledgements

We gratefully thank Susanne Lehmann and Franziska Kohler for their work in procuring and organizing the literature, Carol Kennedy and Dorcas E. Beaton for commenting and editing the DASH chapter, Kathreen E. Roach for commenting and editing the SPADI chapter, Susann Drerup for commenting and editing the WOSI chapter, and Joy Buchanan for her English editing.

Ancillary