Comprehensive assessment of clinical outcome and quality of life after total shoulder arthroplasty: Usefulness and validity of subjective outcome measures




To explore the physiometric and psychometric properties of clinical, generic, and condition-specific assessment instruments. To describe patients' outcome after total shoulder arthroplasty.


Forty-three patients were assessed in a 5–6-year cross-sectional catamnesis.


With regard to shoulder joint stability, pain, general physical health, and mental health, the patients showed scores comparative to normative scores. Significant functional limitation was evidenced by low mean scores on the specific function scales (e.g., Disability of the Arm, Shoulder and Hand questionnaire score = 64.0, normative score = 86.6). There were high correlations among the joint-specific scales (up to 0.93) and moderate correlations between these and the generic and clinical scales. Factor analysis identified 3 different assessment domains.


The patients' quality of life (QOL) was high and not affected by impairment in some specific functional abilities. Physical QOL, mental QOL, clinical assessment, condition-specific measures, and generic measures were identified as separate domains, all of which are required for a comprehensive and sophisticated assessment in practical clinical routine.


Destruction of the shoulder joint by degenerative or inflammatory disorders is a common problem and has a high public health impact. Worldwide, rheumatoid arthritis (RA) has a lifetime prevalence of ∼1%, and the shoulder joint is affected in 57–96% of these cases (1). Osteoarthritis (OA) is 1 of the 3 most disabling diseases; in the 1980s, clinically defined OA showed a point prevalence of 12% in the US population aged 25–74 years (2). As a consequence of increasing life expectancy, an increase of almost 40%—resulting in up to 18.9 × 106 disability-adjusted life years lost due to OA—has been estimated to take place in developed countries over the period 1990–2020 (3, 4).

In the treatment of primary or secondary joint destruction, the development of artificial joint replacement represented a milestone in the prevention of permanent disability (5). Introduced by Neer in the early 1970s, total shoulder arthroplasty has now become a well-accepted procedure. There are multiple reports in the literature of the mid- and long-term results of shoulder arthroplasty (5–7). However, in only a few of them were standardized, valid, and comprehensive assessment instruments employed. This is also true for other shoulder conditions.

Over the last few years, the development of clinical outcome tools has shown that standardized, well-tested instruments can give a valid and reliable reflection of the patient's health status and quality of life (QOL) in different health disorders and across different settings (8, 9). This is particularly so when using self-administered questionnaires. In addition to assessing the direct impact of the disease on specific joint function, self-rating questionnaires allow the assessment of global functional capacity during the performance of everyday activities and of the patient's ability to participate in social activities, each of which plays an important part in the modern World Health Organization's International Classification of Functioning, Disability, and Health concept of health and disease (9, 10).

With the goal of developing a standardized assessment tool that would be feasible for use in the clinical environment, we tested a set of both clinical (physician-assessed) and patient self-administered health measurement instruments in a cross-sectional followup examination of patients 5–6 years after total shoulder arthroplasty. The aims of the study were to describe health status and QOL compared with population-based normative data using a holistic, comprehensive assessment approach. As a first step to finding the optimal set for future routine clinical use, we examined the validity of the instruments, the quality of the data obtained, and the feasibility of using the assessment tools in daily clinical practice.



All patients who had undergone shoulder arthroplasty at the Department of Upper Extremity and Hand Surgery, Schulthess Klinik, Zurich, Switzerland in the years 1996 and 1997 were sent a written invitation to attend a followup consultation with the study physician (GP). The patients were then contacted by telephone, which provided the opportunity to motivate them to come to the clinic, to answer any outstanding questions, and, in the case of patients who did not want to attend an assessment, to establish why the invitation was declined. Travel expenses were refunded, but otherwise no payment was made for participation.


All validated, clinically well-tested generic and specific health measurement instruments for the upper extremity were identified from a search of the literature in PubMed (11) using the key words “shoulder,” “instrument,” “assessment,” and “QOL.” Examination of the articles identified by the search revealed >2,500 potentially relevant references. In a second step, the various instruments described in these articles were qualitatively judged and rated in relation to their practical handling, suitability for use in clinical routine, and clinical–epidemiologic qualities. In particular, the items and the health dimensions that they covered were assessed for their relevance in relation to the aims of the study. This led to the selection of a set of objective clinical instruments to be used by the physician, and a series of subjective patient self-rated questionnaires. The set comprised radiographic and ultrasound imaging and the following assessment tools (described in detail below): 1) a sociodemographic questionnaire, 2) the Self-administered Comorbidity Questionnaire V (SCQ), 3) the Short Form 36 (SF-36), 4) the Disability of the Arm, Shoulder and Hand questionnaire (DASH), 5) the Shoulder Pain and Disability Index (SPADI), 6) the American Shoulder and Elbow Surgeons questionnaire for the shoulder (ASES), and 7) the Constant (Murley) Score (CS).

Of the many instruments identified by the review, a number were selected that enquired about symptoms, pain, and function in a comprehensive but also situation-specific or function-specific manner, ideally using a multilevel rating scale (e.g., from 0 = worst to 10 = best). Additionally, they had to be well validated and enjoy widespread use, as evidenced by their frequent citation in the literature. The SF-36 is the best tested and most commonly used generic self-assessment instrument (9). The DASH, the ASES, and the SPADI are also commonly used and appeared to cover all the themes and issues enquired about in other instruments, such as the Shoulder Disability Questionnaire (SDQ) (12, 13), the Shoulder Rating Questionnaire (SRQ) (14), the 12-item Shoulder Questionnaire (15), and others. The Simple Shoulder Test (SST) (16) investigated pain using 1 item only and all other items with yes/no answers rather than a graded scale (as did the SDQ). Furthermore, some of the items in the SDQ were considered difficult to understand for the patient. All these instruments were therefore excluded from our chosen set to keep it as short as possible.

Information about sociodemographics and disease-modifying parameters (e.g., participation in sports, smoking, work, social support, etc.) were gathered using a standardized questionnaire from a previous study (17). Satisfaction with the surgical intervention was indicated on a 0–10 visual analog scale (VAS).

Comorbidity was investigated using a new, short, well-tested, standardized questionnaire, the Self-administered Comorbidity Questionnaire, enquiring about the 15 most important concomitant diseases (18): hypertension, heart disease, stroke/arteriosclerosis, affective disease, diabetes, obesity/dyslipidemia, cancer, use of alcohol or drugs, lung disease, kidney disease, liver disease, gastroenterologic disease/peptic ulcer, blood disorders/anemia, joint disease, and back problems. Each disease was self rated in terms of its presence or absence, the corresponding use of medication, and the associated limitation in activity (i.e., 3 dimensions or domains). This allowed determination of a score for each of the 15 diseases (0 = no disease to 3 = disease with medication and limitation) and hence a total comorbidity score of 0–45.

The SF-36 is a self-administered generic QOL instrument (19) that assesses physical, mental, and biopsychosocial health in a holistic manner. The SF-36 has been used all over the world in many studies and languages, and its quality (e.g., reliability, validity, responsiveness) has been proven in various settings. We used the validated German “acute” version, which enquires about symptoms and functioning in the last week (20). Previous studies have shown that the SF-36 displays excellent psychometric properties and is more responsive to change in patients with rheumatic conditions than are other, longer instruments (21). The 36 questions of the SF-36 comprise 8 subscales, each containing between 2 and 10 items, plus a single item to assess health transition. The scales cover the dimensions of physical functioning (unweighted mean of 10 items), role physical (4 items), bodily pain (2 items), general health (5 items), vitality (4 items), social functioning (2 items), role emotional (3 items), and mental health (5 items). It yields a total score ranging from 0 (maximal symptoms/maximal limitations/poor health) to 100 (no symptoms/no limitations/excellent health). The SF-36 allows the construction of 2 summary scales, the physical component summary (PCS) and the mental component summary (MCS). Both are standardized relative to the (United States) normative values in such a way that the PCS and MCS would result in a mean of 50 and a standard deviation of 10 points when determined in the US population survey (20). To determine the individual scale scores, at least 50% of the items have to be filled out (“missing rule” in the user's guide) (19). New German normative data from a population survey (n = 6,948) have been published recently and allow stratification with respect to sex, age, and comorbidity (22).

The DASH is a comprehensive self-administered questionnaire about symptoms and functioning of the entire upper extremity (23). The DASH enquires about the ability to perform simple and complex activities of daily living (ADL) that are commonly performed with either one arm or both arms, e.g., changing a light bulb overhead, washing or drying one's hair, making a bed. The DASH is widely used and has been validated in German (23, 24). It has been applied in various settings and shows excellent validity and responsiveness (25). The total DASH score is derived from the unweighted mean of 30 items, of which at least 27 items have to be answered (“missing rule” in the user's guide). We did not apply the items of the optional sports/arts module or the work module. To compare the DASH data with those of the other instruments, the original scoring (0 = best, 100 = worst) was transformed into 0 = worst, 100 = best. Normative data have been published recently by the American Academy of Orthopedic Surgeons (n = 1,656), stratified by sex, age, and comorbidity (26).

The SPADI is a short, self-administered questionnaire that uses a VAS from 0 = best to 11 = worst health for each item (27). Five items assess pain and 8 items assess disability in ADL of the arm (function); the unweighted means are used to determine the pain and function scores. The average of these 2 scores then gives the total SPADI score. Each score (pain, function, total SPADI score) was transformed into a scale from 0 = worse situation to 100 = best situation. The English version of the SPADI was translated into German according to the guidelines of the American Academy of Orthopedic Surgeons outcomes committee (28). Two native German speakers translated the questionnaire and then agreed on a final German version. This version was back-translated by 2 native English speakers to identify any inconsistencies or misunderstandings in the original translation. Consensus regarding the final version to be implemented was reached by a committee comprising an orthopedic surgeon, a physician, one of the native English translators, one of the native German translators, a methodologist, and a statistician. Previous studies have shown that the SPADI scores are reliable and valid and have a high correlation with changes in active shoulder range of motion (27) and with the DASH (25). In comparison with the SF-36 and with 4 other shoulder outcome measures (including the modified ASES), the SPADI proved to be the most responsive (29).

The ASES has 2 parts: the patient self evaluation (patient ASES [pASES]) and the physician assessment (clinical ASES [cASES]) (30). The ASES has been shown to be a valid and responsive tool, especially in its modified form, which is mainly used for the assessment of elbow pain and disability (31). The first 6 questions of the pASES enquire about the presence or absence of pain and the use of pain medication. The seventh item is a 0–10 VAS for pain, which was reversed in the present study to give 0 = maximal pain and 10 = no pain. Analogously, instability was rated by the VAS as 0 = very unstable and 10 = maximal stability. The next 10 items rate difficulty in the performance of ADL for the left arm and the right arm separately, for which the sum score (scale 0–30) was rescored to give a scale from 0 = unable to do to 100 = no difficulties. The final pASES score was given by the combination of pain and function scores as follows: 5 × pASES pain + pASES function ÷ 2(scaled as 0 = worst, 100 = best). The questionnaire was crossculturally adapted for the German language using the same process as described for the SPADI (28).

In the second part of the ASES, the physician rates motion or mobility in degrees in relation to 5 items, each active and passive for each shoulder. We adapted the scoring from the original questionnaire such that a range of motion of 0° = 0 points and maximal possible range of motion = 100 points. Maximum was defined as 180° for forward elevation, 60° for external rotation (at 0° abduction), 90° for external (at 90° abduction) and internal rotation, and 50° for cross-body adduction. An additional 11 items rate signs and symptoms (tendon and capsule tenderness, impingement, etc.), which were also rescaled to give 0 = maximal severe signs/symptoms and 100 = no signs/symptoms. Strength is rated by 5 items (0 = no contraction, 5 = maximal strength) and the average score of the items scaled into 0–100 (100 = maximal strength). Finally, instability was scored using 8 items and their average was scaled to give 0 = maximal instability and 100 = maximal stability. The total cASES score was given as follows: (motion active + motion passive) ÷ 2 + symptoms + strength + instability, scaled from 0 = worst to 100 = best health.

The CS is the longest existing and possibly the most commonly used tool for assessment of the shoulder. The physician asks and documents answers to QOL questions (pain and ADL, ability to work, sleep [disturbance], sports and leisure, and maximal possible active positioning of the hand [e.g., overhead, etc.]) and certain clinical parameters (motion/mobility and abduction strength) (32). We used the German version described by C. R. Constant himself (33). The scale ranges from 0 = worst health (for all dimensions) to 15 = no pain, 20 = no limitations in ADL, 40 = free motion, 25 = 12.5 kg arm abduction strength. The sum of the 4 scales results in the CS total score from 0 = worst to 100 = best health. United Kingdom normative data are available stratified for sex and age but not comorbidity (34).


Determination of the scores followed the “missing rules” for the SF-36 and the DASH as described in the Patients and Methods section. For the other instruments, no missing rules were reported in the literature for the original questionnaires, so we decided arbitrarily to require that at least two-thirds of the items be completed to enable determination of the scale score. To compare the scores among the different instruments, all total or summary scores were adjusted to 0 = worst health, maximal pain/limitation/disability to 100 = best health, no pain/impairment/disability. For some scales of the pASES and the CS, this scoring was different from that used in previous studies, so the original scaling was left for comparison with other data, except for the data shown in Figure 1 (see Results), where all scores were scaled from 0 to 100 to ease visual comparison.

Figure 1.

Comparison of the scores (0 = worst to 100 = best health) of patients after total shoulder arthroplasty (n = 50 joints). Horizontal black lines/marks: population normative values, corrected for sex, age, and comorbidity (if available). SF-36 = Short-Form 36; PCS = physical component summary; MCS = mental component summary; DASH = Disability of the Arm, Shoulder and Hand questionnaire; SPADI = Shoulder Pain and Disability Index; pASES = patient American Shoulder and Elbow Surgeons questionnaire (ASES) for the shoulder; cASES = clinical ASES; CS = Constant (Murley) Score; ADL = activities of daily living.

In the analysis of comorbidities, joint disease was excluded as a comorbid condition, since it was the main disease being examined. Normative values were calculated and stratified by sex, 5-year age groups, and comorbidity (at least 1 illness within the last month versus none) for each individual. A correction for comorbidity has previously been performed for the SF-36 (German data) and for the DASH (US data), but not for the CS (UK data) (22, 26, 33).

In the descriptive statistics of the instruments' scores, both nonparametric and parametric statistics were used, depending on the results of the test for normality of score distributions (Gaussian distribution) using the Kolmogorov-Smirnov test. Pairwise comparisons with normative values were carried out using Wilcoxon's rank sum test for nonparametric data. Spearman's rank correlation was used to assess crossvalidity between the different instruments (35, 36). A factor analysis (main component analysis with varimax rotation) was carried out to examine commonalities among the scores' constructs and to identify whether the set could potentially be reduced in size (36). Logistic regression was used to explore predictive variables for the characterization of patients with either OA or RA, using Stata 7.0 for Windows (Stata Corp., College Station, TX). All the other analyses were performed using SPSS 11.0 for Windows (SPSS Inc., Chicago, IL). Significance was accepted at the 5% level.



Seventy-five patients who had undergone shoulder arthroplasty in 1996 and 1997 were invited to participate in the study. Eight patients had died and 9 could not be traced (e.g., had changed address, moved abroad). Fifteen patients declined the invitation, mostly because of difficulties traveling due to severe (partly preterminal) illness or due to the long distance between home and the clinic. The remaining 43 (57%) patients were examined in July 2002.

Table 1 shows the descriptive sociodemographic and disease-specific data for the 43 patients. All patients had received a cemented Aequalis (Tornier, France) shoulder implant in combination with a cemented HD-polyethylene glenoid component from the global shoulder system (DePuy, Warsaw, IN) in 1996 or 1997. Four of the 7 bilateral cases (see Table 1), who had undergone arthroplasty in 1995 or 1996 at the contralateral shoulder, had received the first joint replacement before 1996 (1 in 1990, 2 in 1994, and 1 in 1995). There were no differences between the unilaterally and bilaterally operated patients in relation to age, sex, education level, or the scores for each of the instruments. Thus the analysis was carried out for the 2 groups combined, giving 1 group of 43 patients with 50 operated joints (the ASES and the CS assess the left and the right limbs separately).

Table 1. Sociodemographics and disease-specific data*
  • *

    Data presented as no. (%) unless otherwise noted.

Age, mean ± SD (range) years65.113.2 (31–87)
 Basic school (8–9 years)1330
 Vocational training1944
Living conditions  
 With partner3070
Alcohol consumption  
 Several times daily25
Sport, hours/week  
 Rheumatoid arthritis2149

Administration and practicality of the assessment instruments.

All the questionnaires could be easily understood and completed by all the patients. On average, the sociodemographic questionnaire took 5 minutes to complete; the comorbidity questionnaire, 2 minutes; the SF-36, 5 minutes; the DASH, 4 minutes; the SPADI, 2 minutes; and the pASES, 2 minutes. Thus, the whole set of self-rated questionnaires required a total of 20 minutes to complete. With the brief introduction, distribution, collection of the questionnaires, and a check for completeness of answering, approximately half an hour per patient was required. The completeness of answering was generally higher for the shorter questionnaires. The physical examination and completion of the cASES and the CS by the examining doctor took an additional 15 minutes.

Health and quality of life.

Table 2 shows each instrument's score and subscores and their comparison with normative values. The mean scores are displayed in Figure 1 using a scale of 0 = worst to 100 = best health for all instruments and subscales. The scores were high for mental health and joint stability, and relatively high for pain and the clinical scores (cASES, CS), but were low with regard to the dimensions measuring specific physical functions and abilities.

Table 2. Instrument scores for patients after shoulder arthroplasty (n = 43 patients, 50 shoulders)*
 MedianMeanSDNormPNormPnMinimumMaximumFloor %Ceiling %Scale§
  • *

    SF-36 = Short-Form 36; PCS = physical component summary; MCS = mental component summary; DASH = Disability of the Arm, Shoulder and Hand questionnaire; SPADI = Shoulder Pain and Disability Index; pASES = patient American Shoulder and Elbow Surgeons questionnaire (ASES) for the shoulder; cASES = clinical ASES; CS = Constant (Murley) Score; ADL = activities of daily living.

  • German population normative values before correction for comorbidity.

  • German population normative values after correction for comorbidity.

  • §

    Scales: 0–k, where 0 = worst health, maximal symptoms/limitation and k = best health, no symptoms/limitations.

  • Normally distributed (Kolmogorov-Smirnov Test).

SF-36 physical functioning55.054.925.372.2<0.00167.30.00443095200–100
SF-36 role physical25.048.248.371.50.00364.80.00442010044420–100
SF-36 bodily pain42.055.425.361.80.10853.50.58243101000140–100
SF-36 general health52.053.122.760.50.04355.30.3204315100050–100
SF-36 vitality53.052.921.357.60.13853.30.866420100220–100
SF-36 social functioning88.081.320.883.10.90080.50.63343251000370–100
SF-36 role emotional100.080.540.185.10.09080.90.01741010019770–100
SF-36 mental health72.071.918.470.90.50767.90.1214232100050–100
SF-36 PCS36.136.910.943.60.00340.80.0534017.058.3000–100
SF-36 MCS55.253.39.551.30.03650.20.0014019.668.2000–100
SPADI pain67.365.327.1----4391000160–100
SPADI function67.165.326.3----4322100090–100
pASES pain8.57.72.7----501100400–10
pASES instability10.08.33.1----500108660–10
pASES function56.759.425.1----5001004100–100
cASES motion active57.456.119.0----501484000–100
cASES motion passive68.164.218.7----503091000–100
cASES symptoms87.582.520.7----50171000440–100
cASES strength80.086.214.1----50601000440–100
cASES instability100.098.93.6----50831000900–100
CS pain12.510.84.7----500152500–15
CS ADL16.015.93.6----497200260–20
CS motion28.025.49.4----50638000–40
CS strength10.09.75.8----50022600–25

The SF-36 pain, SF-36 general health, and all the SF-36 mental health dimensions were equivalent to or exceeded the normative values after the latter were corrected for comorbidity. The specific physical health and function scores (SF-36 physical functioning, SF-36 role physical, SF-36 PCS, DASH) remained below the normative figures. This was also the case for the CS, which can be expressed relative to normal values as adjusted CS (83%; not corrected for comorbidity due to the lack of specific data).

Overall, 20 (47%) of the patients felt that their preoperative expectations about the arthroplasty were met completely (10 on the 0–10 VAS; median 8.5); only 5 patients were somewhat dissatisfied (VAS below 5). Almost all patients (42 of 43) declared that, with their current knowledge of the outcome, they would choose total shoulder joint replacement again if they found themselves in similar circumstances to those that prevailed preoperatively. Radiographic examination showed loosening of 1 humeral component and 2 glenoid components. According to the ultrasound imaging, 25 (50%) rotator cuffs were ruptured.

Construct and concurrent validity of the instruments.

The Spearman's rank correlation coefficients shown in Table 3 reflect the degree of agreement between the instruments in measuring a given symptom or functional ability. The highest correlations were observed between the condition-specific instruments; moderate correlations were found between those and the clinical and the SF-36 scales.

Table 3. Spearman's rank correlation coefficients between the assessment instruments*
  • *

    For abbreviation definitions, see Table 2.

  • P ≥ 0.05.

  • P < 0.001.

  • §

    0.001 ≤ P < 0.05.

SF-36 PCS      
SF-36 MCS0.16     

Factor analysis identified 3 main constructs, listed in Table 4, that explained 88.6% of the variance of the instruments' main scores. Briefly, the factor loading of a score can vary between 0, which means no influence on and no correlation with the factor, and 1, which means that the score is perfectly correlated with the factor or the factor represents perfectly the score.

Table 4. Factor loads of the instruments' main scores*
 Factor 1 Physical QOLFactor 2 ClinicFactor 3 Mental QOL
  • *

    QOL = quality of life. The main loading scores within each factor are shown in bold. For other abbreviation definitions, see Table 2.

Explained variance, %61.215.611.9
SF-36 PCS0.930.040.00
SF-36 MCS0.060.020.99


Assuming that best health is reflected by a score of 100 in all of the health dimensions, the patients examined in the current series were, in general, far from fully healthy. For some scales, the scores reached figures close to the maximum, especially for the subjective rating of shoulder stability (assessed by the pASES and cASES), cASES symptoms, cASES strength, and all the mental health scales of the SF-36. However, the specific normative values show that even an apparently healthy 65-year-old person (the mean age of our sample) generally has reduced scores for many of the instruments examined, indicating that the average 65 year old does not typically enjoy the best possible health. Thus, considering that some of the patients in the present study were also affected by systemic disease, and that the RA patients had the additional burden of a chronic, incurable condition and its associated comorbidity, both the overall health and QOL of the patients in the present study were remarkably good. This was especially so for the mental health dimensions, where the scores reached or exceeded normal values, and also for the SF-36 dimension of pain.

Nonetheless, these comorbid conditions and the presence of RA clearly have an important impact on health, and most likely explain the reduced scores of our patients when compared with those reported in other studies of outcome after shoulder arthroplasty, especially if the data in previous studies were not stratified by OA and RA. For example, 4 years after arthroplasty, patients with low comorbidity, aged 58 (n = 60–75), showed a mean SF-36 physical functioning score of 66.8 (our patients 54.9), an SF-36 pain score of 61.2 (our patients 55.4), and a CS score of 74.0 (our patients 61.7) (37). When assessing OA alone, a mean CS score of 71.0 (our data 67.6 in OA patients) was observed in 268 cases (mean age 69 years) 30 months after implantation (38). In 94 total and 32 hemiarthroplastic shoulder OA patients, aged 65 years, the pASES score was 83.9 (our data 73.4 in OA patients) at the 46-month followup, but there was low comorbidity (39). In another study of both RA and OA patients (mean age 67 years), the mean CS score at the 31-month followup was markedly lower in the RA patients (n = 39; score 57.2, 74% of the norm) than in the primary OA patients (n = 148; score 73.0, 99% of the norm) (5). These findings are similar to those of the present study (RA score 55.6, 74% of the norm; OA score 67.6, 96% of the norm; data not directly shown) and also those recently reported by Levy and Copeland (7). Another study of 51 RA patients, aged 57 years, revealed a mean ASES function score of 48.0 (our RA patients 50.6) and a CS strength score of 4.0 (our RA patients 7.9) (6). In summary, RA patients receive arthroplasty earlier in life than OA patients, which is indicative of the burden of RA. Furthermore, in comparing the results of different studies, the proportion of RA and concomitant diseases must be taken into account.

The use of the disease-specific tools revealed that our patients showed significant impairments in function. However, their self-rated pain and their mental health status (SF-36 pain, role emotional, mental health, MCS) were no different from–and for some dimensions even better than–the norm. Additionally, joint stability was maximal or almost maximal for most of the patients. Interestingly, the patients appeared to be able to function well in their daily life, and enjoyed a high quality of life, despite some limitations in the specific functions enquired about in the disease-specific instruments. It may be speculated that during the long-term course of their disease, the patients had learned to compensate for the loss of these specific functions by developing adaptive strategies. The low number of cases of endoprosthesis loosening—a complication that causes impairment due to pain—undoubtedly contributed to the good result in terms of QOL despite the high number of rotator cuff ruptures (n = 25), which were obviously not accompanied by relevant pain. All in all, most of our patients were satisfied with the result of their arthroplasty and almost all would choose this treatment option again.

The patient's interview set (30 minutes to complete) and the evaluation of the clinical parameters (part of the physical examination with 5 minutes to complete the cASES and the CS) were both feasible in daily clinical practice. The items of all the instruments were easy to understand and most of the patients had no difficulties completing the questionnaires. However, a certain psychointellectual level and language knowledge was naturally required, as with all self-rating questionnaires.

There were high floor effects (i.e., minimum scores reached) for the SF-36 role physical and role emotional, and high ceiling effects (i.e., maximum scores reached) for the SF-36 role physical, role emotional, and social functioning scales; pASES pain and instability scales; cASES symptoms, strength, and instability scales; and CS pain and strength scores. This finding indicates that these scales are not able to offer a valid reflection of health status for patients who experience exceptionally poor health or are greatly impaired (floor effect). The same was true for 2 patients who had a score of 100 but had different health status (ceiling effect). This finding indicates that for a proportion of patients it would not be possible to monitor improvement or deterioration with the given instruments because the extreme scores had already been reached. The subscores of the SPADI, the pASES function, both cASES motion scales, CS motion, CS strength, and all the total and summary scores did not demonstrate floor and ceiling problems. The scales with low floor and low ceiling effects were also those for which the scores were normally distributed (see Table 2). Both these properties confer upon the questionnaire good psychometric and statistical qualities.

The scales of the upper limb- and shoulder-specific questionnaires (the DASH, SPADI, pASES, and CS) showed a high agreement in measuring specific symptoms and disabilities, which indicates similar and valid constructs for these instruments. This also suggests that one instrument could be replaced by another, which would allow shortening and optimization of the final questionnaire set to be used in future clinical practice. The highest correlation was 0.93 between the DASH and the SPADI, which was even higher than in the test setting of shoulder patients for the validity of the DASH (0.79 for SPADI function and 0.85 for SPADI pain) (25).

The unspecific SF-36 physical scales (together represented by the PCS) correlated less well with the scales of the specific instruments, confirming that the SF-36 measures additional aspects of physical health. This suggests that the SF-36 yields more comprehensive information than the specific instruments do, and that specific instruments cannot be crossvalidated using the SF-36. This is supported by the results of the factor analysis (see Table 4). The degree of correlation between the scores of the SF-36 and those of the disease-specific questionnaires in the present study was comparable to that reported by others using other specific instruments (29, 31, 37). Mental health and physical health seem to be independent constructs, as measured by the SF-36. This highlights the need for a specific instrument (or a specific part of an instrument) addressing mental health, if a comprehensive assessment of health and QOL is required.

The difference in the constructs depicted as physical and mental QOL was highlighted by the results of the factor analysis, which clearly separated these 2 health domains. Although the mental QOL was represented by only 1 score (the SF-36 MCS), physical QOL incorporated SF-36 PCS, DASH, SPADI, and pASES assessing pain, range of motion, stability, and function. Whether just 1 of these 4 instruments could represent this domain, to enable a shortening of the instrument set, will be examined in a further study of longitudinal data, focusing on sensitivity to change/responsiveness of the different questionnaires. The SF-36 PCS loaded most heavily on the physical QOL factor and was independent of the 2 other factors (mental and clinical). In accordance with our clinical experience, there is the third dimension representing the results of the clinical assessment (incorporating the cASES and the CS), but this emerged only when the eigenvalue was reduced to 0.5 (instead of 1.0, which is the Kaiser criteria) (36). The condition-specific scores of the physical QOL (DASH, SPADI, pASES) also loaded moderately on this clinical factor, reflecting the different but similar (to the physical QOL) nature of the clinical assessment and its importance in completing the picture.

Finally, the results of the clinical assessment by the physician (cASES) were rather weakly correlated with the scores of the patient self-rated assessments. The clinical scores tended to be higher than the patients' self-rated scores (see Figure 1: cASES scales, CS pain, and CS ADL), indicating that the physician's rating of the patient's health is generally more optimistic than the patient's own assessment. This should be borne in mind when judging objective clinical data.

Logistic regression revealed that the 21 RA patients were significantly younger, had more comorbidities, and were more functionally impaired than the 22 OA patients (data not shown in detail). In this analysis of predictors, the SF-36 MCS, the SPADI, the pASES, the cASES, and the CS were able to distinguish between the 2 conditions (OA and RA) with a very high sensitivity and specificity, as shown by an area under the receiver operating characteristic (ROC) curve of 0.90 (40). Briefly, an area under the ROC curve of 1.0 represents perfect differentiation (of OA and RA) by the model (or the test) with 100% sensitivity and 100% specificity; an area of 0.5 means no ability of the model to differentiate, i.e., no better than chance; and an area of 0.8 is usually considered high and indicates that the model fits and performs well.

Correction of the normative scores for comorbidity (at least one comorbid condition versus none) had only a minimal effect on the DASH but a large effect on the SF-36 scores. In the SF-36, the norm score is between 5% and 14% lower (worse) in a person with comorbidity compared with a person without comorbidity (22). Because the proportion of patients with comorbidity in the present study was high (77%), the SF-36 norms decreased (worsened) after correction for comorbidity. In contrast, the DASH normative figures increased (improved) after correction, because in the US general population sample there were more persons affected by at least 1 comorbid condition (87.7%) than in our sample (77%). As expected, the generic SF-36 reflects the effects of concomitant diseases much better than does the upper limb-specific DASH, and is therefore influenced to a greater extent by comorbid conditions. Assuming that OA and especially systemic RA result in specific functional impairments, this would explain why our patients showed significantly lower scores than normal only in dimensions that measure physical function (i.e., the SF-36 physical functioning, the SF-36 role physical, the DASH, and the CS) but not in SF-36 pain, general health, and all the mental health scales.

The results of the present study, which used an extensive set of instruments and clinical examinations, provide a comprehensive and detailed view of health and QOL in OA and RA patients 5–6 years after total shoulder arthroplasty. Comparison of objective clinical assessments with subjective patient self-rated assessments allowed examination of the relative value of different parameters, instruments, and concepts of assessment. Examination of the data in relation to normative values corrected for comorbidity (for the SF-36 and the DASH) signifies an important improvement in the characterization of health status in patients with specific joint problems and additional comorbidity, and is unique in literature.

The study did, however, have a number of limitations that require discussion. Naturally, because it was an uncontrolled, observational cross-sectional study of patients who had undergone a specific intervention, it was not possible to assess the questionnaires' sensitivity to change or ability to distinguish between different diagnostic or treatment groups. However, this was also not the primary aim of this initial study, and these issues will be addressed in our future work. Secondly, it must be acknowledged that the quality of the data collected by self-administered instruments is dependent on the psychointellectual status of the subjects: a lack of understanding regarding the issues being enquired about may lead to the collection of inaccurate data in some patients. Furthermore, the prerequisite of a minimal level of intelligence in completing the questionnaires implies that the instruments cannot necessarily be implemented for all patients with the specific joint disease under investigation. And finally, although the long followup is a positive attribute in relation to the study's methodologic quality, it also introduces the potential problems of recall bias—in the case of questions for which the patient is required to reflect on the situation as it was 5 years ago—and of selection bias associated with low participation rates (in this study, only 43 of 75 possible patients could be assessed). The outcomes must therefore be interpreted with caution and it must be borne in mind that the sample consisted of patients who had a sufficiently good level of health to complete the questionnaires and be able to visit the clinic. On the other hand, although the results may represent a slightly overoptimistic picture of the long-term outcome after shoulder surgery, it is this very same type of patient that will be subject to routine assessment with the instrument set in future clinical use. Most of the 15 patients who were not able to visit the clinic suffered from severe, sometimes terminal (cancer) illness, with a high level of disability and need for care. Thus, these patients would typically not even need to perform many of the everyday tasks enquired about in the instrument set, as they would be carried out (for other reasons) by a caregiver.

Using an extensive, comprehensive set of instruments it was shown that the general well-being, QOL, and satisfaction with treatment of patients who had undergone uni- or bilateral shoulder arthroplasty 5–6 years ago was, on average, good. Some specific functions remained significantly impaired, but appeared not to play a decisive role in the performance of tasks of daily living.

As was to be expected, the individual instruments did not always deliver unique information, and the set of tools can most certainly be reduced and optimized before being implemented in clinical practice. Identification of the instruments that deliver the most relevant information, in the most succinct fashion, requires a longitudinal study focusing on responsiveness, i.e., the sensitivity to change of the instruments after specific interventions. The aim of such a study will be to find the 3 best tools to cover the clinical, generic, and disease-specific domains to provide for a comprehensive assessment of the patient according to the World Health Organization's International Classification of Functioning, Disability, and Health concept. The final set of instruments should allow for a valid, sensitive, and patient-orientated assessment within the normal clinical routine, and should also provide the opportunity to compare the results among different conditions, diseases, and interventions, and with those of the general population.


We thank Mrs. Roberta Schefer for the management of the patients, the questionnaires, and the database.