Comprehensive assessment of clinical outcome and quality of life after total elbow arthroplasty




To assess quantitatively the outcome and to explore the physiometric and psychometric properties of clinical, generic, and condition-specific instruments after total elbow arthroplasty.


Seventy-nine patients were assessed in a 6–19-year cross-sectional catamnesis by means of 6 widely used questionnaires, clinical examinations, and radiographic examinations.


With regard to pain, general physical health, and all the mental health dimensions of the Short Form 36 (SF-36), the patients showed scores comparable to normative values. Elbow joint stability and satisfaction were both good. Significant functional limitation was evidenced by the low mean scores of the SF-36 physical functioning measure (48.7, normative 69.9) and the Disabilities of the Arm, Shoulder and Hand questionnaire (DASH) function measure (51.1, normative 89.3). The SF-36 physical component summary and the DASH correlated highly (r = 0.76) and, in factor analysis, loaded on the factor “physical unspecific.” The patient and clinical modified American Shoulder and Elbow Surgeons questionnaire (mASES) correlated with the Patient Related Elbow Evaluation form (r = 0.92 with the patient mASES) and loaded on “physical specific.” The SF-36 mental component summary loaded on “mental quality of life.”


The patients' self-rated health, quality of life, and clinical outcome were good and were not affected by impairment in some specific functional abilities. A questionnaire set comprising the SF-36 and the patient and clinical mASES is proposed for the comprehensive and specific assessment of outcome after elbow arthroplasty.


Destruction of the elbow joint occurs most commonly in connection with rheumatoid arthritis (RA) and, occasionally, following intraarticular fracture. It can greatly affect an individual's health and quality of life (QOL). Worldwide, RA has a lifetime prevalence of 1–2%, and the elbow joint is affected in 20–50% of these cases (1). Of all elbow prostheses, 80% are implanted due to RA (2, 3). Loss of elbow motion is considered more disabling than loss of shoulder or wrist function, since normal elbow function is required for positioning the hand in space, which is crucial in the performance of activities of daily living (ADL), e.g., reaching the hand to the mouth (1).

In the treatment of primary or secondary elbow joint destruction, the development of artificial joint replacement has become the most important therapy option in the prevention of permanent disability (1, 4, 5). Especially in RA, total elbow arthroplasty has become a well-accepted and safe procedure. With the introduction of the semiconstrained principle, the complication rates reduced from 45% in the late 1970s to 11–20% in the 1990s (1, 3, 5, 6). There are multiple reports in the literature of the mid- and long-term results after elbow arthroplasty. However, most of these are based on examiner-dependent outcome measures; only few studies have employed standardized, valid, and comprehensive assessment instruments (7–11). Furthermore, no studies have examined the comparability of the various instruments used, in relation to their relative validity, clinical utility, or specificity. This is also true in relation to other elbow conditions.

The importance of valid clinical outcome and QOL assessments has been discussed in the authors' previous cross-sectional study of patients after shoulder arthroplasty (12). Analogous to that study, and with the goal of developing a standardized assessment tool that would be feasible for use in the clinical environment, we tested a set of both clinical (physician-assessed) and patient self-administered health measurement instruments in a cross-sectional followup examination of patients 6–19 years after total elbow arthoplasty. The first aim of the study was to describe the health status and QOL of these patients compared with population-based normative data, using a holistic, comprehensive, quantitative assessment approach. The second aim was to assess the validity of the assessment tools, the quality of the data obtained, and the feasibility of the instruments' use in the daily clinical routine to arrive at recommendations for an optimal set of instruments. It was hoped that the examination of a broad spectrum of parameters would contribute to the development of a standardized assessment tool for use in the clinical environment with broad use in different patient settings.



All patients who had undergone cemented GSB III (Gschwend-Scheier-Bähler) elbow arthroplasty at the Department of Upper Extremity and Hand Surgery, Schulthess Klinik, Zurich, Switzerland, in the years 1984–1996 were sent a written invitation to attend a followup consultation with the study physician (MJ). The indications for arthroplasty were RA and posttraumatic destruction of the elbow. The patients were then contacted by telephone, which provided the opportunity to motivate them to come to the clinic, to answer any outstanding questions, and, in the case of patients who did not want to attend an assessment, to establish why the invitation was declined. Travel expenses were refunded, but otherwise no payment was made for participation. All the patients were fully informed about the risks and benefits of the examination, including the radiograph, and gave their informed consent to participate.


The assessment instruments to be used were selected using the same criteria as described in our previous shoulder study (12). All validated, clinically well-tested health measurement instruments for the upper extremity were identified from a search of the literature in PubMed and were qualitatively judged and rated in relation to their practical handling, suitability for use in clinical routine, and clinical–epidemiologic qualities. The resulting set comprised radiographic and ultrasound imaging and the following assessment tools (see Appendix A): 1) A sociodemographic questionnaire (13), 2) the Self-Administered Comorbidity Questionnaire (SCQ) (14), 3) the Short-Form 36 (SF-36) (15–18), 4) the Disability of the Arm, Shoulder and Hand questionnaire (DASH) (19–22), 5) the Patient Related Elbow Evaluation form (PREE) (23), and 6) the modified American Shoulder and Elbow Surgeons questionnaire for the elbow (mASES) (24). Descriptions of the sociodemographic questionnaire, the SCQ, the SF-36, and the DASH can be found in our previously published article (12).

As a global assessment of the result of the surgery, patients were asked to rate their change in general health as a result of the arthroplasty and their overall satisfaction with the outcome (visual analog scale [VAS] 0–10). In addition, they were asked whether they would choose the operation again if they found themselves in similar circumstances to those which prevailed preoperatively.

The PREE is a short, self-administered questionnaire that uses a VAS from 0 = best to 10 = worst health for each item (23). Four items assess pain intensity under different conditions; 1 item assesses pain frequency; and 15 items assess disability during ADL that are dependent on elbow function in 2 sections, “specific” and “usual” (work, etc.) activities. The unweighted means of the items are used to determine the pain and function scores. The unweighted average of these 2 scores gives the total PREE score. Each score (pain, function, total) was transformed into a scale from 0 = worse to 100 = best. The English version of the PREE was translated into German according to the guidelines of the American Academy of Orthopedic Surgeons Outcomes Committee (25) and showed good reliability and validity (26). Earlier studies using the English version of the PREE revealed it to be reliable and valid, and to have a high correlation with the DASH (Pearson's r = 0.65–0.89), the mASES patient self evaluation (pmASES; r = 0.68–0.82), and the SF-36 physical component summary (PCS; r = 0.49–0.63) (23).

The mASES comprises 2 parts, the patient self evaluation (pmASES) and the physician assessment (clinical mASES [cmASES]) (24). The first 6 questions of the pmASES enquire about the presence or absence of pain (item 1) and the severity of pain in different situations on a VAS from 0 to 10 (items 2–6). The pmASES pain score comprises the arithmetic mean of the 5 VAS items; this was transformed in the present study to 0 = maximal pain and 100 = no pain. The next 12 items concern functional ability in ADL for the left and right arms separately; the sum score (scale 0–36) was rescored in the present study to give an mASES function score ranging from 0 = maximal difficulties/unable to do to 100 = no difficulties. Item 19 concerns satisfaction with the arthroplasty and uses a VAS from 0 to 10. The final pmASES total score was given by the mean of the pain and function scores, analogous to the total pASES score for the shoulder (scaled as 0 = worst, 100 = best) (27).

In the second part of the mASES, the cmASES, the physician rates motion, stability, muscle strength, grip strength, and signs or symptoms of the left and the right elbow separately. Active motion was measured in degrees. We adapted the scoring from the original questionnaire such that a range of motion of 0° = 0 points and maximal possible range of motion = 100 points. Maximum was defined as 150° for flexion, 10° for extension (i.e., 160° for the flexion–extension arc), and 90° each for pronation and supination (i.e., 180° for the pronation—supination arc). The mean of the 6 mobility items gave the cmASES motion score (0 = no mobility, 100 = full motion). An additional 3 items rated valgus, varus, and posterolateral instability (each 0 = no instability/complete stability to 3 = severe instability/no stability); the mean of these 3 formed the cmASES stability score (transformed to 0 = severe instability, 100 = complete stability). Muscle strength was assessed in flexion, extension, pronation, and supination (0 = no contraction, 5 = full strength); the transformed mean of these 4 items gives the cmASES strength score (0 = no contraction, 100 = full power). Item 15 is grip strength in kg, which was scored from 0 = 0 kg to 100 = 60 kg for the determination of the total cmASES score. An additional 18 items concern symptoms, such as pain in association with the application of pressure, pain during different movements, impingement signs, crepitus, Tinel's sign of the ulnar nerve, and the stretch test of the cubital tunnel. The mean of these items gives the cmASES symptoms score, with 0 = maximal symptoms to 100 = no symptoms. Finally, the mean of the cmASES motion, stability, strength, grip strength, and symptoms scores gives the total cmASES score, ranging from 0 = worst to 100 = best.

The mASES questionnaire was cross-culturally adapted for the German language using the same process as described for the PREE (25), and showed good validity and reliability (28). Previous studies have shown that the English version of the mASES is reliable and valid and correlates highly with the DASH (Pearson's r = 0.65–0.81), the PREE (r = 0.61–0.96), and the SF-36 PCS (r = 0.33–0.63) (23, 29).

In the present study, the 2 dimensions of the DASH—symptoms and function—were analyzed separately. This is not explicitly described in the DASH manual (19), but was done here because the data of the PREE and the pmASES showed differences in these domains. The DASH symptom score was determined by the unweighted mean of items 24, 25, and 29 (which enquire about pain), 26 (tingling), 27 (weakness), and 28 (stiffness). The DASH function score was given by the unweighted mean of items 1–23 and 30.

The Mayo Elbow Performance Index (MEPI) (30), also a validated and widely used instrument for the assessment of the elbow, was not included in the present instrument set for the following reasons. First, all items of the MEPI are covered by the DASH or the PREE. Second, although in the DASH and the PREE the response items are graduated scales, the MEPI has just “present/absent” options, and is therefore less able to discriminate between different grades of dysfunction. Third, a prospective, comparative study of scoring systems found that the DASH and the mASES “performed a better assessment” of pain and function than did the MEPI (29).

Anteroposterior and lateral radiographs were taken to investigate loosening or breakage of the endoprosthesis. However, the radiographic results will be discussed only briefly here, as they will be the subject of a future clinical report.


Descriptive statistics for all instrument scores (except grip strength) were given on a scale ranging from 0 = worst to 100 = best health, and these were compared with population normative data (where available) using Wilcoxon's nonparametric test. Normal distribution of the scores was examined by the Kolmogorov-Smirnov test and floor and ceiling effects were shown. Relationships between the scores from different scales (construct validity of the instruments) were examined using Spearman's rank correlation, because most of the data were not normally distributed and nonparametric analyses were required. This is in contrast to the Pearson's product moment correlation not dependent on parametric (statistical) distributions of the scores and gives more conservative results, less likely overestimating effects. Factor analysis (main component analysis with varimax rotation) was used to identify the main domains being represented by the various different instruments. Logistic regression was used to examine the scores' ability to distinguish between different conditions (RA versus posttraumatic). Throughout the analysis, the examination unit was the patient for the SF-36, the DASH, the PREE, the pmASES pain and satisfaction. For the other scores it was the operated joint.


Patient demographics.

Patients who had undergone total elbow arthroplasty between 1984 and 1996 (n = 192) were identified as potential study participants. Of those, 71 (37%) had died and 11 (6%) could not be traced, even with the help of the general practitioner and the corresponding residents' registration office (e.g., they had changed address, moved abroad). Thirty-one (16%) patients declined participation in the study: 4 due to severe illness or handicap, 11 because of the long distance between home and the clinic (>1,000 km), and 12 refused participation in the study. Four patients declined to visit the clinic but agreed to fill out the mailed questionnaire set; however, they did not return it. Of the remaining 79 (41%) patients (96 arthroplasties; 17 had had a bilateral endoprosthesis), 68 (86%) (82 arthroplasty joints) were examined in the clinic between March and October 2003. During the clinic visit, the self-assessment questionnaires were completed. Eleven (14%) patients (14 arthroplasty joints) were not able to visit the clinic due to severe illness or the long distance between home and the clinic, but agreed to complete the self-evaluation questionnaires sent by post.

Table 1 shows the descriptive sociodemographic and disease-specific data. All 79 patients (as well as all nonrespondents) had received a cemented GSB III (Gschwend-Scheier-Bähler) elbow endoprosthesis (Sulzer-Zimmer, Switzerland). Comparing the 62 (78%) unilaterally and 17 (22%) bilaterally operated patients, there were significant differences between them (Wilcoxon's test P < 0.05) for sex (unilateral: 65% women, bilateral: 94% women), mean SF-36 PCS (unilateral: 39.5, bilateral: 28.2), and mean DASH score (unilateral: 59.0, bilateral: 41.3). There were no significant differences between them for age, education level, and number of comorbidities. The scores for the SF-36 mental component summary (MCS), PREE, pmASES, and cmASES scores were slightly but not significantly lower in the bilaterally operated patients. This must be borne in mind when interpreting the scores of the whole combined group of 79 patients with 96 operated joints (the latter for the pmASES function and total score and the cmASES scores, which assess the left and right limbs separately).

Table 1. Sociodemographic and disease-related data*
  • *

    Data presented as no. (%) unless otherwise noted.

Age, mean ± SD (median; range) years64.1 ± 13.3 (66; 24.5–92.3)
Age of the arthroplasty, mean ± SD (median; range) years11.2 ± 3.0 (10.9; 6.5–19.0)
 Male23 (29)
 Female56 (71)
White race79 (100)
 Basic school (8–9 years)29 (37)
 Vocational training36 (45)
 College/university14 (18)
Living conditions 
 Urban34 (43)
 Rural45 (57)
 Alone19 (24)
 With partner60 (76)
 No71 (90)
 Yes8 (10)
Alcohol consumption 
 None25 (32)
 Occasional41 (52)
 Daily12 (15)
 Several times daily1 (1)
Sport, hours/week 
 040 (51)
 0–<111 (14)
 1–214 (18)
 >214 (18)
Comorbidities (excluding joint disease) 
 None23 (29)
 117 (22)
 216 (20)
 311 (14)
 4 or more12 (15)
 Rheumatoid arthritis59 (75)
 Posttraumatic20 (25)
 Unilateral62 (78)
 Bilateral17 (22)
 Left32 (33)
 Right64 (67)

Administration and practicability of the assessment instruments.

All questionnaires were easily understood and completed by all the patients. On average, the sociodemographic questionnaire took 5 minutes to complete, the SCQ took 2 minutes, the SF-36 took 5 minutes, the DASH took 4 minutes, the PREE took 3 minutes, and the pmASES took 3 minutes. Thus, the whole set of self-rated questionnaires required, on average, 22 minutes to complete (timed with a stopwatch in 20% of the patients, the patients were not observed during completion of the questionnaire). With the brief introduction, distribution, and collection of the questionnaires and the check of completeness of answering, 30–35 minutes per patient were needed. The physical examination, which included assessment and documentation of the specific items enquired about in the cmASES, took an additional 10–15 minutes.

Health and quality of life.

Table 2 and Figure 1 show the results for each instrument's score and subscores, and their comparison with normative values (where available), using a scale of 0 = worst to 100 = best health for all scales except for the cmASES grip strength (kg). The pmASES function, the pmASES total score, all cmASES scores, and the radiographs were evaluated separately for each operated limb (per definition of the instruments) giving 2 results for each of the bilateral arthroplasty patients. All other scores related to the whole patient.

Table 2. Instrument scores for patients after elbow arthroplasty (n = 79 patients, 96 joints)*
 MedianMeanSDNormPNo.MinimumMaximum§Floor, %Ceiling, %
  • *

    Examination unit was the patient for the SF-36, the DASH, the PREE, the pmASES pain and satisfaction. For the other scores it was the operated joint. SF-36 = Short Form 36; PCS = physical component summary; MCS = mental component summary; DASH = Disability of the Arm, Shoulder and Hand questionnaire; PREE = Patient Related Elbow Evaluation form; pmASES = patient modified American Shoulder and Elbow Surgeons questionnaire (mASES) for the elbow; cmASES = clinical mASES.

  • German population normative values (corrected for sex, age, and comorbidity).

  • Number of assessed patients or arthroplasty joints.

  • §

    Scales: 0 = worst health, maximal symptoms/limitation; 100 = best health, no symptoms/limitations. Exception: cmASES grip strength, kg.

  • Normally distributed (Kolmogorov-Smirnov test).

SF-36 physical functioning44.048.728.469.9< 0.001790.0100.014
SF-36 role physical25.045.144.766.5< 0.001760.0100.03933
SF-36 bodily pain50.059.127.554.60.103790.0100.0620
SF-36 general health56.056.025.756.30.968786.0100.006
SF-36 vitality50.048.422.454.60.034780.090.040
SF-36 social functioning88.080.722.881.20.4557913.0100.0043
SF-36 role emotional100.074.841.982.10.533720.0100.02065
SF-36 mental health76.071.420.668.40.072780.0100.014
SF-36 PCS34.637.
SF-36 MCS54.052.311.550.40.0926925.168.600
DASH symptoms70.066.122.887.6< 0.0017912.5100.006
DASH function45.751.125.289.3< 0.001774.3100.003
DASH50.055.323.288.8< 0.0017715.0100.001
PREE pain76.071.226.6--762.0100.0015
PREE function64.662.426.2--751.3100.004
pmASES pain80.069.627.0--772.0100.0014
pmASES satisfaction90.081.026.6--710.0100.0343
pmASES function53.157.425.6--906.1100.004
cmASES motion63.062.210.0--8229.880.200
cmASES stability77.881.217.0--8233.3100.0030
cmASES strength92.589.810.5--8250.0100.0028
cmASES grip strength, kg10.512.59.5--820.038.010
cmASES symptoms92.687.615.7--820.0100.0124
Figure 1.

Comparison of the scores of patients after total elbow arthroplasty (n = 79 patients, 96 joints). Scaling: 0 = worst, 100 = best. Horizontal black lines = German population normative values, corrected for sex, age, and comorbidity. One color for all subscores per instrument, horizontal stripes for the function subscores, and checkered for the pain/symptoms subscores. SF-36 = Short Form 36; PCS = physical component summary; MCS = mental component summary; DASH = Disability of the Arm, Shoulder and Hand questionnaire; PREE = Patient Related Elbow Evaluation form; pmASES = patient modified American Shoulder and Elbow Surgeons questionnaire (mASES) for the elbow; cmASES = clinical mASES.

The mean pain scores were high (i.e., the patients had less pain than the norm) for the SF-36 (exceeding the norm in trend, P = 0.103), the DASH symptoms (enquires about pain in 50% of the items), the PREE, the pmASES, and the cmASES symptoms (enquires about pain in 67% of the items). The scores for the pmASES satisfaction, cmASES motion, stability, and strength were high, but it is difficult to interpret these further because no normative data are available for these instruments. All mental health scores and the general health score of the SF-36 were close to or slightly higher than the norm. In contrast, the scores for function were low, as determined by all the instruments: SF-36 physical functioning attained 70%, role physical 68%, and the DASH function 57% of the norm. In the PREE, function was the lowest subscore.

Overall, 35 (44%) of the patients felt that their preoperative expectations about the arthroplasty were met completely (score = 10 on the VAS 0–10; median 9.0); only 6 (8%) patients were somewhat dissatisfied (score below 5). Sixty-five (82%) patients felt themselves better at the time of the assessment than before the arthroplasty, 4 (5%) unchanged, and 10 (13%) worse. Most of the patients (69 of 79; 87%) declared that, with their current knowledge of the outcome, they would choose total elbow joint replacement again, if they found themselves in similar circumstances to those that prevailed preoperatively.

The mean arc in flexion–extension was 107° (from mean extension –32° [deficit] to mean flexion 139°; SD = 22°), and the mean pronation–supination arc was 128° (from mean pronation 80° to mean supination 49°; SD = 32°). Radiologic examination showed loosening of the humeral component in 6 (6%) of the 96 operated joints, of the ulnar component in 1 (1%), and of both components in 3 (3%).

Measurement properties, construct validity, and concurrent validity of the instruments.

There were almost no floor effects for the DASH, the PREE, the pmASES, or the cmASES. The PREE pain, pmASES pain, cmASES stability, strength, and symptoms showed moderate to high ceiling effects. A normal distribution of the scores was observed for both of the subscores of the DASH, the pmASES function, the cmASES motion, and the total cmASES score.

Spearman's rank correlation coefficients shown in Table 3 indicate the degree of agreement between the instruments in measuring a given symptom or functional ability. The highest correlations were observed between the condition-specific instruments (PREE with pmASES: 0.92); moderate correlations were recorded between these and the SF-36 scales. The DASH correlated slightly better with the SF-36 PCS (r = 0.76) than with the condition-specific PREE (r = 0.68), the pmASES (r = 0.73), and the cmASES (r = 0.44). This finding was supported by the factor analysis as illustrated in Table 4.

Table 3. Spearman's rank correlation coefficients between instruments*
  • *

    For abbreviations, see Table 2.

  • P ≥ 0.05.

  • P < 0.001.

SF-36 PCS     
SF-36 MCS−0.10    
Table 4. Factor loads of the instruments' main scores*
 Factor 1 Physical unspecificFactor 2 Physical specificFactor 3 Mental QOL
  • *

    A factor load of 0.0 indicates no agreement; a load of 1.0 indicates perfect agreement of the scale with the factor. QOL = quality of life. For additional abbreviations, see Table 2.

Explained variance, %60.717.311.3
SF-36 PCS0.930.22−0.05
SF-36 MCS−

Factor analysis identified 3 main constructs, listed in Table 4, which explained 89.2% of the variance of the instrument's main scores: the factor “physical unspecific” was mainly composed of the SF-36 PCS and the DASH; the “physical specific” was composed of the PREE, the pmASES, and the cmASES; and the “mental QOL” was composed of the SF-36 MCS. Briefly, the factor loading of a score can vary between 0, which means no influence on and no correlation with the factor, and 1, which means that the score is perfectly correlated with the factor or the factor represents perfectly the score.

In the logistic regression analysis, carried out to examine which factors best distinguished between the 59 RA and the 20 posttraumatic patients, the SF-36 PCS (P = 0.077) and the DASH (P = 0.044) were significant (SF-36 PCS by trend) predictors (data not shown in detail); the SF-36 MCS, the PREE, and the pmASES were weaker, not significant predictors. There were no differences in sex, age, or comorbidity between the RA and posttraumatic patients. The assessment tools were able to distinguish between the 2 conditions with a high sensitivity and specificity, as shown by an area under the receiver operating characteristic curve of 0.86 after correction for sex, age, and comorbidity.


Perfect health, reflected by a score of 100, cannot be realistically expected for a 64-year-old person with RA (75% of our patients had RA) and with a high prevalence of comorbid conditions (71% of the patients had ≥1 comorbid condition) 6–19 years after total elbow arthroplasty. This is discussed in detail in our previous study, with respect to total shoulder arthroplasty patients (12).

As observed in the shoulder patients, some functional limitation was also present in the elbow patients, evidenced by the low scores on the function scales and subscales (see Table 2 and Figure 1). However, this did not substantially affect overall health perception and QOL: the elbow patients rated their general health perception (SF-36) and mental health (5 dimensions of the SF-36) as high, and within the range of normal population values. To adequately perform ADL, certain functional abilities are required. Elbow flexion was 120° or more in 92 of the 96 (96%) arthroplasty joints, a necessary condition to reach the mouth using the hand. The elbow patients were highly satisfied with the result of their arthroplasty (mean pmASES satisfaction score 81.0; 92% with VAS score between 6 and 10 where 0 = worst and 10 = best), and 87% would choose the operation again if necessary. Because pain is the most important factor affecting health perception and QOL (31), this result is not surprising; the patients in the present study reported low pain levels in the pmASES, the PREE, the DASH symptoms score (in which 50% items enquire about pain) (75% of the norm), and the SF-36 (108% of the norm). The apparent discrepancy that 18% of the patients felt unchanged or worse postoperatively, but only 13% would not do the operation again, most likely reflects a certain proportion of the patients who expected to worsen had they not undergone arthroplasty. Patients with bilateral total elbow arthroplasty had poorer health than those with unilateral. Thus, the relative proportion of bilateral and unilateral arthroplasties in any given cohort is important to know because it can confound the overall result for the given patient sample.

Compared with our shoulder arthroplasty patients (mean age 65.1 years, 77% women, 77% with ≥1 comorbidity, 49% with RA) (12), the elbow arthroplasty patients (mean age 64.1 years, 71% women, 71% ≥1 comorbidity, 75% with RA) were more functionally impaired. They displayed lower scores for SF-36 physical functioning (elbow: 48.7 versus shoulder: 54.9), DASH (55.3 versus 64.0), SF-36 role emotional (74.8 versus 80.5), and slightly less in the SF-36 vitality (48.4 versus 52.9). The elbow patients showed better health in the SF-36 bodily pain (elbow: 59.1 versus shoulder: 55.4) and in the SF-36 general health (65.0 versus 53.1), such that overall, the physical and mental summary scores were not different between the 2 groups.

In relation to range of motion and patient satisfaction, the results for our elbow arthroplasty patients were comparable with those reported in other studies (10, 32, 33). In a cross-sectional 50-month followup study of 18 RA patients (21 joints with Coonrad-Morrey endoprostheses), Hildebrand et al showed a mean DASH score of 53 (our patients 55.3), an SF-36 PCS of 27 (ours 37.2), and an SF-36 MCS of 56 (ours 52.3) (10). In the same study, a group of patients with posttraumatic destruction of the elbow (18 patients and elbows) showed SF-36 PCS and the DASH scores that were about 20% higher than those of the RA patients. Consistent with these findings, Garcia et al reported a high mean DASH score of 77 in 16 posttraumatic patients, 3–5.5 years after Coonrad-Morrey arthroplasty; 15 of the 16 patients were satisfied with the result (33).

The patient's questionnaire set (containing the sociodemographic questionnaire, the self-reporting questionnaire, the SF-36, the DASH, the PREE, and the pmASES) needed on average no more than 30 minutes to complete. The evaluation of the clinical parameters was part of the physical examination, which lasted 10–15 minutes, including 3 minutes to complete the clinical part of the cmASES.

Normal distribution and low floor and ceiling effects are positive properties of a scale, as discussed in detail in our previous article (12): Normally distributed scores allow to use sensitive parametric significance tests. Low floor and ceiling effects allow to differentiate between the patients by the score. In contrast, 2 patients with a score of 100 cannot be differentiated by this score, although their outcome may be different. According to these measurement properties, the SF-36 physical functioning, general health, mental health, and both summary scores; all DASH subscores; the PREE pain and function scores; the pmASES function and total score; and the cmASES motion and total scores were able to reflect best the outcome of the present patient sample.

Surprisingly, and in contrast to the shoulder arthroplasty study (12), in factor analysis the DASH (total score) loaded on the same factor (Table 4) and correlated (Table 3) slightly better (r = 0.76) with the generic SF-36 PCS than with the elbow-specific PREE (r = 0.68) and pmASES (r = 0.73). Because the DASH is considered to be condition specific, a higher correlation with the other condition-specific instruments than with the generic SF-36 would have been expected. The 24 function items of the DASH thus seemed not to be particularly sensitive to elbow-specific disabilities. However, the DASH symptom items may be more sensitive to elbow problems than the SF-36, as the DASH score was far lower in relation to the normative value than was the SF-36 (e.g., bodily pain). According to this analysis, all the information yielded by the DASH is covered by the SF-36 and the condition-specific PREE or pmASES. The SF-36 is unique in capturing mental health and psychosocial dimensions. Also in contrast to the shoulder study findings, the clinical assessment (cmASES) was highly correlated with the condition-specific self rating of the pmASES and the PREE. The physician thus seems to assess elbow-specific (but not shoulder-specific) function and symptoms in a similar manner to the patient. However, it is well known that clinical measures of elbow function do not necessarily reflect patient wellbeing, performance levels in ADL and QOL, and vice versa (see reference 29). Thus, clinical and self-rating assessments should both be used simultaneously.

To shorten the set to be used in future clinical practice, just one of either the PREE or the pmASES is recommended for inclusion, since the correlation (r = 0.92) and factor analysis indicated similar and valid constructs for these instruments. This is supported by a construct validity study of the English versions of the PREE and pmASES: the PREE had a Pearson's correlation coefficient of 0.93–0.96 with pmASES pain and 0.61–0.73 with pmASES function (23).

The ability to discriminate between 2 different conditions of expected differing severity (in this case, posttraumatic versus RA) is another positive property of an instrument. An instrument that does this well helps to characterize 2 different diseases by the levels of the scores. In the logistic regression analysis, the SF-36 PCS and the DASH were the best instruments for discriminating between the 2 conditions examined. Overall, the set containing all instruments showed high sensitivity and specificity in the prediction of RA and posttraumatic condition, as shown by the considerably high area under the receiver operating characteristic curve of 0.86 (an area of 1.00 would reveal perfect discrimination with 100% sensitivity and 100% specificity) (34).

It is important to correct the normative values of an instrument for the presence and absence of comorbid conditions, as well as for sex and age (12), if they are to be used for comparative purposes in clinical studies. In the present study, this was possible for both the SF-36 and the DASH, but not for the other instruments.

The strengths of the present study include the comprehensive nature of the assessment (which covered general health and QOL, subjective patient self ratings, and objective clinical findings) and the use of normative data corrected for age, sex, and comorbidity in interpreting the patients' scores. The limitations included the study's cross-sectional uncontrolled design, the inherent problems of self assessment (requiring a certain level of psychointellectual ability and compliance) and the low response rate (79 of 192 possible patients, 41%) due to the relatively long followup time leading to a high rate of deaths (37%). Furthermore, the health status of the nonresponders may have been worse than that of the patients who took part, leading to an overestimation of the outcome. However, this is an inherent problem of any such retrospective study (participation bias, a form of selection bias).

The outcome data are presently only applicable to severely affected RA and posttraumatic patients needing arthroplasty. However, the instruments' psychometric properties can be expected to be consistent in different patient settings, as almost all score levels were represented by the patients examined. Therefore, we expect that the set can be successfully used in other clinical settings, e.g., primary care, although this would of course require further verification.

Using an extensive, comprehensive set of instruments, it was shown in patients who had undergone total elbow arthroplasty 6–19 years ago, general wellbeing, QOL, and satisfaction with treatment were good on average. Some specific functions remained significantly impaired, but appeared not to play a decisive role in the performance of tasks of daily living and perception of QOL in general. This is important, because it suggests that a study relying only on functional measures would overlook the high self-perceived QOL and satisfaction of the patients, which may be decisive in determining future utilization of health care resources.

According to the present cross-sectional analysis, an instrument set consisting of the SF-36 combined with the patient and clinical mASES (or the PREE, if it must be short), together with sociodemographic and comorbidity assessments, is proposed for future clinical use. The DASH is not necessary because it is no more specific than the SF-36 and can be replaced by the PREE or the patient mASES, which has the added advantage of assessing the left and the right limbs separately. However, before the instrument set can be implemented in clinical practice, the responsiveness of the various questionnaires should be examined. This will require a longitudinal study to examine the sensitivity to change of the instruments after specific interventions. In accordance with the World Health Organizations' International Classification of Functioning, Disability and Health concept, the final set of instruments should allow a valid, sensitive, patient-orientated, and clinically relevant assessment, within the normal clinical routine (31, 35). It should provide the opportunity to compare results among different conditions, diseases, and interventions, and with those of the general population.


We thank Roberta Schefer and Susann Drerup for the management of the patients, the questionnaires, and the database.


Table  . Overview of the Subscores and Scaling for the Instruments Used
InstrumentSubscoresNo. of itemsOriginal scoringScoring in the study
Short Form 36 (SF-36)Physical functioning100 = worst, 100 = best0 = worst, 100 = best
 Role physical40 = worst, 100 = best0 = worst, 100 = best
 Bodily pain20 = worst, 100 = best0 = worst, 100 = best
 General health60 = worst, 100 = best0 = worst, 100 = best
 Vitality40 = worst, 100 = best0 = worst, 100 = best
 Social functioning20 = worst, 100 = best0 = worst, 100 = best
 Role emotional30 = worst, 100 = best0 = worst, 100 = best
 Mental health50 = worst, 100 = best0 = worst, 100 = best
 Physical component summary220 = worst, 100 = best0 = worst, 100 = best
 Mental component summary140 = worst, 100 = best0 = worst, 100 = best
Disability of the Arm, ShoulderSymptoms6100 = worst, 0 = best0 = worst, 100 = best
 and Hand questionnaireFunction24100 = worst, 0 = best0 = worst, 100 = best
 (DASH)DASH (total score)30100 = worst, 0 = best0 = worst, 100 = best
Patient Related ElbowPain550 = worst, 0 = best0 = worst, 100 = best
Evaluation form (PREE)Function15100 = worst, 0 = best0 = worst, 100 = best
 PREE (total score)20100 = worst, 0 = best0 = worst, 100 = best
Modified American ShoulderPain550 = worst, 0 = best0 = worst, 100 = best
 and Elbow SurgeonsSatisfaction10 = worst, 10 = best0 = worst, 100 = best
 questionnaire for the elbow:Function2 × 120 = worst, 36 = best0 = worst, 100 = best
 patient part (pmASES)pmASES (total score)2 × 180 = worst, 100 = best0 = worst, 100 = best
Modified American ShoulderMotion2 × 6Not described0 = worst, 100 = best
 and Elbow SurgeonsStability2 × 3 0 = worst, 100 = best
 questionnaire for the elbow:Strength2 × 4 0 = worst, 100 = best
 clinical part (cmASES)Grip strength2 × 1 kg, 60 kg = 100 points
 Symptoms2 × 18 0 = worst, 100 = best
 cmASES (total score)2 × 32 0 = worst, 100 = best