Predicting poor physical performance after total knee arthroplasty

Authors


  • Conflicts of interest: none to declare.

Abstract

The purpose of this study was to develop a preliminary decision algorithm predicting functional performance outcomes to aid in the decision of when to undergo total knee arthroplasty (TKA). One hundred and nineteen patients undergoing primary unilateral TKA were evaluated before and 6 months after TKA. A regression tree analysis using a recursive partitioning function was performed with the Timed Up and Go (TUG) time, Six-Minute Walk (6MW) distance, and Stair Climbing Test (SCT) time as measured 6 months after TKA as the primary outcomes. Preoperative measures of functional performance, joint performance, anthropometrics, demographics, and self-reported status were evaluated as predictors of the primary outcomes 6 months after surgery. Individuals taking ≥10.1 s on the TUG and aged 72 years or older before surgery had the poorest performance on the TUG 6 months after surgery. Individuals walking <314 meters on the 6MW before surgery had the poorest performance on the 6MW test 6 months after surgery. Individuals taking ≥17 s to complete the SCT and scoring <40 on the SF-36 mental component score before surgery had the poorest performance on the SCT 6 months after surgery. Poorer performance preoperatively on the 6MW, SCT, and TUG, was related to poorer performance in the same measure after TKA. Age and decreased mental health were secondary predictors of poorer performance at 6 months on the TUG and SCT, respectively. These measures may help further develop models predicting thresholds for poor outcomes after TKA. © 2012 Orthopaedic Research Society. Published by Wiley Periodicals, Inc. J Orthop Res 30:1805–1810, 2012

Over 647,000 total knee arthroplasties (TKAs) are performed each year in the United States with a future projection of more than 3.48 million TKAs annually by the year 2030.1, 2 Surgical indications for TKA with consensus among orthopedists are: pain not responsive to drug therapy, persistent weight bearing pain, pain at night, severe pain daily, resting pain several days per week, and substantial narrowing of the joint space on radiograph.3–5 Therefore, after radiographic findings of osteoarthritis (OA) have been confirmed, the decision of when to perform TKA is largely reliant on subjective complaints of pain. Once conservative options for management of knee OA are exhausted, and TKA is indicated, patients are often counseled to delay knee replacement until their pain is no longer bearable.6 This recommendation is based on decreasing the need for subsequent revision surgeries as prosthesis life is limited, in addition to potential healthcare cost savings through patient attrition.6 With this advice, some patients modify their activity levels to manage pain, but in the process, lose muscle mass, strength, and functional ability secondary to disuse and the degenerative nature of knee OA.7, 8 Thus, some patients may tolerate their pain and delay joint replacement, but in so doing, their functional status deteriorates. This poses a problem for these patients because functional performance prior to surgery is predictive of functional performance after surgery; therefore, delaying surgery may predispose them to a poorer functional outcome following TKA.6, 9–15 Some patients may be waiting too long to make the decision to undergo TKA as their functional performance deteriorates. However, no quantifiable measures are available to the patient or surgeon to indicate that a patient is at risk for a poorer functional outcome. The inability to walk >3 blocks and any difficulty climbing stairs may be functional limitations warranting TKA, in addition to pain and radiographic findings, but these functional guidelines are based only on expert opinion.3 Therefore, we developed a preliminary decision algorithm predicting functional performance at 6 months after surgery using preoperative functional performance measures, joint performance measures, anthropomorphic measures, demographic measures, and self-report measures. This algorithm utilizes simple, quantifiable clinical measures that could be used to guide patients and surgeons in the process of deciding when TKA should be performed.

METHODS

Study Design

This was a secondary analysis of data from 119 patients who were recruited as part of a clinical trial (Clinical Trial Registry: NCT00224913). Two time points from among the six time points collected for the study were used: 2 weeks before and 6 months after surgery. The 6-month time point was chosen because patients recovering from TKA typically plateau in strength and functional gains by 6 months.12, 16–18 In fact, 1 year after TKA, patients begin to decline in function, and thus our goal was to capture patients at the peak of their functional recovery.19

Setting and Subjects

All of the patients from the clinical trial who were evaluated at the preoperative and 6-month time points were utilized. The preoperative time point was added midway through the clinical trial, so not all participants from the original clinical trial were utilized for this analysis. Patients were considered eligible for the clinical trial if they had Kellgren–Lawrence grade 3–4 OA in at least one tibiofemoral compartment and had been scheduled for TKA by one of three local orthopedic surgeons. Potential subjects were excluded if maximal pain in the non-operative knee was >4 out of 10 or if they had a diagnosis of symptomatic arthritis in any other lower extremity joint, cardiovascular impairments, neurological impairments, uncontrolled hypertension, uncontrolled diabetes, a body mass index (BMI) >40 kg/m2, or lived >20 miles from the designated outpatient physical therapy clinic. All subjects underwent unilateral TKA in the same hospital with posterior cruciate ligament-sacrificing condylar implants with patellar resurfacing. Baseline characteristics of the patient population are detailed in Table 1. Following surgery, patients participated in 3–5 days of inpatient postoperative care, followed by homecare physical therapy for 2–3 weeks. Patients began an outpatient physical therapy program 3–4 weeks after TKA at one designated outpatient physical therapy clinic that consisted of progressive strengthening exercises, functional retraining, manual therapy to improve range of motion (ROM), and appropriate modalities to reduce pain and inflammation. Subjects participated in the intervention 2–3 times/week for 6–8 weeks, with an average of 17 total treatment sessions. Details of the treatment protocol were described elsewhere.20

Table 1. Baseline Regressors of Patients before Surgery
VariableMean ± SD
  1. KOS-ADLS, Knee Outcome Score-Activities of Daily Living Score; SF-36 MCS, Short Form-36 Mental Component Score; SF-36 PCS, Short Form-36 Physical Component Score.

Sex (% male)45.4
Age (years)64.8 ± 9.2
Body mass index (kg/m2)30.4 ± 4.4
Six-Minute Walk (m)467 ± 119
Stair climbing test (s)19.9 ± 9.0
Timed up and go (s)10.0 ± 2.7
Knee flexion (°)118.0 ± 13.9
Knee extension (°)4.0 ± 5.0
Quadriceps strength—surgical (Nm2/kg)19.3 ± 7.4
Quadriceps strength—non-surgical (Nm2/kg)23.6 ± 9.1
KOS-ADLS (%)50.7 ± 16.7
SF-36 MCS55.9 ± 8.9
SF-36 PCS31.9 ± 8.1
Global rating of knee function54.1 ± 20.9

Performance-Based Testing Procedures

For this analysis, three performance-based measures were selected for possible inclusion in the model based on clinical importance. Functional performance-based measures included the Timed Up and Go (TUG), Six-Minute Walk (6MW), and the Stair Climbing Test (SCT). The TUG is a measure of the time it takes a subject to rise from a chair, walk 3 m, turn around, and return to a seated position in the chair.21 The patient began seated with his or her feet on the floor and began the test upon the investigator's command. Subjects were permitted to use the arms of the chair for support during rising and sitting if needed. The average of two trials was taken, and the time was recorded to the nearest 0.01 s. The 6MW is a measure of the distance a person can walk during 6 min.22 The subject is asked to walk as quickly and safely as possible; assistive devices can be used if needed. Only one trial of the 6MW was performed per testing session. The TUG and 6MW have excellent test–retest reliability and are commonly used to measure functional ability in older adults.22 The SCT evaluates an individual's ability to ascend and descend a set of 12 steps. The steps were 18 cm high with a depth of 28 cm. Subjects began at the floor below the first step and were asked to ascend, turn around, and then descend the steps as quickly as possible in a safe manner. Bilateral handrails were available for use if needed, and the subjects began on the investigator's command. The measure used was the average of two trials recorded on a stopwatch to the nearest 0.01 s. A similar SCT was shown to have excellent test–retest reliability after TKA.18, 20, 23

Self-Reported Function Questionnaires

Three self-reported function questionnaires were selected for possible inclusion in our model based on clinical importance: the Knee Outcome Survey—Activities of Daily Living Scale (KOS-ADLS), Short Form-36 (SF-36), and a global rating of knee function score. The KOS-ADLS is a set of 14 questions that assess a patient's functional ability and symptoms.24 Scores are percentage-based with 100 being the optimal response to all questions. The KOS-ADLS has excellent test–retest reliability and is sensitive to change in patients with knee pathology.24, 25 The SF-36 is standardized, generic, self-report, health questionnaire where a score of 50 represents the average score fore the U.S. population. Two subscales of the SF-36 were calculated: the physical component score (PCS) and mental component score (MCS). The PCS assesses an individual's physical function, and the MCS assesses an individual's mental health. This easily administered test has been used to examine perception of function in patients with knee pathology.16, 26, 27 The global rating of knee function score is a single number given in response to the question “How would you rate your overall functional ability from 0 to 100, where 0 represents completely disabled and 100 represents normal function?” This answer correlates with the KOS-ADLS and Lysholm Knee Score.24

Clinical and Anthropometric Measures

Knee flexion and extension ROM, quadriceps strength, BMI, age, and gender were also recorded. Active knee flexion and extension ROM were measured with the subject in the supine position. To measure knee extension, a pad was placed under the patient's heel to permit full extension. During knee flexion, patients were asked to pull their heel towards their buttocks. Knee ROM was measured with a standard long-arm goniometer with the axis of the goniometer placed over the lateral epicondyle of the femur, the proximal arm aligned with the greater trochanter of the femur and the distal arm aligned with the lateral malleolus of the ankle. Quadriceps strength was operationally defined as the peak isometric force produced during a volitional contraction on a Kin–Com dynamometer. The knee was positioned at 75° of flexion, and patients were asked to extend their knee (“kick as hard as possible”) for 3 s. The peak of three trials was recorded, and the force was normalized to BMI. The quadriceps strengths of both the surgical and non-surgical limbs were measured. Body weight and height were measured on the same clinical scale to determine BMI at baseline and follow-up.

Statistical Methods

We developed a decision algorithm using Classification and Regression Trees (CART), described by Breiman et al.28 and implemented by a recursive partitioning function in R29 with TUG time, 6MW distance, and SCT time, 6 months after TKA as the primary outcomes. Preoperative TUG time, 6MW distance, SCT time, sex, age, BMI, knee flexion, knee extension, quadriceps strength in the surgical and non-surgical limbs, KOS-ADLS score, SF-36 MCS and PCS scores, and the global rating of knee function score were evaluated as predictors of the primary outcomes 6 months after surgery. The routine begins by splitting the sample space into two regions, and then models the response variable, Y, in each region. The variable and split point are selected to achieve the best fit. One or both regions are split into two more regions, and this process is continued until some stopping rule is applied. Observations where the predictor is missing for a given split are passed left or right using a surrogate split based on a non-missing feature that correlated with the predictor. The splitting method we specified was ANOVA, with the criterion for stopping set to a 5% improvement in R2 as long as there were ≥20 cases in a node. 95% confidence intervals were estimated for the mean of the outcome measure at all nodes.

RESULTS

Mean ± SD performance on the TUG for patients (N = 118) at 6 months was 7.9 ± 1.8 s. The best predictor of TUG performance at 6 months was TUG at baseline. Individuals taking ≥10.1 s on the TUG and aged ≥72 years before surgery demonstrated the poorest performance on the TUG 6 months after surgery (Fig. 1). The mean performance of this group at 6 months was 10.8 s (95% CI: 10.0, 11.6). Individuals faster than 7.6 s on the TUG before surgery had the best outcome on the TUG 6 months after surgery. The mean performance of this group at 6 months was 5.8 s (95% CI: 5.4, 6.2).

Figure 1.

At each node, the mean TUG time at 6 months is listed as well as the number of patients at that split and the 95% confidence interval for the mean. Squares indicate terminal nodes. The cut point for each split is displayed on the vertical line stemming from the split. N, number of patients at each split; Pre-op, preoperative; TUG, Timed Up and Go test.

Mean performance on the 6MW for patients (N = 102) at 6 months was 543 ± 118 m. The best predictor of 6MW performance at 6 months was the 6MW at baseline. Individuals walking <314 m before surgery had the poorest outcome on this test 6 months after surgery (Fig. 2). The mean performance of this group at 6 months was 363 m (95% CI: 297, 428). Individuals walking ≥668 m before surgery had the best outcome 6 months after surgery. The mean performance of this group at 6 months was 785 m (95% CI: 723, 848).

Figure 2.

At each node, the mean 6MW distance at 6 months is listed as well as the number of patients at that split and the 95% confidence interval for the mean. Squares indicate terminal nodes. The cut point for each split is displayed on the vertical line stemming from the split. N, number of patients at each split; Pre-op, preoperative; 6MW, six-minute walk test; NMVC Non, normalized maximum voluntary contraction of quadriceps in non-surgical limb.

Mean performance on the SCT for patients (N = 119) at 6 months was 12.9 ± 5.0 s. The best predictor of SCT performance at 6 months was SCT at baseline. Individuals taking ≥17 s to complete the SCT and who scored <40 on the SF-36 MCS before surgery demonstrated the poorest stair climbing performance 6 months after surgery (Fig. 3). The mean performance of this group at 6 months was 22.4 s (95% CI: 12.7, 32.1). Individuals performing faster than 17 s on the SCT before surgery had the fastest outcomes on this test at 6 months. The mean performance of this group at 6 months was 10.3 s (95% CI: 9.5, 11.1).

Figure 3.

At each node, the mean SCT time at 6 months is listed as well as the number of patients at that split and the 95% confidence interval for the mean. Squares indicate terminal nodes. The cut point for each split is displayed on the vertical line stemming from the split. N, number of patients at each split; TUG, timed up-and-go; SCT, stair climbing test; SF-36 MCS, Short Form-36 Mental Component Score.

DISCUSSION

The purpose of this study was to develop a preliminary decision algorithm predicting functional performance 6 months after TKA using preoperative functional measures, joint performance measures, anthropomorphic measures, demographic measures, and self-report measures. The salient feature of the resulting decision algorithm tree is a sequence of simple yes/no questions that, when answered, will allow the clinician to predict a new patient's 6-month post-surgery performance with a known level of certainty. Once validated in a larger sample of a representative population, this algorithm could be utilized with the patient's reported pain level to help guide the patient and surgeon in deciding whether and when to perform TKA. A health care provider could compare the patient's age, SF-36 MCS score, and functional performance on the TUG, SCT, and 6MW to thresholds indicated by the model and alert the patient of increasing risk for a poorer outcome. The patient and surgeon could then factor this result into their decision process of whether or not TKA is indicated. Patients who are at risk of a poorer outcome could be counseled to undergo surgery as soon as possible to prevent further decline.

Performance on the SCT, TUG, and 6MW at 6 months after surgery was chosen because patients recovering from TKA are known to have functional limitations on these tasks both before and after surgery compared to healthy adults.18, 30–32 Prior to surgery, patients awaiting TKA have demonstrated 75% slower TUG times, 160% longer SCT times, and have walked 31% less distance on the 6MW test compared to healthy, age-matched adults.18 These deficits persist following surgery, with patients demonstrating 63% longer TUG times, 105% longer SCT times, and 28% less distance walked on the 6MW test.18 Results from this study were consistent with past research demonstrating that performance on these measures after surgery is strongly correlated with performance on these measures before surgery, in addition to age and mental health status. This model, while preliminary, presents a potentially useful screening tool in trying to quantify thresholds below which patients who have experienced a decline in functional ability are at increased risk for relatively inferior outcomes in terms of functional recovery if they delay surgery. This would suggest that patients should be counseled to have surgery sooner rather than later.

The TUG test has been correlated with functional mobility and fall risk.33 The regression tree developed from our secondary analysis indicated that individuals taking ≥10.1 s on the TUG and aged ≥72 years before surgery had the poorest performance on the TUG 6 months after surgery. Mean performance on the TUG for this group was 10.8 s. Estimates of TUG times in community dwelling, older adults of similar age have ranged from 5.6 to 8 s.18, 22 Thus, the group mean is on average slower than normative estimates for this population, indicating decreased gait speed, ability to transfer from sitting to standing, turning, and potentially an elevated risk of falling.

Individuals walking <314 m on the 6MW before surgery had the poorest performance on the 6MW test 6 months after surgery. Mean performance for this group was 363 m. Estimates of average 6MW distances in community dwelling, older adults of similar age have ranged from 538 to 600 m.18, 22 This indicates that the group preliminarily identified by the algorithm has significantly decreased endurance as well as gait speed compared to their peers.

Individuals taking ≥17 s on the SCT and scoring <40 on the SF-36 MCS before surgery had the poorest performance on the SCT 6 months after surgery. Average SCT time on a similar set of stairs was found to be 8.9 s for similar aged, healthy adults without knee pain.18 This would indicate that all four groups in the decision algorithm had slower stair climbing times than normal, but in particular, those taking ≥17 s preoperatively had the poorest outcome. Since 75% of individuals report difficulty with climbing stairs following TKA, this finding is not surprising.34

Notably, the SF-36 PCS and KOS-ADLS were not significant predictors at 6 months of performance on the SCT, 6MW, and TUG. Recent evidence suggests that self-reported outcome measures such as the SF-36, KOS-ADLS, and the WOMAC, may not reflect true functional performance and are highly correlated with pain ratings.35–37 Thus, both functional outcome measures and self-reported outcome measures are needed to accurately capture a given individual's outcome.35 However, the SF-36 MCS score at baseline was related to poorer performance on the SCT at 6 months in this cohort. Previous studies identified poorer preoperative MCS scores to be related to poorer postoperative self-reported function.13, 38 In this study, individuals scoring <40 at baseline had the poorest performance on this test. Based upon normative scoring for the SF-36, a score of <40 indicates that this group is at least 1 SD below normal mental health.39 A score of 42 or less on the MCS has a specificity of 81% and sensitivity of 74% of detecting depressive disorders.39 Thus, depression may play a role in poor SCT performance.

There are two main limitations to our study. The primary limitation is sample size. A larger sample of individuals would allow for more precise identification of thresholds of measures that predict greater risk for a poorer outcome. A larger sample would also allow more robust model validation procedures. Further research is needed to validate and expand the proposed model. A second limitation is generalizability. The inclusion criteria for the original study were fairly strict, and this limits generalizability to the larger population of individuals undergoing TKA. Since this study examined outcomes only at the 6-month time point, the results cannot be generalized to predict function beyond this time point. Moreover, evidence exists that the rehabilitation intervention provided to this group of patients may be superior to the standard of care in the community.20 Thus, the functional performance results of this population may represent a population of subjects with higher levels of functioning than typically seen after TKA. Additionally, while the predicted performance measures are separated into several ordered categories, there is no implication of clinically meaningful differences between groups. The outcome measures are shown with confidence intervals while the cut-off values, which were the parameters of interest, were not. The cut-off values will certainly change with a larger sample.

Acknowledgements

This research was funded by the National Institutes of Health (R01 HD041055, K23 AG029978, and T32 AG00279) and the Foundation for Physical Therapy (Promotion of Doctoral Studies I Scholarship). The sponsors had no role in the investigation.

Ancillary