Concordance of four methods of disability assessment using performance in the home as the criterion method




To determine the concordance of 4 methods of disability assessment with the criterion method. Performance testing in the home was selected as the criterion.


The task performance of 57 community-dwelling older women (≥70 years) with knee osteoarthritis was examined through self report, proxy report, clinical judgment based on impairment measures, performance testing in an occupational therapy clinic, and performance testing in participants' homes. The 26 tasks represented 4 domains of daily living activities: 5 functional mobility, 3 personal care, 14 cognitively oriented instrumental activities of daily living (IADL), and 4 physically oriented IADL.


In general, self reports and proxy reports had the highest concordance with in-home performance test results. Nonetheless, even for these methods, depending on task domain, the rate of discordance ranged from 31% to 54%, being least in personal care and greatest in the physically oriented IADL.


Disability estimates based on self reports, proxy reports, clinical judgments, and hospital performance-based assessments are not interchangeable with in-home task performance.


Assessing functional status is an integral component of arthritis care. The cumulative effects of disease-associated and age-related impairments are manifested in daily life. The capacity to perform basic activities of daily living (BADL), such as dressing and bathing, and instrumental activities of daily living (IADL), such as shopping for food and managing finances, is a salient marker of overall health, a critical determinant of the ability to live independently, and a major indicator of the help needed when functional status is impaired. The American College of Physicians (1), Royal College of Physicians of London (2), and the World Health Organization (3) all have endorsed the inclusion of functional assessment in effective patient management.

Despite the importance of assessing function in BADL and IADL in the care of patients with arthritis, little is known about the relative concordance of the various methods used to obtain this information. Self report questionnaires, such as the Western Ontario and McMaster Universities Osteoarthritis Index (4), the Health Assessment Questionnaire (5), and the Arthritis Impact Measurement Scales (6), support a client-centered, subjective approach. In contrast, the Functional Independence Measure (7) and the Klein-Bell Activities of Daily Living Scale (8), where patients are actually observed carrying out their daily activities in either a simulated (e.g., occupational therapy clinic) or naturalistic (e.g., home or nursing home) setting, reflect an objective, performance-based measurement approach. Compared with objective methods, subjective methods are easy to learn, require less skill to administer, are less time consuming to administer, and are less costly. If the yield from subjective and objective methods is comparable, disability would be defined similarly by either option, and subjective methods would be preferred because of their cost effectiveness.

Studies comparing patient and proxy reports of disability to objective measures of disability have yielded conflicting results, with some researchers concluding that self and proxy reports are interchangeable with performance-based ratings (9–12) and others drawing the opposite conclusion (13–22). These studies were conducted in settings including acute medical care (12, 14, 19, 21), psychiatric care (16), rehabilitation (9, 12, 15, 17, 20), outpatient (11, 12), day care/hospital (13, 21), and community residences (10, 18, 22); they included disease-specific (10–12, 20) as well as mixed (9, 13–17, 19, 21) samples. Conflicting results were achieved regardless of whether the objective criterion was the simulated, hospital setting (9, 11–17, 19, 21) or the naturalistic, home setting (10, 18, 20, 22). Comparisons of subjective methods have typically found that self reports evidence less disability than proxy reports (23, 24); comparisons of subjective and objective methods have generally found that performance-based methods evidence less disability than self reports (10, 17, 18, 20–22), but Edwards (14) found that testing elicited more disability. With 1 exception (18), research has focused almost exclusively on mobility skills and BADL in contrast to IADL, although some studies (10, 11, 14, 20) included several IADL items. Consequently, knowledge about the effects of assessment method on identifying disability in IADL, which stand at the interface between independent and dependent living, is severely limited.

The purpose of this study was to evaluate the concordance of disability status based on self ratings, proxy ratings, therapist ratings based on impairment data, and performance-based measures derived from testing in the occupational therapy clinic in relation to performance-based testing conducted in participants' homes. In-home performance was selected as the criterion method because 1) the home is where people usually perform their routine daily living tasks; 2) the home includes familiar task environments and objects; 3) the home is the physical and social environment wherein people currently live and wish to remain; and 4) the home is the environment that most supports and challenges everyday task performance. Our study differs from prior studies in the examination of a broad spectrum of IADL and the simultaneous study of 3 subjective and 2 objective assessment methods.



Because osteoarthritis (OA) limits the BADL and IADL participation of more than 5 million Americans, we chose to limit our focus to 1 homogeneous subsample of this population. Participants had to have a primary diagnosis of OA of the knee with greater than grade 2 radiologic severity (25); with or without OA of the upper extremities; have no other disabling pathology; and report OA-related disability in at least 1 functional mobility (FM), personal care (PC), or IADL task. They also had to be female; at least 70 years of age; community-dwelling; medically stable; have a history of successfully performing the targeted daily living tasks; and have no significant, uncorrected (i.e., with the use of hearing aid or glasses) hearing or visual impairments. The sample was restricted to women because in the present generation of older adults, most IADL tasks are performed by women. A history of successful task performance was required to exclude task disability attributable to a lack of learning or skill. Each participant identified a proxy informant, who was defined as the adult most familiar with the participant's ability to care for herself and her residence.

Disability measures.

For each method, 26 daily living tasks—5 FM tasks, 3 PC tasks, 14 cognitively oriented IADL (e.g., stovetop cooking), and 4 physically oriented IADL (e.g., sweeping the floor)—were assessed. All methods rated independence of task performance on a 4-point ordinal scale that ranged from a low of 0 (unable) to a high of 3 (total independence).

Performance-based methods: clinic and home (criterion).

The Performance Assessment of Self-Care Skills (PASS), clinic and home versions (ref. 26 and Rogers JC, Holm MB: unpublished assessment tool), were used to measure disability in the occupational therapy clinic of the outpatient geriatric center and in the participants' homes, respectively. The PASS is a 26-item criterion-referenced instrument that was previously described in detail (see Table 1) (27). By specifying task conditions, critical subtasks, instructions, and materials for each test item, the PASS allows task performance to be observed in a controlled and standardized situation. Although the subtasks critical for task completion are identical in the clinic and home versions, the task materials change. For example, participants use their own task materials (e.g., clothing, medications) when tested in the home, but are provided these materials by the assessor when tested in the clinic. For rating task independence, each item consists of several subtasks, and subtask ratings reflect the kind and frequency of assistance provided by the examiner for task initiation, continuation, and completion. The independence rating is the mean of all subtask ratings for each item.

Table 1. Domains, test items, and items descriptions for the Performance Assessment of Self-Care Skills*
Domain/test itemItem description
  • *

    IADL = instrumental activities of daily living.

Functional mobility 
 Bed transfersMove onto bed, turn over, move off bed
 Indoor walkingNegotiate around obstacles
 Toilet transfersMove onto commode, obtain and dispose of toilet paper, move off commode
 Tub and shower transfersMove into and out of bathtub, move into shower, turn around as in rinsing body, move out of bathtub
 Stair useAscend and descend stairs
Personal care 
 Oral hygieneClean teeth or dentures
 Trim toenailsTrim 2 toenails on each foot
 DressDon and doff cardigan garment and pants, including fasteners
Cognitive IADL 
 ShopSelect grocery items from an array and pay the bill for them
 Pay billsPay 2 utility bills by check
 Balance checkbookEnter 2 withdrawals and 1 deposit into checkbook and balance the account
 Mail billsPrepare 2 utility bills and checks for mailing
 Use telephoneAscertain the time a local pharmacy opens and closes
 Manage medicationsPrepare medications for the next 2 days; indicate time for next dose
 Obtain auditory informationReport critical information obtained through hearing (radio)
 Obtain visual informationReport critical information obtained through vision (newspaper)
 Repair a flashlightChange battery or bulb to make a flashlight light
 Manage home safetyIdentify and correct 3 home safety problems
 Prepare a light meal 
  Use ovenBake muffins
  Use stovetopHeat soup
  Use sharp utensilsCut fruit
 Play bingoPlay a game of bingo
Physical IADL 
 SweepSweep cereal from floor
 Dispose of garbageRemove garbage from residence
 Change bed linensRemove and replace bed linens
 Clean up after mealWash cookware and dinnerware; clean work area

Because the PASS is a criterion-referenced instrument, observers' decision consistency is the clinically relevant issue and is reported 3 ways: number of agreements/number of possible agreements, mean percent agreements, and mean kappa coefficients (28). For each observation, 2 observers rated each participant, with the observer pairs derived from different combinations of the 5 project staff. On the home version, raters agreed for 588 of 595 observations for the 5 FM items (percent agreement 99%; average kappa 0.74, range 0.56–0.91). Agreement among raters for the 3 PC items was 484 of 497 observations (percent agreement 98%; average kappa 0.83, range 0.74–0.97). Agreement among raters for the 14 cognitive IADL items was 1,868 of 1,990 observations (percent agreement 94%; average kappa 0.32, range 0.02–1.00) and for the 4 physical IADL items was 469 of 482 observations (percent agreement 97%; average kappa 0.19, range 0.01–0.72). On the clinic version, raters agreed on 507 of 525 observations for the 5 FM items (percent agreement 97%; average kappa 0.43, range 0.01–0.82). Agreement among raters for the 3 PC items was 439 of 480 observations (percent agreement 91%; average kappa 0.38, range 0.16–0.58). Agreement among raters for the 14 cognitive IADL items was 1,682 of 1,805 observations (percent agreement 93%; average kappa 0.29, range 0.02–1.00) and for the 4 physical IADL items was 436 of 462 observations (percent agreement 94%; average kappa 0.42, range 0.03–0.65). Thus, excellent decision consistency was achieved using percent agreement but only fair interobserver reliability was achieved with the probabilistic kappa coefficients; however, this is not an unusual finding for criterion-referenced instruments such as the PASS (27–29), for which a ceiling effect is expected in a nonimpaired population.

Self report and proxy report.

PASS items were converted to interview items to provide self report and proxy report measures that could be readily compared with the PASS items. Participants and proxies were interviewed about their capability, that is what they “could do” versus what they “routinely did.” Participants were interviewed in person at the outpatient center. Proxies were interviewed by telephone. Prior to the interview, proxies received sample questions and the 4 response options: 0 (unable) to a high of 3 (total independence). Test-retest percent agreement, with a 1-week interval, for the total self report instrument was 93.1%; for the 5 FM and 3 PC items, it ranged from 90.0% to 100%; for the 14 cognitive IADL, it ranged from 85.0% to 100%; and for the 4 physical IADL, it ranged from 80.0% to 100%.

Clinical judgment.

Sensory, cognitive, motor, and affective impairment data were collected using standardized or clinical assessments. Sensory measures were the portable vision screener, Sent-Ident (functional hearing) (30), Performance-Oriented Assessment of Balance (31), and functional reach (balance) (32). Cognitive measures were the Modified Mini-Mental State (3MS) (33) and Trailmaking parts A and B (34). Motor assessments were grip (dynamometer) and pinch (pinch meter) strength, the Jebsen-Taylor Hand Function (dexterity) (35), the Keitel Functional Test (joint mobility) (36), forced expiratory volume in 1 second (FEV1), forced expriatory volume, and maximum voluntary ventilation (portable spirometer). The affective measure was the Geriatric Depression Scale (37). Three occupational therapists, with a mean of 25 years of experience, reviewed the raw impairment data (e.g., grip strength, dexterity, balance, FEV1, 3MS) and when available compared the results with normative values, and then made a clinical judgment of participants' level of independence in each of the 26 tasks. Interrater reliability for the clinical judgments was established at r = 0.92.


Following approval by the University of Pittsburgh internal review board, potential participants were recruited from the outpatient geriatric and rheumatology services of the now University of Pittsburgh Medical Center-Health System. Patients potentially meeting the study criteria were referred to the study, with their approval, by their physicians. The study requirements were explained and informed consent was obtained from those desiring to participate. The medical records of potential participants were reviewed by the project rheumatologist or geriatrician to confirm the diagnostic criteria, and a physician assistant conducted a health interview using the Cumulative Illness Rating Scale for Geriatrics (38) to assess medical burden. After the eligibility criteria were verified, all assessments were scheduled within 5 days. Self report interviews occurred on day 1 in the outpatient clinic, and were followed by administration of the impairment measures (clinical judgment data), which were forwarded to professional therapists for use in formulating clinical judgments of disability. Within 2 days, the proxy informant was interviewed on the telephone. On day 2, participants returned to the outpatient clinic for performance-based observation (PBO) in the occupational therapy clinic (PBO–clinic). Finally, on day 3, PBO was done in participants' homes (PBO–home). A fixed, rather than a random, order of assessment methods was followed because perceptions (self report) of task performance are more likely to be influenced by demonstrations (performance based) of task performance than vice-versa. The performance test (home and clinic versions) items were designed to replicate tasks routinely carried out in daily life to maintain independent living; they were not novel items. Furthermore, all tasks were well practiced by our subjects, who reported a successful history of task performance. Because of the routine nature of the tasks and subject familiarity with them, the effect of prior clinic performance on home performance was regarded as negligible, and we elected to follow usual practice in having clinic assessment precede home assessment.

Data analytic strategy.

Concordance was computed as a simple percent agreement statistic. Within each domain of functioning (e.g., FM), the number of items for which the particular method of assessment (e.g., self report) produced the same rating on the 0 to 3 scale as the PBO–home criterion assessment was divided by the total number of items in the domain. For example, if a participant's self report resulted in the same rating as the PBO–home assessment on 3 of the 5 FM tasks, percent agreement would be 60 for that person. The average or mean percent agreement was computed for each method within each domain for use as the dependent variable in analysis of variance (ANOVA) procedures (described below). For descriptive purposes, this mean is equivalent to the overall percent agreement across all items (5 for FM) and all subjects (n = 57)—a possible 285 total FM items (items with missing ratings are excluded from analysis).

Bias, or direction of disagreement, was derived by computing the percentage of items for each method within each domain that resulted in 1) higher ratings than the PBO–home criterion (i.e., overestimations of ability), and 2) lower ratings than the criterion (i.e., underestimations of ability). Bias was computed as the percentage of higher than criterion disagreements minus percentage of lower than criterion disagreements. Average (mean) bias was computed for each method within each domain, and also used as an outcome in ANOVA procedures.

Within each of the 4 domains, one-way repeated measures ANOVA procedures were conducted to test for differences in percent agreement and bias, with method of assessment (self report, proxy report, clinical judgment, PBO–clinic) as the independent variable. Followup contrasts comparing each method with all the others were also performed to pinpoint significant differences. Additional perspective is gained by examining the rank ordering of domains in terms of percent agreement and bias for each assessment method. This analysis answers the questions: Within which domain(s) is the particular assessment method most (and least) accurate? and For which domain(s) does the assessment method tend to produce overestimations of home performance as opposed to underestimations?



Table 2 reports demographic, impairment, and disability data for the study participants. Participants had a mean age of 81 years, primarily were white, were fairly well educated, and two-thirds lived alone. They were neither depressed nor cognitively impaired. On the performance test, they exhibited the most disability in personal care and the least in the cognitive IADL.

Table 2. Study participant descriptive statistics (n = 57)
Variable (score range)MeanSD
  • *

    Performance-based observation (PBO) of FM, PC, and instrumental activities of daily living (IADL).

Age, years81.005.01
White, %91.2 
High school graduate, %80.7 
Married, %19.3 
Living alone, %66.7 
Income < $10,000/year, %31.9 
Keitel Functional Test (4–100)28.8610.98
Geriatric Depression Scale (0–15)2.932.48
Modified Mini-Mental State (0–100)91.396.20
Functional mobility (FM)* (0–3)2.750.38
Personal care (PC)* (0–3)2.480.62
Cognitive IADL* (0–3)2.890.15
Physical IADL* (0–3)2.760.42


Typical proxies were daughters (49.1%), friends (15.8%), and husbands (10.5%), 54.4% of whom usually provided some help to the participant. The average age was 61.3 years (SD ± 16.4), and the proxy spent an average of 39.9 hours per week (SD ± 13.49) with the participant.

Assessment method differences.

Table 3 presents mean percent agreement, mean percent overestimation, mean percent underestimation, and mean bias for each of the 4 assessment methods within each domain. The table also shows the results of the followup contrast analyses.

Table 3. Percent agreement and bias of self report, proxy report, clinical judgment, and performance-based observation (PBO) in a clinic with PBO in the home (criterion method), for functional mobility, personal care, cognitive IADL, and physical IADL domains*
 = Home> Home< HomeBias
  • *

    = Home = the percent agreement with the criterion (performance in the home); > Home = percent of ratings higher than the criterion (overestimation of performance); < Home = percent of ratings lower than the criterion (underestimation of performance); Bias = direction and magnitude of the rating bias compared with the criterion measure (computed as > Home − < Home). Items in each column within a domain (i.e., FM, PC) that share a letter are not significantly different. FM = functional mobility; PC = personal care; IADL = instrumental activities of daily living.

Functional mobility    
 Self report57.5a5.335.8−30.5a
 Proxy report61.4a4.624.2−19.6b
 Clinical judgment55.1a5.339.3−34.0a
Cognitive IADL    
 Self report68.1a22.38.4+13.9a
 Proxy report64.5b22.36.8+15.5a
 Clinical judgment50.0c8.041.4−33.3b
Personal care    
 Self report69.0a17.013.5+3.5a
 Proxy report69.0a14.06.4+7.6a
 Clinical judgment55.0b15.829.2−13.5b
Physical IADL    
 Self report46.1a11.841.7−29.8a
 Proxy report54.4ab9.626.3−16.7b
 Clinical judgment43.9a8.846.9−38.2a

Functional mobility.

There were 5 items that assessed FM (see Table 1).

Concordance: The PBO-clinic method was most concordant with the PBO–home criterion when assessing FM. A 73.3% agreement rate was obtained for this method. Clinical judgment was least concordant (55.1% agreement). The ANOVA for percent agreement for FM items resulted in a significant main effect for Method, (F[3,54] = 9.13, P < 0.001). Followup contrasts showed that the PBO–clinic method was significantly more concordant than the other 3 methods, which did not differ from one another. (F statistics for individual contrasts are not reported in the interests of conserving space. These are available from the authors upon request.)

Bias: All 4 methods tended to produce more underestimations than overestimations of FM when compared with performance-based testing conducted in the home (i.e., they exhibited a negative bias). The clinical judgment and self report methods tended to produce the largest underestimations. The ANOVA for bias resulted in a significant main effect for Method (F[3,54] = 9.81, P < 0.001). Followup contrasts showed that the clinical judgment and self report methods produced significantly more negative bias than proxy reports or PBO–clinic assessments.

Personal care.

There were 3 items that assessed PC (see Table 1).

Concordance: In this domain, self reports and proxy reports were more concordant (both 69% agreement) than either clinical judgments (55% agreement) or the PBO–clinic assessments (57.3%). The ANOVA for percent agreement for PC items resulted in a significant main effect for Method (F[3,54] = 3.70, P < 0.05), although the differences were not as large as those for FM. Followup contrasts showed that the self and proxy report methods (which did not differ from one another) were significantly more concordant than the clinical judgment and PBO–clinic methods, which also did not differ.

Bias: Although the self report and proxy report methods tended to produce slight overestimations of PC abilities relative to the home performance assessments, the clinical judgment and PBO–clinic assessments tended to underestimate these abilities. The ANOVA for bias for PC items resulted in a significant main effect for Method (F[3,54] = 17.67, P < 0.001). Followup contrasts showed that the PBO–clinic and clinical judgment methods produced significantly more negative bias than self or proxy report methods.

Cognitive IADL.

There were 14 items that assessed abilities to perform cognitive-based IADL (see Table 1).

Concordance: In this domain, similar to the PC domain, self reports (68.1% agreement) and proxy reports (64.5%) were more concordant with PBO–home than either clinical judgments (50%) or the PBO–clinic assessments (55.4%). The ANOVA for percent agreement for cognitive IADL items resulted in a significant main effect for Method (F[3,54] = 10.93, P < 0.001). Followup contrasts showed that the self reports were significantly more concordant than proxy reports, which were in turn more concordant than the PBO–clinic and clinical judgment methods, which did not differ.

Bias: The self report and proxy report methods tended to produce overestimation of cognitive IADL abilities relative to the PBO–home assessments. In contrast, the clinical judgment and PBO–clinic assessments tended to underestimate these abilities. The ANOVA for bias for cognitive IADL items resulted in a significant main effect for Method (F[3,54] = 123.59, P < 0.001). Followup contrasts showed that the PBO–clinic and clinical judgment methods (which did not differ from one another) produced significantly more negative bias than self or proxy report methods, which did not differ.

Physical IADL.

There were 4 items that assessed abilities to perform physical-based IADL (see Table 1).

Concordance: Concordance with PBO–home assessments was generally the lowest in this domain. Proxy reports (54.4% agreement) and PBO–clinic (52.0%) methods were slightly more concordant than self reports (46.1%) and clinical judgments (43.9%). The ANOVA for percent agreement for physical IADL items resulted in a significant main effect for Method (F[3,54] = 3.88, P < 0.05). Followup contrasts showed that the PBO–clinic method was significantly more concordant than self reports or clinical judgments. None of the other contrasts was significant.

Bias: All 4 assessment methods tended to produce underestimation of physical IADL abilities relative to the home performance assessments. This was most evident in the clinical judgment method. Proxy reports were the least prone to underestimation. The ANOVA for bias for physical IADL items resulted in a significant main effect for Method (F[3,54] = 5.68, P < 0.01). Followup contrasts showed that the proxy report method produced significantly less underestimation of physical IADL abilities relative to the criterion (PBO–home) than the other three methods, which did not differ.

Rank ordering of domain concordance and bias.

The results just presented show that there are assessment method differences in concordance with home-based performance assessments, and that these differ across assessment domains. The same is true of tendencies to overestimate or underestimate performance relative to the criterion. That is, the methods show different patterns of bias when compared with home—the current and preferred living environment—depending on whether functional mobility, personal care, cognitive IADL, or physical IADL tasks are being assessed. To gain additional perspective on these findings, Table 4 shows the rank ordering of the 4 domains in terms of both concordance and bias for each of the 4 assessment methods. The table lends insight into the conditions under which each method is most (and least) accurate, as well which domains of function tend to be overestimated versus underestimated for each method.

Table 4. Rank ordering of domains in terms of percent agreement and bias (most positive to most negative) within assessment method*
 Self reportProxy reportClinical judgmentPBO–clinic
  • *

    PBO–clinic = performance-based observation in the clinic; PBO–home = performance-based observation in the home; IADL = instrumental activities of daily living.

Percent agreement with PBO–homePersonal care (69.0)Personal care (69.0)Functional mobility (55.1)Functional mobility (73.3)
Cognitive IADL (68.1)Cognitive IADL (64.5)Personal care (55.0)Personal care (57.3)
 Functional mobility (57.5)Functional mobility (61.4)Cognitive IADL (50.0)Cognitive IADL (55.4)
 Physical IADL (46.1)Physical IADL (54.4)Physical IADL (43.9)Physical IADL (52.0)
Bias based on PBO–homeCognitive IADL (+13.9)Cognitive IADL (+15.5)Personal care (−13.5)Functional mobility
Personal care (+3.5)Personal care (+7.6)Cognitive IADL (−33.3)(−15.1)
 Physical IADL (−29.8)Physical IADL (−16.7)Functional mobilityPersonal care (−21.6)
 Functional mobility (−30.5)Functional mobility (−19.6) (−34.0) Physical IADL (−38.2)Physical IADL (−27.6) Cognitive IADL (−32.2)

To summarize, the self-report and proxy report methods show similar patterns for both concordance and bias. These methods were most concordant with home-based performance assessments for PC and cognitive IADL tasks, and least concordant for physical IADL tasks. In addition, in terms of bias, both self and proxy reports tended to produce overestimations of cognitive IADL, and to a lesser extent, PC abilities, while underestimating physical IADL and FM. The self-report underestimations of physical IADL and FM were somewhat more pronounced than those for proxy reports. The clinical judgment and PBO–clinic methods had similar concordance patterns across domains. Both methods were most concordant with PBO–home assessments in the FM domain (although concordance was much higher for PBO–clinic than for clinical judgment), and least concordant in the physical IADL domain. Finally, both clinical judgments and clinic-based performance methods consistently underestimated abilities relative to the home-based criterion, although the domain for which this was most pronounced differed (physical IADL for clinical judgment, cognitive IADL for PBO–clinic).


The primary objective of this study was to ascertain the relative concordance of several subjective and objective disability assessment methods to a criterion method—the observation of task performance in participants' homes. Our most significant finding is the extent of discordance between in-home performance and self reports, proxy reports, clinical judgments, and clinic performance. Our data show that with some exceptions, the ratings of patients and their proxies are more concordant with the criterion than either the impairment-based clinical judgment of therapists or the observation of task performance in a hospital setting. The greatest concordance of patient and proxy reports was seen for PC (69%) and the cognitive IADL (≥64%), followed by FM (>57%), and lastly physical IADL (≥46%). Thus, from a task perspective, older women with knee OA and their proxies had the most difficulty perceiving disability in those tasks involving gross body movement (i.e., FM, physical IADL) that are most likely to be affected by knee OA. Nonetheless, from a methodologic perspective, even with the methods most concordant with in-home performance (69%; self and proxy report of personal care), the discrepancy rate was 31% at best; and at worst (46%; self report of physical IADL), the discrepancy rate was 54%.

Comparisons between patient and proxy reports of disability have generally suggested that patients perceive less disability than proxies, whether the proxies are family members or professional caregivers (23, 24, 39). Our results, however, indicated that the concordance of patient and proxy ratings with in-home performance-based ratings was comparable. The only significant difference between the concordance of patient and proxy ratings occurred in reference to the cognitive IADL domain, with self concordance being better than proxy concordance. Our findings are consistent with those of Shinar et al (12) and support the interchangeability of patient and proxy disability ratings among all domains for this population. Hence, when patients are unable or unwilling to provide disability data, proxies, who are knowledgeable about their daily living habits, may be substituted. Our findings do, however, contrast with those of Kuriansky et al (16), who ascertained that proxy reports were more concordant with performance-based ratings than self ratings.

When there was substantive discordance (bias) of patient and proxy reports with in-home performance, the direction of discordance was the same for both patients and their proxies, but the direction varied by functional domain. Participants and their proxies overestimated functional independence for PC and the cognitive IADL and underestimated it for FM and physical IADL. Thus, these elderly women appeared to have a tendency to deny cognitive difficulties and accentuate physical dependencies and were supported in these perceptions by their proxies. Elam et al (15) and Pinholt et al (40) ascertained that professionals appear to have these same biases. Perhaps our most intriguing findings occurred in regard to our comparison of the results of the 2 performance-based measures. With the exception of FM, where PBO–clinic had the highest concordance with PBO–home, its concordance was uniformly low for the other 3 domains, although still greater than chance. When clinic performance was not rated equal to home performance, it consistently underestimated it, suggesting greater disability. At first glance, this finding seems particularly unexpected because the same instrument, the PASS, was used to structure observations in both performance situations: the occupational therapy clinic and the home. The primary difference in the testing procedure was that the occupational therapy clinic was an unfamiliar, supportive environment, whereas the home was the familiar, naturalistic one. Thus, the low concordance between the clinic and home test results is likely due to environmental factors, which resulted in actual performance differences. For some tasks, the supportive environment of the occupational therapy clinic, which contained features to compensate for OA-related physical impairments (e.g., safety frame on commode, bathtub bench in tub) enabled independence. For other tasks, using familiar equipment in the familiar home setting enabled independence. The task performance differences exhibited between the 2 testing situations address the vital role that the environment plays in creating or alleviating disability, where disability is defined as the interaction between person abilities and environmental features (3). Prior research on persons with disabilities has documented both better performance in the home compared with the clinic (41) and better performance in the clinic compared with the home (42–44), depending on the nature of the disability and the supportiveness of the environment.

As was the case for performance-based observation in the clinic, the clinical judgment method also did not function effectively. In fact, it was the least concordant with the criterion, although not always significantly different from the other methods. Clinical judgment was included in our study because of the tendency of health care professionals to infer disability from impairment. Rehabilitation therapists, for example, may conclude that knee OA patients will exhibit disability in tasks generally requiring hip flexion (e.g., bathtub transfers, picking up a garbage sack) because of a joint restriction of 30° or more at the hip, without actually assessing these tasks. Several reasons may account for the ineffectiveness of the clinical judgment method in our study. Because our therapists were provided with substantively more impairment data than would typically be available in a clinical setting, they may have had difficulty selecting the salient clinical cues from the quantity of data (45). Our rationale for providing impairment data beyond that common to OA was that in older adults, disability often has multiple etiologies with joint limitations, for example, compounded by depression or dementia. This multifactorial etiology needs to be taken into account in planning rehabilitation. The tendency for clinical judgment to overestimate disability (negative bias) may stem from therapists' comparisons of our groups with the caseloads typically seen in rehabilitation, which would be substantively more disabled. Finally, the poor concordance between clinical judgment and in-home performance may be attributed to actual discordance between impairment and disability. Correlational studies on a wide range of physical, cognitive, and affective impairments typically yield positive but moderate relationships to task disability, suggesting that impairment does not necessarily lead to disability (46).

In conclusion, clinically, health care professionals can rely on cognitively intact women, who are seen as outpatients, to provide a reasonable indication of the functional tasks they can do at home. At the same time, they are cautioned that physical impairment appears to widen the discrepancy between what they say they can do at home and what they actually can do, with the widest gap occurring for FM and the physically oriented IADL. Although patient and proxy reports of functional status are comparable, whenever possible preference should be given to having patients speak for themselves to reinforce their autonomy. Nonetheless, if proxies are used as informants, they should be familiar with patients' routine task performance. Those who lack this familiarity should not be asked to provide information. Self reports of functional status can be readily incorporated into usual healthcare practices. Questions, such as those used in this study, about performance abilities for discrete tasks (i.e., cook using the stovetop) as opposed to task categories (i.e., cook) are recommended. Perceived changes in these abilities may be used to trigger more in-depth performance-based assessment. Performance-based assessment itself should be done in patients' homes rather than in clinical settings, which may yield a different level of task independence. Home-based assessment enables patients to perform tasks using their usual procedures, equipment, and adaptive techniques. Hence, it yields an accurate evaluation of their ability to live independently and of the disabilities that need to be targeted for rehabilitation. This methodologic research needs to be replicated across a wider range of rheumatic conditions and on people with more severe disability to assess the generalizability of the findings.