Measuring the consequences of osteoarthritis and joint pain in population-based studies: Can existing health measurement instruments capture levels of participation?




To identify health measurement instruments to investigate levels of participation associated with joint pain in a population survey questionnaire.


A comprehensive electronic search of the published literature was performed to identify potential instruments that could measure participation. All items from identified instruments were assessed for the ability to measure participation by 2 experienced and 2 inexperienced assessors. Agreement was determined in terms of actual agreement (%) and agreement beyond chance (κ).


Twenty-seven instruments (912 items) were identified. Agreement between the experienced assessors occurred in 86% of items (κ = 0.70, 95% confidence interval [95% CI] 0.65–0.75) and between the inexperienced assessors in 72% (κ = 0.40, 95% CI 0.34–0.46). The greatest proportion of participation items in one instrument was 82%.


None of the identified instruments consisted entirely of participation items. The concept of participation and its translation into measurement for use in the general population is likely to need further development.


Osteoarthritis (OA) affects an estimated 25% of the general population aged 55 years and older (1). It is one of the most common chronic health conditions in this age group and, in light of projected demographic trends, the total number of people with the disease is likely to increase (2). An important question is how to measure the impact of OA on individuals and society.

Population-based epidemiologic studies of OA have consistently highlighted discrepancies between radiographic evidence of joint pathology on the one hand and patient reports of pain severity and levels of disability on the other (3–5). Such discordance has been reported in other conditions as well (6–8). It suggests that the level of suffering and the impact on everyday life among individuals with joint pain and OA may be highly variable, even when the severity of the underlying disease pathology is comparable.

The recently revised International Classification of Impairment, Disability, and Handicap (ICIDH)—now referred to as the International Classification of Functioning, Disability and Health (ICF) (9)—offers a useful framework with which to describe the various consequences of OA. The ICF classifies functioning at 3 distinct levels: anatomic/physiologic (body function), individual person (activities), and societal (participation). Abnormal functioning at the 3 levels is described, respectively, as impairment (e.g., pain, stiffness), activity limitation (e.g., difficulty walking 100 yards), and participation restriction (e.g., difficulty looking after others). They broadly correspond to the older typology of impairment, disability, and handicap. A range of contextual factors in an individual's life, both personal and environmental, may act upon function at each level and modify relationships between these different levels (please refer to Appendix for definitions of terms).

Previous epidemiologic studies of OA have investigated the prevalence of impairment (e.g., radiographic joint space narrowing and osteophytes). However, there is increasing interest in the societal and personal consequences of OA (10), accompanied by a recognition that these cannot be inferred from levels of impairment or activity limitation alone (11). For example, Heberdens nodes may not affect functional abilities of the hand, but may result in social isolation due to feelings of inadequacy resulting from observable deformity. Hence, directly measuring levels of participation and participation restriction is an important step for epidemiologic research. However, the extent to which this has been satisfied in previous studies is unclear. The ICF framework is relatively new and postdates many of the epidemiologic studies in this field. Measurement of disability has been a feature of some studies (12–16) but coverage of participation restriction by the instruments used is uncertain.

As a preliminary step to investigating the prevalence, patterns, and determinants of participation restriction in population-based studies of joint pain and OA, we undertook a review of current self-report health instruments potentially relevant to the measurement of participation restriction. However, we wanted to identify instruments that could be completed by all older adults, not only those with joint pain. Our aims therefore were to 1) identify health measurement instruments that might include some measurement of participation (or participation restriction) and could be applied in a population survey; 2) reliably determine the proportion of items within these health measurement instruments that captured some aspect of participation (or participation restriction) to indicate the ability of instruments and their scores to exclusively represent participation; and 3) ascertain whether different observers with varying levels of survey experience and knowledge of the ICF framework could consistently identify participation items in these instruments.


The study was conducted in 4 phases and a standardized definition of participation was presented to the assessors at the start. The first phase was a comprehensive search of the published literature to identify instruments that might contain items measuring participation. In the second phase, criteria were applied to the instruments identified in phase 1 to exclude those that could not be applied in a self-completion population questionnaire survey and did not contain any items that measured participation. In the third phase, all items in the remaining instruments were independently assessed for their ability to measure participation by 2 assessors who were used to working with the ICF concepts (experienced) and 2 assessors who were new to the ideas (inexperienced). The assessors used a standardized definition of participation based on World Health Organization (WHO) literature. Agreement within the experienced assessor group, within the inexperienced assessor group, and between the groups was analyzed in phase 4.

Standardization of participation definition.

Before the study was undertaken, assessors read specific WHO literature (Appendix A; Taxonomic and terminological issues from the ICIDH-2 pre-final draft: definitions of activities and participation. Available at URL: to establish a consistent approach to and knowledge of the participation concept. They were presented with the WHO definition of participation, which is “an individual's involvement in life situations in relation to health conditions, body functions and structure, activities and contextual factors” (9). An item was considered as having the ability to measure participation if it met 2 criteria: It must obtain information relating to an action, task, or life situation and the action, task, or life situation must occur in relation to contextual factors.

The assessors were also provided with examples of participation items from the ICF domains for the measurement of activities and participation. Of the 9 domains described in the ICF, 3 (learning and applying knowledge, general tasks and demands, and communication) were deemed to capture activities and not participation. In 2 domains (mobility and self care), some subdomains were specified as capturing activities and some as capturing participation. There were 4 domains (domestic life; interpersonal interactions and relationships; major life areas, e.g., work; and community, social, and civic life) deemed to measure participation (Table 1). This structuring matched 1 of 4 methods, proposed by the WHO, for applying the domains to capture participation (9).

Table 1. Defined activities and participation domains, with partial overlap*
ICF domains for the measurement of activities and participationDomains and subdomains designated by a single observer to capture activities or participation
Activity domainsParticipation domains
  • *

    Please note these domains have been constructed and presented by the World Health Organization (9). This interpretation and structure of domains has been determined by a single observer and not the World Health Organization. ICF = International Classification of Functioning, Disability and Health.

Activity domains—all items capture activities1. Learning and applying knowledge 
 2. General tasks and demands 
 3. Communication 
Overlapping domains—Items within domains are specified as measuring activities or participation4. Mobility4. Mobility
 4.10–4.19 Changing and maintaining body position 4.2 Transferring oneself
 4.30–4.49 Carrying, moving, and handling objects 4.55–4.69 Moving around
 4.50 Walking 4.70–4.89 Moving around using equipment
 5. Self-care5. Self-care
  5.50 Eating 5.10 Washing oneself
  5.60 Drinking 5.20 Caring for body parts
   5.30 Toileting
   5.40 Dressing
   5.70 Looking after one's health
Participation—all items capture participation 6. Domestic life
 7. Interpersonal interactions
  8. Major life areas
  9. Community, social, and civic life

Phase 1: Search strategy and identification of instruments.

A comprehensive search strategy was conducted by a single observer (RW). Search criteria were devised to obtain instruments containing items that might measure levels of participation in people with joint pain. The first search was restricted to reviews of health measurement instruments. The review articles obtained did not contain instruments developed after the year 2000. Therefore, a second search was undertaken of individual peer-reviewed articles to find instruments developed after 1997, together with a hand search of references in all the obtained literature for additional instruments.

Both searches were performed with the following key words (MeSH headings and text words): joint pain, outcome measures, disability, handicap, participation, quality of life, musculoskeletal, arthritis, work related pain, and epidemiology (plus review in the first search) applied individually and in combination. Publications were sought in the English language, and retrieved by an online electronic search of 12 databases (BIDS, Zetoc, Index to thesis, CHID Online, Cochrane database, WebMedlit, CINAHL, Medline, Embase, Bandolier, Institute for Scientific Information Proceedings, and PsycINFO).

Instruments were identified from the searches if their reported design was 1) a generic health or quality of life measure; or 2) to measure the societal consequences of health conditions (whether generic or condition specific), such as handicap instruments; or 3) to measure the consequences of musculoskeletal conditions, excluding those relating to specific anatomic sites (e.g., low back, knee).

Phase 2: Selection of instruments.

A single observer (RW) reviewed all identified instruments and excluded from further evaluation, those that 1) were not designed to be self completed by the study participants; or 2) consisted entirely of items that could not capture participation (e.g., “How many relatives do you have?” “Have you had knee pain in the last month?”).

Phase 3: Rating of all items by experienced and inexperienced assessors.

To determine the proportion of items within the selected health measurement instruments that might measure participation, 2 assessors (A and B) independently rated, separately and blind to the title of the instrument, each item from the selected instruments for its ability to capture participation. These assessors were deemed to be experienced in that they had been working on the development of a population survey based on the ICF framework and had gained an in-depth knowledge of the participation concept.

To ascertain whether different observers with varying levels of knowledge of the ICF framework and the participation concept could consistently identify measures of participation, an additional 2 assessors (C and D) rated each of the items in the selected questionnaires for their ability to measure participation. They again worked separately and blind to the title of the instrument. Assessor C was a health survey researcher with a nominal amount of knowledge of the ICF who had never applied the concept in studies. Assessor D was a clinical trials coordinator who had no previous knowledge of the ICF.

When rating each item, assessors used a 3-category scale: Yes, item captures participation information; unsure if it does; and no, it does not.

Phase 4: Statistical analysis of agreement between assessors.

Agreement in rating items was determined separately for experienced (A and B) and inexperienced (C and D) assessors. Agreement was assessed for each item individually and for each instrument as a whole, by actual agreement (%; proportion of all items assessed where there was agreement). Agreement for all items was assessed by actual agreement and by agreement beyond chance (κ). Levels of agreement were compared between experienced and inexperienced pairs using actual agreement (%) and by comparing their respective kappas using a chi-square test (17).


Potential instruments.

The search for reviews produced a total of 3 books, 3 book chapters, and 16 peer-reviewed articles, highlighting 72 instruments. The search for instruments published after 1997 and a hand search of obtained literature highlighted an additional 3, resulting in a total of 75 instruments that might contain items measuring participation.

Selection of instruments.

Forty-eight instruments were removed because they had not been designed to be self administered or did not contain any items relating to actions, tasks, or life situations. Of the remaining 27 instruments, 17 (15, 17–32) were designed as generic health or quality-of-life measures (criterion 1), 6 (33–38) were designed to measure societal consequences of health conditions (criterion 2), and 4 (13, 39–41) were designed to measure the consequences of musculoskeletal conditions (criterion 3; Table 2). These 27 instruments contained a varying number of items (range 5–136) providing a total of 912 items for rating.

Table 2. Agreement on individual items within the 2 pairs of observers (experienced and inexperienced)*
Health measurement instrument (reference)Inclusion criteriaTotal number of itemsAgreedDisagreed
A vs B no. (%)C vs D no. (%)A vs B no. (%)C vs D no. (%)A vs B no. (%)C vs D no. (%)
  • *

    vs = versus; a = the instrument was designed as generic health quality of life measure; COOP = Cooperative Information Project; b = the instrument was designed to capture the societal consequences of health conditions; c = the instrument was designed to capture the consequences of musculoskeletal conditions

30-item screening scale (18)a303 (10)1 (3)23 (72)19 (63)4 (13)10 (33)
World Health Organization Quality of Life-100 (19)a10018 (18)5 (5)72 (72)75 (75)10 (10)20 (20)
Functional Limitations Profile (20)a13635 (26)10 (7)80 (59)72 (53)21 (15)54 (40)
Self Care Assessment Schedule (21)a103 (30)0 (0)3 (30)5 (50)4 (40)5 (50)
Functional Status Questionnaire (22)a3415 (44)11 (32)12 (35)15 (44)7 (21)8 (24)
Functional Assessment Screening Questionnaire (23)a156 (40)2 (13)7 (47)8 (53)2 (13)5 (33)
Self Evaluation of Life Function (24)a546 (11)6 (11)40 (74)37 (69)8 (15)11 (20)
The Lambeth Disability Screening Questionnaire (25)a223 (14)14 (64)11 (50)0 (0)8 (36)8 (36)
European Quality of Life Scale (26)a51 (20)1 (20)3 (60)2 (40)1 (20)2 (40)
Office of Population Census and Surveys Postal Screening Questionnaire—1985 (27)a371 (3)0 (0)31 (84)29 (78)5 (14)8 (22)
Dartmouth COOP Charts (28)a92 (22)0 (0)7 (78)6 (67)0 (0)3 (33)
Duke Health Profile (29)a172 (12)2 (12)12 (71)14 (82)3 (18)1 (6)
Nottingham Health Profile (30)a457 (16)8 (18)37 (82)33 (73)1 (2)4 (9)
The McMaster Health Index (31)a749 (12)4 (5)52 (70)50 (68)13 (18)20 (27)
The Quality of Life Index (32)a52 (40)1 (20)2 (40)2 (40)1 (20)2 (40)
Life Satisfaction Rating Scales (33)a320 (0)0 (0)30 (94)31 (97)2 (6)1 (3)
Medical Outcomes Study Short-Form 36 (15)a3610 (28)2 (6)24 (67)24 (67)2 (6)10 (28)
Disease Repercussions Profile (34)b61 (17)0 (0)2 (33)2 (33)3 (50)4 (67)
Craig Handicap Assessment and Reporting Technique (35)b2714 (52)6 (22)4 (15)3 (11)9 (33)18 (67)
New Handicap Scale (36)b116 (55)6 (55)4 (36)0 (0)1 (9)5 (45)
Reintegration to Normal Living (37)b119 (82)8 (73)0 (0)2 (18)2 (18)1 (9)
Impact on Participation and Autonomy (38)b2315 (65)17 (74)0 (0)2 (9)8 (35)4 (17)
London Handicap Scale (39)b63 (50)3 (50)1 (17)1 (17)2 (33)2 (33)
Health Assessment Questionnaire (13)c202 (10)1 (5)18 (90)13 (65)0 (0)6 (30)
Arthritis Impact and Measurement Scale (40)c10025 (25)22 (22)69 (69)56 (56)6 (6)22 (22)
McMaster-Toronto Arthritis and Rheumatism Questionnaire (41)c323 (9)2 (6)25 (78)0 (0)4 (13)30 (94)
Arthritis Helplessness Index (42)c150 (0)0 (0)15 (100)15 (100)0 (0)0 (0)
Totals 912201 (22)132 (14)584 (64)516 (57)127 (14)264 (29)

Agreement by 2 experienced assessors.

Of the 912 items rated, the number of items considered by each assessor (A and B) to be measuring participation was 215 (24%) and 286 (31%), respectively (Table 3). Complete agreement between assessors occurred for 785 (86%) of the items rated (201 deemed to capture participation, 584 deemed not to, and none where both assessors were unsure). This level of agreement gave a κ of 0.70 (95% confidence interval [95% CI] 0.65–0.75).

Table 3. Rating of all items into 3 categories by the 4 assessors
AssessorParticipation No. of items (% of items)Unsure No. of items (% of items)Nonparticipation No. of items (% of items)
A286 (31)1 (0)625 (69)
B215 (24)75 (8)622 (68)
C272 (30)72 (8)568 (62)
D187 (21)42 (5)683 (75)

The proportion of items for which there was complete agreement between assessors ranged from 50% (Disease Repercussions Profile) to 100% (Arthritis Helplessness Index, Health Assessment Questionnaire, Dartmouth Cooperative Information Project Charts; Table 2]. The highest proportions of agreed participation items were found in instruments designed to capture societal consequences of health conditions. In 5 of these instruments, 50% or more of the items were agreed to be measuring participation: the Reintegration to Normal Living Index (82%), the Impact on Participation and Autonomy Questionnaire (65%), the New Handicap Scale (55%), the Craig Handicap Assessment and Reporting Technique (52%) and the London Handicap Scale (50%). The Disease Repercussions Profile, also designed to capture societal consequences, only contained 1 agreed item out of 6 rated that measured participation.

Generic health instruments tended to contain a smaller proportion of agreed participation items (i.e., 10 items [28%] from the Medical Outcomes Study Short Form 36, and 35 items [26%] from the Functional Limitations Profile). Of the instruments designed to capture the consequences of musculoskeletal conditions, the Arthritis Impact Measurement Scale (version 2) contained the greatest proportion (25%) of instrument items agreed as measuring participation.

Agreement between 2 inexperienced assessors.

The number of items considered by each assessor (C and D) to be measuring participation was 272 (30%) and 187 (21%), respectively (Table 3). Complete agreement between assessors occurred in 653 (72%) of all items (132 deemed to capture participation, 513 deemed not to, 8 items both assessors unsure). This gave a κ of 0.40 (95% CI 0.34–0.46). The agreement within instruments ranged from 6% (McMaster-Toronto Arthritis Patient Function Preference Questionnaire) to 100% (Arthritis Helplessness Index).

The pattern of agreement was similar to the experienced assessors. Five of the 6 instruments designed to capture societal functioning contained 50% or more items agreed by the inexperienced assessors as measuring participation: the Impact on Participation and Autonomy questionnaire (74%), the Reintegration to Normal Living Index (73%), the New Handicap Scale (55%), and the London Handicap Scale (50%). The Disease Repercussions Profile contained no items agreed to measure participation. In contrast to the experienced assessors, the inexperienced assessors agreed that >50% of the Lambeth Disability Screening Questionnaire (64%) measured participation.

Agreement between the experienced and inexperienced assessors.

Complete agreement between the experienced and the inexperienced assessors occurred in 556 (61%) of the items. There was agreement that 97 items measured participation and that 459 did not (Table 4). Of the 201 items considered by both of the experienced assessors to measure participation, there were 104 items about which one or both of the inexperienced assessors disagreed. Of 584 items considered by both experienced assessors not to measure participation, there were 125 items about which one or both of the inexperienced assessors disagreed. Kappa scores for agreement beyond chance within experienced and inexperienced pairs were significantly different from each other (P < 0.001).

Table 4. Agreement between observers, experienced compared with inexperienced
 C versus D “inexperienced”
Agreed participationAgreed not participationDisagreedTotals
A versus B “experienced”    
 Agreed participation971490201
 Agreed not participation7459118584


We have reasoned that measuring participation in population-based studies is important for describing the personal and social consequences of joint pain in older adults. In this study, we sought to determine the extent to which existing health measurement instruments might be used in a population survey to measure levels of participation associated with joint pain. We identified a number of instruments that, by design, had the potential to be applied in this way. We rated individual items to determine the proportion within each instrument that captured participation and used this to indicate the extent to which instruments and their scale scores exclusively represent participation. Our findings demonstrate that participation coverage within these measures is variable, and there were no instruments that consisted entirely of participation items. Instruments with high proportions of relevant items may give reasonable estimates of participation frequency. However, these instruments were not designed to capture participation and contain items that do not measure participation. The removal of items that do not capture participation may affect the validity of these instruments, and further validation studies would be required prior to any application as participation measures in the population.

Instruments designed to capture societal consequences, consistent with the handicap concept, contained the greatest proportion of participation items. This was expected, as the conceptual models of handicap and participation are similar. The conceptual basis is important to consider when selecting an instrument because the underlying theory is shaping the measurement (43). Instruments designed to measure generic health status, quality of life, or the consequences of musculoskeletal conditions contained a smaller proportion of participation items. These instruments were not designed specifically to reflect the conceptual models of handicap or participation, and their ability to measure them has been questioned (44, 45).

The Reintegration to Normal Living Index contained the greatest proportion of participation items (9 of 11). It was designed as a clinical tool to measure levels of global function and has been applied in survey questionnaires (37, 46). The second highest proportion was in the Impact on Participation and Autonomy (15 of 23), designed as a handicap questionnaire (38). Both questionnaires contained items that were not rated as participation items. This may reflect differences between the concepts of participation and handicap. Measurement of handicap has proven challenging, with the small number of instruments and lack of uniformity between them suggesting the concept is underdeveloped (34, 45). It remains to be seen if similar problems occur with participation.

Our findings were based on an analysis of instruments that might contain participation items. These were identified by a comprehensive and reproducible search of published literature and application of standard criteria to include only those that could be applied in population surveys to measure participation in relation to joint pain. We know of no other instruments for inclusion. Those such as the Western Ontario McMaster Universities Osteoarthritis Index (47), the Roland and Morris Low Back Questionnaire (48), and the Disabilities of the Arm, Shoulder and Hand outcome measure (49) were not included because they were designed to capture consequences of site-specific conditions. The WHO Disability Assessment Schedule (50) was published after our searches were performed. Other instruments with participation items (e.g., Assessment of Life Habits [51]) were not designed to be self completed.

In the absence of a gold standard, we used agreement between the experienced assessors to determine the proportion of items within each instrument that captured participation. There are no other studies with which to compare this level of agreement. However, the level of agreement beyond chance for the experienced assessors was significantly higher than for the inexperienced assessors. Intra-assessor variation was not assessed. We cannot therefore conclude whether the disagreement within the inexperienced pair or within the experienced pair was reflecting general inconsistencies or systematic differences. However, agreement beyond chance between the inexperienced assessors was poor compared with that achieved by the experienced pair. This suggests that greater experience and familiarization with the WHO concepts lead to a greater common understanding and certainty in the application of these concepts.

To guide the development of the conceptual and measurement models of participation, the main sources of disagreement were analyzed. Assessors found it difficult to determine if actions, tasks, or life situations occurred at the level of the individual (i.e., activities) or society (i.e., participation), an issue that also affected the application of the ICIDH (52). For example, with the item “Are you able to use the telephone?” (Self-Evaluation of Life Function question 3 [24]), one assessor from each pair assessed the item as measuring participation, whereas the other assessor did not. Disagreement concerned whether the telephone is a contextual factor that affects the fulfillment of the task.

There were 97 items that the experienced and the inexperienced observers agreed captured participation, and these provide examples of participation items likely to be recognized more straightforwardly by any observer. Such items did not refer to specific actions, tasks, or life situations, but captured a collection of them, similar to the domain and subdomain headings presented in the ICF. For example, “I participate in social activities with family, friends, and/or business acquaintances as is necessary or desirable to me” (Reintegration to Normal Living Index question 7 [37]).

The level of disagreement between the inexperienced assessors raises questions about the ability of the participation concept to be quickly understood and easily applied. It suggests that WHO literature does not lead to a consistent understanding of the concept, which may be why it is measured in a number of ways. The WHO, in introducing the ICF, does not present a gold standard for measurement or specific strategies of measurement, but instead present a base that requires tailoring to meet specific needs (53) as well as a description of the concept (54). To apply the concept for research purposes, further development and direction is required.

In conclusion, there appear to be no existing health measurement instruments consisting entirely of participation items. It may be appropriate to develop both the concept and its translation into measurement to capture levels of participation in the general population.


The authors acknowledge Dr. Clare Jinks and Jonathan Hill for their participation in the study.

  • 1

    All terms are defined by the World Health Organization (9).




The execution of tasks or actions by an individual. Negative aspect is activity limitation (formerly “disability” in International Classification of Impairments, Disabilities, and Handicap [ICIDH]), which is defined as difficulty an individual may have in executing a task, e.g., difficulty walking 100 yards.

Body structures.

Anatomic parts of the body, such as organs, limbs, and their components. Negative aspect is impairment, which is defined as a problem with body structure, e.g., radiographic joint space narrowing or osteophyte formation.

Body functions.

The physiologic functions of body systems (including physiologic functions). Negative aspect is impairment, which is defined as a problem with body function, e.g., limited joint range of movement, loss of muscle strength.

Contextual factors.

These represent the complete background to an individual's life and living. They include 2 components: environmental factors and personal factors, which may have an impact on the individual with a health condition and that individual's health and health-related status.

Environmental factors.

These make up the physical, social, and attitudinal environment in which people live and conduct their lives, e.g., poor access to public transport, health care professionals' attitudes to osteoarthritis and aging.


An umbrella term encompassing all body functions, activities, and participation. Negative aspect is disability, which is referred to as impairment, activity limitation, and participation restriction.

Major life areas.

A domain of functioning that covers the tasks, actions, and life situations required to engage in education, work, and employment and to conduct economic transactions.


Involvement in life situations. Negative aspect is participation restriction (formerly “handicap” in ICIDH), which is defined as problems an individual may experience in involvement in life situations, e.g., difficulties obtaining/retaining paid employment.

Personal factors.

Refers to the particular background of an individual's life and living, and comprise features of the individual that are not part of a health condition or health status, e.g., age, sex, coping style.