Development and validation of a German version of the joint protection behavior assessment in patients with rheumatoid arthritis

Authors


Abstract

Objective

Joint protection (JP) is an important part of the treatment concept for patients with rheumatoid arthritis (RA). The Joint Protection Behavior Assessment short form (JPBA-S) assesses the use of hand JP methods by patients with RA while preparing a hot drink. The purpose of this study was to develop a German version of the JPBA-S (D-JPBA-S) and to test its validity and reliability.

Methods

A manual was developed through consensus with 8 occupational therapist (OT) experts as the reference for assessing patients' JP behavior. Twenty-four patients with RA and 10 healthy individuals were videotaped while performing 10 tasks reflecting the activity of preparing instant coffee. Recordings were repeated after 3 months for test–retest analysis. One rater assessed all available patient recordings (n = 23, recorded twice) for test–retest reliability. The video recordings of 10 randomly selected patients and all healthy individuals were independently assessed for interrater reliability by 6 OTs who were explicitly asked to follow the manual. Rasch analysis was performed to test construct validity and transform ordinal raw data into interval data for reliability calculations.

Results

Nine of the 10 tasks fit the Rasch model. The D-JPBA-S, consisting of 9 valid tasks, had an intraclass correlation coefficient of 0.77 for interrater reliability and 0.71 for test–retest reliability.

Conclusion

The D-JPBA-S provides a valid and reliable instrument for assessing JP behavior of patients with RA and can be used in German-speaking countries.

INTRODUCTION

Individuals with rheumatoid arthritis (RA) experience physical impairment and functional limitations, even though impressive advances in drug treatment have been achieved (1). A multidisciplinary approach in the management of RA is important, with physiotherapy and occupational therapy aiming at maintaining or improving independence and quality of life (2, 3).

Hand involvement during the course of the disease occurs inevitably. Within 5 years of onset, finger and wrist joints are affected (4) and destruction of the dominant hand is more frequently observed (5). Joint protection (JP) is therefore an important intervention. Principles of JP have been developed based on anatomic and biomechanical research to guide occupational therapists (OTs) in their work with patients with RA (6), e.g., altering working methods (use of proximal joints, dynamic activities), energy conservation (balance between activity and rest), and using assistive devices.

JP has beneficial short-term effects on pain and function in patients with established RA and moderate functional problems (7, 8). Using assistive devices reduces pain during task performance (7) and altering working methods reduces difficulties in activities of daily living (8). If JP is taught using behavioral education methods, it can also have a long-term impact on reducing pain and maintaining function for individuals with less than 5 years' disease duration (9–11). JP is also taught as a preventive intervention to patients with recent disease onset. However, its effectiveness at this early stage has not yet been convincingly demonstrated (12).

Discrepancy has been found between self-reported and observed JP behavior (8). An assessment instrument that systematically and objectively evaluates effectiveness of JP interventions is needed. Hammond and Lincoln developed and evaluated the Joint Protection Behavior Assessment (JPBA) (13) assessing JP behavior while preparing instant coffee and a snack, because most JP methods that are taught clinically focus on protecting hand and wrist joints during kitchen activities. In the JPBA, 5 JP principles (6) are assessed while performing finger-wrist activities: reducing effort using labor-saving gadgets and assistive devices, avoiding lifting, and having good workplace organization; distributing load over several joints; using joints in stable positions; using stronger, larger (proximal) joints; and avoiding positions of deformity.

The original JPBA consists of 20 tasks integrating these JP principles. Several aspects of validity and reliability of the JPBA have been extensively examined (13, 14) and the JPBA has been used in clinical studies (9–11, 15, 16). A short version of the JPBA (JPBA-S), consisting of 10 tasks for the activity of preparing instant coffee, has been found to be reliable compared with the full-length JPBA (Spearman's correlation 0.94) (14). The cultural adaptation and validation of a German version was therefore based on the short form. Preparing instant coffee is also a common activity in Switzerland and requires little time, which minimizes the risk of individuals with RA becoming fatigued.

The purpose of the present study was to cross-culturally adapt the original JPBA-S to a German (Deutsch) version (D-JPBA-S) and to develop an assessment manual and assess the psychometric properties in a German-speaking Swiss population with RA. Before assessing interrater, intrarater, and test–retest reliability, special attention was paid to the construct validity of the D-JPBA-S.

PARTICIPANTS AND METHODS

Participants.

Eight OTs from different hospital rheumatology departments in Zurich, Switzerland who were experienced in treating patients with RA and teaching JP were invited to participate in the development of the German manual. We videotaped the JP performance of 24 patients who were consecutively recruited from the outpatient facility of the Department of Rheumatology, University Hospital Zurich between June and July 2004. All patients fulfilled the 1987 American College of Rheumatology (formerly the American Rheumatism Association) classification criteria for RA (17); were receiving stable disease-modifying antirheumatic drug (DMARD) treatment, including anti–tumor necrosis factor treatment, steroids, and nonsteroidal antiinflammatory drugs, for at least 4 weeks; and had mild to moderate disease activity (Disease Activity Score in 28 joints [DAS28] <5.1). All participants had attended at least 1 JP instruction session since onset of the disease. Patients with severe RA and functional limitations preventing JP behavior or independent task performance were excluded. All but 1 patient (due to exacerbation of comorbidities) participated in the test–retest recordings after 2–3 months (n = 23). All patients had stable disease activity during this period. Self-perceived disease activity, measured with the Rheumatoid Arthritis Disease Activity Index (RADAI), and erythrocyte sedimentation rate remained unchanged (Table 1). However, the DMARD dose was slightly increased in 2 patients. Ten non–health professional employees of the University Hospital without health problems, matched in age and sex to the RA group, participated as controls (Table 1). The local research ethics committee approved the study protocol and all individuals provided informed consent prior to participation. Six therapists (4 OTs and 2 physiotherapists [PTs]), recruited from different rheumatology departments in Zurich, assessed the video recordings.

Table 1. Demographic and clinical characteristics of study participants*
CharacteristicHealthy controls (n = 10)RA patients (n = 23)Correlation with D-JPBA-S (R) at baseline
Baseline3 months
  • *

    Values are the mean ± SD unless otherwise indicated. RA = rheumatoid arthritis; D-JPBA-S (R) = German version of the revised Joint Protection Behavior Assessment short form; IQR = interquartile range; NA = not applicable; NM = not measured; DMARDs = disease-modifying antirheumatic drugs; NSAIDs = nonsteroidal antiinflammatory drugs; DAS28 = Disease Activity Score in 28 joints; ESR = erythrocyte sedimentation rate; RADAI = Rheumatoid Arthritis Disease Activity Index; HAQ = Health Assessment Questionnaire; JAM = Joint Alignment and Motion scale; ROM = range of motion.

  • P < 0.0001.

  • No significant change between baseline and 3 months.

  • §

    P < 0.05.

  • P < 0.001.

  • #

    Of dominant hand.

Women/men, no.7/318/5  
Age, median (IQR) years57 (47–63)63 (47–70)  
Disease duration, median (IQR) yearsNA11 (7–18)  
Hochberg functional class, median (IQR)NA2 (1–4)NM0.44
DMARDs, no. of patientsNA2222 
Steroids, no. of patientsNA1111 
NSAIDs, no. of patientsNA1010 
DAS28NA3.2 ± 1.5NM0.45
ESR, mm/hourNA14.0 ± 11.817.0 ± 16.50.42§
RADAINA2.6 ± 1.62.6 ± 1.70.43
Hand pain (RADAI)NA1.0 ± 0.950.8 ± 0.970.57
HAQ scoreNA1.3 ± 0.61.4 ± 0.60.42
General health (HAQ), median (IQR)NA7 (5–8)6 (5–9)−0.55
JAM, median (IQR)#NA2 (1–3)1 (1–3)0.38§
Grip strength, median (IQR)#NA16 (6–24.5)13 (7–26)−0.63
ROM wrist flexion#NA48.2 ± 26.157.6 ± 23.10.03
ROM wrist extension#NA36.3 ± 17.743.9 ± 19.9−0.05

Manual development.

Face validity.

The tasks contained in the UK version of the JPBA-S were checked for cultural applicability because of potential differences in equipment used in Switzerland. Therefore, “putting in an electric plug” was removed. This is not a physically difficult task for Swiss patients with RA due to a different plug design. This task was replaced by “opening a milk pack,” a convenient alternative that applies several JP principles and is frequently used in JP education. Filling, carrying, and pouring a kettle tasks were replaced with holding, carrying, and pouring a pan (optionally an electric kettle) because saucepans are more commonly used in Switzerland to boil water. We anticipated that both men and women performed all 10 tasks as routine daily activities. To determine face validity, JP literature and the UK JPBA manual were reviewed to identify which JP principles were being applied during task performance (13). Several principles can be applied to each task depending on method of performance. There are some differences in the UK JPBA-S because some tasks are performed differently in Switzerland (Table 2).

Table 2. Face and content validity of the D-JPBA-S*
D-JPBA-S tasksJoint protection principles face validity
12345
  • *

    D-JPBA-S = German version of the Joint Protection Behavior Assessment short form. √ = joint protection principle fulfilled.

  • 1 = reducing effort by using aids, using assistive devices, and avoiding lifting, as well as good organization of workplace; 2 = distributing load over several joints; 3 = using joints in stable positions; 4 = use of strongest, largest (proximal) joints; 5 = avoiding positions of deformity.

  • Omitted after Rasch analysis.

Turn on water tap
Hold pan  
Turn off water tap
Carry full pan  
Open coffee jar 
Close coffee jar 
Pour hot water into cups 
Open milk pack   
Hold milk pack to pour milk 
Carry full cup(s)  

Content validity.

All methods of performing the 10 tasks were described in behavioral codes. These codes either described normal hand use (i.e., as performed by healthy individuals) or hand use consistent with joint protective adaptation in patients with RA. We translated all codes of the selected tasks from the UK JPBA-S and added new codes found in German leaflets or books about JP, reported by experienced rheumatology OTs, and identified in the video recordings of individuals with and without RA. In total, 91 behavioral descriptions were generated for the 10 tasks, between 6 and 11 for each. A draft manual containing these descriptions (illustrated with photographs to ensure understanding) was developed and sent to 8 OT experts. They were asked individually to score each code as a correct, partially correct, or incorrect JP behavior for patients with mild to moderate RA with wrist and hand involvement but without severe finger, hand, elbow, or shoulder deformities, because these can lead to difficulty performing common JP methods and require more idiosyncratic solutions.

Final scores allocated to the behavioral codes were based on the preliminary decision that consensus about being a correct or incorrect method by at least 6 of the 8 expert OTs was necessary. Descriptions that did not achieve this level of consensus were scored as partially correct. The manual was used as reference for assessing the video recordings.

Video recordings and additional measures.

Video recordings were completed in kitchen facilities of the University Hospital Zurich. Participants were asked to use the same styles of faucets, containers for boiling water, milk packs, and assistive devices they normally used to ensure the assessment situation was as similar as possible to their home. All utensils were heavy enough to offer sufficient resistance to require a JP response from participants with RA. Participants were asked to make 2 cups of coffee in the same sequence and manner as they would normally do at home. They were kept unaware of the true purpose of the video recordings to reduce socially desirable responses. They were informed that the video camera would only focus on their hands and not their faces to preserve anonymity. Light conversation continued during the video recording to distract participants from consciously paying attention to their hand movements. The assessment was repeated after 3 months. Video recordings were transferred to Pinnacle Instant CD/DVD 8.0 software (Pinnacle Systems, Mountain View, CA) and were edited on compact discs for assessments.

The following parameters were measured for patients with RA: physical functional ability was assessed using the Hochberg functional classes (18) and the Health Assessment Questionnaire (HAQ), a disease-specific self-administered 20-item questionnaire (19); self-perceived disease activity and typical RA symptoms such as pain and morning stiffness were assessed using the RADAI, a self-administered 5-item questionnaire (20); general health status was assessed using a 10-cm visual analog scale with the end points bad and excellent; impairment of the dominant hand was measured using a goniometer (for active wrist joint range of motion), the Joint Alignment and Motion Scale (for finger and wrist joint deformity) (21), and a Jamar hand dynamometer (Lafayette Instrument Company, Lafayette, IN) (for grip strength) (22); and disease activity was assessed using the DAS28, calculated from the results of a 28 tender joint count, a 28 swollen joint count, and erythrocyte sedimentation rate (23).

Assessment procedures.

Discriminant validity.

Assessments were performed with patients with RA and healthy individuals to determine if their behaviors differed regarding JP.

Cross-sectional validity.

JP behavior within the RA group was correlated with functional impairment (assessed with the HAQ) and hand pain (assessed with RADAI pain items).

Reliability assessments.

Four OTs and 2 PTs independently assessed JP performances of 10 randomly selected patients with RA and all 10 healthy participants (interrater reliability). Two random duplicate video recordings of 2 patients (patients A and B) were included to determine intrarater reliability. Raters were blinded to the presence of duplicates. These duplicates were reassessed 4 weeks later by all raters, thus simulating the clinical situation of OTs reassessing their patients. One of the 6 raters assessed the video recordings of all 23 patients at both time points. The raters were asked to strictly follow the manual to minimize observer drift while assessing.

Rasch analysis.

The Rasch model reverses the traditional view of the data-model relationship, i.e., data must fit the model, meaning that the observed frequencies should not differ too much from expected values (24). Rasch model theory states that response probabilities change as a function of participant ability and item difficulty (expressed as logits), i.e., the probability that a person with a logit score of 1.0 will pass an item with a difficulty of 1.0 logit is 50%, but the probability that he or she will pass an item with a difficulty >1 logit or <1 logit is <50% or >50%, respectively. Rasch models provide various error estimates and fit statistics, especially for testing unidimensionality (i.e., if indeed a single dominant trait is being measured) and scale additivity (i.e., the probability that difficult items are only passed by high-scoring participants whereas less-able participants only pass easier items). This particularly allows gathering of further evidence of the construct validity of a measure. Each item and person is calibrated to provide a difficulty estimate and an ability estimate, respectively, of the location on an abstract linear continuum from less to more, thus providing an equal interval scale representing the variable, in this case, JP behavior.

Statistical analysis.

The Rasch Partial Credit Model was applied (25), because the steps (thresholds) between the adjacent scores (incorrect/partially correct/correct = 0/1/2) might be different across tasks. The raters were also accounted for as a person factor to control for bias. Complete data for all 120 ratings were available, and 90 ratings without extreme scores (0 points) (26) were analyzed for construct validity.

Individual item fit to the model was examined with α at 5%. To reach overall probability in the 10-item D-JPBA-S testing, Bonferroni correction was used throughout and therefore the significance values were set at 0.005 (27).

All reliability tests were performed for the original D-JPBA-S (D-JPBA-S [O]; i.e., using ordinal raw scores for all 10 tasks) and the revised D-JPBA-S (D-JPBA-S [R]; i.e., using linear data of all tasks fitting the Rasch model). Intraclass correlation coefficients (ICCs) were calculated using 2-way random-effects models and consistency definition for all reliability measures. The ICC provides information on the ability of ≥2 observers to differentiate between subjects. For interrater reliability, we expected an ICC2,6 of ∼0.80. For intrarater reliability, we expected an ICC2,1 of ∼0.80. To evaluate real changes in clinical practice and research, a test–retest change determined by a specific measurement must be at least the smallest detectable difference (SDD), which is calculated as follows: SDD = 1.96 × √2SEM2, where SEM (standard error of measurement) is SD × √(1 − r) and r is the reliability coefficient (28). Pearson's correlation coefficients were calculated to measure associations between the D-JPBA-S (R) data and disease-specific data; Mann-Whitney U test was used to test differences between healthy individuals and patients with RA. Rasch analysis was performed using the Rasch Unidimensional Measurement Model RUMM2020 software package (RUMM Laboratory, Duncraig, Western Australia). All ICC calculations and statistical testing were performed using the SPSS software, version 12.0 (SPSS, Chicago, IL).

RESULTS

Content validity of the D-JPBA-S.

Agreement on the scores between ≥6 of the 8 OT experts was achieved for 53 of the 91 behavioral descriptions (58%), with 22 descriptions scored as correct and 31 as incorrect. There was insufficient agreement on 38 descriptions, which were therefore scored as partially correct.

Construct validity of the D-JPBA-S using Rasch analysis.

Examining fit of the 10 D-JPBA-S tasks to the Rasch model revealed that task 1 (turn on tap), task 2 (hold pan), and task 4 (carry pan) were significant at P < 0.005 (chi-square probabilities, all after Bonferroni correction), i.e., the observed values of these 3 tasks were significantly different from the expected values and therefore did not fit the model (Table 3). Additionally, the thresholds for task 2 (hold pan), task 4 (carry pan), and task 8 (open milk pack) were disordered, i.e., their scoring categories were not progressing in a logical order. It can be expected that as a person's ability increases, it will be more likely for him or her to obtain a higher score; however, in the case of disordered thresholds, the items do not work in this way. Subsequently, scoring categories 1 and 2 were collapsed for the 3 disordered tasks, resulting in dichotomous data of 0 (for incorrect and partially correct) and 1 (correct) for tasks 2, 4, and 8. After rescoring, task 8 (open milk pack) still did not fit the model at the 0.5% significance level. No uniform differential item functioning was found, meaning that no task was biased by raters, sex, or age. Therefore task 8 was removed, resulting in a model fitting all remaining items, i.e., a valid assessment was obtained (Table 3). Test-of-fit statistics demonstrated a mean ± SD item location (i.e., difficulty) of 0 ± 2.2 and a mean ± SD person location (i.e., ability) of −3.4 ± 1.3, implying that participants' ability was too low in relation to the items' difficulty (Figure 1). The formal test of invariance (item–trait interaction) revealed a total item chi-square of 46.4 (P = 0.001), indicating significant deviation between the observed data and what was expected from the model at group level. Reliability indices were 0.79 (person separation index, indicative of the power of the D-JPBA-S to discriminate between respondents) and 0.77 (Cronbach's alpha). Person logits of the D-JPBA-S (R) were transformed into an arbitrarily chosen 0–18 interval scale for further calculations.

Table 3. Individual item fit of the D-JPBA-S (R)*
 Initial values unchangedAfter rescoring tasks 2, 4, and 8After removing task 8
ItemChi-square valuesChi-square probabilityItemChi-square valuesChi-square probabilityItemChi-square valuesChi-square probability
  • *

    Initial values are presented in serial order; values after rescoring and after removing nonfitting tasks are presented in chi-square probability order. D-JPBA-S (R) = German version of the revised Joint Protection Behavior Assessment short form.

  • Significant at P < 0.005 (after Bonferroni correction).

Task 1Turn on tap12.340.002Turn on tap1.040.569Turn off tap0.3730.830
Task 2Hold pan20.180.000Carry pan1.590.452Carry pan0.4180.812
Task 3Turn off tap1.110.574Hold pan1.600.450Carry cups3.5000.174
Task 4Carry pan11.510.003Turn off tap2.000.369Pour water5.3220.070
Task 5Open jar6.450.040Pour water3.210.201Turn on tap5.5980.060
Task 6Close jar8.500.014Carry cups3.780.151Hold pan6.7070.035
Task 7Pour water7.000.030Open jar9.520.009Pour milk6.7990.033
Task 8Open milk10.730.005Pour milk9.930.007Close jar6.9950.030
Task 9Pour milk8.530.014Close jar10.050.007Open jar10.6700.005
Task 10Carry cups4.110.128Open milk27.000.000   
Figure 1.

Person/item threshold-targeting graph of the German version of the revised Joint Protection Behavior Assessment short form ([ratings of] persons: n = 120; items: n = 9). Locations of persons (= person abilities) and of each item threshold (1 threshold for the dichotomous tasks 2 and 4, 2 thresholds for all other polytomous items) on the interval scale, representing the measure of joint protection behavior. Easiest item thresholds are from incorrect to partially correct for the tasks “pour milk” and “turn on tap,” with mean logits of −3.3 and −2.8, respectively (on the left). Most difficult item thresholds are from partially correct to correct for the tasks “turn on tap,” “open jar,” and “pour water,” with mean logits of 7.2 and 7.4, respectively (on the far right).

Reliability.

The demographic characteristics of the healthy participants and those with RA were comparable (Table 1). Mean values of the 6 raters' scorings were between 3.5 and 5.4 on the D-JPBA-S (R) scale and differences were not significant (Kruskal-Wallis H test P = 0.50).

Interrater reliability.

Overall interrater reliability for the D-JPBA-S (O) was 0.79 (95% confidence interval [95% CI] 0.74–0.85), ranging between 0.84 (95% CI 0.76–0.91) and 0.70 (95% CI 0.54–0.82) for each pair of raters. Reliability values slightly decreased when calculated for the D-JPBA-S (R), being 0.77 (95% CI 0.70–0.83) across all raters and ranging between 0.84 (95% CI 0.75–0.90) and 0.65 (95% CI 0.47–0.80) for each pair of raters.

Intrarater reliability.

The intrarater reliability of each rater was generally higher in the assessments of time point 1 than of time point 2, which was true for both patient A and patient B. The intrarater agreement range for patient A was 80–100% (mean ± SD 95 % ± 8.4%) at time point 1 and 50–100% (mean ± SD 75% ± 20.7%) at time point 2; for patient B, values were 100% at time point 1 and 70–100% (mean ± SD 90% ± 11.7%) at time point 2. Results were the same for the D-JPBA-S (O) and D-JPBA-S (R). Because raters scored this sample within a very restricted range, resulting in low variability, ICC calculations were not applicable.

Test–retest reliability.

Patients repeated the kitchen activity after a mean ± SD of ∼11 ± 2.5 weeks. ICCs were 0.65 (95% CI 0.27–0.87) for the D-JPBA-S (O) and 0.71 (95% CI 0.31–0.88) for the D-JPBA-S (R). On the 18-point linear scale, the median D-JPBA-S (R) score on test 1 was 7.7 points (interquartile range [IQR] 3.4–9.3) and on test 2 was 5.8 (IQR 3.35–7.7). Score changes over the 2 tests were between 0 and 12.2, and the mean ± SD score change was 1.1 ± 3.7 points. SDD was 5.5 points on the linear scale.

Disease-related factors and JP behavior.

Calculations in this section were performed with D-JPBA-S (R) linear data. The D-JPBA-S (R) scores of the RA participants were negatively correlated with grip strength (r = −0.63, P < 0.001). Correlations with all other disease-related factors were significantly positive. No correlation was found with range of motion at the wrist joint (Table 1).

Discriminant validity.

The D-JPBA-S (R) discriminated significantly (Mann-Whitney U test; z = −8.215, P < 0.0001) between healthy individuals and those with RA regarding JP behavior. The median score was 0 points (IQR 0–2.9) for healthy individuals and 6.5 (IQR 5.0–9.0) for those with RA (Table 1).

Cross-sectional validity.

Pearson's correlation coefficient of JP behavior with functional impairments in patients with RA (measured with the D-JPBA-S [R]) was 0.42 (P < 0.0001) and correlation with hand pain was 0.57 (P < 0.0001) (Table 1).

DISCUSSION

The final version of the D-JPBA-S, as obtained with Rasch modeling, i.e., consisting of 9 valid tasks, is suitable for measuring JP behavior. Traditionally, analysis of outcome data focused on summing and dividing raw scores that are ordinal; however, calculations with such data may not be justified. Performing Rasch analysis is far more than a conceptual issue and its results had practical implications for the construct of the D-JBPA-S. For task 2 (hold pan) and task 4 (carry pan), it was difficult to assess whether the assisting hand held the pan's weight (scored as correct) or was only supporting (partially correct). Therefore, raters may have randomly assigned scores, not perceiving a substantial difference. Collapsing incorrect and partially correct scores in this case is advantageous without losing information. Task 8 (open milk pack) was not an appropriate item because it did not discriminate between JP performance of healthy individuals and patients with RA. Both groups had trouble opening the milk pack, irrespective of health status or awareness of protecting joints.

Reliability calculations were based on linear scores of the D-JPBA-S (R) as well as on summed raw scores of the D-JPBA-S (O). Reliability for the D-JPBA-S (R) was slightly lower because one item was deleted. Because the D-JPBA-S will be used as an evaluative assessment, it is of no use to collapse all ordered polytomous scales into dichotomous scales to raise reliability, as this would diminish precision.

More important is the accurate measurement of change between 2 time points by transforming raw scores into linear data because raw score changes might be misinterpreted. Every linear difference (test 2 minus test 1) corresponds to a range of raw score differences, which differ depending on test 1 initial status (29). Test–retest reliability integrates variability within the patients' group and within the rater, i.e., a change of the rater's assessment might reasonably be due to different JP performances of some patients.

The period between test and retest may appear to be long. However, we anticipated that noticeable changes in habitual JP behavior could occur due to unpredictable daily pain changes. This was also identified in an earlier study (8) and was confirmed in our video recordings. Different JP performances due to large pain changes in (few) individuals explain our SDD of 5.5 points (∼30% of the total range), even though most patients were in a stable condition and the overall correlation between pain and JP behavior was moderate. Although the usually low initial scores promote large improvements, it might be difficult to detect true differences in individual patients when disease-dependent changes interfere with real changes.

The discrepancy between difficulty of items and persons' ability also illustrates that individuals without RA have no reason to perform JP and that individuals with RA perform less JP than might be expected because they do not recall JP instructions. Participants with RA stated that the effective drug treatment had lowered their perceived need and their motivation to apply JP during daily activities. However, difficulty levels of items are very different and there is a large gap within the scale. Further development of the scale should take this into account, e.g., by weighting the items, to improve its appropriateness (30).

The manual describes the application of the D-JPBA-S and all possible JP methods and is illustrated with pictures. This is essential to ensure reliable assessments. The decision about how much agreement was required to assign the scores correct, partially correct, and incorrect for the manual was arbitrary. Requiring a higher level of agreement for correct/incorrect scores means more partially correct scores. Persons with RA often show partially correct behavior, having been told about JP behavior without fully understanding the principles behind it or having developed their own idiosyncratic methods, and therefore the potential for improvement is quite substantial. For example, a common, easy-to-learn principle is to work bilaterally.

Intrarater agreement between time points 1 and 2 was almost 100%. Raters may have recognized the individuals and, being convinced of their first scorings, persisted in their scores. Intrarater agreement after 4 weeks was considerably lower for some raters, suggesting that they performed assessments in a more unbiased and critical fashion. This second value might therefore be more accurate and nearer to the reality of clinical practice in which assessments are repeated some weeks later in the course of OT intervention.

Our patients may be considered representative of the RA population on relevant characteristics such as sex, age, and disease severity, and there is evidence that measurement constructs are stable across samples from a common population regardless of sample size (31). This validation provided the prerequisites for using the D-JPBA-S in research. Further analysis (e.g., using generalizability theory) is needed to allow estimations of change on an individual patient level.

AUTHOR CONTRIBUTIONS

Ms Niedermann had full access to all of the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis.

Study design. Niedermann, Forster, Hammond, Uebelhart, de Bie.

Acquisition of data. Niedermann, Forster.

Analysis and interpretation of data. Niedermann, Forster, Hammond, de Bie.

Manuscript preparation. Niedermann, Forster, Hammond, Uebelhart, de Bie.

Statistical analysis. Niedermann, de Bie.

Acknowledgements

We thank all of the therapists who assisted in this project, in particular Mieke Visscher, OT, at the Institute of Physical Medicine (IPM) of the University Hospital Zurich (USZ), who assisted in the video recording of the patients; the expert OTs who collaborated in developing the manual: Franziska Heigl, Corina Jacobs, Ulla Jörn Good, Regula Kubli, Christine Meier, Anne Rovsing, Sunita Sinha, Irma Stettler, and Mieke Visscher; and the OTs and PTs who dedicated their time assessing the video recordings: Vera Beckmann, Nicolette Bruns, Ulrike Trinks, Sunita Sinha, and Mieke Visscher. We thank the OT and PT teams of the IPM USZ for providing their therapy kitchen facilities for the video recordings; Alex Tobler for technical assistance; and Leanne Pobjoy for her help in preparing the manuscript. Last but not least we thank the members of the Matilda Bay Club-Rasch Measurement in the Social Sciences discussion group for their helpful comments.

Ancillary