Reliability, validity, and responsiveness of five at-work productivity measures in patients with rheumatoid arthritis or osteoarthritis




Arthritis often impacts a worker's ability to be productive while at work. However, the ideal approach to measuring arthritis-attributable at-work productivity loss remains unclear. Our objective was to evaluate the relative strengths and weaknesses of 5 measures aimed at quantifying health-related at-work productivity loss and to determine the best available instrument for this population.


In a 12-month longitudinal design, the psychometric properties (reliability, validity, and responsiveness) of 5 self-reported measures of at-work productivity were compared in workers with either rheumatoid arthritis (RA) or osteoarthritis (OA). We tested the Workplace Activity Limitations Scale (WALS), 6-item Stanford Presenteeism Scale (SPS-6), Endicott Work Productivity Scale (EWPS), RA Work Instability Scale (WIS), and Work Limitations Questionnaire (WLQ).


Across all measures, participants (n = 250, 120 with RA and 130 with OA) consistently reported mild losses of at-work productivity. The Cronbach's alpha of the scales ranged from 0.71 (for SPS-6) to 0.94 (for EWPS), indicating some concerns over the internal consistency of the SPS-6. The RA WIS demonstrated the strongest construct validity (|r| = 0.54–0.74), whereas the WALS was most responsive to perceived changes in work ability. Despite its increasing popularity and potential application for costing analysis, the WLQ did not compare favorably with the other scales, possibly due to psychometric concerns with its physical demands subscale.


Measures revealed unique conceptualization of at-work disability, but no single scale emerged as clearly superior. However, current results slightly favor the WALS and RA WIS as superior instruments for measuring at-work productivity loss in workers with arthritis.


Arthritis is among the most prevalent disorders affecting Canadians (1–3), including people of working age. Many workers with arthritis experience high absenteeism and disability while at work (presenteeism), often leading to eventual work change or work loss (4–9). Although various approaches to quantify the impact of arthritis on employment have been proposed, the best way to measure and estimate the economic costs of worker productivity loss attributable to arthritis is currently unknown (10–13). To date, measurement approaches have largely focused on absenteeism (e.g., days off work), yet it is increasingly recognized that productivity loss while at work actually represents an even greater proportion (41% versus 12% due to absenteeism) of the indirect costs associated with arthritis (14). Moreover, with continuing advances in disease management it is anticipated that people living with arthritis may be better able to maintain their ability to work, thereby needing fewer episodic absences from work. As such, there is a growing need to advance approaches that specifically measure productivity loss at work to better capture the overall impact of arthritis on worker productivity.

The most recent literature review revealed 16 different measurement instruments available for quantifying at-work disability, and highlighted differing perspectives on operationalizing this concept (10). For example, some scales were designed to assess the degree of difficulty in performing specific work tasks, whereas others focused on productivity loss by assessing the amount of time during which difficulties were experienced. Little is currently known about the basic psychometric properties of these instruments in a population with arthritis, or about their ability to sensitively capture changes over time (10, 11, 15). Concurrent comparisons of the psychometric properties of these scales are also needed to help reveal the relative strengths and weaknesses of the measures (10, 11), and to provide evidence-based guidance for the selection of outcome instruments in future studies. Of the available measures, only 5 were considered multi-item self-report measures with specific focus on at-work disability rather than combined absenteeism and presenteeism: the Workplace Activity Limitations Scale (WALS) (16), the 6-item Stanford Presenteeism Scale (SPS-6) (17), the Endicott Work Productivity Scale (EWPS) (18), the Work Instability Scale (WIS) for rheumatoid arthritis (RA) (19), and the Work Limitations Questionnaire (WLQ) (20). Of these, 3 had been originally developed (16, 19) or tested (20) in arthritis, whereas the remaining 2 scales (17, 18) offered sufficiently different perspectives of at-work disability and therefore were also considered of high interest for comparison. This study concurrently evaluated and compared the measurement properties of these 5 at-work productivity measures in patients attending rheumatology clinics or outpatient allied health professional services for the management of RA or osteoarthritis (OA). Our specific objectives were to determine the reliability, validity, and responsiveness for each measure and the single best measure based on their comparative psychometric performance for workers with RA or OA.


Recruitment and sample size.

A 12-month longitudinal survey was conducted by convenience sampling of patients who were followed at 3, 6, and 12 months after initial baseline assessment. The 3 sites consisted of 2 tertiary-level rheumatology clinics in urban teaching hospitals (n = 142) in Toronto, Ontario, Canada and an outpatient arthritis treatment program providing multidisciplinary services (n = 108) in Vancouver, British Columbia, Canada. Cross-sectional baseline data were used for evaluating reliability and validity. Data collected at the 12-month followup were used to assess responsiveness to change. Inclusion criteria were: 1) attendance at an outpatient rheumatology clinic with a rheumatologist diagnosis of RA or OA (Toronto), or attendance at an arthritis treatment program within the past 2 years with RA or OA recorded as the reason for referral by the referring physician (Vancouver); and 2) paid employment for ≥1 month prior to recruitment. Respondents were excluded if they did not speak English, because the questionnaires have not yet been translated into other languages. Informed written consent was obtained from all participants. Research ethics approval was obtained from all participating institutions.

A sample size of 117 was required to detect differences in a correlation of 0.70 from 0.85 with a power of 0.80 and α = 0.05 (0.001 with Bonferroni adjustment). We recruited 120 participants with RA and 130 with OA to allow for subgroup evaluations by type of arthritis in future analyses. In this study, all subjects were pooled into a single sample because the psychometric properties did not differ across the diseases (21). At n = 250, differences between correlations of 0.75 and 0.85 could be detected with similar power.

Data collection.

Each questionnaire included items on sociodemographic characteristics (age, sex, education, and marital status), health and work variables including items on arthritis symptomatology (type, severity, and duration), the Self-Administered Comorbidity Questionnaire (22), current work status (full versus part time) and occupation type based on the National Occupational Classification developed by Human Resources and Skills Development Canada (23), the 5 at-work productivity measures, and a series of items on health-related at-work difficulties to be used for construct validation of the scales.

Measures (in the order fielded in the questionnaire).


The 12-item WALS asks about arthritis-related employment activity limitations and was patterned after the Health Assessment Questionnaire, but with questions specific to the work place (16). The scale measures the degree of difficulty with various job-related tasks that tax upper and/or lower limb function (e.g., gripping, crouching), as well as difficulties with commuting, scheduling, concentration, and pace of work. Responses on a 4-point Likert-type scale range from 0 to 3, where 0 = no difficulty and 3 = not able to do. If the item was not applicable, a score of 0 was given. Cronbach's alphas of 0.78 and 0.81 have been reported in a longitudinal sample with arthritis (n = 349–491) at 4 different time points, each 18 months apart (16, 24). The twelfth item was excluded from scale scoring based on discussions with the developer. A total score (out of 33) was calculated based on the response for 11 items, with a higher total score indicating greater difficulties at work.


The SPS is a relatively new instrument with few validation studies available. Various versions of the scale have been previously tested, including the original SPS-32 (15), the SPS-13 (25), and also the SPS-6 (17). Originally intended for workers in knowledge-based jobs, the SPS-6 measures the impact of workers' perceived ability to concentrate on work tasks despite distractions of health impairments and pain, based on a 1-month recall period (15, 17). The SPS consists of 6 questions on a 5-item Likert scale, with response options ranging from strongly agree to strongly disagree. Scale items were scored between 1 and 5, where 1 = low disability and 5 = high disability, giving a total scoring range of 6–30. A total score was not calculated if any items were missing.


The EWPS was developed to quantify the frequency of work performance and productivity attitudes and behaviors over a 1-week period, for a broad range of diseases and occupations (15, 18). It covers 4 domains: attendance, quality of work, performance capacity, and personal factors (social, mental, physical, and emotional) in 25 items. Five response options are offered (range 0–4) and the scale score is calculated out of 100, with 100 representing lowest productivity (15). The test–retest reliability and validity of the scale have only been tested in persons with depression (18). According to the developer, up to one-third of items can be missing and replaced by the mean of the remaining items.


The RA WIS was originally developed for RA to assess potential mismatch between workers' functional abilities and job demands, and to identify individuals who may be in need of work place modifications to sustain employment (19). Job flexibility, good working relationships, and symptom control were established as key constructs according to qualitative work during the development of this scale. The scale consists of 23 yes/no items (where yes = 1 and no = 0), for which a total score of <10 represents low work instability, 10–17 represents moderate work instability, and >17 represents high work instability (19). Increasing levels of work instability are believed to be associated with greater risk of work loss. This instrument has demonstrated a test–retest reliability of rs = 0.89 in workers with arthritis.


This scale was originally developed by Lerner et al to measure the impact of chronic diseases and treatment on work performance (26). The WLQ is a 25-item questionnaire asking about the proportion of time over the past 2 weeks during which difficulty was experienced in 4 different domains: time management, physical demands, mental-interpersonal, and output demands. Uniquely, the physical demands subscale has reversed instructions, and therefore, as per the developers' instructions, scores of the 3 other subscales are reversed such that higher scores consistently reflect a greater proportion of time spent having difficulty at work (27). With increasing popularity, the WLQ has been tested in various chronic conditions, including depression (28), OA (20), and RA (29). Because the WLQ quantifies presenteeism in terms of amount of time, it also has potential to be translated into a monetary value for economic estimations of the burden of illness. More recently, the calculation of a WLQ productivity loss index, based on the weighted sum of subscale scores (out of 100), was developed to estimate the percentage of productivity loss due to health problems (27). In this study, the psychometric properties for each of the 4 WLQ subscales were also separately evaluated, along with the WLQ productivity loss index.

Statistical analysis.

Sample description.

Univariate statistics (number, mean, SD, frequency distributions) were used to describe the sociodemographic characteristics of the participants.

Distribution of at-work productivity scores

Univariate statistics (number, mean, SD) were calculated for each of the at-work productivity measures and items used for construct validation of the scales. The distributions of the scale scores were examined for normality, as well as floor and ceiling effects, which were considered significant if >15% of participants were at the minimum or maximum score (30).


Cronbach's alpha coefficients, Kuder-Richardson Formula-20 (KR-20), and item-total correlations were calculated for relevant instruments to assess their reliability for this population. For unidimensional scales, a Cronbach's alpha or KR-20 of ≥0.70 is deemed acceptable at the group level (31, 32), although ≥0.90 is considered to be desirable (33). Item-total correlations of ≥0.3 for individual scale items are also desirable.

Construct validity

Theoretical constructs linked to the impact of health on work productivity were used to assess the degree of convergence with the measures (Table 1). These constructs, derived from individual items or scales adapted from existing tools, were classified as either work-oriented or disease-oriented constructs, although none were considered a gold standard. It is hypothesized that an ideal instrument designed for measuring at-work productivity might demonstrate stronger associations to work-oriented constructs compared with disease-oriented constructs that do not specifically address the work element.

Table 1. Constructs used for assessing the validity of the at-work productivity measures, with a priori hypothesized levels of correlation*
ConstructInstrumentalization, source (ref.) Univariate statistics
Score rangeNo.Mean ± SD
  • *

    NRS = numeric rating scale; WHO = World Health Organization; HAQ = Health Assessment Questionnaire.

  • Moderate association hypothesized (0.5 < |r| < 0.75), with r in absolute value to correct for scoring orientation of some constructs.

  • Strong association hypothesized (|r| ≥ 0.75), with r in absolute value to correct for scoring orientation of some constructs.

  • §

    Known-groups validity assessed: logical and statistical significant gradient in scale scores expected (i.e., very much > to a degree > no; P < 0.05 for the analysis of variance F test).

Work-oriented constructs    
 Self-rated work productivityNRS single item0–102507.8 ± 2.2
 Perceived impact of health problems on workNRS single item, adapted from the Work Productivity and Activity Impairment Scale (34)0–102492.4 ± 2.5
 Self-rated difficulty doing workNRS single item1–72482.6 ± 1.6
 Satisfaction with occupational performanceNRS single item, adapted from the Canadian Occupational Performance Measure (35)0–102508.4 ± 1.6
 Self-rated ability to workNRS single item, adapted from the Canadian Occupational Performance Measure (35)0–102507.9 ± 2.1
 Intrusion of arthritis on work abilityWork item from the Illness Intrusiveness Rating Scale (36)1–72502.6 ± 1.6
 Self-rated job performance in past weekNRS single item, adapted from the WHO Health and Work Performance Questionnaire (37)0–102477.9 ± 1.8
 Arthritis hindrance on work performance§Single item adapted from the Health and Labor Questionnaire (38)0 = no, 1 = to a degree, 2 = very much2460.6 ± 0.7
Disease-oriented constructs    
 General perceived disabilityHAQ (39)0–32490.8 ± 0.6
 Self-rated arthritis severityNRS single item0–72503.3 ± 1.8
 Pain intensity over past weekNRS single item (40)0–100 (intervals of 10)25037.1 ± 27.6

Between-scale correlation

Spearman's correlation coefficients were calculated between at-work productivity measures to determine potential convergence of these scales. If 2 scales measure a similar construct, a high correlation (r >0.8) would be expected.

Responsiveness to change

The responsiveness of the measures to two different global indices of change was assessed, which offered different perspectives of at-work difficulties. First, change in work ability was assessed at the 12-month followup by a single item that asked respondents to rate their change in ability to do usual work activities in relation to baseline (0 = much worse, 5 = no change, 10 = much better). Second, change in work productivity (−10 = much worse, 0 = no change, 10 = much better) was assessed as a difference in the self-rated work productivity score (0–10) at baseline and at 12-month followup.

We hypothesized that specific instruments might be more responsive to constructs with a similar conceptualization of work disability; for example, the WALS (focused on the ability to perform tasks) versus the WLQ (focused on productivity loss from a temporal perspective). Both global indices of change were dichotomized in 2 ways to identify those who had improved (versus all others) and those who had deteriorated (versus all others). An improvement in work ability was defined as a change in self-rated work ability of >6, whereas deterioration was defined as a change of <4. In terms of work productivity, a change in self-rated work productivity >1 was considered an improvement, whereas a change <−1 was considered a deterioration. Defined cutoffs were adapted from previous guidelines on global rating scales in which values within ±1 are considered as unchanged (41, 42).

Analyses were conducted 3 ways based on the model by Deyo et al (43). First, mean change, SD of change, effect size, and standardized response mean (SRM; mean change divided by SD of change) were calculated for each scale in patients indicating a change. The second approach was to use a receiver operating characteristic (ROC) curve. Changes in at-work productivity scores (12-month minus baseline) were compared with the gold standard (yes/no) of improved or deteriorated according to the above criteria. The ability to correctly classify a respondent was described by plotting sensitivity (vertical) versus 1 − specificity (horizontal) for each scale. The area under the ROC curve (AUC) was then calculated using nonparametric (Wilcoxon's) statistics (44, 45) as an index to quantify discriminative ability. Finally, we correlated change scores for each scale against the global index of change using Spearman's rank correlations. This allowed use of the individual increments of change, and would weight most heavily in the center where there was a large proportion of people describing no change in ability to work. An overall rank was determined from the cumulative summation of rankings, based on the comparative performance on all individual responsiveness criteria assessed (7 total: effect size, SRM, 2 AUC for improvement and 2 for deterioration, and correlation coefficient r).


Sample description.

The study sample consisted of 250 participants previously diagnosed with RA (n = 120) or OA (n = 130), the majority of whom were female (82.7%). Their mean ± SD age was 50.6 ± 9.2 years (range 19–65 years). More than half (54.7%) had been diagnosed with arthritis for >5 years, and only 9.8% had been diagnosed for <1 year. Most (68.1%) were working full time, and 27.9% were working part time. The distribution of occupation types was: business/finance/administration (44.1%), health/science/arts/sports (31.1%), sales/services (17.6%), and trades/transport/equipment operators (7.2%). Almost half of the sample (48.0%) had ≥1 comorbidities. Followup subject retention rate was 85.2% (n = 213) at 12 months.

Description of the scales and constructs.

The mean scores for the various constructs (Table 1) and the low mean scores across all measures (Table 2) correspondingly suggested a relatively mild loss of at-work productivity in this sample. Only 26 (11.7%) of 223 participants in the sample surpassed a RA WIS score of 17, which is the marker indicating high risk for work loss. The lack of applicability of some items to the person's job was problematic in the WLQ physical demands subscale; 82 missing data points for one particular item (difficulty lifting) in the physical demands subscale were due to this. However, due to a generous allowance for missing data according to the scoring instructions (up to 50% imputable), physical demands subscale scores were available on 243 of 250 subjects.

Table 2. Univariate statistics and reliability indices for 5 different measures of at-work productivity in workers with arthritis*
Instrument (items, no.)Participants, no.Possible range of scoresMean ± SDWork productivity loss based on scale range, no. (%)Cronbach's alpha or KR-20Item-total correlation, range
  • *

    KR-20 = Kuder-Richardson Formula-20; WALS = Workplace Activity Limitations Scale; SPS-6 = 6-item Stanford Presenteeism Scale; EWPS = Endicott Work Productivity Scale; RA WIS = rheumatoid arthritis Work Instability Scale; WLQ = Work Limitations Questionnaire; N/A = not applicable; TM = time management; PD = physical demands; MI = mental-interpersonal; OD = output demands.

  • Higher scores indicate greater health-related productivity loss at work for all scales/subscales.

  • Only 11 of 12 items were included in scale summation, as per the developer's guide (16).

  • §

    Scores were reversed (the WLQ PD subscale has reverse instructions).

WALS (11)2340–338.0 ± 5.10 (0)8 (3.4)0.870.42–0.66
SPS-6 (6)2446–3013.3 ± 5.20 (0)33 (13.5)0.710.11–0.56
EWPS (25)2490–10021.5 ± 14.30 (0)8 (3.2)0.940.39–0.76
RA WIS (23)2230–238.3 ± 6.41 (0.4)21 (9.4)0.920.34–0.71
WLQ index2310–28.66.5 ± 5.50 (0)13 (5.6)N/AN/A
 TM (5)2410–100§29.8 ± 29.97 (2.9)48 (19.9)0.860.60–0.76
 PD (6)2430–10037.2 ± 29.78 (3.3)32 (13.2)0.770.38–0.63
 MI (9)2470–100§17.9 ± 23.03 (1.2)65 (26.3)0.940.50–0.90
 OD (5)2450–100§19.1 ± 23.13 (1.2)87 (35.5)0.880.61–0.81

Reliability and construct validity.

The Cronbach's alpha or KR-20 of the scales ranged from 0.71–0.94 (Table 2). The 5 scales generally achieved moderate correlations (|r| = 0.43–0.74) to work-oriented constructs, but a wider correlation range (|r| = 0.36–0.76) was observed against disease-oriented constructs (Table 3). Correlations between the 4 WLQ subscales and constructs were generally weaker (|r| = 0.23–0.65), although WLQ physical demands had especially low correlations to work-oriented constructs (|r| = 0.23–0.39). Specifically, the perceived impact of health problems on work correlated with the greatest number of scales/subscales (n = 8) at the hypothesized level. Among the scales, the RA WIS correlated to the largest number of constructs (n = 7 of 11) above the a priori hypothesized level, and also to the most work-oriented constructs (n = 4 of 8, tied with EWPS). The RA WIS also demonstrated the strongest known-group validity (F = 105.1 versus F = 9.5–68.9 for the other scales) in differentiating workers experiencing varying levels of arthritis hindrance to work.

Table 3. Correlations of at-work productivity measures against various theoretical constructs*
Theoretical constructs (a priori hypothesized relationship)WALSSPS-6EWPSRA WISWLQ indexTMPDMIOD
  • *

    See Table 2 for definitions.

  • A priori hypothesis met.

  • F values from analysis of variance between known groups ([0, no] vs. [1, to a degree] vs. [2, very much]) were reported.

  • §

    Tukey P < 0.05 for all t-test comparisons (i.e., 0 vs. 1, 1 vs. 2, 0 vs. 2) and evidence of logical gradient.

Work-oriented constructs  
 Self-rated work productivity (r < −0.5)−0.43−0.51−0.54−0.54−0.49−0.34−0.27−0.47−0.49
 Perceived impact of health problems on work (r > 0.5)0.620.670.640.730.670.500.390.620.65
 Self-rated difficulty doing work (r > 0.75)0.660.590.580.710.580.450.320.540.61
 Satisfaction with occupational performance (r < −0.75)−0.52−0.57−0.62−0.68−0.57−0.39−0.28−0.58−0.60
 Self-rated ability to work (r < −0.75)−0.58−0.59−0.62−0.67−0.60−0.41−0.32−0.58−0.63
 Intrusion of arthritis on work ability (r > 0.75)0.640.630.550.740.600.440.350.560.60
 Self-rated job performance in past week (r < −0.5)−0.45−0.49−0.62−0.56−0.49−0.37−0.23−0.52−0.51
 Arthritis hindrance on work performance68.9§55.2§40.8§105.1§51.8§25.49.530.144.5§
Disease-oriented constructs General perceived disability (r > 0.5)0.760.450.360.660.490.460.390.360.48
 Arthritis severity (r > 0.5)0.600.560.360.620.420.390.290.390.45
 Pain intensity over past week (r > 0.5)0.640.570.400.670.480.420.330.420.48
Total constructs that met expectation, no. of 11554721023

Correlation between scales.

Spearman's rank correlations between the 5 at-work productivity measures generally revealed only moderate correlations (0.5 < r < 0.75) among the scales (Table 4). Only the WALS and RA WIS were slightly more correlated (r = 0.77), whereas the EWPS and WALS showed the least agreement (r = 0.55). Three of the 4 WLQ subscales correlated well to the WLQ index score (for time management r = 0.76, for mental-interpersonal r = 0.82, and for output demands r = 0.87), although the physical demands subscale showed only a moderate correlation at r = 0.49. Notably, the physical demands subscale also had the lowest range of correlations with all other measures (r = 0.17–0.49).

Table 4. Spearman's correlations between measures of at-work productivity in workers with arthritis*
  • *

    See Table 2 for definitions.

RA WIS0.770.690.641     
WLQ index0.610.630.610.671    


In terms of responsiveness to change in work ability, the WALS was consistently the strongest performer, ranking first in 5 of the 7 measured indices and achieving the best cumulative summation of rankings (SOR) at SOR = 10 (Table 5, Figure 1). The RA WIS also compared favorably with the remaining scales, ranking second overall, and no worse than second in 5 of 7 criteria (SOR = 16). The WLQ index (fifth, SOR = 31) was less responsive to this construct compared with others, ranking no greater than fourth in any single index of responsiveness.

Table 5. Responsiveness indices for measures of at-work productivity against 2 theoretical global indicators of change*
  • *

    Comparative rankings based on the magnitude of the indices of the individual criteria are shown in parentheses. SRM = standardized response mean (mean change divided by SD of change); AUC = area under the receiver operating characteristic curve. See Table 2 for additional definitions.

  • Mean change divided by the SD of the baseline score.

Work ability         
  Mean of change−2.3−2.4−3.9−2.7−1.40.2−7.70.7−4.3
  SD of change2.94.911.
  Effect size−0.58 (1)−0.50 (2)−0.27 (5)−0.46 (3)−0.30 (4)0.01−0.250.03−0.21
  SRM−0.79 (1)−0.48 (3)−0.35 (4)−0.64 (2)−0.28 (5)0.01−0.20.02−0.18
  AUC0.71 (1)0.62 (3)0.55 (5)0.69 (2)0.57 (4)0.460.580.470.55
  Mean of change3.
  SD of change6.84.912.31.65.729.727.721.026.1
  Effect size0.65 (1)0.31 (3)0.52 (2)0.20 (4)0.18 (5)
  SRM0.50 (3)0.35 (4)0.63 (2)0.88 (1)0.20 (5)
  AUC0.76 (2)0.70 (5)0.77 (1)0.76 (2)0.71 (4)0.640.660.710.71
  Spearman's r−0.37 (1)−0.25 (3)−0.17 (5)−0.35 (2)−0.20 (4)−0.10−0.16−0.19−0.11
  Sum of rankings1023241631
 Overall rank13425
Work productivity         
  Mean of change−1.4−2.2−2.4−1.2−2.7−4.5−13.6−5.0−4.4
  SD of change3.
  Effect size−0.30 (3)−0.48 (1)−0.15 (5)−0.22 (4)−0.43 (2)−0.17−0.48−0.17−0.17
  SRM−0.37 (3)−0.42 (2)−0.15 (5)−0.29 (4)−0.64 (1)−0.16−0.5−0.21−0.17
  AUC0.64 (2)0.56 (4)0.52 (5)0.57 (3)0.66 (1)0.550.650.670.47
  Mean of change1.−
  SD of change6.44.313.
  Effect size0.21 (3)0.32 (2)0.45 (1)0.00 (5)0.07 (4)−
  SRM0.18 (3)0.38 (2)0.49 (1)0.00 (5)0.08 (4)−0.300.200.200.03
  AUC0.64 (3)0.69 (1)0.69 (1)0.55 (4)0.54 (5)0.560.630.650.56
  Spearman's r−0.24 (2)−0.25 (1)−0.22 (3)−0.17 (5)−0.19 (4)−0.03−0.22−0.26−0.11
  Sum of rankings1913212521
 Overall rank213 (tied)53 (tied)
Figure 1.

Receiver operating characteristic (ROC) curves assessing the responsiveness of the 5 at-work productivity measures to 2 different global indicators of change: A, change in work ability, and B, change in work productivity. Only ROC curves for deterioration are shown. In both parts, the y-axis = sensitivity (true- positive rate) and the x-axis = 1 – specificity (false-positive rate). The area under the curve (AUC) and 95% confidence intervals are indicated for each measure. The further the curve is away from the diagonal (representing AUC = 0.5), the more responsive the scale is to the change experienced in those who have deteriorated versus those who have not (i.e., no change or improved). WALS = Workplace Activity Limitations Scale; SPS-6 = 6-item Stanford Presenteeism Scale; EWPS = Endicott Work Productivity Scale; RA-WIS = rheumatoid arthritis Work Instability Scale; WLQ = Work Limitations Questionnaire.

More variable performance in the responsiveness indices was observed for change in work productivity, especially between improvements and deteriorations (Table 5, Figure 1). For improvements, the SPS-6 showed the largest effect size (−0.48), and the WLQ productivity loss index had the largest SRM (−0.64) and AUC (0.66). For deteriorations, however, the EWPS was the most responsive, with the best effect size (0.45), SRM (0.49), and AUC (0.69, tied with the SPS-6). Overall, the SPS-6 (SOR = 13) was considered the most sensitive and consistent performer against this global indicator of change, and the WALS was second overall (SOR = 19).


Direct comparison of the psychometric properties of 5 at-work productivity measures in a disease-matched sample was investigated to help fill an important gap in the literature (10, 11, 13). Although the study findings supported the cross-sectional validity and reliability of all 5 measures for workers with either RA or OA, it was not readily apparent that a single scale had emerged as clearly superior to the others.

The key weaknesses for the SPS-6 were its low internal consistency and an especially low range of item-total correlations. However, at 6 items, it was consistently responsive to both global indicators of change. The EWPS was a consistent performer, but comparatively, it did not stand out in any psychometric criteria. The 23-item RA WIS stood out in terms of construct validity because it had the top-ranked correlations with all but 1 of the work-oriented constructs. Although originally developed for RA, current results and recent research (46, 47) offer cumulative support for its validity for RA and OA. A strong correlation between the 11-item WALS and RA WIS was suggestive of some convergence in their conceptual foci. Given similar emphasis on assessing activity limitations at work, both scales aligned closely with associated constructs (e.g., difficulty doing work), and were predictably more responsive to changes in work ability than work productivity. In contrast, the WLQ productivity loss index, with its orientation to measure losses in productivity in terms of time, fared better in terms of responsiveness to changes in work productivity than work ability, although its correlation to productivity-oriented constructs (i.e., self-reported work productivity) were lower than expected. It should be considered that problems within the WLQ physical demands subscale may have affected the performance of the WLQ productivity loss index. In most psychometric categories, the physical demands subscale performed poorly, even when compared with its sister subscales. Recognizing that it uniquely has reversed instructions (asked to indicate time able, instead of time having difficulty), we speculate that lower internal consistency and construct validity may reflect the respondents' lack of awareness to the flip in response instructions. As such, it may be worthwhile to consider corrections to the physical demands subscale in future applications of this tool.

In our study, only low-to-moderate correlations were found between the 5 measures. Similarly, low correlations have also been reported with other work productivity measures (25, 48, 49). For example, the WLQ correlated at r = 0.50 (productivity loss index) and r = 0.20–0.49 (range of subscales) against the SPS-13 (25), and only at r = 0.38 with the Work Productivity Short Inventory (49). Collectively, this speaks to the lack of comparability between these instruments, which may be rooted in the contextualized nature of work disability. When assessing the effects of health interventions or adaptive work strategies, it is likely that some scales will be sensitive to improvements in task performance (16, 24), although it is important to recognize that such changes may be obscured in other scales that are more sensitive to other forms of improvements, such as time efficiency at work. Therefore, the lack of conceptual convergence between the scales and their tendencies to align with related constructs are likely a reflection of the diversity of the instruments' core concepts.

Methodologically, the use of a single sample of workers with arthritis was believed to be the optimal approach for standardizing the type and severity of health condition to provide a valid psychometric comparison of multiple measures in this population. The application of multiple approaches to evaluate responsiveness was also informative. Study limitations included the lack of objective or supervisor indicators of work performance to serve as constructs, because all questionnaire items were self-reported. Also, the ordering of the 5 measures was not randomized in the questionnaire. The generalizability of the current results to other health conditions cannot be ascertained without further investigations.

In making a determination of the best measure, it should be noted that we had revised our high expectations, being that none of the scales consistently achieved the expected correlations in terms of construct validity. In part, this may be due to the absence of a recognized gold standard with which to compare work productivity measures. It was also anticipated that the measures would correlate more strongly with work-oriented than disease-oriented constructs, yet only the performance of the EWPS appeared to fit this pattern. The lack of dissociation in these correlations may reflect that items in the at-work productivity measures were not sufficiently specific to the work demands of the respondents, or that the disease was impacting both our disease- and work-oriented constructs similarly (50). It should be considered that measures that demonstrate applicability across a wide range of work contexts are likely to relate to disease-oriented constructs such as activity limitations outside of work. However, measures with high specificity may result in a lack of generalizability across different work contexts.

In conclusion, variable and modest psychometric performances were demonstrated by the 5 instruments for measuring at-work productivity in workers with arthritis. No single scale was clearly superior, although the current results favor the RA WIS and WALS based on strong comparative performance against the other measures. Both are, incidentally, the only scales among the 5 that were originally developed for persons with arthritis. Direct users of these scales should consider 3 key things: 1) the target concept, be it difficulties with work tasks, risk of work disability, or loss of productivity, with considerations for its relevance to the specific diseases or occupations of interest; 2) the purpose of the intended application, being that the scales appeared to vary in responsiveness depending on the specific direction of change (i.e., measuring deterioration or improvement); and 3) our current study focused on worker productivity as an outcome measure. Additional work is needed to test the functioning of these scales as indicators of indirect costs in economic analyses.


All authors were involved in drafting the article or revising it critically for important intellectual content, and all authors approved the final version to be submitted for publication. Dr. Beaton had full access to all of the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis.

Study conception and design. Beaton, Tang, Gignac, Lacaille, Badley, Anis, Bombardier.

Acquisition of data. Beaton, Lacaille, Anis, Bombardier.

Analysis and interpretation of data. Beaton, Tang, Gignac, Lacaille.


Abbott provided an unrestricted grant to support data collection at one of our sites. Abbott had no direct role in the study design, data collection, analysis, or interpretation of the data, writing of the manuscript, approval of the manuscript content, or in the publication process for this work. Neither submission nor publication of this article were contingent on any approval from Abbott.


The authors would like to acknowledge the participating institutions: the Mount Sinai Hospital (Toronto, Ontario), the Martin Family Centre for Arthritis Care and Research at St. Michael's Hospital (Toronto, Ontario), and the Mary Pack Arthritis Program (Vancouver, British Columbia). We also thank the Institute for Work & Health (Toronto, Ontario) and the Arthritis Community Research & Evaluation Unit (Toronto, Ontario) for providing in-kind support for this study. Finally, the authors would like to acknowledge contributions from members of the Canadian Arthritis Network Work Productivity Group: Xingshan Cao, Timea Donka, Rebecca Dubé, Katherine Edwards, Novelette Fraser, Taucha Inrig, Carol Kennedy, Jessica Lee, Xin Li, Samra Mian, Ludmila Mironyuk, Anusha Govinda-Raj, Pam Rogers, Rebeka Sujic, Debbie Sutton, Ada Todd, Dwayne Van Eerd, Rebecca Wickett, Jessica Widdifield, and Wei Zhang.