Mail-delivered arthritis self-management tool kit: A randomized trial and longitudinal followup


  • identifier: NCT00449474.



To determine the effectiveness of an intervention Tool Kit of arthritis self-management materials to be sent once through the mail, and to describe the populations reached.


Spanish speakers (n = 335), non-Hispanic English-speaking African Americans (n = 156), and other non-Hispanic English speakers (n = 404) were recruited separately and randomized within each of the 3 ethnic/racial categories to immediately receive the intervention Tool Kit (n = 458) or to a 4-month wait-list control status (n = 463). At the end of 4 months, controls were sent the Tool Kit. All subjects were followed in a longitudinal study for 9 months. Self-administered measures included health status, health behavior, arthritis self-efficacy, medical care utilization, and demographic variables. Using analyses of covariance and t-tests, analyses were conducted for all participants and for Spanish- and English-language groups.


At 4 months, comparing all intervention subjects with randomized wait-list controls, there were significant (P < 0.01) benefits in all outcomes except medical care utilization and self-rated health. The results were maintained at 9 months compared with baseline. On average, the Tool Kit reached persons ages 50–56 years with 12–15 years of schooling. There were few differences between English- and Spanish-language participants in either the effectiveness or reach variables.


A mailed Arthritis Self-Management Tool Kit proved effective in improving health status, health behavior, and self-efficacy variables for up to 9 months. It also reached younger persons in both English- and Spanish-language groups and Spanish speakers with higher education levels than previous studies of the small-group Arthritis Self-Management Program.


Arthritis is the most common cause of disability, a major reason for outpatient visits, and one of the most prevalent chronic conditions (1, 2). Among older people, arthritis frequently complicates other chronic conditions (3). Over the past 25 years, consistent evidence has been accumulated of the effectiveness of arthritis self-management education packaged in small-group, home study, computer, and Internet modalities (4–10).

Consequently, numerous national bodies have recommended arthritis self-management education to complement medical care (11–14). Despite these recommendations, arthritis self-management education has reached only a limited number of people (Boutaugh M: personal communication). Many Arthritis Foundation chapters have had difficulty disseminating arthritis self-management education programs. Supplementing the 6-lesson small-group format of the classic Arthritis Self-Management Program (ASMP) with one-time mail delivery may foster dissemination by lessening problems in recruiting, training, and retaining group leaders; identifying accessible and acceptable meeting places; and scheduling programs. One study noted that many vulnerable populations have not been included in study samples (15). Another study reported that less than 50% of a closed eligible population participated, even when Internet and small-group programs were offered repeatedly over many years (16). Other recent studies have questioned clinical significance (17, 18). No studies were found where arthritis self-management education was delivered in a one-time mailing.

This study sought to replicate effectiveness findings, establish clinical significance, and extend the reach of arthritis self-management education. Here we report on the 4-month randomized trial and 9-month longitudinal study of a mailed Arthritis Self-Management Tool Kit.


The Arthritis Program of the Centers for Disease Control and Prevention (CDC) sponsored the development, dissemination, and evaluation of the Tool Kit. The mail-delivered Tool Kit was designed for persons with multiple arthritic conditions, English- or Spanish-language skills, and a range of literacy levels. Participants could self-tailor the materials.

Intervention description.

The Arthritis Self-Management Tool Kit was packaged in a plastic envelope and contains 1) a “Self Test” to help participants determine how arthritis affects their lives and self-tailor the use of the Tool Kit, including items related to pain, fatigue, physical limitations, and health worries; participants score this test themselves and are directed to specific parts of the Tool Kit based on their scores; 2) information sheets: Working with Your Doctor and the Health Care System, Exercise, Medications, Healthy Eating, Fatigue and Pain Management, Finding Community Resources, and Dealing with One's Emotions; 3) information sheets on key process components of the ASMP: Action Planning, Problem Solving, Deciding What to Try, and Individualizing an Exercise Program; 4) The Arthritis Helpbook or Cómo Convivir Con Sur Artritis (19, 20); 5) audio relaxation and exercise compact discs (CDs); and 6) an audio CD of all material printed on the information sheets.

Intervention formatting.

In an effort to reach audiences diverse in language and literacy levels, the information sheets were written in English, translated to Spanish, and then translated back to English. They were recorded by persons of both sexes, different ages, and varying types of arthritis and racial/ethnic groups. This ensured modeling of essential content and processes by persons similar to the participants. The English version used African American, Asian American, and European American voices, and the Spanish version used Hispanic American, Mexican, and Central American voices.

The Spanish audio exercise CD was translated into English, and the English audio relaxation CD was translated into Spanish. The Arthritis Helpbook and Cómo Convivir Con Sur Artritis were originally written in English and Spanish, respectively.

The Arthritis Helpbook and Cómo Convivir Con Sur Artritis were written at sixth- to seventh-grade levels, and the information sheets were written at eighth- to ninth-grade levels. The CDs are designed for individuals who choose to listen to, rather than read, the materials. All materials are culturally appropriate presentations, not linguistically precise translations of the language in which they were created.

Study design.

The Tool Kit was mailed to intervention participants at the beginning of the study and to control participants when they had completed their 4-month questionnaire. Longitudinal data were collected from all participants 9 months after receiving the Tool Kit.

Coordinating centers.

Stanford University coordinated the Spanish-language arm of the study and the University of North Carolina (UNC) at Chapel Hill coordinated the English arm of the study. The Tool Kit intervention and study protocols were developed collaboratively and used at both universities.

Sample composition.

The study was designed to include 900 participants, randomized equally into treatment and control groups. We aimed to include 300 Spanish-speaking participants and 600 English-language participants, including 300 non-Hispanic African Americans and 300 non-Hispanic whites. This would allow enough power at a 0.80 level to detect effect size differences of 0.3 at the 0.05 significance level (2-tailed test) in the total sample and within each of the 3 ethnic groups, given an expected attrition of 20%. The effect size (0.3) was the level considered to reflect clinically significant changes in previous outcome studies (21). Although 900 participants were enrolled, the number of African American participants was less than intended, so the English-speaking subset could not be further segmented by ethnicity.


We employed a phased targeted recruitment strategy in an effort to reach both Spanish and English speakers, including African Americans. Nineteen CDC-sponsored state health department arthritis units were recruited in phase 1. They received a recruitment packet containing general recruitment tips and specific suggestions for each population, as well as press releases, public service announcements, and sample flyers in both English and Spanish. Units had access to a dedicated Web site where all recruitment materials could be downloaded. They were invited to participate in monthly conference calls and were encouraged to contact UNC for technical assistance; UNC also e-mailed monthly recruitment updates. This 3-month effort resulted in the recruitment of 85 study participants, 60 (71%) of whom were non-Hispanic whites.

Because few participants were recruited, researchers assumed responsibility for recruiting in phase 2. We advertised with flyers and in lay health magazines, gave face-to-face talks, made guest appearances on radio and television talk shows, sent e-mail and Web announcements, and encouraged professional referrals. An additional 836 participants were recruited and enrolled. The most effective recruiting methods were group-specific and included the use of public service announcements on national Spanish-language television, participation in African American radio talk shows, and personal contact (non-Hispanic African Americans). Advertisements in lay health magazines and announcements on arthritis Web sites resulted in the greatest number of non-Hispanic white participants.

Potential participants called English- or Spanish-language toll-free telephone numbers, where a staff member explained the study and determined their eligibility. The study protocol and consent forms were approved by the Institutional Review Boards for human subjects at Stanford and UNC.

Eligibility criteria.

There were 4 eligibility criteria: age ≥18 years; self-reported physician diagnosis of osteoarthritis, rheumatoid arthritis, or fibromyalgia, or the presence of chronic joint symptoms as determined by CDC criteria; no prior participation in the small-group ASMP or Chronic Disease Self-Management Program; and the ability to complete research questionnaires.

Data collection procedures.

Outcome data were collected via a self-report questionnaire in the language and by the method preferred by the participant, either by mail or telephone. Previous studies have determined that equivalent data are collected by both data collection methods (22). Baseline data were collected prior to randomization. Three attempts were made by telephone at varying times, to gather missing or incomplete data.

Outcome measures.

There were 7 health status measures. Visual numeric scales used to measure pain and fatigue were developed at Stanford and were found to correlate highly with visual analog scales with a higher completion rate (23). The Health Distress Scale was adapted from the Medical Outcomes Study and focuses on the distress associated with health problems (24). Self-rated global health comes from the National Health Survey and has been found to be predictive of future health status (25). The Activities Limitation Scale measures the impact of disease on role activities such as recreation and chores (26). The 8-item version of the Health Assessment Questionnaire (HAQ) measures disability and is based on the measure used in the National Health Survey (27). Reported internal consistency and test–retest reliabilities for health distress, global health, activities limitation, and HAQ measures ranged from 0.85–0.92 (24–27). Depression was measured with the Patient Health Questionnaire, a 9-item self-administered version of the PRIME-MD developed to screen for depression in a medical office setting (28).

Three health-related behaviors were measured: stretching and strengthening exercise, aerobic exercise, and the use of techniques to improve communication with doctors. These instruments were developed and validated by the Stanford Patient Education Research Center (29).

Four medical care utilization measures were used: self-reported outpatient visits to physicians, emergency room visits, number of nights in the hospital, and number of hospitalizations. In a previous study, Ritter et al found that self-report of outpatient visits (r = 0.70) and days in the hospital (r = 0.83) correlated with chart audit data (30).

We also measured perceived self-efficacy, i.e., participants' confidence to manage their arthritis. The short (8-item) scale was built on an earlier model (31), and it has a Cronbach's alpha of 0.94.

Detailed information about these instruments, including their psychometric properties, can be found at the Stanford Patient Education Research Center Web site (29). The psychometric properties of the Spanish-language instruments are described in detail by González et al (32, 33).

Additional variables.

Demographic variables included age, sex, years of education, marital status, and race/ethnicity. Participants were also asked about their arthritis diagnosis or condition and how much they used and how useful they found the various Tool Kit materials. Use and usefulness items were developed for this study. Participants were asked to indicate whether they used each resource in the Tool Kit never, a few times, several times, or regularly. They were asked to rate the usefulness of the materials on a scale of 0–10 (not at all useful to very useful).

Statistical analyses.

Based on prior studies of the small-group ASMP, we hypothesized that program participants, in comparison with those randomized to the control group, would experience better outcomes at 4 months for health status measures, health behaviors, and arthritis self-efficacy, and would also have reductions in medical care utilization variables. It was further hypothesized that these differences would be maintained 9 months after entry into the program.

T-tests were used to compare baseline demographic and outcome variables for the intervention participants with those for the usual-care controls, and to compare the 4-month and 9-month dropouts from data collection with those who completed questionnaires. An analysis of covariance (ANCOVA) was used to compare the intervention effect (treatment versus control) at 4 months after controlling for the baseline value of the outcome variable. All analyses were performed using only cases with data at 4 months and with all cases. The all-case analysis assumed that those not completing the 4-month questionnaires had no change from baseline (intent-to-treat). The sample was then segmented by language group, English or Spanish, and analyzed separately. The analyses were repeated to look at outcomes at 9 months after controlling for baseline values. Means and percentages were computed on all use and usefulness variables.

To establish the clinical meaningfulness of statistically significant changes, we also examined how many of the 7 health status outcomes improved by an effect size of ≥0.30 (defined as the change score over the pooled SD of the baseline score) at 4 months for each individual. Fischer et al found that an effect size of 0.30 was personally important for people with chronic conditions (21). We also identified participants with ≥3 improvements of an effect size of ≥0.30. Three or more improvements is the criterion we have used in previous studies (34, 35). Treatment participants were compared with control participants using t-tests for the sum of improvements. Chi-square tests were computed to compare treatment and control improvements of an effect size of ≥0.30, and participants who had ≥3 versus <3 improvements of an effect size of 0.3. All data analyses were performed using SAS, version 9.1 (36).



After 921 participants completed informed consent forms and baseline questionnaires, they were randomized to usual-care control (n = 463) and intervention (n = 458) groups. Of these, 414 (89%) usual-care and 359 (78%) intervention participants completed 4-month questionnaires. Nine-month questionnaires were completed by 648 (70%) participants (Figure 1).

Figure 1.

Flow chart of study participants. * The large number of Spanish speakers who left contact information but then did not participate reflects less available program space than interest in the Spanish-language arm of the intervention. Participation was offered on a first-come, first-serve basis. Sp = Spanish; Eng = English.

The mean ages of the participants were 54.3 years for the intervention group and 53.4 years for the control group. The mean numbers of years of education were 13.6 for the intervention group and 13.9 for the control group. Approximately 15% of both groups were men, and ∼50% were married (Table 1). There were no significant differences in any of the demographic variables between the control and intervention groups. Means were also comparable between language groups, although Spanish speakers were 2–3 years less educated than English speakers. A total of 546 participants (51%) reported having osteoarthritis, 238 (33%) reported having rheumatoid arthritis, 441 (30%) reported having fibromyalgia, and 120 (13.5%) reported having other arthritic conditions, including those who met the CDC criteria for chronic joint symptoms. Of the English-speaking participants, 14–16% reported having fibromyalgia alone; far fewer Spanish-speaking participants (4–5%) did so. The average number of comorbidities was 1.

Table 1. Baseline demographic and disease variables*
 TotalEnglish speakersSpanish speakers
Treatment (n = 458)Control (n = 463)Treatment (n = 294)Control (n = 292)Treatment (n = 164)Control (n = 171)
  • *

    Values are the percentage unless otherwise indicated.

Age, mean ± SD (median) years (range 18–95)54.3 ± 12.2 (54.4)53.4 ± 12.3 (53.8)56.3 ± 11.7 (55.9)55.3 ± 12.0 (55.7)51.1 ± 12.3 (49.7)50.2 ± 12.0 (51.8)
Education, mean ± SD years (range 2–23)13.6 ± 3.8313.9 ± 3.8014.8 ± 2.9815.1 ± 2.9911.6 ± 4.3112.0 ± 4.25
Non-Hispanic white43.743.668.069.200
Non-Hispanic African American17.216.626.926.400
Rheumatoid arthritis33.832.921.122.356.750.9
Only fibromyalgia10.511.913.915.84.35.3
Chronic joint symptoms<
Any other arthritic condition17.517.814.013.725.025.7
Comorbidities, mean ± SD (range 0–6)1.15 ± 1.221.30 ± 1.301.33 ± 1.261.52 ± 1.390.49 ± 0.500.51 ± 0.50


Table 2 shows the means at baseline for the outcome variables. Only communication with a physician and nights in the hospital were significantly different between experimental and control participants (P = 0.019 and 0.032, respectively). After applying Bonferroni corrections, neither was significantly different.

Table 2. Baseline values of outcome variables*
 Range, desirable directionTotalEnglish speakersSpanish speakers
Treatment (n = 458)Control (n = 463)Treatment (n = 294)Control (n = 292)Treatment (n = 164)Control (n = 171)
  • *

    Values are the mean ± SD unless otherwise indicated. PHQ = Patient Health Questionnaire.

Health distress0–5, ↓2.72 ± 1.362.71 ± 1.332.39 ± 1.302.32 ± 1.233.33 ± 1.243.37 ± 1.22
Activity limitation0–4, ↓2.04 ± 1.142.06 ± 1.071.98 ± 1.152.13 ± 1.062.14 ± 1.101.96 ± 1.07
General health1–5, ↓3.33 ± 0.9573.44 ± 0.9433.11 ± 0.8903.30 ± 0.9063.73 ± 0.9423.66 ± 0.965
Disability0–3, ↓0.652 ± 0.4960.636 ± 0.4890.561 ± 0.4550.598 ± 0.4530.840 ± 0.5130.727 ± 0.528
Depression (PHQ scale)0–27, ↓10.2 ± 6.599.79 ± 6.419.80 ± 6.799.41 ± 6.4110.8 ± 6.1910.4 ± 6.36
Pain, visual numeric scale0–10, ↓6.93 ± 2.266.92 ± 2.276.60 ± 2.166.69 ± 2.117.54 ± 2.287.30 ± 2.47
Fatigue, visual numeric scale0–10, ↓6.39 ± 2.606.43 ± 2.556.17 ± 2.576.42 ± 2.366.79 ± 2.616.44 ± 2.85
Arthritis self-efficacy1–10, ↑5.12 ± 2.245.19 ± 2.265.05 ± 2.205.10 ± 2.135.27 ± 2.315.34 ± 2.46
Aerobic exercise, minutes/week85.5 ± 10195.0 ± 10993.3 ± 11195.9 ± 10471.4 ± 78.193.6 ± 118
Range of motion exercise, minutes/week41.0 ± 52.943.9 ± 55.346.0 ± 54.845.8 ± 55.132.0 ± 48.140.1 ± 55.4
Communication with a doctor0–5, ↑2.70 ± 1.322.90 ± 1.303.01 ± 1.243.12 ± 1.212.14 ± 1.272.52 ± 1.37
Physician visits, past 4 months0–503.83 ± 5.094.38 ± 5.874.60 ± 5.755.48 ± 6.812.46 ± 3.202.53 ± 2.96
Emergency department visits, past 4 months0–200.359 ± 1.220.313 ± 0.8320.399 ± 1.440.251 ± 0.7180.287 ± 0.7160.415 ± 0.987
Hospitalizations, past 4 months0–30.103 ± 0.3510.148 ± 0.5500.099 ± 0.3420.071 ± 0.2700.110 ± 0.3680.275 ± 0.812
Nights in the hospital, past 4 months0–120.273 ± 1.210.657 ± 3.640.344 ± 1.4570.502 ± 3.300.146 ± 0.5890.918 ± 4.15

Instrument reliability.

When Cronbach's alphas were computed for multi-item health status instruments, the results were consistent with previous results (23–27). For English speakers, the Cronbach's alphas were 0.93, 0.86, 0.91, and 0.92 for activities limitation, HAQ disability, health distress, and self-efficacy, respectively. For Spanish speakers, the Cronbach's alphas were 0.91, 0.87, 0.88, and 0.95 for activities limitation, HAQ disability, health distress, and self-efficacy, respectively. The newer Patient Health Questionnaire depression scale had an internal consistency reliability of 0.90 in English and 0.87 in Spanish.


Those who failed to complete the 4-month questionnaire were younger and more likely to be English speakers than those who completed the questionnaires (P < 0.01). They also had statistically significant higher levels of depression and fatigue and did less aerobic exercise (P = 0.01, 0.03, and 0.01, respectively).

When the 4-month treatment noncompleters (n = 49) were compared with 4-month control noncompleters (n = 99), the controls were more likely to be Spanish speakers (P = 0.021) and have higher health distress at baseline than the treatment noncompleters (P = 0.024). All other outcome variables were similar at baseline for treatment and control participants.

When the 9-month treatment dropouts (n = 131) were compared with control dropouts (n = 143) at baseline, the only significant differences were that the treatment dropouts were more likely to be non-Hispanic African Americans and English speakers (P = 0.035). There were no significant differences in the outcome variables.

Four-month outcomes.

Changes in all health status variables were in the hypothesized direction, with 6 of 7 variables being significantly different between treatment and control groups after correcting for multiple comparisons. Changes in all 3 health behaviors were also in the hypothesized direction and statistically significant, as was the change in self-efficacy. There were no significant differences in medical care utilization variables. When ANCOVAs were rerun using intent-to-treat methodology (last value substituting for missing data), the results were nearly identical. Table 3 shows the mean change scores from the actual (non-missing) cases and the associated P values. When we segmented the sample by language group, the group results were substantially the same as the overall group (Table 3).

Table 3. Four-month change scores*
 TotalEnglish speakersSpanish speakers
Treatment (n = 359)Control (n = 414)PTreatment (n = 215)Control (n = 261)PTreatment (n = 144)Control (n = 153)P
  • *

    Values are the change ± SD. PHQ = Patient Health Questionnaire.

  • From analyses of covariance.

Health distress−0.529 ± 1.16−0.092 ± 1.05< 0.001−0.391 ± 1.020.045 ± 0.988< 0.001−0.734 ± 1.33−0.325 ± 1.10< 0.001
Activity limitation−0.389 ± 0.965−0.060 ± 0.823< 0.001−0.344 ± 0.864−0.034 ± 0.7890.003−0.457 ± 1.10−0.103 ± 0.878< 0.001
General health−0.059 ± 0.7590.00 ± 0.7190.039−0.024 ± 0.7250.019 ± 0.7080.253−0.182 ± 0.793−0.032 ± 0.7380.082
Disability−0.100 ± 0.3570.026 ± 0.343< 0.001−0.059 ± 0.3000.031 ± 0.294< 0.001−0.160 ± 0.4220.016 ± 0.379< 0.001
Depression (PHQ scale)−1.45 ± 5.070.166 ± 4.30< 0.001−1.35 ± 4.680.354 ± 4.03< 0.001−1.60 ± 5.61−0.150 ± 4.720.015
Pain, visual numeric scale−1.23 ± 2.26−0.488 ± 2.02< 0.001−1.01 ± 1.99−0.355 ± 1.85< 0.001−1.55 ± 2.58−0.712 ± 2.260.003
Fatigue, visual numeric scale−0.807 ± 2.38−0.237 ± 2.17< 0.001−0.451 ± 2.05−0.050 ± 1.9600.002−1.33 ± 2.72−0.556 ± 2.460.015
Arthritis self-efficacy0.837 ± 2.280.088 ± 2.07< 0.0010.902 ± 2.180.046 ± 1.90< 0.0010.740 ± 2.430.159 ± 2.330.034
Aerobic exercise, minutes/week40.2 ± 119−7.37 ± 103< 0.00131.7 ± 113−11.0 ± 94.2< 0.00152.8 ± 126−1.18 ± 1170.001
Range of motion exercise, minutes/week24.2 ± 68.13.05 ± 62.2< 0.00111.9 ± 59.5−2.88 ± 55.2< 0.00142.6 ± 75.913.1 ± 71.760.002
Communication with a doctor0.262 ± 1.15−0.010 ± 1.050.0170.213 ± 1.03−0.010 ± 0.9630.0460.333 ± 1.32−0.011 ± 1.190.194
Physician visits, past 4 months0.009 ± 4.77−0.362 ± 4.620.987−0.010 ± 5.53−0.400 ± 5.370.7890.007 ± 3.40−0.300 ± 3.050.451
Emergency department visits, past 4 months−0.057 ± 0.891−0.018 ± 0.9820.461−0.048 ± 0.9470.036 ± 1.080.689−0.069 ± 0.8080−0.105 ± 0.8040.601
Hospitalizations, past 4 months0.063 ± 1.01−0.012 ± 0.5760.5270.101 ± 1.260.024 ± 0.4010.2510.007 ± 0.418−0.072 ± 0.7790.563
Nights in the hospital, past 4 months0.140 ± 2.21−0.086 ± 4.840.6380.126 ± 2.450.071 ± 5.120.9890.159 ± 1.81−0.346 ± 4.330.470

All outcome analyses were repeated using change scores and simple t-tests instead of ANCOVAS. Statistically significant improvements were identical, although the outcome for communication with a doctor was stronger using t-tests (P < 0.001) rather than ANCOVAS (P = 0.017). As noted, the differences between the treatment and control groups for communication with a doctor at baseline were not statistically significant after taking into account multiple comparisons.

Nine-month outcomes.

After 4 months of waiting, control participants were sent the same intervention materials (the Arthritis Self-Management Tool Kit) as treatment participants. All participants were asked to complete a questionnaire 9 months after receiving the materials. Of those originally randomized, 70% completed the questionnaire (Table 4). When t-tests were used to compare baseline values with the values of the outcome variables at 9 months, the 7 health indicators, the 3 health behaviors, and arthritis self-efficacy were significantly improved at 9 months (Table 4). There were no significant changes in health care utilization after taking into account multiple comparisons, but emergency department visits were marginally reduced (P = 0.012). The statistically significant tests using intent-to-treat methodology were nearly identical to the same tests using only those who completed 9-month questionnaires. When the data were broken down by language group, once again Spanish and English speakers showed the same results as the overall sample, with the exception that overall general health was not significantly improved for Spanish speakers.

Table 4. Nine-month change scores*
 Total (n = 648)English speakers (n = 398)Spanish speakers (n = 250)
Change ± SDP*Change ± SDP*Change ± SDP*
  • *

    From t-tests comparing the change score with zero change.

Health distress−0.580 ± 1.15< 0.001−0.432 ± 1.01< 0.001−0.813 ± 1.32< 0.001
Activity limitation−0.460 ± 1.01< 0.001−0.445 ± 0.925< 0.001−0.482 ± 1.13< 0.001
General health−0.120 ± 0.772< 0.001−0.030 ± 0.7390.413−0.260 ± 0.802< 0.001
Disability−0.106 ± 0.388< 0.001−0.076 ± 0.347< 0.001−0.163 ± 0.439< 0.001
Depression (Patient Health Questionnaire scale)−1.96 ± 5.10< 0.001−1.94 ± 4.94< 0.001−2.01 ± 5.36< 0.001
Pain, visual numeric scale−1.27 ± 2.34< 0.001−1.09 ± 2.06< 0.001−1.54 ± 2.69< 0.001
Fatigue, visual numeric scale−0.859 ± 2.52< 0.001−0.698 ± 2.06< 0.001−1.11 ± 3.08< 0.001
Arthritis self-efficacy0.869 ± 2.28< 0.0010.941 ± 2.22< 0.0010.755 ± 2.38< 0.001
Aerobic exercise, minutes/week32.1 ± 114< 0.00123.0 ± 98.6< 0.00146.7 ± 133< 0.001
Range of motion exercise, minutes/week20.3 ± 66.0< 0.00113.9 ± 59.6< 0.00130.4 ± 73.9< 0.001
Communication with a doctor0.304 ± 1.13< 0.0010.320 ± 1.06< 0.0010.280 ± 1.22< 0.001
Physician visits, past 4 months−0.268 ± 4.970.157−0.432 ± 5.640.154−0.016 ± 3.700.946
Emergency department visits, past 4 months−0.079 ± 0.7830.011−0.065 ± 0.8030.113−0.100 ± 0.7510.036
Hospitalizations, past 4 months0.009 ± 0.5650.6730.065 ± 0.5720.026−0.076 ± 0.5430.028
Nights in the hospital, past 4 months0.047 ± 2.410.6220.204 ± 2.890.166−0.196 ± 1.330.021

Additional analyses.

Clinically meaningful change.

Statistical improvements of an effect size of ≥0.30 for ≥3 of the 7 health indicators were considered clinically meaningful. At 4 months, 55% of treatment participants met this criterion compared with 34% of control participants (chi-square P < 0.001). The mean number of improvements at 4 months of an effect size of ≥0.30 among the 7 health indicators was 2.9 for the treatment participants and 2.0 for the usual-care control participants (P < 0.001). Table 5 shows the proportion of treatment and control participants who improved by an effect size of ≥0.30 for each of the 7 health indicators at 4 months. After taking into account multiple comparisons, the difference between treatment and control group participants is significant for 6 of the 7 outcomes.

Table 5. Percentage of participants who improved by effect sizes of ≥0.30 for each health indicator variable (4 months)
VariableTreatmentControlP (treatment vs. control)
Health distress52.633.2< 0.001
Activity limitation46.926.9< 0.001
General health24.518.00.029
Disability34.920.6< 0.001
Depression (Patient Health Questionnaire scale)23.815.40.003
Pain, visual numeric scale59.445.1< 0.001
Fatigue, visual numeric scale51.138.2< 0.001

Use of the materials.

Four months after receiving the Tool Kit, participants were asked how they used the materials and what they found useful. Only 3% of participants reported not using any of the materials.

Participants who used the Tool Kit found it useful (mean ± SD 7.40 ± 2.51). The proportion of the participants who used the different types of materials at least some of the time varied from 95% for the book to 69% for the exercise CD. The book was rated the most useful of the materials, with only 3% of the participants finding it not useful. In contrast, the exercise and relaxation CDs were not useful for 19% and 13% of the participants, respectively. The Spanish speakers were the most likely to find every component of the Tool Kit helpful.


CDC-funded state arthritis programs are not required, nor do they have the resources, to participate in research. Consequently, it is not surprising that they had limited success in recruiting participants. This finding contributed to the CDC decision to embed the dissemination of arthritis self-management intervention programs in existing program delivery systems like Arthritis Foundation chapters, and to reconsider how programs could be promoted through public health systems.

The hypotheses that there would be improvement in all outcome variables at 4 and 9 months after receiving the materials were partially confirmed. There were significant improvements in health status variables, health behaviors, and arthritis self-efficacy, but not in health care utilization.

The results were similar for Spanish-speaking participants and English-speaking non-Hispanic whites, although there was insufficient power among non-Hispanic African Americans to adequately test the significance of the changes in outcomes. All changes except for the change in self-reported general health were in the expected direction at both 4 and 9 months and are comparable with the findings of Goeppinger and colleagues among a largely African American sample (15).

These findings suggest that an arthritis self-management intervention, packaged in a Tool Kit and sent to participants in a single mailing, was as effective as previously studied intervention delivery modalities (37). A single mailing reached younger participants and better educated Spanish speakers, both with fewer comorbidities, than the small-group ASMP (15, 38).

Based on the results of this study, Arthritis Foundation chapters, public health agencies, and health care practitioners can confidently promote both the small-group ASMP and mail-delivered Tool Kit interventions (Arthritis Self-Management Tool Kit; Bull Publishing Company, Boulder, CO). The Tool Kit reached not only working-age populations where small-group interventions may be less feasible, but also a better educated Spanish-speaking sample than in earlier effectiveness trials. The Tool Kit may also be useful for organizations with limited financial or programmatic resources for dissemination. The availability and efficacy of the Tool Kit in Spanish and English also enhances its value. The CDC Arthritis Program has listed the Tool Kit as a “promising practice” and allows funded state health departments to use federal monies to support dissemination.

The main limitations of this study are methodologic. Our inability to enroll the targeted number of African American participants suggests that our recruitment methods may have been inappropriate or the mailed Tool Kit format unappealing. Barriers to research participation by African Americans are well-documented and may have also been causes (39). A future study of the program might target non-Hispanic African Americans and use the recruitment strategies we found most successful.

Because participants could not be blinded to the intervention, we cannot rule out the possibility of an attention effect. It is unlikely, however, that a one-time mailing would have sustained attention effects for 9 months. We also cannot entirely rule out that differential health among noncompleters of questionnaires had an effect on the results, particularly at 9 months, when there is no longer a randomized control group. At 4 months, the relative similarity at baseline between the treatment and control noncompleters suggests that this is unlikely to be a serious problem. Nine-month results must be seen more cautiously.

Examining 15 outcome variables increased the risk of Type I error, but the overall consistency of the effectiveness results suggests that Type I error was unlikely. The differences in change scores at 4 months and 9 months for all but medical utilization measures favored the treatment group, and all scores except self-reported general health and communication with a physician at 4 months were significant at P < 0.001.

Utilizing ANCOVAS to present 4-month results controlling for baseline values of outcome variables was a more conservative approach than using t-tests, although it did not affect any variables except for communication with a doctor. The change in this variable was statistically significant in both the English and Spanish subsamples using t-tests, but not when using ANCOVAS.

It should be noted that, although statistically and even clinically significant, some improvements are modest. The question remains: do they have clinical meaning? We think that they do. The study sample is very heterogeneous on all demographic and disease variables. In addition, there were no exclusion criteria based on symptom severity. Many study participants probably enrolled with low levels of symptom severity and therefore had little room for improvement (floor effect). All of these factors influence effect sizes. Fifty-five percent of the treatment group reported improvements of an effect size of ≥0.30 for 3 or more of the 7 health indicators. Participants also reported that the materials were useful. Given these findings, it is important that the mailed Tool Kit be included in the approved list of evidence-based arthritis self-management interventions.


All authors were involved in contributions to study conception and design, acquisition of data, or analysis and interpretation of data, and drafting the article or revising it critically for important intellectual content, and all authors approved the final version to be submitted for publication. Dr. Goeppinger had full access to all of the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis.


We gratefully acknowledge the able assistance of Teresa J. Brady, PhD, our Technical Monitor at the CDC; Katy Plant, MPH, who helped develop the English-language exercise CDs and the personalized CD recordings of the information sheets included in the Arthritis Self-Management Tool Kit; Virginia González, who developed the Spanish-language information sheets; and Janice Pigg, MSN, who energetically recruited respondents for the needs assessment.