Collaboration with patients in the design of patient-reported outcome measures: Capturing the experience of fatigue in rheumatoid arthritis




Patient-reported outcome measures (PROMs) need to include concepts and language relevant to patients and be easily understood. These studies aimed to develop draft PROMs to measure rheumatoid arthritis (RA) fatigue and its impact by collaborating with patients to identify language and experiences, create draft PROM items, and test them for comprehension, with decisions supported throughout by a patient research partner.


In study 1, interview transcripts of RA patients describing their fatigue (n = 15) were subjected to content and inductive thematic analysis to identify fatigue language and experiences. In study 2, 3 focus groups of RA patients (n = 17) explored these and developed the wording for visual analog scales (VAS) and identical numerical rating scales (NRS), then a draft multi-item questionnaire was developed with the patient research partner. Study 3 comprised 15 RA patients who completed the PROMs during cognitive interviewing to explore understanding.


Studies 1 and 2 identified key patient terminology (fatigue, exhaustion) and 12 potential fatigue concepts (Cognition, Coping, Duration, Emotion, Energy, Frequency, Impact, Planning, Quality of Life, Relationships, Sleep, and Social Life). Patients' proposals were clarified into draft screening VAS/NRS for fatigue severity, effect, and coping, plus a draft 45-item questionnaire. Study 3 showed that 14 questions required clarification or revision of response options.


Collaboration with patients enabled development of draft RA fatigue PROMs grounded in the patient data, strengthening face and content validity and ensuring comprehension. The draft conceptual framework that emerged has resulted in draft PROMS ready for item reduction, and testing of construct and criterion validity and reliability.


Systematic attempts by health professionals and methodologists to standardize disease assessment in rheumatoid arthritis (RA) have been formulated through the Outcome Measures in Rheumatology Clinical Trials (OMERACT) meetings, leading to a “core set” of 8 outcomes as an international standard in RA clinical trials (1, 2). The RA “core set” did not include fatigue, yet fatigue is an integral part of RA, experienced by almost all patients (3, 4). It is an important symptom that is considered overwhelming, uncontrollable, and different from normal tiredness in severity, quality, and unpredictability, and affects every aspect of life (5–7). RA fatigue may be a complex multicausal symptom with different components such as pain, depression, and inflammation contributing in varying degrees and combinations at different times in individual patients (8).

When RA patients first participated in OMERACT 6 (2002), they raised awareness of fatigue (9), stimulating new research (6, 10–13) so that at OMERACT 8 (2006), international consensus was reached that whenever possible, fatigue should be measured in all RA studies alongside the core set, using an instrument validated for RA fatigue, including concepts patients consider essential (14).

A patient-reported outcome measure (PROM) must be valid and reliable (15). There is usually no “gold standard” with which to compare a PROM, but tests include: face validity (appears sensible), content validity (all necessary but no unnecessary items included), construct validity (results converge with or discriminate between appropriate variables), reliability (test–retest consistency in individuals with a stable condition, internal consistency between items), and sensitivity to change (in patients whose clinical condition has altered) (15–18).

However, in a systematic review of the validity and reliability of fatigue PROMs used in RA studies, 3 independent reviewers identified only 4 questionnaires that had sufficient evidence of validation against published guidelines (10) to measure RA fatigue: Functional Assessment of Chronic Illness Therapy–Fatigue (FACIT-F) (19), Multidimensional Assessment of Fatigue (MAF) (20), Profile of Mood States (21), and Short Form 36 (SF-36) vitality subscale (22). RA fatigue and the words to describe it may differ from other conditions and may not be captured by a generic questionnaire, yet only the MAF is disease specific, and it has a high nonresponse rate (8, 23). The MAF was developed to be multidimensional (20), but yields a single global score. However, the literature suggests that RA fatigue may comprise several dimensions (5–7, 24), and identification of these could inform a conceptual framework of the patient perception of RA fatigue. The subsequent development of a multidimensional PROM that separately measured these concepts could have potential for individually targeted interventions.

Patients have an understanding of fatigue and its effect on their everyday life that only they can describe (25–27). Therefore, an understanding of the lived experience of RA fatigue could be gained through listening to patients (experts) and applied to the development of PROMs, which will then capture issues relevant to patients (15).

To our knowledge, to date there is no standardized wording for a visual analog scale (VAS) or numerical rating scale (NRS) for RA fatigue. The review (10) and updated search (28) revealed that most studies do not describe the wording, timeframe, or descriptors of their VAS, and from those that were reported, 15 different versions were identified, making it difficult to compare findings. These VAS only measured fatigue severity, yet the perceived impact of and ability to cope with RA fatigue may be as (or more) important.

The US Food and Drug Administration requires PROMs to measure relevant concepts, including concerns of patients, and recommend cognitive interviewing before finalization to ensure content validity (15). It seems almost self-evident that in order to collect accurate, relevant data, it is crucial that patients understand PROM questions in the way intended. However, tests of this understanding are rarely reported. “Think aloud” is a form of cognitive interviewing in which patients think aloud as they complete a PROM. Therefore, the researcher can determine if patients have problems interpreting questions in the way intended and if there are appropriate response options, enabling pilot PROMs to be clarified (29).

Based on the limitations of current measures and the clear need to develop a measure grounded in patient experiences, these studies aimed to develop draft PROMs to measure RA fatigue and its impact by collaborating with patients to identify language and experiences, create draft PROM items, and test them for comprehension. Study 1 explored the patient perspective, study 2 drafted fatigue items in collaboration with patients, and study 3 tested these for comprehension. All 3 studies were supported by a patient research partner on the research team, with experience of significant fatigue and previous research studies, who contributed to understanding and interpreting the results (30). A draft multi-item questionnaire and 3 simple screening VAS (with identical NRS versions) were to be developed, ready to go forward with statistical evaluation.


The Central and South Bristol Research Ethics Committee approved all of the studies (06/Q2006/104) and the United Bristol Healthcare Trust approved the original qualitative study (E5223).

Study 1 (capturing the patient perspective of fatigue).

In-depth interviews with 15 RA patients had previously been conducted to understand the fatigue experience (6). The purposive sample comprised RA patients with fatigue ≥7 of 10 and a range of characteristics. Interviews used 6 neutral questions to explore fatigue descriptions, cause, consequences, self-management, sleep, and communication. The transcripts were reanalyzed by one author (JN) who was not party to the previous detailed analysis, with the aims of exploring the language of fatigue and generating conceptual labels of the fatigue experience. Analysis was supported by the NViVo 7 data management package (31). Transcripts were searched for fatigue language, and content analysis used to identify and quantify descriptors (32). Second, with no prior conceptual framework for RA fatigue, transcripts were searched for descriptions of the fatigue experience and inductive thematic analysis used to extract small units of meaning, which were given codes (labels) (32, 33). To support the rigor of thematic analysis, 3 random transcripts were independently analyzed (SH), findings were compared, and common codes agreed. Finally, similar codes were grouped together to form bigger categories, and categories were combined into broad concepts describing the experience of RA fatigue.

Study 2 (developing draft PROMs).

People with RA (34) ages >18 years and with English as a first language were recruited from outpatient clinics if they had a fatigue score ≥7 over the previous week on a 10-cm VAS (no fatigue/extreme fatigue) (35). A sampling frame ensured a range of ages, sexes, and occupations. Three focus groups were held and discussion and debate were facilitated (JN, SH) (36). Each group was asked to provide the words they use to describe fatigue, and then was given the fatigue descriptors arising from study 1, plus descriptors from the 4 best questionnaires (10). Descriptors from all 3 sources were then discussed by each group and ranked in order of preference for a fatigue PROM. Patients were then facilitated to discuss and generate timeframes, stem questions, and anchors for VAS for fatigue severity, impact, and coping (identical wording for NRS versions). Finally, patients discussed whether the concepts of fatigue identified in study 1 were relevant for inclusion in a multi-item questionnaire, and whether any had been omitted. Group debate and decisions were recorded on flip charts and also audio recorded and transcribed verbatim. The study team, including the patient research partner (MU), made the final decision on wording for the VAS/NRS based on the focus groups' recommendations and required measurement properties. Using data and recommendations from studies 1 and 2 and examples from the literature, a 45-item draft questionnaire was similarly developed and discussed in detail with the patient research partner before piloting (study 3).

Study 3 (cognitive interviewing).

Fifteen patients were recruited using the same criteria and method as study 2. Patients completed the 3 draft VAS/NRS and the 45-item draft questionnaire while thinking aloud, thus allowing the researcher to follow their thought processes and probe problematic wording or response options (29). Data were taped and transcribed and responses to each individual question were classified in relation to the intended meaning under recommended headings: Understanding, Retrieval (of information), Judgment (on a response), and Response (appropriate response option available) (15, 37–39).


Study 1.

Fifteen patients (12 women) participated in the original study (mean ± SD age 55.6 ± 14.4 years, mean ± SD disease duration 12.6 ± 8.6 years) (6). Content analysis revealed 12 fatigue descriptors (Table 1), with all of the patients using tired and most using energy, exhausted, and fatigued: “Just weary, sort of exhausted, fatigued, I dunno, just as though everything's drained away from me” (female, age 63 years). However, patients stipulated that RA fatigue was different from the tiredness they experienced before RA, implying that tired was an inappropriate descriptor: “This [RA] tiredness is just…absolute…I can't do anything else” (female, age 50 years). The patient descriptors were mapped onto the 4 questionnaires previously identified (10), which showed that several descriptors in those questionnaires were not used by these patients and may not be understood (listless, pep, sluggish, and bushed).

Table 1. Occurrence of fatigue descriptors in patient transcripts and 4 existing fatigue questionnaires (study 1)*
 Transcript sources (n = 15)Frequency in transcriptsPresent in POMS, SF-36, MAF, or FACIT-F
  • *

    POMS = Profile of Mood States; SF-36 = Short Form 36; MAF = Multidimensional Assessment of Fatigue; FACIT-F = Functional Assessment of Chronic Illness Therapy–Fatigue.

Tired15171POMS, SF-36
Energy (have/lack)933SF-36
Fatigued718POMS, FACIT-F, MAF
Worn out22POMS, SF-36
Different from normal tiredness814
Zonked out11
Listless00POMS, FACIT-F
Pep (full of)00SF-36

Inductive thematic analysis identified 240 detailed codes that could be combined into larger categories (n = 25), which were then mapped onto the 4 questionnaires. Fourteen categories that patients specifically related to fatigue and its impact were not represented in the existing questionnaires: acceptance, importance, loss of control, cause, cognition, depression, duration, onset, medication, mood, planning, quality of life, sleep, and family life. The remaining 11 patient categories appeared in one or more questionnaires, but none contained all 11. Furthermore, the detailed transcript analysis indicated that even where they were present, 4 categories (coping, emotion, impact, and social life) were insufficiently covered. On reverse mapping, one item in an existing questionnaire was not found in any patient transcript (too tired to eat [FACIT-F]). Another category (energy) used in the existing questionnaires had not been coded during analysis of the patient transcripts because it was the opposite of fatigue. However, lack of energy may be termed fatigue and on reexamination, energy was found in 12 transcripts. Because the aim was to measure the level and impact of RA fatigue, 4 unrelated categories were discarded (acceptance, causes, medication, and importance). The remaining 21 categories, plus the new category energy, were then grouped into broad concepts (n = 12): Impact, Sleep, Relationships, Coping, Frequency, Cognition, Emotion, Duration, Social Life, Planning, Quality of Life, and Energy (Figure 1).

Figure 1.

Broad categories merging to form key concepts, exemplified by quotations (study 1).

The Impact of fatigue on every area of life was considerable (Figure 1). Patients felt unable to achieve their optimum physical and mental function (a particular problem for those in paid employment), lacked enthusiasm, and had to modify expectations and adopt unwelcome, dependant roles. Poor Sleep and waking unrefreshed led to daytime fatigue and needing to sleep in the day, which was generally unavoidable, but which some found restorative. Relationships could be strained because fatigue led to lack of patience, irritability, and frustration. One patient described how she had to make a choice between going out with her boyfriend, or staying in and having sex with him, as she did not have energy to do both. Patients felt too fatigued to entertain or fell asleep while socializing, causing embarrassment and guilt. Coping with fatigue was largely managed by pacing, but some found nothing helped. The Frequency of episodes of fatigue varied from daily to weekly or less often. Many patients experienced difficulties with Cognition when fatigued, such as poor memory and inability to assimilate information or problem solve. The reduced ability to participate produced a negative effect on Emotion, with patients feeling angry, frustrated, upset, and embarrassed, which they attributed directly to their fatigue. Fatigue varied in Duration from short periods to permanent fatigue. Fatigue limited Social Life in the amount of time spent with friends, and lack of Energy meant patients were unable to accomplish what they wanted to. Planning was difficult due to the unpredictability of fatigue, which had a particular impact on social life. Quality of Life was adversely affected by fatigue from a global perspective. These potential descriptors and 12 fatigue concepts were then debated by the focus groups (study 2).

Study 2.

Eleven female and 6 male patients participated (n = 2 ages ≤39 years, n = 8 ages 40–59 years, n = 7 ages ≥60 years, mean ± SD fatigue 8.05 ± 0.98). From a combined list of fatigue descriptors from the 3 sources (see the Patients and Methods section), each group reached a consensus on their top 5 fatigue descriptors (Table 2). Exhausted was ranked first by all of the groups, whereas tired was selected by 2 groups (ranked second and fourth). Drained and worn out were selected by every group.

Table 2. Top 5 fatigue descriptors ranked by focus groups (study 2)
RankGroup 1Group 2Group 3
3Worn outDrainedWorn out
4TiredNo energyDrained
5WearyWorn outHeaviness

Each group created screening VAS/NRS for fatigue severity, impact, and coping. Patients debated a range of time scales, reflecting that fatigue changes within a day as well as within a week. Patients wished to capture a period representative of the routine of daily life; therefore, all of the groups reached consensus on 7 days as an appropriate period for which they would be able to remember their activities and fatigue levels.

In discussions regarding the wording of the stem questions, patients clearly distinguished between severity, effect, and coping with fatigue, and sought phrases that most people with RA would understand and that would reflect that the fatigue was due to arthritis. Debate regarding the anchors centered on the words exhaustion, energy, and fatigue, and all of the groups decided that tired was inadequate to reflect the RA experience:

“I think every person gets tired whether you're RA or not, but not everyone gets fatigue, know what I mean? A healthy person wouldn't come home at the end of the day and be fatigued, they might be tired but they won't have fatigue” (male, age 41 years).

All of the groups agreed that the word effect was an appropriate descriptor of fatigue impact, with 2 groups placing it in both the stem question and anchor. However, 2 groups felt that for the coping VAS, a distinction should be made between managing and coping. They suggested that manage related to practical issues, and cope concerned their emotions:

“There is definitely a difference there because you can feel like you might have managed your fatigue in the last week reasonably well but you haven't actually coped with it very well” (female, age 39 years).

One group designed a VAS using manage, one used cope, and one designed a VAS for each. The final wording of the VAS/NRS scales was decided by the research team based on the recommendations from the focus groups, tempered by clinical relevance, measurement properties, and the need for clarity (NRS version) (Figure 2).

Figure 2.

Numerical rating scale to measure severity, effect, and coping with rheumatoid arthritis fatigue.

Finally, each focus group discussed the development of a multi-item questionnaire. The majority of patients in all groups felt that the 12 fatigue concepts from study 1 were a valid reflection of RA fatigue and should be taken forward to questionnaire development.

Wording for questions to cover the 12 concepts of RA fatigue was developed from the patient interviews and focus groups (studies 1 and 2) using an iterative process, with successive rounds of proposed questions being discussed by the team, including the patient partner, who provided practical insights into everyday life with RA. For example, she explained that people with RA may shop via the Internet, so a proposed question regarding shopping was changed to ask whether fatigue affected the ability to leave the house to shop. This process led to the development of 45 items covering the 12 concepts, including 4 coping/managing questions because the patients in study 2 had struggled to define these.

The response options Not at all/A little/Quite a bit/Very much reflected phraseology and recommendations from patients in studies 1 and 2 and were selected for most questions because patients prefer verbal statements to numbered responses (40). The order of questions may influence completion rates; therefore, easy questions relating to daily activities were placed first and the question regarding sex life, which some may consider sensitive, was placed in the middle (41).

Study 3.

The draft VAS/NRS and multi-item questionnaire were completed by 15 patients (9 female) with a range of ages (n = 2 ages <39 years, n = 6 ages 40–59 years, n = 7 ages >60 years) and educational levels (n = 7 normal schooling, n = 4 college/apprenticeship, and n = 4 university). It was clear that each VAS/NRS was understood by patients in the way in which the authors intended. However, a minor change from horizontal to vertical boxes was required for the layout of 3 questionnaire items to encourage patients to select only one response.

There were no problems with Retrieval (of information) over 1 week, but 14 questions required clarification to improve Understanding (n = 8 questions), Judgment (n = 5), and/or Response (n = 12). Although these required only minor changes to the questions or response options, they made important differences in the interpretation of the questions. An example of the Understanding category was the draft question, “Have you cancelled plans because of fatigue?” where 2 patients assumed that plans meant plans to go out:

“I don't go out so I can't cancel” (female, age ≥60 years).

“Say you had plans at home to sort out a cupboard or do a bit of washing?” (JN).

“No I wouldn't do that” (female, age ≥60 years).

“So would fatigue have an impact on that sort of thing?” (JN).

“Yeah” (female, age ≥60 years).

The wording failed to capture plans to do things at home such as housework; therefore, an example was added to the question, e.g., “Plans to go out or do jobs around the home or garden.” An example of the Judgment and Response categories was that several patients were unable to select an appropriate response for certain questions and it was clear that for these, an additional response option (Does not apply to me) was required. The wording of other questions was clarified in a similar fashion, leading to the final draft 45-item questionnaire in which all questions and responses had clearly interpretable meanings for RA patients (Table 3).

Table 3. Example questions from the draft 45-item rheumatoid arthritis (RA) fatigue questionnaire for each potential RA fatigue concept
Draft conceptExample question
ImpactHas fatigue limited your ability to leave the house to go shopping or do errands?
SleepHave you fallen asleep during the day without wanting to?
RelationshipsHas fatigue had a negative effect on your relationship with your partner?
CopingDo you feel you can manage your fatigue?
FrequencyOver the past week, how often have you experienced fatigue?
CognitionHave you forgotten things because of fatigue?
EmotionHave you felt down or depressed because of fatigue?
DurationHow long, on average, has each episode of fatigue lasted this week?
Social lifeHave you left a social event early due to fatigue?
EnergyHave you lacked physical energy?
PlanningHave you cancelled plans because of fatigue?
Quality of lifeHas your quality of life been affected by fatigue this week?


These studies have collaborated with patients in interviews and focus groups, and with a patient research partner to understand the patient perspective of fatigue in RA. The detailed analysis of qualitative data (study 1), undertaken with no a priori conceptual framework for RA fatigue, identified 12 potential fatigue concepts (Figure 1). These include issues that are not represented in existing PROMs used for RA fatigue (e.g., emotional or cognitive fatigue) and reflect how patients view RA fatigue in a wider context that extends beyond severity to encompass the impact of fatigue on their lives. The 3 short scales cover distinct and important (but rarely measured) aspects of fatigue: two patients with similarly high fatigue scores may have very different impacts because of differences in perceived coping abilities. Although existing PROMs such as the RA-specific MAF (20) have been developed to include items from several dimensions, they yield only a single, global fatigue score. However, given its method of development, the emergent draft multi-item questionnaire may have the potential to measure several concepts of RA fatigue. This may have therapeutic implications (e.g., cognitive versus physical fatigue), and a forthcoming study will evaluate the draft multi-item questionnaire to identify the strongest of these 45 items, which may further elucidate the conceptual framework for patient perception of RA fatigue.

Patient collaboration in the development of these draft PROMs occurred through 3 formal, interactive studies, and collaboration with a patient research partner at all stages of question development. Not only were the patients willing and able to make a significant contribution, but they clearly appreciated the measurement properties of wording, timeframe, and descriptors (study 2), and raised the different uses of the words cope and manage. It has been shown that perceived control is necessary for patients to utilize RA management strategies, and that control includes the strategies that patients perceive as enabling them to cope (42). It may be that although some patients use cope and manage interchangeably, others differentiate and may consider one to be an internal state (i.e., emotion-based coping) and the other a reaction to an external environmental condition (i.e., problem-based managing) (43). There is clear potential for such questions to be answered in a different way from that intended; therefore, both terms will be evaluated in the draft questionnaire.

Our results provide evidence of the need for researchers to collaborate with patients in selecting the correct wording and format when designing questionnaires in order to reduce the risk of systematic misunderstandings. In cognitive interviewing (study 3), whereas the majority of the 45 carefully prepared questions were interpreted as intended by the majority of patients, there was significant misunderstanding in 14 questions, which if unidentified, could have led to inaccurate data collection. On this premise, it is recommended that cognitive interviewing is included in all of the studies to develop PROMs (15).

The draft multi-item questionnaire has been carefully phrased to include the stem “RA fatigue” in all of the questions. Patients themselves specifically linked these impacts to fatigue (study 1), and often this attribution will be straightforward (e.g., my pain is mild, it is fatigue that is stopping me doing things). At times it may be difficult to distinguish such fatigue attributions from causal links to depression, pain, or disability, but by providing this reminder with each question, it is hoped that patients will focus on fatigue.

This series of studies recruited only English-speaking patients from one teaching hospital in the UK, which may limit generalizability. The qualitative literature largely reports similar patient experiences across Europe and the US (5–7), but it would be helpful for these PROM studies to be conducted in another population. The cutoff for fatigue in studies 1 and 2 was 7 of 10, but patients with lower levels of fatigue may also experience significant problems. Secondary qualitative data were used in study 1, and although these data were collected using open and neutral questions to understand the phenomenon of fatigue from the patient perspective, the original intent did not include informing development of a PROM.

The development of items for fatigue PROMs based on data from experts (i.e., patients) is not novel in musculoskeletal disease (20, 44), nor is the use of cognitive interviewing to clarify PROM fatigue items. However, in PROMs used for RA fatigue, cognitive interviewing has previously only been used after the final validation of a generic subscale (SF-36), when it is too late to change the wording (45). Cognitive interviewing has been used for the clarification of draft PROM items for an item bank of generic fatigue questions (25% of items subsequently eliminated) (46). The development of conceptual frameworks grounded in patient data is recommended (15) but rarely reported, and no current RA fatigue scale contains multiple concepts that can be measured separately. The strength of this series of studies is that they address all 3 recommendations: they are grounded in collaboration with patients to develop PROM items specifically for RA fatigue (with patients involved in the design of each question), they used cognitive interviewing to field test the items before validation studies, and they elucidated a draft conceptual framework (Figure 1 and Table 3).

These PROMs are now ready for evaluation of construct and criterion validity and reliability. Comparison of the VAS and NRS versions of the screening scales should show the strongest version. It is envisaged that some questions in the 45-item questionnaire may be redundant and that the retained questions will permit detailed measurement of the fatigue experience, including separate concepts, and thus facilitate development of individually tailored fatigue interventions. The wider implications of this work are that it is possible, beneficial, and essential to collaborate with patients intimately in all stages of PROM development.


All authors were involved in drafting the article or revising it critically for important intellectual content, and all authors approved the final version to be published. Dr. Nicklin had full access to all of the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis.

Study conception and design. Nicklin, Cramp, Kirwan, Urban, Hewlett.

Acquisition of data. Nicklin.

Analysis and interpretation of data. Nicklin, Cramp, Kirwan, Urban, Hewlett.


The authors would like to thank the staff and patients at the Bristol Royal Infirmary Rheumatology Department for their assistance with this research.