Validity of two new patient-reported outcome measures in systemic sclerosis: Patient-reported outcomes measurement information system 29-item health profile and functional assessment of chronic illness therapy–dyspnea short form

Authors


Abstract

Objective

Many patient-reported outcome (PRO) instruments used in systemic sclerosis (SSc) trials are limited by lack of validation, licensing fees, and complicated scoring systems. We assessed the construct validity for discriminative purposes of 2 new PRO instruments, the Patient-Reported Outcomes Measurement Information System 29-item Health Profile (PROMIS-29) and the Functional Assessment of Chronic Illness Therapy–Dyspnea short form (FACIT-Dyspnea), measuring health status and dyspnea in SSc patients.

Methods

Seventy-three patients participated in a cross-sectional study at a tertiary SSc program. PROMIS-29, FACIT-Dyspnea, and legacy PRO instruments used in clinical trials (Medical Research Council Dyspnea Score, St. George's Respiratory Questionnaire, Health Assessment Questionnaire disability index, and Short Form 36) were administered. Composite severity scores using an adaptation of the Medsger Disease Severity Index were generated using clinical, diagnostic, and laboratory information. PROMIS-29 and FACIT-Dyspnea scores were compared with legacy PRO measures and composite severity scores.

Results

The mean patient age (84% women) was 51 years (range 22–72 years). The mean SSc disease duration from the onset of the first non–Raynaud's phenomenon symptom was 7.2 years (range 0–45 years). Spearman's correlation coefficients across FACIT-Dyspnea and PROMIS physical functioning scores with legacy PRO instruments were generally high (range 0.50–0.86); those between PROMIS and FACIT-Dyspnea with composite disease severity scores were more modest, but statistically significant (range 0.33–0.48, P < 0.01).

Conclusion

PROMIS-29 and FACIT-Dyspnea are valid instruments to measure the health status of SSc patients. PROMIS-29 and FACIT-Dyspnea may be preferable to legacy instruments because they are freely available in multiple languages and simple to administer, score, and interpret.

INTRODUCTION

Systemic sclerosis (SSc; scleroderma) is a rare chronic connective tissue disease that causes skin and internal organ fibrosis, production of autoantibodies, and deregulation of vascular homeostasis. Given the systemic nature of the disease, SSc can greatly impact quality of life (1). Accepted SSc clinical trial outcomes include assessment of functional, laboratory, and radiographic disease markers, and patient-reported outcomes (PROs). A PRO instrument is a patient-completed questionnaire that assesses a symptom or functional limitation.

A multitude of PRO instruments, including the St. George's Respiratory Questionnaire (SGRQ) (2), Short Form 36 (SF-36) (3), Health Assessment Questionnaire (HAQ) disability index (DI) (3, 4), Medical Research Council (MRC) Dyspnea Score, and others, have been developed, validated in SSc (as indicated), and used to assess treatment response in SSc clinical trials. However, the lack of a standardized and uniformly scored set of PRO instruments to measure various SSc manifestations complicates between-study comparisons and interpretation of PRO results. In order to address this impediment to SSc and other disease-focused research, the National Institutes of Health (NIH) established the Patient-Reported Outcomes Measurement Information System (PROMIS; online at www.nihpromis.org). The PROMIS is a network of NIH-funded research sites and coordinating centers working collaboratively to develop a series of dynamic tools to reliably and validly measure PROs (5).

In 2004, PROMIS investigators developed a plan to create a bank of PRO items and short forms that could be used by all investigators conducting patient-oriented research. Multidimensional scales, including the PROMIS 29-item Health Profile (PROMIS-29), and symptom-specific instruments such as the Functional Assessment of Chronic Illness Therapy–Dyspnea short form (FACIT-Dyspnea) were created in English and Spanish. The PROMIS-29 and FACIT-Dyspnea were validated in the general US and chronic obstructive pulmonary disease (COPD) populations, respectively, and were standardized using a T score metric with a mean ± SD set to 50 ± 10 to ensure simple scoring and interpretable results (6–8).

The purpose of this study was to assess the construct validity for discriminative purposes of the PROMIS-29 and FACIT-Dyspnea for the measurement of general health and dyspnea in SSc. We hypothesized that these new instruments to assess global health and dyspnea would perform as well as legacy instruments (SGRQ, MRC Dyspnea Score, HAQ DI, SF-36) that have been used in SSc clinical trials. The benefit of the new instruments is that they are available in multiple languages (English and Spanish), free, and simple to administer, score, and interpret.

Significance & Innovations

  • This is the first study in systemic sclerosis (SSc) to assess the construct validity for discriminative purposes of 2 new patient-reported outcome (PRO) instruments: the Patient-Reported Outcomes Measurement Information System 29-item Health Profile (PROMIS) and the Functional Assessment of Chronic Illness Therapy–Dyspnea short form (FACIT-Dyspnea).

  • Study results indicate that the PROMIS-29 and FACIT-Dyspnea have construct validity for discriminative purposes compared to legacy PRO instruments and a disease severity measure, and can be used to assess general health status and dyspnea in SSc patients.

  • The PROMIS-29 and FACIT-Dyspnea may be superior to legacy instruments such as the Short Form 36 and the Health Assessment Questionnaire disability index.

PATIENTS AND METHODS

All patients met the American College of Rheumatology criteria for SSc or 3 of 5 criteria for calcinosis, Raynaud's phenomenon, esophageal dysmotility, sclerodactyly, telangiectasias syndrome (9). Institutional review board consent was obtained from patients seen at a tertiary care scleroderma program to participate in the Northwestern Scleroderma Patient Registry and Biorepository. One optional component of the registry is the annual completion of PRO instruments. During regularly scheduled followup clinic visits, participants were given self-administered, paper-based, PRO instruments (PROMIS- 29, FACIT-Dyspnea, SF-36, MRC Dyspnea Score, SGRQ, and HAQ DI) in the same order. Of the 90 participants approached to complete the instruments between December 2, 2009, and July 21, 2010, 3 declined to participate, 7 did not complete the full battery, and 7 did not have a diagnosis of limited cutaneous or diffuse cutaneous SSc and were excluded from the analysis. Consequently, 73 subjects participated. One research assistant manually transcribed responses into a computerized database. A second research assistant reviewed a random sampling of 10% of entered PRO data to verify an error rate of <0.5%.

All laboratory and diagnostic values were performed as clinically indicated. Laboratory values (brain natriuretic peptide, hemoglobin, and creatinine) performed at one clinical laboratory obtained closest to the PRO instrument completion date were analyzed. Additional analyses using only laboratory data within 6 and 12 months of the PRO instrument completion date were also conducted. Pulmonary function test (PFT) parameters, including forced vital capacity % predicted and diffusing capacity for carbon monoxide (DLCO) % predicted, were obtained by querying the Northwestern Medicine Enterprise Data Warehouse (EDW). The EDW is an electronic data repository that is updated nightly and contains all of the clinical information collected across the Northwestern University campus via a variety of clinical reporting tools, including the electronic medical record. Estimated pulmonary artery systolic pressure on 2-dimensional Doppler echocardiography performed for clinically indicated reasons was obtained by manual chart abstraction because echocardiography reports are less amenable to electronic data capture via the EDW. The modified Rodnan skin thickness score (MRSS) was measured by an experienced scleroderma clinical specialist (10).

PROMIS and FACIT patient-reported instruments.

PROMIS investigators framed the overall PRO measurement model according to the classic World Health Organization definition of health as “a complete state of physical, mental and social health; not merely the absence of disease or infirmity.” Using confirmatory factor analysis to establish the unidimensionality of components of self-reported health, followed by item-response theory models to calibrate items within banks, a hierarchical framework was populated around these 3 broad health components: physical, mental, and social. Each component was hypothesized, and confirmed, to have discrete, “bankable” symptoms and functional impacts (in the case of physical health); affective, behavioral, and cognitive effects (in the case of mental health); and relationships and function (in the case of social health).

The PROMIS network developed item (question) banks and short forms in more than 20 health domains as well as a set of global health items and 29-, 43-, and 57-item profile measures (www.nihpromis.org) (8). To create a brief, practical yet inclusive short profile, a consensus-building process was used to identify 7 of these 20 domains to produce the PROMIS-29 that was used in this study (11, 12). The 7 domains specifically relate to physical, mental, and social health and cover the most relevant areas of self-reported health for the greatest majority of people with chronic illness: pain, fatigue, depression, anxiety, sleep, physical function, and sexual function. The PROMIS-29 includes 4 items each from these 7 core PROMIS domains as well as one 11-point rating scale for pain intensity. Norm-based scores have been calculated for each domain, such that a score of 50 ± 10 represents the mean ± SD of the general population. High scores represent more of the domain being measured. Therefore, on symptom-oriented domains of the PROMIS-29 (anxiety, depression, fatigue, pain interference, and sleep disturbance), higher scores represent worse symptoms. On the function-oriented domains (physical functioning and social role), higher scores represent better functioning.

The FACIT-Dyspnea was developed using item-response theory methodology and input from COPD experts to select the most informative and relevant subset of items for measuring dyspnea in a COPD patient population (13). Item-response theory or latent-trait theory is based on the idea that the probability of a certain response to an item is a mathematical function of patient and item parameters. Inclusion of items that assess dyspnea across a spectrum of severity permits accurate measurement of dyspnea for each patient. The FACIT-Dyspnea has been demonstrated to be an internally consistent, reliable, and valid assessment of dyspnea for men and women with a self-reported and formal diagnosis of COPD (6, 7).

Respondents are presented with 10 common tasks to determine whether they performed the task and the severity of dyspnea when completing these tasks over the past 7 days and, separately, they are asked to rate the level of difficulty in completing these tasks due to dyspnea. Patients who report not having done the task are asked whether it was due to shortness of breath or simply because an opportunity to do the task did not occur in the past week. Two scores are created: dyspnea and dyspnea-related functional limitation. Higher scores represent worse dyspnea or functional limitation. As with the PROMIS, scores are distributed on a T metric (mean ± SD 50 ± 10); however, the reference population is people with self-reported COPD.

Legacy PRO instruments.

The SF-36 consists of 36 items assessing physical functioning, social functioning, role limitation due to physical health, bodily pain, mental health, role limitations due to emotional health, vitality, and general health perceptions. Scores were calculated according to the manual, including the calculation of physical and mental component scores (14). On all SF-36 scales, lower scores (<50) represent worse outcome. QualityMetric requires a yearly SF-36 licensing agreement and fee on a per study basis.

The SGRQ was designed to measure health impairment in patients with asthma and COPD. It has been validated in patients with SSc (2). Symptoms and activity scores are calculated using item weights provided in the instrument, with a high score indicating poor quality of life (15). The total score is calculated from 16 items and their respective weights. Special permission is required before administering the SGRQ.

The MRC Dyspnea Score is a freely-available instrument to measure breathlessness (16). The MRC Dyspnea Score grades severity of breathlessness on a 5-point scale, where 0 indicates no trouble with breathlessness and 4 indicates very severe breathlessness. The MRC Dyspnea Score has been validated in patients with COPD and idiopathic pulmonary fibrosis, but not in SSc patients (17, 18).

The HAQ DI was developed to assess disease severity in patients with rheumatoid arthritis and includes 2 or 3 questions for 8 activity domains (dressing and grooming, arising, eating, walking, hygiene, reach, grip, and activities of daily living) and a pain visual analog scale (VAS) (19, 20). A mean score for each domain and a composite score are calculated using an ordinal scale from 0 (no impairment) to 3 (maximal functional impairment). The pain VAS is anchored at 0 (no pain/limitation) and 100 (very severe pain/limitation), where 1 cm = 0.2 points on a 0–3 ordinal scale, but is not factored into the overall disability index score.

The Medsger Disease Severity Scale is widely used in clinical trials. It assesses the impact of SSc on 9 organ systems (general health, peripheral vascular, skin, joint/tendon, muscle, gastrointestinal tract, lungs, heart, and kidneys) (21, 22). Each organ system is scored on a 0 (no involvement) to 4 (severe involvement) scale, although no composite score is calculated. The Medsger Disease Severity Score has not been validated in patients with SSc (23).

Study design.

We conducted a cross-sectional study using a cohort of well-characterized patients with SSc to assess the construct validity for discriminative purposes of 2 new PRO instruments. Construct validity according to Consensus-Based Standards for the Selection of Health Measurement Instruments terminology is defined as “the degree to which the scores of a health related-PRO (HR-PRO) instrument are consistent with hypotheses (for instance with regard to internal relationships, relationships to scores or other instruments, or differences between relevant groups) based on the assumption that the HR-PRO instrument validity measures the construct to be measured” (24). We hypothesized that the PROMIS-29 and FACIT-Dyspnea would demonstrate construct validity because these instruments were designed to cover as full a range of the latent variable being measured by each bank (e.g., pain, fatigue, etc.).

Statistical analysis.

To generate a composite SSc disease severity score, a modified Medsger Disease Severity Scale was calculated using available laboratory values (brain natriuretic peptide, hemoglobin, and creatinine), physical examination findings (MRSS), and clinical markers of disease (PFT and echocardiography) in our cohort. Each variable was first classified into levels of severity and scores of 0, 1, or 2 assigned to each level: 1) MRSS values of 0–6 were recoded to 0, 7–14 were coded as 1, and values of 15 and higher were coded as 2; 2) DLCO% predicted of 80% or greater was coded as 0, 71–79% was coded as 1, and less than or equal to 70% was coded as 2; 3) estimated right ventricular systolic pressure on 2-dimensional Doppler echocardiography values of less than or equal to 50 mm Hg were coded as 0 and values greater than 50 mm Hg were coded as 1; 4) brain natriuretic peptide values of 60 pg/ml or less were coded as 0, 61–99 pg/ml were coded as 1, and values of 100 pg/ml or greater were coded as 2; 5) hemoglobin values of 12 mg/dl or greater were coded as 0, 10.6–11.9 mg/dl were coded as 1, and values of 10.5 or less were coded as 2; and 6) creatinine levels of 1.59 mg/dl or less were coded as 0, 1.60–2.99 mg/dl were coded as 1, and levels of 3.00 mg/dl or greater were coded as 2. Individual item scores were summed to create the composite SSc disease severity score. This composite score was used in the analyses described below.

We selected end points for the validation analyses that we anticipated would be most related to the legacy measures. Construct validity for the new instruments was assessed using Spearman's correlation coefficients between legacy measures and FACIT-Dyspnea and PROMIS-29 scores, and by comparing mean FACIT-Dyspnea and PROMIS-29 scores across groups defined by the MRC Dyspnea Score and by the composite measure of disease severity. Analysis of variance was used to test the differences in scores between groups. Effect sizes (mean difference/pooled SD) were calculated for the differences between adjacent groups to aid in interpretation of differences. All analyses were conducted using SAS, version 9.2.

RESULTS

The battery of 2 new and 4 legacy PRO measures was administered to 73 patients with SSc. Descriptive statistics for patient characteristics and laboratory results performed closest to the date of PRO instrument completion are described in Table 1. There were no significant differences in the analyses when all, or only laboratory data within 6 and 12 months of the PRO instrument completion date, were used (data not shown). More than three-quarters of the participants were women (n = 61 [83.6%]) and 54.8% had limited disease subtype. Mean disease duration was 7.2 years (range 0–45 years) if defined as the time since the onset of the first non–Raynaud's phenomenon symptom and 8.0 years (range 0–36 years) if defined as the interval since the appearance of Raynaud's phenomenon. Only 3 patients (4.1%) had renal insufficiency.

Table 1. Demographics and laboratory and clinical characteristics of disease severity in patients with SSc*
 Value (n = 73)
  • *

    SSc = systemic sclerosis; RP = Raynaud's phenomenon; lcSSc = limited cutaneous SSc; MRSS = modified Rodnan skin thickness score; FVC = forced vital capacity; DLCO = diffusing capacity for carbon monoxide; RVSP = right ventricular systolic pressure on 2-dimensional Doppler echocardiography.

Clinical information 
 Age, mean ± SD (range) years51.1 ± 10.9 (22–72)
 Women, no. (%)61 (83.6)
 Year of onset of RP, mean ± SD (range)8.0 ± 7.5 (0–36)
 Year of onset of first non-RP, mean ± SD (range)7.2 ± 7.6 (0–45)
 SSc disease subtype, no. (%) lcSSc40 (54.8)
 MRSS, mean ± SD9.2 ± 8.0
 Antitopoisomerase, no. (%) positive (n = 70)22 (31.4)
 Anticentromere, no. (%) positive (n = 63)14 (22.2)
 Brain natriuretic peptide, no. (%) (n = 63)11 (17.5)
 Hemoglobin, mean ± SD mg/dl12.5 ± 1.5
 Renal insufficiency (creatinine kinase >1.59), no. (%)3 (4.1)
Diagnostic information 
 FVC, mean ± SD % predicted78.5 ± 17.3
 DLCO, mean ± SD % predicted65.6 ± 20.5
 Estimated RVSP, mean ± SD35.9 ± 11.1
 Modified Medsger Disease Severity Index, mean ± SD (range)2.79 ± 1.97 (0–9)

Descriptive statistics for all PRO measures are listed in Table 2. Because the new PRO instruments are specifically designed to use a T score metric, the mean of each domain is ∼50 (range 40.5–55.1). This contrasts with the legacy PRO instruments that use various scoring systems as reflected in the wide distribution of means (range 0.83–78.2). Figure 1 depicts PROMIS-29 domain scores for the scleroderma cohort relative to the US population. For some domains such as physical functioning and satisfaction with social role participation, a high score (>50) is favorable. Our SSc cohort reported poorer physical functioning (mean 46.7) and satisfaction with social role participation (mean 48.4) than the general population. For other domains such as fatigue, pain interference, and sleep disturbance, a high score indicates more decrements in health-related quality of life. Scleroderma patients reported more anxiety (mean 50.3), fatigue (mean 51.8), pain interference (mean 55.1), and sleep disturbances (mean 52.1), although the degree of impairment was less than the minimum clinically important difference (0.5 SD, or 5 points) for all domains except pain interference (25). Both FACIT-Dyspnea scores were ∼1 full SD better than the average for individuals with COPD (Table 2). Therefore, our cohort of patients with scleroderma reported less dyspnea than patients with COPD.

Table 2. Means and medians for the FACIT-Dyspnea, PROMIS-29, and legacy patient-reported outcome instruments in patients with systemic sclerosis*
 NMean ± SDMedian (range)
  • *

    FACIT-Dyspnea = Functional Assessment of Chronic Illness Therapy–Dyspnea short form; PROMIS-29 = Patient-Reported Outcomes Measurement Information System 29-item Health Profile; SGRQ = St. George's Respiratory Questionnaire; HAQ = Health Assessment Questionnaire; DI = disability index; SF-36 = Short Form 36; MRC = Medical Research Council.

New instruments   
 FACIT-Dyspnea7240.5 ± 10.239.2 (27.7–67.4)
 FACIT-Dyspnea functional limitations7141.1 ± 11.140.3 (29.6–68.8)
 PROMIS-29 physical functioning7346.7 ± 9.245.3 (26.8–56.9)
 PROMIS-29 anxiety7350.3 ± 9.449.0 (40.3–78.0)
 PROMIS-29 depression7349.4 ± 8.748.9 (41.0–71.4)
 PROMIS-29 fatigue7351.8 ± 12.051.0 (33.7–75.8)
 PROMIS-29 pain interference7355.1 ± 10.455.7 (41.6–75.6)
 PROMIS-29 sleep disturbance7352.1 ± 9.752.8 (32.0–73.3)
 PROMIS-29 satisfaction with social role participation7348.4 ± 11.446.3 (29.0–64.1)
Legacy instruments   
 SGRQ symptom score7325.0 ± 22.419.9 (0–83.6)
 SGRQ activity score7036.6 ± 33.235.3 (0–98.9)
 SGRQ impacts score7013.0 ± 17.25.6 (0–78.1)
 SGRQ total score6922.2 ± 21.116.6 (0–82.6)
 HAQ DI720.83 ± 0.820.63 (0–2.63)
 SF-36 general health7349.2 ± 24.250 (0–95)
 SF-36 physical functioning7359.6 ± 29.460 (0–100)
 SF-36 role limitations-physical7263.2 ± 32.168.8 (0–100)
 SF-36 bodily pain7361.3 ± 27.562 (0–100)
 SF-36 mental health7371.5 ± 20.975 (5–100)
 SF-36 role limitations-emotional7378.2 ± 27.691.7 (0–100)
 SF-36 social functioning7373.0 ± 28.275 (0–100)
 SF-36 vitality7352.4 ± 8.050 (31.2–68.8)
 MRC Dyspnea Score 0 (%) 29 ± 44.6 
 MRC Dyspnea Score 1 (%) 24 ± 36.9 
 MRC Dyspnea Score 2–4 (%) 12 ± 18.5 
Figure 1.

Patient-Reported Outcomes Measurement Information System 29-item Health Profile function and symptom domain scores. A high functional domain score is favorable (left). A low symptom domain score is favorable (right).

To assess the construct validity of the PROMIS-29 and FACIT-Dyspnea compared to the legacy instruments in measuring general health status and dyspnea, we calculated Spearman's correlation coefficients to examine the correlation between PRO instrument scores that assessed similar domains. FACIT-Dyspnea and FACIT-Dyspnea functional limitations scores were compared to the SGRQ, MRC Dyspnea Score, and HAQ DI, while PROMIS-29 physical functioning was compared to the HAQ DI and the SF-36 physical component score (Table 3). We prospectively defined r = ≤0.3 as a low correlation, 0.3 ≤ r ≤ 0.5 as a moderate correlation, and r = >0.5 as a high correlation (26). Correlation coefficients ≥0.50 were considered good evidence for construct validity. As expected, FACIT-Dyspnea and FACIT-Dyspnea functional limitations scores strongly correlated with SGRQ, MRC Dyspnea Score, and HAQ DI scores (r = 0.64–0.82). There was a strong correlation between PROMIS-29 physical functioning and HAQ DI and SF-36 physical component scores (r = −0.82 and 0.86, respectively). Additionally, there was a strong correlation between PROMIS-29 anxiety and depression and SF-36 mental component scores (r = −0.50 and −0.70, respectively). These data suggest that the PROMIS-29 and FACIT-Dyspnea are good alternatives to existing measures of disease burden.

Table 3. Spearman's correlation coefficients (95% confidence intervals) comparing the FACIT-Dyspnea and PROMIS-29 scales with legacy patient-reported outcome instruments for the measurement of dyspnea and physical function in patients with systemic sclerosis*
 SGRQ total scoreMRC Dyspnea ScoreHAQ DISF-36 physical component scoreSF-36 mental component score
  • *

    All P < 0.01. FACIT-Dyspnea = Functional Assessment of Chronic Illness Therapy–Dyspnea short form; PROMIS-29 = Patient-Reported Outcomes Measurement Information System 29-item Health Profile; SGRQ = St. George's Respiratory Questionnaire; MRC = Medical Research Council; HAQ = Health Assessment Questionnaire; DI = disability index; SF-36 = Short Form 36.

FACIT-Dyspnea0.82 (0.72, 0.88)0.74 (0.60, 0.83)0.64 (0.47, 0.76)
FACIT-Dyspnea functional limitations0.78 (0.67, 0.86)0.68 (0.52, 0.79)0.76 (0.63, 0.84)
PROMIS-29 physical functioning−0.82 (−0.88, −0.72)0.86 (0.78, 0.91)
PROMIS-29 anxiety−0.50 (−0.66, −0.30)
PROMIS-29 depression−0.70 (−0.80, −0.55)

To further evaluate construct validity of the new versus legacy PRO instruments, we assessed the correlation between new and legacy instruments and the modified Medsger Disease Severity Index using Spearman's correlation coefficients as above. PROMIS-29 physical functioning, pain interference, and satisfaction with social role participation domains were moderately correlated with the SSc Disease Severity Index (r = 0.37–0.48). A moderate degree of correlation was also seen between both FACIT-Dyspnea scores and the Disease Severity Index (r = 0.33 and 0.43 for dyspnea and functional limitations, respectively). These moderate correlations suggest that physician and patient-reported assessments of disease severity often differ.

As summarized in Table 4, the FACIT-Dyspnea scores and PROMIS-29 physical functioning domain showed differences in means on the basis of MRC Dyspnea Score groups with the poorest mean score associated with the most severe MRC Dyspnea Score category of self-reported health. All effect sizes for the differences between adjacent groups were >0.30. The FACIT-Dyspnea scores and PROMIS-29 physical functioning and satisfaction with social role participation scores showed differences in means on the basis of groups defined by the SSc Disease Severity Index, with most effect sizes >0.30. The range of effect sizes for group differences was similar for the legacy measures (SF-36, SGRQ, and HAQ DI). These observations suggest that new and legacy PRO instruments are comparable to one another in terms of their ability to detect meaningful differences between clinically distinct groups of SSc patients.

Table 4. Mean ± SD scores and effect sizes of the FACIT-Dyspnea and PROMIS-29 and the legacy patient-reported outcome instruments by the MRC Dyspnea Score*
MRC Dyspnea Score groupMean ± SDEffect sizeP
  • *

    FACIT-Dyspnea = Functional Assessment of Chronic Illness Therapy–Dyspnea short form; PROMIS-29 = Patient-Reported Outcomes Measurement Information System 29-item Health Profile; MRC = Medical Research Council; SGRQ = St. George's Respiratory Questionnaire; HAQ = Health Assessment Questionnaire; DI = disability index; SF-36 = Short Form 36.

  • Mean difference/pooled SD.

  • By analysis of variance.

FACIT-Dyspnea  < 0.001
 MRC Dyspnea Score 0 (n = 28)33.1 ± 6.21.42 
 MRC Dyspnea Score 1 (n = 24)42.9 ± 5.51.60 
 MRC Dyspnea Score 2–4 (n = 12)53.9 ± 9.9  
FACIT-Dyspnea functional limitations  < 0.001
 MRC Dyspnea Score 0 (n = 27)34.0 ± 8.60.93 
 MRC Dyspnea Score 1 (n = 24)42.0 ± 6.61.47 
 MRC Dyspnea Score 2–4 (n = 12)54.6 ± 11.1  
PROMIS physical functioning  < 0.001
 MRC Dyspnea Score 0 (n = 29)51.5 ± 8.4−0.89 
 MRC Dyspnea Score 1 (n = 24)45.2 ± 5.7−1.36 
 MRC Dyspnea Score 2–4 (n = 12)35.6 ± 5.1  
SGRQ total score  < 0.001
 MRC Dyspnea Score 0 (n = 27)6.2 ± 9.51.69 
 MRC Dyspnea Score 1 (n = 22)28.5 ± 13.01.64 
 MRC Dyspnea Score 2–4 (n = 12)50.1 ± 18.5  
HAQ DI  < 0.001
 MRC Dyspnea Score 0 (n = 29)0.5 ± 0.70.59 
 MRC Dyspnea Score 1 (n = 23)0.9 ± 0.71.02 
 MRC Dyspnea Score 2–4 (n = 12)1.6 ± 0.7  
SF-36 physical component score  < 0.001
 MRC Dyspnea Score 0 (n = 28)47.4 ± 11.1−1.07 
 MRC Dyspnea Score 1 (n = 24)37.0 ± 8.2−0.91 
 MRC Dyspnea Score 2–4 (n = 12)28.0 ± 8.7  

DISCUSSION

Using a cohort of SSc patients, we demonstrated the construct validity for discriminative purposes of 2 new PRO instruments, the PROMIS-29 and FACIT-Dyspnea. The PROMIS-29 and FACIT-Dyspnea will likely be more efficient and meaningful PRO instruments compared to legacy instruments in clinical SSc trials because they are freely available; simple to administer, score, and interpret; available in multiple languages; and have been validated in many diseases, now including SSc.

We found moderate to high degrees of correlation between PROMIS-29 domains and FACIT-Dyspnea and legacy instruments. All PROMIS-29 domains were moderately to highly correlated with the SF-36 physical component score, SGRQ, and HAQ DI, which have been validated in patients with SSc. Correlation between PROMIS-29 domains and the MRC Dyspnea Score was moderate. The FACIT-Dyspnea was highly correlated with the SGRQ total score, MRC Dyspnea Score, HAQ DI, and SF-36 physical component score, as expected. These results suggest the PROMIS-29 and FACIT-Dyspnea are acceptable alternatives to the SGRQ, HAQ DI, SF-36, and MRC Dyspnea Score in clinical studies.

To further evaluate the construct validity of the PROMIS-29 and FACIT-Dyspnea in SSc, we examined the correlation between these new PRO instruments and the modified Medsger Disease Severity Index. Some of the PROMIS-29 domains (physical functioning, pain interference, and satisfaction with social role participation) and the FACIT-Dyspnea were moderately correlated with the modified Medsger Disease Severity Index. We did not expect a high degree of correlation between some of the PROMIS-29 domains such as anxiety, depression, sleep disturbance, and satisfaction with social role performance and the Disease Severity Index.

The primary reason we did not observe a strong correlation between the severity scale and FACIT-Dyspnea may stem from a lack of variability in pulmonary involvement of our subjects. Our finding that the date of laboratory testing (creatinine, brain natriuretic peptide, and hemoglobin) in relation to PRO instrument completion had no effect on the modified Medsger Disease Severity Index score suggests that our cohort was relatively healthy: patients with stable SSc infrequently undergo laboratory testing. Additionally, although levels of 5 of the 6 items in the Index (MRSS, DLCO, hemoglobin, brain natriuretic peptide, and estimated right ventricular systolic pressure on echocardiography) can influence dyspnea, the mean Severity Index in our cohort was only 2.79 (potential range of 0–11, observed range of 0–9). This suggests that patients were relatively healthy as assessed by our metric.

The FACIT-Dyspnea and PROMIS-29 were able to successfully differentiate between groups defined by MRC Dyspnea Score category or the SSc Disease Severity Index, with most effect sizes >0.30. Effect sizes >0.2 are likely to be clinically important (27, 28). This is an indication of the construct validity for discriminative purposes of these new instruments in patients with SSc.

One important advantage of using PROMIS instruments such as the PROMIS-29 is the ability to compare the results in a study cohort to the general US population. SSc patients reported poorer physical functioning and satisfaction with social role participation compared with the general US population (25). Levels of anxiety and depression were similar, whereas fatigue, pain interference, and sleep disturbances were higher compared with the age-adjusted general US population (25). Increased sleep disturbances in our cohort compared to the general US population corroborates recently published study results that found sleep disturbances in many SSc patients that were associated with greater dyspnea, depressed mood, and severity of reflux symptoms (29).

Because lung disease (pulmonary artery hypertension and interstitial lung disease) is the leading cause of death in SSc patients and can cause dyspnea, we compared PROMIS domain scores in our SSc patients to COPD patients from a prior study (25). Our SSc cohort reported better fatigue, anxiety, depression, physical function, and pain scores than COPD patients (25). FACIT-Dyspnea scores were substantially better (>1 SD, which is 2 times the minimum clinically important difference) than those of COPD patients (6, 7). The ability to compare results of new PRO instruments across diseases is a major strength of the PROMIS items and short forms, and will likely provide meaningful insights into disease burden for a broad range of conditions.

Limitations of this study include the inability to calculate a complete Medsger Disease Severity Index for study subjects because of a lack of information collected from our registry participants (i.e., tendon friction rubs, joint contractures, digital ulcers). Also, as stated previously, our cohort was relatively healthy, with a low mean score on the modified Medsger Disease Severity Index. Administration of the PROMIS-29 and FACIT-Dyspnea to a cohort of SSc patients with low, moderate, and high disease severity to further determine the sensitivity of the PROMIS-29 and FACIT-Dyspnea in SSc is needed.

Future studies should also include longitudinal analyses to assess the sensitivity of the PROMIS and FACIT as indices of SSc disease activity. An important future research need is to determine if the PROMIS-29 and FACIT-Dyspnea predict outcomes in SSc (e.g., development and/or progression of lung and skin disease, death), as they do for cancer. In fact, studies have shown that PROs predict clinical outcomes (e.g., death) better than physician assessments in patients with cancer (30, 31).

The results of our study demonstrate the construct validity for discriminative purposes of the PROMIS-29 and FACIT-Dyspnea as measures of general health status and dyspnea in SSc patients. The potential advantages of the PROMIS-29 and FACIT-Dyspnea include simple scoring procedures using a T scale metric, the ability to compare results to a variety of other populations (US general public and other chronic diseases), availability in multiple languages (soon to include Chinese), and lack of licensing costs. The PROMIS-29 and FACIT-Dyspnea are attractive alternatives to legacy PRO instruments to measure general health status and dyspnea in patients with SSc. 5

Table 5. Mean ± SD scores and effect sizes of the FACIT-Dyspnea and PROMIS-29 and the legacy patient-reported outcome instruments by disease severity using the modified Medsger Disease Severity Index score*
SSc Disease Severity Index score groupMean ± SDEffect sizeP
  • *

    FACIT-Dyspnea = Functional Assessment of Chronic Illness Therapy–Dyspnea short form; PROMIS-29 = Patient-Reported Outcomes Measurement Information System 29-item Health Profile; SSc = systemic sclerosis; SGRQ = St. George's Respiratory Questionnaire, HAQ = Health Assessment Questionnaire; DI = disability index; SF-36 = Short Form 36.

  • Mean difference/pooled SD.

  • By analysis of variance.

FACIT-Dyspnea  0.003
 0–1 (n = 25)35.0 ± 7.60.94 
 2–3 (n = 20)44.1 ± 9.8−0.13 
 ≥4 (n = 27)42.8 ± 10.8  
FACIT–Dyspnea functional limitations  0.003
 0–1 (n = 24)35.3 ± 7.70.65 
 2–3 (n = 19)42.1 ± 10.70.31 
 ≥4 (n = 28)45.3 ± 11.9  
PROMIS physical functioning  < 0.001
 0–1 (n = 25)52.4 ± 6.7−0.81 
 2–3 (n = 20)45.6 ± 8.9−0.41 
 ≥4 (n = 28)42.2 ± 9.0  
PROMIS social role  0.004
 0–1 (n = 25)53.9 ± 10.3−0.53 
 2–3 (n = 20)48.1 ± 12.9−0.42 
 ≥4 (n = 28)43.6 ± 9.2  
SGRQ total score  0.004
 0–1 (n = 22)10.0 ± 15.30.89 
 2–3 (n = 19)27.8 ± 23.60.00 
 ≥4 (n = 28)27.9 ± 19.9  
HAQ DI  < 0.001
 0–1 (n = 25)0.3 ± 0.50.62 
 2–3 (n = 19)0.8 ± 0.70.72 
 ≥4 (n = 28)1.3 ± 0.9  
SF-36 physical component score  < 0.001
 0–1 (n = 25)48.6 ± 8.6−0.99 
 2–3 (n = 19)38.3 ± 12.3−0.36 
 ≥4 (n = 28)34.5 ± 10.2  

AUTHOR CONTRIBUTIONS

All authors were involved in drafting the article or revising it critically for important intellectual content, and all authors approved the final version to be published. Dr. Hinchcliff had full access to all of the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis.

Study conception and design. Hinchcliff, Thavarajah, Chang, Cella.

Acquisition of data. Hinchcliff, Varga, Chung, Podlusky, Carns.

Analysis and interpretation of data. Hinchcliff, Beaumont, Chang.

Ancillary