A number of studies (all n <200) have assessed health-related quality of life (HRQOL) in patients with systemic sclerosis (SSc), but no systematic review of the effect of SSc on HRQOL has been done. The objective of this study was to systematically review the literature on HRQOL in SSc measured using the Medical Outcomes Trust Short Form 36 (SF-36).
A comprehensive search was conducted in August 2007 using Medline, CINAHL, and EMBase to identify original research studies reporting SF-36 scores of SSc patients. Selected studies were reviewed and characteristics of the study samples and SF-36 data were extracted. Bayesian meta-analysis and meta-regression were performed to obtain pooled estimates of SF-36 physical component summary (PCS) and mental component summary (MCS) scores for all patients as well as by limited and diffuse disease status.
Twelve data sets with a total of 1,127 SSc patients were included in the systematic review. HRQOL was impaired in patients with SSc, with pooled SF-36 PCS scores being more than 1 SD below the general population (38.3; 95% credible interval [95% CI] 35.2, 41.5) and pooled SF-36 MCS scores being ∼0.5 SDs below the general population (46.6; 95% CI 44.2, 49.1). SF-36 PCS scores were 3.5 points (95% CI −1.0, 8.0) lower in patients with diffuse compared with limited disease.
This study provides robust evidence of the presence and magnitude of impairment in HRQOL in patients with SSc. Although the impairment appears greater in physical health, mental health impairment is also reported.
A focus of medical research has traditionally been measurement of mortality and morbidity. As chronic diseases have become more prevalent (1), researchers have begun to realize that these are not sufficient to capture the experience of disease. Patient-reported outcomes, including measurement of health-related quality of life (HRQOL), have emerged as important outcomes of interest. In addition, information regarding HRQOL serves a number of other purposes. First, in clinical trials, treatment efficacy and/or improved survival need to be balanced against adverse effects and impaired HRQOL. Second, HRQOL data can be used by health care policymakers to identify needs and allocate resources for patients with various diseases. Finally, in the clinical setting, HRQOL data can allow busy clinicians to monitor their patient's status and make treatment decisions.
Systemic sclerosis (SSc) is a multisystem disorder characterized by a disturbance in fibroblast function, microvascular disease, and immune system activation, culminating in fibrosis of the skin and internal organs (2). Although it is a heterogeneous disorder, 2 common clinical subsets are recognized in terms of skin involvement: limited cutaneous SSc (lcSSc; skin involvement distal to the elbows and knees) or diffuse cutaneous SSc (dcSSc; skin involvement proximal to the elbows and knees in addition to the trunk) (3). SSc is associated with significant morbidity, including disfiguring skin thickening, finger ulcers, joint contractures, pulmonary hypertension, interstitial lung disease, chronic diarrhea, and renal failure. Functional disability is considerable (4), and rates of clinically significant depressive symptoms are high even compared with other medical patient groups (5). The disease therefore encompasses broad multidimensional issues, including biologic, psychological, and social processes. Thus, it would not be surprising that HRQOL should be impaired. However, to date, there has been relatively little work on HRQOL in patients with SSc, and experts have recommended additional research in this area (6).
Given the paucity of data, this systematic review of the literature was carried out to gain greater insight into the HRQOL of patients with SSc. Specifically, we had 2 objectives: the primary objective was to determine to what extent HRQOL is impaired in SSc, and the secondary objective was to determine whether there are differences in HRQOL between patients with lcSSc and dcSSc. The Medical Outcomes Trust Short Form 36 (SF-36) (7) is a widely used generic measure of HRQOL. Therefore, to maximize the comparability of the studies selected in this systematic review, we decided to limit the review to those studies using the SF-36 as the main outcome measure of HRQOL.
MATERIALS AND METHODS
Methodology of the systematic review.
We performed this systematic review of the literature according to guidelines proposed by Stroup et al for the reporting of meta-analyses of observational studies in epidemiology (8).
The SF-36 is composed of 36 questions that can be grouped into 8 domains: physical functioning, role physical, bodily pain, general health, vitality, social functioning, role emotional, and mental health. Each of the domains can be scored separately and have a range of 0–100, with 0 indicating the worst HRQOL and 100 indicating the best HRQOL. The scores of the domains can also be combined into 2 summary scores: the physical component summary (PCS) and the mental component summary (MCS) scores. The summary scores are standardized to responses from the US general population, for which the mean ± SD score is 50 ± 10.
Search strategy and study selection.
Medline, EMBase, and CINAHL up to August (week 3) 2007 were independently searched by 2 investigators (MH, EN) using the following search strategies: ((scleroderma[mh] OR scleroderma[tiab]) AND sf 36[tiab]) for Medline and (scleroderma and sf 36) for CINAHL and EMBase. The searches were also repeated using systemic sclerosis instead of scleroderma. In addition, reference lists of selected studies and a recent review article (9) were also hand searched by a single investigator (MH). Two investigators (MH, EN) independently reviewed the abstracts of each reference identified by the search to identify full-length, published, original research studies that included SSc patients and that reported SF-36 data. All such studies were selected for full-article review. Differences were resolved by consensus.
The same 2 investigators reviewed the articles identified as potentially eligible based on their abstracts and determined eligibility for study inclusion, again based on consensus. Studies were selected according to the following criteria: 1) the study presented original data, 2) the study included SSc patients, 3) the study reported SF-36 subscale or summary score data, 4) studies of any design without restriction as to language were included, 5) in the case of duplication with multiple articles publishing data on the same cohort, the most complete data set or the article whose focus was more specifically on HRQOL in SSc was included, 6) data available only in abstract form were excluded, and 7) studies with a mixed patient population were included if data on SSc patients were available, separately.
Description of studies.
Two investigators (MH, EN) independently extracted the data from each selected study using a structured data extraction form. Differences were resolved by consensus. The following information was systematically extracted: 1) study design (e.g., randomized trial, cohort, cross-sectional, etc.); 2) country where the study was performed; 3) characteristics of the patients: sample size, criteria used to identify patients with SSc (American College of Rheumatology [formerly the American Rheumatism Association] classification  or other classification system), age, percentage of female patients, mean disease duration, and percentage of patients with limited and diffuse skin involvement; and 4) SF-36 subscale and summary scores. For clinical trials and cohort studies, SF-36 scores at baseline were recorded.
Authors of individual studies were contacted to obtain complementary data (in particular, SF-36 summary scores) necessary to perform the meta-analysis. Several (11–13) graciously replied and provided the requested data.
Data were extracted and summarized in tabular form. The SF-36 subscale scores reported in most studies were not standardized and were not directly comparable. However, the SF-36 summary scores are standardized and can be compared directly. Therefore, we performed a meta-analysis with the studies for which we had SF-36 summary scores (9 studies, n = 955). We used Bayesian meta-analysis and meta-regression methods to perform our analysis for a number of reasons. First, it was important to be able to select appropriate covariates for the meta-regression and given the small number of studies, Bayesian methods are more reliable for model selection and small sample sizes because they do not depend on large sample properties of the tests (14). Second, Bayesian methods, particularly when using the WinBUGS/OpenBUGS (15) software, naturally handle problems with missing covariate values. We note that the primary drawback of Bayesian hierarchical modeling is the sensitivity of results to specification of the prior distributions for the parameters. Therefore, we performed a sensitivity analysis for the most critical choices of prior distribution (see Supplementary Appendix A, available in the online version of this article at http://www3.interscience.wiley.com/journal/77005015/home).
We used the deviance information criterion (DIC) to select the covariates for the meta-regression models. The DIC is similar in nature to the more commonly used likelihood-based model selection criteria for hierarchical models, Akaike's information criterion and the Bayesian information criterion. These 3 criteria all share a common generic formula by rewarding models that fit the observed data well (as measured by −2 × log-likelihood) and penalizing models that are increasingly complex (where the penalty is some function of the number of model parameters and the number of observations). There are 2 major practical differences between the DIC and its frequentist counterparts. First, rather than calculating the fit of the model based on the maximized log-likelihood, the model fit component of the DIC is an average of the −2 × log-likelihood over all posterior samples. Second, the number of parameters used to penalize for model complexity is estimated from the data, rather than being fixed in advance. This second difference is more important, in that it tries to adjust for the extent to which studies share parameters in the model.
All Bayesian parameter estimates and associated 95% credible intervals (95% CIs) were obtained using the WinBUGS and OpenBUGS software and the R statistical package (16, 17). All study estimates are based on the runs of 3 chains with 100,000 samples from each, thinned by a factor of 10 yielding 30,000 iterations for each analysis result presented. The Gelman and Rubin convergence diagnostic (18, 19) in WinBUGS was used to diagnose convergence of the Markov chain Monte Carlo samplers. Finally, we also analyzed our data with frequentist methods to allow for comparison in the cases where we had complete data and to more objectively assess the impact of our Bayesian model assumptions using the MiMa meta-regression package (20).
The search process identified 22 unique titles (11–13, 21–39) (Supplementary Figure 1, available in the online version of this article at http://www3.interscience.wiley.com/journal/77005015/home). During the title and abstract review process, 1 study was excluded because it did not report SF-36 data for patients with SSc (29). Twenty-one articles were selected for full-text review. Eight studies were excluded: 5 (27, 30–32, 34) because they reported on subsets or duplicated data reported in other included studies (23, 28, 37), with only minor changes in the overall sample size and SF-36 scores; 1 (25) reported data from 2 other studies, 1 was reported separately and included in the selection (26), and 1 was published only in abstract form (40); 1 (35) did not report any SF-36 data; and 1 (36) reported SF-36 data on 15 patients, of which only 4 had SSc and their results were not reported separately. One study reported supplementary data (24) to another eligible study (38), and data from the 2 reports were therefore combined.
Studies included in the systematic review.
Therefore, 12 studies with a total of 1,127 SSc patients were included in the systematic review. Four studies were from the US (21, 23, 24, 26, 38), 4 were from Italy (13, 33, 37, 39), 2 were from France (22, 28), 1 was from the UK (11), and 1 was from Canada (12). Characteristics of the SSc patients included in the studies are shown in Table 1. SF-36 PCS and MCS scores were available for 9 studies (n = 955).
Table 1. Characteristics of the studies included in the systematic review*
SF-36 data were extracted from the selected studies (Supplementary Table 1, available in the online version of this article at http://www3.interscience.wiley.com/journal/77005015/home). The SF-36 PCS scores ranged from 33.4 to 43.8 and the SF-36 MCS scores ranged from 41.0 to 50.7. A Bayesian meta-analysis was performed to pool the SF-36 PCS and MCS data from the 9 studies (n = 955) that reported these data (Figure 1). The resulting posterior mean estimate of the population overall PCS score was 38.3 (95% CI 35.2, 41.5). Without adjusting for any covariates, we estimated a between-study SD of 4.1 (95% CI 2.4, 8.3) (Table 2). Similarly, we obtained an overall estimate of 46.6 (95% CI 44.2, 49.1) for the MCS score and an unadjusted between-study SD of 3.0 (95% CI 1.6, 6.2). The significant heterogeneity between studies, particularly in the PCS score estimate, is visible in the forest plots in Figure 1 and supported by the wide SDs mentioned.
Table 2. Meta-analysis and meta-regression models for the SF-36 PCS and MCS scores*
Smaller deviance information criterion (DIC) values indicate better model fit. Between-study heterogeneity is assessed using the SD of the adjusted study means and the coefficient estimate is the simple meta-regression coefficient for the model of interest. SF-36 = Short Form 36; PCS = physical component summary; MCS = mental component summary; 95% CI = 95% credible interval; HRQOL = health-related quality of life.
We performed meta-regression analyses for the 9 studies included in the meta-analysis. The goal of the meta-regression was to use selected population characteristics to try to explain (and therefore reduce) the observed heterogeneity between studies. We looked at various models using age, percentage of dcSSc patients, percentage of female subjects, and duration of disease among study participants as covariates in the regression models. Table 2 shows the DIC values and the estimated SDs of the SF-36 PCS and MCS scores for the unadjusted baseline models and for models adjusting for the selected covariates. For purposes of interpretation, lower DIC values indicate better model fit. For the SF-36 PCS scores, we found that age (DIC 37.0) or percentage of dcSSc patients (DIC 37.4) yielded slightly better-fitting models than the baseline unadjusted model (DIC 37.7). Of note, however, was that the effect of age and percentage of patients with dcSSc went in opposite directions, with increasing age being associated with better SF-36 PCS scores and dcSSc being associated with worse SF-36 PCS scores. The model containing both covariates actually performed worse, due to the correlation between age and percentage of patients with dcSSc. Percentage of female subjects and disease duration did not have any effect, either by themselves or in addition to the other covariates. None of the covariates helped to explain the observed heterogeneity in the SF-36 MCS scores.
Forest plots for the pooled study estimates adjusting for the percentage of dcSSc patients are also shown in Figure 1. Although the pooled unadjusted and adjusted estimates appear similar, the heterogeneity in the forest plots is reduced in the adjusted models for the SF-36 PCS. In addition, the SDs in the adjusted models for the SF-36 PCS are almost half of the SDs in the unadjusted models (Table 2). Therefore, adjusting for the percentage of patients with dcSSc considerably decreases the heterogeneity among studies included in the meta-analysis for the SF-36 PCS. Similar findings were obtained when adjusting for age (data not shown).
Comparison of limited and diffuse disease subsets.
Eight studies (n = 797) reported SF-36 scores by the extent of skin involvement (Supplementary Table 2, available in the online version of this article at http://www3.interscience.wiley.com/journal/77005015/home). SF-36 PCS scores ranged from 36.8 to 43.8 in patients with lcSSc compared with 32.4 to 43.7 in patients with dcSSc. SF-36 MCS scores ranged from 40.0 to 54.1 in patients with lcSSc compared with 40.0 to 50.6 in patients with dcSSc. Two studies (n = 157) (12, 28) reported significantly worse PCS scores in patients with dcSSc compared with lcSSc, and 1 study (n = 24) (33) reported significantly worse MCS scores in patients with lcSSc compared with dcSSc. Of the 8 studies, 2 contained only subjects of one disease type (1 study with only dcSSc patients  and 1 study with only lcSSc patients ), which were excluded from the meta-analysis looking at differences between subsets. Therefore, the results in this section are derived from the data of 6 studies (n = 414) (Table 2 and Figure 2).
Since there were few studies and most had small sample sizes, we pooled the results using meta-analytic methods rather than using meta-regression. Table 2 shows the results of 2 different models for this meta-analysis. The fixed-effects model assumes that there is no between-study heterogeneity in the SF-36 difference between patients with lcSSc and dcSSc. The random-effects model assumes that between-study heterogeneity exists in the differences between lcSSc and dcSSc patients. The DIC values for the random-effects models for both the SF-36 PCS and MCS were the lowest, indicating better fit than the fixed-effects models. Using the random-effects model, we estimated that patients with lcSSc had an SF-36 PCS score that was 3.5 points (95% CI −1.0, 8.0) higher than patients with dcSSc. On the other hand, lcSSc and dcSSc patients did not seem to differ in their SF-36 MCS scores (the estimates for the difference in scores between the 2 groups were close to 0 in both models). The forest plots in Figure 2 show the results of the random-effects model.
Of note, we also found considerable heterogeneity between studies included in this meta-analysis (Table 2). However, since we did not have covariate information on the individual subsets (limited or diffuse) for many of the studies, we decided not to perform a meta-regression on these data. Instead, we examined how the inference about the difference in HRQOL between lcSSc and dcSSc patients would change under different values of true between-study heterogeneity (as measured by the SD of the true study differences). Figure 3 shows that for the PCS, the amount of between-study heterogeneity does not affect the estimate of the difference, but does affect the perceived likelihood of the difference being greater than zero. For the MCS, there is no amount of between-study heterogeneity that would lead to the conclusion that there was a difference between the mental HRQOL of lcSSc and dcSSc patients.
Sensitivity analyses and computational details.
We conducted a sensitivity analysis to assess the robustness of our results to the specification of our models. First, we refit the meta-analyses of the preceding sections to the 8 studies with completely observed data using frequentist random-effects meta-analysis methods in the R statistical package via the meta-library and the MiMa meta-regression software. We found that the frequentist approach yielded very similar parameter estimates to what we obtained with the Bayesian models. In the one situation where they differed, the frequentist meta-analyses showed a more statistically significant difference between dcSSc and lcSSc patients (P = 0.047) than the Bayesian meta-analysis of the differences. Similarly, we tested the robustness of our Bayesian model prior to specification by obtaining results for the 4 prior specifications detailed in Supplementary Appendix A (available in the online version of this article at http://www3.interscience.wiley.com/journal/77005015/home), in particular the important prior distribution on the random-effects variance. We did not see any substantial differences in the parameter estimates themselves (Figure 4), although in the case of the differences one interval contained 0 (for prior in Figure 4D), whereas the 3 others did not.
In this systematic review of 12 studies with a total of 1,127 SSc patients, we found significant impairment in the HRQOL of SSc patients. Although the impairment in physical health appeared greater (SF-36 PCS score was more than 1 SD below that of the general population), mental health was also impaired (SF-36 MCS score was ∼0.5 SDs below that of the general population). Moreover, the physical health of patients with dcSSc was ∼0.5 SDs below that of patients with lcSSc, whereas mental health was impaired to the same extent in both subsets of disease. The minimum clinically important difference (MCID) is an important measure of change in HRQOL (41) and represents the smallest change in the score that patients can perceive. A change in the score of 2.5–5.0 for the SF-36 summary scores has been suggested as representing an MCID and has been previously used in SSc (23). Therefore, although the differences identified in this study were obtained using cross-sectional data, the differences in PCS but not MCS scores between dcSSc and lcSSc are also likely to be clinically meaningful.
The significance of this study is 2-fold. First, it provides evidence of the presence and magnitude of impairment in HRQOL in SSc, both in physical and mental health. Although several small studies had found that SF-36 PCS scores were impaired in SSc, the results were inconsistent (Supplementary Tables 1 and 2, available in the online version of this article at http://www3.interscience.wiley.com/journal/77005015/home). Moreover, SF-36 MCS scores were thought to be relatively preserved in SSc, leading some to argue that, despite significant impairments in physical health, SSc patients adapt well to their slowly progressing disease (23). This was incongruent, however, with reports of high rates of depressive symptoms in SSc (5, 42). Thus, this study provides strong evidence that HRQOL is considerably impaired both in the physical and mental health domains in SSc. Second, policymakers may be unfamiliar with this rare but devastating disease. These data, which show significant impairment in the physical health of SSc patients of more than 1 SD compared with the general population, provide valuable evidence that physicians and patient groups can use to advocate resources for patients who have this severe and devastating disease.
In the meta-regression analyses, we found that controlling for age or the percentage of patients with dcSSc, but not both together, decreased the heterogeneity between studies. In addition, we found that the effect of age and percentage of dcSSc patients was in opposite directions, with increasing age being associated with better SF-36 PCS scores and dcSSc being associated with worse SF-36 PCS scores. Although the finding related to age appears counterintuitive, we hypothesize that it may be the result of confounding and survival bias, with patients with lcSSc having better survival than those with dcSSc and thus surviving to older ages. In our analyses stratified by lcSSc and dcSSc, we showed that patients with lcSSc had better SF-36 PCS scores than those with dcSSc. However, although data were not available to allow us to demonstrate that the patients with lcSSc included in the systematic review were older than those with dcSSc, there is nevertheless some independent evidence to suggest this. First, patients with dcSSc are believed to have worse survival than those with lcSSc (43, 44). Second, in the only study included in the review that reported age separately according to limited or diffuse status, 70% of patients with lcSSc were age ≥55 years compared with only 49% of those with dcSSc. Therefore, we believe that the model adjusting for lcSSc or dcSSc (Figure 1) is the model that provides the best estimate of the pooled SF-36 PCS (38.2; 95% CI 36.2, 40.4) and MCS (46.6; 95% CI 43.9, 49.2) scores in SSc.
A study such as this is not without limitations. A systematic review of published studies is limited by the fact that it excludes unpublished data and this may result in publication bias, whereby studies with negative results may be less likely to have been published and included in the analysis. We attempted to examine this using funnel plots (See Supplementary Figure 2, available in the online version of this article at http://www3.interscience.wiley.com/journal/77005015/home). These showed that only the overall meta-analysis of the MCS showed evidence of asymmetry, suggesting that the larger studies tended to have more normal MCS scores. The other 3 analyses did not show significant asymmetry, although this is difficult to fully assess given the small number of studies. Confounding is also possible, given the lack of individual patient data. Nevertheless, we attempted to control for some of the possible confounding in the meta-regression by adjusting for common confounders including age and sex, albeit at the group level. Unfortunately, we did not have characteristics of patients by subset of disease (limited or diffuse) and could not perform a meta-regression for that part of the analysis. Therefore, we acknowledge that confounding remains a possibility in that analysis. Finally, the patient inclusion criteria for each study with regard to disease subset were clearly quite varied, with studies ranging from having only lcSSc patients to only dcSSc patients (Table 1). Such heterogeneity in selection could in fact affect the analysis. First, it could cause our estimates to be less precise due to the possibility of estimating different population parameters in each study. Second, if the patients selected for the studies were somehow different than the general population of SSc patients, this could also have an impact on the generalizability of our results. However, we view the heterogeneity of the patient population as a strength rather than as a weakness of our study. We were still able to detect a difference in physical HRQOL, in spite of the very different patient populations and the limited number of studies. In our opinion, it is far more likely that having studies with more homogeneous populations would strengthen our results, rather than reveal a systematic bias.
In conclusion, this study provides robust evidence that HRQOL is considerably impaired in patients with SSc. This finding should now serve as our call to action to identify targets and implement interventions that have the ability to improve the HRQOL of those living with this devastating disease.
All authors were involved in drafting the article or revising it critically for important intellectual content, and all authors approved the final version to be published. Dr. Hudson had full access to all of the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis.
Study conception and design. Hudson, Thombs, Steele.
Acquisition of data. Hudson, Newton.
Analysis and interpretation of data. Hudson, Thombs, Steele, Panopalis, Baron.
INVESTIGATORS OF THE CANADIAN SCLERODERMA RESEARCH GROUP
Investigators of the Canadian Scleroderma Research Group, in addition to the authors, are as follows: J. Pope: London, Ontario; J. Markland: Saskatoon, Saskatchewan; D. Robinson: Winnipeg, Manitoba; N. Jones: Edmonton, Alberta; N. Khalidi, E. Kaminska: Hamilton, Ontario; P. Docherty: Moncton, New Brunswick; M. Abu-Hakima, S. LeClercq, M. Fritzler: Calgary, Alberta; A. Masetto: Sherbrooke, Quebec; D. Smith: Ottawa, Ontario; E. Sutton: Halifax, Nova Scotia; J.-P. Mathieu, S. Ligier: Montreal, Quebec, Canada.