Systemic sclerosis disease modification clinical trials design: Quo vadis?


  • Fabian A. Mendoza,

    1. Jefferson Institute of Molecular Medicine and Scleroderma Center, Thomas Jefferson University, Philadelphia, Pennsylvania
    Search for more papers by this author
  • Lynette L. Keyes-Elstein,

    1. Statistical and Clinical Coordinating Center for Autoimmune Disease Clinical Trials, NIH/National Institute of Allergy and Infectious Diseases, Bethesda, Maryland
    Search for more papers by this author
  • Sergio A. Jimenez

    Corresponding author
    1. Jefferson Institute of Molecular Medicine and Scleroderma Center, Thomas Jefferson University, Philadelphia, Pennsylvania
    • Jefferson Institute of Molecular Medicine, Thomas Jefferson University, 233 South 10th Street, Suite 509 BLSB, Philadelphia, PA 19107
    Search for more papers by this author


Since the first description of the treatment of scleroderma (systemic sclerosis [SSc]) by Carlos Curzio in 1753, which included warm milk and vapor baths, bleeding from the foot, and oral administration of small doses of quicksilver (1), until the most recent clinical trials of anti–transforming growth factor β monoclonal antibodies and novel tyrosine kinase inhibitors (2–10), numerous medications have been used in treating this condition. Despite intense efforts to develop effective therapies, SSc remains the autoimmune rheumatic disease with the highest case-specific mortality. Furthermore, there is a general perception that no disease-modifying therapeutic modalities are available and that any therapeutic approaches currently employed are clinically ineffective. Although numerous therapeutic agents have been suggested to be effective in uncontrolled clinical trials or in descriptive unblinded studies, rigorous evaluation failed to confirm these suggestions.

The relative rarity of SSc, its clinical heterogeneity and the lack of quantitative and unbiased assessment tools to evaluate the effectiveness of therapeutic interventions have imposed serious limitations to the design and performance of informative clinical trials in this disease. Based on different pathophysiological concepts, diverse trial design approaches have been used to develop phase I and phase II trials for potential SSc disease-modifying drugs (11). Despite these efforts, effective treatments for SSc remain elusive with no clear and widely-accepted standard for treating its broad spectrum of clinical manifestations. The paucity of data showing effective disease-modifying therapeutic interventions for SSc, our best wishes to help patients with this disease, and the clamorous necessity for effective therapies, coupled with the expected increase of new trials in the near future, render the limitations in the performance of controlled clinical trials an unavoidable obstacle that must be surmounted. Consequently, we will focus this View article toward establishing a framework and describing an approach to the design and development of disease-modifying SSc clinical trials that may be useful for investigators designing and engaging in such trials.

It is important to emphasize that cutaneous SSc involvement constitutes a specific subset of the broad disease spectrum and that paradigms and clinical response expectations for other manifestations of the disease, such as pulmonary artery hypertension or SSc-related interstitial lung disease (ILD), should not be directly extrapolated from the observations related to skin involvement.

We are aware that satisfying all clinical trial requirements in terms of statistical correctness, ethical perspective, and feasibility will be difficult for SSc disease-modification trials. Therefore, recognizing that it will not be possible to design a universally acceptable fixed clinical trial design, it is our purpose to discuss several crucial aspects of trial design that will need to be considered and addressed in the design of future SSc disease-modification clinical trials.

Randomized Controlled Trials Versus Single-Arm Open-Label Trials

Single-arm open-label trials are used to obtain preliminary data regarding the safety of a drug and may help to develop and refine the parameters for the outcome variables to be evaluated. In any single-arm trial, observed improvements may not be attributable to study treatment. For example, if the outcome of interest is not an objective measure, both physicians and patients may believe they observe improvement that is not genuine. Measures based on rater interpretation, such as the modified Rodnan skin thickness score (MRSS) or visual analog scales (VAS), are vulnerable to this type of observation bias. Another possible explanation for observed improvements not related to the study treatment could be the natural course of the disease. For example, skin will soften spontaneously in some patients with SSc. Conversely, disease course could also obfuscate a positive treatment effect; patients may show no improvement during the trial, but the disease would have progressed in the absence of the study treatment. In SSc, the heterogeneous clinical presentation and the variability in the natural course of the disease render single-arm studies of limited value to explore effectiveness of any potential therapeutic agent. To illustrate how the variable course of the disease can affect treatment outcome, it should be remembered that Carlos Curzio's treatment of the scleroderma patient he described (1) resulted in softening of her skin after 11 months of therapy.

The evaluation of safety data for a single-arm trial can also lead to ambiguous findings. For a heterogeneous disease such as SSc, which gives rise to a complex array of clinical manifestations impacting multiple organ systems, distinguishing adverse reactions related to the disease from those attributable to the therapeutic intervention can be a significant challenge. These considerations became quite apparent in a recent single-arm open study of imatinib mesylate in which it was extremely difficult to adjudicate with certainty numerous clinical manifestations to either the disease course or as adverse reactions to the therapeutic intervention (Spiera R: personal communication). However, it should be emphasized that there is an important role of single-drug open-label trials for the acquisition of preliminary information about safety as well as to obtain valuable mechanistic insights of the proposed intervention and to assist in the development and honing of outcome measures. Furthermore, these trials are substantially less difficult and less expensive to conduct since they have an easier recruitment and design.

A well-designed blinded randomized controlled trial (RCT) can overcome the weaknesses outlined above for the single-arm study. In the discussion to follow, we review issues associated with the design and analysis of RCTs as they apply to trials in patients with SSc. We attempt to maintain our recommendations to be consistent with published guidance documents from the Food and Drug Administration (FDA) and the International Conference on Harmonisation (ICH), which provide principles and methodology to ensure that the assessments of efficacy and safety are unbiased and scientifically valid.

Randomized Placebo-Controlled Trials

A key decision when designing an RCT is the choice of control group. Placebo-controlled trials are preferred when the objective is to assess efficacy and safety of the study treatment after accounting for the influences of disease course and physician or patient expectations. Placebo-controlled trials are considered ethical when no effective treatment is known or when the use of placebo does not expose the subjects to excessive risk of harm, provided that subjects are informed of the risks. Even when the use of placebo is deemed ethical, physicians and their patients may be reluctant to participate in a placebo-controlled study, which presents a problem for recruiting sufficient numbers of subjects for the trial. When the use of placebo presents ethical or practical concerns, the study treatment can be compared to a prior existing therapy or standard of care. When the control is an active therapy, the objective of the trial is to demonstrate either noninferiority or superiority of the study treatment compared to the active control, the choice likely depending on the perceived effectiveness of the active control. Regarding SSc, the most relevant question in this matter is whether there is an accepted standard of disease-modifying treatment for these patients.

Although SSc is characterized by broad multisystemic organ involvement, the involvement of skin in SSc is almost universal and constitutes a specific subset of the broad disease spectrum. At the present time, there is no universally accepted disease-modifying therapy for the cutaneous manifestations of the disease. Only a few studies have demonstrated a statistically significant impact on the MRSS. In a study described by Tashkin et al (12), subjects with SSc-associated lung disease treated for 1 year with oral cyclophosphamide had significant improvement in MRSS at 12 months, compared to subjects receiving placebo. However, owing to its adverse event profile, cyclophosphamide is not a preferred treatment option for skin disease in the absence of ILD.

From a review of 8 RCTs for diffuse skin disease in SSc, excluding trials for treatment of Raynaud's phenomena and SSc-associated ILD (2–9), only 1 study showed a statistically significant difference between treated and control groups. This study was a small trial (n = 29) employing methotrexate (MTX) as a disease-modifying drug, which showed a statistically significant improvement over placebo (P = 0.03) at 24 weeks on a composite response variable (defined by improvement in MRSS, diffusing capacity for carbon monoxide, or VAS for general well-being). The impact on the MRSS alone was not significant (P = 0.06) (5). The lack of significance of the effect on the MRSS for this MTX trial and the failure of the other RCTs to find statistically significant treatment effects does not necessarily mean that all the treatments are ineffective. A failed study may have included too few subjects to detect a meaningful clinical difference, or may have been too short in duration for the treatment to impact the pathologic effects of the disease on the skin, or biased in some way against the treatment. For example, in a second RCT designed to evaluate MTX in 71 randomized subjects, an analysis of 47 subjects who completed the study showed no statistical difference between MTX and placebo (8). A reanalysis of the data using a Bayesian approach and applying multiple imputation methods for missing data to reduce bias associated with loss of subjects suggested that subjects receiving MTX were more likely to have a favorable MRSS response than subjects receiving placebo. While this demonstrates the fact that evaluation of a simple hypothesis sometimes does not provide the entire picture and that a thoughtful analysis can glean information important to evaluating the full potential of a new therapy, the frequentist approach (i.e., hypothesis testing) is currently still the standard by which the FDA will approve a drug for a labeling indication and by which clinical practice is changed. The example of MTX for treatment of SSc skin manifestations illustrates the importance of overcoming design challenges in order to obtain unambiguous results.

Another interesting example of a study that failed to find significant difference between treatment groups is the high-dose versus low-dose D-penicillamine trial (2). In this trial, a low dose of D-penicillamine was deemed by the investigators to be an appropriate control. Although this study has been widely quoted as supporting the concept that D-penicillamine is not an effective disease-modifying therapy for SSc, this conclusion rests on the validity of the assumption that the low dose of D-penicillamine is not at a therapeutic or biologically active level. Recent research, however, demonstrating an extremely important role of extracellular matrix stiffness in the fibrotic process calls the validity of this assumption into question. Remarkable results from several recent studies (13–16) have shown that increased matrix stiffness resulting from lysyl oxidase 2–mediated cross-linking of collagen molecules and from interactions with specific integrins can result in a potent activation of profibrotic pathways and in increased production of collagen and other extracellular matrix proteins with worsening of the fibrotic process. These studies clearly indicate that disruption or blockage of collagen cross-linking would be expected to exert a potent antifibrotic effect. D-penicillamine is one of the most potent inhibitors of collagen cross-linking. These effects can last for prolonged periods of time depending on the turnover rates of the affected collagen molecules. Therefore, even very low doses of the drug may have quite potent antifibrotic effects mediated by the modification of extracellular matrix stiffness. Based on this new information, a more appropriate conclusion is that the high-dose versus low-dose D-penicillamine study was not able to differentiate between the two interventions or determine whether either intervention was effective or ineffective. As such, rejection of D-penicillamine as a potential disease-modifying agent for SSc may have been premature.

Given the absence of a universally accepted disease-modifying treatment for skin manifestations, ethical concerns associated with designing a placebo-controlled study are diminished. Furthermore, placebo-controlled trials are essential to establishing the absolute benefit of a new treatment over the natural course of the disease. Placebo-controlled trials, however, can be difficult to recruit especially with a disease like SSc where disease progression has serious and irreversible consequences. Physicians are eager to offer SSc patients a disease-modifying therapy, even one that is not proven or universally accepted, if there is a reasonable expectation of benefit. In addition, owing to anxiety about the future progression of their disease, patients are often unwilling to accept randomization to placebo, and if they do consent they may be quick to leave the study in the absence of perceived benefit. Physician and patient reluctance to participate, combined with the rarity of SSc, can compromise the ability to complete an adequately powered blinded parallel-group placebo-controlled trial within a reasonable timeframe.

To address this issue, use of historical controls as a comparator rather than placebo controls has been employed in some studies to describe the natural course of the disease. Generally, this approach will not produce comparable treatment groups, and hence bias is not adequately controlled, and subsequent findings are inconclusive. While use of historical controls cannot be recommended in most situations, valid alternative placebo-controlled designs that minimize exposure to placebo and/or provide a potentially active treatment to all subjects are worth consideration at the protocol development stage.

Since no single design will be satisfactory in all situations, several alternative placebo-controlled designs options are discussed below. Some are standard and well known, others are progressive designs developed for trials in diseases other than SSc, but all should be considered for their potential to offer a way forward for clinical research on treatments for patients with SSc. Among several alternative designs outlined in the 2001 FDA guidance on choice of controls, add-on, early escape, and randomized withdrawal designs are some standard options that might be considered for SSc trials (17). Each option has advantages and disadvantages.

Add-on design

For the add-on design, all subjects are randomized to either the study treatment group or placebo group in combination with a background therapy on which all subjects are maintained. The background therapy would typically be a standard of care. If no proven or uniformly accepted standard of care therapy is available, a control therapy with reasonable expectation of benefit (e.g., MTX for cutaneous disease) might be appropriate given certain limitations (noted below). The advantage of this design is that all patients will receive some form of treatment, which might make the trial more acceptable to physicians and patients and easier to recruit, but there are several considerations. First, the effect of the study drug alone cannot be assessed. If the combination of background-plus-study treatment proves to be better than background-plus-placebo, no definitive conclusions can be drawn about how the study treatment would fare as a monotherapy.

Second, if the background therapy is effective, then it might be reasonable to assume that the difference between the background-plus-study treatment combination and background-plus-placebo combination would be smaller than the difference between study drug and placebo in the absence of background therapy. Under this assumption, the add-on study would need to be larger than the parallel-group placebo-controlled trial to achieve the same level of power. Third, if the effectiveness of the control is not proven (e.g., MTX for cutaneous disease), and a clinically significant difference between groups is not detected, the study will be inconclusive, i.e., both drugs might or might not be effective.

Finally, this design is likely to be most useful in cases where the background and study therapies have different mechanisms of action, but the protocol development team should also be cognizant of potential adverse interactions between the therapies.

Early escape design

In the early escape design, randomized treatment (placebo or study treatment) is discontinued if the disease worsens or does not improve by a prespecified point in time. The advantage of this design is that subject exposure to an ineffective treatment is minimized. Responder status or time-to-failure end points are the most appropriate for this design. Since the consequences of SSc progression can be severe and irreversible, close monitoring for disease progression and implementation of plans for rescue therapy are necessary in every study on ethical grounds. However, it is important to recognize the limitations that institution of rescue therapy could pose on analysis of quantitative end points (see discussion below on patient selection challenges and risk of progression).

Randomized withdrawal

In the randomized withdrawal trial, all subjects receive the study drug during an open-label phase of the study. Subjects who respond during the open-label phase are randomized to either stay on the study treatment or withdraw. For example, SSc patients could be treated with open-label study treatment for a predetermined period such as 9 months. After this period, subjects with a stable or improved MRSS could be randomized to either stay on the study treatment or withdraw. After another period of observation, e.g., 6–9 months, if the subjects who remained on study treatment had significantly better improvement in the MRSS than those who were withdrawn, the study treatment would be considered effective. One important consideration in this design is that if the effect of the study treatment is expected to continue on after withdrawal, then the observation period will have to be long enough to witness deterioration of the response in order to observe differences between study arms. Another consideration is that the study treatment should be tapered slowly to minimize the risk of disease exacerbation associated with rapid withdrawal. A limitation of this design is that only the subset of the population that had a favorable response is included in the randomized portion of the study, so estimated treatment effects cannot be generalized to the population as a whole.

Over the last decade, some innovative designs have been developed to improve efficiency, enhance recruitment, and address the ethical issue of equipoise when prior studies suggest a reasonable expectation of benefit. Some of these designs are the three-stage design (18, 19), response conditional cross-over (20), placebo-phase design (21), and various response-adaptive designs (22). These designs can allow smaller numbers of subjects to be required for the study, allow a reduction in the number of subjects in the placebo arm, and they have been successfully used in studies of rare diseases. However, the relatively slow development of skin induration changes in SSc necessary to cause a meaningful modification of the MRSS as an outcome measure, the potential carry-over effect of previous interventions, and the prolonged duration of disease-modifying interventions for SSc may limit the potential application of these designs for SSc clinical trials.

Randomized Active Controlled Trials

In situations where an effective therapy is available, a placebo-controlled trial may no longer be ethical. In SSc, some of the target organ manifestations such as renal crisis and pulmonary hypertension have highly effective therapy options. Marginal efficacy of cyclophosphamide has also been demonstrated in SSc-associated ILD. If available, an active control should be considered for RCTs when failure to treat could result in life-threatening consequences or significant morbidity. For example, the currently ongoing autologous stem cell transplantation SCOT (Scleroderma: Cyclophosphamide or Transplantation) trial and its mirror European study, ASTIS (Autologous Stem Cell Transplantation International Scleroderma) use cyclophosphamide as an active control. Results of these trials are not known at present. It should be pointed out that in these two trials, the aggressive treatment scheme, the inclusion of patients with SSc-related alveolitis, and the requirement of “life-threatening disease” in their inclusion criteria clearly justify the use of a potent although potentially harmful active control. However, the well-documented lack of durable effects of cyclophosphamide on SSc lung involvement and the substantial frequency of severe side effects suggest great caution in the use of this drug as a comparator active control. Obviously, standard therapy for other manifestations of the disease should be allowed in both groups.

Patient Selection Challenges and Risk of Progression

To mitigate effects of the variable clinical course of SSc, and owing to the paradigm that early interventions are crucial for an effective SSc therapy, most of the recent trials have been focused on subpopulations of patients with recent-onset disease. Recent-onset disease has been defined in different ways. A cut-off point of 18–24 months from the appearance of the first non–Raynaud's phenomenon SSc involvement has been utilized in some studies and seems reasonable. However, to provide greater homogeneity regarding the skin involvement, it may be appropriate to define the onset of the disease as the appearance of clinically detectable skin induration to avoid problems that may be introduced, for example, by the presence of gastrointestinal symptoms such as gastroesophageal reflux. Furthermore, if skin thickening is a major outcome of the study, patients should have a substantial degree of skin involvement at the time of recruitment. For example, a requirement for subjects to have diffuse cutaneous SSc with an MRSS >16 at enrollment will enable the study to demonstrate the effectiveness of the therapeutic intervention employing the currently available outcome tools.

Although the selection of recent-onset and moderate to severe skin involvement is useful from the clinical and mechanistic perspective, there is strong evidence to suggest that there is a substantial risk of rapid and even fatal disease progression in patients with rapidly progressive skin involvement, particularly related to worsening renal and pulmonary involvement (2–9). Given the substantial risk of potentially serious, irreversible, or life-threatening consequences associated with severe disease progression, aggressive monitoring is required to detect rapid progression of skin disease, and to recognize early signs of renal disease or alveolitis. Criteria for use of salvage therapies for patients with worsening lung and skin involvement should be prespecified in the protocol and are strongly recommended. However, choosing the drugs for salvage is difficult and represents another point of controversy. The recently published recommendations from the European League Against Rheumatism/EULAR Scleroderma Trials and Research group for management of SSc may assist investigators in selection of appropriate salvage therapies (23). One difficulty is that with the exception of the use of angiotensin-converting enzyme inhibitor for renal crisis, there are no available data conclusively demonstrating that the use of any drug can decrease SSc mortality. Pulmonary involvement has emerged as the most frequent cause of mortality in patients with SSc, currently accounting for approximately 60% of all SSc-related deaths. Since cyclophosphamide has been shown to stabilize recent-onset SSc-related alveolitis and ILD (12), it is suggested as the rescue therapy for patients who develop clinically significant alveolitis and ILD. For patients with rapidly progressing skin involvement indicative of a dangerous progression of disease, there is no clear choice of rescue therapy. Given the modest effects observed for MTX in two RCTs, and positive evidence from uncontrolled trials suggesting possible effectiveness of mycophenolate mofetil (24–29), one of these two agents may be appropriate choices.

Altough failure to provide appropriate salvage therapy in the event of severe disease progression would be unethical, implementation of this design feature poses significant challenges with respect to analysis of the data and interpretation of the study results. Whereas randomization helps to assure treatment groups are comparable at the start of the trial, salvage therapy, which will be implemented erratically and differentially across treatment arms, undermines comparability and weakens the ability of the trial to adequately assess the impact of study treatment on patient outcomes. If the “need for salvage therapy” is a component of the primary end point (e.g., response failure or time to failure), the impact of bias could be minimal, providing criteria for implementing salvage therapies are prespecified, clearly and objectively defined, and implemented consistently.

Conversely, numeric end points, for example, the MRSS, may be highly sensitive to bias if the time point for the primary assessment is after some subjects have received salvage therapy. Under this scenario, the potential for bias should be recognized during the design phase with strong consideration of 1) how often salvage therapy is likely to be needed in the population under study; 2) what baseline characteristics would be useful predictors of subjects who might go on to salvage, and, importantly; 3) how will the chosen salvage therapies impact the primary end point of interest? Consider an example where the primary end point is the MRSS at a given time point, and mycophenolate mofetil is chosen as the salvage therapy for rapidly progressing skin disease. Assuming that both study treatment and mycophenolate mofetil are effective in lowering the skin score, if salvage with mycophenolate mofetil is used more often in the placebo arm than in the study treatment arm, then the observed difference between placebo and study treatment arms will be smaller than would have been observed in the absence of the salvage therapy. That is, the estimated effect will be biased against the study treatment and statistical significance will be more difficult to achieve. Typical data analysis approaches including exclusion of subjects who have received salvage therapy, ignoring that some subjects have received salvage, and carrying forward the last observation prior to salvage will all yield biased results. If baseline predictors of the need for salvage are available, an alternative might be to consider primary end point data for subjects receiving salvage as missing, and apply prespecified approaches for handling missing data (see discussion below). Importantly, since no statistical technique can overcome the impact of bias with absolute certainty, the study development team should discuss the likely severity of the bias due to salvage and risk posed to valid inference. If the risk is too high, alternative end points or designs should be considered.

As noted above, criteria for implementing salvage therapies should be clearly and objectively defined, but these decisions are also challenging. For cutaneous disease, it is suggested that a change exceeding the minimum important difference (MID) as defined by Khanna et al (30) and based on several prior SSc clinical trials should be utilized. Khanna et al estimated a 3.2–5.3 MID for MRSS improvement (e.g., ∼15–25% for an MRSS of 21); therefore, we suggest that a worsening of >6 on the MRSS (e.g., ∼30% worsening in skin involvement for an MRSS of 21) should trigger the institution of the salvage therapy arm. Similarly, for evaluation of pulmonary worsening and based on data from previous clinical trials, a 15% reduction in total lung capacity or forced vital capacity accompanied by the demonstration of new onset of alveolitis or fibrotic lesions in a high-resolution computed axial tomography scan should indicate the institution of salvage therapy with cyclophosphamide. Based on the above considerations, an RCT design (placebo versus study drug) with salvage therapy for patients with worsening SSc pulmonary and skin involvement is suggested (Figure 1).

Figure 1.

Example of a systemic sclerosis disease modification randomized controlled trial (RCT) design. Note that the trial includes salvage therapy arms for worsening skin involvement and deterioration of lung function caused by new-onset alveolitis or progression of lung fibrosis. The study also includes a post-study followup (f/u) phase. TLC = total lung capacity; mRSS = modified Rodnan skin thickness score; tx = treatment; PFT = pulmonary function test; HRCT = high-resolution computed tomography; SHAQ = Scleroderma Health Assessment Questionnaire.

Subjects meeting a criterion for salvage therapy should remain in the study and continue with all planned study procedures and assessments. Furthermore, subjects and study personnel should generally remain blinded to the study treatment assignment after salvage therapy is instituted. Per the ICH E9 guidelines, breaking the blind for an individual subject should only be considered when management of patient care would vary depending on the treatment assignment. In contrast to these guidelines, however, an FDA guidance on safety reporting requirements that went into effect in March 2011 outlines different reporting requirements for study treatment and comparator arms, which results in unblinding the treatment assignment for any subject experiencing an unexpected serious adverse event. According to the new guidelines, suspected unexpected serious adverse reactions (SUSARs), which by definition can only occur in the study treatment arm, are reported in an expedited manner, but suspected unexpected adverse experiences occurring in the comparator arm are not to be reported on the expedited timeline. Since expedited reports for SUSARs are also sent to institutional review boards, site personnel will be unblinded. In a study with few unexpected serious adverse events, the impact of unblinding the treatment assignment on a few subjects may be minimal. However, the scientific validity of the study could be jeopardized if the fraction of unblinded treatment assignments becomes too large. Study teams should monitor this and consider consulting with the FDA on alternate reporting strategies should a concern arise.

Duration of the Trial

Owing to the slow turnover of collagen in the tissues, the trial requires a long observation period to detect clinically evident changes. For example, the results of a recent study employing the tyrosine kinase inhibitor imatinib mesylate showed that improvement in the MRSS was not detectable at 3 months and that a statistically significant improvement was observed at 6 months of treatment, which continued until the termination of the study at 12 months (10). The same pattern was observed in a recent retrospective trial with mycophenolate mofetil showing statistical differences compared with a historical cohort after only 6 months (29). Furthermore, our own analysis of a cohort of recent-onset rapidly progressive SSc patients treated with mycophenolate mofetil showed worsening MRSS during the first 6 months of treatment, which was followed by statistically significant improvement in the MRSS in the subsequent 6 months of the trial (31). Therefore, based on the experience from these studies as well as from other studies a 12-month intervention phase would be required as the minimal duration for a meaningful trial. Followup at the end of the intervention is also advisable either as a part of the trial itself or as a continued extension.

End Points

A comprehensive review of potential end points for the study of patients with SSc is beyond the scope of this discussion, but would include a wide variety of time-to-event outcomes, responder definitions, quantitative measures associated with individual manifestations of the disease, quality of life measures, patient-reported outcomes, and biomarkers. One end point used almost universally in SSc clinical trials is the MRSS, the only validated clinical measure of changes in the extent and severity of skin involvement in SSc (32–34). The MRSS is an improvement over the original description of the skin score (32), in terms of practical application and reduction of interobserver variability and standard deviation. In a study of intra- and interrater variation in the assessment of the MRSS in a diverse population of SSc patients with both diffuse and limited cutaneous disease of variable severity, the coefficient of interrater variation (i.e., a measure of agreement between raters) was estimated at 25%. In contrast, the intrarater coefficient of variation (i.e., within-rater agreement) was 12%. These two findings taken together suggest that repeated MRSS assessments on an individual subject should be made by the same rater. If the MRSS is the primary end point for a longitudinal study, the logistics of retaining the same rater throughout the study should be considered by the protocol development team.

A significant challenge has been to identify end points that describe disease progression across multiple organ systems. An example is event-free survival, an end point of key interest in both the SCOT and ASTIS trials, where events are defined as death or significant organ failure. A second example, also from the SCOT study, is a new global composite rank score (GCRS) that reflects each subject's “order” relative to every other subject in the study based on the following hierarchy of component outcome variables: death, failure of event-free survival, change in forced vital capacity, change in Scleroderma Health Assessment Questionnaire score, and change in MRSS. Although it has intuitive appeal, the GCRS will have to be evaluated to determine whether scores represent an accurate assessment of disease progression or improvement.

The development of new outcome measures is a priority in SSc clinical research. With more precise, sensitive, and quantitative outcome measures, clinical trials will require fewer individuals to be enrolled for a shorter duration.

Threats to Study Validity

Dropouts and missing data

High dropout rates have been observed in past trials of disease-modifying therapy in patients with SSc. For example, 50% of subjects were lost in the D-penicillamine trial and 20% in the bovine collagen trial (2, 6). Missing data due to subject withdrawal can result in biased estimates of the effect of the study treatment on both efficacy and safety end points, and undermine the validity of study conclusions. A comprehensive review of methods for prevention and handling of missing data in clinical trials has been recently published by the National Research Council (35). A few highlights from these recommendations are repeated in the discussion below.

Importantly, statistical approaches to account for missing data in the analysis will always rely on unverifiable assumptions about the missing data mechanism. Hence, strategies designed to minimize missing data on key outcomes during the design and implementation phases of the study have more power to ensure study integrity than statistical techniques. The protocol design team should consider reasons why subjects might withdraw prematurely and consider potentially mitigating study design features. For example, SSc subjects may withdraw prematurely due to intolerance of the study treatment, lack of perceived treatment effect, or inconvenience attributed to long followup times or too many visits. To address withdrawal due to tolerability issues, the study design might include flexible dosing or include a run-in period to ensure tolerability prior to randomization. Options to minimize dropouts due to lack of perceived treatment effect include implementation of salvage therapies or considering a design that minimizes exposure to ineffective treatment or offers active treatment to most or all subjects. The problem of long followup times is challenging in SSc trials given the long time required to observe changes in fibrosis, but care should be taken to keep the number of study visits to a minimum and avoid unnecessary data collection at each visit. While the study is actively following subjects, site personnel should be trained to understand the importance of working to keep subjects in the study and to collect and report data. Subjects who are withdrawn from treatment should still be encouraged to continue to return for study visits and data collection. Sites should be adequately compensated for these efforts as an incentive to maintain vigilance in these efforts. Finally, data managers should monitor and query sites for missing data on a routine basis.

Despite best efforts, some degree of missing data will be present in nearly every trial. A prespecified analysis plan to evaluate the study objectives should include information on all randomized subjects. This is consistent with the principles of an intent-to-treat analysis and requires that missing data for randomized subjects who withdrew prior to assessment of the end point of interest be accounted for in the analysis. This approach is in contrast to many published reports on SSc trials where analysis of data from completers is presented as the primary analysis. Analysis of only the completers is unbiased only in the rare situation where missing subject data are missing completely at random, which means missingness is attributable to reasons completely unrelated to any study variable. For example, data may be missing due to equipment failure or withdrawal of the subject due to an out of town transfer away from the study clinical facilities. Furthermore, analysis of only the completers also wastes valuable information that is available on withdrawn subjects.

Many statistical analysis tools are currently available that reduce bias associated with missing data and make use of available information on withdrawn subjects. These include single imputation methods, maximum likelihood, multiple imputation, Bayesian methods, and methods based on generalized estimating equations. The choice of method depends on assumptions regarding the missing data mechanism and underlying the methodology. Single imputation methods such as last observation carried forward (LOCF) are commonly applied in SSc trials but rely on the often unrealistic assumption that the observation at the time of withdrawal would not change if measured later. In the case where subjects' outcomes are expected to worsen with time, the LOCF approach can underestimate the treatment effect if a disproportionate number of subjects on the placebo arm withdraw. Reasonable assumptions about missing data should be discussed with the study team prior to choosing the statistical approach. In addition, the team should consider baseline assessments and historical information that might be predictive of future withdrawal, which could be incorporated into the analytical plan. In particular, the skin thickness progression rate, which has been shown to predict mortality and early internal organ involvement, should be assessed at baseline for studies in diffuse cutaneous SSc (36). Finally, sensitivity analyses are also important to assess the degree to which the estimated treatment effects rely on the assumptions.

Inadequate power

Another consequence of the premature withdrawal of subjects from a trial is that the subjects remaining are likely to be a subset of subjects with more stable disease or with greater improvement in disease who have not experienced intolerable toxicities. This subset no longer represents the target population. In addition, the outcome assessments in the placebo and study treatment arms are likely to be more similar than would have been expected if subjects had not withdrawn. As a consequence, statistical power for the test of the treatment effect is reduced.

A secondary consequence is in the design of future studies. In building a rationale for sample size estimates for a new trial, statisticians turn to published trials to extract estimates of responder rates and/or mean outcomes for the comparator arm along with estimates of the standard deviation. Extreme outcome values are likely to be under-represented among completers, which will bias estimated means and tend to reduce standard deviations. If the standard deviation estimate is too small, the resultant sample size estimate will also be too small. Care must be taken when using estimates from a trial with a high dropout rate in order to ensure that the assumptions used in the design of a new trial are appropriate.

In addition to ensuring that estimates derived from prior studies are appropriate, several other factors can impact the validity of sample size/power estimates and can, if ignored, undermine the success of the trial. First, the statistical test on which the sample size/power estimate is based must be consistent with the objectives and design of the study as well as the analysis planned for the primary end point upon completion of the trial. The sample size and power needs for key secondary end points should also be considered. If the analysis methods underlying the sample size/power calculations are inconsistent with the methods planned for analysis, the study may be under-powered to detect the treatment effect of interest or over-powered and larger than necessary.

Second, the goal in selecting the sample size is to ensure the study is adequately powered to detect a clinically meaningful effect. A clinically meaningful effect should not be confused with a statistically significant effect; a treatment group difference of ANY magnitude no matter how small can be statistically significant if the study is large enough. The objective is to design an efficient study that has a reasonable probability (i.e., power) of detecting a clinically meaningful difference between treatment groups should one in fact exist. Since there is not always consensus on the magnitude of the clinically meaningful effect, the protocol development team should consider this question during the design stage consulting literature and expert opinion as needed. This question has been considered by investigators in the field for studies where the MRSS is of interest. A recent Delphi exercise showed that a reduction of at least 35% from the baseline is necessary to consider the effect clinically relevant and statistically significant (37). This estimate is more generous than the MID calculation discussed above, but it may represent a more desirable goal in patients with severe skin disease.

Third, the adequately powered study must be feasible. Too often, site investigators will overestimate how many subjects can be randomized within a reasonable time frame, and a large study will ultimately fail due to insufficient recruitment. Another ill-advised practice is choosing a feasible sample size first then selecting a clinically significant effect large enough to produce the desired sample size. The danger in this practice is that if the treatment effect for a new drug is smaller than that used for planning but still worthy of clinical consideration, the study may have insufficient power to detect the effect. Although compromise is often needed owing to limited resources, a better approach is to consider alternate design and end point options before making a final decision on how best to meet the study objectives.

Translational Research Considerations

Given the clamorous necessity for the development of new disease-modifying drugs for SSc, every effort should be made to address the potential mechanism of action of the proposed intervention. Skin biopsy samples can provide invaluable information with a low risk for the patient. Since many of the intracellular pathways involved in the expression of the fibrotic phenotype are not well defined in the context of the natural course of the disease and depending on the intervention proposed, it is suggested that patients in both placebo and treatment groups should undergo skin biopsies. Analysis of these skin biopsy samples will allow the development and validation of more sensitive outcomes and the testing of potential therapeutic interventions in a shorter time and with a smaller number of subjects. The identification of serum biomarkers of disease activity should also become a high priority. Cytokine and autoantibody profiles from SSc patient samples are valuable sources for biomarker candidate selection and subsequent testing in SSc and healthy populations.

Although the MRSS has been considered a validated outcome for SSc disease-modifying trials, it is not quantitative and is prone to subjective variability. Therefore, substantial efforts should be devoted to the development and validation of more sensitive and quantitative tools such as quantification of extracellular matrix ECM protein expression on skin biopsy samples before and after treatment (38) or novel approaches based on newer molecular technologies such as global gene expression analyses (39–41), proteomics (42), or kinomics (43). Proteomics, microarrays, and kinomics are extremely powerful tools that have provided valuable mechanistic information, particularly in oncologic disorders however; they are still underused in patients with SSc. Their inclusion in any SSc mechanistic studies will provide invaluable data to direct and guide future research in the field.


All authors were involved in drafting the article or revising it critically for important intellectual content, and all authors approved the final version to be published.


The expert assistance of Melissa Bateman in the preparation of the manuscript is gratefully acknowledged.