Abatacept for lupus nephritis: Alternative definitions of complete response support conflicting conclusions

Authors

  • David Wofsy,

    Corresponding author
    1. University of California, San Francisco
    • Arthritis/Immunology Unit (111R), San Francisco VA Medical Center, 4150 Clement Street, San Francisco, CA 94121
    Search for more papers by this author
    • Dr. Wofsy has received consulting fees from Bristol-Myers Squibb, Genentech, and Vifor Pharma (less than $10,000 each).

    • Drs. Wofsy and Diamond were not compensated for this investigator-initiated analysis.

  • Jan L. Hillson,

    1. Bristol-Myers Squibb, Princeton, New Jersey
    Search for more papers by this author
    • Dr. Hillson owns stock or stock options in Bristol-Myers Squibb.

  • Betty Diamond

    1. Feinstein Institute for Medical Research, Manhasset, New York
    Search for more papers by this author
    • Drs. Wofsy and Diamond were not compensated for this investigator-initiated analysis.

    • Dr. Diamond has received consulting fees and/or honoraria from Genentech, Pfizer, Merck-Serono, and Merck (less than $10,000 each).


Abstract

Objective

Recent clinical trials in lupus nephritis have all used different criteria to assess complete response. The objective of this analysis was to compare several previously proposed criteria, using the same data set from a large trial of abatacept in lupus nephritis (IM101075). In so doing, we sought to determine which criteria are most sensitive to differences among treatment groups and to further examine the potential of abatacept in lupus nephritis.

Methods

Patients in the IM101075 trial received abatacept at 1 of 2 different dose regimens or placebo, both on a background of mycophenolate mofetil and corticosteroids. Using data from this trial, we assessed rates of complete response at 12 months according to 5 sets of criteria, from 1) the trial protocol, 2) the Aspreva Lupus Management Study (ALMS) trial of mycophenolate mofetil, 3) the Lupus Nephritis Assessment with Rituximab (LUNAR) trial of rituximab, 4) an ongoing National Institutes of Health trial of abatacept (Abatacept and Cyclophosphamide Combination: Efficacy and Safety Study [ACCESS]), and 5) published recommendations of the American College of Rheumatology.

Results

According to the complete response definition from the IM101075 study protocol, there was no difference among treatment groups in the IM101075 study. In contrast, according to the ALMS, LUNAR, and ACCESS criteria, rates of complete response among patients in the IM101075 study were higher in both treatment groups relative to control. The largest differences were obtained with use of the LUNAR criteria (complete response rate of 6% in the control group, compared to 22% and 24% in the 2 abatacept groups).

Conclusion

The choice of definition of complete response can determine whether a lupus nephritis trial is interpreted as a success or a failure. The results of this analysis provide an evidence-based rationale for choosing among alternative definitions and offer a strong rationale for conducting further studies of abatacept in lupus nephritis.

Bristol-Myers Squibb (BMS) recently completed a large, multicenter, randomized, double-blind, placebo-controlled phase II/III study of abatacept in patients with active lupus nephritis (IM101075). In this trial the primary outcome measure was not achieved (1), but the trial has provided a rich data set that can be used to address important questions about trial design and about abatacept. This report addresses two of those questions: 1) What primary outcome measure is best suited to lupus nephritis trials; and 2) Does abatacept have promise as a treatment for lupus nephritis? It also poses a third important question: What would success or failure of abatacept treatment teach us about mechanisms of pathogenesis of lupus nephritis?

At present, there is no consensus on how to define the primary outcome measure for a lupus nephritis trial. The landmark National Institutes of Health (NIH) trial of pulse cyclophosphamide used progression to end-stage renal disease as its primary outcome measure (2). The Euro-Lupus group also used treatment failure as the primary outcome measure in its trials, although its definition of failure was broader than the NIH definition (3, 4). More recently, trials of mycophenolate mofetil, rituximab, and abatacept have focused on success rates rather than failure rates, but success has been defined differently in each trial (1, 5, 6). This variation among trial designs makes it difficult to compare trials, and leaves us to wonder which outcome measure is most sensitive to change, which most closely reflects clinical status, and which correlates best with long-term outcome.

PATIENTS AND METHODS

Study design.

IM101075 was a 12-month, multicenter, randomized, double-blind, placebo-controlled phase II/III trial of abatacept versus placebo on a background of standard-of-care treatment with mycophenolate mofetil and corticosteroids in patients with active lupus nephritis. Women and men at least 18 years of age were eligible for the trial. All patients met at least 4 of the 11 components of the American College of Rheumatology (ACR) classification criteria for systemic lupus erythematosus (SLE) (7), either sequentially or concomitantly (7). It was not required that all 4 criteria were present at the time of study entry.

Renal biopsy had been performed within 12 months prior to screening in all patients, confirming class III or IV active proliferative lupus glomerulonephritis. Patients were excluded if the biopsy result was classified as class III(C), IV-S(C), or IV-G(C) according to the International Society of Nephrology/Renal Pathology Society 2003 classification criteria (8) or class IIIc or IVd according to the World Health Organization 1982 classification criteria (9). In cases in which the biopsy had been administered >3 months prior to screening, the C3 or C4 level had to be below the lower limit of normal, and/or the anti–-double-stranded DNA level above the upper limit of normal, during the current flare. In addition, all patients were required to have evidence of active disease at screening, defined as a urine protein:creatinine ratio of ≥50 mg/mmole and an active urinary sediment (>5 red blood cells/high-power field and/or >8 white blood cells/high-power field without infection, and/or cylindruria, confirmed during the current flare). Patients with serum creatinine levels >3 mg/dl were excluded. Prohibited background medications included azathioprine, leflunomide, methotrexate, and cyclosporine. Patients could not have received cyclophosphamide within 3 months of randomization, gamma globulin within 4 months, rituximab within 6 months, plasmapheresis within 1 year, or abatacept at any time prior to study entry.

Three hundred patients were randomized 1:1:1 among 3 groups; 2 patients did not receive study medication. One group received placebo infusions on days 1, 15, and 29, and every 28 days thereafter for 12 months. A second group received abatacept infusions at a fixed, weight-tiered dose approximating 10 mg/kg according to the same schedule. The third group received a higher dose of abatacept for the first 5 infusions (30 mg/kg), followed by a fixed, weight-tiered dose of ∼10 mg/kg every 28 days. All patients also received mycophenolate mofetil and corticosteroids throughout the trial. The target dose of mycophenolate mofetil was 2 gm/day for Caucasian and Asian American patients and 3 gm/day for patients of African descent, with a goal of achieving the target dose by day 57. Oral corticosteroids were initiated at a dosage equivalent to 30–60 mg/day of prednisone or prednisolone, with dosage based on treatment received prior to randomization. A tapering regimen that was designed to achieve a dosage of 10 mg/day by week 12 of treatment was recommended, but adherence to this regimen was not required if the site investigator deemed that it was not in the patient's best interest. Patients who were taking antiproteinuria agents (e.g., angiotensin II receptor blockers, angiotensin-converting enzyme inhibitors, or any combination of these drugs) continued these medications at a stable dosage, but initiation of these therapies during the trial was prohibited. Similarly, antimalarial medications and nonsteroidal antiinflammatory drugs were allowed, provided the dosage was stable.

Comparison of outcome measures.

Four recent lupus nephritis trials have each used different definitions for complete response. None of these definitions matches the published recommendations of the ACR (10). The distinguishing features among these approaches are summarized in Table 1, which compares the per-protocol complete response definition from IM101075 (clinicaltrials.gov identifier NCT00430677), the ACR recommendations, and the complete response definitions used in the Lupus Nephritis Assessment with Rituximab (LUNAR) trial of rituximab (NCT00282347) (6), the Aspreva Lupus Management Study (ALMS) trial of mycophenolate mofetil (clinicaltrials.gov identifier NCT00377637) (5), and an ongoing NIH-sponsored trial of abatacept in combination with cyclophosphamide (Abatacept and Cyclophosphamide Combination: Efficacy and Safety Study [ACCESS]) (clinicaltrials.gov identifier NCT00774852).

Table 1. Definitions of complete response in lupus nephritis trials: distinguishing features*
Source of criteriaUrine protein: creatinine ratio, gm/gmCreatinine or estimated glomerular filtration rateUrinalysis, cells or castsSteroid taper requiredCriteria must be met on 2 successive visits
  • *

    The Bristol-Myers Squibb (BMS) trial allowed enrollment of patients with urine protein:creatinine ratios of ≥0.44 gm/gm. The Lupus Nephritis Assessment with Rituximab (LUNAR) trial and the Abatacept and Cyclophosphamide Combination: Efficacy and Safety Study (ACCESS) trial restricted enrollment to patients with urine protein:creatinine ratios of ≥1.0 gm/gm. The Aspreva Lupus Management Study (ALMS) trial restricted enrollment to patients with proteinuria of ≥1 gm/24 hours, abnormal serum creatinine levels, and/or abnormal urinalysis results. ACR = American College of Rheumatology.

BMS trial≤0.26Within 10% of screening or baseline valueNormalNoYes
ACR recommendations≤0.20Within 25% of screening or baseline valueNormalNot addressedNo
LUNAR trial≤0.50Within 15% of screening or baseline valueNormalYesNo
ALMS trial≤0.50NormalNormalYesNo
ACCESS trial≤0.50Normal or within 25% of baseline valueNot a componentYesNo

In several respects, the criteria used by BMS in IM101075 set a higher bar for complete response than the other trials. The urine protein:creatinine ratio had to be ≤0.26 gm/gm (30 mg/mmole), the estimated glomerular filtration rate (GFR) had to remain within 10% of the screening/pre-flare value, and these criteria needed to be achieved on 2 consecutive visits. None of the other trials required that criteria be met at 2 visits. The ACR recommendations also set a high bar for the urine protein:creatinine ratio, whereas the other 3 trials were more lenient. Philosophies regarding renal function differed. The LUNAR trial required that the serum creatinine level be within 15% of the baseline level, whereas the ACR and ALMS criteria allowed greater deviation from baseline. The ACCESS trial criteria are the most liberal, requiring only that the serum creatinine level be within the normal range or, if abnormal, within 25% of baseline. The ACCESS trial is also most lenient in that urinalysis results are not included among the complete response criteria. Finally, while systematic tapering of steroids was recommended in the BMS protocol, it was not required. The ACR recommendations do not include any comment on steroid administration, and the LUNAR, ALMS, and ACCESS criteria all required that steroids be successfully tapered to ≤10 mg/day.

There were also differences in entry criteria among the 4 trials. The BMS trial allowed enrollment of patients with a urine protein:creatinine ratio as low as 0.44 gm/gm (50 mg/mmole), whereas the other trials required a urine protein:creatinine ratio of at least 1.0 gm/gm, or total proteinuria ≥1 gm/24 hours.

We followed 3 rules in applying the alternative definitions of complete response to the BMS data set: 1) for each set of trial criteria, we only included patients from the BMS study who also met the entry criteria for the trial whose complete response criteria were being applied (in each case, at least 80 patients per group qualified to be included in the “trial-specific” analysis set); 2) treatment was considered to have failed in patients whose prednisone dosage had not been tapered to ≤10 mg/day at day 365; and 3) the analyses were performed under blinded conditions with regard to the patients' treatment group. The proportion of patients achieving complete response according to each of the alternative definitions was calculated for each treatment group in the BMS trial) and 95% confidence intervals were computed.

RESULTS

Summary of patient outcomes.

Table 2 shows the outcomes for the patients in each of the 3 treatment groups, as assessed by the 5 distinct sets of criteria for response. As previously reported (1), withdrawal rates were comparable in all groups.

Table 2. Patient outcomes at 12 months*
 Control treatmentAbatacept 10/10 treatmentAbatacept 30/10 treatment
  • *

    Patients in the abatacept treatment groups received 12 months of treatment at 10 mg/kg every 28 days (abatacept 10/10) or 12 months of treatment at 30 mg/kg every 28 days for 5 months followed by 10 mg/kg every 28 days for the remainder of the treatment period (abatacept 30/10). Values are the percent of patients with the given outcome; n values are the number of patients analyzed from the control group, abatacept 10/10 group, and abatacept 30/10 group, respectively. See Table 1 for definitions.

BMS criteria (n = 100, 99, 99)   
 Complete response335
 Partial response182721
 Poor response574549
 Withdrawn222424
ACR criteria (n = 100, 99, 99)   
 Complete response61413
 Partial response212831
 Poor response513331
 Withdrawn222424
LUNAR criteria (n = 80, 87, 86)   
 Complete response62224
 Partial response212117
 Poor response483338
 Withdrawn252421
ALMS criteria (n = 95, 99, 94)   
 Complete response132528
 Partial response232019
 Poor response393029
 Withdrawn252524
ACCESS criteria (n = 80, 87, 86)   
 Complete response193633
 Partial response161619
 Poor response402427
 Withdrawn252421

Complete response rates.

The difference in complete response rate between the treatment groups and the control group, along with 95% confidence intervals, are depicted in Figure 1. Use of the IM101075 per-protocol criteria resulted in the lowest rates of complete response, with no significant difference between either of the abatacept treatment groups and the control group. For each of the other trial criteria (ALMS, LUNAR, ACCESS), the rates of complete response were increased in both treatment groups relative to control. The ACR criteria also demonstrated a >2-fold increase in complete response rates in both treatment groups relative to control, but the confidence intervals fell just short of clearly distinguishing treatment from control. The complete response rates were highest when determined using the ACCESS trial criteria (Table 2), reflecting a more lenient definition of complete response, but the magnitude of the difference between the control and treatment groups was comparable when assessed with the ALMS, LUNAR, and ACCESS criteria (Figure 1).

Figure 1.

Difference in rates of complete response in each of the abatacept treatment groups (12 months of treatment at 10 mg/kg every 28 days [10/10] or 12 months of treatment at 30 mg/kg every 28 days for 5 months followed by 10 mg/kg every 28 days for the remainder of the treatment period [30/10]) compared to the rate in the standard-of-care control group, with complete response defined according to the criteria from the Bristol-Myers Squibb study IM101075 protocol (BMS), the American College of Rheumatology (ACR) recommendations, and the Lupus Nephritis Assessment with Rituximab (LUNAR), Aspreva Lupus Management Study (ALMS), and Abatacept and Cyclophosphamide Combination: Efficacy and Safety Study (ACCESS) trials. Bars show the 95% confidence interval.

Patients with nephrotic levels of proteinuria.

Slightly more than 50% of the patients in the trial had urine protein:creatinine ratios of >339 mg/mmole (3 gm/gm) at screening and/or baseline. Among this subgroup, there were substantial differences in rates of complete response as defined using the different criteria (Table 3), and the confidence intervals denoted significant differences between abatacept-treated and control patients (Figure 2). Rates of complete response as defined using all of the criteria sets except the BMS criteria were 3–4-fold higher with abatacept than with control treatment among these patients with the highest levels of proteinuria.

Table 3. Rates of complete response in patients with nephrotic levels of proteinuria (>339 mg/mmole [3 gm/gm]) at screening and/or baseline*
CriteriaControl treatmentAbatacept 10/10 treatmentAbatacept 30/10 treatment
  • *

    Patients in the abatacept treatment groups received 12 months of treatment at 10 mg/kg every 28 days (abatacept 10/10) or 12 months of treatment at 30 mg/kg every 28 days for 5 months followed by 10 mg/kg every 28 days for the remainder of the treatment period (abatacept 30/10). Values are the number of complete responders/number assessed (%). See Table 1 for definitions.

BMS1/54 (2)1/49 (2)2/56 (4)
ACR1/54 (2)3/49 (6)7/56 (13)
LUNAR2/53 (4)8/48 (17)13/56 (23)
ALMS3/54 (6)9/49 (18)14/56 (25)
ACCESS4/53 (8)15/48 (31)17/56 (30)
Figure 2.

Difference in rates of complete response in each of the abatacept treatment groups compared to the rate in the standard-of-care control group in the analysis limited to patients who had nephrotic levels of proteinuria (urine protein:creatinine ratio >339 mg/mmole [3 gm/gm]) at screening and/or baseline, with complete response defined according to the criteria from the BMS study IM101075 protocol, the ACR recommendations, and the LUNAR, ALMS, and ACCESS trials. Bars show the 95% confidence interval. See Figure 1 for explanations and definitions.

DISCUSSION

It is in the nature of clinical trials research that a series of educated guesses is made before the experiment begins. These educated guesses define the study population, the precise outcome measures and the hierarchy among them, and the size of the trial. Occasionally, practical considerations, such as budget, influence the “guesses,” especially where power analyses are concerned. Sometimes these guesses prove to be accurate. However, in a worst-case scenario, wrong guesses can lead to rejection of a potentially effective drug or acceptance of a dangerous one.

It was not our intent in this study to challenge the primacy of prospectively defined criteria in clinical trials. However, as in all areas of science, there is much to be learned from a careful and open-minded analysis of the data, going beyond binary concepts of success or failure that are based on preconceived notions. In the case of the trial described here, we have shown that the conclusion of failure was driven not as much by the data as it was by trial design decisions that were made before any data were collected. The analyses described here do not establish that abatacept is effective in lupus nephritis. Future studies are needed to confirm or refute these findings. However, it is important to emphasize that the criteria that were examined in the present study were not developed post hoc for the purpose of this analysis. Rather, they are the criteria, established a priori, that have been used in other prominent lupus nephritis trials, as well as the published recommendations of the ACR. In that context, our findings suggest that there may still be a role for abatacept as a potential therapy for lupus nephritis.

In the absence of any proven effective treatment for lupus nephritis, it has not been possible to resolve differences of opinion about trial design. As a result, different primary outcome measures have been used in the different trials. The data presented here provide a basis in evidence, rather than opinion, for deciding which definitions of complete response may be most capable of discerning differences among treatment groups.

We examined the data on each patient individually to determine why the per-protocol complete response definition performed differently than the other definitions. There was no single answer. Rather, several issues combined to limit the discriminatory power of the definition. The requirement that a very low urine protein:creatinine ratio be reached (≤30 mg/mmole) eliminated some patients who met the more lenient urine protein:creatinine ratio target set for the ALMS, LUNAR, and ACCESS trials. However, the urine protein:creatinine ratio level required in the ACR criteria is even more stringent, yet differences between the treatment and control groups were still demonstrated with these criteria. The strict definition of stable renal function (estimated GFR within 10% of baseline) eliminated patients who failed to meet the standard for complete response solely due to serum creatinine fluctuations that were within the variability of the laboratory test (e.g., changes from serum creatinine levels of 0.5 mg/dl to 0.6 mg/dl). Finally, the requirement that the goal be achieved at 2 successive visits created a situation in which trivial and unsustained variation from any component of the outcome measure on day 337 (most commonly, transient microscopic hematuria) eliminated some patients who otherwise met complete response criteria on day 365.

Each of these problems contributed approximately equally to the reduced frequency of complete responses when assessed by the BMS criteria. In short, the bar was placed too high. Because so few patients reached the bar, the protocol-defined criteria failed to discern differences between groups.

The post hoc analysis reported herein suggests that data from the IM101075 trial may provide insight into whether certain subsets of patients with lupus nephritis might be more likely than others to benefit from abatacept (e.g., based on age, sex, race/ethnicity, duration of disease, kidney biopsy classification, estimated GFR, lupus serology, etc.). These analyses will be the subject of a subsequent report. However, it should be noted that those results will have to be interpreted with great caution, because no post hoc analysis can prove efficacy either in the overall population or in any subpopulation; future prospective trials will be needed for that purpose. Even if those trials confirm these results, the relatively low rates of complete response described here illustrate the considerable unmet need for more effective treatments.

There is a strong rationale for the notion that abatacept might be effective in the treatment of lupus nephritis. Abatacept (CTLA-4Ig) is a recombinant fusion protein comprising the extracellular domain of human CTLA-4 and a modified fragment of the Fc domain of human IgG1. It acts by competing with CD28 for binding to CD80/CD86. By inhibiting CD28 engagement on T cells and plasma cells (11, 12), abatacept interferes with mechanisms that have been implicated in lupus nephritis. This mechanistic rationale is strongly supported by the findings of studies in murine models of SLE, in which treatment with abatacept or other forms of CTLA4-Ig has been shown to arrest and even reverse established lupus nephritis (13–15).

Although the primary trial end point was not achieved in the IM101075 study, the analyses presented here lead us to ask what a positive response to abatacept might teach us about the pathogenesis of lupus nephritis and how this might influence the choice of therapeutic targets. There are several possible explanations for the potential efficacy of abatacept. The first relates to the activation of naive T cells, which requires CD80/CD86 engagement with CD28 on the T cell (11). If abatacept is proven to be effective in lupus nephritis, ascertaining whether treatment diminishes the number of activated T cells in blood may help address a critical unanswered question regarding renal flares: Is disease triggered by the activation of naive cells, which is prevented by abatacept, or by memory cells, which are not directly affected? An alternative mechanism that might explain the effect of abatacept in SLE involves direct effects on plasma cells (12). A detailed analysis of autoantibodies and B cell subsets may help to determine whether abatacept reduces plasma cell survival, which requires CD28 engagement. These mechanistic questions have not yet been explored in the context of a clinical trial. We are hopeful that the ACCESS trial, which is currently approaching full enrollment, will help to clarify the key question of the effectiveness of abatacept in lupus nephritis and, if efficacy is demonstrated, may help us refine our understanding of lupus pathogenesis.

AUTHOR CONTRIBUTIONS

All authors were involved in drafting the article or revising it critically for important intellectual content, and all authors approved the final version to be published. Dr. Wofsy had full access to all of the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis.

Study conception and design. Wofsy, Hillson, Diamond.

Analysis and interpretation of data. Wofsy, Hillson, Diamond.

ADDITIONAL DISCLOSURES

Author Hillson is an employee of Bristol-Myers Squibb.

Acknowledgements

We thank Stephanie L. Meadows-Shropshire for statistical support.

Ancillary