The BeSt study was designed and conducted by rheumatologists participating in the Foundation for Applied Rheumatology Research (FARR) in 18 peripheral and 2 university hospitals in the Western part of The Netherlands. The Medical Ethics Committee at each participating center approved the study protocol, and all patients gave written informed consent before inclusion. Patients with early RA, as defined by the American College of Rheumatology (ACR; formerly, the American Rheumatism Association) 1987 revised criteria (24), were recruited between April 2000 and August 2002. Patients had to have a disease duration of ≤2 years, be age ≥18 years, and have active disease with ≥6 of 66 swollen joints, ≥6 of 68 tender joints, and either an erythrocyte sedimentation rate (ESR) ≥28 mm/hour or a global health score of ≥20 mm on a 0–100-mm visual analog scale, where 0 = best and 100 = worst. Exclusion criteria included previous treatment with DMARDs other than antimalarials, concomitant treatment with an experimental drug, a malignancy within the last 5 years, bone marrow hypoplasia, a serum aspartate aminotransferase or alanine aminotransferase (ALT) level >3 times the upper limit of normal, a serum creatinine level >150 μmoles/liter or an estimated creatinine clearance <75 ml/minute, diabetes mellitus, alcohol or drug abuse, concurrent pregnancy, wish to conceive during the study period, or inadequate contraception.
Treatment allocation and intervention.
Patients were allocated to 1 of 4 treatment groups by variable block (9–13) randomization, stratified per center. Closed envelopes containing the patient study number, the allocated treatment group, and preprinted prescriptions for the allocated treatment were distributed and stored by ascending stratified randomization number in the participating centers. After receiving authorization by telephone from the study coordinator, the local rheumatologists enrolled eligible patients.
Patients received sequential monotherapy (group 1), step-up combination therapy (group 2), initial combination therapy with tapered high-dose prednisone (group 3), or initial combination therapy with infliximab (group 4). For all groups, the treatment protocol described a number of subsequent treatment steps for patients whose medication failed. The decision of whether to adjust medication was made every 3 months based on the Disease Activity Score in 44 joints (DAS44), which was calculated by a research nurse who remained blinded to the allocated treatment group during the entire study period. If the patient did not reach a DAS44 of ≤2.4, the treating physician immediately adjusted therapy by proceeding to the next step in the allocated treatment group. If the clinical response was consistently adequate (DAS44 of ≤2.4 for at least 6 months), medication was gradually tapered until 1 drug remained at a maintenance dose. The DAS44 cutoff level of 2.4 was chosen because observational studies have shown that rheumatologists are generally satisfied with the treatment results and do not intensify therapy if the DAS44 is ≤2.4 (25,26).
The patients assigned to sequential monotherapy (group 1) started with 15 mg/week methotrexate (MTX), which was increased to 25–30 mg/week if the DAS44 was >2.4. Subsequent steps for patients with an insufficient response were sulfasalazine (SSZ) monotherapy, leflunomide monotherapy, MTX with infliximab, gold with methylprednisolone, and, finally, MTX with cyclosporin A (CSA) and prednisone.
The patients assigned to step-up combination therapy (group 2) also started with 15 mg/week MTX, which was increased to 25–30 mg/week if the DAS44 was >2.4. If response to therapy was still insufficient, SSZ was added, followed by the addition of hydroxychloroquine (HCQ) and then by prednisone. Patients whose disease failed to respond to the combination of these 4 drugs subsequently switched to MTX with infliximab, MTX with CSA and prednisone, and, finally, to leflunomide.
The patients assigned to initial combination therapy with prednisone (group 3) started with the combination of 7.5 mg/week MTX, 2,000 mg/day SSZ, and 60 mg/day prednisone (the last of which was tapered in 7 weeks to 7.5 mg/day). In the case of a DAS44 of >2.4, MTX was augmented to 25–30 mg/week, and if the response was still insufficient, the combination was replaced subsequently by the combination of MTX with CSA and prednisone, followed by MTX with infliximab, leflunomide monotherapy, gold with methylprednisolone, and, finally, by azathioprine (AZA) with prednisone. In the case of a persistent DAS44 of ≤2.4, first prednisone was tapered to zero after 28 weeks, and then MTX was tapered to zero after 40 weeks.
The patients assigned to the initial combination with infliximab started with 25–30 mg/week MTX with 3 mg/kg infliximab at weeks 0, 2, and 6 and every 8 weeks thereafter. After 3 months, the dose of infliximab was increased to 6 mg/kg/every 8 weeks if the DAS44 was >2.4. Extra DAS44 calculations for dose adjustments were performed every 8 weeks within 1 week before the next infusion of infliximab. If the DAS44 was >2.4, the dose of the next infusion was increased to 7.5 mg/kg/every 8 weeks and finally to 10 mg/kg/every 8 weeks. If patients still had a DAS44 of >2.4 while receiving MTX with 10 mg/kg infliximab, medication was subsequently switched to SSZ, then to leflunomide, then to the combination of MTX, CSA, and prednisone, then to gold with methylprednisolone, and, finally, to AZA with prednisone. In the case of a persistent good response (DAS44 of ≤2.4 for at least 6 months), the dose of infliximab was reduced (from 10 to 7.5, 6, and then 3 mg/kg) every next infusion until stopped.
An overlap period of 1 month was used when switching from 1 single DMARD to the next. Unless otherwise specified, the doses of the different drugs were as follows: for MTX, 25–30 mg/week (oral or subcutaneous); for SSZ, 2,000–3,000 mg/day; for leflunomide, 20 mg/day; for HCQ, 400 mg/day; for prednisone, 7.5 mg/day; for CSA, 2.5 mg/kg/day; for gold, 50 mg/week (intramuscular) with 120 mg methylprednisolone (intramuscular) at weeks 0, 4, and 8; for AZA, 2–3 mg/kg/day; and, for infliximab, 3–10 mg/kg/every 8 weeks (intravenous), as described above in greater detail for group 4.
In all groups, if the clinical response was consistently adequate (DAS44 of ≤2.4 for at least 6 months), drugs were tapered to monotherapy at a maintenance dose, which was 10 mg/week for MTX, 2,000 mg/day for SSZ, 10 mg every other day for leflunomide, 50 mg every other week for gold, or 2 mg/kg/day for AZA. Prednisone and infliximab were always the first drugs to be tapered to a dose of zero. If disease activity flared (DAS44 >2.4) after tapering a drug, the last effective dose was reintroduced. In all groups, prednisone could be reintroduced only once: if, after a second discontinuation, the DAS44 increased again to >2.4, then the next step in the protocol was taken. Infliximab could be discontinued only once; after reintroduction, it could be tapered again, but only to a maintenance dose of 3 mg/kg/every 8 weeks. If side effects occurred, the responsible drug was reduced to the lowest tolerated dose. If a drug was not tolerated at all or contraindicated, patients receiving monotherapy proceeded to the next step in the allocated treatment group, and patients receiving combination therapy proceeded with the other drug(s) of the combination.
Contraindications for treatment with infliximab included the following: a known allergy to murine proteins, a chronic infectious disease, serious infections which occurred within the last 3 months, opportunistic infections which occurred within the last 6 months, a neurologic or cerebral disease, a lymphoproliferative disease, active tuberculosis (TB) within the last 2 years, and evidence of an old or latent TB infection for which latent TB therapy (isoniazid [INH]–based therapy or another regimen recommended by local experts) was not instituted prior to infliximab therapy. Prior to infliximab therapy, all patients were evaluated for TB with a purified protein derivative skin test and a chest radiograph. At the beginning of 2002, heart failure was added as a contraindication for treatment with infliximab. Previously enrolled patients with heart failure who had already received infliximab continued therapy and were closely monitored.
Concomitant treatment with nonsteroidal antiinflammatory drugs and intraarticular injections with corticosteroids were permitted. Other parenteral corticosteroids were not allowed. The use of DMARDs or oral corticosteroids was only permitted as dictated by the treatment protocol. All patients received 1 mg/day folic acid during treatment with MTX.
Assessment of end points.
Every 3 months, assessments were performed by a research nurse who was blinded to the allocated treatment group. Primary end points were functional ability, measured by the Dutch version of the Health Assessment Questionnaire (D-HAQ) (27), and radiographic joint damage according to the modified Sharp/Van der Heijde score (SHS), with a range of 0–448 (28), assessed on radiographs of the hands and feet obtained at baseline and after 1 year of followup. Higher D-HAQ scores indicate poorer function. All radiographs were read by 2 trained assessors who were blinded to the patient's identity, treatment center, and date of radiograph and who scored the radiographs paired, in random order, and independently. The intraobserver coefficients were 0.93 and 0.94, and the interobserver coefficient was 0.93. The mean score of the 2 assessors was used for the analysis. A patient was classified as having erosive disease if the mean erosion score was >0.5. Progression of radiographic joint damage was defined as a change in radiographic score greater than the smallest detectable difference (SDD), as well as by a change (in the total ragiographic score) >0.5 (29, 30). The SDD was 5.92, 3.76, and 3.75 for total SHS, erosion score, and joint space narrowing score, respectively. Secondary end points were 20%, 50%, and 70% improvement according to the ACR response criteria (31) and clinical remission, defined as a DAS44 of <1.6 (32).
To maintain uniformity in scoring and assessment quality, all research nurses were trained at study initiation and every 6 months thereafter. Two trial physicians verified adherence to the protocol every 3 months. All protocol deviations were recorded.
At each control visit, the following laboratory tests were performed: ESR, complete blood cell count, and serum levels of ALT, gamma glutamyl transpeptidase, bilirubin, lactate dehydrogenase, creatinine, electrolytes, and glucose. The treating physician recorded all adverse events (AEs) and serious AEs and, if necessary, made treatment adjustments in accordance with the protocol. Serious AEs were defined as any adverse reaction resulting in any of the following outcomes: a life-threatening condition or death, a significant or permanent disability, a malignancy, hospitalization or prolongation of hospitalization, a congenital abnormality, or a birth defect.
A total sample size of 468 patients (117 per group) was needed to obtain 80% power to detect a difference of at least 0.2 in the D-HAQ score, which was set as a clinically relevant difference, with a 5% significance level and adjusting for multiple comparisons between groups, assuming an SD of 0.45. This sample size also ensured >80% power to detect a difference of ≥20% in the change score of radiographic damage as measured by the SHS.
All outcomes were calculated in an intention-to-treat (ITT) analysis using all available data. Measures with a Gaussian distribution, expressed as the mean and SD, were analyzed using a one-way analysis of variance. In the case of an overall significant difference between the groups, a post hoc least significant difference test was performed for the primary outcomes, and Tukey's honestly significant difference test was used for the secondary outcomes to correct for multiple testing. Outcome measurements with a non-Gaussian distribution, expressed as the median and interquartile range (IQR), were analyzed by the Kruskal-Wallis test. Pairwise comparisons between groups were performed using the Mann-Whitney U test. For the SHS, the change scores were reported both as the mean and as the median. Categorical variables such as sex and rheumatoid factor (RF) positivity were compared between treatment groups using the chi-square test. A subgroup analysis of the progression of radiographic joint damage was performed in patients who either did or did not have erosive disease at baseline.