Reporting of randomized controlled clinical trials (RCTs) often is suboptimal. Cancer drug trials are complicated further by multiple survival and response endpoints. The authors of this report determined the frequency of reporting of time-to-event endpoints and tumor response outcomes in advanced colorectal cancer and examined the relation between the year of publication and the reported effectiveness of 5-fluorouracil or equivalent agents.
A literature search identified 144 RCTs that involved 35,853 patients. The patient characteristics, trial designs, and methods for endpoint reporting were extracted. The clinical effectiveness of 5-fluorouracil or equivalent agents was analyzed in 3 time periods (pre-1990, 1990s, and 2000s) in 28,636 patients.
One hundred twenty-nine trials (90%) reported overall survival (OS) and response rates; whereas time to progression (44%), duration of response (43%), progression-free survival (22%), and time to treatment failure (12%) were reported less frequently. Except for stable and progressive disease, the frequency of reporting of endpoints did not improve over the period studied. The median OS for patients who received 5-fluorouracil or equivalent agents increased significantly (from 9.4 months before 1990 to 13.5 months after 2000). During the same period, the rate of stable disease increased (38.2%, 40.5%, and 45.1% for pre-1990, 1990s, and 2000s, respectively; P = .004); whereas the rate of progressive disease decreased significantly (39.2%, 33.3%, and 27.8%, respectively; P = .002).
Randomized clinical trials (RCTs) are the undisputed gold standard for establishing drug safety and efficacy. To address well recognized deficits in reporting of trials,1 the Consolidated Standards of Reporting Trials (CONSORT) Statement was developed in 1996,2 updated in 20013 and refined in 2010.4, 5 Completion of the CONSORT checklist is designed to improve the transparency, accuracy, and completeness of trial reporting. In doing so, it is envisaged that the public will be better placed to appraise the health impact of a new therapy. Several studies have examined the impact of CONSORT in disease types6, 7 or more generally.8, 9 Only a few studies have evaluated the scientific quality of RCTs in cancer,10-14 and, to our knowledge, there are no studies in colorectal cancer. Modest improvements have been demonstrated in some areas, for example, increased use of intention-to-treat analysis.11 However, all of those studies concluded that the quality of trial reporting remains suboptimal despite the widespread promulgation of the CONSORT statement. In terms of the results component of the checklist, CONSORT requires a flow diagram, the period of recruitment, baseline data, and the outcomes of primary and secondary endpoints.
In cancer drug trials, improvement in overall survival (OS) is the most convincing measure of drug efficacy and patient benefit. Measuring survival requires prolonged follow-up and large patient numbers, and it may be confounded by the use of effective rescue therapies. To address this limitation, many studies have introduced a range of intermediate time-to-event endpoints,15, 16 such as the time to disease progression (TTP), progression-free survival (PFS), the time to treatment failure (TTF), and the time to response (TTR). In addition, greater emphasis has been placed on subcategorizing response in terms of complete response (CR), partial response (PR), stable disease (SD), or progressive disease (PD) rather than using the overall response rate (RR). The duration of response (DOR), which is measured from the time of initial response to documented tumor progression, also is being reported increasingly. It is proposed that the use of intermediate endpoints will shorten the duration of drug development, minimize the exposure of patients to ineffective or toxic treatments, and accelerate the availability of new medicines.17 Several cancer drugs have been marketed and publicly subsidized on the basis of their impact on intermediate time-to-event endpoints. New biologic agents, such as bevacizumab, cetuximab, or panitumumab, are characterized by a cytostatic mechanism of action rather than by cytotoxicity, which is used to characterize chemotherapy. For this reason, SD has been proposed as a relevant clinical endpoint by some investigators rather than objective tumor shrinkage. To date, this endpoint has not been accepted by regulators.
Nevertheless, this change in the paradigm of cancer trial design and conduct has prompted drug regulators, healthcare payers, and the pharmaceutical industry to explore indirect ways of establishing the veracity of claims regarding new medicines. In this regard, the body of published literature potentially holds baseline information on trial populations and on the natural history of cancer with and without cytotoxic chemotherapy. Analysis of these data also may allow quantification of the changes in intermediate endpoints for patients who are exposed to chemotherapy and distillation of the changes related to the therapeutic effect of the drug versus those attributable to baseline improvements over time in medical care. Although studies have demonstrated that many trial reports fail to comply with the CONSORT statement, it remains important to test the possibility that historic trial reports contain such useful information.
Colorectal cancer is 1 of the most common cancers in the world and has a poor outcome in advanced stages. Over the last 40 years, fluoropyrimidines have been an integral part of the treatment for patients with advanced colorectal cancer (ACRC). Despite constant modulation (ie, bolus vs infusional or oral vs intravenous), it was not until the early years of this decade that treatment outcome options for patients with ACRC improved significantly.18, 19 For these reasons, a study of first-line clinical trials in ACRC provides an ideal opportunity to address several significant questions related to cancer trial endpoints and the adequacy of the trial reports in this area. The objectives of this study were to analyze published RCTs of ACRC during the period from 1966 to 2005 to determine the frequency of reporting of time-to-event endpoints and tumor response outcomes. We also examined the relation between year of publication and the reported effectiveness of 5-fluorouracil (5-FU) or equivalent agents.
MATERIALS AND METHODS
Search Strategy and Selection Criteria
We searched electronic databases (MEDLINE [1966 to June,2005]; EMBASE [1980-2005]; the Cochrane Library, Issue 2 ; PubMed [June 14, 2005]; Current Contents [1993 to June 2005]; CINAHL [1982, to June, 2005]; and OldMedline [1950-1965]) to identify RCTs in ACRC, as reported previously.20 Trials were included in the current study if they assessed first-line pharmacologic treatments, contained ≥2 arms, and reported OS and either response or TTP. Exclusion criteria were nonrandomized design, single-arm trials, and trial duplicates. In the event of updated trial results, information from the most recent published article was included. There were no language restrictions, and all abstracts were screened. To reduce the bias resulting from unpublished data, the abstracts from conferences and medical meetings or proceedings also were examined. These abstracts included those published until June 1, 2005 by the American Society of Clinical Oncology, the European Cancer Organization/European Society for Medical Oncology, and the National Cancer Institute.
The data-extraction process and the methods that we used to assess the quality of each study have been previously reported.20 To validate the accuracy of the data and to ensure trial updates had been incorporated, 1 author (I.N.) re-extracted the dataset before analysis. Figure 1 provides a flow diagram describing the number of trials, arms, and study participants.
The following trial characteristics were extracted: size, number of treatment arms, recruitment period, and year of publication. The reported baseline patient characteristics of median age, sex, previous adjuvant therapy, tumor site (colon or rectum), and Eastern Cooperative Oncology Group (ECOG) performance status were obtained for each treatment group in every trial along with the number of patients (measured as intention to treat [ITT], evaluable, and eligible). Because of the inconsistency of reporting, performance status was summarized only as the percentage in each category when available (ECOG 0, 1, 2, 3, or 4).
For each treatment group in each trial, time-to-event outcomes (OS, DOR, TTP, PFS, TTF, and TTR) were recorded in months. Outcomes were recorded based on 3 possible patient groups: the ITT population, the evaluable population, or the eligible population. For tumor response endpoints (CR, PR, SD, PD, and RR), the number of participants who achieved the specified response was entered for each arm. The population on which the response was based (ie, ITT, evaluable, or eligible) also was recorded, and response rates were calculated based on the appropriate patient population. Although outcomes were presented for 3 possible groups, outcomes were included in analyses based on the ITT population as the first choice, followed by those obtained from an evaluable population. Outcomes based on eligible populations were not analyzed and were deemed “not recorded.”
To examine the changes in the frequency of reporting trial endpoints over time, first, we identified the subset of trial arms that contained only 5-FU or equivalent agents. These agents were classified as 5-FU alone (bolus and infusional); 5-FU (bolus and infusional) in combination with biomodulating agents like methotrexate, N-phosphonacetyl-l-aspartate acid, allopurinol, dipyridamole, trimetrexate, 5-methyltetrahydrofolate, levamisole, or folinic acid; 5-FU plus chemoimmunotherapy (protein A sepherose, thymostimulin, interferon); oral fluoropyrimidines (capecitabine, tegafur-uracil, eniluracil/FU, doxifluridine); thymidylate inhibitors (raltitrexed); and 5-FU–containing regimens in combination with other cytotoxics (cisplatin) for which the outcome was not superior to the standard 5-FU–based arm. For each study, 2 authors (H.-T.A. and R.W.) identified the trial arms that belonged to 1 of the 6 5-FU categories described above. The total patient population for this analysis was estimated from treatment-group sizes using the ITT population as the first choice; otherwise, we used the evaluable population (Fig. 1). We considered it appropriate to set a cutoff for this analysis at June 1, 2005, because most trial arms in the recent 5 years contained combination agents of oxaliplatin or irinotecan with or without biologic agents.
Overall trial characteristics were summarized using descriptive statistics. Patient characteristics were summarized for all arms and for 5-FU or equivalent arms, weighting for the size of each trial arm.
The overall frequency of outcome reporting was tabulated for all time-to-event and rate-based outcomes using both trial and arm as units of analysis. Outcome reporting for selected outcomes also was summarized for the periods before 1990, from 1990 to 1999, and from 2000 to 2005. Associations between reporting and period were assessed using the overall chi-square test and the chi-square test for trend with no weighting applied.
Outcomes of OS, TTP, DOR, and RR, SD, and PD were summarized for 5-FU or equivalent arms. The mean (95% confidence interval) of the median or the rates of these outcomes were plotted for the time periods and overall. Simple linear regression models weighted by trial arm size were constructed to investigate the relation between these selected outcomes and year of publication. Regressions were summarized by the slope, representing the estimated change in outcome for each year of publication. Residual plots were inspected to assess the assumptions of linear regression. Data were analyzed using SAS software (version 9.2; SAS Institute Inc., Cary, NC).
Trial Characteristics and Reporting of Baseline Patient Demographics
In total, 144 ACRC RCTs, either phase 2 or phase 3, were identified that involved 35,853 patients in 340 trial arms. All trials studied first-line cytotoxic therapy either alone or in combination. The median number of patients per trial was 187 (range, 25-1120 patients per trial). Most trials (n = 111 trials; 77%) included 2 treatment arms, and 33 trials (23%) reported on ≥3 arms. Eight studies (6%) were reported in the form of abstracts, and the remaining 136 studies (94%) were published as full articles. Twenty RCTs (14%) were phase 2 studies, and 124 RCTs (86%) were phase 3 studies. Less than 10% of the RCTs in this study predated 1986, and most were published during the 1990s and later. The median duration of follow-up was published in only 44 trials (31%); and, considering these trials separately, the median duration of follow-up was 21 months (range, 4-61 months). Most trials that are cited in this report were published within 2 to 3 years of completing patient accrual.
We identified a total of 279 arms that used 5-FU–based or equivalent agents from 131 trials (data not shown). Of those 279 arms, 226 arms reported population sizes based on an ITT population, and 51 arms reported only evaluable sizes, providing an estimated total sample size of 28,636 patients. Two arms were halted: 1 because of futility21 and another as a result of prespecified interim analysis.22 The major patient baseline characteristics of the 5-FU or equivalent cohort were similar to those of the entire study population, and there were no differences in the frequency with which those characteristics were recorded (Table 1).
Table 1. Reported Patient Baseline Characteristics for All Arms and for 5-Fluorouracil/Equivalent Agentsa
No. of Arms Reporting Data (N=340)
Of the 289 arms that reported sex, there were 159 arms in which the sum of men and women was equal to the intent-to-treat number of patients, 86 arms in which the sum was equal to the evaluable number of patients, 42 arms in which the sum was equal to the number of eligible patients, and 2 arms in which the sum was not equal to any of the given numbers of patients. Some publications reported the exact numbers of men and women or reported only percentages, yet others included only information on 1 sex group.
Prior adjuvant therapy, %
Colon disease site, %
Performance status, %
No. of Arms Reporting Data (N=279)
Prior adjuvant therapy, %
Colon disease site, %
Performance status, %
Frequency of Reporting of Time-to-Event Endpoints and Tumor Response Outcomes
In terms of reporting clinical outcomes, 129 trials (90%) identified OS as an endpoint, and a similar number of studies reported RR (130 trials; 90%), as shown in Table 2. The number of patients with SD or PD was captured in about 60% of trials. We were able to derive the disease control rate (DCR) (DCR = RR + SD) for 83 trials (58%), incorporating 189 arms and 18,128 patients. Most trials (123 of 144; 85%) used World Health Organization (WHO) criteria to assess response, 3 trials (2%) used Southwest Oncology Group criteria, 2 trials (1%) used Response Evaluation Criteria in Solid Tumors (RECIST), and the remaining 16 trials (11%) failed to specify any criteria. In general, time-to-event endpoints were reported less frequently than response rates (Table 2). It is noteworthy that the reporting of most time-to-event outcomes did not alter significantly over the period examined in this study (Table 3). For response-based outcomes, there was weak evidence of a decrease in reporting of RR (P = .07) and evidence of an increase in reporting of SD (P = .04) and PD (P = .0002) over time. Reporting of both OS and DOR decreased in the 1990s compared with the preceding decade, although reporting improved from 2000. TTP reporting, however, increased in the 1990s but decreased after 2000.
Table 2. Frequency of Reporting Time-to-Event and Response Outcomes in Trials
Median No. of Patients per Arm
Total No. of Patients
Time to disease progression
Duration of response
Time to treatment failure
Time to response
Table 3. Number and Percentage of Arms Reporting Selected Outcomes, by Year of Publication
No. of Arms
No. of Arms (%)
OS indicates overall survival; TTP, time to progression; DOR, duration of response; RR, response rate; SD, stable disease; PD progressive disease.
1990 to <2000
Test for difference
Changes in Reported Clinical Effectiveness of 5-Florouracil in Trial Participants With Time
A comparison of 3 time periods revealed that there was a significant increase in the median OS from 9.4 months in the pre-1990s period to 13.5 months in 2000 (Table 4; Fig. 2, top). With regard to response endpoints, the rate of SD increased significantly, and the rate of PD declined, yet there was no change in the overall RR (Table 4; Fig. 2, bottom).
Table 4. Clinical Outcomes of 5-Florouracil-Based or Equivalent Agents Over Time
The current results indicate that the reported OS for trial arms that received 5-FU or equivalent agents increased significantly over the study period. Factors that contributed to this increase include arm crossover to a more active treatment arm (for instance, oxaliplatin and irinotecan), lead time bias, and improvements in symptom control and palliative care. In contrast to the changes in survival, we observed that neither the response duration nor the time to progression altered across the study period. This illustrates the observation that the growing availability of alternative treatments does not have an impact on these intermediate endpoints. For this reason, such treatments are likely to represent reliable historic comparison points for future studies.
Response rates were reported more frequently than time-to-event endpoints. It is noteworthy that there was a consistent improvement in the rate of reporting of SD and PD; and, although the reporting still was less than that for the RR, >60% of trial arms after 2000 captured these endpoints. It also is noteworthy that, concomitant with this increased reporting, we also observed a highly significant increase in the percentage of patients who achieved SD and an 11% decrease in the percentage with PD. These changes may related to the availability of more stringent follow-up imaging in recent years. For example, more frequent follow-up schedules could result in an increased fraction of SD, especially in slow-growing tumors, in which time intervals are too short to recognize true PD. Our observations regarding SD have implications for those who advocate the use of this endpoint as a valid outcome in early trials of new agents.23 Our data suggest that there is a lack of precision in defining this term, and the classification of tumors into this category may be open to considerable interpretation.
Although there is ongoing discussion about the importance of reporting according to the CONSORT statement, there are very few reports that that have comprehensively evaluated compliance with CONSORT, and we know of none in the setting of metastatic CRC. Although the purpose of our study was not to rectify this gap, we were able to make several observations. Baseline patient characteristics, including age and performance status, were poorly reported. The duration of follow-up was published in only 31% of the studies, and 11% of trials failed to mention the criteria they used to assess response. We also noted that the beginning of the time interval for measuring time-to-event endpoints was defined variously as the date of diagnosis, the date of enrolment onto the trial, or the first visit to the oncologist. Similar problems arose in defining the end of the time interval. In some studies, this was defined by measured progression, whereas, in others, it was defined as patient death (regardless of etiology) or the last known review. In many studies, this information was not provided. The consequence of a lack of standardization around measurement is that the time-to-event interval may appear shorter or longer, depending on the date selected for completion or initiation of the measurement. Although the constant variation of time-to-event definitions is not primarily a problem of trial quality, it may have an impact on the correct interpretation of surrogate markers. Readers must be aware that different classification for the same endpoint often are used and commonly are combined in meta-analytic approaches. Strictly speaking, comparisons between trials that use the same surrogate endpoint but a different definition should not be made or at least should not be disclosed.
Our study has several significant limitations. First, the 3 time periods chosen for analysis were necessarily arbitrary. The period from 2000 onward was selected to capture the impact of the publication of CONSORT in 1996, and the period after 1990 reflected the period after the publications of guidelines regarding trial reporting.24, 25 The inclusion of very old studies also may have biased our conclusions; however, <10% of the studies we included were published before 1986. Second, comparisons between trials inevitably were compromised by the use of different response criteria and undisclosed variations in medical imaging techniques. Even with the use of guidelines, such as WHO and RECIST, consistency could not be guaranteed, especially given the measurement variations mentioned above. Third, by pooling the 5-FU trial arms, we have made the assumption that the route of administration or the method of administration does not alter the response rates.
In our analysis, we considered only published trials. Moreover, the inclusion of unpublished trials, for example, those with negative outcomes or poor quality designs, may have altered our conclusions. Finally, a small number of studies that were included in our analysis were phase 2 trials (14%) rather than phase 3 trials. Although phase 2 studies traditionally have different endpoints than phase 3 trials, all studies that were included in our analysis met predetermined criteria for inclusion, and phase was not 1 of the criteria.
In summary, we have highlighted several issues related to the quality and reporting of RCTs in ACRC that need to be considered when using these data in meta-analyses for constructing new statistical hypotheses or for benchmarking the potential impact of new therapies that have been evaluated only in single-arm trials. Compliance with the CONSORT statement would greatly improve reporting; however, our results lend strong support to the proposal of others26, 27 that greater attention is needed to the definitional issues around how and when time-to-event and response endpoints should be collected and reported.
CONFLICT OF INTEREST DISCLOSURES
Supported by the Cancer Council of New South Wales.