Secular changes in the quality of published randomized clinical trials in rheumatology

Authors


Abstract

Objective

To assess the quality of published randomized clinical trials (RCTs) in rheumatology and to determine whether there has been improvement in quality between 2 time periods, 1987–1988 and 1997–1998.

Methods

Using MEDLINE and a hand search of selected rheumatology journals, we identified RCTs of adult rheumatic diseases published in English in 1987–1988 or 1997–1998. We examined trial quality with an expanded version of the Jadad scale, which assesses the adequacy of reported random sequence generation, allocation concealment, blinding, and analysis. All trials were read by 1 reviewer, with prior standardization using a random sample read by 2 reviewers. We also evaluated “high”- versus “low”-impact journals based on citation index.

Results

Two hundred forty RCTs (1987–1988 119 RCTs, 1997–1998 121 RCTs) were assessed. Results showed improvement in the quality of the trials, but the rates of reported random sequence generation, allocation concealment, power, and intent-to-treat analyses were persistently low. Low rates of reports of random sequence generation, allocation concealment, and intent-to-treat analyses were present even in the high-impact journals.

Conclusion

There has been improvement in the quality of reporting of RCTs in rheumatology between 1987–1988 and 1997–1998. However, methodologic problems such as lack of allocation concealment, inadequate random sequence generation, lack of reporting of power, and lack of intent-to-treat analyses remain common. Many of these problems are established sources of bias in RCTs and are easily rectifiable.

We rely on randomized clinical trials (RCTs) as the most reliable evidence of a treatment's efficacy. The methodologic quality of RCTs and their reporting should influence the way we interpret the evidence contained within them. In the mid-1990s, 2 independent international efforts joined forces to develop the CONSORT statement, a structured guide to reporting of RCTs in medical journals with the aim of improving the quality of reporting of published RCTs (1).

The objectives of our study were to assess the quality of published RCTs in rheumatology, to determine whether there has been improvement in quality between the 2 time periods 1987–1988 and 1997–1998, in which the former predates and the latter postdates the era of CONSORT and other published efforts to improve RCT reports in medicine in general and in rheumatology specifically, and to evaluate whether there are still important problems in the reporting of clinical trial methods in rheumatology.

Potentially important problems in trial methods include inadequate random sequence generation, lack of allocation concealment, and imperfect blinding, all of which have been shown to yield inflated estimates of a treatment's effect compared with trial reports without these problems (2, 3). Random sequence generation is a process designed to randomly allocate participants to each treatment group as a way of evenly distributing known and unknown confounding variables. Allocation concealment is a process distinct from blinding that conceals the random assignment sequence from the investigator and from the participant before and until the allocation of therapy, which is also intended to reduce selection bias. Blinding seeks to mask both the participant and/or the assessor from the nature of the intervention, to avoid ascertainment bias.

Problems with these methods have been associated with biased estimates of treatment effect. For example, the effect estimates for RCTs with inadequate allocation concealment were 37–41% greater than those with adequate allocation concealment (2, 3). In addition, published RCT reports that failed to describe allocation concealment at all had similar biased estimates of effect as those with inadequate methods (3). Those RCTs with inadequate random sequence generation and inadequate blinding had effect estimates that were 5–11% and 17% greater, respectively, compared with those with adequate methods (2, 3). The frequency of these particular correctable deficiencies in rheumatology trials has not been assessed, and their enumeration served as an additional goal of this study. In addition to these problems, we focused on the reporting of power analyses in trials that have shown no differences between treatments, and we focused on the presence or absence of intent-to-treat analyses. Both of these issues reflect widely accepted elements of trial publication quality (1, 4).

MATERIALS AND METHODS

Search strategy. We sought RCTs on rheumatic diseases in adults that were published in English in 1987–1988 and 1997–1998. These included RCTs on osteoarthritis, rheumatoid arthritis (RA), fibromyalgia, systemic lupus erythematosus, systemic sclerosis, primary Sjögren's disease, vasculitis, Behçet's disease, gout, pseudogout, ankylosing spondylitis, psoriatic arthritis, and seronegative arthritis. RCTs that evaluated back pain and soft tissue rheumatism or only evaluated adverse effects were excluded.

The RCTs were found using a MEDLINE search that incorporated MeSH terms for “clinical trials,” the English language, and the diseases outlined above. In addition, we performed a hand search of 6 rheumatology journals: Annals of the Rheumatic Diseases, Arthritis & Rheumatism, British Journal of Rheumatology, Journal of Rheumatology, Osteoarthritis & Cartilage, and Scandinavian Journal of Rheumatology. A trial was designated to be a “randomized clinical trial” if the terms “randomization,” “randomly,” or “randomized” appeared in the title, abstract, or methods section, and if it was a prospective clinical trial with a parallel or crossover design. Abstracts were reviewed by 1 reviewer (CLH) and each RCT report was divided into those ineligible for the study and those for which the entire report needed to be reviewed. A group of 307 trials were reviewed for eligibility (by CLH) and then a random subset was reviewed by a second reviewer (DTF). There were no discrepancies between the 2 reviewers in the selection of the RCTs.

Evaluation of RCT quality. Each RCT report was assessed for quality using a version of the Jadad scale (5), modified to include more detailed information regarding the methods of allocation concealment and analysis (see Table 1). Analysis was classified according to the primary analysis undertaken in the Results section. After 2 reviewers (CLH and DTF) standardized the data extraction by using a sample of RCTs from other years, all trials were evaluated by 1 reviewer (CLH). A computer-generated random sample of RCTs (n = 11) from both time periods was evaluated by a second reviewer (DTF) to determine interobserver reliability for allocation concealment, double blinding, randomization, and intent-to-treat analysis (kappa 0.80 for all features combined, 95% confidence interval 0.64–0.97). In addition, we randomly selected a further subset from both time periods (n = 41) using a computer-generated random number list. These RCTs were blinded for authors, institution, journal, and year and then evaluated by the primary reviewer (CLH). Intraobserver reliability for blinded compared with nonblinded RCTs for allocation concealment, double blinding, randomization, and intent-to-treat analysis was 0.81 (95% confidence interval 0.70–0.91).

Table 1. Evaluation of methodologic quality
Characteristic, qualityDescription
  • *

    Categories defined by Rochon et al (see ref. 6).

Randomization sequence generation
 AdequateRandom number table, computer random number generation, coin tossing, shuffling cards, adaptive randomization
 InadequateCase record number, alternation, date of admission, date of birth, even/odd, minimization
 Unstated/unreported
Allocation concealment
 AdequateCentral allocation (randomization or allocation occurring at separate site from participants), local pharmacy allocation, numbered or coded bottles, serially numbered opaque sealed envelopes
 Unclear/unreported
Double blinding
 AdequateUse of active placebos, identical placebos, or dummies is mentioned.
 Inadequate
 Unclear/unreported
Description of withdrawals and dropouts
 AdequateDescription of participants who did not complete the observation period or who were not included in the analysis must be described.
 InadequateNo statement of withdrawals, or the description of withdrawals and dropouts does not distinguish between groups.
Analysis
 Intent-to-treatAll participants randomized were included in analysis.
 Modified intent-to-treatAnalysis excluded participants who never received treatment or who were never evaluated while receiving treatment.
 Completers analysisInclusion of only participants who completed treatment protocol.
 Unclear/not done
Manufacturer support*
 0) No manufacturer support
 1) Acknowledged grant support by a pharmaceutical manufacturer
 2) Pharmaceutical employee listed as author
 3) Stated the drug was supplied by manufacturer
 4) Publication in journal supplement sponsored by a pharmaceutical manufacturer

Data extraction. Demographic data regarding the trial, including disease, country of origin, type of intervention, type of trial (parallel versus crossover), number of participants, and length of trial, were extracted. We also collected information regarding manufacturer support according to categories defined in a previous study by Rochon et al in 1994 (6).

Statistical analysis. Categorical data were analyzed using chi-square tests for categorical data (or Fisher's test when numbers were small). Continuous measures were analyzed using t-tests or Wilcoxon tests for nonparametric data. P values reported are 2-sided. Analyses were undertaken comparing the 2 time periods (1987–1988 and 1997–1998) and comparing RCTs from “high”- and “low”-impact journals. The citation index for each journal in which an RCT in the study was published was determined from the 1998 Science Citation Index; a journal's citation index is a function of how often published articles from that journal are cited subsequently in other journal articles. RCTs from journals without a citation index were excluded from this analysis. RCTs were considered to be from high-impact journals if the citation index was above the median of the journals included in the study. The remainder were considered low impact.

RESULTS

Of the 307 trials reviewed, 240 were included in the study, and these were almost equally divided between the 2 time periods (Table 2). The proportion of RCT reports evaluating treatments for RA decreased over the 10-year period, with an increase in trials of uncommon diseases such as connective tissue diseases and vasculitis. The proportion of RCTs involving drug interventions, and particularly nonsteroidal antiinflammatory drugs, decreased. There was an increase in the number of participants in drug intervention trials (median 46 in 1987–1988 [range 9–493] versus 88 in 1997–1998 [range 7–10,051]; P = 0.0001 by Wilcoxon rank test). There was also a secular increase in acknowledgment of manufacturer support in drug therapy trials and a reduction in those published in industry-sponsored supplements (Table 2). However, there was no difference between manufacturer-supported and non-manufacturer-supported RCTs in the adequacy of random sequence generation, allocation concealment, double blinding, or analysis (data not shown).

Table 2. Characteristics of trials*
Type of trial1987–1988 (n = 119)1997–1998 (n = 121)
  • *

    Values are the no. (%) of randomized clinical trials (RCTs). NSAID = nonsteroidal antiinflammatory drug.

  • P = 0.02 by chi-square test.

  • P = 0.001 by chi-square test.

  • §

    P = 0.0001 by Wilcoxon rank test.

Disease type
 Rheumatoid arthritis55 (46.2)47 (38.8)
 Osteoarthritis35 (29.4)38 (31.4)
 Fibromyalgia5 (4.2)9 (7.4)
 Connective tissue disease/vasculitis12 (10.1)20 (16.5)
 Other12 (10.1)7 (5.8)
Drug therapy105 (88.2)90 (74.4)
 NSAID (drug RCTs only)41/105 (39.0)20/90 (22.2)
 Published in rheumatology journals60 (50.4)86 (71.1)
 Crossover trials29 (24.4)10 (8.3)
 Number of participants
  Mean ± SD81.7 ± 93.3323.1 ± 1,243.6
  Median (range)§46 (9–493)88 (7–10,051)
 Length of trial (months)
  Mean ± SD4.5 ± 4.37 ± 9.3
  Median (range)3 (0.1–24)3 (0.25–60)
 Manufacturer support (drug RCTs only)
  No manufacturer support43/105 (41.0)27/90 (30.0)
  Acknowledged grant from pharmaceutical manufacturer18/105 (17.1)47/90 (52.2)
  Pharmaceutical employee listed as author11/105 (10.5)10/90 (11.1)
  Drug supplied by manufacturer20/105 (19.0)4/90 (4.4)
  Publication in journal supplement sponsored by a pharmaceutical manufacturer13/105 (12.4)2/90 (2.2)

Although there were modest improvements in methods, the problems with inadequate random sequence generation, lack of allocation concealment, and lack of intent-to-treat analyses remained common (Table 3). There were small numbers of RCTs in which random sequence generation was adequately performed; however, in most RCT reports (89.9% in 1987–1988, 79.3% in 1997–1998), the method of randomization was not stated. Similarly, in most RCT reports (87.4% in 1987–1988, 80.2% in 1997–1998), the method of allocation concealment was not described. When reported, central allocation was the most common form of allocation concealment used in both time periods (5.9% in 1987–1988, 12.4% in 1997–1998). Pharmacy allocation, numbered and coded bottles, and opaque envelopes were less commonly used forms of allocation concealment. In most double-blind trials, double blinding was both adequately described and performed.

Table 3. Quality characteristics based on year of publication and impact factor in both time periods combined (21 RCTs from journals without citation index excluded)*
CharacteristicYear of publicationImpact factor
1987–1988 (n = 119)1997–1998 (n = 121)Low impact (n = 73)High impact (n = 146)
  • *

    Values are the % of randomized clinical trials (RCTs).

  • P = 0.05 versus 1987–1988.

  • P = 0.005 versus low-impact RCTs.

Random sequence generation8.417.48.214.4
Allocation concealment11.819.06.921.9
Double blinding (% of double-blinded RCTs)73.785.582.781.5
Description of dropouts and withdrawals58.058.756.759.6
Power analysis (% of RCTs with negative results)9.835.115.826.5
Intent-to-treat analysis19.329.824.726.0

Of the 240 RCTs included in the study, 146 were considered to be from high-impact journals and 73 from low-impact journals, with 21 RCTs excluded from this analysis for being published in journals without a citation index. Report of adequate random sequence generation was uncommon in both the high- and the low-impact journals (Table 3). However, although report of adequate allocation concealment was uncommon in both groups, it was reported more often in high-impact journals (P = 0.005) (Table 3).

There were definite improvements in analysis of RCT data. For example, the proportion of trial reports using intent-to-treat analyses increased from 19.3% (1987–1988) to 29.8% (1997–1998), and those with a completers analysis as the sole analysis presentation became less common (42.0% versus 23.0%, respectively; P = 0.018). Nonetheless, analyses in the majority of RCTs, even in 1997–1998, were not done on an intent-to-treat basis. A minority of RCTs in both time periods (1987–1988 11.8%, 1997–1998 10.7%) did not perform any statistical comparison of the 2 intervention groups. The description of the analysis in the Methods section in some RCTs either did not specify the type of analysis that was done or was misleading. For example, in 1997–1998, 46 RCT reports (38.0%) stated that an intent-to-treat analysis would be performed, but this was only done in 36 (29.8%). In a further 25 RCTs in 1997–1998, a modified intent-to-treat analysis was performed, which excluded participants who never received treatment or who were never evaluated while receiving treatment.

Another secular improvement noted was a significant increase in the number of RCTs with a negative result that reported power analyses. However, 65% of “negative trials” in which the treatment comparisons showed no statistically significant difference still failed to provide evidence on whether there was adequate statistical power to detect meaningful differences (Table 3). There were no differences between RCTs published in high- and low-impact journals in the performance of an intent-to-treat analysis (high impact 26.0% versus low impact 24.7%) or in the reporting of power calculations in “negative” RCTs (high impact 26.5% versus low impact 15.8%).

DISCUSSION

Methodologic problems such as lack of allocation concealment, inadequate random sequence generation, and lack of reporting of power and intent-to-treat analyses are common in rheumatology RCT publications, even in high-impact journals. Reporting of methods of random sequence generation and allocation concealment remain infrequent, whereas there have been substantial improvements in the inclusion of intent-to-treat analyses. Our results imply that rheumatology RCT reports could be improved by implementation of reporting guidelines for trials. These guidelines can be found at the CONSORT Web site (www.consort-statement.org).

Lack of allocation concealment, which gives rise to the largest inflation of effect estimates, is the most common problem. The low proportion of RCTs describing adequate allocation concealment is comparable with other specialties. For example, in a study of perinatal RCTs, adequate allocation concealment was present in only 31.6% (3). In a study of 73 RCTs published in the Archives of Dermatology between 1976 and 1997, only 1% described adequate randomization, 3% adequate allocation concealment, and 6% intent-to-treat analyses (7). A previous study of secular trends in published clinical trials of disease-modifying antirheumatic drugs in RA from the time periods 1945–1969, 1970–1979, and 1980–1989 showed no change in the amount of information given on eligibility criteria, random allocation, method of randomization, or blinding. However, there was an increase in the description and complexity of statistical methods used (8).

The use of the intent-to-treat analysis, i.e., analysis of all those randomized for treatment in the trial, constitutes the most valid analytic approach in a randomized trial. Exclusion of randomized subjects can lead to an overestimation of clinical effectiveness (9). Although we have shown that in rheumatology RCTs, analytic approaches have improved, use of the intent-to treat approach still is reported in less than half of the trials even in high-impact journals. Furthermore, in rheumatology RCTs we found that, even if the RCT publication describes an intent-to-treat analysis, this description is often misleading. A study of 249 RCTs published in 1997 in the Lancet, British Medical Journal, New England Journal of Medicine, and Journal of the American Medical Association demonstrated that the intent-to-treat approach was often inadequately described and applied. Of the 119 RCTs that stated that intent-to-treat analysis was performed, 13% did not actually perform this analysis and there was a wide variation in the handling of missing data (10).

RCTs that report no difference between treatment groups should ideally present results of a power analysis (11), which would provide information as to whether the study had a sufficient number of subjects to detect a likely treatment effect. Unfortunately, less than one-half of null (negative) trials provide such information.

Major medical journals, recognizing biases in reporting of industry-sponsored trials, have recently mandated that first authors of reports of these trials have access to trial data and the opportunity to analyze these data (12). Although we have addressed different issues in this investigation, we share the dilemma being addressed by those editors, in that we too recognize that there is bias in reporting of data from randomized trials. In both cases, we seek to lessen that bias so that the data reported provide valid evidence on treatment efficacy.

Our findings of trial report deficiencies could be due to either actual limitations in the methods of trials or to incomplete reporting of these methods (or both). The problem of adequate reporting of trial methods is a dual responsibility of editors and authors. Journals can require a standard of reporting of RCT methods to be followed by authors. A number of general medical and specialty medical journals already adhere to the CONSORT statement as one means of ensuring this. A recent study showed that RCTs in 3 general medical journals that adopted the CONSORT guidelines showed improvement in the overall trial quality score based on the Jadad scale and showed improvement in the reporting of allocation concealment between 1994 and 1998, whereas the comparator journal (New England Journal of Medicine), which did not adopt the guidelines, had no change (13). Following the original CONSORT guidelines would have resulted in improvements in most of the specific problems identified in rheumatology trials. Adherence to such a protocol need not put extra burden on peer reviewers; instead, a checklist could be completed by the investigators or editorial staff. Such a checklist is easily available from the CONSORT Web site (www.consort-statement.org).

Without complete descriptions of methods, RCTs should not be published, if we are to avoid already documented biases. The use of adequate trial methods requires planning at the implementation phase of the RCT. Since it is possible to implement adequate methods of random sequence generation and allocation concealment in any trial setting, including non-drug therapy and open trials, there is no reason that these important sources of biases could not be reduced. In addition, the use of inappropriate analysis can further cloud the true effect of the treatment. As we seek more marginal benefits of interventions, particularly when comparing 2 therapeutic interventions, these types of preventable methodologic biases become even more critical (14). The goal of both trial investigators and editors should be to provide the most explicit account of the RCT to the wider rheumatology community so that the appropriate use of the information for our patients can be made.

Ancillary