Potential conflict of interest: Nothing to report.
To determine the current quality of reporting of randomized clinical trials (RCTs) in the field of gastroenterology and hepatology, we evaluated the methodological reporting of RCTs in six major gastroenterology and hepatology journals. The methodological quality, including generation of the allocation sequence, allocation concealment, double-blinding, and sample size calculation; number of patients; disease area; and funding source was also retrieved from each trial, and the relevant trials were identified by searching MEDLINE in 2006 using a highly sensitive search strategy. The status of reporting the methodological quality of RCTs was descriptively reported. One hundred five trials were included in the final analysis; of these, 81% (85/105) reported adequate generation of the allocation sequence, 61% (64/105) reported adequate allocation concealment, 51% (54/105) were double-blind, and 75% (79/105) reported adequate sample size calculation. The reported methodological quality greatly improved when compared with historical cohorts. Conclusion: This study shows that there was substantial improvement in the reported methodological quality in the major gastroenterology and hepatology journals, but this quality can be further improved. (HEPATOLOGY 2009.)
If you can't find a tool you're looking for, please click the link at the top of the page to "Go to old article view". Alternatively, view our Knowledge Base articles for additional help. Your feedback is important to us, so please let us know if you have comments or ideas for improvement.
In the era of evidence-based medicine, high-quality health care means a practice that is consistent with the best available evidence. In the hierarchy of clinical evidence, randomized controlled trials (RCTs) are generally considered to be the best means of ascertaining the value of a particular therapy; therefore, well-designed and well-conducted RCTs are invaluable for health practitioners in clinical decision making.1 However, flaws in the randomization methods of RCTs can overestimate intervention benefits by 30%,2–4 and trials without double-blinding may exaggerate them by 14%.2–4 To maintain a high standard and improve the quality of RCTs, a revised CONSORT statement was published in the year of 20015 and was subsequently adopted by many high-impact journals.6, 7
A methodological assessment of RCTs in several major gastroenterology and hepatology journals prior to 2001 has been published,8–11 but there have been no similar reports since then. Whether the quality of methodological reporting has been improved remains to be established. To address these issues, we systematically evaluated the methodological reporting of RCTs in six major gastroenterology and hepatology journals in 2006.
RCT, randomized clinical trial.
Materials and Methods
This study included all RCTs published as full text articles in Gastroenterology, Hepatology, Gut, Journal of Hepatology, American Journal of Gastroenterology, and Clinical Gastroenterology and Hepatology in 2006. We decided to include trials published in these journals because they are leading gastroenterology and hepatology journals, and their methodological reporting has not been systematically studied since 2001. We decided to study the trials published in 2006 because the revised CONSORT statement was published in 2001 to improve the quality of reports of RCTs.5
Trials were considered to be RCTs if the words “random,” “randomly,” “randomization,” or “randomized” were used to describe the allocation method in the text. Trials published as abstracts, quasi-randomized trials, trials including animals or volunteers, trials dealing with subgroups of patients from RCTs or long-term follow-up of RCTs, or trials not reporting the outcomes of randomized patients were excluded. The relevant trials were identified by searching MEDLINE using a highly sensitive search strategy developed and validated by Robinson and Dickersin,12 and all phases of the search strategy are shown in Table 1. The search strategy we used was slightly different from the original one, because each year the US National Library of Medicine makes changes and additions to the MeSH and Publication Type terms, and evaluation studies and comparative studies are now treated as Publication Types, not MeSH terms (K. A. Robinson, personal communication). Two of the authors (B. Y. and G. J.) then hand-searched all the issues in the six journals to check if any potential trial was missing.
Table 1. All Phases of the Search Strategy to Identify Potential Relevant Trials in MEDLINE
randomized controlled trial [pt] OR controlled clinical trial [pt] OR randomized controlled trials [mh] OR random allocation [mh] OR double-blind method [mh] OR single-blind method [mh] OR clinical trial [pt] OR clinical trials [mh] OR (“clinical trial” [tw]) OR ((singl* [tw] OR doubl* [tw] OR trebl* [tw] OR tripl* [tw]) AND (mask* [tw] OR blind* [tw])) OR (“latin square” [tw]) OR placebos [mh] OR placebo* [tw] OR random* [tw] OR research design [mh:noexp] OR comparative study [pt] OR evaluation studies [pt] OR follow-up studies [mh] OR prospective studies [mh] OR crossover studies [mh] OR control* [tw] OR prospective* [tw] OR volunteer* [tw]) NOT (animal [mh] NOT human [mh])
Methodological quality was critically appraised according to the following elements as described in the Cochrane Handbook for Systematic Reviews of Interventions13: (1) generation of the allocation sequence, classified as adequate (e.g., computer-generated random numbers) or unclear (not described); (2) allocation concealment, classified as adequate (e.g., central independent unit, sealed envelopes), unclear (not described), or inadequate (e.g., open table of random numbers); (3) double-blinding, classified as adequate (e.g., participants, caregivers, outcome assessors, and analysts unaware of treatment allocation by identical placebo), unclear (not described), inadequate (e.g., tablets versus injections), or not double-blinded. In addition, we assessed the sample size calculation (yes = clear reported; no = not reported or did not specify the power and sample size). The definition of positive results means one intervention is significantly better than the other intervention; the definition of negative results means one intervention is not significantly better than the other intervention. The interobserver agreement of the two authors (B. Y. and G. J.) was rated by calculation of Kappa value. Any disagreement was resolved by discussion between the two reviewers (B. Y. and G. J.); if the disagreement could not be resolved by discussion, the opinion of the senior reviewer (D.-W. Z.) was sought.
This study was designed to describe the current status of reported methodological quality in six major gastroenterology and hepatology journals; it was not designed to detect significant changes in the reported methodological quality with historical cohorts. Statistical comparisons between cohorts were not performed. Descriptive statistics (mean, standard deviation, median) were employed to summarize selected findings. All statistical analyses were performed with SPSS version 11.0 for Windows (SPSS Inc., Chicago, IL).
A total of 1,247 studies of the six journals were retrieved; the process of selecting eligible trials is shown in Fig. 1. Among these 1,247 studies, 1,130 were excluded because they had a nonrandomized design; were animal studies; were letters or reviews; or were extension, subgroup, pooled analyses of RCTs. Another 12 studies were not included because of volunteer inclusion. There were 105 trials (reported in 104 articles) eligible for the analysis, the principal characteristics of which are summarized in Table 2. Generally, there were more trials with positive results (64/105) than with negative results (41/105). There were 59 (56%) trials from Europe, 18 (17%) trials from Asia, and 30 (29%) trials from the United States and Canada.
Table 2. Principal Characteristics of the Included Trials
No. of Trials (%)
95% Confidence Interval
Effect of intervention
Public and industry
The 105 trials included a median of 60 patients per arm (range, 8–2587; 25th percentile, 50 patients; 50th percentile, 119 patients; 75th percentile, 205 patients; 95th percentile, 684 patients). The included trials were categorized as 17 disease areas. Eighteen trials dealt with irritable bowel syndrome, 14 with hepatitis C virus, eight with inflammatory bowel diseases, seven with gastroesophageal reflux disease, five with liver cirrhosis, five with colonoscopy, five with hepatitis B virus, four with Helicobacter pylori, three with Barrett esophagus, three with hepatocellular carcinoma, three with peptic ulcer bleeding, three with variceal bleeding, two with colorectal adenomas, two with peptic ulcers, two with nonalcoholic fatty liver disease, and 21 with miscellaneous conditions.
Kappa values for the interobserver agreement between the two reviewers (B. Y. and G. J.) were 0.87 for generation of allocation sequence, 0.81 for allocation concealment, 0.88 for double-blinding, and 0.87 for sample size calculation. All of these values indicate almost perfect or substantial agreement.
Generation of the allocation sequence was adequate in 85 of the 105 trials and unclear in the remaining 20 trials. The allocation concealment was adequate in 64 trials and unclear or inadequate in 41 trials. Adequate double-blinding was reported in 54 trials and unclear or not double-blinded in 51 trials. Of the trials without adequate double-blinding, 57% (29/51) dealt with medication; meanwhile, of 83 medical trials, only 65% (54/83) were double-blinded. Adequate sample size calculation was reported by 79 trials; 51 trials set the power at 80%, and five trials set the power at least 95%; the other 49 trials set the power between 80% and 95%. According to different strata, it was found that industry-sponsored, large-scale (n > 100) multicenter studies generally had better reporting of methodological quality of RCTs than public-sponsored, small-scale (n < 100) single-center studies (Table 3). A multivariate logistic regression analysis revealed that the number of patients (n > 100) was the independent determinant of adequate allocation sequence generation (P < 0.001) and adequate sample size calculation (P = 0.036); the number of participating centers (n > 1) was the independent determinant of adequate double-blinding (P = 0.014); and there was no independent determinant of adequate allocation concealment. Compared with previous reports,8–10 there were improvements of reported methodological quality of RCTs in the major gastroenterology and hepatology journals in 2006 (Table 4).
Table 3. Reporting of Methodological Quality of RCTs in Major Gastroenterology and Hepatology Journals in 2006 According to Different Strata
Adequate Allocation Sequence Generation
Adequate Allocation Concealment
Adequate Sample Size Calculation
Abbreviation: CI, confidence interval.
Single-center (n = 39)
Multicenter (n = 66)
No. of patients
>100 (n = 59)
<100 (n = 46)
Industry (n = 34)
Public (n = 32)
Table 4. Reporting of Methodological Quality of RCTs in Major Gastroenterology and Hepatology Journals in Different Study Periods
Abbreviations: CI, confidence interval; NS, not specified.
Adequate allocation sequence generation (95% CI)
Adequate allocation Concealment (95% CI)
Adequate double-blinding (95% CI)
Adequate sample size calculation (95% CI)
The present study revealed that there was significant improvement in the reported methodological quality in the major gastroenterology and hepatology journals when compared with earlier studies on methodological quality. Nonetheless, during the study, 19% of all RCTs did not report adequate generation of the allocation sequence, 39% did not report adequate allocation concealment, 49% were not double-blinded, and 25% did not report adequate sample size calculation. These findings suggest that there is still room for improvement in the practice of methodological quality reporting of RCTs.
This study had some limitations. First, the 2005 edition of the Cochrane Handbook was the only edition available during the manuscript preparation, but now there is a new 2008 edition available that deals with more methodological components that are associated with risks of bias.14 Second, because of the small sample size, we gave the proportions with 95% confidence intervals, which could lead to more modest optimism regarding the observed improvement. Third, the accuracy of the methodological quality assessment is affected by the quality of the reporting; a trial with severe methodological limitations may be considered as an excellent trial if the limitations were not reported, whereas a well-designed and conducted trial may be considered to carry substantial bias if it was inadequately reported.15 For readers of medical journals, the difference between “not done” and “not reported” in the reporting of trials should be noted. For example, if the authors of a report do not mention the use of a computer to generate random numbers, readers should not assume that this was not done. On the other hand, the use of a computer-generated sequence does not necessarily mean that it was adequately done. However, previous research has shown that the quality of the design and conduct of the RCTs is positively related to the reporting quality.16 Thus, our study may at least partly reflect the actual methodological quality of RCTs in the six major gastroenterology and hepatology journals.
It is well-known that up-to-date systematic reviews and meta-analyses of RCTs represent the highest level in the hierarchy of evidence.17 However, they may yield misleading conclusions if the included RCTs suffered from severe methodological flaws, and evidence has shown that RCTs may introduce bias; therefore, it is extremely important to measure the quality of RCTs. In general, there are two kinds of systems for assessing quality. The first kind comprises those systems that employ quality scales in which relevant items are assigned to numerical points, and these points are then added to a summary score; therefore, many checklists and quality scales have been developed. Nevertheless, it has been demonstrated that use of summary scores to identify trials of high quality are often highly misleading and should not be used at all.18 This is also the recommendation in the 2008 Cochrane Handbook.14 Thus, we decided to use the other systems that simply evaluate the presence or absence in the reporting of important components, including generation of the allocation sequence, allocation concealment, and double-blinding.2, 3
The process of randomization is designed to provide the following important advantages: (1) avoidance of investigator bias, (2) appropriately balanced arms, and (3) analysis of study results without statistical modeling assumptions. In the present study, adequate generation of the allocation sequence was reported by 81% of all RCTs, and the proportion is higher than what has been published.8–10
An empirical study has shown that absence of adequate allocation concealment is related to potential bias.19 Therefore, allocation concealment has been considered to be more important than other components of allocation (such as the generation of the allocation sequence) in reducing bias.20 Thus, studies can be judged on the method of allocation concealment. However, there are different attitudes about the importance of allocation concealment.21, 22 Our study revealed that 61% of all trials reported adequate allocation concealment; therefore, the reporting of allocation concealment can be further improved, because although it is not always feasible to conduct a double-blind trial (especially in the surgical or endoscopic field), it is always possible to adequately conceal the allocation sequence by a remote telephone randomization service, a central independent unit, or a sequentially numbered, opaque, sealed envelopes method. According to the 2008 edition of the Cochrane Handbook, of the different methods of allocation concealment, those using envelopes are more susceptible to manipulation than other approaches. If investigators use envelopes, they should develop and monitor the allocation process to preserve concealment.14
It is also clear that some biases can be eliminated or diminished by using the method of double-blinding. However, about half of all the trials did not employ adequate double-blinding when 79% (83/105) of all trials dealt with medical intervention. This indicates that more emphasis needs to be made by investigators in this aspect of methodological quality control in order to minimize potential bias.
The statistical power of a trial lies in its chance to detect a clinically important difference between the active treatment and the control arm when that difference actually exists. If a trial cannot demonstrate a statistically significant benefit, an erroneous conclusion may be drawn that the treatment is not beneficial, even if the trial did not enroll enough patients. Therefore, it is essential to take this key component into consideration when evaluating the methodological quality of a trial. However, the level of power is not interpretable without reference to the effect detected. Thus, a 95%-powered trial is not necessarily better than an 80%-powered trial if the effect size is much larger in the former trial. We found that 25% of all trials did not report sample size calculation. In addition, over half of the RCTs with negative results presented at American Society of Clinical Oncology annual meetings do not have an adequate sample size to detect a medium-sized treatment effect.23 Similarly, in our study we found that 24% (10/41) of RCTs with negative results did not report sample size calculation. Therefore, we recommend that future researchers clearly provide the process of sample size calculation so that the results of negative RCTs can be interpreted appropriately and the probability of missing an important therapeutic improvement may be reduced.
In conclusion, because the revised CONSORT statement was published in 2001, the reported methodological quality of RCTs in major gastroenterology and hepatology journals improved significantly. Nevertheless, we hope that editors and authors take note of our recommendations and pay more attention on the reporting of methodological issues in order to maintain RCT as the golden standard for evaluation of intervention efficacy.
We are indebted to the authors of the primary studies; without their contributions, this work would have been impossible. We also thank Li Feng, M.D., Division of Gastroenterology and Hepatology, Department of Medicine, University of Maryland Medical Center, for his comments and advice regarding the early version of this manuscript.