SEARCH

SEARCH BY CITATION

Keywords:

  • biostatistics;
  • outcome event;
  • surgical trial

Abstract

  1. Top of page
  2. Abstract
  3. Introduction
  4. Methods
  5. Results
  6. Discussion
  7. References

Background

Surgical trials sometimes fail to clearly identify the primary outcome events of interest. This results in trials that are diffuse and difficult to interpret.

Objective

The objective of this study was to systematically review the use of outcome events in surgical trials.

Data sources

Surgical trials published between 1 January 2007 and 30 June 2010 in 26 peer-reviewed journals representing a wide range of specialty interests were used in this study.

Review methods

Copies of all potentially relevant articles were scrutinized to identify the admissible surgical trials. Two investigators experienced in health research methods used a standardized form to extract discrete information (i.e. it was an ‘identifying and counting’ exercise that did not require subjective evaluations). All forms were double-checked.

Results

Twenty-four per cent (130 out of 531) of the trials failed to declare the primary outcome events – 11% (56 out of 531) of the trials indicated the primary outcome events in the abstract, but not in the body of the article. The compliant trials used a median of three primary outcome events (interquartile range: 2–5, absolute range: 1–17), and a median of 19 statistical comparisons (interquartile range: 9–32, absolute range: 1–130). Only 2% (11 out of 531) of the trials made an adjustment for the multiple testing of statistical significance (9 of these trials declared a single primary outcome event). Composite outcome events appeared in 9% (48 out of 531) of the trials and these studies contained a median of 24 statistical comparisons.

Conclusions

Many surgical trials fail to clearly define the specific outcome events of interest, and this is often accompanied by a subversive number of statistical comparisons.


Introduction

  1. Top of page
  2. Abstract
  3. Introduction
  4. Methods
  5. Results
  6. Discussion
  7. References

Outcome events, or endpoints as they are also called, are the measurements made by investigators to evaluate the results of clinical trials.[1] The primary outcome measures should be directly related to the main aim of the study, and when planning clinical trials, they are used to estimate the number of patients required to obtain a reliable result. Most clinical trials include secondary outcome events that evaluate peripheral issues – clinical trials are onerous to perform and is expensive, so investigators use them to gain as much potentially useful information as possible. The CONSORT (CONsolidated Standards of Reporting Trials) statement requires ‘completely defined pre-specified primary and secondary outcome measures, including how and when they were assessed’.[2] Compliance with these criteria allows the readers to separate the key issues from the fringe topics. Failure to comply with these requirements inevitably results in clinical trials that are diffuse and difficult to interpret.

The aim of this study was to systematically review the use of outcome events in surgical trials. We paid particular attention to the declaration of primary outcome events, the number and nature of the outcome events, the length of follow-up and the number of statistical comparisons.

Methods

  1. Top of page
  2. Abstract
  3. Introduction
  4. Methods
  5. Results
  6. Discussion
  7. References

A systematic review was conducted of surgical trials published between 1 January 2007 and 30 June 2010 in 26 peer-reviewed journals (American Journal of Surgery, Annals of Internal Medicine, Annals of Plastic Surgery, Annals of Surgery, Annals of Thoracic Surgery, ANZ Journal of Surgery, Archives of Surgery, British Journal of Surgery, British Medical Journal, Canadian Journal of Surgery, Diseases of the Colon & Rectum, European Journal of Surgical Oncology, Journal of the American College of Surgeons, Journal of the American Medical Association, Journal of Bone & Joint Surgery (USA), Journal of Paediatric Surgery, Journal of Trauma, Journal of Vascular Surgery, Lancet, Neurosurgery, New England Journal of Medicine, Otolaryngology Head & Neck Surgery, Surgery, Breast, Urology and World Journal of Surgery). These are readily available international journals that have a high-impact factor for that specialty. We inspected all of the relevant issues of the journals: we did not rely on databases, such as Medline, to identify articles. Digital and paper copies were made of all potentially relevant articles. These candidate articles were scrutinized to identify the admissible surgical trials, which were defined as prospective studies that evaluated: (i) the effect of an intervention on health outcomes after the patients were randomized into groups, and (ii) considered a topic directly relevant to the practice of surgery (i.e. they did not include interventions performed by physicians and radiologists).

Two investigators (JLH and JCH) experienced in health research methods used a standardized form to extract data. This form contained identifying material and information about the outcome events (the declaration of primary and secondary endpoints, the number and nature of the outcome events, the length of follow-up and the number of statistical comparisons). Words such as ‘main’ or ‘principal’ were accepted as being indicative of primary outcome events: words such as ‘other’ or ‘additional’ were accepted as being indicative of secondary outcome events. Non-clinical endpoints were only considered in this context (i.e. we did not include investigations that were used incidentally to monitor therapy). We diminished the risk of observer variation by only collecting discrete information (i.e. it was an identifying and counting exercise that diminished the need for subjective evaluations). We double-checked the data for each article and resolved discrepancies by discussion.

The scope and magnitude of the study makes it a representative of the recent surgical literature. Hence, this study is descriptive in nature and we have refrained from using comparative statistics. Data are described using the median, interquartile range and absolute range statistics.

Results

  1. Top of page
  2. Abstract
  3. Introduction
  4. Methods
  5. Results
  6. Discussion
  7. References

There were 590 candidate trials. Close scrutiny led to the exclusion of 59 trials – 36 did not use clinical endpoints (11 concerned education/training and 7 were about devices or prostheses), 11 were not trials (7 used data from a published clinical trial), 7 were non-surgical (3 were about percutaneous coronary artery procedures and 2 evaluated Caesarian sections) and 5 were reviews. This left 531 admissible studies.

Nine journals accounted for 68% (359 out of 531) of the admissible trials (Table 1). Eleven specialty interests accounted for 70% (374 out of 532) of the admissible trials (Table 2). The orthopaedic trials were related to the knee (n = 22), spine (n = 13) and hip (n = 13). Gastroduodenal surgery included bariatric surgery (n = 8) and surgery for gastro-oesophageal reflux (n = 7).

Table 1. The nine journals that each published more than 20 of the 531 admissible surgical trials (the absolute numbers are in parentheses)
Annals of Surgery14% (74)
British Journal of Surgery12% (65)
Journal of Bone & Joint Surgery (USA)9% (50)
Urology7% (35)
The Lancet6% (31)
Diseases of the Colon & Rectum5% (29)
World Journal of Surgery5% (27)
Annals of Thoracic Surgery5% (24)
New England Journal of Medicine5% (24)
Table 2. The specialty interests most often encountered in the 531 admissible surgical trials (the absolute numbers are in parentheses)
Orthopaedic13% (71)
Colorectal12% (64)
Vascular8% (42)
Urology8% (42)
Gastroduodenal7% (36)
Cardiac6% (30)
Hepatobiliary4% (22)
Breast4% (19)
Otolaryngology3% (16)
Pancreas3% (16)
Groin hernia3% (16)

Sixty-three per cent (332 out of 531) of the admissible trials declared primary outcome events in the Methods section. Thirteen per cent 13% (69 out of 531) declared primary outcome events in other parts of their article: Abstract alone (n = 56), Results and Abstract (n = 8), Introduction and Abstract (n = 4), and Results alone (n = 1). Twenty-four per cent (130 out of 531) failed to declare primary outcome events in any part of their article.

Fifty per cent (266 out 531) of the admissible trials declared secondary outcome events in the Methods section, while 22% (115 out of 531) declared secondary outcome events in other parts of their article. Hence, 18% (150 out of 531) of the trials failed to declare secondary outcome events in any part of the article.

The generic outcome events most often encountered in the 531 admissible trials were as follows: death 20% (n = 108), length of stay in hospital 17% (n = 90), functional status (including time until return to work) 15% (n = 78), quality of life 14% (n = 75), patient satisfaction 9% (n = 49), costs 6% (n = 33) and psychosocial status 3% (n = 15). Non-clinical endpoints were used at least once in 52% (278 out of 531) of the admissible studies – imaging (n = 152), biochemistry (n = 103) and pathology (n = 29).

Table 3 contains summary statistics for the numbers of primary outcome events and statistical comparisons. The trials reported a median of three primary outcome events (interquartile range: 2–5, absolute range: 1–17), and a median of 19 statistical comparisons (interquartile range: 9–32, absolute range: 1–130). Five trials declared more than 10 primary outcome events. Ten per cent (51 out of 532) of the trials reported more than 50 statistical comparisons, and 6 of the trials reported more than 100 statistical comparisons. Only 11 trials made an allowance for multiple testing of statistical significance and nine of these trials used one primary outcome event. Only 39% (205 out of 531) of the trials declared that their analyses were based on an ‘intention to treat’.

Table 3. The relationship between the declared primary outcome events and the number of statistical comparisons in the 531 admissible surgical trials
 Declared number of primary outcome events
1 (n = 244)2–6 (n = 89)Nil (n = 198)
Median212119
Interquartile range10–3410–359–32
Absolute range0/1–1200/1–1211–130

Only 2% (12 out of 531) of the trials failed to declare the time that they followed-up patients. In 79% (422 out of 531) of the trials, the patients were followed up for a declared time after discharge from hospital: 18% (98 out of 531) of the trials only evaluated the patients while they were in hospital. There was no trend to report a standardized definition of operative mortality: 20% (107 out of 531) of the trials declared death as an outcome – only 30% (32 out of 107) of these trials declared a specific period of review, the commonest being 30 days, which was used in 16% (17 out of 107) of the relevant studies.

Discussion

  1. Top of page
  2. Abstract
  3. Introduction
  4. Methods
  5. Results
  6. Discussion
  7. References

We found that 24% of the surgical trials under evaluation failed to declare the primary outcome events. This is similar to the non-compliance rate of 26% for similar criteria in a review of 490 surgical trials published in the ANZ Journal of Surgery and the British Journal of Surgery between 1969 and 2003.[3] At first sight, this appears to be relatively good when compared with the report from Altman's group that one-half of clinical trials published in prestigious medical journals specify the primary outcome event.[4, 5] However, in those studies, the criteria were strict. They looked for the ‘explicit’ definition of a primary or main outcome. We were more lenient by avoiding decisions about what was, or was not, ‘explicit’, and rather than just concentrating of the Methods section, we looked for an indication in any part of the article. Eleven per cent of the surgical trials that we surveyed only declared the primary outcome events in the abstract. It is surprising that more than 1 in 10 surgical trials fail to mention such key information in the text of their article.

Laxity in declaring the primary outcome event is a major fault. The effectiveness of clinical trials depends on a tight articulation between the aims, outcome events and conclusions. Vagueness about the main outcome events indicates an unwillingness to reliably evaluate a precise hypothesis. In the absence of predefined primary outcomes, investigators can selectively report outcomes based on post hoc analyses. The International Committee of Medical Journal Editors' ‘Uniform Requirements for Manuscripts Submitted to Biomedical Journals: Writing and Editing for Biomedical Publication’[6] states that ‘Both the main and the secondary objectives should be clear, and any pre-specified subgroup analyses should be described’. This is consistent with the CONSORT statement's warning that having several primary outcomes ‘incurs the problems of interpretation associated with multiplicity of analyses and is not recommended’.[7] Our study provides evidence that this concern is valid.

We observed a lack of uniformity in the presentation of generic outcome events such as operative death. Ideally, outcomes that are used during clinical trials, surgical audits and the management of surgical services within health-care systems would use common criteria so that reliable estimates could be made of the importance of variations between different services.

Nine per cent of the trials that we reviewed used a composite outcome event. Composite outcome events are usually based on a combination of traditional outcomes. They allow complex outcomes to be expressed as a single quantity.[1, 8] This should enhance the power of studies to reliably detect clinically important differences in outcome. However, when Ferreira-González et al. explored the use of composite endpoints in 114 cardiovascular trials, they came to the conclusion that: ‘Higher event rates and larger treatment effects associated with less important components may result in misleading impressions of the impact of treatment’.[9] Our study indicates that surgical trials add to these concerns by just using composite endpoints as part of the mix of multiple outcome events. The use of composite endpoints was not associated with an anticipated reduction in the number of outcome events and a constrained approach to testing for statistical significance.

Only 6% of the trials that we reviewed used costs as an outcome event, and in many of these studies, the analysis appeared to be trivial. A recent report suggested that less than one-half of surgical trials that provided estimates of costs were formal studies (they only mentioned costs in the Methods and Results sections).[10] It is exceedingly rare to find a surgical trial that is based on a comprehensive economic analysis, which is understandable because it requires considerable effort to determine the monetary value of benefits.

The design of our study has some potential weaknesses. In order to collect a large number of articles, the period under review was 3.5 years. The initial publications appeared in January 2007 and, given that methodological standards tend to improve with time, may not accurately reflect the standards of the contemporary literature for readers (i.e. a chronological bias). We were concerned about the risk of observer variation. This was offset by only collecting discrete information, double-checking the data for each article and resolved discrepancies by discussion. We we careful to adopt an impartial attitude to the data and tried to avoid a ‘group think’ mentality that actively sought ‘errors’. Finally, our study is descriptive in nature, which negates the ability to make analytical comparisons between the specialty interests, the impact factor of the journal or markers of study quality.

In conclusion, surgical trials tend to be too diffuse. Both readers and biostatisticians would appreciate surgical trials that link a relevant aim to a modest number of primary outcome events and carefully targeted statistical comparisons. We found that more than one-quarter of the surgical trials declared five or more primary outcome events and one-half of the trials contained more than 18 statistical comparisons. Furthermore, we found that multiple testing for statistical significance persisted despite the use of composite outcome events and the selective declaration of primary outcome events. Hence, in this instance, compliance with guidelines had a limited ability to indicate the quality of a highly relevant component of the published articles. The ability of well-informed authors to ‘play the game’ should not be confused with the structural integrity of a study.

References

  1. Top of page
  2. Abstract
  3. Introduction
  4. Methods
  5. Results
  6. Discussion
  7. References