Through the looking glass
One surprising feature of clinical research is that many randomised trials have been performed in general medicine, but only a few in surgery. Yet, we should expect this when we consider the contrasting nature of the two disciplines. In randomised trials in general medicine, the experimental treatment, in the form of tablets or injections, is of uniform potency; the diagnosis of the condition is certain; and the definition of cure is usually free of bias. Thus, in trials of thrombolytic therapy in myocardial infarction, the vials of the experimental thrombolytic drug have uniform pharmacological potency; the diagnosis of myocardial infarction is certain; and the definition of cure, avoidance of death, is free of bias. In randomised trials in surgery, the experimental treatment, in the form of a new operation, is of variable potency, depending on the training of the surgeons; the diagnosis of the condition may be uncertain; and the definition of cure may not be free of bias. The problems besetting randomised trials of surgical treatments are described in the Commentary by Paul Hilton (pages 1081–1088).
The author has used the example of the United Kingdom Tension-Free Vaginal Tape randomised trial recently published in the BMJ1. This large multicentre trial compared the new treatment, the tension-free vaginal tape, with the standard treatment, Burch colposuspension, in women with genuine stress incontinence. It is not the results of the trial that concern us here, but the questions that arise from the conduct of the trial. Paul Hilton not only describes the problems of randomisation, intention-to-treat analysis, length of follow up and the power of the trial, which are common to randomised trials in any situation, but he also considers additional problems that are found in trials of surgical treatments—differences in interpretation of the trial according to the definition of cure, bias in the ascertainment of cure and variations in the performance of individual hospitals.
Cure of stress incontinence can be assessed in five ways: the symptoms experienced by the woman, the severity of these symptoms, urodynamic investigations, measurement of the quality of life and economic evaluation. Hilton shows elegantly how the rate of cure can vary greatly depending on which of the five assessments is thought to be the most important. Thus, if the definition of cure depends on cystometry, 80% of women were cured; if it depends on a negative pad test, 70% of women were cured; if it depends on the combination of cystometry and a negative pad test, 60% of women were cured; and if it depends on the woman reporting ‘no leakage’, 30% of women were cured. But if the women's satisfaction with their treatment was measured, 80% of women were satisfied or highly satisfied, notwithstanding the results of cystometry or the pad test, or whether or not they were completely dry. The International Continence Society has gone a long way to establishing definitions in urinary incontinence; now is the time for it to establish a definition of cure of urinary incontinence, a definition that can be used in randomised trials and in meta-analyses of randomised trials. The available evidence suggests that the preferred definition should depend on the perceptions of the women and not their physicians.
No matter what the definition of the outcome of a trial, the investigators ascertaining this outcome should not be aware of the treatment given to the woman. In randomised trials in general medicine, this blinding is easily achieved with placebos; but it is morally unjustified to give a woman a placebo operation. Usually, the main interest is not that the new treatment is superior to the standard, but that it is equivalent in its efficacy to cure but at much less cost to the woman. New surgical treatments tend therefore to be less invasive; in fact, they tend to be entirely different surgical procedures. Thus, the tension-free vaginal tape operation is performed under local anaesthesia, Burch colposuspension under general anaesthesia; the tension-free vaginal tape operation requires only two small incisions in the abdomen with no dissection of tissue planes, Burch colposuspension requires a much larger incision in the abdomen with extensive dissection of tissue planes; the tension-free vaginal tape operation requires only simple forms of post-operative pain relief for a short time, Burch colposuspension requires opiate analgesia for several days. They are completely different operations, and it is illogical to compare their immediate effects. More importantly, because the operations are so different, blinding of the investigators to the treatment the women received is impossible, and so ascertainment bias may occur. Paradoxically, this bias is more likely with the objective tests of urinary continence, such as urodynamic tests, which are carried out by the physician, who may have unconscious biases towards one treatment or the other; and less likely with the subjective tests of urinary continence, such as ‘stress leakage’, which are perceived by the woman, who will not have unconscious biases towards one treatment or the other.
The Commentary also questions the external validity of the trial. The investigators included urogynaecologists, urologists and general gynaecologists working in teaching and district hospitals, in order to make the trial as general as possible. Overall, there was no difference in the rate of cure between the two treatments, but there was a difference between the 14 centres taking part in the trial. In eight centres, the objective cure rate with the tension-free vaginal tape was greater than with Burch colposuspension, while in four centres the opposite occurred. With such heterogeneity, the techniques of meta-analysis that should be used for the estimate of the treatment effect may be quite different from a global estimate where the differences between the centres are not accounted for. Almost certainly, the differences between the hospitals are due to differences in experience, for the trial shows an association between the number of women enrolled by a centre into the trial and the rate of objective cure. This emphasises a major problem with randomised trials of surgical treatments. Since the hypothesis being tested in most randomised trials is that the experimental treatment is as good as the standard treatment (but is less invasive or requires less post-operative pain relief or is cheaper), the trial has to be very large. In order to satisfy the calculation of the number of women required for the trial, more and more centres and more and more surgeons are recruited; the more surgeons are recruited, the less experience they will have. The potency of the therapy is not uniform. In the end, the trial is unable to test the new surgical treatment, the results are inconclusive and the resources spent on the trial are wasted.
We should congratulate the investigators in the United Kingdom Tension-Free Vaginal Tape trial for successfully completing a randomised trial to test a new surgical treatment. The Commentary by Paul Hilton describes many of the difficulties encountered in undertaking this trial. These considerations call into question the whole concept of randomised trials of surgical treatments, for they may be impossible to carry out. If at the stage of the design of a trial it is found that the ascertainment of cure is likely to be biased, and above all, if the trial may be so large that it cannot be accomplished without recruiting surgeons with lesser training, then the investigators should have the courage not to perform the trial. It may be that in the future new surgical treatments will not be tested by randomised trial, but by an uncontrolled case series where cases are entered into a national database, where the operations are undertaken only by surgeons with adequate training and where the definition of cure is clearly established.