Stats.con

Authors


Stats.con . by James Penston. . The London Press , London , 2010 , £12.99 , 318 pp. ISBN 978-1-907313-33-2.

The scientific status of many disciplines, including sociology, economics and much of clinical medicine, is derived in part from their use of statistical methods. Yet James Penston argues that the method is fundamentally flawed. Building on a previous shorter book [1], he investigates the history, methodology, consequences and philosophy of statistical methods in a beautifully written critical review.

Penston cites under-reported squabbles over the probability theories upon which hypothesis tests are based [2], and ongoing disputes over the correct way to implement the method [3,4]. On top of these intrinsic problems there is a great deal of bias in the production and reporting of statistics. To cite just one example, Penston notes that during the last Conservative Government in the UK, the definition of unemployment changed more than 20 times in an attempt to hide the increasing number of people without work. Then there are numerous examples of outright fraud, often perpetrated by those with a financial interest in getting a particular result.

It should come as no surprise, then, when conclusions derived from statistical methods are contradictory. One study claims that coffee increases the risk of pancreatic cancer, another suggests it does not. One study suggests that red meat increases the risk of colorectal cancer, another suggests it does not . . . Such conflicting findings are familiar to anyone who watches the news or reads the daily paper. If any other discipline produced so many conflicting results, it would be thrown into disrepute.

Penston focuses his attack on the ‘golden child’ of the statistical method, namely the large-scale randomized trial. If these trials can be revealed as flawed, then a forteriori, Penston argues, the whole methodology topples.

Large-scale randomized trials only reveal only small absolute effects; otherwise they would not need to be so large. Small apparent effects, Penston argues, are rarely indicative of actual (ontological) differences in the two interventions being compared. For example, the investigators in the EUROPA study randomized 12 218 patients in 23 countries with stable ischaemic heart disease to receive perindopril or placebo. The primary outcome was a composite of cardiovascular death, non-fatal myocardial infarction or cardiac arrest. Compared with 8% in the experimental group, 9.9% of the participants in the ‘placebo’ group experienced the outcome: perindopril appeared to have a 1.9% absolute effect. Based on this result, it was suggested that all patients with ischaemic heart disease take perindopril.

The first problem with small absolute effects is that they are often presented in terms of relative risk reductions, which exaggerate the differences (and which many people, including educated doctors, mistake for absolute differences). Second, the fact that the trials are described accurately as ‘large’ makes them appear more definitive. More fundamentally, it only takes small biases during the allocation, treatment or reporting phases, to tip the scales [5].

In fact, Penston gives us good reason to believe that small biases are inevitable, and there is a large body of empirical work supporting the view that these biases often exaggerate the apparent benefits of the treatment [6–8]. In the EUROPA study, for example, Penston calls our attention to the following:

  • 1Differential dropout rates (23% in the perindopril group, 21% in the placebo group).
  • 210.5% of the patients in the run-in phase of the trial were excluded, mostly for reasons related to treatment with perindopril (this would be likely to affect the external validity of the trial).
  • 3While randomization, combined with the empirical law of large numbers, reduces the chances of baseline imbalances, small imbalances can produce small absolute effect differences.
  • 4All five members of the EUROPA executive committee declared a conflict of interest.

Either by themselves, or certainly combined, these biases could have accounted for the apparent positive benefit of perindopril in the study.

Subsequently, the PEACE study randomized 8290 patients in four countries to receive trandolapril or placebo; the PEACE study failed to show any difference. The PEACE authors claimed that differences between PEACE and EUROPA conclusions could have been due to the fact that the treatment was slightly different, or that the patients were slightly different. Yet Penston rightly complains that these claims were made after the fact to explain the discrepancy. An alternative view is that small absolute effects are not indicative of any real difference between experimental treatment and placebo.

In spite of its many strengths, the book can be criticized on two grounds. First, Penston's position lacks subtlety and second his solutions are at least equally problematic.

There are at least some circumstances when small effects produced by large-scale studies are important. For example, a large body of consistent epidemiological studies suggested that, contrary to what Dr Spock suggested, significantly more babies who slept on their backs survived [9,10]. Moreover, Sudden Infant Death Syndrome (SIDS) rates have historically been very low in countries where babies traditionally sleep on their backs [11]. It is also relevant that the studies were not contaminated by additional commercial or political influences. The absolute benefit of placing babies to sleep on their backs is small – 0.6 to 0.3 in 1000 in the USA and the UK. But the outcome is undoubtedly important. Here we have a case where large-scale studies provide important information. Another example is the small absolute benefit of corticosteroid injections to reduce mortality in premature babies [12].

Then, while statistical results may not help us predict what will happen to an individual, they do matter. Paul Meehl expressed his frustration at those who denied this fundamental point in an amusing way: ‘An advocate of this anti-actuarial position would have to maintain, for the sake of logical consistency, that if one is forced to play Russian roulette a single time and is allowed to select a gun with one or five bullets in a chamber, the uniqueness of the event makes the choice arbitrary’[13].

Then, Penston does not spell out any alternatives in any detail. The extent to which he does propose other methods (clinical experience and basic science research) falls prey to the very same objections he levies against the statistical method. Human nature being what it is, individual clinicians and basic scientists are – and indeed have been – known to be strongly influenced by commercial and political pressures. In fact, it is far cheaper to influence a group of experts than it is to conduct a large-scale trial. Hence, Penston's own criticisms of bias and fraud (chapters III and IV) apply all the more to the other methods. He admits this tacitly when he notes, ‘Fraud, in one form or another, is found in every walk of life’ (p. 177). Furthermore – and this problem has been all but ignored – the problems with external validity (chapter V) apply equally to studies of underlying biological mechanisms. Finally, untamed clinical reasoning is notoriously problematic. A clinician might observe dramatic remission from depression after antidepressant therapy, but without further examination it is impossible to rule out that the recovery was due to ‘placebo’ effects. These problems are exacerbated by naïve and well-educated clinicians' (and most other people's) common reasoning fallacies [14,15].

To conclude, Penston's three major points are defensible:

  • 1the foundation of the statistical method is the subject of a lively but under-reported debate;
  • 2bias in the production, interpretation and reporting of statistics makes their results problematic; and
  • 3small absolute differences revealed in large-scale randomized trials must be viewed with great suspicion.

I look forward to Penston's next book, where he might provide guidelines for when we might use statistical methods more fruitfully, and where he spells out a better alternative in more detail.

Ancillary