Acknowledgments. For their suggestions on drafts of this article we thank Nekane Balluerka, Jarrett Byrnes, Roger Mead, Paul Murtaugh, Daniel O’Keefe, Michael Riggs, David Savitz, Allan Stewart-Oaten, Sheela Talwalker, Scott Urquhart, Alan Welsh – and three anonymous reviewers.
LOPSIDED REASONING ON LOPSIDED TESTS AND MULTIPLE COMPARISONS
Version of Record online: 10 MAY 2012
© 2012 Australian Statistical Publishing Association Inc.
Australian & New Zealand Journal of Statistics
Volume 54, Issue 1, pages 23–42, March 2012
How to Cite
Hurlbert, S. H. and Lombardi, C. M. (2012), LOPSIDED REASONING ON LOPSIDED TESTS AND MULTIPLE COMPARISONS. Australian & New Zealand Journal of Statistics, 54: 23–42. doi: 10.1111/j.1467-842X.2012.00652.x
- Issue online: 25 JUL 2012
- Version of Record online: 10 MAY 2012
- Bonferroni procedure;
- comparison-wise error rate;
- directed tests;
- false discovery rate;
- family-wise error rate;
- one-tailed tests;
- randomized clinical trials;
- set-wise error rate;
- significance tests;
- split-tailed tests;
- type I error
For those who have not recognized the disparate natures of tests of statistical hypotheses and tests of scientific hypotheses, one-tailed statistical tests of null hypotheses such as ∂≤ 0 or ∂≥ 0 have often seemed a reasonable procedure. We earlier reviewed the many grounds for not regarding them as such. To have at least some power for detection of effects in the unpredicted direction, several authors have independently proposed the use of lopsided (also termed split-tailed, directed or one-and-a-half-tailed) tests, two-tailed tests with α partitioned unequally between the two tails of the test statistic distribution. We review the history of these proposals and conclude that lopsided tests are never justified. They are based on the same misunderstandings that have led to massive misuse of one-tailed tests as well as to much needless worry, for more than half a century, over the various so-called ‘multiplicity problems’. We discuss from a neo-Fisherian point of view the undesirable properties of multiple comparison procedures based on either (i) maximum potential set-wise (or family-wise) type I error rates (SWERs), or (ii) the increasingly fashionable, maximum potential false discovery rates (FDRs). Neither the classical nor the newer multiple comparison procedures based on fixed maximum potential set-wise error rates are helpful to the cogent analysis and interpretation of scientific data.