LOPSIDED REASONING ON LOPSIDED TESTS AND MULTIPLE COMPARISONS

Authors


  • Acknowledgments. For their suggestions on drafts of this article we thank Nekane Balluerka, Jarrett Byrnes, Roger Mead, Paul Murtaugh, Daniel O’Keefe, Michael Riggs, David Savitz, Allan Stewart-Oaten, Sheela Talwalker, Scott Urquhart, Alan Welsh – and three anonymous reviewers.

Author to whom correspondence should be addressed.

Summary

For those who have not recognized the disparate natures of tests of statistical hypotheses and tests of scientific hypotheses, one-tailed statistical tests of null hypotheses such as ∂≤ 0 or ∂≥ 0 have often seemed a reasonable procedure. We earlier reviewed the many grounds for not regarding them as such. To have at least some power for detection of effects in the unpredicted direction, several authors have independently proposed the use of lopsided (also termed split-tailed, directed or one-and-a-half-tailed) tests, two-tailed tests with α partitioned unequally between the two tails of the test statistic distribution. We review the history of these proposals and conclude that lopsided tests are never justified. They are based on the same misunderstandings that have led to massive misuse of one-tailed tests as well as to much needless worry, for more than half a century, over the various so-called ‘multiplicity problems’. We discuss from a neo-Fisherian point of view the undesirable properties of multiple comparison procedures based on either (i) maximum potential set-wise (or family-wise) type I error rates (SWERs), or (ii) the increasingly fashionable, maximum potential false discovery rates (FDRs). Neither the classical nor the newer multiple comparison procedures based on fixed maximum potential set-wise error rates are helpful to the cogent analysis and interpretation of scientific data.

Ancillary