Effect sizes and p values: What should be reported and what should be replicated?

Authors


  • Preparation of this report was aided by National Science Foundation grains DBC 9205890. SES-9110572, and SHK-9422242. and by Nationonl Institute of Menial Health grant MH-41328.

  • We thank J. Scott Armstrong. David Bakan, Ronald P. Carver, Jacob Cohen. Alice Eagly, Lynne Edwards, Robert Frick, Gideon Keren, Lester Kruteger, Fred Leavitt, Joel R. Levin, Clifford unneborg, David Lykken, Paul E. Meehl. Ronald C. Serlin, Bruce Thompson, David Weiss, and this journal's reviewers for commnents on an earlier draft. Although these comments helped greatly in sharpening and clarifying arguments, the article's subjects matter regrettably affords no uniform consensus. so the final version contains assertions that will be unacceptable to some commenters.

Address reprint requests to: Anthony G. Greenwald Department of Psychology, Box 351525, University of Washington, Seattle, WA 98195-1525, USA, E-mail: agg@u.washingion.edu

Abstract

Despite publication of many well-argued critiques of null hypothesis testing (NHT). behavioral science researchers continue to rely heavily on this set of practices. Although we agree with most critics' catalogs of NHT's flaws, this article also takes the unusual stance of identifying virtues that may explain why NHT continues to he so extensively used. These virtues include providing results in the form of a dichotomous (yes/no) hypothesis evaluation and providing an index (p value) Mini has a justifiable mapping onto confidence in repeatability of a null hypothesis rejection. The most-criticized flaws of NHT can be avoided when the importance of a hypothesis, rather than the p value of its test, is used to determine that a finding is worthy of report, and when p=.05 is treated as insufficient basis for confidence in the replicability of an isolated non-null finding. Together with many recent critics of NHT, we also urge reporting of important hypothesis tests in enough descriptive detail to permit secondary uses such as meta-analysis.

Ancillary