Editorial: The first cut is the deepest: why do the reported effects of treatments decline over trials?
Article first published online: 6 JUN 2011
© 2011 The Author. Journal of Child Psychology and Psychiatry © 2011 Association for Child and Adolescent Mental Health
Journal of Child Psychology and Psychiatry
Volume 52, Issue 7, pages 729–730, July 2011
How to Cite
Ozonoff, S. (2011), Editorial: The first cut is the deepest: why do the reported effects of treatments decline over trials?. Journal of Child Psychology and Psychiatry, 52: 729–730. doi: 10.1111/j.1469-7610.2011.02425.x
- Issue published online: 6 JUN 2011
- Article first published online: 6 JUN 2011
In a previous editorial for this journal (September 2010), I focused on the importance of null results and their role in moving the science of psychopathology forward. I revisit this theme here, now focusing on intervention science, inspired, in part, by a recent provocative paper in The New Yorker entitled ‘The truth wears off’ (Lehrer, 2010). Written for a general audience, this article inspired discussion, sometimes heated, among both members of the public and scientists. It is highly relevant to the mission of JCPP and therefore worth additional commentary here.
Lehrer (2010) describes a pattern found in the scientific literature in which the size of effects diminishes over time, from initial published reports of robust findings that, as more publications focus on the topic, become smaller and less significant, until in some instances the effects disappear altogether. He coins the phrase ‘the decline effect’ to describe this phenomenon, which has been documented in diverse fields from physics to ecology to psychology and psychiatry. Many authors have commented on this phenomenon (see, e.g., Ioannidis, 2005; McMahon, Holly, Harrington, Roberts, & Green, 2008) prior to Lehrer bringing the issue to the awareness of a general audience. What has created the greatest controversy is the assertion that the decline effect questions the foundation of the scientific method. Does it?
Lehrer (2010) raises a number of likely explanations for the decline effect, as have other commentators on the issue. Regression to the mean may be operating when an initial exciting finding turns out to be a random statistical aberration and is corrected by subsequent studies. Another contributing factor is publication bias. When a hypothesis is first being investigated, it is difficult to publish anything other than positive findings. It is only once a finding is well-established that disconfirmatory and null results become interesting, leading to an apparent decline in effect in published studies over time. Selective reporting, in which authors don’t submit papers that yield unexpected findings or disconfirm a hypothesis, is another possibility. Samples in initial studies tend to be homogeneous, but when later trials use more diverse samples, effects are often reduced in magnitude. Access to and quality of interventions available in the community may improve over time, reducing the size of differences between a tested intervention and a treatment-as-usual comparison condition.
The recent publication trend of brief reports and single experiments may also contribute to the decline effect. Twenty years ago, it was the norm to include within one publication multiple experiments confirming and extending a finding. This is no longer the standard, resulting in less rigorously tested findings being published. Initial studies that represent the first explorations of a new hypothesis may not have the funding necessary for large sample sizes, control groups, randomization, and condition masking, all of which reduce the likelihood of errors, biases, and statistical flukes. Recent examples of the decline phenomenon in the autism literature include both the vaccine–autism link and the efficacy of the potential intervention secretin. In both cases, highly cited initial effects and associations were not replicated in multiple later trials and epidemiological studies of larger sample size and better design.
The take home message may be that ‘the truth takes time’, rather than ‘the truth wears off’, as Myers (2010) rebutted in his highly read science blog. ‘This is not a failure of science, unless you’re somehow expecting instant gratification on everything or confirmation of every cherished idea’, he writes. Initial evidence, no matter how impressive, should be interpreted with caution. We should not expect the right answer to emerge immediately, but can hope that ultimately the scientific method will prevail. This will only happen, however, if we uphold higher standards and require published studies to be carefully designed and well-powered. We must strive to create a scientific culture in which the design and methods of the study are more important to the publication process than the results. We need to undertake and publish more studies that seek to replicate previous findings. It is not uncommon for editors to reject a paper because ‘the results are largely confirmatory’, as if this is an entirely uninteresting finding or fatal flaw. Instead, as Ioannidis (2006) says, ‘replication and rigorous evaluation become as important as or even more important than discovery’. In this new scientific culture, not only would replications be welcomed, but well-powered and designed studies with null results would be appreciated as just as important as positive results, because knowing what doesn’t work is as important as knowing what does.
How is this relevant to the current issue of JCPP? One article in particular brings this timely debate into focus. Carter et al. (2011) describe a randomized trial of the Hanen More Than Words (HMTW) intervention for young children with autism. This treatment is widely used in parts of the world and, in employing parents as interventionists, is a relatively economical treatment that could be utilized even in communities without major autism resources. In short, the HMTW program may have tremendous potential, but, until now, has not been subjected to rigorous examination. Carter et al.’s study was reasonably well-powered and very carefully designed, using randomization and a modified intent to treat design. It specified outcome measures in advance, including both proximal measures (e.g., change in parental responsivity) and distal measures (e.g., change in child communication skills and symptom levels). Outcome assessment was multimodal, extending beyond parent report (very important in this case because parents, as interventionists, were aware of treatment condition) to include direct evaluation of both parent–child and examiner–child interactions.
The study found no change in child behaviors as a result of the intervention, contrary to expectation. Improvements in the more proximal outcome measure were found, but as the primary intent of the intervention was to have downstream effects on the development and functioning of the children with autism themselves, the authors interpreted this as an essentially negative trial, stating, ‘these findings, across multiple child outcomes, raise concerns about the general appropriateness of the HMTW intervention’.
The authors followed up the negative results of their trial by exploring possible moderators of treatment outcome, revealing potentially important information about specific treatment matching characteristics. They found that children with low levels of object interest had the best outcomes, but unearthed an unanticipated negative outcome for children with high levels of object interest, who demonstrated attenuated development in the HMTW intervention compared with the treatment-as-usual condition. It will be vital to replicate these findings, but they begin to help us understand not only what works and what doesn’t, but what works, for who.
These findings are consistent with another well-designed and well-powered study of a different parent-delivered treatment for autism, the Preschool Autism Communication Trial (PACT; Green et al., 2010). An initial investigation of this intervention demonstrated large group differences in a small sample (Aldred, Green, & Adams, 2004), only to find declining effect sizes and dampened enthusiasm (‘these findings suggest that the optimistic results from other studies should be reassessed’ and ‘we cannot recommend this intervention … for the reduction of autism symptoms’) once a much larger multisite investigation was undertaken (Green et al., 2010).
Carter et al.’s (2011) paper is a good example of what we strive for in JCPP. Despite finding largely null results, the manuscript was submitted, the reviewers recognized its importance, and the article was published. The findings, while initially unexpected by the authors, are convergent with those of Green et al. (2010) using a similar communication intervention for autism. Here, science is working nicely (over a relatively short time period) to refine ideas and suggest new ways forward.
Journals have a serious responsibility to publish studies with strong designs, large samples, and clearly reported results. This obligation is shared by authors, reviewers, and editors alike. We must pledge to put the rigor of the study above the findings. The funding process must invest in supporting replication science, just as the editorial process must appreciate that confirmatory results and well-powered null results can have as great an impact as novel new ‘discoveries’. We should be wary of initial reports of significant findings when they come from a single study and wait to weigh the accumulated evidence across multiple well-powered investigations. With such efforts, the science of intervention can move forward and we can hope that the decline effect itself will someday be in decline.
Thanks are due to Jonathan Green and Tony Charman for their valuable inputs to this editorial.
- 2004). A new social communication intervention for children with autism: Pilot randomized controlled treatment study suggesting effectiveness. Journal of Child Psychology and Psychiatry, 45, 1420–1430. , , & (
- 2011). A randomized controlled trial of Hanen’s ‘More Than Words’ in toddlers with early autism symptoms. Journal of Child Psychology and Psychiatry, 52, 741–752. , , , , , & (
- 2010). Parent-mediated communication-focused treatment in children with autism (PACT): A randomized controlled trial. Lancet, 375, 2152–2160. , , , , , , & the PACT Consortium (
- 2005). Why most published research findings are false. PLoS Medicine, 2, 696–701. (
- 2006). Evolution and translation of research findings: From bench to where? PLoS Clinical Trials, 1, 1–5. (
- 2010). The truth wears off. The New Yorker, 13, 52–57. (
- 2008). Do larger studies find smaller effects? European Child and Adolescent Psychiatry, 17, 432–437. , , , , & (
- 2010). Science is not dead. Retrieved April 29, 2011, from http://scienceblogs.com/pharyngula/2010/12/science_is_not_dead.php. (