• Örebro prevention programme;
  • outcome evaluation;
  • youth drinking

During 2007–10, my co-author Anna Strandberg and myself performed an independent effectiveness trial of the Örebro prevention programme (ÖPP) [1]. ÖPP is a Swedish programme at the universal prevention level, which aims to delay and reduce drunkenness in 13–16-year-old youth using brief presentations to parents during six termly regular teacher–parent meetings. It was first evaluated by the developers Koutakis & Stattin in 1999–2001, with scientific reporting of results in 2008 [2]. Özdemir & Stattin's [3] first criticism concerns the fact that the number of presentations in our trial was lower than stipulated, and that this might be a reason why the programme did not work as intended. We clearly agree with this statement, which at the same time needs to be contextualized. As a trial of programme effectiveness, the research question in our study is not whether the programme works under optimal research conditions when delivered by the programme developers, but if it works when delivered by multiple programme presenters under regular practice conditions. The fact that fidelity and dosage tend to decrease when programmes go into wide dissemination has been pointed out as one of the major challenges of current prevention research [4]. Current standards also note that ‘… a program that produces significant effects in an efficacy trial may or may not yield similar effects under real-world conditions’ ([5], p. 3). Thus, the implementation issue is one important reason for conducting effectiveness trials in the first place. In our opinion, the fact that we ended up with a mean of 4.7 (standard deviation = 0.9) presentations rather than six, in spite of extensive efforts to coordinate and facilitate for participating schools, is one significant finding from this trial. The main reasons were that termly parent meetings were not part of regular practice for all schools, and that some teachers and/or parents found the programme presentations too repetitive. As these observations speak of hindrances for schools to carry out the programme as intended, we consider them noteworthy and important. In a Swedish report we have elaborated further the basis of our conclusion that, overall, implementation levels in our trial fall somewhat below those than in the original study, but at the same time exceed those in regular practice [6].

The second criticism by Özdemir & Stattin concerns the reduction of variance resulting from dichotomization of the outcome variables. Dichotomization is not uncommon in prevention research, and presumably often conducted for the same reasons that we describe in our trial, i.e. severely skewed data distributions that do not respond well to transformations due to large numbers of zeros, and with few events in categories indicating higher drinking and drunkenness frequencies. For outcome variables such as life-time prevalence of drunkenness (debut), the binary form is necessary for other and obvious reasons. Our first and more general comment to this is that while acknowledging the difficulties of power and sample size estimation in general, and in cluster-randomized trials in particular, we have judged the sample size of 1750 participants as sufficient to detect a small-to-moderate effect size. Although this guess could probably have been better educated than ours, we note that this is twice the sample size and five times the number of schools of the original ÖPP trial, where small-to-moderate effect sizes were found (using a continuous outcome measure, however). Our second comment to this critique is more important, and pertaining specifically to our trial and data. Özdemir & Stattin applied a latent growth modelling approach to the original metric of the drunkenness and consumption variables. Their description of these analyses is not very detailed, but as latent growth models require and benefit from multiple time-points, we assume that they used data from all three measurements (T1, T2 and T3). The major problem with the model that yielded the significant finding (i.e. that of life-time drunkenness) is that it includes data flawed by differential attrition at the second measurement occasion (T2). As reported in our paper, data at T2 showed significantly higher dropout rates in the control group (9.4 versus 6.5%, χ2 = 5.16, P < 0.05), but with a more selective attrition in the intervention group. That is, youth in the ÖPP condition who were lost to follow-up at T2 were significantly more likely than their completing counterparts to have reported life-time drunkenness at baseline (P = 0.01), while a corresponding difference was not present in the control group (P = 0.17). Such pre-test differences on the outcome of interest have been suggested to provide the best available estimate of the spurious effect that can be expected at post-test [7]. Thus, we assume that the life-time drunkenness variable at T2 is most probably biased in favour of the ÖPP group. Because the observed group differences in attrition rates and pre-test drunkenness status were not present at T3, we believe that for any analysis to produce reliable estimates of programme effects on the drunkenness variables, it should be based on T1 and T3 (9th grade) data only. T3 also has other benefits, such as greater sensitivity due to higher drunkenness frequencies than at T2 (in fact, significant programme effects were not observed until the 9th grade in the initial study of ÖPP). Against this background, we do not consider the re-analyses by Özdemir & Stattin to be a solid ground for rejecting the null hypothesis and changing our conclusion about programme effectiveness.

With regard to the third criticism, we agree strongly that the test of mediating mechanisms is an important issue for trials of intervention effects. The encouraging results of these analyses suggest that our null findings are not due to an erroneous programme theory, but to other factors. As mentioned, one of these may be that national media campaigns have led to elevated levels of restrictiveness in Swedish parents in general. Finally, rather than adopting a one-sided search for reasons why this trial failed to replicate the effects identified in the first study, we point to the need for an independent evaluation which takes into account the total evidence base of all three studies. Meta-analytical procedures and the thorough assessment of risk of bias in the initial study, as well as in the two subsequent trials [1, 8] would improve the basis for the effect estimates, and provide guidance on the degree of confidence to place in them.

Declarations of interest


  • Maria Bodin

  • National Board of Health and Welfare, Stockholm, Sweden. E-mail:


  1. Top of page
  2. References