Commentary on Brennan et al. (2011): Towards more interpretable evaluations


Brennan et al.'s review [1] highlights the need for more complex intervention models and multiple outcome measures in evaluations of interventions with licensed premises. In this commentary, I discuss applications of basic evaluation design that can address this need and improve comparability and meaningfulness of research in this area.

Differentiating outcome(s)

Brennan et al. [1] reviewed evidence for two outcomes: disorder and severe intoxication. However, effectiveness of interventions may differ depending on the outcome being examined. For example, responsible beverage service (RBS) programs may reduce intoxication under some circumstances (see [2]), but are unlikely to reduce violence significantly because the effects of RBS on intoxication tend to be small, and intoxication is only one among several factors contributing to bar violence [3,4]; other factors include staff behavior [5], environmental conditions [3,4] and behavioral norms [6]. Correspondingly, the Safer Bars program, which focuses exclusively on managing problem behavior (not RBS), reduced moderate–severe aggression [7], but would not necessarily be expected to reduce intoxication.

Linking interventions and outcomes

Both interventions and evaluations in this area could benefit from using explicit logic models linking measures of implementation to mediating variables and ultimate outcomes. As an example, the Safer Bars evaluation found significant improvement in knowledge and attitudes among bar staff who received training [8], but no significant environmental changes [5], supporting the explanation that staff training, not environmental change, was the primary mechanism by which the reduction in violence was achieved. This explanation was supported further by 12-month follow-up data indicating that (a) venues that had lower turnover of Safer Bars-trained managers and security staff had lower rates of aggression than did venues with higher turnover [7], and (b) managers reported making few environmental changes following the program.

Community mobilization projects, in particular, often include multiple interventions undertaken simultaneously, making it difficult to identify effective components. A model identifying mediating and moderating variables can contribute to a better understanding of how multiple-component programs work. In terms of moderating variables, for example, Wagenaar et al. [9] hypothesized that their community intervention would have a greater effect on 18–20-year-olds (who were the primary target) than on 15-17-year-olds; however, they found similar effects for both age groups, suggesting that the intervention may have had an effect on community norms as well as an effect attributable to the development of age-specific policies.

The logic model can also identify unrealistic outcome expectations from interventions limited in scope or duration. For example, multi-component large-scale community interventions may have substantial and measurable community-level impacts, whereas bar-level interventions are unlikely to have effects that can be measured at the community level, even if significant at the bar level, unless delivered community-wide. The STAD project (Stockholm Prevents Alcohol and Drug Problems) illustrated the importance of intervention potency and duration with a 10-year time-frame, during which the effects of this multi-component program actually increased over time [10], in contrast to most other interventions where effects have tended to diminish over time [5].

Avoiding confounding implementation and outcome measures

Police arrest data are often used to estimate an intervention's effect on crime; however, these data are unlikely to provide a valid assessment when the intervention involves changes in police activity. For example, a police-led [11] and a community intervention [12] involving enhanced policing both found an increase in alcohol-related arrests following the intervention; however, the community study [12] also found a significant decrease in emergency room visits for assault, suggesting a real reduction in assaults mediated possibly by enhanced policing.

Controlling threats to validity

Evaluations in this area sometimes ignore well-known threats to internal and external validity [13]. For example, when an intervention has occurred in response to a natural spike in problems, improvement following the intervention could be attributable to regression to the mean, a design problem not solved by the use of a non-equivalent comparison area. This explanation could account for the dramatic decline in assault rates following the first year of the Geelong Accord [14,15], although data from subsequent years suggest a possible real effect of the Accord.

To conclude, there are many methodological challenges to implementing and evaluating interventions in the community. As Brennan et al. suggest, one solution to understanding results from a diverse set of studies is research to ‘unify disparate measures of harm so that studies can be compared’. However, more interpretable findings are possible if interventions and evaluations are based on logic models that clearly identify the process by which the intervention is expected to work and distinguish between measures of implementation, mediation and outcomes. In addition, reviews could also use a logic model to frame their assessment of existing findings. Such an approach to reviews would allow for meaningful reviews to be conducted without unnecessarily restrictive criteria that require the exclusion of important studies (e.g. [16,17]) because they do not meet specific design criteria.

Declaration of interests