Behaviour modification is often delivered to teaching subgroups. For example, experimental and control smoking cessation programmes may be given to 15 classes (subgroups) with 10 (otherwise independent) individuals. We present general statistical tests and power estimates to compare continuous outcomes from two interventions in settings where the magnitude of teaching subgroup heterogeneity, number of subgroups and subgroup size can differ between intervention arms. An application is made to data from a trial to reduce disease-transmitting sexual behaviour. The statistical impact of teaching subgroup heterogeneity effect increases as the (a) number of participants in a subgroup increases, and (b) ratio of ‘averaged experimental and control subgroup effect variance’ to study subject variance increases. If plausible levels of subgroup teaching effect heterogeneity are ignored, the true sizes of tests with nominal 0.05 two-sided type I errors range from 0.055 to 0.47, while when planning studies, estimated sample sizes are only 11.1–95.2 per cent of the true requirements. Copyright © 2002 John Wiley & Sons, Ltd.