The statistical analysis of late-stage variety evaluation trials using a mixed model is described, with one- or two-stage approaches to the analysis. Two sets of trials, from Australia and the UK, were used to provide realistic scenarios for a simulation study to evaluate the different methods of analysis. This study showed that a one-stage approach gave the most accurate predictions of variety performance overall or within each environment, across a range of models, as measured by mean squared error of prediction or realized genetic gain. A weighted two-stage approach performed adequately for variety predictions both overall and within environments, but a two-stage unweighted approach performed poorly in both cases. A generalized heritability measure was developed to compare methods.