We discuss the use of the Bayesian evidence ratio, or Bayes factor, for model selection in astronomy. We treat the evidence ratio as a statistic and investigate its distribution over an ensemble of experiments, considering both simple analytical examples and some more realistic cases, which require numerical simulation. We find that the evidence ratio is a noisy statistic, and thus it may not be sensible to decide to accept or reject a model based solely on whether the evidence ratio reaches some threshold value. The odds suggested by the evidence ratio bear no obvious relationship to the power or Type I error rate of a test based on the evidence ratio. The general performance of such tests is strongly affected by the signal-to-noise ratio in the data, the assumed priors and the threshold in the evidence ratio that is taken as ‘decisive’. The comprehensiveness of the model suite under consideration is also very important. The usefulness of the evidence ratio approach in a given problem can be assessed in advance of the experiment, using simple models and numerical approximations. In many cases, this approach can be as informative as a much more costly full-scale Bayesian analysis of a complex problem.