An ensemble of models can be interpreted in two ways. The first treats each model as an approximation of the true system with some random error. Alternatively, the true system can be interpreted as a sample drawn from a distribution of models, such that model and truth are statistically indistinguishable. Both interpretations are ubiquitous and have different consequences for the uncertainty of model projections, but are rarely defended. Here we argue that the two seemingly conflicting views are in fact complementary, and the interpretation of the ensemble may evolve seamlessly from the former to the latter. We show some ‘truth plus error’ like properties exist for historical and present day climate simulations in the CMIP archive, and that they can be explained by the ensemble design and tuning to observations, although both models and tuning are imperfect. For future projections, structural differences in model response arise which are independent of the present day state and thus the ‘indistinguishable’ interpretation is increasingly favored. Our inability to define performance metrics that identify ‘good’ and ‘bad’ models can be explained by the models having largely exploited the available observations. The remaining model error is largely structural and the observations are often uninformative to further reduce model biases or reduce the range of projections covered by the ensemble. The discussion here is motivated by the use of multi model ensembles in climate projections, but the arguments are generic to any situation where multiple different models constrained by observations are used to describe the same system.