Observations have shown that massive galaxies at high redshift have much smaller effective radii than galaxies of similar mass today; however, recent work has shown that they have similar central densities. The primary growth of size, therefore, relates to the apparent relative abundance of low-density material at low redshifts. But various models have been proposed to accomplish this, and the exact contribution of these mechanisms, relative to others that would, for example, lower the density of the system uniformly, or relate to possible observational misestimates of the stellar mass distribution, remain uncertain, as does the degree to which this evolution is driven by processes of initial spheroid formation versus subsequent ‘dry’ assembly of spheroids. These different possibilities also yield dramatically different constraints on any possible evolution in the MBH–σ relation. Here, we compile observations of spheroid properties as a function of redshift and use them to test the different proposed models, each of which we have calibrated and studied in a suite of high-resolution hydrodynamic simulations. We show that the evolution in progenitor disc gas fractions with redshift gives rise to the initial formation of smaller spheroids at high redshift. We then consider how these early-forming systems must evolve to be consistent with the larger sizes of old spheroids today. We consider (1) equal-density ‘dry’ mergers, (2) later major or minor ‘dry’ mergers with less dense galaxies, (3) adiabatic expansion, after significant gas mass loss, (4) gradients in stellar mass-to-light ratios from young nuclear stellar populations (yielding smaller Re at early times, which vanish as the system fades), (5) biases in the stellar mass estimation of high-redshift (young) systems (from e.g. uncertain asymptotic giant branch starlight contributions) and (6) observational effects (possible biases in fitting or missed light from surface brightness dimming, or the effects of different definitions of effective radii). In principle, any of these models could be tuned to explain any observed effective radius evolution. However, the predicted evolution in velocity dispersions, central stellar mass surface densities and profile shape are very distinct. Comparing with observations, only model (2), later or minor ‘dry’ mergers with less dense systems, is consistent with the constraints as an explanation of the entire effect. Moreover, it is the only model which allows for any evolution in MBH–σ towards more massive black holes (BHs) at high redshift. Still, the amount of merging needed for this to explain the observed factor of ∼6 size evolution is larger than that predicted by hierarchical growth and clustering constraints. We, therefore, consider a cosmologically motivated model with high-resolution simulations, in which the initial galaxy forms in a gas rich merger and is observed at an appropriate age under representative conditions, then evolves undergoing a ‘typical’ level of dry merging and mass loss. We show that this case is consistent with all the observational constraints without tension with cosmological expectations. Effect (2), which builds up an extended, low-density envelope, dominates the evolution (giving factors ∼2–3 size evolution), but effects (1), (3), (4) and (possibly) (6) each contribute an additional ∼20 per cent size evolution (net factor of ∼2), together bringing the natural cosmological predictions into good agreement with the combination of observational constraints. We discuss implications for the evolution in correlations between BH and host bulge properties and show that this naturally predicts some evolution similar to that observed; better observations of BH masses could also constrain host galaxy merger histories.