We use semi-analytic techniques to study the formation and evolution of brightest cluster galaxies (BCGs). We show the extreme hierarchical nature of these objects and discuss the limitations of simple ways to capture their evolution. In a model where cooling flows are suppressed at late times by active galactic nucleus (AGN) activity, the stars of BCGs are formed very early (50 per cent at z∼ 5, 80 per cent at z∼ 3) and in many small galaxies. The high star formation rates in these high-z progenitors are fuelled by rapid cooling, not by merger-triggered starbursts. We find that model BCGs assemble surprisingly late: half their final mass is typically locked up in a single galaxy after z∼ 0.5. Because most of the galaxies accreted on to BCGs have little gas content and red colours, late mergers do not change the apparent age of BCGs. It is this accumulation of a large number of old stellar populations – driven mainly by the merging history of the dark matter halo itself – that yields the observed homogeneity of BCG properties. In the second part of the paper, we discuss the evolution of BCGs to high redshifts, from both observational and theoretical viewpoints. We show that our model BCGs are in qualitative agreement with high-z observations. We discuss the hierarchical link between high-z BCGs and their local counterparts. We show that high-z BCGs belong to the same population as the massive end of local BCG progenitors, although they are not in general the same galaxies. Similarly, high-z BCGs end up as massive galaxies in the local Universe, although only a fraction of them are actually BCGs of massive clusters.