We have used a combination of high resolution cosmological N-body simulations and semi-analytic modelling of galaxy formation to investigate the processes that determine the spatial distribution of galaxies in cold dark matter (CDM) models and its relation to the spatial distribution of dark matter. The galaxy distribution depends sensitively on the efficiency with which galaxies form in haloes of different mass. In small mass haloes, galaxy formation is inhibited by the reheating of cooled gas by feedback processes, whereas in large mass haloes, it is inhibited by the long cooling time of the gas. As a result, the mass-to-light ratio of haloes has a deep minimum at the halo mass, ∼1012 M⊙, associated with L* galaxies, where galaxy formation is most efficient. This dependence of galaxy formation efficiency on halo mass leads to a scale-dependent bias in the distribution of galaxies relative to the distribution of mass. On large scales, the bias in the galaxy distribution is related in a simple way to the bias in the distribution of massive haloes. On small scales, the correlation function is determined by the interplay between various effects including the spatial exclusion of dark matter haloes, the distribution function of the number of galaxies occupying a single dark matter halo and, to a lesser extent, dynamical friction. Remarkably, these processes conspire to produce a correlation function in a flat, Ω0=0.3, CDM model that is close to a power law over nearly four orders of magnitude in amplitude. This model agrees well with the correlation function of galaxies measured in the automated-plate measurement survey. On small scales, the model galaxies are less strongly clustered than the dark matter, whereas on large scales they trace the occupied haloes. Our clustering predictions are robust to changes in the parameters of the galaxy formation model, provided only those models which match the bright end of the galaxy luminosity function are considered.