Galaxies are (biased) tracers of the dark matter in the Universe. This mapping can be conveniently divided into two parts: the connection between galaxies and dark matter haloes (halo occupation statistics) and the relation between haloes and the underlying matter distribution. The former is the focus of this paper in which we formulate the concept of non-linear and stochastic galaxy biasing in the framework of halo occupation statistics. Using two-point statistics in projection, we define the galaxy bias function, bg(rp), and the galaxy–dark matter cross-correlation function, , where rp is the projected distance. We use the analytical halo model to predict how the scale dependence of bg and , over the range 0.1 ≲ rp ≲ 30 h−1 Mpc, depends on the non-linearity and stochasticity in halo occupation models. In particular, we quantify the effect due to the presence of central galaxies, the assumption for the radial distribution of satellite galaxies, the richness of the halo and the Poisson character of the probability to have a certain number of satellite galaxies in the halo of a certain mass. Overall, brighter galaxies reveal a stronger scale dependence, and out to a larger radius. In real space, we find that galaxy bias becomes scale independent, with , for radii r ≥ 1–5 h−1 Mpc, depending on luminosity. However, galaxy bias is scale dependent out to much larger radii when one uses the projected quantities defined in this paper. These projected bias functions have the advantage that they are more easily accessible observationally and that their scale dependence carries a wealth of information regarding the properties of galaxy biasing. To observationally constrain the parameters of the halo model and to unveil the origin of galaxy biasing, we propose the use of the bias function . This function is obtained via a combination of weak gravitational lensing and galaxy clustering, and it can be measured using existing and forthcoming imaging and spectroscopic galaxy surveys.