We present a detailed comparison between the well-known smoothed particle hydrodynamics (SPH) code gadget and the new moving-mesh code arepo on a number of hydrodynamical test problems. Through a variety of numerical experiments with increasing complexity we establish a clear link between simple test problems with known analytic solutions and systematic numerical effects seen in cosmological simulations of galaxy formation. Our tests demonstrate deficiencies of the SPH method in several sectors. These accuracy problems not only manifest themselves in idealized hydrodynamical tests, but also propagate to more realistic simulation set-ups of galaxy formation, ultimately affecting local and global gas properties in the full cosmological framework, as highlighted in companion papers by Vogelsberger et al. and Keres et al. We find that an inadequate treatment of fluid instabilities in gadget suppresses entropy generation by mixing, underestimates vorticity generation in curved shocks and prevents efficient gas stripping from infalling substructures. Moreover, in idealized tests of inside-out disc formation, the convergence rate of gas disc sizes is much slower in gadget due to spurious angular momentum transport. In simulations where we follow the interaction between a forming central disc and orbiting substructures in a massive halo, the final disc morphology is strikingly different in the two codes. In arepo, gas from infalling substructures is readily depleted and incorporated into the host halo atmosphere, facilitating the formation of an extended central disc. Conversely, gaseous sub-clumps are more coherent in gadget simulations, morphologically transforming the central disc as they impact it. The numerical artefacts of the SPH solver are particularly severe for poorly resolved flows, and thus inevitably affect cosmological simulations due to their inherently hierarchical nature. Taken together, our numerical experiments clearly demonstrate that arepo delivers a physically more reliable solution.