Large-scale hydrological models, simulating the terrestrial water cycle on continental and global scales, are fundamental for many studies in earth system sciences. However, due to imperfect knowledge of real world systems, the models cannot be expected to capture all aspects of large-scale hydrology equally well. To gain insights in the strengths and shortcomings of nine large-scale hydrological models, we assessed their ability to capture the mean annual runoff cycle. Unlike most other studies that rely on discharge observations from continental scale river basins, our study is based on observed runoff from a large number of small, near-natural catchments in Europe. We evaluated the models' ability to capture the average magnitude, the amplitude, as well as the timing of the mean annual runoff cycle. Our study revealed large uncertainties when modeling runoff from these small catchments. We identified large differences in model performance, however, the ensemble mean (mean of all model simulations) yielded rather robust predictions. Model performance varied systematically with climatic conditions and was best in regions with little influence of snow. In cold regions, many models exhibited low correlations between observed and simulated mean annual cycles, which can be associated with shortcomings in simulating the timing of snow accumulation and melt. Local (grid cell) scale differences between observed and simulated runoff can be large and local biases often exceeded 100%. These local uncertainties are contrasted by a relatively good regional average performance, ultimately reflecting the purpose of the models, i.e., to capture regional hydroclimatology.