Desertification threatens 70% of all dry lands worldwide by diminishing the provision of economic and ecosystem services. However, since long-term vegetation dynamics of semiarid ecosystems are difficult to study, the opportunities to evaluate desertification and degradation properly are limited. In this study, we tailored, calibrated and tested a spatially-explicit simulation model (DINVEG) to describe the long-term dynamics of dominant grass and shrub species in the semiarid Patagonian steppe. We used inverse techniques to identify parameterizations that yield model outputs in agreement with detailed field data, and we performed sensitivity analyses to reveal the main drivers of long-term vegetation dynamics. Whereas many parameterizations (10–45%) matched single field observations (e.g. grass and shrub cover, species-specific density, aboveground net primary production [ANPP]), only a few parameterizations (0.05%) yielded simultaneous match of all field observations. Sensitivity analysis pointed to demographic constraints for shrubs and grasses in the emergence and recruitment phase, respectively, which contributed to balanced shrub-grass abundances in the long run. Vegetation dynamics of simulations that matched all field observations were characterized by a stochastic equilibrium. The soil water content in the top layer (0–10 cm) during the emergence period was the strongest predictor of shrub densities and population growth rates and of growth rates of grasses. Grasses controlled the shrub demography because of the resource overlap of grasses with juvenile shrubs (i.e. water content in the top layer). In agreement with field observations, ecosystem function buffered the strong variability in precipitation (a simulated CV in ANPP of 16% vs CV in precipitation of 33%). Our results show that seedling emergence and recruitment are critical processes for long-term vegetation dynamics in this steppe. The methods presented here could be widely applied when data for direct parameterization of individual-based models are lacking, but data corresponding to model outputs are available. Our modeling methodology can reduce the need for long-term data sets when answering questions regarding community dynamics.