This paper is part of a series of papers about the multi-institutional North American Land Data Assimilation System (NLDAS) project. It compares and evaluates streamflow and water balance results from four different land surface models (LSMs) within the continental United States. These LSMs have been run for the retrospective period from 1 October 1996 to 30 September 1999 forced by atmospheric observations from the Eta Data Assimilation System (EDAS) analysis, measured precipitation, and satellite-derived downward solar radiation. These model runs were performed on a common 1/8° latitude-longitude grid and used the same database for soil and vegetation classifications. We have evaluated these simulations using U.S. Geological Survey (USGS) measured daily streamflow data for 9 large major basins and 1145 small- to medium-sized basins from 23 km2 to 10,000 km2 distributed over the NLDAS domain. Model runoff was routed with a common distributed and a lumped optimized linear routing model. The diagnosis of the model water balance results demonstrates strengths and weaknesses in the models, our insufficient knowledge of ad hoc parameters used for the model runs, the interdependence of model structure and model physics, and the lack of good forcing data in parts of the United States, especially in regions with extended snow cover. Overall, the differences between the LSM water balance terms are of the same magnitude as the mean water balance terms themselves. The modeled mean annual runoff shows large regional differences by a factor of up to 4 between models. The corresponding difference in mean annual evapotranspiration is about a factor of 2. The analysis of runoff timing for the LSMs demonstrates the importance of correct snowmelt timing, where the resulting differences in streamflow timing can be up to four months. Runoff is underestimated by all LSMs in areas with significant snowfall.