We conducted an ensemble modeling exercise using the Terrestrial Observation and Prediction System (TOPS) to evaluate sources of uncertainty in carbon flux estimates resulting from structural differences among ecosystem models. The experiment ran public-domain versions of biome-bgc, lpj, casa, and tops-bgc over North America at 8 km resolution and for the period of 1982–2006. We developed the Hierarchical Framework for Diagnosing Ecosystem Models (HFDEM) to separate the simulated biogeochemistry into a cascade of three functional tiers and sequentially examine their characteristics in climate (temperature–precipitation) and other spaces. Analysis of the simulated annual gross primary production (GPP) in the climate domain indicates a general agreement among the models, all showing optimal GPP in regions where the relationship between annual average temperature (T, °C) and annual total precipitation (P, mm) is defined by P=50T+500. However, differences in simulated GPP are identified in magnitudes and distribution patterns. For forests, the GPP gradient along P=50T+500 ranges from ∼50 g C yr−1 m−2 °C−1 (casa) to ∼125 g C yr−1 m−2 °C−1 (biome-bgc) in cold/temperate regions; for nonforests, the diversity among GPP distributions is even larger. Positive linear relationships are found between annual GPP and annual mean leaf area index (LAI) in all models. For biome-bgc and lpj, such relationships lead to a positive feedback from LAI growth to GPP enhancement. Different approaches to constrain this feedback lead to different sensitivity of the models to disturbances such as fire, which contribute significantly to the diversity in GPP stated above. The ratios between independently simulated NPP and GPP are close to 50% on average; however, their distribution patterns vary significantly between models, reflecting the difficulties in estimating autotrophic respiration across various climate regimes. Although these results are drawn from our experiments with the tested model versions, the developed methodology has potential for other model exercises.