Free-air CO2 enrichment (FACE) experiments provide a remarkable wealth of data which can be used to evaluate and improve terrestrial ecosystem models (TEMs). In the FACE model-data synthesis project, 11 TEMs were applied to two decadelong FACE experiments in temperate forests of the southeastern U.S.—the evergreen Duke Forest and the deciduous Oak Ridge Forest. In this baseline paper, we demonstrate our approach to model-data synthesis by evaluating the models' ability to reproduce observed net primary productivity (NPP), transpiration, and leaf area index (LAI) in ambient CO2 treatments. Model outputs were compared against observations using a range of goodness-of-fit statistics. Many models simulated annual NPP and transpiration within observed uncertainty. We demonstrate, however, that high goodness-of-fit values do not necessarily indicate a successful model, because simulation accuracy may be achieved through compensating biases in component variables. For example, transpiration accuracy was sometimes achieved with compensating biases in leaf area index and transpiration per unit leaf area. Our approach to model-data synthesis therefore goes beyond goodness-of-fit to investigate the success of alternative representations of component processes. Here we demonstrate this approach by comparing competing model hypotheses determining peak LAI. Of three alternative hypotheses—(1) optimization to maximize carbon export, (2) increasing specific leaf area with canopy depth, and (3) the pipe model—the pipe model produced peak LAI closest to the observations. This example illustrates how data sets from intensive field experiments such as FACE can be used to reduce model uncertainty despite compensating biases by evaluating individual model assumptions.