Primarily driven by concern about rising levels of atmospheric CO2, ecologists and earth system scientists are collecting vast amounts of data related to the carbon cycle. These measurements are generally time consuming and expensive to make, and, unfortunately, we live in an era where research funding is increasingly hard to come by. Thus, important questions are: “Which data streams provide the most valuable information?” and “How much data do we need?” These questions are relevant not only for model developers, who need observational data to improve, constrain, and test their models, but also for experimentalists and those designing ecological observation networks.
Here we address these questions using a model–data fusion approach. We constrain a process-oriented, forest ecosystem C cycle model with 17 different data streams from the Harvard Forest (Massachusetts, USA). We iteratively rank each data source according to its contribution to reducing model uncertainty. Results show the importance of some measurements commonly unavailable to carbon-cycle modelers, such as estimates of turnover times from different carbon pools. Surprisingly, many data sources are relatively redundant in the presence of others and do not lead to a significant improvement in model performance. A few select data sources lead to the largest reduction in parameter-based model uncertainty. Projections of future carbon cycling were poorly constrained when only hourly net-ecosystem-exchange measurements were used to inform the model. They were well constrained, however, with only 5 of the 17 data streams, even though many individual parameters are not constrained. The approach taken here should stimulate further cooperation between modelers and measurement teams and may be useful in the context of setting research priorities and allocating research funds.