The Role of Data Assimilation in Predictive Ecology

In this rapidly changing world, improving the capacity to predict future dynamics of ecological systems and their services is essential for better stewardship of the earth system. Prediction relies on models that describe our understanding of the major processes that underlie system dynamics and data about these processes and the present state of ecosystems. Prediction becomes more effective when models are well informed by data. A technological revolution in the capacity to collect data now provides very different opportunities to test hypotheses and project future dynamics than when many standard statistical tests were first developed. Data assimilation is an emerging statistical approach to combine models with data in a rigorous way to constrain model parameters and system states, identify model error, and improve ecological prediction. In this paper, we illustrate how data assimilation can improve ecological prediction to support decision-making by reviewing applications of data assimilation across four different research fields: (1) emerging infectious disease, (2) fisheries, (3) fire, and (4) the terrestrial carbon cycle. Across these fields, data assimilation substantially improves prediction accuracy, highlighting its important role in enabling predictive ecology. Data assimilation with regional and global models faces major challenges, such as the large number of parameters to be estimated, high computational demands, the need to integrate multiple and heterogeneous data sets, and complex social-ecological interactions. Nevertheless, data assimilation provides an important statistical approach that has great potential to enhance the predictive capacity of ecological models in a changing climate. Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.


INTRODUCTION
We live in a period marked by numerous environmental challenges: rapid climate change, profound alteration of biogeochemical cycles, unsustainable depletion of natural resources, rapid spread of invasive species, emerging infectious disease, and unprecedented threats from natural and anthropogenic disturbance (IPCC 2007).Predicting future changes in natural resources and the environment is therefore critical to effectively inform policy-makers if society hopes to continue to extract and use natural resources to support local-to-global economies and sustain thriving human societies.Toward that end, we, as a research community, need to advance the predictive capacity of ecology to better anticipate future states and services of ecosystems and enable a better stewardship of the earth system.
Despite the great societal need to predict future changes in the environment and natural resources under global change, predictive ecology has not been well developed, partly because we still lack effective approaches to improve the accuracy of ecological prediction.Data assimilation (DA) has the potential to enable and empower predictive ecology (Luo et al. 2011).DA refers to a suite of statistical techniques used to improve process models based on data.A process model describes how a system works by characterizing major components and their interactions under various forcing scenarios.Like regression analysis, DA requires both a model and data to be combined, aims to minimize differences between data and model, and uses an algorithm to obtain optimal parameters with data-model differences being minimized (Lewis et al. 2006).Unlike simple regression, DA can be applied to complex process models and multiple heterogeneous data sets, can optimizes tens or hundreds of parameters and state variables simultaneously, and has the capacity to navigate complex parameter spaces.As such, a DA-trained process model can not only better describe observed dynamics of an ecological system but can also provide improved predictions of future states of the system in a manner consistent with process understanding (Luo et al. 2011, Keenan et al. 2012).For example, it has long been used to successfully improve numerical weather predic-tion.The capacity of DA to improve ecological predictions, however, has not been fully explored.
This paper examines how DA may enable and empower predictive ecology.We first discuss approaches used to predict future states of ecological systems.Then we use examples from four research areas, infectious diseases, fisheries, fire, and the carbon cycle, to show that DA can make predictive ecology possible by improving ecological predictions to the extent that prediction results can guide policy making.We conclude with a discussion of the future challenges and opportunities for further development of DA to enable predictive ecology.

APPROACHES OF PREDICTING FUTURE STATES OF ECOLOGICAL SYSTEMS
Ecological prediction involves describing future state of ecological systems with fully specified uncertainties (Clark et al. 2001).There are two types of ecological prediction: (1) classical prediction of the most likely future state of an ecological system, conditioned on current observations and trends; and (2) projections, which are ''what-if'' analyses of the most likely future state of a system under explicit scenarios of climate, land use, human population, technologies, and economic activity (Luo et al. 2011).Classic prediction may be applied to fast-evolving systems, such as the spread of an infectious disease, whose dynamics are strongly governed by their current state, whereas projection becomes necessary when alternative scenarios, such as disturbance impacts on ecosystem carbon dynamics, are plausible.Both types of predictions must quantify the past and current states of ecological systems as a starting point and then use models to project the future dynamics.
Traditionally, models have been the primary tool for predicting the future states of ecological systems.For example, biogeochemical models have been incorporated into earth system models to predict responses and feedbacks of the terrestrial carbon cycle under climate-change scenarios (Lawrence and Swenson 2011).Those predictions have been incorporated into the assessment reports of the Intergovernmental Panel on Climate Change (IPCC) to guide mitigation and adaptation efforts by govern-ments and public (IPCC 2007(IPCC , 2013)).The models used have incorporated the dominant processes of ecological systems in order to quantitatively explore ecosystem dynamics.Each of these models usually projects one deterministic trajectory about the future behavior of the system.The trajectory, in most cases, does not capture the dynamics of the ecosystem in the real world because using fixed values for all the parameters in one deterministic model does not account for uncertainty in the parameters and state variables.Therefore, models alone often do not represent the past and current system dynamics closely enough to allow confidence in their predictions (Schwalm et al. 2010, Dietze et al. 2011, Keenan et al. 2012).
To be useful for predictive ecology, we need both process models, to represent key processes that determine the dynamic behavior of an ecological system, and also data, to identify those key processes and constrain model parameters and state variables via data assimilation.DA treats the model structure and ranges of parameter values as prior information in a Bayesian framework to represent the current state of knowledge.It uses global optimization techniques to update parameters and state variables of a model based on information contained in multiple, heterogeneous data sets that describe the past and current states of an ecosystem (Fig. 1).The posterior distributions of estimated parameters through DA usually include the maximum likelihood estimates and are used for forward modeling toward prediction.The probability density function of predicted future states after DA usually has a narrower spread than that without DA for the same model structure and priors (Weng andLuo 2011, Keenan et al. 2013).In the case that data does not match model structure and/or priors, data sets should not be assimilated into a model before the model structure and/or priors are reexamined and adjusted (LeBauer et al. 2013).
DA techniques have long been successfully applied to improve the accuracy of weather forecasts by initializing an atmospheric model with estimates of recent and current weather that form a basis for weather forecasts for the next few days (Kalnay 2002).Numerical weather forecasting was first attempted in the 1920s but did not produce realistic forecasts until the 1950s when DA could be done by computer simulation.Since then weather forecasts have improved steadily and are currently much more accurate.This great success results from advances in (1) mathematical models to represent physical processes of atmosphere dynamics, (2) improved observation technologies, and (3) DA to infuse observations into models to continuously update predictions.In DA, both model and data uncertainty are fully specified, as the information content of a prediction is inversely proportional to its uncertainty (Chatfield 1995, Clark et al. 2001, Liu and Gupta 2007).It is also important to understand which processes drive predictive uncertainty (Weng and Luo 2011, Dietze et al. 2013, LeBauer et al. 2013).Thus, data-model integration via DA acknowledges the critical importance of not just data quantity, but also the uncertainties in both the model and the data, in providing a predictive understanding.Overall, DA can improve ecological prediction by (1) providing estimates of parameters, initial values, and state variables, (2) quantifying uncertainties with respect to those parameters, initial conditions, and modeled states of an ecological system, (3) selecting alternative model structures, and (4) providing a quantitative basis to evaluate sampling strategies for future experiments and observations that will enable improvement to models and predictions (Luo et al. 2011).
This paper mentions a few DA techniques, which mostly are variants of the Kalman filter (KF).This is a recursive algorithm for estimating initial conditions, parameters, and state variables of a model from a series of heterogeneous, intermittent measurements (Kalman 1960).The KF iteratively repeats two sequential steps: forecast and update.The forecast step evolves the currently estimated ecosystem state forward in time using the model.The update step adjusts target parameters by combining observations of the current state of a system with the results from a model.The Ensemble Kalman Filter (EnKF) is a commonly-used and flexible variation on the KF that uses Monte Carlo techniques to generate ensemble predictions for the forecast step (Gao et al. 2011).This paper mentions a few DA techniques, which mostly are variants of KF.For a detailed description of DA techniques and principles, please refer to classical books by Kalnay (2002), Lewis et al. (2006), andEvensen (2007) or some of the published papers, including those by Wang et al. (2009), Harrison (2011), andPeng et al. (2011).

APPLICATION OF DA IN PREDICTIVE ECOLOGY
As a relatively new technique to ecology, DA has only recently been applied to a breadth of research issues.In this section, we review examples of its applications in four diverse fields of ecological research.In each field, we briefly describe the system and research issues, the process-based models used to describe those systems, and the effectiveness of DA in constraining the models and their predictions.

Infectious disease
The study of epidemics has long recognized the importance of connecting observations with theory.DA is a powerful tool for rigorously establishing these connections, allowing predictions of the trajectory of infectious disease outbreaks-an excellent example of the importance of DA to societally relevant predictive ecology.The dynamics of infectious diseases are classically described by Susceptible-Infected-Removed (SIR) models, which predict threshold responses to both population size and R 0 , the ratio between contagion and recovery rate (Kermack and McKendrick 1927).Given these theoretical thresholds, DA has historically been solely focused on estimating the parameters of SIR models (LaDeau et al. 2011).However, one of the primary challenges in disease prediction is that in the early stages of an outbreak the key parameters almost always have to be estimated from very few data points.Even with wellstudied human diseases like influenza, key parameters still need to be estimated each year from limited information (Hooten et al. 2010).Combining this difficulty with the fact that the onset of an outbreak is stochastic, non-linear, and sensitive to initial conditions, accurate prediction of outbreaks remains challenging, whether it is for an animal, plant, or human disease.
A growing body of DA literature goes beyond just parameter estimation to demonstrate the potential for real-time estimation and prediction of both parameters and state variables (such as the number of people currently infected).In realtime estimation, updates to forecasts are generated automatically based on new observations as soon as those observations are made.For example, the emergence of real-time data on human influenza indirectly from Google Flu Trends has spawned multiple attempts at realtime assimilation (Dukic et al. 2012, Shaman andKarspeck 2012).The real-time modeling during the 2001 foot-and-mouth epidemic in the UK (Keeling et al. 2003, Tildesley et al. 2008) contributed substantially to decisions to restrict animal movement and cull livestock populations.These actions have been credited with helping to control the outbreak, and more generally successes like these have contributed to progress of disease modeling toward a more predictive science (Tildesley et al. 2008).
The real-time daily forecast of the 2009 H1N1 outbreak in Singapore is an excellent example of the use of DA to quickly create an operational forecast system (Fig. 2; Ong et al. 2010).This largely grass-roots effort was pulled together within a month with no budget during the early stages of the outbreak by researchers at local hospitals and universities who asked local clinics to report cases of influenza-like illness (Ong et al. 2010).A Susceptible-Exposed-Infectious-Removed model was updated daily using a sequential DA approach to evolve the estimates of both the model parameters and the numbers of individuals in each state.The parameters of the model are estimated within the Bayesian statistical paradigm in which semi-informative prior distributions are assigned to parameters and incoming data incorporated via the likelihood function to obtain a time series of posterior distributions for the parameters and unobserved state space.By doing so, DA increased the confidence in parameter estimates over time, while allowing parameters to change as the epidemic evolved.From these daily updated states and parameters, probabilistic forecasts of the epidemic's progression were generated for the coming month and forwarded to a public website.Initial forecasts were adversely affected by uncertainty in the parameters, caused by the vagueness of the subjective prior distributions and the scarcity of information from the data (Ong et al. 2010).So, early in the epidemic the model tended to over-predict the magnitude of the outbreak.However, the accuracy improved through time.The model correctly predicted the timing of the peak infection weeks ahead of time and provided a remarkably accurate forecast of the declining phase of the epidemic.These forecasts were publicized by the local media and are believed to have contributed to increased transparency, improved risk communication and mitigation, and reduced panic.
While much of the above discussion focused on human diseases there is also a growing body of examples of DA in plant, animal, and zoonotic disease systems.Outside the context of real-time forecasting, Bayesian state-space models have been used for combined parameter and state estimation in systems ranging from measles in sub-Saharan Africa (Ferrari et al. 2008), to white pine blister rust in the greater Yellowstone ecosystem (Hatala et al. 2011), to chronic wasting disease of North American mule deer (Farnsworth et al. 2006).More generally, hierarchical models are being used in disease modeling to better partition different sources of uncertainty, such as separating process variability from observation error (LaDeau et al. 2011).

Fisheries
Sustainable exploitation of marine resources, such as commercial fisheries and fish farming, is becoming increasingly important for economic development.Given past fishery collapses, decision-makers need information on current and potential future states of fish stocks.DA techniques, such as the Kalman filter, were first applied to fisheries models in the 1990s (Schnute 1991).Now, a wide array of DA techniques has been applied to fishery models that generally describe population dynamics and spatial distributions of fish presence and absence.
Fish stock assessment models are designed to determine the effects of the fishery on fish populations and usually include the demograph- v www.esajournals.orgic processes of birth, natural death, harvest, growth, maturation, and movement.The assessment models can be constrained either by time series of fish catch to infer current and target fish stock abundance and the maximum sustainable yield, or by a time series of detailed fishery catchat-age data, to reconstruct the virtual abundance of each annual cohort that had been fished (Methot and Wetzel 2013).For example, after using DA to integrate standardized catch-perunit-of-effort (CPUE) data into stock assessment, the prediction uncertainty was reduced for simulation of the annual variation of Trevally (Pseudocaranx dentex) in the west coast of New Zealand (Maunder and Langley 2004).
The Kalman filter (KF) is the most common DA technique used to improve fish stock predictions.Holt and Peterman (2004) analyzed 24 sockeye salmon stocks and compared the mean square error (MSE) and percent bias of predicted recruitments (relative vs. observed) estimated with and without KF.They found that DA lowered MSE for about 35% of the stocks and had bias closer to zero.Gronnevik and Evensen (2001) used DA for fisheries modeling and stock assessment.Three DA techniques, EnKF, ensemble smoother (uses of data backward in time to improve the estimates at prior times), and ensemble Kalman smoother (an extension of EnKF to improve the estimate at prior times with data which work better with nonlinear dynamics), were applied to an age-structured population model using catch-at-age data for Icelandic cod.All three estimates that used DA had lower variance of prediction than that with no data assimilation (Fig. 3) (Gronnevik and Evensen 2001).
In addition to stock assessment models, species distribution modeling is widely used to estimate the presence and absence of fish species using the geographical and environmental characteristics of each fishing location.To better predict fish occurrence, some models, such as a hierarchical Bayesian spatial model (Munoz et al. 2013) and generalized linear and additive models (GLM and GAM) (Guisan et al. 2002) were improved by DA in terms of either model structure or parameters.For example, Bayesian Kriging, which can incorporate parameter uncertainty into the prediction process, was used to generate v www.esajournals.orgthe maps of horse mackerel occurrence probabilities that that incorporated spatial and parameter uncertainties, geographical characteristic data, and chlorophyll-a concentration (Munoz et al. 2013).In classical geostatistical predictions, the true range of uncertainties is underestimated because many parameters are estimated through the statistical model, potentially leading to the overestimation of predictive accuracy.By incorporating the geographical characteristic data and chlorophyll-a concentration data into the model, the occurrence of the species in unsampled areas was predicted with more accuracy and fully quantified uncertainty.

Fire
Fire shapes the structure, function, and biodiversity of many ecosystems, such as savannah, heathland and forest.Most fire management relies on the prediction of fire frequency, severity, and spread (McKenzie et al. 2000, Xue et al. 2012) to assist decision-making.In addition, fire is a key determinant of the global carbon cycle (Bowman et al. 2009).
Simulation models that describe fire behavior are used to manage fire.For example, Linn and Cunningham (2005) modeled the forward spread of grassland fires under ambient atmosphere winds and different initial lengths of the fire lines.AIOLOS-F, a model of fire behavior prediction was developed as a decision support tool (Croba et al. 1994).However, many of the existing models were not developed for practical use and real-time predictions, but instead used for academic research.Accurate fire prediction requires not only physical models but also datasets of fuel and weather, both of which are quite variable in both time and space (Keane et al. 2001).Moreover, fire behavior is also highly non-linear and complex with interactions among the combustion processes, the landscape, local atmospheric environment and vegetation characteristics, as well as human aspects (Lavorel et al. 2007).The performances of fire model are also sensitive to boundary conditions, which are usually unknown (Sullivan 2009).Therefore, there are many challenges in predicting fire behavior with sufficient accuracy to support the decision-making process.
DA can address some of the above-mentioned issues in fire modeling and improve the accuracy of fire predictions.For example, Mandel et al. (2008) used an ensemble Kalman filter (EnKF) to assimilate measured temperature and remaining fuel into a simple wildfire model in order to effectively track the location of the fireline.They found that fire temperature simulation was significantly improved with EnKF, and the trained model accurately tracked the measured fireline correctly regardless of ignition location.This successful application showed that even with a relatively simple fire model and significant errors in the initial conditions, DA can help train the model to realistically predict fire behavior with high confidence.
Sequential Monte Carlo (SMC or particle filters) is another effective DA technique that has been used for integrating observations into wildfire models.Wildfire models typically have nonlinear and unstable behaviors.SMC methods are particularly useful for the wildfire models because they use an ensemble-based approach and implement the Bayesian recursion algorithms directly to highly nonlinear state-space models (Doucet et al. 2001).Xue et al. (2012) dynamically assimilated ground temperature sensor data of a wildfire to improve predictions.The wildfire model is a discrete event model called DEVS-FIRE.SMC uses Bayesian inference and a stochastic sampling technique to recursively estimate the state of dynamic systems from given observations.After assimilating ground temperature sensor data, the DA system improved the accuracy of wildfire prediction.The application of DA to fire behavior modeling is currently not very common.Nonetheless, DA is a promising tool to help accurately predict fire behavior.

Terrestrial carbon cycle
The development of DA approaches has been an active area of research in terrestrial carbon cycle models.This application is stimulated partly by climate change research and partly by an increasing availability of data.From very simple models with scores of parameters to highly complex models with thousands of parameters, biogeochemical models have been widely used to explore ecosystem responses and feedbacks to climate changes on century-tomillennium time scales (McGuire et al. 2001, Friedlingstein et al. 2006) or to study interactions of multiple global-change factors on land management and ecosystem services on decadal or shorter time scales (Schroter et al. 2005, Schmid et al. 2006, Pretzsch et al. 2008).DA has shown promise in improving carbon cycle predictions (Williams et al. 2005, Luo et al. 2011, Peng et al. 2011), spurred by increasing amounts of experimental data available and active process-based model development.
In the early stages of research, DA was applied primarily to relatively simple terrestrial ecosystem models and quickly became a very effective tool to diagnose model structure (Keenan et al. 2012), evaluate the usefulness of different datasets (Keenan et al. 2013), and quantify relative contributions of model and data in constraining model parameters and for carbon cycle model predictions (Weng and Luo 2011).Several DA studies have indicated that different datasets and their error structures can influence the parameter estimates and model predictions (Luo et al. 2003, Xu et al. 2006, Richardson et al. 2010, Weng et al. 2012).Keenan et al. (2013), for example, found that many datasets have redundant information for constraining model performance.Only five out of the seventeen available data streams were necessary to constrain the model (Fig. 4).In particular, numerous studies have demonstrated that flux data, such as soil respiration and net ecosystem exchange (NEE), do not contain sufficient information to constrain pool-related parameters (Wu et al. 2009, Keenan et al. 2013).Weng and Luo (2011) evaluated information contributed by model structure vs. data to shortand long-term prediction of forest carbon dynamics.Measurements over ten years at a forest ecosystem primarily constrained fast, upstream carbon pools (e.g., foliage and fine root) whereas model structure determined slow, downstream pools (e.g., slow and passive soil organic matter).This suggests that both process understanding to improve model structure and datasets to constrain model parameters are important for longterm carbon predictions.
Application of DA to regional and global carbon cycle models is challenging largely due to prohibitive computational demands.Instead, comprehensive global models are often calibrated against data at multiple locations.For example, Kuppel et al. (2012) assimilated net CO 2 flux (NEE) and latent heat flux (LE) measurements collected from 12 temperate deciduous broadleaf forest sites into the ORCHID-EE model.Several studies have successfully assimilated regional and global data sets into models (Barrett 2003, Zhou et al. 2009, Zhou et al. 2012, Hararuk et al. 2014).For example, Hararuk and Luo (in revision) applied DA to constrain the Community Land Model (CLM3.5)against a global soil organic carbon database.The constrained model explained 41% of the global variability in the observed SOC in comparison with the initial 13%.
While increasingly employed for model calibration, DA has not been widely used for predictions.Gao et al. (2011) used EnKF to assimilate eight sets of data from Duke Forest during 1996 to 2004 into a terrestrial ecosystem model as a basis for predicting the daily carbon pools (state variables) from 2004 to 2012 (Fig. 5).Uncertainties in predicted carbon sinks increased over time for the long-term carbon pools but remained constant over time for the short-term v www.esajournals.orgcarbon pools.In addition, prediction of future carbon dynamics requires weather data in the future time as forcing since ecological system dynamics are influenced by weather and climate.Thus, climate models and emissions scenarios are required to generate future weather data that are used to drive forecasting with an ecological model.Keenan et al. (2013) showed how prediction uncertainties to 2100 were greatly reduced by the incorporation of orthogonal data constraints.The identification of the key datasets that contain the information needed to improve model predictions of future states and fluxes remains an important and outstanding challenge.

FUTURE CHALLENGES AND OPPORTUNITIES OF DA
Overall, all of the four areas discussed here use models to represent major processes underlying the system dynamics.When models are not constrained by data before they are used to project future states, the uncertainty in those projections is typically quite large.When models are constrained by data, uncertainty in predictions declines and accuracy increases.Among all four areas, there are a few examples of real-time forecasting capability.In the case of infectious diseases and wildfire, real-time forecast can save human lives and thus has immediate societal v www.esajournals.orgvalue.Natural resource managers, such as in fisheries and agriculture, would likewise benefit from frequent, on-demand access to forecast models that have assimilated the most up-todate information.Likewise, both in natural resources and in other sectors of the economy, there is increasingly concern about the carbon cycle impacts of human activity.However, decision makers rarely have access to the best available models or data when evaluating the sustainability of alternative scenarios, especially those operating at a local scale.Nevertheless, DA may not improve prediction when ecological processes are not well understood or never observed (Luo et al. 2011), and so any projections will always be contingent on model structure.
While DA has the potential to advance predictive ecology, future DA applications have to deal with several challenges, such as: combining multiple data sources, assimilating data across diverse scales and process, complex model structures, models with large numbers of parameters to be estimated, high computational demands, and the need to expand applications into complex social-ecological systems.

Models become more and more complex: challenges and opportunities
One of the challenges to further advance predictive ecology is related to complex model structures.Terrestrial carbon models are continuously being developed, driven by our evolving understanding of terrestrial ecosystems.For example, recent advances have incorporated detailed descriptions of nutrient dynamics (Zaehle et al. 2010), community dynamics (Medvigy et al. 2009), and responses to disturbances (Prentice et al. 2011) in attempts to simulate the carbon cycle in the real world as realistically as possible.As a consequence, models become more complex but less tractable.DA with complex global models has become infeasible, as most of DA methods are computationally expensive, unless innovative methods are developed for DA or model structure analysis.
In addition, complex models make it very difficult to separate model structural error from parameter error when using traditional modelcomparison approaches (Keenan et al. 2011).Recent advances in theoretical analysis of model structure have the potential to better identify model structural and parameter errors (Xia et al. 2013).Advances in the analysis of model uncertainty are likewise allowing better identification of model parameter error (Dietze et al. 2013, LeBauer et al. 2013).Although DA is not as easily applied to complex models as a whole, key components of models can be identified for targeted improvement.

Data become more and more available: challenges and opportunities
The field of ecology is rapidly becoming datarich with new automated measurement devices, wireless networking, and improved cyber infrastructure greatly facilitate the collection of large amounts of data.Current initiatives in continental-scale monitoring through ''ecological observatories'', such as NEON (http://www.neoninc.org), ICOS (http://www.icos-infrastructure.eu), FLUXNET (http:// www.fluxnet.ornl.gov;http:// www.fluxdata.org),and LTER (http://www.lternet.edu),promise an even greater data deluge in the future.As scientists deal with larger and larger datasets, advanced techniques, such as data mining (e.g., Moffat et al. 2010) and model uncertainty analysis (e.g., Keenan et al. 2011), are becoming more widely applied.
The integration of the newly available data with ecological models is a clear direction forward.It has been shown that using multiple data streams reduces model uncertainty disproportionate to simply the increase in sample size, both for present-day model performance (Richardson et al. 2010) and for projecting ecosystem function under future climate change (Keenan et al. 2012).That said, not all data types give an equivalent improvement in model performance, and indeed some data streams are redundant given the availability of others (Keenan et al. 2013).The true information content of high frequency vs. low frequency observations has also yet to be systematically assessed (e.g., Weng and Luo 2011).Despite challenges, it becomes essential to use multiple constraints to inform continental (Haverd et al. 2013) and global (Smith et al. 2013) models.
Ecological issues become more complex in coupled human-nature systems Many of the most important biophysical changes occurring today are substantially influ-enced by human decisions and actions, such as changes in climate, land-cover, and species composition.These ecological changes, in turn, strongly affect society and its management of natural resources.Future predictions are therefore best made in a social-ecological context, recognizing the important feedbacks between the human and ecological components of socialecological systems.Social scientists are generally reluctant to forecast human actions, and therefore the biophysical consequences of these actions, because people respond not only to past events but also make decisions in response to perceptions of potential future outcomes.
DA has been most successful in improving predictions of the responses of ecological systems to well-understood causes of variation.Building on this experience, it is most likely to be useful for predicting trajectories of rapidly changing variables that follow predictable temporal patterns and also respond predictably to events that are difficult to anticipate.Tropical deforestation, for example, responds predictably to road construction in combination with other variables (Geist and Lambin 2002), but the institutional factors that precipitate road development are less predictable.DA that tracks road construction may therefore be a useful predictor of the locations and rates of deforestation and associated declines in carbon sequestration and biodiversity.Stewardship, which involves shaping the future of social-ecological systems to facilitate ecosystem resilience and human well-being (Chapin et al. 2011), would benefit from the use of DA to track the trajectories of these systems toward or away from critical transitions and thresholds.

Real-or near-real-time predictions
To make predictions accessible for policymaking, we need to develop real-or near-time forecasting capability, i.e., use the most currently available data to get updated predictions.Realtime forecast has been successfully applied to infectious diseases and should be valuable for fire behavior to save human lives and thus have immediate societal value.Near-time predictions can have great economic and societal importance for fisheries and carbon cycle, as their economic values and societal benefits can also be realized in long-term predictive estimates of ecological processes.To realize the real-or near-time prediction, we not only need informed datasets, process-oriented models and DA but also cyberinfrastructure to support data assimilation (Luo et al. 2011, Dietze et al. 2013).

Fig. 1 .
Fig.1.Techniques framework for ecological prediction.Data assimilation uses observed data to train models and updates the model output so as to accurately describe the past and current states of the system and finally to better predict the future states.

Fig. 2 .
Fig. 2. Real-time forecasts of H1N1 in Singapore.The first three rows depict observations (crosses) and forecasts (grey shaded area) at three points in time during the outbreak (indicated by the triangle on the time axis).The cross in the lower left indicates an independent estimate of infection from adult seroconversion.The bottom panel shows the posterior absolute deviation between predicted and observed incidence.As data assimilation trained the model (going from the first row to the third row, the forecasts (gray shading) became more certain (row 3) and prediction error declined (row 4). Figure reproduced from Ong et al. 2010.

Fig. 3 .
Fig. 3.The variances of the estimated fishing mortalities for 7 years old fish using pure ensemble integration with no data assimilation (Ens pred) and three data assimilation techniques.Ensemble Kalman filter (EnKF), the ensemble smoother (ES), and ensemble Kalman smoother (EnKS) (modified from Gronnevik and Evensen 2001).The figure shows that DA greatly reduces the variance of prediction.

Fig. 4 .
Fig. 4. Model uncertainty for the simulations of carbon fluxes (NEE, GPP, Ra, and Rh) for the Fo ¨BAAR model (Keenan et al. 2012).Three different approaches to constrain the model are shown: (1) using all data constrains (flux and biometric data) (black), (2) using short and long term flux data constrains (dark gray), and (3) using only short term flux data constrains (light gray).The shaded areas represent the confidence in model predictions.The figure shows that the model uncertainty can be quantified and reduced by assimilating different datasets into the model (modified from Keenan et al. 2012).

Fig. 5 .
Fig. 5. Daily analysis from1996-2004 (green) and daily forecast of carbon pools from 2004-2012 (yellow) at Duke Forest with 100 ensembles after eight data sets were assimilated into the TECO model using Ensemble Kalman Filter (EnKF).The uncertainties of the analysis were reduced when data was assimilated into the model and the uncertainties of the forecasted carbon pools were relatively stable (modified from Gao et al. 2011).