Imbalanced land surface water budgets in a numerical weather prediction system

There has been a significant increase in the skill and resolution of numerical weather prediction models (NWPs) in recent decades, extending the time scales of useful weather predictions. The land surface models (LSMs) of NWPs are often employed in hydrological applications, which raises the question of how hydrologically representative LSMs really are. In this paper, precipitation (P), evaporation (E), and runoff (R) from the European Centre for Medium‐Range Weather Forecasts global models were evaluated against observational products. The forecasts differ substantially from observed data for key hydrological variables. In addition, imbalanced surface water budgets, mostly caused by data assimilation, were found on both global (P‐E) and basin scales (P‐E‐R), with the latter being more important. Modeled surface fluxes should be used with care in hydrological applications, and further improvement in LSMs in terms of process descriptions, resolution, and estimation of uncertainties is needed to accurately describe the land surface water budgets.


Introduction
The skill and resolution of numerical weather prediction models have increased substantially over the past decades [Palmer et al., 2007;Rodwell et al., 2010;Simmons and Hollingsworth, 2002;Wedi, 2014]. The skill of precipitation forecasts for the extratropical region has, for example, increased by 2 days over a 14 year period (1995)(1996)(1997)(1998)(1999)(2000)(2001)(2002)(2003)(2004)(2005)(2006)(2007)(2008) for the European Centre for Medium-Range Weather Forecasts (ECMWF) [Rodwell et al., 2010]. The improvements of numerical weather prediction models (NWPs) increase the potential for their use in hydrological applications. Traditionally, the land surface model (LSM) component was developed with a main focus on the energy balance [Overgaard et al., 2006] but has, through better representations of hydrological processes and higher resolution, become more similar to hydrological models [Balsamo et al., 2011]. These model developments have brought the two model types closer together and LSMs are increasingly being used in applications where hydrological models would have previously been employed. An example of this is the Global Flood Awareness System which uses runoff from the LSM which is coupled to the atmospheric forecast system of the ECMWF in order to produce flood forecasts across the globe . LSM and NWP outputs have also been applied for forecasting and monitoring of, e.g., wind, health, fire, droughts, and malaria Thomson et al., 2006]. The use of reanalysis products, i.e., outputs from static model versions and assimilation systems run retrospectively, has also increased and includes applications such as simulation of floods [Pappenberger et al., 2012] and droughts [Dutra et al., 2008]. Reanalysis products have the advantage of providing meteorological fields from a single model in contrast to operational forecasts for which the ever-changing models and data assimilation systems can introduce artificial variability [Compo et al., 2011].
However, model outputs are normally evaluated with a meteorological focus without consideration of hydrologically relevant spatial and temporal integrations. The benchmarking of model performance against observations is usually done for measures such as average daily precipitation over a grid cell rather than the more hydrologically relevant quantity of precipitation over a catchment falling within the catchment concentration time (e.g., 3 days). Measured river discharge in particular is an ideal variable to assess the integrated performance and diagnose possible improvements of a forecast but has seldom been used to evaluate operational weather prediction models. In this paper, we make an intercomparison of a suite of modeled surface fluxes from the ECMWF with global observation-based data on precipitation, evaporation, and runoff and analyze the long-term surface water budgets of the models. It should be noted that the paper does not address forecast skill or errors, but is entirely focused on the hydrological representativeness of the NWP system.

ECMWF Model System
The atmospheric general circulation model is coupled to an ocean wave model, the land surface scheme Tiled ECMWF Scheme for Surface Exchange over Land (TESSEL) [van den Hurk et al., 2000] or the improved scheme (H-TESSEL) [Balsamo et al., 2011[Balsamo et al., , 2009, and an ocean general circulation model to form the integrated forecast system (IFS), which is the basis for ECMWF's modeling system [Persson, 2013]. The IFS is used in different combinations and resolutions to produce both deterministic and probabilistic forecasts on the medium range up to the seasonal scale. A total of seven model outputs were used in this study; two realizations from the operational forecast system, three reanalysis products, and two off-line runs with the land surface scheme H-TESSEL.

Operational Forecasts
The operational analysis and deterministic forecast are based on the most recent IFS cycle and produces high-resolution forecasts, currently globally on reduced Gaussian grid with a spectral resolution of T1279 (16 km). The operational ensemble forecast system is run on a lower resolution (T639) than the deterministic forecast and provides an indication of possible effects of errors in the initial analysis by representing the evolution from perturbed initial conditions. In this study, the high-resolution short-range forecast (24 h from the analysis, O H1d ) and a longer lead time (10 days O L10d ) forecast were used. The reason for selecting the longer lead time forecast was to be able to quantify impacts of the analysis on the forecasted variables. At lead time 10 days, the model is less affected by the data assimilation and is more representative of the model climatology.

Reanalysis Data
Reanalysis data, as opposed to the operational forecasts, are produced using a fixed model version in order to maintain consistency throughout the reanalysis period. Three reanalysis products were included in this study: ERA-40, ERA-Interim, and ERA-CM. ERA-40 is available for 1 September 1957 to 31 August 2002 at a spectral resolution of T159 (about 125 km), and the data assimilation was based on a 6-hourly 3D-Var analysis [Uppala et al., 2005]. ERA-Interim, which is available from 1 January 1979 to present at a spectral resolution of T255 (about 80 km), is based on a more recent model version and data assimilation system, 12-hourly 4D-Var . ERA-CM [Hersbach et al., 2013] is the most recent reanalysis used in this study and is available at the same horizontal resolution as ERA-40 (T159) from 1899 to 2009. It is based on a newer model version compared to ERA-40 and ERA-Interim but does not assimilate any atmospheric variables and consists of an ensemble of 10 members.
ERA-40 and ERA-Interim are based on the same LSM, TESSEL, whereas the operational forecasts and ERA-CM used the improved scheme H-TESSEL. The tiled structure of the model grids allows for six different types over land (intercepted water, bare ground, high/low vegetation, and shaded/exposed snow) and two over water (open and frozen). The development of H-TESSEL addressed two main shortcomings of TESSEL: the globally fixed soil texture and the Hortonian runoff scheme [Balsamo et al., 2009]. Precipitation only reaches the ground if the interception layer has been saturated and is then partitioned between infiltration and surface runoff. The soil, which can be covered by a snow pack, is represented by four different layers and the bottom of the column is assumed to be draining freely. From each soil layer, water can also be withdrawn as root extractions. Each tile represents a separate water and energy balance. The soil hydrology in H-TESSEL was modified compared to TESSEL by allowing for spatially varying soil types, using van Genuchten formulation of soil hydraulic properties and changing the surface runoff generation to be based on variable infiltration capacities derived from soil type and topography [Balsamo et al., 2009].

Off-Line Runs of the Land Surface Scheme
Results derived from two off-line runs conducted at ECMWF of the land surface scheme H-TESSEL without data assimilation were used: Land C and Land U . Both of the products have been forced with ERA-Interim data, but Land C was based on precipitation data which had been bias corrected to match monthly totals from the Global Precipitation Climatology Project (GPCP) version 2.1 data set (a predecessor to the data Geophysical Research Letters 10.1002/2015GL064230 set described in section 2.2), whereas Land U was based on uncorrected precipitation data [Balsamo et al., 2013]. The correction methodology is described in detail in Balsamo et al. [2010] and aims to preserve small-scale features of the higher-resolution ERA-Interim data at the same time as matching monthly precipitation totals from the coarser GPCP data. This is achieved by calculating a rescaling factor (quotient between GPCP precipitation and ERA-Interim precipitation) at the coarser resolution after interpolation and then bilinearly interpolating the rescaling factors to the higher resolution. The high-resolution ERA-Interim precipitation is then rescaled using the high-resolution factors. Note that the corrected product is published as ERA-Interim/Land in Balsamo et al. [2013] but is called Land C herein to avoid confusion with ERA-Interim. The precipitation forcing of the Land U model is not directly comparable to the ERA-Interim precipitation used in this study, due to potential forecast drift [Kållberg, 2011] since they were retrieved from slightly differing forecast steps.

Validation Data
Observed discharge data were retrieved from the Global Runoff Data Centre (GRDC) in June 2013 together with basin boundaries from the GIS-polygon data set developed by Lehner [2011]. Precipitation data were obtained from the GPCP, specifically the Version 2.2 Monthly Precipitation Analysis product, which is a merged satellite and rain gauge data set available on global scale at a spatial resolution of 2.5°× 2.5°l ongitude-latitude from 1979 to present [Adler et al., 2003]. Evaporation estimates were obtained from the LandFlux-EVAL multidata set synthesis product (E LFE ) [Mueller et al., 2013], available as a monthly global product at 0.5°× 0.5°longitude-latitude resolution. The diagnostic product covering 1989-2005, which is based on five data sets derived from in situ and/or satellite data, was used as a benchmark in this study.

Preprocessing of Data
To facilitate comparisons, the gridded data were bilinearly interpolated to 0.5°× 0.5°regular longitude-latitude grids and the ECMWF land mask was used to distinguish land and ocean points. Discharge data were quality controlled (for details, see Kauffeldt et al. [2013]) and basins with long-term runoff coefficients higher than unity, basin areas less than 10,000 km 2 and/or more than one daily observation missing per month for January 1995 to December 2001 were excluded, which rendered a set of 611 gauging stations used in the study.

Results and Discussion
All model outputs have a positive precipitation bias compared to GPCP data (Figures 1a-1c). ERA-40 shows a particularly strong bias which increases during the period and has been attributed to issues in the humidity assimilation Uppala et al., 2005]. Both operational models show strong positive biases, which decrease toward the end of the period. The decrease in bias is the result of many factors with the most important one being an updated parameterization of convection, which improved convective precipitation in the tropics [Bechtold et al., 2012]. The strong positive bias in the 1 day forecast (O H1d ) in comparison with the forecasts at lead time 10 days (O H1d ) is an effect of the data assimilation scheme, which decreases (increases) moisture in the boundary layer (higher model levels) over the tropics. This moisture transport causes an increase in precipitation for the first time steps (Figures 1a-1c). ERA-Interim displays a sudden jump in 1992, which has been attributed to issues in the assimilation scheme . The operational models show high evaporation rates (Figures 1d-1f) compared to the other products, which can be due to several reasons since the model version, the assimilation scheme, and/or the assimilated observations differ between each of these products. High uncertainties in global land evaporation estimates are seen from the large range of the observation-based E LFE data (Figure 1e) and the ECMWF model estimates. Precipitation and evaporation are expected to be in an approximate balance over the global domain. However, for all models, apart from ERA-CM, substantial imbalances are seen for extended periods (Figure 1g). ERA-40 shows a strong positive bias because of excessive precipitation. ERA-Interim also shows a positive bias until 1992 when it becomes less biased, mainly because of the spurious decrease in precipitation. Both operational models show positive biases for some time, but it is most pronounced for O H1d , which (together with ERA-40) shows spurious precipitation rates exceeding evaporation rates over the oceans. Similar issues over oceans have also been reported for other reanalysis products [Trenberth et al., 2011]. The ECMWF modeling system does not enforce moisture conservation [Berrisford et al., 2011], which is clear from the imbalances of precipitation and evaporation over the global domain. ERA-CM, which does not include any data assimilation, displays the smallest imbalance (À0.01 mm yr À1 ) which is also relatively stable over time. Long-term (1995Long-term ( -2001 average runoff estimates from the models were benchmarked against observed data for 611 basins worldwide ( Figure S1 in the supporting information). The long-term averaging allowed travel times to be neglected. Mean relative errors (MRE) between the runoff rates for the different models and observed data from GRDC show that substantial differences are found for many basins for all models (Figure 2). MRE larger than ±20% are found for the majority of basins for all models: ERA-40 83%, ERA-Interim 75%, ERA-CM ensemble members 59-66%, O H1d 76%, O L10d 73%, Land C 62%, and Land U 56% of the basins ( Figure S2 in the supporting information). Uncertainties in discharge estimates can be high, depending on the flow regime but are typically ±10-20% for medium to high daily flows [McMillan et al., 2012]. Hence, an MRE band of ±20% can be expected to be a conservative estimate of the uncertainty of the long-term monthly discharge for most basins.  (Figure 1g), but also on cell and basin scales, when analyzing surface water fluxes from individual models (Figure 3). Long-term (1995Long-term ( -2001 evaporation exceeds precipitation in extensive areas for ERA-Interim (Figure 3a). Similar patterns are observed for ERA-40 and the operational models ( Figure S3 in the supporting information). Although glacier melting can create regional-scale decadal fluxes that are potentially measurable, these and other long-term water storage changes are so small that they can safely be neglected in comparison to the imbalances shown here. In addition, assuming storage changes can be neglected over the long term, many areas show an imbalance between runoff and the difference between precipitation and evaporation ( Figure 3b and Figure S4 in the supporting information). These water budget imbalances also translate to basin scale ( Figure 3c, Table 1, and Figure S5 in the supporting information). As much as 15% of the basins exhibit long-term evaporation exceeding precipitation by 10% or more . Imbalances between long-term runoff and the difference between precipitation and evaporation exceeding 10% of mean precipitation are found for 44-66% of all basins for the models using data assimilation (ERA-40, ERA-Interim, and the operational models). However, the land surface models (TESSEL and H-TESSEL) satisfy closure of the water balance at numerical precision levels, which is confirmed (for H-TESSEL) by the balanced surface water budgets of the offline runs. The ERA-CM output indicates that the coupled system run without data assimilation leads to a relatively small imbalance between P and E globally and no land surface imbalances. Instead, the surface water imbalances (E > P and P-E ≠ R) appear when data assimilation is used (ERA-40, ERA-Interim, and the operational runs). The assimilation affects the modeled moisture in several ways, e.g., through adjustment of atmospheric humidity and soil moisture. In the LSM, both soil moisture and snow are used as nudge factors to generate the best possible meteorological initial conditions. However, the resulting moisture increments lead to a disrupted hydrological cycle, where water and energy are conserved neither locally nor globally.

Summary
This study shows that there are strong biases in key hydrological variables for all of the global model outputs. In addition, the data assimilation process causes water budget issues which are seen across scales (grid cell, basin, and global). Since they are persistent over long time periods, this indicates that the soil moisture and snow increments compensate for systematic deficiencies in the models. The reason for using data assimilation, to improve the forecast performance for atmospheric variables, is at the same time reducing the usability of hydrological model components. Better process descriptions and higher resolution may help to alleviate these issues in the forecasting system, and bias correction will be a necessity for hydrological applications until they are resolved. However, bias correction is fraught with issues such as assumptions of stationarity, which are especially problematic in extrapolation for climate-impact studies . In addition, developing higher resolution and better process description LSMs requires us to carefully consider our lack of knowledge of the parameterization of these processes, which ultimately could lead to a plateau in forecast performance [Beven and Cloke, 2012].
Experiments with improved cloud physics and higher spatial resolution indicate that modeling of precipitation will improve in the future [Haiden et al., 2014]. LSM skill is also expected to improve by, e.g., merging different available precipitation data sets and using improved quantity and quality of soil moisture information. However, there is room for extensive development of a more physically realistic representation of surface (and subsurface) processes in LSMs, along with new strategies to deal with parameterization uncertainties, before they can be reliably used in hydrological impact studies. Evaluation of model performance against observed discharge can be used as guidance for such developments by analyzing performance for different hydrometeorological conditions.