Geophysical Research Letters

Real-time correction of ERA-Interim monthly rainfall



[1] Gridded precipitation from reanalysis can be a valid proxy for observations in many regions where local measurements are not available. Despite the continuous improvements in assimilation systems and the growing availability of in situ and remote observations, model errors can nevertheless still affect reanalysis quality. A temporal and spatial correction of ERA-Interim rainfall monthly means is proposed. It is shown that the strategy of postprocessing the data set is crucial for drought monitoring and forecasting.

1 Introduction

[2] Rainfall observations used to monitor/forecast anomalies for operational systems such as drought monitoring need to meet specific requirements. The data set should be long enough (30 years) and statistically homogeneous, implying that observations should as much as possible (i) avoid changes in rain-gauges location and measuring equipments and (ii) use same algorithms to derive precipitation from remote sensing data, even when using different platforms. The data also needs to be available in near real time.

[3] The global rain-gauge network available through the global telecommunication system [World Meteorological Organization, 2009] provides measurements with a coverage that is only satisfactory in the Northern Hemisphere. Data are sparse over the tropical band and almost absent over the oceans. Over the last 10 years, improvements in the microwave technology have led to a number of satellite missions that can be used in global rainfall retrieval algorithm. While these remote sensing measurements have helped to fill coverage gaps, they nevertheless have the disadvantage to be less accurate than in situ stations [Thiemig et al., 2012]. Moreover, they often rely on specific assumptions concerning the surface properties and cloud structure which are uncertain [Tompkins and Adebiyi, 2012]. Most importantly, they have a limited time span. To gain from the strength of both in situ and remote observing systems, a successful approach has been used to merge station data with satellite information as in the Global Precipitation Climatology Project (GPCP) which over the past 10 years has released various global data sets at different spatial and temporal resolution [Huffman and Bolvin, 2000]. These data sets are usually available with several months delay due to the quality checks performed on the data.

[4] In general, the long-term homogeneity and near real time update are the two criteria difficult to achieve on a global scale for a data set only based on observations. Dynamical model outputs can therefore supply a valid alternative, provided that the model retains the same spatial and temporal structure for the whole period to avoid disruptions due to model changes. ERA-Interim (ERA-I) is the latest of the European Centre for Medium-Range Weather Forecasts reanalysis products. It starts on 1 January 1979 and is extended forward in near real time. It employs a sequential 4-D-Var data assimilation scheme which ensures the optimal consistency between the available observations and the model background. ERA-I largely improves over previous reanalysis products overland [Dee et al., 2011] and could therefore be an useful proxy to monitor precipitation in regions with a lack of in situ observations. Nevertheless, there are some deficiencies which still remain especially over the African continent which is the region of interest of this study. One issue is related to the presence of a dry nonlinear trend which is clearly visible when the seasonal climatology for the summer season (June-July-August (JJA)) of the first decade (1979–1989) is compared to the climatology of the third decade (1999–2009) (Figure 1a). This trend, not being present in the GPCPv2.2 data sets (Figure 1b), appears to be a model artifact. It is reasonable to believe that it stems from the availability of rain-microwave observations (from satellite such as Tropical Rainfall Measuring Mission/Special Sensor Microwave Imager) in the most recent years which, enriching the 4-D-Var assimilation cycle [Benedetti et al., 2005], led to an improvement in the representation of the low levels humidity fields [Andersson et al., 2005]. This means that ERA-I climate is unrealistically wet in the 70s–80s while reconverging to more accurate predictions in the later decades. Most of the annual precipitation over Africa is controlled by the south to north and back displacement of the Intertropical Convergence Zone which also controls the onset, spatial extent, and offset of the West-Africa Monsoon [Janicot et al., 2008]. The Sahel band is where the spurious trend has its maximum. The highlighted problem therefore generates an unrealistic strengthening and northward displacement of the monsoon cycle in the first decades of the data set. ERA-I precipitation suffers also from other biases (over the Andes and on the west coasts of continents, for example) [Dee et al., 2011]; nevertheless, biases over Africa are particularly severe due to the scarcity of observations to constrain the assimilation cycle and the limitation of the convection and land surface parameterizations over the region [Agusti-Panareda et al., 2010]. As a consequence, ERA-I is not a valid data set for precipitation monitoring over Africa. Indeed, when employed in the operational drought forecasting system of Dutra et al. [2012], it performed poorly.

Figure 1.

Climatology difference for the summer season (JJA) between decades 1979–1989 and 1999–2009. (a) ERA-Interim prediction and (b) GCPCv2.2 observed trend. (c) Time evolution of the June-July-August biases spatially averaged on the West Africa area (shaded in the box) for the original ERA-I data set and the two stages of the correction applied.

[5] In this work, we propose a two-step correction to improve both the spatial and temporal variability of ERA-Interim monthly precipitation. This is a justifiable strategy, at least in the short term, considering on the one hand the uncertainties and time required to achieve model improvements and on the other noticing the growing demand of gridded reanalysis fields in many sectorial applications in regions which cannot benefit from an adequate observational network. An important aspect of the correction proposed is that it is applied in real time. This means that ERA-I precipitation is corrected using GPCPv2.2 data [Huffman and Bolvin, 2000] (chosen as the benchmark data set) even to recent periods, when GPCPv2.2 is no longer available. This allows the corrected data set not only to be employed operationally for the scope of long range forecast verifications but also to drive real-time sectorial applications.

[6] Two sequential methods are applied which in turn correct the temporal variability and spatial location of rainfall. The first method works on single grid points by rescaling the rainfall fields using regression coefficients of model time series against observed ones [Molteni, 2013]. Then, the precipitation localization is improved by performing an empirical orthogonal function (EOF) mapping of the rescaled fields over the observed mode of spatial variability following an idea implemented to correct long range forecast [Di Giuseppe et al., 2013]. The first step of the correction (local point rescaling) can be applied globally since it operates on independent grid points. The spatial remapping instead is designed to operate on X-Y planes and needs to be applied over defined regions. In this article, we will concentrate on Africa. However, this methodology can be extend to other regions of the globe.

[7] The following section presents the correction methodologies (further detailed in the supporting information). It is then followed by an evaluation of the corrected data set not only in terms of the precipitation fields but also in the drought monitoring system of Dutra et al. [2012] taken as a sectorial application example.

2 ERA-Interim Temporal Rescaling and Spatial Remapping

[8] ERA-I correction is applied in real time on monthly means, m. GPCPv2.2 is used as an observation data set. The model fields, E, are firstly aggregated over GPCPv2.2, G, grid (2.5° × 2.5 °) using conservative remapping. Grid-point precipitation is rescaled to generate a first intermediate data set, ELR. Then, a spatial reshaping of the precipitation fields is applied [Di Giuseppe et al., 2013]. The final corrected product, ELR+mapping, represents therefore the combination of a pointwise rainfall amount rescaling and a spatial mapping. In each of the two steps, the calibration data set is not fixed but generated for each month. It includes in fact all months from available years except the year to be corrected. Leaving out the month to be corrected is necessary for both reasoning of cross validation and to be able to apply the method in real time when observations might not be available. The two steps are in the following briefly explained.

[9] The simpler approach for rescaling a modeled rainfall to match observation is by defining a local scaling, Fas F(j,m,y)=G(j,m,y)/E(j,m,y), where F(j,m,y) only depends on the exact values of G and E at a specific location, j, and time (in our case, month m and year y). This simple approach has two main drawbacks. First, when E(j,m,y)=0, F(j,m,y) is undefined. Second, F(j,m,y) exists only when model and observations are simultaneously defined.

[10] To avoid these limitations, Molteni [2013] suggested to build, for each spatial point, j, a rescaling coefficient F({j},m,{y}) as a regression coefficient on a suitable “neighborhood” of E(j,m,y) where the neighborhood is defined as a function of:

  1. [11] space (using adjacent grid points {i})

  2. [12] year (using years {y} sufficiently close to the selected year, in order to filter out long-term trends)

  3. [13] the position of E(j,m,y) within a suitably sampled distribution of E, i.e., the rank of E({j},m,{y}) within a selected sample of values for the same month.

[14] Since F is provided by a linear regression (LR) approach instead than a local rescaling, the constraint on the simultaneous existence of model and observation rainfall at grid point level is relaxed. Moreover, an additional advantage is that the rescaling factor can be extrapolated to future times to provide a real-time correction to ERA-I for instants when G(j,m,y) is not available. Details on how the “neighborhood” of {j} and {y} are chosen to calculate F({j},m,{y}) are provided in the supporting information.

[15] This first pointwise rescaling is applied to correct the local amount of precipitation and is expected therefore to improve the intraseasonal and intraannual variability of the data set. It would nevertheless fail to correct any systematic shift in the precipitation patterns. If, for example, the West Africa monsoon were persistently displaced southward, as documented in Tompkins and Feudale [2010], the northern points on the fringes of the Sahel will never experience precipitation. As a consequence, the rescaling coefficient, F({j},m,{y}), will be locally undetermined. The correction of large-scale patterns is taken care of by applying a spatial remapping based on an EOF decomposition of the precipitation anomalies with respect to the climate mean (average of the data sets for the period 1979–2010).

[16] In Di Giuseppe et al. [2013], the spatial remapping was applied to long range forecast by using the hindcast set at shorter lead times and GPCPv1.1 pentads data to generate the correction maps. Here we use a different approach since each year in ERA-I data set is extracted in turn and corrected using the remaining years as training data set. In this case, the spatial correction is applied to monthly means by constructing artificial mapped EOF (more details in the supporting information) which can be regarded as a spatial correction mask that if applied to the model temporal variability would match the observed spatial patterns. Any model mode that does not correlate well with observed modes (e.g., the spurious trend in Figures 1a and 1b) of variability will result in a zero mapped anomaly and is therefore removed.

[17] The quality of the correction mask defined by the mapped EOF is based on the predictive skill of ERA-Interim over all the years except the one being corrected. While the model structure such as the physics parameterizations and the data assimilation framework is invariant over the 32 year of reanalysis period, the atmospheric and oceanic observational network is not, with satellite and buoy observations becoming increasingly beneficial, especially for the tropical regions where climate anomalies are more directly impacted by sea surface temperatures and where land-based conventional observational networks are typically sparse [e.g., Tompkins and Feudale, 2010; Fink et al., 2011]. Poor quality initialization reducing skill in the earlier periods will correspondingly weaken the mapped EOFs and lead to anomalies being under-predicted for the present-day.

[18] One way to strengthen the “projection” of the model mode of variability over the observed ones is to apply the mapping on areas that are homogeneous in terms of their precipitation climatology. For this reason, the mapping is applied separately on three subregions (shown in Figure 1c) which are defined quantitatively using a k-means clustering algorithm using the monthly means of the GPCPv2.2 data set and imposing the “closeness” of the precipitation anomaly time series [Di Giuseppe et al., 2013]. The regions identified agree with the common qualitative assessment of the African rainfall climatology. In fact, Western Africa (WAF) undergoes a single monsoon between July and September while central South Africa (CSAF) is characterized by a rainy season between December and February and Eastern Africa (EAF) undergoes two rainy seasons.

3 Quality of the Corrected ERA-I Data Set

[19] African rainfall follows seasonal cycles which differs across the continent. The verification of the quality of the correction applied is therefore performed over the three previously defined regions by looking at the different seasons.

[20] Figure 1c shows the time evolution of the JJA biases spatially averaged on the WAF area for the three data sets (ERA-I, ERA-ILR, and ERA-ILR+mapping) against GPCPv2.2 for the period in which the data set is available. The dry trend is clearly visible in the original data set and is mostly corrected by the linear regression rescaling. The spatial-based approach would be unable—if applied to the raw data—to explicitly rectify the wrong time correlation between the model and the data. The additional remapping is engineered to improve the intensity and location of the monsoon events. By looking at the spatial and temporal correlation of anomalies in the three regions and for the four seasons (Figure 2), it is possible to detangle the improvement provided by each correction step. The corrected data sets (ERA-ILR and ERA-ILR+mapping) are significantly different from the original data when their interval of correlation is outside the ERA-I confidence interval. Therefore, even if there is an hint for improvement, neither of the correction steps provide statistically significant improvements. Since the calculation is performed first, averaging precipitation over vast areas, the limited impact can be due to compensation of nongeographically uniform performances, and the extend of the confidence intervals highlight the large sampling uncertainty associated with the small sample sizes (only 32 seasons). To assess if there is an amelioration in the precipitation spatialization, the anomaly correlation versus GPCPv2.2 is calculated spatially for each time and in each season (Figure 2b). The results are presented in terms of anomaly correlation distribution across the years by showing the 30%, 50 % (median), and 70% values of the correlation distribution. These intervals are not associated with statistical significance tests and should therefore not be interpreted in a statistical way. It is nevertheless clear that the EOF remapping generally improves the precipitation patterns over the other two data sets.

Figure 2.

Anomalies correlation of ERA-Interim versus GPCPv2.2 for the default data set and the two stages of the correction (ELR and ELR+mapping). Correlations are calculated for the three macroregions (WAf, EAF, and CSAF). (a) Temporal correlation of the precipitation anomalies. Bars represent the 95% confidence interval for the null hypothesis using the Fisher r-to-z transformation [Fisher, 1970]. (b) Spatial correlation of the precipitation anomalies for all grid points in the three macroregions. Bars represent the 30%, the median, and 70% of the probability distribution.

[21] The main reason to design a correction for ERA-I rainfall was to extend the usability of the data set in specific sectorial applications. The operational benefit of the corrected ERA-I is assessed using the drought monitoring system of Dutra et al. [2012]. In this system, the drought is diagnosed using the Standardized Precipitation Index (SPI) [McKee et al., 1993] which is simply based on the probability of an observed precipitation deficit occurring over a given time period, and recommended by the World Meteorological Organization (WMO) as a standard to characterize droughts [Hayes et al., 2011]. The integration time has typical values of 3, 6, and 12 months allowing for a range of meteorological, agricultural, and hydrological applications.

[22] The temporal correlation of the 12 month SPI (SPI12) between the standard ERA-I and GPCPv2.2 (Figure 3a) reveals the limited usability of the standard ERA-I over Africa. With the exception of south and north-west Africa, ERA-I cannot be used to monitor drought. The introduction of the pointwise rescaling (ERA-ILR) slightly improves the correlations in central west and east Africa (Figure 3b). It is nevertheless the combination of the temporal and spatial remapping (ERA-ILR+mapping) that has a significant impact increasing the correlations (in several areas > 0.5 increase in Figure 3e) also in central west and east Africa. A limited impact is instead observed in central Africa where ERA-Interim has very limited forecast skills and also GPCP has large uncertainties due to the lack on in situ observations [Dutra et al., 2012]. In this region, any postprocessing will have no value, and only improvements in the model itself can be beneficial. The spatial mapping is especially beneficial when applied after the linear rescaling. Applying the spatial mapping alone to the original ERA-I data as in Di Giuseppe et al. [2013] had a much more limited impact. Similar results were found when comparing the 3 and 6 month SPI.

Figure 3.

(a, b, and c) 12 month SPI correlations for the standard ERA-I data set and the two stages of the correction (ELR and ELR+mapping) versus GPCPv2.2. (d, e, and f) Difference in the correlation between the data sets. Solid contours highlight where differences are significant using the Fisher r-to-z transformation.

4 Conclusions

[23] Gridded precipitation from reanalysis could be used as a valid proxy for observations in many regions where local measurements are not available. Because reanalysis integrations do not have the stringent time constraint of operational analysis, they benefit from the availability of a larger and better quality controlled set of observations at the expenses of an outdated modeling system and lower resolution. Nevertheless, model errors can still affect the output quality especially in regions of sparse observations where these fields could be mostly employed.

[24] In this work, we have proposed a correction that is applied to the monthly means of ERA-Interim, the latest of ECMWF reanalysis product. The correction performs, in two sequential steps, both the time rescaling and spatial remapping of the original precipitation ERA-I fields using an observation data set as benchmark. The important aspect of the method in comparison to standard calibrations is that the corrected data set is provided in real time as soon as ERA-I is made available without delays due to, for example, the lack of simultaneous observations. This means that it can be employed in near real time sectorial applications which require precipitation monitoring such as the drought monitoring and forecast system of Dutra et al.[2012].

[25] The correction has potential to solve the relevant deficiencies of ERA-I precipitation over Africa. Moreover, we have highlighted its practical benefit in meteorological applications, demonstrating that this corrected ERA-I can now be usefully employed to monitor drought over Africa.


[26] This study was funded by the European Commission Seventh Framework Programme “FP7” projects QWeCI (grant agreement 243964) and DEWFORA (grant agreement 265454). We are greatful to Adrian M Tompkins, Fredrik Wetterhall, Dick Dee, and two anonymous reviewers for their valuable comments on the manuscript.

[27] The Editor thanks two anonymous reviewers for their assistance in evaluating this paper.