An improved estimate of daily precipitation from the ERA5 reanalysis

Precipitation is an essential climate variable and a fundamental part of the global water cycle. Given its importance to society, precipitation is often assessed in climate monitoring activities, such as in those led by the Copernicus Climate Change Service (C3S). To undertake these activities, C3S predominantly uses ERA5 reanalysis precipitation. Research has shown that short‐range forecasts for precipitation made from this reanalysis can provide valuable estimates of the actual (observed) precipitation in extratropical regions but can be less useful in the tropics. While some of these limitations will be reduced with future reanalyses because of the latest advancements, there is potentially a more immediate way to improve the precipitation estimate. This is to use the precipitation modelled in the Four‐Dimensional Variational (4D‐Var) data assimilation window of the reanalysis, and it is the aim of this study to evaluate this approach. Using observed 24‐h precipitation accumulations at 5637 stations from 2001 to 2020, results show that smaller root‐mean‐square errors (RMSEs) and mean absolute errors are generally found by using the ERA5 4D‐Var precipitation. For example, for all available days from 2001 to 2020, 87.5% of stations have smaller RMSEs. These improvements are driven by reduced random errors in the 4D‐Var precipitation because it is better constrained by observations, which are themselves sensitive to or influence precipitation. However, there are regions (e.g., Europe) where larger biases occur, and via the decomposition of the Stable Equitable Error in Probability Space score, this is shown to be because the 4D‐Var precipitation has a wetter bias on ‘dry’ days than the standard ERA5 short‐range forecasts. The findings also highlight that the 4D‐Var precipitation does improve the discrimination of ‘heavy’ observed events. In conclusion, an improved ERA5 precipitation estimate is largely obtainable, and these results could prove useful for C3S activities and for future reanalyses, including ERA6.


| INTRODUCTION
Precipitation is an essential climate variable-one that is used to describe Earth's climate-and a fundamental part of the global water cycle.It is important for public water supply, food production, the health of the natural environment, inland waterway transport, and above-average or below-average precipitation can be an indication of floods or droughts, respectively.For these reasons, precipitation is a variable frequently examined in climate monitoring activities.In the Copernicus Climate Change Service (C3S; Buontempo et al., 2022)-a service implemented by the European Centre for Medium-Range Weather Forecasts (ECMWF) on behalf of the European Commission-precipitation is monitored in close-to-real time monthly bulletins (https://climate.copernicus.eu/climate-bulletins) and in an annual European State of the Climate report (https://climate.copernicus.eu/ESOTC).To undertake these activities, C3S predominantly uses precipitation output from the ECMWF ERA5 reanalysis, a product which provides a comprehensive record of the global atmosphere, land surface and ocean waves (Hersbach et al., 2020).Precipitation is generally observed as an accumulation over a specified period of time and as such cannot be readily obtained from an instantaneous analysis.Instead, the current standard practice in ERA5 is to accumulate the precipitation from short-range forecasts initialized from the analysis.
The performance of ERA5 precipitation has been assessed in multiple studies (e.g., Bandhauer et al., 2021;Beck et al., 2019;Bell et al., 2021;Crossett et al., 2020;Jiang et al., 2021;Lavers et al., 2022;Tarek et al., 2020).One particular study by Lavers et al. (2022) evaluated the skill of ERA5 precipitation in capturing 24-h observed precipitation at 5637 stations from 2001 to 2020.The results showed that the smallest random errors occurred in the winter Extratropics, while the largest errors were in the Tropics; the errors did grow in the summer Extratropics, but not generally to the same level as in the Tropics.These findings suggested that users could have most confidence in ERA5 precipitation in extratropical regions.The study furthermore identified processes that were not well captured in the ERA5 reanalysis and these included convection in tropical cyclones, underestimation of the orographic enhancement of precipitation and the overestimation of precipitation on dry days.While these issues are likely to improve with newer reanalysis products, such as from the planned ERA6, it is also important to consider other possible opportunities for improving the estimation of precipitation.
An alternative approach is to derive precipitation accumulations from the final trajectory within the Four-Dimensional Variational (4D-Var) data assimilation window.This 4D-Var system produces the best estimate of the earth's atmosphere by combining short-range forecasts in space and time with meteorological observations.The final trajectory has the advantage that it is directly constrained by observations (including observations sensitive to precipitation) and thus might be expected to provide a better estimate of the true precipitation, provided the analysis increments added to provide the starting point of the trajectory are sufficiently well balanced.However, hitherto, this precipitation estimate has not been assessed.In this study, the aim is to evaluate the precipitation from the final 4D-Var trajectory and determine if an improved estimate of ERA5 precipitation can be provided compared to that currently given by the ERA5 short-range forecasts.Using the same in situ station data as in Lavers et al. (2022), this evaluation will identify any possible improvements in precipitation estimates which could then have implications for C3S monitoring activities and the broader climate science community.

| Precipitation observations
This study uses the same gauge-based precipitation observations assembled in Lavers et al. (2022) and they are briefly explained here.The precipitation data were retrieved from the ECMWF archive for the period 1 January 2001 to 31 December 2020 and 24-h observed precipitation totals were extracted, or when possible, calculated from sub daily periods at seven reporting times: 0000, 0100, 0300, 0400, 0500, 0600 and 1200 UTC.Any 24-h precipitation totals greater than 500 mm or (erroneously) less than 0 mm were removed, and a station was only included in this evaluation if a 50% daily availability criterion across the whole study period and within each meteorological season (December, January and February [DJF]; March, April and May [MAM]; June, July and August [JJA]; September, October and November [SON]) was met.Note also that the study uses raw reported precipitation totals, so these values can be affected by both systematic and random measurement errors (Muchan & Dixon, 2019), issues with recording snowfall due to the lack of a warming element in some rain gauges, and the better performance of certain rain gauges than others.

| ERA5 reanalysis precipitation
ERA5 is the latest ECMWF reanalysis providing a global record of the atmosphere, land surface and ocean waves and is based on the ECMWF Integrated Forecasting System (IFS) Cy41r2 (Hersbach et al., 2020).Two precipitation estimates from ERA5 are calculated and evaluated and these are from (1) the short-range background forecasts, which are currently used to provide the standard ERA5 precipitation product, and (2) the final trajectory of the 4D-Var data assimilation system.To calculate the 24-h precipitation estimates, the short-range background forecasts use accumulations from the first 12 h of forecasts from 0600 and 1800 UTC, while the 4D-Var trajectory uses the 0900 and 2100 UTC assimilation windows.All ERA5 precipitation fields were extracted from the ECMWF archive and the 24-h precipitation totals were computed from 1 January 2001 to 31 December 2020.Both products are also available from the C3S Climate Data Store (ERA5 hourly data on single levels and ERA5 complete, respectively).One point to note is that ERA5 does not assimilate any rain-gauge data, although composite radar/rain-gauge precipitation estimates over the United States-to the east of the Rockies-are assimilated from 2009 onwards.

| Precipitation evaluation
The nearest neighbour approach was used to select the closest ERA5 grid point to a station observation.This method is used at ECMWF for operational verification so that the raw model value is compared to the raw observation and this same approach is employed herein for the ERA5 product.Then, for both ERA5 precipitation estimates, the root-mean-square error (RMSE) and mean absolute error (MAE) of the ERA5-minus-observation differences were calculated at each station for all days, and for those days in each season across 2001-2020.The improvement in the RMSE and MAE when using the 4D-Var precipitation was determined by computing the relative difference with respect to the short-range forecast precipitation, as follows: The Stable Equitable Error in Probability Space (SEEPS) score was also calculated (Haiden et al., 2012;Rodwell et al., 2010Rodwell et al., , 2011)).SEEPS uses a 3 Â 3 contingency table to evaluate the skill of a precipitation product in discriminating between 'dry', 'light precipitation' and 'heavy precipitation'.A 'dry' day, which occurs with climatological probability p 1 , is one when the precipitation accumulation-after rounding to the nearest 0.1 mm-is less than or equal to 0.2 mm.Herein, only stations with a p 1 less than 95% are considered to reduce the sensitivity to sampling uncertainties in arid climates.The 'light' and 'heavy' precipitation categories are computed with respect to the 2001-2020 climatology with a consistent definition at all stations (for all days and each season), with the threshold between these two categories defined such that 'light precipitation' occurs twice as often as 'heavy precipitation' on average (e.g., Haiden et al., 2012).At each station over 2001-2020, the contingency table is populated with the fraction of the days in each category, and the SEEPS is then determined as the scalar product of this 3 Â 3 contingency table and the scoring matrix (Haiden et al., 2012) given below In this matrix, the observed categories 'dry', 'light', and 'heavy' are oriented from left to right and the corresponding forecast categories are from top to bottom.The SEEPS is a negatively oriented score with values between 0 and a maximum expected value of 1 for unskilled forecasts.Note, though, that the SEEPS can be above 1 for short periods when the scoring matrix is based on a climatology.As with the RMSE and MAE, the relative difference in SEEPS between the 4D-Var and short-range forecast precipitation was calculated.Furthermore, the decomposition of the SEEPS was undertaken to diagnose the source of precipitation errors.In particular, two errors were investigated: (1) the prediction of 'light' precipitation when a 'dry' day occurred, a known problem in numerical weather prediction models (Rodwell et al., 2011); and (2) the prediction of 'light' precipitation when a 'heavy' event was observed.

| RESULTS AND DISCUSSION
Figure 1 shows the relative percentage difference (Equation 1) of the RMSE at the 5637 stations for all days and those days in DJF and JJA.Strikingly, most stations have smaller RMSE values for 4D-Var precipitation, as illustrated by the dominance of blue colours in the maps (Figure 1a,c,e) and the location of most of each of the boxplots below the 0% line (Figure 1b,d,f).The largest improvement occurs when using all days, a time when 87.5% of stations have lower RMSEs, whereas boreal winter has the fewest stations-69.4%-withlower RMSEs, which mostly results from larger errors across Europe, Canada and northeast Asia (orange and red markers in Figure 1c).Boreal summer has reduced RMSEs at 80.7% F I G U R E 1 Maps and boxplots of the relative percentage difference of the RMSE at the 5637 stations for (a,b) all days and those days in (c,d) December, January and February (DJF) and (e,f) June, July and August (JJA).The bottom and top of the boxes are the 25th and 75th percentiles, respectively, the line in the box is the median, the dot in the box is the mean, and the whiskers are the 1st and 99th percentiles.
The notches in the boxplots show the 95% confidence interval around the median calculated from a 1000 bootstrapped sample.
of stations (Figure 1e,f).Stations with the largest reduction in RMSEs, as given by the darkest blue colours, are seen in the eastern United States, western Europe (e.g., France), eastern China, southeast South America and eastern Australia (Figure 1a,c,e).Furthermore, on average, across all stations, the mean RMSE decreases, with improvements of À2.6%, À1.9% and À 2.1% for all days, DJF and JJA, respectively.The relative percentage difference of the MAE (Equation 2) at the 5637 stations for all days and those days in DJF and JJA is shown in Figure 2. First, as with the RMSE, the maps are mostly blue, which signifies that smaller MAE values are found with the 4D-Var precipitation.For example, this is clearly seen in the eastern United States, eastern China and eastern Australia (Figure 2a,c,e).Second, however, compared to the RMSE, there are slightly fewer stations where the MAE values have decreased when using the 4D-Var precipitation, and for all days, DJF and JJA, the improvements in the MAE are found in 77.1%, 60.8% and 75.8% of the stations, respectively.A poorer fit to observed precipitation is especially noticeable in Europe (Figure 2a,c,e) and in Canada and northern Asia in DJF (Figure 2c).These larger MAEs in DJF-which are also seen, but to a lesser extent, in the RMSE in Figure 1c-are found at a time when colder and drier conditions occur, which suggests that these errors may result from a larger overestimation of small precipitation amounts on days observed to be dry in the 4D-Var precipitation estimate.This is a common problem in numerical weather prediction systems, and it is possibly worsened due to the need for stronger adjustments to counter the growth of model systematic error at the beginning of the data assimilation window, which means that there may be a spin-up of the 4D-Var trajectory as it tries to return to its preferred model climate.
In JJA, however, the MAEs are generally smaller in the Northern Hemisphere (as with the RMSE in Figure 1e), which suggests that the more intense convective precipitation, which is more prevalent in this season, is mostly better captured in the 4D-Var precipitation.On average, across all stations, the mean MAE decreases by À2.6%, À1.3% and À 2.4% for all days, DJF and JJA, respectively (Figure 2b,d,f).
To investigate further the difference between both ERA5 precipitation estimates, the SEEPS score and the error contributions to it are evaluated for all days in Figure 3.In terms of the total SEEPS, 62.7% of stations have lower SEEPS values with the 4D-Var precipitation (Figure 3b), but many stations see a poorer SEEPS, which is illustrated by the warm colours across Europe (Figure 3a).As mentioned in the previous paragraph, the prediction of 'light' precipitation when a 'dry' day was observed may be the source of the problem, and the error arising from this is presented in Figure 3c.The red markers on the map show strikingly that a majority of stations have a worse performance with the 4D-Var precipitation, with the boxplot showing 73.6% of stations have a larger contribution to the SEEPS than when using the short-range forecast precipitation (Figure 3d).Stations across the Northern Hemisphere are particularly affected by this larger error.The other source of error analysed here is when a 'heavy' event is observed but a 'light' event is predicted; and this is shown in Figure 3e,f.For this type of error, an improvement occurs at 64.3% of stations and Europe, for example, is notable for the number of stations marked by a dark blue colour.The SEEPS and its decomposition were also evaluated in DJF and JJA, and broadly similar results were found in both seasons and for all days (not shown).

| CONCLUSIONS
This study has evaluated two estimates of precipitation from ERA5-the standard one which uses short-range forecasts and one from the final 4D-Var trajectory of the data assimilation system-to determine if an improved estimate of ERA5 precipitation can be provided.First, results suggest that an improved precipitation estimate can be obtained by using the final 4D-Var trajectory, and when considering all available days in the 2001-2020 period, 87.5% of stations have lower RMSEs using this approach.This does drop, however, to 69.4% and 80.7% of stations when only assessing days in DJF and JJA, respectively.The improvement across the majority of stations is driven by a reduced random error in the 4D-Var precipitation, which arises because this precipitation being from the data assimilation system is drawn closer to the observations which themselves are sensitive to or influence precipitation.Second, in terms of the MAE, generally there are also lower errors, but there are regions (e.g., Europe) where improvements are not found.This issue in Europe (and other regions) with the MAE is elucidated and understood by the decomposition of the SEEPS score to arise because the 4D-Var precipitation has a wetter bias on 'dry' days than the ERA5 short-range forecast precipitation.Finally, the findings highlight that the 4D-Var precipitation does improve the discrimination of 'heavy' observed events, and which because of the squared nature of the RMSE and thus the larger influence of extreme values on it, is also consistent with the improved RMSE values found.
In conclusion, it is generally possible to improve the daily ERA5 precipitation estimate by using the direct output from the 4D-Var data assimilation system.Aspects worth considering in future studies are the optimal timing into the data assimilation window to use, which may be a trade-off between the benefits of better constraint by observations and the problems associated with increased model spin-up, and the effects of the diurnal cycle.This assessment could also be undertaken on other 4D-Var accumulated variables to determine if improvements can be obtained for them.These results herein could prove useful for C3S climate monitoring activities, for the planning of future reanalyses in the coming years, such as ERA6, and may be of relevance to other global reanalysis products.
Maps and boxplots of the relative percentage difference of the MAE at the 5637 stations for (a,b) all days and those days in (c,d) December, January and February (DJF) and (e,f) June, July and August (JJA).The boxplot key is as Figure1.

F
I G U R E 3 Maps and boxplots of the relative percentage difference of the SEEPS score (a,b) and the decomposition of SEEPS for a dry observation and light forecast (c,d) and a heavy observation and light forecast (e,f) computed over all days at the 5637 stations.Stations where the percentage of dry days is greater than 95% are excluded and these stations are shown as magenta dots (the number of omitted stations is given in the legends).The boxplot key is as Figure 1.