Global hydrological reanalyses: The value of river discharge information for world‐wide downstream applications – The example of the Global Flood Awareness System GloFAS

Global hydrological reanalyses are modelled datasets providing information on river discharge evolution everywhere in the world. With multi‐decadal daily timeseries, they provide long‐term context to identify extreme hydrological events such as floods and droughts. By covering the majority of the world's land masses, they can fill the many gaps in river discharge in‐situ observational data, especially in the global South. These gaps impede knowledge of both hydrological status and future evolution and hamper the development of reliable early warning systems for hydrological‐related disaster reduction. River discharge is a natural integrator of the water cycle over land. Global hydrological reanalysis datasets offer an understanding of its spatio‐temporal variability and are therefore critical for addressing the water–energy–food–environment nexus. This paper describes how global hydrological reanalyses can fill the lack of ground measurements by using earth system or hydrological models to provide river discharge time series. Following an inventory of alternative sources of river discharge datasets, reviewing their advantages and limitations, the paper introduces the Copernicus Emergency Management Service (CEMS) Global Flood Awareness System (GloFAS) modelling chain and its reanalysis dataset as an example of a global hydrological reanalysis dataset. It then reviews examples of downstream applications for global hydrological reanalyses, including monitoring of land water resources and ocean dynamics, understanding large‐scale hydrological extreme fluctuations, early warning systems, earth system model diagnostics and the calibration and training of models, with examples from three Copernicus Services (Emergency Management, Marine and Climate Change).

Freshwater, and water in rivers especially, is an essential resource critical for the well-being of the environment and humans, but its unequal distribution across the world and its large temporal variability make it the source of some of the most devastating natural disasters.Between 2001 and 2020, floods and droughts have affected nearly 65 and 80 million people per year, respectively, with on average 163 flood events per year recorded in the Emergency Event Database (CRED, 2022).In 2021, economic losses attributed to natural disasters were estimated to exceed 250 Billion USD world-wide, with floods alone causing $70 billion damage.Losses from the July event in Europe were estimated at $43 billion (Kramer & Ware, 2022), providing further evidence that society is not yet resilient to hydrological extremes.According to the Intergovernmental Panel on Climate Change, there is already evidence that global warming impacts on hydrological extremes (Arias et al., 2021).Moreover, flood damages in Europe are expected to exceed €12.5 billion a year under a scenario of 2 C above pre-industrial level if no adaptation measures are in place (Ciscar et al., 2018).
Robust detection and attribution of non-stationarity to support the IPCC are dependent on the availability of good quality hydrological time series data that are as long and complete as possible (Slater et al., 2021).Such hydrological datasets are also fundamental to better understand the water-energy-food-ecosystem nexus-including the water cycle economic and societal links; adapt to future challenges-for example, associated with climate change and population growth and deliver the European Green Deal for a climate-neutral continent.When accessible reliably and in near-real time, hydrological monitoring and forecasting information, such as that delivered by early warning systems, are also effective to: (i) help achieve the goals of the Sendai framework for Disaster Risk Reduction of the United Nation (United Nation, 2015), (ii) tackle water-related UN Sustainable Development Goals, especially SDG6 and associated goals (UN Water, 2016) and (iii) support improved water management (Dixon et al., 2020) and reduction of hydrological extreme risks such as flooding (Alfieri et al., 2018).Already, initiatives such as the Hydro SOS programme of the World Meteorological Organisation have emerged, aiming explicitly to monitor hydrological status at the global scale (Jenkins et al., 2020) to address those world challenges.
However, quantifying the global hydrological status is challenging owing to the large gaps (both in time and space) in hydrological observational data in major global catalogues such as the Global Runoff Data Centre (GRDC)-the international archive of historical river discharge monthly and daily data records collected on behalf of the WMO (see Figure 1).Further adding to the challenge is the declining availability of gauged river discharge data across the world (Lavers et al., 2019).This inhibits the ability to make timely decisions for effective water management and mitigations actions, as these require the evolution of the hydrological status to be monitored close to real-time.
This paper presents how global daily hydrological river discharge 'reanalysis' (or simulations) time series can address the in-situ data availability challenge, with a particular focus on the Copernicus Emergency Management Service's (CEMS) Global Flood Awareness System (GloFAS) hydrological reanalysis (Harrigan et al., 2020), and presents examples of global hydrological reanalysis applications worldwide including several from different Copernicus Services.

| Remote sensing and in-situ river discharge observations
The many gaps in shared river discharge observation networks undoubtedly limit the creation of daily hydrological time series across the world.Innovations in sensor technology, especially low-cost techniques (Acharya et al., 2021), the development of earth observation (from satellite or aircraft) and the modern cyber-infrastructure offered by cloud computing have opened multiple alternatives to costly traditional in-situ river discharge measurements, an opportunity recognized and supported by WMO (Dixon et al., 2020).Several initiatives have received large investments for generalizing the use of Earth Observations (EO) including the EUMETSAT Satellite Application Facility on Support to Operational Hydrology and Water Management H-SAF (https://hsaf.meteoam.it/About) in Europe and the Office of satellite and product operation from NOAA (https://www.ospo.noaa.gov/Products/land/surface.html) in the United States of America.But despite promising local applications (e.g., derivation of river discharge time series from multi-spectral sensors by Tarpanelli et al., 2020), there is to date no operational EO-derived global river discharge product, with EO data still focusing on water budget fluxes and storage variables (e.g., precipitation, evapotranspiration, soil moisture or snow cover), whilst inland water bodies measurements, including that of surface water storage and river discharge, remain poorly explored (Durand et al., 2021).Next-generation satellite missions, such as the Surface Water and Ocean Topography radar altimetry (SWOT; Biancamaria et al., 2016) or the NASA-ISRO SAR mission (NISAR), offer groundbreaking opportunities for inland water measurements, but they also include technical challenges (Blumenfeld, 2017) and much testing and verification are likely to be necessary after their launch (SWOT was launched in December 2022) before they can be fully used operationally.Finally, issues of spatial resolution and imagery, orbital frequency and gaps in the signal from clouds or vegetation interference (Fassoni-Andrade et al., 2021) make it difficult for EO to provide truly continuous coverage in time and space globally.
In-situ observational river discharge datasets, however, remain unique reference information sources, critical for the calibration of hydrological models and useful for evaluating model performance.Sources like the GRDC provide monthly to daily time series of river discharge (and associated metadata including catchment drainage area) collected by national hydrological and hydro-meteorological services and shared with the international community through a unique data catalogue.More recently, community effort such as Caravan (Kratzert et al., 2023) has emerged to collect, clean-up and standardize meteorological, hydrological and associated metadata datasets, resulting in freely available open-source dataset packages to facilitate large-scale hydrological research.

| Modelled river discharge datasets
To complement the existing gaps of in-situ observational river discharge data, traditional physically based hydrological models or machine learning algorithms combining ground observation and atmospheric datasets have been developed to generate river discharge datasets with global coverage (Ghiggi et al., 2021;Lin et al., 2019).Land surface or earth system models (ESM) (coupled with routing modules such as CaMaFlood [Yamazaki et al., 2011] or mRM [Thober et al., 2019]; see, e.g., Tijerina et al., 2021 for review) or traditional largescale hydrological models (e.g., Open Source OS LIS-FLOOD, de Roo et al., 2000) have increasingly been used as tools to simulate river discharge time series over continental or global domains (see, e.g., Sood and Smakhtin (2015) for a review of global hydrological models, or Bierkens (2015) for analysis of the evolution of global hydrological modelling).Common applications generally include simulation of the past for trend analysis, risk assessment or resource quantification, delivery to hydroclimate services such as early warning systems and assessment of the long-term effect of climatic changes or scenario testing (Bierkens, 2015).By construction, large-scale hydrological models aim for resolving the water balance and generally close the water budget when used off-line, avoiding some of the known artefacts due to data assimilation (Chevallier et al., 2017).
However, whilst modelling is a credible alternative to in-situ measurements and EO river discharge products, modelling applications could be limited by the quality of their outputs, attributed to the difficulty in representing key hydrological processes (e.g., see Clark et al., 2015 for ESM) and in calibrating their parameters (Bierkens, 2015).Another challenge associated with modelled river-discharge data is the availability of good quality climate-weather related variables to drive largescale models (Kingston et al., 2020) at relevant spatial and temporal resolutions.Furthermore, these variables also need to be as consistent as possible in space and time globally so that simulations of different hydroclimatic regions and periods can be compared.
To ensure temporal and spatial continuity and consistency of weather data to generate river discharge time series, ground measurements need to be shared, processed and interpolated at the required resolution.But despite global initiatives such as the WMO Integrated Global Observing System (WIGOS) and its metadata repository (OSCAR), there are no universally recognized global surface variable time series datasets used for hydrological modelling.Additionally, the spatial scale, temporal frequency, limited record length, potential water-balance inconsistency and seasonal and geographical biases in remotely sensed water cycle variables make them not always suitable for regular global applications such as daily simulation of river discharge, limiting the uptake of EO-weather related products for operational hydrological applications (Beck, Vergopolan, et al., 2017;Dembélé et al., 2020).An alternative to observation-based forcing data exists in weather reanalyses datasets.Originally developed to provide consistent (in time and space) information on a range of climate, ocean and land variables, they can be used to force global land surface or hydrological models to generate hydrological reanalysis datasets.
Reanalyses are created using a numerical prediction model and a data assimilation scheme to generate gridded data of the earth system.Whilst they are not without biases (Beck, Vergopolan, et al., 2017;Lavers et al., 2022), reanalyses are sometimes preferred to pure EO data due to their record length (typically covering periods from 1979 when satellite data emerged), and continuous and spatially consistent global features.This preference in reanalyses is shown by the recent increase in their uptake in earth system applications (Baatz et al., 2021) and the efforts to facilitate their use (CREATE project;Potter et al., 2018).Note that the use of land data assimilation (typically soil moisture and snow extent) in earth system reanalysis could introduce spurious trends in derived variables such as runoff and snow melt, in turn propagating to river discharge (Zs otér, Cloke, et al., 2020).Such datasets are therefore less suitable for stationarity assessment compared to those generated by forcing off-line hydrological models with weather data only, although the presence of precipitation biases and trends in the tropics (see, e.g., Lavers et al., 2022) could also impact the quality of river discharge time series.

| THE CEMS GloFAS HYDROLOGICAL REANALYSIS
A widely used example of a near real-time hydrological reanalysis dataset is GloFAS, a freely available global dataset of daily river discharge time series.It is produced operationally by the CEMS as part of the GloFAS Early Warning System for floods, which was originally developed jointly by the European Commission's Joint Research Centre (JRC) and the European Centre for Medium-Range Weather Forecasts (ECMWF).As CEMS GloFAS is an operational service, it benefits from regular upgrades, with major changes in the hydrological modelling chain introduced as 'cycle upgrades' that are associated with a strict version control.This section describes the production chain and associated data access.

| Hydrological process simulation
The main hydrological modelling engine of the GloFAS hydrological reanalysis is the open source grid-based OS LISFLOOD hydrological model (Burek et al., 2013;de Roo et al., 2000;van der Knijff et al., 2010, https://ec-jrc.github.io/lisflood/).Model parameters are linked to global geo-physical maps of land cover type and use, topography, soil texture and depth, river channel morphology and water-demand data (for detailed description, see https://ec-jrc.github.io/lisflood-model/4_1_annex_input-files/ and Choulga et al., 2024) and calibrated using in-situ river discharge observations and ERA5 forcing following the Distributed Evolutionary Algorithm for Python approach (DEAP; Fortin et al., 2012).The calibration aims to optimize daily river discharge simulation (Alfieri et al., 2020;Hirpa et al., 2018) over a period as long a period as possible within the available hydrological observational record for each catchment.Up to Glo-FAS v4, default parameter values are used for the modelling of catchments for which discharge data are not available.In GloFAS v4, a regionalization method based on geographical proximity and climatic similarity (Beck et al., 2016;Parajka et al., 2005) was used to transfer the parameters from calibrated gauged catchments (donors) to ungauged catchments.See Grimaldi et al. (2024) for a full description of the GloFAS v4 calibration.
The hydrological modelling performance of the Glo-FAS hydrological reanalysis is assessed with every major cycle by comparing simulated river discharge daily time series with river gauge observations.Figure 2 shows the modified Kling and Gupta Efficiency criterion (KGE', a measure of how well the modelled time series reproduce the observed time series, typically used in hydrology model verification, Gupta et al., 2009, Kling et al., 2012) associated with GloFAS v4, showing higher performance (dark blue) in large parts of North and South America, Central Europe and Asia.Lowest performance (grey) is often concentrated in catchments with strongly regulated rivers or in regions of complex hydrological processes, such as the United States and Canadian Prairies (Shook et al., 2021).It is not within the scope of this paper to provide in-depth analysis on the possible causes for the poor hydrological simulation, and interested readers are invited to read relevant papers (e.g.Harrigan et al., 2020and Hirpa et al., 2018for GloFAS v2.1, Alfieri et al., 2020for GloFAS v3 and Grimaldi et al., 2024 for GloFAS v4).Note that GloFAS hydrological modelling performance is similar to that of other global hydrological models (Arheimer et al., 2020;Beck, van Dijk, et al., 2017;Murray et al., 2023) and has shown to improve with each GloFAS major cycle (Figure 3).Since GloFAS v3, the KGE' score associated with the operational system is provided through the GloFAS web interface (www.globalfloods.eu).However, it is recommended for users to conduct their own evaluation before using GloFAS data for any downstream application, as locally relevant observational data and performance metrics are likely to provide additional meaningful information that complement global assessments.
During the evolution of the CEMS GloFAS service, two hydrological modelling configurations have been used to generate the GloFAS hydrological reanalysis F I G U R E 2 GloFAS v4 hydrological performance as shown on the GloFAS Information System mapviewer.The colour dots represent the modified Kling-Gupta Efficiency (KGE') metric for all river discharge stations with river discharge observation data available for at least 4 years to the GloFAS team.The KGE' is calculated over the whole available observational data period for each station.
datasets: (1) until GloFAS v2, surface and sub-surface runoff data from ECMWF Integrated Forecasting System (IFS) were used as input to OS LISFLOOD, with horizontal water fluxes along the river network simulated by the routing component of OS LISFLOOD to produce river discharge (Alfieri et al., 2013); (2) from GloFAS v3, meteorological data from ECMWF IFS were used as input to OS LISFLOOD with all rainfall-runoff processes related to river discharge simulated by OS LISFLOOD.The CEMS GloFAS system has also been run at two spatial resolutions: 0.1 for versions up to GloFAS v3 and 0.05 for GloFAS v4.The hydrological configuration and spatial resolution used to produce GloFAS datasets is implicit from the associated version number (see Section 3.3 for a detailed description of the GloFAS version control).

| Hydro-meteorological forcing data
The hydro-meteorological forcing input variables to run OS LISFLOOD (i.e., weather or runoff variables) are from the Copernicus Climate Change service (C3S) ERA5 climate reanalysis (Hersbach et al., 2020), which provides seamless coverage both geographically and in time.ERA5 is the latest generation of reanalysis from the ECMWF (Hersbach et al., 2020) available since 2019 as a C3S product.It offers over 240 parameters at 31-km spatial resolution and up to hourly temporal resolution across the globe from 1950 to present, with postprocessed products delivered as regular latitude-longitude grids easily accessible from the C3S Climate Data Store (CDS).One feature of ERA5 is its 'Timely' component ERA5T which provides products in near realtime with a 3-5 day latency using the same modelling framework as ERA5, revolutionizing access to near realtime climate-related information worldwide.

| Production chain configuration, available datasets and version control
GloFAS hydrological reanalysis has two configurations designed to provide the best possible datasets for different applications (Figure 4).One is a consolidated dataset, which uses quality-assured ERA5 forcing data to provide a reference hydrological dataset for long-term hydrological analysis and defining climatological thresholds.The other is a timely dataset (GloFAST), which uses ERA5T forcing data to deliver data as near real-time as possible for monitoring and early warning applications.
Both the GloFAS reanalysis datasets are time series, on a regular latitude/longitude grid, and cover the majority of the world's land masses.The data are available at a daily temporal resolution from 1 January 1979, with the timely dataset being updated once per day and available until in near-real-time (i.e., 2-5 days behind real-time), whilst the consolidated dataset is updated monthly and available with a delay of 3 months.
The first variable to have been made available is the 24-h averaged river discharge, with daily soil wetness index, total runoff and snow water equivalent timeseries F I G U R E 3 Cumulative probability density function of the modified Kling-Gupta Efficiency (KGE') criterion value associated with GloFAS v2.1 (orange), GloFAS v3.1 (red) and GloFAS v4.0 (purple) calculated using the 745 river discharge stations common to all GloFAS calibrations.The KGE' is calculated over the whole available observational data for each station in the 1980-2021 verification period.
expected to be added to the offering in 2024.Mean elevation and drainage network area for each GloFAS grid cell are also made available to help users identify the correct river grid cell when extracting data.Information about the auxiliary datasets and their use is provided on the GloFAS wiki pages (see https://confluence.ecmwf.int/display/CEMS/Auxiliary+Data).
The strict version control associated with the CEMS GloFAS service and associated GloFAS hydrological reanalysis datasets is provided to users as a 2-digit number, with the most relevant for the GloFAS hydrological reanalysis being the first number that changes with every major change in the GloFAS hydrological modelling chain (Figure 5).

| Data access and support
The CEMS GloFAS hydrological reanalysis datasets are openly available from the C3S CDS following a simple registration process (see https://cds.climate.copernicus.eu/cdsapp#!/dataset/cems-glofas-historical?tab= overview).Users can either request data using an interactive web form or through the REST-based CDS API for programmatic access.Files are provided as gridded timeseries in NetCDF-4 and GRIB2 formats, and the data are packed to increase storage efficiency.Metadata information (see Table 1) includes versioning, a description of the coordinate reference system (CRS) and spatial grid and details on the production methods of the data.
Data can be retrieved for defined periods of time and sub-regions using a latitude-longitude bounding box.Between November 2019 (the first publication of a Glo-FAS hydrological reanalysis dataset on the CDS) and December 2023, over 78 TB of data have been downloaded from the C3S CDS in over 566,000 requests by 2113 users from 114 countries (Figure 6).
For transparency and ease of use, the GloFAS hydrological reanalysis datasets have a unique catalogue entry in the C3S CDS, providing access to legacy versions from version 2.1 to the operational version at the time of the visit, and when relevant pre-operational versions.Access to pre-operational datasets allows users of GloFAS reanalysis data to adapt their downstream processing chains to a new dataset before it becomes operational, which is especially important if there is a change in the modelling chain (e.g., from version 2 to 3) or spatial resolution (e.g., from version 3 to 4).Access to legacy datasets (updated daily in a quasi-operational mode for a few months after the upgrade) provides more time for users to migrate from one version to another after the operational switch.However, it is recommended to always use the latest operational version in any near-real time application.Legacy and pre-operational datasets are not maintained operationally, with legacy suites always expected to be phased out a few months after a new major cycle, and pre-operational datasets remaining under development until the official launch, hence they are not fully quality-controlled and potentially subject to change.Additionally, when used in combination with other Glo-FAS datasets such as forecasts and reforecasts, the same GloFAS version must be used for all datasets so that the modelling chains are consistent.
Alternative data access streams (e.g., direct access from the ECMWF MARS data catalogue and tailored ftp service) are possible and recommended when data are to be used for operational applications, as the C3S CDS does not provide a 24/7 operational service.The service can be set-up upon request through the GloFAS web interface 'contact form'.
Documentation is provided directly on the C3S CDS and the GloFAS website, with a dedicated live documentation repository for the CEMS-Flood forecasting services (https://confluence.ecmwf.int/display/CEMS/Global+Flood+Awareness+System) providing detailed information on CEMS GloFAS models, procedures and operational system, including its versioning system, and a user guide corner (https://confluence.ecmwf.int/display/CEMS/CEMS-Flood+User+Guide+Corner) with detailed explanation regarding data type and access and a FAQ section.A CEMS GloFAS Data Volume rate of water flow, including sediments, chemical and biological material, in the river channel averaged over a time step through a cross-section.The value is an average over a 24-h period.

Elevation m
The mean height elevation above sea level for each pixel in the GloFAS domain.Accessible via the link in the Documentation tab.

Upstream area m 2
The total upstream area for each river pixel.This is defined as the catchment area for each river segment, meaning the total area that contributes with water to the river at the specific grid point.The upstream area always includes the area of the pixel.Accessible via the link in the Documentation tab.
Support service is also available, where users can ask for clarifications on the dataset production and content, or highlight any issues they find in its access and use.It is accessible through the 'contact us' form on the GloFAS website.

| Water resources monitoring
Water is vital for the human and environmental health of the planet, but it is unequally distributed across the world, and can be associated with periods of surplus (or floods) or deficit (or droughts) linked with weather and climate variability, adding stress to the impacted populations and ecosystems.Understanding how much, when and where water is available is key to respond to WMO's ambition of 'thorough knowledge of the water resources of our world' (https://wmo.int/content/wmoseight-ambitions-addressing-water),so that informed planning can be developed to mitigate potential waterrelated disasters.By providing information on river discharge time series in most land masses of the world, global hydrological reanalyses are powerful tools to study global water resources independently of the ground observational network availability.This could be to assess the water resource potential (Figure 7a) or monitor the most recent hydrological status worldwide against long-term means (Figure 7b for a zoom over Europe).The availability of global hydrological reanalyses covering several decades also provides context to quantify flood and drought magnitudes and understand water resources variability.Two retrospective assessments have relied on hydrological simulations to identify areas of deficit or surplus of water.At global scale, the WMO Global State of the Water Report 2022 (WMO, 2023) used an ensemble of global reanalyses (including GloFAS hydrological reanalysis v4) to establish that large parts of north and south America, Africa and Asia experienced below normal river conditions in 2022, in contrast to southern Africa and Canada which had higher than normal conditions, a pattern similar to that seen in 2021.At European scale, the Copernicus Seasonal review ranked the summer 2022 as the second driest in Europe since 1990, with 2003 being the driest (https://climate.copernicus.eu/seasonal-review-europes-record-breakingsummer)using simulated hydrological river discharge from CEMS (EFAS v4).The dataset made it possible to highlight that between June and August 2022, river discharge was below normal in 65% and exceptionally low in 30% of rivers of the European river network (Figure 7b).The dataset also showed that the river discharge reached record low levels across July and August in the Rhine river basin, just 1 year after record floods on the Rhine in July 2021, causing devastating impacts in public water supply, agriculture, power generation and industry or ecosystem.Whilst for Europe, studies such as that presented in Figure 7b do not necessarily rely on modelled weather reanalysis input thanks to a very dense meteorological observational network (Figure 7b uses the EMO5 dataset (Thiemig et al., 2022) as forcing data), similar rapid assessments are possible globally at any time using datasets such as C3S ERA5T and CEMS GloFAST.

| Global ocean monitoring
Given the tremendous social, economic and biological value of coastal zones, Copernicus recognized the need for enhanced core monitoring of the coastal ocean (Le Traon et al., 2019;Mercator Ocean International and European Environment Agency, 2018), enabling response to European policies (MSFD, WFD, MSP, Flood Directive, ICZM, Bathing Water, Common Fisheries Policy) as well as enhancing climate change resilience (Mercator Ocean International, 2021;Melet et al., 2020).As river freshwater input into global ocean modelling directly influences coastal and off-shore ocean salinity and nearsurface mixing, there is a need for better characterization of the land boundary and of the land-to-sea forcing.
Existing integrated systems such as the global ocean reanalysis of the Copernicus Marine Service typically rely on climatological (i.e., long-term averages) river discharge data input, hence not accounting for any interannual variability in freshwater input (Lellouche et al., 2021).However, Zuo et al. (2019) showed that by using the bias-corrected time-varying GloFAS hydrological reanalysis (v2.1) as river discharge input to the NEMO ocean model instead of a monthly climatology (here denoted BT06), sea surface salinity biases were reduced in regions affected by freshwater input such as the Amazon or the North Atlantic.Such improvement was also seen in regions known for their local ERA5 precipitation biases such as the west coast of North America (Figure 8a), suggesting potential for better simulation and monitoring of coastal zones.Using time-varying land freshwater input dataset instead of a monthly climatology also improved the representation of large-scale ocean circulation features such as the Atlantic Meridional Overturning Circulation (AMOC) and Antarctic Circumpolar Current (ACC).Figure 8b indeed shows that the AMOC transport is systematically lower (by $2 Sv) and more consistent with the RAPID-MOCHA observations (Smeed et al., 2017) from 2008 onward compared with simulations using climatological freshwater input (Zuo et al., 2020).Please note that the GloFAS system does not include ice-melt discharges in the Antarctic or Greenland, and the same BT06 climatology was used in all NEMO simulations for these regions.Therefore, improvement in the ACC transports is likely due to improved oceanic transports from other part of the ocean in the NEMO-GloFAS simulation, which leads to accumulated water property change in the Southern Ocean and better representation of density gradient across the Subantarctic front.An alternative approach could be to use satellite-based observations to derive ice-sheet mass changes and corresponding ice-melt discharges in the Antarctic and Greenland to complement the GloFAS reanalysis.Also note that river discharge into the Arctic Ocean in GloFAS v2.1 has high uncertainty due to poorly monitoring of the pan-Arctic drainage area.This has been improved in the new GloFASv3.1 product (see Winkelbauer et al., 2022).

| Understanding large-scale variability and fluctuation
Large-scale climatic teleconnections, such as ENSO or MJO, have been associated with wet/dry anomalies (Rashid & Wahl, 2022) and local hydrological extremes (Towner et al., 2021) but their effect at large scale is difficult to extrapolate from gauged records analyses, despite the potentially devastating impact that synchronous floods and/or droughts could have on global food, energy and water security.By using hydrological reanalyses as proxy observational records, the spatial analytical domain can be extended from gauged locations to continental (Africa, Ficchì & Stephens, 2019) or global (Ward, Eisner, et al., 2014;Ward, Jongman, et al., 2014) scales, hence overcoming the observational network gaps in regions vulnerable to hydro-climatic hazard but with insufficient ground-based measures, such as, for example, East Africa.The studies have highlighted areas where climate variability can affect frequency, timing and magnitude of flood hazards, knowledge that could be useful to inform flood management and agriculture planning, for example.Back extension of the analytical period before traditional in-situ records began (typically in the 1960s) have also become possible thanks to the availability of centennial atmospheric reanalyses datasets such as NOAA's 20CR (from 1871, Compo et al., 2011) or ECMWF's ERA-20CM (from 1899, Hersbach et al., 2015).Additionally, reanalyses ensemble of multiple plausible realizations of weather patterns also exist, either from a multi-model system (Hofer et al., 2012) or from the same model (CERA-20CM, Hersbach et al., 2015;Laloyaux et al., 2018), so that uncertainties can be better accounted for.A centennial ensemble hydrological reanalysis with hybrid Numerical Weather Prediction (NWP)-land surface-river routing modelling configuration similar to that of GloFAS v1 and v2 was generated by Emerton et al. (2017) based on ERA-20CM.The dataset effectively offered a much larger sample of data than possible from traditional observational records (10 streamflow simulations for each of the 30 El Niño and 33 La Niña events) so that areas with a robust signal between teleconnection and hydro-hazards could be identified.Importantly, they also were able to show that the hydrological signal was different than the rainfall signal contained in the forcing atmospheric reanalysis (Figure 9), showing the importance of studying global hydrological reanalysis for hydrological extreme understanding.Nevertheless, caution needs to be applied when interpreting the results due to potential impact of spurious trends due to change in the data source and measurement technologies over the years.

| Early warning systems
Early warning systems are climate service tools designed to support disaster risk reduction and climate adaptation.In autumn 2022, the WMO launched the 'Early Warning for all EW4A' initiative (https://public.wmo.int/en/earlywarningsforall) in response to the UN Secretary General, aiming to address the existing gaps of nearly 30% of WMO members without multi-hazard early warning system and more than one third of the world population not covered by early warnings (WMO, 2020).
Arguably, hydrological forecasting systems are some of the best developed large-scale EWS (see the review by Emerton et al., 2016), with most hydrological EWS relying on some form of hydrological reanalysis for three important aspects.The first is to provide the initial conditions (or hydrological state variables) as close as possible to reality; this is because the quality of estimates at a given time influences the accuracy of the future hydrological states due to the strong autocorrelation in river discharge.This is the case for both real-time and past forecasts (used for forecast skill verification), regardless of the forecast horizon (see, e.g., Harrigan et al., 2023or Emerton et al., 2018).For EWS driven by physically based models, hydrological simulations based on the same hydrological model configurations are generally preferred over in-situ hydrological observations to define the initial conditions (especially as many of the state variables of hydrological models do not have observations) to ensure consistency in the modelling chain, regardless of the overall forecasting techniques (e.g., based on statistics, climatology or NWP information; Troin et al., 2021).For large-scale distributed EWS requiring spatially consistent and continuous information, the hydrological reanalysis can be forced with atmospheric reanalysis (e.g., ERA5T; Hersbach et al., 2020 used in GloFAS), hybrid datasets (e.g., HydroGDF; Berg et al., 2021) or observation-based datasets (e.g., EMO5 but updated in real-time used in the CEMS European Flood Awareness System EFAS (Thiemig et al., 2022), or UK Met Office precipitation and potential evaporation used in UK Hydrological Outlook (Prudhomme et al., 2017)).
The second use of hydrological reanalysis is to help derive EWS hydrological forecast products that highlight when and where hydrological events (floods or droughts) are expected-in support to decision making-especially critical when EWS are based on probabilistic or ensemble forecasting (Ramos et al., 2007).The products can then be made available as graphs (maps/graphs) and tables, through a tailored information system, or used to send warning notification to subscribers (Emerton et al., 2016).For transparency, interpretation, reproducibility and spatial consistency, the same well-defined criteria are used to generate the products automatically, generally comparing the EWS hydrological simulations with a reference climatology.Examples include flood thresholds for GloFAS (www.globalfloods.eu;Alfieri et al., 2013) and GEOGLOWS Global Streamflow system (https://geoglows.ecmwf.int/;Hales et al., 2022), or drought thresholds for CEMS European Drought Awareness EDO (https://edo.jrc.ec.europa.eu/edov2/php/index. php?id=1000; Cammalleri et al., 2021) and the African Flood and Drought Monitor (http://hydrology.soton.ac.uk/apps/afdm/; Sheffield et al., 2014), all of which are based on historical reference simulations.Using reference simulations produced with a modelling framework as consistent as possible as that of the real-time EWS limits the potential influence of systematic biases in the event identification, for example, in uncalibrated regions (Emerton et al., 2020), and allows the use of consistent information across all of the domain, including in ungauged catchments (Reed et al., 2007).However, the reference simulations used to define climatological thresholds can influence the skill of the forecasts (Hirpa et al., 2016).Historical reanalyses, which do not explicitly include biases associated with the weather forecast component, are better suited to monitoring and short-term forecasting, whilst forecast range-dependant reference climatology and thresholds based on reforecasts have been advocated by Alfieri et al. (2019) and Zs otér, Prudhomme, et al. (2020) for forecasts beyond 10 days and are used in the operational GloFAS-Seasonal forecast system (Emerton et al., 2018).Note also that automatic, generalized thresholds applied to the whole EWS geographical domain may not be suitable for all applications, and local user-defined thresholds can be preferred to trigger specific actions, for example, in the humanitarian sector (Coughlan-Perez et al., 2015).
The third hydrological reanalysis application in EWS is to provide a reference data for verifying the forecasts anywhere, even in ungauged catchments.Forecast skill Regions where the difference in probability of abnormally high precipitation compared to probability of high river flow, in the month of February during an El Niño, is greater than 10% (based on the ensemble mean).Pink shading indicates that the probability of high precipitation is smaller than the probability of high river flow, whilst green shading indicates that probabilities are larger for precipitation.From Emerton et al., 2017.can be derived by directly comparing forecasted and historically simulated reanalysis time series, for example, using traditional hydrological 'goodness-of-fit' statistics (e.g., Bischiniotis et al., 2019 in Peru) or using a skill score comparing the forecast system of interest to a simple benchmark forecast to assess the added value of the EWS information (e.g., Alfieri et al., 2014, Arnal et al., 2018, Greuell et al., 2018, Wetterhall & Di Giuseppe, 2018and Wanders et al., 2019over Europe, Harrigan et al., 2023 over the world; Figure 10).Finally, hydrological reanalysis can also be used as perfect forecasts to identify sources of, and response to, forecast errors (Arnal et al., 2017).Forecast skill is an important metadata information that help users make informed decision when interpreting the EWS real-time forecasts.In the CEMS GloFAS system, the skill is summarized through a forecast performance product directly available on the GloFAS mapviewer (www.globalfloods.eu) that can be overlayed with the forecast layer.

| Model calibration and training
Calibration of model parameters is an important part of setting up a hydrological modelling system.Due to large gaps in the in-situ river discharge observational network, inconsistency in the quality control procedures and differences in the methods of measurement of observations, model calibration can be very tricky in many basins around the world.Hydrological reanalysis can provide consistent and multi-decadal data that act as 'proxyobservations' in order to calibrate other hydrological models.For example, Senent-Aparicio et al. ( 2021) used GloFAS hydrological reanalysis to calibrate the Soil and Water Assessment Tool (SWAT) model for the Grande San Miguel River Basin where in-situ observations are sparse.For the period 2005-2010, this methodology of calibration led to an increase in skill of the simulation of monthly river discharge compared to observations.Abate et al. (2023) also found positive results by using GloFAS hydrological reanalysis and actual evapotranspiration (AET) from Moderate Resolution Imaging Spectroradiometer (MODIS) to calibrate the SWAT model for the ungauged Kobo-Golina catchment in Ethiopia.Alternatively, hydrological reanalyses can be used to extend existing observational records (Mbuvha et al., 2022) creating a longer record on which to calibrate hydrological models.Care must be taken to ensure that good performance of the hydrological model is expected for the basin of interest and a pre-processing step may be required before application.
In addition to process-based models, data-driven methods have gained traction in the forecasting community due to their efficient computing use compared with traditional physically-based NWP and hydrological models.Mosavi et al. (2018) give an overview of machine learning models used in flood prediction.One hydrological forecasting example is the Google's end-to-end flood F I G U R E 1 0 GloFAS v3 continuous ranked probability skill score (CRPSS) for reforecasts (spanning the full calendar year period) against a climatology benchmark for extended lead time of 15 days with respect to GloFAS river discharge reanalysis at 5997 river points.Optimum value of CRPSS is 1.Blue (red) dots show catchments with positive (negative) skill.
warning system which uses a multiple linear regression model and a long short-term memory (LSTM) neural network to forecast river stage (Nevo et al., 2022).Forecasting of river stage was preferred to that of river discharge due to the large training data needs for machine learning (ML) models and the greater accessibility of river stage observations compared to river discharge observations.However, hydrological reanalysis can provide training datasets for data driven hydrological forecast models, similarly to recent ML-based weather forecast models trained on ERA5 atmospheric reanalysis (e.g., Panguweather, Bi et al., 2022 andGraphCast, Lam et al., 2023).For example, Rahman et al. (2022) successfully used ERA5 and GloFAS hydrological reanalysis to train and compare multiple ML algorithms for the Danube catchment.
Downstream applications may also benefit from MLbased methods trained on hydrological reanalysis data.Goharian et al. (2022) used GloFAS hydrological reanalysis along with remote sensing data to train a reservoir management decision model to optimize flood control and hydro power generation.Hybrid data-driven and process-based forecasts are also growing in popularity as they allow the efficiency and convenience of machine learning methods to be combined with the physicallyinformed process-based models (Slater et al., 2023).Hydrological reanalysis can then act as forcing and/or training data.For example, Hunt et al. (2022) trained an LSTM neural network on ERA5 and GloFAS hydrological reanalysis from 1990 to 2019 and then used operational forecasts from ECMWF IFS and CEMS GloFAS forecasts to predict streamflow up to 10 days ahead for several catchments in the continental United States.

| Earth system modelling diagnostics
Earth system science is recognized as critical for the understanding of bio-geosphere interactions, including the climate (Steffen et al., 2020), with the ESM concept gaining traction for NWP and climate modelling (Bauer et al., 2021;Swart et al., 2019;Valcke et al., 2006;Ziehn et al., 2020).As for any global modelling systems, new developments and changes in process representation are difficult to assess due to the lack of observational data across the world.To evaluate the benefit of developments and assess the quality of forecasts, hydrological reanalyses have been used as 'proxy' observation or 'hard-tobeat' benchmarks (Pappenberger et al., 2015).Whilst diagnostics generally rely on in-situ observations, benchmarking against an existing hydrological reanalysis product provides confidence in the applicability of new developments to simulate river discharge processes, a component of the water cycle often neglected in ESM diagnostics.Here we summarize two examples of application based on hydrological reanalysis datasets: snow scheme component of ESM and high-resolution land processes.
ESM enhancements are often modular, with each component investigated individually before integration into an operational system.ecLand (Boussetta et al., 2021) is the modelling framework of the land component of ECMWF IFS, enabling the easy running of experiments prior to operational implementation of new developments of the ESM within the NWP modelling chain.One critical component in NWP is snow, an important land surface process influencing both the energy and water balances and in particular the diurnal surface temperature cycle.Whilst complex representation of snow processes through multiple snow layers was shown to improve both snow characteristics (depth and mass) and 2 m temperature compared to a single layer modelling in ecLand (Arduini et al., 2019), the evaluation was limited to nine super sites with in-situ snow and soil temperature observation datasets.Zs otér et al. ( 2022) extended the evaluation to river discharge at 453 locations with daily observed timeseries of at least 8 years of good quality data in snow-impacted regions (defined as with a snow to rainfall ratio of at least 10%).By also comparing river discharge and hydrological-related variables such as snow melt, surface and subsurface runoff from a hydrological reanalysis configuration using 11 different snow module parameterizations within the ecLand land surface scheme and the CaMaFlood river routing (Yamazaki et al., 2011), they were able to diagnose errors introduced in the multi-layer snow-scheme (Figure 11) and identify a parameterization avoiding a degradation in the land hydrological processes.The resulting scheme was introduced operationally in ECMWF IFS 48R1.
ERA5-Land is a global reanalysis dataset of the land component of ERA5.It aims to enhance the ERA5 reanalysis through a representation of the land processes at a high spatial resolution (9 km for ERA5-Land against 32 km for ERA5) to better account for the impact of orography and thermodynamics of near-surface states and to include upgrades in the parameterization of the soil thermal conductivity, soil water balance, snowpack processes and potential evapotranspiration compared with ERA5 (Muñoz-Sabater et al., 2021).It is produced as part of the C3S and is available through the C3S CDS.Similarly to ERA5, ERA5-Land does not explicitly simulate river discharge but the dataset includes surface and subsurface runoff that can be routed offline.As part of the ERA5-Land evaluation strategy, river discharge was simulated using GloFAS v2.1 modelling chain but forced with ERA5-Land runoff data and compared with observed time series for a network of 1285 locations.Results were benchmarked against the skill of river discharge simulated from GloFAS hydrological reanalysis version 2.1 forced by ERA5.Results showed an overall improvement in river discharge skill when using ERA5-Land in most parts of the world, attributed to a decrease in biases and increase in correlation, although regions such as the great plains of Canada and of the USA and southern America showed degradation that could be caused by an excess in accumulated snow and overestimation in evaporation (Figure 12).

| DISCUSSION AND CONCLUSION
Access to global river discharge time series at high temporal (daily) and spatial (kilometre-scale) resolutions and that are available for multiple decades and up to near-real time offers the prospect of numerous functions and applications.Examples include quantifying spatiotemporal patterns and variability to help with long-term planning anywhere, monitoring the latest water resources and hydro hazards (e.g., floods and droughts) for increased preparedness and rapid impact assessments, and hydrological model calibration or training in data sparse regions.
With the absence of a global river discharge observational network (in-situ or from EO) providing open information everywhere at all times, model-based simulations are a cost-effective alternative.However, users should be aware of the limitations associated with any modelled dataset before deciding if it is appropriate for their needs.
For global hydrological reanalysis such as CEMS Glo-FAS, the quality of simulations is impacted by that of the data used in the system, whether they are forcing data F I G U R E 1 1 Daily climatological mean time series of (a) river discharge, (b) snowmelt, (c) surface runoff, (d) subsurface runoff from 11 snow module configuration experiments for the Olenek river at the station of Sukhana in eastern Siberia (with area of 127,000 km 2 ) simulated with the ecLand-CaMaFlood configuration.All water related variables are displayed as catchment totals in order to compare them directly to river discharge (please note the values are divided by 1000 for river discharge, snowmelt and subsurface runoff).Each coloured line is associated with a snow-scheme parameterization, with river discharge observations shown in dashed black line-for simplicity, the experiment characteristics are not explained here but can be found in Zs otér et al. (2022).like ERA5/ERA5T precipitation and other meteorological variables, in-situ river discharge measurement data for hydrological model calibration, or terrestrial water cycle verification data (e.g., soil moisture, snow water equivalent, evaporation groundwater recharge) to assess process representation.Simplified or poor representation of hydrological processes, such as river routing, groundwater storage or evaporation processes, and lack or inadequate accounting for human influence in the modelling chain, such as reservoir management or water abstraction, also affect the quality of the simulations.Often models rely on simplified schemes or long-term average estimates that do not account for annual variability or do not include sufficiently accurate information on geomorphology to describe appropriately complex processes.Despite their shortcomings, global hydrological reanalyses can fill important spatial and temporal gaps in our knowledge and provide qualitative information of the hydrological status compared with climatic normal.Global hydrological reanalyses are especially useful if complemented with metadata and additional information, for example, on modelling performance or included processes (e.g., local drainage network, evaluation statistics, reservoir location maps or expected downstream streamflow influence), and tools to extract the information accurately.Global hydrological reanalysis applications also provide unique opportunities for model verification and enhanced process understanding, especially in areas with limited in situ observations.GloFAS river discharge reanalysis has already proven a popular dataset with over 566,000 requests to download more than 78TBs between November 2019 and December 2023.Thanks to the physically based hydrological modelling behind GloFAS simulations, the hydrological reanalysis data catalogue offering can easily be extended to include other hydrological variables such as soil wetness index or snow water equivalent, or even post-processed variables such as flood inundation extent estimates.Such developments could expand the number of downstream applications to additional sectors, such as hydropower, agriculture, hydro-disaster (floods and droughts) insurance, or be used to complement, verify and extend existing services such as the Global Drought Observatory (e.g., with combined low flow and soil moisture indices) and the Global Flood Monitoring (currently based on SAR imagery from the Sentinel constellation), two services of the CEMS.With a global coverage encompassing all watersheds except Antarctica, a high spatial resolution, a continuous time series with a temporal resolution of at least 1 day, a period covering the satellite era and a commitment to closely approach the real time, the product proposed by GloFAS meets most of the criteria imposed by a real time forecasting and/or a reanalysis system such as the ones operated in the Global Monitoring and Forecasting Centre GLO MFC of the Copernicus Marine Service, and provide key opportunities for coastal applications using offshore information.
F I G U R E 1 2 Modified KGESS for GloFAS reanalysis v2.1 modelling chain forced by ERA5-Land runoff against the GloFAS v2.1 benchmark (forced by ERA5 runoff) across 1285 observation stations.Optimum value of KGESS is 1.Blue (red) dots show catchments with positive (negative) skill gained using ERA5-Land forcing.From Muñoz-Sabater et al., 2021.Synergies between Copernicus Services are instrumental to the production and dissemination of hydrological reanalysis datasets.In the case of GloFAS, resources are shared between C3S, which provides input data (ERA5) and efficient data infrastructure and the sharing facilities of the CDS (such as download forms, APIs and toolbox functionalities), and CEMS, which provides the hydrological modelling capabilities and an operational processing framework to provide near real-time updates of proxy river discharge observations anywhere in the world.
models, with examples from three Copernicus Services (Emergency Management, Marine and Climate Change).K E Y W O R D S climate services, Copernicus, global hydrological reanalysis, hydrological extremes, largescale hydrological modelling, observational gaps 1 | INTRODUCTION

F
I G U R E 6 Number of C3S CDS users of the GloFAS reanalysis per country from November 2019 to 19 December 2023.
a): Mean GloFAS v4 hydrological reanalysis daily river discharge over 1980-2019.Darker blue river sections have larger river discharge.For more detail on GloFAS v4, please refer toGrimaldi et al. (2024).(b): Monthly average river discharge anomalies for June, July and August (JJA) 2022 generated using the EFAS v4 system forced by the EMO5 dataset(Thiemig et al., 2022).The categories 'exceptionally high (low)', 'notably high (low)', 'above (below) normal' and 'normal' range relate to the percentile ranges >90 (<10),75-90   (10-25), 60-75 (25-40)  and 40-60 for the 1991-2020 reference period.Shades of blue indicate higher, and shades of brown indicate lower discharge than normal, respectively.Grey indicates normal range discharge.Adapted from https://climate.copernicus.eu/seasonal-revieweuropes-record-breaking-summer.For both (a) and (b) only rivers with drainage areas greater than 1000 km 2 are shown.F I G U R E 8 (a) Differences of RMS error in salinity (in PSU) from two NEMO simulations with land freshwater input from either GloFAS hydrological reanalysis version 3 (with bias corrections) or monthly mean climatology discharges (BT06, Bourdalle-Badie & Treguier, 2006).RMS errors are averaged over the upper 75 m of the ocean column and against all available ocean in-situ observations (Good et al., 2013) over the 2010-2017 period.A negative value (blue) means that NEMO+GloFAS simulated ocean state is closer to observations compared to the one produced by NEMO+BT06, whilst a positive value (red) means the inverse.(b) Time series of the maximum Atlantic Meridional Overturning Circulation transports at 26. 5 N (Sverdrup, 1 e6 m3s À1 ) in ocean reanalyses (12-month running mean): Solid black = observations from RAPID mooring array.Dash green = Ocean reanalysis with time-varying land freshwater input from GloFAS-ERA5.Dash blue = Ocean reanalysis with seasonal climatology of land freshwater input (BT06).