The utility of impact data in flood forecast verification for anticipatory actions: case studies from Uganda and Kenya

Skilful flood forecasts have the potential to inform preparedness actions across scales, from smallholder farmers through to humanitarian actors, but require verification first to ensure such early warning information is robust. However, verification efforts in data-scarce regions are limited to only a few sparse locations at pre-existing river gauges. Hence, alternative data sources are urgently needed to enhance flood forecast verification to better guide preparedness actions. In this study, we assess the usefulness of less conventional data such as flood impact data for verifying flood forecasts compared with river-gauge observations in Uganda and Kenya. The flood impact data contains semi-quantitative and qualitative information on the location and number of reported flood events derived from five different data repositories (Dartmouth Flood Observatory, DesInventar, Emergency Events Database, GHB, and local) over the 2007 – 2018 period. In addition, river-gauge observations from stations located within the affected districts and counties are used as a reference for verification of flood forecasts from the Global Flood Awareness System. Our results reveal both the potential and the challenges of using impact data to improve flood forecast verification in data-scarce regions. From these, we provide a set of recommendations for using impact data to support anticipatory action planning.


| INTRODUCTION
Climate change, variability, and environmental changes disproportionately affect the agricultural sector in Africa with important implications for anticipatory action as part of humanitarian response.In the agricultural sector, these changes could force smallholder farmers who depend on rain-fed crops or flood-recession agriculture to significantly adjust their farm activities (Ficchì & Stephens, 2019;Ochieng et al., 2016;Salack et al., 2015).In Uganda, farmers need reliable and skilful information on the rainy season onset and amount of rainfall, as well as flood occurrence, duration, magnitude, and severity $1-2 months before the season onset to inform their coping strategies (Mitheu et al., 2022).Decision-makers and humanitarian actors aiming to reduce risks and protect livelihoods are also increasingly considering forecast information to inform the early action mechanisms and operational decisions (Coughlan De Perez et al., 2016;Emerton et al., 2020;Hansen et al., 2019;Lopez et al., 2020;Nidumolu et al., 2020).Given this, the skill of any forecast information provided needs to be transparent and well understood if it is to inform preparedness actions appropriately.
In the context of users' needs, forecasts should be evaluated based on their potential to trigger early actions, which can reduce expected losses if an extreme event occurs (Lopez et al., 2020).The evaluation should also consider the consequences of 'acting in vain', which are particularly important in disaster risk reduction and humanitarian actions (Coughlan De Perez et al., 2015).Indeed, several studies have shown that verified and skilful forecasts have the potential to improve preparedness actions for both the agricultural and humanitarian sectors (Coughlan De Perez et al., 2016;MacLeod et al., 2021;Nidumolu et al., 2020;Nyadzi et al., 2019;Paparrizos et al., 2020).However, this verification is carried out only for regions with long-term historical hydrometeorological observations, typically from in situ stations such as river gauges.In forecast verification, these observations are commonly known as conventional observations (Marsigli et al., 2021).
In data-scarce regions, where conventional observations are limited (Coughlan De Perez et al., 2016;Ogutu et al., 2017), less conventional verification data can be derived from, for example, social media reports, citizen volunteered information, impact/damage reports, and insurance data.The resulting information can be used to bridge the forecast verification gap through nontraditional approaches as they provide a more direct representation of the event (Marsigli et al., 2021).For example, information from insurance databases (Bernet et al., 2017;Cortès et al., 2018), as well as online tools such as Google Trends and Twitter feeds (de Bruijn et al., 2019;Thompson et al., 2022) have been used as reference information to evaluate the occurrence of floods.Impact data have also been used with river-gauge observations to identify the magnitude of discharge that is associated with flooding (Coughlan De Perez et al., 2016).Notably, impact data offer an advantage in the verification of forecast information, because they can be derived from openly accessible data repositories containing quantitative and qualitative information across large spatial areas that enable a better and direct representation of the impacts of the extreme event.The use of impact data in forecast verification can only be possible in areas with exposure and vulnerability for the impact to be reported.
It is worth noting that global data repositories such as the Emergency Events Database (EM-EM-DAT, 2020) and the United Nations Disaster Inventory System (DesInventar [DI]; UNISDR, 2018) are prone to biases due to known limitations (Gall et al., 2009).These limitations include under-reporting/over-reporting of the hazards, aggregated spatial coverage, over-representation of certain locations, and/or focus on the specific type(s) of impacts.Furthermore, differences in the criteria for the inclusion of events in the repositories may result in nonuniformity in the estimates of the impacts reported in each repository.In addition, if unverified, impact data collection methods (e.g., from governments and media) may lead to errors in the resulting information (Guha-Sapir & Below, 2002).Despite these caveats, these data repositories represent a potentially valuable source of less conventional data for monitoring and verifying hazards.For example, impact data can be integrated with other geophysical parameters to sub-categorise flash floods from the primary corresponding disaster type (Kruczkiewicz, Bucherie, et al., 2021).Therefore, if the limitations of impact data are appropriately understood, with guidance on their interpretation and relevant recommendations, impact data can be improved to effectively support anticipatory actions.
In this study, we assess the usefulness of flood impact data to verify flood forecast information across Uganda and Kenya compared with river-gauge observations.We verify the river flood forecast from the Global Flood Awareness System (GloFAS) of the Copernicus Emergency Management Service (Harrigan et al., 2023) using two reference observations.The river-gauge observations and flood impact data were derived from several global and national data repositories.
The study addresses two research questions: 1. How suitable are impact data for verifying flood forecasts compared to river-gauge observations?2.Where river-gauge observations are limited or unavailable, how best can impact data be used to verify flood forecasts and ensure anticipatory actions are informed?
Through focussed case studies in two East African countries, we investigate the non-traditional approach of forecast verification using impact data relative to the traditional way of verification using river-gauge observations.Consequently, we provide recommendations on how best impact data can be used in areas with no or limited river-gauge observations to increase confidence in the use of forecast products in data-scarce regions.
In this section, we describe the case study regions and the datasets used for the analysis, that is, the GloFAS reforecast discharge data, river-gauge observations, and the impact data from several data repositories.

| Case study regions
The Netherlands-based IKEA Foundation is supporting the Uganda and Kenya Red Cross Societies (URCS and KRCS, respectively) to develop early warning mechanisms to prepare for floods through the Innovative Approaches for Response Preparedness (IARP) project.In Uganda, several high-risk areas were identified using vulnerability and risk layers developed by the National Emergency Operations and Coordination Centre (NECOC), including a total of 15 districts, for the early action protocol (EAP) development.These regions are prone to flooding and waterlogging across the two rainy seasons between May and November (April-May, Long Rains; September-November, Short Rains).In Kenya, flood-prone river basins including Tana, Nzoia, and Athi are considered for the implementation of flood early actions.Examples of early actions include community awareness, distribution of cash and shelter kits, dissemination of early warning information among others (see KRCS, 2021;URCS, 2021).
The case study regions in Uganda and Kenya were selected based on locations with available river-gauge observations.In Uganda, the districts of Katakwi and Amuria on the Akokorio river (hereafter 'Katakwi'), Tororo (Butaleja), and Mbale (Bududa and Manafwa) on Manafwa River (hereafter 'Manafwa'), and Kiboga, Mubende, and Hoima on the Mayanja River (hereafter 'Mayanja') are considered.In Kenya, the county of Tanariver and Garissa on Tana River (hereafter 'Tana'), Busia and Siaya on Nzoia river (hereafter 'Nzoia'), and Taitataveta and Kilifi on Athi river (hereafter 'Athi') have been considered.Figure 1 shows the locations of the river-gauge stations and the affected counties/district in Kenya and Uganda, respectively.

| GloFAS flood forecasts
GloFAS is an operational global ensemble flood forecasting system developed jointly between the European Commission's Joint Research Centre (JRC), the European Centre for Medium-Range Weather Forecasts (ECMWF), and the University of Reading researchers (Alfieri et al., 2013).The system provides probabilistic extended range discharge forecasts for up to 45 days and seasonal outlooks up to 4 months lead time (Emerton et al., 2018) over the entire globe at a resolution of 0.1 .From GloFAS ensemble of medium to extended range meteorological forecasts from the ECMWF Integrated Forecast System to produce 51 ensemble members of daily streamflow at various lead times.LISFLOOD has been calibrated using daily streamflow data at over 1200 river basins worldwide (Hirpa et al., 2018).
GloFAS v3.1 hydrological performance was evaluated for the period 1979-2019 for over 1500 verification stations across the world using various verification metrics (Kling Gupta Efficiency, Bias, variance, etc).Prudhomme and Zsoter (2021) provide details on the hydrological assessment methodology and further discussion on Glo-FAS performance evaluation.GloFAS provides daily discharge amounts [m 3 /s] from which probabilities of flood threshold exceedance can be derived.For flood detection, these forecasts time series are compared against a set of flood thresholds that are derived from the same model climatology (Zsoter et al., 2020) to avoid the impact of systematic biases in the GloFAS climatology on flood forecast probabilities.In this study, we use daily GloFAS v3.1 reforecast discharge data from 2007 to 2018 extracted for the gauge locations in Kenya and Uganda, respectively (Figure 1).

| Flood thresholds
In the 30-day operational GloFAS forecast interface (https://www.globalfloods.eu/),four different flood return periods (2, 5, 10, and 20 years) are provided and can be used as the thresholds for severe flood events.Zsoter et al. (2020) provide a detailed explanation of how these return periods are computed using GloFAS ensemble reforecasts.Furthermore, thresholds computed as percentiles of the daily river flow time series can also be used to define various hydrological conditions (e.g., high/ low river flows) and have been used by several authors to evaluate forecasts from GloFAS or similar forecasting systems (see Alfieri et al., 2013;Arnal et al., 2018;Emerton et al., 2018;MacLeod et al., 2021).For example, high percentiles (90th percentile or greater) have been used to show a high likelihood of floods when the river flow at a gauging station is above that percentile (MacLeod et al., 2021).In the broad hydrological literature, the notation for flow percentiles is not always consistent or clear, so when percentiles are used, the definition needs to be specified clearly.
In this study, we adopt the traditional definition of percentiles used in statistics where a kth percentile (with k in the range of 1-100) for a time series is the level below which (or at which) a k percentage of values in its distribution falls (the inclusive definition of percentile is adopted).For example, a 90th percentile is equal to or >90% of the river discharge recorded during the specified period.In flood-related studies, a percentile flow can also be referred to in terms of 'percent exceedance' to indicate the percentage of time that the discharge value is likely to be equalled or exceeded (see ;Flow, Excedance and Percentiles, 2023;National River Flow Archive, 2023).Thus, in this study we use the 90th, 95th, and 99th percentile calculated from the re-forecast (all ensemble members) or observed time series of daily discharge, corresponding to high-flow levels exceeded only by a minor portion of the days in the data, that is,10%, 5%, and 1% respectively.

| River-gauge observations
Observed point-based discharge time series for the river gauges considered here were provided by the Department of Water Resources Management (DWRM) in Uganda and by the Kenya Water Authority (WRA) for Kenya.The time series consists of daily discharge values over long periods with all stations having at least 5 years of daily discharge data over the study period.The rivergauge observations corresponding to the period of the impact data (2007)(2008)(2009)(2010)(2011)(2012)(2013)(2014)(2015)(2016)(2017)(2018) have been used for the subsequent analysis.

| Flood impact data
Flood impact data have been used to extend our capability to verify GloFAS flood forecasts beyond conventional observations from sparse gauge networks.The flood impact data contain semi-quantitative and qualitative information on the location and number of reported flood events derived from five different data repositories: (1) Dartmouth Flood Observatory (DFO) Archive (Brakenridge, 2015), (2) DI (UNISDR, 2018), (3) EM-DAT (EM-DAT, 2020), (4) the Global Hazard Weekly Bulletin (PHE, 2015), and (5) local sources (URCS, KRCS, media, etc.) for the 2007-2018 period.These data were collated for Uganda and Kenya for the study regions (districts/counties) for further analysis.The characteristics of these data repositories are summarised in Table 1.
In an ideal situation, an impact would be defined as a combination of the number of people affected and the quantitative estimate of any loss of property and livelihoods.However, the used repositories do not have enough quantitative loss and damage information disaggregated to sub-national administrative units to enable the quantification of impacts and the severity of the flood events.We, therefore, consider the number of flood events reported as a proxy to the impact with an assumption that flood events that result in considerable impacts would be reflected in the data repositories used.The flood events are then classified as either 1 or 0 if the event was reported or not, respectively.The assessment of the number of flood events from the various sources, as well as the overlap (events that are common across the repositories used here), would help understand which data repository is used to identify the highest number of flood events for each study location.

| METHODOLOGY
Here, we outline the comparative analysis of river-gauge observations and impact data and the verification of Glo-FAS flood forecasts using two reference data sets through a set of skill scores.To assess the usefulness of flood impact data in verifying flood forecasts, first, the adequacy of the impact data in supplementing the rivergauge observations is evaluated using Type I and Type II error indices.Second, the flood forecast data are verified using river discharge and impact data as reference, and the verification outcomes based on the probability of detection (POD) and false alarm ratio (FAR) are compared.

| Comparison of river-gauge observations and impact data
In this part of the analysis, we compare the river-gauge observations and impact data.River-discharge value (Q) that has the potential to cause flooding is defined using the 90th and 95th percentile as the threshold, that is, a flood event (binary) occurs when Q is above the threshold, and it does not occur if Q is below the threshold.The total flood events from impact data are derived from the data repositories while considering the overlaps using the timestamp to avoid duplication in the total events.This means that an event that occurs across all the data repositories for the same timestamp is considered one event.The total flood events from impact data (binary) are then compared with river-gauge observations (binary).
Here, we assess the consistency of impact data false positive and false negative outcomes using a window of 7 days (from the day of the observed event up to 7 days ahead) against the flood events picked from the rivergauge observations.Using a 2 Â 2 contingency table, the false-positive outcome is used to compute the 'Type I error' which represents the ratio of the flood events detected by river-gauge observations with no impacts divided by the total flood events (from gauge observations).Additionally, the false negative outcome is used to compute 'Type II error' which represents the ratio of flood events detected in the impact data and not by rivergauge observations divided by the total number of flood events (from impact data).We first compare the rivergauge data (binary) with the impact data (binary) from the various sources across the locations.Next, we compare the river-gauge observations against impact data from a single data repository to assess if impact data from some repositories are better than others in detecting flood events.Type I and II errors are calculated according to the equations in Table 2.

| Flood forecast verification using river-gauge observations and impact data
A set of skill scores were used to evaluate the occurrence of forecasted floods from the GloFAS system against river-gauge observations and impact data.The ability of the forecast to discriminate between events and nonevents is commonly measured using skill metrics calculated from a 2 Â 2 contingency table.Two skill scores were used to quantify the occurrence of flood events (Wilks, 2006): (1) POD or hit rate, which measures the fraction of observed events that were correctly predicted (perfect score of 1) and ( 2) FAR, which indicates the fraction of the predicted events that did not occur (perfect score of 0).Table 3 shows the equations used to calculate the skill scores.
In this study, the verification of flood forecast events is based on the need to provide reliable flood forecast information to inform anticipatory actions taken by the communities and humanitarian actors.The preferred verification outcome will therefore depend on the decisionmaking strategies the actors are willing to take.For example, humanitarian actors might need to decide if actions should be taken based on any forecast probability which might be costly due to the number of events but would ensure reduced losses if the events materialise.
T A B L E 1 Characteristics of the data repositories that were used to derive impact data for the study.

Data repository
(reference) The second alternative would be if actions should be taken based on a forecast that shows a high likelihood of event occurrence to minimise the expenses that would be incurred if the actions turn out to be in vain (see Lopez et al., 2020).Various factors identified from the EAPs developed by URCS and KRCS have been adopted in this study.First, a flood forecast with a 60% chance of happening triggers early actions.Hence, we consider forecasts that indicate a forecast probability of 60% and above to correspond to a flood forecast event, and below 60% to correspond to a 'no-flood' event.Second, in the calculations of events correctly forecasted, an action lifetime of 7 days, is considered.'Action lifetime' is defined as the length of time during which action will remain effective in reducing impacts (Coughlan De Perez et al., 2016).In forecast verification, the action lifetime is commonly known as the 'margin of error' and it is used to give more tolerance to the forecasts such that even if the forecast is late but materialises within the duration of the action lifetime, the actions will still be considered successful.For example, if an action is taken and a flood occurs up to 7 days after the forecasted date, this will still be considered a 'hit' if the action lifetime is >7 days (see Figure 2 for a visual description of the action lifetime and margin of error).Depending on the type of action, the action lifetime can range from 7 to 90 days.This can also vary depending on a specific country's flexibility on the actions to take and the acceptable number of times the stakeholders are willing to 'act in vain'.For Uganda and Kenya, the stakeholders set the probability of 'action in vain' to 50%, indicated using the FAR.From Figure 2, We have considered a margin of error of 5 days and action lifetime of 10 days.However, these parameters can still vary depending on the type of action.
Using distinct flood discharge thresholds (i.e., 90th and 95th percentile) calculated from the GloFAS reforecast data and river-gauge observations, we verify flood forecast using river-gauge observations and impact data as a reference.This study was therefore not meant to evaluate the hydrological performance of GloFAS (calibration and validation of GloFAS time series) but to assess the usefulness of the two reference datasets in forecasts verification.Using a 7 days-action lifetime and 60% probability of flooding, we compute the differences in the skill scores (POD and FAR) for forecast-observed data and forecast-impact data pairs, respectively.Here, if the difference between the 'POD observed' and 'POD impact' is negative and the FAR difference is positive, impact data are more favourable in skill assessment than river-gauge observations and vice versa.Additionally, if the river-gauge observations or impact data (or both) report a flood event for the same days as in the GloFAS flood forecast (within the action lifetime of 7 days from the warning), the reference data (observed or impact) are favourable in skill assessments.

| Impact data from the data repositories
In Uganda, in two districts (Katakwi and Manafwa) the reported impacts from the data repositories show a higher number of events reported in 2007, 2010, 2011, 2012, and 2018 from DI and DFO as compared with other years.However, the flood events for Mayanja from all the data repositories across the years are low.Table 4 shows the number of events across Uganda and the three locations from 2007 to 2018.The number of flood events from each repository presented in Table 4 is independent, that is, it does not consider any overlap across the repositories.
The analysis of the number of flood events from multiple and single data repositories shows that in Katakwi there are 434 flood events where DI recorded the highest number of events at 36%, followed by DFO at 19% (Table 5).Data collected across Katakwi by URCS also provide a substantial contribution (14%) to the flood events in the area.The overlap from multiple data repositories (EM-DAT, DI, and DFO) contributes to 11% of the total flood events.In Manafwa from a total of 304 events, the highest number of events are from single source DI and overlap between EM-DAT and DFO, at 33% and 28% respectively.EM-DAT alone contributes 14% of the total events.In Mayanja, only two data repositories contribute to the flood events.These are the DI at 23% and EM-DAT at 77% totalling 102 events.
In Kenya, many flood events were reported in 2007, 2010, 2011, 2013, 2015, and 2018 across the country and the three study locations (see Table 4).EM-DAT also records the highest number of flood events across the three locations contrasting with findings in Uganda, whereas DI reported the lowest.For example, in Nzoia EM-DAT represents 69% of the total flood events, local sources contribute 12%, whereas DI covers 6%.The overlaps between the various sources contribute marginally across the locations.For example, EM-DAT and DI together contribute <1% in Tana, 3% in Nzoia, and 1% in Athi (Table 5).
4.2 | How adequate are the impact data in supplementing river-gauge observations in identifying flood events?
The comparative analysis in the three locations in Uganda using combined impact data from the various data repositories and observed gauge data show varied results across locations and thresholds.For example, in Katakwi (Figure 3a) by using the 90th percentile from the rivergauge observations, the impact data capture 60% of all gauged flood events, but 42% of the reported flood events from the impact data do not correspond to flows above the 90th percentile threshold.This could mean that either the threshold is too high, with lower flows still causing impacts or the impacts reported were a result of another form of flooding like flash floods or waterlogging from heavy rainfall.In Manafwa and Mayanja (Figure 3b,c), Type I and Type II errors across the thresholds are high (above 0.5) which could mean that the quality and quantity of available impact data for these locations were not adequate (Type I), and the impacts reported were not as a result of riverine flooding (Type II).
The comparative analysis shows a high Type I error across the 90th and 95th percentile in the Kenyan locations.This means that though the observations indicate flood events, there were no impact data to correspond to these events or the quality of the available impact data was not good enough.On the other hand, the Type II error is also high across the locations, suggesting that impacts reported resulted from different forms of flooding, such as flash floods.For example, in Tana at the 90th percentile, impact data capture only 40% of all gauged flood events, but half of the reported flood events do not correspond to flows above the 90th percentile.Figure 4a-c shows the comparative analysis across the thresholds for Tana, Nzoia, and Athi respectively.
The analysis using a single data repository shows an increase in Type I error in all the locations in Kenya and Uganda (Figure 5a,b).For example, in Katakwi using DI alone results in a Type I error of 0.59 as compared to a Type I error of 0.39 while using four data repositories (DI, EM-DAT, local, and DFO).In Tana, EM-DAT results in a Type I error of 0.79 as compared to 0.61 while using data from all the repositories.Type II error fluctuates across the locations (Figure 5c,d).For example, at the 90th percentile, despite Nzoia having almost the same number of flood events from EM-DAT and local sources, Type II error is higher while using local sources as compared to using EM-DAT (Figure 5d).This shows that at the same (higher) threshold for example at (90th percentile) more events are likely to be missed out (events falling below the threshold) from the local source which takes into consideration more localised events as compared to a high-impact data repository like EM-DAT.In other words, a data repository that considers a low threshold for inclusion of the event in their database may require a low threshold based on gauge observation to correctly identify the flood events as compared to a data repository that considers high threshold for inclusion.

|
Where river-gauge observations are limited or unavailable, how best can the impact data be used to verify flood forecasts and ensure anticipatory actions are informed?
We plotted the difference between the forecast skill scores (POD and FAR) obtained using the river-gauge observations and impact data (i.e., POD observed À POD impact and FAR observed À FAR impact ) as a reference for verifying flood forecasts across all the locations and two percentile thresholds to assess their potential in forecast verification (Figure 6).The results show that impact data gives a more favourable assessment of skill as compared to the observed data at the 90th and 95th percentile across lead times in Katakwi (i.e., POD impact > POD observed and FAR impact < FAR observed ).For other locations at a lead time of up to 15 days, the impact data underestimate the Glo-FAS skill in terms of POD and FAR.At longer lead times (>15 days), Nzoia shows a good assessment of skill in terms of POD.These outcomes can be associated with the quantity and quality of the impact data that were available for most locations (except Katakwi and partly Nzoia) which also corresponds to the findings in Section 4.2.The highest difference in the POD of up to 0.4 is seen in Mayanja at the 90th percentile while other locations show a difference of below 0.2.The FAR is however spread out  across locations with a change of about 0.5 in Mayanja and Athi.POD and FAR graphs for the study locations at 90th and 95th percentile using river-gauge observations and impact data are provided in Figure S1.

| DISCUSSION
Using less conventional data such as impact data in forecast verification are gaining interest among researchers and practitioners.However, these data sources, just like hydro-meteorological data, are subject to errors and biases (Wilby et al., 2017).Despite these shortcomings, the impact data have the potential to ensure early warning systems are robust.In this section, we discuss the findings and implications of using impact data to verify flood forecasts and the assumptions that have been considered.First, we discuss the available impact data in the East African countries (Uganda and Kenya).Second, we highlight the adequacy of the impact data compared with river-gauge observations and how that may influence forecast verification.Last, we highlight the potential and challenges of using impact data to verify forecast information in data-scarce regions and provide recommendations that can be useful in improving the impact data to ensure effective early actions.

| What does the available impact data from Uganda and Kenya tell us?
Among the four main data repositories used in this study, DI had the highest number of flood events in Uganda (Katakwi and Manafwa districts), whereas across Kenya and the three counties, EM-DAT reports the highest number of flood events (Table 4).The differences can be associated with the criteria used for the inclusion of impact data in these repositories as well as the country-specific regulations on the collection and systematic reporting of impact data (Osuteye et al., 2017).For example, in Katakwi, if we consider a specific period from 1 August 2007 to 31 October 2007, EM-DAT reported a total of 11 flood events while DI reported 9 flood events (all considering the 7-day window, Section 3.1).Among the events, seven flood events overlap across the repositories while EM-DAT has four distinct events and DI has two distinct events.Therefore, using DI alone will result in fewer (À4) flood events while using EM-DAT alone will result in fewer (À2) flood events.This is just one example and the differences in flood events across data repositories might increase or decrease.Due to such differences, using only one repository can lead to a bias in the outputs generated (e.g., underestimation of event frequency).
Although we disaggregated the impact data into districts and counties, we only used the qualitative information classified as impact/no impact to guide the analysis.This is because there are no direct quantitative loss estimates available for these locations useful in understanding the severity of each flood event.Quantitative estimates are usually reported as aggregated quantities across a region, rather than disaggregated quantities for smaller geographical areas within the region (Gall, 2015).For example, in EM-DAT, the 2007 flooding between August and October that impacted different parts of Uganda are combined as one record (Disaster number 2007-0408; EM-DAT, 2020) with the quantified impact on, for example, the 'number of people affected', also aggregated.The insufficient reporting of quantitative estimates in areas of small spatial coverage can limit the analysis and affect the robustness of any conclusion, especially from a livelihood perspective (Osuteye et al., 2017).In addition, these repositories have differences in the parameters used for reporting.For example, EM-DAT reports only one parameter of 'number of people affected', whereas DI reports the same using two parameters; 'directly affected and indirectly affected'.As also noted in Below et al. (2010), this hinders the direct quantitative comparison between the two data repositories.
5.2 | How adequate are the impact data in identifying thresholds for impactful river flooding and in verifying flood forecasts?
Setting up early warning mechanisms for floods often depends on the thresholds derived from river-gauge data to identify the level at which the river discharge may result in impactful flooding.In data-scarce regions, impact data can help to determine such thresholds (Coughlan De Perez et al., 2016) but this requires a large number of good quality impact data to reduce the chances of over-representation/under-representation of impacts (Ranger et al., 2011).We have found that even within the same country impact data are not consistently available across all locations (Barabadi & Ayele, 2018), which may lead to bias in the outputs.Our analysis shows that using more than one source of impact data reduces the chances of a Type I error or situation where flooding occurs but impact data are not available.For example, although EM-DAT contributes to over 69% of all impact reported in Tana, Nzoia, and Athi respectively, using this repository alone results in an increase in Type I error (flood observed in gauged data but not reported) compared with using all three repositories (EM-DAT, DI, Local; Figure 5b).This can be associated with the inclusion criteria for the various data repositories.For example, for a repository like EM-DAT, only high-impact flood events are represented leaving out low-impact flood events.
We have found that the consistency between impact data and river-gauge data varies markedly across the thresholds, but the variability is location-dependent.For example, in Katakwi, there is good correspondence between the river-gauge observations and impact data at the 90th percentile.This suggests impact data can be used to identify river discharge critical thresholds at which impactful flooding occurs.These findings are consistent with scientific literature where impact data have been successfully used to define flood thresholds.For example, Young et al. (2021) used impact reports to determine the rainfall thresholds that resulted in flooding in the urban city of Alexandria, Egypt.
Although we used the percentile-based method to identify flood events, we acknowledge that high-impact events are generally higher than the 99th percentile (MacLeod et al., 2021), but to ensure robustness of the statistical analysis, we adopted the 90th and 95th percentile thresholds as several previous authors did (e.g., Arnal et al., 2018;MacLeod et al., 2021).These percentiles may include low-impact flood events that are likely to affect local limited areas (with relatively high frequency, e.g., 5% of days over a year for the 95th percentile) but are useful in cases where impact data is used in the verification due to the differences in the inclusion criteria of flood events in the various data repositories (see Table 1).In some previous studies, even lower thresholds are used because of data availability limitations, to ensure robustness in the verification.For example, Arnal et al. (2018) used terciles (33rd and 66th percentiles) of the simulated streamflow for the verification of seasonal streamflow forecasts and discussed the need to consider high thresholds such as the 95th percentile if more data were available.We therefore recommend that further studies with possible longer data periods available, should look at the representativeness of results across flood thresholds higher than the 99th percentile.
Other locations in Uganda and Kenya show an increase in Type I (and Type II) error as the river flow threshold decreases (increases).The increase in Type I error can be related to the inadequacy or the low quality of impact data used in this analysis, i.e. for both inadequate impact data (if the repository did not include an observed event) and low-quality data (if the timestamp of the impact data is incorrect) a false positive is produced.Type II error could have resulted if impacts reported were not because of riverine flooding but other subtypes of flooding, and this can also be influenced by the inclusion criteria which are specific to each data repository.Although a repository like EM-DAT differentiates floods using subtypes such as riverine and flash flooding, DI does not include such subtypes.These subtypes would help ensure that flood events are further categorised before analysis to reduce the Type II error.In addition, such differentiation can help in designing appropriate preparedness and response interventions which vary based on the sub-type of flooding (Nauman et al., 2021;Paprotny et al., 2021).To further confirm the source of increase in Type II error, data derived from satellite imagery (e.g., Sentinel-1 and Sentinel-2) could be used to identify if floods occurred as well as their spatial location (with respect to rivers), which can help discriminate riverine floods (Tarpanelli et al., 2022).
The differences in POD and FAR vary across the study locations considered here.Except in Katakwi and partly in Nzoia (>15 days lead time), where we get a more favourable assessment of skill while using impact data, other locations show that using impact data underestimate the GloFAS skill both in terms of POD and FAR.Though the differences are minimal in the majority of the locations, it still means that impact data cannot be adequately used to verify flood forecasts in most locations, as highlighted previously by Gall (2015).However, the available river-gauge observations and impact data could be used to train the hydrological model used in the GloFAS system through calibration and validation in specific locations that show poor detection of flood events.In other words, the available historical impact data and gauge observations can be used to assess the hydrological skill of the GloFAS using scores such as Nash-Sutcliffe efficiency which assesses temporal variability and agreement between the modelled and observed data (see Teule et al., 2020).Overall, being aware of uncertainties that can result in using the available impact data can help ensure the outputs are used appropriately in supporting anticipatory actions.
5.3 | How best can the impact data be used to verify flood forecasts in data-scarce regions?
Our exploratory analysis has highlighted several factors that are affecting the efficacy of impact data for verifying flood forecasts in most of the study locations in Uganda and Kenya.These are inadequacy of events records, poor quality, and spatial resolution/granularity among others.Therefore, using impact data may result in underestimation of forecast skill, leading to reduced confidence in using the forecast to support anticipatory actions.In other words, if we use impact data to verify and it turns out to be unwittingly underestimating the forecast skill, we might discard a forecast that is good enough to support preparedness actions for vulnerable people.Nevertheless, positive results obtained for Katakwi in Uganda and Nzoia in Kenya show that with some improvements, the impact data could be used to determine critical thresholds for flooding and inform the design of early warning mechanisms in data-scarce regions.For such regions, the following improvements would increase the usability of impact data.

| Characterising the gaps/uncertainties
The uncertainties in the impact data should be explicitly stated, as well as the implications for the outputs, especially if the outputs are intended to inform actions.The uncertainty around the estimate can be denoted using standard error, which indicates how far the estimate is from the mean.The standard error can be calculated by dividing the standard deviation by the square root of the sample size (Walker, 2018).From our analysis, the standard error in the FAR calculation varies from 0.02 to 0.05.Therefore, if the recommended forecast FAR to trigger humanitarian action is <0.5, using impact data will require a FAR of <0.4 to minimise actions taken in vain.Continuous operational evaluation of the forecasts is also required in situations where real-time reference data are available.

| Combining databases
A combination of impact data from multiple data repositories should be explored especially if the data is scarce (Barabadi & Ayele, 2018).This can help reduce the biases and possibility of missed events in the reference datasets for forecast verification, because of the differences in the methods and criteria used in the compilation of the various data repositories.For example, comparing river-gauge observations with impact data from all repositories against EM-DAT in Tana resulted in an improvement of the Type I error from 0.8 to 0.6 (Figure 5b).However, the combination should be carefully explored to avoid duplication of entries, especially from repositories fed from the same primary source or if there is a slight difference in the timestamp for the same event.Some of these challenges of replication can be handled by using a tolerance interval such that entries that are within a certain interval are considered one event.In this study, an interval of 7 days was used.
The combination should also consider the differences in the indicators used in each repository.For example, EM-DAT reports the 'number of people affected' as one indicator while DI reports the same in two separate indicators (i.e., 'directly and indirectly affected').In addition, EM-DAT makes clear differentiations of the disaster type and subtypes, such as riverine flood and flash flooding, whereas DI does not have such differentiation.Such differences make it challenging to combine and compare the data and disaggregate further, for instance, if you want to monitor only a subtype of the disaster.For example, in our analysis, most Type II errors could have resulted from impact data that were not necessarily from riverine flooding.
Harmonising and differentiating these parameters and clarifying their meanings would help minimise these difficulties (Below et al., 2010).This can be done by ensuring that these subtypes are indicated during the data collection process or by applying index-based approaches to differentiate between the various disaster sub-types (see Kruczkiewicz, Bucherie, et al., 2021).In addition, satellite data (e.g., from Sentinel-1 and Sentinel-2) can be used alongside the impacts reports to identify the nature and extent of flooding as well as the spatial location which can help in complementing the impact reports for future applications in forecast verification.The usefulness of satellite images in assessing flood event types and extent has already been demonstrated in several recent studies, although also these datasets have their own current limitations that should be taken into account (see Landuyt et al., 2019;Notti et al., 2018;Tarpanelli et al., 2022).

| Harmonising primary data collection and information management processes
Primary data collection process primary data collection in most countries is done through normal government procedures.This is mainly done using the damage and needs assessment approach at the local level and the collected data analysed at the national level (see The International Bank for Reconstruction and Development & The World Bank, 2010).If the collected information show that impacts are considerable, the country may decide to seek external support.In this case, the United Nations Office for Coordination of Humanitarian Affairs (UN-OCHA) may coordinate more rapid needs assessments to collect more information using approaches such as the Multi-sector Initial Rapid Assessment (MIRA) framework (Inter-Agency Standing Committee, 2015).Countries can, however, use their own guidelines for collecting the data.In Uganda, the Office of the Prime Minister is tasked with the collection and uploading of impact data in the DI.However, recent interviews in Uganda noted that rapid response assessments and collection of impact data are carried out by various institutions, including the Office of the Prime Minister, the Uganda Red Cross Society, the Humanitarian Open Street mapping team, local NGOs, and the district office, among others (personal communication, October 2020).There is a need to harmonise the data collection process through clear guidelines and dedicated institutions to avoid the probability of competing reports of unknown credibility (Guha-Sapir & Below, 2006).
Furthermore, impact reporting can benefit from improved weather and river-gauge networks.Improving gauge networks can be strategized such that it is done alongside the improvement on impact data collection (Baddour & Douris, 2018).This can ensure improvement in the flood forecasting systems by providing key inputs for hydrological model calibration and forecast verification, as well as for further impact reports verification.
Information management process impact data collected through primary sources such as in-country institutions are often uploaded to data repositories such as DI.Due to a lack of resources, most countries might not be uploading the collected information regularly.Therefore, the impact data collected are held in internal disaster management systems and managed by the primary institutions.National data repositories could be explored to ensure that all impact data collected incountry is stored in a central in-country repository for ease of accessibility.

| Impact data outside the official public sources
A broader and more accurate collection of temporal and geospatial data on disaster occurrence would ensure improved risk estimations (Bakkensen et al., 2018).An extended search of impact data available at the incountry archives, for example, in private institutions, and insurance companies, but not yet available in the open repositories would therefore help improve the quantity and detail level (spatial-temporal data) of the available impact data.For example, a study by Smith and Katz (2013) shows that a significant under-reporting of disaster loss estimates can occur due to reliance on only public sources because of their ease of accessibility.

| Use of new technologies
New technologies such as artificial intelligence can be used to expand impact data (van den Homberg et al., 2018).Initiatives to expand the impact data, for example, through web scraping, text mining (Margutti & van den Homberg, 2020), and application of earth observation data (Kruczkiewicz, McClain, et al., 2021;Nauman et al., 2021) and social media platforms should be explored.For example, social media platforms like Google Trends and Twitter have shown promising results in the detection and reporting of flood events (de Bruijn et al., 2019;Rossi et al., 2018;Thompson et al., 2022).In addition, an ongoing study by van den Homberg et al. (2022) has shown that flood impact data generated from news articles can complement data from global repositories such as DI both geographically and temporally, improving the usefulness of the data.Ensuring that any new data are interoperable with data from these repositories will require clear technical guidelines and protocols (Wirtz et al., 2014) such as the WMO data standardisation initiative (see Baddour & Douris, 2018).
Overall, impact data represent an important source of less conventional data for monitoring and improving early warning and preparedness actions.There is also great potential for improving these data quantity and quality through strengthening in-country disaster monitoring capabilities and ensuring standardised process of data collection that captures all the relevant data features such as flood extent, gauge level, contact information among others are in place(Integrated Research on disaster risk, 2014).

| CONCLUSION
As the world faces an uncertain future due to climate variability, environmental, and climate change, and an increase in extreme hydrometeorological events, investing in early warning early action mechanisms can be an effective way to prepare and adapt to these extreme events.However, such an investment will require understanding how forecast information performs in detecting these extreme events to ensure that anticipatory actions are not taken in vain.While forecast verification has been successful in regions where long-term hydrometeorological observations are available, this is very challenging in data-scarce regions.
Verification of forecasts using non-traditional approaches that use less conventional data would ensure the development of these mechanisms even in locations with scarce/no conventional observations.In this study, we investigated the usefulness of flood impact data to verify flood forecasts.Our findings show that although existing impact data have shortcomings, they also have the potential for flood event analysis and forecast verification and can be used in regions with no long-term hydrometeorological observations.These impact data may, however, require improvement to enhance their utility and make the forecast verification more acceptable and reliable.Among the recommendations outlined above, supporting the national institutions to streamline impact data collection, and expanding impact data using new technologies are of critical importance.Addressing these issues will, however, require a recognition of the role that impact data can play in verifying hydrometeorological forecasts and in identifying trends in extreme events to inform risk management.In addition, a collaborative effort among international humanitarian actors, disaster management institutions, the private sector, and local communities is needed to ensure that quality impact data are collected consistently and made available in near real-time.
v3.1 (current operational version), the LISFLOOD hydrological model (van der Knijff et al., 2010) is forced by an F I G U R E 1 Flood occurrence maps for Kenya and Uganda show the study counties/districts and the river gauge locations.The map was created using impact data collated from four different data repositories from 2007 to 2018.The colour scheme represents the number of years out of the 12 years considered when floods occurred ranging from low (1-3 years), moderate (4-6 years), high (7-9 years), and very high (10-12 years).

F
I G U R E 3 Comparative analysis of the impacts (all sources) and observed data at three percentile thresholds (80th, 90th, and 95th) of daily river flows from the gauged stations for (a) Katakwi, (b) Manafwa, and (c) Mayanja in Uganda.F I G U R E 4 Comparative analysis of the impacts and observed data at three percentile thresholds (80th, 90th, and 95th) of daily river flows from the gauged stations for (a) Tana, (b) Nzoia, and (c) Athi in Kenya.F I G U R E 5 Type I and Type II error at 90th percentile for all data repositories (including overlaps) and single source data repositories for (a) Type I in Uganda locations, (b) Type I in Kenya locations, (c) Type II Uganda locations, and (d) Type II Kenya locations.DI, DesInventar; EM-DAT, Emergency Events Database.F I G U R E 6 Differences in POD and FAR for locations in Uganda (Katakwi, Manafwa, and Mayanja) and Kenya (Tana, Nzoia, and Athi) across lead times at the 90th and 95th percentiles.FAR, false alarm ratio; POD, probability of detection.
Percent of the total number of flood events from multiple (overlaps) and single source data repositories for the study locations in Uganda and Kenya.
T A B L E 5Note: The first two sources that represent the highest percentage over each district/county are highlighted in bold.Abbreviations: DFO, Dartmouth Flood Observatory; DI, DesInventar; EM-DAT, Emergency Events Database.