An Empirical Social Vulnerability Map for Flood Risk Assessment at Global Scale (“GlobE‐SoVI”)

Fatalities caused by natural hazards are driven not only by population exposure, but also by their vulnerability to these events, determined by intersecting characteristics such as education, age and income. Empirical evidence of the drivers of social vulnerability, however, is limited due to a lack of relevant data, in particular on a global scale. Consequently, existing global‐scale risk assessments rarely account for social vulnerability. To address this gap, we estimate regression models that predict fatalities caused by past flooding events (n = 913) based on potential social vulnerability drivers. Analyzing 47 variables calculated from publicly available spatial data sets, we establish five statistically significant vulnerability variables: mean years of schooling; share of elderly; gender income gap; rural settlements; and walking time to nearest healthcare facility. We use the regression coefficients as weights to calculate the “Global‐Empirical Social Vulnerability Index (GlobE‐SoVI)” at a spatial resolution of ∼1 km. We find distinct spatial patterns of vulnerability within and across countries, with low GlobE‐SoVI scores (i.e., 1–2) in for example, Northern America, northern Europe, and Australia; and high scores (i.e., 9–10) in for example, northern Africa, the Middle East, and southern Asia. Globally, education has the highest relative contribution to vulnerability (roughly 58%), acting as a driver that reduces vulnerability; all other drivers increase vulnerability, with the gender income gap contributing ∼24% and the elderly another 11%. Due to its empirical foundation, the GlobE‐SoVI advances our understanding of social vulnerability drivers at global scale and can be used for global (flood) risk assessments.


Introduction
Every year, extreme events caused by natural hazards such as storms, floods, droughts, and heat waves result in severe impacts globally.These impacts range from asset-level damages to internally displaced people, and increasing mortality (EM- DAT, 2021;Formetta & Feyen, 2019;IDMC, 2021).At the same time, these events and their impacts are expected to exacerbate in the future due to climate change and population growth in locations exposed to climate-related hazards (C. Field, 2012;Oppenheimer et al., 2014;Thiery et al., 2021).However, impacts are not only driven by the population exposed to extreme events, but also by their individual characteristics such as age, gender, and income that determine people's vulnerability to these hazards (e.g., Cutter et al., 2003;A Fekete & Rufat, 2023;Lloyd et al., 2022;Rufat et al., 2015).Therefore, it is important to consider such characteristics of social vulnerability to gain a better understanding of how society can cope with and adapt to hazard events (J Birkmann, 2006;Cutter et al., 2003;J. Hinkel, 2011;Oppenheimer et al., 2014).
The drivers of climate-related risks have been conceptualized in the IPCC's risk framework, where risk results from the interaction of hazards, exposure to these hazards, and vulnerability of the exposed elements (C. Field, 2012;Oppenheimer et al., 2014).While social vulnerability research has been a growing field at local to national scales (Abbas & Routray, 2014;Bakkensen et al., 2017;Cutter et al., 2003;Fekete, 2009;Fekete & Rufat, 2023;Holand & Lujala, 2013;Hung et al., 2016;Liu & Li, 2016;Lloyd et al., 2022;Reckien, 2018;Rufat et al., 2019;Spielman et al., 2020;Tate et al., 2021), only few global-scale risk assessments account for social vulnerability (Carrão et al., 2016;Scheffran & Battaglini, 2011).This is contrary to physical vulnerability, which is often assessed by using (empirical or expert-based) damage functions such as depth-damage functions that are widely applied in flood risk assessments (e.g., Hallegatte et al., 2013;Hinkel et al., 2014;Jongman et al., 2015;Tiggeloven et al., 2020;Winsemius et al., 2016).However, the assumed causal relationships for physical vulnerability do not hold true when assessing social vulnerability as social vulnerability is driven by a variety of intersecting characteristics that cannot be captured in a continuous function (J.Hinkel, 2011;Reckien, 2018;Yoon, 2012).While several global studies have derived social vulnerability estimates based on impacts of, and exposure to, past hazard events (Bouwer & Jonkman, 2018;Formetta & Feyen, 2019;Jongman et al., 2015;Kakinuma et al., 2020;Tanoue et al., 2016), none of these studies empirically explore the socioeconomic characteristics that drive these vulnerability estimates.One notable exception is a recent study that has found economic inequality to be an important driver of flood fatalities in 67 countries (Lindersson et al., 2023).Therefore, recent work has stressed the need to develop a better understanding of vulnerability drivers at broad scales to be able to account for social vulnerability to climate hazards in global-scale risk assessments (Clement et al., 2021;J. Hinkel et al., 2021;Raju et al., 2022;Thiery et al., 2021).
Global studies that account for social vulnerability in assessing risks primarily apply index-based approaches that are built on national-level input data.Examples are the World Risk Index (Aleksandrova et al., 2021;Welle & Birkmann, 2015) and the INFORM index (Marin-Ferrer et al., 2017) which assess countries' risk levels to natural and climatic hazards as well as natural and human-made hazards, respectively (J.Birkmann et al., 2022).Several global studies analyze social vulnerability in a spatially explicit (i.e., subnational) manner, for example, related to drought risk (Carrão et al., 2016), political stability and conflict (Scheffran & Battaglini, 2011), or food security (Ericksen et al., 2011).However, these studies largely rely on national-level data of social vulnerability characteristics due to a lack of globally consistent, spatially explicit data that allow for differentiating spatial patterns in vulnerability at subnational levels.Therefore, the majority of spatially explicit social vulnerability assessments have been conducted at local to regional scales where high-resolution spatial data of vulnerability characteristics are more readily available (de Sherbinin et al., 2019;A. Fekete et al., 2010;Preston et al., 2011;Yoon, 2012).One prominent example of a spatially explicit index-based approach is the Social Vulnerability Index (SoVI) (Cutter et al., 2003;Flanagan et al., 2018), which was designed for the USA, and has been adopted and modified for a range of applications related to climate hazards (Flanagan et al., 2011;Holand & Lujala, 2013;Myers et al., 2008;Reckien, 2018;Schmidtlein et al., 2008Schmidtlein et al., , 2011;;Spielman et al., 2020;Tate, 2012Tate, , 2013;;Tate et al., 2021).Thus far, however, no subnational social vulnerability index exists that characterizes vulnerability to climate hazards and can be consistently applied at the global scale.
The main motivation for using index-based approaches is the need to combine the variety of intersecting characteristics that drive social vulnerability.Examples of these characteristics are age, gender, ethnicity, income, occupation, and living conditions (Cutter et al., 2003;Tate, 2012).While indices have been widely used to communicate the degree of social vulnerability to decision-makers (e.g., J Birkmann, 2006;J. Birkmann et al., 2022;J. Hinkel, 2011;Rufat et al., 2019;Visser et al., 2020), they have been increasingly criticized due to their limited transparency as all vulnerability drivers are combined into one composite index (Reckien, 2018;Rufat et al., 2019;Spielman et al., 2020).These indices primarily rely on a theory-based (i.e., top-down) understanding of vulnerability drivers based on the literature, assuming that all drivers included in the assessment are valid in characterizing social vulnerability.Therefore, index-based approaches often lack validation against empirical data of past hazards and their impacts (A. Fekete, 2012;Moreira et al., 2021;Ran et al., 2020;Tate, 2013).Several local to national studies have validated social vulnerability indices against past events (Bakkensen et al., 2017;A. Fekete, 2009;Liu & Li, 2016;Lloyd et al., 2022;Myers et al., 2008;Rufat et al., 2019;Schmidtlein et al., 2011).However, validation at global scales is scarce and, if done, is not disaggregated to the subnational level, but based on national-level data (J.Birkmann et al., 2022;Visser et al., 2020).Therefore, the validity of index-based approaches in reflecting actual vulnerability on the ground often remains unconfirmed (Bakkensen et al., 2017;A. Fekete, 2019;Reckien, 2018;Rufat et al., 2019;Spielman et al., 2020).
The current practice in social vulnerability assessment raises the need for developing a spatially explicit approach to characterize social vulnerability at global scale and to validate these findings against empirical data based on past hazard events (de Sherbinin et al., 2019;A. Fekete, 2019;Moreira et al., 2021;Ran et al., 2020;Reckien, 2018).We address this need by developing a globally consistent SoVI for flooding, based on publicly available, spatially explicit data sets of potential social vulnerability drivers such as age, education, income, settlement type, and access to healthcare.To ensure its validity, we use an empirical bottom-up approach, estimating a multiple linear regression model that predicts fatalities due to flooding with the help of social vulnerability characteristics.We test 47 potential vulnerability variables calculated for almost 1,000 past flooding events across the globe based on high-resolution flood maps per event (Tellman et al., 2021); and produce a "Global-Empirical Social Vulnerability Index (GlobE-SoVI)" with a selection of those vulnerability variables that contribute most to predicting flood fatalities (see Section 2 for further detail).By developing a more in-depth understanding of social vulnerability and its drivers in a spatially explicit manner, our results can inform policymaking in developing strategies that reduce social vulnerability to natural hazards (in particular flooding), for instance related to spatial planning; promoting socioeconomic development; and adaptation planning.

Materials and Methods
We followed the IPCC's risk framework, conceptualizing vulnerability as one component in driving climate risks and impacts in addition to hazards and exposure to these hazards (C. Field, 2012;Oppenheimer et al., 2014).Figure 1 provides an overview of the methodological steps taken to develop the GlobE-SoVI: 1) we selected and preprocessed spatially explicit data of potential vulnerability drivers based on a literature review of commonly used vulnerability variables; 2) this was followed by a flood impact analysis predicting flood fatalities by estimating a multiple linear regression model based on 47 vulnerability variables calculated for each flood event; 3) we combined all established vulnerability variables into a global database of normalized SoVI values, the GlobeE-SoVI.These three main steps are described in detail in the following sections.

Vulnerability Drivers
We first reviewed the relevant literature to gain an overview of social vulnerability drivers as established in previous work (e.g., Cutter et al., 2003;Hardy & Hauer, 2018;Jurgilevich et al., 2017;Madajewicz, 2020;Rufat et al., 2015;Yoon, 2012).Due to limited availability of spatially explicit data that reflect these drivers at global scale, we focused on nine key categories of potential vulnerability drivers to climate hazards as established in the literature (e.g., Cutter et al., 2003;Lloyd et al., 2022;Rufat et al., 2015), for which spatial data were openly available: age, gender, education, income, poverty, settlement type, healthcare access, flood protection standards, and disaster preparedness (Table S1 in Supporting Information S1).While subnational data on most of these potential drivers were readily available, we used travel time to the nearest healthcare facility as a proxy for healthcare access (following e.g., Cutter et al., 2003).Furthermore, we tested two national-level indices as proxies for disaster preparedness (Table S2 in Supporting Information S1).
An overview of the data sets that were explored in the study is provided in Table 1.We processed all data for the year 2010 (or the closest year available), determined by the availability of the Gridded Population of the World version 4.11 (GPW v4.11) data set, which provides demographic age and gender data.The data were preprocessed to ensure global alignment, both in terms of temporal and spatial resolution.We complemented education and income data of the Subnational Human Development Index (SHDI) with data from other years and with national-level data from the HDI (UNDP (United Nations Development Programme), 2022) for those countries where subnational data were unavailable.To be able to combine all vulnerability drivers into the GlobE-SoVI, we converted all data to raster files with a spatial resolution of 30 arcsec (i.e., approximately 1 km at the equator).

Flood Impact Analysis
For the flood impact analysis, we used satellite-based flood maps of past flooding events (2000)(2001)(2002)(2003)(2004)(2005)(2006)(2007)(2008)(2009)(2010)(2011)(2012)(2013)(2014)(2015)(2016)(2017)(2018) from the Global Flood Database (GFD) that covers 913 events spread across 110 countries (Tellman et al., 2021).To be able to calculate social vulnerability characteristics per event, we aggregated the flood maps to the spatial resolution of the vulnerability drivers (i.e., 30 arcsec).To account for population exposure, we employed the Global Human Settlement Layer (GHSL) population data set (GHS-POP), following the approach of Tellman et al., 2021 who used the population data of 2000 for all flooding events until 2007 and the data of 2015 for all remaining events from 2008 to 2018.We then calculated 47 potential vulnerability variables per event (Table S2 in Supporting Information S1), along with population exposure as well as flood duration, which we used as a proxy for hazard intensity in addition to the spatial footprint of the floodplain.
To account for the fact that impacts are driven by the interaction of hazard, exposure and vulnerability, we estimated a multiple linear regression model to predict flood fatalities (Equation 1), similar to previous work (Bakkensen et al., 2017;Lindersson et al., 2023;Lloyd et al., 2022;Peduzzi et al., 2009).Being aware of the limitations of regression models (e.g., causality cannot be inferred) (Lindersson et al., 2023), we used this modeling approach for a first-order analysis as interpretation of results was straight-forward and the available data were too limited for a more data-intensive modeling approach.We used flood fatalities (Fat) as reported for all events i included in the GFD as the dependent variable of the model.We excluded events where the reported number of fatalities exceeded the exposed population, resulting in 911 observations that were explored in the analysis.As very high numbers of fatalities were reported for several events, we log transformed the data to deemphasize these outliers (Feng et al., 2014).Subsequently, we conducted a trend analysis on Fat i to account for possible temporal changes in fatalities due to changes in social vulnerability that we would not be able to capture as the analyzed vulnerability drivers were only available for one time step.This analysis yielded a slightly decreasing trend in 2000-2018, albeit with low significance (p < 0.1) and a low fit (adjusted R 2 = 0.003) (Figure S1 in Supporting Information S1); therefore, we did not detrend the data.
We built the regression model in a stepwise approach, combining an automated stepwise variable selection algorithm with manual variable testing based on significance, multicollinearity testing, and model fit (similar to Rufat & Botzen, 2022).In doing so, we ensured that we did not overlook significant variables or included nuisance variables (i.e., coincidentally significant variables), which is a well-known limitation of stepwise regression techniques (Smith, 2018).To allow for evaluating the improvement in model fit (i.e., the adjusted R 2 ) when accounting for vulnerability in addition to exposure and hazard, we estimated a baseline model using flood duration (Dur) and population exposure (Pop) as the only explanatory variables.We then explored the 47 different vulnerability variables (Vul 1 ,Vul 2 , …,Vul n ) calculated per flood event (e.g., share of elderly, mean years of schooling, rural settlements; Table S2 in Supporting Information S1), following these six steps: 1. Data preselection: As the potential vulnerability variables were calculated from nine data sets (Table 1), several variables were perfectly collinear (Figure S2 in Supporting Information S1).Therefore, we could not estimate a first model configuration based on all variables.To establish the variables to include in the model, we tested each variable individually by adding it to the baseline model, retaining only those variables that were statistically significant (p < 0.05) (Burton, 2015;Lindersson et al., 2023) (James et al., 2013).If the remaining variables exceeded the p value defined during this process, we repeated steps 3 and 4 iteratively until all variables were statistically significant and had a VIF value < 5. 5. Double counting: If variables were calculated from the same data set and would conceptually reflect the same potential vulnerability driver (e.g., mean vs. median consumption per day), we retained the variable leading to a higher model fit (Tate, 2013).6. Adding variables: We tested those statistically significant variables that we had to drop in step 1 due to perfect collinearity one-by-one to ensure that we did not overlook significant variables (Smith, 2018).We added them to the final model configuration if they were statistically significant (p < 0.05), increased the model fit (adjusted R 2 ), and did not result in multicollinearity (VIF < 5).

Calculation of the GlobE-SoVI
Based on the results of the regression analysis, we produced one global raster of each established vulnerability variable (e.g., mean years of schooling by gender, share of elderly) as input for the GlobE-SoVI calculation, by processing the relevant data sets from Table 1.As the regression analysis was based on summary statistics per flooding event, we made a few adjustments at the raster level to ensure consistency of each variable raster with the established vulnerability variables.For example, we analyzed the share of each settlement type in the flood zone; however, as the settlement data are categorial, we produced a Boolean raster instead of percent shares (Table S3 in Supporting Information S1).To establish the GlobE-SoVI score (Vul) per raster cell j (Equation 2), we calculated the weighted sum of all variable rasters using the respective coefficients established in the regression analysis, following previous work (Hagenlocher & Castro, 2015;Nicholson et al., 2019;Tate, 2013): Although we log transformed flood fatalities to estimate the regression model, we refrained from exponentiating the GlobE-SoVI after calculating the weighted sum, therefore assuming a linear behavior of social vulnerability.
To remove outliers, we established the 98th percent confidence interval (following Tate, 2013) by replacing all Vul j values below (above) the 1st (99th) percentile with the respective percentile values that we calculated based on the GlobE-SoVI map ("winsorization") (Hagenlocher & Castro, 2015;Hagenlocher et al., 2018).Finally, we normalized the GlobE-SoVI to values ranging from 1 (low vulnerability) to 10 (high vulnerability) by scaling the data linearly, which is commonly done in social vulnerability assessments (Anderson et al., 2019;Hagenlocher et al., 2018;Meijer et al., 2023;Tate, 2013).We used a minimum of 1 based on the assumption that people are Earth's Future 10.1029/2023EF003895 vulnerable to flooding to a certain degree and a maximum of 10 to prevent misconception of the GlobE-SoVI as reflecting percentages.
Furthermore, we aggregated the raster maps of each vulnerability variable and the final GlobE-SoVI to administrative units based on the Global Administrative Areas (GADM) data, version 4.1 (GADM, 2022).We employed the administrative units with the highest spatial detail available per country (Figure S3 in Supporting Information S1) and calculated the mean per variable.We used this approach to create continuous and consistent maps of the GlobE-SoVI independent from the spatial detail of the input data as these differed markedly across countries, with high-resolution input data for, for example, Italy and the USA and low-resolution input data for, for example, Australia and Libya (CIESIN-Center for International Earth Science Information Network-Columbia University, 2018).The resulting GlobE-SoVI map can, therefore, be combined with any population data set to characterize the social vulnerability of the exposed population.

Results
In this section, we first present the results of the regression analysis, including the final regression model (i.e. Model 3) that we estimated to predict flood fatalities, along with a brief description of the five social vulnerability variables established during this process.Second, we describe the spatial patterns of the GlobE-SoVI and each vulnerability variable used to calculate it, both across and within countries.Third, we show the relative contribution of each vulnerability variable to the overall GlobE-SoVI score, established by aggregating the raster data to different world regions.

Regression Analysis
We tested several model configurations to predict (log transformed) flood fatalities based on the steps described in the Materials and Methods (Section 2.2).Table 2 presents three selected models (see Table S4 in Supporting Information S1 for the first model configuration after step 1) with the following configurations: • Model 1 constitutes the baseline model, accounting for hazard duration and population exposure only, which leads to an adjusted R 2 of 0.141.• Model 2 includes all vulnerability variables retained after running the stepwise selection algorithm (step 2; n = 20) based on the initial set of preselected variables (step 1; n = 28).This intermediate model configuration leads to the best model fit (adjusted R 2 = 0.288), irrespective of the variables' statistical significance level or multicollinearity.• Model 3 presents the final model configuration including five social vulnerability variables that contribute significantly to predicting flood fatalities (p < 0.05) while at the same time showing limited multicollinearity (VIF < 5).We obtained this model configuration by iteratively removing all non-significant (n = 11) and multicollinear (n = 4) variables from Model 2 one-by-one (steps 3 and 4); this was followed by dropping one variable due to double counting (step 5) and adding one variable initially dropped due to perfect collinearity (step 6).The explanatory power of the final model almost doubles (adjusted R 2 = 0.278) compared to Model 1.
Considering the model errors produced by Model 3, we see relatively high absolute and relative errors when predicting flood fatalities per event (Figure S4 in Supporting Information S1), largely driven by two extreme events where very high fatalities (≥10,000) were reported.When removing these events, the average absolute error amounts to 57 fatalities per event.Model 3 overestimates the number of fatalities in 55% of all events, where the average absolute error is 14 fatalities, while the relative error is 242%.This large relative error can mainly be explained by an overestimation of the fatalities during events with a low number of fatalities (e.g., the model predicts two fatalities for an event with one casualty).In those events where the model underestimates fatalities the average absolute error is 112 fatalities, corresponding to a relative error of 86%.
From the five social vulnerability variables included in Model 3, we find four to have a positive (i.e., increasing) effect on social vulnerability and one to have a negative (i.e., decreasing) effect.Education reduces social vulnerability and is represented by mean years of schooling (at the age 25+) weighted by gender.The share of elderly people, defined as the population aged ≥65 years, increases social vulnerability, as does the share of rural settlements (in percent), defined as settlement class 13 "rural cluster" in the GHS-SMOD data (Florczyk et al., 2019; Table 1).Similarly, the gender income gap increases social vulnerability.This variable expresses the relative difference in income between males and females (in percent), with 0 reflecting complete income equality Earth's Future 10.1029/2023EF003895 and 100 reflecting complete income inequality (i.e., men generate all income).Furthermore, we find the maximum walking time to the nearest healthcare facility (in hours) to increase social vulnerability.This variable is used as a proxy for healthcare access in exposed locations during and directly after a flooding event.

The GlobE-SoVI-Spatial Patterns of Social Vulnerability
Figure 2 presents the GlobE-SoVI map (Figure 2a) along with the five vulnerability variables (Figures 2b-2f) used to calculate the GlobE-SoVI, aggregated to the administrative unit level (see Section 2.3).The original vulnerability map in raster format along with each variable raster are visualized in Figure S5 in Supporting Information S1.For all variables, we see distinct spatial patterns, both across as well as within countries.With values of 1-2, the GlobE-SoVI (Figure 2a) is lowest in parts of Northern America, northern Europe (e.g., Norway), Australia, and Israel.These low values are primarily driven by high education levels (Figure 2b), a relatively low gender income gap (Figure 2d), a low share of rural settlements (Figure 2e), and low walking times Note.Coef = regression coefficient; SE = robust standard error " " p < 0.1; "*" p < 0.05; "**" p < 0.01; "***" p < 0.001.
Earth's Future 10.1029/2023EF003895 to the nearest healthcare facility (Figure 2f).We see highest vulnerability (i.e., values of 9-10) in large parts of northern Africa (e.g., Morocco, Niger, Sudan), the Middle East (particularly Yemen), and southern Asia (e.g., Afghanistan, Pakistan).In these regions, education levels are low (Figure 2b), the gender income gap is high (Figure 2d), and the walking time to the nearest healthcare facility is moderate to high (Figure 2f).
Most regions are characterized by a mix of low to moderate and moderate to high vulnerability.In Latin America, Sub-Saharan Africa, and Eastern and Central Asia, all vulnerability variables attain a mix of mostly moderate values.However, some parts of Sub-Saharan Africa (e.g., Angola, Mozambique) form an exception, with low education levels, low shares of elderly, and a comparatively low gender income gap.Furthermore, we see highly diverse spatial patterns of high and low vulnerability in some countries such as Spain, Romania, Estonia, and South Korea, mostly due to the spatial distribution of the elderly population (Figure 2c).

Relative Contributions of Vulnerability Variables Across World Regions
By calculating the mean of each vulnerability variable (see Figure 2) per world region, we are able to derive their relative contributions to the GlobE-SoVI in each region, which differ considerably across the world (Figure 3).Globally, education levels make up the highest relative contribution with more than 58%, having a negative effect on social vulnerability, that is, reducing the GlobE-SoVI.The elderly population has a positive contribution of roughly 11% to the GlobE-SoVI, while the gender income gap contributes almost 24% to social vulnerability.Rural settlements have a very minor contribution of 0.1%, and the walking time to the nearest healthcare facility makes up 7% of the GlobE-SoVI score.See Table 1 for data sources of panels (b-f).
With regard to education levels, we see the highest negative (i.e., reducing) effect on vulnerability in Australia & New Zealand ( 68.3%) and in Northern America ( 65.3%), which is about 15% higher than the global mean.The lowest education levels, and hence the lowest effect on reducing vulnerability, can be found in the Middle East & North Africa (about 44.5%),Southern & South Asia ( 46.2%), and Western, Middle & Eastern Africa ( 46.8%), thereby lying approximately 20% below the global mean.The elderly population contributes most to vulnerability in Western Europe (i.e., 25.8%), whose contribution is roughly 2.3 times higher than the global mean, followed by Eastern Europe & Russia where the contribution (12.9%) is about 1.5 percentage points higher than the global mean.In Northern America (6.5%) and the Middle East & North Africa (7.2%) elderly influence vulnerability the least.The effect of the gender income gap on vulnerability differs markedly across world regions, ranging from 14.4% in Western Europe to almost 44% in the Middle East & North Africa, which corresponds to roughly 40% lower and roughly 85% higher contributions than the global mean, respectively.Rural settlements have a consistently very minor effect on vulnerability, with the highest contribution of 0.6% in Southern & South-East Asia.Last, the walking time to the nearest healthcare facility contributes to vulnerability to a limited degree in most world regions, ranging from 1.1% in Western Europe to 5.8% in Eastern Europe & Russia; however, in Northern America (13.2%) and the Small Island States (12%), its influence is almost twice as high as the global mean.Across all world regions, education levels make up the highest relative contribution to vulnerability.In the Middle East & North Africa, Southern & South Asia, and Western, Middle & Eastern Africa, the contribution of the gender income gap is almost as large as the one of education (in absolute terms).Therefore, these three regions deviate from the overall patterns of contributions per vulnerability variable to the GlobE-SoVI that were established in the global mean.A second noticeable exception is Western Europe where the elderly population has the second highest contribution, and the gender income gap plays a less important role in driving social vulnerability.Furthermore, the relative contributions per vulnerability variable in Southern Africa stand out compared to the rest of the continent, being well aligned with the global mean patterns, thereby deviating considerably from other African regions.Similarly, the vulnerability patterns in Latin America and Central & Eastern Asia are well aligned with the global mean.

Advancing Global-Scale Social Vulnerability Assessments
Using observed flood maps of past flooding events to empirically derive social vulnerability drivers based on fatalities caused by these events, we advance previous research that was hampered by a lack of observed hazard  2).
data (Bouwer & Jonkman, 2018;Formetta & Feyen, 2019;Jongman et al., 2015;Kakinuma et al., 2020;Tanoue et al., 2016).Through estimating a multiple linear regression model, we are able to account for the relative contribution of hazard (i.e., event duration), exposure (i.e., population) and each vulnerability variable to flood fatalities (Table 2).This is different to previous assessments that assumed an equal contribution of hazard, exposure, and vulnerability to risk (Bouwer & Jonkman, 2018;Jongman et al., 2015;Tanoue et al., 2016).Furthermore, existing social vulnerability assessments at local to national scale commonly use Principal Component Analysis (PCA) to reduce the multitude of vulnerability variables included in the assessment in order to create a composite index (e.g., Cutter et al., 2003;Rabby et al., 2019;Rufat et al., 2019;Tate, 2012;Zhou et al., 2014).While we considered a PCA-based regression model, we refrained from using this approach as it resulted in a lower model fit (Table S5 in Supporting Information S1) and would compromise interpretability and reproducibility of the results due to the multitude of variables included.Instead, our approach focusses on five empirically validated key drivers of social vulnerability to calculate the GlobE-SoVI.
Making the input data of these key drivers and the code for each analysis step publicly available (see Data Availability Statement), we facilitate reproducibility of the study results, thereby addressing the call for more transparent approaches to assessing social vulnerability (Moreira et al., 2021;Reckien, 2018;Rufat et al., 2019;Spielman et al., 2020).The established regression coefficients can be used to calculate the GlobE-SoVI when updated and/or more highly resolved (both spatially and temporally) data sets become available.Combining the GlobE-SoVI with spatial data of population exposure helps characterizing the socioeconomic dimension of flood impacts more comprehensively: establishing locations where high exposure and social vulnerability coincide leads to a more refined picture of social vulnerability hotspots.These hotspots can form the basis for more informed policies that are tailored to prevailing socioeconomic conditions, for instance in the context of global frameworks such as the UN's 2030 Agenda for Sustainable Development (United Nations, 2015), the Sendai Framework for Disaster Risk Reduction (UNDRR, 2015) and the Paris Agreement on climate change (UNFCCC, 2016).Despite the fact that the GlobE-SoVI is empirically developed based on flood impacts, we are confident that, due to their generic nature, the established key drivers of social vulnerability are relevant for characterizing social vulnerability to other hazards as well, although dedicated studies should be performed to illustrate to what extent this is the case.

Drivers of Social Vulnerability
The five key drivers of social vulnerability and their effect (i.e., increasing or decreasing) on social vulnerability confirm the relationships assumed in the literature (Table S1 in Supporting Information S1).An interesting insight of this study is that while per capita income, which is often used as an indicator of vulnerability (e.g., Cutter et al., 2003;Meijer et al., 2023;Rufat et al., 2019;Tate, 2013;Yoon, 2012), has been statistically significant in explaining flood fatalities (see Model 2 in Table 2), income inequality between genders seems to play a more important role in driving vulnerability.This insight confirms previous work (e.g., Rufat et al., 2015) that discusses income as a potential proxy for other vulnerability variables that reflect socioeconomic status such as education levels, which we establish as the most influential driver of social vulnerability globally.The high influence of education levels and the gender income gap on social vulnerability established in this study suggest that policies pursuing the aim to reduce vulnerability may need to focus on improving access to and quality of education as well as promoting gender equality.These two aspects are already targeted in the UN's 2030 Agenda for Sustainable Development (United Nations, 2015), and are specifically addressed in Sustainable Development Goal (SDG) 4 "Quality education" and SDG 5 "Gender equality." We emphasize that while the established vulnerability variables significantly contribute to the prediction of flood fatalities, we are not able to establish exact causality in our analysis.Therefore, these variables may be proxies for a variety of other characteristics such as the quality of infrastructure, early warning systems, or the strength of local institutions (Burton, 2015;Rufat et al., 2015Rufat et al., , 2019) ) which we were not able to account for in this study due to a lack of relevant global data.One such example is the walking time to the nearest healthcare facility which we use as an indicator for healthcare access.Although we based this indicator selection on previous work (e.g., Cutter et al., 2003;Drakes & Tate, 2022), it does not measure which parts of society actually have access to healthcare, but is rather a measure of accessibility of a place in terms of transport infrastructure (Weiss et al., 2020).Therefore, the drivers established in this first-order analysis should be viewed as a first generic set of global vulnerability drivers which can be expanded whenever more detailed data are available (see Section 4.4.for a discussion on data and model uncertainties inherent in global-scale studies).

Comparison to Previous Validation Exercises
While several previous studies validate social vulnerability models using empirical data of past hazard events and their impacts in a spatially explicit manner (i.e., at subnational level), these validation exercises are performed on local to national levels using different dependent variables (e.g., fatalities, damages, migration, people seeking emergency shelters) and hazards (e.g., earthquakes, floods, typhoons) (Bakkensen et al., 2017;Burton, 2015;A. Fekete, 2009;Lloyd et al., 2022;Myers et al., 2008;Rufat et al., 2019;Schmidtlein et al., 2011).The majority of these studies use a top-down validation approach, first developing a social vulnerability index based on a set of indicators, followed by validation of the developed SoVI using past hazard events.One study uses a bottom-up validation approach like the one used in this study: Burton (2015) determines a set of potential variables to characterize disaster resilience, and then establishes those that are significant in predicting disaster recovery.Due to their different validation approaches, regression model specifications, and spatial foci, the results of previous studies are not directly comparable to our results.Nonetheless, we see model fits similar to ours, with R 2 of for example, 0.2 (Myers et al., 2008), 0.26 (Rufat et al., 2019), 0.27 (Bakkensen et al., 2017), and 0.28 (Burton, 2015).Therefore, we are confident that the results of this study provide a valuable step forward in global-scale social vulnerability assessments.

Data and Model Uncertainties
Due to the global focus of the study, we rely on the integration of a diverse range of data sets, some of which are available at coarse spatial resolution only (Table 1).Even within one data set, spatial units can differ substantially across countries, resulting from differences in the definition of census units, with high spatial detail in countries like France and the USA and low detail for instance in Egypt and Libya (e.g., Figure 2c).These differences are prone to the so-called Modifiable Areal Unit Problem (MAUP) (Openshaw, 1983), which often leads to statistical bias in spatial analysis, as aggregated spatial units smoothen deviations at the local scale (Gao & O'Neill, 2021).For example, the share of elderly (Figure 2c) is unlikely to be 0% or 100% if the spatial units have low spatial detail while these shares may occur in units with high spatial detail, thereby resulting in more spatially diverse vulnerability patterns (see e.g.Australia vs. Italy in Figure 2a).Furthermore, the lack of relevant global data sets of potential vulnerability drivers constrains our analysis to a selected set of drivers; therefore, we possibly miss relevant drivers of social vulnerability.
By calculating the GlobE-SoVI based on the results of one regression model for the entire globe, we are not able to account for differences across countries, in terms of vulnerability drivers and their importance.As social vulnerability is considered to be location-specific, the literature recommends adjusting the SoVI to the local context (Burton, 2015;Cutter et al., 2016;A. Fekete, 2009;Lloyd et al., 2022).While we considered estimating the model separately for different income groups, assuming that social vulnerability drivers differ depending on the income level, the number of reported flood events (n = 913) was too small to estimate robust sub-models.Estimating sub-models would also allow for using different impact metrics for each sub-model, based on the assumption that the manifestation of flood impacts differs across income groups.For example, while fatalities may be a suitable impact metric in low-income countries, other impact metrics such as property losses may be more relevant in high-income countries (Bakkensen et al., 2017;J. Birkmann et al., 2022).Indeed, previous work has used different impact metrics to assess and validate social vulnerability indices such as damages (Bakkensen et al., 2017;Schmidtlein et al., 2011); net migration (Myers et al., 2008); disaster assistance claims (Bakkensen et al., 2017;A. Fekete, 2009;Rufat et al., 2019); the number of evacuations (A. Fekete, 2019;Nicholson et al., 2019); or people seeking emergency shelters (A. Fekete, 2009(A. Fekete, , 2019)).However, as these are local-to national-scale studies, such impact metrics are more readily available than at the global scale, where consistent reporting of impacts caused by natural hazards is limited (a notable exception being the Emergency Events Database, EM-DAT (EM-DAT, 2021)).Therefore, previous research has called for developing a database that consistently reports impacts (including fatalities) per event, also accounting for natural hazards other than flooding (Cutter, 2017).Such an initiative may lead to more reliable numbers of reported flood fatalities.The reported fatalities seemed questionable for some events included in the Global Flood Database (GFD) (e.g., 100,000 fatalities), potentially driven by the fact that the data underlying the GFD largely rely on news reports (Brakenridge, 2023).Moreover, given spatial footprints of reported events are available, such a database would allow for establishing social vulnerability drivers to other natural hazards in addition to flooding, which is particularly relevant in the growing field of multi-risk research (Drakes & Tate, 2022;Ward et al., 2022).

Conclusions
This study advances social vulnerability assessments at global scale, by empirically establishing five key drivers of social vulnerability based on past flooding events, and combining these drivers to develop a global map of social vulnerability to flooding (i.e., the GlobE-SoVI).Through estimating a multiple linear regression model, we identify five vulnerability variables most significant for predicting flood fatalities in addition to event duration and population exposure, which improve the model fit substantially, almost doubling the adjusted R 2 from 0.141 to 0.278.Using the regression coefficients, we calculate the GlobE-SoVI as a weighted sum of the five vulnerability variables.The spatial patterns of social vulnerability differ markedly within and across countries, mostly driven by education levels (i.e., mean years of schooling), followed by income inequality (i.e., gender income gap); age (i.e., share of elderly); healthcare access (i.e., walking time to nearest healthcare facility); and rural settlements.
Future work can refine this first-order assessment by addressing the uncertainties inherent in this study.First, it is worth exploring whether better model fits can be attained when estimating different sub-models per income group or world region; assessing impact metrics other than fatalities (e.g., migration, evacuations); or using other modeling approaches.Furthermore, follow-up research can carry out similar analyses for other hazards such as earthquakes, landslides, or heatwaves, and the combination of multiple hazards, and assess whether and how the established social vulnerability drivers and their importance differ across hazards.Similarly, accounting for temporal dynamics in social vulnerability, which can change due to the impacts of a hazard event (e.g., de Ruiter & van Loon, 2022), but also due to future changes in socioeconomic development under a range of scenarios (e.g., the Shared Socioeconomic Pathways, SSPs (O'Neill et al., 2017)), deserves further attention.Most of these issues depend on the availability of high quality and spatially explicit data, which are currently limited at this scale of analysis, calling for mainstreaming the reporting of impacts caused by natural hazards at larger scales, including their spatial footprints; characteristics of the affected population; and a wide range of impact metrics such as fatalities, displacements, and evacuations.Once more detailed data become available (see Mester et al., 2023 for a recent example), the results of this work can be refined as the model code and corresponding data sets produced in this study are publicly available, making the calculations of the GlobE-SoVI fully reproducible.Despite these needs for future research, our analysis helps to develop a more in-depth understanding of the characteristics that drive social vulnerability globally, along with their spatial distributions.Therefore, the results of this study can support decision-making in developing strategies that reduce social vulnerability to flooding, for instance related to spatial planning, socioeconomic development, and adaptation planning.The GlobE-SoVI constitutes an important step forward in climate risk assessments as it is the first social vulnerability model that can be consistently applied at the global scale, thereby bridging the social and natural sciences, which has been a major limitation in social vulnerability research (de Sherbinin et al., 2019).

Figure 2 .
Figure 2. GlobE-SoVI (a) and its five input variables (b-f) per administrative unit.(a) GlobE-SoVI score; (b) education levels in mean years of schooling; (c) percent elderly (age 65+); (d) gender income gap in percent; (e) percent rural settlements (in 2015); (f) walking time to the nearest healthcare facility in hours.Gray = no data.See Table1for data sources of panels (b-f).

Figure 3 .
Figure 3. Relative contribution (in %) of each social vulnerability variable to the GlobE-SoVI per world region (based on UN regions).Note: Higher education levels reduce vulnerability, while all other variables increase vulnerability (Table2).
. If perfectly collinear variables still remained, we dropped those leading to a lower model fit.2. Stepwise selection: We ran a stepwise regression algorithm based on all preselected variables, using the "stats" package in R. The algorithm drops or adds variables to the model per step, searching for the best model fit based on the Akaike information criterion (AIC) (R Core Team, 2022).3. Statistical significance: We analyzed the p values of the model variables established by the stepwise selection algorithm and removed variables with low statistical significance (p > 0.05) one-by-one, starting with the highest p value. 4. Collinearity testing: We tested the remaining variables for multicollinearity based on the variance inflation factor (VIF) test.Aiming for a VIF < 5 to ensure low collinearity, we removed the respective variables one-byone, starting with the highest VIF

Table 2
Regression Results for Different Model Configurations Including the Final Selected Model 3