Integration of catastrophe and entropy theories for flood risk mapping in peninsular Malaysia

A major challenge in flood mapping using multi‐criteria decision analysis (MCDA) is the selection of the flood risk factors and the estimation of their relative importance. A novel MCDA method through the integration of two state‐of‐the‐art MCDA methods based on catastrophe and entropy theory is proposed for mapping flood risk in the Peninsular Malaysia, an area very susceptible to flooding events, is presented. A literature review was undertaken which identified the various socioeconomic, physical and environmental factors which can influence flood vulnerability and risk. A set of variables was selected using an importance index which was developed based on a questionnaire survey. Population density, percentage of vulnerable people, household income, local economy, percentage of foreign nationals, elevation and forest cover were all deemed highly relevant in mapping flood risk and determining the zones of maximum vulnerability. Spatial integration of factors using the proposed MCDA revealed that coastal regions are highly vulnerable to floods when compared to inland locations. Flood risk maps indicate that the northeastern coastal region of Malaysia is at greatest risk of flooding. The prediction capability of the integrated method was found to be 0.93, which suggests good accuracy of the proposed method in flood risk mapping.


| INTRODUCTION
Extreme rainfall-driven events such as flood or droughts have increased in severity and frequency in many regions as a result (Nashwan, Ismail, & Ahmed, 2019a;Sediqi et al., 2019). In many parts of the world, floods have had devastating impacts in terms of loss of life and property (Dewan, 2013a;Nashwan, Shahid, & Wang, 2019b;Pour, Bin Harun, & Shahid, 2014;Yaseen et al., 2019). To ameliorate the risks associated with flooding events on the development of a region, it is important to define the spatial distribution of at-risk locations, particularly in the context of ongoing climate change (Pour, Wahab, Shahid, Asaduzzaman, & Dewan, 2020;Santos & Reis, 2018).
Tropical regions tend to be more susceptible to changes in climate (Noor, Ismail, Shahid, Nashwan, & Ullah, 2019;Rahman et al., 2019;Shahid et al., 2017), and thus, more susceptible to climate variability (Mishra & Liu, 2014). Recent studies have reported the adverse consequences on the societies and economies of tropical regions arising from the increasing frequency and severity of weather extremes (Noor et al., 2019;Noor, Ismail, Chung, Shahid, & Sung, 2018;Sa'adi, shahid, Ismail, Chung, & Wang, 2017;Shahid et al., 2016;Wong, Yusop, & Ismail, 2018). Malaysia, located in the tropics, is one country which has experienced climate and hydrological extremes in recent years (Khan et al., 2019;Mayowa et al., 2015;Sa'adi et al., 2017). The impacts of extreme rainfall and monsoonal rain-driven floods are increasingly evident in this region (Nashwan, Ismail, & Ahmed, 2018a). The flood in December 2014, affected thousands of people  and resulted in huge economic loss to the country. As a result of these type of events, the development of flood management processes has been proposed to mitigate the negative impacts on people and the economy (Salarpour, Yusop, Yusof, Shahid, & Jajarmizadeh, 2013).
Extreme rainfall is generally considered to be the major driver of flooding in Malaysia. Various physical and socioeconomic factors, however, can also amplify the impact of these flood events. Alias et al. (2019) identified forest cover, elevation, and population density as having a great influence on the spatial variability of flood impacts. In assessing the flood risk of a region, it is important that physical and social factors be considered (Rahman et al., 2019).
Numerous studies on flood vulnerability and risk mapping have been conducted in recent years (Chen et al., 2014;Dano et al., 2019;Dewan & Yamaguchi, 2008;Elsheikh, Ouerghi, & Elhag, 2015;Feloni, Mousadis, & Baltas, 2019;Jato-Espino, Lobo, & Ascorbe-Salcedo, 2019;Matori, Lawal, Yusof, Hashim, & Balogun, 2014;Nigussie & Altunkaynak, 2019;Pradhan & Youssef, 2011). In general, different factors can be considered in a multi-criteria decision analysis (MCDA) system. The major challenges in developing flood risk maps using MCDA are involved in the selection of indicators and the weighting of the factors according to their importance in defining flood risk. A large number of physical, environmental and socio-economic factors are typically responsible for shaping the vulnerability of an area (Cutter et al., 2008;Dewan, 2013a;Dewan, 2013b;Rahman et al., 2019). Many criteria for the selection of indicators have been proposed in the literature, including their availability, measurability, practicality, relevance, and degree of responsiveness and sensitivity (Alamgir et al., 2019;Nashwan, Shahid, Chung, Ahmed, & Song, 2018b;Yli-Viikari et al., 2007). The selection criteria for indicators should be based on the specific study area characteristics and the research questions to be solved.
Various knowledge-based and data-driven MCDA methods have been proposed for the mapping of risk associated with natural hazards such as flooding (Dewan, 2013a;Dewan, 2013b). In a knowledge-driven method, the perceived influence of factors on flood susceptibility is based on the opinion of the decision-makers. Therefore the outcomes of a knowledge-based MCDA are always prone to be biased due to personal preferences (Nashwan et al., 2018b, Ahmed et al., 2015. This limits their applicability in many cases, particularly in regards to risk mitigation decision-making. The data-driven method attempts to overcome this drawback by assigning weightings to the different factors based on the properties of data itself (Ahmed, Shahid, & Nawaz, 2018;Alamgir et al., 2019). For this reason, the data-driven MCDA approach is often preferred for flood risk mapping.
Catastrophe and entropy theories are two such MCDA methods which have been found to be highly effective in the modelling and mapping of different natural hazards (Agarwal et al., 2016;Ahmed et al., 2015;He-Hua, Jian-Jun, Xiao-Yan, & Ye, 2018;Singh, Jha, & Chowdary, 2018;Zhou, Ma, Chen, Wu, & Luo, 2018). Catastrophe theory was developed to characterise discontinuous dynamic systems where changes are abrupt. It is a subjective method that estimates factors' importance based on internal structure of different factors, and thus assists in avoiding human bias in the decision-making process. Entropy is a measure of uncertainty of a random variable. It can be used to evaluate how the controlling factors influence the outcome; e.g., how different socio-economic factors govern flood susceptibility. Therefore, it can also be used in a similar manner to catastrophe theory in assigning weights without the input of expert opinion. Both methods, however, have inherent advantages and disadvantages when determining the indicators. The weight assigned to an indicator by catastrophe theory is often influenced by the number of groups into which the indicator has been classified (Cui & Singh, 2015). Therefore, indicator weights are partly influenced by human judgement (Al-Abadi, Shahid, & Al-Ali, 2016). This shortcoming can be avoided using an entropy-based weighting method (Castillo, Castelli, & Entekhabi, 2015;Tang & Wang, 2013). Integration of these two theories can provide some robustness to the weighting approach when assessing the risks associated with flood events.
A data-driven MCDA approach, integrating both catastrophe and entropy theories, is proposed in this study in order to provide an unbiased evaluation of the spatial pattern of flood risk in Peninsular Malaysia. The study has considered flood risk as a system -consisting of different subsystems which can be evaluated using indicators. Entropy theory was used to assign the ranks of the indicators of the sub-systems while the catastrophe theory was used to assign weights to the different subsystems. The methodology proposed in this study can be employed for systematic evaluation of flood vulnerability factors, and to evaluate natural hazard risk in other regions.  Figure 1). The topography consists of an irregular, inland mountainous region surrounded by shorelines, notably around the Peninsula. It is situated within a tropical climatic zone with year-round high temperature and humidity. The daily average temperature in the Peninsula varies between 21 and 32 C. Rainfall of the region is controlled by the interaction between two monsoonal systems and the heterogeneous land and sea surfaces. Most rainfall occurs during the two monsoonal seasons, the northeast monsoon between November and February, and the southwest monsoon between May and August (Muhammad et al., 2019). The northeast monsoon is the more intense of the two systems. Extreme rainfall events can often occur in consecutive days, leading to severe flooding, particularly in the west of Peninsular Malaysia (Nashwan, Ismail, & Ahmed, 2019a).

| Geospatial data
Secondary data was obtained and used in this study. District level socio-economic data of Peninsular Malaysia was collected from the Statistical Yearbooks of Malaysia (DOSM, 2018). The flood hazard map prepared by the Department of Irrigation and Drainage (DID) in 2016 using long-term historical inundation data, was collected and digitised. Maps of various physical factors related to flood vulnerability (such as forest cover) were generated from existing land use maps of 2018. An elevation map F I G U R E 1 The geographical location and topography of peninsular Malaysia was produced from an Advanced Spaceborne Thermal Emission and Reflection Radiometer (ASTER) digital elevation model (DEM) (https://earthexplorer.usgs.gov/).

| METHODOLOGY
3.1 | Importance index for identification of flood vulnerability factors Flood vulnerability factors were first identified through a literature survey. The identified factors were carefully checked, and those found relevant to the study area were selected. A questionnaire survey was conducted among the various stakeholders in order to rank the factors according to their importance in defining flood vulnerability in Peninsular Malaysia. A non-random judgmental procedure was followed to select the samples from academics, disaster management experts, local councillors and people involved in emergency management (e.g., relief and rescue operations). The judgmental sampling was conducted as only a limited number of people possess knowledge on different factors, affecting flood vulnerability. A total of 50 samples was selected. Attention was given to maintain homogeneity in different groups. The age of the respondent was between 36 and 54 years with a median age of 44 years. A structured questionnaire was distributed to selected samples. All of them voluntarily participated in the survey and returned their responses within a week.
Respondents were asked to rank the factors on a scale of 1 to 3. A wider scale (e.g., 1 to 5) does not significantly change the ranking of factor, but does have the potential to confuse the respondents, so a scale of 1 to 3 was used. The responses were used to rank the factors using the Importance Index (Lim & Alum, 1995) as defined below: where n 1 , n 2 and n 3 are the total responses of 1, 2 and 3 provided by the respondents during the questionnaire survey. The values of the importance index range from 0 to 1, where 1 indicates highly important, and 0 indicates no importance. The factors are usually ranked according to the importance index, and the top-ranked factors are considered to be the most useful factors to include in the development of flood vulnerability maps. Usually, the first few top-ranked factors are selected, but the determination of these tends to be subjective, and can be biased to a modeller's preference. This potential lack of objectivity can be overcome through the use of a data classification method known as Jenks optimization (Jenks, 1967). This method uses an importance index to classify derived values according to their variance. The classification is derived in such a way that variance in the importance index within a class is minimised, but among the classes is maximised. Factors that were ranked top by the Jenks optimization tool were considered for inclusion in the flood vulnerability mapping work.

| Integration of catastrophe and entropy theories
Using catastrophe theory, flood vulnerability is considered to comprise of several subsystems, each of which can be evaluated based on one or more criteria or indicators. The values of all indicators are first normalised between 0 and 1, with 1 indicating high vulnerability to flooding (e.g., high population density indicates high vulnerability to flood). Equation (2) is used for low vulnerability (e.g., more forest cover indicates less vulnerability to flooding) and Equation (3) is employed for high vulnerability. The normalisation formulae (Wang, Liu, Zhang, & Chen, 2011) are given by: where X is the indicator, a 1 and a 2 are minimum and maximum values of the indicator. The catastrophe fuzzy membership functions are then used to assign ranking to each indicator. This helps in removing incompatibility issues between the initial indicator values (Ahmed et al., 2015;Wang et al., 2011). There are seven catastrophe models that can be used for the estimation of catastrophe fuzzy membership functions depending on the number of indicators of a subsystem. The catastrophe models and the formula used for estimation of membership function or rating of each indicator are shown in Table 1 in which a represents state variable and u, v, w, x are control variables. The state variable is related to control variables based on different catastrophe models. a i represents catastrophe fuzzy membership function of the control variable, i, where i can be u, v, w or x depending on the model. Details of the models and estimation of catastrophe fuzzy membership functions can be found in Wang et al. (2012).
The entropy method is used to estimate the weight of each subsystem. If the number of a subsystem is m, and the number of indicators of a subsystem is n, the matrix of Eigenvalue, Y can be estimated using the normalised values of the indicators as (Chen & Li, 2010): The matrix is used to calculate the ratio index as: The ratio index is then used to estimate the entropy (Amiri et al. 2014) as follows: The weight of each subsystem is finally calculated using Equation (7) as shown below:

| Computation of flood vulnerability and risk
Socio-economic and physical factors were subsequently integrated to estimate flood vulnerability index (FVI) (Balica, Douben, & Wright, 2009) using the following equation: where F represents flood vulnerability factor, N is the number of factors, w indicates the weights of the factors and r is the rank of different values.

| Assessment of model performance
The performance of the proposed method was evaluated using the receiver operating characteristic (ROC) curve.
The ROC considers only two classes (A and B) to validate the model. Two values can have a maximum of four possible outcomes. If a method can identify a flood zone correctly, it is considered as true positive (TP) otherwise false positive (FP). Similarly, if the method fails to locate a flood zone correctly, it is considered as a true negative (TN) otherwise false negative (FN). In the ROC curve (Huang & Ling, 2005), TP is drawn against FP and then the area under the curve (AUC) is estimated to define model accuracy as: The AUC in ROC curve is widely used in evaluating the performance of a classification model. It provides a measure of model capability for identifying different classes. The AUC in ROC provides different measures of model performance such as its sensitivity (TP/TP + FN), specificity (TN/TN + FP) and false alarm ratio (FP/TN + FP). The AUC-ROC is considered to be a composite metric for the reliability estimation of different properties of the classification model performance (Huang & Ling, 2005).

| Identification of flood vulnerability factors
A large number of factors related to flood vulnerability in Peninsular Malaysia have been documented in various studies (Alias et al., 2019;Dano et al., 2019;Elsheikh et al., 2015). A total of 19 factors were identified from the literature review based on availability, measurability and sensitivity. These factors are given in Table 2. A questionnaire survey was conducted among the stakeholders (50 individuals in total) to rank these 19 variables based on their importance and relevance to local conditions. The responses were then used to estimate their importance, and an index was developed. The estimated importance and the rank of each factor, based on the importance index, are given in Table 2. The ranking of different factors was classified using the Jenks optimization method (Table 3). The first column of Table 3 shows the class and the second column exhibits factors belonging to that class. The seven top factors identified are: (1) population density (2) percentage of vulnerable people; (3) elevation; (4) Gini coefficient; (5) percentage of foreign nationals; (6) household income; and (7) forest cover. These factors were considered important, in the Malaysian context, for delineating flood risk and carried forward for further study.

| Spatial distribution of flood vulnerability
Flood vulnerability factor data identified for Peninsular Malaysia was used to prepare flood maps. The spatial distribution of the factors identified as of most importance in assessing flood vulnerability in Peninsular Malaysia is presented in Figure 2. The values of population density, percentage of vulnerable people to the total population, Gini coefficient, household income and percentage of foreign population to the total population were divided into five classes using the natural break algorithms. Most of the lands of Peninsular Malaysia are situated at an elevation of 200 m so elevation data below 200 m is classified into four equal classes when preparing the elevation map. Figure 2 shows the high variability in population density which characterises Peninsular Malaysia. Higher

| Spatial distribution of composite flood vulnerability
The ranks of different features of each flood vulnerability subsystem were estimated using the catastrophe theory.
The results are presented in Table 4. The values of each indicator were classified using the Junk's optimization technique. The classified values of the indicators are given in the second column of Table 4. The mean value of each class was used as the initial value for the estimation of rank of each class, and the mean values were first normalised using Equations (2) and (3). The mean and the normalised values of the different classes of the indicators are given in the third and fourth column of Table 4. The catastrophe functions were then used for the estimation of ranks of each class. The Wigwam model (   Table 4. Next, the weight of each subsystem was estimated using the entropy theory (Table 5). The initial values of different classes of the indicators were used to estimate the ratio of each value using Equation 5. Each value was divided by the sum of all initial values of an indicator. Then the entropy of each class was estimated using Equation 6, the results of which are presented in the fourth column of Table 5. Finally, the weight of each subsystem was estimated using Equation 7. These results are presented in the last column of Table 5.
The estimated weight of different subsystems, and rating of different classes of the indicator of the subsystems, were used in Equation 8 to estimate flood vulnerability.
The vulnerability values ranged from 0.249 to 0.569. The values were then classified into five categories using Junk's optimization method. Figure Figure 4 shows a flood hazard map of the area. Recent floods that occurred in Peninsular Malaysia in 1965Malaysia in , 1967Malaysia in , 1971Malaysia in , 1973Malaysia in , 1983Malaysia in , 1988Malaysia in , 1993Malaysia in , 1998Malaysia in , 2000Malaysia in , 2007Malaysia in , 2008Malaysia in , 2009Malaysia in , 2010Malaysia in and 2014 are taken into account. The map was developed based on inundation areas during historical floods, estimated through ground observation and satellite images. The result indicates that floods mostly occur on the coastal plains. The flood hazard vulnerability is higher in the south, especially on the southeast and southwest coastal areas, while flood vulnerability is lower in the Centre of the Peninsula as these areas are generally not exposed to floods.  Figure 5 shows the highest flood risk zone (0.452-0.569) is located in the northeast coastal region. A large part of the southeast and southwest coasts also has a high flood risk rating (0.389-0.451). In general, it appears that locations with a high flood risk are more prevalent on the east coast than on the west coast.

| Flood risk estimation
There is a general conception that flood risk is much higher in the northeast coastal region (Nashwan, Shahid, et al., 2018b;Pradhan & Youssef, 2011). It is also highly devastating when flood occurs in urbanised areas along the central-western coast. The flood risk map generated in this study was found to match very well with the conception. Comparison of the flood risk map with flood vulnerability factors revealed causes of high flood risk in the northeast and central-western coasts. In the northeast, a high ratio of vulnerable people, low household income and greater inequality in wealth distribution (e.g., high Gini coefficient) are major factors contributing to flood risk. On the other hand, high population density together with a high ratio of foreign nationals is associated with high flood risk in the central-western coastal region of Peninsular Malaysia.

| Validation of flood risk map
The general perception regarding flood risk is that devastating flood events occur more frequently, that is, the risk is higher, in the northeast districts of Malaysia. Some districts in the southeast, however, are also severely affected by frequent floods. The AUC ROC tool was used to evaluate whether the flood risk map, developed using the integrated method proposed in this study, was able to map the flood risk zones of Peninsular Malaysia. The results are shown in Table 6 and indicate that the prediction capability of the integrated method is 0.93, reinforcing the suggestion that integrated use of both the catastrophe and entropy methods can provide the locations of flood risk with good accuracy.

| CONCLUSION
A data-driven MCDA approach through the integration of both catastrophe and entropy theories is proposed in this work which can provide an unbiased assessment of flood risk distribution in Peninsular Malaysia. This method can also be used for systematic assessment of the factors relevant to flood vulnerability and risk zone delineation. Seven major factors were accountable for flood risk. These were population density, percentage of vulnerable people, household income, and economy of the region, percentage of foreign nationals, elevation and forest cover. Using the proposed MCDA technique, this study revealed that coastal regions of Peninsular Malaysia are highly vulnerable to floods than inland locations. The highest flood risk was observed on the northwest coast. The efficiency of the proposed method was assessed using the AUC-ROC tool which indicated an accuracy of 0.93. The spatial variability of flood susceptible zones, and the factors that influence it, can be used to develop measures necessary for reducing future flood risk in Malaysia. The methodological framework necessary for the reliable mapping of flood risk proposed in this study, can be applied elsewhere. Despite the F I G U R E 5 Spatial distribution of flood risk T A B L E 6 The performance of integrated catastrophe-entropy method in mapping flood risk method was used to determine flood risk locations, there is opportunity to improve this work. For example, sensitivity of flood vulnerability factors can be evaluated to understand their relative importance. In addition, accuracy of the maps generated in this study depends on the quality and resolution of data. Hence, the effect of uncertainty in determining flood vulnerability and risk can be estimated in a future work. Besides, data of smaller administrative unit (sub-district or council) can be used for mapping flood risk areas accurately.

DATA AVAILABILITY STATEMENT
Data sharing is not applicable to this article as no new data were created or analyzed in this study.