Flood susceptibility mapping and assessment using a novel deep learning model combining multilayer perceptron and autoencoder neural networks

Floods are one of the most destructive natural disasters causing financial damages and casualties every year worldwide. Recently, the combination of data‐driven techniques with remote sensing (RS) and geographical information systems (GIS) has been widely used by researchers for flood susceptibility mapping. This study presents a novel hybrid model combining the multilayer perceptron (MLP) and autoencoder models to produce the susceptibility maps for two study areas located in Iran and India. For two cases, nine, and twelve factors were considered as the predictor variables for flood susceptibility mapping, respectively. The prediction capability of the proposed hybrid model was compared with that of the traditional MLP model through the area under the receiver operating characteristic (AUROC) criterion. The AUROC curve for the MLP and autoencoder‐MLP models were, respectively, 75 and 90, 74 and 93% in the training phase and 60 and 91, 81 and 97% in the testing phase, for Iran and India cases, respectively. The results suggested that the hybrid autoencoder‐MLP model outperformed the MLP model and, therefore, can be used as a powerful model in other studies for flood susceptibility mapping.

K E Y W O R D S deep learning, flood susceptibility, GIS, mapping, multilayer perceptron
Floods are considered one of the most important and highly destructive natural hazards in Iran and their frequency and intensity have increased in the recent years (Ahmadlou et al., 2019;Arabameri, Rezaei, Cerdà, Conoscenti, & Kalantari, 2019;Khosravi et al., 2019;Rahmati, Pourghasemi, & Zeinivand, 2016;Termeh, Kornejady, Pourghasemi, & Keesstra, 2018). Iran has been experiencing floods of different intensities every year as it has a semi-arid to arid climate with little and mostly showery annual precipitations having non-uniform spatial and temporal distributions (Sharifi Garmdareh, Vafakhah, & Eslamian, 2018). For example, the flood occurred in 25 Iranian provinces during the first week of March 2019 left at least 19 people dead and billions of dollars' worth of damage. The recent observations reveal that the changes in the amount and intensity of the precipitation vary in different regions of the Indian subcontinent due to escalating temperature induced by global warming (Pachauri et al., 2014). The regions which were receiving less precipitation earlier are now getting more amount of water due to climate change and due to increasing discharge in catchments are implying more risks of floods (Field, Barros, Stocker, & Dahe, 2012;Pachauri et al., 2014). Hence, flood hazard prediction and zoning in susceptible regions are highly important and can help reduce the damages caused by this phenomenon. This study examines the flood susceptibility in two study areas in Iran and India. In general, there are two approaches in various studies to model different phenomena such as flood, landslide and so on. The first approach is to build a model and test it in various regions (Shafizadeh-Moghadam, Asghari, Taleai, Helbich, & Tayyebi, 2017). In this approach, to test the performance of the model, it is used in different regions and on different data. In the second approach, various models are tested in one study area and their performances are examined (Ahmadlou et al., 2019). This study uses the first approach for flood susceptibility mapping.
In the expert-based approach, the opinion of experts is first used in the form of information layers to determine the factors effective in the flood occurrence (Fernández & Lutz, 2010). The multi criteria decision making (MCDM) methods, such as the AHP, are then used to weight the factors and, finally, these factors are combined with their coefficients (Souissi et al., 2019). These methods mostly rely on the technical knowledge of the experts and are, therefore, prone to errors (Khosravi et al., 2019). In the data-driven approaches, various statistical methods, machine learning, and data mining techniques are used based on the historical floods in the region along with the characteristics of the regions experiencing the same phenomenon such as the topographical, climatic, and geological characteristics (Kia et al., 2012;Wang et al., 2015). In fact, the working mechanism of these methods makes use of the existing data on the location of the flood occurrence in the past and their characteristics. This approach, based on the historical floods, provides the researchers with an accurate tool (Khosravi et al., 2019).
Data-driven approaches have been employed in various studies to prepare flood susceptibility maps (Bui et al., 2020;Bui et al., 2016;Costache et al., 2020c;Khosravi et al., 2019;Kia et al., 2012). However, with the advancements in machine learning and data mining techniques, more advanced models are put forth in this field every day, enabling researchers to combine them with GIS and RS for zoning and detection of susceptible regions (Bui et al., 2020). Artificial neural networks (ANNs) are among the most widely used algorithms in n various disciplines (Costache & Bui, 2019;Kia et al., 2012;Shi, Wang, Tang, & Zhong, 2020;. This model is a highly powerful tool and has been reported to provide appropriate results in various studies Kia et al., 2012;Pradhan, 2010). However, the traditional ANN can get trapped in local optima through random initialization, which can be prevented by using a deep-learning algorithm, autoencoder neural networks, based on the MLP neural network to obtain a better initialization (Vincent, Larochelle, Lajoie, Bengio, & Manzagol, 2010). In fact, an autoencoder is used to improve the accuracy and efficiency of MLP neural networks through a nonlinear mapping that both reduces the dimension of the problem and serves as a feature extraction procedure (Hernández, Sanchez-Anguix, Julian, Palanca, & duque, 2016, Oliveira et al., oliveira, barbar, & soares, 2014. The MLP network is then used for prediction and estimation. Hence, the main objective of this study is to obtain flood susceptibility maps using a model combining autoencoders and MLP neural networks.
2 | STUDY AREAS AND DATA SET 2.1 | The first study area The first study region is located in Golestan Province in Iran extending between latitudes 36 27 0 and 38 14 0 N and between longitudes 53 40 0 and 56 30 0 E (Figure 1). The study area has an area of 12,050 km 2 , altitude range of −147-3,348 m above sea level, and precipitation range of 180-880 mm. The northern parts of the region have less rainfall intensity than the southern parts. The north of this region is surrounded by agricultural lands while the south is surrounded by forest areas. Part of the Alborz mountain range is also located in the south of the region. The roughness of the region is such that they can be clearly divided into plains and mountains. In this study area, the slope of the land decreases from the heights to the plains. At the confluence of the plains and foothills of northern Alborz, due to the severity of erosion and alluvial density, part of the old roughness is covered by newer sediments and only in some places has appeared as hills. Deadly floods occurred in this province in 2001,2002,2005, and 2019.

| The second study area
The second study area ( Figure 2) which shares Upper and Lower Ganga basins of Ganga River Basin (GRB), is holding one of the densest populated region in the Indian territory (Singh, 1971). This area has altitude range of 45-96 m above sea level, and precipitation range of 1,001-1,281 mm. It faces most devastating floods every year during monsoon period in influence of South-West monsoon rainfall (Vittal et al., 2016). The unparalleled distribution of the population in the region put the lives in danger in unprecedented situation during flood. On an average hundreds of lives lost or get missing every year in India due to flood and mostly in GRB only. The colossal loss of properties and agricultural products happen in the recurring period is recorded in various government reports for this study area. The study area is the confluence zone of major rivers-Ghaghara, Gandak, Ganga, Son, Kosi, and other minor tributaries of Ganga (Arora, Pandey, Siddiqui, Hong, & Mishra, 2019). Hence, the risk of inundation during monsoon period is much higher than other regions of India.
The region experiences four seasons-summer season (April-May), monsoon season (June-September), postmonsoon season (October-December) and winter season (January-March) (Dimri et al., 2019). The sub-tropicalhumid region experiences the highest temperature from April to July. During these months the maximum temperature recorded between 35 to 45 C. The lowest temperature recorded in December and January, where the downfall of the lowest temperature is recorded up to 03-04 C. On arrival of monsoon, in late June and early July, the high intensity rainfall devoured the upper catchments of GRB, resultant the lower basins (study area) experiences an unprecedented situation during late August and September. The increment in discharge in the rivers has been noticed 50 to 100 times greater than average discharge (Shukla and Singh, 2004) and cause the flood.

| Data set
The flood inventory map and flood conditioning factors are required for flood susceptibility mapping using datadriven methods. In fact, the flood inventory map acts as the target variable to be modelled, and the flood conditioning factors represent the independent variables (predictors) used for modelling the target variable.
The flood inventory map contains the location of past floods (147 and 300 flood events for Iran and India, respectively). Various methods are available to determine these points including field observations and satellite images and Google Earth imagery. One hundred and forty seven flood points in Golestan Province were recorded by the Golestan Water resources organisation. For GRB, Landsat 5 multispectral scanner (MSS) and shuttle radar topography mission (SRTM) version 4 digital elevation model (DEM) satellite images were used to create flood zones (Table 1). Then, 300 flood points were generated by creating random points tool in GIS environment. To identify non-flood points, random sampling was first performed in ArcGIS 10.4 software and finally, from the generated points, 300 non-flood points where flooding is not able to occur were selected using field surveys, topography maps and Google Earth software. Such a process was used to generate 147 non-flood points in Golestan Province. During the modelling, 70% of the flood and non-flood points were used for training and 30% for testing purposes. Flood occurrence in a region is affected by various factors (Kourgialas & Karatzas, 2017;  Talukdar et al., 2020). For Golestan Province, nine factors including altitude, slope, aspect, plan curvature, topographic wetness index (TWI), lithology, distance to drainage, rainfalls, and land use. For GRB, 12 factors including altitude, slope, aspect, plan curvature, distance from the river, rainfall, river density, TWI, land use land cover, distance from roads, soil type, and geomorphology factors were selected based on previous studies and data availability. Tables 2 and 3 show source of input data, original format of source data (vector and raster), original map scale or spatial resolution of source data and derived map (factor). Altitude is considered one of the important factors in most studies related to flood susceptibility mapping (Costache, 2019;Janizadeh et al., 2019). In highaltitude regions, flood occurrence is highly unlikely, whereas flat regions have a high potential for flooding (Janizadeh et al., 2019). This factor can be prepared using DEM. Some topographic factors such as slope, aspect, plan curvature and TWI are also extracted from DEM. The slope map, due to its direct effect on surface runoff, is another factor influencing flood occurrence so that an increase in the slope reduces the time for surface infiltration, hence allowing a larger volume of water to enter the river bed causing flooding (Tehrany, Pradhan, Mansor, & Ahmad, 2015). Aspect and curvature are two other height factors considered in this study. Moreover, Equation 1 is used to calculate the TWI as a water-related factor highly important in flood occurrence (Pourghasemi, Pradhan, Gokceoglu, & Moezzi, 2012;Ali et al., 2020): In Equation 1, A is the catchment area, and α is the slope angle. Rainfall is another influential factor in flood occurrence (Bracken, Cox, & Shannon, 2008). Floods can occur when the amount of water flowing from a catchment exceeds the capacity of its drains. However, flood occurrence due to rainfall is also dependent on other factors such as land use and land cover, soil type, and characteristics of waterways such as size and shape (Ahmadlou et al., 2019). The geology factor was also used in the modelling due to its direct effect on infiltration and surface runoff. The activities associated with land use (e.g., urban development or deforestation) are one of the most important human factors affecting flood occurrence. The effect of this factor can vary from one land use type to another as well as at small, medium, or large scales. For example, lack of vegetation and/or urban growth in a region can lead to floods. Land use/land cover (LULC) of Golestan Province was prepared by the maximum likelihood (ML) supervised classification technique using Landsat 8 Operational Land Imager (OLI) satellite image. Also, for GRB LULC map, the ML method was used in the Climate Change Initiative (CCI) LULC 2008 dataset and the study area part has been extracted from the ready-to-use dataset and used in the work as an LULC conditioning factor. Distance to drainage is another T A B L E 3 Details of input data and derived data parameters for GRB important factor affecting the flood occurrence (Tehrany, Pradhan, & Jebur, 2013;Wang et al., 2015). Figures 3 and 4 show the factors along with their categories that influence flood occurrence for Iran and India cases, respectively. In this study, ArcGIS and Environment for Visualising Images (ENVI) softwares were used to prepare of conditioning factors. The modelling process was programmed in the MATLAB software.
3 | METHODS Figure 5 shows the different stages of the research using the employed models. After preparing the flood inventory map and flood conditioning factors, the frequency ratio (FR) model was used to determine the correlation between flood occurrence and the considered variables. In the next step, two models, namely the MLP and autoencoder-MLP, were used for the preparation of susceptibility maps, and

| Frequency ratio
The FR determines the quantitative correlation between flood occurrence and the various factors affecting it (Oh, Kim, Choi, Park, & Lee, 2011). For each class of variables, FR is equal to the occurrence percentage of floods in that class to the percentage of area covered by that class (Lee & Sambath, 2006). Hence, the Equation 2 is developed to determine the FR value for each class of the variables (Lee & Sambath, 2006): where n is the number of classes for the considered variable, N pix (S i ) is the number of pixels containing floods in the i th class of the considered variable, and N pix (N i ) is the number of all pixels for that class. It can be observed that higher FR indicate a more powerful correlation between flood occurrence and the respective variable and, conversely, lower ratios suggest a weaker correlation.

| Multilayer perceptron
Considered as one of the most widely used and most accurate machine learning techniques in various fields, ANNs are highly capable in modelling nonlinear relationships between target variable and explanatory variables (Kia et al., 2012). An MLP neural network is composed of a single input layer, multiple hidden layers, and a single output layer. Each of these layers is made of several neurons as the smallest information processing units (Jain, Mao, & Mohiuddin, 1996, Zurada, zurada, 1992. In these networks, the output of the first layer (input layer) is used as the inputs to the next layer (hidden layer). This trend continues in the following layers up to a certain number of layers until the outputs of the last hidden layer are fed to the output layer as the inputs. The MLP includes a set of weights that should be tuned for the training stages of the neural network. The back-propagation (BP) method is common in the training of MLP networks (Jain et al., 1996). This algorithm randomly selects the initial weights, biases and compares the output computed through the network with the real values. The difference between the computed and real outputs is obtained using the criteria such as the root-mean-square error (RMSE) or mean square error (MSE), after which the network weights are updated based on the delta rule. Hence, the overall network error is distributed among the various nodes in the network (Jain et al., 1996). This process continues until the error reaches a stable level. The MLP model specifications in this study are as follows: A total of 4 fully connected layers were used in this sequential layer, such that any given neuron in each layer is connected to all neurons in the next layer (for example, Golestan Province in Figure 6). Of these 4 layers, 3 were used for data processing, and the last layer was used for prediction. A total of 15, 10, 5, and 1 neurons were considered in the first hidden layer, the second layer, the third layer, and fourth or the output layer, respectively. The rectified linear unit (ReLU; (Nair & hinton, 2010)) was applied to the all 4 layers as the activation function after processing. The ReLU formula is as Equation 3 (Nair & hinton, 2010): This function is not linear and provides the same benefits as Sigmoid but with better performance (Zeiler F I G U R E 4 Flood conditioning factors for GRB. GRB, Ganga River Basin et al., 2013). After finishing the training run, this MLP network is applied to the test data to assess its accuracy.

| Autoencoder-MLP
This model is composed of two structures, namely the autoencoder neural network (Chicco, Sadowski, & baldi, 2014;Sun et al., 2016) and the MLP neural network. Instead of feeding the input data directly to the MLP for prediction, the autoencoder neural network is initially used for feature extraction, after which the results are provided to the MLP neural network for prediction.
Autoencoders are generally neural networks capable of learning to produce an output layer similar to the input layer (Chicco et al., 2014, Sun et al., 2016. This process is carried out in two stages by an encoder and a decoder. In the first stage, the input data are compressed in the hidden layer by the encoder, after which they are reconstructed by the decoder using the hidden layer (Chen, Shi, Zhang, wu, & guizani, 2017). In this model, the objective is not to train the autoencoders to produce the decoder output but to use the hidden layer produced F I G U R E 4 (Continued) by the encoder. This hidden layer is, in fact, a compressed representation of the data and, as a result, the hidden layer of the autoencoder contains suitable low-volume features of the initial data positively affecting the prediction results (Chen et al., 2017;Sun et al., 2016). As a key capability for making correction predictions, the autoencoders can also discover the nonlinear relationships between variables (Chen et al., 2017). Hence, autoencoders are used for two reasons, namely compressing the data and extracting nonlinear relationships between variables.
The stack autoencoder (SAE) was used in this study (Shin, Orton, Collins, Doran, & Leach, 2012;Vincent et al., 2010). Figure 7 shows the SAE architecture. This encoder is a neural network composed of several layers of autoencoders, such that the outputs of each autoencoder are fed to the next autoencoder as the input (Shin et al., 2012, Vincent et al., 2010. As mentioned earlier, two stages are involved in the combined autoencoder-MLP model. In the first stage and the SAE, the input data are mapped to the hidden layer using the encoder segment through a nonlinear mapping (for example, Golestan Province in Figure 7). The hidden layer has access to a nonlinear, compressed representation of the input features. These features are then provided to the second autoencoder, where they are encoded to produce new features. The produced output is fed to the third autoencoder as input, and the same trend continues up to the last autoencoder. The encoding step of an encoder is performed as Equation (4) and (5) (Shin et al., 2012, Vincent et al., 2010: where w, b are the weight and bias vectors, respectively. In Equation 4, l is the number of hidden layers and h l − 1 is the (l-1) th hidden layer whose values is taken from the previous hidden layer l. Therefore, in the first stage of the model, the features are extracted through multiple layers of encoders using the SAE. In the second stage, the features extracted from the last SAE layer are given to the MLP layer as the input for prediction.
The autoencoder-MLP model used in this study includes a total of 5 layers, the 4 first of which are associated with the autoencoder, and the last layer belongs to the MLP neural network (Figure 7). A total of 5 neurons were considered in the first autoencoder layer, 15 in the second layer, 10 in the third layer, 5 in the fourth layer, and 1 in the last layer (Figure 7). The activation function was applied to all layers after preprocessing. A linear function was used for the third layer, whereas the ReLU was used for the rest of layers. The processing was performed as described earlier, during which the output of each layer is fed to the next layer as the input. Hence, after the extraction of features in the first stage by the autoencoders, the MLP in the second stage performs the prediction process and completes the model. Moreover, the first encoder of the SAE is shown in Figure 8. Once the training stage is finished, the autoencoder-MLP model is applied to the test data to investigate its accuracy. The FR was used to determine the correlation between each class of variables and floods. The results are presented in Tables 1 and 2 for both cases. As shown in Table 4, the 45-270 m height class, the 0-3 slope class, the flat aspect class, and the flat class in the plane curvature factor were among the most important classes that were assigned the highest weights by the FR method. On the contrary, the altitude above 1,260 m class received the lowest weight and, therefore, this class plays the least important role in flood occurrence. Moreover, the 500-1,000 m class in the distance to drainage factor, the Proterozoic class associated with lithology factor, the water use class, and the 600-800 mm precipitation class had the highest effect on flood occurrence. Also, as shown in Table 5 for GRB, the altitude above 45 m class, the slope above 7 class, the flat aspect class, and the flat class in the plane curvature factor were among the most important classes that were assigned the highest weights by the FR method. On the contrary, the altitude above 65.7 m class, the 5 -7 slope class, the east aspect class, and the convex class in the plane curvature factor received the lowest weight and, therefore, these classes play the least important role in flood occurrence. Moreover, the 0-600 m class in the distance to drainage factor, the Oxbow Lake class associated with geomorphology factor, the water use class, and the 1,213-1,281 mm precipitation class had the highest effect on flood occurrence. Other important classes can be seen in Table 5 for other factors.

| Application of MLP and autoencoder-MLP in flood susceptibility modelling
After conducting the correlation analysis and determining the weight of each class of variables, the MLP and autoencoder-MLP models were implemented in Python. Seventy percent of the datasets were used as the train data, and the remaining 30% were used to test the models.
After training of 200 iterations, all cells in the two regions were entered into the MLP and autoencoder-MLP models and their flood susceptibility index was F I G U R E 6 The multilayer perceptron model structure calculated. Figures 9 and 10 show the flood susceptibility maps for the two models for Iran and India cases, respectively. After making the prediction outputs of the two models for the entire region, the natural break classification method was used to classify these maps into five classes including very low, low, moderate, high, and very high. Natural break classification is one of the most common methods in natural hazard mapping to classify the various classes of conditioning factors as well as susceptibility maps. This method identifies real classes within the data. This is useful because it creates maps that have accurate representations of trends in the data (Baz, Geymen, & Er, 2009). For example, for a map with different values, this method finds areas that have close values. Geometric interval or quantiles are other methods of splitting, which do not create the best division. For example, quantiles divides only ranges into classes with equal distances. These two methods are easy and fast, but they do not produce the desired output. For Golestan Province, these five classes cover, respectively, 33.76, 6.78, 6.71, 6.68, and 46.07% of the total study area for MLP model,and 19.52,13.85,15.28,14.09,and 37.26% for the autoencoder-MLP model. The results indicate that 52.75 and 51.35% of the entire region falls into the high and very high flood susceptibility classes in the MLP and the autoencoder-MLP models, respectively. By investigating the characteristics of the cells which were classified into the high flood susceptibility class in the MLP model, it can be clearly observed that the majority of these cells are in the 45-270 m height class, in the Cenozoic geological layer class, and in the agricultural lands in the region. For GRB, these 5 classes (very low, low, moderate, high and very high) cover, respectively, 26.24, 29.26, 22.17, 16.02, and 6.31% of the entire region for the MLP model,and 10.74,21.97,35.37,23.74,and 8.18% for the autoencoder-MLP model. The results indicate that 22.33 and 31.92% of the entire region falls into the high and very high flood susceptibility classes in the MLP and the autoencoder-MLP models, respectively.
It is noteworthy that the first study area, Golestan province, covers more susceptible lands, 52%, in terms of combined share of high and very high susceptible lands in comparison to the second study area, Middle Ganga Plain, where the share stands for the same category is 27% (average values of MLP and autoencoder-MLP outputs). The main reason behind this odd share, for both of the study areas with same model, is the altitude and slope of the region. In Golestan province the crescent shaped F I G U R E 7 The autoencoder-MLP hybrid model structure. MLP, multilayer perceptron upper part covers the low altitude regions ranging from −147 m to 270 m (Figure 3(c)) and low slope (0-3 ; Figure 3(a)) is also recorded for the same place. Also, the high rainfall is received in the upper catchment of the study area which provides surplus water to the lower catchments (low altitude part) of the region. Ultimately, this part of the Golestan province having low altitude and low slope characteristics receive more water during and after rainfall. These are the main reasons behind 54% share of high & very high susceptible lands to flood. Whereas, in the second study area, Middle Ganga Plain, the complete region characterised with low altitude zones and low slope ( Figure 4) and it's a part of the lower catchment of Gang River Basin, India. Therefore, the high and very high lands, 24%, are only visible along to rivers and low depressions only.
Based on the maps produced by both models, in Golestan Province the areas in the very low to moderate susceptibility classes are mainly located in the southern and southwestern parts of the region where the Alborz mountain range acts as a barrier preventing the entry of humidity derived from the Caspian Sea into these regions. Consequently, these areas have low rainfall and a dry climate. As a result, the probability of flood occurrence in these parts of the study area is low. The areas with high flood susceptibility are located in the northern and northwestern parts of the region. Evaporation of the Caspian Sea increases the humidity in these areas giving rise to heavy precipitations that can lead to floods. The proximity of the water table to the ground surface, as well as the saturated soil in these areas, can increase the intensity of floods.
In the India case study, the low altitude floodplains (<50 m) of the region have recorded high and very high flood susceptible zones in produced maps from Autoencoder-MLP ( Figure 10). The major concentration of high to very high susceptible zones can be observed in the complete eastern MGP where the major concentration of total annual average rainfall (>1,100 mm) is being recorded. The monsoon rainfall hits the area in the last of June and early July, submerged the low altitude basins first and causes an unprecedented situation (Arora et al., 2019).
During the monsoon period, the high volume of discharge of water from upstream influxes the downstream catchments and flood water spread over the region in the eastern parts (Bhatt & Rao, 2016). From the early flood records, it has been also observed that the sudden rainfall in the post-monsoon on already wet areas, flooded due to monsoon rainfall, brings more disaster in August and create havoc situation. Apart from both major factors, the river density plays a crucial role to distinguish the more and least flooded regions, high dense regions formed in permeable soil with low relief (altitude) regions (Gajbhiye, Mishra, & Pandey, 2014). The high dense streams' regions in the central northern and north eastern parts account for high flood susceptible zones.
The high dense and higher amount of rainfall is being recorded in the confluence zones of rivers, which provide a higher probability of flood than other parts in the study area. It has been observed in the earlier studies that the flood probability is higher in the confluence point (Kadam & Sen, 2012). The study area is having the confluence zone in the eastern margin where four rivers (i.e., Ganga, Ghaghara, Son and Rapti rivers) meet and cause more discharge of flooded water in low relief basin in the eastern parts.
The area under the ROC curve was used for assessing the accuracy of the results from the MLP and autoencoder-MLP models. As shown in Figures 11 and  12, the area under the curve for the MLP and autoencoder-MLP models in Golestan Province were 79 and 97% in the training and 82 and 96% in the testing phases, and for GRB were 74 and 93% in the trainingand 81 and 97% in the testing phases, respectively, indicating that the autoencoder-MLP model outperforms the MLP model in terms of accuracy in both study areas. The   Although MLP is one of the most famous and widelyused machine learning models, it has been used in few studies for flood susceptibility mapping. (Janizadeh et al., 2019) compared standalone MLP with alternating decision tree (ADT), functional tree (FT), kernel logistic regression (KLR), and quadratic discriminant analysis (QDA) models. In their study, this model achieved poorer results than that of ADT and KLR. However, using the hybrid model of autoencoder-MLP can achieve better results.
One of the limitations of the autoencoder-MLP hybrid model is that the results are different in each run of the model. This is due to the different initial weights assigned to the input variables. To overcome this limitation, the model can be run several times and the model with the highest accuracy is selected as the final model. Another limitation is related to sampling technique used for training, as well as, testing of the model. Every time random sampling is used, different training and testing datasets are generated. Therefore, models made with these datasets can be different. To solve this problem, the random sampling method can be repeated several times and the best model can be selected.

| CONCLUSION
Flood susceptibility mapping can be used as an important information resource for planners and managers to reduce the hazards resulting from this phenomenon.
In this study, a hybrid model composed of the MLP and autoencoder models was constructed to prepare the FSM for two study areas in Iran and India. For Golestan Province, nine factors including altitude, aspect, slope, plan curvature, TWI, lithology, distance to drainage, land use, and rainfall. For GRB, 12 factors including altitude, slope, aspect, plan curvature, distance from the river, rainfall, river density, TWI, land use land cover, distance from roads, soil type, and geomorphology were considered as the effective factors in flood occurrence. The hybrid autoencoder-MLP uses the capabilities of the MLP neural networks as one of the most powerful machine learning techniques and autoencoder neural networks. In this hybrid model, autoencoder was used to reduce the number of features and eliminate the ineffective ones from the modelling process. The results showed that the autoencoder-MLP model provided considerably better results compared to the MLP model in both study areas.