Seismic damage prediction of RC buildings using machine learning

Decision‐makers and stakeholders require a rapid assessment of potential damage after earthquake events in order to develop and implement disaster risk reduction strategies and to respond systematically in post‐disaster situations. The damage investigated manually after an earthquake are complicated, labor‐intensive, time‐consuming, and error prone process. The development of fragility curves is time consuming and unable to predict the damage for wide classes of structures since it considers few structural properties and only one seismic characteristic. Furthermore, the nonlinear finite element method cannot be utilized for numerous buildings because it involves more time and money. This paper presents the machine learning (ML)‐based seismic damage prediction of RC buildings. It is found that some of the research works only considered seismic parameters or structural parameters to train the ML models and predict the structural damage assessment. However, these ML models may not fully reveal the underlying complexity of the relationship between input parameters and building performance. As a result, their applicability will be limited. This paper evaluates the feasibility of using ML techniques such as K‐nearest neighbor, random forest, decision tree, support vector machine, and artificial neural network to rapidly predict earthquake‐induced reinforced concrete building damage considering both the structural properties and ground motion characteristics. The machine learning models are trained using the simulation results. Due to lack of real earthquake damage datasets or limited access, most of the research works used Scikit Learn train_test_split function to randomly split the entire datasets into training and testing datasets and the performance of the proposed ML technique are evaluated using the testing datasets. However, in this study, the performances of different ML models are evaluated using real earthquake damage datasets of RC buildings collected after 2015 Nepal earthquake. The overall accuracy on testing datasets suggests the capability of machine learning algorithms in successfully predicting the seismic damage of reinforced concrete buildings in quick time with reasonable accuracy. This study is beneficial in emergency response and recovery planning after an earthquake.


INTRODUCTION
The assessment of the existing Reinforced Concrete (RC) buildings for a seismic damage is a challenging structural engineering problem and also a key issue for disaster mitigation and resilience. The seismic damage assessment of these structures aids in determining whether the buildings can be used safely after the earthquake by realizing the probability of damage levels. Furthermore, it provides valuable information to the concerned emergency departments about the maximum damage areas and thus helps decision-makers and stakeholders to develop and implement disaster risk reduction strategies and to respond systematically in post-disaster situations. Therefore, the seismic damage assessment of existing RC buildings is a major concern.
The proposed methods for seismic damage assessment can be categorized into three types: (1) visual inspection, (2) fragility curves, and (3) nonlinear simulation-based method. Colonna et al. 1 presented visual inspection of a school in Teramo in terms of detection of crack patterns and evaluation of the seismic damage index. Hafner et al. 2 presented postearthquake assessment of buildings through visual inspection after the Zagreb earthquake in March 2020 to identify the safety and usability of buildings. Visual inspection of buildings in the post-earthquake phase are also carried out by welltrained professionals based on the damaged data recorded from the past earthquakes. 1,3,4 However, the visual inspection method requires mobilization of many well-trained professionals to the affected areas at the same time. There may be insufficiency of these experts for rapid seismic assessment of building after earthquake and time delay in mobilization cause by paralyzed transportation. The available time may be limited to collect the buildings information in more detail. The evaluation of usability must be based on visual inspection and professional judgment, as well as interviews with local technicians to acquire information about local constructive practice. 5 The damage inspection of buildings conducted manually after the earthquake are complicated, labor-intensive, time consuming, and error prone process. Furthermore, the validation of this methods is limited to specific geographical and seismic zones. 6 Fragility curves defines the relationship between the probability of exceeding a particular damage state of element or structure as a function of an earthquake's intensity measure parameter. The fragility curves are developed using observed damage data considering various characteristics of buildings and seismic characteristics. [7][8][9] Polese et al. 10 derive class representative capacity curves and relative fragility curves for minor, moderate, extensive, and complete damage states. Gautam et al. 11 presents seismic fragility functions of RC buildings affected by the 2015 Gorkha Earthquake in Nepal. The fragility functions are based on a detailed damage inspection of numerous buildings that were damaged by the earthquake and its aftershocks. Gaudio et al. 12 constructed the fragility curve considering the material characteristics, modeling parameters, damage state threshold, and uncertainties in seismic demand into account through a Monte Carlo simulation technique. Existing damage relationships for reinforced concrete structures are reviewed by Rosetto et al. 13 with the goal of applying them to a European (and similar) seismic risk assessment scenario. Based on a data bank of 99 post-earthquake damage distributions observed in 19 earthquakes and involving 340,000 RC structures, new empirical fragility curves for reinforced concrete building populations are derived. However, the development of fragility curves from the observed data is time consuming and cannot easily develop for different regions or countries with diverse construction characteristics. 14 Furthermore, seismic damage curves allow damage prediction for classes of similar structures characterized by a small number of parameters, and typically consider only one seismic parameter. Hence, they are unable to estimate the damage for wider classes of structures. 15 Many research efforts have been made in the past to predict the seismic damage of buildings using nonlinear simulationbased method. Castellazzi et al. 16 used limit analysis and nonlinear finite element (FE) analysis to investigate the seismic damage of a historical Basilica church in Italy. Preciado 17 presented a methodology for assessing the seismic damage of all types of towers and slender unreinforced masonry structures (e.g., lighthouses and minarets). Four validated 3D FEM models representing European towers are used to develop the approach. Castori et al. 18 proposed three modeling strategies, each with a different level of complexity: the equivalent frame model, the rigid macro-block model, and the finite element model, for seismic damage assessment of Civic Museum in Tuscany. Ahmed et al. 19 conducted a numerical study using advanced numerical simulations in ATENA environment for seismic damage assessment of confined masonry structures adopting micro-modeling approach. Ceroni et al. 20 used finite element model to assess the seismic damage of

NOVELTY OF CURRENT RESEARCH
• It is found that some of the research works only considered seismic parameters or structural parameters to train the ML models and predict the structural damage assessment. However, these ML models may not fully reveal the underlying complexity of the relationship between input parameters and building performance. As a result, their applicability will be limited. • This paper evaluates the feasibility of using ML techniques such as K-nearest neighbor, random forest, decision tree, support vector machine, and artificial neural network, to rapidly predict earthquake-induced reinforced concrete building damage considering both the structural properties and ground motion characteristics. • Due to lack of real earthquake damage datasets or limited access, most of the research works used Scikit Learn train_test_split function to randomly split the entire datasets into training and testing datasets and the performance of the proposed ML technique are evaluated using the testing datasets. However, in this study, the performances of different ML models are evaluated using real earthquake damage datasets of RC buildings collected after 2015 Nepal earthquake. • The overall accuracy on testing datasets suggests the capability of machine learning algorithms in successfully predicting the seismic damage of reinforced concrete buildings in quick time with reasonable accuracy.
heritage masonry buildings by means of non-linear static analyses according to the provisions of Eurocode 8. A case study is carried out on old masonry building (The Matica Hrvatska) to compare the damage results from inspection and numerical results. The damage evaluated from in-situ assessment cannot perfectly correspond with the damage patterns observed in the numerical model because of lack of information since the seismic behavior of buildings is quite complex and is influenced by many parameters. 21 Most of the research works using nonlinear finite element method are carried out to evaluate the seismic damage of a single building; however, it cannot be used for large scale of buildings because it takes more time and money. 22 In recent years, artificial intelligence (AI) techniques are rapidly increasing and have been used widely in several engineering disciplines. This technique provides an opportunity to reduce computational burdens and improve prediction efficiency. 23 Machine learning (ML) have recently received significant attention and are establishing themselves as a new class of powerful intelligence technologies for usage in seismic and structural engineering with proven effectiveness. AIdriven technologies are projected to become more feasible and necessary in the future with the improvement in processing capability and data accumulation. 24 Harirchian et al., 25 Xie et al., 26 and Sun et al. 27 conducted a comprehensive literature review on the most popular and recently developed ML approaches for assessing building damage. A summary of some of the most important research works is provided below. Research study conducted by Latour and Omenzetter 15 explored the capacity of artificial neural network (ANN) to reliably assess the earthquake-induced damage of planar R/C frames utilizing the results of nonlinear time history studies. Tang et al. 28 proposed a machine learning-based quick seismic risk assessment methodology to reduce the computational cost of assessing the possible loss of a building due to an earthquake during its intended life. The prediction ability of different machine learning algorithms such as artificial neural network, support vector machine, classification and regression tree and random forest were investigated. Morfidis et al. 22 used artificial neural networks to achieve an optimum prediction for the damage state of RC buildings. Machine learning-based methods, including both regression-based and classification-based algorithms were implemented by Hwang et al. 23 for reliable prediction of seismic response and structural collapse classification of RC frame buildings in future earthquake considering component-and system-level modeling uncertainties. Zhang et al. 29 used classification and regression tree (CART) and Random Forest to probabilistically identify the structural safety state of an earthquake-damaged building. Burton et al. 30 implemented ML-based approaches to evaluate the vulnerability of buildings to aftershock collapse using mainshock intensity, seismic response, and certain damage indicators. The proposed framework is applied to a four-story RC special moment frame building. Vafaei et al. 31 trained perceptron neural network to develop a correlation between the inter-story drift ratios and the plastic hinge rotation of reinforced concrete shear wall. The data required for the training were obtained from nonlinear modal pushover analyses. Sun et al. 32 adopted kernel-based machine learning approaches to reconstruct seismic response demands across several tall buildings (20-42 stories). Xu et al. 33 proposed a machine learning-based method to predict the structural types of buildings for city scale seismic damage simulations. Mangalathu et al. 34 studied the feasibility of different machine learning techniques to classify the buildings damages to red, yellow, and green utilizing the damage data from 2014 South Nepal earthquake. Deep learning techniques such as Convolutional Neural Network (CNN)-based rapid regional post-event seismic damage assessment methodology was proposed by Lu et al. 35 utilizing the time-frequency distributions of ground motions. Rofooei et al. 36 carried out nonlinear analysis of 2-D moment resisting frames for the data to train perceptron network and investigate the influence of structural and seismic parameters utilizing ANN. Xu et al. 37 presented a Long Short-Term Memory (LSTM) neural network architecture-based framework for real-time regional seismic damage assessment. Xiong et al. 38 introduced a convolutional neural network (CNN) and an unmanned aerial vehicle (UAV)-based automated method for assessing building seismic damage.
Building damage is associated to the features of structural system and ground motion characteristics, which consists of many parameters. In particular it is quite difficult to determine the extent to which structural or seismic parameters influence structural performance to identify the main parameters that may cause damage. Molas et al. 38 used of neural networks to predict earthquake damage using recorded ground motion indices such as peak ground acceleration (PGA), peak ground velocity (PGV), peak ground displacement (PGD), and Seismic Intensity (SI). Simulated earthquake ground motions are utilized to create a well-distributed data set, and the damage is represented by the ductility factor derived from non-linear analysis of two single-degree-of-freedom structural models. Kiani et al. 39 used spectral acceleration, acceleration spectrum intensity, spectrum intensity, displacement spectrum intensity, PGA, cumulative absolute energy, 5%−75% significant duration and 5%−95% significant duration as input features and implement different classification-based machine leaning tools (logistic regression, lasso regression, support vector machine, Naive Bayes, decision tree, random forest, linear and quadratic discriminant analyses, neural networks, and K-nearest neighbors) for predicting the structural responses and deriving the fragility curves. Xu et al. 40 proposed a method based on machine learning algorithms for accurately predicting seismic damage in real time using several intensity measures (IMs). The complex characteristics of ground motions were represented by 48 intensity measures (IMs) and the seismic damaged was evaluated using nonlinear time history analysis. Support vector machine, logistic regression, and decision tree were adopted to develop the mapping rules between the IM vectors and the seismic damage. Kia and Sensoy 42 investigated the impact of nine seismic parameters: PGA, PGV, PGD, PGA/PGV, PGA/PGD, PGV/PGD, frequency content, effective time, and fault line distance on the performance of ANNs in predicting the seismic damage level of r/c concrete frames. Arslan 42 performed a series of analytical studies using neural networks to study the influence of structural parameters on seismic response of RC buildings, which were supposed to cause the damages on the structures during earthquake. Concrete compressive strength, yield and ultimate strength of steel, transverse reinforcement, infill wall ratio, short column, strong column-weak beam, and shear wall ratio were taken into consideration while determining the corresponding structural parameters. In another research conducted by Arslan et al. 43 artificial neural network was utilized to predict the seismic response of existing medium and high-rise RC buildings using 23 structural parameters as inputs. In the study conducted by Zhang et al. 29 for evaluating the performance of structural system, non-universal structural parameters were involved that were only effective for a specific structure. These studies shows that only ground motion parameters or structural parameters are considered as inputs to train the ML models for evaluating the response of structural systems. The developed machine learning techniques may not fully disclose the underlying complexity of the interactions between input parameters and building performance. 28 As a result, the applicability of the machine learning models will be limited. 40 For the reliable and accurate prediction of structural damage during earthquake considering both seismic parameters and structural parameters helps boot the adaptability of the machine learning models. 15,44,45 This study presents machine learning-based seismic damage prediction of RC structures considering both the structural parameters and seismic parameters. A set of 10 structural parameters and 7 ground motion parameters are considered. Moreover, the real earthquake damage dataset is not available or have limited access, most of the research works adopted Scikit Learn train_test_split function to split the whole datasets randomly into training and testing datasets and the performance of the proposed ML techniques are evaluated using the testing datasets. However, in this study, the performances of different ML models are evaluated using 2015 Nepal earthquake RC buildings damage dataset. The prediction model can be embedded in an electronic tool that provides quick estimates of seismic damage to RC building structures after an earthquake. Furthermore, this study supports for fast decision making, implement disaster risk reduction strategies and to respond systematically in post-disaster situations.

OVERVIEW OF MACHINE LEARNING-BASED SEISMIC DAMAGE ASSESSMENT
This study presents machine learning-based seismic damage prediction of RC buildings considering both the structural and seismic parameters. The framework of machine learning-based seismic damage assessment of RC buildings is shown in Figure 1. It consists of four sub-sections: develop simulation dataset, train ML models, test on real earthquake damage Each sub-sections of the proposed framework are discussed below in more detail.

DEVELOP TRAINING DATASETS FROM SIMULATION
The majority of machine learning (ML) algorithms are intended for the analysis of balanced datasets. Moderate and strong earthquakes are rare events and hence the real-world datasets are imbalanced and have majority and minority classes. The number of datasets of each class should be evenly distributed so that the predictive ML models is not biased toward the majority class and ignored the minority class. 46 In this study, the training datasets are prepared using the simulation results. The advantage of using simulation results is the utilization of amplitude scaling methods of the earthquake wave to increase the datasets for moderate to collapse damage grade, which is explained in the subsequent section.

Structural parameters
The structural parameters considered in this study are number of stories, inter-story height, height of building, fundamental period, plinth area, age of buildings, land surface condition, position, plan configuration, and type of superstructure. RC buildings ranging from 3 to 7 stories assuming the inter-story height of 3 m and with varying dimensions in x-direction and y-direction are considered as shown in Table 1. Furthermore, these buildings are rectangular and square in plan and regular in elevation to represent a significant number of RC buildings designed with the help of modern seismic design  codes. The fundamental period T 1 is assumed to be proportional to the number of stories N. 47 The thickness of slab, size of beam, and column are chosen such that they met the requirements as suggested by the different design codes. [48][49][50] The slab thickness is taken as 150 mm for all buildings. The different sizes of beam and column and column are considered as shown in Table 2. The buildings are assumed to be built within 50 years and the analysis is carried out with different damping ratio that varies from 0.3 to 0.5. Two types of superstructures: brick cement mortar walls and stone cement mortar walls are considered. These buildings are assumed to be built on flat land surface condition and not attached with other neighboring buildings.

Earthquake parameters
At present, several real-time earthquake monitoring networks have been established, such as Japan, 51 China, 52 USA, 53 European Commission, 54 and Turkey. 55 When an earthquake occurs, these networks can capture and transmit ground motions in real time. In this study the 11 earthquake waves recorded in Kyoshin Network (K-Net) and Kiban Kyoshin Network (KiK-Net) are used as shown in Table 3. These earthquakes are selected randomly from the year 2003 to 2022 having seismic intensity of greater than 4 as per Japan Meteorological Agency (JMA) scale in order to have near balance damage dataset for each damage grade (from null or slight damage to collapse). The extent of building damage caused by an earthquake is widely connected to the characteristics of the record, which consists of many parameters. The evaluation of the influence of seismic motions on structures, and especially on RC buildings, is an extremely complex and multi-parametric problem. As a result, a large number of seismic parameters have been introduced for evaluating earthquake effects on structures. 22 In this study seven seismic parameters: peak ground acceleration (PGA), peak ground velocity (PGV), ratio of PGV/PGA, seismic intensity (SI), and pseudo-spectral acceleration (PSA) at 0.3 s, 1 s, and 3 s are considered as illustrated in Table 4 to investigate the structural damage of the buildings due to earthquake. The amplitude of the seismic parameters considered of the training dataset is shown TA B L E 3 Recorded earthquake waves (K-Net and KiK-Net).   56 which is also adopted by United States Geological Survey (USGS). 57 It has a lower bound of 1, means no shaking is felt and no damage is observed and higher bound of 10, means extreme shaking is felt and very heavy damage is observed. The Newmark Beta method is applied to calculate the pseudo-spectral acceleration, expressed in terms of acceleration due to gravity, at different time period. The ratio of PGV/PGA is a seismic parameter accounting for frequency content of the input motion since PGA and PGV are associated with motions of different frequencies. 58 The earthquake waves with higher PGV/PGA values have larger damage potential. 59

Parameter for assessing RC buildings' damages
The assessment of expected seismic damages of RC buildings is expressed in terms of damage indices. Damage indices quantitatively evaluate the degree of seismic damages that the cross section of structural member or whole structure has suffered. In this study Maximum Inter-story Drift Ratio (MISDR) is used to reflect the seismic damage of RC buildings as explained in Figure 4. MISDR is generally considered an effective indicator of global structural and nonstructural damage of reinforced concrete buildings. 60 Previous research has used MISDR for the assessment of the inelastic response of structures. 61,62 The response under seismic excitation is obtained performing Nonlinear Time History Analysis (NLTHA) using Newmark's beta with incremental step (predictor-corrected method) for MDOF bilinear model. Equation (1) (2) and (3), Equation (9) can be obtained as corrected acceleration response which can be used further to calculate the corrected velocity and displacement response.

Data preparation and analysis
The database consists of 2487 dataset. Larger amplitude earthquakes are inherently less frequent; therefore, it is quite difficult to get moderate, heavy or collapse damage states which results in uneven distribution of damage states data that may influence the training model negatively. 63 Hence, amplitude scaling method 64 is adopted in this study. In this method the amplified ground motions are generated by multiplying recorded seismic waves with suitable scaling factors. The scaling factor is chosen no more than 4 in this study since it is considered inappropriate and can cause significant errors with scale factor more than 4. [65][66][67] Since the nonlinear dynamic response of a building is strongly dependent on the characteristics of the earthquake input, the ground motion should be selected that resembles the practical applications. Hence, the scaling of the selected earthquakes is done in such a way that the PGA is less than 1.53 g since the buildings step into nonlinear states at 0.2 g PGA approximately. 68 Figure 5 shows the distribution of damage state dataset before and after the amplitude scaling. Likewise, Figure 6 shows the number of damage class data based on number of building  Figure 7 presents the scatter plots that illustrate the building damage grades as a function of pairs of input variables to show any noticeable trends. It is interesting that Figure 7A-I does not reflect any specific pattern in the damage grades as a function of pairs of predictor variables. Unlike reported in Mangalathu et al. 34 and Boatwright et al. 70 no specific trend is observed in the damage grades as function of pseudo-spectral acceleration at 0.3 s and construction years. One possible reason can be due to consideration of RC buildings less than 50 years.

MACHINE LEARNING ALGORITHMS
In this study, machine learning-based approach is presented to check the applicability of ANN, DT, SVM, KNN, and RF for quick and reliable prediction of seismic damage states of RC buildings. The predictive models are developed and trained using Scikit learn library, which is the most useful and robust library for machine learning in Python. The higher accuracy obtained considering hyperparameters during training and testing the different machine learning algorithms are shown in Table 6. The accuracy of these ML algorithms is determined using Equation 11. Furthermore, the performance of the ML models can be evaluated based on the model metrics such as precision, recall, and f1-score, which are calculated using confusion matric parameters as illustrates in Table 7. The percentage of predicted damage grades that are correctly  assigned by the ML models is called precision. Recall can be defined as the percentage of actual damage grades that are correctly assigned by the ML models. The harmonic mean of decision and recall is known as f1-score. A good classifier should have a precision, recall and f1-score of 1 (high) and it becomes 1 when the numerator and denominator are equal, that is, TP = TP + FP and TP = TP + FN in the case of precision and recall respectively. This also implies that FP and FN are zero. The value of the denominator grows greater than the numerator as FP and FN increases, and the precision and recall value decreases, respectively. The f1-score will result 1 when both precision and recall become 1, or in other words FP and FN are zero.

TRAINING ACCURACY OF DIFFERENT ML MODELS
The performance of different ML models adopted in this study is evaluated using confusion matrix and model metrics such as precision, recall and f1-score. Confusion matrix is a plot of observed versus the predicted damage grades. Each diagonal element in the confusion matrix represents the damage grades that are classified correctly by the ML models. Each off-diagonal element represents an incorrectly predicted damage grade. Figure 8 shows the confusion matrix of different ML models. It is observed that RF can correctly predicts 689 out of 691 damage grades '0′, 517 out of 517 damage grades '1′, 560 out of 561 damage grades '2′, 346 out of 346 damage grades '3′, and 371 out of 372 damage grades '4′ on the training datasets. In other words, RF can correctly predict the training datasets with the overall accuracy of 99.83%. The precision, recall, and accuracy of different ML models in Figure 8 shows that the RF has obtained the higher precision and recall in comparison with other ML models. The overall accuracy obtained by DT, SVM, KNN, and ANN is 88.4%, 99.5%, 74.5%, and 88.3%, respectively. Table 8 illustrates the performance of different machine learning models adopted in this study in terms of f1-socre. It illustrates that RF has obtained higher f1-score in comparison with other ML models. The accuracy, precision, recall and f1-score obtained using different ML algorithms shows that they have the capability to predict the multiclass damage grades of RC buildings after an earthquake. These ML models are further utilized to predict the damage grades of RC buildings in real earthquake damage data in the next section. The importance of each feature in the model is measured using feature importance scores. The feature importance score is a value given to each feature in the model while a model is developed. The Gini Index, commonly referred to as Gini impurity, determines the probability of a certain feature that is categorized incorrectly when selected randomly. It can be calculated by subtracting the sum of squared of probabilities of each class from one as shown in Equation 12. In this study, RF algorithm with its associated Gini feature importance is considered to measure the feature importance score. It examines the amount that a feature's Gini index decreased at each split. The importance of a feature increases as the Gini Index decreases for that feature.
The importance of each input parameters in reflecting the damage grades of buildings is shown in Figure 9. Higher feature importance score is observed in the case of construction years followed by the plinth area and PGA. Furthermore, PGV, SI, PSA at 0.3 s, PGV/PGA, and PSA at 1 s are found to have similar contribution and the plan configuration has the least contribution when expressing the building's damage grade.

PREPARATION OF TESTING DATASET
Here, the ML models are trained using the simulation dataset of RC buildings designed based on IS Code. Because of the similarity of construction materials (brick, reinforcement bar, cement, etc.), construction techniques, occupancy load, soil properties, etc., buildings designed based on IS code has been recommended for adoption in Nepal. 71,72 Hence, the ML models trained using the simulation data can be validated using the post-earthquake damage data collected from Nepal. To evaluate the performance of various ML models on the unseen datasets that were not used during the training, the real earthquake RC building damage datasets from 2015 Nepal earthquake are taken. On April 25, 2015, an earthquake with a magnitude of 7.8 (Mw) struck Nepal's central region at 11:56 NST (local time) (Gorkha). After the earthquake the building damage assessment survey was conducted by Nepal government in the earthquake-affected districts and the building damage was categorized into five grades ranging from damage grade 1 that is, negligible to slight to damage grade 5 that is, collapse following the EMS-98 guidelines 73 as explained in Table 9 (http://eq2015.npc.gov.np/docs/#/faqs/faqs). The household data survey is available for free on the official website of Nepal's National Planning Commission (http:// eq2015.npc.gov.np/). The damage database contains 762,106 building datasets collected across eleven districts of Nepal: Okhaldhunga, Sindhuli, Ramechhap, Dolakha, Sindhupalchok, Kavrepalanchok, Nuwakot, Rasuwa, Dhading, Makwanpur, and Gorkha. Figure 10 shows the available information of buildings that were recorded after 2015 Nepal earthquake which are further illustrated in Table 10. According to reports, 79% required extensive repair or reconstruction, and 36 percent were destroyed. The current study focuses on the prediction of seismic damage in RC buildings using machine learning. As a result, only the RC buildings from 2015 Nepal earthquake damage database are evaluated in this study. The total of 67 RC buildings are considered in order to validate the trained ML models that are developed in previous section. Out of the many parameters from the damage database, only those structural parameters are filtered that are used to train the ML models. For TA B L E 9 General guidance on damage grade for a building. example, the number of stories is 3−7 story, only rectangular and square building plan, building attachment is considered not attached with other buildings, buildings having its construction age less than 50 years, only the flat ground surface condition, and brick and stone cement mortar superstructures. Figure 11 shows the summary of the testing datasets based on the structural parameters. Figure 12 shows the distribution of datasets based on the number of stories. It is observed that the testing datasets contains higher number of 3-story RC buildings that is, 24 and 4 numbers of 7-story RC buildings. Furthermore, the damage parameter, damage grade as explained in Table 9 are further considered using the MISDR values where MISDR values less than 0.0025 refers to no structural damage and slight nonstructural damage, and so on to MISDR values greater than 0.015 refers to very heavy structural damage or collapse of building as reported by Masi et al. 69 Nanos and Elenas 74 reported that MISDR values corelates well with RC buildings' structural and nonstructural damage also after severe earthquake. The seismic properties of Nepal earthquake are taken from USGS website. 53

DISCUSSION ON ML MODEL PERFORMANCE ON TESTING DATASET
The different ML models that are trained in the previous section are utilized to predict the seismic damage of RC buildings on the testing dataset. The confusion matrix of different ML models on the testing dataset along with their recall and the  precision are shown in Figure 13. It is observed that RF outperforms the other ML models considered in this study based on the overall damage prediction accuracy for the testing dataset which is 74.62%. It can correctly predict 14 out of 16 damage grades '0′, 13 out of 16 damage grades '1′, 7 out of 12 damage grades '2′, 7 out of 11 damage grades '3′, and 9 out of 12 damage grades '4′ on the testing datasets. The overall accuracy of 64.17%, 68.65%, 62.68%, and 70.14% is obtained from DT, SVM, KNN, and ANN, respectively on the testing datasets. The summary of the performance of the ML models on training and testing datasets are summarized in Table 11. It is observed that SVM and RF has obtained the higher prediction accuracy of over 99% on the training dataset, however RF has obtained 6% higher prediction accuracy than SVM on the testing dataset. SVM, in its basic form, is a linear binary classifier, assumes the multidimensional data are linearly separable in the input space. SVM establish an optimal hyperplane to separate the dataset into a discrete number of predefined classes using the training data. In practice, the data samples of various classes often overlap and are not always linearly separable, for example, 2015 Nepal earthquake damage dataset. As a result, Kernel trick method is introduced to address the limitation of linear SVM. Kernel trick projects the input dataset into a higher dimensional feature space to improve the separability between classes and has a significant impact F I G U R E 1 3 Performance of different ML models for the testing dataset. on how well SVM performs on a subset of the training dataset or validation dataset. This space can theoretically be of an infinite dimension and yet allow for linear discrimination. Kernel-based method can be quite sensitive to overfitting, 75 which is possibly the higher prediction accuracy on training dataset and lower prediction accuracy on testing dataset by SVM as compared with RF. Random forest is a ML technique to increase accuracy by integrating multiple classifiers to address the same problem. The integration of multiple classifiers decreases variance and may yield more reliable results. In addition, voting approach is used to assign a label considering maximum number of votes from multiple classifiers to each unlabeled dataset. 76 Furthermore, instead of using all predictor variables as in DT, RF uses the bagging technique to improve the stability and accuracy of integrated models while reducing variance. 77 Bagging technique randomly splits data into smaller sections while creating the trees, so the correlated features may or may not be used for a particular tree. Therefore, the final prediction voting considering the maximum number of votes from multiple classifiers makes a RF better than a single decision tree, hence enhances its accuracy and reduce overfitting. 78 Furthermore, since the recall values is more useful in predicting the damage, the higher recall value obtained by different ML models on each damage grades are shown in Table 11. The number of correct predictions of damage grades  Figure 14. It is observed that the RF can perform well in predicting the damage grade '1′, '3′, and '4′ whereas SVM for damage grade '0′ and ANN for damage grade '2′ and '4′. Furthermore, Table 12 illustrates that the RF has achieved the higher f1-score for all the damage grades except for damage grade '2′, for which ANN has achieved higher f1-score. Hence it is found that RF performed better than other ML models. From this, it can be considered that once the ML models are well trained, it can be used to predict the seismic damage of RC buildings after the earthquake in short time with reasonable level of accuracy. Lastly, the fragility-based seismic damage prediction model is developed as shown in Figure 15. Numerous research works have been conducted to develop the damage classification or prediction model by constructing fragility function. [7][8][9][10][11]79 The fitted fragility function is developed as explained by Baker, 80 where estimates of the fragility function parameters are obtained by maximizing the likelihood function. Figure 15 provides the relationship between the probability of exceeding a particular damage grade of structure as a function of peak ground acceleration expressed in terms of acceleration due to gravity. In this study, it is observed that the seismic damage prediction accuracy using machine learning algorithms is higher on the training dataset and lower on the testing dataset. One reason is lack of sufficient number of training dataset that can characterized the real-world damage data correctly. The recall values obtained from RF for damage grade 3 and 4 is lower than 80% which can be due to a smaller number of training dataset as compared with other damage grade. Since the real-world damage observations capture the aleatoric variability and epistemic uncertainties, the training dataset should be prepared well to capture these uncertainties for higher prediction accuracy. The real-world damage data are affected by many structural parameters as shown in Table 10. However, in the present study, only few parameters are considered while developing the machine learning model. For example, the effects of structural pounding which refers to the collisions that occur between adjacent buildings during earthquake is not incorporated while developing the simulation dataset. Considering the building locations and soil conditions can improve the damage prediction accuracy of ML models. 34,81 Several research works have reported the pounding between adjacent structures during 2015 Nepal earthquake. 82,83 Furthermore, the short column effect is also reported 82 due to landing beam for stairs, construction of building in moderate or steep slope creating long and short foundation column, etc. in 2015 Nepal earthquake. However, the short column effect is not considered while generating the simulation datasets. Likewise, the use of other superstructures such as mud mortar brick wall, timber, bamboo, etc. considering local construction materials and various building plan configurations to increase the prediction accuracy.

CONCLUSION
The assessment of building damage after an earthquake is a crucial step for decision-makers and stakeholders in emergency response and recovery planning. The damage state of the buildings can vary from negligible to slight damage to collapse depending on building characteristics, soil conditions, and earthquake and ground motion characteristics. This study aims to obtain the reliable prediction of seismic damage of RC buildings after an earthquake in quick time using ML techniques. For this purpose, different ML techniques are utilized such as RF, SVM, DT, ANN, and KNN. These ML models are trained using the simulation datasets and their performance are evaluated using the real earthquake damage datasets obtained after 2015 Nepal earthquake. The simulation datasets are prepared considering 35 RC buildings with different structural characteristics and nonlinear time-history analysis is carried out using 11 seismic waves recorded in K-Net and KiK-Net. In order to solve the problem of uneven distribution of different damage grades datasets amplitude scaling method is adopted. RF has achieved the highest overall accuracy of 99.83% and KNN has achieved the lowest accuracy of 74.5% on the training dataset. This shows that the ML models adopted in this study are capable to successfully predict the seismic damage of RC buildings.
Different ML models that are trained utilizing the simulation dataset are further used to predict the seismic damage of RC buildings that are obtained after 2015 Nepal earthquake. The overall accuracy of over 62% is obtained from all the ML algorithms considered in this study. RF has obtained the highest prediction accuracy of 74.62% on the testing datasets and KNN has the lowest prediction accuracy of 62.68%. Furthermore, the performances of these different ML models are evaluated based on precision, recall and f1-score. RF performed better than the other ML models. The performance of different ML models both in training datasets and testing datasets implies that the ML techniques can be utilized for successful prediction of seismic damage grades in short period of time with reasonable accuracy.
This study is carried out to evaluate the seismic damage of RC buildings after an earthquake using different ML techniques. The increase in number of datasets for the training considering more earthquake records can increase the seismic damage prediction accuracy of RC buildings. Furthermore, in the present study, only the RC buildings that have similar structural characteristics with the training datasets are considered as testing datasets to predict their damage grades after earthquake. For example, 3-7 stories buildings, building age up to 50 years, building not attached with other neighboring buildings, flat ground condition, two types of plan configuration, two types of superstructures, etc. However, the training datasets can be increased further considering many others structural features that are available in real earthquake damage database as shown in Table 10. This can help in more reliable seismic damage prediction of numerous RC buildings in a quick time. In addition to this, this study can be extended to predict seismic damage of multiple type of building structures such as masonry, bamboo, wooden, steel, composites, etc. to predict the seismic damage of the larger number of buildings after an earthquake. The authors have considered these as a future works.

A C K N O W L E D G M E N T S
The authors would like to thank Kyoshin Network (K-Net) and Kiban Kyoshin Network (KiK-Net) for providing the open sourse database to download the earthquake events.

C O N F L I C T O F I N T E R E S T S TAT E M E N T
The authors declare no conflict of interest.

D ATA AVA I L A B I L I T Y S TAT E M E N T
The data that support the findings of this study are available from the corresponding author upon reasonable request.