Integration of a Deep-Learning-Based Fire Model Into a Global Land Surface Model

Fire is a crucial factor in terrestrial ecosystems playing a role in disturbance for vegetation dynamics. Process-based fire models quantify fire disturbance effects in stand-alone dynamic global vegetation models (DGVMs) and their advances have incorporated both descriptions of natural processes and anthropogenic drivers. Nevertheless, these models show limited skill in modeling fire events at the global scale, due to stochastic characteristics of fire occurrence and behavior as well as the limits in empirical parameterizations in process-based models. As an alternative, machine learning has shown the capability of providing robust diagnostics of fire regimes. Here, we develop a deep-learning-based fire model (DL-fire) to estimate daily burnt area fraction at the global scale and couple it within JSBACH4, the land surface model used in the ICON-ESM. The stand-alone DL-fire model forced with meteorological, terrestrial and socio-economic variables is able to simulate global total burnt area, showing 0.8 of monthly correlation ( r m ) with GFED4 during the evaluation period (2011–2015). The performance remains similar with the hybrid modeling approach JSB4-DL-fire ( r m = 0.79) outperforming the currently used uncalibrated standard fire model in JSBACH4 ( r m = −0.07). We further quantify the importance of each predictor by applying layer-wise relevance propagation (LRP). Overall, land properties, such as fuel amount and water content in soil layers, stand out as the major factors determining burnt fraction in DL-fire, paralleled by meteorological conditions over tropical and high latitude regions. Our study demonstrates the potential of hybrid modeling in advancing fire prediction in ESMs by integrating deep learning approaches in physics-based dynamical models.


• Deep neural networks (DNN) can
accurately predict global burnt area fraction on a daily scale • Integration of the DNN into a physics-based land model significantly improves the estimation of biomass burnt damage in vegetation dynamics • The DNN accounts for regional fire variations by assigning varying degrees of importance to each predictor

Supporting Information:
Supporting Information may be found in the online version of this article.
represent fires in Earth system models (ESMs), with uncertainties in the representation of fire disturbance still dominating the overall uncertainties in the estimation of carbon fluxes from land (Hardouin et al., 2022).
Global fire models have been developed based on empirical and physical understanding of the fire process, and these have been incorporated within dynamic global vegetation models (DGVMs) (Hantson et al., 2016).In the early stage of global fire modeling, burnt area was estimated based on the amount of dry fuel and the length of fire season (Thonicke et al., 2001).The representation of frequency of fire occurrence was advanced by considering weather-driven fire risk (Lenihan, 1998).Venevsky et al. (2002) added characteristics of fire spread by adopting the Rothermel's rate-of-spread (RoS) equations (Rothermel, 1972).Based on the RoS, more advanced fire related physical representations were introduced (Pfeiffer et al., 2013;Thonicke et al., 2010) and implemented in various DGVMs (Drüke et al., 2019;Lasslop et al., 2014;Yue et al., 2016).Human activity impacts are also considered as nonlinear functions for fire ignition and suppression based on population density, gross domestic product (GDP) and land-use changes (Kloster et al., 2010;le Page et al., 2015;F. Li et al., 2013).
Although there has been remarkable progress in global fire modeling, there are still many challenges remaining to represent the fire process and fire-vegetation interactions.For instance, fire characteristics, such as the completeness of combustion and plant mortality, are not robustly parameterized to reflect differences depending on vegetation types (Lasslop et al., 2014).Uncertainties in vegetation effects on fire remain as a main drawback in DGVMs (Forkel et al., 2019).Besides, while fire modeling has advanced with more sophisticated process based representations, there is still no agreement on the optimal level of complexity for a global fire model (Hantson et al., 2016).
Deep learning (DL), as a subset of machine learning (ML), has recently been incorporated in fire studies leading to significant advances within different aspects of fire science.For instance, convolutional neural network (CNN) is a class of deep learning algorithms that utilizes convolutional layers to extract spatial features from images or grid format data sets.The fundamental concept underlying CNN lies in its utilization of convolution layers.Convolution is a mathematical operation involving a small filter to detect and capture local patterns.By leveraging CNNs, the spatial behavior of fire has been successfully captured (Hodges & Lattimer, 2019;Radke et al., 2019).The long short-term memory modeling (LSTM, see Section 2) approaches have also demonstrated their capability in predicting fire damage and duration (Z.Li et al., 2021;Liang et al., 2019).As an extension of LSTM architecture, convolutional-LSTM (Shi et al., 2015) combines the advantages of both LSTM and CNN, rendering it particularly well-suited for tasks that necessitate the simultaneous understanding temporal patterns and spatial information.The model first processes input through convolutional operations to generate feature maps, similar to traditional CNNs.Subsequently, LSTM cells take these results as input and maintain their hidden states over time, facilitating the capture of temporal dependencies.Kondylatos et al. (2022) incorporated meteorological, environmental and anthropogenic drivers into a convolutional-LSTM to comprehensively address the spatiotemporal context for wildfire danger prediction.Other studies leveraged ML/DL methods to characterize various aspects of fire occurrence, such as fire weather (Son, Kim, et al., 2022;Son, Ma, et al., 2022), lightning ignition (Coughlan et al., 2021), fire susceptibility (Zhang et al., 2021) and fuel availability (D'Este et al., 2021).
The main objective of this study is to leverage the advantages of DL in enhancing a process-based model, with a specific focus on improving biomass burnt damage simulation.To achieve this, we first develop a DL-based global fire model, consisting of three independent modules representing weather-driven fire danger, land properties, and anthropogenic effects on burnt areas.Subsequently, we integrate this model into the land model of the Icosahedral Non-hydrostatic Earth System Model (ICON-ESM), serving as a surrogate for process-based fire representation.Lastly, we apply an interpretable method on the DL model to analyze regional characteristics and the limits of our hybrid modeling approach.Compared to a previous DL surrogate fire model (Zhu et al., 2022), our study has advances in two folds: (a) we incorporate LSTM based recurrent model architecture to consider time dependent memory effects from dynamic weather and vegetation processes; and (b) our model training was based on observational data sets, except for fuel load, allowing it to be coupled with any DGVM.
SON ET AL. 10.1029/2023MS003710 3 of 17 model implemented to estimate fire damage based on combustible fuel availability and fuel dryness (Jungclaus et al., 2022).As one of the most simple fire representations, it can be applied in any global land surface model.The primary objective of the fire scheme is more focused on the disturbance effect on natural land cover changes, rather than fire occurrence and interactions, limiting its role on vegetation dynamics and carbon cycling in ecosystems.Instead, the previous version of JSBACH (JSBACH3.2) used the SPITFIRE fire model (Thonicke et al., 2010) to simulate global fire regimes, but it has not yet been implemented in JSBACH4.
In the simple fire scheme, the fuel availability is represented by the total litter density (   ) and is compared to the litter threshold (  0 ).The fuel dryness is estimated from surface level air relative humidity (  rh ) smoothed with a persistence factor (p) at each time step (Equation 1).When the humidity decreases lower than its threshold (  rh0 ), the fraction of burned area (  FBA ) is assumed to linearly increase as humidity decreases: (1) where,   denotes the frequency of fire occurrence: set as 6 years for woody and 2 years for grass type vegetation.The burnt fraction is computed on a daily time step and is utilized to update the relocation of carbon and nitrogen, assuming that all vegetation within the burned area perishes, notwithstanding the fact that only a part of it undergoes complete combustion.We take the simple fire model (hereafter referred to as JSB4-simple) as the baseline for model evaluation.The standalone version of JSBACH4 is used to run JSB4-simple with the default configurations as used in JSBACH3.2 and described in Reick et al. (2021).

Deep Learning (DL) Fire Model
The deep learning fire model (DL-fire) is composed of three modules: weather-driven fire danger, land properties and anthropogenic effects (Figure 1).The development of the modules for weather danger (W-LSTM) and land properties (L-LSTM) are based on the long short-term memory network approach (LSTM) (Hochreiter & Schmidhuber, 1997).LSTM is an advanced recursive neural network to handle temporal dynamic behaviors from  ([b,18], gray vector).The output of each module is merged by element-wise multiplication, resulting in a dimension of 8 (red vector), which matches the number of PFTs, except for the bare land type, used in L-LSTM.The burnt fraction is finally calculated by summing up the merged output per PFT, which is obtained by taking the inner product between the output and the PFT vector (orange vector), and multiplying it with two physical constraint terms.sequential data.The key aspect of the LSTM approach is its memory unit, called cell state that maintains information on states over timesteps, and its update is regulated by input and forget gates: (3) where  sigmoid is the sigmoid function,  tan ℎ is the hyperbolic tangent function, and ⨀ denotes the element-wise product of vectors.The output dimension of the LSTM is set to 8 to be equal with the number of the plant functional types (PFTs), except for the bare land type.
The anthropogenic effect module uses two layers of fully connected feed-forward network: where   denotes the input vector for anthropogenic variables and   is hidden layer vectors.The   and   terms are weight matrices and bias vectors for the input and hidden vectors.The function  act represents a nonlinear transformation using a softplus function (Dugas et al., 2000) in this study.The vector   is the output vector of the anthropogenic effect module that has the same dimension as the outputs of the W-LSTM and L-LSTM modules.
The final output, the fraction of burned area, is the computed sum of all PFTs, except for the bare land type, after multiplying results of the three modules and the fractions of PFTs (orange vector in Figure 1).Also, we use the fraction of bare land (  _bare ) and snow (  _snow ), fuel (above ground plant litter in JSBACH4) and relative humidity not only as LSTM input predictors, but also as constraints on fire occurrence and intensity: × fire prone area × dry fuel availability (11) fire prone area = 1 −  _bare −  _snow (12) where   ,   ,   denote output vectors of W-LSTM, L-LSTM and anthropogenic effect modules and  _PFTs denotes the fractions of PFTs.We use sigmoidal curve function (   ) to transform relative humidity into a non-linear space. rh0 is the threshold of relative humidity for fire occurrence set as 60 (%),  fuelnorm is normalized fuel using its maximum and minimum values during the training period (Equation 15).

Burnt Fraction
For model training and evaluation, we used daily burned area from the Global Fire Emissions Database (GFED4) (Randerson et al., 2015) and calculated the burnt fraction for each grid cell.The GFED4 burned area product is based on the Moderate Resolution Imaging Spectroradiometer (MODIS) Collection 5.1 (MCD64A1 v5.1), globally available at 0.25° × 0.25° spatial resolution.
Extreme data imbalance between instances of fire and no-fire is observed over all regions (Table 1).If the data with a large proportion of no-fire instances are directly used for model training, it is highly likely to mislead model outputs to converge into zero values.In order to reduce the risk of zero convergence, we adopt two strategies.We first used a Gaussian kernel with 30 days of window size to smooth the burned area (step 1 in Table 1).Subsequently, we downsample no-fire instances according to ratios in Table 1 (step 2 ratio), reducing the imbalanced ratios to be close to 1:1 for all regions.

Input Variables
The DL-fire uses 50 predictors which are divided into three sub-modules to predict burnt fraction illustrated in detail in Tables 2-4.The weather danger module (W-LSTM) uses 9 predictors, including anomalies of temperature, specific and relative air humidity.Weather variables, such as temperature, specific/relative air humidity, wind speed and precipitation, are obtained from ERA5 (Hersbach et al., 2020) and lightning climatology is based on a data set from the spaceborne Optical Transient Detector (OTD) and Lightning Imaging Sensor (LIS) on the Tropical Rainfall Measuring Mission (TRMM) satellite (Cecil et al., 2014).The anomalies are calculated by extracting daily climatology (mean values on a day of year basis) during the years 1950-2020.
The land property module (L-LSTM) takes 23 predictors including the water volumes in four soil layers which are obtained from ERA5-Land (Muñoz-Sabater et al., 2021) and the Leaf Area Index (LAI) which is derived from the collection-5 MODIS LAI product (Myneni et al., 2015).We also calculate daily anomalies for the water volumes and LAI using the above mentioned method during 1950-2020 and 2003-2020, respectively.The topographic factors, such as elevation, slope and roughness, are taken from Amatulli et al. (2018).The amount of fuel is simulated by JSB4-simple.The area distributions of plant functional types (PFTs) are obtained from Pongratz et al. (2008), given as inputs for running JSBACH4 and we remap PFTs to be nine types as outlined in Table 3.
The anthropogenic effect module (A-NN) takes into account a total of 18 predictors from five different characteristics: population density (Klein Goldewijk et al., 2017), gross domestic product (GDP) and human development Fire:no-fire Step 1 Step index (HDI) (Kummu et al., 2018), total road density (Meijer et al., 2018) and 14 fractions representing the state of land use (Hurtt et al., 2020).
All the input variables are regridded and aggregated to a daily timestep and 0.25° spatial resolution to be consistent with the GFED4.Except for PFT fractions constrained in the range of [0,1], we normalized predictors using maximum and minimum values of each region based on the training period ( ,train_max and ,train_min , where r denotes a GFED region in Figure S1 of the Supporting Information S1), ideally to be in the range of [0,1]:

Model Setup for Training and Simulation With JSBACH4
We develop 14 regional models based on GFED reference regions (Figure S1 in Supporting Information S1).
The dimension of the hidden layer is set to be 64 for all the three module architectures and dropout regularization is implemented for the anthropogenic module layers with 10% of probability to randomly inactivate neural

Table 3
Input Predictors for Land Property Module (L-LSTM) and Their Brief Ecological Description network nodes.For the LSTM modules, the sequence length of training data set is set to 14 days.We use the mean square error (MSE) loss function with ADAM optimizer (Kingma & Ba, 2014) by setting the learning rate to 0.001 and batch size to 1,024.To avoid overfitting on the training data set, we stop model training after a span of 30 epochs where no further improvement is observed in the validation data set.
The DL-fire is trained without coupling to the dynamics of JSBACH4, as an offline learning approach.When the DL-fire is integrated into JSBACH4, all the land properties are provided by physics-based dynamics processes, except for topography.The other predictors are set to be forced by data sets used for model training and it allows the evaluation of simulation results from the year 2001.We perform experiments on the R2B4 ICON-grid system with spin-up time of 51 years, starting from the year 1950, and evaluate simulation results from 2001 to 2015.During the spin-up period (before the year 2001), we set all anthropogenic variables to be static at the state of 1st January 2001.

Evaluation Metrics
To quantify the performance in simulating spatial variation, we apply the normalized mean error (NME) with area weights suggested by Hantson et al. (2020):  We calculate the Pearson correlation coefficient between daily (r d ), monthly (r m ) and interannual (r i ) variability in predicted burnt fraction and GFED4, and the mean phase difference (MPD) to evaluate seasonal variation (Kelley et al., 2013).To quantify a distance between two phases, time unit is first transformed as an angle vector: where   denotes month (January-December).Then real (   ) and imaginary (   ) component vectors are calculated by: The phase (   ) is described by direction of the vectors (Equation 20) and MPD quantifies the phase difference by Equation 21: where  P is phase from model simulation and   from observation at grid cell   .

Layer-Wise Relevance Propagation
To interpret the decision making process of the DL-fire model, we apply the layer-wise relevance propagation (LRP) (Bach et al., 2015) to decompose contributions from the input space.LRP computes relevance scores for each individual input by propagating relevance from the model output back through the neural network layers.While the total amount of relevance scores in each layer is kept consistent, the relevance in a layer is redistributed to the previous layer considering weights and input values, and this process repeats until getting the scores for the input layer.Here, we normalized relevance scores for each timestep so that the absolute values sum up to 1. Then we composite the normalized scores during the evaluation period to compare relative attribution with a global aspect.

DL-Fire Model Evaluation
Globally, the predicted burnt fraction shows a good overall accordance with the GFED4 estimates during the evaluation period (Figures 2a and 2b) with a NME of 0.64 (Table 5).The pattern of seasonal cycle is also accurately captured with 0.3 of MPD and 0.73 of r d .Monthly aggregated predictions show a comparable correlation score (r m = 0.80) to that of a previous DL model (0.76) (Joshi & Sukumar, 2021), noting that the evaluation period is different for both studies.However, high fractions, especially in the second half of the years 2011 and 2012, are underestimated (Figure 2c) indicating a degrading performance skill in interannual variability (r i = 0.35).
Regionally developed models vary in their performance skills.All the regional models show a NME lower than 1.0 and the best score is achieved in the northern part of South America (NHSA, 0.48), whereas NME is relatively high in regions where it shows large burnt fractions, such as Boreal North America (BONA), the southern part of South America (SHSA), the southern part of Africa (SHAF) and Central Asia (CEAS).The model for Central America (CEAM) shows high predictability in seasonal variation with 0.19 of MPD, and the BONA, SHSA, Africa and Equatorial Asia (EQAS) also perform well with a performance higher than 0.8 of r d .The lowest daily correlations are obtained in the temperate North America (TENA, 0.47) and CEAS (0.41), showing underestimations in each of the fire seasons (Figures S2b and S2k in Supporting Information S1). 8 out of 14 regional models perform well on predicting interannual fire patterns with higher than 0.8 of r i .The least interannual predictability is shown across Southeast Asia (SEAS) and SHAF (r i = −0.14, 0.08) due to a lack in detecting high burnt fractions (Figures S2i and S2l in Supporting Information S1).These results, especially due to the SHAF region, cause a considerable drop in the interannual predictability at the global scale.

Coupling With JSBACH4
When the DL-fire model is coupled with JSBACH4 (JSB4-DL-fire), burnt fraction prediction skill is significantly enhanced in comparison to the simple fire model (JSB4-simple).JSB4-DL-fire improves NME score from 0.75 to 0.67 at the global scale, and NME decreases in 10 out of 14 regions (Table 6).Although burnt fractions in Africa and Siberia are underestimated, JSB4-DL-fire successfully captures the spatial variation of burnt fraction, especially across fire prone regions, such as Africa, South America, and Australia (Figure 3a).
Furthermore, burnt fractions in fuel-limited areas are improved to be close to zero in JSB4-DL-fire.JSB4-simple sets nonzero constant parameter for the minimum degree of fire damage (see Section 2), the results of JSB4-simple show higher than 0.1%/year of damage over almost all areas, including deserts and extremely cold regions (Figure 3c).Due to this oversimplified parameterization, arid areas and high latitudes, such as BONA, TENA, Europe (EURO), Middle East (MIDE) and Asia (BOAS and CEAS), show poor NME scores (2.34, 2.49, 2.06, 6.10, 1.40, and 1.39, respectively).These discrepancies are effectively addressed by JSB4-DL-fire with fuel and PFT constraints, improving NME to be lower than 1.0 across all the regions, except for MIDE.
The global spatial variation in fire seasonality is compared by visualizing the month with maximum fire damage per grid cell during the year 2001-2015 (Figure 3b).JSB4-DL-fire shows overall coincide fire season distribution with GFED4, and the best score of MPD is achieved over CEAM (0.19,  Table 6).Compared to JSB4-simple, the seasonal phase difference in AUST is also improved (MPD = 0.26), but JSB4-DL-fire achieves slightly increased scores in 8 out of 14 regions.Nevertheless, the most notable improvement in JSB4-DL-fire is found in temporal correlations.While the global mean of the JSB4-simple simulation has a statistically insignificant relationship with GFED4 (r d , r m ≈ 0 and r i = 0.17), the JSB4-DL-fire considerably increases the correlations (r d = 0.61, r m = 0.79, r i = 0.37).We also compare their seasonality during 2011-2015 (DL evaluation period), showing that the month to month variability in JSB4-simple is highly underestimated, showing a limited range in monthly burned area values, whereas spatial and seasonal patterns of JSB4-DL-fire generally match well with GFED4 (Figure S3 in Supporting Information S1).
Regionally, the performance of JSB4-DL-fire is most marked in SHSA and SHAF (Figures 4e and 4i) with scores higher than 0.8 of r d (Table 6).JSB4-DL-fire also effectively reduces underestimation in NHAF and AUST (Figures 4h and 4n) as well as the overestimation in BONA, BOAS and CEAS (Figures 4a,4i,and 4k).Among 14 regions, JSB4-DL-fire enhances r d in 9 and r m in 12 of them.In terms of interannual variability, the biggest improvement is found in BOAS, increasing r i from 0.1 to 0.76, whereas the variability in SEAS and MIDE are the least predictable (−0.04 and −0.12, respectively).Although JSB4-DL-fire outperforms JSB4-simple in general, in comparison to the model validation results forced by observation (Table 6), the predictability of DL-fire is degraded over almost all the regions by integrating with JSBACH4.These changes in predictability by being coupled with JSBACH4 will be further discussed in terms of JSBACH4 internal biases in the next section.

Model Interpretation
To understand how the DL fire model makes its predictions, we implement LRP for evaluating the contribution of each predictor.Globally, the fraction of bare land shows the highest absolute attribution with more than 16.3% of relevance score (Figure 5a).Its role, as a key component in identifying no or low risk of fire, is highlighted across regions, where there are large portions of arid lands or deserts, such as SHSA, MIDE, SHAF, and AUST (Figures S4e, S4g, S4i, and S4n in Supporting Information S1).Fuel load also shows a high ratio of contribution (14.1%) based on its multiple roles as a constraint (7.4%) as well as an input of L-LSTM (6.7%).The volume of water in the 4th soil layer (SWL4) counts as the 3rd key factor associated with burned fraction in that it can be considered an extreme condition when dryness has reached deeper soil layers.Considering that the sum of soil dryness-related variable scores occupies 34.4% of the total relevance, the changes in soil dryness play as key drivers in the DL-fire.Evaluation Metric Scores for JSB4-DL-Fire (JSB4-Simple) Meteorological predictors, despite their small impacts in the global aspect (6.2%, Figure 5b), display significant importance in some tropical and high latitude regions.For instance, tropical rain forests are very fire-resistant during the wet season due to high humidity.Models trained over NHSA and EQAS show high relevance of relative humidity and temperature to capture the climatic characteristics and their distinct seasonality (Figures S4d and S4m in Supporting Information S1).The strong influence of meteorological predictors is also noticeable over BONA and BOAS, especially temperature contributes the most (12.3% and 16.4% respectively) (Figures S4a and S4j in Supporting Information S1).These results are associated with fire-climate interactions in boreal forests where fire frequency and extent are affected depending on temperature variation (Hu et al., 2015;Kim et al., 2020) and their positive feedbacks under climate change (Oris et al., 2014).

Discussion and Conclusion
In this study, we introduce a deep learning based fire model (DL-fire) and implement it within the physics-based land surface model JSBACH4.The DL-fire predicts burnt fraction based on weather conditions, land properties and anthropogenic effects, performing well in predicting spatial and seasonal variation.When the DL-fire operates as a coupled module within JSBACH4 (JSB4-DL-fire), the quality of fire damage simulation improves noticeably compared to the simple fire scheme in JSBACH4.However, the predictability of JSB4-DL-fire is not as accurate as the validation results of DL-fire forced by observation.Since the only differences between the two are from land property predictors, either observed or simulated, its main reason is presumed to be internal biases of JSBACH4.   3 and 4. (b) Compares the relevance between the groups and their scores are displayed on top of bars.
To investigate the impact of JSBACH4 internal biases on fire prediction, we compare the predictors from a validation data set and the simulated by JSBACH4.In terms of global perspective, the JSB4-DL-fire predictions overall underestimate burned fraction from May to September, and subsequently its rising and falling seasonal pattern is roughly a month lagged from September to February (Figure S5 in Supporting Information S1).These similar discrepancies are found in LAI over Africa.The simulated LAI in NHAF is overall underestimated with a month lagged peak in its seasonality (Figure S6h in Supporting Information S1).In SHAF, LAI shows opposite seasonal behavior from July to November (Figure S6i in Supporting Information S1), causing an underestimation of fire damage (Figure 4i).
Regionally, MIDE and SEAS show the most apparent discrepancies due to overestimation in JSB4-DL-fire.JSBACH4 shows a tendency to underestimate water contents in all the soil layers (Figures S7-S10 in Supporting Information S1), except for the content of the first layer (SWL1) in MIDE (Figure S7g in Supporting Information S1).Considering that water availability in the topmost layer plays a vital role for vegetation (Seneviratne et al., 2010) and agricultural productivity (Battista et al., 2016), the biases of SWL1 can mislead DL-fire to exaggerate combustible fuel amount or its conditions on the ground.Similarly, overestimated durations of burnt fraction and LAI in SEAS coincide with each other (Figure 4l; Figure S6l in Supporting Information S1).To effectively address internal biases of physics-based models, it was suggested to merge deep learning as an external post-processing method (Reichstein et al., 2019;Son, Kim, et al., 2022;Son, Ma, et al., 2022).However, this approach is not directly applicable in this study due to dynamical interactions between predictors and DGVMs.
Instead, an online training approach, developing the deep learning model concurrently with running DGVMs will be our next step to advance the function of DL-fire in ESMs.
Representing interannual variability in global burnt area is yet a continuous effort for improvement in fire-enable DGVMs.Most of the DGVMs have not yet proven to successful in reproducing interannual variability (Hantson et al., 2020), and their limited skills cause uncertainties for the global carbon budget estimation (Bastos et al., 2020).Previous DL model showed ability to capture observed interannual patterns, but it still requires further verification due to its short evaluation period (Joshi & Sukumar, 2021).Although JSB4-DL-fire either performs well at a global scale, significant regional improvements are observed with higher than 0.7 of r i over 6 out of 14 regions (Table 6).These results suggest that ML/DL based hybrid approach can be a solution for the interannual variability problems in DGVMs.
A critical limitation of our DL-fire model is its lack of mechanical understanding.In contrast to process-based fire models that incorporate scientific principles to account for physical processes and ecological interactions, our current model primarily focuses on estimating burned area statistics without explaining underlying fire processes, such as ignition, spread and extinction.Consequently, this limitation can potentially result in inconsistent or unrealistic simulations in terms of fire frequency and duration.Furthermore, another issue may arise regarding the fire-climate-carbon feedback when the land model is coupled as a part of an ESM, which has not been evaluated in this study.
To address the limitations of ML/DL, it has been suggested to develop independent models specifically tailored for fire ignition and spread (Tang et al., 2023).Also, there has been a proposal to integrate computational fluid dynamics and finite element methods with ML (Ye & Hsu, 2022).These suggestions highlight the potential for further advancements in fire modeling by leveraging the strengths of ML/DL in conjunction with established fire dynamics.By pursuing these avenues of research, our upcoming study will focus on developing DL processes that effectively harmonize fire dynamics.
Humans influence fire regimes in various ways that either promote or limit fire.Population growth and urban expansion generally result in the proliferation of human ignition sources, thereby increasing the likelihood of fires.
In contrast, fire suppression efforts and changes in land-use practices contribute to a decline in fire activity (Andela et al., 2017;Bowman et al., 2011).Our model underrates roles of these factors showing conspicuously low global relevance (0.05%, Figure 5b).These consequences can be due to a coarse time resolution of anthropogenic data set.Since all the anthropogenic variables are interpolated from annual records or used as static values, they cannot provide any information associated to seasonal variation or anomalous daily events.Besides, some of the major man-made fire damages, particularly agricultural burnings, can be explained by weather seasonality and vegetation states (Korontzi et al., 2006).However, it should be pointed out that our model globally utilized C3 annual crops (c3ann) the most among anthropogenic drivers (Figure S11a in Supporting Information S1) to identify crop related activities, and regionally in NHAF, BOAS, SEAS, and EQAS (Figures S11i, S11k, S11m, and S11n in Supporting Information S1).Population follows as the second influential anthropogenic factor and HDI also show relatively higher relevances in developed regions (0.02% in TENA and 0.07% in EURO), echoing their socioeconomic impacts on fire (F.Li et al., 2013;Teixeira et al., 2021).These results may suggest its potential of further improvement of human impacts on fire activities with more sophisticate data set and adapted model architecture.
Small fires play a crucial role in shaping the global-scale patterns of burned areas and carbon emissions (Van Der Werf et al., 2017).However, our current model lacks training data for sub-500m fires, as it is not readily available on a daily scale.To compensate, we additionally evaluate our model's performance by considering burned fractions, incorporating sub-500m fires, but on a monthly scale (Randerson et al., 2012).Overall, the metrics demonstrate similar scores (Table S1 in Supporting Information S1), with a slight increase in NME over regions where the burnt fraction is underestimated, such as NHSA, NHAF, and SEAS (Figures 4d, 4h, and 4i), due to the inclusion of small fire, leading to an augmented observed burnt fraction.EURO shows the most notable decline in performance with r m of 0.61, although its biases are reduced by 0.66 in NME.We assume that the presence of a very small amount of burnt fractions resulting from Gaussian kernel smoothing prevents significant degradation in the model's performance, despite not explicitly considering small fires.
However, this approach does not comprehensively address the imbalance between small and big fires (Table S2 in Supporting Information S1), which could contribute to the suboptimal performance in extreme burnt fractions.The inclusion of small fires and their distinction from big fires should be considered another essential task to further enhance the model's predictive capabilities.
Regarding a global or local training approach, it can be argued which one in particular is a better option, either one single global model or multiple regional models.A global coverage model can be efficient in terms of model development and coupling with DGVMs, but for it not to lose regional characteristics, it may require more trainable parameters and higher complexity in architecture.When we tested to train a singular global model with the identical architecture as our local models (Figure S12 in Supporting Information S1), its global mean prediction accuracy notably decreased (r m = 0.1).As previously discussed (Zhu et al., 2022), the global training approach only demonstrates relatively satisfactory results across Africa, where high burned areas prevail (NME = 0.73 and 0.72 for NHAF and SHAF, respectively), while substantial biases are observed in other regions with low-fire areas.Considering that we were unable to achieve the desired outcome even with a higher dimension of hidden layers (not shown), there is a need to explore more advanced model architectures that can effectively incorporate local fire characteristics (Lehmann et al., 2014).For the local approach, there are two major points to be considered: (a) the number of regions that should be considered and, (b) whether a unified or a specialized model design for each region should be developed.Exploration of these options would enable us to further upgrade prediction performances, however, this is not addressed in this study.
One of the main purposes of ESMs is to project climate changes based on future scenarios.However, in this study, we decide not to project future fire regime changes with DL-fire, although it is technically executable.This is because our model is currently composed of 14 regional models, and it cannot practically reflect global bioclimatic changes.Finally, we argue that further approaches should focus on developing and training one global DL model coupled with the host land surface model, and by that learning aspects of regional fire variability which would support conducting fully hybrid projection simulations.

Data Availability Statement
We utilized the burned area data set from the GFED4 archive within the Global Fire Emissions Database (Randerson et al., 2015) as the primary target for developing and evaluating our model.Furthermore, meteorological date were retrieved from ERA5 (Hersbach et al., 2020), lightning climatology data were provided through the NASA Earth Science Data and Information System (ESDIS) project and the Global Hydrology Resource Center (GHRC) Distributed Active Archive Center (DAAC) (Cecil et al., 2014).The Earth Science Data Systems (ESDS) facilitated access to LAI from the MCD15A3H version 6 of the Moderate Resolution Imaging Spectroradiometer (Myneni et al., 2015).Global topography data are credited to Amatulli et al. (2018), GDP and HDI data sets are attributed to Kummu et al. (2018).Population density data were obtained from the History database of the Global Environment (Klein Goldewijk et al., 2017), total road density data was sourced from the Global Roads Inventory Project data set (Meijer et al., 2018), and land use states were acquired from the Land-Use Harmonization project (Hurtt et al., 2020).Our model simulation results are openly available in Zenodo at Son et al. (2023).

Figure 1 .
Figure 1.Flowchart of DL-fire model.An input vector with 50 predictors ([b,50], where b is the batch size) is divided into three sub-modules: 9 predictors for W-LSTM ([b,9], shown in blue vector), 23 predictors for L-LSTM ([b,23], green vector), and 18 predictors for A-NN([b,18], gray vector).The output of each module is merged by element-wise multiplication, resulting in a dimension of 8 (red vector), which matches the number of PFTs, except for the bare land type, used in L-LSTM.The burnt fraction is finally calculated by summing up the merged output per PFT, which is obtained by taking the inner product between the output and the PFT vector (orange vector), and multiplying it with two physical constraint terms.

Figure 2 .
Figure 2. Spatial and temporal comparison between and GFED4 and DL-fire predictions.The maps of (a) DL-fire and (b) GFED4 visualize annual burnt fraction averaged over evaluation period (2011-2015).(c) Compares global mean of burnt fraction from GFED4 (black) and DL-fire (blue).

Figure 3 .
Figure 3. Spatial maps of burnt fraction and its seasonality.The maps on the left (a.JSB4-DL-fire, c. JSB4-simple and e. GFED4) show annual burnt fraction averaged over the years 2001-2015, and the right (b.JSB4-DL-fire, d.JSB4-simple, and f.GFED4) visualize the peak month of burnt fraction.All areas with annual burnt fraction less than 0.1%/yr are masked out (white).

Figure 5 .
Figure 5. Global predictor importance assessment.(a) Shows predictors with the highest 30 LRP relevance scores and they are color-coded in four groups: weather conditions (blue), land properties (green), anthropogenic effects (gray) and PFTs (orange).Full names of PFTs and land use states (LU) are in Tables3 and 4. (b) Compares the relevance between the groups and their scores are displayed on top of bars.

Table 4
Input Predictors for Anthropogenic Effect Module (A-NN) and Their Brief Ecological Description   the simulated value and  cell area at grid cell   .  is the mean of the observed values.A smaller value of NME describes better agreement with observation and zero is for perfect match between observation and model simulation.If NME is larger than 1, model performance is worse than simple prediction with statistical mean value.