Predicting Geothermal Heat Flow in Antarctica With a Machine Learning Approach

We present a machine learning approach to statistically derive geothermal heat flow (GHF) for Antarctica. The adopted approach estimates GHF from multiple geophysical and geological data sets, assuming that GHF is substantially related to the geodynamic setting of the plates. We apply a Gradient Boosted Regression Tree algorithm to find an optimal prediction model relating GHF to the observables. The geophysical and geological features are primarily global data sets, which are often unreliable in polar regions due to limited data coverage. Quality and reliability of the data sets are reviewed and discussed in line with the estimated GHF model. Predictions for Australia, where an extensive database of GHF measurements exists, demonstrate the validity of the approach. In Antarctica, only a sparse number of direct GHF measurements are available. Therefore, we explore the use of regional data sets of Antarctica and its tectonic Gondwana neighbors to refine the predictions. With this, we demonstrate the need for adding reliable data to the machine learning approach. Finally, we present a new geothermal heat flow map, which exhibits intermediate values compared to previous models, ranging from 35 to 156 mW/m2, and visible connections to the conjugate margins in Australia, Africa, and India.


of 16
Alternatively, statistical analysis (Goutorbe et al., 2011;Lucazeau, 2019) can be used to establish global models of GHF. Various features deemed appropriate for heat flow characterization are used to predict GHF in areas not covered with measurements. The known values are extrapolated to regions with a similar geological setting, often relying on global data sets. For example, Lucazeau (2019) presented a new global GHF model by empirically selecting an optimum of 14 geological and geophysical observables. However, global maps often lack information in ice-covered regions and are mostly just interpolated into these areas. Therefore, the results from these studies might be a simplification for Antarctica, and an empirical decision might not necessarily be the right choice.
A promising alternative has been presented by Rezvanbehbahani et al. (2017) for estimating GHF in Greenland. In their study, they use a machine learning algorithm to find the optimal predictors. However, some features might not be meaningful for polar regions or are overrepresented.
As in the study for Greenland, we apply a supervised machine learning regression approach with a more thorough choice of data sets. We use a slightly enhanced implementation and geological features from various global and regional model to evaluate the influence of the input data. Furthermore, we use combined heat flow data from three different data sets, including a thoroughly compiled database for Antarctica (Burton-Johnson et al., 2020). As a result, we test the method's performance on the geophysically well-known continent Australia and present a new heat flow map for Antarctica.

Method
To make quantitative predictions, a regression algorithm that captures the complex linkages between heat flow and its geological environment is required. For this, we use gradient boosting regression, a supervised machine learning technique, which iteratively creates an ensemble of regression trees. Each tree additively fits to the loss function gradient of the previous (Friedman, 2001). Rezvanbehbahani et al. (2017) already give a comprehensive mathematical description with the application to heat flow and geological data.
The advantage of regression trees is that they are able to capture nonlinear relationships by doing recursive partitioning of the data with logical splitting conditions. Therefore, the algorithm walks through every feature to find the best threshold for splitting, constrained by some user-defined criteria like the maximum depth, maximum number of leaves, and minimum samples per leaf of the tree. At each iteration of the gradient boosting procedure, a tree is trained on a subset of the data to predict the steepest gradient descent step. In contrast to gradient descent, which optimizes the model parameters, gradient boosting optimizes (boosts) the model or tree itself. Saving the trees as an ensemble model in memory enables to output predictions for any future sample. Unlike, for example, neural networks, the regression trees are traceable and easy to interpret.

Regularized Tree Boosting: XGBoost
We use an advanced and more regularized implementation of gradient boosting regression, the open-source library XGBoost (Chen & Guestrin, 2016), provided among others for Python, which is highly efficient in its performance. Some of the advantages are the possibility of regularization, stochastic subsampling, parallel processing, and built-in cross-validation for parameter tuning. For example, the algorithm can prune a tree backward and remove splits that are beyond positive gain. In contrast to basic gradient boosting, this implementation would not stop splitting upon the first negative loss but go deeper if positive loss follows.
The goal is to find the best tree model that fits the training data x i and target values y i . The quality of the established model is then reviewed by a defined objective function (Chen & Guestrin, 2016 where n is the number of training values and K is the number of trees. L is the loss function that measures how well the model explains the training data by comparing them to the predicted values ˆi E y . Ω is the regularization term that regulates the complexity of the tree f k . Each f k is defined by an independent tree structure q and leaf Further overfitting is prevented by introducing a shrinkage that scales every new tree added to the model by a factor ϵ. It, therefore, controls the influence of an individual tree, and empirical studies showed that small values (ϵ ≤ 0.1) lead to better predictions (Friedman, 2001).

Final Model Evaluation
The data are divided into a training and a test set. The former is used to build the final model, which is then validated to the test data. Here, we use squared errors as loss function and a fivefold grid-search to find the optimal values for shrinkage, maximum tree depth, and subsampling. A quantitative model evaluation is carried out analogous to Rezvanbehbahani et al. (2017) by calculating the normalized root-mean-square error (RMSE/mean) of the test set and the model score and R 2 score, which are relative measures for how well the model explains the variability between observed and predicted train and test values, respectively.
For our analysis, we split the available observed heat flow values randomly into 80% training and 20% test data, but make sure that all Antarctic measurements are within the training data set. According to Rezvanbehbahani et al. (2017), the prediction accuracy improves by increasing the number of samples from the region of interest within the training set. The grid-search results in a shrinkage of ϵ = 0.01 for K = 1,000, a maximum tree depth of 11 vertices, a subsampling of 0.7, meaning that prior to building a tree, the training data are further split up and only 70% of the data are used. We chose γ = 120 and keep the default value λ = 1.
Furthermore, the importance of an individual feature for the model is calculated for every tree and subsequently averaged over the whole model. It describes the reduction of uncertainty of the target values y i provided by each attribute split point and is weighted by the number of observations in the respective node. Consequently, the feature becomes more important when it is essential for reducing the loss function and, therefore, contributes to improving the prediction model. Highly correlated input features can lead to an improper interpretation of their respective relevance. The split points might be chosen equally often between these features, thereby decreasing their associated importance. Reducing the number of strongly correlated features is, therefore, essential for a sensible analysis.

Data
In the following, we present the data that are used for model construction. As target values y i we use a GHF measurement compilation from three different databases and for the predictor variables x i , we suggest a number of geological features, which are linked to each GHF value.

Geothermal Heat Flow
Global GHF measurements are available on http://heatflow.org/ (Hasterok, 2019). We complemented this data set with the database from Lucazeau (2019) and for Antarctica, we explicitly used the revised data set from Burton-Johnson et al. (2020). It is important to mention that these thermal gradient measurements can only be understood as estimates of GHF because several processes might influence the result, for example, surface temperature variation and hydrothermal circulation.
Anomalously high values (>200 mW/m 2 ) are assumed to be not representative of a whole continent, but related to local geothermal processes (10 0 − 10 1 km, Bachu, 1988) and, therefore, not considered in our statistical analysis. In Antarctica, they are also associated with low quality and appear to be exceptional ( Figure S1). GHF can vary significantly over a lateral spatial resolution of ∼10 km. Employing a relatively high model resolution of 0.25° lead to artifacts and strong gradients in the predicted heat flow map, which is partly related to the low resolution of the global data sets. Therefore, we scaled the resolution down to 0.5° which appears sufficient to resolve the continental trends. As we concentrate on Antarctica, we excluded marine measurements and all measurements assigned the quality "D: Data not used in heat flow maps" according to Lucazeau (2019). As a consequence, the global ∼70,000 measurements are reduced to ∼10,000 after filtering, binning, and limiting to continental data (>1,000 m below sea level). The mean heat flow of the remaining measurements is 64 mW/m 2 with a standard deviation of 25.6 mW/m 2 ( Figures S2 and S3).

Geological and Geophysical Information
Data used for the model prediction need to be selected carefully and only features with a possible relation to heat flow are useful (Table 1). Furthermore, we tried to use data sets where independent information are included to prevent possible biases and global maps with homogeneous coverage. Some of the data sets used in earlier studies are quite unreliable for Antarctica or superseded by efforts to incorporate local data in recent years. A model built on information that is not available or associated with high errors in the region of interest is impractical. Therefore, we only chose data sets that include reliable data for Antarctica.
The crustal thickness is an essential parameter for heat flow estimation. We used a kriging interpolation of different Moho depths, which was compiled using active source seismic data  incorporating available seismic stations for Antarctica from the USGS GSC database (Mooney, 2015). In contrast to CRUST1.0 (Laske et al., 2013), it does not involve tectonic regularization, which results in smoother transitions. We chose the lithosphere-asthenosphere boundary (LAB) from the LithoRef18 model , calculated by joint inversion and analysis of multiple data sets: gravity anomalies, geoid, satellite-derived gravity gradients, and elevation along with seismic, thermal, and petrological prior information. The initial lithospheric thickness is a hybrid model based on several global tomography models. Tectonic units Schaeffer and Lebedev (2015) 6 Gravity mean curvature Ebbing et al. (2018) 7 Vertical magnetic field Ebbing et al. (2021) 8 Distance to ridges Coffin et al. (1997) 9 Distance to trenches Coffin et al. (1997) 10 Distance to transform faults Coffin et al. (1997) 11 Distance to young rifts Şengör and Natal'in (2001)  Along with depth information, we also used several distance measures to geological features like trenches, transform faults, and ridges from the PLATES project by Coffin et al. (1997). Likewise, we included the distance to young rifts from Şengör and Natal'in (2001), where young means not older than 65 Ma (according to Goutorbe et al. (2011)). Elevated heat flow is closely related to the vicinity of volcanoes. Geothermal exploration is often carried out in volcanically active regions, which might introduce a bias. On the other hand, volcanoes can be useful indicators of high heat flow. We used locations of Pleistocene volcanoes from the Global Volcanism Program (2013). For Antarctica, possible locations are given by van Wyk de Vries et al. (2018), who used ice sheet bed-elevation data to locate conical structures in West Antarctica, supported by aerogravity, satellite imagery, and databases of confirmed volcanoes. Yet, we only took results with an intermediate or higher certainty factor (≥3).
Indicators of the geological composition of the crust are gravity anomalies. We used the mean curvature, inferred from the two horizontal and independent satellite gravity gradient components (Ebbing et al., 2018). This parameter is used to interpret gravity anomalies by trying to delineate geometric information of subsurface structures from an observed nongeometric quantity. It describes how much a line deviates from being straight or a surface from being flat. Also, we wanted to reduce the effect from the density contrast between crust and mantle since we are already employing a Moho depth individually and, therefore, used the isostatic residuals.
We, furthermore, used a composed topography with the global EARTH2014 (Hirt & Rexer, 2015) and the Antarctic Bedmachine model (Morlighem et al., 2020). The former is an improved global topographic model compared to ETOPO1 (Amante & Eakins, 2009), which we combined with the newest available topographic high-resolution model of Antarctica based on mass conservation.
Schaeffer and Lebedev (2015) produced a tectonic regularization of the Earth's crust by clustering the resulting large phase and group velocity data set from the global tomographic model SL2013sv (Schaeffer & Lebedev, 2013). It clusters six continental and oceanic regions of different ages with a relatively low resolution of two degrees. To handle this categorical feature, we label encoded the clusters in the order of decreasing age: cratons: 1, PreC, fold belts modified cratons: 2, Phanerozoic continents: 3, Rigdes Backarcs: 4. We decided not to onehot encode this feature to decrease the dimensionality and facilitate the interpretation. Nevertheless, training the model with the one-hot encoded tectonic feature leads to almost identical results.
The magnetic signal from the crust is an important feature that potentially correlates strongly with GHF. The deepest magnetic sources can be assumed to correlate to the Curie depth, the temperature at which rocks (mostly magnetite for the crust) lose their ability to orientate in the direction of the applied field. The depth of this isotherm would be one of the most important factors for estimating GHF when known with low uncertainty (Lösing et al., 2020). We did not use Curie depth estimates themselves because it was always by far the most dominant feature in first tests. Regarding the uncertainties behind the Curie depth estimation (Núñez Demarco et al., 2020;Pappa & Ebbing, 2021), we prefer an independent prediction. Therefore, we use magnetic data from the satellite magnetic lithosphere model LCS-1 (Olsen et al., 2017). One problem arising with magnetic data is the directional dependency of the crustal field. To circumvent this, we use the magnetic field anomalies after reduction to the pole with an equivalent dipole approach .
In addition, we use the vertically integrated susceptibility model by Hemant and Maus (2005). The VIS model is established with a GIS-approach using a global geological map as a base. Within the geological units, laboratory susceptibility of different rock types and seismically inferred crustal thickness is used to define an optimized model by inverting satellite magnetic data. Although the model is not entirely independent from the magnetic data sets, we use it as an indicator of the prevailing geology and trust that the machine learning approach will be able to make an adequate weighting of the individual data sets.

Regional Data for Antarctica and Its Gondwana Neighbours
Besides the global data sets, we add regional data sets to the analysis. While our focus is on an improved assessment of GHF in Antarctica, we will also replace the Moho depth from the global data sets for the adjacent continents. Latest during Gondwana, southern Africa, Australia, and East Antarctica were connected (Meert & Van Der Voo, 1997). Tectonic affiliations can be observed (Daczko et al., 2018;Mulder et al., 2019) and are supported by gravity and magnetic data (Aitken et al., 2016;Ebbing et al., 2021;Ferraccioli et al., 2011). Similar to Pollett et al. (2019), we assume here that the tectonic history of these now separated continents has been relatively quiescent and that the crustal part is primarily influencing the surface heat flow in thermotectonic stable terrains (Förster & Förster, 2000;Mareschal & Jaupart, 2013).
Regional data for this part of Gondwana might improve the prediction model. The data coverage and quality for Australia are significantly better than for the other parts, and hence, we use Australia for the first test of our approach. For Australia, an estimate of the Moho depth is available by AuSREM (Australian Seismological Reference Model) (Kennett et al., 2018). For Southern Africa, we use the seismic Moho depth model by Youssof et al. (2013), which is concentrated on the Kalahari craton and not the entire region. The lithospheric model for Antarctica is based on the integrated geophysical data from Pappa et al. (2019). Additionally, we evaluate the application of a seismological model AN1 by An et al. (2015aAn et al. ( , 2015b. The latter is modified so that the West Antarctic Moho (everything west of the Transantarctic Mountains) of AN1 is replaced with the most recent and refined model by Shen et al. (2018) (called Shen18, hereafter).

Results
First, the validity of the machine learning approach is tested for Australia. Following, we investigate the significance of individual features and their influence on the model statistics. Next, we show the predicted geothermal heat flow in Antarctica and finally present some of the uncertainties of this approach.

Predicting Heat Flow for a Well-Known Region
For the first test of our approach, we predict GHF for Australia, where considerably more measurements are available (values have been prepared as described in Section 3.1) and geophysical information might be more accurate than for Antarctica. Figure 1a shows the prediction for the Australian continent and the available measurements.
The model agrees well with most of the measurements, but few very high values (≥120 mW/m 2 ) are somewhat underestimated. However, the result resembles the kriging model from Pollett et al. (2019) with the three major provinces: eastern, central, and western Australia, ranging from values between ∼30 and 40 mW/m 2 in the Yilgarn Craton to 100-120 mW/m 2 in the Cenozoic part of East Australia and the Adelaide Fold Belt.
Because of the relatively high amount of measurements, we can use them exclusively to establish the Australian model. This leads to a measurably better correlation (R 2 increases by 0.05) between actual and predicted values for Australia than by using all global values (Figure 1b). Only one value of 159 mW/m 2 stands out. It is more accurately predicted by all global measurements, due to the higher amount of high heat flow values (≥160 mW/ m 2 ) within the global data. For Antarctica, this approach is not practicable because of the sparsity of continental measurements.

Testing Individual Features
The prediction strongly relies on the deployed features. Therefore, it is important to review their quality and their significance for heat flow. We can evaluate the different impacts of each feature on the training set (model score), the test set (R 2 score), and the normalized RMSE by running the algorithm with only one feature at a time (Figure 2). It is noticeable that high model scores are not necessarily accompanied by high R 2 scores or the other way around. The scores range from 0.10 to 0.29 for the training set and from 0.001 to 0.11 for the test set. However, the RMSE does not vary significantly, with values between 0.37 and 0.39. The distance to volcanoes is yielding the highest score in the model training and the second-lowest RMSE. Hence, the proximity to volcanic regions is essential for heat flow prediction and is most accurate according to the regression boosting algorithm. The feature "Tectonics" inferred from seismic tomography indicates a high R 2 score but a low model score.
Further, we investigated the influence of the number of features by gradually increasing it from 1 to 12 (only one Moho and LAB depth, Figure 2). We successively add the feature with the next highest model score, according to their ranking. Therefore, we start with the distance to volcanoes, add Moho depth subsequently, and so on. The progressions of the model score and R 2 score and the normalized RMSE show convergent behavior. Up to six features, the scores and the RMSE change noticeably, after which they reach a plateau, and adding more features has no influence on these measures anymore. The resulting scores for all 12 features are 0.94 for the training set and 0.44 for the test set, meaning that our prediction model explains almost half of the data variety. The normalized RMSE settles in a value of 0.29, so on average, the prediction makes a relative error of 29%.

Heat Flow Prediction for Antarctica
The 12 different features (Table 1) are now used to predict the heat flow for Antarctica. For the Antarctic Moho and LAB depths, like in all former calculations, we chose the lithospheric model by Pappa et al. (2019). Figure 3 shows the resulting heat flow model and the few existing measurements on the continent with their respective measured heat flow value.
Overall, the machine learning approach predicts higher heat flow in West than in East Antarctica, which is consistent with previous heat flow models (e.g., An et al., 2015b;Martos et al., 2017;Shen et al., 2020). The predicted mean heat flow of 63 mW/m 2 is close to the global mean of 64 mW/m 2 . Exceptionally high values up to 156 mW/m 2 are predicted for the region between Ellsworth Land and Marie Byrd Land, which are lower than the extremely high measurement of 180 mW/m 2 (Fisher et al., 2015) in this area. The heat flow in East Antarctica appears to be more homogeneous and ranges mostly between 50 and 60 mW/m 2 . However, we find the lowest values (<40 mW/m 2 ) in Dronning Maud Land and around Dome Fuji. Most of the available measurements in Antarctica fit well to the predictions. In general, they are somewhat underestimated. Plotting predicted values against their actual counterparts (Figure 4) shows that most predictions are close to the exact prediction on the diagonal line. Still, heat flow values higher than 90 mW/m 2 are all below. Especially, the exceptionally high value of 187 mW/m 2 in Enderby Land, which is the binned mean of two measurements (Nagao & Kaminuma, 1983) is immensely underpredicted. It is visible that this particular value has no considerable influence on the prediction and the immediate neighbor, a relatively low measurement of 50 mW/m 2 is fitted better. In contrast, the high value beneath the West Antarctic Ice Sheet has a significant influence, and a large area exhibits noticeably warm heat flow.
How the features contributed to the minimization of the loss function can be shown as importance of each feature. According to the algorithm, proximity to volcanoes and Moho depth have the most substantial effect on heat flow predictions (Figure 4). The subsequent order of feature importance's can vary slightly. Magnetic field and LAB seem to have an inferior influence and are, in this case, the two least important features.

Uncertainties
The uncertainties of this method are considered to be significantly dependent on the features used. Figure 5 shows heat flow predictions for three different selections of features and their respective differences to the result presented above (Figure 3).
Because of the high importance of the Moho depth on the prediction (Figure 4), we investigate the influence of using a different lithospheric model than in Section 4.3. Therefore, we exchange it with the combined lithospheric model AN1 and Shen18. The prediction for this case shows mainly colder values, especially in East Antarctica (Figures 5a and 5b). In a broad region of George V Land and Wilkes Basin, we can observe the highest differences of about 30 mW/m 2 .
Further, we show the heat flow prediction with only global input features (Figures 5c and 5d). It is visible that coastal regions are slightly colder, and central Antarctica exhibits slightly warmer areas.
Lastly, we used only six features with the highest model scores according to Figure 2 (volcanoes, Moho depth, trenches, LAB, ridges, and transform faults). Here, the same regional models as in Section 4.3 are used. Compared to the results in Section 4.3, both East and West Antarctica are considerably colder, particularly between Ellsworth Land and Marie Byrd Land with the highest differences up to 66 mW/m 2 (Figures 5e and 5f).
We calculated the maximum absolute difference between all presented prediction models (results from Section 4.3 and Figures 6a-6e). It is visible that West Antarctica is considerably more uncertain than East Antarctica, with absolute differences of up to about 82 mW/m 2 . High uncertainties are also observed in George V Land and Terre Adélie, whereas most of East Antarctica indicates minor differences between 0 and 25 mW/m 2 . The resulting maximum and minimum heat flow maps are intended to indicate the upper and lower boundary for potential dynamic ice sheet simulations ( Figure 6).

Discussion
The established prediction model exhibits an R 2 score of 0.44 in the test set and a relative RMSE of 29%. Both values are lower than the respective results from Rezvanbehbahani et al. (2017), which most probably results from the applied Gauss filter on heat flow in their study. The reduced variability of the heat flow values can be captured more accurately by the low-resolution global models. We decided not to Gauss filter the measurements because binning and excluding extremely high values (>200 mW/m 2 ) is already a significant modification.
The first test for Australia shows a close fit between actual and predicted heat flow. The actual measured values are considered to be more reliable and to better reflect the regional scale due to the higher spatial coverage. Further, Australian heat flow estimates are derived from bedrock boreholes in contrast to most Antarctic ones, which allows the measurement of thermal conductivity and heat production from samples (e.g., Mather et al., 2018;Pollett et al., 2019). Thus, regarding the close fit and the agreement with previous studies (Artemieva & Mooney, 2001;Cull, 1982;Pollett et al., 2019), we are optimistic that the method gives reasonable results.
Likewise, for Antarctica, the predicted values fit closely to the actual measurements. However, values higher than 90 mW/m 2 are predicted slightly colder. One reason for this is possibly that the ratio between global heat flow measurements above and below 90 mW/m 2 is about 1:3 and for continental values almost 1:4. On the other hand, measurements are likely biased toward tectonically active regions and hence high values due to the nature of geothermal exploration. Therefore, we are not aiming for a perfect prediction of the measurements because that would mean overfitting, which is already restrained by the algorithm. And, regarding the uncertainties that go along with these measurements (Burton-Johnson et al., 2020), it is probably more realistic to have low variance and hence a more reasonable regional trend.
On the other hand, the method primarily relies on the measurements. As a result, the area around the South Pole is likely to exhibit a close fit to the value of 61 ± 1 mW/m 2 (Price et al., 2002), which is different from suggested elevated heat flow in this area based on estimated melt rates from ice sheet modeling (Jordan et al., 2018).  (Fisher et al., 2015) and 240 mW/m 2 (Clow et al., 2012), where latter has been filtered out before binning. Both go along with very high uncertainties (the former value is calculated from an earlier study on ice cores by Fudge et al. (2013) so no thermal equilibration was accomplished). Though there are two of these high measurements in the same place from different studies, it is more likely to have elevated heat flow in this area. In this study, we show the results with the inclusion of the former value into the training set. This yields noticeably high heat flow in West Antarctica, yet not as high as the actual measurements. Omitting this value leads to significantly colder heat flow in this area, indicating that the geological setting might be unique. One high value in East Antarctica (Lon: −39.5, Lat: −69.5) results from binning two values 184 and 190 mW/m 2 from one publication (Nagao & Kaminuma, 1983). Again both of these values are highly uncertain. A very close value of 50 mW/m 2 , also from the same publication, probably compensates for their influence (all three values are within ∼30 km). A comparably high variability is also found, for example, by (Begeman et al., 2017) measuring a value of 88 ± 7 mW/m 2 near the Whillans ice stream and close (∼100 km) to another measurement of 285 ± 80 mW/m 2 (Fisher et al., 2015). They suggest shallow magmatic intrusions or fluid advection in the crust as a possible explanation. In our model, we did not account for such high variability.
Testing the inclusion of global heat flow measurements up to 400 mW/m 2 in the algorithm leads to slightly increased heat flow in already warm regions with differences up to 40 mW/m 2 ( Figure S4). Still, values over 90 mW/m 2 are underpredicted ( Figure S5).
In this study, we focused on a feasible selection of features and analyzed their influence on the prediction. Using data, which are not available in polar regions due to the thick ice layer like heat production provinces or rock type, might improve the statistics for global prediction but has no added value for Antarctic heat flow prediction. Distinct choices of features can lead to very different predictions, especially for regions without constraining measurements, even if the model statistics do not change.
According to the algorithm, proximity to the next closest volcano and crustal thickness are good predictors with high training and test scores. Tectonic units by Schaeffer and Lebedev (2015) display contradicting scores, yielding the lowest model score but at the same time the highest R 2 score. This might emanate from its categorical character and the large-scale structures, which lead to a low variance prediction model. However, the order of feature importance for the final heat flow prediction (Figure 4) does not necessarily coincide with the order of the individual scores (Figure 2), which indicates that the algorithm builds a more complex relationship among the features than a linear regression would probably do.
Global models are often strongly inaccurate in Antarctica due to low data coverage. Therefore, we decided to included regional data and models, which are more detailed and incorporate more information from, for example, satellite data or exclusive Antarctic surveys. For the final heat flow model, we preferred the lithospheric model by Pappa et al. (2019) because it combines satellite gravity gradients with seismological estimates (from AN1), thermodynamic modeling, and the consideration of isostasy. Subsequently, it is validated with seismic Moho depth estimates. The combination of different geophysical observables makes the model more robust. The alternative lithospheric model (combination of AN1 and Shen18) exhibits lower values in George V Land and Terre Adélie. Specifically, this area represents the Antarctic tectonic counterpart of the Australian Gawler Craton (Payne et al., 2009;Williams et al., 2018). The high heat flow region in southeast Australia fits well to the warm region in George V Land in our presented map (Section 4.3), which supports our decision. At all conjugate margins, the predicted heat flow in East Antarctica (Section 4.3) exhibits visible connections to its presumed tectonic counterparts (Figure 7). In contrast, previous heat flow models (An et al., 2015b;Martos et al., 2017;Shen et al., 2020) do not indicate these connections (Pollett et al., 2019).
However, apart from the east-west division, there is no noticeable consensus. While the magnetic model by Martos et al. (2017) exhibits very high heat flow regions (130-180 mW/m 2 ) in West Antarctica, the model by Shen et al. (2020) is relatively cold and no heat flow higher than 90 mW/m 2 is found. Our result could be classified as an intermediate model with values between these rather extreme models. For comparison, we calculated the maximum absolute difference between the models by Martos et al. (2017) and Shen et al. (2020) and the predicted final model from this study (Figure 8). Therefore, we resampled the former models to a resolution of 0.5°. We can observe that the heat flow models mainly agree in central East Antarctica and very poorly in West Antarctica with a maximum absolute difference of 125 mW/m 2 .

Conclusions
Machine learning based on gradient boosting regression is a suitable approach for predicting heat flow, which has been demonstrated for Australia and the largely unknown content of Antarctica. The machine learning approach overcomes simplifications like using laterally constant thermal values and does not rely on a single geophysical method but uses several different features. Further, the algorithm learns from other continents with higher coverage of geophysical data and available heat flow measurements.
Our new Antarctic GHF model incorporates all available measurements and considers the information of in total 12 geological and geophysical observables, which contain information on regional tectonic settings and geology. To study the advantage of using regional data sets, we included crustal depth models of Antarctica and its conjugate margins of South Africa and Australia to improve and specify the prediction for Antarctica. A simple Gondwana reconstruction shows the similarities in heat flow predictions for the conjugate margins, illustrating the added value using the formerly adjacent continents as guidance for East Antarctica. While uncertainties remain, we provide minimum and maximum results, which can be used as input for further sensitivity analysis, for example, for ice sheet dynamics and ice flow (e.g., Rogozhina et al., 2012;Seroussi et al., 2017). Future efforts should not only aim to improve the statistics of the prediction model but deliberate about which and how many features are reasonable and trustworthy. As shown here, the addition of regional data is preferred for global data with limited accuracy in polar regions.
Furthermore, parameters like heat production and thermal conductivity have not yet been considered but would certainly change the results significantly (Lösing et al., 2020). However, to achieve this, a wide rock property database must be established. Also, using updated data like a new tomography for cluster analysis of tectonic units along with a multivariate analysis of lithospheric domain boundaries like Stål et al. (2019) could be beneficial.

Data Availability Statement
The authors are grateful for the revision and comments from Wolfgang Szwillus and thank Jonas Liebsch for the compilation of both global heat flow data sets. Heat flow data bases used in this study: (a) Hasterok (2019) http:// www.heatflow.org/ (downloaded 10.02.2020, an updated version is now available by Jennings et al. (2021)).