High‐Resolution Maps of Near‐Surface Permafrost for Three Watersheds on the Seward Peninsula, Alaska Derived From Machine Learning

Permafrost soils are a critical component of the global carbon cycle and are locally important because they regulate the hydrologic flux from uplands to rivers. Furthermore, degradation of permafrost soils causes land surface subsidence, damaging infrastructure that is crucial for local communities. Regional and hemispherical maps of permafrost are too coarse to resolve distributions at a scale relevant to assessments of infrastructure stability or to illuminate geomorphic impacts of permafrost thaw. Here we train machine learning models to generate meter‐scale maps of near‐surface permafrost for three watersheds in the discontinuous permafrost region. The models were trained using ground truth determinations of near‐surface permafrost presence from measurements of soil temperature and electrical resistivity. We trained three classifiers: extremely randomized trees (ERTr), support vector machines (SVM), and an artificial neural network (ANN). Model uncertainty was determined using k‐fold cross validation, and the modeled extents of near‐surface permafrost were compared to the observed extents at each site. At‐a‐site near‐surface permafrost distributions predicted by the ERTr produced the highest accuracy (70%–90%). However, the transferability of the ERTr to the sites outside of the training data set was poor, with accuracies ranging from 50% to 77%. The SVM and ANN models had lower accuracies for at‐a‐site prediction (70%–83%), yet they had greater accuracy when transferred to the non‐training site (62%–78%). These models demonstrate the potential for integrating high‐resolution spatial data and machine learning models to develop maps of near‐surface permafrost extent at resolutions fine enough to assess infrastructure vulnerability and landscape morphology influenced by permafrost thaw.

• High-resolution maps of near-surface permafrost on hillslopes were generated from machine learning models • The normalized difference vegetation index was the most important feature in prediction of near-surface permafrost • The support vector machines generally generated the highest balanced accuracy for predictions at sites where the model was not trained

Supporting Information:
Supporting Information may be found in the online version of this article. 10.1029/2023EA003015 2 of 17 Among the dominant factors controlling the distribution of permafrost in discontinuous regions are topography, vegetation, snowpack, and soil properties.A primary effect of topography is the influence on the amount of solar radiation absorbed by the soil surface.The magnitude of solar radiation received by the ground surface is influenced by topographic aspect with differences in north-south surface temperatures as great as ∼3-4°C (Hasler et al., 2015;Hipp et al., 2014).Elevation influences summer and winter air and soil temperatures and also influences the distribution and depth of snow (Jorgenson et al., 2010).Topography can also affect permafrost stability by controlling the distribution of snow across the landscape.For example, snow accumulates in topographic concavities, leading to the deepest late-season snowpack in these locations (Bennett et al., 2022;Parr et al., 2020).Vegetation influences permafrost distribution by regulating the temperature flux from the land surface to depth within the soil, and the feedbacks between vegetation and snow distribution have particularly strong effects on permafrost (Jorgenson et al., 2010;Lawrence & Swenson, 2011).Thick permafrost and thin seasonal thaw layers have been documented under dense spruce and alder vegetation, indicating that thick vegetation cover can protect the ground from solar radiation (Campbell et al., 2021).Snow has high albedo, which limits warming of the snow surface, and low thermal conductivity, which insulates the ground from cold winter temperatures (T.Zhang et al., 2001).Together, these properties of seasonal snow cover can result in increased mean annual ground temperatures and subsequent permafrost thaw (T.Zhang, 2005;Atchley et al., 2015).Soil properties, specifically soil moisture and organic layer thickness also influence active layer thickness (ALT).The impacts of soil moisture on ALT are complex, and increases in soil moisture have been demonstrated to result in both shallower and deeper active layers (Clayton et al., 2021).Thicker dry soil organic layers can promote shallower ALT by providing an insulating layer and protecting soils from warm summer temperatures (Fisher et al., 2016).While the effects of topography, vegetation, snow, and soil properties are broadly understood, their relative impact on near-surface permafrost at a single site are not well known, and to-date, no single dominant variable controlling permafrost loss has been identified.
Permafrost distributions are typically mapped at the scale of 10s of meters (Pastick et al., 2015) to kilometers (Gruber, 2012;Obu et al., 2019;Ran et al., 2022).Maps of global permafrost distributions, for example, have been generated at ∼1 km 2 scale using estimates of mean annual ground temperature (MAGT) derived from measurements of mean annual air temperature and adjustments for topography (Gruber, 2012).Similarly, a 1 km scale map of permafrost distributions in the Northern Hemisphere was generated using the temperature at the top of permafrost model, which calculates MAGT based on measurements of mean annual air temperature and empirical adjustments for the thermal effects of snow cover (Obu et al., 2019).An updated Northern Hemisphere 1 km resolution map of MAGT and ALT, which is the seasonally thawed soil layer, was produced using an ensemble statistical forecast, which included information about freezing degree days, thawing degree days, leaf area index, snow cover duration, precipitation, solar radiation, soil organic content, soil bulk density, coarse fragment content, and compiled ground measurement data (Ran et al., 2022).The resultant maps provide the highest accuracy circumpolar maps to date, with a root mean square error or 1.3 ± 0.13°C for MAGT and 86.61 ± 19.61 cm for ALT.
At a higher spatial resolution, near-surface permafrost (within 1 m of the ground surface) has been mapped at 30 m resolution for all of Alaska using empirical relationships between nearly 17,000 observations of permafrost presence versus absence and environmental variables, including topography, vegetation, and climate, with an average accuracy of ∼85% (Pastick et al., 2015).However, near-surface permafrost extent in discontinuous regions is often variable at meter scale (Douglas et al., 2016;Léger et al., 2019), and decameter-and kilometer-scale estimates are too coarse to accurately capture permafrost distributions.Although these regional and hemispheric maps of permafrost extent are useful for assessments of the potential global impacts of permafrost thaw, the products are too coarse to resolve permafrost distributions at a scale relevant to infrastructure stability, to identify feedback processes driving permafrost degradation, or to assess landscape dynamics related to permafrost disturbance.
Geophysical techniques have been demonstrated to be powerful methods for mapping the extent of permafrost at high spatial resolution to intermediate depths of 1-10s of meters (Campbell et al., 2021;Dafflon et al., 2016;Uhlemann et al., 2021).For example, electrical resistivity tomography (ERT), which images the distribution of electrical resistivity in the subsurface, can illuminate contrasts in subsurface resistivity where highly resistive layers are interpreted as permafrost in the absence of shallow bedrock.Measurements of ERT on the Seward Peninsula in Alaska, revealed that greater snow accumulation around tall shrubs insulates the ground, hindering freezing during the winter months, whereas thinner snowpack above graminoids allows for freezing of the ground 10.1029/2023EA003015 3 of 17 surface and prevents warm surface water from infiltrating deep into the subsurface (Uhlemann et al., 2021).Similarly, ground penetrating radar and electromagnetic induction have been used to develop high-resolution models of subsurface properties to infer distributions of permafrost at Twelvemile Lake in Alaska (Briggs et al., 2017;Campbell et al., 2021).Although geophysical methods can generate high-resolution predictions of permafrost extent, these methods are time consuming, spatially limited, and often require significant post processing, which precludes the use of these methods for broad-scale predictions of permafrost distribution.
Recently, studies have demonstrated the potential for supervised machine learning models to generate high-resolution predictions of the extent of permafrost within relatively small study areas, but application of these models has generally been limited to mountainous environments (Deluigi et al., 2017;Zakharov et al., 2021).Specifically, support vector machines (SVM), logistic regressions, and random forests have been used to generate maps of alpine permafrost probability at 10 m 2 resolution within 588 km 2 of the Western Swiss Alps, with an accuracy >80% for the training data sets (Deluigi et al., 2017;Deluigi & Lambiel, 2013).The use of machine learning models for predicting near-surface permafrost presence in tundra landscapes have been limited to a few studies (Campbell et al., 2021;Daly et al., 2022).An artificial neural network (ANN) was developed to predict the probability of permafrost occurrence at 2.5 m 2 resolution using ground-truth training data from interpretations of ground penetrating radar transects at Twelvemile Lake, Alaska (Campbell et al., 2021).Although the method was only tested on a small spatial area, the results at Twelvemile Lake highlight the strong potential for the integration of geophysical measurements, remote sensing metrics, and machine learning algorithms to map permafrost distribution at fine spatial scales.Similarly, a binary logistic regression was developed using a combination of borehole ground truth data, topographic variables, and remote sensing products to generate maps of near-surface permafrost, with an accuracy of ∼73% for a site within the Northwest Territories, Canada (Daly et al., 2022).At the site, vegetation class was a strong predictor of permafrost presence with a higher probability of permafrost occurring under coniferous forests and peat plateaus.
Here we test the application of machine learning models to generation of high-resolution maps of permafrost extent for three watersheds within the discontinuous permafrost region on the Seward Peninsula in Alaska.Specifically, we test the performance of three different classes of machine learning algorithms: a random forest variant, SVM, and ANNs.We use geophysical measurements and field observations to identify areas of near-surface permafrost and the geophysical ground truth data, topographic metrics derived from lidar point clouds, and multispectral indices derived from satellite imagery to train machine learning models to generate high-resolution (3 m) predictions of the extent of near-surface permafrost at each of the sites.The machine learning predictions provide insight into landscape and ecosystem metrics indicative of near-surface permafrost presence.We then assess the potential for transferability of machine learning predictions to sites outside of the training area.

Study Locations
The three sites selected for this study are all located on the Seward Peninsula in western Alaska (Figure 1).The climate of the Seward Peninsula is characterized as a cool continental and subarctic climate within the Koppen-Geiger classification system (Peel et al., 2007) Two sites are located near the Bering Sea coast, northwest of Nome, AK, along the Teller Highway at miles 27 and 47, and are referred to as Teller 27 (TL27) and Teller 47 (TL47), respectively.Both Teller sites are underlain by discontinuous permafrost (Uhlemann et al., 2021), and at both locations vegetation is largely dominated by mixed graminoids and with patches of tussock tundra (Figures 1e and 1f), with willow shrubs prevalent in the gullies and along streams (Konduri & Kumar, 2021).The third site (Kougarok; KG) is located roughly 100 km northeast of Nome and is situated along a small tributary to the Kutzitrin River.Vegetation at KG is similarly a mixture of graminoids and willow shrub (Figure 1g), but unlike TL27 and TL47, alders are present along various hillslope positions at the site (Salmon et al., 2019).

Geophysical Observations
Geophysical surveys were conducted at each of the three sites in the late summers of 2018, 2019, and 2021 to identify permafrost locations during the period of maximum seasonal thaw depth.Electrical properties are commonly used to map the subsurface in permafrost landscapes because electrical resistivity is directly influenced by subsurface temperatures (Uhlemann et al., 2021).Electrical resistivity tomography (ERT) data were collected along ∼200 to ∼600 m lines at four transects at TL27, two transects at KG, and six transects at TL47 (Figures 1b-1d).The transects were chosen to image areas that were previously identified by frost probing to represent transitions between presence and absence of near-surface permafrost; however the ERT data provide a more continuous measurement of near-surface permafrost presence compared to probing alone.ResIPy (Blanchy et al., 2020) was used to pre-process and invert the ERT data to derive the resistivity distribution (Uhlemann et al., 2023).
Depth-resolved soil temperature data co-located with resistivity observations were measured using a novel distributed temperature probe, which contains an array of digital temperature sensors spaced at 0.05 m inside a 0.8 m steel tube (Dafflon et al., 2022).Each co-located measurement location was recorded using a GNSS RTK GPS unit.Co-located measurements of mean resistivity between 1 and 3 m depth and temperature values at 0.8 m depth were classified as locations with or without near-surface permafrost using a weighted kMeans classifier.Samples with temperatures <1.8°C were classified as permafrost, but if resistivity values exceeded 3,000 Ωm, the temperature threshold for classification was increased to 2.4°C.Full details of the co-located measurement classification schemes are given in Uhlemann et al. (2023).

Manual Determination of Permafrost Presence Using Field Observations
At the TL47 site, geophysical evaluations of permafrost presence and absence were supplemented with direct observations of soil profiles exposed in soil trenches, frost probing, and measurements of temperature to 0.75-1.20 m depth collected in September 2021 and August 2022 using a SpotOn temperature probe, which has an accuracy of 0.8°C, and was calibrated to 0°C in ice water.Points were classified as indicating the presence of near-surface permafrost if the temperature was <0.5°C.Precise location data were collected at each observation point using a GNSS RTK GPS unit and labeled as either containing or lacking near-surface permafrost.Additionally, permafrost absence was manually assigned to points located within the gullies, where bedrock is exposed.These observations contributed an additional 134 ground truth points to the TL47 data set.At TL27 and KG additional determination of permafrost presence or absence were determined from 56 and 46 measurements of temperature at 0.8 m depth, respectively.Locations were determined to be underlain by near-surface permafrost if the temperature was <0.5°C.With the combined geophysical and manual observations, there are 168 non-permafrost and 75 permafrost points at TL27, 40 non-permafrost and 187 permafrost points at KG, and 193 non-permafrost and 202 permafrost points at TL47.Ground-truth location data are provided in E. Thaler et al. (2023).

Machine Learning Model Training
At each of the three sites, we tested the capability of three different classes of machine learning models (extremely randomized trees (ERTr), support vector machines (SVM), and artificial neural networks (ANN)) to predict the extent of near-surface permafrost.These three algorithms were selected because they represent dominant machine learning classes and have been used for mapping mountain permafrost in the Himalaya (Baral & Haq, 2020).The ERTr and the SVM models were developed using the ExtraTreesClassifier and the svm algorithms, respectively, from the Python package SciKit-Learn (Pedregosa et al., 2011), and we used the Python package Keras to generate the ANN.While input feature standardization is commonly performed prior to training machine learning models when features span different ranges or different units, we did not scale the data because the actual values of landscape metrics are important for determining landscape position, which is known to be a predictor of permafrost presence (Jorgenson et al., 2010).
ERTr (Geurts et al., 2006) is an ensemble learning method similar to random forests in which the results of numerous decorrelated decision trees are aggregated to generate a classification result.Unlike random forest algorithms, which create decision trees over bootstrapped subsets of the data set, ERTr algorithms generate decision trees over the full data set.ERTr can provide higher performance on noisy feature data (Geurts et al., 2006) like those associated with high-resolution topography.SVM (Gunn, 1998) is a linear classification model, effective for high-dimensional data sets, that separates data into classes by fitting an optimal hyperplane whose dimensions are defined by the input feature vectors.The algorithm defines the widest possible margin to separate the classes in the feature space.SVM requires a defined kernel parameter which is used to project the input vectors into high-dimensional space.Although many kernels exist for SVM, here we use a linear kernel which allows for the retrieval of feature importance, or a mathematical representation of the input features with the greatest predictive power.ANNs are composed of an input layer, an output layer, and hidden layers (Hagan et al., 1997).Unlike ERTr and SVM models which generate binary maps of permafrost extent (presence or absence), results from ANNs are given as the probability of occurrence (0%-100%).The ANN results were binarized by setting a threshold of 50%, where pixels with values greater than the threshold are classified as permafrost.
Hyperparameters are variables that control the model training process, and are variable for each of the three models trained in this study.For the ERTr and SVM models, hyperparameter selection was optimized using the RandomizedSearchCV function in Scikit-Learn, which randomly samples hyperparameters from given distributions and evaluates model accuracy using a K-fold cross-validation.For both the ERTr and SVM, we use a five-fold cross-validation and evaluated model performance by training the models on 80% of the data using the optimized hyperparameters and testing the accuracy of the model predictions on the 20% training data set.For each training-test split, the data were stratified such that each set maintained the same percentage of classes as the full data set.Hyperparameter tuning for the ANN was performed using the kerastuner Python library.Accuracy assessment of the optimized hyperparameters was performed on a 20% testing data set.
The ERTr, SVM, and ANN models were trained for two types of assessment: (a) at-a-site prediction and (b) transferability.The at-a-site models were trained using four different sets of training data, one for each of the individual sites and one for each combination of sites which included the given site.For example, the models for TL27 were trained using the following ground truth data sets: TL27, TL27 + KG, TL27 + TL47, and TL27 + KG + TL47.
For each method, the at-a-site models were trained using an 80-20 training and test split, and for each training data set, we used a five-fold cross-validation to assess model accuracy and uncertainty.Model transferability was evaluated by training models using 100% of the data from one or two sites and testing the model at the site from which data were not included in the model training.For example, a model for was trained using all of the ground truth data from TL47 and KG and was tested on near-surface permafrost predictions at the TL27 site.
For accuracy assessments of the models, we report Balanced accuracy (BA), which is commonly used to evaluate classifications when the classes are imbalanced, as is the case with the permafrost training data at TL27 and KG.
BA is defined as where TPR is the true positive rate, or the proportion of true positives (permafrost presence) correctly predicted by the model, and TNR is the true negative rate, or the proportion of negative classes (permafrost absence) correctly predicted by the model.BA values of 1 indicate a perfect prediction.We additionally evaluated model predictive performance of the transferred models by plotting receiver operator characteristic (ROC) curves, which are plots of the TPR versus the false positive rate, for each model and then calculating the area under the curve (AUC) for each ROC curve.AUC scores demonstrate the model's ability to distinguish between classes and range from 0 to 1, where 0 indicates the model has no ability to separate between classes, and 1 represents perfect classification.We also report precision, which is a measure of the classifier's ability to identify the near-surface permafrost pixels and recall, which is a measure of the proportion of true positives classified correctly by the model (3)

Topographic and Vegetation Metrics
Lidar data were collected by the National Center for Airborne Laser Mapping in August 2021 (Singhania, 2021).
The lidar points clouds were processed to generate a digital elevation model (DEM), with the vegetation removed using the lidR package (Roussel et al., 2020) developed for the R software.The topographic rasters were produced at 3 m spatial resolution to match the resolution of the multispectral features.The metrics slope and aspect were calculated with the Geospatial Data Abstraction Library (GDAL/OGR contributors, 2022) using the 3 m DEM.Topographic curvature, a measure of landscape convexity and concavity, was calculated as a polynomial fit over a 45 m window, which is the window size that smooths out the microtopographic roughness present at the sites due to numerous hillslope failures (Del Vecchio et al., 2022), yet still captures the site-scale trends in topographic curvature, such as hollows and ridgetops.Vegetation canopy height models (CHM) were derived from the lidar point clouds at each site using the grid_canopy function in the lidR package.The function uses a point-to-raster algorithm, which can result in data holes in the CHM raster.The gaps in the CHM were filled using the three nearest neighbor pixels and an inverse-distance weighting algorithm.The normalized difference vegetation index (NDVI; (Rouse et al., 1974)) was also generated at each site using the near-infrared (NIR) and Red bands from 3 m PlanetScope imagery from the PlanetLabs satellite constellation (PlanetLabs, 2018) acquired during early August 2021, which is the peak vegetation greenness period for the region.The PlanetScope images were filtered for 0% cloud cover and snow cover.The CHM provides canopy structure and is primarily useful for distinguishing shrub (Salix spp.) dominated areas from those covered with ground-level or non-woody vegetation, such as the field horsetail (Equisetum arvense) and Sphagnum spp., whereas NDVI is useful for distinguishing between the various vegetation types at the sites.The classifiers were trained using either CHM, NDVI, or CHM and NDVI to test which vegetation metric produced more robust models.

Snow Cover Duration
PlanetScope imagery (PlanetLabs, 2018) was additionally used to calculate snow cover at 3 m spatial resolution within each study area.Snow covered area was estimated using the Blue Snow Threshold algorithm (E. A. Thaler et al., 2023), which uses a dynamic threshold in the blue wavelengths to generate binary estimates of snow cover.
Although the Blue Snow Threshold algorithm is unable to directly estimate snowpack thickness, an assumption is made that snow cover duration is greatest in locations with the deepest snowpack (Parr et al., 2020).For each pixel within each study area, the last numerical calendar day in 2021 with snow cover was determined by generating a time series of snow-covered area using each cloud-free PlanetScope image available at the sites beginning in mid-March, when snow cover begins to recede below full coverage at the sites.There is not a cloud-free image available for every day during the snow melt season; however, the number of images adequately reveals locations within each site where snow cover duration, and therefore snow depth, are greatest.The resultant data set is a raster with cell values indicating the last numerical day that a pixel was snow covered in the available imagery.
From March 1 to 1 July 2021, the number of images available for Teller 27, Teller 47, and Kougarok is 13, 15, and 17, respectively.Maps of the input features for each site are shown in Figure 2.

Feature Extraction and Importance
The remote sensing observations (topographic metrics, canopy height, and snow duration) were extracted at each of the ground truth observation locations.The high-resolution input feature rasters have significant variability between neighboring pixels; however, the noise in the data was reduced by generating a 3 × 3 window around each ground truth location and assigning the mean value of the nine raster cells to the center cell for each of the features.
For all three machine learning algorithms, feature importance for generating the predictions of permafrost extent was assessed from the 80-20 training-test splits.For the ERTr model, permutation feature importance, which measures the decrease in model performance after randomly shuffling a feature vector, was calculated.Features with greater importance result in a larger decrease in model performance when they are shuffled.Feature importance for the SVM models was determined by comparing the absolute value of the weight coefficients, which is a measure of how important each feature is in determining the separation of the hyperplane.This method of SVM feature importance is only applicable to SVMs trained with linear kernels.There are no native methods for extracting feature importance from ANNs; therefore, we utilized the SHAP Python package (Lundberg et al., 2020;Lundberg & Lee, 2017) to generate feature importance for the ANN models.

Comparison to Coarser Near-Surface Permafrost Map Products
Finally, we compared the percentage of near-surface permafrost predicted using the machine learning method with the greatest accuracy at each site to the extent of permafrost predicted by three existing near-surface permafrost products Pastick et al. (2015), Obu et al. (2019), and Ran et al. (2022), referred to as Pastick, Obu, and NIEER (Northwest Institute of Eco-Environment and Resources), respectively.Each of the three existing products gives probabilities for near-surface permafrost at each pixel.We convert the probabilities into binary representations of near-surface permafrost presence by assigning all pixels with a probability ≥0.5 a permafrost presence.Conversely, pixels with a probability <0.5 were assumed to be lacking near-surface permafrost.For each of the at-a-site predictions, the transferred predictions, and the three existing map products, we plot the predicted percentage of the study site with near-surface permafrost.

At-A-Site Predictions of Permafrost Extent
At a given site, predictions of permafrost extent generated by the ERTr model typically had the highest BA when training data from the site was incorporated into the model (Figures 3 and 4).Accuracies for TL27, KG, and TL47 were 0.70 ± 0.03, 0.91 ± 0.06, and 0.83 ± 0.05, respectively for the models that included NDVI as an input feature.Incorporation of training data from the other two sites generally decreased the prediction accuracy for the at-a-site predictions (Figures 3a-3c).For example, incorporating training data from KG and TL47 into the ERTr also trained at TL27 decreased the accuracy of predicted near-surface permafrost extent at TL27 by 6%.The SVM models had nearly equal maximum accuracy compared to the ERTr at TL27 and TL47 with maximum BA scores of 0.70 ± 0.03 and 0.83 ± 0.07, respectively but had lower accuracy at KG (0.82 ± 0.11).The ANN models consistently had lower accuracy scores across all sites with maximum accuracies of 0.63 ± 0.08, 0.57 ± 0.03, 0.72 ± 0.02 for TL27, KG, and TL47, respectively (Figure 3, Table S1 in Supporting Information S1).

Assessment of Model Transferability
Although the ERTr models generate predictions of permafrost extent with the greatest BA for the at-a-site tests, the SVM models consistently outperform when transferred to a site where they have not been trained (Figures 3d-3f).The balanced accuracies for the transferred ERTr models are on average ∼10% lower than the at-a-site models, ranging from 0.5 to 0.77.The greatest accuracy for the transferred ERTr models are those trained using TL47 + KG, TL47, and TL27 + KG at TL27 (0.62), KG (0.7), and TL47 (0.62), respectively.Balanced accuracies for the transferred SVM models are generally greater than the accuracies from the transferred ERTr models and range from 0.59 to 0.78.The greatest BA for the transferred SVM models are those trained using TL47 + KG, TL47, and TL27 + KG at TL27 (0.62), KG (0.78), and TL47 (0.77), respectively.For the three best transferred models, we report precision, recall, and AUC in Table 1.Generally, the transferred ANN models performed worse than the transferred ERTr and SVM models, with mean accuracies from all transferred combinations of training data of 0.59, 0.62, and 0.66, for the ANN, ERTr and SVM models, respectively (Figures 3d-3f).
Balanced accuracies for both at-a-site and transferred models were generally higher for predictions of near-surface permafrost presence when NDVI was included as the only vegetation feature compared to models that used CHM or the models that incorporated both CHM and NDVI (Figures S1 and S2, Tables S2 and S3 in Supporting Information S1).For the ERTr models that  substituted CHM for NDVI as the vegetation metric, there was in increase in at-a-site accuracy at TL27, with a maximum of 0.85, but the accuracy decreased at KG (maximum of 0.85) and TL47 (maximum of 0.71) (Figure S1, Table S2 in Supporting Information S1).CHM had little to no effect on the at-a-site or transferred ERTr, SVM, or ANN models compared to the models that included NDVI.Inclusion of both NDVI and CHM did not greatly influence the at-a-site or transfer accuracy of the models and generated results with comparable accuracy to the NDVI models (Figure S2, Table S3 in Supporting Information S1).Detailed results for the models developed with both CHM and CHM and NDVI are given in Supporting Information S1.

Comparison to Existing Map Products
Our at-a-site machine learning models predict the percentage of near-surface permafrost at the sites to be 13%, 94%, and 68% for TL27, KG, and TL47, respectively.The transferred models predict extents at TL27, KG, and TL47 of 28%, 66%, and 61%, respectively.There is significant disagreement between the existing permafrost mapping products available for the three sites (Figures 5 and 6).The NIEER product (Ran et al., 2022) predicts 100% permafrost extent at all of the sites, while the Obu map (Obu et al., 2019) predicts 0% permafrost at TL27, 100% at KG, and 33% at TL47.The Pastick product (Pastick et al., 2015) estimates the extent of near-surface permafrost as 15%, 87%, and 13% for TL27, KG, and TL47, respectively.

Discussion
Our machine learning models not only predict the current extent of near-surface permafrost at high spatial resolution within the three study locations, but they also provide information about the factors driving the predictions and give insight on the relationships between vegetation, snow, and topography that promote permafrost degradation.

Feature Importance
Feature importance is a key component of machine learning analysis, as it can provide insight on process drivers (Figure 7).Across all three implementations (ERTr, SVM, and ANN), NDVI is consistently among the top two most dominant predictive features, which demonstrates that vegetative greenness is a significant indicator of near-surface permafrost presence/absence.When NDVI is replaced by canopy height (CHM), the vegetation metric becomes is only the most important feature in the TL47 ERTr model, consistently overshadowed by the importance of slope and aspect in every other instance (Figure S3 in Supporting Information S1).When both CHM and NDVI are used as vegetation metrics, NDVI is consistently a more important predictive feature than CHM in all iterations across all sites except for the SVM and ERTr at TL27 (Figure S4 in Supporting Information S1).
The CHM provides information about the height of vegetation across the sites but is primarily useful for determining the height of shrubs and trees and is not necessarily useful for segregating vegetation beyond shrub and non-shrub.However, CHM might be a useful feature for predicting near-surface permafrost in landscapes that contain a mixture of trees and shrubs.NDVI is also useful for distinguishing shrubs, which have a higher NDVI value (Figure S11 in Supporting Information S1), from non-shrub vegetation, but unlike CHM, NDVI is more sensitive to shorter, non-woody vegetation, whose presence can be related to the presence or absence of permafrost (Figure S11 in Supporting Information S1).In regions of discontinuous permafrost, near-surface permafrost has been shown to be absent beneath shrubs (Salix) and horsetail (Equisteum) (Gill, 1973;Jorgenson et al., 2001;Nguyen et al., 2009).Snow becomes trapped beneath shrubs (Bennett et al., 2022;Pomeroy et al., 2006), which acts as an insulator, keeping the soil warmer than the cold winter atmospheric temperatures (Gisnås et al., 2014;Grünberg et al., 2020;Y. Zhang et al., 2018).Horsetail is a resilient plant, which grows readily in disturbed soils (Emers et al., 1995), such as those that are vertically or horizontally displaced as permafrost thaws (Nguyen et al., 2009).Given adequate spatial resolution, horsetail NDVI signature is distinguishable from other short non-woody vegetation (Riedel et al., 2005), such as Sphagnum moss (Figure S11 in Supporting Information S1) and lichen, which can be indicative of permafrost presence because mosses can keep soil temperatures low and inhibit thaw (Fisher et al., 2016;Grünberg et al., 2020).Whereas the CHM only provides information about the height of a plant, which is not necessarily related to permafrost presence, the NDVI signature of the various plants that relate to the presence or absence of permafrost is unique and is hence a superior vegetation metric for prediction of near-surface permafrost in ecosystems, such as the tundra at these study sites.Additionally, NDVI is a more readily available metric than CHM, which is typically derived from processing lidar point cloud data.
The feature importance from the machine learning models also reveals that topography could be an indicator of permafrost degradation at all three sites.For all three algorithms, topography is among the top two dominant factors at all three sites.At TL27 and KG, slope is the second most important feature, and aspect is the most important at TL47 for the ERTr models, while slope is the dominant feature at TL27 and KG for the SVM and ANN models, and aspect is the most important feature at TL47 for the SVM and ANN models.On shallow hillslopes, slow rates of surface and groundwater movement can cause water to accumulate, which provides an increase in thermal energy and can lead to increased permafrost thaw (McKenzie & Voss, 2013).Whereas on steeper hillslopes, water movement is faster, leading to less water ponding, less thermal input, and hence less permafrost degradation (Magnússon et al., 2022).Topographic curvature also influences late-season snowpack and snow drifts, such that concavities (areas of positive topographic curvature) contain deep snowpacks (Bennett et al., 2022) that can increase the thermal degradation of permafrost.Furthermore, shrubs (Salix) are most common in topographic concavities and aid in the trapping and longevity of snowpack throughout the season.The influence of the late-season snow cover is also demonstrated by the importance of snow duration in the predictions.At all sites snow duration is the third most important feature in the predictions for the ERTr models.Topographic aspect is an important predictor of near-surface permafrost loss at TL47 (Figure 7), where relatively steep south-facing slopes that receive the bulk of the summer solar radiation are lacking near-surface permafrost.In general the ERTr models tend to generate a more balanced feature importance than the SVM and ANN models; however, as noted above, the order of feature importance is not consistent between the algorithms (Figure 7).For example, although snow duration is consistently important across the ERTr models at all three sites, it becomes the least important factor in the SVM models at TL27 and KG and the second least important factor in the TL47 SVM model and is the second least important factor in all of the ANN models.The variability in feature importance among the three different model types is likely due to the different algorithms used for calculating feature importance for each of the models.Regardless of the variability and the specific mathematical reasoning driving the variability, it remains that NDVI is a consistently dominant predictive factor across all models and sites.

Model Applicability
Generating a set of input features that capture the inter-site variability in near-surface permafrost extent is critical for developing transferable machine learning models (Ran et al., 2022).Furthermore, determining the algorithms which produce the greatest accuracy when tested at non-training site is crucial for creating reliable, high-resolution maps of near-surface permafrost extent in locations where validation data are sparse or unavailable (Baral & Haq, 2020).Although the ERTr algorithm is useful for predicting near-surface permafrost extent for sites with ample training data (Figure 3), it does not perform well when transferred to sites where ranges in the values of the input features extend beyond the range on the training data set.On the other hand, the SVM and ANN models maintain roughly the same BA for the at-a-site predictions as for the predictions to the non-trained sites (Figure 3), which might suggest that, although at-a-site accuracies are lower, the models are more transferable.The SVM models offer a slight advantage over the ANN models as they, in general, generate predictions with a greater BA, are simpler to train and apply, and feature importance is simpler to extract from linear SVMs.
Our models should not be expected to transfer to other environments or ecosystems without additional information that incorporates climate variables.Here we have trained and tested models on three locations within the same ecosystem (tussock tundra with patches of willow shrubs), climate, and similar topography (hillslopes) where we would expect drivers of near-surface permafrost degradation to be similar.We do not expect that the models trained at these sites would transfer well to sites with vastly different vegetation or topography, such as boreal forests and floodplains.If they were applied to areas with known, continuous, near-surface permafrost, they would predict areas of non-permafrost, even when those locations are underlain by near-surface permafrost.Similarly, if our models were applied to temperate climate zones, such as the southern United States, they would also predict discontinuous permafrost in these locations, where clearly there is none.Incorporation of additional climate variables, such as mean annual air temperature, or geographic information, such as latitude, could help constrain the regions over which the models predict near-surface permafrost presence.
Because our models include only topographic and multispectral features, and beyond snow duration, do not include climate data, they are a static representation of the current state of near-surface permafrost at the sites.Hence the current machine learning models presented here cannot be used to generate updated estimates of near-surface permafrost extent under different climate change scenarios.However, incorporation of additional climate variables, such as mean annual air temperature (Hu et al., 2022) or ground surface temperature (Li et al., 2019) would make the models more dynamic and capable of generating predictions of changing permafrost extent when provided new temperature information.Furthermore, by training the models with ground surface or air temperature data, they can be forced with long-term temperature trend data from climate models to make decadal predictions of permafrost extent.Without incorporating these climate features into the machine learning models, the models will only represent the time period of ground truth observation and cannot be used for assessing future impacts of climate change on permafrost stability or the rate of permafrost degradation.
The machine-learning models for near-surface permafrost extent presented here represent a marked improvement in the spatial resolution of permafrost predictions through the incorporation of high-resolution (3 m) topographic and multispectral data.The effect of the spatial resolution is clearly demonstrated in the comparison of the machine learning models (both at-a-site and transferred) to the existing near-surface permafrost products (Figures 5 and 6).The Obu and NIEER products, which are produced at a 1 km 2 spatial resolution do not capture any nuance of the permafrost distributions at the sites.However, the Pastick product, which is produced at 30 m spatial resolution, is a clear improvement in capturing the degree of discontinuity in near-surface permafrost extent at the sites, and the predicted percentage of permafrost in the Pastick maps agrees best with the at-a-site model predictions at both TL27 and KG.These comparisons indicate the importance of developing high-resolution site-specific models of near-surface permafrost rather than relying on coarse pan-arctic products to inform infrastructure vulnerability or geomorphological dynamics associated with permafrost degradation.

Conclusions
High-resolution maps of near-surface permafrost extent in discontinuous permafrost regions are necessary to assess potential infrastructure vulnerability related to the changing thermal state of permafrost and to understand how the permafrost degradation is expressed in landscape form and structure at the hillslope-to watershed-scale.Although regional and hemispheric maps of permafrost extent have been produced, they are often at resolutions too coarse to be used to inform vulnerability assessments or to provide insight on hillslope-scale geomorphic responses to permafrost degradation.We trained and tested three machine learning models: ERTr, SVMs, and ANNs to generate high-resolution (3 m) maps of near-surface permafrost extent.The predictions derived from the machine learning models that we employ in this study demonstrate an advancement in the spatial estimation of the current extent of permafrost in three ecologically similar watersheds, and suggest that given adequate feature inputs, there is potential for machine learning models to be trained and transferred to predict near-surface permafrost extent in locations with similar ecology and topography, but where training data are sparse.However, predictions of near-surface permafrost extent in ecological and topographic settings, which are not included in the training data should be viewed skeptically, as ecosystem factors strongly influence the thermal state of soil.Our models, and similar models based on topographic-climatic features do not predict the true thermal state of the ground, rather they generate maps of potential permafrost terrains, suggesting that study site investigations are necessary to map the true soil thermal state.High-resolution maps of near-surface permafrost extent, like those presented here, are a tool for policy makers and land managers for assessing areas and infrastructure potentially underlain by near-surface permafrost and vulnerable to damage if the permafrost were to thaw.

Figure 1 .
Figure 1.(a) Map of study areas on the Seward Peninsula, Alaska (location shown in map inset) shown on the Bing Imagery basemap.The blue point shows the location of the city of Nome, AK.The three study locations (Teller 27, Teller 47, and Kougarok) are indicated by the white circles.Hillshade maps are of the three study sites (b) Teller 27, (c) Kougarok, (d) Teller 47.The points indicate the location of the ground truth observation of permafrost presence (green) and absence (black) derived from classification of co-located electrical resistivity and temperature measurements as well as spot temperature measurements, frost probe refusal, and soil pit observations collected in the late summers of 2018, 2019, and 2021.(e) Photograph of Teller 27 showing graminoids on the slopes and willow shrubs in the hollow (credit: JCR).(f) Photograph of Kougarok with graminoids and alder shrubs (credit: Nathan Conroy).(g) Photograph of Teller 47 showing graminoids on hillslopes and willow shrubs in hollows (credit: JCR).Coordinates are given in WGS84 UTM Zone 3N.

Figure 2 .
Figure 2. Maps of input features for model training at (a) Teller 27, (b) Kougarok, and (c) Teller 47. Canopy height and the topographic metrics (aspect, curvature, and slope) derived from lidar point clouds, and the normalized difference vegetation index and the last snow-covered day for each pixel were derived from PlanetScope imagery (PlanetLabs, 2018).Waterbodies and roads are not masked for the analysis.

Figure 3 .
Figure 3. Balanced accuracy for at-a-site model predictions at (a) Teller 27 (TL27), (b) Kougarok (KG), and (c) Teller 47 (TL47), with error bars showing uncertainty from the cross-fold validation.Balanced accuracies for the transferred models are shown in d-f.Accuracies are given for the extremely randomized trees (brown bars; ERTr), the support vector machines (pink bars; SVM), and the artificial neural network (blue bars; ANN) classifiers for each combination of training data.For example, TL27 + KG indicates the model training using data from only the Teller 27 and Kougarok sites.

Figure 4 .
Figure 4. Prediction of permafrost extent for Teller 27 (a), (d), Kougarok (b), (e), and Teller 47 (c), (f) from models that produced the best Balanced accuracy (BA) for the predictions overlain on hillshades for each site.Panels (a)-(c) are the predicted permafrost extent (blue polygons) derived from the at-a-site models, which utilized an 80-20 training-test split.The points indicate the location of ground truth data for near-surface permafrost (green) and non-permafrost points (black).Panels (d)-(f) show the predictions of permafrost extent from the transferred models which did not incorporate training data from the prediction site.For example, the model used for predicting permafrost extent at Teller 27 (d) used training data from only Kougarok and Teller 47.The mean BA for each prediction is given on each panel.The receiver operator characteristic curves, and the area under the curves (AUC) for each transfer (d)-(f) are shown in g, h, i for Teller 27, Kougarok, and Teller 47, respectively.An AUC score of 1 indicates a perfect prediction.

Figure 6 .
Figure 6.Estimated extent of near-surface permafrost for Teller 27 (a, b, c), Kougarok (d, e, f), and Teller 47 (c, f, i) from three existing map products: Pastick (Pastick et al., 2015) (first column), Obu (Obu et al., 2019) (second column), and NIEER (Ran et al., 2022) (third column) overlain on hillshades for each site.The blue polygons indicate that permafrost is predicted at the location, and where the blue polygons are absent (as in b), near-surface permafrost is predicted to be absent.

Figure 7 .
Figure 7. Feature importance derived from the extremely randomized trees (a)-(c), the support vector machines (d)-(f), and the artificial neural network (ANN) (ANN) (g)-(i) classifiers for each of the three sites.(a, d, g) Teller 27, (b, e, h) Kougarok, (c, f, i) Teller 47.These panels only show the classifiers trained using the normalized difference vegetation index, which are generally the higher performers.