Antarctic Topographic Realizations and Geostatistical Modeling Used to Map Subglacial Lakes

Antarctic subglacial lakes can play an important role in ice sheet dynamics, biology, geology, and oceanography, but it is difficult to definitively constrain their character and locations. Subglacial lake locations are related to factors including heat flux, ice surface slope, ice thickness, and bed topography, though these relationships are not fully quantified. Bed topography is particularly important for determining where water flows and accumulates, but digital elevation models of the ice sheet bed rely on interpolation and are unrealistically smooth, biasing estimates of subglacial lake location and surface area. To address this issue, we use geostatistical methods to simulate realistically rough bed topography. We use our simulated topography to predict subglacial lake distribution across the continent using a binomial logistic regression, which uses physical parameters and known lake locations to calculate the probabilities of lake occurrences. Our results suggest that topography models interpolated without appropriate geostatistics overestimate subglacial lake surface area and that total lake surface area is lower than previously predicted. We find that radar‐detected lakes are more likely to occur in the interior of East Antarctica, while altimetry‐detected (active) lakes are expected to be found in West Antarctica and near the grounding line. We observe that radar‐detected lakes have a high correlation with heat flux and ice thickness, while active lakes are associated with higher ice velocity.


Introduction
Subglacial lakes are water bodies between the ice and bedrock that are produced through a combination of the insulation and pressure of the ice sheet, frictional heating, and geothermal heat flux (Pattyn, 2010;Robin et al., 1970;Siegert, 2000). They are an integral part of the Antarctic subglacial hydraulic system and are known to influence ice sheet dynamics by interaction with overlying ice, drainage, and the reduction of basal friction (e.g., Kamb, 1987;Siegfried et al., 2016;Stearns et al., 2008). Constraints on water storage are necessary for estimating the magnitude of past outburst events, predicting sea level rise contribution from meltwater discharge, and modeling ice sheet history (Dowdeswell & Siegert, 1999). The character, size, and spatial distribution of subglacial lakes provide constraints for present and former ice sheet dynamics and the processes that drive ice sheet movement (e.g., Bell, 2008). In addition to being important for ice sheet dynamics, subglacial lakes have biological and geological implications: With no sunlight, no exposure to the atmosphere, and cold temperatures, subglacial lakes are an extreme habitat that could contain unique forms of life (e.g., Christner et al., 2006Christner et al., , 2014Siegert et al., 2001), and these lakes may contain sediment columns that record up to thousands or millions of years of ice sheet history (Smith et al., 2018).
Subglacial lakes were first identified in radio-echo sounding (RES) data in the 1960s (Robin et al., 1970). In radar images (Figure 1a), lakes are characterized by bright, flat reflections at the ice/water interface (e.g., Carter et al., 2007;Oswald & Robin, 1973;Wright & Siegert, 2012). In some cases, the radar record is ambiguous (Matsuoka, 2011) and requires more refined processing techniques for interpreting the state of the basal interface (Schroeder et al., 2015). Less distinct lakes and basal water features have been identified with radar by their specularity (Carter et al., 2007;Schroeder et al., 2014;Young et al., 2016), scattering properties , and reflectivity after correcting for radar attenuation (Chu et al., 2016). Since their initial discovery, RES surveys have led to the discovery of hundreds of lakes beneath the Antarctic ice sheet (e.g., Siegfried & Fricker, 2018;Wright & Siegert, 2012), shown in Figure 1b. The largest of these, Vostok Subglacial Lake, has a surface area of 14,000 km 2 (Siegert et al., 2005).  Studinger et al., 2004). (b) Map of known active and radar-detected lake locations (Bell et al., 2007;Siegfried & Fricker, 2018;Wingham et al., 2006;Wolovick et al., 2013;Wright & Siegert, 2012). (c) Active lake detected by changes in surface elevation. Repeat ICESat profiles at Engelhardt Subglacial Lake (Track 0087 from Zwally et al., 2014) show a drop in elevation between 2003 and 2010. Efforts have been made to use radar-detected lakes to constrain the total amount of liquid water beneath the ice sheets in Antarctica. Dowdeswell and Siegert (1999) estimated that subglacial water bodies cover 54,000 km 2 in Antarctica. This study used about 70 lakes found in 400,000 line kilometers of radar data surveyed between 1967 and 1979 to extrapolate a continent-wide estimate by assuming that the amount of water found along the survey lines is proportional to the rest of the continent. This estimate assumed that the distribution of lakes is uniform throughout the continent and that the fraction of the flight lines containing lakes is representative of the rest of Antarctica. The radar data used in the Dowdeswell and Siegert (1999) study were recorded on 35-mm optical film (Drewry et al., 1982). The low image quality and analog nature of this data set make it difficult to observe features, so it is possible that not all the lakes were found. Livingstone et al. (2013) modeled lake locations by calculating hydraulic potential and estimated that subglacial lakes make up ∼4% of the ice/bed interface. A minimum area of 5 km 2 of water surface was used to identify lakes. After masking out cold-bedded regions using a thermodynamic ice sheet model, the fractional lake coverage dropped to 2.7%, or a cumulative area of about 350,000 km 2 , an order of magnitude larger than Dowdeswell and Siegert (1999). This suggests that bed temperature plays an important role in modeling both lake surface area and locations. Goeller et al. (2016) also mapped expected lakes using hydraulic potential with a grid resolution of 5 km, yielding 4.9% lake coverage, and then scaled down their modeled cumulative lake surface area by an order of magnitude based on the number of predicted lakes that were not confirmed by radar observations, yielding an estimated coverage of 0.6%, or 77,000 ± 18,000 km 2 .
The Livingstone et al. (2013) and Goeller et al. (2016) models relied on Bedmap2 bed topography (Fretwell et al., 2013), which was generated by combining all available radar measurements of topography and interpolating between measurements. This interpolation created unrealistically smooth bed topography. Bedmap2 contains regions with bed elevation uncertainty of up to 1,000 m (Fretwell et al., 2013), and interpolation artifacts from the smoothing process likely biased continental-scale models of subglacial lake size distribution toward larger lakes (Goeller et al., 2016;Livingstone et al., 2013). This bias is exacerbated by the 5-km grid resolution used in the Goeller et al. (2016) model (more than half of known lakes are shorter than 5 km Dowdeswell & Siegert, 1999).
Logistical and financial limitations on RES surveys used to measure bed topography impede the acquisition of high-resolution bed measurements for the entire continent. Gaps in data coverage are typically filled by kriging interpolation (e.g., Fretwell et al., 2013;Lythe & Vaughan, 2001), which has a smoothing effect. An alternative approach is to use available data to stochastically simulate topography so that unsurveyed areas have statistically realistic roughness. This method of "conditional simulation" is commonly used in mining applications and reservoir modeling (e.g., Deutsch & Wang, 1996;Journel, 1974) and can generate multiple simulations that reflect the spatial uncertainty in the model. Conditional simulations of bed topography have been implemented at a catchment scale (Goff et al., 2014), but no realistically rough digital elevation models (DEMs) exist at a continental scale for Antarctica.
Recent studies also suggest that subglacial lakes can exhibit dynamic behavior. Observations of vertical changes of the ice surface with interferometric synthetic aperture radar techniques have led to the discovery of subglacial water bodies with volume changes on time scales as short as weeks, referred to as "active" subglacial lakes (Gray et al., 2005) (Figure 1c). ERS-2 satellite radar altimetry and ICESat laser altimetry data improved the spatial and temporal coverage of active lake observations and uncovered over 100 lakes (Fricker et al., 2007;Smith et al., 2009;Wingham et al., 2006). These lakes have been observed to be hydrologically connected, with water cascading through multiple lakes and ultimately emptying across the grounding line (Fricker & Scambos, 2009). Subglacial lake drainage has been linked to changes in ice velocity (Siegfried et al., 2016;Smith et al., 2017;Stearns et al., 2008), so the subglacial water storage and drainage system may play a role in ice sheet models and sea level rise projections. Active lakes have also been identified beneath the rapidly flowing Thwaites Glacier (Smith et al., 2017). However, subglacial lakes identified through surface deformation are not characterized by a flat, bright bed in RES observations (e.g., Siegert et al., 2014;Wright & Siegert, 2012), suggesting that our physical understanding of these features remains incomplete. Mapping of active subglacial lakes is also limited by the relatively brief time scale over which we have continent-wide surface observations (Siegfried & Fricker, 2018).
In this paper, we generate conditional simulations of Antarctic bed topography to map subglacial lakes and investigate differences between active and radar-detected lakes. To overcome limitations from smoothing in Bedmap2, we simulate 100 DEMs with realistic roughness and demonstrate the influence of topographic roughness on subglacial water coverage. Multiple simulations create an ensemble of topographic realizations that reflect spatial uncertainty.
To predict lake locations and thereby further improve our surface area estimate, we implement a binomial logistic regression. This method uses statistical relationships between known subglacial lakes and bed elevation, heat flux (Martos et al., 2017;Maule et al., 2005;Shapiro & Ritzwoller, 2004), ice velocity (Rignot et al., 2017), strain (Alley et al., 2018), the curvature (i.e., second derivative) of hydraulic potential, ice surface slope, ice thickness, flow accumulation models (Schwanghart & Scherler, 2014), and the distance along the ice flow path from the grounding line to calculate the relative probability of finding a subglacial lake at a given location. This process improves the accuracy of lake surface area estimates by incorporating other conditions into determining lake locations rather than only using bed geometry. The regression analysis is done separately for active and radar-detected lakes in order to map their respective locations and suggest regions that may contain undiscovered lakes. The correlation coefficients between lake probability and each physical parameter allow us to determine the relative importance of certain conditions for lake occurrence, providing insights into the complexities of the Antarctic subglacial hydrological environment.

Methods
We conducted a geostatistical analysis of subglacial conditions in order to generate statistically realistic maps of bed topography and hydrology and to examine trends and properties of subglacial lakes. First, bed topography was simulated at long and short wavelengths. Longer-wavelength roughness was added to regions of Bedmap2 with bed uncertainty of 1,000 m. This was done through multiple-point direct sampling (Mariethoz et al., 2010), a geostatistical method that fills in gaps in bed topography by using other regions as training data (section 2.1.2). Small-scale roughness was added to the large-scale roughness results by simulating small-wavelength topography through a Fourier analysis of radar bed topography data, similar to the method used in Hu and Tonder (1992) (section 2.1.3). The simulated DEMs were used to calculate ice thickness, water routing models (Schwanghart & Scherler, 2014), bed slope, and curvature of hydraulic potential. Curvature was used to find areas of hydraulic concavity where water would be likely to collect. An eigenvalue decomposition was used to orthogonalize or reduce dependency between the data sets; this is needed to satisfy assumptions in logistic regression and avoid overfitting the model (Mandel, 1982). A binomial logistic regression analysis was used to statistically model subglacial lake occurrence. This regression outputs maps of the probabilities of lake occurrence across Antarctica (section 2.2.4). We applied a threshold to the probability maps in order to classify the continent into lake versus no-lake areas. To better understand which factors influence lake occurrence, the correlation coefficients between the physical predictors and the lake occurrence probability maps were used to study sensitivity. The general workflow, illustrated in Figure 2, is described in detail in the sections that follow.

DEM Simulation
We created multiple DEM realizations at 1-km resolution by simulating topographic roughness at short and long wavelengths and adding it to Bedmap2. We simulated large-scale topographic roughness through multiple-point direct sampling (Mariethoz et al., 2010) in regions of Bedmap2 with uncertainty greater than or equal to 1,000 m (section 2.1.2). For short spatial wavelengths, we added small-scale (high-frequency) roughness to the entire continent, similar to the approach used in Goff et al. (2014) where short-wavelength roughness is "draped" over a smoother surface. Small-scale roughness was based on the difference between Bedmap2 and radar measurements of bed topography (CReSIS, 2018). The smoothest and roughest extremes of this residual roughness were selected to generate a random surface using methods from Hu and Tonder (1992) (section 2.1.3). Fifty simulations were made with each roughness and added to the direct sampling results, totaling to 100 synthetic DEMs.

Topography Data
We used Bedmap2 topography (Fretwell et al., 2013) as the basis for our topographic simulations. Bedmap2 was created using traditional interpolation methods (e.g., cosine filtering and kriging). Topography was interpolated between radar flight lines with bed elevation picks at 1-km resolution. Eighty-three percent of the Bedmap2 cells are within 20 km of a data point (Fretwell et al., 2013), but some regions are as far as hundreds of kilometers from a data point. The unsurveyed regions were isotropically interpolated, creating unrealistically smooth bed topography. Bed elevation uncertainty can reach up to 1,000 m (Fretwell et al., 2013). Even cells containing radar-derived bed data were smoothed for internal consistency, reducing short-scale roughness.

Multiple-Point Direct Sampling Simulation (Large-Scale Roughness)
Multiple-point statistics (MPS) simulations were used to simulate topography in any region of Bedmap2 where bed uncertainty is 1,000 m or greater, such as the sample region in Figure 3a. MPS methods (e.g., Meerschman et al., 2013;Straubhaar et al., 2011) extract and store patterns and spatial dependencies between points in a training image in order to simulate information elsewhere. The multiple-point direct sampling method, developed by Mariethoz et al. (2010), implements MPS through direct sampling of the training image and simulates the grid cell of interest without having to store statistical information. MPS, in particular direct sampling, has been used in reservoir modeling to fill gaps by learning from the data where no gap exists (e.g., Caers et al., 2001;Strebelle, 2002). The DeeSse commercial software was used to implement direct sampling in our simulations. We defined Bedmap2 grid cells with uncertainty of 1,000 m or more as unknown (Figure 3b), and all other areas as part of the training image. We assumed that the surveyed regions of Antarctica are representative of the areas with gaps. Fifty realizations were generated so that many possible perturbations of topography could be used to estimate uncertainty in the subglacial lake maps due to long-wavelength topographic uncertainty. The added long-wavelength topography is shown in Figure 3c. Full methods and implementation are described in Mariethoz et al. (2010), Meerschman et al. (2013), and Straubhaar et al. (2011).

Fourier Analysis of Topography Data (Small-Scale Roughness)
We added short-wavelength roughness to the direct sampling results ( Figure 3d) through a Fourier analysis of radar bed topography from the Multichannel Coherent Radar Depth Sounder (MCoRDS) operated by the Center for Remote Sensing of Ice Sheets (CReSIS, 2018). We analyzed over 70 flight lines from 2010 and 2011 to view a wide sample of roughness behavior, shown in Figure 4a. We calculated the difference between MCoRDS topography data and Bedmap2 for the flights. Following the roughness parameterization described in Taylor et al. (2004) and Bingham and Siegert (2009), the residual roughness of each flight line was characterized by its roughness energy, or the integral of the power spectral density. The roughest and smoothest extremes of the residual topography were used as parameters for the simulations in order to encapsulate the roughness extremes in our model. The roughest residual topography was found in the Transantarctic Mountains, and the smoothest came from Thwaites Glacier. To create synthetic roughness, the power spectral density was converted to amplitude, and circular symmetry was imposed to represent amplitude in two dimensions. A random phase shift between 0 and 2 was applied to each element, and then the inverse Fourier transform was computed to produce a 2-D surface (e.g., Hu & Tonder, 1992). We generated 50  (CReSIS, 2018) used to determine small-scale added roughness. The roughness energy of the difference between the measured bed and Bedmap2 are plotted. (b) MCoRDS radar topography compared to Bedmap2 and our simulated topography. The plotted segments show the bed segments with the roughest and smoothest residual roughness (Bedmap2 topography subtracted from measured topography). The residual roughness from the "smooth" and "rough" segments were used to simulate small-scale roughness, which was added to the long-wavelength simulations to form the "smooth" and "rough" DEMs. The yellow lines in (b) show one realization of simulated topography at the sample locations. roughness simulations using the smoothest residuals, referred to as "smooth topography" throughout this paper, and 50 with the roughest residuals, referred to as "rough topography" (Figure 4b). This roughness was added to the MPS results, giving a total of 100 DEMs that include statistically realistic roughness. We assessed the quality of simulated topography by comparing the frequency content of MCoRDS bed topography and our simulated topography along flight lines.

Subglacial Lake Simulation
We employed a logistic regression to estimate subglacial lake locations. This method used statistical relationships between physical predictors and known lake locations to estimate the probability of having a lake at any given location in Antarctica. Our predictors included heat flux, ice velocity, surface slope, bed elevation, ice thickness, bed slope, strain rate, distance along ice flow path from the grounding line, flow accumulation, and hydraulic potential curvature. For each DEM, we calculated geometric parameters (ice thickness, bed slope, flow accumulation, and hydraulic potential curvature) from the simulated bed topography; other parameters (heat flux, ice velocity, surface slope, strain, and distance along flow path) were kept unchanged between simulations. We used this set of parameters in a regression to fit a model that calculates the probability of lake occurrence across Antarctica. We ran simulations with each topography data set and lake type combination (Table 1). We also modeled active and RES lakes with Bedmap2 topography for comparison.

Subglacial Lake Data
Known lake locations of RES-identified lakes were used to train the regression model. The largest database of subglacial lake positions comes from the Wright and Siegert (2012) inventory. This inventory contains both active and radar-identified lakes. We removed the active lakes from the inventory, leaving 246 RES lakes. We also included 105 RES lakes found in the Gamburtsev Mountain Range, identified by Wolovick et al. (2013).
We ran separate models with active lakes or lakes that are inferred by rapid changes in surface elevation. One hundred thirty-one lakes came from the Siegfried and Fricker (2018) database, which collated all known active subglacial lakes that have an outline. We included four additional lakes from the Adventure Subglacial Trench (Wingham et al., 2006) and four lakes located at the onset of Recovery Glacier (Bell et al., 2007). The upper Recovery lakes have been detected by radar with mixed success (Langley et al., 2011), but because of their proposed dynamic nature (Langley et al., 2011), we included them in the active lake data set.
Our logistic regression is informed by predictors at known lake locations, but it also requires information at known "non-lake" locations or places where radar surveying has shown that there are no lakes. The non-lake locations for RES lakes were identified using a map of radar coverage used in Bedmap2 (Fretwell et al., 2013). The locations with a known lake, or within a radius of the observed lake length, were removed so that the remainder of the flight lines could be considered "known" non-lakes. We assumed that all (non-active) subglacial lakes beneath radar flight lines were detected by radar and correctly identified. The non-lake locations for active lakes were identified using ICESat coverage (Zwally et al., 2014). Any point surveyed by ICESat that did not contain an active lake was considered a non-lake. Because there is a significantly greater number of non-lake data points than known lake points with which to train the regression model, the estimated probabilities of lake occurrence will be very low and should not be interpreted in an absolute sense (see further discussion in section 4.2).

Other Data Sets
We included geothermal heat flux to provide information on which areas have sufficient geothermal heat for the bed to be at the pressure melting point. We included the Shapiro and Ritzwoller (2004), Maule et al. (2005), and Martos et al. (2017) heat flux data sets. The Maule et al. (2005) and Martos et al. (2017) data sets were determined using magnetic susceptibility, where satellite magnetic data were used to estimate the Curie depth or depths at which rocks reach the Curie temperature and become nonmagnetic. These were used in a thermal model to estimate heat flux at the bed surface (Maule et al., 2005;Martos et al., 2017). The Shapiro and Ritzwoller (2004) heat flux distribution employed a seismic model to extrapolate heat flux measurements to tectonically and structurally similar regions that do not have measurements.
There is not an established connection between ice velocity and subglacial lakes, but we included ice velocity as a parameter so that we can explore a potential relationship. We used interferometric synthetic aperture radar-derived ice velocity data from the National Aeronautics and Space Administration's Making Earth System Data Records for Use in Research Environments (MEaSUREs) program (Rignot et al., 2017).
We included distance along ice flow path from the grounding line to investigate a possible relationship between lakes and position along flow pathways. This distance was determined by calculating the flowline at each grid cell and summing over the distance. The flowlines were calculated using the Antarctic Mapping Tools "flowline" function, which uses MEaSUREs ice velocity to compute the streamline from a given location (Greene et al., 2017).
We included strain rate in our logistic regression to explore a possible relationship with lake occurrence, particularly active lakes. We used the along flow, across flow, and shear strain calculated from satellite-derived ice velocity (Alley et al., 2018).
Flat ice surfaces have been used to predict the locations of subglacial lakes (e.g., Ridley et al., 1993). We calculated maximum surface slopes from Bedmap2 surface elevation data (Fretwell et al., 2013), and bed slope from our population of synthetic DEMs.
The synthetic DEMs were used to calculate the curvature of hydraulic potential and water routing models for inclusion in the logistic regression. Hydraulic potential gradient is often used to predict water movement (e.g., Goeller et al., 2016;Livingstone et al., 2013;Pattyn, 2010), but because it is a function of bed slope and surface slope, which are already used in our model, hydraulic potential gradient would be redundant. Instead, we used hydraulic potential curvature to find areas of concavity where water would be likely to collect. Hydraulic potential was evaluated as follows: where w is the density of water (1,000 kg/m 3 ), i is the density of ice (917 kg/m 3 ), g is gravitational acceleration, h is bed elevation, and H is ice thickness (Shreve, 1972). We calculated hydraulic potential and then took the second derivative.
We also estimated subglacial water flow paths using a water routing model to show the effect of upstream catchment area on lake location. We used the FLOWobj function from TopoToolbox (Schwanghart & Scherler, 2014) to delineate subglacial water drainage pathways based on subglacial hydropotential gradients and calculated flow accumulation. We created models using both "carve" and "fill" algorithms, which route water in different ways. The "fill" option routes water through the center of flat areas, while the "carve" option finds the deepest points in flat areas and adjusts the flow path to reach the deep spots.

Eigenvalue Decomposition
The regression input parameters were not independent of each other so they were decorrelated to avoid overfitting the regression model. For example, there is presumably some correlation between the heat flux data sets. To avoid attributing too much weight to trends that exist within multiple variables, we conducted an eigenvalue decomposition or orthogonal linear transformation to decorrelate the variables (Joliffe & Morgan, 1992;Mandel, 1982). We normalized each data set, then calculated the covariance matrix, and solved for the eigenvectors, or principal components. This results in 15 principal components that are an orthogonal representation of the data.

Binomial Logistic Regression
We used a binomial logistic regression to model a relationship between known lake locations and the principal components (i.e., orthogonalized parameter inputs). A binomial logistic regression model uses a logistic function to relate predictor variables to a binary dependent variable, in this case the location of subglacial lakes. It is commonly used to predict landslide hazards (e.g., Ohlmacher & Davis, 2003). We used a logistic regression to fit coefficients between the parameters and known lakes and then used the coefficients to calculate the log odds of lake occurrence, which were converted to probability: where the X values are the principal components and the C values are the coefficients.

Lake Statistics and Surface Area Calculation 2.3.1. Expected Area and Size Distribution
The calculated lake location probabilities from our logistic regression are relative, not absolute, so the expected cumulative lake surface area cannot be calculated by taking the sum of the probabilities of the grid cells multiplied by their areas. Instead, we set a threshold or cutoff point where all probabilities above a certain number were considered a lake, and all of the grid cells with lower probabilities were considered a non-lake. We used data from the Antarctic Gamburtsev Province Project (AGAP) Ferraccioli et al., 2011) over the Gamburtsev Mountains to constrain lake coverage for RES lakes. We used AGAP because it is densely surveyed and there are a large number of lakes with which to tune this threshold.
We assumed that all lakes in the area were detected. For each map, we analyzed the percent of known lakes and non-lakes generated as a function of the percent of the area covered by lakes. The intersection between these curves is the optimal threshold. We chose our search radius such that the cumulative length of lakes calculated along radar flight lines was within an order of magnitude of known lake lengths in this area.
In the Gamburtsev Mountains, there are ∼80 km of observed lakes along the flight lines, so we chose our radius so that our optimal threshold generates lakes that intersect roughly 80 km of flight lines ( Figure 5). This was used to pick a cutoff point that was applied to the rest of the continent to develop a binary map of lakes (section 3.3.1), which was used to calculate total surface area. We made histograms of lake length to show the distribution of lake sizes (section 3.3.2). Lake lengths were determined by finding the the area of each expected lake in the binary map, assuming it had a circular shape and calculating the diameter.

Correlation Coefficients
The correlation coefficients between each physical parameter and the lake probability maps shows the sensitivity of lake occurrence to each variable. For each simulation, we calculated a correlation coefficient for each physical parameter. This result suggests which variables are most important for predicting subglacial lake occurrence.

Model Accuracy Evaluation
To demonstrate the effectiveness of the lake location predictions, we computed the percent of known lakes that were matched by the simulations at a given probability cutoff threshold. We evaluated lake accuracy at different cutoff thresholds. Matches were determined using both a 5-km search radius, as used in Goeller et al. (2016), and a 20-km search radius. The 20-km search radius was used to account for topographic variability in the simulations that may have shifted lake positions. The results were averaged across all the smooth simulations and all the rough simulations.

Topographic Models
We generated 100 synthetic DEMs, the first Antarctic DEMs with statistically realistic topography. Fifty were simulated from smoother radar bed picks (Figures 6a-6c), and 50 were simulated from rougher observed bed topography (Figures 6da and 6e). The standard deviation of bed elevation between our set of simulations, shown in Figure 6g and 6h, ranges from 100 to 700 m.
The power spectral density plots in Figure 7 show the wavelength dependent power of the Center for Remote Sensing of Ice Sheets radar bed measurements (Figure 3), Bedmap2, and the simulated topography. The mean of the power spectrum for each smooth ( Figure 7a) and rough (Figure 7b) simulation is plotted. The power spectral densities for the simulated DEMs are more similar to the radar topography than Bedmap2 at all wavelengths. For the smooth topography, the spectral behavior of the simulated  topography resembles that of the radar bed measurements but has a slightly lower power at 1-km wavelengths. The rough topographic simulations closely match the spectral behavior of the bed measurements but have a higher power at subkilometer wavelengths. The rough topography has a higher corner wavelength than the smooth topography.

Subglacial Lake Simulations
We generated a subglacial lake simulation for each of the scenarios in Table 1: 1 with Bedmap2 and RES lakes, 1 with Bedmap2 and active lakes, 50 with smooth simulated topography and RES lakes, 50 with smooth simulated topography and active lakes, 50 with rough simulated topography and RES lakes, and 50 with rough simulated topography and active lakes. Here we describe the results of the RES lake and active lake simulations:

RES Lake Simulations
The probability maps for RES lakes (Figure 8) showed relatively high probability (P > 0.003) areas clustered in the interior of East Antarctica, especially at Dome A, Dome C, and Dome F. Vostok Subglacial Lake emerged with high probability. Dome F contained a large probable lake that persisted across different roughnesses and simulations. There were no probable lakes in West Antarctica or near the coast. The regional trends remained the same across simulations with different roughness, though the small-scale behavior varied (Figures 8f-8h). Known lakes in the interior of East Antarctica aligned well with probable lakes, but known lakes in West Antarctica or the South Pole region did not have high probabilities.

Active Lake Simulations
Relatively high probabilities of active subglacial lake occurrence (P > 0.0004) were widespread in West Antarctica and some sectors of East Antarctica (Figure 9). High probabilities occurred near known active subglacial lakes at Rutford, Institute, Foundation, and Recovery ice streams. Evans Glacier had high probabilities though no active lakes have been detected there. Pine Island and Thwaites glaciers showed high probabilities of active lake occurrence, as did Byrd Glacier. The regional trends were consistent across different roughnesses.

The Effect of Roughness on Surface Area and Lake Size for RES Lakes 3.3.1. Surface Area
The probability threshold values for determining what constitutes a lake were 0.0029, ∼0.0035, and ∼0.0033 for the Bedmap2, smooth, and rough simulations, respectively. These were used to generate maps of radar lake locations ( Figure 10). The subglacial lakes are clustered in the center of East Antarctica around Vostok, Dome C, and Dome A and extend upward toward Dome F. No lakes were generated near the coast or in West Antarctica.
The added roughness reduced the expected total surface area of subglacial lakes. Using the Bedmap2 topography 0.60% or 77,000 km 2 of the continent had a subglacial lake (Figure 10d). For the smooth topographic simulations 0.41 ± 0.12% of subglacial Antarctica had a lake or about 52,000 ± 15,000 km 2 (Figure 10e). And for the rough simulations, subglacial lake surface area fell to 0.32 ± 0.18% or 41,000 ± 23,000 km 2 (Figure 10f).

Size Distribution
The Bedmap2 lake model generates 6,078 lakes with a mean size of 12.6 km 2 . The smooth simulations generate more than double as many lakes and have a mean area of 3.3 km 2 . The rough simulations have a mean lake total area about half the size of that in Bedmap2 but have a similar number of lakes. About 30% of known lakes have an observed length of 1 km or less (Figure 11a), while in each simulation, roughly 80% of the lakes have a length of 1 km (Figures 11b-11d). The smooth, rough, and Bedmap2 simulations have similar lake length distributions.

Correlation With Physical Parameters
The correlation coefficients between the probability maps and the input parameters, shown in Figure 12, reflect the importance of each variable for predicting lake occurrence. Both types of lake have correlation coefficients of about 0.2 or greater for ice thickness and heat flux, though ice thickness is more strongly correlated with RES lakes. Distance along flow path from the grounding line is more strongly correlated with RES lakes than active lakes. Ice velocity has a correlation coefficient of ∼0.35 with active lakes but is independent of radar lakes. The water routing models are independent of RES lakes but have a positive correlation with active lakes. The "fill" water routing algorithm has a stronger correlation than the "carving" algorithm. The three heat flux data sets correlate similarly well with both active and radar lakes, though the Martos et al. (2017) data set performs slightly better for both lake types. Both varieties of lake are largely independent of strain rate, bed slope, surface slope, and hydraulic curvature.

Lake Simulation Accuracy
We assessed the accuracy of our lake location simulations by calculating the percentage of known lakes that were identified as a lake in our simulation for a range of probability thresholds (Figure 13). For RES lake simulations, the top 10% highest probability cells account for about 70% of known lakes. For active lakes, about 90% of known lakes were captured in the top 10% of the simulations with rough topography and about 70% for the smooth topography. Simulated active lake location accuracy increased with roughness.

Bed Topography
To the best of our knowledge, our topographic simulations are the first full-continent DEMs of Antarctica with simulated roughness and are the first glaciological application of multiple-point direct sampling. The direct sampling and Fourier analysis allow us to overcome the spatial limitations of RES surveys. It also helps ensure that inferred regional variations in subglacial hydrology are not impacted by kriging artifacts nor variations in the density of bed elevation measurements. While radar coverage across Antarctica has expanded in the past decades and mass conservation techniques can be used to provide topographic constraints (e.g., Morlighem et al., 2011), geostatistical simulation is the only way to realistically represent bed roughness. Furthermore, mass conservation techniques rely on ice surface velocity and make physical assumptions about the bed (e.g., Morlighem et al., 2011). Our DEMs were generated independently of surface observations, making them an unbiased parameter.
In this study, the synthetic topography enabled the creation of realistic lake size distributions, but it has the potential to improve other models as well. Topographic roughness has implications for water storage (e.g., Cael et al., 2017) and should be taken into account when estimating subglacial water storage and the magnitude of drainage events. Topographic simulations could be used to route subglacial water, constrain ice volume estimates, or allow us to accurately model sliding behavior. The size of topographic features in part controls basal sliding and deformation behavior (e.g., Weertman, 1964), so realistically representing Figure 11. RES lake length histograms for (a) known lakes, (b) expected lakes from simulation using Bedmap2, (c) expected lakes from simulations using smooth topography, and (d) expected lakes from simulations using rough topography. Lake lengths are shown on a logarithmic scale.
roughness could improve ice sheet models. The ensemble of multiple topographic realizations provides a basis for uncertainty quantification in ice sheet modeling.

Subglacial Lake Probabilities and Surface Area
Based on the characteristics of known lake and non-lake locations, we developed a probabilistic metric for subglacial lake location and integrated surface area. Because significantly more non-lake data points than known lake locations were used in the regression analysis, even the highest lake probability values were on the order of 10 −3 . The active lake regression analysis uses more non-lake data than in the RES lake analysis, so probabilities were about an order of magnitude lower than radar-detected lake probabilities. Therefore, the probabilities of active subglacial lake occurrence should be interpreted as relative, not absolute. The probabilities within each map are internally consistent but cannot be compared to probabilities for a different lake or topography type. Therefore, these are not the true probabilities of lake occurrence.
We calculated surface area estimates for RES lakes by using AGAP data to set a probability threshold for determining which probabilities constitute a lake (Table 2) but were unable to do so for active lakes. This is because we did not have reliable non-lake data set for active lakes with which to find an optimal surface area. Our surface area estimates show that cumulative RES lake surface area decreases with increasing bed roughness and that calculations made using Bedmap2 topography can overestimate surface area by as much as a factor of 4. Our cumulative subglacial lake estimate was between 0.14% and 0.53%, or roughly 18,000 to 68,000 km 2 . This is comparable to the Dowdeswell and Siegert (1999) and Goeller et al. (2016) estimates, which were 54,000 km 2 and 77,000 ± 18,000 km 2 , respectively, though neither study generated a map of lake locations. The Dowdeswell and Siegert (1999)  ours because they use the ratio of lakes intercepted by airborne radar flight lines as a constraint on surface area. Our estimate is an order of magnitude smaller than Livingstone et al. (2013), which estimated 2.7%, or 350,000 km 2 . This difference is likely because we used rough topography and set a probability cutoff point (section 2.3.1) for determining surface area.
Our expected radar-detected lake number and size distribution differs from previous studies and known lake length measurements. Only a third of known lakes have an observed length of 1 km or less, while our simulations show that roughly 80% of lakes are expected to be this size. This difference is likely explained by Figure 13. The percent of known subglacial lakes that are matched by the simulations as a function of increasing lake surface area for (a) RES-detected and (b) active lakes. From left to right, the probability cutoff threshold is decreased so that a greater percent of the continent is covered by lake. At each point, the percentage of known lake matches is evaluated. Lake matches are computed for both a 5-and 20-km search radius. For the smooth and rough results, the mean recovery is shown across all simulations. For example, approximately 80% of known RES lakes are in the 20% highest probability pixels.

Table 2
Cumulative Subglacial Lake Surface Area and Lake Number Across Different Studies Study Surface area Expected number of lakes Dowdeswell and Siegert (1999) 54,000 km 2 Livingstone et al. (2013) 2.7%, or 350,000 km 2 9,360 Goeller et al. (2016) 0.6% or 77,000 ± 18,000 km 2 921 ± 300 Scenario 1 (Bedmap2 topography) 0.6%, or 77,000 km 2 6,078 Scenario 3 (smooth topography) 0.41 ± 0.12%, or 52,000 ± 15,000 km 2 15,119 ± 9,637 Scenario 5 (rough topography) 0.32 ± 0.18%, or 41,000 ± 23,000 km 2 5,034 ± 2,500 the sampling bias in known lakes as smaller lakes are harder to detect. Livingstone et al. (2013) only counted lakes larger than 5 km 2 , so smaller lakes are not included. Goeller et al. (2016) used a grid resolution of 5 km, although more than 90% of our expected lakes and 70% of the known lake lengths used in our study are smaller than 5 km. Each of our expected lake maps had a median lake area of 1 km 2 , but the mean sizes varied with topographic roughness. Our lake simulation results using Bedmap2 yielded lakes with a mean surface area of 12.6 km 2 , while our smooth topography simulation had a mean lake size of 3.3 km 2 .
Although the Bedmap2 topography generated larger lakes and a greater cumulative surface area, the smooth topography had a greater number of lakes. The smooth topographic simulations yielded more than twice as many expected lakes as the Bedmap2 topography and more than 15 times as many lakes as the Goeller et al. (2016) model.
The rough topography yields fewer, larger lakes than the smooth topography. This may be due to the difference in corner wavelengths (Figure 7). Because the rough topography has a higher corner wavelength than the smooth topography, it will have a larger dominant wavelength. Therefore, the rough topography has both higher relief and wider basins, which produces larger lakes.
These results suggest that most undiscovered lakes would have to have an area less than 5 km 2 in order to match our expected size distributions and number of lakes. Our simulations can not generate lakes with an area smaller than 1 km 2 , though there are likely many. This would not substantially influence cumulative lake surface area, but it could affect the distribution of lake sizes.

Lake Simulations and Locations
Our lake location probability maps confirmed the locations of many known lakes and also showed potential locations of undiscovered lakes. For RES lakes, Dome A, Dome F, and Princess Elizabeth Land have promising regions that lack dense radar surveys. For active lakes, Evans Glacier, Pine Island Glacier, Bailey Ice Stream, and the lower portion of Byrd Glacier all had high probabilities, though they have no known active lakes. While satellite altimetry does not have the spatial constraints of ice-penetrating radar and can theoretically image the entire Antarctic surface, active lakes may remain undetected because the ice surface is too rough, the lake was too small relative to the ice thickness, or the lake was not active during the period of observation. Our probability maps can assist in locating active lakes where altimetry methods might not be suited for detection.
The high correlation coefficients with heat flux suggest that heat flux plays an important role in modulating lake locations. This was also found to be the case in the Livingstone et al. (2013) study, which used the Pattyn (2010) model of warm and cold-bedded regions. However, known lake locations were used in the Pattyn study, which introduces circular logic in the Livingstone et al. (2013) model. Our method of incorporating heat flux data through regression allowed us to account for the basal thermal regime without bias.
Not all known lake locations were matched by our model. Some of this error is likely due to topographic uncertainty. Because the topography, ice thickness, and subglacial water flow paths change with each realization, lake locations will vary. Some known RES lakes were located in very low probability areas, such as West Antarctica and Victoria Land (Figure 8a). This is why the accuracy curve in Figure 13a did not converge to 100% until almost the entire continent was made to be a lake. While some of the known RES lakes had low RES lake probabilities, they coincided with high probability active lake regions, such as at Mercer Ice Stream. This suggests that these lakes may have more in common with active lakes. It is also possible Journal of Geophysical Research: Earth Surface 10.1029/2019JF005420 that some of the known lakes in these locations are false positives. Lake identification typically relies heavily on human interpretation and rarely undergoes a stringent verification and classification process (Carter et al., 2007).
The active lake simulations with rough topography have a higher accuracy than those with smooth topography (Figure 13b). This is likely an effect of the water routing model, which correlates strongly with active lake location. The ice surface slope has 10 times the effect on hydraulic potential that bed slope does (Shreve, 1972), so a steep bed slope is needed to create an area with high flow accumulation. Therefore, the steep bed slopes resulting from rougher topography are more conducive to creating areas with high flow accumulation and local storage.
The lake maps shed light on the complex network of subglacial hydrology and are helpful for a variety of purposes. They could be used to inform future field campaigns to study lakes, classify hydrological regions of Antarctica, estimate subglacial water storage, or predict frozen and thawed regions. They could also be used to parameterize sliding behavior in ice sheet models, constrain the magnitude of outburst events, or investigate drainage networks. Some of the highest probability locations for active lakes in Antarctica were Pine Island and Thwaites glaciers, which have experienced significant mass loss (Rignot et al., 2019). These maps could provide insights on the processes that govern the movement and retreat of these glaciers.

Active Versus Radar-Detected Lakes
The maps and correlation coefficients showed similarities and differences between active and radar lakes. The advantage of this approach is that we can use all the information available in our model without an a priori assumption about importance. Both active and RES lake probabilities correlated well with ice thickness and heat flux. The higher importance of ice thickness for RES lakes suggests that they form through pressure melting in regions with thick ice and remain in that location. Radar lakes have a higher correlation with distance along flow path from the grounding line than active lakes, indicating that radar lakes tend to exist further upstream. This is also confirmed by the probability maps, which show radar lakes clustered in the center of East Antarctica and active lakes grouped closer to the coast. Active lakes were correlated with the water routing models while RES lakes were not. This suggests that active lakes are governed by water routing, while RES lake locations depend on ice thickness. The strong positive correlation of active lakes with ice velocity also suggests that active lake meltwater is generated through frictional heating or that these lakes cause increased ice velocity.
Our results suggest that active and RES lakes are fundamentally different types of hydrological features. RES lakes are located near the divides with small hydraulic catchments, while active lakes are closer to the grounding line and accumulate water from large upstream catchments and frictional melting. Active lakes were predicted beneath ice streams, while RES lakes were not. Active lakes appeared in West Antarctica, while RES lakes were predominantly in East Antarctica, suggesting that West Antarctica and East Antarctica have extremely different hydrological environments.
The active lake locations and correlation coefficients also help explain why active lakes generally cannot be detected with ice-penetrating radar. Active lakes were linked with high ice velocity, ice streams, and water flowing in from upstream. This suggests that active lakes may be too dynamic to form the flat ice/water interface required for a clean radar reflection (e.g., Siegert et al., 2014).
Certain bed-dependent variables did not correlate well with either lake type because they change dramatically with each topographic simulation. Therefore, bed slope and hydraulic potential curvature were nearly independent of lake probability, though in reality they may not be. The different topography with each simulation introduced noise into these parameters that overwhelmed the original signal. The importance of these variables was likely underestimated by our modeling study.

Assumptions and Uncertainty
While our DEMs offer realistic representations of bed topography, they are simplified, particularly in that simulated roughness does not vary with region or direction. For example, the same roughness statistics are used to simulate small-scale topography on the ocean floor and the Transantarctic Mountains, but the seafloor is likely much smoother due to sedimentation. Bed topography is not constrained by ice velocity or mass conservation. The simulations do not recreate morphological features or drainage pathways. The power spectral density plots (Figure 7) show that the simulated topography behaves more similarly in the frequency domain to radar topography than Bedmap2 does, but there is still some difference between the simulated and actual topography. This difference could be attributed to artifacts that are generated by adding simulated small-scale roughness to long-wavelength topography. Depending on the constructive or destructive interference of the small-wavelength topography with long-wavelength topography, the resulting topography could have somewhat different spectral content from the measured topography. Future work on topographic simulation methods is needed to account for regional variations in roughness, possibly through using local topographic data to inform the roughness characteristics of nearby topography. Bed topography has also been linked to ice surface topography (e.g., Bindschadler & Choi, 2007); this relationship could be used to constrain topographic predictions.
Although the ice-bed interface of Antarctica exceeds 12 × 10 6 km 2 , only 351 RES lakes and 139 active lakes are used in the model. This is an exceedingly small amount of data with which to train a continent on through regression analysis. We are unable to use available lake length data because we do not know if the given coordinate for each lake is the center point. We also assume that all radar flight lines without lakes are non-lakes, though dim or smaller lakes could easily go undetected by radar or are not identified when processing radargrams. Future hydrological models would benefit from a definitive non-lake database. It would also be useful to have a range of coordinates for each observed lake rather than one coordinate and a length. Similarly, our use of ICESat coverage for determining non-lake locations does not completely eliminate observational bias. Active lake detection relies on repeat measurements before and after drainage, so the lack of a detected active lake does not necessarily mean that one is not there. We assume that this does not substantially alter our predictions.
The cutoff thresholds for determining which probabilities qualify as a lake were set using the AGAP radar data in the Gamburtsev Mountains Ferraccioli et al., 2011). This data set was chosen for its dense coverage and large quantity of lakes, but it covers only a small area. The AGAP surveys are the best training data available, but it is by no means representative of the entire continent. While our model places lakes in the locations with the highest probabilities and does not put any lakes in low probability areas, a small proportion of the low probability regions likely will still have lakes in reality. These maps should be used to complement, not replace other studies.

The Use of Geostatistics
This project synthesized the available physical data of Antarctica and simulated bed topography to calculate subglacial lake surface area in a way that accounted for uncertainty in bed roughness and physical drivers. The use of geostatistics in this model had several advantages: (1) We directly incorporated the roughness of topographic data in bed topography simulations, (2) multiple realizations of topography and lake simulations were created in order to encapsulate a range of possible bed conditions, (3) multiple data sets were integrated to predict lake occurrence while minimizing physical assumptions, (4) it showed the relationships between physical predictors and lake occurrence, and (5) by applying the same model to active and RES lakes, we could objectively investigate how they differ. This allowed us to mitigate typical issues associated with data limitations and physical uncertainty that frequently hamper physical models.

Conclusion
We have developed a protocol for simulating subglacial topography and generated 100 new DEMs with statistically realistic roughness. These DEMs enabled the development of 200 probability maps of radar and active lake probabilities and 100 binary maps of radar lakes. All data products are available online (http://www.usap-dc.org/view/dataset/601213). Radar lakes are expected to cover 0.32-0.41% of Antarctica with 5,000 to 15,000 lakes. Our statistical simulation of lake occurrence results in a lower total surface area and higher number of lakes than past estimates and demonstrates that lake surface area and number depends on bed roughness. Therefore, topographic roughness exerts a control on the hydrological conditions of Antarctica. Our simulations are a realistically rough alternative to DEMs made by kriging and can be used as a boundary condition in other models. This has implications for basal sliding, water routing, and sea level rise contributions from meltwater.
Our results suggest that active and RES lakes are distinct hydrological features that occur in different locations and are linked with different physical conditions. Active lakes are more likely to be found beneath West Antarctic ice streams and outlet glaciers, including a very high likelihood at Thwaites Glacier and Pine Island Glacier, while radar lakes occur in interior East Antarctica near the divides. No active lakes have 10.1029/2019JF005420 been detected at Evans Glacier or the lower portion of Byrd Glacier, but our results suggest that these locations have a high likelihood of having active lakes. Ice thickness, heat flux, and distance inland along flow path from the grounding line were the dominant predictors for RES lakes. Active lakes have a strong positive correlation with ice velocity, heat flux, and flow accumulation models. This suggests that RES lakes are stable feature that are far upstream of the grounding line and accumulate water through pressure melting, while active lakes are further downstream, collect water from upstream sources, and are part of a dynamic hydraulic system closer to the grounding line.
Limitations in data coverage are a hurdle to characterizing the properties and systems of subglacial environments. Our use of geostatistical methods allowed us to realistically represent subglacial conditions and investigate hydrological complexities. The protocols established in this paper could enable further investigations of subglacial conditions.