A Bayesian Logistic Regression for Probabilistic Forecasts of the Minimum September Arctic Sea Ice Cover

This study introduces a Bayesian logistic regression framework that is capable of providing skillful probabilistic forecasts of Arctic sea ice cover, along with quantifying the attendant uncertainties. The presence or absence of ice (absence defined as ice concentration below 15%) is modeled using a categorical regression model, with atmospheric, oceanic, and sea ice covariates at 1‐ to 7‐month lead times. The model parameters are estimated in a Bayesian framework, thus enabling the posterior predictive probabilities of the minimum sea ice cover and parametric uncertainty quantification. The model is fitted and validated to September minimum sea ice cover data from 1980 through 2018. Results show overall skillful forecasts of the minimum sea ice cover at all lead times, with higher skills at shorter lead times, along with a direct measure of forecast uncertainty to aide in assessing the reliability.


Introduction
Sea ice is a major component of the Arctic climate system, influencing atmospheric and oceanic circulation by inhibiting heat and moisture exchanges between the ocean and atmosphere and reflecting most of the incident solar radiation. Since the beginning of the satellite era in 1978, sea ice cover in the Arctic ocean has seen a decrease in all seasons, with an annual decrease rate of 4% per decade (Cavalieri & Parkinson, 2012) and the most dramatic decline experienced in September (Onarheim et al., 2018;Stroeve & Notz, 2018). Since 2007, the September Arctic sea ice extent has consistently remained below the pre-2007 records. Decreasing summer ice cover has led to an extension of the Arctic ocean open water season by about a week each decade (Stroeve, Markus, et al., 2014;Stroeve & Notz, 2018). This increase in Arctic accessibility is of keen interest for geopolitics, resource extraction, shipping, tourism, and scientific research. Skillful forecasts of minimum sea ice and open water passages is therefore of great importance to these and other stakeholders.
The changing Arctic climate and high variability of sea ice has sparked growing interest in modeling and predicting summer sea ice conditions. Following the then record low September sea ice extent in 2007, the Study of Environmental Arctic Change (SEARCH, https://www.searcharcticscience.org/seaiceoutlook) began soliciting Sea Ice Outlooks (SIOs) from the scientific community, forecast centers, and the public. Now managed by the Sea Ice Prediction Network (SIPN) (https://www.arcus.org/sipn/sea-ice-outlook), ©2020 The Authors. This is an open access article under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited. interest and participation in SIO has grown from 39 submitted predictions in 2008 to 112 predictions in 2018, with a variety of dynamical (coupled ice-ocean models or coupled ice-ocean-atmosphere models), statistical-, mixed-, and heuristic-based methods (see (Hamilton & Stroeve, 2016).
Dynamical models for seasonal sea ice predictions use sea-ice conditions, ocean temperatures, and/or atmospheric conditions to initialize the models for each season (Msadek et al., 2014). These models can be coupled ice-ocean models with prescribed atmospheric forcing such as the Pan-Arctic Ice-Ocean Modeling Assimilation System (PIOMAS)  or coupled ice-ocean-atmosphere models such as CFSv2 (Wang et al., 2012) and CanSIPS (Sigmond et al., 2013). Typically, these models are used to create ensemble forecasts from which estimates of the minimum September sea ice extent as well as spatial field probabilities can be obtained. While these methods have shown promise in recent years, skillful forecasts at lead times greater than 4 months remain elusive (Bushuk et al., 2017) and forecast uncertainty is not well quantified.
Statistical forecasting of Arctic sea ice cover dates back to the 1980s and has developed substantially in recent years. Early forecasts focused on predictions utilizing the relationships between sea ice and atmospheric (Barnett, 1980;Walsh, 1980), oceanic (Johnson et al., 1985), or sea ice (Johnson et al., 1985;Walsh, 1980) predictors. The primary focus was to predict a single parameter such as total sea ice extent, either regionally (Drobot, 2007;Drobot & Maslanik, 2002) or pan-Arctic (Drobot et al., 2006;Lindsay et al., 2008). More recently, statistical methods have been used to provide spatial field predictions of sea ice concentration using a variety of techniques such as a linear Markov model (Yuan et al., 2016), a vector autoregressive model , a deep neural network (Chi & Kim, 2017), a convolutional neural network (Kim et al., 2020), and a data-adaptive harmonic technique (Kondrashov et al., 2018). There is no statistical method in vogue to generate probabilistic spatial field forecasts presented in the yearly SIO, which motivates the need for a statistical method to join the dynamical model forecasts.
This study presents a novel approach to probabilistic minimum sea ice forecasts by employing multivariate Bayesian logistic regression at Pan-Arctic grid points. By utilizing a statistical framework, the output of this technique is not merely a point estimate of predicted probabilities, but rather a distribution from which a direct quantification of uncertainty is available. Forecasts are made for minimum sea ice cover for 1-to 7month lead times, and performance is evaluated across the spatial domain as well as at local grid points for each lead time using both a rolling-validation technique and a drop-one cross-validation technique. This method is a compliment to the existing suite of statistical and dynamical sea ice forecasting models, and the scope of this paper is to introduce this method, rather than a stand-alone study of sea ice variability.

Sea Ice
Sea ice data are obtained from the Sea Ice Index products housed at the National Snow and Ice Data Center (NSIDC) (Fetterer et al., 2017). The data are derived from two sources: (1) the Near-Real-Time DMSP SSMIS Daily Polar Gridded Sea Ice Concentrations (NRTSI) from the Special Sensor Microwave Imager/Sounder (SSMIS) on board the Defense Meteorological Satellite Program (DMSP) satellites (Maslanik & Stroeve, 1999) and (2) the combined Nimbus Scanning Multichannel Microwave Radiometer (SMMR, 1979(SMMR, -1987, the DMSP Special Sensor Microwave/Imager (SSM/I, 1987(SSM/I, -2007, and the Special Sensor Microwave Imager/Sounder (SSMIS, 2007 to present) 25 km × 25 km gridded sea ice concentration data product from the NASA Team sea ice algorithm (Cavalieri et al., 1996). September minimum sea ice cover at each grid cell (where ice is deemed "present" if the concentration is above 15%) is calculated for the period 1980-2018.

Atmospheric and Oceanic Variables
Studies have shown that spring atmospheric conditions have a strong impact on summer sea ice melt (Kapsch et al., 2014(Kapsch et al., , 2019. Favorable conditions transport moisture to the Arctic, increasing cloud cover and water vapor affecting anomalous sea ice concentrations (Kapsch et al., 2013). Increased cloud and water vapor have been linked to earlier melt onset through increased longwave downwelling (Mortin et al., 2016). Because of its long memory, the ocean can be a source for long-term predictability of the climate and sea ice (Boer, 2004;Griffies & Bryan, 1997;Guemas et al., 2016;Lindsay et al., 2008). Building on the extensive literature studying the predictability and variability of sea ice (Huang et al., 2019;Ionita et al., 2019;Olonscheck et al., 2019;Rigor et al., 2002;Tivy et al., 2007;Wang et al., 2015Wang et al., , 2019Yuan et al., 2016), several local atmospheric predictors were considered for inclusion including downwelling longwave and shortwave radiation, cloud fraction, sea level pressure, total column water vapor, and surface air temperature. Additionally, the first principal component (PC) of geopotential height at 500 and 1,000 hpa (north of 60°l atitude) and Atlantic and Pacific (as defined by WOCE Data Products Committee. 2002, for latitudes north of the equator) sea surface temperatures (SSTs) were included as predictors. Selection of covariates was determined using Bayesian Information Criterion (BIC) and heuristic decision making. Relying on BIC only results in different model structures at each grid cell and lead time. For simplicity, a single model structure was chosen based on BIC results and heuristic selection. Allowing different covariates per grid cell or lead time is left for future work. The final selected model uses local surface air temperature, downwelling longwave radiation, and sea ice concentration as well as the first PC of geopotential height at 500 hpa and Atlantic and Pacific SSTs. All atmospheric and oceanic values are derived from the Modern Era Retrospective Analysis for Research and Applications (MERRA-2) (Gelaro et al., 2017). MERRA-2 uses the Goddard Earth Observing System, Version 5.12.4 (GEOS-5) atmospheric model and Global Statistical Interpolation (GSI) analysis scheme and has an approximate spatial resolution of 0.5°latitude by 0.625°l ongitude. MERRA-2 has been shown to perform well compared to other common reanalysis products in terms of Arctic atmospheric variables (Graham et al., 2019) as well as oceanic variable (Lim et al., 2017). Local data are subsetted to only include locations with a latitude >40°N and are regridded to match the coordinate reference system projection of the sea ice data (polar stereographic projection at a grid cell size of 25 × 25 km).

Logistic Regression Model
Logistic regression is a technique that specifies the probability that a binary response variable is one class given the values of explanatory variables: π(x) = p(Y = 1|X = x), where π(x) lies between 0 and 1, the random variable Y takes on values 0 or 1, and X is a vector of predictors. In this study, the response variable is the presence/absence of sea ice on the day of minimum extent, and the explanatory variables are the anomalous sea ice concentration, atmosphere, and ocean predictors discussed in sections 2.1 and 2.2. The authors recognize that these covariates can exhibit a strong trend over the study period, but because the response variable is binary and cannot be detrended, anomalous values are used rather than detrended values. An exploratory analysis using detrended values yields results that do not change significantly.
The probability π(x) is parameterized as where X i represents the ith of m covariates with corresponding regression coefficient β i , typically estimated by maximum likelihood whose uncertainties rely on large-sample central limit theorem approximations. Small sample sizes can lead to complete or quasi-complete separation in which the response variable Y is completely separated by a predictor variable, X 1 , for example, p (Y i = 1|X i1 ≥ 2) = 0 and p (Y i = 1| X i1 < 2) = 1. Separation is a common problem in logistic regression and was apparent when applied here, but Bayesian methods can be used to mitigate this effect with application of prior distributions.

Bayesian Framework
The Bayesian approach provides a method for updating the distribution of statistical parameters of the model based on observations given some prior beliefs about the true behavior of the system. Here, we seek the posterior distribution of regression coefficients, p(β|data), using a prior distribution for β, p(β), and a conditional likelihood, L (data|β). Our conditional likelihood is products of independent Bernoulli trials whose probability parameters are parameterized according to the logistic regression above. Following Gelman et al., 2008, we use independent center zero Cauchy prior distributions for each regression coefficient with scale 10 for the intercept and 2.5 for other regression coefficients. In practice, the posterior distribution of the regression coefficients in logistic regression is not available in closed form and is approximated by generating samples that are approximately distributed according to the posterior using Markov Chain Monte Carlo (MCMC) techniques (Gelman et al., 2013). MCMC approaches can be computationally prohibitive; we therefore approximate the posterior mode of regression coefficients using an approximate Expectation-Maximization (EM) algorithm (Dempster et al., 1977), and gather associated standard errors in accordance with Gelman et al. (2008), using the "arm" package (Gelman & Su, 2018) in the R programming language. Given the posterior mode, posterior predictive samples are generated by sampling regression coefficients according to a multivariate normal distribution followed by samples from the logistic predictive model, conditioned on the posterior parameter samples. The final output is an approximated posterior predictive distribution of the probability of the presence of sea ice ( Figure S1 in the supporting information).

Application
With this framework, a separate Bayesian logistic regression model is fitted at each grid point with valid predictor values. Any location that has never seen a change in sea ice cover is omitted. The output is a posterior distribution of probabilistic forecasts of sea ice cover at each grid point from which a median value and credible intervals can be determined. Because monthly mean values are used which are not available until the following month, a 1-month lead time (made in August) corresponds to July monthly mean values. Additionally, a single value pan-Arctic sea ice extent is derived from these probabilistic forecasts as outlined in section 3.3.

Skill Measurement
To test the overall performance of this model, a rolling-validation scheme is applied in which years 1980 through 1989 were used for training the model to forecast year 1990, after which years 1980 through 1990 were used to forecast year 1991, and so on through prediction year 2018. The skill is quantified via Brier Scores (BS) for each lead time. A Brier Score is the mean square error of probability forecasts for dichotomous events, in this case presence (1) or absence (0) of sea ice. The BS is where f t are the forecasted probabilities, o t are the observed outcomes, and N is the number of forecasts. The BS takes on values in [0,1], with a smaller score representing better predictions. To compare this technique to a reference case, the Brier Skill Score (BSS) is calculated by comparing the Brier Score to climatology following where BS ref is the Brier Score using climatology as the forecasted probabilities. Here, BSS = 0 is equal to climatology, BSS > 0 outperforms climatology, and BSS < 0 is worse than climatology. Climatology is calculated as the proportion of the presence of sea ice at each grid over all the training years. We use two forms of climatology: (1) using all the previous years to predict the current year, hereon referred to as the full climatology, and (2) using the previous 10 years to predict the current year, hereon referred to as the 10-year climatology. This method is also used to measure the skill of the fully fit model compared to a concentration-only model in which sea ice concentration is the only predictor, showcasing the value of the atmospheric and oceanic predictors.
Assessment of Brier Scores are implemented in two ways: (1) A single "global" BS is calculated across the spatial domain (north of 67°latitude) for each year individually in a similar fashion as the SIO, where N is the total number of grid locations, and (2) a local BS at each individual grid point where N is the total number of years (29).
The performance of this method is also evaluated using a drop-one cross-validation scheme applied to every year of observation at each lead time. For example, if year 2000 is dropped, the model is fit on all other years and a prediction is made with year 2000 data. This is repeated for all years. These results show only slightly greater "global" skill than the rolling-validation scheme (though with a very similar temporal pattern, Figure S2). The local BS using cross validation show slight improvement compared to the rolling-validation results by increasing the number of locations that show skill ( Figure S3). The rolling validation replicates the practical forecast system and is therefore chosen for further discussion in this paper, while the cross-validation results can be found in the supporting information text.

Earth and Space Science
The forecasted Pan-Arctic sea ice extent is compared to two baseline statistical models: a climatology model and an anomaly persistence model. The forecast from the climatology model is simply the mean minimum sea ice extent from all previous years and therefore does not change with lead time. The anomaly persistence model assumes that the sea ice extent anomaly at a given lead time will be the same as the anomaly during the September minimum. The skill of the Bayesian logistic regression model is compared to these baseline models with common quantitative performance measures, namely the Nash-Sutcliffe model efficiency coefficient (NSE), the root-mean-square error (RMSE), and the mean absolute error (MAE) (Chai & Draxler, 2014;Chatur et al., 2013;DeRousseau et al., 2019;Nash & Sutcliffe, 1970).
NSE, historically used to assess the predictive power of hydrologic models, quantifies how well a model simulation can predict the outcome variable. Given the observed value, y i , the predicted value, ŷ i , and the average value of the data (climatology at a given grid cell), y, NSE is calculated as NSE takes on values in (−∞,1] with higher values indicating greater accuracy. A score of 0 indicates that the model predictions are as accurate as the mean of the observed data, and a score that is less than zero indicates that the mean is a better predictor than the model.
The MAE is a measure of the average magnitude of errors of a model by using the absolute value of the errors. It represents the median error of the model and is measured on the same scale as the observed variable, with lower values indicating a better model fit. Given observed values, y i , predicted values, ŷ i , and n observed values, MAE is calculated as The RMSE is the standard deviation of the prediction errors and indicates how concentrated the data are around the model fit. Similar to MAE, RMSE is measured on the same scale as the observed variable and is always positive due to the squared residuals in its calculation. These squared terms additionally highlight the effect of outliers, an important feature when high errors are a concern. RMSE is calculated as In this study all three metrics are used to capture the accuracy and median and mean errors of Pan-Arctic minimum September sea ice extent forecasts.

Forecasted Probabilities and Uncertainties
To assess the performance of this method under varying conditions, forecasts made for a recent low-extent year and a recent high-extent year (2012 and 2009, respectively) are examined in closer detail (Figure 1). The year 2012 is the record low sea ice minimum year since the start of the satellite era, which resulted from a combination of preconditioned overall ice reductions and a strong summer storm that entered the central Arctic in early August (Parkinson & Comiso, 2013). The year 2009, on the other hand, had the highest September minimum extent in the last decade. Despite an Arctic dipole anomaly (DA) in June and July bringing warm southerly winds into the Chukchi and East Siberian Seas, August and September were dominated by low pressure in the Chukchi and Beaufort Seas along with high pressure over Greenland, likely favoring ice divergence and slowing the decline in ice extent (Stroeve et al., 2012). Both years proved challenging for forecast systems (Stroeve, Hamilton, et al., 2014).
The output of each forecast contains a probability spatial field (Figures 1a and 1c) as well as spatially mapped width of the central 95% posterior predictive credible interval (Figures 1b and 1d). Overall, the forecasts capture the general spatial features of the minimum extent with greater accuracy at shorter lead times. Greater

10.1029/2020EA001176
Earth and Space Science uncertainties are mostly restricted to areas near the minimum sea ice edge signifying high confidence across most of the Arctic but inexact forecasts surrounding the sea ice edge.
Global Brier scores are used in the SIO to measure the skill of forecast submissions and are therefore used here as well (Figure 1, white text). Forecasts for 2009 ( Figure 1c) contain similar skill across all lead times. In fact, January predictors provide the second most skillful forecast. The uncertainty however, shows generally wider credible intervals at longer lead times, decreasing as lead time decreases (Figure 1d). Forecasts for 2012 show roughly constant skill using winter predictors (January through March), followed by a substantial increase in skill from May to June (Figure 1a). In both years, there is a spike in uncertainty using July predictors compared to May and June. Similar results are found for years 2007 (second lowest extent year, Figures S4a and S4b) and 1996 (highest extent years, Figures S4c and S4d). Because forecasts made in July rely mostly on sea ice concentration as a predictor, inclusion of the remaining covariates is a potential source of this surge in uncertainty.

Forecast Skill Through Rolling Validation
The rolling-validation local Brier skill scores (found individually at each grid cell) show high forecast skill at mid-Arctic latitudes along the Northern Sea Route and in the Beaufort and Chukchi Seas (Figure 2). At shorter lead times, high skill is present across most of the spatial domain, and as lead time increases the total area over which the model outperforms both climatology forecasts decreases (Figures 2a and 2c). A similar

10.1029/2020EA001176
Earth and Space Science pattern is observed when compared to the sea ice concentration-only model, although skill decreases in June and July as the sea ice concentration becomes the dominant predictor ( Figure 2b). The high skill compared to the sea ice concentration-only model at long lead times can be explained by lower variability in sea ice concentration during the winter and spring as well as spring atmospheric conditions affecting melt onset which impacts ice retreat later in the season (e.g., Stroeve et al., 2016).
The full climatology BSS maps (Figure 2a) show little changes from January to April, then a gradual increase in areal coverage of skill appears from April to July. The time period with increasing skill is when the melt season ramps up and is an important transition period for the Arctic climate system. Persson (2012) found that spring atmospheric processes can create significant surface warming in the Arctic prior to melt onset occurring. As melting progresses, the surface albedo decreases, allowing more heat to be absorbed, further enhancing melt (Stroeve et al., 2012). This signifies a transition where predictive power is transferred from the atmosphere to the sea ice.
The 10-year climatology BSS maps (Figure 2c) show less skill compared to the full climatology BSS, particularly at lead times greater than 3 months. Because the 10-year climatology uses sea ice cover from more recent years in which sea ice covers a smaller area compared to the 1980s and 1990s, the lower BSS suggests that our forecast system exhibits bias toward overpredicting the probability of sea ice cover. This is explored further using a reliability diagram (Figure 3) in which forecasted probabilities from all grid boxes and all years are binned into small groups of width 0.05 ranging from 0 to 1 (i.e., [0,0.05), [0.05,0.1), …, ([0.95,1]). Then, for each bin, the frequency of occurrence of sea ice is determined from observations. Bias toward overpredicting sea ice cover is seen here as most values from each lead time are below the 1 to 1 line, particularly with forecasted probabilities closer to 1.
The contribution of atmospheric and oceanic covariates are examined by individually dropping these covariates, refitting the model with the remaining covariates, forecasting, and comparing to the full model fit ( Figure 4). As before, positive BSS values indicate locations where the fully fit model outperforms the model with the dropped covariate, indicating that the inclusion of that particular covariate adds predictive skill.

Earth and Space Science
Surface air temperature adds skill across much of the Arctic domain at lead times of 1 to 3 months ( Figure 4a). This area decreases as lead time increases, but skill is still provided in the Beaufort and Chukchi Seas. Downwelling longwave radiation provides skill during the winter and spring, but less so at shorter lead times (Figure 4b). During winter the surface energy budget is largely dominated by longwave radiation and with spring comes the onset of melt in the snowpack atop sea ice. Once liquid water is present, longwave radiation becomes less important in driving sea ice melt as demonstrated by providing less skill at shorter lead times. Pacific and Atlantic SSTs provide skill in similar locations at concurrent lead times with a notable increase in skill across much of the Arctic using May predictors (Figures 4c and  4d, respectively). The first principal component of wintertime extratropical SSTs has been linked to changes in Arctic sea ice cover by influencing planetary wave propagation, thereby affecting the strength of the polar vortex . Because sea ice concentration is at or nearly 100% across most of the Arctic during winter, winter and early spring SSTs can provide information on the state of the Arctic climate when little information can be gained from the state of the sea ice. Geopotential height at 500 hpa provides skill at longer lead times (4 to 7 months) in the central Arctic (Figure 4e). Rigor et al. (2002) found that the memory of the wintertime AO persists through most of the subsequent year, influencing sea ice advection and heat transport to the Artic from lower latitudes. While the first principal component of geopotential height at 500 hpa is not the same as the AO index, it is representative of synoptic scale circulation patterns that could impart similar impacts on summer sea ice.
Some of the skill maps in Figure 4 can appear spatially inconsistent, a result of separate model fitting at each location. While this does not affect the performance of this model the authors recognize that this is a feature that could be addressed further. Possible approaches are to apply a spatial smoother to the fitted regression coefficients, place a spatial process on the prior distribution, include a spatial component in a Pan-Arctic model, or fit models regionally rather than locally, but this is left for future work.
Forecasts of the 2018 minimum sea ice cover probability via this approach were submitted to the SIO in June, July, and August of 2018 ( Figure 5) (Bhatt et al., 2019). Out of 120 total forecasts submitted to the

Earth and Space Science
SIO in 2018, a total of 10 spatial field forecasts were submitted in June, 14 in July, and 12 in August. These forecasts were made with a mixture of statistical and dynamic methods. The skill of each forecast was measured using spatially averaged Brier Scores where a value of 0 represents a perfect forecast and a 1 represents no skill. The approach presented in this paper yielded a lower Brier score (higher skill) than all other methods in June and July and was only outperformed by 1 forecast in August.

Earth and Space Science
Although 2018 was a fairly average extent year in the last decade, this method shows promise with the relatively high performance and will continue to make forecasts in future years. The forecasted single value Pan-Arctic sea ice extent can also be obtained with this method and was submitted to SIO in 2018. This forecast, however, was not as accurate as other statistical methods (Bhatt et al., 2019) as Pan-Arctic sea ice extent is not the value this method is designed for. Further discussion of this single value sea ice extent is discussed in the following section.

10.1029/2020EA001176
Earth and Space Science

Pan-Arctic Sea Ice Extent and Global Brier Scores
Because this method outputs a probability at a given grid point, to obtain a Pan-Arctic sea ice extent each of these probabilities needs to be converted into a binary output, the presence or absence of sea ice cover. This is achieved by establishing a probability threshold (i.e., if the threshold is 50%, then any grid cell with a probability <50% is set to "ice free" and any grid cell with a probability ≥50% is set to "ice covered"). This threshold is found by varying it from 1% to 99%, calculating the total sea ice extent by summing the area of grid cells that are "ice covered," and comparing to the observed minimum extent. This is performed for every year during rolling validation and at every lead time which shows approximately a 70% probability threshold to be the most accurate. This threshold, which should be 50% for a binary prediction, is a result of the bias toward overpredicting sea ice cover probability discussed earlier.
Using this 70% threshold, time series of the forecast of minimum extent at each lead time are compared to historical observations (Figure 6a). All lead times capture the overall decreasing trend from 1990 to 2018, with large deviations in 2015 and 2016 ( Figure 6a) and corresponding high Brier scores (Figure 6b). The 2015 errors are likely due to the dramatic changes the Arctic experienced that summer, with unusually low temperatures in May and June followed by one of the then warmest Julys on record, and finally slowed extent loss in late August and September . This resulted in underpredictions at short lead times, and overpredictions at longer lead times. The strong central Arctic cyclone in August of 2016 is likely the cause of overpredicted extents that year, which is also observed in 2012 when a similar cyclone occurred.
The global Brier score ( Figure 6b) provides a measure of performance year by year and is the metric used in the SIO post season reports (i.e., Bhatt et al., 2019). Long lead times (January-April) show a decreasing trend in skill from 1990 to 2018 while short lead times (May-July) show either no trend or increasing skill, resulting in a slightly greater mean Brier score (averaged across all lead times) for years 2009-2018 (mean BS = 0.092) compared to years 1990-1999 (mean BS = 0.089). Recent years show a noticeably larger spread in Brier scores compared to the 1990s, a result of greater differences between long and short lead times. Since 1999, the Arctic has seen a dramatic decrease in overall sea ice thickness and multiyear ice (MYI) coverage which now covers less than one third of the Arctic ocean (Kwok, 2018). This shift to predominantly seasonal ice cover renders summer sea ice more susceptible to the highly variable summer atmospheric circulation (discussed further in section 4), impacting the effectiveness of long lead seasonal forecasts as demonstrated by the greater spread and higher peaks in Brier score since 2000 (Figure 6b).
Three metrics outlined in section 2.6 are used to assess the skill in Pan-Arctic September minimum sea ice extent forecasts compared to baseline statistical models (Figure 7). At all lead times the forecasted values show greater accuracy using all three metrics when compared to both the climatology and anomaly persistence models. The decreasing gap between skill scores for our model and the anomaly persistence model as lead time decreases below 3 months reiterates the discussion in section 3.2 in that forecast skill comes primarily from sea ice concentration as lead time decreases.
Interestingly, both forecast values and anomaly persistence outputs show a decrease in skill as lead times decrease from winter to spring with the least accurate model forecasts occurring in March and the least accurate anomaly persistence forecasts occurring in April for NSE and RMSE and May for MAE. This touches on the idea of a "spring predictability barrier" found by Bonan et al. (2019) in which forecasts initialized prior to May are less skillful than forecasts initialized after May. However, here we have demonstrated that winter geopotential height and SSTs add predictive power at longer lead times.

Summary and Conclusion
With expanding open ocean conditions in summer months, interest in Arctic accessibility is growing quickly with much potential for socioeconomic benefit. The need for accurate seasonal forecasts of sea ice cover is therefore of great use to stakeholders for route planning and is of interest to the scientific community. This study has developed a novel approach for seasonal probabilistic forecasts of the minimum sea ice cover using a Bayesian logistic regression framework. The output is a probabilistic spatial map of the location of sea ice during the sea ice minimum extent, resulting in a direct measure of uncertainty at each grid point. By including a combination of atmospheric, oceanic, and sea ice predictors, this method was able to make accurate forecasts up to 7 months in advance across much of the Arctic Ocean. Additionally, providing the uncertainty of forecasts can provide users with a measure of reliability of the predictive power of this method.
The progression of BSSs from January to July depict the predictability of sea ice through the change in season. From January through March the climatology BSSs remain fairly constant, but in April there is a noticeable increase in skill, coinciding with the onset of the melt season. The sea ice concentration remains largely unchanged from March to April across most of the Arctic, so this increase in skill comes from the atmospheric predictors. With melt water appearing in late March and April, the Arctic's climate system adjusts to accommodate these new conditions and is able to absorb more solar radiation, influencing local temperature and pressure gradients and therefore atmospheric circulation. From May through July, the climatology BSS increases substantially, with the dominant source of predictability shifting to the sea ice concentration. This coincides with more pronounced changes in sea ice concentration across the Arctic in the summer months.
The forecasted total sea ice extents (calculated via the method described in section 3.3) show less accuracy using January through March predictors (particularly for extreme events), but skill increases with decreasing lead time beginning in April. While years with extreme events pose challenges for this forecasting method, there is potential for improved performance by simply including more observations as they become available, particularly as low extent years progressively become the norm. There will, however, always be a limit on seasonal predictive accuracy due to variability in summer atmospheric circulation (Serreze & Stroeve, 2015), a point that has historically received much attention in the literature (Lynch et al., 2001;Maslanik & Stroeve, 1999;Serreze et al., 1995;Stroeve et al., 2012Stroeve et al., , 2008. In case studies depicting the impact of variability in summer atmospheric circulation on minimum September extent, Serreze and Stroeve (2015) point out that yearly variations of temperature and wind direction strongly influence the dynamic and thermodynamic states of sea ice. Because summer weather conditions in the Arctic are highly variable (Serreze & Barrett, 2008, 2010, seasonal forecasts of summer sea ice will always suffer. Additionally, recent studies have shown that there may be a summer sea ice predictability barrier in spring for some Arctic regions (Bonan et al., 2019;Bushuk et al., 2017;Day et al., 2014). By assessing general circulation model outputs, these studies found a universal drop in prediction skill when models are initialized in May compared to June, particularly in the marginal seas of the Arctic basin. Given these limitations, the probabilistic forecasting method presented here provides accurate predictions, particularly at long lead times by utilizing the predictive power of wintertime geopotential heights and SSTs.