Downscaled subseasonal fire danger forecast skill across the contiguous United States

The increasing complexity and impacts of fire seasons in the United States have prompted efforts to improve early warning systems for wildland fire management. Outlooks of potential fire activity at lead‐times of several weeks can help in wildland fire resource allocation as well as complement short‐term meteorological forecasts for ongoing fire events. Here, we describe an experimental system for developing downscaled ensemble‐based subseasonal forecasts for the contiguous US using NCEP's operational Climate Forecast System version 2 model. These forecasts are used to calculate forecasted fire danger indices from the United States (US) National Fire Danger Rating System in addition to forecasts of evaporative demand. We further illustrate the skill of subseasonal forecasts on weekly timescales using hindcasts from 2011 to 2021. Results show that while forecast skill degrades with time, statistically significant week 3 correlative skill was found for 76% and 30% of the contiguous US for Energy Release Component and evaporative demand, respectively. These results highlight the potential value of experimental subseasonal forecasts in complementing existing information streams in weekly‐to‐monthly fire business decision making for suppression‐based decisions and geographic reallocation of resources during the fire season, as well for proactive fire management actions outside of the core fire season.


| INTRODUCTION
Fire hazards have become increasingly recognized in recent decades, particularly in the western United States (US) where wildland burned area increased approximately 500% during 1984-2020 (Higuera & Abatzoglou, 2021). Increased damages from wildfires have prompted interest in fire intelligence for both suppression activities and proactive mitigation efforts (e.g., prescribed fire) including that of fire weather forecasts. In the US, forecasts include short-term fire weather forecasts from the National Weather Service and associated warnings (e.g., Red Flag Warning issued within 48-h of critical fire weather conditions), and longer-range monthly-to-seasonal outlooks from Predictive Services (Clark et al., 2020;Owen et al., 2012). Previous studies have shown promising skill of medium-range forecasts of daily fire weather for lead times of up to a week (Di Giuseppe et al., 2020;Worsnop et al., 2020), empirical seasonal forecasts of fire activity (Chen et al., 2020;Preisler et al., 2011) and seasonal fire danger (Sampath et al., 2021). In this paper, we evaluate subseasonal forecasts straddling this time continuum that have the potential to not only inform prepositioning and mobilization of resources for fire suppression but also windows of opportunity for using managed and prescribed fire.
Subseasonal forecasts have garnered increased effort in the past decade given the joint interests in improving weather and climate-readiness across society and increased computational capacity and scientific knowledge (White et al., 2017). These subseasonal timescales represent key time frames for proactive decision-making yet present a notorious predictability forecast gap in the weather-climate continuum between skill inherited by initial conditions in the case of short-to-medium range weather forecasts ($10 days), and the skill inherited by boundary conditions in seasonal climate forecasts. Subseasonal forecasts are influenced by both initial conditions and boundary conditions (e.g., soil moisture), as well as subseasonal teleconnections such as the Madden Julian Oscillation. While deterministic solutions are still commonly used in short-term forecasts, subseasonal forecasts are often probabilistic in nature due to larger uncertainty at weeks 2-4 lead time, and more targeted to provide weekly summaries. Numerous studies have documented skill in 1-3 week lead times for temperature and precipitation forecasts (Baker et al., 2019), the occurrence of ridging events (Gibson et al., 2020), and remain an active area of research (Pegion et al., 2019). Similarly, subseasonal forecasts have been applied to meteorological extremes such as severe weather (Gensini et al., 2020) and tropical storms (Robertson et al., 2020) as well as to inform water resources, agriculture, and energy impacts (White et al., 2017). To date, more limited studies have examined the potential for seasonal-to-subseasonal forecasts for fire danger (Bedia et al., 2018;Roads et al., 2005;Worsnop et al., 2021) creating a gap in elucidating where and when current operational subseasonal forecasts may aid in fire decision support as well as where current systems fall short in providing useful information.
The goal of this study is to detail an experimental subseasonal forecast system and quantify the predictive skill for different fire weather and fire danger indices across the contiguous US. Unlike previous studies (Roads et al., 2005), we evaluate the skill of subseasonal forecasts across CONUS and across different seasonal windows due to the distinct spatiotemporal variations in fire danger and application of forecasts. We aim to provide information on where such forecasts provide statistically significant skill and where forecast skill is currently insufficient given the need to articulate subseasonal forecasts to decision-makers (White et al., 2017).

| DATA AND METHODS
We use the Climate Forecast System version 2 (CFSv2; Saha et al., 2014), which is one of the North American operational subseasonal-to-seasonal dynamical forecast models (Pegion et al., 2019). The CFSv2 is a fully coupled atmosphere-ocean-land forecast system with multiple forecasts per day (00 UTC, 06 UTC, 12 UTC, and 18 UTC, with each forecast including four ensemble members). In this study, we use 6-hourly surface (wind velocity, maximum and minimum temperature, specific humidity, precipitation rate) and radiative fluxes (downward shortwave flux at the surface) forecast outputs on approximately a 1-degree horizontal grid. We acquired 10-years of archived operational forecasts over two, three-day periods per month from the National Center for Environmental Information for the period May 2011-Apr 2021. The first 3-day forecast period was initialized at the beginning of the month (first day and previous 2 days) while the second forecast period was initialized in the middle of the month (15th day of the month and previous 2 days), yielding a total of 12 ensemble members for each set. A lagged ensemble approach for model evaluation efforts has been shown to improve forecast skill, particularly at longer subseasonal leads (Vitart & Takaya, 2021). Even though the experimental forecasts use four times as many ensemble members (i.e., there are four ensemble members for each forecast at 00Z, 06Z, 12Z, 18Z), ensembles for each forecast were not available for the archived forecasts analyzed. While the amalgamation of forecasts with different initial conditions can degrade short-term predictive skill, larger ensemble sizes generally improve skill for subseasonal timescales (Vitart & Takaya, 2021) which are the focus of this paper. Herein, we focus on forecasts out to 28-days.
We briefly describe the system for downscaling CFSv2 forecasts which is used both in the evaluation of retrospective forecast skill and in the experimental real-time system (Figure 1). Forecasts of 6-hourly surface and radiative fluxes from CFSv2 from each ensemble member are used to calculate daily maximum and minimum temperature, 24-h precipitation accumulation, daily mean wind speed, daily mean downward shortwave radiation at the surface, and daily mean specific humidity. We define a day based on a quasi-calendar day for CONUS ending at 06 UTC. Forecasts are statistically downscaled to a $4-km grid using a simple spatial interpolation of coarsescale CFSv2 anomalies. This approach bilinearly interpolates and superposes forecast anomalies from CFSv2 to climatologies from gridMET (Abatzoglou, 2013) for the aforementioned variables. Briefly, gridMET is a surface meteorological dataset covering the contiguous US from 1979-present that was developed using an amalgamation of regional reanalyzes and station-based products and has been widely used in fire-climate and fire-weather studies. A common reference period (2011-2021) is used for calculating anomalies. Incorporation of longer duration reference periods of 1999-2010 achieved by appending CFSv2 calibration climatologies for 1999-2010 with those calculated from the operational CFSv2 products did not materially alter the conclusions of forecast skill. We additionally account for biases in CFSv2 imparted by model drift by adjusting the climatology as a function of forecast lead time and initialization date. Studies have shown some additional forecast skill imparted through statistical postprocessing of forecasts (Worsnop et al., 2020). We opted to use a simple approach as slightly more sophisticated approaches such as biascorrection spatial downscaling have shown nominal differences in seasonal forecast skill (Barbero et al., 2017). The resultant forecasts are herein referred to as CFSv2-gridMET.
Diurnal relative humidity (RH) fields are not standard output from most subseasonal forecasts yet are needed to calculate many fire danger indices. We initially approximated forecasted daily maximum and minimum RH using paired daily mean specific humidity and daily minimum and maximum temperature, respectively. Secondly, we biascorrected these forecasts using monthly climatological differences (i.e., 2011-2021) between gridded historical data and approximated maximum and minimum relative humidity values to account for systematic biases inherited through this approach. These differences were added to estimated maximum and minimum relative humidity to obtain our final values in CFSv2-gridMET.
Observation data were based on the 4-km surface gridded meteorological dataset gridMET (Abatzoglou, 2013). These data include the complete set of variables needed to calculate several fire danger indices from the US National Fire Danger Rating System (NFDRS, Cohen & Deeming, 1985) as well as reference evapotranspiration (ETo). Herein, we focused validation efforts on a reduced set of computed variables and timescales that have potential interest for fire management: (i) weekly average (e.g., week 1, days 1-7; week 2, days 8-14) Energy Release Component (ERC), (ii) weekly average 100-h dead fuel moisture (FM100), and (iii) weekly total ETo. We also perform a supplemental validation exercise for the occurrence of extreme fire weather conditions defined by the days where the Burning Index (BI) exceeded local 97th percentile conditions. Note that while subseasonal forecasts of individual meteorological variables were bias corrected to gridMET climatologies, our verification efforts focus on temporal skill attributes. FM100, ERC, and BI are outputs from the NFDRS, which computes a suite of numerical outputs that are proxies for potential fire behavior and have direct use in fire management. FM100 is a proxy for the moisture content of 100-h dead fuels having diameters of 25-to-75 mm. FM100 responds to weekly fluctuations in precipitation, relative humidity, temperature, and day length and is used both in risk assessment for large fires as well as a factor in some prescribed burn plans. We specifically calculate ERC and BI corresponding to fuel model G (dense conifer with heavy fuels) that has been widely used nationally. Specifically, the ERC is a weather-climate build-up index of potential fire energy release that considers the influence of meteorology on live and dead fuels but is insensitive to wind speed. ERC is widely used for tracking seasonal fire potential and large-fire potential and has strong serial correlation inherent in FM100 as well as 1000-h dead fuel moisture that represent moisture content of dead fuels 75-200 mm in diameter that entrains a 1000-h time lag. BI is influenced by both ERC and fire weather conditions-namely wind speed-and is a proxy for flame length and difficulty in fire control and is a frequent metric used in wildland fire decision-making  (Jolly et al., 2019). ETo is calculated based on the American Society of Civil Engineers (ASCE) standardized Penman-Monteith reference evapotranspiration approach (Allen et al., 1998) and represents the potential flux of moisture from a well-watered reference surface to the atmosphere. ETo-and its standardization in the Evaporative Demand Drought Index (EDDI, Hobbins et al., 2016;McEvoy et al., 2016)-has gained traction as a fire-relevant variable given its established links to wildland fire danger and activity (Abatzoglou & Kolden, 2013;McEvoy et al., 2019). Forecasts for fire danger indices are initialized by running NFDRS using gridMET observations leading up to the forecast period and appending downscaled forecasts for days 1-28, while ETo forecasts exclusively use downscaled forecasts. Validation metrics for weekly forecasts included anomaly correlation coefficient (ACC), mean absolute error (MAE), and bias at the native resolution of gridMET across CONUS. The ACC is based on the Pearson correlation coefficient (r) of temporal anomalies at each location and excludes latent correlation entrained through the seasonal climatology. To increase sample sizes and reduce sampling variability, we calculate all metrics using 3-month centered-moving windows (e.g., skill assessments for March pool all forecasts issued in Feb-Apr). Skill assessment is performed using the ensemble mean of the forecasts. While the forecast system produces daily output, we examine weekly aggregated forecasts for week 1 (day 1-7) though week 4 (days 21-28). Forecast skill for ACC is deemed to be statistically significant (p < 0.01) when r > 0.3.
Second, we provide a more detailed assessment of probabilistic skill for metrics aggregated to a subset of Predictive Service Areas (PSAs). PSAs are fire management units designed to capture subregional commonalities in fire weather, fuels, and climate and are widely used geographic units for monitoring and forecasting fire danger and resource allocation decisions. We illustrate probabilistic skill for three PSAs in the western US that span geographical and climatological gradients: (i) Northern Sierra PSA in California; (ii) Payette PSA in western Idaho; and (iii) White Mountains-Gila PSA on the Arizona-New Mexico border.
Probablistic skill assessments at the PSA level use two complementary approaches that both use the full set of ensemble forecasts. First, we quantify the skill of forecasts to capture at least 1 day of fire weather extremes on weekly time scales using Brier Skill Scores (BSS). This exercise explicitly asks whether subseasonal forecasts capture periods of high fire danger that have important consequences for fire suppression. Fire weather extremes were defined by BI values exceeding local 97th percentile values for the 2011-2021 period pooled for the entire calendar year. We calculate BSS on weekly time scales where reference climatological forecasts were generated by bootstrap resampling 10-years (n = 1000) among historic years (2011-2021) to generate ensembles of climatologies to compare against. Secondly, we calculate Ranked Probability Skill Scores (RPSS) for weekly ERC and ETo forecasts timescales across the ensemble. The RPSS provides a measure of the reliability of probabilistic forecasts relative to climatology. We considered quintiles for category probabilities and used the observed 2011-2021 as the climatology for our reference forecast. For both BSS and RPSS, values of 1 indicate perfect skill whereas values of 0 indicate skill that is no better than a reference forecast.

| RESULTS
Skill for week 3 forecasts of ERC, FM100, and ETo across the primary western US fire season (April, June, August, October) illustrate several commonalities and differences ( Figure 2). First, we see notably higher ACC for ERC than FM100 or ETo, arising through stronger memory of ERC. Nearly 76% of week 3 ERC forecasts had statistically significant ACC across CONUS, whereas approximately 28% and 30% of CONUS had significant ACC for FM100 and ETo, respectively (Table 1a). The use of ACC in ETo forecast skill yields results that would be comparable for EDDI given its construct. Second, we see generally higher ACC across western and central CONUSnotably for FM100 and ERC.
We further illustrate differences in ACC forecast skill for ERC and FM100 as a function of lead time for forecasts made in July ( Figure 3). As with most subseasonal forecasts, skill progressively degrades with lead time from week 1 to week 4. However, the inherent memory of 1000h fuels in ERC calculations allows for high ACC to persist well into weeks 3 and 4. By contrast, FM100 ACC degrades more rapidly. For example, we found significant ACC skill for week 4 ERC forecasts across approximately 54% of CONUS compared with 16% of CONUS for week 4 FM100 forecasts (Table 1a). Supplemental analysis of ACC for the meteorological variables is provided in Table S1, which showed widespread skill in weeks 1-2 with declines with further lag time. Biases were relatively low across CONUS at various lead times (Table 1b). By contrast, forecast error for ensemble means expressed through MAE showed a progressive increase from week 1 through week 4 (Table 1c).
A more detailed analysis of probabilistic forecast skill for select PSAs elucidates additional forecast attributes. First, we found that forecasts of weeks with at least 1 day of extreme fire weather (97th percentile BI) showed skill (BSS > 0.15) for week 1 across the three PSAs ( Figure 4). However, skill for weeks 2-4 was variable across PSAs suggesting limited skill at longer time horizons. The decline in skill for fire weather extremes beyond a 2-week lead time is consistent with reduced forecast skill, particularly for wind speed, that factors strongly into extremes in BI (Table S1; Worsnop et al., 2020). By contrast, similar validation exercises for ERC extremes-which does not incorporate wind speed-showed BSS > 0.15 through weeks 3 and 4 ( Figure S1).
T A B L E 1 (a) Percent of CONUS with significant anomaly correlation coefficient (ACC) skill at various lead times. Significant skill is quantified by ACC with p < 0.01 (or r > 0.3). Results are further averaged across all months. Values for ERC and FM100 refer to weekly means whereas values for ETo refer to weekly sums. (b) Mean forecast bias over CONUS for weeks 1-4. Results are further averaged across all months. (c) Mean absolute error of ensemble mean forecasts over CONUS for weeks 1-4. Reported are mean absolute errors averaged for all months.
Week 1 Week 2  Probabilistic forecasts of weekly ERC and ETo for the three PSAs showed positive RPSS for lead times of up to week 4 ( Figure 5). Consistent with ACC, we show coherent higher RPSS for ERC compared with ETo skill across PSAs. Generally, week 1-2 ERC forecasts had RPSS > 0.2 during the core of the fire season across PSAs, although with somewhat lower RPSS for week 3 forecasts in July-September when the greatest amount of burned area occurs. In general, probabilistic skill realized through RPSS coincided with skill realized through the ensemble mean ACC.

| DISCUSSIONS AND CONCLUSIONS
We demonstrate that statistically downscaled subseasonal forecasts of fire danger from an experimental forecast system have significant skill at up to 3-weeks lead time across CONUS. Building on prior efforts that show forecast skill among fire danger indices in week 2 (Worsnop et al., 2021), we highlight seasonal differences in forecast skill across select fire danger metrics as well as at longer lead times that may be of value in fire management planning efforts. Similar to previous studies, we find greater week 2-3 skill for fire danger metrics like ERC that evolve more slowly to changing conditions and are strongly influenced by initial fuel moisture as compared with ETo or pure fire weather indices that do not incorporate fuel moisture or other antecedent conditions (Roads et al., 2010;Worsnop et al., 2020). Nonetheless, we show that ETo exhibits moderate week 3 skill across a majority of CONUS similar to the results of  and likely due to the heightened sensitivity of ETo to temperature and complementary relationship of evaporation at longer timescales inherent in coupled atmosphere-land processes.
F I G U R E 3 Anomaly correlation coefficient for (a-d) ERC and (e-h) FM100 for weeks 1 through week 4 for July forecasts from CFSv2-gridMET. Results are shown only for the ensemble mean. Correlation values >0.3 are considered statistically significant.
While we identify geographic and seasonal skill in fire danger metrics-most notable as realized through ERCseveral areas of reduced or limited skill are noted. First, we show limited skill for periods of high wind-driven fire danger beyond week 2. This is likely due to weaker skill in forecasts of wind-driven critical fire weather events as we demonstrate some skill in weekly meteorological variables that contribute to fire danger indices and ETo (Table S1). Second, we show that probabilistic skill is weaker during the core fire season in the western US highlighting a need to elucidate contributors to reduced subseasonal forecast skill and means to overcome such challenges. This is potentially due to the MJO-a prominent source of subseasonal forecast skill-being both weaker and having a more limited influence on temperature and precipitation during summer across the western US (Slingo & Palmer, 2011). Finally, we note that while the data behind subseasonal forecasts can be viewed on daily timescales, there is limited correlative skill in most daily forecasts beyond 14-days. Skill for weekly timescales exceeds that for daily timescales at longer time lags similar to that highlighted in Worsnop et al. (2021).
CFSv2 is one of several subseasonal forecasting systems. Advances in subseasonal forecast skill may be gained through multimodel approaches (Pegion et al., 2019) as well as through the use of machine learning approaches (Gibson et al., 2021). We used a simple statistical downscaling approach here for computational efficiency and to mirror that of the semi-operational system. More sophisticated statistical and dynamical downscaling of raw model output may add value (Worsnop et al., 2020;Worsnop et al., 2021), but it is beyond the scope of the present study. Finally, while we report the skill of forecasts, studies have increasingly highlighted windows of opportunity for subseasonal and seasonal forecasts imparted through strong signals in the El Nino-Southern Oscillation, F I G U R E 4 Brier Skill Scores (BSS) of weekly forecasts of at least 1 day of extreme fire weather (Burning Index exceeding local 97th percentile) for three Predictive Service Areas (PSA). Inset map shows the geographies covered by the Northern Sierra (yellow), White Mountain-Gila (red), and Payette (blue) PSAs.  (Krishnamurthy et al., 2021;Mariotti et al., 2020). For example, Jones et al. (2011) showed improved forecast skill for precipitation extremes over CONUS during active phases of the Madden Julian Oscillation.

F I G U R E 5 Ranked Probability
The procedures for downscaled CFSv2 forecasts (gridmet-CFSv2) have been incorporated into an experimental subseasonal forecast system. This system produces an ensemble of 48 forecasts for the next 28 days over the contiguous United States based on CFSv2 forecasts from the previous 3 days. Forecasts include both surface meteorological variables, NFDRS fire danger indices, ETo, and EDDI with raw data and visualizations available through two platforms, the Climate Toolbox (ClimateToolbox.org) and Climate Engine (app.ClimateEngine.com; Huntington et al., 2017). Figure 6 provides examples of these visualization platforms. In the Climate Toolbox, the subseasonal forecast tool shows the daily spread of the 48-forecasts for the next 28 days and the probability of the forecasts falling within management relevant percentile-based categories (Figure 6a). This visualization provides a novel locationspecific alternative for the probability information and allows the user to tailor the display to management-F I G U R E 6 (a) Example visual from Climate Toolbox's Subseasonal Forecast tool, which provides users the ability to display daily forecast probability for fire danger indices relative to local climatological conditions based on the percent of ensemble members agreeing on forecast categories. Shown are the forecast probabilities for the Energy Release Component (ERC) in Wawona, California for forecast initialized during 18-20 Jul 2022. Extreme fire danger represents ERC values above the 97th percentile, Very High fire danger represents conditions between the 90th and 97th percentile, and High and Moderate fire danger represent ERC values between the 80th and 90th percentile and 50-80th percentile, respectively. (b) Map of week 3 ensemble-mean ERC departure from 1981 to 2010 climatological average from Climate Engine. specific categories as very high and extreme fire danger represent days above the 90th and 97th percentile that incur increased levels of fire operations preparedness, tactics, and hazards (Jolly et al., 2019). A second example from Climate Engine (Figure 6b) shows a map of week-3 ERC anomaly forecasts.
While macroscale burned area is strongly correlated to seasonal and intraseasonal variability in fire danger metrics (Abatzoglou & Kolden, 2013;McEvoy et al., 2019), fire activity at more refined spatial and temporal scales is confounded by a host of other factors such as fuels, ignitions, and fire suppression resources. Nonetheless, incorporation of subseasonal fire danger forecasts may be used to improve fire outlook forecasts (Preisler et al., 2016;Turco et al., 2018). Skillful fire danger forecasts may be of particular value when paired with longer-range outlooks of critical fire weather such as lightning outbreaks (Abatzoglou & Brown, 2009) or wind events (Jones et al., 2010). The moderate forecast skill of ERC at week-2 and week-3 time scales is promising for the development of early warning systems that may both inform regional fire management during the primary fire season for suppression-based decisions and geographic reallocation of resources. Finally, the identification of skillful forecasts is just an initial step in providing usable information for decision-making. Coproduction of forecasts with fire management decision-makers can help further address specific information gaps and systematic evaluation of forecast skill for such gaps, identification of barriers in using forecasts in the context of current decisionmaking systems, and thresholds of forecast quality and uncertainty needed to translate data into decisions (Hartmann et al., 2002).