Statistical characteristics of Arctic forecast busts and their relationship to Arctic weather patterns in summer

Recently, human activity in the Arctic region, such as trans‐Arctic shipping, has increased due to the reduction in Arctic sea ice. Accurate weather forecasts will become increasingly important as the level of human activity in the Arctic continues to increase. Operational numerical weather predictions (NWPs) have been improved considerably over recent decades; however, they still occasionally generate large forecast errors referred to as “forecast busts.” This study investigates forecast busts over the Arctic between 2008 and 2019 using operational forecasts from five leading NWP centers. Forecasts with an anomaly correlation coefficient below its climatological 10th percentile, and a root‐mean‐square error above its 90th percentile at a lead time of 144 hr, are regarded as “busts.” The occurrence frequency of forecast busts decreased from 2008 (13–7%) to 2012 and was between 2 and 6% for the period 2012–2019. Arctic forecast busts were most frequent in the May and July–September periods (~6 to 7%), but less frequent between December and March (~4%). The summertime forecast bust occurred more frequently when the initial pattern was the Greenland Blocking (GB) or Arctic Cyclone (AC) pattern rather than one of the other patterns. Some busts occurred without the weather pattern transition (~22 to 40%), but the others occurred with the pattern transition. These results help users to be careful when they use the forecasts initialized on GB and AC patterns.


| INTRODUCTION
Improvements in our understanding of both dynamical and physical processes, as well as in computational efficiency, have allowed numerical weather predictions (NWPs) to improve significantly over recent decades (Bauer et al., 2015). The leading NWP centers across the globe now routinely provide high-resolution deterministic and low-resolution ensemble forecasts on medium-range timescales. However, NWPs occasionally generate very poor forecasts ("forecast busts") despite the huge improvements in forecast skill (Rodwell et al., 2013).
Forecast busts across Europe have been investigated in many previous studies. Rodwell et al. (2013) showed that the verifying analysis composite for forecast busts across Europe shows blocking over Scandinavia, and the initial analysis composite shows the Rockies trough accompanied by high convective available potential energy (CAPE) over North America. Lillo and Parsons (2017) showed that these busts occurred during large-scale pattern transition caused by amplification of Rossby waves. Grams et al. (2018) also showed the importance of moist processes associated with the warm conveyor belt for the European forecast busts. Magnusson (2017) found that the error sources originated from the tropical Pacific, North America, and North Atlantic by three bust cases.
Recently, human activity in the Arctic region, such as trans-Arctic shipping, has increased due to the reduction in Arctic sea ice (Eguíluz et al., 2016;Melia et al., 2016). Accurate weather forecasts are becoming increasingly important as human activity continues to increase in the Arctic. Although the forecast skill over the Arctic has been increasing for the past 10 years (Jung and Matsueda, 2016), operational predictions occasionally generate very poor forecasts (9 and 10 in July in Figure 1), as in the case of the European forecast busts. Yamagami et al. (2018a;2018b;2019) showed that operational ensemble forecasts generate large central pressure and position errors at ≥4.5 days before the mature stage of extraordinary Arctic cyclones. This suggests that such extraordinary Arctic cyclones could be one of the possible events that lead to the occurrence of Arctic forecast busts. Our forecast skill with respect to the Arctic atmosphere has a large influence on our ability to accurately forecast Arctic sea ice (Nakanowatari et al., 2018) and midlatitude atmosphere (Jung et al., 2014), especially during periods affected by Scandinavian blocking . These previous studies indicate that Arctic forecast busts would significantly influence the forecasts of other climate systems and other regions.
This study investigated the characteristics of forecast busts over the Arctic by using operational forecasts from major NWP centers and the relationship between forecast busts and weather patterns over the Arctic in summer.

| Forecast data
The operational forecast data used in this study are available from the TIGGE database (Swinbank et al., 2016) managed by the European Centre for Medium-range Weather Forecasts (ECMWF). We used ensemble forecast data from five NWP centers: the Canadian Meteorological Centre (CMC), ECMWF, the Japan Meteorological Agency (JMA), the US National Centers for Environmental Prediction (NCEP), and the UK Met Office (UKMO). These five NWP centers show higher performance than the other NWP centers available at the TIGGE database in the Northern Hemisphere (Matsueda and Tanaka, 2008;Swinbank et al., 2016) and over the Arctic (Jung and Matsueda, 2016). Ensemble forecasts initialized at 1200 UTC on every day from January 1, 2008 to December 31, 2019 were used in this study. Note that there are some missing data for each NWP center (in particular, there are long missing periods in 2017 and 2018 for CMC and 2014 for UKMO). The forecast data had a grid spacing of 2.5 and a temporal resolution of 1 day.

| Forecast skill and threshold of forecast bust
To detect the forecast busts, we used the uncentered anomaly correlation coefficient (ACC) and root-meansquare error (RMSE) of the latitude-weighted geopotential height at 500 hPa (Z500) over the Arctic (≥65 N) as follows (Wilks, 2019): where Z500 f , Z500 a , and Z500 c are the predicted, analyzed, and climatological Z500, respectively, and N is the total number of grid points over the Arctic. We used the own-control analysis (an initial field of the control forecast) from each NWP center to calculate the ACC and RMSE in a bias-free manner. The climatological Z500 was calculated using the ECMWF Reanalysis 5 (ERA5) data (Hersbach et al., 2020). Rodwell et al. (2013) defined the threshold for forecast busts over Europe as a forecast with an ACC of less than 0.4 (ACC thre = 0.4) and an RMSE greater than 60 m (RMSE thre = 60 m) at a lead time of 144 hr. However, the forecast skill differs among the NWP centers and in different regions. The number of busts shows large (small) dependency on ACC thre for larger (smaller) RMSE thre ( Figure S1). The number of busts is sensitive to both ACC thre and RMSE thre . To obtain a subjective threshold for the Arctic forecast busts, we calculated the probability density functions (PDFs) of the ACC and RMSE in each month using control forecasts from each NWP center at a lead time of 144 hr over our analysis period from 2008 to 2019. Then, the climatological 10th percentile value of the ACC and 90th percentile value of the RMSE were retrieved from each PDF. When the control forecasts showed an ACC of less than the 10th percentile value of ACC and an RMSE greater than the 90th percentile value of RMSE for each month at a lead time of 144 hr, the forecasts were regarded as forecast busts.

| ACC and RMSE distributions
The distribution of the ACC for Z500 in the Arctic at a lead time of 144 hr shows that 50% of the forecasts (25th-75th percentile values, colored box in Figure 2a) between 2008 and 2019 had ACC values of 0.6-0.9 in all months and for all NWP centers, except for CMC in May, June, and October, JMA in June and July, and NCEP in June.
The ACC was typically highest in February (the average ACC was 0.73-0.81, white circle) and lowest in June (the average was 0.63-0.71). The 10th, 25th, 75th, and 90th percentiles as well as the median ACC values also showed a similar seasonal cycle to the average. All of the NWP centers showed the largest standard deviation of ACC in May or June, except for UKMO whose standard deviation was the largest in October. CMC, ECMWF, JMA, and NCEP showed the second largest standard deviation in October. The standard deviation of ACC was the smallest in March for CMC, November for ECMWF and JMA, and December for NCEP. These results indicate that the spatial distribution of synoptic systems is more (less) predictable in summer to autumn (winter to spring). The RMSE at lead times of 144 hr also shows a similar seasonal cycle to the ACC (Figure 2b). The RMSE was highest in January (the average RMSE was 81.7-95.7 m) and lowest in August (the average was 66.6-74.2 m). The seasonal cycles of the 10th, 25th, 75th, and 90th percentiles, as well as the median values, were similar to that of the average RMSE for all NWP centers. The standard deviation of the RMSE was highest in December or January and lowest in June or August, which is consistent with the amplitude of the geopotential height anomaly in winter and summer.
Over all, ECMWF showed the highest skill in all months, and CMC or JMA showed the lowest skill among the five NWP centers. The standard deviations of ACC and RMSE were smallest for ECMWF, indicating that the quality of the ECMWF forecast is more stable compared with the other centers.
As mentioned above, the bust threshold was the 10th percentile value of the ACC and 90th percentile value of the RMSE in each month for the individual NWP centers (Table S1, Supporting Information). In July 2016, the ACCs for the ECMWF, NCEP, and UKMO forecasts initialized on 10 were lower than the 10th percentile value (dotted lines in Figure 1a), and at the same time, the RMSEs for these forecasts were higher than the 90th percentile value (Figure 1b). On the other hand, the ACC for the JMA forecast initialized on 10 was lower than the 10th percentile value, but the RMSE for its forecasts was lower than the 90th percentile values. Thus, we regarded the ECMWF, NCEP, and UKMO forecasts as busts, but the CMC and JMA forecasts were not.

| Frequency of forecast busts
The proportion of forecasts that were busts over the Arctic was highest in 2008 for all NWP centers except NCEP (Figure 3a). In 2008, about 13% of forecasts were busts for CMC and ECMWF, 10% for JMA and UKMO, and 7.5% for NCEP. The proportion of forecast busts decreased significantly from 2008 to 2012 for all NWP centers, falling to between 3 and 5% (i.e., ca. 10-18 days) in 2012. The decrease in forecast busts indicates the improvements in the forecast systems (e.g., model resolution, assimilation systems, and boundary conditions). Rodwell et al. (2013) showed that the number of European busts had a local maximum in 2008, suggesting that the frequency of less predictable patterns for the operational ECMWF model was higher in 2008 than in the other years. As with the European flow patterns, the frequency of less predictable Arctic flow patterns might be higher in 2008 than in the other years. Although after 2012 the proportion of busts remained below 6% for all NWP centers, except for JMA and NCEP in 2015, the year for the local maximum differed among the NWP centers. The seasonal cycle of the proportion of forecast busts has two peaks ( Figure 3b). One is in May, and the other is in mid-to late-summer. Although all NWP centers show the peak in May clearly, the later peak differs among the NWP centers (July for ECMWF; August for CMC, JMA, and UKMO; and September for NCEP). At these peaks, the proportion of forecast busts was approximately 6-7% (ca. 21-26 days). The proportion of forecast busts was lowest in winter, with a value of around 4% (ca. 14 days). As a large number of the Arctic forecast busts occurred in summer, we focus on these summertime busts in the next subsection.

| Frequency of summer forecast busts and its relationship to Arctic weather patterns
As with the annual bust proportion, the number of forecast busts in summer generally decreased from 2008 to 2019 for all NWP centers except JMA (Figure 4). More than 10 busts occurred over the period 2008-2010 for all NWP centers. In particular, 15 ECMWF forecasts were busts in 2008 (Figure 4b). After 2011, the number of forecast busts was at most six during summer for NCEP and UKMO (Figure 4d,e), indicating that the summer busts have a similar interannual variability to the annual busts for these two centers (Figure 3a). For the CMC and ECMWF, the number of busts remained relatively high until 2013, but the number decreased significantly after 2013 (Figure 4a,b). In contrast to these NWP centers, the number of forecast busts for JMA was large, even in 2016 and 2017 (Figure 4c), indicating that the summer forecast busts contribute to the higher proportion of annual busts for JMA in 2016 and 2017 (Figure 3a).
To investigate the atmospheric situation over the Arctic associated with these busts, we classified the Arctic atmospheric circulation into five weather patterns based on the k-means clustering method for 20 non-normalized principal components of Z500 anomaly over the Arctic area, as used by Matsueda and Kyouda (2016) and Matsueda and Palmer (2018). The five weather patterns are called as the Arctic Dipole (AD), Greenland Blocking  Figure S2a-e and Data S1). The classification revealed that forecasts initialized on any of these weather patterns can bust (Figure 4a-e). However, the dominant initial weather pattern for busts differed among the years and NWP centers.
Over the period 2008-2013, a large number of forecasts initialized on the GB pattern were busts for all NWP centers. In particular, between 40 and 80% of the forecast busts initialized on the GB pattern. The number of busts initialized on the GB pattern decreased until 2013 for all NWP centers. For JMA, NCEP, and UKMO at a lead time of 144 hr, the predicted BH pattern (left number at top-right corner in Figure 4h-j) was a smaller number than the analyzed BH pattern in ERA5 (Figure 4m-o). These results imply that the transition from initial GB pattern to the BH pattern after 144 hr is less frequent in the forecast than in analysis. In contrast, the predicted AD pattern (Figure 4f-i) was a larger number than the analyzed AD pattern (Figure 4k-n) for CMC, ECMWF, JMA, and NCEP. The transition from initial GB pattern to the AD pattern would be more frequent in forecasts than in analysis. These suggest that the persistence of high pressure over Greenland is difficult to predict for the NWP models.
The number of forecast busts initialized on the GB pattern decreased significantly over the period 2014-2019. Since the frequency of analyzed GB pattern in summer over the period 2014-2019 (22.3%) was almost similar to that over the period 2008-2013 (23.7%), this reduction indicates that the westward propagation of high pressure could be predicted correctly after 2013 due to improvements in NWP systems. Although CMC and ECMWF show no dominant initial weather pattern associated with the busts over the period 2014-2019 (Figure 4a,b), the dominant initial weather pattern for JMA, NCEP, and UKMO was AC (Figure 4c-e). In  (Simmonds and Rudeva, 2012;Yamagami et al., 2017). Besides, the AC pattern was dominant in the summer of 2017 (51/91 days), and an extraordinary AC was detected on 10 in August 2017 using the threshold in Yamagami et al. (2018b). For the extraordinary ACs, CMC and ECMWF showed a higher prediction skill for the central pressure than JMA, NCEP, and UKMO (Yamagami et al., 2019). In contrast, CMC showed the lowest prediction skill in the central position among the five NWP centers. These results suggest that busts associated with extraordinary ACs would have occurred due to the error for the AC deepening. However, during AC pattern, some busts were associated with the extraordinary ACs, the others were associated with ordinary ACs. These results indicate that the JMA, NCEP, and UKMO models have difficulties predicting the wandering, persistence, and decay of the ACs.
There are no dominant predicted and analyzed weather patterns at lead times of 144 hr (Figure 4f-o). Unlike the Scandinavian blocking for forecast busts in Europe (Rodwell et al., 2013), the NWP models do not have a specific weather pattern over the Arctic in verifying analysis.

| SUMMARY AND CONCLUSIONS
This study investigated the characteristics of the Arctic forecast busts using operational forecasts from five leading NWP centers. To define the threshold for Arctic forecast busts, we assessed the forecast skill of the operational forecasts from each month over the period 2008-2019. The ACC (RMSE) over the Arctic was highest in February (January) and lowest in June (August) for all NWP centers. The number of busts is sensitive to both ACC and RMSE thresholds. Therefore, we used the 10th percentile of the ACC and 90th percentile of the RMSE from each NWP center for each month as the subjective threshold for forecast busts over the Arctic.
Considering the proportion of forecasts in each year that were busts, 7% (NCEP) to 13% (CMC and ECMWF) were busts in 2008, but the proportion of busts then decreased significantly from 2008 to 2012 for all NWP centers. The proportion of forecast busts was between 2 and 6% for all NWP centers from 2013 to 2019, but the year of local maximum differed among the NWP centers. The monthly variability of forecast busts showed that the proportion of busts increased in the May and July-September periods (~7%), but decreased in December-March (~4%).
To investigate the relationship between forecast busts and atmospheric circulation in summer, we classified the Arctic atmospheric circulation into five patterns. The five atmospheric patterns were Arctic Dipole (AD), Greenland Blocking (GB), Arctic Cyclone (AC), Beaufort High (BH), and Summer NAO (SNAO). The dominant initial weather pattern associated with forecast busts was GB between 2008 and 2013 for all NWP centers. For the JMA, NCEP, and UKMO forecast busts, the AC pattern also shows a higher proportion. Although the forecast busts initialized on the GB pattern decreased after 2013 for all NWP centers, the forecast busts initialized on the AC pattern were still dominant for JMA, NCEP, and UKMO. In contrast, the CMC and ECMWF forecast busts did not show a specific initial weather pattern in recent years. The Arctic forecast busts were not associated with specific weather patterns at a lead time of 144 hr. Some summertime busts occurred without weather pattern transition (~22% for UKMO to 40% for JMA), but the others occurred with the transition ( Figure S5). These results suggest that the summer busts presumably occurred associated with the difference in the position of synoptic systems (e.g., difference in direction of ACs' wandering).
The European forecast busts occurred during Scandinavian blocking episodes (Rodwell et al., 2013), and its source of the errors were over North America and the Pacific equator (Magnusson, 2017). Over the Arctic, the forecast busts were associated with the initial GB and AC patterns. For the ECMWF bust initialized on July 10, 2016 (Figure 1), the initial weather pattern was the GB pattern, and it persisted up to a lead time of 96 hr. The GB pattern changed to the AD pattern at a lead times of 120 and 144 hr. The comparison between higher-and lower-skill five members showed the large positive and negative differences across the polar vortex at a lead time of 144 hr ( Figure S3g), and its source was the initial difference around the polar vortex ( Figure S3a). Besides, the spread of the control analysis among the five NWP centers in summer ( Figure S3a) was large over the Pacific side of the Arctic Ocean and Greenland, as with that in winter . The analysis spread classified by the weather patterns was large around the polar vortex for all patterns and over Greenland for the GB and SNAO patterns ( Figure S4b-f). These areas are one of the possible sources of the initial errors for the Arctic forecast busts. The observations over the Arctic region have potential impacts on improvements of forecasts over the Arctic and mid-latitudes (Yamazaki et al., 2015;Sato et al., 2018;Lawrence et al., 2019). This study also supports the impact of the increase in Arctic observation conducted by YOPP SOP-NH1 and 2 on the operational global forecasts. Therefore, the additional observations on the larger spread area for each pattern could reduce the Arctic forecast busts.
This study suggests that users should access the forecast uncertainty using ensemble forecasts and the differences in forecasts among the NWP centers when forecasts are initialized on the GB and AC patterns, especially on AC pattern in recent years. For the European forecast busts, moist processes associated with warm conveyor belt (Grams et al., 2018) and mesoscale convection over North America (Parsons et al., 2019) contribute to the large errors. In addition, Day et al. (2019) showed that the deterioration of the Arctic forecast reduces midlatitudes forecast skill during Scandinavian blocking episodes. Further studies of the detailed processes associated with error growth in Arctic forecast busts and the impact of Arctic forecast busts on mid-latitude forecast skill will be needed in the future.

ACKNOWLEDGMENTS
The author thanks to the ECMWF for providing ERA5 and TIGGE datasets. This study was supported by the Japan Society for the Promotion of Science (JSPS) Grantin-Aid for Research Activity Start-up, Number 19K23454.