Use and impact of Arctic observations in the ECMWF Numerical Weather Prediction system

This paper presents an assessment of the usage of Arctic atmospheric observations in the Numerical Weather Prediction (NWP) system of the European Centre for Medium‐Range Weather Forecasts, and of their impact on the quality of short‐ to medium‐range forecasts. The Arctic has low coverage of conventional data north of 70°N but one of the highest levels of coverage of satellite sounding data on Earth. The impact of Arctic observations on forecast skill was assessed by performing Observing System Experiments, in which different observation types were removed from the full observing system. This assessment was complemented by an analysis of Forecast Sensitivity to Observation Impact diagnostics. To our knowledge it is the first time that comprehensive numerical experimentation has been carried out to explore the role of different Arctic observations in a state‐of‐the‐art global operational NWP system. All Arctic observations were found to have a positive impact on forecast skill in the Arctic region, with the greatest tropospheric impacts on both short‐ and medium‐range forecasts due to microwave, conventional and infrared sounding observations. Results indicate the great importance of microwave sounding data and conventional data, which are found to be the key observing systems in the summer and winter seasons, respectively. These observations were found to have positive and statistically significant impacts on forecasts not only in the Arctic but also in the midlatitude regions at longer lead times. Differences between the seasons are most likely due to problems assimilating microwave sounding observations over snow and sea ice, leading to a reduced impact in winter. There is also the suggestion of increased importance of conventional data in winter, and other factors may also play a role.


INTRODUCTION
In recent years, there has been increased interest in accurate weather predictions for the polar regions of the globe. The climate is changing more rapidly in the Arctic than in sub-polar regions, a phenomenon known as Arctic amplification. Rapid warming of the Arctic opens opportunities for more shipping routes and increased tourism, but also creates challenges such as an increased risk of oil spills, which are likely to have more catastrophic effects here than in other regions . Because of this, it has become more important than ever to have accurate weather predictions for the Arctic region. Good weather predictions in the Arctic are also important for midlatitude forecasts in Eurasia and North America at longer lead times, due to linkages between the Arctic and the Northern Hemisphere midlatitudes (Francis and Vavrus, 2012;Cohen et al., 2014;Overland et al., 2015). For example, Jung et al. (2014) demonstrated that improved Arctic forecasts resulted in improved forecasts in the midlatitudes for a boreal winter period particularly over eastern Europe, northern Asia and North America. This paper contributes to ongoing international modelling activities in the framework of the Year of Polar Prediction (YOPP) coordinated by the World Meteorological Organization (WMO), 1 by presenting an assessment of the impact and use of different Arctic observation types in the Numerical Weather Prediction (NWP) system of the European Centre for Medium-Range Weather Forecasts (ECMWF). This assessment relies on extensive numerical experimentation with the ECMWF Integrated Forecasting System (IFS), which is used operationally to produce global weather forecasts at scales from days to months. Although studies on the impact of Arctic observations on Arctic and midlatitude weather forecasts have been carried out previously (e.g., Inoue et al., 2013;Sato et al., 2017), these have tended to focus on a single observation type and were conducted in assimilation systems that used fewer observations than would be common in current operational NWP, with very limited usage of satellite data in particular. To our knowledge, this is therefore the first time that comprehensive numerical experimentation has been carried out to explore the role of different Arctic observing systems in a state-of-the-art NWP system. NWP systems rely on satellite and conventional observations to determine the initial state of the atmosphere. These initial conditions are produced with comprehensive data assimilation systems which optimally combine the available observations with short-range forecasts to produce a best estimate of the state of the atmosphere. In the ECMWF 4D-Variational (4D-Var) data assimilation system, the largest number of observations are assimilated from satellite data in the form of radiances from microwave and infrared 1 www.polarprediction.net sounding and imaging instruments. Additional assimilated observations include AMV 2 winds, GPSRO bending angles and scatterometer winds, as well as conventional observations including those from radiosondes and wind profilers, synoptic observations (land-and ship-based), drifting buoy observations and aircraft data.
Conventional data coverage is relatively sparse in the Arctic, particularly north of 70 • N, where observations of this type tend to be more costly than in sub-polar regions. However, there is a very high level of coverage of satellite sounding data in the Arctic as compared to the equatorial region. Indeed, the polar regions are the most densely observed regions of the globe in terms of the data obtained from low Earth orbit (LEO) (or polar-orbiting) satellites, which include observations from high-impact microwave and infrared temperature and humidity sounders. Over the past 20 years satellite data have become increasingly important for global NWP, due to both an increase in the number of observations and the development of sophisticated data assimilation systems to fully exploit these data (Bauer et al., 2015). Given the high availability of satellite data over the poles, there is therefore huge potential for reducing forecast errors with good use of satellite observations. However, challenges remain with assimilating data in the Arctic, particularly in winter, due to higher forecast model errors in polar regions (Bauer et al., 2014;Bauer and Jung, 2016;Pithan et al., 2018) and difficulties in estimating the surface emission in the forward model over snow and sea ice for satellite data. In particular, the microwave surface emission is more variable over snow and sea ice, and there are assumptions in the modelling of this that are likely to be inaccurate (e.g., specular reflection; Guedj et al., 2010;Bormann et al., 2017).
The impact of different observing systems on weather forecasts can be assessed using Observing System Experiments (OSEs) (e.g., Bouttier and Kelly, 2001;English et al., 2004;Lord et al., 2004;Kelly and Thepaut, 2007;Radnoti et al., 2010;McNally, 2014;Bormann et al., 2019), or estimated using Forecast Sensitivity to Observation Impact (FSOI) diagnostics (Baker and Daley, 2000;Langland and Baker, 2004;Errico, 2007;Gelaro et al., 2007;Cardinali, 2009;Daescu and Todling, 2009). In the former, data assimilation experiments are run to remove different observation types from the full observing system and the increase in forecast error is assessed at various lead times (up to day 10 in our case). In the latter, the observation impact on 24-hr forecasts in the full observing system is estimated using adjoint-based methods. We use both of these techniques to assess the impacts of different observation types in the Arctic on forecast skill level. The aim is twofold: to evaluate the impact of conventional Arctic observations, which are costly 2 Note that all acronyms relating to satellite observations are expanded in the Appendix to obtain, and to evaluate whether we are making good use of the high number of satellite data from this region. We focus on two recent summer and winter periods -June-September 2016 and December 2017 to March 2018 -and compare results from the Arctic OSEs to those from global OSEs run for the same period, which are also presented independently by Bormann et al. (2019). A companion paper by Day et al. (2019) further investigates the regime dependence of the midlatitude impacts resulting from denying observations over the polar region during winter. This paper is organised as follows. First we present a summary of the ECMWF NWP system and the use of observations in the Arctic region. We then present the results from the OSEs, focusing on impacts on short-and medium-range forecast skill in the summer and winter seasons. Finally, we present an analysis of FSOI results and conclude with a discussion of all the results, making recommendations for future developments.

ECMWF INTEGRATED FORECASTING SYSTEM (IFS)
The ECMWF global NWP system, known as the Integrated Forecasting System (IFS), uses an incremental 4D-Var data assimilation system for determining the initial state of the atmosphere (Rabier et al., 2000). The 4D-Var method is used to produce a high-resolution deterministic analysis and an ensemble of analyses (from an ensemble of data assimilations [EDA]) at lower resolutions (Bonavita et al., 2012). Satellite and conventional observations are assimilated in the ECMWF (atmospheric) 4D-Var system, in two 12-hr assimilation windows per day, including data from 0900 UTC to 2100 UTC, then from 2100 UTC to 0900 UTC. Most satellite observations are assimilated as radiances such that a radiative transfer forward model is needed to transform model variables such as temperature and humidity into radiances. The RTTOV model (Saunders et al., 1999;Saunders et al., 2018) is used for clear-sky conditions and the RTTOV-SCATT model is used for cloudy and precipitating conditions (Bauer et al., 2006;Geer et al., 2009;. For tropospheric sounding channels sensitive to the surface, the surface emission is estimated prior to assimilation using the model skin temperature and an estimate of surface emissivity. Constant values are used for emissivity of infrared instruments and either a dynamic retrieval is used for microwave observations over land and sea ice (Di Tomaso and Bormann, 2012;Baordo and Geer, 2016), following the methods of Karbou et al. (2005;, or values from the FASTEM model are used over the ocean (Liu et al., 2011;Bormann et al., 2012;Kazumori and English, 2015). A skin temperature sink variable is then used during the assimilation of clear-sky observations to account for uncertainties in the model skin temperature. A new skin temperature is retrieved at each cycle and for each observation location but is then discarded before the next assimilation cycle.
At each data assimilation cycle, observations are combined optimally with the short-range forecast (background) from the previous cycle, in order to produce a best estimate of the state of the atmosphere. The relative weights given to the observations and background values are determined, respectively, by pre-defined observation error covariance matrices and background error covariance matrices generated from the EDA (Bonavita et al., 2012). These matrices characterise the uncertainties in the random errors in observation and background fields, with the former including contributions from the instrument noise, representation errors due to spatial mismatches between observations and the model, and forward model errors.
Systematic observation errors for satellite observations (including forward model and instrument errors) and aircraft data are handled through the use of a variational bias correction (VarBC) scheme (Dee, 2004;Auligné et al., 2007). Before assimilating data, satellite radiances and AMVs are thinned both temporally and spatially, keeping only one observation from each instrument inside an area roughly 125 × 125 km in size and within a 30-min interval. Spatial thinning (horizontal and vertical) is also applied to aircraft data and vertical thinning to radiosonde data with high vertical resolution. Note that no vertical averaging is applied to radiosonde data. Thinning is necessary in order to avoid spatially correlated observation errors (e.g., due to representation error), which are currently neglected in 4D-Var data assimilation. Quality control is also applied to remove observations with gross errors (including errors in both the observations and in the forward model) using a check against the short-range forecast (first-guess check), and to apply cloud screening for satellite data assimilated in clear-sky conditions.

USAGE OF ATMOSPHERIC OBSERVATIONS IN THE ARCTIC
Typically, around 2.5 million observations are assimilated per day in the Arctic region. Of the observations assimilated north of 60 • N, 4-6% are conventional data (with exact numbers varying by season), 20-23% are microwave radiances, 67-72% are infrared radiances, 1.6-2.0% of the data are GPSRO bending angles, 0.7-1.1% are AMVs derived from polar-orbiting satellite image sequences 3 and 0.2% are scatterometer 10-m winds.
Conventional observations assimilated in the Arctic include surface pressure ( ) observations, temperature ( ), wind ( and ) and specific humidity ( ) from radiosondes and 3 Note that AMVs derived from a combination of geostationary and polar-orbiting satellite images are not assimilated and geostationary satellite data are not assimilated north of 60 • N. T A B L E 1 Types of conventional observations assimilated in the ECMWF system and the atmospheric heights of the observations. Observations assimilated include the near-surface observations of geopotential height ( ), 10-metre and wind components ( 10 m, 10 m), 2-m relative humidity ( ℎ2 m) and surface pressure ( ), as well as atmospheric measurements at different heights of temperature ( ), humidity ( ) and and wind components ( , )

Conventional observation type Observations assimilated Atmospheric height
Synoptic stations (land-and ship-based) , 10 m, 10 m, ℎ2 m, Surface aircraft, wind ( and ) from wind profilers, 10-m wind over the ocean ( 10 m and 10 m), 2-m relative humidity ( ℎ2 m, during the daytime only) and geopotential height ( ), as listed in Table 1. Note that only a small number of wind profiler observations are assimilated north of 60 • N, from sites in Norway and Finland, and geopotential height observations are only assimilated for a small subset of data for which surface pressure has not been reported. Conventional observations generally have better coverage over land than over the ocean due to the fact that most radiosonde and surface pressure observations are situated over land, with fewer observations over the Arctic Ocean ( Figure 1). In particular, there are very few radiosonde observations north of 70 • N, and there are fewer surface pressure and 10-m wind observations over the Arctic Ocean in winter than in summer due to fewer data from buoys and ships. In the Arctic, the microwave and infrared radiances represent over 90% of the total number of assimilated observations. There are more polar-orbiting satellite data assimilated in the Arctic than in other regions due to the frequent number of revisits over the poles. The high number of observations from polar-orbiting satellites north of 65 • N is illustrated in Figure 2a,b, which shows the number of observations assimilated in the operational ECMWF system for different seasons for the microwave temperature sounding channel 12 (peaking around 10 hPa) of AMSU-A on MetOp-A. All channels of all microwave and infrared sounding instruments show similarly good coverage north of 60 • N, although lower-peaking channels will be affected by cloud screening and surface-related quality control (e.g., Figure 2c,d).
The main microwave observations assimilated both globally and in the Arctic are the 50-60 GHz temperature sounding channels on the AMSU-A and ATMS instruments, and the 183 GHz water vapour sounding channels (referred to hereafter as the humidity sounding channels) on a number of instruments including MHS, ATMS, MWHS-2, MWHS, GMI 4 and SSMIS, a total of 16 instruments in all. All microwave temperature sounding channels are assimilated in clear-sky conditions only, after screening observations from cloud-sensitive channels in cloudy and precipitating regions. The majority of humidity sounding channels are assimilated in all-sky conditions following the methods of Bauer et al. (2010), Geer et al. (2010), Geer and Bauer (2011) and Baordo and Geer (2016), which were developed initially for the assimilation of microwave imagers and were more recently applied to the assimilation of humidity sounding channels (Geer et al.,  ). The use of microwave sounding data over the Arctic has changed greatly over the last 10 years, with the majority of data now assimilated over all surface types including ocean, snow-covered land, snow-free land and sea ice (e.g., Di Tomaso and Baordo and Geer, 2016). The current usage of microwave sounding channels in the Arctic is summarised in Table 2. Assimilated infrared observations include radiances from four instruments: IASI (two instruments), CrIS and AIRS, which are primarily sensitive to atmospheric temperature, humidity and ozone. Only observations unaffected by cloud are assimilated, as well as a few occurrences of overcast conditions (McNally, 2009). Compared to microwave data, infrared observations have a greater number of channels and a higher vertical resolution, leading to overall more infrared observations assimilated than microwave observations. Currently 100-200 channels are assimilated per instrument, with the majority in the long-wave temperature sounding band, but with some ozone and humidity sounding channels. However, infrared observations have lower temporal coverage than microwave observations since they are available from fewer satellites with fewer local crossing times. They are also more sensitive to cloud cover than microwave observations, which leads to a higher number of observations T A B L E 2 Use of microwave radiances from different instruments assimilated in the Arctic (north of 60 • N). Note that the lowest-peaking 183 GHz channels are not used (north of 60 • N) in the all-sky system due to cold air outbreak model biases in this area. In addition, some AMSU-A and MHS channels have been blacklisted for some instruments due to operational problems such as increased noise, failure of the channel etc. and are not assimilated globally. This includes approximately 2-3 channels for each AMSU-A instrument, and channel 3 of NOAA-19 MHS being removed during the cloud-screening process as compared to the microwave temperature sounders. While there is a high temporal coverage of polar-orbiting satellite data over the Arctic, surface-sensitive observations (sensing the lower troposphere) are difficult to use in winter and over snow and sea ice. More specifically, we cannot yet make the best use of these observations because forecast model errors in these polar regions are still large and also because observation operators are incomplete or have large errors, limiting the capability to extract information obtained from satellites. For example, it is more difficult to assimilate surface-sensitive microwave channels over snow and sea ice due to the difficulties in estimating the surface contribution in the radiative transfer (e.g., in terms of modelling the surface emissivity, skin temperature and reflection properties). One source of forward model error is that currently a specular reflection is assumed over snow, whereas in reality the reflection is more likely to be close to a diffuse (Lambertian) reflection (e.g., Guedj et al., 2010;Bormann et al., 2017). These aspects are evident in the large differences between the observations ( ) and the short-range forecasts in radiance space ( ) for the winter period, particularly over areas of sea ice and snow cover (not shown). Improving the use of satellite radiances over snow and sea ice is an area of ongoing research.
Since the quality control rejects observations where the absolute − value is larger than the threshold, the greater mean differences between the observations from the temperature sounding microwave channels and the background lead to more of these observations being rejected in winter than in summer (see Figure 2c,d for AMSU-A channel 5 of MetOp-A, peaking around 500-850 hPa). For this channel 24% more data from the polar regions (north of 60 • N) are used in the summer period than in the winter. Note that similarly strong − differences can also be seen over sea ice and snow for the microwave humidity sounding channels (not shown), which also leads to a higher rejection rate of the data in the quality control process. Improvements in the radiative transfer modelling, the skin temperature treatment or the representation of relevant physical processes may allow more of these observations from the Arctic region to be used.
There are also difficulties in assimilating infrared observations in polar regions compared to other areas. As for the microwave observations, the surface emission is more difficult to model over sea ice, leading to higher biases for surface sensitive channels. Furthermore, for infrared observations there can also be problems with cloud detection, which is more difficult in colder areas (e.g., Eresmaa, 2014). The cloud detection for infrared observations relies primarily on thermal contrast between the clouds and the surface. Over warm oceans clouds tend to be visible as a strong cold signal. However, in polar latitudes if the cloud top is a similar temperature to the ocean there will be no contrast, making cloud detection difficult. Similarly, cloud detection relies on an accurate estimate of the model skin temperature, which can be problematic over sea ice, as highlighted above. These aspects make cloud detection in the infrared part of the spectrum more difficult in polar regions.

Experimental set-up
In order to study the impact of different observing systems in the Arctic, Observing System Experiments (OSEs) were carried out to remove different observation types over the polar regions. This allowed us to directly measure the impact of losing these observations from the full observing system, given the current usage of the data. OSEs were run by removing data at latitudes north of 60 • N and south of 60 • S for the following observing systems: • all microwave temperature and humidity sounding observations, including those for all 16 instruments given in Table 2 This list represents all the atmospheric observing systems assimilated in the Arctic with the exception of scatterometer winds, which are not considered due to the small number of assimilated data in this region. Note that in the microwave OSE only the 183 GHz channels of SSMIS and GMI were removed, since these are the only channels assimilated north of 60 • N -the assimilation of microwave imager channels at 18-90 GHz is restricted to latitudes between 60 • N and 60 • S. In these OSEs data were removed south of 60 • S and north of 60 • N so that the experiments could also be used to assess the impact of observations in the southern polar region in future work. We do not expect the removal of observations in the Antarctic to affect Arctic or Northern Hemisphere midlatitude forecasts, and indeed we found no evidence of this in maps showing the changes in forecast error.
A set of global OSEs was also run for the main observation groups so that the impacts from the polar OSEs could be compared to global and hemispheric results. In this paper we present selected results from these experiments in order to put the Arctic OSEs into context. A more thorough analysis of the global OSEs is presented by Bormann et al. (2019).
Data assimilation experiments were run for the periods June-September 2016 and December 2017 to March 2018 using spectral triangular truncation of 399 with a cubic-octahedral grid (TCo399, ∼25 km horizontal resolution) (Malardel et al. 2016) and with 137 vertical levels. These experiments have the same vertical resolution, but a lower horizontal resolution than the operational high-resolution deterministic ECMWF analysis (TCo1279, ∼9 km). For each of the experiments, 10-day forecasts were started twice daily from the 0000 UTC and 1200 UTC analyses. Cycle 43R3 of the ECMWF IFS was used for the summer period, which is the same as the operational version at the time of the study, and cycle 45R1 was used for the winter period, which was operational from June 2018 to June 2019. 5 The main differences in observational use between cycles 43R3 and 45R1 were the introduction of non-surface-sensitive infrared sounding channels over land and the need to account for horizontal drift in the assimilation of radiosondes. These changes led to forecast improvements, particularly for the introduction of infrared observations over land. However, they are unlikely to affect the conclusions of this study, since a comparison of global OSEs run here against those run in 2014, as described by McNally (2014), showed little change in terms of the relative impacts of the different observation types.
The winter experiments partially cover a special period with additional radiosonde launches in the Arctic -the Year of Polar Prediction (YOPP) Special Observing Period 6in February-March 2018. During this time the number of radiosonde launches was doubled for selected Arctic land 5 Details of the ECMWF model cycles can be found online at www.ecmwf.int 6 www.polarprediction.net radiosonde stations. This led to a 30-35% increase in the number of radiosonde observations assimilated north of 60 • N for February-March as compared to December-January.
Control experiments were run for both the summer and winter periods, assimilating all data globally. Changes in forecast error in the OSEs with respect to these controls were used to evaluate the impact of the various Arctic observing systems on forecast skill, at various lead times (up to 10 days) and throughout the atmosphere. Note that for computational reasons the EDA was not rerun for any of the OSEs, instead a fixed EDA was used which included the full observing system. This means that background error covariances may have been underestimated in the denial experiments, further degrading forecasts in the denial and thus leading to a potential overestimation of the observation impacts as compared to the control.

Verifying changes in forecast error
In the following, we evaluate the impact of each observing system in the Arctic by analysing the change in forecast error induced by removing the respective observing system from the NWP system. To do this, one has to compare the forecasts to a proxy for the truth, which is typically chosen to be either a verifying analysis or observations. Possible verifying analyses include the experiment's own analysis, the operational ECMWF high-resolution deterministic analysis or the analysis of the control experiment. Verifying shorter-range forecasts in particular is generally problematic because the changes in forecast errors are likely to be smaller than the analysis uncertainty (McNally, 2014). Short-range forecast errors (particularly for 12-hr forecasts) are also likely to be strongly correlated to analysis errors such that the analysis does not provide an independent verification. Because of this, it is usual to compare short-range (up to 12-hr) forecasts to observations, including both conventional and satellite data.
To investigate the effects of using different verification references, we calculated changes in the standard deviation of forecast error for different OSEs, normalised by the values for the control, using different verifying references including operational analyses, own analyses and radiosonde observations. Figure 3 illustrates these changes for geopotential height at 500 hPa (Z500) for the microwave summer OSE and the conventional winter OSE: the experiments with the largest forecast error increase for the summer and winter periods, respectively. It is evident that up to day 3 (day 2) for the summer microwave OSE (winter conventional OSE), the change in apparent forecast error relies heavily on the choice of the verifying reference. At later times, however, the change in forecast error becomes larger than the differences in the verifying analysis (or between the analyses and the observations) and the choice of verification no longer has an effect. Large values for the verification against operations are likely because the operational analysis and the control forecast share some errors as they use the same observing system, and verification against operations hence favours the control. In contrast, the low values for the radiosonde-based verification likely reflect larger errors in the verifying reference here (relative to the analysis) combined with selective geographical sampling. Problems with analysis-based verification of short-range forecasts are not confined to the Arctic region, as discussed for instance by Geer et al. (2010). In the following, we verify short-range (12-hr) forecasts against observations, for the previously stated reasons, and short-to medium-range (up to day 10) forecasts against ECMWF operational analyses, since this is the more independent of the analysis choices. However, for the latter we focus on forecast scores from day 2 onwards, where different verifications show broadly consistent results.

Short-range forecast impact in the Arctic
In order to evaluate the impact of different observing systems in the Arctic region, we first investigate the effect on short-range (up to 12-hr) forecasts in the region where those observations were removed, that is, north of 60 • N. The short-range forecast impact of the OSEs can be evaluated against radiosonde data; the results are shown in Figure 4. This comparison is restricted to areas of radiosonde coverage, that is, at latitudes of 60-70 • N ( Figure 1c) and to data used in the assimilation. The use of radiosonde data as a verification tool makes it difficult to perform a strict comparison of the impact of the different OSEs, since this metric will likely favour the conventional OSE due to the selective spatial sampling. However, this verification does still provide an indication of the relative impact of different satellite observation types as well as an indication of the heights and variables affected by the conventional data.
Each observing system has a positive impact on short-range forecast accuracy in the Arctic region. Indeed, the loss of each individual observing system leads to a degradation in short-range forecast quality, as indicated by an increase in the standard deviation of observation minus forecast ( − ) values relative to the control (Figure 4). The conventional data have a strong impact on short-range forecasts, with positive impacts on all variables and most atmospheric heights, as one would expect. The largest impacts, relative to other observations, are to temperature in the winter, stratospheric wind (10-100 hPa) and lower tropospheric short-range forecasts (temperature, wind and humidity at 700-1000 hPa). Conventional data also appear to have a stronger relative impact in winter, particularly for tropospheric temperature (300-1000 hPa) and wind (250-300 hPa). The temperature and humidity impacts occur at similar heights in summer and winter, but the wind  Figure 5. It appears that radiosondes are responsible for the stratospheric impacts and for approximately half of the tropospheric impacts (e.g., Figure 5a), with the rest due to other conventional data. The observations from synoptic stations have a statistically significant impact on specific humidity and vector wind at 1,000 hPa, which is likely due to surface pressure observations since far fewer 10-m wind or 2-m relative humidity observations are assimilated.
Of the different satellite observation types, the microwave data have the highest overall impacts on temperature, wind and specific humidity, with particularly strong impacts on temperature and wind at 500 hPa and on humidity at 300-700 hPa. It is interesting to investigate whether these impacts are due to the temperature sounders, humidity sounders or a combination of the two. The microwave temperature sounders typically improve the temperature fields, which in turn leads to improvements in the geostrophic winds. Humidity sounders have sensitivity to humidity, temperature, cloud and precipitation, and can affect wind via the improvement of any of these fields. In recent years, the impact of microwave humidity sounders has grown, due to an increased number of observations and improvements in the assimilation of these data, such that the global impact of humidity-sensitive instruments is now equivalent to the impact of temperature sounders . Results of the polar OSEs indicate that both types of instrument have similar impacts on short-range forecasts in the Arctic, as they do globally. As shown in Figure 6, we obtain similar wind and temperature impacts from both the microwave temperature and humidity sounders in the Arctic. The wind impact from microwave temperature sounders is likely due to geostrophic balance effects in response to an improved temperature field, and for microwave humidity sounders it is likely due to a combination of effects such as the 4D-Var tracing effects described by , as well as balance adjustments in response to temperature improvements.
The infrared sounders improve the tropospheric temperature, humidity and wind short-range forecasts at similar atmospheric heights to the microwave observations ( Figure 4). Compared to the microwave observations, the improvements are generally smaller, however, which is likely due to a lower temporal and geographical coverage (fewer instruments overall and no data in cloudy regions). There are also some differences between the impacts of infrared and microwave data due to the different sensitivities of the channels assimilated. The infrared data have a stronger effect on temperature at 850-1,000 hPa than the microwave data, particularly in the summer, and the microwave temperature sounding channels have a stronger effect higher in the atmosphere (5-100 hPa) than the assimilated infrared temperature sounding channels. It is interesting to note that the humidity impact of infrared observations is neutral in the winter period, whereas microwave observations still have an impact on humidity. There are ongoing investigations into improving the use of infrared water vapour channels at ECMWF, for example through assimilating additional channels with a higher humidity sensitivity, and assimilation in all-sky conditions. The positive impact from microwave humidity sounding channels in both winter and summer suggests that such improvements in the use of infrared water vapour channels could have benefits for the Arctic as well as for lower latitudes.
The main effect of GPSRO data is on temperature in the upper troposphere-lower stratosphere (UTLS), particularly for the summer period where GPSRO has the highest impact  Figure 4). This impact is to be expected, since these data are most sensitive at this height (e.g., Collard and Healy, 2003). AMVs have the smallest impact of any observation type in the Arctic. However, there are still statistically significant impacts on short-range forecast skill, with small improvements to temperature and wind speed around 500-700 hPa due to AMVs. When AMVs were first assimilated at ECMWF in the polar regions the impact was much greater (Bormann and Thépaut 2004), so this smaller impact is likely a reflection of the increased usage of other observation types, particularly microwave humidity and temperature sounders which also act to improve wind. Indeed, AMV winds now provide only 0.7-1.1% of all assimilated observations in the Arctic, and so a lower impact is to be expected.

Medium-range forecast impact in the Arctic summer
We will now assess the impact of Arctic observations on forecast accuracy for days 2-10 over the Arctic region (north of 60 • N). This will be presented in terms of the change in the standard deviation of forecast error, as verified against the ECMWF high-resolution operational analysis, with respect to the control experiment. This metric was chosen because in data assimilation we aim to use observations primarily to correct random forecast errors. Results for the root mean square forecast error are similar, however, as random forecast errors dominate statistics in the medium range.
The normalised change in the standard deviation of forecast error is shown for the summer period in Figure 7 for Positive values indicate an increase in forecast error when the observations are removed, meaning that observations act to improve forecasts. Note that we look at Z500 in particular because it is a good measure of the performance in terms of large-scale circulation. Error bars indicate the 95% confidence intervals for the Student's -test, with the Šidák correction (assuming 20 independent tests) and inflation for temporal correlation of errors applied, following the recommendations of Geer (2016). It is worth bearing in mind when analysing these results that a 4-month period may not be long enough to test very small impacts with statistical significance. A statistically neutral impact could therefore indicate either a genuinely neutral impact or that the change in forecast error is too small to achieve statistical significance over a 4-month period. Results indicate statistically significant increases in medium-range forecast error relative to the control from the microwave, conventional and infrared data in the troposphere, suggesting that, as for the short-range forecast impacts, these are the leading observing systems for tropospheric weather forecasts in the Arctic summer. The largest changes in forecast error are due to the microwave and conventional data, which have statistically significant impacts up to day 6 and day 4, respectively. For example, the losses of microwave and conventional data lead, respectively, to a 7% and 5% increase in the Z500 normalised standard deviation of forecast error on day 4 in the Arctic (Figure 7). The tropospheric impacts of AMVs and GPSRO are generally not statistically significant in the medium range, but GPSRO does have a statistically significant impact on UTLS temperature up to days 2-3. Overall the impacts at different atmospheric levels are consistent with the short-range forecast impacts ( Figure 5), with positive impacts in the troposphere and stratosphere from microwave and conventional observations, tropospheric impacts from infrared observations and UTLS temperature impacts from GPSRO.
The strongly positive impact of satellite data, particularly microwave observations, is somewhat similar to the impact of satellite data in the Southern Hemisphere in both seasons, shown for the global OSEs in Figure 8a,c. This suggests that we make good use of satellite observations over the Arctic in summer. The relative impact of conventional data in the Arctic summer appears somewhat higher than the relative impact in the Southern Hemisphere summer (December 2017 to February 2018; Figure 8c) and lower than in the Northern Hemisphere summer (Figure 8b). This is consistent with a higher level of coverage in the Arctic north of 60 • N than for the Southern Hemisphere, but a lower coverage than for the Northern Hemisphere as a whole (particularly in comparison to polar-orbiting satellite data).
Results of the microwave temperature and humidity sounder OSEs indicate that roughly half of the tropospheric microwave impacts are due to the temperature sounders and half to the humidity sounders, as shown in Figure 9a. This is consistent with the short-range forecast impacts, shown in section 4.3, which indicate that these instruments have very similar impacts on tropospheric temperature and wind.
The impact of conventional observations was also investigated with OSEs for different observation types, including for all radiosonde data, aircraft data, synoptic (land and ship) stations and drifting buoys. The results shown in Figure 9b are more difficult to interpret since no individual observing system has statistical significance in the medium range beyond day 2. It seems therefore that the medium-range impact is due to a combination of all observation types, with a suggestion that the radiosonde and near-surface observations are the most significant.

Medium-range forecast impact in winter
Results of the polar OSEs run for the winter season (December 2017 to March 2018) show that, as for the summer period, the main tropospheric medium-range forecast impacts are due to microwave, conventional and infrared data, as illustrated in Figure 10. However, in contrast to the summer period, the conventional data appear to have greater impacts than microwave data, particularly for low-level temperatures (1,000 hPa) where this difference is statistically significant at day 2. This is similar to the results of the short-range forecast, where the relative impacts of conventional observations on temperature are greater in winter than in summer, as shown in Figure 4.
The increase in the relative impacts of conventional data compared to microwave data is likely due mainly to the difficulties associated with assimilating microwave data over snow and sea ice, as discussed in section 3. In particular, there are known sources of error in the estimation of the surface contribution in the forward model, reflected in an increase in mean − values for tropospheric channels over snow and sea ice. Improvements in the treatment of surface emission in the forward model should lead to improved usage of these data in the Arctic winter. This will be discussed further in section 6.
While the reduced impact of microwave data is most likely the dominant cause of the summer-winter difference, there may also be an increase in the importance of conventional data in the winter season. This is supported by results from the global OSEs over the Southern Hemisphere, which also show

F I G U R E 7
Normalised change in the standard deviation of forecast error for the summer Arctic OSEs, shown for (a) temperature at 100 hPa, (b) geopotential height at 500 hPa, (c) vector wind at 500 hPa and (d) temperature at 1,000 hPa over the Arctic region (north of 60 • N). Note that forecast errors are verified against the ECMWF (high-resolution) operational analysis. Values are given as fractions a greater relative impact from conventional data for the Southern Hemisphere winter season (e.g., Figure 8), albeit still with a higher overall impact from microwave data. The increased importance of the conventional data in winter may be related to radiosondes introducing vertical structure into the temperature fields, which is likely to be particularly important in winter due to the prevalence of inversions (Overland et al., 1997;Uttal et al., 2002;Serreze and Barry, 2005), which may not be well captured by the forecast model . This idea is supported by the fact that the conventional OSEs show changes in the mean temperature in the analysis, with a strong vertical structure for both the summer and winter seasons (see section 4.7). The impacts of infrared observations, GPSRO and AMVs are overall very similar in winter as they are in summer, with a positive impact in the troposphere from infrared data in particular, and on UTLS temperature for GPSRO. The latter GPSRO impact is somewhat greater in the atmosphere in the winter, however, at around 100 hPa compared to 100-200 hPa for the summer season. This is consistent with the short-range forecast fits to radiosonde data which also show this for the GPSRO OSE.

4.6
Connection to the midlatitudes Thus far we have examined the impacts of Arctic observations on medium-range forecasts north of 60 • N, but a degradation in the Arctic short-range forecasts could also impact the quality of midlatitude forecasts at later times. Results from the polar OSEs show that there are impacts on the midlatitudes from around day 2 onwards in both seasons, with statistically significant impacts for the microwave summer OSE and conventional winter OSE. This can be seen in the form of normalised changes in the standard deviation of forecast error as averaged over the Northern Hemisphere midlatitudes (20-60 • N, not shown) as well as in maps of the normalised changes in the standard deviation of forecast error, as shown in Figure 11.
As Figure 11a-c shows, the loss of Arctic microwave observations affects the midlatitudes in the summer particularly over North America and the Atlantic Ocean, with the latter impact extending towards the Iberian Peninsula and North Africa at around day 5. There is also an impact on the midlatitude forecast skill for the conventional winter OSE (Figure 11d,e), particularly over Eurasia and, to a lesser extent, North America. These areas of degradation in the midlatitudes for the Arctic winter are consistent with the findings of Jung et al. (2014), who showed that in the winter period an improvement in skill over the Arctic should lead to improved forecasts in the midlatitudes over North America and Eurasia. The midlatitude summer and winter impacts shown here are likely to be regime-dependent and will be explored in further work. Indeed, an analysis for the winter season has already been carried out by Day et al. (2019) and results indicate a link to Scandinavian blocking for the winter period. As well as forecast errors in the Arctic affecting midlatitudes, the reverse can also be true, with forecast errors in the midlatitudes affecting the accuracy of weather forecasts in the Arctic. To investigate this, we compared the forecast skill north of 60 • N for the polar OSEs with that of the global OSEs ( Figure 12). This comparison indicates whether the medium-range forecast impact in the Arctic comes from the Arctic observations alone (polar OSEs) or from a combination of Arctic and midlatitude observations (if there is a difference between the global OSE and polar OSE impacts). In the summer season, for both microwave and conventional OSEs, results are similar for the global and polar experiments, suggesting that conditions in the midlatitudes do not greatly affect the Arctic in the summer. However, in winter there appears to F I G U R E 10 As Figure 7, but for the winter OSE be a quite substantial impact in the Arctic from the midlatitude observations. This is consistent with the results of Bauer et al. (2014), who showed a high EDA spread in the Atlantic in winter, with large uncertainties in the forecasts linked to storms. This result indicates that the Arctic is more affected by Northern Hemisphere midlatitude initial conditions for the winter period than for the summer period.

Impact on the mean temperature analysis in the Arctic
In addition to an increased normalised standard deviation of forecast error, removing observations in the Arctic was found to impact the mean analysis values. These changes can provide us with an estimate of the uncertainty in the mean analysis state and can also indicate areas where there are forecast model biases. The latter can be difficult to identify, however, since observations, observation operators and forecast models all potentially have systematic errors. Results from the global OSEs show that all observations affect the mean analysis in the midlatitudes and in the Tropics, as well as in the polar regions, and generally the magnitude of the changes in the Arctic are similar to sub-polar regions. These global effects are presented in detail by Bormann et al. (2019); here we focus only on features that are specific to the Arctic, namely mean changes in the tropospheric temperature.
The conventional data were found to introduce mean temperature changes with a strong vertical structure in summer and in winter, as shown in Figure 13. Conventional observations also introduce a global change in the geopotential height which is strongest over the Arctic for the summer season (not shown). The impact of conventional data on the mean analysis was further explored by investigating whether this was due to a direct effect or to the role of these observations as an anchor for VarBC. Additional conventional polar OSEs were run for both summer and winter seasons, with VarBC replaced by a fixed bias correction taken from the control experiment at each cycle. This led to no significant changes in the mean analysis (or forecast skill) compared to the conventional polar OSE for either season (results not shown). This suggests that when conventional data are removed in the Arctic there is still sufficient anchoring from other radiosonde observations in the midlatitudes, as well as from GPSRO, for the bias correction of other data. The impact of the conventional data on the mean analysis in the troposphere is therefore a direct effect, rather than a result of anchoring VarBC for other observations.
The effect of conventional data on temperature in the Arctic is particularly striking for the winter season, and could be an indication of systematic errors in the forecast model. Indeed, the mean changes for winter (Figure 13b) are consistent with temperature biases that are known to occur due to snow over sea ice not being represented by the forecast model. This lack of snow leads to a skin temperature that is too warm in winter and atmospheric temperature forecasts that are too warm near the surface, with a compensating cold bias above it (S. Keeley, personal communication, 2019). The reverse of this effect can be observed in results from the conventional OSEs, both in the atmospheric analysis ( Figure 13b) and with a subsequent increase in skin temperature forecasts . This box average acts to smooth the data and highlights the important features. Statistical significance is indicated by the hatchings and is calculated as the 95% confidence interval following the Student's -test method, with a Šidák correction applied (assuming 320 independent tests from 64 boxes × 5 experiments). The 22.5 × 22.5 degree box size was chosen since this gave forecast errors that are uncorrelated between neighbouring boxes (and independent testing can then be assumed) (by <0.5 K) up to day 2. Conventional observations therefore appear to correct systematic forecast model biases in temperature for the winter season. However, a long-term solution would be to improve the forecast model by adding representations of snow over sea ice. Furthermore, the impact of the mean changes in temperature reduces with forecast lead time, and the medium-range forecasts are dominated by changes in random error (from day 2 onwards). Mean changes in the tropospheric temperature analysis were also observed for the microwave and infrared OSEs, with a cooling of around 0.1-0.2 K for the infrared OSEs at 700-1,000 hPa in both summer and winter, and warming and cooling effects of 0.1-0.2 K for the microwave OSEs in the summer and winter seasons, respectively, at similar heights (not shown). These bias changes are likely linked to errors in the cloud detection or in the forward model. Again, the changes are small compared to the random forecast errors that dominate beyond day 2.

FORECAST SENSITIVITY TO OBSERVATION IMPACT (FSOI) DIAGNOSTICS
Another means of quantifying the observation impact is to use Forecast Sensitivity to Observation Impact (FSOI) diagnostics, which estimate the relative impacts of each observation type on short-range (24-hr) forecasts. The FSOI values were calculated for the control experiments following Cardinali (2009  is the product of three terms: the background departures in observation space, the transpose of the Kalman gain matrix (dependent on observation and background errors and the observation forward model) and a measure of 24-hr forecast error sensitivity to the initial analysis state. Forecast errors are calculated using the analysis as a reference, and the measure of forecast error used is the dry energy norm (e.g., Rabier et al., 1996). Note that the FSOI can also be calculated using observations as a reference (e.g., Cardinali, 2018) and this is an area of active research. FSOI values are summed over all observations assimilated for a given observation type, and normalised by the total global FSOI. The estimated observation impact by design depends on the background departures, the assigned observation errors, the number of observations assimilated and whether the observations are assimilated in regions of high forecast error growth (active regions).
There are some key differences between using FSOI values and OSEs to measure observation impacts. FSOI values estimate the impact of each observation in the context of the full observing system, whereas OSEs indicate the impact of losing an observation type from the system. However, FSOI values allow us to easily identify which subsets of observations/channels have the most impact, as well as geographical areas of high impact, information that could not easily be obtained using OSEs. The two techniques can therefore be considered complementary. Furthermore, FSOI results are usually in reasonable agreement with results from OSEs in extratropical regions (Gelaro and Zhu, 2009). The relative FSOI values shown in Figure 14 for various observation types assimilated in the Arctic can be interpreted as follows. A relative FSOI value of 3% for microwave Arctic observations indicates, for example, that the microwave observations assimilated in the Arctic contribute 3% of the overall reduction in total global forecast error as measured by the total energy norm used in these FSOI calculations. Note that together all observations in the Arctic (north of 60 • N) contribute approximately 7% of the overall reduction in global forecast error (7.3% in summer and 7.5% in winter), compared to approximately 30% each for observations in the Northern or Southern Hemisphere midlatitudes (20-60 • N or 20-60 • S).
The FSOI values indicate that the three observation types with the greatest impact in the Arctic are the microwave, conventional and infrared observations. The microwave observations have the greatest impact in the summer period and the conventional observations in the winter period. The FSOI results are thus broadly consistent with results found in the OSE for both the short (Figure 4) and medium range (Figures 7 and 10). There are some differences between the results from the OSEs and FSOI values, however. For example, the FSOI values suggest that GPSRO and AMVs have similar impacts to each other, and the relative impacts of infrared data in summer appear higher in FSOI compared to, for example, changes in the normalised standard deviation of forecast error at day 4 ( Figure 7). These differences are not surprising, since FSOI is a measure of short-range, not medium-range, forecast impact, and the use of the dry energy norm would favour tropospheric impacts leading to an underestimation of the (UTLS) impact of GPSRO. However, it is interesting to note these differences, since if FSOI values were used as a replacement for OSEs they would not give entirely the same message.
Maps of the cumulative FSOI values per grid point, normalised by the total global FSOI value, indicate a greater impact over land (particularly over Siberia) and sea ice for AMSU-A and MHS instruments in summer as compared to winter (e.g., as shown in Figure 15). This further supports the hypothesis that the reduced impact from microwave observations in winter is due to a reduced impact from areas of snow and sea ice. There is also a small increase in FSOI values for conventional data in the winter period, particularly over Iceland and Scandinavia, possibly indicating increased importance of conventional observations in these areas in winter. The FSOI values also indicate a higher impact of all observations in the Northern Hemisphere midlatitudes over the Pacific and Atlantic Oceans in winter, likely due to greater forecast error growth in this region during the winter season. Other features of note include the high impact of AMSU-A data over Greenland in both the summer and the winter, which can be linked to a high standard deviation of − for channel 6 AMSU-A in this area, and a high impact of conventional data over the Pacific coast of North America due to the large number of aircraft data in this region. Note that FSOI values with a small magnitude in Figure 15c,d are due to the small number of (aircraft) observations being assimilated in this area during the 4-month period.

DISCUSSION AND CONCLUSIONS
In this paper we have investigated the usage of different Arctic observation types in the ECMWF NWP system and the impact of assimilating them on the quality of shortand medium-range forecasts. Our investigation relied on both  Figure 14, such that positive values indicate that the observations act to reduce forecast error. The summation of values shown here for latitudes north of 60 • N (indicated as a solid black line) would give the same values as shown in Figure 14 comprehensive numerical experimentation consisting of both OSEs and FSOI diagnostics. The results demonstrate the importance of both satellite and conventional data in the Arctic region in determining the initial conditions used for NWP. As NWP systems are also used to produce modern reanalyses such as ERA5 (Hersbach et al., 2018), improvements in the use of data for NWP will also lead to improved reanalyses. All of the large observation groups considered (conventional data, microwave data, infrared data, GPSRO and AMVs) were found to improve short-range forecasts in the Arctic. This demonstrates the complementarity of the different observing systems, with tropospheric and stratospheric impacts from microwave and conventional data, tropospheric impacts (particularly at low-level temperatures) from infrared observations, UTLS temperature and wind impacts from GPSRO and low-level temperature, and wind impacts from AMVs. The greatest short-range tropospheric impacts in the Arctic were found to be due to microwave, conventional and infrared observations, as shown by the analysis of OSE results and by FSOI statistics. During summer these impacts were greatest for the microwave observations, while during winter they were greater for conventional observations. During summer, removing the microwave or conventional data also led to statistically significant forecast degradations in the Arctic region at days 4-6 in the troposphere, with the greatest impacts due to the microwave data (Figure 7). Unlike for the microwave and conventional observations, the impact of infrared data does not persist into the medium range beyond day 2 (Figure 7). During winter, the conventional data have the greatest impacts over the Arctic region at days 4-6, but the impacts are smaller and differences between the impacts of the various observing systems are not statistically significant ( Figure 10).
The microwave and conventional observations were found to be important for midlatitude forecasts in the medium range, with impacts particularly over North America and the Atlantic (stretching down to the Iberian Peninsula and North Africa at day 5) in summer and over North America and Eurasia in winter. These are likely to be regime-dependent; this has been explored in more detail for the winter season by Day et al. (2019). By comparing results from Arctic OSEs to global OSEs we were also able to analyse the impacts of midlatitude observations (south of 60 • N) on medium-range weather forecasts in the Arctic (north of 60 • N). The medium-range impacts were found to be substantial in the winter season (for both conventional and microwave OSEs) but negligible in the summer season. This suggests that medium-range weather forecasts in the Arctic are mainly influenced by Arctic observations in the summer, but that during winter they are also strongly influenced by midlatitude observations.
The strong positive impact of conventional data in the Arctic shows the importance of these observations, which are more costly to obtain than in sub-polar regions, particularly during the winter season. Investments in the usage of these data, such as extending the conventional observational network, for example through increased geographical or temporal coverage, would likely be useful for weather forecasting and for future reanalyses. During 2018 and 2019 there were periods of increased frequency of radiosonde observations in the Arctic during the YOPP Special Observing Periods. Initial investigations regarding the February-March 2018 Special Observing Period, through an OSE in which these additional observations were removed, suggest some benefits for short-range forecasts, whereas the medium-range impact is more marginal (not shown). Evaluation of this impact is ongoing.
The differences in the relative impacts of Arctic observations for the summer and winter seasons are striking, particularly for the microwave data, and there are likely to be several reasons for this. A major factor is likely to be the suboptimal assimilation of microwave sounding data (e.g., AMSU-A, MHS, ATMS) over snow and sea ice. The strong positive impact of microwave data in the summer season demonstrates the potential of these observations. Improving their use over snow and sea ice is therefore likely to prove very beneficial for forecasts in the Arctic and midlatitudes. Improving the use of microwave data would also benefit future ECMWF reanalyses for time periods as far back as 1979, when the first microwave sounding instrument was launched. One key area of development would be to improve the forward model over snow and sea ice, where there are known problems with the current treatment of surface emission and reflection. For example, a Lambertian reflection assumption could be used over snow instead of the currently assumed specular reflection. Work on this is ongoing, and initial results indicate a decrease in biases for − values of surface-sensitive microwave channels when a Lambertian assumption is used. A longer-term goal would be to link the snow emission and reflection to a radiative transfer forward model which could accurately represent all of the relevant radiative processes within the snow pack (absorption, emission and scattering). This would need to be combined with a multilayer snow model representing all of the physical processes. The use of a microwave emission model could also lead to additional benefits, since microwave observations over snow contain information about the snow pack which is currently not being exploited. A more physical representation of the surface emission could lead to improved characterisation of the snow properties using the microwave data.
Other aspects of the NWP system will contribute to observation impacts, including the quality of the forecast model and the specification of background error covariances in the Arctic. The assumed background error covariances are important for the usage of all observations, since they determine the weight given to them in the analysis. They also play a key role in satellite radiances, since they affect the vertical structure of the increments as well as the separation of radiance signals into different geophysical variables. This is particularly important for cases where the vertical structure may not be well captured by the background. Indeed, recent investigations suggest that the EDA used to specify background error covariances is underdispersive in the lower troposphere in the Arctic winter . This could result in the suboptimal assimilation of observations and should therefore be investigated further. In addition, known forecast model biases, for example relating to surface temperatures over snow, will hinder the effective use of observations from the Arctic winter. Currently, the snow state is initialised in the land surface data assimilation system through the assimilation of the NOAA/NESDIS IMS snow cover product along with in situ snow depth observations. However, the forecast model lacks some complexity for snow, modelling it as a single layer and with no explicit representation of snow over sea ice. Improvements in the representation of snow in the forecast model are expected in the future, however, such as through the ongoing development of a multilayer snow model.