Implementation of an Adaptive Bias‐Aware Extended Kalman Filter for Sea‐Ice Data Assimilation in the HARMONIE‐AROME Numerical Weather Prediction System

Sea ice surface temperature is an important variable for short‐range numerical weather prediction systems operating in the Arctic. However, when provided by numerical sea ice models, this variable is seldomly constrained by the observations, thus introducing errors and biases in the simulated near‐surface atmospheric fields. In the present study a new sea ice data assimilation framework is introduced in the HARMONIE‐AROME numerical weather prediction system to assimilate satellite sea ice surface temperature products. The impact of the new data assimilation procedure on the model forecast is assessed through a series of model experiments and validated against sea ice satellite products and in‐situ land observations. The validation results showed that using sea ice data assimilation reduces the analyzed and forecasted ice surface temperature root mean square error (RMSE) by 0.4 °C on average. This positive impact is still traceable after 3 h of model forecast. It also reduces the 2 m temperature RMSE on average by 0.2 °C at the analysis time with effects persisting for up to 24 h forecast over the Svalbard and Franz Josef Land archipelagos. As for the 2 m specific humidity and 10 m wind speed, no effect was observed. Possible impact on the upper‐air fields was assessed by comparing the model forecast against the radiosonde soundings launched from Spitsbergen, with no clear improvement found. Implications of using a coupled surface‐atmosphere data assimilation technique in HARMONIE‐AROME are discussed.

and ice compactness. Numerical weather prediction systems, on the other hand, often treat the ocean surface and sea ice merely as the lower boundary condition for the atmosphere. In this case surface properties of the ice cover, such as ice surface temperature and roughness, become highly important because they define the interface for energy exchange between the atmosphere and underlying medium.
Although the fully coupled atmosphere-ocean-sea ice numerical weather prediction systems are in active development, and becoming more common in operational applications (see, e.g., Browne et al., 2019;Guiavarc'h et al., 2019;Meixner et al., 2020), the noncoupled NWP systems are still widely utilized. Compared to a fully coupled system that resolves evolution of ocean and sea ice explicitly, noncoupled NWP systems often use simplified parameterization schemes for representing these components. While obviously lacking in the complexity of resolved processes, noncoupled NWP systems have some advantages that make them especially suitable for routine weather forecasting. These systems are less computationally expensive than coupled systems which allows higher spatial resolution or increased ensemble size. Also, a noncoupled system has fewer degrees of freedom than a coupled one making the model tuning less difficult. Since a noncoupled system can not represent all the complexity of physical processes a coupled system is capable of, it usually relies on external products and observational data sets for accurate representation of the sea ice state. This is often done by using a data assimilation procedure.
In operational ocean modeling systems the most common sea ice observations used for data assimilation are ice concentration satellite products (Cummings, 2005;Lea et al., 2015;Waters et al., 2015). Other observational products are also applied in these systems (e.g., sea ice drift in the TOPAZ4 ocean data assimilation system, see Sakov et al., 2012), though not as universally as ice concentration. Assimilation of other parameters, such as sea ice thickness or snow depth over sea ice, is assessed by a number of studies (e.g., Fritzner et al., 2019;Xie et al., 2018) with promising results. However, ice compactness in an NWP system is often provided as an external ice concentration field and stays constant during the model integration. Having precise ice thickness values is also less important because simple ice models applied in NWP usually resolve only thermodynamic processes, and snow cover of any considerable depth plays a more important role than ice thickness in the evolution of the surface energy budget (Batrak & Müller, 2019). In contrast, assimilation of the ice temperature observations could help by improving the accuracy of the lower boundary condition as well as aiding the atmospheric data assimilation which often relies on the quality of surface temperature in observation operators (e.g., assimilation of satellite radiances as discussed by Lawrence, Goddard, et al., 2019).
In the HARMONIE-AROME numerical weather prediction system (Bengtsson et al., 2017) sea ice is parameterized by a one-dimensional scheme SICE (Batrak et al., 2018). SICE is a multilayer thermodynamic sea ice model which represents processes of ice growth and melting and includes an adapted version of the land-surface snow module (ISBA-ES, Boone, 2000) to simulate the snow layer. Ice-covered areas are defined by means of an external ice concentration field which is updated at each surface analysis but remains constant during the model integration. In operational applications of HARMONIE-AROME (see, e.g., Müller et al., 2017) SICE is not constrained by the observations. The sea ice state is steered by the atmospheric model and freely evolves. This limitation, combined with missing representation of the ice dynamics, causes noticeable misrepresentations in the ice surface temperature field. However, the use of ice surface temperature products, which are seldomly applied for sea ice data assimilation in operational numerical weather prediction, could improve the initial state of the model and benefit the model forecast of near-surface atmospheric variables.
This study presents a data assimilation procedure based on the extended Kalman filter to assimilate a near real time ice surface temperature product in the sea ice scheme of HARMONIE-AROME. The new assimilation scheme is tested in an operational-like environment over a small experimental model domain containing a considerable amount of sea ice. The model simulations are cross-validated using an independent satellite ice surface temperature product and observations from the SYNOP stations located over the Svalbard and Franz Josef Land archipelagos. Effects induced by the developed assimilation procedure on the upper air model variables are assessed by comparing the model forecasts against the radiosonde soundings. The paper is organized as follows: Section 2 describes the data assimilation algorithm and utilized observations. Section 3 provides information about the numerical experiments and discusses the obtained results. In the final section a summary of obtained results is given and possible future developments are discussed.

Description of the Sea Ice Data Assimilation Scheme
The EKF algorithm within HARMONIE-AROME is implemented as a part of the surface parameterization package SURFEX (Masson et al., 2013). Surface data assimilation in HARMONIE-AROME is performed at each assimilation cycle (see the schematic on Figure 1a) by SURFEX, which provides a data assimilation framework allowing a choice between optimal interpolation (OI), simplified extended Kalman filter, and ensemble Kalman filter (see, e.g., Fairbairn et al., 2015;Mahfouf et al., 2009). Since this framework was developed for data assimilation over land surfaces, it can not be easily adapted for use in other parameterization schemes. Therefore, in the present work, a dedicated data assimilation procedure is introduced for sea ice data assimilation.
The data assimilation procedure for the SICE scheme is based on the discrete extended Kalman filter (EKF), which is a generalization of the classic algorithm by Kalman (1960) to nonlinear systems. The EKF is applied iteratively and, adapting the notation by Ide et al. (1997), defined in the following way: where b x is the model background state vector; a x is the analysis vector; y is the observation vector;  is the nonlinear model operator;  is the nonlinear observation operator; M is the linearized model operator matrix; H is the linearized observation operator matrix; B is the background error covariance matrix; Q is the model error covariance matrix; R is the observation error covariance matrix; K is the Kalman gain matrix; A is the a-posteriori background error covariance matrix, note the form of the matrix A formulation owing to nonoptimal Kalman gain K in the EKF; I is the identity matrix; and the − superscript mark indicates the values from the previous analysis.
Original formulations of the extended Kalman filter provided by the Equation 1 do not restrict the range of values which the state vector could take after analysis. However, in case of sea ice temperature or ice concentration, which are naturally bounded, applying the EKF could result in unphysical states in the ice model, such as negative ice concentration or ice temperature higher than melting temperature. The framework introduced in the present study allows for variable transformation from bounded to continuous space to avoid this situation.
In the present study variable transformation is introduced adopting the following notation: : : where Ξ is a vector in the original space; f Ξ is a vector in the transformed space; f is the direct transform function; and 1 f is the corresponding inverse transform function. Then, the nonlinear model and observation operators, and the corresponding Jacobian matrices, calculated for a transformed-space state vector 0 f x , could be written in the following form: where   is the nonlinear model operator in the transformed space;   is the nonlinear observation operator in the transformed space;  is the function composition operator; D is the differentiation operator; and f  denotes the possible transformation of the observed values. In the present application f  is applied only to the ice concentration observations.
For each variable the transformation functions f and 1 f are defined using the following logarithmic transform: where a and b are the lower and the upper bounds of the corresponding variable, respectively.

The Bias-Aware EKF
The optimality of Kalman filtering relies on the assumed statistical properties of the state and observation vectors. The random error components of the variables in these vectors are assumed to be zero-mean and follow the normal distribution. Although the normality requirement often holds to some extent, in case of NWP applications both the model and observational errors are subjected to having not only random but also systematic components-biases. In the presence of systematic errors, the performance of the standard or "bias-blind" EKF is degraded, with the filter unable to eliminate the model bias. In cases when considerable systematic errors are observed in the system, bias-aware filtering is applied, and biases are represented explicitly in the filter.
When applying the SICE scheme, HARMONIE-AROME tends to have a considerable positive bias of the ice surface temperature if snow cover over sea ice is misrepresented. Assuming that systematic component of the observational errors is small compared to the model errors, the model bias could be explicitly estimated using the bias-aware EKF formulations suggested by Dee (2005); Dee and Da Silva (1998). In this approach the classic EKF algorithm is extended by adding the model bias estimation step: where  K is the Kalman gain for model bias;  B is the background error covariance matrix for the model bias; b Δ is the model bias background state vector; a Δ is the model bias analysis vector. Thus, in bias-aware EKF analysis becomes a two-step procedure. At the first step, an updated estimate of the model bias for each control variable is obtained by means of Equation 9. Then, the estimated model bias is applied for bias correction of the state vector in the EKF analysis equation (Equation 10). The background error covariance matrix  B is usually not known and should be represented by an error covariance model (see, for example, Dee & Todling, 2000). In the present study, following the approach of Dee and Da Silva (1998),  B is defined as:  where  is a tuning constant set to 0.125 to allow slow evolution of the estimated model bias. At the model cold start an estimate of the model bias Δ should be provided to the filter, in this study the filter is initialized BATRAK 10.1029/2021MS002533 5 of 26 with zero model bias, thus requiring a spin-up period of several assimilation cycles to populate the estimated model bias vector.
The original form of the analysis equation in the bias-aware EKF (Equation 10) applies bias correction for all the control variables at the analysis stage. However, this approach is not optimal for the fast evolving model variables, such as ice temperature, because the model quickly compensates this correction after a few integration time steps. To increase the effect of the bias correction in SICE EKF, estimated model bias is corrected incrementally during the model forecast following Radakovich et al. (2001) by introducing a bias correction flux term into the surface energy balance equations of snow and sea ice in the SICE scheme. The flux correction term is defined as follows: where  is the length of the assimilation cycle, which is equal to 3 h in the present study. This flux is added to the surface energy balance equation: where t C is the surface thermal resistance; s T is the surface temperature; and B represents the sum of radiative, turbulent and conductive heat fluxes.
In SICE EKF only the correction for the estimated surface temperature bias is applied in form of the bias correction flux. This is sufficient because the ice thermal profile is driven mainly by the evolution of the surface energy balance (especially in winter time when shortwave solar radiation penetrating the ice pack is negligible). The estimated model bias for other control variables is used only to calculate the innovation vector. Thus, Equation 10 is transformed to: To take into account the growing uncertainty of the estimated model bias the method applied by Carton et al. (2000) is is a tuning constant. In the current study, following the results of preliminary tests,  is set to 0.98 resulting in the exponential decay of the estimated bias with the e-folding time of 50 assimilation cycles or 6.25 days assuming the 3 hourly cycling.
In the following text the term EKF denotes the bias-aware extended Kalman filter as defined by the system of Equation 15 with the superscript marks indicating the transformed elements omitted.

Adaptive Estimation of the Matrix Q
The model error covariance matrix Q is an important parameter of the Kalman filter which could be crucial for good performance of the filter. This matrix provides information about the errors induced by the model itself since the model is not ideal and only approximates real-life processes. The matrix Q can be considered as a tuning parameter of the filter, although different approaches for defining Q exist. In some applications the elements of Q are prescribed based on a prior expert knowledge and the behavior of the model (see, e.g., Draper et al., 2009;Kourzeneva, 2014). Another approach is to use the true information about the modeled BATRAK 10.1029/2021MS002533 6 of 26 system for computing Q. Obviously, the full true state of the modeled system is never available and the only source of information which could provide some insight is the observations.
In the presented EKF framework for sea ice data assimilation the model error covariance matrix Q is constructed as a block matrix consisting of two components: where  denotes the matrix direct sum. H Q represents the thermal profile part of the matrix Q and defined as: where i and j are the row and column indices of , H i j q in H Q , and  ,  and  are free parameters. R Q corresponds to the part of the state vector which does not contain any prognostic variables of the SICE scheme. In the current implementation R Q is a single-element matrix providing the error variance of the sea ice concentration: where  is the assimilation cycle length in hours, and  is a free parameter. The 0.279 constant is calculated for the 24 h persistent ice cover model using the IFS-HRES sea ice cover output. Therefore where  ,  ,  , and  are the scalar parameters which define the matrix.
To estimate the parameters of the matrix Q the method proposed by Dee (1995) is applied. This technique is based on the assumption that the innovations should follow the normal distribution (i.e.  where v is the innovation vector, and   T S HBH Q is the innovation covariance matrix). It uses the maximum likelihood to estimate a parameter vector  which would result in the best fit between the innovation vector and the assumed Gaussian distribution. The size of the innovation vector, or in other words, the number of available observations, should be considerably higher than the number of the estimated parameters to provide accurate innovation-vector-based estimates of the covariance parameters (this requirement could be alleviated to some extent by a generalized likelihood formulation which uses more than a single assimilation step, see Solonen et al., 2014 for additional details).
In case of sea ice data assimilation in SURFEX, the EKF is applied to each sea ice grid cell individually and this condition does not hold since the number of available observations is usually of the same order as the size of the parameter vector. Thus, to estimate , the size of the innovation vector should be considerably increased while keeping the number of estimated parameters. This could be done if the same parameter vector  is used to construct the matrix Q when running the EKF for different grid cells. Then, without changing the semantics of the SURFEX EKF scheme the innovation vector and the innovation covariance matrix can be expressed for some selected set  of ice-covered grid cells as: where  denotes the vector concatenation. In this case the maximum-likelihood function of Dee (1995) becomes: which gives a single likelihood estimate for a selected set of sea ice grid cells. Now the size of the innovation vector is much larger than the size of the parameter vector, allowing effective estimation. Taking into account the Equation 20,  can be efficiently computed as: where  denotes the matrix determinant. The gradient of the maximum-likelihood function is obtained in a similar way: where  n is the n-th element of the parameter vector  (see Dee, 1992 for details on the derivation of the gradient of the maximum-likelihood function). Then  is estimated by minimizing the Equation 22 applying the L-BFGS-B algorithm (Morales & Nocedal, 2011;Zhu et al., 1997). Obtained maximum-likelihood estimate of the parameter vector  * is then smoothed, as suggested by Dee (1995): where   is the previous estimate of the matrix Q parameters, and   [0;1] is the smoothing constant with the value set to 0.9.
In the present study  is defined as containing all the sea ice grid cells within the model domain applicable for the EKF procedure. It is also possible to reduce the size of  thus allowing multiple estimations of  for a given assimilation cycle.

Specific Considerations for Snow-Covered and Snow-Free Surfaces
When applying the EKF for sea ice temperature assimilation in SICE, two regimes of the sea ice cover should be considered. These two regimes are distinguished by presence of the snow layer on top of the sea ice. This layer not only insulates the ice from the atmosphere but it also has different thermodynamic properties than sea ice resulting in very short response time of the snow surface temperature to changes in the atmospheric conditions. Furthermore, when sea ice in the model is represented as an ice-only layer without snow cover, its temperature tends to persistently show a considerable warm bias, if snow cover is present in reality.
The SICE EKF framework described in the present study uses separate thermal profile-based state vectors for snow-free and snow-covered sea ice grid cells. State vector for the snow-free sea ice grid cells includes ice temperatures from four ice layers and SIC. When sea ice is covered by snow the EKF state vector includes snow heat content from the 12 snow layers and SIC but the ice temperatures are not included.

Ice Covered by the Snow Layer
Most of the year sea ice is covered by a layer of snow. Thermal properties of snow such as heat capacity and thermal conductivity differ considerably from those of sea ice, and snow layer shows much more rapid changes in the thermal profile. The snow surface temperature is strongly affected by the atmospheric forcing and promptly reacts to its changes and thus can be referenced as a "fast variable". Extremely short memory of the snow surface temperature makes the application of the data assimilation techniques for the sea ice surface temperature products less straightforward.
For example, the standard approach for computing the matrix M in the SURFEX implementation of the EKF within HARMONIE-AROME is the following: First the atmospheric model is run to produce the forecast, which is used to generate the atmospheric forcing. The model precipitations are excluded from the forcing to reduce the nonlinearities within the snow model. Then, the atmospheric forcing is used to run a series of standalone SURFEX simulations where each simulation is initialized from the previous surface analysis but with the model field representing a control variable modified by a prescribed additive perturbation. Standalone SURFEX simulations propagate the perturbed surface state forward in time for the cycle length (3 h in the present application). Finally, the output from the set of standalone SURFEX simulations is used to calculate M (see the schematic represented on the Figure 1b showing the surface data assimilation procedure with the EKF in SURFEX) by applying the finite-difference method to obtain Jacobians column by column as: where i x is the i-th control variable in the state vector; and  i x is the corresponding perturbation.
Although this approach works reasonably well for slowly evolving variables, such as ice temperature, it can not be applied to the full snow model. The fast nature of the evolution of the snow thermal profile and the strong steering signal from the atmospheric forcing result in a quick deterioration of the computed matrix M, and for the 3 hourly cycling M shows almost no connection between the individual components of the state vector. Using such M greatly degrades the performance of the filter by hindering the forward propagation of the background error covariances.
To compute a well-formed matrix M, a simplified version of the snow scheme is called when running the perturbed SURFEX simulations. For the snow scheme applied in SICE, the surface energy balance can be written in the following way: where t C is the surface thermal resistance; R is the surface radiative balance; Q is the turbulent flux of sensible and latent heat; C is the conductive heat flux from the snow pack to the snow surface; and A denotes the set of atmospheric forcing variables. In Equation 26 all the right hand side terms depend on the snow surface temperature s T , and the process of changing the snow temperature through the heat diffusion within the snow pack is the slowest process. For the standalone SURFEX simulations the snow surface temperature is assumed to be strongly steered by the atmospheric forcing for the fast-evolving components of the surface energy balance (i.e. where  is a transformation function) and s T is excluded from the radiative balance and turbulent heat flux terms as follows: This form of the surface energy balance essentially leaves only the heat conduction processes to be represented in the calculated matrix M. Although this approach allows to calculate a well-formed M, it has an obvious drawback of introducing the discrepancy between the non-linear and linearized versions of the snow scheme model operator. This inconsistency could result in the deteriorated performance of SICE EKF, especially for longer assimilation cycles.

Snow-Free Ice Surface
During the summer time snow layer actively melts which results in a snow-free state of the sea ice model. When ice model enters the snow-free state a new state vector for the EKF routine is constructed, which is based on the ice surface temperature profile, and the filter is re-initialized. The ice variables are less responsive to the changes in atmospheric forcing and a well-formed matrix M can be computed from the full model in contrary to the case of snow-covered ice.
When sea ice is covered by a snow layer less than 3 cm thick, this snow is ignored in the data assimilation procedure and ice is assumed to be still snow-free. For the grid cells where snow layer becomes thicker than 3 cm, the filter is re-initialized and the state vector is reverted to the state vector based on the snow temperature profile variables.
This distinction between the snow free and snow covered sea ice states is made to save computational resources and simplify the technical implementation of the filter. The SICE scheme uses the multilayer snow scheme that has 12 snow layers in the default configuration, but the ice temperature profile in SICE is constructed with four ice layers. Thus, during the summer snow-free period the SICE state vector is considerably shorter than the one applied for the snow-covered sea ice grid cells. BATRAK 10.1029/2021MS002533 9 of 26

Observations
Although in-situ observations of sea ice provide the most accurate information about the sea ice state, they have a number of limitations rendering them more useful for verification studies than for the routine data assimilation in NWP. First, in-situ observations often have very high locality (i.e. they report only local conditions which could be nonrepresentative for a grid cell-sized area) which makes them more difficult to utilize in operational short-and medium-range weather forecasting models where typical effective resolution starts from (10)  km (assuming that horizontal resolution of regional NWP systems is ranging from 1 to 4 km; see, e.g., Chapter 17 of WMO, 2015, and an example of an operational HARMONIE-AROME application described by Müller et al., 2017). Second, these data sets are sparse and sporadic owing to the high deployment cost of the measuring units and the vast size of the Arctic Ocean. And third, in-situ observations over sea ice often become available for data assimilation with considerable delays, which limit their applicability in operational NWP. Therefore, the more suitable observation type for sea ice data assimilation is remote sensing data. These data sets have known processing time and they cover considerable areas, compared to in-situ observations. The limited spatial resolution of satellite observations makes them more representative on the model grid cell scale.
In the present study ice surface temperature observations are provided by a near real time Level-2 ice surface temperature product (Dybkjaer et al., 2018) derived from the measurements taken by the VIIRS instrument onboard the Suomi NPP satellite. This product has 750 m spatial resolution and it is derived from the infrared channel data, therefore providing ice surface temperature only for cloud-free conditions. Also the quality of the selected product depends on the applied cloud detection algorithms, which generally are less accurate in night-time conditions resulting in lower quality of the product.
The data assimilation framework within SURFEX requires observations to be provided on the same spatial grid as the model due to the one-dimensional nature of the data assimilation procedure which does not provide a means for operating with neighboring grid cells. The standard approach in HARMONIE-AROME, which is applied to the analysis of snow depth, and screen level temperature and humidity observations, is to first perform the two-dimensional OI procedure which uses the latest available forecast as the model background field (Taillefer, 2002) and then treat the obtained gridded analysis as observations in the one-dimensional SURFEX data assimilation. The SICE EKF implementation follows this approach by using the two-dimensional OI to obtain the gridded observations as the first processing step, which is technically implemented as a separate analysis procedure that uses only the ice surface temperature observations. Screen level air temperature observations, considering their general sparsity in the Arctic, are not taken into account during the ice surface temperature analysis.
Very high density of the VIIRS SIST product results in spatially correlated observational errors making the application of such a product in a data assimilation framework much more computationally expensive since the usual assumption of uncorrelated observations does not hold (Liu & Rabier, 2002). To ensure that the observational errors are uncorrelated, various approaches are applied, for example, thinning (Järvinen & Undén, 1997) or creating superobservations (Benjamin, 1989). A clear disadvantage of thinning is that a considerable number of available observations is discarded. Applying superobservations, though showing a benefit over thinning (Hoffman, 2018), results in smoothed observations which could have a negative effect on the performance of the data assimilation procedure. In the present study neither a thinning procedure nor the superobservation approach is applied to the observations. Instead, the observation error covariances are scaled as discussed by, for example, Butterworth et al. (2002), to reduce the weight of observations. Scaling is done in the following way: assuming that the effective resolution of HARMONIE-AROME over ocean and sea ice is close to eight grid cells (Skamarock, 2004) or 20 km for a 2.5 km grid and taking into account the Whittaker-Nyquist-Kotelnikov-Shannon sampling theorem, the observational error standard deviations are scaled by a constant factor of  1 20 13.3 2 0.75 .
Since the VIIRS SIST product provides information only in cloud-free conditions, presence of cloud cover results in data gaps in the product. When these gaps are large the OI procedure produces the 2D analysis field which does not differ from the model background field for a considerable number of sea ice grid cells. As a consequence, for these grid cells the analysis departures are zero and the model would be treated as a perfect one by the filter. This results in underestimated parameters of the matrix Q, and also, in the case of BATRAK 10.1029/2021MS002533 10 of 26 the bias-aware EKF procedure, deteriorates the estimated model bias. To avoid these effects an additional processing step after the 2D OI procedure removes gridded observations in all model grid cells located farther than 30 km away from the valid VIIRS SIST product pixels.
The part of the observation operator in SICE EKF, which connects the ice temperature model variable and the ice surface temperature reported by the observational product is defined as: where sist T is the surface temperature over an ice-covered grid cell (which can also contain some open sea); SIC is the ice concentration;  w is the emissivity of water; sst T is the sea surface temperature;  i is the emissivity of the sea ice surface; ist T is the temperature of the topmost model layer in the SICE scheme (which could be either ice or snow layer, depending on the sea ice state). Thus, ice concentration has a strong effect on the model equivalent of the sea ice temperature, especially when fraction of open water is relatively high. In result, when the model and ice surface temperature product show a considerable discrepancy in the ice concentration, the EKF can produce unrealistically huge increments deteriorating the model state. To alleviate this effect the ice concentration field was added to the state vector in SICE EKF. Consecutively, the observational vector was extended with ice concentration field provided by IFS-HRES boundary data. However, the ice concentration field from IFS-HRES is still used for defining ice-covered regions, and for grid cells where ice cover in HARMONIE-AROME is not consistent with updated ice cover provided by IFS-HRES (e.g., if IFS-HRES reports an ice-covered grid cell, but it is ice-free in HARMONIE-AROME or vice versa) the EKF is not applied and SICE state is initialized (or cleared) according to IFS-HRES ice cover.
To simplify the implementation of the EKF in SURFEX a fixed-size observation vector is used and the missing observations are specified via the values of the observation error covariance matrix, which is defined as: which always has a finite value because in the present study sea ice concentration is provided by IFS-HRES and has no missing values. In the present study both  2 sist and  2 sic are prescribed and do not vary in time nor space. This definition allows updating the background error covariance matrix even in the case when the full set of observations is not available by effectively zeroing the corresponding elements of the Kalman gain matrix.

Design of the Numerical Experiments
To assess the performance of the developed SICE EKF analysis scheme it was tested within HARMO-NIE-AROME for a dedicated model domain which contains a considerable amount of sea ice covered grid cells (see Figure 2). This domain is relatively small, which saves computational resources, but large enough to include both Svalbard and Franz Josef Land archipelagos, thus allowing verification against in-situ observation from SYNOP and radiosonde stations. Moreover, it includes both the marginal ice zone and areas of seasonal ice in the Barents sea, as well as the areas closer to the North Pole with perennial ice cover. The model domain has a size of  500 500 grid cells with 65 vertical levels and spatial resolution of 2.5 km. Boundary data for all the experiments were provided by the global NWP system IFS-HRES, and HARMO-NIE-AROME used a 60 s integration time step.
Two types of numerical experiments were conducted: experiments of the first type were designed to test the performance of the filter over a considerably long time period; the second type of experiments assessed the possible effects of the new scheme on numerical weather forecasts. The summary of all the numerical experiments is presented in the Table 1.
For testing the performance of the filter two experiments were conducted: The reference run HA-REF without sea ice data assimilation, and the sensitivity run HA-EKF, which includes SICE EKF. Both experiments started at September 1, 2019 and run with 3 hourly cycling strategy until the February 1, 2020. The sea ice cover at the start of the experiments was initialized as a snow-free ice surface with uniform thickness of 0.75 m. HA-REF and HA-EKF were performed without upper air data assimilation while instead, at each assimilation cycle, the blending procedure was applied to mix the atmospheric model fields of HARMONIE-AROME with IFS-HRES fields. The surface data assimilation in HA-REF uses the simplified EKF framework  to assimilate screen level air temperature and relative humidity, as well as the snow depth observations over land surfaces. The HA-EKF experiment follows the surface data BATRAK 10.1029/2021MS002533 12 of 26 To study the effects of adding the implemented sea ice data assimilation procedure on the HARMO-NIE-AROME forecasts, three additional experiments were performed. These experiments are as follows: the reference experiment 3DVAR-REF, the sensitivity experiment 3DVAR-EKF, and the experiment with coupled surface and atmospheric data assimilation procedures 3DVAR-EKF-TS. The 3DVAR-REF experiment follows the configuration of the HA-REF experiment, except that for the upper air data assimilation the three-dimensional variational analysis is applied. For the upper air data assimilation conventional observations were complemented by remote sensing radiance observations provided by AMSU-A and IASI instruments. The 3DVAR-EKF experiment follows the general configuration of 3DVAR-REF but also includes the EKF procedure for surface data assimilation over sea ice. These two experiments use the default settings of HARMONIE-AROME regarding the sequence of the surface and upper air data assimilation procedures, which are performed independently and do not influence each other. But taking into account the fast nature of evolution of the ice surface temperature, it can be beneficial to use the best possible estimate of the surface temperature field when performing upper air data assimilation. For example, improved surface temperature field available at the analysis time can improve the model equivalent of satellite radiance observations, since the model observation operator uses surface temperature as one of the input parameters (for the details on HARMONIE-AROME observation operator for satellite observations see Saunders et al., 2018). In the HARMONIE-AROME implementation of 3DVAR the surface temperature field in the model background state is taken directly from the previous cycle forecast, thus the data assimilation routine operates with modeled state of the surface temperature, which is liable to modeling errors and biases. However, a better estimate of the surface temperature field is available after performing surface data assimilation procedure, which is done prior to upper air data assimilation in HARMONIE-AROME. Therefore, the surface temperature field in the model background state can be replaced by the latest surface analysis field before performing the upper-air assimilation. To assess the possible effects of this approach an additional experiment, 3DVAR-EKF-TS, was conducted. This experiment is based on 3DVAR-EKF, but, at each assimilation cycle the surface data assimilation procedure is run first to produce the surface temperature analysis field. Then, the surface temperature field of the model background state in upper air data assimilation is replaced by the generated surface analysis field thus introducing a coupling between the surface and upper air data assimilation. Each of the three experiments in this group covers the time period from December 1, 2019 to January 1, 2020. Initial conditions of the model state on December 1, 2019 were taken from the HA-EKF experiment and the upper air data assimilation system was initialized by taking the background BATRAK 10 whether the EKF data assimilation procedure for sea ice was applied in the corresponding experiment. b upper air data assimilation (or initialization) procedure. c whether the surface temperature field in the model background state was replaced with the analysis provided by the surface data assimilation procedure prior to performing atmospheric data assimilation.

Table 1
Overview of the Numerical Experiments error covariance matrix and variational bias correction parameters from the operational numerical weather prediction system AROME-Arctic (Müller et al., 2017). It is possible to use the interpolated background error covariance matrix from AROME-Arctic because the experimental domain is located almost completely inside the AROME-Arctic domain and has the same vertical grid and spatial resolution. For the 3DVAR-EKF and 3DVAR-EKF-TS the EKF state parameters were taken from the HA-EKF experiment on November 30, 2019 at 21 UTC cycle.
The experiments 3DVAR-REF, 3DVAR-EKF, and 3DVAR-EKF-TS were configured to use a 3 hourly cycling strategy with 3 h forecasts at 00, 03, 06, 09, 15, 18, 21 UTC cycles, and with 48 h forecast for 12 UTC cycle for verification purposes. The number of available observations for upper air data assimilation in HAR-MONIE-AROME over Arctic model domains depends on satellite overpasses and varies between different cycles with 00 UTC cycle tending to have the least amount of available observations and 12 UTC having the most. This can be illustrated by the Figure 3 showing the available observations (excluding blacklisted and rejected) within the experimental model domain on December 15, 2019, or Table 3

Validation Data Sets
To study the effects of introducing the sea ice data assimilation scheme in HARMONIE-AROME, model forecasts from the numerical experiments were verified against independent observations. In-situ observations of the sea ice surface temperature are scarce in the Arctic and provide information only about the local conditions. Therefore, to get a general overview of the performance of the model system a comparison against a remote sensing product should be applied. Because SICE EKF already assimilates a SIST product derived from the VIIRS instrument, an independent data set would be preferable for validation purposes. The present study uses a near real time SIST product derived from the MODIS instruments onboard the Terra and Aqua satellites (Hall & Riggs, 2015a, 2015b, which is an independent data set covering a considerable area. The ice surface temperature product is provided as 5 min swathes and it has 1 km spatial BATRAK 10.1029/2021MS002533 14 of 26 resolution. As the first step, the SIST product is aggregated on the 2.5 km model grid, selecting only the pixels marked by the quality assurance algorithm of the product as "good quality". Then the aggregated ice surface temperature product is compared with the HARMONIE-AROME model surface temperature forecasts ignoring the model grid cells where total cloud cover in the model is higher than 1 octet. To better understand the effects induced by the introduced sea ice data assimilation procedure on the forecast of the near-surface atmospheric variables such as screen-level temperature and humidity, and 10 m wind speed, model results were additionally validated against in-situ observations from 13 SYNOP stations located within the model domain (see the Figure 2). These stations routinely report atmospheric conditions with 1 h temporal resolution, except for the station located in the Franz Josef Land archipelago, which reports with 3 h resolution.
The model domain also includes a radiosonde sounding station located on the western coast of the Spitsbergen island. Soundings from this station are launched at 12 UTC. The present study used data taken during sonde ascent at the main pressure levels for assessing the effects of the ice surface temperature analysis on the model forecast of the upper atmosphere parameters. However, since this station is not located on the sea ice, only indirect effects can be studied.

Results of Validation Against Observations
As a first step, the ice surface temperature fields in HA-REF and HA-EKF were verified against the MODIS SIST product. Figure Figure 5 indicates that HA-EKF better represents the ice surface temperature field than HA-REF, but this improvement is rather short term and quickly diminishes with the forecast lead time. This conclusion is supported by the RMSE growth maps (see the Figures 5c and 5d). HA-REF shows no clear pattern in RMSE growth and fluctuations within the computed field originate from the irregularities in the observational product and discrepancies in the available MODIS product pixels between different forecast reference times. From the other side, HA-EKF has an area of increased error growth over the seasonal ice cover in the Barents sea. In this area, errors in the HA-EKF experiment grow faster than in HA-REF, thus diminishing the difference between two experiments after 3 h of model forecast.
To better assess the added value of the applied sea ice data assimilation procedure, weekly values of the ice surface temperature RMSE for HA-REF and HA-EKF were computed. Figure 6 shows the series of weekly (ME) and standard deviation of errors (ESTD) for the ice surface temperature in HA-REF and HA-EKF (which are connected with RMSE as   2 2 2 RMSE ME ESTD ). Both HA-REF and HA-EKF show positive ME, which is considerably higher during the autumn months of the experiments (Figure 7a). This is a known feature of the SICE scheme in HARMONIE-AROME resulting from the simplified implementation of the snow cover parameterization and misrepresentation of the ice cover in autumn. Using sea ice data assimilation helps to reduce this warm bias to some extent, but it is unable to remove the bias completely. Beginning in mid-November both HA-REF and HA-EKF show similar evolution of the ice surface temperature bias. Although, since the MODIS SIST product is available only for the cloud-free conditions, comparing it with model SIST can show spurious warm bias due to discrepancy between the MODIS cloud mask and the HARMONIE-AROME model cloud cover. At least some part of the warm bias found in Figure 7a can be attributed to this effect. The time series of ESTD scores show a somewhat opposite evolution than that of the ME series (see the Figure 7b) with ESTD having lower values in the beginning of the experiment. Such a feature of ESTD can be explained by the fact that in September sea ice is still in the melting regime and shows less temperature variability than in winter time. When ice starts freezing and a considerable amount of snow is accumulated on top of sea ice, the system becomes less constrained, which results  Three additional experiments (3DVAR-REF, 3DVAR-EKF, and 3DVAR-EKF-TS) were conducted to better understand the effects of the introduced assimilation scheme. The model forecasts from these experiments were compared against the in-situ observations from the SYNOP stations located over the Svalbard and Franz Josef Land archipelagos. Figure 8 shows the effect of the sea ice data assimilation on the air temperature model forecast as a relative change in 2 m temperature RMSE between 3DVAR-REF and 3DVAR-EKF for the individual SYNOP stations, computed for the forecast lead times from 0 to 12 h from the 12 UTC cycles. For most of the SYNOP stations within the model domain, the 3DVAR-EKF experiment shows reduced 2 m temperature RMSE, especially for the stations located over the eastern part of the Svalbard archipelago, which are situated close to the ice-covered areas. For the six stations on the western coast of Spitsbergen, the difference in 2 m temperature RMSE is smaller and three stations show a 1% increase of RMSE in 3DVAR-EKF forecasts compared to that of 3DVAR-REF. For the Hopen island station 3DVAR-EKF shows a 5% increase in 2 m temperature RMSE compared to the values obtained from the reference experiment BATRAK 10.1029/2021MS002533 19 of 26 While the Figure 8 shows a generally positive effect of introducing the SICE EKF analysis, it provides only summary scores for each of the stations. To get a better understanding of the induced effects, the verification scores were calculated for the individual forecast lead times. Figure 9 shows RMSE scores as a function of the forecast lead time for 2 m air temperature, 2 m specific humidity and 10 m wind speed for 3DVAR-REF, 3DVAR-EKF and 3DVAR-EKF-TS. As can be seen from the Figure 9a, 2 m temperature RMSE in both 3DVAR-EKF and 3DVAR-EKF-TS is considerably lower than in the reference experiment 3DVAR-REF.  Although changes in the ice surface temperature induced by the applied analysis procedure have a considerable effect on the near surface parameters such as 2 m temperature, improved representation of the skin temperature can benefit the upper air variables as well. Direct effects of having the improved surface temperature are manifested, for example, in form of more realistic turbulent fluxes between air and underlying surface. But these effects are localized in the lower-most part of the model atmosphere and are expected to be less pronounced for the higher levels of the model grid. From the other side, having surface temperature analysis available in 3DVAR can result in additional impact in the upper air fields originating from the modified model equivalent of the assimilated observations, especially in case of radiance assimilation.
To assess the relative effect of using surface temperature analysis in the upper air 3DVAR analysis, the forecasts of the air temperature in 3DVAR-EKF and 3DVAR-EKF-TS were compared against that of 3DVAR-REF. This comparison was performed by calculating the root mean square difference (RMSD) between the reference and sensitivity experiments and then assessing the change in RMSD between 3DVAR-EKF and 3DVAR-EKF-TS. Figure 10 shows the effect of using the surface temperature analysis in 3DVAR as a function of the forecast lead time and vertical coordinate. As could be seen from the figure 3DVAR-EKF-TS shows higher RMSD values than 3DVAR-EKF, or in other words 3DVAR-EKF-TS differs from 3DVAR-REF more than 3DVAR-EKF. This difference is more apparent in the lower part of the model atmosphere and close to zero for the upper-most model levels. Although 3DVAR-EKF-TS shows a clear increase in RMSD it could not be attributed to improved or degraded forecast skill because the reference experiment 3DVAR-REF does not provide the true state of the atmosphere. To verify the effect of the introduced data assimilation routine on the evolution of upper air model variables they should be compared with an observational data set.
Inside the model domain there is a sounding station (see the Figure 2, marked with a cross), which regularly launches radiosondes at 12 UTC thus providing the required observational data. Observations from this station were not assimilated during the model runs therefore provide an independent data set, which could be used for the analysis. of air temperature (see the Figure 11a) 3DVAR-EKF and 3DVAR-EKF-TS show no clear advantage over the reference experiment 3DVAR-REF.
Moreover, 3DVAR-EKF shows higher than in 3DVAR-REF ESTD at the heights between 850 hPa and 500 hPa. The 3DVAR-EKF-TS experiment reduces air temperature ESTD compared to 3DVAR-EKF, but shows no considerable improvement over the result obtained from 3DVAR-REF.
For the lowest level at 925 hPa both 3DVAR-EKF and 3DVAR-EKF-TS show reduced ESTD compared to 3DVAR-REF (with the values of 1.19°C for 3DVAR-EKF and 3DVAR-EKF-TS, and 1.31°C for 3DVAR-REF). The ESTD profiles for the air specific humidity, shown on the Figure 11b, indicate that for the levels higher than 700 hPa 3DVAR-REF, 3DVAR-EKF, and 3DVAR-EKF-TS show the same ESTD values. For the levels below 700 hPa 3DVAR-EKF shows reduced ESTD values compared to both 3DVAR-REF and 3DVAR-EKF-TS. Specific humidity quickly diminishes with height, and this characteristic feature is reflected in the ESTD profile. Figure 11c shows the wind speed ESTD profile with 3DVAR-EKF performing close to 3DVAR-REF for all the levels except 400 hPa where it shows increased ESTD value. For the 3DVAR-EKF-TS experiment wind speed ESTD is lower than in 3DVAR-EKF for all the pressure levels. For the levels below 500 hPa 3DVAR-EKF-TS shows lower ESTD than 3DVAR-REF and 3DVAR-EKF. Compared to 3DVAR-REF, 3DVAR-EKF-TS reduces wind speed ESTD by 0.2 1 ms on average in the lower part of the profile. For the forecast lead times of 24 and 48 h difference in ESTD profiles between 3DVAR-REF, 3DVAR-EKF, and 3DVAR-EKF-TS for all the three parameters is greatly reduced (not shown).
Comparison against the radiosonde observations shows that for air temperature and humidity, except for the lowest pressure levels, 3DVAR-EKF and 3DVAR-EKF-TS has no clear advantage over 3DVAR-REF. For the wind speed 3DVAR-EKF-TS shows a minor improvement over 3DVAR-REF and 3DVAR-EKF in the lower part of the wind speed profile. Although, the conclusions that can be drawn from this analysis should be taken with care because there is only a single sounding station within the model domain and the experimental period is relatively short. Thus the difference between the model experiments might not be statistically significant.

Discussion and Conclusions
Sea ice plays an important role in formation of Arctic weather and climate (see, e.g., Vihma, 2014). Numerical weather prediction systems running over the Arctic require accurate representation of the sea ice state to represent the processes of air-surface interaction. For short-range numerical weather prediction, sea ice surface temperature becomes an important characteristic of the sea ice state because it steers the turbulent exchange of heat between the ice surface and the lowest model level. Thus having accurately represented ice surface temperature helps to improve the temperature profile of the lowest part of the model grid directly above the ice-covered regions and in the remote areas where air masses are advected, for example, during cold air outbreaks. Numerical parameterizations can not represent the full complexity of the physical processes governing the sea ice state and thus prone to systematic and non-systematic errors. In the HARMO-NIE-AROME NWP system sea ice is parameterized by a simple one-dimensional scheme SICE. In the present study an adaptive bias-aware extended Kalman filter assimilation scheme was implemented for SICE in HARMONIE-AROME to improve the modeled ice surface temperature. This procedure fills the gap in observation coverage and assimilates a near real time SIST product derived from the VIIRS instrument observations. Assimilation of the satellite product is performed in a two-step way, which is standard BATRAK 10.1029/2021MS002533 22 of 26 for surface data assimilation in HARMONIE-AROME. In the first step, the 2D OI is performed using VIIRS product and model surface temperature forecast. In the second step, the 1D EKF procedure is applied over ice-covered grid cells to update the vertical profiles of ice or snow temperature and ice concentration. The systematic component of the model error is estimated by the filter and corrected during the model forecast by applying an auxiliary correction flux to the surface energy balance in SICE. The filter adapts the parameters of the model error covariance matrix following the Gaussian likelihood estimation based on the innovation vector. To reduce the nonlinearity of the sea ice scheme within the assimilation cycle of SICE EKF, HARMONIE-AROME uses a 3 h assimilation cycle, and a simplified version of the snow scheme is used for deriving the linearized model operator matrix.
The developed system was assessed by comparing the model forecast of the ice surface temperature against an independent satellite ice surface temperature product. Assessment showed that applying SICE EKF procedure helps to reduce RMSE of the ice surface temperature, although the effect diminishes with the forecast length. Spatially aggregated series showed that the implemented EKF scheme reduces the systematic error of the ice surface temperature during autumn months although it is unable to completely remove the model bias. Non-systematic errors are considerably reduced throughout the whole experimental period.
To assess the effect of the introduced sea ice data assimilation procedure on the model forecast of near surface parameters such as 2 m air temperature and humidity, and 10 m wind speed, an additional series of experiments was performed with 48 h forecasts performed for 12 UTC cycles. For these experiments, verification was done against the land SYNOP stations located over the Svalbard and Franz Josef Land archipelagos. According to the verification results, experiments which include the SICE EKF procedure show reduced 2 m temperature RMSE over the first 24 h of the model forecast compared to the reference run without sea ice data assimilation. For the 2 m air specific humidity and 10 m wind speed introducing sea ice data assimilation procedure shows no clear improvement over the reference experiment.
The impact of SICE EKF on the upper air variables was studied by verifying the numerical experiments against the independent observations retrieved from the radiosonde soundings. While showing no clear advantage over the reference run at the pressure levels above 850 hPa, near the surface the sensitivity experiment shows indication of reduced air temperature ESTD. For the air specific humidity and wind speed no clear positive impact was observed. Such a weak effect of the sea ice data assimilation procedure can be partly attributed to the size of the experimental model domain. In the present study the model domain is relatively small, thus boundary conditions have a stronger effect on the model state than in large domains.
The effects of using the latest surface analysis in the model background for upper air data assimilation were studied in a separate experiment. This experiment showed that using the latest available surface temperature analysis in 3DVAR results in a model state that differs more from the reference run than when using the standard approach without updating the model background state before 3DVAR. This indicates potential benefits of the suggested approach. Verification against the radiosonde observations shows that using updated model background state results in no clear improvement for air temperature and air humidity fields but reduces wind speed ESTD for the lower part of the profile (below 500 hPa).
The results of the model experiment verification against the satellite and in-situ observations indicate that applying the sea ice data assimilation procedure helps to improve the representation of the sea ice surface temperature in the model, and model forecast of the near surface air temperature over the regions located close to the ice covered areas. Positive impact of the assimilation scheme in the ice surface temperature field is still traceable after 3 h of model forecast, although less prominent than for the 0 h forecast lead time. The effects of the implemented assimilation procedure can be traced not only in the screen-level parameters but also in the upper-air fields, though weaker. Stronger effects were observed when using updated model background state, although upper air verification was performed using the data from the only sounding station located within the model domain, and these findings should treated as preliminary. A more general study over an extended model domain, which includes several sounding stations, is required to assess the effects of SICE EKF on the atmospheric variables in full detail. Moreover, updated surface temperature analysis incorporates not only the sea ice analysis but also the updated sea surface temperature and land surface temperature fields, thus the effects observed in the upper air fields could not be attributed to the sea ice analysis only. Possible application of the developed procedure in the operational environment requires additional research when introducing the new scheme in the so-called long cut-off cycle in HARMONIE-AROME, which is necessary due to the timeliness of the satellite products that are to be assimilated. Another aspect, which should be considered, is the computational cost. Running a surface analysis in HARMONIE-AROME with SICE EKF for sea ice and the simplified EKF for soil data assimilation has the computational complexity of   (max , ) m n  for a given set of observations, where m and n are the sizes of the state vectors in SICE EKF and in soil analysis, respectively. Thus, if state vector in SICE EKF is larger than state vector of soil analysis, computational cost of surface analysis is increased compared to the reference configuration.
Future work on SICE EKF includes a more detailed evaluation of the performance of the implemented filter over the extended model domain and longer verification period. The performance of the filter during the melting period, which did not receive much attention in the present work, should be further studied. The introduced data assimilation framework allows adding new observation types, which opens a great opportunity for further improvement of the representation of the sea ice state. For example, ice albedo, which is important in summer time for accurate representation of the surface energy balance, is one potential candidate for extending the data assimilation framework. BATRAK 10.1029/2021MS002533 25 of 26

Acknowledgments
The author would like to thank Roger Randriamampianina, Erin Thomas, Gert-Jan Marseille, and Avichal Mehra for their valuable comments. Comments and suggestions from the three anonymous reviewers helped to considerably improve the paper. This research was funded by the Research Council of Norway through the "Advanced models and weather prediction in the Arctic: enhanced capacity from observations and polar process representations (ALERTNESS)" project (NFR-280573). This is a contribution to the Year of Polar Prediction (YOPP), a flagship activity of the Polar Prediction Project (PPP), initiated by the World Weather Research Program (WWRP) of the World Meteorological Organisation (WMO).