Evaluating errors due to unresolved scales in convection‐permitting numerical weather prediction

In numerical weather prediction (NWP), observations and models are quantitatively compared for the purposes of data assimilation and forecast verification. The spatial and temporal scales represented by the observation and model may differ and this results in a scale mismatch error which may be biased and correlated. The aim of this paper is to investigate the structure of representation error in convection‐permitting NWP models for four meteorological variables: temperature, specific humidity, zonal and meridional wind. We use high‐resolution data from the experimental Met Office London Model (approximately 300 m grid‐length) to simulate perfect observations and lower‐resolution model data. The scale mismatch error and its bias, variance and correlation are calculated from the perfect observation and low‐resolution model equivalents. Our new results show that the scale mismatch bias is significant in the boundary layer for temperature and specific humidity, whereas the variance is significant in the boundary layer for all analysed variables. Contrary to previous studies using low‐resolution (km‐scale) data, horizontal correlations are shown to be insignificant. However, all variables exhibit considerable vertical representation error correlation throughout the boundary layer. Our results suggest that significant biases and vertical correlations exist that should be accounted for to give maximum observation impact in data assimilation and for fairness in model verification and validation.

results in a scale mismatch error, also known as the error due to unresolved scales (Daley, 1993;Hodyss and Satterfield, 2017;Janjić et al., 2018). These errors may be biased and correlated; hence, to obtain the best result from the data assimilation or verification process, the uncertainty of the error due to unresolved scales must be accounted for (Stewart et al., 2008;2013;Schutgens et al., 2016;Ben Bouallegue et al., 2020).
In data assimilation, the treatment of the error due to unresolved scales depends on how the scale mismatch arises. For cases when the observation footprint is larger than the model resolution, for example, geostationary satellite data with an instantaneous field of view several kilometres wide assimilated into a convection-permitting (km-scale) NWP model (Janjić et al., 2018), the error can be mitigated by appropriately filtering the model fields. For cases when the observation footprint is smaller than the model resolution, for example when high-resolution weather radar data are assimilated , the assimilation system must account for the uncertainty in the scales unresolved by the model. This uncertainty in the unresolved scales can be accounted for by using adapted data assimilation methods such as the Schmidt-Kalman filter (Schmidt, 1966;Janjić and Cohn, 2006), but the most common approach is to use standard data assimilation methods, accounting for the uncertainty due to unresolved scales by including it in the observation-error covariance matrix (Lorenc, 1981). Here the error due to unresolved scales contributes, along with pre-processing and observation operator error, to the error of representation (Janjić et al., 2018).
Many recent studies have estimated the entire observation-error covariance matrix and then hypothesised about the contribution of the error due to unresolved scales (e.g., Stewart et al., 2014;Waller et al., 2016b;2016a;Bormann et al., 2016;Wang et al., 2018). However, it is possible to estimate uncertainty in the unresolved scales in isolation from the instrument error and other sources of observation error (Fielding and Stiller, 2019). Previous studies have used high-resolution observations (e.g., Oke and Sakov, 2008) or high-resolution model data (e.g., Daley, 1993;Waller et al., 2014;Liu and Rabier, 2002;Satterfield et al., 2017) to approximate the error due to unresolved scales. However, these methods rely on unrealistic assumptions about periodicity, or require an ensemble of forecasts or very-high-resolution observational data. Schutgens et al. (2016) proposed an alternative method that uses high-resolution model data to simulate perfect observations and perfect model data from which a scale mismatch error can be calculated.
Recently, in NWP there has been limited research into estimating the scale mismatch error that arises for specific observations or atmospheric variables. The work of Waller et al. (2014) showed for mesoscale NWP (12 km grid-length) that the scale mismatch errors for temperature and specific humidity are spatially correlated and state-dependent; furthermore, the errors are larger (in a relative sense) for humidity than for temperature. The fields of oceanography and atmospheric chemistry have also investigated the significance of representation error (e.g., Oke and Sakov, 2008;Boersma et al., 2016;Karspeck, 2016;Schutgens et al., 2017) and as with NWP it is shown that the errors are state-dependent.
Scale mismatch is not only a problem for data assimilation, but also for model validation where models are evaluated against observational datasets, or products derived from them. In the verification context, the representation error is often referred to as spatial sampling error. It is particularly problematic for global climate models where relatively low-resolution models are often compared with historical point observations. In this context, for a variety of geophysical variables, it has been found that spatial sampling error is often significant and larger then measurement error and should be not be ignored in the verification process (Schutgens et al., 2016). Furthermore, the work of Bock and Parracho (2019) showed that, for integrated water vapour data from GPS, representation errors can be strongly enhanced by specific topographic and climatological features (e.g., steep topography).
The aim of this paper is to carry out a novel investigation of the structure of representation error in a convection-permitting NWP model. We will follow the method of Schutgens et al. (2016) from which the error due to unresolved scales can be calculated and hence the scale mismatch bias and uncertainty estimated. A detailed description of this methodology is given in Section 2. We will apply this methodology to data from the Met Office's ∼ 300 m grid-length London Model, described in Section 3, to provide an estimate of scale mismatch uncertainty present in a model with a grid-length of approximately 1.5 km (equivalent to the Met Office's operational convection-permitting UK variable resolution model). We consider the scale mismatch error for four variables; temperature, specific humidity, zonal and meridional wind. We note that, despite its high resolution, the London Model is still not able to resolve all scales and processes and hence our estimates should be treated as a lower bound to representation error for high-resolution observations. Our results are presented in Section 4. We show that the representation error bias and variance for all variables is most significant in the lowest model levels. We also show that the error variance for all variables is uncorrelated in the horizontal; however, significant correlation (defined as absolute value of correlation greater than 0.2) exists in representation error at different model levels. We conclude in Section 5 that, for convection-permitting data assimilation, the scale mismatch error will be an important contributor to the observation error in the boundary layer.

ERRORS OF REPRESENTATION
In this section we define representation errors and the methodology we will use in this manuscript to estimate them.

Definition
In data assimilation, a scale mismatch error arises when the observations contain information on different spatial scales compared to the model they are assimilated into. The difference between the observation, y ∈ R N p , and its model equivalent, , where x b ∈ R N m is the background state and H ∶ R N m  → R N p is the linear observation operator, is known as the innovation, We note that in this study we only use a linear observation operator, and hence use this in the definition of representation error; however a full derivation of representation error using a nonlinear operator can be found in Janjić et al. (2018). The innovation may also be expressed in terms of the error-free observation, y truth , the true state at the model resolution, x truth and the errors in the background and observation, where b ∈ R N m is the background error and i ∈ R N p is the instrument, or measurement, error associated with the observation.
The difference between the error-free observation and the true state at the model resolution mapped in to observation space results in the error of representation, r = y truth − H(x truth ). (4) As defined mathematically in Janjić et al. (2018), this representation error consists of three different errors: the quality control or pre-processing error which arises when there are imperfections in the selection or processing of observations; the observation operator error which arises when an approximate mapping between model and observation space is used; and the error due to unresolved scales and processes which arises when the observations and model contain information at different scales. The representation error is most commonly accounted for in the data assimilation process by including the representation error covariance, F ∈ R N p ×N p , along with the instrument error covariance, E ∈ R N p ×N p , in the total observation error covariance matrix R = E + F.
In this manuscript, we investigate the error due to unresolved scales and processes and calculate the associated representation error covariance, F. Therefore, to ensure that the representation error in Equation (4) consists only of the error due to unresolved scales, we ensure that we have no quality control or processing uncertainty and no approximation except for interpolation in the observation operator.

Estimation
We use the methodology proposed in Schutgens et al. (2016) to calculate the difference between high-resolution observations and low-resolution model observation equivalents. We refer the reader to Schutgens et al. (2016) for a full description of the method; we present a brief overview in this section. A high-resolution gridded model field is used to generate both the high-resolution observations and low-resolution model observation equivalents. The 3D high-resolution model grid-points are given coordinates (i, j, k) such that (i, j) are the horizontal coordinates i = 1, … , n i and j = 1, … , n j with vertical coordinates k = 1, … , n k . The (true) value of the model field, x, at time t and location (i, j, k) is denoted x truth ijkt . Thus we define a perfect error-free direct observation at point (i, j, k) as The low-resolution model field at the same location, x m ij t , is defined by taking a weighted average over a (2Δ i + 1) × (2Δ j + 1) × (2Δ k + 1) region of the high-resolution field, where Δ i , Δ j and Δ k are the longitudinal, latitudinal and vertical half-sizes of a low-resolution grid box in the coordinate indices and w ijk is a normalized weighting function. We note that in many models the vertical level height, h truth ijk , has some dependence on the underlying terrain, with this dependence decreasing with increasing model level number. Because the low-resolution model fields are obtained by averaging across model levels, it is reasonable to assume that the value of the low-resolution model field variable is valid at the averaged model level height, Since the height of the high-and low-resolution fields may differ, we use k to denote high-resolution model levels and for the low-resolution model levels.
To calculate the representation error, it is necessary to calculate the observation model equivalent y m ijkt . The model equivalent of the observation is calculated by linearly interpolating the low-resolution model data from the two nearest model levels to the observation height, Here x m ij ′ t is the nearest model level below the observation, x m ij ′′ t is the nearest model level above the observation and As the observation and model equivalent are defined to represent the same variable at the same location, the representation error consists only of a scale mismatch error and may be calculated by Using samples of these representation errors calculated at different times, we are able to calculate, and examine the spatial variability of, the representation error bias, variance and correlation.

MODEL DATA
In this section we describe the Met Office London Model used to generate data for this study. We then describe the specific data and methodology used in our experiments.

The Met Office London Model
The is run routinely and used to produce T+36 hr forecasts at 0600 and 2100 UTC. One of the major differences between the London Model and UKV is the improved resolution of orography and land use data (including the vegetation fractions, urban morphology, land/sea mask and aerosol emissions). The increased resolution of the underlying orography, shown in Figure 1, allows the hills and valleys in the domain to be much better resolved than in lower-resolution models. According to Lean et al. (2019), who carried out experiments with a suite of high-resolution research models over London in comparison with high-resolution observations, model grid-lengths of order 100 m may be sufficient for predicting many bulk and statistical properties of convective boundary layers.

Case-study data and methodology
To estimate representation error for the UKV model we use data from all 600 T+6 hr forecasts (i.e., forecasts valid at 0300 and 1200 UTC) from the London Model between 6 March 2018 and 31 December 2018. The weather for this period is described by Kendon et al. (2019).
In this work we consider the scale mismatch uncertainty for four variables: air temperature, specific humidity, meridional and zonal wind. Using the method given in Section 2.2 we estimate the representation error for observations on the order of a 1.5 km grid. We assume that we are able to directly observe variables at the position of the model grid. Hence, the observations are taken to be every fifth grid point from the high-resolution grid. We note that it is possible to generate indirect observations and their model equivalents, but this has potential to introduce additional sources of error. In this study we consider only direct observations as we focus solely on estimating the scale mismatch error.
We derive our low-resolution model fields using Equation (6) with Δ i = Δ j = 2 and Δ k = 0, with equal weight given to each point, that is w ijk = 1/25. (We note that, in the calculation of the low-resolution model fields, we do not average across vertical levels because, at present, the London model and the UKV use the same vertical levels.) This horizontal reduction in resolution results in a N i × N j = 80 × 72 low-resolution lat-lon grid that has a spacing approximately equal to the UKV model. Data within 3 km of the high-resolution model lateral boundaries are neglected from the calculation to ensure that our estimates are not affected by boundary condition spin-up. To ensure our observation and model equivalent are at consistent heights, we calculate the observation model equivalent by interpolating the model data from the two nearest model levels to the observation height using Equation (8). We note that in the lowest levels it is not always possible to interpolate between levels as the observation height may be lower than the lowest model level. In operational data assimilation, it is common that the observation value is adjusted to be representative of the model surface height. However, here we will neglect the data from the lowest two levels to endure that all our model equivalents have been calculated in a consistent manner.
Using the high-resolution observation and low-resolution model data, we are able to calculate the representation error (using Equation (9)) for all available data in our case-study period. We also use Equation (9) to provide an 'orography representation error', o ij , that is, the difference between the high-resolution and low-resolution orography fields, for each grid point. We note that we calculate the representation error at each observation location, and therefore have a 3D estimate of representation error, r ijkt , for each of the N t = 600 T+6 hr forecasts produced by the London model during the case-study period. Averaging the estimated scale mismatch errors over the period of the case-study provides climatological statistics appropriate for a 1.5 km model. We estimate the representation error bias and standard deviation for all of the observation locations on our 3D grid. These results will be used to evaluate the spatial distribution of representation error. In particular we will consider the horizontal distribution of estimated bias and variance fields for three representative model levels. The levels selected are as follows: level 5 (approximately 133 m, in the boundary layer), level 20 (approximately 1,605 m, around the region of the boundary-layer top), and level 40 (approximately 5,872 m, in the mid-troposphere). To further assess the results, we also consider the mean and absolute maximum representation error bias and standard deviation for each model level. The equations for the metrics we calculate are given in Table 1. We also consider how representation error is correlated in the zonal, meridional and vertical directions.
It is known that the representation error uncertainty is state-and time-dependent, with the error being larger during convective events and for variables that have smaller-scale features, for example, humidity (Waller et al., 2014). Hence, we note that, by calculating the scale-mismatch error statistics over the case-study period, TA B L E 1 Equations for representation error (RE) bias and standard deviation, mean and absolute maximum representation error bias and standard deviation and Pearson correlation coefficient

RE bias
Note: The Pearson coefficient equation describes the correlation between estimated representation error mean and orography representation error; the correlation coefficient between estimated representation error standard deviation and absolute value of orography representation error is calculated using the same equation with r ijk and its associated mean replaced with | r ijk | and its associated mean. we may overestimate or underestimate the uncertainty for specific weather events. However, our results provide an initial climatological estimate of representation error, for the four chosen variables, which could be included in a data assimilation scheme, or used to guide the interpretation of verification results.
To assess the relative importance of the representation error variances that we find, we compare our results against the WMO OSCAR observation uncertainty requirements for high-resolution NWP (WMO., ). The WMO provides three different levels of uncertainty requirement: threshold (minimum requirement for data to be useful), breakthrough (data will provide significant but not optimal benefit) and goal (ideal requirement above which further improvements are not necessary). Here we use the goal uncertainty. For the four variables considered in this study, the WMO OSCAR goal observation uncertainty requirements are detailed in Table 2. For specific humidity, the desired error is given as a percentage of the variable value. We therefore calculate the average specific humidity at each of our model levels of interest (levels 5, 20 and 40) and calculate the desired uncertainty. This information is included in Table 3. We remind the reader that the representation error forms part of the total observation error and therefore values of representation error lower than those given in Table 2 do not necessarily imply that the representation error is negligible.

RESULTS
In this section we present our estimated representation error statistics for air temperature, specific humidity, zonal and meridional wind. In Section 4.1 we present the spatial distribution of the scale-mismatch error bias and variance. We present the vertical and horizontal correlations in Sections 4.2 and 4.3 respectively.

Spatial distribution of representation error bias and variance
We begin this section by giving a basic description of all the figures presented in this section. In subsequent paragraphs we give a more detailed discussion of the results, where we find it instructive to discuss the figures in tandem. We first consider the horizontal estimated bias and variance fields for the three model levels discussed in Section 3. Other model levels within the appropriate atmospheric layers exhibit qualitatively similar behaviour to the results shown. The results for temperature, specific humidity, zonal and meridional wind are plotted in Figures 2-5 respectively. We note that, for each variable and each model level, the colour scale of the plots varies; this is to allow the spatial distributions to be evaluated. Due to the model staggered grid, the winds are plotted at slightly different heights from the temperature and specific humidity. Results from a stratospheric model level will also be discussed, but results are not plotted. We also consider if there is a relationship between the estimated representation error bias or standard deviation and the 'orography representation error' (the difference between the high-resolution orography and its low-resolution equivalent). For temperature, the Pearson correlation coefficient between the representation error mean (standard deviation) and the orography representation error (absolute value) for each model level are plotted in Figure 6. The absolute maximum and mean values for bias and standard deviation at all model levels for temperature, specific humidity, zonal and meridional winds are plotted in Figure 7. Figures 2a, 3a, 4a and 5a show that the bias in the boundary layer is considerable in some locations with the magnitude of the bias 0.14 K, 0.00004 kg⋅kg −1 and 0.14 m⋅s −1 for temperature, specific humidity and wind respectively. (This is comparable to approximately 28, 29 and 14% of the WMO goal uncertainty requirements.) It is also notable that the underlying orography structure (shown in Figure 1) is visible in the representation error bias for all variables, particularly temperature and specific humidity. The bias has a larger magnitude over areas with higher surface elevation. Figure 6 quantifies the correlation between the representation error bias and difference between the high-and low-resolution orography for temperature. Despite the visible structure of the orography in the representation error bias (Figure 2), there is no substantial correlation between the orography difference F I G U R E 2 Spatial distribution of (a, b, c) representation error bias and (d, e, f) standard deviation for temperature (K) at model levels and temperature bias; this is also the case for the other variables (not shown).
We now consider the behaviour of representation error bias with height. By comparing panels (a, b, c) of Figures 2-5, we see that for all variables the bias decreases with height; for example, the magnitude of bias for temperature reduces from 0.14 K (28% of the WMO uncertainty requirements) in the boundary layer ( Figure 2a) to 0.025 K (5%) in the troposphere (Figure 2c). Whilst the magnitude of the bias decreases with height, the structure of the orography is visible and persists throughout the boundary layer and troposphere; only in the stratosphere is the bias for all variables independent of the orography. The visible structure of the orography in the representation error bias fields is expected due to the large influence of the high-resolution orography and land use data in the London model. We note that the influence of the orography is less strong and decreases more rapidly for the wind variables than for temperature and specific humidity. Figure 7 confirms that the absolute maximum bias decreases with height for all variables.
We see that the absolute maximum bias for temperature is non-zero up to 15 km. For both temperature and specific humidity, the bias is largest in the lowest 5 km whereas for the wind variables the absolute maximum bias is only large in the lowest model levels. Figure 7 also shows that model level average representation error is unbiased.
The difference between the absolute maximum and mean biases highlights the spatial variability of the representation error bias seen in Figures 2-5. This suggests that it will be necessary to account for the spatial dependence of representation error bias in the data assimilation system.
By comparing panels (d, e, f) of Figures 2-5 we see that, as with the bias, for all variables the standard deviation decreases with height. For example, the estimated standard deviations for wind are 0.56 m⋅s −1 and account for 56% of the WMO uncertainty requirement in the boundary layer, where as in the troposphere this reduces to 0.18 m⋅s −1 (18%). This is expected since the natural variability of the analyzed fields decreases with height and therefore the difference between high-resolution observation and low-resolution model is expected to decrease and become less variable. The Pearson correlation coefficient for temperature representation error standard deviation and orography representation error plotted in Figure 6 (blue lines) shows that for temperature the standard deviation is related to the orography throughout the boundary layer and troposphere. In contrast, for specific humidity and winds the representation error standard deviation is only related to the orography in the lowest levels. At level 20 for all variables (and level 5 for specific humidity) there is a decrease in the standard deviation in a region just southeast of the centre of the domain corresponding to the city of London. It is possible that this is related to the change in roughness parameter associated with the city compared to its surrounding areas. For the model levels in the stratosphere (not shown) the error standard deviations for all variables are very small and there is no obvious spatial variation. From Figure 7a,c,d we see that the absolute maximum and average representation error standard deviations for temperature and wind are non-zero at all model levels. For specific humidity above 10 km, both the absolute maximum and average representation errors are very small, and it is likely that for these regions the representation error could be neglected without detriment to the analysis. Figure 7 allows us to compare the magnitude of the bias compared to the standard deviation. For the specific humidity and wind variables, the absolute maximum and mean bias are always smaller than the absolute maximum and mean standard deviation. However, for temperature the absolute maximum bias is of similar magnitude to the mean standard deviation throughout the troposphere. Idealized studies have shown that it is more important to account for the bias caused by representation error rather than the representation uncertainty (Bell et al., 2020). Therefore, our results suggest that the treatment of representation error bias may be more important for temperature, whereas correctly accounting for the error variances may be more important for specific humidity and wind.

Representation error vertical correlation
We next consider how the representation errors are correlated in the vertical. Figure 8 shows the correlations between the representation errors at different model levels for temperature, specific humidity and zonal and meridional wind.
The representation error for temperature has significant correlation (defined as absolute value of correlation greater than 0.2) in the boundary layer. The representation error is positively correlated within the boundary layer; however, the levels within the boundary layer are weakly anti-correlated to those levels near the boundary-layer top. We hypothesize that this negative correlation is due to the variation in the boundary-layer top since the calculations average representation error differences over the year, and day and night. Between the boundary-layer top and the tropopause, the representation error still exhibits some weak positive correlation. The estimated representation error correlations resemble the temperature background-error correlations that exist when mixing occurs between the air in the boundary layer and the free atmosphere above (Fowler et al., 2010).
The representation error correlations for specific humidity exhibit similar behaviour to those for temperature, with the strongest correlation between F I G U R E 6 Pearson correlation coefficient for temperature (K) as a function of model level calculated as described in Table 1 between estimated representation error mean and orography representation error (grey), and between estimated representation error standard deviation and absolute value of orography representation error (black) model levels in the boundary layer. However, unlike temperature, there are no negative correlations in the region of the boundary-layer top.
The representation error correlations for zonal and meridional wind are qualitatively similar. Representation error is positively correlated in the first ten levels; negative correlation is seen between the surface levels and the boundary-layer top. Above 2 km there is no significant representation error correlation between model levels. The change in sign of the correlations in the boundary layer is a consequence of the typical dynamics of wind in the boundary layer, and is also seen in background-error covariance models for divergent wind (Brousseau et al. 2011).
We note that, for the vertical error correlations calculated here, we have assumed that observations in a single column are directly above each other. In operational data assimilation it is unlikely that profiles of direct observations will share exactly the same latitudinal and longitudinal position, and in this case the representation error correlation should ideally account for both a horizontal and vertical component.

Representation error horizontal correlation
We finally consider horizontal correlation for representation error. For each model level we calculate correlations in the meridional and zonal directions. Our results (not shown) suggest that, for all levels and for all variables in both the meridional and zonal directions, there is no significant representation error correlation. This result differs from previous findings which show that representation error (calculated on a different scale, and using a different method, to the results presented here) can exhibit significant horizontal correlations (Waller et al., 2014). Given that representation error has been shown to be state-dependent (Janjić and Cohn, 2006), it is possible that our results show no significant correlation as any structure has been averaged out as we have used representation error samples across a nine-month period.

CONCLUSIONS
Scale mismatch errors arise in data assimilation when observations contain information on different spatial and temporal scales compared to the model they are assimilated into. When the observations contain information on a smaller scale than the assimilating model, the representation errors can be biased and correlated. Furthermore, these errors are known to be dependent on the particular weather regime. It has been usual in NWP to account for the scale mismatch error by inflating the observation-error variance. Recent work has attempted to estimate covariance matrices for a variety of observations, but these estimated matrices are approximate, and combine all sources of observation uncertainty, and hence it is not simple to assess the contribution of the scale mismatch error. In this manuscript we investigate the importance of scale mismatch error for four variables in regional data assimilation. We estimate representation errors for temperature, specific humidity, zonal and meridional wind by taking the difference between perfect observations and low-resolution model data all simulated using high-resolution model data. Our high-resolution data are provided by T+6 hr forecasts from the Met Office's experimental London Model, and we simulate low-resolution model data with a resolution equivalent to the Met Office UKV convection-permitting model. We note that, despite its high resolution, the London Model is still not able to resolve all scales and processes and hence our estimates should be treated as a lower bound to representation error for high-resolution observations. Our results show that representation error bias and variance, for all the variables analysed, are more considerable in lower model levels. Previous work (e.g., Boutle et al., 2016) has shown that the increased resolution orography has significant impact on increasing variability in the London Model fields, so it is intuitive that the largest differences between our simulated perfect observation and simulated low-resolution model will occur in lower levels where the resolution of the orography impacts the model fields most. In addition to the orography, the land use data, including the vegetation, urban morphology and aerosol emissions may also contribute to the representation error at lower model levels, though we have not been able to quantify their contribution in this investigation. For temperature it is shown that the representation bias has similar magnitude to the variance. Bell et al. (2020) showed that it is more important to account for bias caused by representation error than the representation error variance; therefore, this may be particularly important for temperature and specific humidity. We show that estimated correlations in the horizontal are insignificant. This is not supported by previous work with lower-resolution data over a 1.5 hr window (Waller et al., 2014) which has shown that representation errors can exhibit spatial correlations. It is known that horizontal representation errors are state-and time-dependent. The absence of horizontal correlations in our results may be due to averaging the data over a large case-study period. Furthermore, the previous work used a different method to calculate the scale-mismatch errors which required a periodicity assumption which we do not make here.
In contrast to the horizontal correlations, significant correlations exist between vertical levels for all of the variables analysed. For temperature and specific humidity, large positive representation error correlations persist throughout model levels in the boundary layer; weaker correlations are also present between the levels in the troposphere. For wind, the representation error is positively correlated in the boundary layer, but negatively correlated at the boundary-layer top. Our finding that the correlation of scale mismatch error is much greater in the vertical than in the horizontal is also supportive of a similar result shown in Fielding and Stiller (2019).
The experiments described in this manuscript have allowed us to examine, in isolation, the scale mismatch component of the representation error. Our results suggest that, for convective-scale data assimilation, the scale mismatch error will be an important contributor to the observation error in the boundary layer. Furthermore, significant biases and vertical correlations exist which should be accounted for if observations are to be accurately assimilated and maximum information extracted.