Forecast verification: Relating deterministic and probabilistic metrics

The philosophy of forecast verification is rather different between deterministic and probabilistic verification metrics: generally speaking, deterministic metrics measure differences, whereas probabilistic metrics assess reliability and sharpness of predictive distributions. This article considers the root‐mean‐square error (RMSE), which can be seen as a deterministic metric, and the probabilistic metric Continuous Ranked Probability Score (CRPS), and demonstrates that under certain conditions, the CRPS can be mathematically expressed in terms of the RMSE when these metrics are aggregated. One of the required conditions is the normality of distributions. The other condition is that, while the forecast ensemble need not be calibrated, any bias or over/underdispersion cannot depend on the forecast distribution itself. Under these conditions, the CRPS is a fraction of the RMSE, and this fraction depends only on the heteroscedasticity of the ensemble spread and the measures of calibration. The derived CRPS–RMSE relationship for the case of perfect ensemble reliability is tested on simulations of idealised two‐dimensional barotropic turbulence. Results suggest that the relationship holds approximately despite the normality condition not being met.


INTRODUCTION
Operational numerical weather prediction (NWP) centres use a range of metrics to monitor and communicate forecast performance and make decisions about model upgrades. These metrics, which summarise information contained in forecasts and verifying analyses and convert them into scalar values, can broadly be divided into two categories. The first category of metrics quantify differences between a single forecast and the verification field. Since these metrics depend on only one forecast state, they can be viewed as deterministic metrics. They are often used to communicate forecast skill to the public (Bauer et al., 2015) as well as in theoretical predictability studies (e.g., Lorenz, 1969;Leith, 1974). On the other hand, probabilistic metrics measure the sharpness and reliability of forecast distributions generated by ensembles of forecasts. The philosophy of probabilistic verification is therefore rather different from that of deterministic verification. That being said, it is intuitive to expect that, as probabilistic forecasts evolve in time, the loss of information manifest by the widening of forecast distributions should somehow be matched to the growth of deterministic errors when individual ensemble members, or indeed the ensemble mean, are compared against the verifying analysis. Yet not much is known about whether this relationship can be quantified mathematically, beyond the fact that the ensemble spread should agree with the root-mean-square error (RMSE) of the ensemble mean when the forecast is reliable. There has been some progress in this direction, with Gneiting and Raftery (2007) and Leutbecher and Haiden (2021) establishing certain analytic formulae for the probabilistic metric Continuous Ranked Probability Score (CRPS). In this article we shall demonstrate further that, in a bulk sense and under certain conditions, the CRPS is a function of the RMSE of the ensemble members. Furthermore, this RMSE can be related to the squared difference between the ensemble mean and the verifying analysis, which is in itself a deterministic verification metric. In this way, the CRPS-RMSE relationship may draw a link between deterministic and probabilistic verification.
The article is structured as follows. Section 2 introduces the CRPS and the RMSE, and discusses in what ways the RMSE can be interpreted as a deterministic verification metric. Our main result, the CRPS-RMSE relationship, is derived in Section 3. Its usefulness is explored for simulations of idealised two-dimensional (2D) barotropic turbulence in Section 4, where departures from the predicted relationship will be discussed in the light of the validity of the conditions imposed in the derivation. Section 5 summarises the results and concludes the article.

Preliminaries
We adopt the notation of Gneiting and Raftery (2007) in respect of scoring rules for probabilistic predictions. Let P denote the predictive distribution of a scalar random variable U which materialises at value u. A scoring rule S(P, u) is a function of the predictive distribution and the verification value. If, given a predictive distribution P, the verification value follows some (conditional) distribution Q, then the average score over many predictions with distribution P can be denoted by S(P, Q) ≔ E Q [S(P, u)], with the second argument of the function S(⋅, ⋅) now being a distribution instead of a scalar value 1 . However, in contrast with the set-up of Gneiting and Raftery (2007), we shall assume that scores are negatively oriented so that forecasts with lower scores are better. Hence proper scores over a given class  of distributions have the property If, for every Q ∈  the equality S(Q, Q) = S(P, Q) holds only when P and Q are the same 2 , then the score is known to be strictly proper (Gneiting and Raftery, 2007). The special situation where P = Q is known as the ensemble being reliable (although we acknowledge that other definitions and characterisations exist).

Continuous Ranked Probability Score
The CRPS is a widely used metric that evaluates the full ensemble distribution of a continuous scalar variable and penalises unsharp distributions. It is the integral of the squared difference between the cumulative distribution function (CDF) of the forecast and of the verification: where F(x) is the CDF of P and H u (x) is the Heaviside function at the verification value u. An equivalent expression for the CRPS, often known as the 'kernel representation', is available for distributions P whose first moments are finite: where U and U ′ are independent random variables drawn from the distribution P (Gneiting and Raftery, 2007). A proof of equivalence is provided in Lemmata 2.1 and 2.2 of Baringhaus and Franz (2004). Gneiting and Raftery (2007) noted that the CRPS is a strictly proper score over a very general class of distributions, namely the class of Borel probability measures whose first moments are finite. For the special case of normal distributions P =  ( P , 2 P ), an explicit formula for 1 Without ambiguity, S(⋅, ⋅) can mean either the score for an individual prediction or the expected score over many predictions, depending on the second argument being a scalar variable or a distribution. 2 This should be understood in the sense of measure theory, that P and Q only have to be equal up to a null set.
and (c) CRPS(P, Q)∕RMSE(P, Q) as functions of relative bias b = ( P − Q )∕ Q and spread ratio r = P ∕ Q CRPS(P, u) is available (Gneiting and Raftery, 2007): where erf(z) ≔ (2∕ √ )∫ z 0 e −y 2 dy is the error function. Denoting for the probability density function (PDF) of a standard normal random variable and for its CDF, this formula could be obtained by substituting into Equation (2), integrating by parts and invoking the identity Equation (4) can be integrated over a normal  ( Q , 2 Q ) kernel to yield a formula for the expected score CRPS(P, Q): is the relative bias and is the ratio of standard deviations, or simply the spread ratio. The Appendix demonstrates that the integral can be expressed analytically as where ) .
Note that, provided the verifying distribution Q is fixed, the qualitative properties of CRPS(P, Q) are fully determined by the function f (b, r) which is shown in Figure 1a. This formula for CRPS(P, Q), agrees exactly with the one obtained by Leutbecher and Haiden (2021), who used the kernel representation of the CRPS (Equation (3)) as the starting point of their derivation.

Root-mean-square error
The root-mean-square error (RMSE) is the square root of the ensemble members' mean squared error (MSE) from the verification value. The latter is defined as for an outcome u ∈ R and a distribution P of its forecast U. Mathematically speaking, this is the MSE of u as an estimator of the ensemble mean, although this could somewhat be counter-intuitive in a forecasting context. Nevertheless, the standard bias-variance decomposition of MSE applies: where P and P are respectively the mean and the standard deviation of P. Assuming that the verifying distribution Q for u has mean Q and standard deviation Q , the expected score MSE(P, Q) is where b and r are as in Equations (7) and (8). The second equality can be established by observing that is the MSE of P as an estimator of u, whence the same bias-variance decomposition applies. From Equation (13), it follows that Note that we have not defined RMSE(P, u). Should it be defined by taking the square root of Equation (11), then the RMSE(P, Q) defined in Equation (14) would generally not be equal to E Q [RMSE(P, u)]. Hence, strictly speaking, the RMSE does not fit into the framework of scoring rules. It is simply a convenient transformation of the scoring rule MSE(P, u), since it has the same physical dimensions as the variable u of interest. Given that the RMSE relates with the MSE bijectively and monotonically, we may nevertheless apply the concepts of scoring rules to the RMSE, bearing in mind that in this sense the two quantities are synonymous. The RMSE is not a proper score over any non-degenerate class of distributions (Gneiting and Raftery, 2007), as graphically confirmed in Figure 1b.
The RMSE discussed here should not be confused with the RMSE of the ensemble mean, which is based on the MSE of the ensemble mean, defined as By verifying the ensemble mean as if it were a deterministic forecast in itself, MSE mean (P, u) and therefore its associated RMSE can be seen as a score with deterministic roots. Compared with Equation (12), MSE mean (P, u) lacks the contribution from the ensemble variance 2 P . If the alternative formulation were to be used in place of the MSE and RMSE defined in Equations (13) and (14), then all expressions involving 1 + b 2 + r 2 throughout this article would have to be replaced by 1 + b 2 . (As a consequence, the multiplicative factor √ 2 often mentioned in the forthcoming sections would become √ .)

DERIVATION OF THE CRPS-RMSE RELATIONSHIP
So far we have seen the basic mathematical properties of the CRPS and the RMSE. Since the former is proper while the latter is improper, it is generally impossible to draw a one-to-one correspondence between the two. Nevertheless, if we compare Equations (9) and (14), we obtain What Equation (16) suggests is that, on average, the CRPS and the RMSE are related through a multiplicative factor dependent on b and r as far as predictions of normally distributed scalar variables are concerned. This multiplicative factor as a function of b and r is shown in Figure 1c. The average, as defined in Subsection 2.1, refers to aggregation over a large number of cases that share the same predictive and verifying distributions (P and Q). However, standard verification practice in NWP aggregates these scores across dimensions defined a priori such as grid points and forecast start dates, rather than by predictive and verifying distributions. How can Equation (16) be modified to accommodate this?
It is important to bear in mind that, in the notation S(P, Q) for a given score S, there is an implied conditioning on the predictive distribution being P, since S(P, Q) is the average of S(P, u) over many P-distributed predictions. Q in this notation refers to the distribution of the verification value u, but it is also conditional upon the predictive distribution being P. To derive a formula for an aggregated score that takes into account the different possibilities of predictive distributions, it is therefore necessary to include information about the heteroscedasticity of P, that is, the relative frequency of occurrence of different predictive distributions. Since we have assumed that P is normal, such heteroscedasticity can be interpreted as a joint meta-distribution Θ of the parameters P and P . In this case the aggregated score S * can be written as This is the expectation of a conditional quantity, S(P, Q). Without prescribing any specific forecast frequency distribution Θ, we can only simplify this expression further by making an extra assumption that S(P, Q) be in fact unconditional on P. This is equivalent to saying that all forecasts have the same relative bias b and spread ratio r, regardless of P and P . It includes the case where all forecasts are reliable (i.e., P = Q, or (b, r) = (0, 1)), but also includes the more general case where forecasts are consistently biased or over/underdispersive by a certain percentage. Substituting CRPS for S and using Equation (9), we have Similarly, substituting MSE for S and using Equation (13) gives so that Equations (18) and (20) thus provide expressions for the CRPS and the RMSE aggregated under heteroscedastic conditions, where P's parameters can vary from grid point to grid point, and from one forecast start date to another. These expressions assume the normality of forecast distributions as well as the consistency of the relative bias and spread ratio across all forecasts. Combining these expressions and denoting for the relative heteroscedasticity of the ensemble's standard deviation, we have Here, we see that the ratio between the aggregated CRPS and the aggregated RMSE is the product of two terms: f (b, r)∕ √ 1 + b 2 + r 2 , which depends only on the relative bias and the spread ratio; and 1∕ √ 1 + h, which depends only on the relative heteroscedasticity of the ensemble spread. Since h ≥ 0 by definition, it follows that f (b, r)∕ √ 1 + b 2 + r 2 is the upper bound of such CRPS-RMSE ratio (provided that the predictive and verifying distributions are both normal), which is the same as the multiplicative factor given in Equation (16) and shown graphically in Figure 1c. In the limit where the standard deviation P is homoscedastic, that is, Var Θ [ P ] → 0, the bound is attained and Equation (16) is recovered.
Using Equation (10), we can see that for reliable predictions of normally distributed random variables the CRPS-RMSE relationship simplifies to which is bounded above by 1∕ √ 2 . The bound is robust to small biases, since it is an even function with respect to the b = 0 axis. For example, a 5% bias only increases this bound by 0.06%. It is more sensitive to small degrees of non-calibration of the ensemble spread. To first order, the bound increases by 0.5% for every 1% of under-dispersiveness of the ensemble, and vice versa. Table 1 provides more detail on how the bound responds to small departures from perfect ensemble reliability.

THE CRPS-RMSE RELATIONSHIP IN AN IDEALISED 2D TURBULENCE MODEL
The CRPS-RMSE relationship for normally distributed random variables is numerically tested in an experiment involving 2D barotropic turbulence. Due to the nature of perfect-model idealised turbulence simulations, we are only able to test the relationship for reliable predictions (Equation (23)), where P = Q. The turbulence is governed by the equation where t is the time, is the velocity streamfunction 3 , is the vorticity, Δ is the 2D Laplacian operator 4 and Equation (24) is solved pseudo-spectrally in a doubly periodic domain, with a truncation wavenumber of k t = 1024. This is equivalent to a (2k t ) × (2k t ) = 2048 × 2048 grid. The forcing f and dissipation d are prescribed in spectral space. By forcing at both large and small scales, a hybrid k −3 -k − 5 3 background spectrum is obtained, where k is the scalar wavenumber. The length-scale at which the spectral break sits respects the canonical hybrid spectrum observed and simulated in the midlatitude upper troposphere (Nastrom and Gage, 1985;Judt, 3 The velocity streamfunction is related to the velocity (u, v) by

TA B L E 1
Relative changes of f (b, r)∕ √ 1 + b 2 + r 2 compared to the case (b, r) = (0, 1), when (a) the relative bias b is varied but the spread ratio r is fixed at 1, and (b) the spread ratio r is varied but the relative bias b is fixed at 0

Experimental design
A long control integration of Equation (24) is taken as the verification. When the turbulence is fully developed and reaches a statistically stationary state, a normally distributed random variable centred at zero is added to all Fourier coefficients of the vorticity field to generate the 'truth'. The variance of the random variable, 2 , depends on the magnitude but not the direction of the wavevector. It can be shown that a perturbation of magnitude (k) relative to the energy spectral density E(k) of the control integration can be generated by choosing 2 = [ (k)E(k)∕2 ]k. In this experiment, (k) is fixed to be 10 −6 across all k. Next, M = 4 ensemble members are generated from the 'truth' using the same perturbation statistics as the generation of the 'truth' from the control integration. All perturbations for the four ensemble members and for the 'truth' are mutually independent. The perturbed simulations are integrated for a fixed time period of T = 150 non-dimensional units, allowing the error to almost fully saturate by the end of it.
The experiment is repeated for N = 30 start dates. This can be thought of as N 1 = 5 years, among which the control integrations are fully independent, and N 2 = N∕N 1 = 6 start dates per year initialised at intervals of 0.1T.
The choice of a relatively small M and large N is motivated by Leutbecher (2019). That article suggests that if the CRPS for reliable ensembles is adjusted using to remove the effects of the ensemble size being finite, then a reduction in the number of ensemble members used for numerical experimentation returns more robust results than a reduction in the number of start dates, provided that the constraints in computational cost are similar. The experimental design guarantees a reliable ensemble, since the verification is statistically indistinguishable from the M ensemble members. As such, Equation (23) is expected to hold subject to P being a normal distribution as the simulation evolves.
The scalar variables of interest chosen for this study are the velocity components u and v. For each start date and grid point, the CRPS and the MSE are computed for both velocity components in physical space. The computation of the CRPS is performed using the algorithm set out by Hersbach (2000). These metrics are then aggregated over Λ ≔  ×  × , where  represents the set of 2048 2 grid points,  the 30 start dates and  the two canonical directions (u and v), but remain as functions of the forecast lead time. Isotropy of the turbulence enables the scores for u and v to be combined without changing the results.
When the metrics are aggregated, the quantity E Λ [S(P, u i )] is computed for each lead time, where S can be CRPS or MSE, and where u i represents a generic velocity component. The law of iterated expectations guarantees the last two equalities of which result from the definition of S and Equation (17) respectively. In this way, CRPS * and RMSE * (the square-root of MSE * ) can be empirically computed, which should satisfy Equation (23) subject to the normality assumption. To account for the finite ensemble size, the aggregated CRPS is corrected by Equation (25) before being compared with the aggregated RMSE.

Results
For notational purposes in this subsection, we denote the start date by t 0 , and write U(t, t 0 , x, e 1 ) for u(t, t 0 , x) F I G U R E 2 Growth of the ensemble-mean error energy spectrum, or equivalently the power spectrum of RMSE * (curves from bottom to top, plotted at intervals of 0.1T), whose initial condition is indicated by the lowest curve and U(t, t 0 , x, e 2 ) for v(t, t 0 , x). A subscript "f" attached to U(t, t 0 , x, e i ), u(t, t 0 , x) or v(t, t 0 , x) (where i = 1 or 2) indicates a forecast, in which case the variable is understood to be a random variable with distribution P. The absence of the subscript indicates the verification, which is also interpreted as a random variable but with distribution Q. Figure 2 illustrates the growth of the error energy spectrum. More precisely, it is the spectrum of the ensemble-mean error energy aggregated over all grid points and start dates, that is, the spectral decomposition of In two spatial dimensions 5 and where the ensemble is reliable (P = Q), this is equivalent to the spectral decomposition of where Equations (11) and (26) have been used in the first two equalities respectively. As such, Figure 2 may also be interpreted as the evolution of the power spectrum of RMSE * . Following an initial period of adjustment that leads to fast saturation of the mesoscale k − 5 3 range, a synoptic-scale peak emerges in the error spectrum. The 5 The equivalence between Expressions (27) and (28) is not extendable to higher spatial dimensions, because it only happens in two dimensions that the factor 1 2 for the kinetic energy is also the factor used to compute the average over D. In higher dimensions, the ensemble-mean error energy can be related to the MSE of velocity components via a constant multiplicative factor. spectrum then grows more or less uniformly in spatial scale and gradually saturates the k −3 range. After that, the growth slows down as the largest scales approach saturation. These observations are consistent with those reported in Leung et al. (2020).
Like RMSE * , it is possible to spectrally decompose CRPS * ∞ . To compute CRPS * ∞ for a wavenumber or range of wavenumbers, one simply picks out the associated waves in spectral space, transforms them to physical space, then aggregates the score over Λ and applies Equation (25). Such CRPS * ∞ may be compared with RMSE * for the same wavenumber(s) using Equation (23) ∈ [513, 1024]). The evolution of these metrics is shown in Figure 3a. Generally speaking, they grow steadily (in exponential terms) before asymptoting smoothly to their respective saturation values. The same figure also shows RMSE * associated with these scales but normalised by √ 2 so that, according to Equation (23), the curves for the CRPS and the RMSE would coincide if P were normal and P were homoscedastic. Broadly speaking, the agreement between the two is extremely close for all four spectral ranges, spanning several orders of magnitude of growth and, importantly, capturing the differences in saturation times between the different spectral ranges. This shows that, for these simulations, the normalised RMSE represents a good proxy for the CRPS. However, the discrepancy between the two curves is non-trivial throughout most of the simulation, although it remains within a factor of two. To enable closer examination of the discrepancy, the ratio of the two curves is plotted and shown in Figure 3b. Evidently, the discrepancy is stronger at the planetary and synoptic scales. For smaller scales, the CRPS and the normalised RMSE agree better, especially after the error at these scales has saturated. Figure 4 shows the ratio √ 2 CRPS * ∞ (t)∕RMSE * (t) for the full field without decomposition into wavebands (thick solid curve). In addition, the thin solid curves of a lighter shade show the evolution of the ratios for the N = 30 individual start dates, that is, with Λ =  ×  instead of  ×  × . Considerable variation in this ratio across the 30 cases is seen, particularly at smaller lead times. According to Equation (23), the solid curves in Figure 4 are expected to coincide with if the ensemble is normally distributed. Computing this ratio involves evaluating the ensemble's standard 2 )RMSE * (t) (dashed) for the planetary, synoptic, meso-and sub-mesoscale (from dark to light shades), up to T = 150. (b) The ratio √ 2 CRPS * ∞ (t)∕RMSE * (t) between the solid and dashed curves of (a) for the respective shades 2 CRPS * ∞ (t)∕RMSE * (t) but for Λ =  ×  (i.e., for specific start dates), for the M = 4-member ensemble (thin solid curves of a lighter shade) and the M = 48-member ensemble (dotted curve) deviation P , but the sample size (M = 4) is too small to estimate P robustly. To mitigate this, a larger ensemble of M = 48 members is run to estimate the heteroscedasticity of the ensemble's standard deviation. This is done only for a single start date (N = 1) owing to limited computational resources. As shown in the dashed curve of Figure 4, the fraction 1 ∕ √ 1 + h exhibits two local minima throughout the integration, the more extreme of which corresponds to a relative heteroscedasticity of h ≈ 1. This curve agrees nicely with the ratio √ 2 CRPS * ∞ (t)∕RMSE * (t) for the same large-ensemble experiment (dotted curve), which is visibly indistinguishable from the collection of thin solid curves that depict this ratio for individual cases involving the smaller ensemble (M = 4). The curves representing √ 2 CRPS * ∞ (t)∕RMSE * (t) and 1∕ √ 1 + h (i.e., dotted and dashed) agree nicely, hence suggesting that heteroscedasticity is responsible for the discrepancy between CRPS * ∞ (t) and (1∕ √ 2 )RMSE * (t). Any non-normality that the ensemble might develop throughout the simulation would thus appear to have a negligible impact on the CRPS-RMSE ratio, at least for the particular case considered in this experiment.

Non-normality of the ensemble distribution
Despite the fact that the departure of the normalised CRPS-RMSE ratio from unity can be primarily explained by the flow's heteroscedasticity, it is of interest to check explicitly whether the M = 48-member ensemble is normally distributed. This is done by evaluating the ensemble's skewness and excess kurtosis at each of the 2048 2 grid points and for each of the two velocity components, and comparing histograms of these statistics across the 2048 2 × 2 samples with those obtained via a Monte-Carlo simulation involving 2048 2 × 2 groups of 48 standard normal random variables, all mutually independent. Figure 5 shows the result for several lead times. Initially the two distributions are almost identical. This can be expected, since the perturbations are normally distributed by design (Subsection 4.1). The difference grows as the flow evolves. This suggests that non-normality in the ensemble distribution is being built up as the simulation progresses, which is hardly surprising, since it is a known feature of 2D turbulence (Farge et al., 1999). Yet, it also highlights that the extent of non-normality found here does not substantially affect the derived CRPS-RMSE relationship.

DISCUSSION AND SUMMARY
In this article, we have derived a functional relationship between two forecast verification metrics: the CRPS and the RMSE (Sections 2 and 3). The CRPS is a standard probabilistic score that rewards forecasts that are both sharp and reliable. On the other hand, the RMSE is the sum of the ensemble variance and the squared error of the ensemble mean. In some contexts, only the latter contribution is included in the definition of RMSE, which makes it like a deterministic verification metric since the ensemble mean can be verified as if it were a deterministic prediction in its own right. The fact that the CRPS and the RMSE can be functionally related provides a link between deterministic and probabilistic verification. Assuming that the predictive and verifying distributions are both normal, the relationship comes in the form where b is a measure of bias (Equation (7)), r is a measure of non-calibrated ensemble dispersion (Equation (8)), f (b, r) is as given in Equation (10), and h is the relative heteroscedasticity of the ensemble's standard deviation P as defined in Equation (21). The asterisks accompanying the notations CRPS and RMSE refer to aggregation over a sample, and the heteroscedasticity refers to the variability of P across the dimensions of aggregation. The CRPS-RMSE relationship is subject to a technical assumption that the measures of non-calibration (b and r) do not depend on the predictive distribution.
When predictions are reliable, Equation (29) reduces to The relationship in this special case has been tested on simulations of idealised 2D turbulence (Section 4), in which ensembles are reliable by the experimental design. Heteroscedastic effects are present, and are found to depend considerably on the length-scale and the forecast lead time. To our knowledge, the origins of such heteroscedastic effects in idealised turbulence are not well understood. It would be interesting to investigate the scale-dependence of heteroscedasticity in the future. In any case, if we were to ignore such heteroscedastic effects, a simpler form of the CRPS-RMSE relationship  (31) is robust to small ensemble biases, although it is more sensitive to over-and underdispersion of the ensemble.
The CRPS-RMSE relationship may be applied on any scalar meteorological variable in the real world, provided that the distribution of the variable is not overly non-normal. Inhomogeneity and anisotropy of the atmospheric flow imply that the results will depend on the domain and direction of aggregation. It remains to be seen how the heteroscedasticity observed in NWP simulations compares with that reported here for idealised 2D turbulence. The authors thank Richard Scott for providing his code for modification for the numerical simulations in Section 4, which would have not been possible without further support from high-performance computing resources at the European Centre for Medium-Range Weather Forecasts. The authors also wish to thank the two anonymous reviewers for their helpful and valuable comments on an earlier version of the manuscript.

AUTHORS' DECLARATION
An earlier and expanded version of this article forms part of the first author's PhD thesis (Leung, 2020).

APPENDIX. EXPECTED CRPS FOR NORMAL PREDICTIVE AND VERIFYING DISTRIBU-TIONS
The integral in Equation (6) can be simplified to provide an analytic expression for CRPS(P, Q), the expected CRPS for normal predictive and verifying distributions. The integral will be decomposed into three contributions according to the terms inside the outermost parentheses of the integrand. These contributions will be evaluated one by one. To begin, we have since they are Gaussian integrals. As for the contribution we proceed by first seeking an indefinite integral A(x) of x (rx + b), so that Expression A4 can be written as (A5) It is not difficult to see that so that A(∞) = −b∕r 2 and A(−∞) = 0. Substituting these into (A5), the first term equals −b Q , whereas The first term on the right-hand-side of Equation (A7) is a Gaussian integral which evaluates to .
The second term equals which can be established by considering and writing (note that I(r, 0) = 0, as the integrand is in that case an odd function). Hence we can write (A4) as Substituting this and Equations (A2) and (A3) into (A1) and therefore Equation (6), we finally arrive at CRPS(P, Q) = Q f (b, r), where ) . (A12)