A diagnostic suite to assess NWP performance


  • T.-Y. Koh,

    Corresponding author
    1. School of Physical and Mathematical Sciences, Nanyang Technological University, Singapore
    2. Earth Observatory of Singapore, Nanyang Technological University, Singapore
    3. Temasek Laboratories, Nanyang Technological University, Singapore
    • Corresponding author: T.-Y. Koh, School of Physical and Mathematical Sciences, Nanyang Technological University, 21 Nanyang Link, SPMS-04-01, 637371 Singapore. (kohty@ntu.edu.sg)

    Search for more papers by this author
  • S. Wang,

    1. School of Physical and Mathematical Sciences, Nanyang Technological University, Singapore
    Search for more papers by this author
  • B. C. Bhatt

    1. Temasek Laboratories, Nanyang Technological University, Singapore
    Search for more papers by this author


[1] A suite of numerical weather prediction (NWP) verification diagnostics applicable to both scalar and vector variables is developed, highlighting the normalization and successive decomposition of model errors. The normalized root-mean square error (NRMSE) is broken down into contributions from the normalized bias (NBias) and the normalized pattern error (NPE). The square of NPE, or the normalized error varianceα, is further analyzed into phase and amplitude errors, measured respectively by the correlation and the variance similarity. The variance similarity diagnostic is introduced to verify variability e.g. under different climates. While centered RMSE can be reduced by under-prediction of variability in the model,αpenalizes over- and under-prediction of variability equally. The error decomposition diagram, the correlation-similarity diagram and the anisotropy diagram are introduced. The correlation-similarity diagram was compared with the Taylor diagram: it has the advantage of analyzing the normalized error variance geometrically into contributions from the correlation and variance similarity. Normalization of the error metrics removes the dependence on the inherent variability of a variable and allows comparison among quantities of different physical units and from different regions and seasons. This method was used to assess the Coupled Ocean/Atmospheric Mesoscale Prediction System (COAMPS). The NWP performance degrades progressively from the midlatitudes through the sub-tropics to the tropics. But similar cold and moist biases are noted and position and timing errors are the main cause of pattern errors. Although the suite of metrics is applied to NWP verification here, it is generally applicable as diagnostics for differences between two data sets.

1. Introduction

[2] Numerical weather prediction (NWP) is an established scientific discipline, the beginning of which can be traced even before the advent of computers [Richardson, 1922; Yoden, 2007]. While NWP models grew in number and complexity [Charney et al., 1950; Lynch, 2008], model verification has become an increasingly important task. Objective evaluation of forecast quality is crucial for both scientific and operational purposes [Brier and Allen, 1951]. But a consensus about what constitutes a good quality forecast is difficult to achieve, even if attention is confined to just two aspects of quality: accuracy, defined as agreement between pairs of observation and forecast; and skill, measured with respect to a reference standard of performance [Murphy, 1993].

[3] A variety of verification procedures has been developed and a review of these can be found e.g. in Wilks [2006]. Here, we review the most commonly used elementary yardsticks for accuracy: the bias (a.k.a. unconditional bias), root-mean square error (RMSE) and Pearson's correlation coefficient. Each measure has its own strengths and shortcomings, where the latter are not necessarily addressed by other diagnostics:

[4] 1. The bias indicates the overall systematic difference between forecast and reality so that useful guiding notions like “the model is too wet/dry or too warm/cold” can be derived, but what constitutes a large or small bias is hard to say from the value of the bias itself without a context.

[5] 2. The RMSE gives a good estimate of the overall error between the model and the observations, but it tends to vary directly with the standard deviation of the observed quantities [Koh and Ng, 2009]. This means the size of RMSE is not solely due to the model's performance per se, e.g. small errors for temperature and humidity in the tropics and large errors for wind in the upper troposphere are somewhat expected from the corresponding small or large variabilities in physical quantities themselves.

[6] 3. The correlation coefficient is useful to detect errors arising from phase lead or lag between forecast and observation but is independent of the difference in the variance of forecast and observation. So having a correlation of one is of dubious significance if forecast variance is much smaller than observed variance and is left uncorrected.

[7] Since the error information given by any one error metric is always either incomplete or not detailed, there is a need for a suite of suitably chosen error metrics. One example is the decomposition of the mean square error (MSE) into correlation, conditional bias, unconditional bias, and possibly other contributions [cf. Murphy, 1988, equation (12)]. However, the trade-off of recognizing the nature of the error and decomposing a single metric into many components is that we simply have too many metrics to look at.

[8] The situation was improved when Taylor [2001] recognized that a simple geometrical relation exists between the centered RMSE and the standard deviations of forecast and observation and proposed a compact diagram to visualize these metrics. The Taylor diagram has since become generally accepted and is useful for comparing RMSE or other skill scores between different models.

[9] Another issue that is often overlooked in verification studies is the rigorous mathematical generalization of diagnostics from scalar to vector variables. Common methods for analyzing the error of vectors, such as wind, invariably break them up into Cartesian or polar components, and each component is treated separately as a scalar [Anthes et al., 1989; Qian et al., 2003; Hanna and Yang, 2001; Hogrefe et al., 2001]. The result is that the information associated with the covariance between vector components is missed [Koh and Ng, 2009]. From a mathematical point of view, components of vectors are not scalars and do not respect the same invariance principles under transformations of the reference frame. So the danger is that the diagnostics of separate vector components even when taken together do not completely describe or might even mis-represent physical reality.

[10] In the first part of this paper, we aim to develop a systematic suite of elementary diagnostics — some are new while others have been published — which can (1) shed light on different aspects of model accuracy; (2) be neatly summarized in a few diagrams that relate the diagnostics geometrically; (3) be easily generalized from scalar to vector variables. The total error is neatly resolved into bias and pattern error and the latter is further decomposed into errors arising from the mismatch in the phase or amplitude of variations. For a scalar, the error metrics can be succinctly summarized in two diagrams, whereas one more diagram is needed for two-dimensional vectors to investigate the anisotropy of vector error distribution.

[11] The suite of diagnostics is put to the test in the second part of this paper by assessing the performance of Coupled Ocean/Atmosphere Mesoscale Prediction System (COAMPS®). More than merely demonstrating the utility of the error metrics, the objective here is to contrast the model performance in the tropics and extratropics, so as to evaluate the current-dayskillof tropical NWP. COAMPS is a limited-area mesoscale model originally developed for NWP in the USA and has been demonstrated to be efficacious in various parts of the world [e.g.,Kong, 2002; Liu et al., 2007]. But the model is largely unverified in tropical regions such as Southeast Asia. The earlier work Koh and Ng [2009] verified the COAMPS model against two months of intensive radiosonde observations from South China Sea Monsoon Experiment (SCSMEX) [Ding et al., 2004] using a more restricted set of diagnostics. With the suite of error metrics advanced here, we extend the effort by verifying the COAMPS model against one year of radiosonde data for Southeast Asia and compare the results with those for southeastern USA.

[12] The paper is organized as follows: Section 2 contains a brief review of the current diagnostic framework. In section 3, we advance a diagnostic framework for model accuracy and summarize these diagnostics into two (for scalar variable) or three (for vector variable) diagrams. Section 4 describes the COAMPS model and the observation data set used for the verification study. Section 5 highlights the main advantages of the proposed diagnostic tools and section 6 evaluates COAMPS model performance in the tropics and extratropics. The main conclusions are summarized and discussed in section 7. Table 1 provides a list of acronyms and symbols used in this paper for convenient reference.

Table 1. Acronyms, Symbols, Their Meanings and Defining Equations.
Acronym/SymbolMeaningDefining Equations
MSEmean square error-
RMSEroot-mean square error(3)
NRMSEnormalized root-mean square error(26)
NPEnormalized pattern error(27)
NBiasnormalized bias(28)
Oobserved variable-
Fforecast/modeled variable-
Ddiscrepancy of forecast/model from observation(1)
σOstandard deviation of observation-
σFstandard deviation of forecast/model-
σDstandard deviation of discrepancy, or centered RMSE-
math formulaFforecast variability normalized by observed variability math formula
math formulaDcentered RMSE normalized by observed variability math formula
math formula*Fanti-symmetric measure of variance similarity, used inYu et al. [2006](11)
math formula*Danti-symmetric measure of normalized error variance(12)
νfractional difference of forecast from observation math formula
ρcorrelation (for scalars)(4)
 correlation (for vectors)(18)
ψangle on the Taylor diagram(9)
ηvariance similarity(19)
η*modified variance similarity(33)
ϕangle on the correlation-similarity diagram(20)
αnormalized error variance(17)
α*one example of skill score(22)
δnormalized root-mean square error, NRMSE(26)
σnormalized pattern error, NPE(27)
μnormalized bias, NBias(28)
γangle on the error decomposition diagram(29)
θpreferred direction of vector pattern error-
εssymmetrized definition of eccentricity, used to measure vector error anisotropy(32)
εconventional definition of eccentricity(B3)
βalternative symmetrized definition of eccentricity, used in Koh and Ng [2009] math formula
alarger eigenvalue of var(D)-
bsmaller eigenvalue of var(D)-

2. Brief Review

2.1. Back to Basics

[13] (The advanced readers may wish skip this subsection.)

[14] The discrepancy D between the forecast F and the observation O of a vector variable is defined as

display math

Note that any vector equation is applicable to a scalar because scalars are one-dimensional vectors.

[15] For scalars, common error statistics: bias, RMSE, and correlation coefficient ρ, are defined respectively as

display math
display math
display math

Since 〈(D − 〈D〉)2〉 = 〈D2〉 − 〈D2, RMSE can be decomposed as

display math

var(D) is the bias-corrected MSE, a.k.a. centered MSE, which quantifies the matching between the fluctuating components of observation and forecast. In this paper, we refer to RMSE as a measure of “total error” and the centered MSE as a measure of model's “pattern error”. The term “random error” is often used to denote “pattern error” but we have reserved the label “random” to mean “arising through chance” in this paper. var(D) was further resolved in equation (10) of Murphy [1988] as

display math

Taylor [2001] expressed this relationship in terms of standard deviations σD, σF and σO of D, F and O respectively, and noted the resemblance to the cosine rule as follows:

display math
display math

a, b and c refer to the length of the three sides of a triangle and ψ is the angle opposite the side c. Equation (8) can be used to succinctly display all metrics in equation (7) in the geometric form by taking in a Taylor diagram [Taylor, 2001, Figure 2] the polar angle as

display math

[16] To use the centered MSE to quantify model accuracy for distinct physical fields and for climatologically disparate regions, some kind of normalization is necessary [Koh and Ng, 2009]. The most frequently used normalization reference is σO2 [Murphy, 1988; Taylor, 2001; Jolliff et al., 2009]. From equation (7):

display math

where math formulaD = σD/σO and math formulaF = σF/σO (Figure 1). The normalized forecast variation math formulaF is used to discern any difference in the amplitude of variations between the observation and forecast. In this approach, there are a few outstanding issues as explained next.

Figure 1.

Non-dimensional Taylor diagram showing all standard deviations normalized with respect to that of the observation. Circular contours of math formulaD are drawn centered on the reference (observation). Values of correlation ρ are labeled along the circular edge, and the radial distance math formulaF is labeled on the horizontal axis. The “best” model with ρ = 1, math formulaF = 1 and hence math formulaD = 0 is marked by a star.

2.2. A Few Issues

[17] First, both MSE and centered MSE sometimes reward forecasts that underestimate atmospheric variability [Arpe et al., 1985; Taylor, 2001]. This feature is inherited in the metric math formulaD and can be demonstrated by fixing correlation in equation (10) and minimizing math formulaD. The minimum occurs at math formulaF = ρ ≤ 1, which except when ρ = 1, runs contrary to the intuitive idea of having optimal model performance when math formulaF = 1 for positive correlation [Taylor, 2001; Jolliff et al., 2009]. If minimizing RMSE was the objective, having a model under-predict the variability when the correlation is not perfect would be advantageous. But sometimes (e.g. when the prediction of variability itself is of interest), it is reasonable to wish that model under-prediction of variability should be reflected as a disadvantage, i.e. as an increase in some other error metric. For negative correlations, minimal math formulaD is achieved when the model variability vanishes ( math formulaF → 0). That is, if the modeled variations tend to oppose the observed, a relatively constant forecast compared to the observations is preferred to minimize RMSE.

[18] Secondly, unlike MSE and centered MSE, the normalized measure math formulaD as a difference diagnostic between forecast and observation is not invariant under the exchange of forecast and observation. This lack of symmetry (and anti-symmetry) is rather uncharacteristic for a difference measure, as even the most elementary of such measures, |AB| (or AB), between two quantities A and B possesses symmetry (or anti-symmetry) under the exchange of A and B. Relatedly, math formula F also does not have this property and so fails to represent equally the severity of under-prediction (0 ≤ math formulaF< 1) and over-prediction (1 < math formulaF < ∞) of atmospheric variability. This problem is only partially overcome by the definition used by Yu et al. [2006]:

display math

Substituting the above definition into equation (7), we get

display math

While math formula*F and math formula*Dare anti-symmetric under the exchangeσFσO for σFσO, this is achieved by sacrificing continuity of the measures at σF = σO (for ρ ≠ 1 in the case of math formulaD). Since good models tend to have σFσO, the discontinuity makes math formulaD in equation (12) a rather inconvenient measure of pattern error.

3. Development of Error Diagnostic Suite

3.1. Pattern Error

[19] To address the questions raised in the last section and generalize the decomposition of pattern errors to vectors, we start from the matrix identity

display math

where cov(F, O) = 〈(F − 〈F〉)(O − 〈O〉)〉 is the covariance matrix [Feller, 1968] and var(A) ≡ cov(A, A), where A = D, O, F. Scalar variables are one-dimensional vectors, which means the matrices reduce to the familiar scalar quantities: var(D), var(O), var(F) and cov(F, O). For a vector variable, the trace of the covariance matrix is the sum of its diagonal elements and so,

display math
display math

By taking the trace of equation (13) and normalizing by (σO2 + σF2) instead of just σO2, we have

display math


display math
display math
display math

[20] α is the normalized error variance, first defined in Koh and Ng [2009] for vector and scalar variables. There, the diagnostic was shown not to depend on observation variability, thus yielding insights into the forecast errors in a mesoscale model.

[21] ρ is a natural generalization of the correlation coefficient for vector variables, already defined by Dietzius [1916]. Other definitions of vector correlation also exist [e.g., Court, 1958; Crosby et al., 1993], but Aparna et al. [2005] in evaluating the modeling of sea breezes found the correlation diagnostic based on Dietzius' definition to be least noisy.

[22] η is a new measure called “variance similarity” or “similarity” for short. It is defined here as the ratio of the geometric mean to the arithmetic mean of σO2 and σF2. Like math formulaFemployed implicitly in the non-dimensional Taylor diagram, the variance similarityη compares the standard deviations of the forecast and observation. But unlike math formulaF, the variance similarity η has the following advantages:

[23] 1. It is invariant under the exchange of observation and forecast. Since this means η is equivalent for reciprocal values of σF/σO (i.e. x and 1/x, where xmath formula+), the under- and over-prediction of atmospheric variability are penalized equally.

[24] 2. It increases monotonically as the variance of observation and forecast approach each other. η ≪ 1 indicates that the observation and forecast fluctuate with very different amplitudes; η = 1 denotes matching standard deviation of variations, σF = σO.

[25] In fact, all three metrics, α, ρ and η, are non-dimensional, symmetric with respect to the observation and forecast, and valid for both scalar and vector variables.

[26] Motivated by the analogy to the cosine rule in Taylor [2001], the relation in equation (16) can be visualized geometrically, by taking either ρ or η as the cosine of an angular coordinate leaving the other as the radial coordinate. Here, we define the angular coordinate ϕ as

display math

ϕ is 0 when forecast and observation vary to the same extent; ϕ is π/2 when the observation or forecast is constant. For σFσO, ϕ is approximately equal to the fractional difference between σF and σO (Appendix A). Equation (16) becomes

display math

Figure 2, described in detail by its caption, shows how α, ρ and ϕare related geometrically, and shall henceforth be called the “correlation-similarity” diagram.

Figure 2.

The correlation-similarity diagram: the magnitude of correlation |ρ| is the radial distance, denoting the phase agreement, where lower and upper semi-circles are for positive and negativeρ respectively; ϕis the angle from the vertical axis, representing the amplitude agreement, where left and right semi-circles are forσF < σO and σF > σO respectively. The values of η are labeled along the circular edge of the plot and circular contours corresponding to ρ/η = ± math formula, ± math formula, ±1, ±2, ±4 are drawn. |y| denotes the vertical distance from the center of the plot. α increases vertically upward and the values are marked on the left. The “best” model with ρ = 1, η = 1 and hence α = 0 lies at the lowest point and is marked by a star.

[27] The correlation-similarity diagram is designed to make explicit thefundamental symmetry of α and η under σFσO as measures of differences between two data sets. This symmetry would not be obvious if η was chosen as the radial coordinate, or if α was plotted on a non-dimensional Taylor diagram (which uses (arccosρ, math formulaF)-coordinates). The reason is that η or math formulaF being a radial coordinate and hence α as a function of these coordinates cannot be symmetrically visualized about σF = σO. More concrete comparisons with the Taylor diagram will be made in section 5.2 with the help of NWP model verification results.

[28] Equation (16)resolves non-zeroα into contributions from the disagreement in phase (ρ < 1) and in amplitude (η< 1) between observation and forecast variations, which is visualized geometrically on the correlation-similarity diagram: the (ρ, arccos η)-plane is divided up by isolines ofρ/η. As ρ/η = yρ = y cos ϕ, isolines of ρ/η = y are circles of diameter y as shown in Figure 2. Models that lie within (outside) the ρ/η = 1 circle are dominated by phase (amplitude) errors and the smaller (larger) the value of ρ/η, the greater is the relative contribution from phase (amplitude) errors.

[29] In the correlation-similarity diagram,α is denoted by the vertical distance. At the bottom, α = 0 denotes no pattern error when forecast and observation variability match in both amplitude and phase. In the middle, α = 1 denotes that the model is only as well as a random forecast made from knowing the climatological mean and having equal chance of predicting an anomaly in one direction or the opposite, i.e. cov(F, O) = 0 (cf. equations (13) and (17)). At the top, α= 2 denotes maximum pattern error when forecast and observation vary with the same amplitude but are exactly out-of-phase. Therefore,α calibrates the pattern error made by a model against that made by the mentioned random forecast.

[30] With reference to Figure 2, for fixed non-zeroη, α is minimized by increasing correlation ρ. For fixed positive ρ, minimizing α implies maximizing η which means the forecast variability approaches the observed variability (equation (16)). For fixed negative ρ, minimizing α requires minimizing η, i.e. increasing disagreement in the amplitude of the variations, which may sound perplexing at first. But actually, when the forecast variations tend to oppose the observed variations (and hence exhibiting negative correlation), it makes sense for the least disagreement to be achieved by either forecast or observation being nearly constant. Nonetheless, η → 0 also corresponds to either σF → ∞ or σO → ∞. This would mean α is minimized simply because the normalization reference tends to infinity. But this is a common problem for normalized measures. For example, math formulaD is also minimized in regions where the observed variability σO is very large for negative ρ (cf. Figure 1). Luckily, negative correlations usually indicate some severe problem in the underlying physics or dynamics and so models in practical use seldom fall into this category.

[31] As for η = 0, α is independent of ρ, which is desirable because when either forecast or observation is constant, ρ is undefined. In fact, α can be proven from first principles to be unity in this case. On the other hand, for ρ = 0, η is well-defined, but α is independent of η and equals to one. The model performs only as good as a random forecast, but even so, one may reasonably expect to distinguish among those forecasts that rightly reproduce the observed amplitude of variations, those that do not, and those that are constant. So, although α is a good measure of the pattern error against the random forecast, it is not a good skill score to use when the model performs similarly to a random forecast, i.e. when α ≈ 1. But in such cases, η itself clearly provides the necessary supplementary information and so one might be motivated to design a simple skill score:

display math

[32] Now, α* is in no way unique: it is similar to the skill scores already proposed in section 5 of Taylor [2001]. In contrast, the definition of α is special in the sense that it arises naturally from the symmetric normalization of the centered MSE and results in a simple relation for pattern, phase and amplitude errors (equation (16)). Thus, we focus on the separate use of α and η in the present work and refer the interested reader to Taylor [2001] for a discussion on designed skill scores similar to α*.

3.2. Bias

[33] The correlation-similarity diagram only represents the pattern error and its decomposition into phase and amplitude errors. Thus, there is still a need to incorporate the bias into the diagnostic framework. The generalization ofequation (5) for vectors is the matrix equation

display math

By taking the trace, we have

display math

[34] We divide equation (24) by (σO2 + σF2) to remove the dependence on observation variability while preserving the symmetry under the exchange of forecast and observation. Using equation (17) and with some rearrangement and substitutions,

display math


display math
display math
display math

[35] δ is the normalized RMSE (NRMSE). σ is the normalized pattern error (NPE), as it arises from the normalization of the standard deviation of the error fluctuations. μ is the normalized bias (NBias). The use of σD to standardize the bias 〈D〉 is recommended when the two populations are not independent [Rosenthal, 1991]. The magnitude of NBias is also the paired t-statistic multiplied by math formula, where N is the number of degrees of freedom in the data sample. Despite N being unknown because of spatiotemporal correlation among data values, NBias is a direct measure of statistical significance: larger NBias implies more significant bias. So a comparison of NBias between two regions is also a comparison of the statistical significance of model bias in the two regions.

[36] To present the diagnostics on a polar plot, we can define an angle γ based on the NBias by

display math

Note that for small bias, tan γγ and so γ approximates NBias. Equation (25) can be rearranged as

display math

For the magnitude of NBias less than 0.5, the bias makes negligible contribution to the total error (i.e. RMSE) since for |tanγ| =≲ 0.5, cos γ ≳ 0.9 and δσ. Figure 3 illustrates the geometric relation between σ, δ and γ embodied in equation (30). It shows the decomposition of the total error into bias and pattern error contributions in a non-dimensionalized manner and will henceforth be called the “error decomposition” diagram. For a scalar variable, positive and negative bias are distinguished by plotting in the right and left quadrant respectively. For a vector variable, only the right quadrant is used (which could be interpreted as the 2D meridional projection of a 3D hemispherical plot in which the azimuthal angle denotes the direction of the vector bias).

Figure 3.

The error decomposition diagram: the normalized RMSE, δ, is the radial distance labeled along the bottom of the plot; γ is the angle from the vertical axis and is a measure of the normalized bias, μ. The normalized pattern error, σ, is the upward vertical distance bounded by math formula. The values of μ is labeled around the edge of the plot, where values between −0.5 and 0.5 (bold dashed lines) denotes negligible contribution of the bias compared to the pattern error to the total error (i.e. RMSE). The “best” model with δ = 0 is marked by a star.

3.3. Vector Variables

[37] For a vector variable A, σA2 ≡ tr[var(A)] does not capture all independent pieces of information in var(A). For a two-dimensional vector like horizontal wind (u, v), there are three independent pieces of information pertaining to variance, exemplified by var(u), var(v) and cov(u, v). Koh and Ng [2009] proposed the use of an ellipse to capture all this information. Figure 4 (reproduced from Koh and Ng [2009]) shows such an ellipse, where a and b are respectively the square roots of the larger and smaller eigenvalues of var(A). (Such representation can be generalized to an n-ellipsoid forn-dimensional vectors.) Taking the vector A as the error D of some two-dimensional vector variable, the centered MSE,σD2 = a2 + b2, would be a measure of the size of the “error ellipse”.

Figure 4.

Graphical representation of the mean and variance of a 2D vector A, where λmin and λmaj are the smaller and larger eigenvalues of var(A). The axes of the ellipse are aligned with the corresponding eigenvectors, or equivalently, principal components of vector A. For example, A could be F, O or D.

[38] Suppose the eigenvector with eigenvalue a2 makes a clockwise angle of θ with the vertical axis (Figure 4), where 0 ≤ θ < π. Using the horizontal and vertical unit vectors as the basis, Appendix B shows that

display math

where εs is a symmetrized measure of eccentricity of the ellipse, defined as:

display math

For εs ≪ 1, εs is the fractional difference between a and b (proven in Appendix B).

[39] Evidently from equation (31), besides σD, two other diagnostics are necessary to complete the description of vector pattern errors: εs and θ which measure respectively the extent of anisotropy and the preferred direction of the vector errors. Both diagnostics can be displayed in another polar plot, the “error anisotropy” diagram (Figure 5) in which the radial distance is εs and the polar angle is 2θ, reflecting the order-2 rotational symmetry of the orientation of the error ellipse.εs = 0 corresponds to isotropy while εs = 1 corresponds to maximal anisotropy (i.e. the vector errors align in a straight line). Note that εs and θ characterize the vector pattern error but are not errors per se.

Figure 5.

The error anisotropy diagram: the radial distance εs is a symmetrized measure of eccentricity quantifying the extent of anisotropy of the vector pattern errors; the angular coordinate 2θ reflects the preferred direction θ of the errors. Note the cardinal directions are folded together. There is no “best” model on this diagram although the center is special as it stands for isotropy in the vector pattern error.

4. Assessing NWP Skill

4.1. Model Description and Domains

[40] The atmospheric module of COAMPS model is a nonhydrostatic, compressible atmospheric model using terrain-followingσz coordinates [Gal-Chen and Somerville, 1975]. There are 11 prognostic variables: horizontal wind, vertical velocity, Exner's function, potential temperature, mixing ratios of water vapor, cloud water, cloud ice, rain and snow, and turbulent kinetic energy. The main physical parameterizations are Harshvardhan et al.'s [1987] radiation scheme, Kain and Fritsch's [1990] cumulus scheme, Rutledge and Hobbs's [1983] cloud microphysics scheme, Mellor and Yamada's [1974] turbulence scheme and Louis's [1979] surface flux scheme. Simple land [Deardorff, 1980] and sea [Chang, 1985] models are incorporated. The lateral boundary conditions are of Davies's [1976] type. More model details can be found in the work of Hodur [1997].

[41] Navy Operational Global Atmospheric Prediction System (NOGAPS) [Hogan and Rosmond, 1991] provides global fields for the initial cold start and boundary conditions for the 81-km domain. Boundary conditions for the 27-km domain were taken from the 81-km domain. A MultiVariate Optimal Interpolation (MVOI) scheme analyzes conventional and satellite observations every 12 hours [Barker, 1992]. The analyses were used to initialize the model and 12, 24, 36 and 48 h-forecasts were made from 00 and 12 UTC daily. The 12-hr forecasts could also be used to warm start the assimilation-forecast cycles.

[42] Southeast Asia (94°E to 132°E and 11.5°S to 25.5°N) and southeastern USA (25°N to 40°N and 75°W to 95°W) were chosen to compare the accuracy and skill of COAMPS in NWP for the tropics and extratropics respectively. Moreover, the COAMPS model was developed in the USA for weather prediction and so it seems appropriate to benchmark model performance against that region. Southeast Asia straddles the equator reaching northern subtropical latitudes and its climate is heavily influenced by the Asian-Australian monsoon. Southeastern USA spans the subtropics and midlatitudes and, according to the most recent Koppen-Geiger climate classification [Kottek et al., 2006; Peel et al., 2007], has a temperate humid climate with hot summers. The model was separately run using Mercator and Lambert conic conformal projections in the two regions. Two-level nesting at reference resolutions 81 km and 27 km and 61 vertical levels were used in each region. Twice daily COAMPS assimilation-forecast cycles were carried out from 00 UTC 1 June 2007 to 31 May 2008 in both regions.

4.2. Observation Data Set for Verification

[43] Twice daily radiosonde data was used for NWP verification in this work. Two scalars and one vector, namely temperature, dew point depression and horizontal wind, were verified using the suite of diagnostics developed in section 3. Data were obtained at 9 World Meteorological Organization (WMO) mandatory pressure levels (in mb): 1000, 925, 850 (lower troposphere); 700, 500, 400 (middle troposphere); 300, 200, and 100 (upper troposphere). The seasons for the whole year were divided based on the change in the observed mean wind and temperature across each month. In this paper, we focus on the summer and winter periods: summer is defined as June to September for Southeast Asia and June to August for the USA; winter is defined as December to March for Southeast Asia and December to February for the USA.

[44] The radiosonde stations in Southeast Asia were further split into two groups: continental (higher-latitude) and maritime (lower-latitude). For convenience henceforth, “continental Southeast Asia”, “maritime Southeast Asia” and “southeastern USA” will be referred as “continental”, “maritime” and “US” regions respectively (Figures 6 and 7). There are 21 continental, 27 maritime and 24 US stations in the data set, after rejecting stations with little data. The only exceptions are stations 46, 47 and 48 in Figure 6: they are retained despite having less data because they fill in data gaps in the Philippines and South China Sea. Stations 24, 37 and 40 are grouped with the continental stations because they lie in the higher tropical latitudes and are much influenced by continental air masses.

Figure 6.

The 48 radiosonde stations in Southeast Asia, with the solid curve dividing them into continental and maritime groups with 21 and 27 stations respectively.

Figure 7.

The 24 radiosonde stations in and around southeastern USA.

[45] Data quality control is of paramount importance as all the statistics used (bias, variance, covariance) are not robust statistics, which means they are sensitive to outliers [Huber and Ronchetti, 2009]. The radiosonde data used have all passed through a rigorous quality control process, including comparison with climatological range, lapse-rate test, vertical wind-shear test, hydrostatic consistency check [Baker, 1992]. To eliminate extreme temperature and dewpoint depression readings, values more than 3 standard deviations away from the mean of the readings over one season at each pressure level were eliminated. Dew point depression at 100 mb and 200 mb was not used since instrumental error was believed to be larger than forecast error because of very low specific humidity.

4.3. Verification Method

[46] Model forecasts were first linearly interpolated from σz coordinates onto the mandatory pressure levels. The Mercator and Lambert conformal grids were Delaunay triangulated and the forecast values at the triangles' vertices (which are also model grid points) were interpolated onto the station locations that lie within the triangles. The interpolation error was subsumed conceptually as part of the forecast error, because NWP forecasts for station locations always involve some form of interpolation.

[47] The mean and variance of the forecasts, observations and forecast-observation discrepancies for each variable at each pressure level were calculated over all stations.δ, σ and μ, and hence α and γ, were computed for the decomposition of total error. ρ, η and hence ϕ were calculated to further resolve the pattern error. For the wind vector, the eigenvalues and eigenvectors of the tensor variance were calculated to derive εs and θ, where θ was measured clockwise from the cardinal north.

5. Results I: Advantages of the Proposed Diagnostics

5.1. Normalization of Errors

[48] To illustrate the effect of normalization, Figure 8provides an example of the old and the proposed error diagnostics for temperature at 850 mb across the different regions and seasons. In both panels, while bias and normalized bias exist, the total errors (RMSE and NRMSE) are dominated by the contribution from pattern errors (centered RMSE and NPE). Bias, centered RMSE and RMSE are all larger during winter for the higher latitudes (USA and continental regions). After the ambient variability of temperature is removed by normalization, the pattern error (NPE) and hence the total error (NRMSE) show the opposite trend: they are smaller for winter in the USA and continental region, but larger in the maritime region regardless of the monsoon season. The continental region during summer monsoon also has larger pattern and total errors like the maritime region. The trend in pattern and total errors reflected by the proposed diagnostics is more reasonable as convection that is highly active in equatorial regions and the summer season is generally not as well captured by models compared to large-scale baroclinic eddies in the higher latitudes especially in winter.

Figure 8.

Comparison of the (a) old and (b) proposed set of error diagnostics for the 24 h-forecasts of temperature (K) at 850 mb from 27-km grid using warm start for the different domains and seasons (U, USA; C, continental; M, maritime regions; w, winter; s, summer). Circles, triangles and crosses denote respectively bias, pattern and total error measures. The ambient variability in temperature is represented by the square root of the combined variance of forecast and observation and denoted by asterisks.

[49] While the normalization of bias by σD to yield NBias does not qualitatively alter the trend in the model performances, it gives a clear picture that the bias makes negligible contribution to the total error measured by NRMSE as NBias magnitudes are all less than 0.5. Likewise, NPE being less than 1 provides an unequivocal quantitative statement on the extent to which the model is performing better than a random forecast derived from historical mean and variability.

5.2. Comparison With the Taylor Diagram

[50] The utility of the proposed diagrams lies in the presentation of the inter-relationship between the normalized error metrics in an easy-to-understand geometrical way, which will be demonstrated insection 6 and Figures 12, 13 and 14. But it is worthwhile first comparing the correlation-similarity diagram with the non-dimensional Taylor diagram since the two present similar information but in different ways. They differ in the choice of angular coordinate (arccosη or arccos ρ) and in the normalization reference to yield different measures of variance similarity (η or math formulaF).

[51] In the Taylor diagram, analogy with the cosine rule dictates arccos ρ as the angular coordinate. So there is the advantage of user familiarity in keeping with this convention. But if we make this choice, we leave η as the radial coordinate which does not visualize amplitude errors clearly. The reason is when the model variability is close to the observed, η ≈ 1 − ν2/2, where ν is the fractional difference between the observed and modeled variabilities (Appendix A). ν2 is a much smaller number than ν, which means η as a radial coordinate does not resolve differences as well as math formulaF ≡ (1 + ν) employed in the Taylor diagram.

[52] To illustrate the above problem, we adopt tentatively arccos ρ as the angular coordinate and define as the radial coordinate a modified measure of variance similarity:

display math

The modified definition above serves to plot the data for σF > σOin an outer ring like in the Taylor diagram. Otherwise those data would have to be plotted in a lower semi-circle to avoid overlapping with those data with the same value ofη but for σF < σO, which would break the continuity across σF = σO. In preserving this continuity in the radial direction, explicit visualization of the symmetry of η and α under σOσF is lost. For σF > σO, η* ≈ 1 + ν2/2 for small ν and so the same problem of poor resolution exists around η ≈ 1 as for σFσO where η* ≡ η.

[53] Figure 9compares the Taylor diagram with the alternative correlation-similarity diagram for temperature, showing clearly that the information is poorly resolved with the choice of (arccosρ, η) as angular and radial coordinates respectively. Now, having adopted the opposite convention, i.e. (arccos η, ρ) as angular and radial coordinates respectively, Figure 13b shows that the same information is resolved as well as in the Taylor diagram because arccos η ≈ |ν| for small ν.

Figure 9.

(a) The Taylor diagram and (b) the alternative correlation-similarity diagram corresponding toFigure 13b for temperature. Colored contours denote isolines of normalized error variance αand black contours denote isolines of correlation-to-similarity ratio at the valuesρ/η = ± math formula, ± math formula, ±1, ±2, ±4. The “best” model is marked by a star. Results from 4 pressure levels (1000, 850, 500, 300 mb) are plotted in sequence as a line, where kinks mark the values at the pressure levels and a circle denotes the 1000-mb level. Only 24 h-forecasts at 27 km resolution using warm starts are shown.

[54] An important question now arises: when is η preferred to math formulaFas a measure of variance similarity? The answer according to the case at hand would determine the choice between the correlation-similarity diagram and the non-dimensional Taylor diagram. math formulaF is evidently useful whenever math formulaDis the diagnostic of interest because the two quantities are geometrically related in the non-dimensional Taylor diagram. Other diagnostics likeα can also be plotted on the Taylor diagram (colored contours in Figure 9a) but there is no general advantage if those diagnostics rather than math formulaD is of interest. In particular, one would avoid math formulaDas a measure of pattern error if the symmetry between over- and under-prediction of variance is important (see alsosection 5.3).

[55] When the wish is to analyze pattern errors into contributions from phase and amplitude errors, α, ρ and ηwould be the appropriate diagnostics and plotting the data on the correlation-similarity diagram (Figure 13b) would facilitate understanding by appeal to geometric reasoning and direct reference to the (ρ, η)-coordinates which are not possible on the Taylor diagram. The isolines ofρ/ηwhich help in the analysis take the simple form of circles in the correlation-similarity diagram but are complicated on the Taylor diagram (black contours inFigure 9a). There is also no need to plot α-isolines the correlation-similarity diagram becauseα is simply the vertical coordinate.

5.3. A Different Account of Amplitude Errors

[56] Figure 10 compares the model performance in dew point depression at 850 mb for the three regions in summer as evaluated by math formulaD and αplotted for convenience on the non-dimensional Taylor diagram and the correlation-similarity diagram respectively. While the general increase in pattern error with forecast time is reflected in both diagrams, there are important differences in the details. Focusing on the 12 h-forecasts (triangles), the pattern error in the US region is apparently largest among the three regions as indicated by math formulaD (magenta contours) in the Taylor diagram, but is the smallest as indicated by α(vertical coordinate) in the correlation-similarity diagram. In the Taylor diagram, model under-prediction of variance is rewarded with smaller math formulaD so that the diagnosed performance is enhanced in continental and maritime regions where σF < σO (as explained after equation (10) in section 2). In fact, since the correlation ρ ≈ 0.4, in this case math formulaD minimizes when σF ≈ 0.4 σO! In the correlation-similarity diagram,α minimizes at σF = σO for all ρ> 0: the under-prediction of variance increasesα for maritime and continental regions.

Figure 10.

(a) The Taylor diagram and (b) correlation-similarity diagram and (c and d) their respective magnified portions for dew point depression at 850 mb on 27 km grid using cold start. Three regions (black, US; red, continental; blue, maritime) during summer are plotted where each line traces in sequence the 12-, 24-, 36- and 48-h forecasts. The 12-h forecast is denoted by a triangle.

[57] In Figure 10d, the 12 h-forecast in the maritime region has poorer performance inα than the continental region despite closer agreement between modeled and observed variance (i.e. higher η, dashed radial lines). The reason lies in the larger phase error (i.e. smaller ρ, dashed curves) in the maritime region. In Figure 10c, compared to the continental region, the better prediction of variability (i.e. math formulaF nearer 1) in the maritime region is the main and somewhat ironic reason for the poorer performance measured by math formulaD. So even when the diagnoses from the two metrics are consistent, the reasons identified may not be the same. Mathematically speaking, for σFσO, the contribution of ν ≡ ( math formulaF − 1) to math formulaDis first-order (fromequation (10)) and that to αis second-order (fromequations (16) and (A2)). Thus, math formulaD gives stronger emphasis to the contribution of amplitude errors than α.This strengthens the reward for under-prediction of variance and eclipses the role of better correlation in achieving a smaller math formulaD.

[58] If the non-dimensional Taylor diagram is used, it may be advantageous to also check the model performance againstα-isolines. Otherwise, plotting on the correlation-similarity diagram would be good because all data points lie within theρ/η = math formula circle (Figure 10b) and reaffirms that the contribution of amplitude errors is secondary to that of phase errors. Moreover, time in the US region, ρ/η falls rapidly with forecast showing that the growth rate of amplitude errors is weaker than the growth rate of phase errors.

5.4. Directional Information on Vector Errors

[59] The proposed diagnostics reveal important directional information on the wind vector errors, which have been largely overlooked in model verification so far (perhaps for the lack of appropriate diagnostics before now). Figure 11 shows that for all regions and seasons, the anisotropy in the wind error variance is weak (i.e. εs≲ 0.2). This suggests that the dominant source of pattern errors in the wind prediction is isotropic, e.g. more likely to arise in the model from spurious gravity waves rather than Kelvin waves (which has only east-west velocity).

Figure 11.

Anisotropy diagram for wind. Only a magnified view of the central portion of the diagram is shown. Results from 4 pressure levels (1000, 850, 500, 300 mb) are plotted in sequence as a line and a circle denotes the 1000-mb level. Only 24 h-forecasts at 27 km resolution using warm starts are shown.

[60] Earlier results in Koh and Ng [2009]seem to indicate some relation of the weak anisotropy to the background wind in Southeast Asia using spring-early summer radiosonde data from the South China Sea Monsoon Experiment (1 May to 30 June 1998). Here, we have separately used longer data sets to diagnose the winter (December to March) and summer (June to September) monsoons of the maritime region (blue lines inFigure 11) and made two new findings: (1) there is a near-90 degree rotation in the major axis of the pattern error between upper level (300 mb) and near-surface (1000 mb); (2) separately at these two levels, the major axis also rotates nearly 90 degrees between winter and summer. Such adverse behavior in the (weak) error anisotropy between seasons and between upper level and near-surface is reminiscent of the reversal of background wind between seasons and between upper and lower levels in the monsoon system. But how the background wind influences the growth of pattern errors in specific directions is still unclear and awaits further research.

6. Results II: Evaluation of COAMPS Model

6.1. Overall Model Performance

[61] Apart from the error anisotropy diagram presented in section 5.4, the error decomposition diagram (Figure 12) and correlation-similarity diagram (Figure 13) summarize concisely the other error metrics for the COAMPS model. They offer insights into the nature of the prediction error, be it overall bias or phase and amplitude errors.

Figure 12.

Error decomposition diagrams for (a) wind, (b) temperature and (c) dew-point depression. In practice, only the regionδ ≤ 1.2 needs to be shown compared to Figure 3. The same pressure levels from the same model runs are displayed using the same key as in Figure 11. (NB. In Figure 12a, the values for 500-mb and 850-mb levels for continental summer are so close that they appear as a single kink in the line plot at this scale.)

Figure 13.

Correlation-similarity diagrams for (a) wind, (b) temperature and (c) dew-point depression. In practice, only the semi-circle for positive correlation needs to be shown compared withFigure 2. The darker circles and circular arcs denote ρ/η = math formula, math formula, 1, 2, 4. The same pressure levels from the same model runs are displayed using the same key as in Figure 11.

[62] For wind (Figure 12a), the magnitude of NBias μ is mostly smaller than 0.3 for all pressures, regions and seasons. However, NBias can be larger than 0.5 at certain levels for temperature (Figure 12c) and dew point depression (Figure 12c). Therefore, bias contributes negligibly to the RMSE for wind but this may not be so for temperature and dew point depression. A cold bias was observed across the three regions and two seasons for all but the 300-mb level. The model also overpredicts the humidity (negative bias for dew point depression) for all but the 850-mb level in the maritime and US regions. Using the more restricted SCSMEX data set in Southeast Asia,Koh and Ng [2009] noted similar cold and moist biases and suggested that the two could be related.

[63] The decomposition of the normalized error variance α into correlation ρ and variance similarity η makes it clear that poor correlation contributes more to the pattern error: in all cases (Figure 13), the error metrics lie within the ρ/η = 1 circle or even within the ρ/η = math formula circle. The large spatiotemporal phase errors for all variables reflect the need to greatly improve the timing and location more than the intensity of weather systems in the NWP model. Alternatively, forecasters may rely more on intensity forecasts and make suitable modifications to timing and location forecasts. For wind (Figure 13a), η≥ 0.975 signifies that wind variability is well-captured by the model and subgrid wind variations are not important. In contrast, there is a general under-prediction of variability (ϕ < 0) for temperature (Figure 13b) and dew point depression (Figure 13c). This is understandable because the modeled values represent averages over a grid box whereas observed values are point measurements subjected additionally to subgrid variations not captured by the model.

6.2. Regional and Seasonal Differences

[64] Comparison of the error metrics across regions (Figures 12 and 13) reveals that the overall performance for all variables degrades generally from US, through continental, to maritime regions. We denote this ordering in short-hand notation as:

display math

The ordering in NRMSE δ across regions is evident for wind and temperature but less so for dew point depression (Figure 12). Although expected to exist, the gap in model performance between tropical and extratropical regions is rather large. For example, for temperature (Figure 12b), 0.2 ≲ δ ≲ 0.4 in the US region whereas 0.5 ≲ δ ≲ 1.0 in the maritime region, and this performance gap arises mainly in NPE σ because NBias μ has similar magnitude in both regions. Analyzing the pattern errors (Figure 13), decreasing correlation ρ contributes more to the degradation of NPE for wind and temperature than decreasing variance similarity η from the extratropics to the tropics. For example, for temperature at 500 mb, ρ > 0.9 in US winter but ρ < 0.3 in maritime winter (Figure 13b).

[65] As for seasonal differences, Figures 12b and 13b portray the same trend of model performance (regardless in terms of NBias or NPE, correlation or variance similarity) for temperature as follows:

display math

The seasonal order of performance is opposite for the maritime region in contrast to the other two regions and is likely related to differences in regional climate. The larger bias and pattern (phase and amplitude) errors for temperature in US or continental summer and in maritime winter may perhaps be attributed to generally recognized deficiencies in convective parametrizations [e.g., Slingo et al., 1994]. In midlatitudes, convection is more active in summer; in Southeast Asia, it is more active in summer for the continental region and in winter for the maritime region [Robertson et al., 2011].

[66] The seasonal differences for wind errors in Southeast Asia show a distinctive character that is absent in the US region: the order of model performance in NRMSE, NPE and correlation for the two seasons switch in going from low to high levels (cf. the kinks in the continental summer and maritime winter lines in Figures 12a and 13a).

display math

The reversal in seasonal performance across altitude is not seen for NBias and variance similarity (i.e. the angular coordinates in the two plots) which are small in the first place.

[67] We hypothesize that this feature of wind errors has its root in the position and timing errors of modeled convection which as mentioned is more intense in continental summer and maritime winter [Robertson et al., 2011]. Heavy precipitation during convection releases much latent heat that drives the mass circulation in the tropics. Due to the strong vertical gradient in density, the effect on wind velocity is much stronger at upper levels than at lower levels. The wrong position and/or timing of convection should also impact the temperature field more at the upper levels where the latent heat is released. This is shown by close inspection of Figures 13a and 13b. Degradation of correlation occurs for both wind and temperature at 300 and 500 mb in continental summer and maritime winter (see the kinks in the line plots). But the already poor temperature correlations at the lower levels mean that there is no reversal in seasonal performance across altitudes for temperature, unlike for wind.

[68] Other factors may complicate the wind correlations at 850 mb in maritime Southeast Asia. For example, Teo et al. [2011]noted that the land/sea breezes in a mesoscale model are over-sensitive to small-scale terrain and island features unique to the local region.

6.3. Common Error Features

[69] Despite the above contrasts between regions and seasons, there are a few model error features that are common across the three regions and two seasons, which we shall demonstrate by comparing temperature at 850 mb. Figure 14 illustrates the most obvious one: degrading the model resolution from 27 km to 81 km increases the bias a little but does not affect much the pattern error, correlation or variance similarity as diagnosed over the synoptic observation networks in Figures 6 and 7. This is likely due to the radiosonde networks not adequately sampling mesoscale weather. In tropical regions where mesoscale weather systems dominate, there is a dire need to build up mesoscale observation networks for better monitoring and modeling for forecast and research [Koh and Teo, 2009]. Mass et al. [2002]reports that in midlatitude regions, lower-resolution model runs may even sometimes perform better in objective accuracy scores than higher-resolution runs because larger-scale frontal features are adequately captured in lower-resolution runs while the simultaneous lack of mesoscale model details and insufficient mesoscale observations implies that location or timing errors of mesoscale weather systems are not penalized.

Figure 14.

(a and b) Error decomposition and (c and d) correlation-similarity diagrams for temperature at 850 mb. Three regions for 2 seasons are plotted for model runs at 27-km (Figures 14a and 14c) and 81-km (Figures 14b and 14d) resolutions. Unlike in the previous figures, each line traces in sequence the 12-, 24-, 36- and 48-h forecasts. The 12-h forecast is denoted by a circle for warm start or by a triangle for cold start. The darker circles and circular arcs in Figures 14c and 14d denoteρ/η = math formula, math formula, 1, 2, 4.

[70] The advantage of warm starts (circles) over cold starts (triangles) is evident in Figure 14: warm starts show much smaller bias and larger variance similarity (i.e. nearer the vertical axes in all panels) at the same forecast hour for each region and season. Thus, past data assimilation by the regional model helps correct for model bias and better constrain the amplitude of future temperature variations. However, the improvement in correlation from cold starts to warm starts is small, if present at all. This may be understandable as in the data assimilation process, the adjustment of modeled values toward observed values can in principle imply both a forward or backward shift in the phase of an oscillating variable and so phase errors may not necessarily be reduced.

[71] With increasing forecast hour, the large biases initially present in cold starts are progressively corrected by compensating model dynamics, indicating that the biases are not in equilibrium with other model fields (lines from triangles in Figures 14a and 14b). For example, the cold bias at 850 mb may be minimized from turbulent heat transport from the surface and convergence of radiative flux from the warmer atmosphere above and below. Thus, even without further data assimilation after initial time, having spin-up times of 48 hours or more from cold starts would effectively cut down the initial bias. However, in warm starts where the model has already spun up, the biases tend to grow (lines from circles inFigures 14a and 14b). As correlations consistently fall for both warm and cold starts (all lines with forecast hour in Figures 14c and 14d), position and timing errors from poor model simulation of wavelength, period and/or phase velocities accumulate over model integration time. It is plausible that nonlinear interactions feed these errors into the mean state thereby increasing bias as is apparent for warm starts. Thus, model integration itself cannot remove bias beyond the spin-up period and in fact can increase pattern errors especially in phase.

7. Conclusions and Discussion

[72] In this paper we have presented a suite of verification diagnostics applicable to scalar and vector variables, following the normalization and successive decomposition of model errors. The breakdown of the normalized root-mean square error (NRMSE,δ) enables us to determine how much of the overall model error is attributable to the normalized bias (NBias, μ) and the normalized pattern error (NPE, σ). The normalized error variance (σ2α) is further analyzed into contributions from phase and amplitude errors, measured respectively by the correlation (ρ) and the variance similarity (η). The suite of error diagnostics are summarized in two (for scalar) or three (for vector) diagrams facilitating visualization and comprehension.

[73] The proposed diagnostic framework was implemented to assess the COAMPS model in maritime and continental Southeast Asia and in southeastern USA. It was found that NBias contributes much less to NRMSE for wind than for temperature and dew point depression. For pattern error, most of it arises from poor correlation as the agreement between the amplitude of variations is rather good, regardless of the variable, region or season. Thus, much effort is needed to cut down position or timing errors such as arising from poor model simulation of phase velocities. It also reveals that the model mostly under-predicts atmospheric variability, which might be expected as models simulate grid-box averages while radiosonde measurements are point-like.

[74] The variance similarity η introduced here is a useful diagnostic to verify weather variability. Such verification is important, e.g. for the forecast of weather fluctuations under different climates. While ηis defined in terms of standard deviations, one can easily generalize the diagnostic by taking the ratio of the geometric mean to the arithmetic mean of modeled and observed 90th or 95th percentile values to verify forecasts of extreme rainfall, temperature or wind. The penalization of both over- and under-prediction would be particularly desirable in this case.

[75] The analysis by region shows that the model performance deteriorates from southeastern USA through continental Southeast Asia to maritime Southeast Asia. The state-of-the-art in NWP skill in the tropics still needs to catch up with that in the midlatitudes. The error characteristics of continental Southeast Asia is more similar to southeastern USA, suggesting climate type plays a bigger part than geographic location in introducing model errors. This is manifested most clearly for temperature, where the winter error metrics are better for southeastern USA and continental Southeast Asia, but not for the maritime Southeast Asia. The degradation of model performance across regions and seasons is perhaps linked with the modeling of convection which is more active in tropical regions and in the summer season.

[76] Biases are revealed as mostly model effects: they are the same in different regions and seasons qualitatively and sometimes even quantitatively (e.g. NBias for temperature). The importance of data assimilation to reducing bias is clearly revealed, although model spin-up can reduce the bias somewhat while increasing pattern errors especially in phase disagreement. The lack of adequate observation of mesoscale weather systems, especially in the tropics, is a serious issue for NWP verification at resolutions of O(10 km).

[77] In comparing pattern errors, we have employed the correlation-similarity diagram but we recommend that this tool be used in a complementary manner to the more established Taylor diagram. The difference in the two diagnostic diagrams lies mainly in the normalization process:

[78] 1. The Taylor diagram can be used without normalization; or the non-dimensional Taylor diagram uses the observed variance σO2 as the normalization reference, leading to math formulaF as the measure of the agreement between modeled and observed variability. The geometric relation based on the cosine rule is thereby preserved as equation (10).

[79] 2. The correlation-similarity diagram uses the combined observed and model variance (σO2 + σF2) as the normalization reference, recognizing that the variability inherent in the model's climate (e.g. ranging from the tropics to the midlatitudes) affects the pattern error as much as the observed variability does. This normalization process preserves the symmetry under σOσF exchange and leads to η as the measure of variance similarity and a different geometric relation in equation (16).

[80] The Taylor diagram remains useful in presenting unnormalized errors. When an agreed normalization standard is available, such as a single observation data set, the non-dimensional Taylor diagram is useful to present not only math formulaDwith its implicit cosine-rule geometry but also other diagnostics likeα.

[81] When models are compared based on largely different observation data sets with disparate variance (e.g. between different regions, seasons or vertical levels), when it is a prioriunclear whether the “observation” should be given greater emphasis than the “model” (e.g. between radiosonde soundings and satellite analyses), or when over- and under-prediction of variance are to be treated equally so that math formulaDis of lesser diagnostic value, there is no strong reason to prefer using the non-dimensional Taylor diagram per se, except possibly user familiarity. In such cases, the normalized error varianceαis a good alternative diagnostic and the correlation-similarity diagram is useful for the geometric interpretation ofαin terms of contributions from phase and amplitude errors. But users must watch for a major technical difference from the Taylor diagram: the correlation-similarity diagram uses arccosη as the angular coordinate, which is necessary to better resolve visually the plot about η= 1 and to respect the visual symmetry between over- and under-predicted variance without sacrificing the continuity across σF = σO.

[82] A caveat worth noting is that the set of diagnostics does not account for the impact on the phase and amplitude of observed variables by atmospheric processes occurring at scales smaller than grid resolution and not explicitly represented in the model. For example, a small-scale gravity wave not represented in the model could upset an otherwise good correlation between model and observations at the resolved scale. These subgrid processes may also have an unknown effect on the bias unless they are constrained in principle to have zero average impact. Thus, good performance according to this set of diagnostics may not always be practically attainable unless the observations are screened first to be rid of these unresolved and unrepresented processes. This caveat is in fact applicable to a wide class of error diagnostics including the commonly used RMSE.

[83] The non-dimensionalization of the diagnostics by the inherent scales of variability in the real world and in the model makes it possible to compare the model performance among different physical variables, pressure levels, or climatologically disparate regions and seasons. This is vital in making all our analysis physically meaningful [Watterson, 1996]. In general, verification with such non-dimensional diagnostics should be carried out before limited-area models can be properly implemented in regions different from those for which they were initially developed.

[84] The mathematical generalization of the diagnostic framework to include vector variables in this work represents only a beginning. By analyzing the complete covariance matrix of vector wind error, the true extent of anisotropy in wind errors and the preferred direction of the error variance could be revealed, which would have not been rightly captured by treating u-wind and v-wind as scalars. More effort in such direction may be beneficial to the development of other diagnostics involving vectors.

[85] Finally, it must be emphasized that although the suite of metrics developed here is applied to verifying an NWP model, it is generally applicable as diagnostics for differences between two data sets.

Appendix A:: Preference for (ρ, arccos η)- to (arccosρ, η)-Coordinates

[86] An equally legitimate choice might be to use η as the radial distance and arccos ρ as the polar angle. However, models often perform much better in amplitude than in phase so that σFσO. In this case, η ≈ 1, ϕ ≪ 1, and

display math

Writing math formulaFσ/σO = 1 + ν, where |ν| ≪ 1, equation (19) becomes

display math

Comparing the last two expansions for η and ignoring O(ν3), O(ϕ3) and higher-order terms,

display math

Thus, when the forecast and observation have nearly the same variability, ϕ or arccos η is approximately the fractional difference |ν| between the two variabilities (equation (A3)). On the other hand, η depends on the square of the fractional difference ν2 to leading order (equation (A2)) and so a small fractional difference between σF and σO show up as a very small difference in η. Thus, plotting on (arccos ρ, η)-coordinates would squeeze the data points together nearη = 1, while plotting on (ρ, arccos η)-coordinates would spread these data points outward from arccosη = 0. Hence, (ρ, arccos η)-coordinates are chosen in the correlation-similarity diagram.

Appendix B:: Error Anisotropy and Measures of Eccentricity

[87] The matrix var(D) is diagonalized when expressed with its eigenvectors as the basis as follows:

display math

Taking A = D in Figure 4, using the horizontal and vertical unit vectors as the basis and with the notation s = sin θ, c = cos θ,

display math

To simplify the above equation, note that the conventional definition for the eccentricity ε of an ellipse can be defined as:

display math

But |ε| is not invariant under the exchange ab. Other measures of eccentricity could be defined to preserve this symmetry: e.g. from the two expressions of ε above, we are motivated to construct

display math

β was already defined in Koh and Ng [2009] and εs is defined in equation (32). Both β and εs are (squares of) symmetrized measures of eccentricity. Among the three measures, ε, β and εs, only εs appears naturally in equation (B2) and so is preferred for representing the anisotropy in the variance of vectors (see equation (B2)), contrary to the choice made in Koh and Ng [2009].

[88] For small εs, εs is approximately the fractional difference between a and b as shown below:

display math


[89] The authors would like to acknowledge Sue Chen in Naval Research Laboratory (Monterey), USA, for her advice and assistance in the set up of our COAMPS forecast system for research purposes. Special thanks go to Aaron Chiang Qi Ming of School of Physical and Mathematical Sciences, Nanyang Technological University for his help in the COAMPS verification work. This is Earth Observatory of Singapore contribution 41.