3.1. Pattern Error
 To address the questions raised in the last section and generalize the decomposition of pattern errors to vectors, we start from the matrix identity
where cov(F, O) = 〈(F − 〈F〉)(O − 〈O〉)⊺〉 is the covariance matrix [Feller, 1968] and var(A) ≡ cov(A, A), where A = D, O, F. Scalar variables are one-dimensional vectors, which means the matrices reduce to the familiar scalar quantities: var(D), var(O), var(F) and cov(F, O). For a vector variable, the trace of the covariance matrix is the sum of its diagonal elements and so,
By taking the trace of equation (13) and normalizing by (σO2 + σF2) instead of just σO2, we have
 α is the normalized error variance, first defined in Koh and Ng  for vector and scalar variables. There, the diagnostic was shown not to depend on observation variability, thus yielding insights into the forecast errors in a mesoscale model.
 ρ is a natural generalization of the correlation coefficient for vector variables, already defined by Dietzius . Other definitions of vector correlation also exist [e.g., Court, 1958; Crosby et al., 1993], but Aparna et al.  in evaluating the modeling of sea breezes found the correlation diagnostic based on Dietzius' definition to be least noisy.
 η is a new measure called “variance similarity” or “similarity” for short. It is defined here as the ratio of the geometric mean to the arithmetic mean of σO2 and σF2. Like Femployed implicitly in the non-dimensional Taylor diagram, the variance similarityη compares the standard deviations of the forecast and observation. But unlike F, the variance similarity η has the following advantages:
 1. It is invariant under the exchange of observation and forecast. Since this means η is equivalent for reciprocal values of σF/σO (i.e. x and 1/x, where x ∈ +), the under- and over-prediction of atmospheric variability are penalized equally.
 2. It increases monotonically as the variance of observation and forecast approach each other. η ≪ 1 indicates that the observation and forecast fluctuate with very different amplitudes; η = 1 denotes matching standard deviation of variations, σF = σO.
 In fact, all three metrics, α, ρ and η, are non-dimensional, symmetric with respect to the observation and forecast, and valid for both scalar and vector variables.
 Motivated by the analogy to the cosine rule in Taylor , the relation in equation (16) can be visualized geometrically, by taking either ρ or η as the cosine of an angular coordinate leaving the other as the radial coordinate. Here, we define the angular coordinate ϕ as
ϕ is 0 when forecast and observation vary to the same extent; ϕ is π/2 when the observation or forecast is constant. For σF ≈ σO, ϕ is approximately equal to the fractional difference between σF and σO (Appendix A). Equation (16) becomes
Figure 2, described in detail by its caption, shows how α, ρ and ϕare related geometrically, and shall henceforth be called the “correlation-similarity” diagram.
Figure 2. The correlation-similarity diagram: the magnitude of correlation |ρ| is the radial distance, denoting the phase agreement, where lower and upper semi-circles are for positive and negativeρ respectively; ϕis the angle from the vertical axis, representing the amplitude agreement, where left and right semi-circles are forσF < σO and σF > σO respectively. The values of η are labeled along the circular edge of the plot and circular contours corresponding to ρ/η = ± , ± , ±1, ±2, ±4 are drawn. |y| denotes the vertical distance from the center of the plot. α increases vertically upward and the values are marked on the left. The “best” model with ρ = 1, η = 1 and hence α = 0 lies at the lowest point and is marked by a star.
Download figure to PowerPoint
 Equation (16)resolves non-zeroα into contributions from the disagreement in phase (ρ < 1) and in amplitude (η< 1) between observation and forecast variations, which is visualized geometrically on the correlation-similarity diagram: the (ρ, arccos η)-plane is divided up by isolines ofρ/η. As ρ/η = y ρ = y cos ϕ, isolines of ρ/η = y are circles of diameter y as shown in Figure 2. Models that lie within (outside) the ρ/η = 1 circle are dominated by phase (amplitude) errors and the smaller (larger) the value of ρ/η, the greater is the relative contribution from phase (amplitude) errors.
 In the correlation-similarity diagram,α is denoted by the vertical distance. At the bottom, α = 0 denotes no pattern error when forecast and observation variability match in both amplitude and phase. In the middle, α = 1 denotes that the model is only as well as a random forecast made from knowing the climatological mean and having equal chance of predicting an anomaly in one direction or the opposite, i.e. cov(F, O) = 0 (cf. equations (13) and (17)). At the top, α= 2 denotes maximum pattern error when forecast and observation vary with the same amplitude but are exactly out-of-phase. Therefore,α calibrates the pattern error made by a model against that made by the mentioned random forecast.
 With reference to Figure 2, for fixed non-zeroη, α is minimized by increasing correlation ρ. For fixed positive ρ, minimizing α implies maximizing η which means the forecast variability approaches the observed variability (equation (16)). For fixed negative ρ, minimizing α requires minimizing η, i.e. increasing disagreement in the amplitude of the variations, which may sound perplexing at first. But actually, when the forecast variations tend to oppose the observed variations (and hence exhibiting negative correlation), it makes sense for the least disagreement to be achieved by either forecast or observation being nearly constant. Nonetheless, η 0 also corresponds to either σF ∞ or σO ∞. This would mean α is minimized simply because the normalization reference tends to infinity. But this is a common problem for normalized measures. For example, D is also minimized in regions where the observed variability σO is very large for negative ρ (cf. Figure 1). Luckily, negative correlations usually indicate some severe problem in the underlying physics or dynamics and so models in practical use seldom fall into this category.
 As for η = 0, α is independent of ρ, which is desirable because when either forecast or observation is constant, ρ is undefined. In fact, α can be proven from first principles to be unity in this case. On the other hand, for ρ = 0, η is well-defined, but α is independent of η and equals to one. The model performs only as good as a random forecast, but even so, one may reasonably expect to distinguish among those forecasts that rightly reproduce the observed amplitude of variations, those that do not, and those that are constant. So, although α is a good measure of the pattern error against the random forecast, it is not a good skill score to use when the model performs similarly to a random forecast, i.e. when α ≈ 1. But in such cases, η itself clearly provides the necessary supplementary information and so one might be motivated to design a simple skill score:
 Now, α* is in no way unique: it is similar to the skill scores already proposed in section 5 of Taylor . In contrast, the definition of α is special in the sense that it arises naturally from the symmetric normalization of the centered MSE and results in a simple relation for pattern, phase and amplitude errors (equation (16)). Thus, we focus on the separate use of α and η in the present work and refer the interested reader to Taylor  for a discussion on designed skill scores similar to α*.
 The correlation-similarity diagram only represents the pattern error and its decomposition into phase and amplitude errors. Thus, there is still a need to incorporate the bias into the diagnostic framework. The generalization ofequation (5) for vectors is the matrix equation
By taking the trace, we have
 We divide equation (24) by (σO2 + σF2) to remove the dependence on observation variability while preserving the symmetry under the exchange of forecast and observation. Using equation (17) and with some rearrangement and substitutions,
 δ is the normalized RMSE (NRMSE). σ is the normalized pattern error (NPE), as it arises from the normalization of the standard deviation of the error fluctuations. μ is the normalized bias (NBias). The use of σD to standardize the bias 〈D〉 is recommended when the two populations are not independent [Rosenthal, 1991]. The magnitude of NBias is also the paired t-statistic multiplied by , where N is the number of degrees of freedom in the data sample. Despite N being unknown because of spatiotemporal correlation among data values, NBias is a direct measure of statistical significance: larger NBias implies more significant bias. So a comparison of NBias between two regions is also a comparison of the statistical significance of model bias in the two regions.
 To present the diagnostics on a polar plot, we can define an angle γ based on the NBias by
Note that for small bias, tan γ ≈ γ and so γ approximates NBias. Equation (25) can be rearranged as
For the magnitude of NBias less than 0.5, the bias makes negligible contribution to the total error (i.e. RMSE) since for |tanγ| =≲ 0.5, cos γ ≳ 0.9 and δ ≈ σ. Figure 3 illustrates the geometric relation between σ, δ and γ embodied in equation (30). It shows the decomposition of the total error into bias and pattern error contributions in a non-dimensionalized manner and will henceforth be called the “error decomposition” diagram. For a scalar variable, positive and negative bias are distinguished by plotting in the right and left quadrant respectively. For a vector variable, only the right quadrant is used (which could be interpreted as the 2D meridional projection of a 3D hemispherical plot in which the azimuthal angle denotes the direction of the vector bias).
Figure 3. The error decomposition diagram: the normalized RMSE, δ, is the radial distance labeled along the bottom of the plot; γ is the angle from the vertical axis and is a measure of the normalized bias, μ. The normalized pattern error, σ, is the upward vertical distance bounded by . The values of μ is labeled around the edge of the plot, where values between −0.5 and 0.5 (bold dashed lines) denotes negligible contribution of the bias compared to the pattern error to the total error (i.e. RMSE). The “best” model with δ = 0 is marked by a star.
Download figure to PowerPoint
3.3. Vector Variables
 For a vector variable A, σA2 ≡ tr[var(A)] does not capture all independent pieces of information in var(A). For a two-dimensional vector like horizontal wind (u, v), there are three independent pieces of information pertaining to variance, exemplified by var(u), var(v) and cov(u, v). Koh and Ng  proposed the use of an ellipse to capture all this information. Figure 4 (reproduced from Koh and Ng ) shows such an ellipse, where a and b are respectively the square roots of the larger and smaller eigenvalues of var(A). (Such representation can be generalized to an n-ellipsoid forn-dimensional vectors.) Taking the vector A as the error D of some two-dimensional vector variable, the centered MSE,σD2 = a2 + b2, would be a measure of the size of the “error ellipse”.
Figure 4. Graphical representation of the mean and variance of a 2D vector A, where λmin and λmaj are the smaller and larger eigenvalues of var(A). The axes of the ellipse are aligned with the corresponding eigenvectors, or equivalently, principal components of vector A. For example, A could be F, O or D.
Download figure to PowerPoint
 Suppose the eigenvector with eigenvalue a2 makes a clockwise angle of θ with the vertical axis (Figure 4), where 0 ≤ θ < π. Using the horizontal and vertical unit vectors as the basis, Appendix B shows that
where εs is a symmetrized measure of eccentricity of the ellipse, defined as:
For εs ≪ 1, εs is the fractional difference between a and b (proven in Appendix B).
 Evidently from equation (31), besides σD, two other diagnostics are necessary to complete the description of vector pattern errors: εs and θ which measure respectively the extent of anisotropy and the preferred direction of the vector errors. Both diagnostics can be displayed in another polar plot, the “error anisotropy” diagram (Figure 5) in which the radial distance is εs and the polar angle is 2θ, reflecting the order-2 rotational symmetry of the orientation of the error ellipse.εs = 0 corresponds to isotropy while εs = 1 corresponds to maximal anisotropy (i.e. the vector errors align in a straight line). Note that εs and θ characterize the vector pattern error but are not errors per se.
Figure 5. The error anisotropy diagram: the radial distance εs is a symmetrized measure of eccentricity quantifying the extent of anisotropy of the vector pattern errors; the angular coordinate 2θ reflects the preferred direction θ of the errors. Note the cardinal directions are folded together. There is no “best” model on this diagram although the center is special as it stands for isotropy in the vector pattern error.
Download figure to PowerPoint