Atmospheric Modeling and Analysis Division, National Exposure Research Laboratory, U.S. Environmental Protection Agency, Research Triangle Park, NC 27711, USA

The use of unbiased symmetric metrics, as outlined in Yu et al. (2006), hereafter YU2006 simplifies interpretation of comparisons between two datasets. The interpretation is similar for both under and overestimation, in that the modified bias statistic can be viewed in terms of a factor over or underestimation of one dataset to another. These metrics also are designed to minimize inflation of the results by a limited number of very small valued data points when doing the normalization. These features make unbiased symmetric metrics powerful and easy to use tools to quickly understand overall difference between two datasets.

In the context presented in YU2006, the metrics are used for air quality parameters, e.g. concentrations produced by a model compared to observations. Within this context, the formulations developed in YU2006 are fully adequate since concentrations of aerosols and trace gases are always positive. However, cases exist where one needs to compare datasets with negative mean values, such as top-of-the-atmosphere (TOA) shortwave aerosol radiative forcing. In those cases, the formulas presented in YU2006 for the normalized mean bias factor, B_{NMBF}, and the normalized mean absolute error factor, E_{NMAEF}, break down. In this article, we show that by taking advantage of the symmetry around zero for positive and negative numbers, one can use the absolute values of the dataset means in the original formulas for B_{NMBF} and E_{NMAEF} defined in YU2006 for the case in which the means of the two datasets being compared are negative. The basic interpretation of the metrics remains identical to the original formulations, making the updated version of the metrics, presented here, more robust. Therefore, it is recommended that the updated formulations be used going forward.

2. Metric formulation and expansion to negative means

The original formulations for B_{NMBF} and E_{NMAEF} from YU2006 are:

(1)

and

(2)

where M and O represent the modeled and observed quantities, the i subscript represents a particular value at a given point, and the overbar indicates the mean over all points.

The interpretation of these metrics, as described in YU2006, is straightforward if both M̄ and Ō are positive. This caveat is implicit in the context assumed by YU2006, but is never stated explicitly in the article. Those using the metrics need to be aware of this fact or they may unwittingly misinterpret their results if they have negative quantities.

The interpretation of B_{NMBF} is as follows. The sign of B_{NMBF} indicates whether the modeled mean under or overestimates the observed mean. And, the magnitude of B_{NMBF} indicates the factor of the under or overestimation. Specifically, YU2006 phrase the interpretation as ‘if B_{NMBF} is positive, the model overestimates the observations by a factor of B_{NMBF} + 1’ and ‘if B_{NMBF} is negative, the model underestimates the observations by a factor of 1 − B_{NMBF}.’ A simpler way to think of this, so that only one formula need be remembered, is that the magnitude of the model mean under or overestimates the magnitude of the observed mean by a factor of 1+|B_{NMBF}| with the sign of B_{NMBF} indicating an underestimation, if negative, or overestimation, if positive. While not necessary for positive means, as presented above, the addition of the language for the magnitude of the means foreshadows the use of the revised metric presented below.

The interpretation of E_{NMAEF} given by YU2006 is that the value of E_{NMAEF} represents the ratio of the mean absolute gross error and either the mean observation for the case of overestimation or the mean model value for the case of underestimation. The mean absolute gross error is defined as

(3)

where N is the number of samples. A simpler way to think of this is that the E_{NMAEF} is the ratio of the mean absolute gross error divided by the smaller of the two means.

The problem with the above interpretations of B_{NMBF} and E_{NMAEF} when M̄ and Ō are negative is demonstrated in Table I. This table shows the simplest case of comparing a single pair of model and observation points, i.e. N = 1, so M_{1} = M̄ and O_{1} = Ō. Given this assumption, one can quickly calculate the metrics in their head for discussion purposes. The table is broken into three sections based on the signs of the means. The column ‘Factor under/over’ indicates the factor by which the model under or overestimates the observed mean, defined here as |M̄/Ō| for |M̄|>|Ō| and |Ō/M̄| for |M̄|<|Ō|; note that the values are all 5. So, based on the desired behavior of B_{NMBF}, the value of |B_{NMBF}| for these scenarios should equal 4 with the sign of B_{NMBF} positive for cases with M̄ > Ō and negative for M̄ < Ō. This is indeed the case when both means are positive but is not when one or both of them are negative. Likewise for E_{NMAEF}, negative means yield misleading results. In our examples, when the ‘factors’ equal 5, if both means have the same sign, E_{MAGE} = 4 and the absolute value of the smallest mean is 1, so the desired value of E_{NMAEF} is 4 based on taking the ratio of the two. For the two negative means, this does not occur; E_{NMAEF} is − 0.8 instead. When the signs of the two means differ, E_{MAGE} = 6 in our case, so the desired E_{NMAEF} should be 6, but instead is − 6. The sign difference could be considered trivial in this simplistic case. However, other scenarios, as demonstrated by the last two lines of the table, result in misleading conclusions if E_{NMAEF} is considered E_{MAGE} divided by the smaller mean.

Table I. Comparison of the original and revised formulations of the normalized mean bias factor and normalized mean absolute error factor. To simplify the example, the means consist of only one value. If the modeled and observed datasets had multiple points, the bias and error statistics would not always have such similar values

Scenario

M̄

Ō

Factor of magnitude under/over

B_{NMBF}

B′_{NMBF}

E_{NMAEF}

E′_{NMAEF}

Both means positive

M̄ > 0; M̄ > Ō

5

1

5

4

4

4

4

M̄ > 0; M̄ < Ō

1

5

5

− 4

− 4

4

4

Both means negative

M̄ < 0; M̄ > Ō

− 1

− 5

5

− 0.8

− 4

− 0.8

4

M̄ < 0; M̄ < Ō

− 5

− 1

5

0.8

4

− 0.8

4

Means have mixed signs

M̄ > 0 & Ō < 0; M̄ > Ō

5

− 1

5

− 6

N/A

− 6

N/A

M̄ < 0 & Ō > 0; M̄ < Ō

− 1

5

5

6

N/A

− 6

N/A

M̄ > 0 & Ō < 0; M̄ > Ō

1

− 5

5

− 1.2

N/A

− 1.2

N/A

M̄ < 0 & Ō > 0; M̄ < Ō

− 5

1

5

1.2

N/A

− 1.2

N/A

Given the above problems with interpreting B_{NMBF} and E_{NMAEF} for the case of one or more negative means, the revised metrics are defined as:

(4)

and

(5)

where we indicate the updated versions of the metrics using primes to differentiate them from the original formulations. The formulas for the metrics are very similar to the original versions with the exceptions of using the absolute values of the means in all calculations and conditions, and the additional conditions on the signs of the means that makes the metrics undefined if the signs of the means differ.

The resulting values from the original versus the revised formulations are identical for the case of positive means. So, existing work using the metrics under these circumstances (e.g. Zhang et al., 2006; Sartelet et al., 2007; Bao et al., (2010)) will not have issues regarding which version of the metrics were used, and no confusion will result from future work using the revised formulations. For clarity, the defining principles for the definitions are repeated from YU2006 with this modification. The revised formulations are predicated on the assumption that the metrics should follow the rules outlined below.

For B′_{NMBF} the updated rules are:

1.The sign of B′_{NMBF} indicates whether the magnitude of the model mean under or overestimates the magnitude of the observed mean, with B′_{NMBF} < 0 indicating |M̄|<|Ō|, and B′_{NMBF} > 0 indicating |M̄|>|Ō|.

2.1+|B′_{NMBF}| indicates the factor by which the magnitude of the model mean under or overestimates the observed mean.

3.Because of the first two rules, B′_{NMBF} is undefined when the signs of M̄ and Ō differ.

4.The range of B′_{NMBF} is − ∞< B′_{NMBF} < ∞ with a value of 0 indicating the best agreement.

Rule 3 deserves further justification. Unfortunately, if B′_{NMBF} is to indicate both model under or overestimation along with the factor of that difference from the observations, there is no logical way both requirements can hold while simultaneously having B′_{NMBF} uniquely identify a given situation. Referring again to Table I, and also Figure 1, for the scenario of M̄ = 5 and Ō = − 1, the factor overestimation of the model is 5/− 1 = − 5. Depending on what sign one argues should be used for the resulting ratio, based on Rule 2, the corresponding value of B′_{NMBF} would be either − 6 if the ratio were allowed to stay negative or else 4 if the absolute value of the ratio were used. Either one of these values would indicate different model behavior if the signs of the means matched, i.e. the values of B′_{NMBF} would not uniquely indicate one possible situation. This issue is graphically shown in Figure 1, where the different combinations of M̄ and Ō from Table I are shown along with lines indicating different ratios for the factor differences. Note that if the signs of the means match, there are unique, easily identifiable ratios that can be associated with each M̄ and Ō pairing, as shown by the two colored quadrants containing the ratio lines. However, if the signs of M̄ and Ō do not match, there is no unique way to define the ratio without causing confusion with the ratio when M̄ and Ō have matching signs, as indicated by the lack of ratio lines in the two quadrants where the signs differ. Therefore, it is more desirable to limit the complete robustness of B′_{NMBF} by explicitly making it undefined when the signs of the two means differ instead of introducing ambiguity.

For E′_{NMAEF}, the updated rules are:

1.The value of E′_{NMAEF} indicates the ratio of E_{MAGE} to the smaller of the two means.

2.The range of E′_{NMAEF} is 0≤E′_{NMAEF} < ∞, with a value of 0 indicating best agreement.

Because B′_{NMBF} and E′_{NMAEF} form a pair that together characterize the relationship between the model and observation values, we choose to make E′_{NMAEF} undefined when B′_{NMBF} is undefined. While this is not strictly necessary, since the rules defining E′_{NMAEF} hold when the signs of the means differ, this choice will prevent confusion by users of the metrics.

Table I shows the values for the revised metrics for the given scenarios. The results now can be interpreted consistently across a much wider range of conditions. Interpretations of both B′_{NMBF} and E′_{NMAEF} hold when the signs of both means match.

One limitation remains for B′_{NMBF} and E′_{NMAEF}: the inability to use them for cases when the modeled and observed means differ in sign. This limitation does not prevent using B′_{NMBF} and E′_{NMAEF} when individual points within the analysis contain both signs, only when the signs of the means differ. So, with judicious use of the metrics, one can use them in most situations. However, to prevent confusion, we do not encourage using the metrics if individual points within the two datasets are both negative and positive.

It should also be noted that similar issues to those raised here, regarding the applicability of the metrics in YU2006 for negative means, exist for other metrics highlighted in that paper besides B_{NMBF} and E_{NMAEF}. For example, the normalized mean bias defined as

(6)

also has issues with non-uniqueness and interpretation errors when the sign of the means differ since the ratio of the means is offset by the value − 1. If M̄ = 1 and Ō = − 2 then B_{NMB} = − 1.5, whereas the situation of M̄ = − 1 and Ō = 2 results in the same B_{NMB} = − 1.5. So, if one was presented with the statistic B_{NMB} = − 1.5, one would not know if the model over or underestimated the observations. Similar to what has been done for B′_{NMBF}, restricting the definition of B_{NMB} to the case where the signs of the two means match clarifies the issue for users and permits using the metric with two negative means. However, B_{NMB} still retains the limitation that it is asymmetric regarding the possible values for over and underestimations, as discussed in YU2006, and therefore, using B′_{NMBF} is often a better choice.

The normalized mean absolute error, defined as

(7)

also has issues with going negative when Ō < 0, which is out of the defined range of values for it. However, this is easily remedied by redefining it using the same principles used to redefine B′_{NMBF} and E′_{NMAEF} resulting in

(8)

so that 0≤E′_{NMAE} < ∞ is guaranteed.

3. Example interpretation and use of updated metrics

With the above changes to the metric formulations, the normalized mean bias factor and normalized mean absolute error factor can be used for a broader range of applications. The benefit of these metrics can be seen through their application. YU2006 presents a good example showcasing the simplicity of using B′_{NMBF} and E′_{NMAEF} versus other metrics, and the reader is referred to that article for a more detailed discussion. Here, two examples are given. The first is a simple example highlighting the subtle change to the metric interpretation. The second is a real-world example using cloud radiative forcing.

The idealized, simple example uses the values shown in Table I. If a user were presented with a B′_{NMBF} of − 4 for an observed mean of − 5, they could quickly determine that the magnitude of the model mean underestimates the magnitude of the observed mean because B′_{NMBF} is negative. Next, adding 1 to |B′_{NMBF}| gives a value of 5, so the magnitude of the model mean underestimates the magnitude of the observed model mean by a factor of 5. If one desires, this can quickly be used to determine the magnitude of the model mean using M̄ = Ō/(1+|B′_{NMBF}|) = (−5)/5 = − 1. (When B′_{NMBF} indicates an overestimation, the ratio of |M̄| and |Ō| flips, so one would reconstruct M̄ using M̄ = Ō(1+|B′_{NMBF}|).) In a symmetric manner, if the observed mean was 5, instead of − 5, then a B′_{NMBF} of − 4 would indicate that the value of the model mean is 1. Likewise, if one had the model mean instead of the observed mean for a starting point, one could use similar logic to determine the observed mean. While this might sound like a long chain to follow when written in full detail, in practice it becomes routine and intuitive. The symmetry of the factor for over and underestimation, as well as the symmetry about zero, makes the metric appealing.

A real-world situation where the updated metrics prove useful is the comparison of shortwave cloud radiative forcing. Figure 2 shows monthly mean TOA global shortwave cloud radiative forcing from March 2000 through February 2012 for observations from the Cloud and Earth's Radiant Energy System (CERES) satellite observations (Wielicki et al., 1996) versus global climate model output for eight models from the World Climate Research Programme (WCRP) Coupled Model Intercomparison Project Phase 3 (CMIP3) (Meehl et al., 2007, http://www-pcmdi.llnl.gov/ipcc/about_ipcc.php). The data are based on the same dataset used in Dessler (2010) where it was used to analyze cloud radiative feedbacks on climate. As can be seen on the graph, some models clearly diverge from the 1 : 1 line and therefore from realistic behavior. The INM-CM3.0 model underestimates the magnitude of the forcing versus observations while the UKMO-HadCM3 overestimates it. The corresponding metrics using the updated definitions are shown in Table II. Of the models shown, these two represent the two extremes with normalized mean bias factors of − 0.33 and 0.27, respectively, indicating that the INM-CM3.0 model underestimates the CERES observed mean by a factor of 1.33 and UKMO-HadCM3 overestimates the same quantity by a factor of 1.27. The corresponding normalized mean absolute error factors for the two models are 0.33 and 0.27, respectively, indicating that the average difference between the modeled and observed values is approximately a factor of 0.3 times the mean. Because the normalized mean bias factor and normalized mean absolute error factor are similar in magnitude, this indicates that the models consistently either over or underestimate the observations without a combination of overestimates compensating underestimates. If underestimates were compensating overestimates, it would bring the two means closer together and the normalized mean bias factor would indicate a better agreement than the normalized mean absolute error factor. This is one way that using the two metrics together leads to greater insight. An example of this compensating effect can be seen for the ECHAM/MPI-OM model where the modeled values straddle the 1 : 1 line resulting in a normalized mean bias factor of 0.03 but a slightly larger normalized mean absolute error factor of 0.08.

Table II. Comparison of different metrics for monthly mean TOA global shortwave cloud radiative forcing from eight global climate models from CMIP3 versus CERES as shown in Figure 2. The correlation, mean bias (MB) and mean absolute gross error (MAGE) use the metric definitions directly from YU2006. The normalized mean bias (NMB′), normalized mean error (NME′), normalized mean bias factor (NMBF′) and normalized mean absolute error factor (NMAEF′) use updated metric definitions requiring matching signs for observed and modeled means, and the use of the absolute value of the means in the respective formulas

Model

NCAR PCM1

IPSL-CM4

INM-CM3.0

UKMO-HadCM3

ECHAM5/ MPI-OM

NCAR CCSM3

GFDL-CM2.0

GFDL-CM2.1

Mean observation

− 22.16

− 22.16

− 22.16

− 22.16

− 22.16

− 22.16

− 22.16

− 22.16

Mean model

− 23.97

− 19.09

− 16.66

− 28.17

− 22.76

− 23.73

− 27.40

− 26.59

Number

119

119

119

119

119

119

119

119

Correlation

0.78

0.96

0.79

0.83

0.76

0.86

0.73

0.82

Metrics based on difference

MB

− 1.81

3.07

5.50

− 6.01

− 0.60

− 1.57

− 5.24

− 4.44

MAGE

2.11

3.08

5.50

6.01

1.85

1.84

5.24

4.44

Metrics based on relative difference

NMB′

0.08

− 0.14

− 0.25

0.27

0.03

0.07

0.24

0.20

NMAE′

0.10

0.14

0.25

0.27

0.08

0.08

0.24

0.20

NMBF′

0.08

− 0.16

− 0.33

0.27

0.03

0.07

0.24

0.20

NMAEF′

0.10

0.16

0.33

0.27

0.08

0.08

0.24

0.20

4. Conclusion

It has been shown that the original formulations of the unbiased symmetric metrics defined in YU2006 break down for the case when the mean of one or more of the datasets being compared is negative. This is not an issue for the application described in YU2006 where the datasets always have positive values. However, other applications that involve negative values, such as TOA aerosol radiative forcing, could benefit by using a revised formulation of the metrics presented here. Using these updated formulas, along with subtle yet important clarifications of how to interpret the metrics as defined in the guidelines used to develop the formulas, a broader range of applications can benefit from applying these metrics.

The simpler interpretation of the metrics presented in Section 2, which originally only worked when the means of the model and observations are positive, can now additionally be applied to cases when both means are negative by using the updated formulations. To reiterate the simple way of thinking about these metrics, in the context of the revised formulation, the interpretations are as follows. For B′_{NMBF}, the magnitude of the model mean under or overestimates the magnitude of the observed mean by a factor of 1+|B′_{NMBF}| with the sign of B′_{NMBF} indicating an underestimation, if negative, or overestimation, if positive. For E′_{NMAEF}, E′_{NMAEF} is the ratio of the mean absolute gross error divided by the smaller of the two means. Both B′_{NMBF} and E′_{NMAEF} are undefined when the signs of the two means differ.

Acknowledgements

The authors wish to thank Elaine Chapman for her insight and assistance improving this manuscript. The contribution of Dr W. I. G. to this work was supported by a US DOE Early Career Research grant to him at Pacific Northwest National Laboratory (PNNL) under Contract DE-AC06-76RLO1830. PNNL is operated for the US DOE by Battelle Memorial Institute. The contribution of Dr S. Y. was funded and managed by the US Environmental Protection Agency through its Office of Research and Development. It has been subjected to the Agency's administrative review and approved for publication. The authors also want to thank Dr Dessler for his help in obtaining the data used in Figure 2.