• Open Access

Generalized approach for using unbiased symmetric metrics with negative values: normalized mean bias factor and normalized mean absolute error factor

Authors

  • William I. Gustafson Jr,

    Corresponding author
    1. Atmospheric Sciences and Global Change Division, Pacific Northwest National Laboratory, Richland, WA 99352, USA
    • Atmospheric Sciences and Global Change Division, Pacific Northwest National Laboratory, Post Office Box 999, MSIN K9-30, Richland, WA 99352, USA.
    Search for more papers by this author
  • Shaocai Yu

    1. Atmospheric Modeling and Analysis Division, National Exposure Research Laboratory, U.S. Environmental Protection Agency, Research Triangle Park, NC 27711, USA
    Search for more papers by this author
    • The contributions of these authors to this article were prepared as part of their official duties as United States Federal Government employees.


Abstract

Unbiased symmetric metrics provide a useful measure to quickly compare two datasets, with similar interpretations for both under and overestimations. Two examples include the normalized mean bias factor and normalized mean absolute error factor. However, the original formulations of these metrics are only valid for datasets with positive means. This article presents a methodology to use and interpret the metrics with datasets that have negative means. The updated formulations give identical results compared to the original formulations for the case of positive means, so researchers are encouraged to use the updated formulations going forward without introducing ambiguity. Copyright © 2012 Royal Meteorological Society

1. Introduction

The use of unbiased symmetric metrics, as outlined in Yu et al. (2006), hereafter YU2006 simplifies interpretation of comparisons between two datasets. The interpretation is similar for both under and overestimation, in that the modified bias statistic can be viewed in terms of a factor over or underestimation of one dataset to another. These metrics also are designed to minimize inflation of the results by a limited number of very small valued data points when doing the normalization. These features make unbiased symmetric metrics powerful and easy to use tools to quickly understand overall difference between two datasets.

In the context presented in YU2006, the metrics are used for air quality parameters, e.g. concentrations produced by a model compared to observations. Within this context, the formulations developed in YU2006 are fully adequate since concentrations of aerosols and trace gases are always positive. However, cases exist where one needs to compare datasets with negative mean values, such as top-of-the-atmosphere (TOA) shortwave aerosol radiative forcing. In those cases, the formulas presented in YU2006 for the normalized mean bias factor, BNMBF, and the normalized mean absolute error factor, ENMAEF, break down. In this article, we show that by taking advantage of the symmetry around zero for positive and negative numbers, one can use the absolute values of the dataset means in the original formulas for BNMBF and ENMAEF defined in YU2006 for the case in which the means of the two datasets being compared are negative. The basic interpretation of the metrics remains identical to the original formulations, making the updated version of the metrics, presented here, more robust. Therefore, it is recommended that the updated formulations be used going forward.

2. Metric formulation and expansion to negative means

The original formulations for BNMBF and ENMAEF from YU2006 are:

equation image(1)

and

equation image(2)

where M and O represent the modeled and observed quantities, the i subscript represents a particular value at a given point, and the overbar indicates the mean over all points.

The interpretation of these metrics, as described in YU2006, is straightforward if both and Ō are positive. This caveat is implicit in the context assumed by YU2006, but is never stated explicitly in the article. Those using the metrics need to be aware of this fact or they may unwittingly misinterpret their results if they have negative quantities.

The interpretation of BNMBF is as follows. The sign of BNMBF indicates whether the modeled mean under or overestimates the observed mean. And, the magnitude of BNMBF indicates the factor of the under or overestimation. Specifically, YU2006 phrase the interpretation as ‘if BNMBF is positive, the model overestimates the observations by a factor of BNMBF + 1’ and ‘if BNMBF is negative, the model underestimates the observations by a factor of 1 − BNMBF.’ A simpler way to think of this, so that only one formula need be remembered, is that the magnitude of the model mean under or overestimates the magnitude of the observed mean by a factor of 1+|BNMBF| with the sign of BNMBF indicating an underestimation, if negative, or overestimation, if positive. While not necessary for positive means, as presented above, the addition of the language for the magnitude of the means foreshadows the use of the revised metric presented below.

The interpretation of ENMAEF given by YU2006 is that the value of ENMAEF represents the ratio of the mean absolute gross error and either the mean observation for the case of overestimation or the mean model value for the case of underestimation. The mean absolute gross error is defined as

equation image(3)

where N is the number of samples. A simpler way to think of this is that the ENMAEF is the ratio of the mean absolute gross error divided by the smaller of the two means.

The problem with the above interpretations of BNMBF and ENMAEF when and Ō are negative is demonstrated in Table I. This table shows the simplest case of comparing a single pair of model and observation points, i.e. N = 1, so M1 = and O1 = Ō. Given this assumption, one can quickly calculate the metrics in their head for discussion purposes. The table is broken into three sections based on the signs of the means. The column ‘Factor under/over’ indicates the factor by which the model under or overestimates the observed mean, defined here as |/Ō| for ||>|Ō| and |Ō/| for ||<|Ō|; note that the values are all 5. So, based on the desired behavior of BNMBF, the value of |BNMBF| for these scenarios should equal 4 with the sign of BNMBF positive for cases with > Ō and negative for < Ō. This is indeed the case when both means are positive but is not when one or both of them are negative. Likewise for ENMAEF, negative means yield misleading results. In our examples, when the ‘factors’ equal 5, if both means have the same sign, EMAGE = 4 and the absolute value of the smallest mean is 1, so the desired value of ENMAEF is 4 based on taking the ratio of the two. For the two negative means, this does not occur; ENMAEF is − 0.8 instead. When the signs of the two means differ, EMAGE = 6 in our case, so the desired ENMAEF should be 6, but instead is − 6. The sign difference could be considered trivial in this simplistic case. However, other scenarios, as demonstrated by the last two lines of the table, result in misleading conclusions if ENMAEF is considered EMAGE divided by the smaller mean.

Table I. Comparison of the original and revised formulations of the normalized mean bias factor and normalized mean absolute error factor. To simplify the example, the means consist of only one value. If the modeled and observed datasets had multiple points, the bias and error statistics would not always have such similar values
ScenarioŌFactor of magnitude under/overBNMBFBNMBFENMAEFENMAEF
Both means positive
> 0; > Ō5154444
> 0; < Ō155− 4− 444
Both means negative
< 0; > Ō− 1− 55− 0.8− 4− 0.84
< 0; < Ō− 5− 150.84− 0.84
Means have mixed signs
> 0 & Ō < 0; > Ō5− 15− 6N/A− 6N/A
< 0 & Ō > 0; < Ō− 1556N/A− 6N/A
> 0 & Ō < 0; > Ō1− 55− 1.2N/A− 1.2N/A
< 0 & Ō > 0; < Ō− 5151.2N/A− 1.2N/A

Given the above problems with interpreting BNMBF and ENMAEF for the case of one or more negative means, the revised metrics are defined as:

equation image(4)

and

equation image(5)

where we indicate the updated versions of the metrics using primes to differentiate them from the original formulations. The formulas for the metrics are very similar to the original versions with the exceptions of using the absolute values of the means in all calculations and conditions, and the additional conditions on the signs of the means that makes the metrics undefined if the signs of the means differ.

The resulting values from the original versus the revised formulations are identical for the case of positive means. So, existing work using the metrics under these circumstances (e.g. Zhang et al., 2006; Sartelet et al., 2007; Bao et al., (2010)) will not have issues regarding which version of the metrics were used, and no confusion will result from future work using the revised formulations. For clarity, the defining principles for the definitions are repeated from YU2006 with this modification. The revised formulations are predicated on the assumption that the metrics should follow the rules outlined below.

For BNMBF the updated rules are:

  • 1.The sign of BNMBF indicates whether the magnitude of the model mean under or overestimates the magnitude of the observed mean, with BNMBF < 0 indicating ||<|Ō|, and BNMBF > 0 indicating ||>|Ō|.
  • 2.1+|BNMBF| indicates the factor by which the magnitude of the model mean under or overestimates the observed mean.
  • 3.Because of the first two rules, BNMBF is undefined when the signs of and Ō differ.
  • 4.The range of BNMBF is − ∞< BNMBF < ∞ with a value of 0 indicating the best agreement.

Rule 3 deserves further justification. Unfortunately, if BNMBF is to indicate both model under or overestimation along with the factor of that difference from the observations, there is no logical way both requirements can hold while simultaneously having BNMBF uniquely identify a given situation. Referring again to Table I, and also Figure 1, for the scenario of = 5 and Ō = − 1, the factor overestimation of the model is 5/− 1 = − 5. Depending on what sign one argues should be used for the resulting ratio, based on Rule 2, the corresponding value of BNMBF would be either − 6 if the ratio were allowed to stay negative or else 4 if the absolute value of the ratio were used. Either one of these values would indicate different model behavior if the signs of the means matched, i.e. the values of BNMBF would not uniquely indicate one possible situation. This issue is graphically shown in Figure 1, where the different combinations of and Ō from Table I are shown along with lines indicating different ratios for the factor differences. Note that if the signs of the means match, there are unique, easily identifiable ratios that can be associated with each and Ō pairing, as shown by the two colored quadrants containing the ratio lines. However, if the signs of and Ō do not match, there is no unique way to define the ratio without causing confusion with the ratio when and Ō have matching signs, as indicated by the lack of ratio lines in the two quadrants where the signs differ. Therefore, it is more desirable to limit the complete robustness of BNMBF by explicitly making it undefined when the signs of the two means differ instead of introducing ambiguity.

Figure 1.

The black dots show a comparison of the and Ō combinations in Table I. The orange regions show where underestimates Ō and the blue regions show where overestimates Ō, while the gray lines show where a given pair of and Ō would lie on the graph for different factor differences between the two values

For ENMAEF, the updated rules are:

  • 1.The value of ENMAEF indicates the ratio of EMAGE to the smaller of the two means.
  • 2.The range of ENMAEF is 0≤ENMAEF < ∞, with a value of 0 indicating best agreement.

Because BNMBF and ENMAEF form a pair that together characterize the relationship between the model and observation values, we choose to make ENMAEF undefined when BNMBF is undefined. While this is not strictly necessary, since the rules defining ENMAEF hold when the signs of the means differ, this choice will prevent confusion by users of the metrics.

Table I shows the values for the revised metrics for the given scenarios. The results now can be interpreted consistently across a much wider range of conditions. Interpretations of both BNMBF and ENMAEF hold when the signs of both means match.

One limitation remains for BNMBF and ENMAEF: the inability to use them for cases when the modeled and observed means differ in sign. This limitation does not prevent using BNMBF and ENMAEF when individual points within the analysis contain both signs, only when the signs of the means differ. So, with judicious use of the metrics, one can use them in most situations. However, to prevent confusion, we do not encourage using the metrics if individual points within the two datasets are both negative and positive.

It should also be noted that similar issues to those raised here, regarding the applicability of the metrics in YU2006 for negative means, exist for other metrics highlighted in that paper besides BNMBF and ENMAEF. For example, the normalized mean bias defined as

equation image(6)

also has issues with non-uniqueness and interpretation errors when the sign of the means differ since the ratio of the means is offset by the value − 1. If = 1 and Ō = − 2 then BNMB = − 1.5, whereas the situation of = − 1 and Ō = 2 results in the same BNMB = − 1.5. So, if one was presented with the statistic BNMB = − 1.5, one would not know if the model over or underestimated the observations. Similar to what has been done for BNMBF, restricting the definition of BNMB to the case where the signs of the two means match clarifies the issue for users and permits using the metric with two negative means. However, BNMB still retains the limitation that it is asymmetric regarding the possible values for over and underestimations, as discussed in YU2006, and therefore, using BNMBF is often a better choice.

The normalized mean absolute error, defined as

equation image(7)

also has issues with going negative when Ō < 0, which is out of the defined range of values for it. However, this is easily remedied by redefining it using the same principles used to redefine BNMBF and ENMAEF resulting in

equation image(8)

so that 0≤ENMAE < ∞ is guaranteed.

3. Example interpretation and use of updated metrics

With the above changes to the metric formulations, the normalized mean bias factor and normalized mean absolute error factor can be used for a broader range of applications. The benefit of these metrics can be seen through their application. YU2006 presents a good example showcasing the simplicity of using BNMBF and ENMAEF versus other metrics, and the reader is referred to that article for a more detailed discussion. Here, two examples are given. The first is a simple example highlighting the subtle change to the metric interpretation. The second is a real-world example using cloud radiative forcing.

The idealized, simple example uses the values shown in Table I. If a user were presented with a BNMBF of − 4 for an observed mean of − 5, they could quickly determine that the magnitude of the model mean underestimates the magnitude of the observed mean because BNMBF is negative. Next, adding 1 to |BNMBF| gives a value of 5, so the magnitude of the model mean underestimates the magnitude of the observed model mean by a factor of 5. If one desires, this can quickly be used to determine the magnitude of the model mean using = Ō/(1+|BNMBF|) = (−5)/5 = − 1. (When BNMBF indicates an overestimation, the ratio of || and |Ō| flips, so one would reconstruct using = Ō(1+|BNMBF|).) In a symmetric manner, if the observed mean was 5, instead of − 5, then a BNMBF of − 4 would indicate that the value of the model mean is 1. Likewise, if one had the model mean instead of the observed mean for a starting point, one could use similar logic to determine the observed mean. While this might sound like a long chain to follow when written in full detail, in practice it becomes routine and intuitive. The symmetry of the factor for over and underestimation, as well as the symmetry about zero, makes the metric appealing.

A real-world situation where the updated metrics prove useful is the comparison of shortwave cloud radiative forcing. Figure 2 shows monthly mean TOA global shortwave cloud radiative forcing from March 2000 through February 2012 for observations from the Cloud and Earth's Radiant Energy System (CERES) satellite observations (Wielicki et al., 1996) versus global climate model output for eight models from the World Climate Research Programme (WCRP) Coupled Model Intercomparison Project Phase 3 (CMIP3) (Meehl et al., 2007, http://www-pcmdi.llnl.gov/ipcc/about_ipcc.php). The data are based on the same dataset used in Dessler (2010) where it was used to analyze cloud radiative feedbacks on climate. As can be seen on the graph, some models clearly diverge from the 1 : 1 line and therefore from realistic behavior. The INM-CM3.0 model underestimates the magnitude of the forcing versus observations while the UKMO-HadCM3 overestimates it. The corresponding metrics using the updated definitions are shown in Table II. Of the models shown, these two represent the two extremes with normalized mean bias factors of − 0.33 and 0.27, respectively, indicating that the INM-CM3.0 model underestimates the CERES observed mean by a factor of 1.33 and UKMO-HadCM3 overestimates the same quantity by a factor of 1.27. The corresponding normalized mean absolute error factors for the two models are 0.33 and 0.27, respectively, indicating that the average difference between the modeled and observed values is approximately a factor of 0.3 times the mean. Because the normalized mean bias factor and normalized mean absolute error factor are similar in magnitude, this indicates that the models consistently either over or underestimate the observations without a combination of overestimates compensating underestimates. If underestimates were compensating overestimates, it would bring the two means closer together and the normalized mean bias factor would indicate a better agreement than the normalized mean absolute error factor. This is one way that using the two metrics together leads to greater insight. An example of this compensating effect can be seen for the ECHAM/MPI-OM model where the modeled values straddle the 1 : 1 line resulting in a normalized mean bias factor of 0.03 but a slightly larger normalized mean absolute error factor of 0.08.

Figure 2.

Comparisons of monthly mean TOA global shortwave cloud radiative forcing (CRF) for eight global climate models versus CERES satellite observations during March 2000 and February 2010. The 1 : 1, 2 : 1 and 1 : 2 lines are shown for reference

Table II. Comparison of different metrics for monthly mean TOA global shortwave cloud radiative forcing from eight global climate models from CMIP3 versus CERES as shown in Figure 2. The correlation, mean bias (MB) and mean absolute gross error (MAGE) use the metric definitions directly from YU2006. The normalized mean bias (NMB′), normalized mean error (NME′), normalized mean bias factor (NMBF′) and normalized mean absolute error factor (NMAEF′) use updated metric definitions requiring matching signs for observed and modeled means, and the use of the absolute value of the means in the respective formulas
ModelNCAR PCM1IPSL-CM4INM-CM3.0UKMO-HadCM3ECHAM5/ MPI-OMNCAR CCSM3GFDL-CM2.0GFDL-CM2.1
Mean observation− 22.16− 22.16− 22.16− 22.16− 22.16− 22.16− 22.16− 22.16
Mean model− 23.97− 19.09− 16.66− 28.17− 22.76− 23.73− 27.40− 26.59
Number119119119119119119119119
Correlation0.780.960.790.830.760.860.730.82
Metrics based on difference
MB− 1.813.075.50− 6.01− 0.60− 1.57− 5.24− 4.44
MAGE2.113.085.506.011.851.845.244.44
Metrics based on relative difference
NMB0.08− 0.14− 0.250.270.030.070.240.20
NMAE0.100.140.250.270.080.080.240.20
NMBF0.08− 0.16− 0.330.270.030.070.240.20
NMAEF0.100.160.330.270.080.080.240.20

4. Conclusion

It has been shown that the original formulations of the unbiased symmetric metrics defined in YU2006 break down for the case when the mean of one or more of the datasets being compared is negative. This is not an issue for the application described in YU2006 where the datasets always have positive values. However, other applications that involve negative values, such as TOA aerosol radiative forcing, could benefit by using a revised formulation of the metrics presented here. Using these updated formulas, along with subtle yet important clarifications of how to interpret the metrics as defined in the guidelines used to develop the formulas, a broader range of applications can benefit from applying these metrics.

The simpler interpretation of the metrics presented in Section 2, which originally only worked when the means of the model and observations are positive, can now additionally be applied to cases when both means are negative by using the updated formulations. To reiterate the simple way of thinking about these metrics, in the context of the revised formulation, the interpretations are as follows. For BNMBF, the magnitude of the model mean under or overestimates the magnitude of the observed mean by a factor of 1+|BNMBF| with the sign of BNMBF indicating an underestimation, if negative, or overestimation, if positive. For ENMAEF, ENMAEF is the ratio of the mean absolute gross error divided by the smaller of the two means. Both BNMBF and ENMAEF are undefined when the signs of the two means differ.

Acknowledgements

The authors wish to thank Elaine Chapman for her insight and assistance improving this manuscript. The contribution of Dr W. I. G. to this work was supported by a US DOE Early Career Research grant to him at Pacific Northwest National Laboratory (PNNL) under Contract DE-AC06-76RLO1830. PNNL is operated for the US DOE by Battelle Memorial Institute. The contribution of Dr S. Y. was funded and managed by the US Environmental Protection Agency through its Office of Research and Development. It has been subjected to the Agency's administrative review and approved for publication. The authors also want to thank Dr Dessler for his help in obtaining the data used in Figure 2.

Ancillary