Optimal fingerprinting under multiple sources of uncertainty

Authors


Abstract

Detection and attribution studies routinely use linear regression methods referred to as optimal fingerprinting. Within the latter methodological paradigm, it is usually recognized that multiple sources of uncertainty affect both the observations and the simulated climate responses used as regressors. These include for instance internal variability, climate model error, or observational error. When all errors share the same covariance, the statistical inference is usually performed with the so-called total least squares procedure, but to date no inference procedure is readily available in the climate literature to treat the general case where this assumption does not hold. Here we address this deficiency. After a brief outlook on the error-in-variable models literature, we describe an inference procedure based on likelihood maximization, inspired by a recent article dealing with a similar situation in geodesy. We evaluate the performance of our approach via an idealized test bed. We find the procedure to outperform existing procedures when the latter wrongly neglect some sources of uncertainty.

1 Introduction

An important goal of climate research is to determine the causes of past global warming in general and the responsibility of human activity in particular; this question has thus emerged as a research topic known as detection and attribution (D & A). Over the past two decades, the results produced in this area have been instrumental in delivering far-reaching statements and in raising awareness of anthropogenic climate change (Intergovernmental Panel on Climate Change's Fourth Assessment Report: Hegerl et al. [2007]). From a methodological standpoint, D & A studies routinely use linear regression methods referred to as optimal fingerprinting, whereby an observed climate change y is regarded as a linear combination of externally forced signals x added to an internal climate variability term ϵ which is treated as a noise [Hegerl and Zwiers, 2011]. The regressors x—or “fingerprints”—consist here of spatial, temporal, or space-time patterns of response to external forcings as anticipated by one or several climate models. Practically, the results of the inference and in particular the magnitude of the uncertainty range on the regression coefficients determine whether these fingerprints are present in the observations and whether or not the observed change is attributable to a given cause.

Over the years, this approach has been formulated using a suite of multivariate linear regression models that were progressively refined, involving more and more general formulations together with inference procedures of increasing complexity. For instance, the standard general linear model and its usual inference treatment—i.e., ordinary least square (OLS)—were first introduced explicitly by Hegerl et al. [1996] and were then refined by Allen and Tett [1999]. In Allen and Stott [2003], an important further advance was introduced by recognizing that the forcing responses x are actually not known to certainty: they are indeed tainted by a remanent noise originating from the internal variability which is also present in climate model simulations, this, even after averaging over the simulation ensemble and because the size of the latter is usually small. Technically, the inference scheme described in Allen and Stott [2003] takes advantage of the fact that the error on the observed response y and on the regressors x are both assumed to be driven by internal variability (i.e., actual and simulated, respectively) and consequently share a common covariance structure. A joint “prewhitening” of y and x is thus possible, allowing next for a total least squares (TLS) inference [Van Huffel and Vanderwaal, 1994].

This innovation was taken one step further by Huntingford et al. [2006; hereinafter HSAL06] who introduced the more general hypothesis that the forcing responses x are not only tainted by remanent internal variability but also by some error associated to the climate model. From a model formulation standpoint, this contribution was an important milestone, foremost not only because it incorporated model uncertainty into D & A studies for the first time but also because it set the ground for taking into account several other sources of error (e.g., observational error) that may affect D & A. Nonetheless, HSAL06 presents two shortcomings from a model inference standpoint. First, its more general assumptions imply that the errors on y and x no longer have the same covariance structure; thus, the TLS algorithm cannot be applied: a different inference scheme is needed. While HSAL06 recognized this need, it came short of any detailed description of the inference technique used and instead merely refer to the procedure described in Nounou et al. [2002] in the context of chemical analysis. Thus, the paper is not self-contained in this respect, and the method is not very easily accessible to D & A practitioners. Second, the procedure developed by Nounou et al. [2002] appears to treat a statistical model which—to the best of our understanding—is different from the one formulated by HSAL06 and is neither applicable nor implementable in the situation at stake (see section 2).

Because of the general relevance of HSAL06 model formulation in D & A, it is fundamental in our view to insure that a sound and easily implementable statistical inference method is available for this model. The main objective of this paper is to lay out such an inference method. Section 2 recalls the model, introduces a few notations, and then proposes a brief outlook on the statistical literature available for this type of regression models enhancing the peculiarity of the model studied. Section 3 lays out our inference scheme, and section 4 illustrates it and evaluates its performance on simulated series. Section 5 discusses results and concludes.

2 Defining and Discussing Error-in-Variable Models

We propose to study the following class of error-in-variables (EIV) models:

display math(1)

where the observation vector y of length n and the simulation ensemble average response matrix x of size n×p are known. In contrast, the matrix of actual regressors x* of size n×p is not known with certainty but the known matrix x is assumed to be a noised version of it. The vector of regression coefficient β of length p is unknown and represents the main focus of the inference, as in any classical linear regression. Concerning the noise dependence structure, we assume that the observation noise vector ϵ and the column vectors ν1,..., νp are all independent, Gaussian distributed with zero-mean and covariance matrices Σ, Ω1,...,Ωp, respectively.

It should be underlined that in both the D & A and the EIV literature, the model structural equation (1) is often formulated under the reduced form y=(xν)β+ϵ in which the latent variable x* was eliminated by using x* = xν. However, it can be argued that this form is somewhat ambiguous (see Appendix A); hence, we prefer the formulation of equation (1).

It should also be noted that the regression models mentioned in section 1 are all special cases of the general formulation of equation (1). Denoting Σ the covariance associated to internal variability, then Allen and Tett [1999] correspond to Ωi=0, Allen and Stott [2003] to Ωi=Σ/nr with nr the size of the ensemble, and HSAL06 to Ωi=Σ/nr+Λ where Λ is the covariance associated to the climate model error. In practice, these matrices are usually estimated beforehand from ensemble runs [Allen and Tett, 1999; Ribes et al., 2012]. While this estimation step has been shown to critically influence the end result, it is in general handled preliminarily and rather independently from the regression inference which is at stake here. It is thus omitted for concision in the present paper where Σ, Ωi are assumed to be known.

An extensive body of literature, dating back as far as Adcock [1878], is available on EIV linear regression models, presenting a variety of model formulations for applications in different fields (e.g., see the book of Fuller [1987] or the review article of Gillard [2010]). Reviewing it is beyond our scope, instead we highlight a cleaving aspect that makes the D & A setup a somewhat peculiar situation with respect to most EIV models found in literature. The main peculiarity of our model resides in the assumed structure of noise dependence (see Appendix B). In a nutshell, our setup assumes dependence between lines and independence between columns (i.e., n×n covariances), whereas most existing EIV models consider the symmetric case of independence between lines and dependence between columns (i.e., p×p covariance). An example of contribution falling into the latter category is Nounou et al. [2002], wherein all covariance matrices are p×p. Note that an overlap does exist between these two contrasted categories, but it is reduced to the case where all the noise components are independent. Such a “white” setup is a commonplace one, referred to as the Deming regression when p=1 [Deming, 1943], that can be handled with the TLS algorithm when noise variances are known and is the focus of many articles in the more problematic case where variances are known only partially Gillard [2010]. While the model of Allen and Stott [2003] is not a “white” setup strictly speaking, it can easily be connected to this case by “whitening” the data. In contrast, the general case of HSAL06 at stake here is neither “whitenable” nor does it have any overlap with the traditional EIV setup of line independence and column dependence. The method of Nounou et al. [2002]—as most EIV methods found in literature—is hence not able to treat the inference problem at stake here and is simply not implementable in the first place because of the mismatch in matrix sizes.

As a notable exception within the above-described EIV landscape, Schaffrin and Wieser [2008; hereinafter SW08] treated in the context of an application in geodesy the more general case where both lines and columns are simultaneously dependent. This contribution assumes that the dependence within the noise matrix ν has a block structure described by the Kronecker product QΩ where Q (respectively Ω) is a fixed known p×p (respectively n×n) covariance matrix describing the dependence among columns (respectively lines), which means that E(νijνkl)=Qjl×Ωik for any (i,j,k,l). Then, it proposes an inference procedure based on minimizing a weighted squared error criterion, which coincides with the negative loglikelihood of the model under a Gaussian setting. Minimization is performed by mean of an iterative algorithm coined “weighted total least squares” (WTLS), obtained by (i) deriving the first-order nonlinear conditions in β and x* (or, equivalently in the paper, in β and the residuals, under the model structural constraint), (ii) expressing their solution as a function of itself (i.e., “fixed point” formulation), and (iii) iterating the fixed point equations to convergence. Results show fast convergence toward the exact solution (when known).

By choosing Q=Ip, the framework of SW08 does correspond to our model, provided all the covariance matrices Ωi thereof are equal. Therefore, running the above algorithm would be a valid solution to solve our model in this case. However, this assumption may be restrictive, and on the other hand, the algorithm does not produce confidence intervals but merely a point estimate, an important shortcoming in a D & A context. These two limitations motivates us to design a new procedure, which is adapted from the one of SW08 and follows a similar approach.

3 Inference of the EIV Model

We wish to build and inference procedure for β, i.e., to derive a point estimate and confidence intervals that depend only of the two known variables x and y. For this purpose, we follow a maximum likelihood estimation (MLE) approach as is commonplace in EIV regression (e.g., SW08) and in optimal fingerprinting [e.g., Allen and Stott, 2003]. The loglikelihood function of model (1) is

display math(2)

We wish to maximize (2) in (β,x*). The first-order conditions are

display math(3)
display math(4)

where xi and math formula are the ith column vector of x and x*, respectively, and math formula. The system of p+1 equations corresponding to equations (3) and (4) must now be solved in (β,x*), but the solution cannot be obtained under a closed form because of the nonlinearity of the system implied by the product terms math formula. Following SW08, we can thus reformulate expressions (3) and (4) under a fixed point equation to be solved in (β,x*):

display math(5)
display math(6)

Next, we take advantage of equations (5) and (6) to implement the following iterative procedure:

display math(7)

where δ0>0 is the requested precision level. Scheme (7) yields a value math formula which is by construction a fixed point of the function φ(β,x*) defined by equations (5) and (6) and thereby satisfies to the first-order condition.

Defining the partial MLE as the value that maximizes the likelihood in one given variable when all others are hold fixed, the right-hand terms of equations (5) and (6) also give the expressions of the so-called partial MLEs math formula and math formula. Scheme (7) may thus also be interpreted as a partial iterative maximization. Such a procedure is not uncommon in statistics [e.g., Lauritzen, 1996], and it is similar to the widely used expectation-maximization (EM) procedure [Dempster et al., 1977]. Under the EM procedure, math formula would indeed also be obtained by iterative maximization in β (“M” step) but this maximization would be performed on a likelihood function obtained by averaging over x* (“E” step), whereas in (7) it is obtained by maximizing over x* (step 1). The partial iterative maximization approach is thus sometimes referred to as “hard EM” because of this similarity. Finally, scheme (7) may also be paralleled with the Gibbs algorithm widely used in Bayesian statistics Geman and Geman [1984], which similarly consists of a “partial iterative” simulation of β and x* conditionally on one another. Rephrased in this context, scheme (7) basically retains at each step the most likely value of the conditional distribution, as opposed to a random realization thereof.

With a point estimate math formula in hand, we now turn to confidence intervals. A substantial benefit of working with a MLE approach is that commonplace and widely applicable results are available to approximate confidence intervals. In particular, if we define the profile likelihood for each individual coefficient βi as

display math

where βi=(β1,...,βi−1,βi+1,...,βp), then the corresponding profile likelihood ratio test statistic is asymptotically distributed according to the χ2 distribution with 1° of freedom. This means that an approximate (1−α)% confidence region for βi is the set of values: math formula where c1−α is the (1−α)th quantile of the χ2 distribution with 1° of freedom (see Venzon and Moolgavkar [1988] for more details). The confidence interval is thus obtained by solving for math formula in βi, which is done here using the Newton-Raphson algorithm [Press et al., 2007]. The profile likelihood i(βiy,x) is derived every time needed by application of the same partial iterative maximization procedure, but with βi hold constant. More specifically, step 1 is run identically, but step 2 is replaced by

display math

where math formula.

Finally, one important element in D & A studies consists in deriving a goodness-of-fit metric after the inference was performed, in order to check the consistency of the assumed model. A classic approach to such a consistency test within the proposed EIV linear regression model would be for instance to use the weighted sum of squared residuals obtained from the maximized loglikelihood math formula. This quantity conveniently follows a χ2 distribution under the present assumption of known covariances; hence, a critical value could straightforwardly be obtained for application of the test—in theory. However, in practice, covariances are not known but estimated which has been shown to strongly influence the test's distribution. How to deal with the latter remains an open question at present (see Ribes et al. [2012] for a detailed explanation) which is beyond the scope of this letter.

4 Illustration and Simulation Results

This section illustrates and evaluates the performance of our WTLS procedure by applying it to simulated values of x and y. The use of simulated rather than real data aims at verifying that our inference procedure performs correctly, a goal which requires the actual values of β and x* to be known. Our idealized data simulation assumptions are based on Ribes et al. [2012]. They aim at replicating realistically a global data set of twentieth century temperature, as obtained after dimension reduction, and is described in detail in Appendix C together with assumptions regarding Σ. Regarding Ωi, we assumed as in HSAL06 that Ωi=Σ/nr+Λ for all i with nr=5 and Λ=σ2I representing the climate model error covariance—as a default choice in the absence of a sound estimate of the covariance associated to model error. We ran the inference procedure on ten samples of size N=1000 corresponding to ten values of σ2 ranging from 0 to math formula, where we use math formula as a reference error amplitude.

On average, the scheme (7) converged in 24 iterations which took 0.05 s using a desktop computer and the algorithmic optimization described in Appendix D. Computing confidence intervals took an extra 0.1 s, which overall makes the procedure applicable identically for higher values of both p and n (within the same order of magnitude). Figures 1a–1d show an illustration for a randomly selected simulation under σ=σ0. The performance of the MLE math formula was evaluated based on average mean squared error math formula, where k=1,...,N denotes the simulation index. The accuracy of the confidence interval I0.9 was evaluated by comparing the empirical frequency math formula of each actual scalar coefficient math formula falling into the interval, to its theoretical value 0.9. For the sake of comparison to existing procedures, both performance metrics were also systematically derived on each simulation by conducting an OLS and a TLS inference. In the context of this test bed, it is clear that the latter two procedures do make an incorrect assumption on Ωi—i.e., Ωi=0 and Ωi=Σ/nr, respectively—yet they provide a useful benchmark.

Figure 1.

(a–d) Illustration of the inference procedure for a simulation n=275, p=2, and σ=σ0 and (e and f) performance results. Data scatterplot (x1,y) (blue dots) and math formula (green circles) shown in Figure 1a. Contour plot of the negative profile loglikelihood −i(β) and trajectory of β(t) showing convergence to the minimum shown in Figure 1b. Plot of the negative profile loglikelihood −i(β1) shown in Figure 1c. Plot of the χ2 probability level and confidence interval shown in Figure 1d. Average mean squared error of the estimator obtained with our procedure (EIV, black line), TLS (blue line), and OLS (red line) shown in Figure 1f. Frequency of the actual value of β falling within the 90% confidence interval for our procedure, TLS, and OLS shown in Figure 1f.

Our simulation results are summarized in Figures 1e and 1f. They show that the procedure is able to estimate the true value of β with a moderate dispersion, no bias (not shown) and a low mean square error (MSE) overall. As expected, the MSE performance of the estimator decreases when σ increases since the signal to noise ratio then degrades. Expectedly as well, this degradation is lower than the degradation in the performance of the TLS and OLS estimators, because for the latter two, performance is further degraded by the increasing discrepancy between the model assumed for inference and the correct model. Therefore, our results also enhance the important benefit of using the correct covariance in the presence of multiple sources of error. Finally, the procedure also appears to produce relatively accurate confidence intervals. Our approach works well for small and moderate noise levels, but it appears to have its limits for large σ (Figures 1e and 1f). The performance decline occurs in the direction of an underestimation of actual uncertainty, i.e., overconfidence. Additional research is needed to treat the case of σ large.

5 Discussion and Conclusion

We have proposed here an inference method which is suitable for the optimal fingerprinting case in which multiple sources of error having each their particular dependence structure are assumed to be present in both observations and simulated responses. The method is applicable in particular to the situation studied by HSAL06 where two sources of error (i.e., internal variability and climate model error) are assumed to be at play, but it could also cope with different and/or additional error terms. Observational error would be a natural candidate, taking advantage of recent observational data sets which give a precise quantification of observational uncertainty and its dependence structure.

Technically, the method relies on an iterative likelihood minimization scheme that is similar to the one introduced in SW08 and may be viewed as an extension of the classic TLS algorithm and hence was coined “weighted total least square” (WTLS) by these authors. It appears to perform well when applied to idealized simulation data, with respect to both point estimation and confidence intervals. Yet the method has a few limitations that open opportunities for further improvements and require discussion. First, while in practice the algorithm always converged toward a global maximum of the likelihood function when applied to simulated data, its convergence was neither mathematically established nor was the fact that the limit—if reached—is the desired global maximum. This limitation might be overcome by further theoretical work based for instance on De Leeuw [1994] who established a few general conditions of convergence for a similar partial maximization iterative algorithm. As disturbing as this situation may feel, it is yet in our view fair to say that it is far from unusual. Indeed, convergence conditions of even widely used algorithms in statistics (e.g., k-means) are not always very rigorously understood yet make sense heuristically and in practice work—as can be argued to be the case of our procedure and that of SW08 from which it is inspired. On the other hand, we derived confidence intervals based on an approximation that holds asymptotically, but the associated convergence conditions that determine its validity domain were not investigated in detail. Actually, as suggested by our results under high model error σ (Figure 1f), it would be reasonable to expect that convergence is not reached for n as low as a few hundreds when the signal-to-noise ratio is too weak, causing overconfidence in the present situation. Several options are available to address this issue, in particular, the Bayesian framework is adapted to deal with situations where frequentist, asymptotic results are prone to fail. In any case, as problematic as it might be, this limitation applies equally to most existing D & A methods—in particular to the aforementioned TLS procedure—in which confidence intervals are derived based on the same asymptotic approximation, thus justifying in our view a specific methodological focus in D & A on this question.

Despite of these limitations, the benefit of using our method instead of the OLS or TLS algorithm, which under our simulation assumptions rely on an incorrect model, appears to be material. Indeed, our results show that neglecting error sources that actually significantly affect the data studied is heavily castigated: first, by a considerably higher error on the regression coefficient β (Figure 1e) and second, by an important underestimation of associated confidence intervals (Figure 1f). Due to this ambiguous combination of an additional error on β together with an overconfident uncertainty quantification, it is difficult to speculate whether our procedure would in general yield confidence intervals of smaller or larger amplitude than those obtained under an OLS or TLS treatment applied to the same data and henceforth whether it would lead to more or less assertive D & A statements. Settling this question would be important in our view and would require using our method to revisit one or several previous case studies—for instance that of HSAL06 later reproduced in Hegerl et al. [2007] would be a natural option. Such a comparison would certainly be relevant and instructive but was beyond the scope and length constraint of the present letter, which limited itself to methodological aspects.

Appendix A: Notations and Causality

The causal relationship between the true unknown regressors x* and their observed version x* matters. Two contrasted situations prevail in this respect. In our setup as in the vast majority of cases, the observed variable x is a noised version of the latent variable x*, i.e., x is caused by x*. By contrast, the symmetric case where it is the latent variable x* which this time is a noised version of the observed variable x, i.e., where x* is caused by x, may be treated as well in theory. In nonrigorous language, the error in the variable x is thus “additive” in the latter case and “subtractive” in the former. This interpretation inspires the following two notations: y=(x+ν)β+ϵ and y=(xν)β+ϵ, respectively; the latter being often used in D & A and in EIV literature in lieu of equation (1). Nevertheless, these two notations are confusing because they are equivalent strictly speaking—one may indeed algebraically substitute ν with math formula in the second equation to obtain the first—yet they aim at representing two different situations. We are faced here with the well-known, extensively discussed deficiency of algebraic equations in representing causal relationships, due to the fact that the former are symmetrical objects, whereas the latter are directional ones [Pearl, 2009]. The consensual way to cope with this deficiency is to augment the equation with a so-called “path diagram” in which arrows are drawn from causes to effects [Wright, 1921]. Under this convention, the path diagram x*x should be added to equation (1) to reflect our “subtractive noise” situation, and xx* should be added for the “additive noise” situation. Finally, it is worth emphasizing that the causal distinction discussed here is not a merely theoretical one; it does strongly influence inference results (not shown) so we can definitely not afford the confusion.

Appendix B: Noise-Dependence Structure

In our model, we assume dependence between lines and independence between columns instead of independence between lines and dependence between columns as in most existing EIV models. Line dependence corresponds here to dependence in space and time. Internal variability, for instance, uses to exhibit large-scale modes of variability (e.g., El Niño– Southern Oscillation and Atlantic Multidecadal Oscillation (AMO)) so that the value observed at a given location or instant is not independent from the value observed nearby. Here we are using the same type of assumption to describe the model response error (e.g., too sensitive models may simulate an overestimated greenhouse gas (GHG) warming at most locations, or error in large-scale patterns). Column independence corresponds to independence between the column vectors of the noise ν. The internal variability component of ν can be considered independent because each model run represents an independent realization of internal variability by construction. On the other hand, it is fair to say that—at least theoretically—one may expect some degree of column dependence on the model error component of noise ν, for two or more forcings that affect the same physical processes. However, in practice in D & A applications, the forcings usually considered (e.g., GHGs, aerosols) involve distinct physical processes yielding contrasted responses in order for attribution to be possible, which tends to lessen this issue.

Appendix C: Simulation Test Bed Assumptions

The model patterns x* correspond to a simulation ensemble average from the Centre National de Recherches Météorologiques-Climate Model version 5 under natural and anthropogenic forcings, respectively (i.e., p=2). Dimension reduction was based on spherical harmonics for spatial reduction and averaging by decade for temporal reduction, eventually yielding n=275. The covariance matrix Σ associated to internal variability was estimated on an ensemble of control simulations of the same model by mean of linear shrinkage. From the assumed covariances and patterns, we finally simulate successively the noise vectors ϵ and (νi)i=1,2 and the data to be used for inference math formula and y=x*β+ϵ with β=(1,1).

Appendix D: Implementation Details

The computational cost of the above calculations is driven by the inversion of the n×n matrix math formula, its most intensive piece, which must be performed at every iteration of step 1. If the latter were to be performed naively, the algorithm would be prohibitively costly when n is large. This computational hassle can be significantly lightened by deriving beforehand math formula, then form the product math formula and diagonalize it into math formula with PiPi=I and Δi diagonal, for i=1,...,p. Equation (6) can then be modified into

display math(D1)

where math formula for any vector u. Under the transformed expression (6), the time consuming n×n matrix inversion of step 1 now becomes much lighter since math formula is diagonal.

Acknowledgements

We gratefully acknowledge the Centre National de la Recherche Scientifique (CNRS), Météo France, the Consejo Nacional de Investigacion Cientifica y Tecnologica (CONICET), and the University of Buenos Aires (UBA) for their support in this collaboration. We would also like to thank Dáithí Stone and an anonymous reviewer for helpful suggestions and insightful reviews. Part of this work has been supported by the LEFE-MESDA project, the ANR-DADA project, the EU-FP7 ACQWA project (www.acqwa.ch), the PEPER-GIS project, the ANR-MOPERA project, the ANR-McSim project, and the MIRACCLE-GICC project.

The Editor thanks two anonymous reviewers for their assistance in evaluating this paper.

Ancillary