## 1. INTRODUCTION

[2] *Lorenz* [1963] shattered the paradigm of a clockwork universe when he proved that if the solution to a dynamical system was not periodic, then small uncertainties in the state will grow so large as to render the forecast no better than a randomly drawn state from the system. No longer was there reason to expect weather to be as predictable as the motion of the planets or the tides of the ocean. Lorenz's result implies that the time during which the state is predictable, i.e., when prediction errors lie below those based on random selection of realistic states, is finite in nonperiodic systems. Therefore, when the state is predictable, the prediction error variance is less than the error variance of random selections. The difference between these two variances can be called the predictable variance. Current numerical weather prediction models indicate that the predictable variance is relatively small after about 3 weeks [*Simmons and Hollingsworth*, 2002]. Despite being small, predictable variance beyond 3 weeks may still be of interest. Specifically, certain spatial or temporal structures may be highly predictable beyond 3 weeks but difficult to detect because the unpredictable structures superposed on it dominate. For instance, climate in a limited region might be highly predictable beyond 3 weeks, but this predictability might be difficult to detect in an analysis that pools all regions together. A component that is highly predictable only in a certain season may be difficult to detect in an analysis that pools all seasons together. Also, components that are predictable beyond 3 weeks may be persistent and hence explain much of the variability of monthly means, even if they explain little of the daily variability [*Shukla*, 1981a]. In addition, large-scale structures tend to be more persistent, and hence more predictable, than small-scale structures [*Lorenz*, 1969; *Shukla*, 1981b]. These and numerous other examples illustrate that predictability beyond 3 weeks can be identified by appropriate filtering in space or time. The question arises as to whether there exists an optimal method for finding such predictability. Consider the following techniques that have been used to identify predictable structures in weather and climate data sets: (1) *Barnett and Preisendorfer* [1987] used canonical correlation analysis to identify relations between sea surface temperatures and land surface temperature. (2) *Lorenz* [1965] used singular value decomposition to identify the initial conditions that maximized error growth. (3) *Deque* [1988] and *Renwick and Wallace* [1995] used a version of principal component analysis to identify the most predictable patterns in operational forecast models. (4) *Hasselmann* [1979, 1997] developed “fingerprint” methods for detecting climate change. (5) *Venzke et al.* [1999] used an extension of signal-to-noise ratio to multivariate analysis to identify predictable variables in climate change scenarios. (6) *Schneider and Griffies* [1999] used discriminant analysis to find components that maximize predictive power.

[3] Each of these methods has some legitimate claim for identifying maximally predictable components. A natural question is how the methods compare when applied to the same problem. Despite appearances to the contrary the above methods are consistent: They all give the same result, on average, when applied to variables that are joint normally distributed. Demonstrating this consistency on a case-by-case basis would be unsatisfying because it would not give insight into the underlying reasons for this consistency. The purpose of this paper is to summarize and clarify a theoretical framework based on information theory that reveals the underlying unity of various multivariate statistical methods. Indeed, all of the specific examples listed above are equivalent to maximizing certain terms of a measure of predictability called relative entropy. This paper also shows that the framework provides sensible answers to a variety of questions that otherwise have no clear answer. This topic constitutes only a fraction of the vast literature on predictability. For a review of other topics in the predictability of weather and climate we recommend the book edited by *Palmer and Hagedorn* [2006]. The remainder of this section outlines, with a minimum of mathematics, the main results reviewed in this paper.

[4] In section 2 we introduce the basic concepts in predictability theory. Specifically, we define the forecast and climatological distributions and two guiding principles of predictability. The first principle is that a variable is unpredictable if its forecast distribution is identical to its climatological distribution. Hence a necessary condition for predictability is that the forecast and climatological distributions must differ. Intuitively, a measure of predictability should measure the “difference” between two distributions. More precise statements about measures of predictability are difficult to formulate without knowing the motives of the user. In practice, predictability might be defined more restrictively, e.g., the forecast should have more information (less uncertainty) than the climatology, to account for small ensemble size or imperfect model.

[5] The second principle of predictability, which has been emphasized by *Schneider and Griffies* [1999] and *Majda et al.* [2002], is that a measure of predictability should be at least invariant to linear, invertible transformations of the variables. Measures that satisfy this invariance do not depend on the arbitrary basis set used to represent the variables. Three measures of predictability have been proposed that satisfy these principles: mutual information [*Leung and North*, 1990], predictive information [*Schneider and Griffies*, 1999], and relative entropy [*Kleeman*, 2002]. If only average predictability is considered, then all three measures are equal, and no distinction exists between the measures. A fourth measure, called the Mahalanobis error, is introduced here that satisfies the two principles and can be interpreted as a multivariate generalization of the familiar normalized mean square error.

[6] In section 3 and later, we confine our attention to normal distributions, for which numerous analytical results exist. Section 4 gives explicit expressions for the predictability of normally distributed variables in terms of their mean and covariances.

[7] Section 5 reviews an important concept called the whitening transformation. A whitening transformation produces a set of uncorrelated variables with equal variances. The importance of this transformation lies in the fact that when it is applied to forecast variables, many familiar techniques, such as analysis of variance, principal component analysis, and singular value decomposition, give immediate results about predictability. Indeed, predictability of a forecast can be deduced solely from the whitened forecast variables, as discussed in section 6.

[8] In section 7, we discuss a technique called predictable component analysis which finds components with maximum predictability, analogous to the way principle component analysis finds components with maximum variance. The state of a system can be represented by a sum of predictable components ordered such that the first has maximum predictability, the second has maximum predictability subject to being uncorrelated with the first, and so on. Remarkably, the same predictable components are obtained whether one optimizes predictive information, relative entropy (ignoring the signal term), mutual information, the Mahalanobis error, as well as classical measures such as normalized mean square error, average signal to noise ratio, or the anomaly correlation. Section 7 also shows that optimizing the signal term in relative entropy yields the same results as fingerprint methods in climate change analysis.

[9] In section 8 we discuss how singular vector analysis and canonical correlation analysis emerge naturally in this predictability framework. A vexing question in the use of singular vectors to measure predictability is which norm should be chosen. Information theory provides a sensible answer to this question. Specifically, singular vectors correspond to predictable components when the initial vector norm is based on the observation covariance matrix, and the final vector norm is based on the climatological covariance matrix. The initial vector norm constrains the initial vectors to have equal probability density, which ensures that they are equally likely to arise in the observations. The final vector norm reduces to relative variance in one dimension, as appropriate for predictability measures, and is invariant with respect to linear transformations, which ensures that the singular vectors do not depend on the coordinate system in which the state is represented. These results shed light on the role of norms in predictability. Section 9 shows that the above framework includes data assimilation, clarifying the fact that the predictability framework accounts for both dynamics and initial condition uncertainty.

[10] The role of norms in predictability theory is further clarified in section 10. It is shown in certain idealized examples that if the above norms are chosen, components that maximize signal are identical to components that minimize error, whereas this consistency is lost if other norms are chosen. A surprising result is that the choice of initial norm can determine whether one maximizes or minimizes predictability. Section 10 also shows how to generalize singular vectors to models with stochastic forcing. This generalization is important because if the model is stochastic, the growth of initial condition error captures only part of the total forecast spread since the stochastic component also contributes to spread.

[11] Section 11 shows that the above framework includes linear stochastic models: One need only to make the proper identification between stochastic model parameters and the forecast and climatological distributions. This connection allows a dynamical interpretation of predictability.

[12] In section 12 we discuss the fascinating and not fully understood role of nonnormality in predictability. Generalized stability analysis tells us that singular vectors of nonnormal systems grow more strongly than those of normal systems with the same dynamical eigenvalues. One might surmise from this that nonnormality diminishes predictability since it enhances error growth. However, one could equally well surmise that nonnormality enhances predictability because it enhances signal growth. Confounding these arguments is the fact that nonnormality increases the climatological variances, so the difference between forecast and climatological variances becomes difficult to guess. The solution to this dilemma, which becomes clear only after rigorous analysis, is that nonnormality enhances predictability. The minimum predictability, for all measures of predictability (ignoring the signal term in relative entropy), occurs when the whitened dynamical operator is normal. This condition occurs when the dynamical operator and noise covariance matrix can be diagonalized simultaneously, which, in turn, is precisely the condition for detailed balance. Remarkably, the minimum value of predictability depends only on the real part of the eigenvalues of the dynamical operator. This result further implies that the predictability of normal systems is independent of the location of spectral peaks in the power spectrum. A conjecture for the upper bound of predictability is also discussed.

[13] A fundamental limitation with the above framework is that the forecast distribution of the climate system is unknown and must be estimated from finite samples. This review concludes with a discussion of some promising techniques for dealing with finite samples.