2.1. Motivation for Singular Spectrum Analysis (SSA)
[31] SSA is designed to extract information from short and noisy time series and thus provide insight into the unknown or only partially known dynamics of the underlying system that generated the series [Broomhead and King, 1986a; Fraedrich, 1986; Vautard and Ghil, 1989]. We outline here the method for univariate time series and generalize for multivariate ones in section 4.2.
[32] The analogies between SSA and spatial EOFs are summarized in Appendix A, along with the basis of both in the KarhunenLoève theory of random fields and of stationary random processes. Multichannel SSA (see section 4.2) is numerically analogous to the extended EOF (EEOF) algorithm of Weare and Nasstrom [1982]. The two different names arise from the origin of the former in the dynamical systems analysis of univariate time series, while the latter had its origins in the principal component analysis of meteorological fields. The two approaches lead to different methods for the choice of key parameters, such as the fixed or variable window width, and hence to differences in the way of interpreting results.
[33] The starting point of SSA is to embed a time series {X(t):t = 1,…,N} in a vector space of dimension M, i.e., to represent it as a trajectory in the phase space of the hypothetical system that generated {X(t)}. In concrete terms this is equivalent to representing the behavior of the system by a succession of overlapping “views” of the series through a sliding Mpoint window.
[34] Let us assume, for the moment, that X(t) is an observable function (t) of a noisefree system's dependent variables X_{i}(t), as defined in (4), and that the function ϕ that maps the p variables {X_{i}(t):i = 1,…,p} into the single variable X(t) has certain properties that make it generic in the dynamical systems sense of Smale [1967]. Assume, moreover, that M > 2d + 1, where d is the dimension of the underlying attractor on which the system evolves, and that d is known and finite. If so, then the representation of the system in the “delay coordinates” described in (5) below will share key topological properties with a representation in any coordinate system. This is a consequence of Whitney's [1936] embedding lemma and indicates the potential value of SSA in the qualitative analysis of the dynamics of nonlinear systems [Broomhead and King, 1986a, 1986b; Sauer et al., 1991]. The quantitative interpretation of SSA results in terms of attractor dimensions is fraught with difficulties, however, as pointed out by a number of authors [Broomhead et al., 1987; Vautard and Ghil, 1989; Palus and Dvorák, 1992].
[35] We therefore use SSA here mainly (1) for dataadaptive signaltonoise (S/N) enhancement and associated data compression and (2) to find the attractor's skeleton, given by its least unstable limit cycles (see again Figure 1 above). The embedding procedure applied to do so constructs a sequence {(t)} of Mdimensional vectors from the original time series X, by using lagged copies of the scalar data {X(t):1 ≤ t ≤ N},
the vectors (t) are indexed by t = 1,…,N′, where N′ = N − M + 1.
[36] SSA allows one to unravel the information embedded in the delaycoordinate phase space by decomposing the sequence of augmented vectors thus obtained into elementary patterns of behavior. It does so by providing dataadaptive filters that help separate the time series into components that are statistically independent, at zero lag, in the augmented vector space of interest. These components can be classified essentially into trends, oscillatory patterns, and noise. As we shall see, it is an important feature of SSA that the trends need not be linear and that the oscillations can be amplitude and phase modulated.
[37] SSA has been applied extensively to the study of climate variability, as well as to other areas in the physical and life sciences. The climatic applications include the analysis of paleoclimatic time series [Vautard and Ghil, 1989; Yiou et al., 1994, 1995], interdecadal climate variability [Ghil and Vautard, 1991; Allen and Smith, 1994; Plaut et al., 1995; Robertson and Mechoso, 1998], as well as interannual [Rasmusson et al., 1990; Keppenne and Ghil, 1992] and intraseasonal [Ghil and Mo, 1991a, 1991b] oscillations. SSA algorithms and their properties have been investigated further by Penland et al. [1991], Allen [1992], Vautard et al. [1992], and Yiou et al. [2000]. The SSA Toolkit first documented by Dettinger et al. [1995a] was built, largely but not exclusively, around this technique.
2.2. Decomposition and Reconstruction
[38] In this section we illustrate the fundamental SSA formulae with the classical example of a climatic time series, the Southern Oscillation Index (SOI). SOI is a climatic index connected with the recurring El Niño conditions in the tropical Pacific. It is defined usually as the difference between the monthly means of the sea level pressures at Tahiti and at Darwin (Australia). We use this definition and the monthly data given in the archive http://tao.atmos.washington.edu/pacs/additional_analyses/soi.html.
[39] The SOI data in this archive are based on the time series at each of the two stations being deseasonalized and normalized [Ropelewski and Jones, 1987]. The seasonal cycle is removed by subtracting the average of the values for each calendar month over a reference interval, in this case 1951–1980. The residues from this operation, called monthly anomalies in the climatological literature, are then normalized with respect to the standard deviation computed over the same interval.
[40] The archived SOI data for 1866–1997 are from the Climate Research Unit of the University of East Anglia, and those for 1998–1999 are from the Climate Prediction Center of the U.S. National Centers for Environmental Prediction (NCEP). They are obtained by taking the difference between the anomalies at Tahiti and those at Darwin and dividing by the standard deviation of this difference over the same 30year reference interval. The time interval we consider here goes from January 1942 to June 1999, during which no observations are missing at either station; this yields N = 690 raw data points. Note that this raw SOI time series is centered and normalized over the reference interval 1951–1980, but not over the entire interval of interest. We show in Figure 2 the SOI obtained from this raw data set. It actually has mean −0.0761 and standard deviation equal to 1.0677. All subsequent figures, however, use a version of the series that has been correctly centered over the interval January 1942 to June 1999.
[41] SSA is based on calculating the principal directions of extension of the sequence of augmented vectors {(t):t = 1,…,N′} in phase space. The M × M covariance matrix C_{X} can be estimated directly from the data as a Toeplitz matrix with constant diagonals; that is, its entries c_{ij} depend only on the lag i − j [cf. Vautard and Ghil, 1989]:
The eigenelements {(λ_{k}, ρ_{k}):k = 1,…,M} of C_{X} are then obtained by solving
The eigenvalue λ_{k} equals the partial variance in the direction ρ_{k}, and the sum of the λ_{k}, i.e., the trace of C_{X}, gives the total variance of the original time series X(t).
[42] An equivalent formulation of (7), which will prove useful further on, is given by forming the M × M matrix E_{X} that has the eigenvectors ρ_{k} as its columns and the diagonal matrix Λ_{X} whose elements are the eigenvalues λ_{k}, in decreasing order:
here E_{X}^{t} is the transpose of E_{X}. Each eigenvalue λ_{k} gives the degree of extension, and hence the variance, of the time series in the direction of the orthogonal eigenvector ρ_{k}.
[43] A slightly different approach to computing the eigenelements of C_{X} was originally proposed by Broomhead and King [1986a]. They constructed the N′ × M trajectory matrix D that has the N′ augmented vectors (t) as its rows and used singularvalue decomposition (SVD) [see, for instance, Golub and Van Loan, 1996] of
to obtain the square roots of λ_{k}. The latter are called the singular values of D and have given SSA its name.
[44] Allen and Smith [1996] and Ghil and Taricco [1997] have discussed the similarities and differences between the approaches of Broomhead and King [1986a] and Vautard and Ghil [1989] in computing the eigenelements associated in SSA with a given time series X(t). Both the Toeplitz estimate (6) and the SVD estimate (9) lead to a symmetric covariance matrix C_{X}. In addition, the eigenvectors ρ_{k} of a Toeplitz matrix are necessarily odd and even, like the sines and cosines of classical Fourier analysis. The Toeplitz approach has the advantage of better noise reduction when applied to short time series, as compared with the SVD approach. This advantage comes at the price of a slightly larger bias when the time series is strongly nonstationary over the interval of observation 1 ≤ t ≤ N [Allen, 1992]. Such biasversusvariance tradeoffs are common in estimation problems.
[45] To obtain S/N separation, one plots the eigenvalue spectrum illustrated in Figure 3. In this plot an initial plateau that contains most of the signal is separated by a steep slope from the noise; the latter is characterized by much lower values that form a flat floor or a mild slope [Kumaresan and Tufts, 1980; Pike et al., 1984; Vautard and Ghil, 1989].
[46] As the M × M matrix C_{X} is symmetric, standard algorithms [Press et al., 1988] will perform its spectral decomposition efficiently, as long as M is not too large. The choice of M is based on a tradeoff between two considerations: quantity of information extracted versus the degree of statistical confidence in that information. The former requires as wide a window as possible, i.e., a large M, while the latter requires as many repetitions of the features of interest as possible, i.e., as large a ratio N/M as possible. The choice of M = 60 in Figure 3 allows us to capture periodicities as long as 5 years, since Δt = 1 month, and thus the dimensional window width is MΔt = 5 years; on the other hand, N/M ≅ 11 is fairly safe, and the diagonalization of C_{X} for this moderate value of M does not introduce large numerical errors either.
[47] In Figure 3, there is a clear grouping of the first five eigenvalues, followed by a very steep slope of three additional eigenvalues. The latter are well separated from the first five, as well as from the remaining 52 eigenvalues, which form the mildly sloping and flattening out “tail” of the SSA spectrum.
[48] The S/N separation obtained by merely inspecting the slope break in a “scree diagram” of eigenvalues λ_{k} or singular values λ_{k}^{1/2} versus k works well when the intrinsic noise that perturbs the underlying deterministic system and the extrinsic noise that affects the observations are both white, i.e., uncorrelated from one time step to the next (see definition of ξ(t) in equation (1)). This rudimentary separation works less well when either noise is red, i.e., when it is given by an AR(1) process (see section 3.3) or is otherwise correlated between time steps [Vautard and Ghil, 1989]. The difficulties that arise with correlated noise led Allen [1992] and Allen and Smith [1994] to develop Monte Carlo SSA (see section 2.3).
[49] When the noise properties can be estimated reliably from the available data, the application of a socalled “prewhitening operator” can significantly enhance the signal separation capabilities of SSA [Allen and Smith, 1997]. The idea is to preprocess the time series itself or, equivalently but often more efficiently, the lag covariance matrix C_{X}, such that the noise becomes uncorrelated in this new representation. SSA is then performed on the transformed data or covariance matrix and the results are transformed back to the original representation for inspection.
[50] By analogy with the meteorological literature, the eigenvectors ρ_{k} of the lag covariance matrix C_{X} have been called empirical orthogonal functions (EOFs) [see Preisendorfer, 1988, and references therein] by Fraedrich [1986] and by Vautard and Ghil [1989]. The EOFs corresponding to the first five eigenvalues are shown in Figure 4. Note that the two EOFs in each one of the two leading pairs, i.e., EOFs 1 and 2 (Figure 4a) as well as EOFs 3 and 4 (Figure 4b), are in quadrature and that each pair of EOFs corresponds in Figure 3 to a pair of eigenvalues that are approximately equal and whose error bars overlap. Vautard and Ghil [1989] argued that subject to certain statistical significance tests discussed further below, such pairs correspond to the nonlinear counterpart of a sinecosine pair in the standard Fourier analysis of linear problems.
[51] In the terminology of section 1 here, such a pair gives a handy representation of a ghost limit cycle. The advantage over sines and cosines is that the EOFs obtained from SSA are not necessarily harmonic functions and, being data adaptive, can capture highly anharmonic oscillation shapes. Indeed, relaxation oscillations [Van der Pol, 1940] and other types of nonlinear oscillations [Stoker, 1950], albeit purely periodic, are usually not sinusoidal; that is, they are anharmonic. Such nonlinear oscillations often require therefore the use of many harmonics or subharmonics of the fundamental period when carrying out classical Fourier analysis, while a single pair of SSA eigenmodes might suffice. Capturing the shape of an anharmonic oscillation, such as a seesaw or boxcar, albeit slightly rounded or smoothed, is easiest when the SSA window is exactly equal to the single period being analyzed.
[52] Projecting the time series onto each EOF yields the corresponding principal components (PCs) A_{k}:
Figure 5 shows the variations of the five leading PCs. Again, the two PCs in each of the pairs (1, 2) and (3, 4) are in quadrature, two by two (see Figures 5a and 5b). They strongly suggest periodic variability at two different periods, of about 4 and 2 years, respectively. Substantial amplitude modulation at both periodicities is present, too.
[53] The fifth PC, shown in Figure 5c, contains both a longterm, highly nonlinear trend and an oscillatory component. We shall discuss the trend of the SOI series in connection with Figures 6b and 16a further below.
[54] We can reconstruct that part of a time series that is associated with a single EOF or several by combining the associated PCs:
here �� is the set of EOFs on which the reconstruction is based. The values of the normalization factor M_{t}, as well as of the lower and upper bound of summation L_{t} and U_{t}, differ between the central part of the time series and its end points [Ghil and Vautard, 1991; Vautard et al., 1992]:
[55] The reconstructed components (RCs) have the property of capturing the phase of the time series in a welldefined least squares sense, so that X(t) and R_{��}(t) can be superimposed on the same timescale, 1 ≤ t ≤ N. This is an advantage of the RCs over the PCs, which have length N − M and do not contain direct phase information within the window width M.
[56] No information is lost in the reconstruction process, since the sum of all individual RCs gives back the original time series. Partial reconstruction is illustrated in Figure 6 by summing the variability of PCs 1–4, associated with the two leading pairs of eigenelements; it is common to refer to such a reconstruction (equation (11)), with �� = {1, 2, 3, 4}, as RCs 1–4. The portion of the SOI variability thus reconstructed contains 43% of the total variance. It captures the quasioscillatory behavior isolated by these two leading pairs, with its two distinct nearperiodicities.
[57] It is clear that the partial SOI reconstruction (bold curve in Figure 6a) is smooth and represents the essential part of the interannual variability in the monthly SOI data (thin curve). Each of the two pairs of RCs, 1–2 and 3–4, can be thought of as retracing a ghost limit cycle in the phase space of the tropical climate system. These two limit cycles can then be said to form the robust skeleton of the attractor. It is unlikely that a time series of a few hundred points, like the SOI or other typical climatic time series, will suffice to capture the attractor's fine structure [Vautard and Ghil, 1989].
[58] This robust skeleton, however, provides sufficient information for most practical purposes. In particular, warm events (El Niños) and cold ones (La Niñas) over the eastern tropical Pacific are captured quite well, during the 57.5 years of record, as minima and maxima of the partially reconstructed SOI. We check this statement by comparing the bold curve in Figure 6a with the vertical arrows along the figure's upper and lower abscissae.
[59] These arrows correspond to strong (bold arrows) or moderate (light arrows) El Niño–Southern Oscillation (ENSO) events. The events are defined subjectively, calendar quarter by calendar quarter, from reanalyzed surface temperature data produced at NCEP and the U.K. Met. Office for the years 1950–1999 (see http://www.cpc.noaa.gov/products/analysis_monitoring/ensostuff/ensoyears.html).
[60] Large positive peaks in the partial SOI reconstruction, i.e., those that exceed one standard deviation, match the strong La Niñas quite well (downward pointing arrows on the upper abscissa). The only exceptions are the 1950 and 1971 cold events, which were of moderate strength, and the weak 1996 La Niña.
[61] The same good match obtains between the large negative peaks in the figure's bold curve, i.e., those that exceed one standard deviation, and the strong El Niños (upward pointing arrows on the lower abscissa). The only notable exception is the large peak in 1977–1978, which was classified subjectively as a weak warm event. The 1957–1958 and 1991–1992 events appear as moderatesize minima in the partial SOI reconstruction. They are included in the NCEP list as strong El Niños for one (January–March 1958) or two (January–June 1992) seasons, but neither was strong during the second half of the calendar year. Thus the only discrepancies among the oscillatory part of the SOI, based on RCs 1–4, and the subjective NCEP classification are in the intensities (moderate versus strong, or vice versa) of a few events.
[62] Earlier SSA results do support the present emphasis on the doubly periodic character of ENSO phenomena. They include the analyses of Rasmusson et al. [1990] for sea surface temperatures and nearsurface zonal winds in the tropical IndoPacific belt, those of Keppenne and Ghil [1992] for a slightly different treatment of the SOI, as well as those of Jiang et al. [1995a] for sea surface temperatures and of Unal and Ghil [1995] for sea level heights in the tropical Pacific. In all these data sets and SSA analyses, a quasibiennial and a lowerfrequency, quasiquadrennial oscillatory pair were reliably identified among the leading SSA eigenelements.
[63] Shown in Figure 6b is also a filtered version of RC5, which captures well the small but significant longterm trend of the SOI time series in Figure 2. To eliminate the oscillatory component apparent in PC5 (Figure 5c), we applied SSA with the same 60month window to the full RC5. The two leading eigenmodes correspond to a pure trend, shown in Figure 6b, while the second eigenpair corresponds to a quasibiennial oscillation (not shown). The SOI trend in Figure 6b agrees, up to a point, with the one captured by the multitaper reconstruction in section 3.4.2 (see Figure 16a there). Given the recent interest in the interdecadal variability of ENSO, we postpone further discussion of this result for the moment when its multitaper version is also in hand.
[64] Reliable S/N separation and identification of oscillatory pairs is not always as easy as in the case of interannual climate variability in the tropical Pacific. Global surfaceair temperatures, for instance, present a considerably more difficult challenge for identifying interannual and interdecadal oscillations. Elsner and Tsonis's [1991] excessive reliance on eigenvalue rank order as a criterion of significance in SSA has led to considerable confusion in this case [see Allen et al., 1992a, 1992b].
[65] Reliable identification of the true signal conveyed by a short, noisy time series and of the oscillatory components within this signal requires effective criteria for statistical significance, which are treated in the next section. Subject to these caveats, a clean signal, obtained by partial reconstruction over the correct set of indices ��, provides very useful information on the underlying system, which is often poorly or incompletely known.
[66] Such a signal can then be analyzed further, both visually and by using other spectral analysis tools that are described in section 3. The maximum entropy method (MEM), which we describe in section 3.3, works particularly well on signals so enhanced by SSA [Penland et al., 1991].
2.3. Monte Carlo SSA
[67] In the process of developing a methodology for applying SSA to climatic time series, a number of heuristic [Vautard and Ghil, 1989; Ghil and Mo, 1991a; Unal and Ghil, 1995] or Monte Carlo [Ghil and Vautard, 1991; Vautard et al., 1992] methods have been devised for S/N separation or the reliable identification of oscillatory pairs of eigenelements. They are all essentially attempts to discriminate between the significant signal as a whole, or individual pairs, and white noise, which has a flat spectrum. A more stringent “null hypothesis” [Allen, 1992] is that of red noise, since most climatic and other geophysical time series tend to have larger power at lower frequencies [Hasselmann, 1976; Mitchell, 1976; Ghil and Childress, 1987].
[68] For definiteness, we shall use here the term red noise exclusively in its narrow sense, of an AR(1) process given by (1) with M = 1 and 0 < a_{1} < 1, as required by weak or widesense stationarity (see Appendix A for an exact definition). Other stochastic processes that have a continuous spectral density S(f) which decreases monotonically with frequency f will be called “warm colored.”
[69] The power spectrum S(f) of the AR(1) process is given by [e.g., Chatfield, 1984]
Here 0 < S_{0} < ∞ is the average value of the power spectrum, related to the whitenoise variance σ^{2} in (1) by
while r is the lagone autocorrelation, r = a_{1}, and the Nyquist frequency f_{N} = 1/(2Δt) is the highest frequency that can be resolved for the sampling rate Δt. Note that in (1) and (5) we have used Δt = 1 for simplicity and without loss of generality, since Δt can always be redefined as the time unit. It is useful at this point to recall, for clarity's sake, that it is not necessary to do so. The characteristic decay timescale τ of the AR(1) noise can be estimated by
[70] In general, straightforward tests can be devised to compare a given time series with an idealized noise process: The continuous spectrum of such a process is known to have a particular shape, and if a particular feature of the data spectrum lies well above this theoretical noise spectrum, it is often considered to be statistically “significant.” A single realization of a noise process can, however, have a spectrum that differs greatly from the theoretical one, even when the number of data points is large. It is only the (suitably weighted) average of such sample spectra over many realizations that will tend to the theoretical spectrum of the ideal noise process. Indeed, the Fourier transform of a single realization of a rednoise process can yield arbitrarily high peaks at arbitrarily low frequencies; such peaks could be attributed, quite erroneously, to periodic components.
[71] More stringent tests have to be used therefore to establish whether a time series can be distinguished from red noise or not. Allen [1992] devised such a test that compares the statistics of simulated rednoise time series with those of a given climatic time series. The principle of this test is implicit in some of Broomhead and King's [1986a] ideas. The application of SSA in combination with this particular Monte Carlo test against red noise has become known as “Monte Carlo SSA” (MCSSA) [see Allen and Smith, 1994, 1996].
[72] MCSSA can be used, more generally, to establish whether a given time series can be distinguished from other welldefined processes. We only present here, for the sake of brevity and clarity, the original test against an AR(1) process. Allen [1992] proposes, in fact, to estimate the mean X_{0} of the process at the same time as the other parameters. We therefore rewrite (1) here for the particular case at hand as
here, as in (1), ξ(t) is a Gaussiandistributed whitenoise process with zero mean and unit variance.
[73] When testing against the process (16), the first step in MCSSA is to estimate the mean X_{0} and the coefficients a_{1} and σ from the time series X(t) by using a maximum likelihood criterion. Allen and Smith [1996] provide lowbias estimators that are asymptotically unbiased in the limit of large N and close to unbiased for series whose length N is at least an order of magnitude longer than the decorrelation time τ = −1/log r. Cochrane and Orcutt [1949] have shown that when τ is not very small relative to N, the use of a crude estimate for the mean X_{0}, i.e., “centering” the time series first, can lead to severe biases in the subsequent estimation of a_{1}. This is not the case for the SOI time series used here, as τ ≪ N for it. Hence we have used an SOI time series, based on the data in Figure 2, that has been centered.
[74] On the basis of estimated values _{0}, â_{1}, and of these parameters, an ensemble of simulated rednoise data is generated and, for each realization, a covariance matrix C_{R} is computed. In the nonlinear dynamics literature, such simulated realizations of a noise process are often called surrogate data [Drazin and King, 1992; Ott et al., 1994].
[75] The covariance matrices of the surrogate data are then projected onto the eigenvector basis E_{X} of the original data by using (8) for their SVD,
[76] Since (17) is not the SVD of the particular realization C_{R}, the matrix Λ_{R} is not necessarily diagonal, as it is in (8). Instead, Λ_{R} measures the resemblance of a given surrogate set with the original data set of interest. The degree of resemblance can be quantified by computing the statistics of the diagonal elements of Λ_{R}. The statistical distribution of these elements, determined from the ensemble of Monte Carlo simulations, gives confidence intervals outside which a time series can be considered to be significantly different from a random realization of the process (16). For instance, if an eigenvalue λ_{k} lies outside a 90% noise percentile, then the rednoise null hypothesis for the associated EOF (and PC) can be rejected with this level of confidence. Otherwise, that particular SSA component of the time series cannot be considered as significantly different from red noise. Additional problems posed by the multiplicity of SSA eigenvalues and other finer points are also discussed by Allen [1992] and Allen and Smith [1996].
[77] As the next step in the analysis of our SOI time series, we apply an MCSSA noise test to it. In order to enhance the readability of the diagrams for the SSA spectra in the presence of MCSSA error bars, we associate a dominant frequency with each EOF detected by SSA, as suggested by Vautard et al. [1992], and plot in Figure 7 the eigenvalues (diamonds) versus frequency, following Allen and Smith [1996].
[78] Such a plot is often easier to interpret, with respect to the MCSSA error bars, than plotting versus the eigenvalue's rank k as in Figure 3. Care needs to be exercised, however, since the dominantfrequency estimate may be ambiguous or uncertain, due to the possible anharmonicity of the EOFs, especially for low frequencies. This is patently the case for the fifth mode in Figures 3–5. This mode appears at zero frequency in Figure 7, while we know very well that it has a quasibiennial oscillatory component besides its capturing the SOI's nonlinear trend.
[79] The error bars shown in Figure 7 for each EOF represent 90% of the range of variance found in the statespace direction defined by that EOF in an ensemble of 1000 rednoise realizations; that is, it denotes the interval between the 5th and 95th percentile. Hence eigenvalues lying outside this interval are relatively unlikely (at the 10% level) to be due merely to the rednoise process (equation (16)) against which they are being tested.
[80] The high values in Figure 7 exhibit a significant quasibiennial oscillation and an oscillatory component with a period of about 50 months. The low values near 1 cycle yr^{−1} are due to the fact that the seasonal cycle has been removed prior to the analysis, and the consequent suppression of power near annualcycle periods has not been taken into account in the noise parameter estimation. Allen and Smith [1996] recommend that as far as possible, seasonal cycles should not be removed prior to the analysis, but that their presence should be taken into account explicitly in the parameter estimation [see also Jiang et al., 1995a; Unal and Ghil, 1995]. This recommendation has to be weighted against two related considerations. First, the reliable identification of a given periodicity becomes harder as the number of periodicities to be estimated increases, and second, the seasonal cycle and its physical causes are fairly well known and understood.
[81] The MCSSA algorithm described above can be adapted to eliminate known periodic components and test the residual against noise. This adaptation can provide better insight into the dynamics captured by the data. Indeed, known periodicities, like orbital forcing on the Quaternary timescale or seasonal forcing on the intraseasonaltointerannual one, often generate much of the variance at the lower frequencies manifest in a time series and alter the rest of the spectrum. Allen [1992] and Allen and Smith [1996] describe this refinement of MCSSA which consists of restricting the projections given by (17) to the EOFs that do not account for known periodic behavior.
[82] Monte Carlo simulation is a robust, flexible, and nonparametric approach to assessing the significance of individual eigenmodes in SSA. Since it can be computationally intensive, Allen and Smith [1996] suggest a much faster, albeit parametric alternative. In an appendix they also clarify the relationship between confidence intervals from MCSSA and earlier heuristic approaches [e.g., Vautard and Ghil, 1989; Ghil and Mo, 1991a; Unal and Ghil, 1995].
[83] MCSSA provides, furthermore, the means of evaluating the significance of frequency separation between apparently distinct spectral peaks, without appealing to the application of additional spectral methods. The associated “bandwidth” is 1/M and is discussed further in a multichannel context in section 4.2. Given the fact that the assignment of frequency to an eigenpair is not entirely unique (see discussion of the zerofrequency component in Figure 7), we still recommend the application of other spectral methods for the detailed study of spectral content, in addition to SSA and MCSSA.
2.4. Multiscale SSA and Wavelet Analysis
[84] Wavelet analysis has become a basic tool for the study of intermittent, complex, and selfsimilar signals, because it works as a mathematical microscope that can focus on a specific part of the signal to extract local structures and singularities [Strang, 1989; Meyer, 1992, 1993; Daubechies, 1992]. In climate dynamics [Meyers et al., 1993; Weng and Lau, 1994; Torrence and Compo, 1998], and geophysics [Kumar and FoufoulaGeorgiou, 1997], wavelets have been used mostly to follow changes in frequency of one or more periodic signals. While SSA follows amplitude and phase modulation of a signal easily (see section 2.2, as well as Plaut and Vautard [1994] and Moron et al. [1998]), a narrow band of frequencies that vary in time, from one line in the band to another (see section 3.1), is captured typically by a single pair of SSA eigenmodes.
[85] A wavelet transform requires the choice of an analyzing function or “mother wavelet” ψ that has general admissibility properties [Meyer, 1992; Daubechies, 1992], as well as the more specific property of time and frequency localization; that is, ψ and its Fourier transform ℱψ must decay rapidly outside a given interval. Functions ψ based on a Gaussian, ψ(x) = exp (−x^{2}), first proposed in this context by Gabor [1946], possess the localization property even though they do not satisfy the admissibility condition that their integral over the real line ℝ vanish [Delprat et al., 1992].
[86] A ψwavelet transform W_{ψ} in continuous time and frequency is simply a projection of a signal X(t), −∞ < t < ∞, onto btranslated and adilated versions of ψ:
If most of ψ is concentrated in the interval [−1, 1], say (up to a rescaling), then (18) is clearly an analysis of X in the interval [b − a, b + a]. Using the successive derivatives ψ^{(n)} of a given mother wavelet ψ in (18) is equivalent (up to a normalization factor) to a ψ analysis of the successive derivatives of the time series X; this is easy to see through an integration by parts.
[87] The original signal, or a filtered version of it, can be reconstructed from the family of wavelet transforms. Hence for scale values a in an interval I, a reconstructed version X_{I} of the signal X(t) is
A_{ψ} is a normalization factor which only depends on the mother wavelet ψ. This formulation is essentially a bandpass filter of X through I; if I is the positive real line, I = ℝ^{+}, then X_{I}(t) = X(t). Note that the Gaussian ψ(x) = exp(−x^{2}/2) itself cannot be used in the reconstruction formula (19), because it does not satisfy ∫_{−∞}^{∞} ψ(x) dx = 0, although its first derivative does. The forward transform of (18), however, is well defined, and the Gaussian can be used as an analyzing tool [Arneodo et al., 1993; Yiou et al., 2000].
[88] A large number of wavelet bases have been introduced to satisfy the conflicting requirements of completeness, localization in both time and frequency, and orthogonality or, for nonorthogonal bases, limited redundancy. To provide an optimal multiscale decomposition of a given signal, an automatic timevarying adjustment of the mother wavelet's shape may be desirable. This could replace the current practice of searching through extensive “libraries” of mother wavelets (e.g., http://www.mathsoft.com/wavelets.html). To provide such a dataadaptive variation in basis with time, Yiou et al. [2000] have introduced multiscale SSA.
[89] Systematic comparisons between SSA, the wavelet transform, and other spectral analysis methods have been carried out by Yiou et al. [1996] and in Table 1 of Ghil and Taricco [1997]. Further analogies between certain mathematical features of SSA and wavelet analysis were mentioned by Yiou [1994]. Table 2 here summarizes the most useful mathematical parallels between the two time series analysis methods.
Table 2. Analogy Between SSA and Wavelet AnalysisMethod  SSA  Wavelet Transform 


Analyzing function  EOF ρ_{k}  mother wavelet ψ 
Basic facts  ρ_{k} eigenvectors of C_{X}  ψ chosen a priori 
Decomposition  ∑_{t′=1}^{M}X(t + t′)ρ_{k}(t′)  ∫ X(t)ψ((t − b)/a) dt 
Scale  W = αMΔt  a 
Epoch  t  b 
Average and trend  ρ_{1}  ψ^{(0)} 
Derivative  ρ_{2}  ψ^{(1)} 
[90] In SSA the largest scale at which the signal X is analyzed in (10) is approximately NΔt, the length of the time series, and the largest period is the window width MΔt. As a consequence, the EOFs ρ_{k} contain information from the whole time series, as in the Fourier transform.
[91] In order to define a local SSA, the SSA methodology was extended by using a timefrequency analysis within a running time window whose size W is proportional to the order M of the covariance matrix. Varying M, and thus W in proportion, a multiscale representation of the data is obtained. A local SSA is performed by sliding windows of length W ≤ NΔt, centered on times b = W/2, …, NΔt − W/2, along the time series. This method is useful when the local variability, assumed to be the sum of a trend, statistically significant variability, and noise, changes in time.
[92] A priori, the two scales W and M can vary independently, as long as W is larger than MΔt, W/(MΔt) ≥ α > 1, and α is large enough [Vautard et al., 1992]. In the wavelet transform, however, the number of oscillations of the mother wavelet is fixed and independent of the scale (width) of the analyzing wavelet. In this spirit, Yiou et al. [2000] fixed the ratio W/M = αΔt and relied therewith on the oscillation property of the EOFs to provide a fixed number of zeroes for the dataadaptive “wavelet” ρ_{k} of each local SSA analysis. They used α = 3 in most of their calculations, as well as only the one or two leading EOFs, ρ_{1} and/or ρ_{2}, on each W interval. This provides an analysis at a fixed scale W (see Table 2). Sampling a set of W values that follow a geometrical sequence, for instance, in powers of 2 or 3, provides a multiscale analysis very similar to the wavelet transform.
[93] For a given position b and fixed W, we thus obtain local EOFs that are the direct analogs of analyzing wavelet functions. The number of EOF oscillations increases roughly with order, and the zeroes of ρ_{k+1} separate those of ρ_{k}; this emulates an important property of successive analyzing wavelets. The first EOF thus corresponds approximately to an analyzing wavelet function with a single extremum and no zero inside the window, for instance, the Gaussian or the “Mexican hat”; such a basic wavelet is denoted by ψ = ψ^{(0)} in Table 2. The second EOF has a single zero and is reminiscent of the first derivative of the Gaussian, denoted by ψ^{(1)} in Table 2, and so on. Vautard and Ghil [1989] demonstrated this oscillation property of the EOFs for red noise in continuous time, and Allen [1992] did so for the same type of noise process in discrete time. Appendix B of Yiou et al. [2000] provides a more general proof that is based on the concept of total positivity for lagcovariance matrices.
[94] For each b and each EOF ρ_{k}, it is possible to obtain local PCs A_{k} and RCs R_{k} (see equations (10) and (11)). The kth PC at time b is
and the corresponding RC is
with b − W/2 ≤ t ≤ b + W/2. The crucial difference between this local version and global SSA is that the RCs are obtained here from local lagcovariance matrices. As b varies from W/2 to NΔt − W/2, this implies that the RCs will be truncated near the edges of the time series.
[95] We thus see that the local SSA method provides simultaneous “wavelet transforms” of the data by a set of analyzing wavelet functions, corresponding to the M different EOFs of the lagcovariance matrix. When W = αMΔt is small, local SSA provides a smallscale analysis of the signal with a few distinct analyzing functions, i.e., a small subset of EOFs indexed by k. This is reasonable, as there are not many possible structures at scales that approach the sampling timescale. On the other hand, at large scales, local SSA can also provide the simultaneous analysis by many different analyzing mother wavelet functions, {ρ_{k}:1 ≤ k ≤ M}, and thus reflect the large complexity of the structures that is possible over the entire time series.
[96] The most important property of this local SSA analysis is that the analyzing functions are data adaptive. In other words, the shape of these analyzing functions is not imposed a priori, like in a wavelet analysis, but explicitly depends on the time series itself. For instance, an oscillatory behavior could be followed in a given time series by white or colored noise and then by deterministically intermittent behavior. These changes in behavior could indicate regime transitions that the system which generates the signal underwent while under observation. If so, an analyzing wavelet which is adapted to each section of the signal will definitely help follow such regime transitions in time.
[97] Yiou et al. [2000] performed multiscale SSA on the monthly SOI data for the years 1933–1996 (see section 2.2). The parameters were α = 3 and geometric scale increments of 2. They computed, moreover, an “instantaneous” frequency by least squares fitting a sine wave to each local EOF of interest, as done in Monte Carlo SSA for the global EOFs. The instantaneous frequency can also be obtained from a complex wavelet transform [Delprat et al., 1992; Farge, 1992], by using information on the phase of the transform.
[98] The analysis of Yiou and colleagues did not reveal any evidence of selfsimilarity or fractality in the SOI. Instead, they find a preferred scale of variability between 3 and 5 years (not shown), which corresponds to ENSO's lowfrequency mode (see section 2.2 and citations therein). The first two local EOFs are consistently paired and in phase quadrature, which shows that the nonlinear oscillation associated with this mode is robust and persists throughout the 60odd years being examined.
[99] The computation of the instantaneous frequency in multiscale SSA allows one to detect an abrupt frequency shift of ENSO's lowfrequency mode near 1960 (Figure 8). The characteristic periodicity goes from 57 months (between 1943 and 1961) to 39 months (between 1963 and 1980). A decrease in period in the early 1960s was observed already by Moron et al. [1998] in tropical Pacific sea surface temperatures, by using multichannel (global) SSA, and by Wang and Wang [1996] in a sea level pressure record at Darwin, using wavelet methods. Mann and Park [1996b, Figure 9] also observed a “pinch out” in the amplitude of the quasibiennial oscillation in the 1960s.
[100] Moron et al. [1998] noticed, on the one hand, a change in the lowfrequency mode's periodicity in the early 1960s by using multichannel SSA (see section 4.2) with different window widths (72 = M = 168 months) on sea surface temperature fields for 1901–1994 (their Figure 2 and Table 4). On the other hand, these authors found that the trend of the sea surface temperatures in the tropical Pacific exhibited an increase from 1950 on (their Figure 4). They related this surface trend to a change in the parameters, such as the thermocline depth along the equator, of the coupled oceanatmosphere oscillator responsible for ENSO [Ghil et al., 1991; Neelin et al., 1994, 1998].
[101] The frequency of a linear oscillator always changes smoothly as a function of the coefficients in the linear ODE that governs it. That is typically the case for nonlinear oscillators that depend on a single parameter as well [Stoker, 1950]. Two notable exceptions involve period doubling [Feigenbaum, 1978; Eckmann, 1981] and the Devil's staircase [Feigenbaum et al., 1982; Bak, 1986]. In both these routes to chaos, frequencies change abruptly as one parameter, in the former, or two, in the latter [Schuster, 1988], change gradually.
[102] Moron et al. [1998] hypothesized therefore that as one of the parameters of the ENSO oscillator crosses a threshold value in the early 1960s, the period of the seasonally forced climatic oscillator jumps from one step of the Devil's staircase to another. This would confirm an important aspect of recent theoretical work on ENSO's oscillatory regularity, as well as on the irregular occurrences of major warm and cold events [Chang et al., 1994; Jin et al., 1994, 1996; Tziperman et al., 1994].
[103] Yiou et al. [2000] compared the results in Figure 8 with those of Moron et al. [1998], using multichannel SSA (see section 4.2), and those of Wang and Wang [1996]. The latter authors used both the standard wavelet transform with a Morlet mother wavelet and Mallat and Zhang's [1993] waveform transform to analyze the Darwin sea level pressures for 1872–1995. The frequency jump in Figure 8 is much sharper than when using either global SSA or wavelettype methods (not shown here). It is also sharper than the changes in SOI variability described by Torrence and Compo [1998], who used both Morlet and Mexicanhat basis functions for their analysis. Multiscale SSA can thus provide sharper regime transitions in the evolution of a nonlinear system than either global SSA or a wavelet analysis that uses a fixed set of basis functions.