Tracking and prediction of large-scale organized tropical convection by spectrally focused two-step space–time EOF analysis



An empirical orthogonal function (EOF) analysis in time and space is applied to extract the coherent signals of convectively coupled equatorial waves, intraseasonal oscillations, and other disturbances from unfiltered satellite outgoing long-wave radiation anomaly data. The algorithm produces a basis of time indices for the coherent signals in selected bands of the zonal wave-number frequency domain and also generates reduced noise versions of wave-number frequency filtered data applicable in real time. Multiple linear regression is applied to forecast the time indices of each wave-number frequency band, and the predicted indices are applied to reconstruct the predicted filtered outgoing long-wave radiation (OLR) fields. A cross-validation analysis demonstrates that the predicted Madden–Julian oscillation (MJO) signals exhibit skill to 25 days across the global Tropics, and beyond 30 days across some of the higher latitudes of the Tropics. Copyright © 2011 Royal Meteorological Society

1. Introduction

Prediction at the sub-seasonal weather–climate interface is on the frontier of investigation in atmospheric sciences. Spectral peaks in proxies for atmospheric moist deep convection in the Tropics that project well above others in their neighbourhoods suggest the presence of coherent intraseasonal signals that might yield predictability by empirical means. The associated signals contribute substantially to the sensible weather in the tropical atmosphere and are also associated with signals in the extratropical circulation (Wallace and Gutzler, 1981; Ferranti et al., 1990; Weickmann et al., 1997; Mo and Higgins, 1998; Hendon et al., 2000; Higgins et al., 2000; Jones and Schemm, 2000; Mo, 2000; Nogues-Paegle et al., 2000; Branstator, 2002; Jones et al., 2004a, 2004b; Weickmann and Berry, 2007). The associations between these signals in the tropical atmosphere and extratropical circulations suggest that although such pronounced spectral peaks are less apparent in the midlatitudes, the portions of the midlatitude signals that are coherent with the tropical convective modes might yield a predictable intraseasonal background state to the synoptic weather.

The 30–60-day Madden–Julian oscillation (MJO: Madden and Julian, 1994; Zhang, 2005) acts along with interannual patterns such as the El Niño/Southern Oscillation (ENSO) to modulate the organization and evolution of such moist deep convection and the global atmospheric circulation (Roundy et al., 2010). Synoptic- to planetary-scale waves couple to convection and modulate its evolution on higher frequencies (Kiladis et al., 2009). These waves include convectively coupled equatorial Rossby (ER), Kelvin, mixed Rossby–gravity (MRG), and easterly waves (e.g. Kiladis et al., 2005). All of these disturbances evolve together as part of the same nonlinear system. Changes in one mode influence the background conditions felt by the others. Nevertheless, some of these wave and climate modes evolve quasi-systematically on intraseasonal time-scales (herein considered roughly 10 to 90 days) with linear signals apparently dominating their evolution. The term ‘linear’ herein implies that if a particular convective mode can be described by a basis of time series, then a future state of that mode is approximately a linear combination of the present signals in that basis. A portion of the nonlinear signal might also yield predictability. For example, nonlinear interaction with the seasonally evolving base state is associated with changes in the structure and propagation characteristics of the MJO and convectively coupled waves. When such modulation occurs in similar ways across many years, statistical models can be trained to diagnose and predict its effects.

Recently, Roundy and Schreck (2009, hereafter RS09) developed a statistical system for identifying and tracking the signals of convective modes proximate to selected pronounced spectral peaks. They calculated the leading time extended empirical orthogonal functions (EEOFs) of outgoing long-wave radiation (OLR) anomalies filtered in time and space to emphasize the signals of selected modes of organized convection. They found that by projecting unfiltered OLR anomalies onto the EEOF patterns of the filtered anomalies they could extract signals associated with the target modes. The time extensions include only the recent past (i.e. no future data are required) so that the algorithm can be applied in real time. The EEOF patterns provide general synoptic-dynamic models of the progression of each of the target modes because they include zonal, meridional and temporal degrees of freedom.

Other authors have applied EOF methods to track the MJO in real time. For example, Wheeler and Hendon (2004, hereafter WH04) combined OLR and zonal wind data averaged over 15°N to 15°S in an EOF analysis, and they labelled the resulting principal component (PC) time series the real-time multivariate MJO (RMM) indices. By averaging over latitude, WH04 eliminated the meridional degree of freedom. Omission of this degree of freedom allows the leading pair of RMM PCs to explain more of the total variance in the resulting reduced basis, but the approach sacrifices detail associated with seasonal and event-to-event variations in the structure of the MJO and its meridional propagation. Concentration of most of the variance associated with the MJO into a single pair of eigenmodes allows the RMM PCs to be plotted in a simple two-dimensional phase diagram. The phase concept allows users to sort each day when the signal is deemed present into one of eight zonal phases. Identification of the dates characterized by a given phase facilitates the calculation of composite patterns associated with that phase.

In contrast, retention of the meridional and consideration of temporal degrees of freedom as expressed by retention of several leading eigenmodes of RS09 allows their EEOF projection algorithm to more completely specify the temporal and spatial variations of observed convectively coupled modes. Variance associated with the MJO is spread over a large number of the RS09 EEOF PCs. Thus many different combinations of PCs in the EEOF system can explain a given phase of the dual-PC RMM system, each with different characteristics of meridional structure or phase speed. The EEOF PCs thus define a basis much larger than that generated by dual-mode approaches. Spreading some of the variance in the leading eigenmodes across a broader basis implies that a two-dimensional phase diagram based on the leading two PCs in the EEOF system does not include sufficient variance to be as useful as the RMM PCs. Thus the choice between a multi-mode and a two-mode approach to indexing the MJO depends on the ultimate objective of the user.

Development of time indices to represent the MJO and convectively coupled waves allows for empirical prediction of the associated modes. Statistical models can be applied to diagnose and predict the spatial patterns associated with PCs in either the EEOF or the dual PC approaches. Jiang et al. (2008) applied multivariate linear lag regression to predict the RMM PCs and the associated seasonally varying spatial patterns. They incorporated the influence of seasonal variations in the MJO by generating regression coefficients based on OLR and wind data during each calendar month, and then they substituted the observed values of the RMM PCs into the resulting models to obtain the predicted OLR anomalies. They calculated correlations between the predicted and observed values of local OLR anomalies and found that coefficients dropped to below 0.5 after 15 days and 0.2 after 30 days. Their results demonstrated predictions that were better than most numerical weather prediction models prior to 2008 (most of which showed skill extending only to 5–7 days, e.g. Seo et al., 2005; Waliser, 2005). They further demonstrated that multiple linear regression produces better forecasts for OLR anomalies associated with the MJO than a sampling of other empirical techniques. Since Jiang et al. (2008), some authors have shown skill in numerical model forecasts of MJO signals that exceeds that of the Jiang et al. benchmark (Bechtold et al., 2008; Seo et al., 2009).

This article develops an improvement of the Roundy and Schreck (2009) EEOF technique that allows for better spatial and temporal resolution along with better-behaved PCs. Then it describes a linear regression algorithm similar to that of Jiang et al. (2008) for generating forecasts of modified EEOF PCs in the spectral band of the MJO.

2. Data and methods

2.1. EEOF approach

Interpolated OLR data (Liebmann and Smith, 1996) were obtained from the National Oceanic and Atmospheric Administration Earth System Research Laboratory (NOAA ESRL) website. OLR is a relatively good proxy for moist deep convection in the Tropics. Although other types of data have proven more effective in recent years in tracking and diagnosing structures of convectively coupled modes, OLR data are available over a longer climatology, making them better suited for analysis of intraseasonal to interannual patterns. These interpolated data are used from June 1974 through to 31 December 2003, except for a period of missing data from 17 March to 31 December 1978. NOAA uninterpolated OLR data are interpolated linearly in space and utilized for the remainder of the analysis from 2004 to the present. The local mean and seasonal cycle (including its first three harmonics) estimated from the period 1974–2006 are subtracted to generate anomalies. A modified version of the algorithm of RS09 is applied to generate EEOF patterns based on the filtered OLR anomalies, and the unfiltered anomalies are applied to generate the associated PCs and to reconstruct filtered data as follows:

  • 1.Filter the interpolated OLR data for a selected band in the zonal wave-number frequency domain (filter bands are shown in Figure 1, except that 100-day low-pass filtered anomalies were also analysed). These bands are broader than those of Wheeler and Kiladis (1999) and Roundy and Frank (2004) in order to include more of the total variance, since the main objective is prediction of sensible weather associated with equatorial waves, intraseasonal oscillations, and other coherent signals in the same general vicinity of the spectrum. For example, Roundy (2008) showed that Kelvin waves tend to propagate more quickly across the Western Hemisphere than the Eastern, and they tend to slow down over the Eastern Hemisphere as they propagate through the local active convective phase of the MJO and speed up through the suppressed phase. As different events evolve differently, their contributions to the spectrum appear at different wave numbers and frequencies, leading to a smoother spectrum with power distributed more evenly. Applying broad filter bands to process OLR data would retain such variations, while narrow filters like those of Wheeler and Kiladis (1999) and Roundy and Frank (2004) would exclude them. Those previous works benefited from the narrow filter bands because they helped demonstrate the clear association between some observed signals and the dispersion characteristics of theoretical waves. In contrast, the core objective of the present analysis is to assess and predict coherent signals associated with more of the variance in broader neighbourhoods of the full spectrum.
  • 2.Construct a matrix X whose columns are time series of the filtered OLR data on the full 2.5° grid from 30°S to 30°N. (The original RS09 algorithm applied a reduced grid.)
  • 3.Find the leading EOFs E of the matrix X (these are the eigenvectors of the matrix XTX corresponding to the largest eigenvalues). Find the corresponding principal component time series (PCs, U), following
    equation image(1)
  • 4.Construct a matrix Xpc from this matrix of time series of leading PCs U, and extend the matrix by including the same PCs at time lags from the original, i.e.
    equation image(2)
    The number of rows in the matrix Xpc is the length of the dataset, and the number of columns is the number of EOFs retained for the selected band, times the number of time steps included in the extensions.
  • 5.Calculate the leading EOFs Epc of Xpc. Although the PCs U for a given band form an orthogonal basis, when time lags are considered, correlations occur. For example, a positive signal in one eigenmode might frequently precede a positive signal in another eigenmode. Such correlations frequently relate to preferred propagation patterns in the filtered data. For example, if one spatial EOF includes a pattern shifted to the east of another EOF, and if the corresponding process propagated eastward, then the time extensions would show that activity in the eastern pattern would follow activity in the western one, with a time lag dependent on the phase speed. This EOF analysis based on time extensions of the spatial PCs thus associates spatial patterns with temporal evolution, allowing the resulting patterns to include preferred propagation characteristics of modes of tropical convection. Steps 1–5 complete the calculation of the extended EOF patterns of the filtered data. Diagnosis of these preferred propagation patterns in the filtered data allow the subsequent steps (below) to extract similar signals from unfiltered data.
  • 6.Construct a matrix Xunfil identical to X except using unfiltered OLR anomaly data, with data reconstructed for lower-frequency modes (if any) calculated first and subtracted. For example, before the MJO band projections are calculated from the unfiltered anomalies, the 100-day low-pass projections are subtracted. As another example, for the Kelvin band, 100-day low-pass, MJO, and ER-band projections are subtracted. Data included in the 100-day low-pass projections are smoothed with a 10-day running mean—otherwise no smoothing is applied.
  • 7.Find
    equation image(3)
  • 8.Construct the matrix
    equation image(4)
  • 9.Find
    equation image(5)
    These reconstructed PCs Upcunfil are similar to Upc.
  • 10.An approximation of the filtered data Xrecon, applicable in real time, but with reduced noise, is reconstructed by following
    equation image(6)
    equation image(7)
Figure 1.

Boxes represent bands of the zonal wave-number frequency domain applied to filter OLR anomalies for EEOF pattern extraction. The bands are plotted on a wave-number frequency power spectrum of OLR anomalies normalized by dividing by a smoothed background spectrum, calculated as by Roundy (2008). Thin grey curves represent dispersion solutions for waves on the equatorial beta plane (Kiladis et al., 2009) for the indicated equivalent depths (h). CPD is cycles per day.

The EOF patterns E and Epc form templates of the temporal–spatial patterns of the filtered data onto which the algorithm projects unfiltered data to generate time indices of the modes in each band (Upcunfil) as well as to reconstruct the filtered data in real time without requiring application of the Fourier transform at the end of the dataset. Time extension EOF analysis of the original PCs then distinguishes modes that propagate in different directions or with different phase speeds. EOF analysis performed only in space cannot distinguish between eastward and westward propagation if both eastward- and westward-moving disturbances have similar spatial structures. Exclusion of higher EOFs reduces noise. The algorithm also dramatically reduces the computer memory required while simultaneously increasing the temporal and spatial resolution of the structure of the EEOF patterns identified in each band.

The number of time lags n applied varies across the set of filter bands to save memory. Lower frequency modes include longer histories (a larger n) to better resolve the modes. The 2–10-day westward band is assigned 12 days, the ER and MJO bands get 100 days, the Kelvin band gets 35 days, and the 100-day low-pass band is assigned 1000 days. With the exception of the low-pass band, these values match or exceed the longest period included in the band to allow signals in the EOF patterns to resolve at least one cycle. Results are not sensitive to small changes in n. All time lags are applied in increments of 1 day as indicated in Eqs (2) and (4), except for the low-pass band, which uses 10-day time stepping. Thus except for the 100-day low-pass band, this approach eliminates the need for smoothing the unfiltered data that was necessary following RS09.

Objective determination of the appropriate number of EOFs to retain is difficult and might not improve upon subjective approaches. Several tens of EOFs are required to explain most of the variance in each band. These are taken from a potential basis of greater than 360000 time extended EOFs possible (i.e. 144 zonal grid points by 25 meridional by 100 time steps), although with reduction of the number of spatial PCs retained in step 3, the maximum possible time extended EOFs Epc would range between 3800 and 20000, depending on the number of time extensions n included. In any case, those retained are a tiny fraction of the total available. Traditionally, retaining several tens of EOFs is considered bad practice (e.g. Wilks, 2006). However, retaining just a handful is insufficient to resolve the spatial structures of observed convective disturbances.

Reconstructions were made based on retaining different numbers of EOFs corresponding to different fractions of the total variance to attempt to diagnose the best number of EOFs to retain. The objective was to retain the smallest basis of EOFs that provides signal deemed sufficiently representative of the high-amplitude patterns in unfiltered and filtered OLR anomalies. Retaining 20% of the variance is sufficient to explain most signal in the MJO and Kelvin bands over the core warm-pool regions, but the level set at 75% of the variance is close to the point at which adding further EOFs does not seem to markedly improve the reconstruction of occasional high-amplitude anomalies outside of those regions (for example, seasonal monsoon zones and the northeast tropical Pacific basin). Large reduction of the amount of variance retained reduces the effective resolution of the structure and evolution of observed large-scale convective systems away from the most active convective zones. Most of the variance excluded when 75% is retained occurs in geographical regions where comparatively little activity in the selected band tends to occur.

The number of EOFs retained was thus set at 75% of the total variance in the spatial EOFs, and the same number of EOFs was retained following the time extensions as well (thus if m spatial EOFs were retained, m time extended EOFs were retained as well). This setting provides a reasonable representation of the original filtered data, especially across the warm-pool zones of the tropical oceans (and the Pacific Ocean cold tongue for the low-pass band) while still discarding a substantial amount of noise. For the remainder of this analysis, I retain enough eigenmodes to explain 75% of the variance in each band, and I refer to the result as the ‘signal’ and the remaining 25% of the variance as ‘noise’. These percentages apply to the number of spatial EOFs retained. This number retained is 82, 48, 200, 122 and 46 for the ER, MJO, 2–10 day westward, Kelvin and 100-day low-pass bands, respectively. Since the process of time extension spreads variance among a larger number of PCs, this choice implies that less than 75% of the total variance in each band is retained. In general, bands with signals characterized by larger wave numbers and higher frequencies require greater numbers of EOFs to resolve the observed signals, consistent with the larger number of degrees of freedom expected in such bands. Although the number of EOFs retained is not objectively defined, the analysis below shows that the resulting signals well represent patterns in unprocessed OLR anomalies, and that prediction of the extracted signals associated with the MJO is skilful in independent data past one month in some regions of the Tropics.

2.2. Forecasting by multiple linear regression

This section discusses an approach for predicting projected MJO band OLR anomalies discussed in section 2.1, step 10. The first step applies the matrix of principal components for the MJO band Upcunfil in a multiple linear regression model to predict itself at a time lag τ:

equation image(8)

where Aτ is a vector of regression coefficients. Eq. (8) is solved for Aτ including only days of the year 45 days before to 45 days after the day of the year on which the intended forecast is made. Focus on one time of the year allows the algorithm to better forecast the evolution of patterns in the selected wave-number frequency band that evolve differently during different seasons (similar to Jiang et al. (2008), who calculated regression coefficients in their approach according to calendar month). The training period includes all available data from 27 March 1975 through to 8 April 2007. That period represents 11700 days, or 2925 days for each regression calculation (after accounting for the requirement for day of year).

Steps 9 and 10 in section 2.1 reconstruct the filtered OLR anomalies corresponding to the forecast. Skill is assessed by cross-validation to generate a blind hindcast dataset. Hindcasts are made for each day of each year, based on training the regression models on data from all other years. This validation approach is complicated because the original filtered data and the EEOF patterns themselves contain temporal information that might artificially enhance the apparent skill of the hindcasts. To address this problem, new EEOF patterns and PCs were calculated for each year of the dataset. The calculations for a given year are made by removing the data for that year from the calculation of the EEOF patterns that are then applied for predicting signals that year. This approach results in fragmentation of the dataset that might reduce the skill of prediction, but tests reveal that the set of structures in the basis of EEOFs change little from year to year in the MJO band.

Roundy and Schreck (2009) showed that nearly every EEOF pattern in their analysis appears as a pair of eigenmodes that are in temporal and spatial quadrature, and members of each pair explain nearly the same amount of variance. This conclusion also applies to the revised algorithm considered here. Since removal of a single year can redistribute small amounts of variance between the eigenmodes, the order of eigenvectors is not maintained in the calculations for each year. Thus this cross-validation approach confounds the testing of the skill of prediction of individual PCs. However, the projected OLR anomalies associated with combinations of many PCs do not appear to be significantly affected. This project therefore determines skill of the forecasts through analysis of the predicted projected OLR fields. Since the number of EOFs retained is set a priori to retain 75% of the variance and the testing dataset is not applied to select predictors, artificial skill due to predictor screening (DelSole and Shukla, 2009) does not contribute to the results. Artificial skill associated with a large number of predictors also does not confound this analysis because the results are tested with independent data.

3. Results

3.1. Projected OLR signals

Sections 3.1–3.3 analyse the projected signals and their similarity to the filtered data. Figure 2 shows an example of unfiltered OLR anomalies (shading) and projected OLR signals (contours), averaged from 2.5°N to 12.5°N for January–July 1997. Blue shades suggest active convection. Solid contours represent negative projected OLR anomalies, with dashed positive. Black represents the 100-day low-pass projections; red, the MJO band; green, the ER wave band; magenta, the Kelvin band; and cyan, the 2–10-day westward band. During the first half of the period, low-frequency convection is suppressed over the western Indian Ocean basin and enhanced over the western Pacific and Atlantic Oceans. A transition to a pattern consistent with El Niño occurred across April, with enhanced low-frequency convection indicated over the Pacific Ocean and suppressed low-frequency convective signals developing over the Maritime Continent. Red contours indicate alternating periods of enhanced and suppressed convection in the MJO band, with the strongest signals across the Indian Ocean and Maritime Continent regions. Some MJO band signals also occur over the Atlantic Ocean and Africa. Each MJO event evolves differently, but each anomaly represented by red contours appears to be associated with substantial signals in unfiltered OLR anomalies. Green contours representing equatorial Rossby waves yield similar conclusions. Fast signals associated with Kelvin waves circle the globe as indicated by the magenta contours. Extratropical wave signals can also project substantially onto signals in the Kelvin band, but extratropical waves and Kelvin waves are not independent (Straub and Kiladis, 2003). Easterly waves, mixed Rossby–gravity waves, and some tropical cyclone signals appear together in the blue contours that represent the 2–10-day westward band. Overall, the contoured signals appear to represent well the target disturbances in the unfiltered data. A similar diagram including only the set of EOFs that explain only 20% of the variance in each band does not show the negative low-frequency active convective signal east of the date-line, nor does it show the Western Hemisphere MJO band OLR anomalies (not shown).

Figure 2.

OLR anomalies (shading, W m−2) averaged from 2.5°N to 12.5°N for January–July 1997. Contours represent EEOF projected OLR anomalies, with solid contours negative and dashed positive. Black represents 100-day low-pass signals, and red, green, magenta and cyan represent the MJO, ER, Kelvin and 2–10-day westward bands, respectively. The contour interval is 7.5 W m−2, and the zero contour is omitted.

3.2. Spectrum analysis

A zonal wave-number frequency spectrum analysis is applied to the projected OLR anomalies in the ER, MJO, 2–10-day westward and Kelvin bands. Projected data obtained from section 2.1 step 10 are sorted into longitude–time arrays at each latitude from 15°S to 15°N. Each array is then broken into a series of 360-day segments overlapping each other by 180 days. The ends of each segment are then tapered in time by multiplication by cosine bells to reduce spectral leakage. The overlap recovers data lost to the tapering. This general approach is similar to that followed by Wheeler and Kiladis (1999), but with longer time windows. The result is then normalized for plotting by dividing by the product of the number of days in each window and the number of zonal grid points. No attempt to remove a background is applied because noise is already reduced by exclusion of the EEOFs associated with the bottom 25% of the variance in each band.

Figure 3 shows the resulting spectra for the ER, MJO, 2–10-day westward and Kelvin bands in panels (a)–(d), respectively. Although slight overlap in power occurs between the ER and 2–10-day westward bands and between the MJO and Kelvin bands, this analysis demonstrates that the projection algorithm separates the signals well. Any overlap is not generated by redundant signals, because the ER projections and the MJO projections are removed from the data before generating the 2–10-day westward and Kelvin projections, following step 6. Power in each band tends to concentrate in the lower frequencies of the band. Previous works have associated this concentration with a red noise background.

Figure 3.

Power spectra of projected OLR anomalies in the indicated bands. Results are assessed from the full data record.

The spectral peak of projected signals in the Kelvin band (Figure 3(d)) spreads across a broad range of wave numbers at each frequency. Previous works have suggested that the spectral peak associated with Kelvin waves is narrow in wave number relative to the peak associated with the MJO. Power extending from the core of the Kelvin peak to higher wave number has generally been assumed to be part of the red background. However, Figure 3(d) suggests that a portion of that power is associated with coherent signals. Thus the peak associated with Kelvin waves may in fact be broader than previously thought, and less inconsistent with the MJO peak. This analysis also suggests that substantial coherent signals occur in the ‘spectral gap’ between the MJO and Kelvin peaks, where power has previously been assumed to be part of the background. These results raise questions about the previous assumptions that power above a smoothed background is a necessary condition for the presence of coherent signals.

Results are sensitive to the number of EOFs retained. In addition to the spectra calculated for projections including 75% of the variance, I also calculated spectra for reconstructed data including 20%, 40% and 60% of the variance (respectively) in order to assess to what extent the spectrum of background noise might contribute to Figure 3 (not shown for brevity). Traditional interpretations of the background spectrum of Wheeler and Kiladis (1999) would suggest that the spectrum would broaden from the spectral peaks with increasing numbers of EOFs included as more noise is included. Instead, however, power begins near the lowest frequencies and wave numbers in each band and extends outward with increasing numbers of EOFs retained, with the rate of increase greatest in the regions of the strongest spectral peaks (not shown). Thus coherent signals suggested by the leading EOFs do not lie just along the dispersion curves of shallow water theory or immediately proximate to the strongest spectral peaks. Further, the spectra of data reconstructed for individual leading EOFs are broad, with power both proximate to and away from the spectral peaks. This result is not surprising because observed waves are not well characterized by sinusoidal structures. Thus much of what is widely accepted as background is actually associated with coherent signals. These results suggest that coherent signals in the bands are red just like the noise.

3.3. Comparison of projected and wave-number frequency filtered OLR

Figure 4 shows the temporal Pearson correlation coefficients of the filtered and the projected OLR anomalies at each grid point across the Tropics. Correlation coefficients exceed 0.9 in regions of maximum OLR variance in the same wave-number frequency bands (Roundy and Frank, 2004). Correlations as low as 0.3 occur in regions of climatologically low OLR variance, such as the region of the subtropical ridge west of South America. Figure 5 shows the corresponding fraction of the local variance in the filtered OLR explained by the projected OLR. As with the correlation analysis, the fraction of the local variance of filtered OLR explained by the projected OLR is greatest in regions where OLR in the target wave-number frequency band varies most. The projected data explain less than 10% of the local variance in some geographical regions where OLR anomalies in the target bands tend to be small. These results suggest that most of the noise discarded in the projection process originates from regions with climatologically low variance. Some of this ‘noise’ might be associated with real weather events. However, consistent with RS09, these results suggest that the EEOF projection algorithm reduces some of the noise associated with filtering in the wave-number frequency domain, such as ringing of signals from regions of high activity into regions of climatologically low activity. The projected signals thus might provide a better representation of actual coherent patterns in unfiltered data than filtered data.

Figure 4.

Local correlation between filtered and projected OLR anomalies for the bands indicated in the panel titles. Results are assessed from the full data record.

Figure 5.

Fraction of the local variance in OLR anomalies filtered for the indicated bands of the wave-number frequency domain explained by the corresponding EEOF projected OLR anomalies. Results are assessed from the full data record.

3.4. Example hindcasts

Figure 6 shows projected OLR signals in the MJO band (shading) averaged from 7.5°S to 7.5°N, with contours representing blind hindcasts of the same anomalies for lead times of 7, 14, 21 and 28 days (shown in panels (a)–(d), respectively). Blue shades suggest active convection, but forecast active convective signals are contoured in red to enhance the contrast. The 7-day forecast indicates high-amplitude signals nearly identical to the verification. Although the forecast signal degrades with higher lead times, some patterns remain well represented even at 28 days. On the other hand, some hindcast anomalies at 21- and 28-day lead times suggest outcomes opposite the verification, especially during April–June 1988.

Figure 6.

Verification EEOF projected OLR anomalies in the MJO band (shading, with active convection suggested in blue), and the corresponding cross-validated predicted signal at lead times of 7, 14, 21 and 28 days in panels (a)–(d) (respectively). All signals are averaged from 7.5°S to 7.5°N. Red contours indicated negative anomalies and blue contours positive. The contour interval is 5 W m−2, and the zero contour is omitted.

3.5. Assessment of skill

Cross-validated hindcasts were generated for projected OLR anomalies at daily lead times up to 30 days for 1974–2006. The standard deviations, root mean square (RMS) errors and correlations were calculated for the hindcasts at each lead time in both space and time across 30°S to 30°N and select geographical regions. Verification data are the OLR projections onto the EEOFs of the MJO band. A similar assessment of the OLR hindcasts calculated following Jiang et al. (2008) was prepared for comparison. These hindcasts were based on regressing OLR data against the WH04 RMM PCs as they described, with a different set of regression coefficients calculated every calendar month. Cross-validation was not applied to generate my reconstruction of the Jiang et al. results, which might yield small inflation of the associated skill. For consistency, these forecasts are verified or evaluated for skill based against the same EEOF projection benchmark applied for assessing the forecasts by the EEOF approach.

The Taylor diagram (Taylor, 2001) provides a convenient assessment of the skill of a forecast. This diagram has become popular among climate scientists to demonstrate the quality of climate model output relative to observed data. The Taylor diagram provides information about the correlation between the forecast and verification data, as well as about the amplitude of the forecast signal and its RMS error. Figure 7 shows the Taylor diagram for 30°S to 30°N hindcast MJO band OLR for the EEOF forecast (red) and the Jiang et al. (2008) approach (blue). The standard deviation of the observed signal is 6 W m−2, and a forecast in the lower right corner of the diagram would have perfect correlation with the data reconstructed from the EEOF PCs and zero RMS from those same data. The EEOF forecasts lose correlation with lead time more quickly than they lose amplitude. Nevertheless, correlations remain significantly different from zero to the 25-day lead. RMS error never exceeds the standard deviation of the verification data at any lead time. Comparison with the results of the Jiang et al. approach shows that the EEOF projected dataset explains twice as much variance as that obtained following Jiang et al. This larger amount of variance explains most of the improvement of the EEOF approach over the Jiang et al. approach. The Jiang et al. approach shows slightly lower RMS error after the 24-day lead than the EEOF forecast. Consistent with the above interpretation of Figure 7, the EEOF forecast approach shows some skill across the global Tropics approaching 25-day lead times.

Figure 7.

Taylor diagram representing the skill of the hindcast OLR anomalies in the MJO band. Red indicates the result for the EEOF approach, and blue represents the result from the comparable forecast of Jiang et al. (2008). Numbers of the same colour near the plotted points indicate the lead time of the forecast represented at that number. A perfect forecast would have an RMS difference (RMSD) of 0, correlation of 1, and a standard deviation of 6. This figure is available in colour online at

Figure 8 shows a similar Taylor diagram for the EEOF approach including only India (15 to 25°N and 65 to 85°E). Correlations and error statistics include only 1 June–31 August, to assess the skill of the EEOF algorithm in predicting the Indian southwest monsoon. The 30-day forecast has a correlation of roughly 0.5 and it retains roughly half the standard deviation of the verification data. These results suggest that skill in predicting MJO band signals in the Indian summer monsoon extends to at least 30 days. All correlations plotted in Figure 8 are significant at above the 99% level.

Figure 8.

Taylor diagram as in Figure 7, but correlations, standard deviations, and RMS errors are calculated only for the region 15°N to 25°N and 65°E to 85°E during June, July and August to demonstrate skill of the EEOF technique in predicting MJO band OLR anomalies during the Indian southwest monsoon. This figure is available in colour online at

These results suggest that correlation and error evolve differently for different geographical locations and times of the year. The skill of the forecast might also vary with the amplitude of the predicted signal. Figure 9(a) shows the correlation skill assessed at each latitude for the region between 40°E and 90°E including only times when the forecast signal at that lead-time exceeds ± 1 standard deviation in the cross-validated forecast data. Correlation drops off more quickly near the Equator than at high latitude, with correlations remaining statistically significant poleward of 15°N or S beyond 30-day lead times. Figure 9(b) shows the corresponding skill score (SS) relative to climatology (the zero anomaly):

equation image(9)
Figure 9.

(a) Pattern correlation between EEOF forecast and verification OLR anomalies in the MJO band for the region 40°E to 90°E as a function of latitude and lead time, including only times when the forecast indicates signals in excess of ± 1 SD at the same lead times in the cross-validated hindcast dataset. (b) The corresponding skill score (SS) as defined in Eq. (9). Minimum skill is −0.015. Contours are plotted every 0.05 for both panels (a) and (b).

Skill declines with time to 0 on the Equator by 26 days, but remains positive farther off the Equator through to 30 days. Skill drops to zero within 10 days in a similar analysis of SS for times when the forecast suggests amplitudes less than 1 standard deviation (SD) (not shown). Thus high-amplitude forecast signals can be taken with confidence, but forecasts of low-amplitude signals might not be as useful.

Figure 9 suggests that skill decreases smoothly with increasing lead time from the Equator toward the Poles. This poleward migration of skill might be associated with poleward-moving signals linked with the MJO during some times of the year such as the northern summer over Asia and the western North Pacific Ocean or the southern summer over the southwest Pacific Ocean. Thus the present state of MJO band convection yields more information about convection at high latitudes of the Tropics past week 3 than it yields about future convection near the Equator at the same time. This result suggests that the algorithm might be less effective at forecasting new events near the Equator than forecasting the continued poleward drift of existing events.

4. Conclusions

This work describes a new algorithm for diagnosing signals from satellite OLR data associated with convectively coupled waves, intraseasonal oscillations and climate variations. Signals in selected broad bands of the zonal wave-number frequency domain are associated with temporal spatial eigenvectors and principal components that can be applied to dissect the patterns of convection in real time and to predict their temporal and spatial evolution. The principal components serve as time indices of signals in the selected bands. The original filtered data are reconstructed from unfiltered data by projecting unfiltered data onto the time–space eigenmodes of the filtered data. As presently applied in real time at and, the resulting reconstructed filtered fields explain 60–75% of the variance in the corresponding filtered data and the results represent well the signals of the filtered data in the geographical regions where the filtered signals vary most. The PCs in a given band are easily predicted by multiple linear regression based on regression coefficients calculated from data at the same time of the year as the forecast.

Although the approach is applied at the same website to predict signals in all five wave-number frequency bands analysed here, for simplicity this article assesses only the skill of the forecasts for signals in the MJO band because these signals are of greatest interest. The forecast MJO band signals exhibit some skill to about 25 days when that skill is assessed across the global Tropics and more substantial skill past 30 days for strong predicted signals at high latitudes of the Tropics (such as India and over the northwest tropical Pacific Ocean). Analysis of the hindcasts of signals in the other bands will be included in a future manuscript.


The NOAA Earth System Research Laboratory graciously provided OLR data for this analysis. Funding was provided by the National Science Foundation, grant 0850642.