## 1. Introduction

Prediction at the sub-seasonal weather–climate interface is on the frontier of investigation in atmospheric sciences. Spectral peaks in proxies for atmospheric moist deep convection in the Tropics that project well above others in their neighbourhoods suggest the presence of coherent intraseasonal signals that might yield predictability by empirical means. The associated signals contribute substantially to the sensible weather in the tropical atmosphere and are also associated with signals in the extratropical circulation (Wallace and Gutzler, 1981; Ferranti *et al.*, 1990; Weickmann *et al.*, 1997; Mo and Higgins, 1998; Hendon *et al.*, 2000; Higgins *et al.*, 2000; Jones and Schemm, 2000; Mo, 2000; Nogues-Paegle *et al.*, 2000; Branstator, 2002; Jones *et al.*, 2004a, 2004b; Weickmann and Berry, 2007). The associations between these signals in the tropical atmosphere and extratropical circulations suggest that although such pronounced spectral peaks are less apparent in the midlatitudes, the portions of the midlatitude signals that are coherent with the tropical convective modes might yield a predictable intraseasonal background state to the synoptic weather.

The 30–60-day Madden–Julian oscillation (MJO: Madden and Julian, 1994; Zhang, 2005) acts along with interannual patterns such as the El Niño/Southern Oscillation (ENSO) to modulate the organization and evolution of such moist deep convection and the global atmospheric circulation (Roundy *et al.*, 2010). Synoptic- to planetary-scale waves couple to convection and modulate its evolution on higher frequencies (Kiladis *et al.*, 2009). These waves include convectively coupled equatorial Rossby (ER), Kelvin, mixed Rossby–gravity (MRG), and easterly waves (e.g. Kiladis *et al.*, 2005). All of these disturbances evolve together as part of the same nonlinear system. Changes in one mode influence the background conditions felt by the others. Nevertheless, some of these wave and climate modes evolve quasi-systematically on intraseasonal time-scales (herein considered roughly 10 to 90 days) with linear signals apparently dominating their evolution. The term ‘linear’ herein implies that if a particular convective mode can be described by a basis of time series, then a future state of that mode is approximately a linear combination of the present signals in that basis. A portion of the nonlinear signal might also yield predictability. For example, nonlinear interaction with the seasonally evolving base state is associated with changes in the structure and propagation characteristics of the MJO and convectively coupled waves. When such modulation occurs in similar ways across many years, statistical models can be trained to diagnose and predict its effects.

Recently, Roundy and Schreck (2009, hereafter RS09) developed a statistical system for identifying and tracking the signals of convective modes proximate to selected pronounced spectral peaks. They calculated the leading time extended empirical orthogonal functions (EEOFs) of outgoing long-wave radiation (OLR) anomalies filtered in time and space to emphasize the signals of selected modes of organized convection. They found that by projecting unfiltered OLR anomalies onto the EEOF patterns of the filtered anomalies they could extract signals associated with the target modes. The time extensions include only the recent past (i.e. no future data are required) so that the algorithm can be applied in real time. The EEOF patterns provide general synoptic-dynamic models of the progression of each of the target modes because they include zonal, meridional and temporal degrees of freedom.

Other authors have applied EOF methods to track the MJO in real time. For example, Wheeler and Hendon (2004, hereafter WH04) combined OLR and zonal wind data averaged over 15°N to 15°S in an EOF analysis, and they labelled the resulting principal component (PC) time series the real-time multivariate MJO (RMM) indices. By averaging over latitude, WH04 eliminated the meridional degree of freedom. Omission of this degree of freedom allows the leading pair of RMM PCs to explain more of the total variance in the resulting reduced basis, but the approach sacrifices detail associated with seasonal and event-to-event variations in the structure of the MJO and its meridional propagation. Concentration of most of the variance associated with the MJO into a single pair of eigenmodes allows the RMM PCs to be plotted in a simple two-dimensional phase diagram. The phase concept allows users to sort each day when the signal is deemed present into one of eight zonal phases. Identification of the dates characterized by a given phase facilitates the calculation of composite patterns associated with that phase.

In contrast, retention of the meridional and consideration of temporal degrees of freedom as expressed by retention of several leading eigenmodes of RS09 allows their EEOF projection algorithm to more completely specify the temporal and spatial variations of observed convectively coupled modes. Variance associated with the MJO is spread over a large number of the RS09 EEOF PCs. Thus many different combinations of PCs in the EEOF system can explain a given phase of the dual-PC RMM system, each with different characteristics of meridional structure or phase speed. The EEOF PCs thus define a basis much larger than that generated by dual-mode approaches. Spreading some of the variance in the leading eigenmodes across a broader basis implies that a two-dimensional phase diagram based on the leading two PCs in the EEOF system does not include sufficient variance to be as useful as the RMM PCs. Thus the choice between a multi-mode and a two-mode approach to indexing the MJO depends on the ultimate objective of the user.

Development of time indices to represent the MJO and convectively coupled waves allows for empirical prediction of the associated modes. Statistical models can be applied to diagnose and predict the spatial patterns associated with PCs in either the EEOF or the dual PC approaches. Jiang *et al.* (2008) applied multivariate linear lag regression to predict the RMM PCs and the associated seasonally varying spatial patterns. They incorporated the influence of seasonal variations in the MJO by generating regression coefficients based on OLR and wind data during each calendar month, and then they substituted the observed values of the RMM PCs into the resulting models to obtain the predicted OLR anomalies. They calculated correlations between the predicted and observed values of local OLR anomalies and found that coefficients dropped to below 0.5 after 15 days and 0.2 after 30 days. Their results demonstrated predictions that were better than most numerical weather prediction models prior to 2008 (most of which showed skill extending only to 5–7 days, e.g. Seo *et al.*, 2005; Waliser, 2005). They further demonstrated that multiple linear regression produces better forecasts for OLR anomalies associated with the MJO than a sampling of other empirical techniques. Since Jiang *et al.* (2008), some authors have shown skill in numerical model forecasts of MJO signals that exceeds that of the Jiang *et al.* benchmark (Bechtold *et al.*, 2008; Seo *et al.*, 2009).

This article develops an improvement of the Roundy and Schreck (2009) EEOF technique that allows for better spatial and temporal resolution along with better-behaved PCs. Then it describes a linear regression algorithm similar to that of Jiang *et al.* (2008) for generating forecasts of modified EEOF PCs in the spectral band of the MJO.