Stratified statistical models of North Atlantic basin-wide and regional tropical cyclone counts


  • Michael E. Kozar,

    Corresponding author
    1. Department of Meteorology, Pennsylvania State University, University Park, Pennsylvania, USA
    2. Now at Center for Ocean-Atmospheric Prediction Studies, Florida State University, Tallahassee, Florida, USA
    • Corresponding author: M. E. Kozar, Center for Ocean-Atmospheric Prediction Studies, Florida State University, 200 RM Johnson Bldg., 2035 E. Paul Dirac Dr., Tallahassee, FL 32306-2840, USA. (

    Search for more papers by this author
  • Michael E. Mann,

    1. Department of Meteorology, Pennsylvania State University, University Park, Pennsylvania, USA
    Search for more papers by this author
  • Suzana J. Camargo,

    1. Lamont-Doherty Earth Observatory, Earth Institute at Columbia University, Palisades, New York, USA
    Search for more papers by this author
  • James P. Kossin,

    1. National Climatic Data Center, NOAA, Asheville, North Carolina, USA
    2. Also at Cooperative Institute for Meteorological Satellite Studies, Madison, Wisconsin, USA
    Search for more papers by this author
  • Jenni L. Evans

    1. Department of Meteorology, Pennsylvania State University, University Park, Pennsylvania, USA
    Search for more papers by this author


[1] Using the historical Atlantic tropical cyclone record, this study examines the empirical relationships between climate state variables and Atlantic tropical cyclone counts. The state variables considered as predictors include indices of the El Niño/Southern Oscillation and Northern Atlantic Oscillation, and both “local” and “relative” measures of Main Development Region sea surface temperature. Other predictors considered include indices measuring the Atlantic Meridional Mode and the West African monsoon. Using all of the potential predictors in a forward stepwise Poisson regression, we examine the relationships between tropical cyclone counts and climate state variables. As a further extension on past studies, both basin-wide named storm counts and cluster analysis time series representing distinct flavors of tropical cyclones, are modeled. A wide variety of cross validation metrics reveal that basin-wide counts or sums over appropriately chosen clusters may be more skillfully modeled than the individual cluster series. Ultimately, the most skillful models typically share three predictors: indices for the main development region sea surface temperatures, the El Niño/Southern Oscillation, and the North Atlantic Oscillation.

1. Introduction

[2] The potential origins of interannual and longer-term variability in North Atlantic tropical cyclones (TCs) have been investigated in numerous studies over the past decade. Anomalous recent levels of activity, and in particular the record-breaking 2005 Atlantic Hurricane Season, have spurred scientific interest in this topic. A number of recent studies have used statistical regression models to examine the apparent impact of climate state variables on TC activity, including recent trends.

[3] The El Niño/Southern Oscillation (ENSO) has long been known to impact Atlantic TC activity, with El Niño (La Niña) tending to diminish (enhance) seasonal TC activity [e.g., Gray, 1984]. Indices of ENSO accordingly represent a predictor commonly used in statistical modeling of Atlantic TC activity [e.g., Bove et al., 1998; Elsner, 2003; Elsner and Jagger, 2006; Mann et al., 2007]. A number of studies have also considered the role of the Northern Atlantic Oscillation (NAO) in Atlantic TC activity [e.g., Elsner et al., 2000a, 2000b; Elsner, 2003; Elsner and Jagger, 2006; Mann et al., 2007], which influences seasonal TC activity through an influence on large-scale storm tracks [e.g.,Elsner et al., 2000a; Elsner, 2003; Kossin et al., 2010]. A warm ocean favors the formation and development of TCs [e.g., Gray, 1968], as it is closely tied to key thermodynamic quantities involved in the energetics of TCs, such as potential intensity [Emanuel, 1995]. Numerous studies modeling Atlantic TC activity thus incorporate sea surface temperatures (SST) over the Main Development Region (MDR; 6°–18°N, 20°–60°W) during the primary season (Aug–Oct) for Atlantic TC formation [e.g., Hoyos et al., 2006; Emanuel, 2005; Sabbatelli and Mann, 2007; Mann et al., 2007]. Recently, there has been some debate within the research community as to whether MDR SSTs themselves [e.g., Emanuel, 2005] or some “relative” measure of SST that measures warmth of the MDR relative to the tropical mean [e.g., Vecchi et al., 2008; Ramsay and Sobel, 2011] is a more appropriate measure of thermodynamic influences on Atlantic TC activity.

[4] Two recent studies [Sabbatelli and Mann, 2007; Mann et al., 2007] have modeled annual basin-wide TC counts on three predictors: ENSO, the NAO, and Aug–Oct MDR SSTs. These analyses employed Poisson regression, a tool that is appropriate for modeling a Poisson process with a rate of occurrence that is conditional on underlying state variables [e.g.,Elsner et al., 2000b; Elsner, 2003]. The resulting statistical model displayed significant predictive skill, accounting in cross-validation for roughly half of the variance in annual Atlantic TC counts. However, it may be possible to enhance the skill in TC count statistical modeling exercises by exploring a wider range of potential climate predictors. This study thus extends upon the framework established by these previous studies by employing a more exhaustive analysis of Atlantic TC counts, using a larger array of candidate predictors in the context of Poisson regression-based statistical modeling exercises.

[5] Additional potential predictors to consider include rainfall in the Sahel region in Western Africa during the boreal summer. Studies have found that negative precipitation anomalies in this region coincide with an increase of dry African dust layers, which have the potential to inhibit TC genesis [Prospero and Lamb, 2003]. Moreover, an anomalously dry season has the potential to alter the characteristics of moist easterly waves that can eventually develop into TCs [Goldenberg and Shapiro, 1996]. Furthermore, previous studies hypothesize that droughts in Western Africa are consistent with stronger upper-level westerlies that can increase the amount of shear in the Atlantic basin [Landsea and Gray, 1992]. Recent studies use this information to correlate precipitation patterns from the African monsoon to Atlantic TC development, but this relationship is non-stationary [Bell and Chelliah, 2006; Zhang and Delworth, 2006; Fink et al., 2010]. While the stratospheric Quasi-Biennial Oscillation (QBO) was argued to influence in some early work [Gray, 1984], recent work of Camargo and Sobel [2010]questions the existence of any such relationship in the modern record. A more practical limitation is that there is no reliable long-term record of the index. It was therefore not included among the potential predictors.

[6] Furthermore, Kossin and Vimont [2007] suggested that the Atlantic Meridional Mode (AMM), a dynamically reproducible mode of variability, might explain the overarching variability in the tropical North Atlantic. Much like ENSO in the Pacific, the AMM is the leading mode of a coupled air and sea feedback process in the Atlantic. Therefore, climate features such as Caribbean sea level pressure (SLP) anomalies, African rainfall amounts, and the areal extent of the Atlantic Warm Pool (AWP) are suspected to be manifestations of the AMM.

[7] Several alternative metrics of ENSO [Barnston et al., 1997] also could be employed in this analysis (see Figure 1). In addition to the Niño3.4 index favored by, for example, Mann et al. [2007], one might alternatively employ the Niño3 or Niño1 + 2 indices of ENSO [e.g., Kossin et al., 2010]. Finally, some researchers have hypothesized that the influence of tropical Atlantic SSTs on Atlantic TC activity is best measured through a “relative” MDR SST index [Vecchi et al., 2008], the difference between the MDR and the global tropical mean SSTs, or through consideration of the North Atlantic SST and the mean SST over the global tropics separately [Villarini et al., 2010]. The global tropical mean SST index therefore is considered as a potential predictor, in addition to the “absolute” and “relative” measures of the Atlantic MDR SST. Incorporating all of these potential alternative climate state variables into an extensive pool of candidate predictors allows for a more comprehensive and robust exploration of the appropriate statistical models relating climate and Atlantic TC counts, building upon previous works such as those done by Villarini et al. [2010] and Sabbatelli and Mann [2007].

Figure 1.

Locations of the four Niño regions. The indices measured in the Niño 3, 3.4, and 1 + 2 regions are used in this study. This figure is a recreation of a diagram available online from Climate Prediction Center, National Centers for Environmental Prediction, National Weather Service, NOAA (El Niño regions, available at, accessed 2009).

[8] In addition to expanding the pool of potential predictors, it is worthwhile to decompose historical basin-wide TC counts into subgroups that cluster with respect to their path, location of genesis, and other track characteristics [Elsner et al., 1996; Nakamura et al., 2009; Kossin et al., 2010]. Previous studies and annual seasonal forecasts have displayed skill in explaining much of the nonrandom interannual variance in North Atlantic TC counts as a whole [e.g., Sabbatelli and Mann, 2007; Mann et al., 2007]. However, successful seasonal predictions of particular “flavors” of TCs in the Atlantic basin have remained elusive. In principle, the factors that govern different flavors of TCs may differ, and additional predictive skill, as well as insight, might arise from modeling them separately, rather than collectively [e.g., Lehmiller et al., 1997].

2. Data

[9] Sabbatelli and Mann [2007] and Mann et al. [2007]used three predictors in modeling annual Atlantic basin-wide TCs: (1) the post-season boreal winter December–February (DJF) Niño 3.4 SST index, (2) the post-season boreal winter December–March (DJFM) NAO index [Jones et al., 1997] and (3) the in-season August–October (ASO) mean MDR SSTs. The SST predictors here are derived from a consensus SST product, which is based on a blend of three individually published SST products [Rayner et al., 2003; Smith and Reynolds, 2003; Kaplan et al., 1998]. Instead of focusing on how the use of each individual SST product affects the results, as was done previously [e.g., Villarini et al., 2010], this study focuses on the sensitivity between TCs and the respective climate state variables through the blended consensus products for all SST based predictors.

[10] The pool of candidate predictors in our study includes these indices, but the additional series discussed in the introduction are also tested: (4) the in-season June–September (JJAS) Sahel precipitation index (from Joint Institute for the Study of the Atmosphere and the Ocean (Sahel rainfall index (20–10N, 20W–10E), 1900–May 2009, available at, accessed 2009)), and alternative ENSO indices of the post-season boreal winter including the (5) Niño1 + 2 (from the NOAA Climate Prediction Center, NOAA/CPC) and (6) the Niño 3 SST indices. Also considered as a candidate predictor is the (7) ASO “relative” MDR SST index, which is calculated by subtracting a time series of SSTs averaged across the global tropics (23.5°N to 23.5°S) from the averaged North Atlantic MDR SST series. In addition, the pool of candidate predictors includes, the (8) ASO global tropical SSTs that were used to calculate the relative MDR index above, (9) the May–June (MJ) NAO index [Jones et al., 1997] and (10) the in-season June–November (JJASON) AMM index (from NOAA/CPC). These data sets are all available across various time periods, ranging as far back as the mid-to-late 1800s. Plots of all time series are shown inFigure 2, and further details are provided in Table 1.

Figure 2.

Time series (1878–2007) of potential predictors, indicating conditions that are favorable (red) and unfavorable (blue) for TC activity.

Table 1. List of All State Variables That Are Considered as Candidate Predictors, as Explained in Section 2a
 PredictorsTime Interval Used
  • a

    The longest time interval of each predictor is also listed.

1post-season Dec–Feb Niño3.4 index1878–2007
2post-season Dec–Mar NAO index1878–2007
3in-season Aug–Oct mean Atlantic MDR SSTs1878–2007
4in-season Jun–Sep Sahel precipitation index1900–2007
5post-season Dec–Feb Niño 1 + 2 index1950–2007
6post-season Dec–Feb Niño3 index1878–2007
7in-season Aug–Oct relative Atlantic MDR SSTs1878–2007
8in-season Aug–Oct mean global Tropical SSTs1878–2007
9pre/in-season May–June NAO index1878–2007
10in-season Jun–Nov AMM index1950–2007

[11] As described above and in Table 1, some of the potential predictors in this modeling study are averages taken during or after the season, e.g., the boreal DJF winter. Thus, incorporation of these predictors into statistical seasonal forecast models relies on the assumption of statistical persistence of the associated anomalies. Despite the potentially limiting nature of that assumption, seasonal predictions incorporating such predictors have proven remarkably successfully, matching or beating in their skillfulness the predictions made by other approaches (see: Hurricane2012.html). In the current study, our emphasis and interest is not with seasonal forecasting, however, but instead a better understanding of the physical ties between TC variability and tropical climate state variables, whether they are measured before, during or after the season.

[12] In addition to considering this expansive set of tropical predictors, it may be desirable to partition the basin-wide TC best-track database (“HURDAT”) [Jarvinen et al., 1984] into clusters of like storms. Grouping storm counts by storm track using a “cluster” methodology has proven to be advantageous in several previous studies, as it allows for an analysis that can relate regional TC activity to spatially varying climate features such as ENSO, AMM, NAO, and MJO [e.g., Camargo et al., 2007a, 2007b, 2008; Kossin et al., 2010]. Specifically, this cluster technique has been previously applied to Atlantic extratropical cyclones [Gaffney et al., 2007], western North Pacific typhoons [Camargo et al., 2007a, 2007b], eastern North Pacific hurricanes [Camargo et al., 2008], Fiji TCs [Chand and Walsh, 2009] southern hemisphere TCs [Ramsay et al., 2012], and the most relevant to this study—North Atlantic TCs [Kossin et al., 2010]. This path clustering technique is accomplished by utilizing a mixture model, in which every component consists of a quadratic regression curve of TC position versus time. The model is fit to the data via a maximum likelihood estimation of the parameters [e.g., Kossin et al., 2010]. Each TC track is then assigned to one of the K different quadratic regression models, with each model being described by regression coefficients and a noise matrix. As is the case in the K-means method, the number of clusters used in this methodology is not uniquely determined in the cluster analysis. Therefore, in-sample log likelihood values are used to obtain the optimum number of clusters, just as inKossin et al. [2010] and Camargo et al. [2007a, 2008]. Kossin et al. [2010] found that at least four clusters are needed to capture the track types being partitioned (Figure 3). Consequently, in addition to the basin-wide Atlantic TC series, our analysis uses an extension of the four-cluster decomposition of Atlantic TCs ofKossin et al. [2010], dating back to 1878 in this case instead of 1950. Further, by combining multiple clusters we can also analyze an inherently larger subset of TCs, focusing on, for example, clusters containing TCs of primarily tropical origin, or clusters that contain the majority of societally impactful TCs (i.e., intense hurricanes and landfalling storms).

Figure 3.

TC tracks from 1950 to 2007 for each of the four clusters, as separated by the cluster analysis methods detailed within the text (adapted from Kossin et al. [2010]).

[13] Due to improvements over time in the detection of TCs from technological advances such as aircraft reconnaissance and satellites, there is likely a bias in estimates of basin-wide TC counts in earlier decades of HURDAT [see, e.g.,Landsea, 2007; Chang and Guo, 2007; Mann et al., 2007]. Therefore, a recent published adjustment to HURDAT [Vecchi and Knutson, 2008] is used in our modeling exercises. This adjustment, which dates back to 1878, provides an estimate of potential missing TCs in HURDAT prior to advent of weather satellites by using historical ship tracks in the pre-satellite era in combination with modern storm track information. All analyses in this article of the total basin-wide Atlantic TC counts, therefore, make use of theVecchi and Knutson [2008] (“VK08”) estimates. Clearly, since individual cluster series are dependent on storm track information, these count adjustments cannot be translated to the individual cluster series. As a result, since the four individual cluster series are derived from counts and tracks within HURDAT, they likely contain an uncorrected undercount bias, especially early in the record (prior to the advent of aircraft reconnaissance).

[14] Ultimately, this leads to a total of six target series—the VK08adjusted basin-wide TC counts, the four individual TC cluster series defined inKossin et al. [2010] and a combination of those clusters representing the majority of societally significant storms that are more likely to become intense hurricanes and/or make landfalls (clusters 2–4 in Kossin et al. [2010]). We seek to derive objective, optimal statistical models in terms of the potential underlying climate factors (TC series shown in Figure 4) for each of these series.

Figure 4.

Time series (1878–2007) of the primary TC predictands that are analyzed in this study. Red (blue) indicates positive (negative) TC count anomalies with respect to the mean of each individual time series. Basin-wide TC counts have been adjusted as described in text. Note the wide variation in range between the ordinate axes.

[15] A potential downside of modeling the individual cluster TC series is that the sample size contributing to seasonal mean counts is often greatly diminished relative to basin-wide counts. More specifically, the relative sampling uncertainty goes as the square root of the total number of storms divided by the total number of storms (i.e., √nTC,tot / nTC,tot = nTC,tot−1/2). Therefore, when analyzing exclusive quantities such as major hurricane landfalls or in this instance, individual clusters of TCs, the sample sizes may be prohibitively small for establishing statistical skill or significance in any underlying statistical model, which again underscores the utility of considering combinations of multiple clusters. Discussion of sampling sizes in this manner is not referring to the sample size of the individual time series, which is equivalent to the number of years in annual time series.

3. Methods

[16] Using the expanded pool of ten predictors discussed in section 2 (Table 1 and Figure 2), and the set of six target predictands (adjusted basin-wide TC counts, each of the four cluster TC count series, and the combination of multiple clusters;Figure 4), Poisson regression is applied to model climate influences on annual TC counts. This approach assumes TC counts can be appropriately represented by a Poisson process, in which the probability of observing a certain number of TCs (nTC) in any given year is defined as

display math

[17] In a Poisson distribution, the mean occurrence rate, λ, is the sole free parameter, and the unconditional case has a maximum likelihood value of the mean annual count. The null hypothesis is that λ is a constant, while Poisson regression tests the alternative hypothesis that λ is a function of other variables, e.g., climate state variables. Poisson regression models are governed by the following equation

display math

where in our case, E is the expected number of TCs in a single year as predicted by the Poisson regression model, X represents a vector of the various climate predictors, and β contains the regression coefficients. Some recent studies suggest that alternative forms of regression models might also be suited to model the influence of climate state variables on TC activity [e.g., Villarini et al., 2010; Mestre and Hallegatte, 2009], but we choose to build on the rich body of work applying Poisson regression to modeling TC count data [e.g., Solow and Nicholls, 1990; Solow and Moore, 2000, 2002; Elsner et al., 2000b, 2001; Elsner, 2003; Elsner and Jagger, 2006; Villarini et al., 2010; Tippett et al., 2011].

[18] A standard forward stepwise (Poisson) regression is applied to each of the six available predictands (Figure 4) using all possible combinations of nx predictors (nx = 1,2,…,nx,max; where nx,max is the maximum available number of predictors available). Not all predictors extend back over the full interval to 1878 (Table 1), so models are tested over three possible time intervals, the shortest of which allows testing of all predictors, and the longest of which has a smaller pool of candidate predictors: nx,max = 7 for the analyses from 1878, nx,max = 8 for the analyses from 1900, and nx,max = 10 for the analyses from 1950. Since, each of the potential predictors measure tropical climate features, they overlap by varying degrees (Table 2). Therefore, to minimize redundancy among the predictors, at most one NAO index, one ENSO index, and one MDR or tropical mean SST index is used in any modeling exercise.

Table 2. Correlation Matrix Showing the Pearson Linear Correlation Coefficients (r) Between Each of the Ten Candidate Predictors That Are Listed in Table 1a
Pearson Linear Correlation Coefficients (1950–2007)ENSO IndexesNAO IndexesOther SST IndexesOther Indexes
3.431 + 2DJFMMJMDRRel.Trop.AMMSahel
  • a

    In order to include all ten of the predictors, each correlation is calculated for the shortest time interval included in this study, 1950–2007.

ENSO indexes    −0.02−0.03−0.410.40−0.26−0.25
   1 + 20.690.771.000.14−0.160.07−0.340.46−0.17−0.31
NAO indexes          
Other SST indexes          
Other indexes          

3.1. Development of the Poisson Model via a Stepwise Forward Regression Approach

[19] The Poisson regression model is constructed using all possible choices of a single predictor from the pool of nx,max available predictors. In the first iteration, the predictor that yields the lowest mean squared error (MSE) is chosen. From the remaining pool of nx,max − 1 candidates, a second predictor is considered, now selecting the bivariate combination with the lowest MSE (i.e., the choice of second predictor is dependent on its interaction with the first predictor chosen). This procedure is repeated until all nx,maxpredictors are used. Cross-validation statistics (see below) are used to select the optimal order,nx,opt, of the statistical model (with the goal that nx,optnx,max). The statistical model with nx,optvariables and the lowest averaged cross-validated MSE is selected.

[20] Goodness of fit of the resulting statistical models is measured by a suite of metrics, including mean squared error (MSE) and coefficient of determination (R2) [e.g., Wilks, 2005] and χ2 statistics measuring both the goodness of fit of the statistical model and the “adequacy” of the fit. Adequacy is defined here by the χ2 test of independence as the consistency of the residual variance with purely random Poisson statistical behavior. The χ2 statistics are defined as:

display math
display math

where E once again is the expected TC counts from the Poisson regression model's predictions, O is the observed TC count, nyrs is the number of years in the time interval, eijis a cell in the 2 × 2 contingency table constructed from expected model-predicted counts, and oij is a cell in a 2 × 2 contingency table constructed from the observed counts. Ultimately, as values of αfit approach zero, the probability that such a skillful model would arise from chance alone becomes increasingly low. As values of αadequacy approach one, the probability that residual unresolved variance is consistent with purely random Poisson process behavior becomes increasingly high.

[21] Independent cross-validation experiments are used to evaluate the predictive skill of the underlying statistical models. In these experiments, the model is calibrated over one-half of the data set, and an independent prediction of TC counts is made for the other half, and the goodness of fit of the prediction is evaluated. The procedure is repeated alternatively using both the first and second half of the data for calibration/validation, and an average set of validation scores are obtained. A variety of statistical measures are favored in the climate literature for evaluating the cross-validation skill of statistical models. Among these are the validation MSE, and various forms of the coefficient of determinationR2 (which measures the fraction of variance resolved by the statistical model). Calculating a validation R2score adopting the outside-sample baseline mean yields what is referred to as the “reduction of error” (“RE”), while calculating a validation R2score adopting the out-of-sample baseline mean yields a somewhat more challenging metric, that is sometimes referred to as the “coefficient of efficiency” or “CE.” These two metrics are calculated as:

display math
display math

where E again is the expected TC counts based on the prediction from the Poisson regression model, O is the number of TCs from the observed predictand, and the subscripts designate the in-sample or out-of-sample half of the model or predictand data. Additionally, the squared linear correlation coefficient (r2) [e.g., Wilks, 2005] can be used to measure the fraction of resolved variance, though it should be noted that r2is insensitive to both the mean and variance of the estimate, and so is a somewhat less rigorous validation measure. In principle, the most skillful models should out-perform the others with respect to most, if not all, of these alternative cross-validation skill metrics.

[22] In addition to the primary cross-validation methods detailed above, we adapt a one-year validation methodology to assess the skill of our basin-wide TC count models. To accomplish this we predict the number of total TCs in a single year, training the model on the data (predictors and TC counts) from all of the previous years. A table with the results of these tests, which verified to within one standard deviation of a Poisson process 80% of the time from 1983 to 2007, is included in theauxiliary material.

[23] Overall, the principle behind our approach is to make the determination of statistical models linking climate and TCs as objective as possible. This goal is achieved by employing (i) a large pool of predictors that includes all or nearly all predictors that have been suggested in previous statistical modeling exercises, and (ii) an objective stepwise screening process both for selecting predictors from a larger pool of candidate predictors, and for independently evaluating the skillfulness of competing statistical models.

4. Results

4.1. Basin-Wide Tropical Cyclone Counts

[24] Time series resulting from the three Poisson models (spanning the three data intervals) for the adjusted basin-wide TC counts are shown inFigure 5a, while key results from the calibration and cross-validation exercises are summarized inTable 3. The cross-validation results provide some support for recent work byMann et al. [2007] favoring the use of absolute (rather than “relative”) MDR SST, and the use of ENSO and NAO indices as additional predictors. The Niño 3.4 index is statistically favored over the other two (Niño 3 and Niño 1 + 2) ENSO indices considered in all but the middle time interval (1900–2007). NAO shows less robustness as a predictor of total counts, with no consistency in the choice of NAO predictor across the three models. Based on χ2 tests on each interval (Table 3), the statistical models are highly significant (αfit ≪ 0.05) and provide no compelling evidence for unresolved structure (αadequacy ∼ 0.70–0.90).

Figure 5.

Statistical models for Atlantic basin-wide and cluster TC count series using the predictors specified inTable 1. Colors in each panel correspond to: TC counts (black), and the models trained on the interval 1878–2007 (red), 1900–2007 (blue), and 1950–2007 (green). Total TC counts (a) are adjusted as discussed in text.

Table 3. Results of Calibration and Cross Validation Tests Employed in the Statistical Modeling Exercisesa
PredictandTime IntervalPredictors UsedAccepted Poisson Regression Model Statistics
Calibration StatisticsCross Validation Scores
  • a

    The various statistics are tabulated as defined in the text. Predictors are indicated in the order they are selected in the forward stepwise screening regression. The results of the χ2 tests are measured with respect to the probability (α) of rejecting the relevant null hypothesis. As values of αfit approach zero, the probability that such a skillful model would arise from chance alone becomes increasingly low. As values of αadequacy approach one, the probability that residual unresolved variance is consistent with purely random Poisson process behavior becomes increasingly high.

Unadjusted TC counts1878–2007MDR SST, Nino 3.4, DJFM NAO8.820.490.000.6411.060.550.290.41
VK08 adjusted TC counts1878–2007MDR SST, Nino 3.4, DJFM NAO9.000.440.000.859.840.470.390.39
 1900–2007MDR SST, Nino 39.420.430.000.7111.090.430.320.34
 1950–2007MDR SST, Nino 3.4, MJ NAO8.840.470.000.9012.240.370.210.30
Cluster one TCs1878–2007MDR SST, DJFM NAO2.730.140.000.323.470.23−0.230.08
 1900–2007MDR SST, Sahel Precip, DJFM NAO2.680.190.000.393.290.29−0.090.07
 1950–2007MJ NAO, Nino 32.610.140.100.872.810.150.100.13
Cluster two TCs1878–2007Nino 3.4, MDR SST2.690.
 1900–2007MDR (relative), Nino 3.42.830.
 1950–2007DJFM NAO, MDR SST2.400.110.160.432.810.010.000.04
Cluster three TCs1878–2007Tropical SST, Nino 3.4, DJFM NAO2.520.350.000.004.550.33−0.420.12
 1900–2007Tropical SST, Nino 3.4, DJFM NAO2.660.370.000.004.970.21−1.970.07
 1950–2007MDR SST, Nino 1 + 2, MJ NAO2.320.480.000.642.660.530.310.37
Cluster four TCs1878–2007MDR (relative), Nino 3, MJ NAO2.490.
 1900–2007MDR (relative), Nino 3.4, MJ NAO2.490.
 1950–2007AMM, MDR SST, Nino 1 + 21.180.340.000.311.610.190.110.26
Sum of last three clusters1878–2007MDR SST, Nino 3.4, DJFM NAO6.730.430.000.437.560.450.340.36
 1900–2007MDR SST, Nino 3.47.430.410.000.379.690.340.220.29
 1950–2007MDR SST, Nino 3.4, DJFM NAO5.940.560.000.739.140.380.200.40

[25] As in Mann et al. [2007], just under half of the total variance (R2 = 0.43–0.49) in annual basin-wide TC counts is resolved in calibration using the each of the training intervals, with cross-validation tests indicating similar levels of skill, depending on the precise skill metric used (RE = 0.37–0.55, CE = 0.21–0.39; r2 = 0.30–0.41; Table 3). The models also indicate significant skill in reproducing lower frequency variability when a 5 and 10 year filter is applied (calibration and validation r2 ≈ 0.5). Overall, the validation scores in Table 3 give somewhat mixed results with respect to whether the adjusted “VK08” TC series or uncorrected TC series yields a more skillful model, with MSE and CE favoring the VK08 series but RE and r2 favoring the uncorrected series (Table 3).

[26] Somewhat surprisingly, dependence of total-basin TC counts on the additional predictors tested here, such as the AMM index and Sahel precipitation, is not objectively supported by the forward regression exercises. It is possible that the shorter training intervals that are available for using these predictors (Sahel rainfall extends only back to 1900, and the AMM series extends only back to 1950) are simply not sufficient to identify potential additional useful information in these series. Redundancies between these unsupported predictors with the ENSO, NAO, and/or SST indices identified in the model development (Table 2) also contribute to their lack of inclusion in the model. Indeed, one complication in comparing the relative merit of competing predictors over relatively short (i.e., less than 60 year) time intervals is the fact that the competing predictors differ substantially in their frequency domain attributes [see, e.g., Mann and Emanuel, 2006]. For example, the AMM is closely correlated with Atlantic TC activity on both interannual and decadal timescales [Kossin and Vimont, 2007; Vimont and Kossin, 2007] while MDR SST is most strongly correlated with Atlantic TC activity on decadal and longer timescales [e.g., Mann and Emanuel, 2006]. Furthermore, we must also consider the possibility that climate state variables utilized in these models may be subject to biases due to evolving observation and data sampling techniques. In combination with the previously discussed TC undercount bias, these factors may explain why the correlation between the adjusted counts and the statistical model is much higher in the modern era than it is in the 19th and early 20th centuries (Figure 5a).

4.2. Tropical Cyclone Cluster Time Series

[27] The results for the models of the four individual TC cluster counts (Figures 5b–5e) are more mixed than those obtained for models of basin-wide TC counts. Overall, a larger mix of predictors emerge in the forward stepwise screening regression approach, but in most cases either the statistical skill, regression adequacy, or both are called into question by the statistical results obtained.

[28] The first cluster (Figure 4b) contains TCs that originate primarily over the north and eastern parts of the basin (Figure 3a); these storms tend to have significant curvature in their paths that is modulated by subtropical high variability, placing these TCs into a more baroclinic environment [Kossin et al., 2010]. Overall, this cluster is the most populated of the four clusters, accounting for 31% of all basin-wide TCs from 1878 to 2010. Models trained on this cluster (Figure 5b) tend to be less skillful than those found above for basin-wide TC counts (Table 3), though interestingly, two of the same predictors from the total basin-wide count models (MDR SST and winter NAO) are nonetheless chosen in the forward stepwise screening regression procedure for cluster 1 TCs using the full (1878–2007) interval. The calibration and cross-validation results suggest that less than 20% of the interannual variance is resolved by the statistical model, thoughCEscores are close to zero, indicating that much of the out-of-sample skill comes from predicting the changes in mean counts. The level of skill in the statistical models using shorter training intervals is further reduced and there is substantial evidence (i.e., low values ofαadequacy) for unresolved structure in these models. For the shortest training interval (1950–2007), the selected model does not pass statistical significance (αfit> 0.05). Interestingly, this first cluster is the only one for which there is some hint of the predictive value of Sahel rainfall, however the cross-validation results in this 1900–2007 case indicate limited statistical skill.

[29] The second cluster (Figure 4c) contains TCs that typically originate in the Gulf of Mexico or the Caribbean Sea and have a northward component in their tracks (Figure 3b). TCs within this cluster are strongly modulated in intraseasonal time-scales by the Madden-Julian Oscillation [Kossin et al., 2010] and account for 29% of total basin-wide observed TCs from 1878 to 2007. Cross-validation results for this cluster are uniformly poor. Since Caribbean genesis events form a substantial portion of this cluster, the differing TC variability with ENSO between the western and eastern Caribbean may be confounding this signal. For example, phases of eastern Caribbean TC activity correspond better with decadal trends in ENSO than do phases of western Caribbean TCs [e.g.,Evans et al., 2011; Giannini et al., 2000, 2001]. It is also possible that the standard climate factors considered here simply do not have any decisive relationship with this particular family of Atlantic TCs.

[30] The third cluster (counts: Figure 4d; tracks: Figure 3c) includes many of the more intense TCs and ones that largely form in the eastern part of the North Atlantic Basin. This cluster contains a large fraction of recurving TCs, and the highest percentage of hurricanes compared to the other clusters [Kossin et al., 2010]. From 1878 to 2007, this cluster contains 22% of all storms in the basin. The statistical models for this particular cluster (Figure 5d) show among the greatest apparent skill of all four clusters considered, though the results are quite variable with respect to the time interval considered. The chosen predictors are again similar—MDR SST, ENSO, and NAO, though the flavors of the indices chosen are different (Niño1 + 2 for ENSO and the pre/within-season MJ NAO). Interestingly, the mean Tropical SSTs are preferred over either the Atlantic MDR or the MDR anomaly index for the two longest intervals. The adequacy tests fail over these two intervals, however, (αadequacy≪ 0.05) suggesting that there is substantial unresolved structure in the residuals. Moreover, the cross-validation results are both inferior and highly variable with respect to the particular metric used for the two longer training intervals. Over the most recent interval (1950–2007), the cross-validation results indicate skill that is competitive with that obtained above for basin-wide storms, suggesting that 30–50% of the interannual variance can be skillfully resolved. In this case, it is possible that the shortest interval was able to yield the most skill, since the cluster series is unadjusted for potential undercount biases and is particularly dependent on the observed storm tracks, which have been more accurately observed in the modern era, especially in the eastern part of the basin.

[31] The fourth cluster of TCs (counts: Figure 4e; tracks: Figure 3d) corresponds to storms that develop primarily in the southern part of the basin and typically have relatively east to west tracks. Of the four clusters, this cluster is comprised of the fewest number of storms, accounting for only 19% of the total storm counts from 1878 to 2007. Cross-validation scores generally indicate that the statistical models (Figure 5e) can skillfully resolve more than 20% of the interannual variance, though the precise skill varies with the time interval considered. The most skillful model is derived for the most recent (1950–2007) interval, where, unlike any other cluster, the AMM (only available back to 1950) is chosen as one of the three predictors (the other two are MDR SST and Niño 1 + 2). Given the rather small area of the Niño 1 + 2 region, its inclusion both here and in models for cluster 3 is interesting. However, one interpretation of this correlation between the Niño 1 + 2 and these southernmost clusters may be that Atlantic TCs in the southern part of the basin are influenced by a larger area of tropical Pacific SSTs, possibly further to the north along the South American coast or even west into the eastern portions of the Niño 3 region.

[32] Adequacy tests for the 1950–2007 interval, give an indeterminate result (αadequacy ∼ 0.3) suggesting the possibility of unresolved residual structure. For the two longer intervals (1900–2007 and 1878–2007) a different set of predictors are chosen, including in both cases the “relative” MDR SST series, a series that is not selected for any of the other cluster series or for basin-wide TC counts. In both cases, there is substantial unresolved structure in the residuals, in agreement with the adequacy test results (αadequacy < 0.05).

[33] Overall, models targeting the rather small cluster subsets proved to be less skillful than the models targeted on basin-wide activity as a whole. Therefore, we also consider a larger subset comprised of the sum of the last three clusters (Figure 4f). This combination is selected in order to remove the weaker and mostly open ocean storms in cluster 1, which will allow the analysis to focus on storms of greater societal impacts (i.e., landfalling and more intense hurricanes). The resulting cluster combination accounts for roughly two-thirds of all observed TCs and 85% of all landfall events since 1950. It is of great interest that the vast majority of the most destructive storms in recent history are also included, with the only notable exception being Hurricane Bob (1991).

[34] Not surprisingly, the same three predictors preferred in modeling basin-wide activity—absolute MDR SSTs, Niño 3.4, and DJFM NAO—were utilized in the models trained on this subset of TCs of primarily tropical origin. Overall, these Poisson models for the three-cluster subset of storms (Figure 5f) exhibit much more skill, based on cross-validation tests, than do the models that targeted on individual clusters. In fact, models across all three training intervals often match and in some cases exceed the calibration and cross-validation scores of the models trained on total TC counts. For all three training intervals, the models resolve more than 40% of the interannual variance, andχ2 tests indicate high significance (αfit ≪ 0.05). However, models trained on the longer two time intervals indicate the possibility of unresolved residual structure (αadequacy < 0.5), possibly at least in part due to heteroscedasticity in the storm track errors through time.

[35] A more restrictive combination of clusters 3 and 4, which focuses on TCs of mainly tropical origins in the southern and eastern parts of the basin, might also be considered as a potential target TC series. Such a partition focused on “tropical” type storms might be better represented by the tropical predictors used in this study. The resulting models for this partition (on each of the three time intervals) were consistent with, albeit less skillful than, those created for the less restrictive cluster combination (clusters 2, 3, and 4). Once again, the models for in this partition favored the use of the “absolute” MDR SST index, the Niño 3.4 index, and the NAO index. Adequacy tests fail over all three of the intervals in this case, however, (αadequacy < 0.10) suggesting that there is substantial unresolved structure in the residuals.

5. Conclusions

[36] The results of the statistical analyses with HURDAT, a recent set of adjusted counts, and observed climate predictors confirm previous findings [e.g., Mann et al., 2007], that Atlantic basin-wide TC counts can be skillfully modeled in terms of the MDR SST, ENSO, and the boreal winter post-season NAO index, resolving roughly half of the total interannual variance in both calibration and cross-validation. In the present analysis, the model predictors are chosen from a larger set of ten potential predictors (including other alternative predictors that have been argued for in some previous studies) through a forward stepwise screening regression approach. Thus, the recovery of the three predictors identified by Mann and collaborators provides confirmation of the significance of these climate factors on North Atlantic basin TC counts. Adequacy tests suggest little evidence of any residual unresolved structure in the resulting statistical model, hinting at the possibility that these three climate predictors account for essentially all non-random year-to-year variability in Atlantic basin-wide TC counts.

[37] The results presented here do not support the use of a “relative” MDR SST index in place of the “absolute” MDR SST as a predictor of TC activity [Vecchi et al., 2008; Ramsay and Sobel, 2011]. The most skillful models for basin-wide storm counts, and the majority of the clusters, include the actual MDR SSTs as a predictor, rather than either the relative SSTs or the global tropical mean SSTs. Despite the high correlations between the different SST indices, simply substituting relative SSTs for “absolute” MDR SSTs in our basin-wide TC count models results in an increased mean squared error of more than 25% for each of the target time intervals.

[38] When observed Atlantic TCs are partitioned into four distinct “flavors” of TCs through a cluster analysis of Atlantic TCs [Kossin et al., 2010], the findings are neither as consistent nor as significant as they are with the total number of TCs in the adjusted historical record. For example, the second cluster — which largely includes storms of Gulf of Mexico and western Caribbean origin — shows no clear statistical relationship with any of the ten candidate climate indices considered. However, other clusters (associated for example with storms of tropical Atlantic origin) do appear to evidence some degree of skillfulness in being modeled in terms of climate predictors. Most commonly, the same predictors emerge – MDR SST, indices of ENSO, and indices of the NAO, usually in this order – in the statistical models for the TC cluster series that emerged in the model for basin-wide TC counts. Considering the upward ocean heat fluxes that drive TCs, it is not surprising physically that a measure of SST in the basin during the most active period of the season is the most significant predictor. Likewise, the consistent dependence of the models in this study on measures of ENSO and the NAO are expected due to their relationships to inhibiting wind shear and large scale steering respectively. It is interesting nonetheless, that in most cases the indexes for ENSO and the NAO are post-season averages, not in-season measurements. This is likely due to the fact that NAO and ENSO both exhibit a stronger signal in the boreal winter than the spring or early summer. In addition, under certain limited circumstances, predictors such as Sahel rainfall, the AMM, or “relative” MDR SST are selected in the statistical models.

[39] Models for combinations of clusters yielded higher skill than the individual clusters alone, and are comparable with the full TC count models. Statistical tests for this series showed consistent predictive skill across multiple training periods, similar to the models for basin-wide activity. Encouragingly, the same three predictors—MDR SSTs, Niño 3.4, and DJFM NAO emerged for the “cluster combinations” used here. Ultimately, the three clusters (2, 3, and 4) included in the larger partition of storms contain 100% of all category 5 storms, 88% of all major hurricanes, and 85% of all landfall events since 1950. Therefore, additional work on these types of TCs is clearly warranted to explain the remaining unexplained variance within these models of highly societally significant storms. Models of these somewhat skillful clusters might be improved further by excluding subtropical storms that are likely explained by a modified set of climate state variables that also captures variability in the midlatitudes.

[40] While the analyses indicate that basin-wide TC counts or large (judiciously chosen) subsets thereof can be more skillfully modeled than can the TC counts for the individual clusters, the total number of TCs across the North Atlantic basin does not necessarily correlate with the destructiveness of particular hurricane seasons. Therefore, in many applications, it is important to know not justhow many TCs may form in a particular season, but what types of TCs are likely to form and wherethey are likely to track. There is evidence in several of the analyses of TC cluster series of substantial unresolved, non-random structure in the residual unexplained variability. One possible interpretation of this result is that there are yet other climatic processes that may condition TC behavior in a given season that are not captured by even the somewhat expansive pool of candidate predictors considered in this study. Further correction of remaining biases, both in best-track TC records and the climate predictors themselves, could potentially lead to improved results in modeling climate influences on Atlantic TC behavior, and even more so to modeling improvements for the subset of Atlantic TC having the greatest impact on the residents of that region.


[41] We thank Sonya Miller for her help in obtaining the various SST data sets and for other technical assistance. In addition, we thank Eugene Clothiaux for his helpful comments and suggestions. Thanks are also in order for the three anonymous reviewers who provided thoughtful comments and suggestions that improved the quality of this work. Finally, we acknowledge the support for this study from NSF grant ATM-0735973.