Abstract
 Top of page
 Abstract
 1. Introduction
 2. Methods and Data
 3. Results
 4. Conclusions
 Acknowledgments
 References
 Supporting Information
[1] We obtain new insights into the reliability of longterm historical Atlantic tropical cyclone (‘TC’) counts through the use of a statistical model that relates variations in annual Atlantic TC counts to climate state variables. We find that the existence of a substantial undercount bias in late 19th through mid 20th century TC counts is inconsistent with the statistical relationship between TC counts and climate.
1. Introduction
 Top of page
 Abstract
 1. Introduction
 2. Methods and Data
 3. Results
 4. Conclusions
 Acknowledgments
 References
 Supporting Information
[2] Positive trends over recent decades have been established for various measures of the powerfulness of Atlantic TCs [Emanuel, 2005a; Webster et al., 2005; Sriver and Huber, 2006; Hoyos et al., 2007]. These trends appear connected with increasing tropical Atlantic sea surface temperatures (SSTs) [Emanuel, 2005a; Hoyos et al., 2007], and the increasing SSTs are likely driven in large part by anthropogenic climate change [Mann and Emanuel, 2006; Trenberth and Shea, 2006; Santer et al., 2006]. More debated, however, are longterm trends in Atlantic TC occurrence rates or ‘counts’ and, in particular, their reliability in earlier decades.
[3] Observations of TC behavior have improved over time, particularly since the mid 20th century with the development of more sophisticated observing systems such as aircraft reconnaissance and satellite technology. Prior to the availability of modern observing systems, detection of tropical storms was based on observations along coastlines and from ships. As the density of those observations becomes increasingly sparse back in time, it is plausible that an increasing number of TCs were missed by observers in prior decades.
[4] In some studies, a substantial undercount bias has been argued to exist, particularly prior to the availability of aircraft reconnaissance in 1944. Nyberg et al. [2007] used a biological proxybased reconstruction of past TC activity to infer average counts for major (category 3 and higher) Hurricanes between 1870–1943 that exceed the high levels of the past decade. This finding would imply an average pre1944 undercount bias of 2–3 storms a year for major hurricanes alone. Assuming that major hurricanes account for between a sixth and a third of all named storms (the range of decadal variation in this fraction from the available historical record by Holland and Webster [2007]), such an estimate would imply a dramatic pre1944 average annual undercount of 6–20 named storms.
[5] Others [Landsea, 2005, 2007; Landsea et al., 2004] have argued for a smaller, but still substantial undercount. Landsea [2007] estimated undercounts based on changes over time in the proportion of total reported Atlantic named storms (“PTL”) that achieved landfall on islands or along the U.S. coastline. PTL was interpreted as a proxy for the underreporting of total TC counts due primarily to decreasing track density back in time, with higher PTL interpreted as reflecting increased undercount bias. The increase in average PTL over 1900–1965 (75%) relative to 1966–2006 (59%) was interpreted by Landsea [2007] as reflecting an undercount bias of 2.2 total named TCs for the earlier period. Combined with a speculative additional 1 storm bias that was assumed to hold prior to 2003, this led Landsea [2007] to estimate to an aggregate undercount bias of 3.2 named TCs prior to 1966.
[6] Yet still other studies conclude that the longterm record of Atlantic TC counts is likely reliable back through the late 19th century [Emanuel, 2005a; Mann and Emanuel, 2006; Holland and Webster, 2007]. Some analyses [Neumann et al., 1999; Holland and Webster, 2007; Chang and Guo, 2007] estimate an average undercount of at most only about one TC through the early 20th century. Indeed, an alternative analysis of longterm PTL changes to that described above implies modest undercount. Extending PTL prior to 1900, we find that PTL actually decreased prior to 1900 (see auxiliary material), implying, by the reasoning of Landsea [2007], a decrease, rather than increase, in undercount bias prior to 1900. Yet logically the undercount bias should be higher prior to 1900 owing to a further decrease in ship tracks, and coastal population density. The mean value of PTL was 67% over 1851–1899, and lower still (61%) for 1851–1885 (see Holland [2007] for a similar analysis). This latter value is statistically indistinguishable from the average PTL over the modern period 1966–2006 (59%), which by Landsea's reasoning, would imply a minimal undercount over 1851–1885.
[7] Of course, estimation of undercount based on the assumption of a fixed relationship between total TC counts and the number of landfalling storms is perilous. Such an approach assumes, in particular, that the largescale atmospheric steering which determines the trajectories of TCs once they've formed is constant, when there is in fact strong evidence that it is highly variable over time [Elsner, 2003]. Yet, other methods of estimating the undercount bias in early TC count data (e.g. interpretation of ship logs, or use of proxy climate data) rest on what are arguably equally tenuous assumptions.
[8] Seeking to resolve the discrepancies between the different assessments of Atlantic TC undercount bias discussed above, we have turned to an alternative approach that does not require any of the above assumptions. Instead, our approach draws upon a recently developed statistical model that conditions expected total Atlantic TC counts on underlying climate variables.
2. Methods and Data
 Top of page
 Abstract
 1. Introduction
 2. Methods and Data
 3. Results
 4. Conclusions
 Acknowledgments
 References
 Supporting Information
[9] While a significant component of the variation in TC counts from year to year represents the chance fluctuations of a random Poisson process, systematic changes over time in the mean expected rate of occurrence (i.e. annual TC counts), are believed to result from largescale climate forcing. Past work has shown that the El Niño/Southern Oscillation (ENSO) [Gray, 1984; Bove et al., 1998; Elsner et al., 2000, 2006; Elsner, 2003], the North Atlantic Oscillation (NAO) [Elsner, 2003; Elsner et al., 2000, 2006], and tropical Atlantic Sea Surface Temperatures (SST) [Gray, 1968; Shapiro, 1982; Shapiro and Goldenberg, 1998; Saunders and Harris, 1997; Goldenberg et al., 2001; Emanuel, 2005b; Mann and Emanuel, 2006; Holland and Webster, 2007] each lead to variations in annual Atlantic TC counts. The first two factors influence the amount of vertical wind shear in the atmosphere (less shear is more favorable for TC development), and the tendency for recurvature of storms (which influences the likelihood that TCs encounter a favorable thermodynamic environment for development). Their impacts on TC development are limited primarily to the latter part of the storm season (the boreal fall and early winter) and can therefore be measured using the boreal winter seasonal means conventionally used to define ENSO and NAO indices. The third factor represents the local thermodynamic factors influencing development, and is conventionally [e.g., Emanuel, 2005a] measured by its state during the peak of the storm season.
[10] The impact of these factors on expected TC counts can be accounted for [Elsner, 2003; Elsner et al., 2000, 2006], through a variant on linear regression known as ‘Poisson Regression’ which is appropriate for modeling the influence of some set of independent variables (‘covariates’) on the expected rate of a Poisson distributed random process. We employ a recently developed and validated Poisson regression model [Sabbatelli and Mann, 2007] for conditional expected Atlantic annual TC counts, updated here using a new blended SST product derived by averaging three alternative published SST datasets [Rayner et al., 2003; Smith and Reynolds, 2003; Kaplan et al., 1998] all available back to 1870. Historical Atlantic total named storm counts [Jarvinen et al., 1984] were regressed against three covariates: (1) mean SSTs during the AugustOctober (‘ASO’) peak of the tropical storm season over the main development region (‘MDR’: 6°–18°N, 20°–60°W) for Atlantic TCs, (2) the boreal winter Niño3.4 SST index of ENSO (normalized SST averaged over the region 5°S–5°N, 120°–170°W), and (3) the boreal winter NAO index [Jones et al., 1997]. None of the three covariates are significantly correlated with each other. However, we do find a statistically significant lagged correlation relating the Niño3.4 index to the MDR SST series for the following year's storm season, consistent with the observation elsewhere [Trenberth and Shea, 2006] that ENSO events influence tropical Atlantic SST in the following summer. This lagged influence of ENSO is implicitly accounted for in our analysis via the use of MDR SST as a statistical predictor. The MDR SST, Niño3.4, and NAO series used are shown along with the TC count series in Figure 1. Data used and other supplementary information can be found at: http://www.meteo.psu.edu/∼mann/TC_GRL07.
[11] The statistical model for Atlantic TC counts resulting from training (or ‘calibrating’) over the full 1870–2006 interval is shown in Figure 2. The statistical model captures a substantial fraction R^{2} = 50% (i.e., half) of the total annual variance in TC counts (statistically significant at the p ≪ 0.01 level). The skill of the model was diagnosed through ‘validation’ experiments [see Sabbatelli and Mann, 2007], wherein the full 137 year interval 1870–2006 was divided approximately into a first (1870–1938) and second (1939–2006) half. The statistical model was trained alternatively using either the first or second half, with TC counts predicted for the other (validation) subinterval using only the climate state variables and the statistical relationship that was developed over the training interval. In these tests, the statistical model was found to successfully predict only slightly less (R^{2} = 43%; p ≪ 0.01) of the annual TC variance than was nominally resolved above during calibration over the full 1870–2006 period. Similar results were achieved for a bivariate Poisson regression using only the MDR and Niño3.4 series (Figure 2), though the skill estimates were found to be slightly lower (R^{2} = 43% for calibration and R^{2} = 36% for validation).
[12] We subsequently used the statistical model to investigate the issue of potential TC undercount bias. Our underlying assumption is that a properly biascorrected record of TC counts should yield longterm relationships between TC activity and climate that are consistent over time. We therefore investigated the implications of our model given alternative possible adjustments of annual TC counts reflecting varying levels of assumed undercount prior to the aircraft reconnaissance period.
3. Results
 Top of page
 Abstract
 1. Introduction
 2. Methods and Data
 3. Results
 4. Conclusions
 Acknowledgments
 References
 Supporting Information
[13] We first analyzed the scenario of no undercount bias, using the unadjusted TC count series in our analyses. We found (Figure 3a) that training over the modern 1944–2006 interval yielded predictions over the prereconnaissance (1870–1943) interval (mean annual count = 8.84 TCs) that slightly overpredicted the ‘observed’ TC counts (mean annual count = 7.65 TCs). The significance of this difference, which suggests an undercount of roughly 1.2 TCs prior to 1944, can be estimated based on the null hypothesis of a fixed mean Poisson process. Under this null hypothesis, and recognizing that the underlying samples are large (∼100), the difference between predicted and observed mean count rates can be approximated at tdistributed [see e.g., Wilks, 2005]. Application of Student's ttest indicates that the difference, while modest, is highly significant (Table 1). In other words, the predicted and observed means are statistically inconsistent.
Table 1. Results of t Tests for Differences Between Predicted and Observed Mean Occurrence Rates μ for Early and Late Prediction Intervals as Discussed in Text^{a}  t  Φ  p 


Observed vs. Predicted Occurrence Rate (1870–1943) 
Unadjusted Series  −3.71  73  0.0004 
‘Lightly Adjusted’ Series  0.018   0.99 
‘Heavily Adjusted’ Series  4.76   <0.0001 
Observed vs. Predicted Occurrence Rate (1944–2006) 
Unadjusted Series  3.00  62  0.004 
‘Lightly Adjusted’ Series  0.17   0.87 
‘Heavily Adjusted’ Series  −4.11   0.0001 
[14] Motivated by this finding, we next considered a scenario which assumes a very modest early undercount of 1.2 TCs per year during the prereconnaissance period by constructing a ‘lightly adjusted’ TC series in which the annual TC count was simply increased by 1.2 over the early (1870–1943) subinterval. In this case, observed and predicted 1870–1943 means (8.85 and 8.84 respectively) are both visually (Figure 3b) and statistically (Table 1) indistinguishable.
[15] Finally, we considered a scenario which assumes a substantial early undercount of 3 TCs per year during the prereconnaissance period by constructing a ‘heavily adjusted’ TC series in which the annual TC count was increased by 3 over the 1870–1943 subinterval. In this case (Figure 3c) the actual mean count (10.65) is substantially higher than the predicted annual mean count (8.84), and the difference is highly significant (Table 1).
[16] Generalizing the analysis further, we established that assumed undercounts of either less than 0.53 or more than 1.91 yielded observed means that are statistically inconsistent with the predicted means (i.e., the null hypothesis of equal means can be rejected at the p < 0.05 level). Our analyses consequently suggest the average pre1994 undercount to be between 0.5 and 2, with a most likely value of 1.2.
[17] While a potential criticism of our analysis is that our prereconnaissance statistical model estimates might be compromised by expanded uncertainties in the SST data used prior to World War II, we find this unlikely to be the case. Similar results were obtained using any one of the three individual SST products in place of the blended SST product used above (see auxiliary material). As the different SST products employ different mixes of in situ and remotely observed SST measurements, and make different assumptions about corrections for e.g. the early 20th century switch from bucket to ship intake measurements of seawater properties, our results appear to be robust with respect to uncertainties in SST data. Moreover, we obtained consistent results to those described above when the role of training and prediction period were switched in our analysis (Figures 3d, 3e, and 3f; Table 1). When the statistical model was trained on the early (1870–1943) interval for which the data quality is ostensibly poorer, the predicted TC counts for the late (1944–2006) interval slightly underpredicted observed counts with the unadjusted TC count series, overpredicted observed counts for the ‘heavily adjusted’ scenario, and yielded predicted and observed means that were statistically indistinguishable for the ‘lightly adjusted’ scenario. Similar results were also obtained (see auxiliary material) using (1) only the two predictors MDR SST and Niño3.4, (2) using Niño3 rather than Niño3.4 as a measure of ENSO, and (3) using an alternative choice of division (1939 rather than 1944) between the ‘early’ and ‘late’ subintervals of the TC record. We conclude that our results are robust with respect to uncertainties in climate data, and the other methodological details of our analysis.
Supporting Information
 Top of page
 Abstract
 1. Introduction
 2. Methods and Data
 3. Results
 4. Conclusions
 Acknowledgments
 References
 Supporting Information
Auxiliary material for this article contains seven figures.
Auxiliary material files may require downloading to a local drive depending on platform, browser, configuration, and size. To open auxiliary materials in a browser, click on the label. To download, Rightclick and select “Save Target As…” (PC) or CTRLclick and select “Download Link to Disk” (Mac).
See Plugins for a list of applications and supported file formats.
Additional file information is provided in the readme.txt.
Please note: Wiley Blackwell is not responsible for the content or functionality of any supporting information supplied by the authors. Any queries (other than missing content) should be directed to the corresponding author for the article.