The alternating warm/cold phenomena in the Pacific, known as El Niño–Southern Oscillation (ENSO), is characterized by large perturbations to the worldwide climate. Indices have been defined to characterize this phenomenon. However, the commonly used indices contain an unwanted effect from the annual cycle that can be reduced by digital filtering. Using a filtered ENSO index NL on data from 1856 to the present allows more accurate calculations of various quantities to be made. New results are (1) the distribution of positive values of NL is Gaussian. Thus, large-magnitude El Niño events come from the tail of this distribution and not from some rare external perturbation. (2) The probability of occurrence of an El Niño of any magnitude can be calculated. A 1997–1998 El Niño will occur once in approximately 70 ± 20 years, while an El Niño 25% larger will occur once in approximately 700 ± 200 years. (3) The distribution of negative values of NL deviates from Gaussian because of a deficiency of large La Niña events. (4) Examination of the 20 largest El Niño events since 1856 shows that there is no increase in the frequency of such events with time.
 Anomalies in the sea surface temperature (SST) of particular equatorial regions of the Pacific Ocean show the El Niño/La Niña phenomena of alternating warm/cold regimes of period 2–7 years. Similar anomalies in the central and western Pacific in pressure between Tahiti and Darwin, Australia, are observed and defined by the Southern Oscillation Index. Bjerknes  was one of the first to report a correlation between these two phenomena: “… the temperature variations at the Pacific equator are associated with Sir Gilbert Walker's Southern Oscillation.” This correlation was made quantitative in a later paper by Rasmusson and Wallace , who determined that the correlation coefficient between equatorial SST and pressure anomalies was greater than 0.8. Many later investigations have demonstrated a strong correlation between the two phenomena, which is called the El Niño-Southern Oscillation (ENSO). Various aspects of the ENSO phenomenon are well documented [Philander, 1990; Glantz, 2001].
 El Niño events are of special interest because of their potential to cause worldwide climate effects. For example, the El Niño of 1982–1983 caused both droughts and floods with a death toll estimate in the thousands [Glantz, 2001]. This was called the “El Niño of the Century.” Fifteen years later, the El Niño of 1997–1998 occurred. This El Niño was “bigger” and “hotter” and even more devastating and was also called the “El Niño of the Century” [Glantz, 2001]. Missing from these characterizations are precise scientific definitions of the magnitude of the event and its timing, both of which are of great importance. A semiquantitative definition of “strength” of an El Niño is given by Quinn  to describe 125 events from 1479 to 1990. He used the terms very weak, weak, moderate, strong, and very strong with plus or minus yielding a 15-level scale. This scale, however, is based upon a variety of different subjective factors such as drought/famine reports that a different scientist could disagree with. In a similar way, the dates of an event are described by early, mid, or late, resulting in an ambiguity of at least 1 year. More precise definitions are given in this paper.
 To characterize El Niño/La Niña events and to test models, useful indices are needed. Such indices should not be “contaminated” with an unwanted signal from the annual cycle. The temperature anomaly indices are constructed from measurements of SST that have a strong signal from the annual cycle that is “removed” by various techniques. However, a small component remains. This component can be reduced by a digital filter, described below, to produce a modified index NL that allows quantitative determination to be made of quantities such as the magnitudes, distribution of amplitudes, and asymmetric properties.
 Observations also show that there is an asymmetry between the warm and cold phases of the ENSO cycle. The term “El Niño/La Niña asymmetry” usually means that the amplitude of El Niño events are larger than La Niña events [Hoerling et al., 1997; Jin et al., 2003; An and Jin, 2004]. This definition is, however, not quantitative. Quantitative measures of asymmetry include the skewness S that measures the deviations from Gaussian behavior (described in Appendix A). Skewness has been shown to be important in a number of studies of nonlinear processes [An and Jin, 2004; Burgers and Stephenson, 1999; Hannachi et al., 2003]. Although S is a measure of the asymmetry, it is ambiguous. An excess of positive events (El Niños) over that from a Gaussian distribution may lead to positive S. However, a positive S will also occur if there is a deficiency of negative events relative to a Gaussian, as will be shown in this paper. Another common measure of deviation from Gaussian behavior is the kurtosis K. This measures the relative contribution of the “tails,” but because it is symmetric, it is not a measure of asymmetry. The original probability density function (pdf) of the index values is a better starting point because quantitative measures of asymmetry other than S may be obtained. A number of studies of the pdf of various ENSO SST time series have been made [Burgers and Stephenson, 1999; Hannachi et al., 2003; Penland and Sardeshmukh, 1995; Trenberth and Hoar, 1996]. In particular, Burgers and Stephenson show asymmetry in the pdfs of the four ENSO SST indices, Nino12, Nino3, Nino3.4, and Nino4, for the range 1950–1997. However, none of these studies consider separately the pdf of the positive and negative values, as is done in this paper.
 Some models with nonlinear dynamics lead to deviations from a Gaussian pdf [Jin et al., 2003]. The converse is not always true because a non-Gaussian pdf is not necessarily evidence for nonlinear dynamics. Models with linear dynamics plus stochastic forcing can also produce a nonzero S [Sardeshmukh and Sura, 2008]. Thus, the Gaussian deviation metrics, such as S, does not provide sufficient information to distinguish among the various possibilities from the models. This means that additional characterizations of the probability density function of the observational data are clearly needed.
 This paper is summarized as follows. In section 2, the source of the data is given. A filter that further reduces the effect from the annual cycle is described, and the filtered index NL is introduced. In section 3, the data are analyzed. Histograms show asymmetry. Discussion is given in section 4, and the summary is in section 5.
2. Data and Methods
2.1. Data Sources
 In the early 1980s, four geographic regions and their corresponding temperature anomaly indices were introduced by researchers at the Climate Analysis Center, all with “Nino” in their title [Barnston et al., 1997] (the Climate Analysis center is now the Climate Prediction Center (CPC)). Nino1 is a small region off the coasts of Peru and Ecuador. Nino2 is just north of Nino1. Nino3 straddles the equator and extends longitudinally between 90°W and 150°W. Nino4 also straddles the equator and is just west of Nino3. Nino1 and Nino2 have since been combined into a single index, Nino12. Barnston et al. , in a general study with the objective of finding the location in the tropical Pacific with the strongest correlation with the core ENSO phenomenon, found that a region overlapping Nino3 and Nino4 was best. They introduced a new index, Nino3.4, that “…[may] be regarded as an appropriate general SST index of the ENSO state by researchers, diagnosticians, and forecasters.” The longitude and latitude ranges of the regions are given in Table 1. The monthly values of the average SST for the four Nino regions and their indices are given by the Climate Prediction Center (CPC). Values begins in 1950, with values added monthly (data at http://www.cpc.ncep.noaa.gov/data/indices/sstoi.indices).
Table 1. Longitude and Latitude Range of the Regions
150°W to 160°E
Kaplan et al.  have constructed values of the Nino indices extending back to 1856, which are used in this paper. The Kaplan Nino3.4 index is shown in Figure 1.
2.2. Methods and Definitions
 A widely used method to reduce the effect from the annual cycle is to subtract from the measured value of the SST the climatological value that is defined as the set of average (over a specified number of years) monthly SST values. Some variation of this method has been used to produce the Nino index values. It is shown below that some of the effect from the annual cycle still remains.
 Another method to reduce the effects of an unwanted signal in a data time series that occurs at a particular frequency f0 (the annual cycle) is to put the data through a low-pass filter whose cutoff frequency is larger than f0. The digital filter used in this study to remove more of the annual cycle is described next.
2.2.1. The 12 Month Digital Filter F
 Consider monthly time series data that have been put through a digital filter,
This filter has the obvious property of being a low-pass filter that allows frequencies of f less than (1/12) month−1 to pass with only slight attenuation while reducing the magnitude of higher frequencies. Also, the monthly time resolution of the original time series is preserved but requires the use of six future values. This particular low-pass filter has an important additional property that is not generally recognized. The Fourier transform of F12 is H12(f) = sin(π12f)/sin(πf), which has zeros at multiples of the frequency f = (1/12) month−1 [Smith, 1997]. Thus, this filter “removes” signals whose frequencies are exactly (1 yr)−1 and also all harmonics of (1 yr)−1.
2.2.2. Filtered Index NL
 The annual effect in the Nino SST anomalies is further reduced by applying the filter F to the Nino index. The filtered index NL is defined as
(the subscript L stands for low pass). Others [Tziperman et al., 1997] have used this 12 month filter F on Nino index time series, but no name was given to this quantity. The amount of the reduction of the effect from the annual cycle can be quantified. The Fourier spectrum was computed for both Nino3.4 and NL(Nino3.4). The integrated signal under the annual frequency peak of Nino3.4 contained 21% of the total spectral density, while the calculation for NL (Nino3.4) showed only 14%. Both Nino3.4 and NL(Nino3.4) are shown in Figure 1. Unless stated otherwise, it will be understood that all Nino indices are NL(Nino) indices.
2.2.3. Definition of an El Niño/La Niña Event
 A quantitative definition of an El Niño/La Niña event is needed. The Quinn definition described above is not satisfactory because it does not give a numerical value. Trenberth  proposed the following definition of an El Niño event based upon Nino3.4. An El Niño event occurs when a 5 month running mean of Nino3.4 exceeds a threshold of 0.4°C; a La Niña event is defined for a threshold of −0.4°C. The Trenberth definition is widely used in forecasting the beginning of an event. However, it is not useful here because it does not give the magnitude of the event or the date of the maximum magnitude. In this paper, an El Niño/La Niña event is defined by its maxima or minima. From the NL (Nino3.4) values of amplitude versus time, one determines the following:
 1. Date: the date when there is a maximum (El Niño) or minimum (La Niña) in the amplitude can be determined with a relative accuracy of ±1 month. The conventional definition of the date of an El Niño/La Niña event as the “beginning” of the event is ambiguous and will give earlier dates.
 2. Amplitude: value at the date determined in (1).
3. Analysis of Data
Figure 1 shows Nino3.4 and NL(Nino3.4), in which one sees in both time series the well-known El Niño and La Niña events: the El Niño events of 1972–1973, 1982–1983, and 1997–1998 are clearly evident as large maxima.
3.1. Distribution of Amplitudes
 Various statistical quantities of Nino12, Nino3, Nino3.4, and Nino4 were computed. Table 2 gives the trend, standard deviation, and variance. Also given are the metrics for deviation from a Gaussian probability density function (pdf): skewness S and kurtosis K.
Figure 2 shows the histogram of n, the frequency of occurrence of an amplitude a in a bin of width b (= 0.1 K), versus the amplitude a for Nino12, Nino3, Nino3.4, and Nino4. These data are compared to a Gaussian distribution function, which is
where σ is the standard deviation, N0 is the total number of values, and n0 equals N0b (see Appendix A). Also plotted is a Gaussian pdf that has been fit to only the positive values of Nino3.4 (next paragraph). Note the deficiency of large negative values relative to the Gaussian distribution.
 To quantitatively determine the asymmetry, the positive and negative amplitudes were considered separately. Figure 3a shows plots of log(n) versus a2 for the positive values. The black line through the Nino3.4 values fits the Gaussian distribution (3) with parameters
The values of the three other Nino indices follow this same distribution.
Figure 3b shows that the negative magnitudes follow this same distribution to about the value a ≈ −0.9K, where there is a break to a steeper slope, indicating that some new effect not present for positive amplitudes is important. For larger magnitudes, the values from the eastern Pacific (Nino12) do not fall as fast as those from the central Pacific (Nino4). This may be associated with the shallow mixed layer in the eastern Pacific.
3.2. Large Magnitude El Niño/La Niña Events
 The Nino3.4 and NL(Nino3.4) time series since 1856 were examined, and a list of the largest positive events (El Niños) and negative events (La Niñas) was compiled. Table 3 lists the largest (20 positive and 14 negative) El Niño/La Niña events ordered according to the NL(Nino3.4) magnitude. There are no known El Niño/La Niña events listed by Quinn  or Trenberth  not in Table 3. The ordering of events by NL(Nino3.4) magnitude is different from that of the Nino3.4 magnitude, because the Nino3.4 time series contains a component from the effect from the annual cycle that is not an ENSO effect. The El Niño of 1997–1998 is ranked first by both schemes; however, the 1972 El Niño that was third according to Nino3.4 magnitude is now twelfth in the NL(Nino3.4) ranking.
 The 20 largest El Niños listed in Table 3, being part of the positive Gaussian distribution shown in Figure 3a, were separately found to be described by the same Gaussian distribution.
 The distribution of NL values is shown to be asymmetric. Specifically, the distribution of positive values is Gaussian, while the distribution of negative values deviates from Gaussian beyond a certain amplitude because of a deficiency of large La Niña events. The magnitude of the deviation increases from the eastern Pacific to the central Pacific.
 Some models show that nonlinearities in the fundamental physical processes lead to asymmetry (see An and Jin  for a review). On the other hand, other studies show that models with linear dynamical processes lead to Gaussian pdfs and that non-Gaussian distributions are evidence for nonlinear processes or linear process plus stochastic noise. See the discussion in the work of Sardeshmukh and Sura . However, there are no models that adequately describe the asymmetry findings reported in this paper.
 The positive amplitudes of NL(Nino) were found to be Gaussian distributed. Thus, the occurrence of large-amplitude El Niños are merely rare events from the “tail” of this distribution and not from a new climate perturbation. The negative amplitudes, however, follow the same Gaussian distribution for small values but deviate at a particular value suggesting a new process. This results in significantly fewer numbers of large negative events. Thus, the skewness of the distribution of amplitudes is not determined by “extra” positive events but by a deficiency of large negative magnitude events.
4.2. Frequency of Large-Amplitude El Niños
 The question of whether there have been more El Niño events since the late 1970s than in prior periods has been discussed [Mendelssohn et al., 2005; Trenberth and Hoar, 1996; Rajagopalan et al., 1997; Harrison and Larkin, 1997]. In Figure 4, the amplitudes of the 20 largest El Niño events (Table 2) are plotted versus the date. Considering the time series in 50 year segments, there are three: early, middle, and recent. The number of large El Niños in each is 7, 6, and 7, respectively, suggesting that there is no increase in the frequency of large El Niño events with time. The Mendelssohn et al.  report of increasing trend in the background is not inconsistent with this observation.
4.3. Probability of Large-Amplitude El Niños
 Knowing that the distribution of large El Niños is Gaussian allows one to estimate the probability of an El Niño of any large magnitude using equations (4) and (A5). For example, the El Niño of 1997–1998 is predicted to occur approximately once every 70 ± 20 years. What is the probability that an El Niño 25% larger will occur? The answer is approximately 700 ± 200 years.
 New results are found from the study of modified Pacific sea surface temperature El Niño/La Niña indices NL.
 The distribution of NL values is asymmetric. The positive values are shown to be Gaussian distributed, while the negative values deviate from Gaussian because of a deficiency of large La Niña events. Thus, large-magnitude El Niño events come from the tail of the Gaussian distribution and not from some rare external perturbation.
 The study of the 20 largest El Niños shows that their occurrence is not more frequent now than in the past.
 None of the previously published models adequately describe the results of this paper.
 The most important practical application of these results is that the probability of an El Niño of a particular amplitude can be calculated from the Gaussian distribution. A 1997–1998 El Niño will occur once in approximately 70 ± 20 years and one 25% larger will occur once in approximately 700 ± 200 years.
Appendix A:: Statistical Analysis
 Consider a set of scalar quantities: x1, x2, …, xN. The mean, variance, skewness, and kurtosis are , , , , where σ is called the standard deviation.
 If N is large, the number ΔN(x) between x and x + dx is
where p(x) is the probability density function (pdf), N0 is the total number of values, and n0 = N0Δx. The Gaussian pdf is
where σ is the standard deviation and = 0. The number of values exceeding x is
where erfc is the complementary error function. For the case in the text of monthly data, the number of positive values is N0 = y = 6y, where y equals the number of years and the 1/2 factor is because only one half of the values are positive. Thus,
The number of years to obtain N+(x) = 1 for a given x is