Extreme-Event Magnetic Storm Probabilities Derived From Rank Statistics of Historical Dst Intensities for Solar Cycles 14 – 24

Intense magnetic storms are hazardous for technological operations and infrastructure (e.g., Cannon et al., 2013; Daglis, 2005; Thomson, 2007). The magnetic storm of March 1989, for example, caused the collapse of the Canadian Hydro-Québec electricity transmission grid (Bolduc, 2002), caused numerous operational “anomalies” in United States power-grid systems (North American Electric Reliability Corporation, 1990), and damaged a high-voltage transformer at a nuclear power plant in Salem, New Jersey (Barnes et al., 1991; Rossi, 1990). The same storm damaged satellites and interfered with satellite operations, and it disrupted over-the-horizon radio communication and geophysical surveys around the world (e.g., Allen et al., 1989; Boteler, 2019). The magnetic storm of May 1921 caused widespread disruption to radio communication and telegraph and telephone systems (e.g., Hapgood, 2019; Silverman, 2001), and, notably, it caused fires in telegraph stations used by railroad companies in New York City and State (e.g., Love et al., 2019a). The magnetic storm of September 1909 (e.g., Hayakawa, Ebihara, Cliver, et al., 2019; Love Abstract A compilation is made of the largest and second-largest magnetic-storm-maximum intensities, −Dst1 and −Dst2, for solar cycles 14–24 (1902–2016) by sampling Oulu Dcx for cycles 19–24, using published −Dstm values for 4 intense storms in cycles 14, 15, and 18 (1903, 1909, 1921, 1946), and calculating 15 new storm-maximum −Dstm values (reported here) for cycles 14–18. Three different models are fitted to the cycle-ranked −Dst1 and −Dst2 values using a maximum-likelihood algorithm: A Gumbel model, an unconstrained Generalized-Extreme-Value model, and a Weibull model constrained to have a physically justified maximum storm intensity of −Dstm = 2500 nT. All three models are good descriptions of the data. Since the best model is not clearly revealed with standard statistical tests, inference is precluded of the source process giving rise to storm-maximum −Dstm values. Of the three candidate models, the constrained Weibull gives the lowest superstorm occurrence probabilities. Using the compiled data and the constrained Weibull model, a once-per-century storm intensity is estimated to be −Dst1 = 663 nT, with a bootstrap 68% confidence interval of [497, 694] nT. Similarly, the probability that a future storm will have an intensity exceeding that of the March 1989 superstorm, −Dstm > 565 nT, is 0.246 per cycle with a 68% confidence interval of [0.140, 0.311] per cycle. Noting (possibly slight) ambiguity in the rankings of storm intensities, using the same methods, but storms more intense than those identified for cycles 14–16, would yield a higher once-per-century intensity and a higher probability for a −Dstm > 565 nT storm.

A standard measure of storm-time geomagnetic disturbance is Dst, a scalar time series index formed by averaging geomagnetic variation recorded at low-latitude, ground-based observatories (e.g., Mayaud, 1980;Menvielle et al., 2011). Dst is often interpreted in terms of the westward-directed magnetospheric ring current (e.g., Daglis, 2006). The intensification of this current during main-phase development of a magnetic storm causes a decrease in low-latitude, horizontal-component geomagnetic field strength, and, correspondingly, a decrease in Dst from its prestorm, near-zero value (e.g., Loewe & Prölss, 1997). This decrease in Dst is sometimes taken as the definition of a magnetic storm (e.g., Gonzalez et al., 1994). Storm-maximum −Dst m intensity values sampled from long durations of Dst are often used in statistical analyses of storm-occurrence rates (e.g., Love, 2020;Riley, 2018;Tsubouchi & Omura, 2007;Yermolaev et al., 2013). The greatest −Dst m values represent the most intense storms. These are usually driven by interplanetary coronal-mass ejections (CMEs) (e.g., Gonzalez et al., 2011;Richardson & Cane, 2012) originating from solar active regions and coronal filaments (e.g., Gopalswamy et al., 2010;Kilpua et al., 2017;Webb & Howard, 1994). The occurrence probability of intense storms waxes and wanes in broad correlation with solar-cycle increases and decreases in sunspot number (e.g., Chapman et al., 2020;Echer et al., 2011;Le et al., 2012), though the intensity and detailed evolution of each magnetic storm depend, ultimately, on the geoeffectiveness of solar-wind coupling with the Earth's magnetosphere (e.g., Lyon, 2000;Zhang et al., 2004).
While recognizing that Dst describes only part of each storm's complex evolution (e.g., Borovsky & Shprits, 2017;Kamide, 2006;Lanzerotti, 1992), we note that it is often used in projects for estimating space-weather hazards. This can be understood in terms of the storm-substorm relationship (e.g., Daglis et al., 2003;Kamide et al., 1998). With main-phase intensification of the magnetospheric ring current, as approximately measured by Dst, the auroral oval widens and slips to lower latitudes (e.g., Milan et al., 2009;Siscoe, 1976b;Yokoyama et al., 1998). Then, with the diversion of substorm currents along field lines from the ring current, through the ionosphere, and back out to the ring current (e.g., Ganushkina et al., 2017;Welling, 2019), geoelectric fields are induced in the solid Earth across mid-latitudes. Consequently, Dst plays an important role, though not a singular role, in projects for estimating magnetic storm hazards for electric power grids (e.g., Lucas et al., 2020;Ngwira et al., 2013;Oughton et al., 2017;Pulkkinen et al., 2012;Woodroffe et al., 2016). Thermospheric density, which affects drag on low-orbiting satellites, and spacecraft charging, which can damage onboard electronics, are observed to be correlated with intensification of −Dst (e.g., Bowman et al., 2008;Ganushkina et al., 2017;Guo et al., 2010;Oliveira & Zesta, 2019) and mid-latitude geomagnetic disturbance (e.g., Koskinen et al., 2010;Qian & Solomon, 2012). Disturbances of the ionosphere, the source of storm time interference to over-the-horizon radio communication and global-positioning systems (GPS), are correlated with abrupt changes in Dst (e.g., Astafyeva et al., 2014;Basu et al., 2010).
In light of the history of the impacts of space weather, some scenarios anticipate that future magnetic "superstorms," often defined in terms of extreme values of Dst (e.g., Cliver & Dietrich, 2013;Gonzalez et al., 2011;Lakhina & Tsurutani, 2018), could bring widespread interference and damage to technological systems (e.g., Kappenman, 2012;Riley et al., 2018) and carry significant economic cost (e.g., Baker et al., 2008;Eastwood et al., 2017;Schulte in den Bäumen et al., 2014). For this reason, national and international space-weather projects (e.g., National Science and Technology Council, 2019; Schrijver et al., 2015) have identified prediction and long-term forecasting of intense storms as priorities for scientific investigation (e.g., Baker, 2002;Hapgood, 2011;Morley, 2020). In this context, we assemble and develop a dataset of storm-maximum intensities −Dst m for solar cycles 14-24 (1902-2016). We model the statistics of the most intense and second most intense storms per solar cycle using the mathematical formalism of extreme-value theory for ranked data. We examine possible secular change in the data and possible correlations of the −Dst m values with sunspot number. We estimate future storm occurrence probabilities and associated confidence intervals. Results inform projects for improving societal resilience to space-weather hazards (e.g., Green et al., 2016;Jonas et al., 2016). (for four-station versions of the indices) is about 3%. Since the many-observatory versions of Oulu Dst involve a lot more geographic averaging than a four-observatory Dst, the many-observatory version is less affected by localized differences in geomagnetic disturbance. In this respect, the many-observatory version of the index is more accurate than a four-station Dst. At storm maximum, the average relative difference between the Oulu four-observatory and multiobservatory versions is about 8%. Mindful of these issues, in our statistical analysis of storm intensities, for cycles 19-24 (1957-2016), we use the Oulu corrected storm-maximum −Dst m values, and, when available, we use the multiobservatory version of Oulu −Dst m , Section 5; we do not use Kyoto Dst m .

10.1029/2020SW002579
Estimating Dst for years prior to IGY (before 1957) can be challenging due to different and changing data-reporting conventions and the geographic sparsity of observatories. We have two related quibbles that affect the quality of the pre-IGY extension of the four-observatory version of the Oulu Dst index back to 1932, which is calculated using data from Cape Town (CTO) South Africa, KAK, HON, and SJG. The first concerns timestamp conventions. Before and including 1956, for example, KAK observatory data are reported in yearbooks as spot values with timestamps at the top of each UT hour (00:00, 01:00, etc.) (Kakioka Magnetic Observatory, 1959, p. 9); on the other hand, the pre-IGY HON and SJG data (all the way back to 1915) follow the post-IGY convention of hourly averages centered on the bottom of the UT hour (e.g., Hazard, 1918, p. 5); the same is true for all of the CTO data back to 1932 (e.g. Magnetic Observatory, University of Cape Town, 1944). These important issues are not recorded in the digital files held by the WDS, but they can make a difference when averaging the data to calculate Dst-for given HON,SJG,and CTO data (from,say,00:30 UT), which KAK data value should be used (00:00 or 01:00) in an average? Linear interpolation of the KAK data to obtain values at the bottom of the hour results in degraded 2-h resolution. We do not know how the KAK timestamp is handled for pre-IGY Oulu Dst. Our second quibble concerns the reporting of data gaps. For some magnetic storms, observatory hourly values are fillers intended to indicate that either the light trace of the analog magnetometer system had wandered off the edge of the photographic paper (as we discuss in Section 2, during periods of great geomagnetic disturbance), or, for whatever reason, an hourly value was difficult to estimate. Filler values should be treated as data gaps; they should not be used in calculating Dst. The presence of fillers is indicated in observatory yearbooks (such as with footnotes or other notations), but this important information is not given in the WDS files. A difficult example of this is seen with the HON data for the July 1941 storm, three hourly values are fillers. Unfortunately, the 1941 HON yearbook, held in National Oceanic and Atmospheric Administration archives in Boulder, Colorado, was never formally published (likely due to priorities brought by the World War); a photograph of the relevant yearbook page for HON July 1941 is given in a supplementary file accompanying this report. Inspection of the HON July 1941 time series on the Oulu website shows that filler values are not properly treated in the calculation of Oulu Dst. Mindful of these issues, for cycles 14-18, we use −Dst m values taken either from publications on specific storms, or we calculate −Dst using data values taken from yearbooks and WDS digital files, Section 6. We show the Oulu Dst time series from 1932 to 2016 in Figure 1a.

Autocorrelation and Sampling
We adopt a statistical approach for analyzing storm-maximum intensity values and predicting future probabilities, but autocorrelation in the Dst time series is incompatible with the data independence that is often assumed in statistical analyses. Most obviously, the Dst time series is serially correlated-space-weather conditions vary continuously in time, and, as a result, from one hour to the next, a particular Dst value is often similar to that of the previous and subsequent hour. This applies during quiet conditions, and it applies during the one-to three-day durations of magnetic storms, though the hour-to-hour variance in Dst is higher during a magnetic storm than it is during quiet times. Over longer timescales, the occurrence of one storm is often correlated with the occurrence of other storms due to the nature of the solar wind (e.g., Borovsky, 2020). So, for example, a sunspot group might be the source of multiple coronal mass ejections, each causing a separate magnetic storm (e.g., Gonzalez et al., 2011), or a high-speed stream of plasma emitted from a long-lasting coronal hole can sometimes cause a pair of magnetic storms separated in time by a solar-rotation period (e.g., Richardson & Cane, 2012;Tsurutani et al., 2006).
Data suitable for statistical analysis can be extracted from a time series by down-sampling or averaging over timescales longer than those characteristic of the autocorrelations (e.g., von Storch, 1995;Wilks, 2019, Chapter 5.2.4). A simple sampling method, often used in extreme-event analysis, is selecting the maximum values from ranked sequences of data in blocks of time covering a data time series (e.g., Coles, 2001, Chapter 1.1;Davison & Huser, 2015, Section 2). Block-maximum sampling removes autocorrelation over timescales shorter than the duration of each block; it is used, in particular, in analyses of stochastic processes that are modulated periodically or quasi-periodically. So, for example, statistical analyses are made of annual-maximum rainfall (e.g., Katz et al., 2002). In our study, the natural modulation to consider is that of the solar cycle. Identification of the most intense and second most intense storms, −Dst 1 and −Dst 2 , for each solar cycle, is a type of block-maximum sampling. LOVE 10.1029/2020SW002579

Storm Intensities: Solar Cycles 19-24
We rank storm-maximum −Dst m value from the Oulu Dst time series within each solar cycle, Figure 1a, keeping the most intense −Dst m = −Dst 1 and second most intense −Dst m = −Dst 2 storms for each of solar cycles 19 to 24 (years 1954-2016). In Table 1, we list these values. The most intense storm of cycle 19 was that of February 1958 (−Dst 1 = 422 nT), followed closely by that of September 1957 (−Dst 2 = 419 nT). For cycle 20, the most intense storm was that of May 1967 (e.g., Knipp et al., 2016;Webb, 1969) (−Dst 1 = 374 nT), followed by that of March 1970 (e.g., Pudovkin et al., 1972) (−Dst 2 = 283 nT). For cycle 21, the most intense storm was that of July 1982 (−Dst 1 = 326 nT) (e.g. Wik et al., 2009)  that of November 1991 (−Dst 2 = 366 nT). For cycle 23, the most intense storm was that of November 2003 (−Dst 1 = 533 nT), which came after the more vigorous, but in terms of maximum −Dst m , slightly smaller Halloween storm of October 2003 (e.g., Balch et al., 2004;Gopalswamy et al., 2005); the second most intense storm of this cycle was that of March 2001 (−Dst 2 = 367 nT). Cycle 24 was notably weak; the St. Patrick's day storm of March 2015 was the most intense (e.g., Wu et al., 2016)    We list, in Table 1, −Dst m intensities for 19 magnetic storms that occurred in solar cycles 14 to 18 (years 1902-1954); this list helps fill in and extend back in time lists of storms given by other investigators (e.g., Bell et al., 1997;Gonzalez et al., 2011;Lakhina & Tsurutani, 2018;Vennerstrom et al., 2016). Four of these storm intensities are from published papers: October 1903 (Hayakawa, Ribeiro, et al., 2020), September 1909 (Love et al., 2019a), May 1921 (Love et al., 2019b), and March 1946 (Hayakawa, Ebihara, et al., 2020). The other 15 storm intensities are estimated using methods like those used by others for estimating pre-1957 Dst (e.g., Hayakawa, Ribeiro, et al., 2020;Love et al., 2019a,b). In summary, we inspect observatory yearbooks and WDS data files. For each storm and for each observatory record, we check timestamp conventions, and we check for data gaps and filler values. We make a point of searching for data from low-latitude observatories that are widely separated in longitude to obtain a good circum-global-average measure of storm disturbance. For various reasons, we sometimes use data from observatories that are different from those used for calculating standard versions of Dst. For example, we identify three acceptable observatory records for the March 1920 storm; noting that no observatory was operated in South Africa until 1932, we use San Fernando (SFS), Spain yearbook data (Instituto y Observatorio de Marina, 1921); we found no suitable records for Asian-Australian longitudes, we do not use any KAK data because of incompatibility of timestamps; we use HON WDS data files (Hazard, 1922a); we use TUC WDS data files (Hazard, 1922b) in place of Vieques (VQS), Puerto Rico (the predecessor observatory to SJG) data, which have a gap (Hazard, 1923). For the October 1926 and July 1928 storms, we use Watheroo (WAT), Australia, WDS data files (Fleming et al., 1947). For the March 1941 storm, we use Apia (API), Samoa, WDS data files in place of HON data, which have a gap; we found no suitable records for American longitudes. For the July 1941 storm, in place of HON data, which have filler values (per yearbook), we use WDS TUC data files. For each observatory record, we subtract a pre-storm 24-h quiet period from the storm-time period to obtain disturbance data values. With latitude factors, we average the disturbance data to obtain estimates of storm-maximum −Dst m .
We examine the validity of our identification of −Dst 1 and −Dst 2 for solar cycles 17 and 18 by comparing them with the Oulu Dst index. Our cycle-ranked −Dst 1 and −Dst 2 data values and that of Hayakawa, Ebihara, et al. (2020) for the March 1946 storm, Table 1, are all slightly greater than the corresponding Oulu −Dst 1 and −Dst 2 values. The average relative difference is 5%. This is less than the 8% averaging error we estimate for Dst 1 and Dst 2 in Section 3. Therefore, even though we have some quibbles with parts of the Oulu Dst time series, Section 3, we understand that the Oulu Dst time series is accurate enough to allow us to confidently identify which storms are the most intense and second most intense, −Dst 1 and −Dst 2 , for cycles 17 and 18. For cycle 17, the storm of March 1941 (e.g., Parkinson, 1941;White, 1941) was the most intense (−Dst 1 = 469 nT), and the storm of July 1941 (e.g., Nelson, 1941;Ogg, 1941) was the second most intense (−Dst 2 = 447 nT). For cycle 18, the storm of March 1946 (e.g., Ogg, 1946;Parkinson, 1946) was the most intense (−Dst 1 = 512 nT), and the storm of January 1949 was the second most intense (−Dst 2 = 344 nT).
Lacking a time-continuous record of Dst for earlier solar cycles, we seek an objective method for identifying each cycle's most intense storms. We turn to the aa index, a mid-latitude measure of the range of geomagnetic vector variation over 3-h windows of time at two nearly antipodal observatories (one in the northern hemisphere and one in the southern hemisphere) (Mayaud, 1980). The aa index is continuous in time from 1868 to the present, and so it covers our period of interest. Extended durations of high aa represent vigorous magnetospheric convection (e.g., Thomsen, 2004) leading up to storm-maximum intensity as measured by −Dst m (e.g., Ebihara & Ejiri, 1998;Jordanova et al., 2001). Therefore, using a "homogeneous" version of aa (Lockwood, Chambodut, et al., 2018), Figure 1b, we calculate a running 24-h average of the index time series, something often called AA*. In Table 1, the listed storms for each of cycles 14-16 have the highest-ranked maximum values     * * * 1 5 , m AA AA AA , and, from calculated −Dst m for each of those storms, we infer each cycle's −Dst 1 and −Dst 2 . For example, we infer that the most intense and second most intense storms of cycle 14 were, respectively, those of September 1909 (−Dst 1 = 595 nT) and October 1903 (−Dst 2 = 513 nT); we note that those two storms exhibited the cycle's highest each case, those storms also exhibited their cycle's highest * 1 AA and second-highest * 2 AA levels of mid-latitude disturbance.
Though we identify intense storms, there is room for doubt as to whether or not our identification of −Dst 1 and −Dst 2 are, in fact, the most intense and second most intense storms of solar cycles 14-16. We would like to estimate the confidence we can have in our inferred rankings. We note from Table 1 that, for six of the eight of cycles 17-24, the −Dst 1 and −Dst 2 values are among the storms with the five highest * m AA ; for cycle 19, −Dst 2 is associated with the 17th highest * m AA , and for cycle 21, −Dst 2 is associated with the 11th highest * m AA . We might, therefore, estimate that the probability that the correct −Dst 1 and −Dst 2 are among the storms with the five highest * m AA for any particular solar cycle is 6/8 or 0.750. From this, the joint probability that storms with the five highest * m AA for all three of cycles 14-16 will be −Dst 1 and −Dst 2 for each cycle is (0.750) 3 = 0.422. In other words, under the assumptions made here, there might be a reasonably large probability (0.578) that the correct −Dst 1 and −Dst 2 for cycles 14-16 are greater than those we identify. Despite this, we suspect that we have, in fact, identified the most intense storms for these cycles, but because of lingering uncertainty, we regard the −Dst 1 and −Dst 2 intensities listed in Table 1 for solar cycles 14-16 as lower bounds on storm intensities for those cycles. This affects our choice of model in Section 12.

Secular Change
Qualitatively, the solar-cycle-ranked storm intensities, −Dst 1 and −Dst 2 , shown in Figure 1a and listed in Table 1, are not uniformly distributed in time. There were, for example, four so-called superstorms, with −Dst m > 500 nT (e.g., Lakhina & Tsurutani, 2018), during solar cycles 14-18 (October 1903, September 1909, May 1921, but only two such storms occurred during cycles 19-24 (March 1989 and. Considering, first, the most intense storms of each cycle, a Kolmogorov-Smirnov test (e.g., Bohm & Zech, 2010, Chapter 10.3.5;Press et al., 1992, Chapter 14.3) indicates that the discrepancies between the distributions of the −Dst 1 intensities for cycles 14-18 and for cycles 19-24 are something that would be exceeded by random data with a probability of p = 0.367. For the second most intense storms of each cycle, −Dst 2 , p = 0.367, the same value as for −Dst 1 due to small data numbers. These probabilities are of marginal significance, insufficient to comfortably dismiss the possibility that the discrepancies are only due to random fluctuations in data taken from the same distribution. Still, if any of the estimates of −Dst 1 or −Dst 2 for cycles 14-16 are underestimates, a possibility we discuss in Section 6, then we might have statistically significant evidence of secular change in storm intensities; but, for now, we do not have such evidence.
For data, such as −Dst 1 and −Dst 2 , that are positive with possible outliers, it is useful to analyze the statistics of their logarithms (e.g., Tukey, 1977). The geometric mean, which for a set of n positive data {x i } is If we accept the cycle rankings given in Table 1 as they are, then, the geometric mean of − Dst 1 for cycles 14-24 is 457 nT; and this appears to be a relatively well-centered mean; out of 11 cycles, 6 had − Dst 1 > 457 nT and 5 − Dst 1 < 457 nT. For cycles 14-18, the geometric mean of − Dst 1 is 547 nT, but for cycles 19-24, it is much lower at 393 nT. Student's t test (e.g., Bohm & Zech, 2010, Chapter 3.6.11;Press et al., 1992, Chapter 6.2) gives p = 0.125 that the means for cycles 14-18 and cycles 19-24 would arise from two distributions with the same mean. The geometric mean of −Dst 2 for cycles 14-18 is 390 nT, but for cycles 19-24, it is lower at 307 nT; Student's t test gives p = 0.153. As with the Kolmogorov-Smirnov tests, these probabilities, for the geometric means of either −Dst 1 or −Dst 2 , are of marginal significance, insufficient to comfortably dismiss the possibility that it is only a statistical fluke that storms were more intense during cycles 14-18 than during cycles 19-24. Again, the data do not provide persuasive statistical evidence of a long-term systematic change (over the past 11 solar cycles) in storm intensity.
A related issue is possible correlation between cycle-maximum storm intensity −Dst 1 and sunspot number. As we already noted, and as is well known, the probability of intense storms is modulated roughly in correlation with the rise and decline in sunspot number with each solar cycle, though there are excep-LOVE 10.1029/2020SW002579 tions to this rule-notably, the storm of October 1903 came very early in cycle 14 (e.g., Hayakawa, Ribeiro, et al., 2020). However, our interest here is not the details of intra-cycle modulation but, rather, a possible long-term relationship between cycle-maximum storm intensity and sunspot number. In Figure 1c, we plot daily R D , monthly average R M , and 13-month running average R 13 sunspot number for solar cycles 14-24 (Clette et al., 2014;SILSO World Data Center, 1902. In Table 1, we list sunspot number R D−2 for 2 days before each −Dst m , and we list R M and R 13 , for each −Dst m . We calculate the Pearson correlation coefficient (e.g., Press et al., 1992, Chapter 14.5) between these sunspot numbers and −Dst 1 . For R D−2 , the correlation is small and, possibly surprisingly, negative, −0.154, but the probability that such a correlation coefficient could be realized from random data, is large, p = 0.651. For R M (R 13 ) and −Dst 1 , the correlation coefficient is −0.352 (−0.360), and p = 0.288 (p = 0.275). None of these correlations can be deemed significant.
A similar insignificant correlation is found by Kilpua et al. (2015) between aa and sunspot number. They interpreted this as due to a small-scale turbulent dynamo process in the Sun, one that is only indirectly related to the generation of sunspots. Indeed, we note that energetic solar flares (and, therefore, CMEs) most frequently originate in active regions with complex heliomagnetic polarity structures (fragmentation, helicity) (e.g., Lefèvre et al., 2016;Sammis et al., 2000). CMEs can also originate in disappearing filaments, sometimes at locations in the corona rather far removed from active regions (e.g., Gopalswamy et al., 2010;Webb et al., 2000); an example of a storm generated by such a CME is that of November 1991 (e.g., Cliver et al., 2009), −Dst 2 = 366 nT. These several factors are not well measured by sunspot number, and, therefore, we appreciate that sunspot number is not likely to be tightly correlated with cycle-maximum magnetic storm intensity. Multiply this by the fact that the geoeffectiveness of a CME, and, therefore, storm intensity, depends on solar-wind velocity, density, and interplanetary magnetic field (IMF) strength and orientation, quantities that, themselves, evolve as the CME traverses the Sun-Earth distance. It is, therefore, unsurprising that the intensities of the most intense storms are not well correlated with sunspot number.

Stochastic Processes
Since our dataset of the most intense and second most intense storms, −Dst 1 and −Dst 2 , for each solar cycle are ranked samples from a larger set of storm-maximum −Dst m values, in considering candidate statistical models of −Dst 1 and −Dst 2 , we first consider candidate statistical models of the source −Dst m values. The most conventional model used in extreme-value statistical analyses is the power-law. For positive random data {x > μ} that are power-law distributed, the probability density and cumulative functions are and (e.g., Clauset et al., 2009;Newman, 2005;Sornette, 2006), where μ is a location parameter, ξ is a shape parameter, ξ > 0, and   ( ; , ) P x is the complementary cumulative. Some physical systems that exhibit power-law statistics can be described in terms of self-organizing criticality (SOC), whereby a system evolves to an unstable state that collapses only to evolve again to a similarly unstable state (e.g., Aschwanden, 2011;Sornette, 2006;Turcotte, 1999). In the storm-time magnetosphere, SOC is associated with the buildup of magnetic field at the magnetopause and in the magnetotail that is abruptly rearranged with reconnection (e.g., Angelopoulos et al., 1999;Chang, 1999). Power-law statistics are found in numerical simulations of substorm dynamics (e.g., Klimas et al., 2000;Uritsky & Pudovkin, 1998) and in some analyses of ground-level geomagnetic data (e.g., Balasis et al., 2009;Pulkkinen et al., 2006;Wanliss, 2005). Power-law models have been used to describe extreme-value storm intensities (e.g., Kataoka, 2013;Riley, 2012;Yermolaev et al., 2013), but the occurrence probabilities of the very most extreme intensities are often overestimated by power-law models (e.g., Riley, 2018, Figure 2).
A lognormal model is sometimes assumed to represent the statistics of storm-maximum intensities (e.g., Haines et al., 2019;Love et al., 2015;Pulkkinen et al., 2012). For positive random data {x} that are lognormally distributed, the probability density function is where lnx is the natural logarithm of x, and μ is the lnx-population mean, a location parameter, and σ 2 is the lnx-population variance, a scale parameter. The cumulative function is and where   ( ; , ) L x is the complementary cumulative. Use of a lognormal model is often justified by appealing to the central limit theorem, under which a lognormal process arises from the multiplication of random variables generated by independent underlying processes (e.g., Crow and Shimizu, 1988;Marshall & Olkin, 2007;Mitzenmacher, 2004;Sornette, 2006). Recognizing this, we note that the solar-wind variables of velocity, density, and IMF magnetic field strength display lognormal properties (e.g., Burlaga, 2001;Burlaga & Lazarus, 2000;Veselovsky et al., 2010). It is tempting to imagine a lognormal process for storm-maximum intensities as arising from a multiplicative combination of solar-wind variables that plausibly govern the geoeffectiveness of solar-wind-magnetospheric coupling (e.g., Newell et al., 2007). However, even if spatial and temporal correlations in solar-wind variables could be removed through sampling, so as to render a statistical subset of the solar-wind data, the response of the magnetospheric-ionospheric system to solar-wind forcing is nonlinear (e.g., Vassiliadis et al., 1995), and, therefore, the statistical association of −Dst m with solar-wind variables is not straightforward.
Both the power-law and the lognormal functions asymptotically approach zero in the extreme-value limit,     lim ( ) 0 and lim ( ) 0, but the nature of these two distributions is distinctively different as this limit is approached. The distinguishing property of the power-law distribution is its "regular-variation" or "self-similarity," with any one part of the distribution resembling another part under rescaling by a multiplicative factor r, On the other hand, the lognormal distribution is not regularly varying at infinity, and the tail of the lognormal approaches zero more rapidly than that of the power law, In other words, for given fits to data, { μ, ξ } and { μ, σ}, extrapolated occurrence probabilities will be higher for power-law models than for lognormal models. Although we regard these observations as important, we remain mindful of the possibility that storm statistics are not described by either power-law or lognormal processes.

10.1029/2020SW002579
We can, furthermore, consider the possibility that magnetic storm intensity has an upper limit. Ring-current intensity is likely saturated when a balance is attained between plasma pressure within the magnetosphere and the pressure of the magnetic dipole (Vasyliūnas, 2011). With such a balance, the greatest possible value for −Dst m is approximately 2500 nT, far higher than anything that has been observed. Following on from our discussion of a lognormal process, then, we might assume that magnetic storm intensities are realized from an upper-limit lognormal process (e.g. Love, 2020), where the variable is normally distributed, and {0 < x < υ} is domain-right-limited, with probability density function, The corresponding cumulative function is (Bezdek & Solomon, 1983;Mugele & Evans, 1951). An interesting comparison is of the upper-limit lognormal distribution with a scaled beta distribution, for which the probability density function is where I x (α, β) is the regularized incomplete beta function. Near the right endpoint, the properties of the upper-limit lognormal, are antisymmetric to those of the lognormal (8). On the other hand, near the right endpoint, the beta distribution is regularly varying, and so, as the right endpoint is approached, the beta distribution is similar to a power-law (7) (e.g., Embrechts et al., 1997, example 3.3.17). And, furthermore, in symmetry with Equation 9, In other words, for given fits to data, {α, β, υ} and {μ, σ, υ}, extrapolated occurrence probabilities for intensities approaching the right endpoint will be higher for beta models than for upper-limit lognormal models.

Models for Block-Maximum Samples
As we discuss in Section 4, our ranked −Dst 1 and −Dst 2 data values are a solar cycle-by-cycle (block-byblock) down-sampling of the Dst time series. This removes most of the autocorrelation, giving us a dataset suitable for statistical analysis. But how are our hypotheses that magnetic storm intensities −Dst m might be the result of a lognormal or power-law source processes, or upper-limit lognormal or beta source processes, reflected in the statistical properties of the −Dst 1 and −Dst 2 samples? Extreme-value theory provides us with an answer to this question. Under the Fisher-Tippett-Gnedenko theorem (Albeverio & Piterbarg, 2006, Section 3.2.1; Davison & Huser, 2015, Section 2; Gomes & Guillou, 2015, Section 2.1), the generalized extreme-value distribution (GEV) (e.g., Bali, 2003;Walshaw, 2013) describes block maxima samples taken from an extremely wide variety of source distributions. For positive data {x} that are GEV distributed, the probability density function is and the cumulative function is The GEV distribution has been applied in a diversity of scientific projects (e.g., Ghil et al., 2011), in statistical analyses of space-weather phenomena (e.g., O'Brien et al., 2007;Tsiftsi & De la Luz, 2018), and, in particular, in statistical analyses of magnetic storm intensities (e.g., Chen et al., 2019;Elvidge, 2020;Love, 2020;Nikitina et al., 2016;Woodroffe et al., 2016). The GEV distribution is certainly useful for parameter exploration. Since block-maximum samples taken from the tail of practically any source distribution can be modeled by the GEV distribution, it can also be used without hypothesizing a source process (Hewitt, 1970, p. 343), but this is something we want to avoid.
Formal statistical hypothesis testing can be performed by comparing special cases of the GEV distribution with data. Depending on the shape parameter, ξ, the GEV distribution reduces to one of three special cases, from which we understand that the Fréchet distribution is both domain-unlimited to the right and regularly varying at infinity. And, under the Fisher-Tippett-Gnedenko theorem, block-maximum samples taken from a right-unbounded source distribution that is regularly varying at infinity will be Fréchet distributed. The Weibull distribution is domain-right-limited, with maximum value at υ = μ − σ/ξ, and regularly varying at that right endpoint; samples taken from a right-limited source distribution that is regularly varying at its right endpoint will be Weibull distributed. The Gumbel distribution is right-unbounded and not regularly varying at infinity. Notably, samples taken from a right-unbounded distribution that is not regularly varying at infinity, or samples taken from a right-limited distribution that is not regularly varying at its right endpoint, will be Gumbel distributed (e.g., Albeverio & Piterbarg, 2006, Chapter 3.2.1; Alves & Neves, 2014).
In light of our discussion in Section 8, we understand that block-maximum samples from a power-law source distribution will be Fréchet distributed-the power-law distribution is said to be in the "domain of attraction" of the Fréchet distribution (e.g., Embrechts et al., 1997, Example 3.3.31;Ferreira, 2009;Walshaw, 2013). Block-maximum samples taken from a beta source distribution will be Weibull distributed-the beta distribution is in the Weibull domain of attraction. Block-maximum samples taken from either a lognormal distribution, or a upper-limit lognormal source distribution, will be Gumbel distributed-both the lognormal and upper-limit lognormal distributions are in the Gumbel domain of attraction. We note that the Fréchet and Gumbel distributions have both been used in some statistical analyses of magnetic storms (e.g., Silbergleit, 1997;Weigel & Baker, 2003). The Weibull distribution has also been used in some statistical analyses of magnetic storms (e.g., Gopalswamy, 2018;Moriña et al., 2019;Watari et al., 2001), though, so far as we can tell, without invoking a specific underlying source process.

Framework for Ranked Samples
For our statistical analysis, we need to develop a framework for ranked data. For pairs of samples {x 1 > x 2 }, each taken from blocks of data, the conditional GEV probability density function for x 2 given x 1 is and the joint density function for x 1 and x 2 is (e.g., Chandler, 1952;Nagaraja, 1982;Solow & Beet, 2004

Maximum-Likelihood Estimation
We fit models to the solar-cycle-ranked −Dst 1 and −Dst 2 data listed in Table 1 using the maximum-likelihood method (e.g., Bohm and Zech, Chapter 6.5, 2010, Chapter 6.5;Roe, 2001, Chapter 13.2). We consider three different types of models: (1) a Gumbel model, obtained by fitting the parameters {μ, σ} with the shape parameter ξ = 0, (2) an unconstrained GEV model, obtained by fitting {μ, σ, ξ} with ξ treated as a free parameter, (3) a constrained Weibull model, obtained, as a special-case of a GEV model, by fitting {μ, σ} with ξ determined by μ − σ/ξ = υ = 2500 nT, the theoretical maximum intensity that could possibly be realized for a magnetic storm (Vasyliūnas, 2011). Each model fitted to the data maximizes the joint likelihood where the product is taken over n solar cycles.
We use a simplex algorithm (e.g., Press et al., 1992, Chapter 10.4) to maximize  and, thereby, obtain the maximum-likelihood parameters.

Comparisons of Models
In Figure 2, we show complementary cumulatives of the −Dst 1 and −Dst 2 intensities for the 11 solar cycles 14-24 and complementary cumulatives of the Gumbel, 1 G and 2 G , the unconstrained GEV, 1 Γ and 2 Γ , and the constrained Weibull, 1 W and 2 W models, each fitted using the joint probability density function, Equation 24. We also list in Figure 2 in tabular form Gumbel, GEV, and Weibull model estimates of the probabilities that solar-cycle maximum intensity −Dst 1 will exceed a range of thresholds, 500, 565, 600, …, 1000 nT. The maximum-likelihood parameters for the Gumbel model are μ = 401 nT, σ = 128 nT. From these parameters, we can calculate model moments (e.g., Forbes et al., 2011, Chapter 19): the mode of −Dst 1 (the most likely value) is μ; the mean of −Dst 1 is μ + ϵσ = 475 nT, where ϵ ≃ 0.577 is the Euler-Mascheroni constant; the median of −Dst 1 is μ − σ ln(ln 2) = 448 nT; for this model, the theoretical maximum value is infinite. The maximum-likelihood parameters of the unconstrained GEV model are μ = 401 nT, σ = 129 nT, with a tiny shape parameter ξ = −0.018. From these parameters, we can calculate model moments: the mode of −Dst 1 is each of these GEV moments is very similar to the corresponding Gumbel moments. Perhaps more interesting is the fact that the GEV estimated ξ is slightly negative. This implies a Weibull-type distribution with a maximum storm intensity of μ − σ/ξ = 7522 nT, far higher than the theoretical maximum of 2500 nT. The maximum-likelihood parameters of the constrained Weibull model, with a maximum storm intensity set at 2500 nT, are μ = 403 nT, σ = 130 nT, ξ = −0.062, for which the mode is 411 nT, the mean is 470 nT, and the median is 450 nT.
Qualitatively, all three models are good representations of the data, especially when considering the data's scatter. Among the three models, the Gumbel shows the heaviest distributional tail. The constrained Weibull model shows the lightest distributional tail. The tail of the unconstrained GEV model is tucked in between the tails of the Weibull and Gumbel models. Given these simple observations, one might reasonably wonder whether or not standard statistical tests can actually tell us which of the three models is best. Bootstrap resampling (e.g. Boos, 2003;Efron & Tibshirani, 1993) can quantify model uncertainty. We take solar-cycle −Dst 1 and −Dst 2 pairs as a population, which we sample with replacement. We fit each sample with a GEV model, obtaining, in each case, a new estimate of the shape parameter ξ. From numerous resamplings and refittings, we obtain a bootstrap set {ξ}. The (centered) 68% confidence interval is [−0.429, 0.136]. This statistical spread is wide enough that we cannot be especially confident in the ξ = −0.018 estimate for the shape parameter. That the 68% interval ranges from negative to positive values means that that the cycle-ranked −Dst 1 and −Dst 2 data are, on a simple statistical basis, insufficient for discriminating between a (ξ > 0) right-unbounded, regularly varying Fréchet model, a (ξ < 0) right-limited, regularly varying Weibull model, or a (ξ = 0) right-unbounded, nonregularly varying Gumbel model.
The relative goodness of a model fitted to data is often measured by comparing likelihoods (e.g., Bohm & Zech, 2010, Chapter 10.3.4). The Gumbel, GEV, and Weibull models, respectively, have ln( )  of 129.007, 129.002, 129.039 (units, here, are not relevant). The Weibull model has the highest likelihood. To investigate whether or not this is significant, as before, we perform a bootstrap analysis, repeatedly sampling with replacement the solar-cycle −Dst 1 and −Dst 2 pairs, fitting models to each sample, and accumulating log-likelihood bootstrap sets {ln( )}  for each model type. From these sets, we obtain 68% confidence intervals on ln( )  for each model type. For the Weibull model, the interval is [121.921, 131.280], which is very wide compared to differences between the log-likelihoods of the three models. The intervals for the other models are similarly wide. From this, we understand that all three models are close to equally good representations of the cycle-ranked −Dst 1 and −Dst 2 data.
Next, we consider statistical significance. Would data that are statistically similar to those that we have be likely realizations of any of the fitted models? To answer this question, we turn, again, to the Kolmogorov-Smirnov test, but we recognize that it is circular to estimate significance with the same data used to estimate a model's parameters (e.g. Chave, 2017, Chapter 8.2.4;Corral & González, 2019, Section 3.2;Steinskog et al., 2007), therefore, we turn to bootstrap resampling. Treating each model as a "hypothesis," we take multiple random bootstrap samplings with replacements of solar-cycle pairs −Dst 1 and −Dst 2 , each time calculating a Kolmogorov-Smirnov p-value against the model. For each model, the median p-value of the resamplings is a suitable estimate of significance (e.g., Clauset   Complementary cumulatives of the cycle-ranked −Dst 1 and −Dst 2 data (solar cycles 14-24) and complementary cumulatives of fitted models: Gumbel G (green, brown), GEV Γ (black, gray, with ξ treated as a free parameter), and constrained Weibull W (red, blue, with ξ such that the right limit ν = 2500 nT), each for parameters μ, σ, ξ that maximize the likelihood , Equation 29, for the joint density function given by 24. Also listed are the 68% confidence interval for ξ for the GEV model, log-likelihoods ln( )  for all three models, Kolmogorov-Smirnov p-values, separately for −Dst 1 and −Dst 2 , and model estimates of the probabilities 1 G , 1 Γ , and 1 W that solar-cycle maximum intensity −Dst 1 will exceed a range of thresholds, 500 nT, … 1000 nT.
González, 2019, Section 3.2; Steinskog et al., 2007). The median p-values for −Dst 1 and for the Gumbel, GEV, and Weibull models are, respectively, 0.424, 0.350, 0.386; for −Dst 2 , they are 0.383, 0.332, 0.336. These probabilities are not small enough to motivate rejection of any of the three models as descriptions of the cycle-ranked −Dst 1 and −Dst 2 data.
Since the statistical tests do not allow us to confidently discriminate between the various extreme-value models, we understand that we cannot draw inferences about the nature of the source distribution. As far as we can tell, any one of power-law, lognormal, upper-limit lognormal, or beta sources could give the −Dst m from which we block sample for −Dst 1 and −Dst 2 data values. At this point, we recall the challenges we discuss in Section 6 in identifying the most intense storms for solar cycles 14-16. There, we accepted the fact that our compiled −Dst 1 and −Dst 2 data values for cycles 14-16 should be regarded as lower bounds because there is a (possibly slim) chance that we missed more intense storms. Bearing that in mind, we note, from Figure 2, that this choice makes little difference for storm intensities across a range of, say, 500-700 nT-all three models give similar −Dst 1 probabilities across that range. On the other hand, for greater storm intensities, such as −Dst 1 > 900 nT, differences between the three models are more significant. For a storm as intense as −Dst 1 > 1000 nT, the Gumbel model gives an occurrence probability of 0.010/cycle. The constrained Weibull model gives an occurrence probability of 0.005/cycle, half that of the Gumbel. Therefore, seeking not to exaggerate the storm probabilities, in what follows, we chose to emphasize results from the model that gives the lowest extreme-value storm probabilities: the Weibull model that is constrained to be right-limited for storm intensities at 2500 nT.

Storm Occurrence Rates
In Figure 3a, we show complementary cumulatives of the −Dst 1 and − Dst 2 intensities for solar cycles 14-24 and complementary cumulatives of the constrained Weibull model, 1 W and 2 W , fitted using the joint probability density function, Equation 24. We list in tabular form Weibull-model estimates of the probabilities 1 W that solar-cycle maximum intensity −Dst 1 will exceed a range of thresholds, 500, 565, 600, …, 1000 nT. We also show and list the corresponding 68% confidence intervals obtained from bootstrap resampling of −Dst 1 and −Dst 2 pairs and fitting a Weibull model to each sampling. Note, in particular, the probability that −Dst 1 will exceed 500 nT is  1 0.384 W /cycle, corresponding to an average return rate of just 2.6 solar cycles. The corresponding 68% confidence interval is [0.253, 0.456]/cycle. In Figure 3b, we show Weibull models fitted, independently, to the −Dst 1 intensities and the −Dst 2 intensities, using a likelihood function constructed from the (nonjoint) Weibull density function given by Equation 17. As noted by Solow and Beet (2004), a nonjoint fitting does not exploit the information content of ordering in the data, that is, the information content of the conditional probability of −Dst 2 given −Dst 1 ; but this is essentially what was done in some previous investigations (e.g., Siscoe, 1976a;Willis et al., 1997). For −Dst 1 > 500 nT storms, the probability of the nonjoint model is  , with ξ such that the right limit υ = 2500 nT. (a) Models fitted to − Dst 1 and − Dst 2 (solar cycles 14-24) and using the joint density function given by 24. (b) Models are fitted independently to − Dst 1 and − Dst 2 (cycles 14-24) using the non-joint density function given by 17. (c) Models fitted to − Dst 1 and − Dst 2 (separately for cycles 14-18, cycles 19-24, and cycles 14-24 as dashed line) using the joint density function given by 24; here, for purposes of clarity, fits to − Dst 2 are not shown. Listed are model estimates of the probabilities 1 W that solar-cycle maximum intensity − Dst 1 will exceed a range of thresholds, 500 nT, ⋅⋅⋅ 1000 nT, corresponding 68% confidence intervals, and Kolmogorov-Smirnov p-values, separately for − Dst 1 and − Dst 2 . joint-probability model, Figure 3a, gives more conservative estimates of future superstorm occurrence probabilities than the nonjoint model, Figure 3b.
In Figure 3c, we show results for a bisection test, fitting −Dst 1 and −Dst 2 separately for solar cycles 14-18 and for cycles 19-24; for clarity, we only show complementary cumulatives for −Dst 1 and 1 W even though, as in Figure 3a, we are fitting Weibull models using the joint density function that includes −Dst 2 . We recall, from Section 7, that neither a Kolmogorov-Smirnov test nor a Student's t test is sufficient to reject the possibility that −Dst 1 and −Dst 2 values from cycles 14-18 and values from cycles 19-24 are both taken the same distribution. Still, in Figure 3c, we see practical differences in the fitted Weibull models. For cycles 14-18, the probability of a −Dst 1 > 500 nT storm is 0.492/cycle, but for cycles 19-24 the probability is 0.216/ cycle. As we have already noted, from Figure 3a, modeling data for cycles 14-24 (inclusive of several intense pre-IGY storms) yields a probability for −Dst 1 > 500 nT storm of 0.384/cycle. This is 78% higher than the probability (0.216/cycle) obtained by only modeling data for cycles 19-24. We recall from Section 6 that − Dst 1 and −Dst 2 for cycles 14-16 should be regarded as lower bounds. If there were storms more intense for those cycles, more intense than we list in Table 1, then the methods we use here would yield higher −Dst 1 > 500 nT probabilities.

Probability of 1859, 1921, 1989 Storms
The Carrington event of September 1859 is recognized as a superstorm (e.g., Cliver & Dietrich, 2013;Hayakawa, Ebihara, Wills, et al., 2019). It brought widespread interference to telegraph systems (e.g., Boteler, 2006) and caused low-latitude aurora (e.g., Green et al., 2006), but its absolute intensity is uncertain. Only one low-latitude observatory, that of Colaba (CLA), India, reported a complete record of the storm (Tsurutani et al., 2003). The CLA record has been used to estimate a peak Carrington intensity of −Dst 1 = 850 nT (Siscoe et al., 2006) and −Dst 1 = 1050 nT , even though it is understood that local-time asymmetry in low-latitude geomagnetic disturbance, caused, for example, by field-aligned currents, can result in local disturbance being low or high relative to a mean disturbance level from multiple observatories. If we accept the intensity estimates for the Carrington event, then it and the magnetic storm of May 1921 are the only two storms over the past 15 solar cycles (10 through 24) with −Dst 1 > 900 nT. From our Weibull model, a storm of such intensity has an occurrence probability of  1 0.013 W /cycle (average return rate of 76.9 cycles). That is a low probability, something also noted by Moriña et al. (2019).
For perspective, we note that the bootstrap (centered) 68% confidence interval on the Weibull occurrence probability for storms with −Dst 1 > 900 nT, from Figure 3a, is [0.004, 0.027]/cycle. The upper threshold on this interval, 0.027/cycles, corresponds to a return rate of 37.0 cycles. The 68% confidence interval means that there is a one-sided probability of 16% that the per-cycle probability is greater than 0.027/cycles (and that the return rate is less than 37.0 cycles). A more rigorous check can be obtained with a Kolmogorov-Smirnov test. Here, we use the Weibull model fitted to the −Dst 1 and −Dst 2 data from cycles 14-24, exactly as in Figure 3a, but, for the Kolmogorov-Smirnov test, we add a 12th −Dst 1 value of 850 nT, similar to the Carrington intensity. Then, as in Section 12, we obtain a set of bootstrap p-values. The median of these is 0.325, only slightly lower than the 0.386 median value obtained using only data from cycles 14-24. This is not small enough to motivate the Weibull model's rejection, even for a rare Carrington intensity. In this light, and recalling that we chose to focus on Weibull models because they give lower maximum-likelihood probabilities than the Gumbel and unconstrained GEV models, we understand that the historical occurrence of the Carrington event cannot be viewed as necessarily inconsistent with our modeling.
Regarding the March 1989 storm (−Dst 1 = 565 nT), the most intense storm since the IGY, we note from Table 1 that, over solar cycles 14-24, only the September 1909 and May 1921 storms had, to our knowledge, higher intensities. In extending the span of −Dst 1 and −Dst 2 from solar cycles 19-24 (IGY and afterward) to cycles 14-24 (inclusive of several pre-IGY superstorms), from Figures 3a and 3c, our estimate of the probability of a −Dst 1 > 565 nT storm is higher by 126%, from 0.118/cycle (average return rate of 8.47 cycles) to 0.246/cycle (return rate of 4.1 cycles). We recall from Section 6, that our −Dst 1 and −Dst 2 intensities for cycles 14-16 should be regarded as lower bounds. If there were storms more intense for those cycles, then the methods we use here would yield a higher probability for the occurrence of a 1989-like storm. For now, we can say that previous suggestions that the March 1989 storm was, essentially, a once-per-century storm (e.g., Boteler, 2019;Love et al., 2018;Pulkkinen et al., 2012) appear to be "optimistic"-a storm of such intensity apparently occurs about every 45 years.

Once-Per-Century Storm Intensity
To estimate storm intensity for a given exceedance probability, we invert Equation 18 to obtain the quantile function, We can use this to estimate the intensity of a "once-per-century" storm. Assuming that each solar cycle is 11 years in duration, 9 cycles equals about 99 years. For  Γ 1 / 9 and using the maximum-likelihood parameters for the Weibull model obtained by fitting −Dst 1 and −Dst 2 for solar cycles 14-24, we obtain a once-per-century intensity of −Dst 1 = 663 nT and a corresponding bootstrap 68% confidence interval of [497,694] nT. We recall, again, the ambiguity in the rankings of storm intensities for cycles 14-16. Using our methods with storms more intense for cycles 14-16 would yield a higher once-per-century storm intensity.

Discussion and Conclusions
Our cycle-ranked extreme-event analysis of −Dst 1 and −Dst 2 represents a more complete exploitation of the available data than previous analyses that only use cycle-maxima −Dst 1 (e.g., Love, 2020;Silbergleit, 1997). Our extension of the number of solar-cycle ranked −Dst m intensities to cover cycles 14-24 (1902-2016) is, we might expect, a better representation of extreme-value storm intensities than the −Dst m values taken from shorter intervals, such as the post-IGY interval covering cycles 19-24 (1957-2016) and used in some previous analyses (e.g., Love, 2020;Riley & Love, 2017;Silbergleit, 1997;Tsubouchi & Omura, 2007;Yermolaev et al., 2013). Notably, the extended dataset encompasses several storms with −Dst m > 500 nT (1903,1909,1921,1946) that occurred before 1957, whereas only two such storms (1989 and 2003) occurred after 1957. Although significance tests tell us that we cannot reject the possibility that this difference is just a statistical fluke, Section 7, the difference is real, and it affects estimates of the occurrence probabilities of future intense storms. We recall, for example, from Section 14 and Figure 3, that using the extended dataset yields a maximum-likelihood probability for a storm as intense as that of 1989 of 0.246/cycle (return rate of 4.1 cycles). This is more than twice as high as the probability obtained using the same methods and the shorter standard 1957-2016 dataset.
Such results generate interest in obtaining additional estimates of storm intensities. We recall, from Section 6, that there is some (possibly slight) ambiguity in the rankings of storm intensities for cycles 14-16. This ambiguity is why we have noted that, using the same methods, but storms more intense than those identified for cycles 14-16, would yield a higher once-per-century intensity and a higher probability for a −Dst m > 565 nT storm. Opportunities for improvement in statistical analyses like that reported here might come by pushing the storm-intensity compilation back to even earlier solar cycles, though this entails challenges. Consider, for example, solar cycle 13. The storm of February 1892,  * 1 249.6 AA nT, was one of that cycle's most intense storms (e.g., Superintendent of the U.S. Naval Observatory, 1892). It was completely recorded at Colaba (CLA), India (Moos, 1910, Plate 81 D), San Fernando (SFS), Spain (Instituto y Observatorio de Marina, 1893), and Zi-Ka-Wei (ZKW), China. In contrast, the storm of August 1894 (e.g., Finn, 1894),  * 3 193.6 AA nT, was completely recorded at CLA (Moos, 1910, Plate 81 H), but it was not at SFS and not at ZKW, per yearbooks. We know of no observatory record for either of these storms from the American sector. One could probably reliably estimate −Dst m for the first storm, but, with a recording from only one observatory, estimating −Dst m for the second storm is more problematic.
Finally, from Section 8, we recall our discussion of several candidate stochastic source processes that might describe storm-maximum intensities. On physical grounds, we prefer those source processes with associat-LOVE 10.1029/2020SW002579 ed distributions that are right-limited-storms of infinite intensity are not possible, and so the distribution describing extreme storm intensities should reasonably accommodate an upper limit. Such distributions might be either regularly varying, a property that can arise from self-organized critical processes, or they might be nonregularly varying, a property that can arise from the multiplicative action of multiple random processes. We recall, from Section 9, that block samples of right-limited, regularly varying data are Weibull distributed, while block samples from right-limited, non-regularly varying data are Gumbel distributed (though the Gumbel, itself, is not right-limited). And we recall, from Section 12 and Figure 2, that a Gumbel model and a Weibull model (one constrained to have a theoretical upper limit of 2500 nT) both provide good representations of the cycle-ranked −Dst 1 and −Dst 2 data, and, therefore, we are not able to discriminate between source processes that are regularly varying or not regularly varying. This issue is important for accurately predicting the probabilities of the most extremely intense magnetic storms, differences in the far-end tails of the candidate models, for storm intensities of (say) −Dst 1 > 1000 nT, can be a factor of two. Seeking not to overstate storm probabilities, we chose to emphasize the constrained Weibull model in this report since it gives lower probabilities than the Gumbel. We wonder if physics-based theories or simulations might better inform the most appropriate statistical model of extreme-value storm intensities or the theoretical upper-limit on storm intensity. If we knew that the Gumbel model is appropriate for cycle ranked −Dst m , possibly because the source distribution is upper-limit lognormal, then the probabilities of the most extreme storms, estimated using the data in Table 1, would be higher than those reported here. On the other hand, if we knew that the Weibull model is appropriate, but that the theoretical upper limit on storm intensity is less than the 2500 nT that Vasyliūnas (2011) estimates, then the probabilities of the most extreme storms, estimated using the data in Table 1, would be lower than those reported here. Either way, we see opportunities for physics-based theories to further advance statistical analyses of extremely intense magnetic storms and to better predict the probabilities of their future occurrences.

Data Availability Statement
The standard Dst index is available from the Kyoto WDC (wdc.kugi.kyoto-u.ac.jp). The Oulu Dst index is available from the University of Oulu, Finland (dcx.oulu.fi). Historical observatory hourly data are available from the Kyoto WDC and the Edinburgh WDC (www.wdc.bgs.ac.uk). We used observatory yearbooks from the USGS, the National Oceanic and Atmospheric Administration, the Linda Hall Library, Kansas City, Missouri, the National Diet Library Digital Collections, Japan (dl.ndl.go.jp), and google.com. The homogeneous aa index is available as a supplement to Lockwood, Chambodut, et al. (2018) (https://doi.org/10.1051/ swsc/2018038). Sunspot numbers are available from the WDC-SILSO (sidc.be/silso), Royal Observatory of Belgium, Brussels. als (journal reviewers) for reviewing a draft manuscript. This work was supported by the U.S. Geological Survey Geomagnetism Program. We thank the USGS Library for support in obtaining documents.