Cumulative rate analysis (CURATE): A clustering algorithm for swarm dominated catalogs


Corresponding author: K. M. Jacobs, Victoria University of Wellington, PO Box 600, Kelburn, Wellington 6012, New Zealand.(


[1] We present a new cumulative rate (CURATE) clustering method to identify earthquake sequences especially in regions with swarm activity. The method identifies sequences by comparing observed rates to an average rate. It is distinct from previous clustering techniques in that no direct assumptions about physical processes relating to temporal decay or earthquake-earthquake interaction are made. Instead these assumptions are replaced by a more general one, that earthquakes occurring within a sequence likely share a common physical trigger, which is manifested by a change in rate. The use of rate as the primary selection parameter emphasizes that temporal proximity is the main commonality among different sequence types. To investigate catalog-scale earthquake sequence characteristics, we apply the method along with four standard (de-)clustering methods to a catalog of 4845 M ≥ 2.45 earthquakes from 1993 through 2007 in the Central Volcanic Region of New Zealand. Despite the distinct focus of the method on sequence formation, the declustered catalog of the CURATE method sits within the suite of declustered catalogs produced by other methods. A stochastic reconstruction based on epidemic-type aftershock sequence parameters is also presented to test the differences between catalogs that exclusively contain mainshock-aftershock sequences and areas that exhibit multiple physical processes. We test the declustered catalogs produced by all methods for a Poisson temporal distribution and propose that this be used to ensure reasonable selection parameters. The CURATE method will be especially useful for identifying swarms, creating likelihoods of the size and duration of sequences, and refining earthquake forecasts that include swarms at regional and local scales.

1 Introduction

1.1 Earthquake Sequences

[2] Earthquakes often occur in groups, clustering in time and space. This temporal clustering is defined as an increase in rate above a given background rate, where the background is the typical rate of earthquakes observed in the area of interest. These clusters have often been divided into two categories: mainshock-aftershock sequences and earthquake swarms [Mogi, 1963; Scholz, 2002]. For this study, we define an earthquake sequence to be any group of earthquakes with a rate above an average rate that are also related in space; this will include both categories. An increase in rate is the common element between both types of earthquake sequences. Classically a mainshock-aftershock sequence is one in which the largest magnitude event occurs first, or early in the sequence if there are foreshocks. According to Bath's law there is also an expected magnitude difference between the two largest shocks of 1.2 magnitude units [Bath, 1965]. However, that value is an average and many authors have shown that there is a wide range of observed magnitude differences for mainshock-aftershock sequences globally [e.g., Felzer et al., 2002; Helmstetter and Sornette, 2003].

[3] Recent works by Christophersen and Smith [2008] and Zhuang et al. [2008] confirm that foreshock sequences behave like mainshock-aftershock sequences, which happen to have a smaller first event [Felzer et al., 2004]. We will thus include foreshock sequences when we refer to “mainshock-aftershock” sequences. Equations and models (e.g., Omori's law [Utsu et al., 1995] and epidemic-type aftershock sequence (ETAS) [Ogata, 1998]) can now be used to calculate the expected number of earthquakes (aftershocks) once a large mainshock occurs, and the physical cause of those aftershocks seems to be relatively clear [Scholz, 2002]. The same is not true for earthquake swarms in the sense that no parallel laws of abundance and decay exist for swarm sequences and there is no way to forecast them. There is not even a generally accepted quantitative definition of an earthquake swarm, although most authors would agree with a qualitative definition similar to Mogi's [1963] that earthquake swarms are groups of earthquakes that are closely clustered in time and space (although the duration may be years) and which have no predominant earthquake or “mainshock”. Because the term “swarm” is used to classify sequences that simply have the absence of a distinct mainshock, they encompass a variety of spatial and temporal patterns. This implies that the largest earthquake is not the main physical trigger for the subsequent sequence and that magnitude-dependent clustering techniques [e.g., Gardner and Knopoff, 1974; Reasenberg, 1985] may not be as effective in identifying swarm sequences.

[4] Most authors who have studied earthquake sequences in volcanic regions note the variation in earthquake rate, duration, and magnitudes [Eiby, 1966; Gibowicz, 1973a; Hurst et al., 2008; McNutt, 2005; Sherburn, 1992; Toda et al., 2002]. Benoit and McNutt [1996] report that swarm sequences have a tendency to be recorded with durations that are related to fixed periods of observation (e.g., daily, weekly). These observations emphasize the need for a general model to detect and quantify the observed behavior.

1.2 Motivation for a New Method

[5] Clustering methods allow objective quantitative comparisons of sequences within the same catalog and between different geographic regions. The resulting sequence catalogs (catalogs of clusters identified by clustering methods) can be used to set a benchmark to measure the relative size of future sequences and thus lend themselves as aides to hazard assessment and physical interpretation. While the comparison of sequences within any one method is objective, because many methods exist to identify clusters and remove them from an earthquake catalog, the choice of which method to use is subjective (see van Stiphout et al. [2012] from the CORSSA project [Zechar et al., 2011] for a review of methods). Standard clustering techniques often focus on the removal of sequences rather than the sequences themselves and rely on aftershock behavior and relationships to previous seismicity for cluster assignment. These assumptions may not apply to regions or over magnitude ranges with swarm seismicity, for which the laws of temporal and spatial scaling are not known. Thus, despite the large number of available methods, standard clustering methods are not used in studies of swarm sequences with many authors instead defining arbitrary spatial and rate boundaries to define individual sequences [e.g., Vidale and Shearer, 2006]. Therefore, we have developed a new cumulative rate method, which we call “CURATE” to characterize sequences of earthquakes that could include both mainshock-aftershock and swarm sequences.

[6] To demonstrate the differences between our method and other declustering and clustering methods, we have applied several of the standard methods to our data set. The three types of methods analyzed here are window methods [Gardner and Knopoff, 1974], link-based methods (Single-Link Clustering [Frohlich and Davis, 1990] and Reasenberg [1985]) and Stochastic Declustering [Zhuang et al., 2002]. Summaries of the four specific methods are presented in section 4.

[7] All four of the standard methods presented here assume a single process or suite of processes that depend on earthquake-earthquake interaction and are roughly governed by some combination of Omori's law time decay (aftershock) or a fault interaction zone between successive earthquakes. Single-link clustering makes the least physical assumptions, but still relies on the idea that earthquakes are directly caused by preceding seismicity.

[8] Swarms are understood [Scholz, 2002] to be distinct from the decay processes of aftershock sequences. Brodsky [2006] identified some late triggered swarm earthquakes as aftershocks of events that occurred during the passage of surface waves. These triggered swarms were well fit by an Omori's law decay. It remains to be seen whether nontriggered swarm sequence decay, where observed, can be described by Omori's law and if so whether the parameters fall within the range of values observed in true aftershock sequences.

[9] The CURATE method uses seismicity rate as the main link between earthquakes. Using rate allows us to eliminate the assumption that earthquakes close in time and space are caused by each other and allows us to make the more general statement that an increased rate of earthquakes is evidence that they likely share a common physical trigger. In some cases that physical trigger will be another earthquake, but in the case of many swarms it could be a different physical impetus that may be completely separate from the preceding seismicity (e.g., increased pore-fluid pressure over a region [e.g., Hainzl, 2004]). Separating the definition of increased activity from a direct distance relationship between events will enable us to select and vary a distance parameter based on the region of interest and the anticipated scale of activity. It will also allow quantitative analysis of swarms with multiple bursts (potentially multiple inputs) and the variation of duration, number of events, and energy output between such subswarms. The method is easy to apply and can be used at a variety of temporal and spatial scales. We anticipate it will be most useful at regional (not global) and local scales in magnitude ranges that include some swarm activity (likely Mmax < 6.5).

2 Sequence Selection

2.1 Introduction to Sequence Selection

[10] In order to use any clustering technique, including the CURATE method, it is first necessary to start with a complete earthquake catalog (i.e., any catalog where the lower magnitude limit [Mcut] is greater than or equal to the magnitude of completeness, Mc [e.g., Wiemer and Wyss, 2000]). Unlike typical clustering methods, the CURATE method starts by placing a portion of the earthquake catalog (that with rates below the mean rate) into an initial declustered catalog. We essentially make a subset of the catalog that contains earthquakes that occur at a rate above a certain rate threshold and search for spatial links between those earthquakes. Most earthquakes occurring above this rate will not be background earthquakes and so the probability of including background earthquakes does not increase dramatically when the spatial search window is extended marginally. The remaining steps of the method assign all earthquakes occurring at rates above the mean rate into individual clusters and may place additional earthquakes into the declustered catalog. Figure 1 shows a detailed flow chart through the steps of the method that are described in the following sections. The free parameters of the method are the spatial and temporal boundaries that define the earthquake catalog (this defines the mean earthquake rate), a distance search rule (which limits the area size of potential sequences), and a time window (used to allow for lulls in activity above Mc where there may be ongoing activity at a magnitude below Mc).

Figure 1.

Flow chart outlining the steps the CURATE method uses to identify sequences (and an accompanying declustered catalog). Curved paths represent steps where processed data are rerun through an earlier step one or more times.

2.2 Step 1: CURATE

[11] The first step we take to identify earthquake sequences is to check for a temporal relationship between earthquakes. We use an application of the cumulative sum (CUSUM) method [Page, 1954; Tam, 2009] to characterize the rate. At the time of each earthquake in the catalog, the CUSUM uses the average daily number of earthquakes for the entire period to estimate the expected cumulative number of earthquakes from the beginning of the catalog up to that point in time. The expected cumulative total is subtracted from the real (observed) cumulative total for each earthquake in the catalog (see Figure 2). This produces a comparison to the average rate that is analogous to a reduced travel–time curve. CUSUM methods are often used to detect subtle changes in rate from a background rate that also has fluctuations [Page, 1954; Tam, 2009]. The average or mean rate used in this method will always be higher than the true background rate because it is calculated from the raw earthquake catalog that includes sequences. The resulting sequences produced by this method contain earthquakes that occur at the highest rates and are, we propose, most likely to be related. The use of the mean makes the CURATE method more sensitive to the actual temporal distribution of earthquakes than methods that only consider the individual times between earthquakes.

Figure 2.

Plots of the CURATE and cumulative number of earthquakes with time from 1993 through 2007.5. Any upward movement (positive slope, but not necessarily greater than zero value) is an indication of above average seismicity rate. For example, (A) points to the upward line related to the 1998 Haroharo earthquake swarm (380 earthquakes) and (B) points out the upward line related to a mainshock-aftershock sequence of a magnitude 4.85 earthquake in 2002 (50 earthquakes). The bold black line shows a mean cumulative rate trend for reference.

[12] Equations (1) and (2) describe the CUSUM calculation carried out for each day in the catalog, where D is the daily average number of earthquakes, ts is the time of the first earthquake in the catalog, ti is the time (in decimal days from the beginning of the catalog) of an earthquake in the catalog, tf is the time of the last earthquake in the catalog, and N(ts,t) is the number of earthquakes observed between time ts and t.

display math(1)
display math(2)

[13] This calculation identifies all earthquakes for which the time from the previous earthquake is less than 1/(mean rate), i.e., upward slopes in Figure 2. These events are marked as parts of potential sequences with individual sequences being defined by continuous periods of increase. These periods of increased activity generally look like vertical lines on the graph because of the short time span (hours to days) over which they occur. This approach identifies all types of sequences. If desired, mainshock sequences can be differentiated by other criteria later. This first step creates a list of temporally related earthquakes, which we refer to as “potential sequences.”

2.3 Step 2: Distance-Rule

[14] Once potential sequences have been identified, we apply a distance rule to check that these groups of earthquakes are spatially related. This step is akin to the distance element of a windowing method but it will not sweep in as much seismicity because the time element is not a window, and part of the seismicity (i.e., the seismicity on days of low rate) has already been eliminated from inclusion in the spatial window. The distance rule is chosen by the user, and essentially functions as the scale on which we expect, or wish to search for, possible physical mechanisms underlying the sequences. The initial CURATE calculation will select all earthquakes in our study area during periods of elevated activity regardless of the geographic location. Thus, the potential sequences may include distant earthquakes that are part of the background activity away from the sequence location, or simultaneous sequences occurring in separate locations. To ensure earthquakes in each sequence are spatially related, we first calculate a mean location for each potential sequence, and the distance of each earthquake from the mean location. If the distance from the mean is greater than the distance-rule, the earthquake is eliminated from the sequence.

[15] After events have undergone the distance selection, the mean sequence location is recalculated with the remaining earthquakes. Due to this recalculation it is possible that some sequences will have earthquakes that are now at distances greater than the distance-rule from the mean location (although all remaining earthquakes in the sequence will be within the distance-rule of the original mean location). To ensure that this step does not create a “cookie-cutter” effect and include only parts of a secondary sequence that may fall along the search boundary, any sequence that has more than 5% of the recalculated distances greater than the distance rule is split into two sequences, with earthquakes less than the distance-rule forming one sequence and earthquakes above that limit in another. This creates an upper 95% limit on our distance-rule and eliminates some ambiguity on the spread of distances between earthquakes in a sequence. These two sequences are not rechecked against the distance rule because they already belong to a subset of events within the distance rule of some point.

[16] To account for cases where a smaller, distant sequence is occurring simultaneously with a large sequence (that dominates the mean sequence location), earthquakes that are initially rejected from a potential sequence are returned to the catalog of earthquakes that have not yet been assigned to any sequence. This catalog subset is then searched again for earthquakes related in time (using the previously calculated value of D from equation (1)) iteratively, including application of the distance rule to any found sequences, until no more sequences are found among the rejected earthquakes. Groups of earthquakes that are produced after the application of the distance-rule are called “proto-sequences.” Earthquakes not in a proto-sequence are considered to be the background seismicity and are equivalent to a declustered catalog (see section 4.4 for a comparison of this background seismicity to that from standard declustering algorithms).

2.4 Step 3: Day-Rule

[17] The third step of the CURATE method, which combines related proto-sequences, is akin to a link-based system in that it allows for expansion of the sequence spatial dimensions outside of the initial circular search area through the concatenation of sequences. To determine the expected cumulative number of earthquakes at a particular time, the CURATE method multiplies the time (in decimal days) of the catalog by the daily average number of earthquakes. So to be selected as part of a cluster the interevent time between earthquakes must be less than 1/(mean rate). This may produce sequences that are in the same location whose start and end dates are just a day or two apart. Depending on Mcut (chosen minimum magnitude of the catalog ≥ Mc) this may or may not reflect separate causal processes. To address this potential problem, we have introduced an allowance of a certain number of days between sequences or a “day rule.” If activity continues in a similar location (defined by the distance-rule) within the time defined by the day rule, then the activity is assumed to be related. If no other proto-sequences exist within the day and distance rules the sequence will remain unchanged. Sequences that are found to be related are then concatenated into a single sequence with new parameters (duration/mean location/etc). This concatenation can expand the total sequence area from the original distance-rule limits. Note that this step does not reintroduce background earthquakes, which were identified in step 2 (earthquakes that do not exceed the CURATE). It simply concatenates existing proto-sequences. The products of step 3 are our “Initial Sequences.”

2.5 Step 4: Boundary Checking

[18] The final step is designed to ensure that sequences in our catalog are not arbitrarily truncated at the catalog border. This is necessary for our current study because the northern (offshore) boundary of the Central Volcanic Region (CVR) is not defined tectonically or by a natural decrease of activity, but instead by our ability to measure the activity. Due to the arbitrary nature of that boundary we do a search for earthquakes (not new sequences) beyond the border that are related by the distance and day rules to an existing initial sequence. Because earthquakes may be added which change the initial sequence start and end, the results are run through step 3 once more to check for related sequences. If any sequences of only one event are identified during the method they are added to the declustered catalog and any sequence of two or more earthquakes make up our final sequence catalog, and groups of earthquakes that have been through this final step are simply called “sequences.” Step 4 may be unnecessary in studies where the boundaries are drawn solely on the basis of seismic activity, but should be performed in area where boundaries are based on arbitrary parameters related to network or catalog variability. See Wang et al. [2010] for more about the importance of boundary testing in clustering methods.

3 Application of the Method: Central Volcanic Region, New Zealand

3.1 The Setting of the CVR, New Zealand

[19] As an example application, we use seismicity in the CVR in the North Island, New Zealand. The CVR is interpreted as a backarc extensional basin that represents a continuation of the southern end of the Lau-Havre trough [Stern et al., 2006; Wallace et al., 2004]. The Hikurangi trough lies off the east coast of the North Island where the Pacific plate subducts beneath the Australian plate. The eastern portion of the CVR has high heat flow and volcanism [Bibby et al., 1995; Wilson et al., 1995], and is often referred to separately as the Taupo Volcanic Zone (TVZ). Most of the earthquakes observed in the CVR occur in the TVZ. We chose this catalog because it is an active swarm region and swarms there have been observed and documented as early as 1922 [Bryan et al., 1999; Eiby, 1966; Garrick and Gibowicz, 1983; Gibowicz, 1973b; Hayes et al., 2004; Hurst and McGinty, 1999; Hurst et al., 2008; Sherburn, 1992]. While the chosen region is dominated by swarms, we note that it also has experienced large mainshock-aftershock sequences, of which the largest event recorded in the last 50 years was the Edgecumbe 1987, M = 6.3 [Smith and Oppenheimer, 1989]. Many other regions worldwide also experience some degree of swarm activity in addition to mainshock-aftershock activity [Scholz, 2002; Vidale and Shearer, 2006].

3.2 Earthquake Data and Completeness

[20] Our earthquake data come from the New Zealand GeoNet catalog of located earthquakes from 1993 through July 2007 within a triangular boundary around the Central Volcanic Region (vertices: –39.7° 175.25° ; –37.65° 178°; –37.65° 175.25°) and a depth of 40 km or less (Figure 3). The eastern and western boundaries are defined by decreases in shallow seismicity outside the CVR. The northern boundary, unlike the other boundaries, has been defined on the basis of completeness and not as a delineation of change in earthquake activity. A spatial test of Mc shows that north of –37.25° values of Mc increase to Mc= 3.0 and continue to increase northward (offshore) due to a land-based seismic station network. The stochastic reconstruction presented in section 4.5 requires data outside the target region to establish the background rate and history of the catalog so we chose a northern limit lower by 0.4°S to accommodate the need for completeness beyond the boundaries of the polygon. Using the method of Cao and Gao [2002], which uses successive magnitudes to determine a stable b-value and establish Mc, with a northern boundary of –37.65° (Figure 4), we find a magnitude of completeness (Mc) of 2.45. Mc has decreased with time in general and is also lower in some limited geographic regions, but we have chosen to work with a single Mc to make comparisons in time and space clearer. The catalog contains 4845 earthquakes M ≥ 2.45 within the chosen polygon and time period.

Figure 3.

(A) Location map of the Central Volcanic Region with GeoNet earthquake data for all shallow (< 40 km) seismicity M ≥ 2.45 from 2005. Note that southeast of the CVR/TVZ is an area of shallow subduction related earthquakes.

Figure 4.

Plot used to determine the magnitude of completeness in the method of Cao and Gao [2002]. The horizontal line is shown to help trace the stable range of Mc and the corresponding b-value (1.24).

3.3 Sequence Catalog

[21] Here we present the results of a single sequence catalog using a minimum number of events in a sequence (Nmin) = 4, distance-rule = 20 km, and a day-rule = 3.5 days. The CURATE plot for this catalog is shown in Figure 2. Like all clustering algorithms, the CURATE method identifies clusters as small as two earthquakes; however, the smallest clusters in any method are most likely to be affected by selection rules specific to the individual method. We chose to reduce the possible bias of small clusters in the sequence parameters by using a larger value (Nmin = 4) for the sequence analysis. These parameters have been chosen on the basis of previously described swarm activity in the CVR. Large swarms have been documented by Hurst et al. [2008], Sherburn [1992], and others. We leave the distance rule relatively large to encompass such activity and to see if any precursory activity is seen on these scales.

[22] Using these parameters the CURATE method defines 163 sequences in the CVR comprising 2583 earthquakes (out of 4845), with individual sequences containing between Nmin =4 and 380 events (Figure 5). Table 1 shows the number of sequences affected by each processing step. Only six earthquakes are added in the final processing step (combination of two sequences of three earthquakes.) Table 2 gives the range of changes made to individual sequences in each of the later steps of the method. We exclude steps one and two from the table as earthquakes kept and rejected at these stages depend more on the earthquake catalog itself than the method. Note that we do not allow sequences to exist entirely above the northern boundary so the northern boundary step cannot add sequences, and very few sequences are affected by the final boundary check. The mean, median, and standard deviation of Table 2 step 4b are the same, indicating that just two sequences have been joined together by the final application of the Day-rule. The median time added to the sequence duration is less than half of the Day-rule (1.2 vs. 3.5 days), which suggests that the selection parameter is not at a critical value; it is not being used to its maximum extent and is therefore reasonable. While the median number of earthquakes added is small, the maximum is large. This likely indicates that a large sequence was continuing with events below Mc (as suggested by the change in sequence size in Table 1). Table 3 gives the sequence statistics including the number of earthquakes, duration, Mmax (maximum magnitude), and area (an elliptical area enclosing all earthquakes in the sequence). The values indicate that the majority of identified sequences are small and over quickly. The low mean and median values for the area (relative to the possible area of the search radius ~1256 km2) indicate that the large search radius for sequence earthquakes is not inflating sequence areas (Table 3). The three largest sequences in the catalog have all been identified by previous authors as likely swarm sequences [Hurst et al., 2008].

Figure 5.

Sequence abundance and distribution. All sequences are plotted at their mean locations and scaled and colored by the number of earthquakes they contain.

Table 1. Number of Sequences and Earthquakes for Different Sequence Selection Steps
 Proto SequencesInitial SequencesSequences
# sequences177163163
# earthquakes237325772583
Max # eqs in a sequence336380380
# seqs above 10474343
#seqs above 100366
Table 2. Additions Made to Sequences by Different Processing Steps
 Step 3: Day RuleStep 4: Boundary CheckStep 4b: Reapplication of Day Rule
Number of sequences combined
Number of earthquakes added
Time (days) added to sequence duration
Moment added to sequence (Nm)
Table 3. Statistics of Key Sequence Parameters
 MinMaxMeanStandard DeviationMedian
Duration(days)< 156.953.467.220.00
# Earthquakes438016394
Area (km2)< 1930.8794.62142.7234.24

[23] Distance rules from 10 to 50 km and day rules from 3 to 7 were also tested. The largest increase in the number of earthquakes included in sequences between any two distance rules occurred between 10 and 20 km (Table 4). The 10 km rule was too small to encompass all activity. The 20 km rule appeared to be the smallest distance rule capable of describing sequences in ways that matched published descriptions of that activity. We do not attempt to prove a “best” set of parameters. The sequence catalogs are not sensitive to the day and distance rule parameters (for further discussion see section 5.2). We will show in section 5.1 that we can reproduce foreshock probabilities previously calculated for this region. This gives us further confidence that the parameters presented here are reasonable.

Table 4. Number of Earthquakes in Catalogs With Varying Day and Distance Rules Values in Bold Text Correspond to Catalogs Displayed in Figure 6 and Listed in Table 5
Radius (km)      
Figure 6.

Cumulative number of earthquakes in the declustered catalogs with time for five different clustering techniques, some with multiple parameter values.

Table 5. Parameters of Different Declustering Methods (Durations in Days, and Areas in km2)
Method# seqs# seqs 4+# seqs 10+lrgst seq #lrgst seq Mmaxlrgst seq Durationlrgst seq AreaArea MeanArea MedianArea MaxDuration MeanDuration MedianDuration Max
SLC (C = 1, DST = 5)466141373704.6829.140717.74.74072.10.4108.2
SLC (C = 10, DST = 3)463124262934.683.02028.02.32020.340.128.07
SLC (C = 10, DST = 30)495218685964.7597.13290429.9134.159427.44.1203.4
Stochastic -Declustering438154496474.682410.028211068.86568306.08.54570.0
CURATE (Nmin= 4)482163433804.6829.193271.417.49321.50.357.0
CURATE 20_3488166433414.6811.56931.6270.8017.36931.621.320.3456.95
CURATE 20_3.5482163433804.6829.12931.6271.3917.36931.621.450.3556.95
CURATE 20_4476163433804.6829.12931.6279.3617.14931.621.620.3856.95
CURATE 20_5455162433804.6829.12947.1685.9518.30947.162.020.42114.37
CURATE 20_7420157443804.6829.12945.99102.3225.74945.993.010.47159.56
CURATE 10_3.5456141403714.6829.12509.1030.188.55509.101.290.2056.95
CURATE 15_3.5464154443774.6829.12685.0743.2212.91685.071.390.2556.95
CURATE 20_3.5482163433804.6829.12931.6271.3917.36931.621.450.3556.95
CURATE 25_3.5489172485134.6831.251974.16132.7026.961974.161.580.4156.95
CURATE 30_3.5495179545194.6831.252471.66181.3032.582471.661.640.4356.95
CURATE 40_3.5504192545224.6831.257526.54386.2465.037526.541.790.5460.37
CURATE 50_3.5492210615254.6831.257686.64667.43135.667686.642.050.6574.26

4 Comparison With Other Methods

4.1 Link-Based Methods

[24] The single-link cluster (SLC) analysis of Frohlich and Davis [1990] was run using MATLAB scripts we generated. The SLC method defines a single space-time parameter as inline image, where τ is the time between each pair of earthquakes, d is the corresponding distance between those earthquakes, and C (often given a value of 1 km/day) is a constant to convert the time to an equivalent space/time distance. This single space-time distance (dST) is used to link earthquakes. Thus, earthquakes closer in time can be farther apart in space for the same dST.

[25] A value of dST is calculated for each earthquake pair in the catalog and each earthquake is then linked to the earthquake with the minimum dST value. This creates a set of trees. These trees are then linked by the shortest distance between any two trees until all earthquakes in the catalog are linked. Then, a limiting value DST is chosen and links greater than this value are severed to create distinct clusters. Davis and Frohlich [1991] give an equation for determining the best DST value to use to make clusters. The equation depends on the median link length of the catalog when all earthquakes are joined, and can be applied for median link lengths between 8 and 300 km. The close temporal proximity of lower magnitude earthquakes dramatically lowers the median link length even at high C values. The calculated median link lengths for the CVR catalog at Mc = 2.45 were 2.07–4.89 for C values between 0.05 and 10; these are well below 8 km. This is likely to create a generic problem with interpreting results for local catalogs or at low cutoff magnitudes. Davis and Frohlich [1991] suggest that smaller D ST values should be used in regions with higher background rates. This means that the maximum separation between clustered events is partially dependent on the background rate of seismicity. Because the temporal and spatial distances are treated equally, their errors are also treated equally. For catalogs with low Mc, the value of DST becomes very close to the location errors, hence some of the decisions of whether to link earthquakes will be influenced by location error and will not be consistent if different times or regions of the catalog have different location errors (this is a separate issue from variations in Mc). Note that variation of the DST and C parameters can create a declustered catalog of almost any size. We present results at three different C/ DST combinations to show the variation in the duration, and the size of clusters with parameter choice. A discussion on the effect of these parameter choices is presented in section 5.2.

[26] The other common link-based method is Reasenberg [1985], which we implemented using ZMAP codes by A. Allmann [Wiemer, 2001]. The Reasenberg method defines a maximum space and time “interaction zone” to look for earthquakes that may be related to each earthquake in the catalog. This distinguishes it from the SLC method because one earthquake can create multiple links to other earthquakes. Both the spatial and temporal zones assume typical mainshock-aftershock fault behavior. The time of the interaction zone is based on the maximum time (according to an expectation value derived from Omori's law decay) to observe the next earthquake in the sequence. For simplification this time takes on a maximum value of one day for earthquakes not already in a cluster and a maximum of ten days for events already associated with a cluster. The spatial interaction zone is based on the Kanamori and Anderson [1975] fault source dimension model multiplied by a scaling factor “Q” (generally equal to 10). Like the SLC method, this distance assumption limits the area of the defined sequence by assuming each earthquake is triggered by another earthquake in the catalog.

4.2 Window Method

[27] The Gardner and Knopoff [1974] method is a forward-looking window-based technique that focuses on the creation of the declustered catalog. The method sets magnitude-dependent space-time windows within which to remove earthquakes from the catalogs as clusters. We use the normal Gardner-Knopoff windows, which are conservative and err on the side of removing more earthquakes from the catalog with, e.g., (31 km/22 days) for M = 3.5, and (100 km/790 days) for M = 6.5. We ran the Gardner and Knopoff [1974] method using ZMAP codes by J. Woessner [Wiemer, 2001]. Gardner and Knopoff windows specifically for New Zealand were proposed by Savage and Rupp [2000]. Their windows are systematically larger (more conservative) than the normal parameters and we chose to have smaller clusters by using the smaller, original windows. Thus the results presented here will return a larger declustered catalog. The forward-looking nature of the algorithm separates some clear mainshock-aftershock sequences into two sequences due to other earthquakes occurring beforehand. In such cases the clusters themselves are not physically meaningful (arbitrary separation of a single sequence), but the desired declustered outcome is unchanged. The windowing technique also takes out large amounts of seismicity due to the time window involved. As we show below with stochastic reconstructions of the catalog, the decay time is probably largely overestimated for swarm activity (section 4.5).

4.3 Probabilistic Method

[28] The final method we use for comparison is the stochastic declustering technique of Zhuang et al. [2002]. Stochastic declustering assigns a probability that each earthquake is a background event or an offspring of a previous earthquake in the catalog. Earthquakes are then put into clusters based on these probabilities. The method estimates the background intensity μ(x,y) by first estimating the total spatial intensity (variable kernel method) and modeling the branching structure (assigned using modeled ETAS model parameters for the specific earthquake catalog to be analyzed as inputs into a productivity function with Omori's law temporal decay). The ETAS model is a point process model that consists of a background process plus a modified Omori's law for temporal decay with a magnitude-dependent abundance defined as

display math(3)

where α is the efficiency of an earthquake to produce aftershocks, A is the productivity, and Mc is the cutoff magnitude of the earthquake catalog [Ogata, 1998].

[29] That is to say, each time an event, namely (ti, xi, yi, Mi), occurs, it triggers a nonstationary Poisson process with a rate κ(Mi)g(t − ti)f(x − xi), where g represents a probability density function corresponding to Omori's law, and f is the probability density function for the locations of the direct triggered events. Here we refer to Zhuang et al. [2008] for details. The ETAS parameters modeled for the CVR (α = 0.74, A=0.66) are relatively low for productivity values, but are consistent with previous models that include the CVR [Zhuang et al., 2002]. However, this CVR α-value is much less than the α-value for the whole New Zealand region in Zhuang et al. [2008], which is 1.92 for all shallow events (depth <40 km) for M4.3+ and 1.75 of M4.0+, implying that the difference in triggering productivity between events of different magnitudes in CVR is not as significant as in other parts of NZ. Other possible reasons for the low α value are discussed in section 4.5.

[30] The probability that an event belongs to the class of triggered events can be estimated as the proportion of the contribution of the triggering by all the previous events to the total occurrence rate. Once probabilities of being an offspring event (pj) have been constructed for each earthquake, background earthquakes can be (nonuniquely) identified from offspring events stochastically by generating a set of uniform random numbers U within the set [0 1] and assigning each earthquake (j) as background if Uj < 1 – pj, and all others as offspring. Similarly, to create specific clusters, the ancestor for any offspring events can be found by taking the earliest earthquake (smallest value of I ) such that inline image where the probability that j is an offspring of event i, pi,j, is dependent on the magnitude, time, and spatial position of earthquakes i and j. See Zhuang et al. [2002, 2004] for a full description of the method. Due to the stochastic nature of the simulations, each declustering run is different. Zhuang et al. [2002] observed that most earthquakes (70–80%) show clear probabilities (pj <0.1, pj >0.9) of being either a background or a clustered event. Thus, the exact number of earthquakes (and hence the area and duration) of individual sequences will vary but the overall number of sequences and their relative size is similar between simulations.

4.4 General Observations

[31] In the following section the CURATE sequence catalog has been set to Nmin = 2 to include the possibility of clusters as small as two events, which occur in other methods. Figure 6 shows the range of declustered catalogs (seismicity identified as background) for the different methods. The CURATE declustered catalog falls within normal limits of the standard methods presented. The Reasenberg [1985] and Single-Link-Clustering C/DST = 10/3 [Davis and Frohlich, 1991] are not as smooth as the other curves, indicating they may not be identifying all clustering activities (see Figure 6). At the other end of the range Stochastic Declustering, Gardner and Knopoff, and SLC (C/DST = 10/30) have assigned the most events to sequences and have the smallest declustered catalogs. Table 5 gives a comparison of the area, duration, and parameters of the largest identified sequence for all five methods and for varying selection parameters of the CURATE method. The assumptions about earthquake-earthquake interaction and causation tend to give the Reasenberg catalog more short clusters, but where larger clusters develop, the durations are larger than observed in the CURATE catalog. The Gardner-Knopoff and Stochastic Declustering methods give longer durations and larger areas than the CURATE and link-based methods. Figure 7 shows duration histograms that more clearly illustrate the differences between the sequences defined by the different methods. The CURATE and link-based methods assign most sequences durations of one day or less and have very few sequences lasting longer than a week. More conservative techniques (Gardner-Knopoff, Stochastic Declustering, and large C /DST value Single-Link-Clustering) define sequence catalogs with a broader range of durations, with durations up to years long.

Figure 7.

Duration histograms for clusters defined by five different clustering methods, some with multiple parameter values.

4.5 Stochastic Reconstruction

[32] We present the results of a reconstructed synthetic catalog to fully compare the ability of an aftershock-based model to describe the seismicity in the CVR. If aftershock models and temporal decay assumptions are appropriate, then a synthetic catalog generated using ETAS parameters should match the general characteristics of the observed seismicity. A simplified description of the reconstruction method is included here. See Zhuang et al [2004] for a complete description of stochastic reconstruction. Stochastic reconstruction utilizes the same principles as stochastic declustering. It is based on the idea that if you can model the background process and know the aftershock productivity, time, and spatial functions associated with that productivity then you should be able to stochastically create a synthetic earthquake catalog. The technique introduced by Zhuang et al. [2004] uses ETAS parameters as input to the productivity function of a branching structure with Omori's law temporal decay. Once background events are determined, a Gaussian deviation is added to the locations of those background events. The times of those background events are kept, but the locations are randomly reordered and a magnitude is assigned to each background time by resampling the magnitudes of all events in the target catalog. To account for boundary effects, a larger space/time window around the target catalog is used to carry out the simulation. These background events are then allowed to produce offspring according to the same formulations used in stochastic declustering with an ETAS branching processes, with their temporal occurrence relating to Omori's law and their spatial distribution governed by a long-range inverse power decay. The reconstruction method allows for direct comparisons between the model assumptions and the real catalog data. Here the polygon (–37°,175°;–39.9°, 175°;–39.9°, 175.25°;–37.25°, 178.25°,–37°,178.25°) is used to account for boundary effects, and the catalog from 1987 through 1992 is used to inform the background process history. We have then extracted all reconstructed events within the target boundaries (–39.7°, 175.25°;–37.65°, 178°;–37.65°, 175.25°) and times (1993–2007.5) of the catalog analyzed earlier. Ten simulations were run and returned between 5104 and 5364 earthquakes. A cumulative time plot of the reconstructed catalogs and the observed catalog is shown in Figure 8 with insets showing the calculated CURATE for the real catalog, and one of the reconstructed catalogs (5231 events). A clear deficiency of temporal clustering is seen in the reconstructed catalog.

Figure 8.

Cumulative number of events with time for the real and ten stochastically reconstructed catalogs. Note the relatively uniform slope of the reconstructed catalogs with very weak temporal clustering. Insets (A) and (B) show the CURATE plots for the real catalog and one of the stochastic reconstructions.

[33] This lack of clustering may result from an underestimate of α. A number of studies on smaller regions have shown that misfits of ETAS models may sometimes be corrected by using a background function μ(x,y) that is temporally nonstationary [Hainzl and Ogata, 2005; Lombardi et al., 2006, 2010. Lohman and McGuire, 2007; Llenos et al., 2009]. Our results show that this problem is also observable on a catalog-wide scale over long time periods. Note that while the parameter μ(x,y) is often referred to as the background rate, in a time varying context it is much more accurately the rate of independent events, and not the background rate in a traditional sense. These previous studies have shown that the number of events with time (and temporal clustering) can be reproduced by allowing μ(x,y) to vary with time. However, an increase in the rate of independent events would also alter the results of stochastic declustering and limit the ability to identify individual swarm clusters, with more individual clusters likely during periods with increased rates of independent events. The inability of temporally stationary μ(x,y) to characterize our data set also suggests that catalog-scale analysis methods that use a single set of aftershock decay parameters (time invariant ETAS, Reasenberg, Gardner-Knopoff) are unlikely to be useful in assessing sequence behavior in swarm-dominated regions.

4.6 Further Comparison With Link-Based Methods

[34] Most of the declustered catalogs produced by the two link-based methods are larger but similar in number to the declustered catalog of the CURATE method (Figure 6). The SLC method with parameters of C = 10 and DST = 3 is in this range; however, the cumulative declustered catalog (Figure 6) is not as smooth and Figure 6 and Table 5 demonstrate that those parameters limit almost all sequence durations to less than 3 days. Other possible SLC catalogs (C/ DST) combinations are considered in section 5.2. The Reasenberg and SLC method with parameters of C = 1 and DST = 5 are the closest match to our CURATE method in terms of the largest sequence and the total number of sequences (Table 5), as well as the duration distributions (Figure 7), so we have looked at those two sequence catalogs in more detail. As noted above, the CURATE sequence catalog is shown at Nmin = 2. Despite the different approaches of these two methods and the CURATE method, a few sequences are identical in all three sequence catalogs. These are all sequences that occur over short timescales and are extremely close in space (day/km range). Figure 9 shows a direct comparison of methods for time periods around two large sequences near Haroharo in the Okataina caldera complex (Figure 3). The larger 1998 sequence (Figure 9, 3–4) is described by Hurst et al. [2008] as the “Haroharo sequence.”

Figure 9.

Map and magnitude comparisons for time periods around two large sequences near Haroharo (1–2) November 1997 and (3–4) February1998. Columns 1 and 3 are magnitude with time plots for each of three clustering methods: (A) CURATE, (B) Reasenberg, (C) SLC (C=1, DST=5). Columns 2 and 4 are map views of the same sequences shown in columns 1 and 3. The colors and shapes in (A) are each distinct sequences identified by that method. Shapes in plots (B) and (C) represent sequences that contain earthquakes that overlap with the sequences identified by the CURATE method (A); and colors in (B) and (C) still identify distinct sequences identified by these methods. The largest sequence in each time period is represented by black circles. Colors and shapes do not translate between columns 1–2 and 3–4. See text for further explanation.

[35] There are three types of differences seen in Figure 9. The first is the difference caused by assigning time decay/physical causality to preceding seismicity. The Reasenberg method is the only method of these three that explicitly assumes temporal decay in keeping with aftershocks and Omori's law. This can be seen in Figure 9 (B1) by the assignment (black dots) of five more earthquakes to the main sequence in the 20 days following the bulk of the activity. This effect is also apparent in Figure 9 (B3) where the blue star sequence that starts around day 70 continues after day 80 with three more earthquake not identified by the CURATE (A3) or the SLC (C3) methods.

[36] The second major difference apparent in Figure 9 is a lack of events arising from the way spatially diffuse increases in seismicity and migration of events are dealt with. The CURATE searches an entire area over the time period of interest whereas the other two link-based methods assume that all earthquakes are caused by other earthquakes in the catalog, thus inter-earthquake distances must be within a certain maximum length. This spatial lack of events is most evident in the Reasenberg method (Figure 9, B2 and B4) with fewer events overall and a lack of events away from the densest part of clusters. Not including these more distant or sparse events in the sequence catalog is one of the reasons why the Reasenberg declustered catalog (Figure 6) has more events than any other method presented here. Although this effect is less evident in the SLC method (Figure 9, C) it is worth noting that in both the 1997 time period (A1–2, blue diamonds) and the 1998 time period (A3–4, green triangles) there is a sequence of two distant events preceding the main sequence that is not identified by either Reasenberg or SLC.

[37] Finally, some sequences that the CURATE method identifies as a single sequence are separated into two or more sequences on the basis of their lack of spatial proximity (spatial break). This division is often not based on the area over which the sequence is occurring but arises from the nature of link-based methods that necessarily link successive earthquakes. In Figure 9(1–2) a single sequence of 13 events (blue stars) is identified in the CURATE method (A1–2). The same sequence appears as eight events (stars) in three separate sequences in the SLC (C1–2) method. The effect is also seen in Figure 9(3–4). Because we do not have a priori knowledge of the area over which to expect swarm sequences, it is desirable to look for precursors to large-scale activity over the same spatial area as the anticipated activity. The spatial break is most evident in the SLC method because each earthquake only shares a single link; however, Figure 9 (B) shows that the Reasenberg method also contains traces of this effect. The Reasenberg method (Figure 9, B3–4) does not include two events (green circles) in the larger main sequence (black circles) even though the events look spatially proximal to that activity. This effect can be caused if there is migration with time of the main activity while decay continues in the original regions. Thus, it appears that the CURATE method's independent treatment of distance may better capture the decay of sequences than other link-based methods. In addition to possible decay, the CURATE method allows for larger sequence areas that are independent of the spatial succession of events. In contrast to the link-based methods the Stochastic Declustering combines both large sequences in Figure 9 into a single sequence lasting over 5 years.

5 Testing Sequence Selection Parameters

[38] Next we analyze the robustness of the selection parameters presented here for CURATE. We also propose a method for testing whether the chosen selection parameters are reasonable.

5.1 Comparison With Previous Foreshock Results

[39] To test the consistency of our sequence definitions and the resulting sequences, we have attempted to replicate results of a recent study of foreshocks in the TVZ conducted by Tormann et al. [2008]. They define a foreshock as “an earthquake that is followed by an event of equal or larger magnitude within 5 days and 50 km from the initial event.” They used a catalog of earthquakes with magnitudes ≥ 4.0 from 1964 to 2007 (they also determined that their findings are independent of the time period chosen). Tormann et al. [2008] used a magnitude and time-dependent window to remove aftershocks based on the Gardner and Knopoff [1974] method modified for New Zealand by Savage and Rupp [2000]. We have compared our sequences to their findings by estimating the empirical probability of foreshocks occurring in our sequence catalog. We assume that any foreshock-mainshock pair will be contained in a single sequence. We do not allow for a foreshock to be an earthquake outside of the sequences. A foreshock sequence then will be any sequence with a maximum magnitude M ≥ 4.0 event with at least one smaller M ≥ 4.0 event preceding it within 5 days and 50 km in the same sequence. Multiple foreshocks will not be taken into account. The number of possible foreshock sequences will be composed of sequences with at least one M ≥ 4.0 event and any M ≥ 4.0 earthquakes that are not part of any sequence (declustered catalog). The M ≥ 4.0 events in the declustered catalog are not in a sequence and therefore are not considered foreshocks in this analysis, so the foreshock probability can only reach 1 if there are no M ≥ 4.0 earthquakes in the declustered catalog. The boundaries of our catalog are larger than those used by Tormann et al. [2008] so first we limited our search to earthquakes and sequences that are within the TVZ boundaries (–37.00°, 175.85°; –39.29°, 175.55°; –37.50°, 177.40°). Then we identified 16 sequences within those boundaries that contained at least one M ≥ 4.0 event. The number of M ≥ 4.0 earthquakes (in the same area) in the declustered catalog was six. Using the constraints above there were four sequences with at least one foreshock. This gives us an empirical probability of ~18.2% (4/22), which is within the 24.2% + 7.7% range of Tormann et al. [2008].

5.2 Temporal Distribution of the Declustered Catalog

[40] It has long been suggested that the background rate of seismicity is random in time and should follow a Poisson distribution [e.g., Gardner and Knopoff, 1974; Ogata, 1988; Vere-Jones, 1970]. At this stage we do not offer a method for optimizing the sequence selection parameters (catalog boundaries, distance rule, day rule) but we suggest that a Poisson test of the declustered catalog can be used to test whether a given set of selection parameters are appropriate. This will not give a unique solution of parameters, but will have the ability to identify unreasonable selection parameters and catalogs whose mean rate is unrepresentative of a background rate. This approach may be developed to find optimized parameters by identifying the best fit to a Poisson distribution.

[41] Luen and Stark [2012] recently suggested that some declustering methods may remove too much seismicity to achieve a Poisson result. If too many earthquakes are removed from the catalog, the number of aftershocks expected for each earthquake will be overestimated and the rate of mainshocks will be underestimated. It is therefore ideal to leave in as much seismicity as possible.

[42] We have undertaken two different methods to test the declustered catalogs presented in section 4.4 for a temporal Poisson distribution. The first method utilizes the χ2 test as used in Gardner and Knopoff [1974]. This consists of binning the seismicity into 10 day time windows and counting the number of earthquakes that occur in each bin. To test the influence of the choice of bin, we have incremented the starting bin by the minimum interevent time in the declustered catalog (0.05 days) for start times between 0 and 10 days. This gives us 210 different bin starts. For each possible bin start the seismicity is tabulated over 10 day windows and the mean is used to calculate a Poisson distribution of those window counts. If the position of the bins does not affect the results we will only get binary answers of either 0 (completely non-Poisson) or 1 (completely Poisson). Another property of a Poisson distribution is that its mean should be equal to its variance [Dixon and Massey, 1968]. Thus, the variance divided by the mean (dispersion) should be insignificantly different from one. The second test simply calculates the dispersion for each set of bins. According to Vere-Jones [1970], overdispersion ( ≥ 1) indicates some degree of clustering, and under-dispersion ( < 1) indicates more regularity than a Poisson distribution.

[43] A surprisingly low percent of catalogs returned a Poisson result when using the entire time period (Figure 10, A2). It looks possible from the declustered catalog rates (Figure 6) that there is a significant change in rate before and after 1998. The non-binary results in the Poisson fraction (Figure 10) also show that the position of the bins does have an effect on whether the result passes the χ2 test.

Figure 10.

Poisson fits for declustered catalogs. (A–C) Cumulative distribution of values of variance/mean (dispersion) for 210 different binning configurations of 10 day windows for the three time periods marked (e.g., A: 1993–2007.5), colored by method in the key. Poisson distribution should have a dispersion of one. A2-C2 show the relative fraction of those 210 configurations that return a χ2 value consistent with a Poisson distribution. Methods are ordered by the number of earthquakes in the declustered catalog from smallest to largest. The lack of a colored bar indicates no configurations returned a Poisson result.

[44] The Reasenberg catalog returns the highest dispersion values and never returns a Poisson result for any time period (Figure 10). It is unsurprising that the Reasenberg declustered catalog is not Poisson given that it has the largest declustered catalog. The maps in Figure 9 (low amounts of clustered seismicity in B compared to A and C) also show how much temporally clustered seismicity is left in the declustered catalog. Van Stiphout et al. [2012] ran several declustering algorithms on the ANSS catalog in the California Collaboratory for the Study of Earthquake Predictability (CSEP) testing region and found that varying the parameters of the Reasenberg and Gardner and Knopoff methods has little impact on the χ2 test results. Although their conclusions suggest that other parameter choices for the Reasenberg method would not influence our results, we did run several tests of different p, Q, and τ values to ensure that we had not chosen an unfairly large catalog. The smallest declustered catalog produced from the range of parameters tested was with p/Q/τ values of 0.95/100/10, which gave a declustered catalog of 1944 earthquakes. This value of Q is beyond those typically tested, yet even with this catalog the method gives a median dispersion value of 1.45 and still does not return any Poisson result. This matches the conclusions of Van Stiphout et al. [2012] and other authors indicating that the Reasenberg method rarely returns a Poisson declustered catalog and implies that there is something in the method itself, not just the large size of this declustered catalog that leads to the high dispersion value.

[45] Our results for the Gardner and Knopoff method also match those found by Van Stiphout et al. [2012] with that method returning a high percentage of Poisson results. However, by tracking the dispersion values directly, we find that the Gardner-Knopoff method is consistently under-dispersed. It is tempting to think that this is due to an overabundance of long time intervals (low number of events in the declustered catalog), but the stochastic declustering method (branching-ratio.7198) also has a relatively small declustered catalog and returns dispersion values that are either near one or slightly overdispersed. The reason for this is unclear and warrants further investigation. The CURATE method has dispersion values that are closer to one than Gardner and Knopoff except in the early time period.

[46] The SLC method does not perform well in the overall catalog or in the early time period (1993–1997). We ran the two Poisson tests on six different C/DST pairs, which gave definitions of the largest sequence that were similar to the CURATE method. Three of the declustered catalogs (C/DST = : 1/10, 3/10, 10/15) have fewer events than the CURATE declustered catalog, and three (C/DST = : 1/5, 0.25/3, 0.05/3) have more events. The results of the tests and the size of the declustered catalogs are given in Table 6 where the median value is given to represent the dispersion test. The catalog presented for earlier comparison, C/DST = 1/5, is highlighted with bold text in Table 6. As expected based on the catalog sizes, the three largest declustered SLC catalogs give the largest dispersion values. The dispersion values of catalogs with larger DST values do not strictly increase with increasing catalog size. Increases in C for the same DST cause a corresponding increase in dispersions, whereas increases in DST for the same C value lead to a decrease in dispersion. The poor performance of the small DST catalogs in the early time period may indicate that the increased rate of seismicity in the early time period was due to relatively dispersed seismicity. Table 6 also shows that the CURATE method performs reasonably well over a range of selection parameters. For the whole time period (1993–2007.5) and the early time period (1993–1997) all the SLC catalogs are overdispersed and return higher dispersion values than the CURATE.

Table 6. Results of Poisson Testing for Different SLC and CURATE Declustered Catalogs. The SLC and CURATE Catalogs Considered Throughout the Text are Highlighted in Bold Text Throughout the Row
MethodC/ Dist-RuleDST/ Day-RuleCatalog SizeDispersion% PoissonDispersion% PoissonDispersion% Poisson

[47] The range of CURATE selection parameters shows that several possible selection parameters return better Poisson results than our initial selection parameters. We have elected to use the smaller (20 k) Distance-rule to match previous descriptions of large seismicity. The day-rule analysis shows that we may be able to improve our results by using slightly higher day rules of 4 or 5 days. While these day rules give slightly better values, note that the size of the declustered catalogs vary less than 3.0% from that reported for the 20 k/3.5 day rules. Tables like this paired with information about the scale of activity in a data set will allow users of the CURATE method to find parameter ranges that are appropriate. Small distance rules < 5 km may be appropriate for small studies of microseismicity, but care should be taken when the distance rule approaches the size of the location errors.

6 Discussion: Utility of Sequence Catalogs

[48] The map view of the sequence catalog (Figure 5) provides us with a first-order look at sequence activity in the study region. One initial observation is the apparent increased likelihood for larger sequences to occur in the northern part of our study region (Figure 5). This is unsurprising given the larger area and much denser distribution of faults in the northern part of the TVZ [Villamor and Berryman, 2006]. The largest sequence south of Lake Taupo is a sequence to the west of Ruapehu volcano (Figure 3), preceding its 1995 eruption [Hurst and McGinty, 1999]. Although Hurst and McGinty [1999] refer to this activity as “swarms” we note that the largest of these sequences picked out by our method has many characteristics of a mainshock-aftershock sequence. The largest event is M = 4.8, which occurs as the 9th out of 37 earthquakes, has a magnitude separation of 0.86 (high for the TVZ), and it also exhibits temporal decay after the occurrence of the largest event. All of these characteristics seem to indicate it may be a foreshock-mainshock-aftershock sequence. Its most unusual feature is its proximity in time to the later eruptive activity at Ruapehu. Other distant (10–30 km) sequences preceding eruptions have been documented [Fisher et al., 2009; Umakoshi et al., 2001]. In all cases links to the subsequent eruptive/magmatic activity are not conclusive; however, all reported cases show the preceding seismicity to be anomalous compared to previously recorded decades of seismicity. Hurst and McGinty [1999] suggested that a broader region around volcanoes be monitored for pre-eruptive seismicity. We propose that sequences catalogs created with the CURATE method are an ideal way to facilitate such observations. Even without probabilistic modeling, sequence catalogs can be used as a first-order tool to identify anomalous behavior.

7 Conclusions

[49] We have developed CURATE, a new clustering method that is more general than standard clustering techniques in that no specific causality is assumed. The use of earthquake rate as the primary indicator of activity allows us to vary the distance search parameter more independently than other methods. This independent treatment of distance will enable better identification of sparse increases in activity and lead to the better categorization of decay of sequences. It will allow us to investigate the spatial scale of any potential precursory activity.

[50] Applying the method to a data set from the Central Volcanic Region of New Zealand, we identified small earthquake sequences preceding two large swarms in the Haroharo region. If other large swarm sequences are found to have small precursors, it could have implications for hazard assessments and investigations of underlying physical processes. Anomalous earthquake sequences can readily be identified using sequence catalogs.

[51] A stochastic reconstruction, based on ETAS parameters, indicates that methods with a single set of temporal decay assumptions will put background earthquakes into clusters more frequently in areas or magnitude ranges with swarm activity. The lack of temporal clustering in the reconstructed catalogs replicates on a large spatial and temporal scale results for small spatio-temporal data sets [e.g., Hainzl and Ogata, 2005; Lombardi et al., 2006] that have shown that ETAS models with temporally stationary background rates cannot replicate swarms well.

[52] Despite the novelty of the CURATE method, it produces a declustered catalog that is consistent with other clustering algorithms. We propose Poisson testing of the declustered catalog to check for a sufficient choice of selection parameters. Testing of the declustered catalog presented here also shows that the CURATE method will add to the suite of clustering tools already available. The use of sequence catalogs generated by the CURATE method in seismically active areas will promote the ability to identify the timing and scale of anomalous behavior and provide useful parameters for incorporating earthquake swarm information into earthquake forecasts and hazard assessments.


[53] The authors thank Thomas VanStiphout and two anonymous reviewers for useful reviews of the manuscript. We would like to thank the New Zealand Earthquake Commission (EQC) for funding this project and to acknowledge the New Zealand GeoNet project and its sponsors EQC, GNS Science and LINZ, for providing all seismic data used in this study (