SEARCH

SEARCH BY CITATION

Keywords:

  • paleointensity;
  • error analysis

Abstract

  1. Top of page
  2. Abstract
  3. 1. Introduction
  4. 2. Methods
  5. 3. Results
  6. 4. Discussion
  7. 5. Conclusions
  8. Appendix A:: Accuracy of the Estimated Mean
  9. Appendix B:: Confidence Intervals
  10. Appendix C:: Within-Site Consistency
  11. Acknowledgments
  12. References
  13. Supporting Information

[1] Determining the strength of the ancient geomagnetic field (paleointensity) can be time consuming and can result in high data rejection rates. The current paleointensity database is therefore dominated by studies that contain only a small number of paleomagnetic samples (n). It is desirable to estimate how many samples are required to obtain a reliable estimate of the true paleointensity and the uncertainty associated with that estimate. Assuming that real paleointensity data are normally distributed, an assumption adopted by most workers when they employ the arithmetic mean and standard deviation to characterize their data, we can use distribution theory to address this question. Our calculations indicate that if we wish to have 95% confidence that an estimated mean falls within a ±10% interval about the true mean, as many as 24 paleomagnetic samples are required. This is an unfeasibly high number for typical paleointensity studies. Given that most paleointensity studies have small n, this requires that we have adequately defined confidence intervals around estimated means. We demonstrate that the estimated standard deviation is a poor method for defining confidence intervals for n < 7. Instead, the standard error should be used to provide a 95% confidence interval, thus facilitating consistent comparison between data sets of different sizes. The estimated standard deviation, however, should retain its role as a data selection criterion because it is a measure of the fidelity of a paleomagnetic recorder. However, to ensure consistent confidence levels, within-site consistency criteria must be depend on n. Defining such a criterion using the 95% confidence level results in the rejection of ∼56% of all currently available paleointensity data entries.

1. Introduction

  1. Top of page
  2. Abstract
  3. 1. Introduction
  4. 2. Methods
  5. 3. Results
  6. 4. Discussion
  7. 5. Conclusions
  8. Appendix A:: Accuracy of the Estimated Mean
  9. Appendix B:: Confidence Intervals
  10. Appendix C:: Within-Site Consistency
  11. Acknowledgments
  12. References
  13. Supporting Information

[2] Obtaining detailed information about past geomagnetic field behavior is key to our understanding of the geodynamo and its evolution. However, obtaining reliable estimates of paleofield strength (paleointensity) is problematic and suffers from high failure rates [e.g., Perrin, 1998; Riisager et al., 2002]. Many studies therefore suffer from having a small number of paleomagnetic samples (Figure 1), which are often insufficient to estimate the uncertainty in the mean result (i.e., n = 1). Currently, >70% of entries in the PINT08 database [Biggin et al., 2009] are from four paleomagnetic samples or less (n ≤ 4 (Figure 1)). This brings the reliability of paleointensity estimates based on small data sets into question.

image

Figure 1. Histogram of paleointensity data entries from the PINT08 database [Biggin et al., 2009]. Over 70% of the data entries have n ≤ 4. An additional 71 entries do not report n.

Download figure to PowerPoint

[3] If we consider paleointensity data to be normally distributed, the probability that an estimated mean (m) falls within ±10% of the true mean (μ) can be calculated from the normal cumulative distribution function (Figure 2a). If we choose a commonly applied within-site consistency criterion, that the true standard deviation (σ) must be ≤25% of the true mean, we can calculate the number of paleomagnetic samples required for an accurate estimated mean at the 95% confidence level. For the worst case scenario, when equation image = 0.25, for normally distributed data, n must be ≥24. Biggin et al. [2003], also assuming normality, estimated the number of paleomagnetic samples required for 95% confidence that m falls within ±10% of μ. Using historical data sets, they estimated that at least 6–22 paleomagnetic samples were required to achieve this. Our generally applicable number is larger than the data set specific values given by Biggin et al. [2003]. Regardless, such large data sets are uncommon in paleointensity studies, therefore confidence limits on estimated means are important for fully quantifying paleointensity data.

image

Figure 2. The probability that the estimated mean falls within 10% of the true mean for (a) normally distributed data and (b) lognormally distributed data. The probabilities depend on the true standard deviation (σ) of the underlying distribution, which has been scaled as a percentage of the true mean.

Download figure to PowerPoint

[4] It is intuitive that sampling small numbers of point values can lead to fortuitously low, or high, estimated standard deviations (s), and it has been acknowledged in paleointensity studies that a small standard deviation is no guarantee of accuracy [Biggin et al., 2003]. However, little work has been undertaken to quantify the uncertainties associated with small paleointensity data sets. In this study, we use analytical and numerical calculations to assess the usefulness of statistics commonly used in paleointensity analyses. These calculations are based on the assumption that real paleointensity data are normally or lognormally distributed. These statistics and assumptions will be tested using historical data sets where the true geomagnetic field intensities are known. This is in contrast to Biggin et al. [2003] who used the estimated mean of each data set to define the “true” field intensity.

2. Methods

  1. Top of page
  2. Abstract
  3. 1. Introduction
  4. 2. Methods
  5. 3. Results
  6. 4. Discussion
  7. 5. Conclusions
  8. Appendix A:: Accuracy of the Estimated Mean
  9. Appendix B:: Confidence Intervals
  10. Appendix C:: Within-Site Consistency
  11. Acknowledgments
  12. References
  13. Supporting Information

[5] In statistical theory, a sampling distribution is the probability distribution of a given statistic obtained from a random selection of point values from a population distribution (the complete distribution of values). When sufficient point values are obtained from a population distribution, the sampling distribution will approximate the population distribution. Throughout this paper, we use the term sample in the statistical sense of referring to a subset of a population distribution, and refer to physical specimens used in paleointensity studies as paleomagnetic samples. Each individual paleointensity estimate can be viewed as a point value that is randomly selected from a population distribution.

[6] Most paleointensity studies characterize data using the estimated mean (m) and estimated standard deviation (s) under the assumption of normality, i.e.,

  • equation image

and

  • equation image

where xi is the ith datum and n is the number of data. Cochran's theorem tells us that for normally distributed random variables, the distribution of sample (estimated) means and sample (estimated) variances are independent. Sample means follow a normal distribution with true mean μ, and true variance equation image, while sample variances are chi-square (χ2) distributed with (n − 1) degrees of freedom:

  • equation image

Hence, sample standard deviations are χ distributed:

  • equation image

Examples of the distribution of sample standard deviations are shown in Figure 3.

image

Figure 3. Examples of the distribution of sample standard deviations obtained from normally distributed data for n = 2 and n = 100. As n increases, the distribution becomes narrower and more symmetric.

Download figure to PowerPoint

[7] The known distributions of sample means and sample variances for normal distributions provides analytical solutions for understanding the behavior of m and s. Details of the analytical solutions are given in Appendices AC. Assessing nonnormal distributions, however, is more complicated from an analytical view point because the true standard deviation and true mean are frequently dependent, and can be related in a nonlinear fashion. The easiest approach, therefore, is to derive numerical solutions. To assess lognormally distributed data we have used 106 random samples of varying size, n, to determine the behavior of m and s. This approach can be generalized for any distribution, as follows.

[8] 1. Randomly select n data from the specified distribution.

[9] 2. Calculate the estimated mean (m) and estimated standard deviation (s) of the n data, assuming a normal distribution (i.e., equations (3) and (4)).

[10] 3. Repeat the above steps 106 times.

[11] 4. Identify the number of samples that conform with the criteria to be investigated (e.g., the number of samples with a confidence interval (m ± 1s) that includes the true mean). This allows the probability of each outcome to be estimated.

[12] 5. Repeat steps 1–4 for samples of size n + 1.

[13] The lognormal distribution that we investigated using this approach was set to have a true mean of 30 (a typical geomagnetic field strength in μT) and varying true standard deviations (σ = 1%–100% of the true mean). The true standard deviations are defined as percentages of the true mean, therefore the results are independent of the absolute value of the true mean. The lognormal distribution parameters (γ and θ) were calculated using standard equations [Aitchison and Brown, 1957]:

  • equation image

and

  • equation image

[14] Strictly, the use of equations (1) and (2) in step 2 is only valid for a normal distribution. However, irrespective of the real paleointensity data distributions, this is how most paleointensity studies analyze their data.

3. Results

  1. Top of page
  2. Abstract
  3. 1. Introduction
  4. 2. Methods
  5. 3. Results
  6. 4. Discussion
  7. 5. Conclusions
  8. Appendix A:: Accuracy of the Estimated Mean
  9. Appendix B:: Confidence Intervals
  10. Appendix C:: Within-Site Consistency
  11. Acknowledgments
  12. References
  13. Supporting Information

3.1. Obtaining an Accurate Estimate of the True Mean

[15] As noted in section 1, n ≥ 24 is required for 95% confidence that m falls within ±10% of μ for normally distributed data, under the criterion that equation image = 0.25 (Figure 2a). For lognormally distributed data under the same conditions (Figure 2b), for m to be within ±10% of μ, n must also be ≥24 to achieve a 95% confidence level. These two values represent a worst case scenario under these conditions. When equation image is lower, smaller n can be used to achieve the same 95% confidence level.

3.2. Confidence Limits Using the Standard Deviation

[16] To assess the usefulness of the estimated standard deviation, s, to provide confidence intervals for small n, we calculate the probability that the true mean lies within an interval around the estimated mean defined by a multiple of the estimated standard deviation. Strictly, s does not reflect the precision of m, but rather it represents a coverage interval of the sampling distribution. For normally distributed data the interval m ± 1s will include approximately 68% of the data, and approximately 95% of the data will be included in the interval m ± 2s. The analytical solution for normally distributed data and the numerical solution for lognormal data are shown in Figures 4a and 4b, respectively. For the analytical solution, the probabilities that μ lies within a multiple of s of m are independent of σ. However, for the lognormal distribution these probabilities decrease by ∼10% over a two order of magnitude increase in σ; the dependence on σ is most pronounced at low n (<5). This dependence is small enough to be viewed as negligible and we have averaged the probabilities over all σ values. A contour plot of the maximum probability difference between different values of σ is given in Appendix B.

image

Figure 4. The probability that the true mean lies within a range defined by a multiple of the estimated standard deviation (s) for (a) normally distributed data and (b) lognormally distributed data.

Download figure to PowerPoint

[17] As would be expected, as n increases there is a greater probability of the true mean lying within ±1s. When n = 7 or 8, one estimated standard deviation is sufficient to provide an uncertainty interval that corresponds to a 95% confidence interval for normally and lognormally distributed data, respectively. These are more achievable sample numbers for typical paleointensity studies. When we consider smaller values of n, increasing multiples of s are required to provide the same level of confidence. For n = 2 as many as 8 estimated standard deviations are needed to define the equivalent 95% confidence interval around the estimated mean for normally distributed data (Figure 4a). Eleven estimated standard deviations are required for lognormally distributed data when n = 2 (Figure 4b).

3.3. Confidence Limits Using the Standard Error

[18] An alternative parameter that can be used to define the confidence interval around an estimated mean is the standard error (SE), which is defined as equation image. The SE, which is also known as the standard deviation or standard error of the mean, represents an estimate of the true standard deviation of the distribution of sample means, if repeat sampling of the population distribution was possible (i.e., an estimate of the square root of the variance of the distribution of sample means; see section 2). We restrict the name of this parameter to the SE, to avoid confusion with the estimated standard deviation, s. The probabilities of μ falling within a multiple of the SE of m, for normally and lognormally distributed data, are shown in Figure 5. As is the case with the estimated standard deviation uncertainty interval probabilities, the lognormal SE confidence interval probabilities have a dependence on σ. This dependence produces a maximum probability difference of ∼10% and, as above, the probabilities have been averaged over all σ values (see Appendix B). In many respects the SE provides a poorer method of defining confidence intervals around m. The probabilities of μ falling within a multiple of the SE of m are generally lower than if s were used, and the confidence levels defined by the SE are dependent on n. However, the SE can be used to provide a consistent confidence interval (CI) given that, for a normal distribution, the percentiles of the distribution can be approximated by a t distribution:

  • equation image

where equation image is the two-tail critical t value for the (1 − α) × 100th percentile (i.e., the (1 − α) confidence level) and for (n − 1) degrees of freedom. The white lines in Figure 5 are the t critical values for n at the 95% confidence level. For normally distributed data, for all n these multiples of the SE provide 95% confidence that μ falls within the confidence interval of m. For the lognormal data, t × SE fails to provide a consistent 95% confidence level. However, the confidence levels vary from 91%–94%, with an average of 93%, which is more consistent than provided by ±1s. In general, the larger the deviation from normality, the lower this confidence level becomes.

image

Figure 5. The probability that the true mean lies within a range defined by a multiple of the standard error for (a) normally distributed data and (b) lognormally distributed data. The white lines on each plot represent the t critical values for n at the 95% confidence level.

Download figure to PowerPoint

3.4. Within-Site Consistency

[19] As noted in section 1, low within-site scatter, defined as the ratio of the estimated standard deviation to the estimated mean (δB (%) = equation image × 100), may not be an indication of accuracy and may arise fortuitously when n is small [Biggin et al., 2003]. We calculate the probability that δB (%) ≤ 25% for randomly sampled data (Figure 6). The probability intuitively has a strong dependence on the true standard deviation of the underlying distribution. However, the confidence level varies with n. For n = 2, when the equation image is 15% there is only a ∼90% probability that δB (%) will be ≤25%, which increases to >95% for n ≥ 4. Confidence levels are lower for lognormally distributed data, and under the same circumstances n ≥ 5 is needed for 95% confidence or better.

image

Figure 6. The probability that δB (%) ≤ 25% of the estimated mean for (a) normally distributed data and (b) lognormally distributed data. The true standard deviation has been scaled as a percentage of the true mean.

Download figure to PowerPoint

4. Discussion

  1. Top of page
  2. Abstract
  3. 1. Introduction
  4. 2. Methods
  5. 3. Results
  6. 4. Discussion
  7. 5. Conclusions
  8. Appendix A:: Accuracy of the Estimated Mean
  9. Appendix B:: Confidence Intervals
  10. Appendix C:: Within-Site Consistency
  11. Acknowledgments
  12. References
  13. Supporting Information

[20] When dealing with real paleointensity data parameters such as m, s and the SE can be estimated from the data. Only in recent times, with the use of DGRF data [Maus et al., 2005], can we obtain values for μ, but values for σ remain unobtainable. In the following discussion we will look at historical data sets where μ can be obtained from DGRF data and make use of the criteria outlined above.

4.1. How Are Real Data Distributed?

[21] A key issue is how well the considered distributions represent real paleointensity data. The descriptive statistics of a number of historical paleointensity data sets from a range of localities, methods and materials are summarized in Table 1. Biggin et al. [2003] used the Anderson-Darling (AD) test [Anderson and Darling, 1952; Stephens, 1986] to show that three historical data sets could not be distinguished from a normal distribution at the 0.05 significance level. We expand on this approach by considering additional data sets and testing for lognormality (Table 2). In addition, we have used the AD test to calculate the probability that the data sets are normally distributed with m = μ, or that they are lognormally distributed with γ = ln μ (which assumes that the true mean is the median value of the lognormal distribution, which greatly simplifies calculations for γ and θ).

Table 1. Descriptive Statistics of Real Paleointensity Data Setsa
ReferenceLocationMethodbMaterialYearm (μT)s (μT)δB (%)t × SEt × SE (%)nμ (μT)IEF (%)
  • a

    The estimated mean geomagnetic field intensity and estimated standard deviation are m and s, respectively; δB (%) = equation image × 100; t × SE is the 95% confidence interval defined by the standard error and as a percentage of the estimated mean; n is the number of paleomagnetic samples accepted for the mean paleointensity estimate; μ is the expected geomagnetic field intensity determined from DGRF data [Maus et al., 2005]; and IEF (%) is the intensity error fraction (= equation image × 100).

  • b

    T, data obtained using the Thellier method and its variants [Thellier and Thellier, 1959; Coe, 1967]; S, data obtained using the Shaw method and its variants [Shaw, 1974]; MW, data obtained using the microwave method and its variants [Walton et al., 1993].

  • c

    East Pacific Rise.

  • d

    Submarine basaltic glass.

Pick and Tauxe [1993]EPRcTSBGd199037.15.214.03.38.91237.00.3
Tsunakawa and Shaw [1994]OshimaSLava198643.61.12.59.922.7245.5−4.2
Tsunakawa and Shaw [1994]SakurajimaSLava194639.46.416.215.940.4346.0−14.3
Rolph [1997]Mt. EtnaSLava197139.29.023.04.311.11943.9−10.7
Hill and Shaw [2000]HawaiiMWLava196031.63.611.41.13.64136.2−12.7
Calvo et al. [2002]Mt. EtnaTLava192850.18.016.07.414.8742.318.4
Yamamoto et al. [2003]HawaiiSLava196039.47.920.16.115.4936.28.8
Yamamoto et al. [2003]HawaiiTLava196051.914.227.46.612.82036.243.4
Mochizuki et al. [2004]OshimaSLava198646.44.710.14.910.6645.52.0
Mochizuki et al. [2004]OshimaTLava1986514.18.02.34.51545.512.1
Chauvin et al. [2005]HawaiiTLava195039.34.511.55.614.2536.09.2
Chauvin et al. [2005]HawaiiTLava195539.33.79.43.48.7736.09.2
Chauvin et al. [2005]HawaiiTLava196033.64.914.65.115.3636.2−7.2
Donadini et al. [2007]HelsinkiMW, TBrick190647.94.28.81.83.82349.6−3.4
Michalk et al. [2008]HeklaTLava191343.36.815.710.825.0452.0−16.7
Paterson et al. [2010]LáscarTPyroclastic199324.31.24.90.52.02624.01.3
Muxworthy et al. (submitted manuscript, 2010)ParícutinTLava194348.711.523.64.49.02945.08.2
Muxworthy et al. (submitted manuscript, 2010)VesuviusTLava194449.125.251.312.124.71944.011.6
Table 2. Probability That the Investigated Data Sets Are Normally or Lognormally Distributeda
ReferencePnormP*normPlognormP*lognorm
  • a

    Pnorm and Plognorm are the probabilities that the data sets have been drawn from a continuous normal or lognormal distribution, respectively, according to the Anderson-Darling test. P*norm and P*lognorm are the probabilities, obtained using the Anderson-Darling test, that the data sets have been drawn from a continuous normal distribution with m = μ, or a lognormal distribution with γ = ln μ. If P ≥ 0.05, the data set cannot be distinguish from the theoretical distribution at the 0.05 significance level.

Pick and Tauxe [1993]0.0300.4020.1030.501
Tsunakawa and Shaw [1994]0.2270.4560.2270.457
Tsunakawa and Shaw [1994]0.2570.3530.2200.365
Rolph [1997]0.6490.1110.7800.092
Hill and Shaw [2000]0.5270.0000.3590.000
Calvo et al. [2002]0.9310.1170.9060.113
Yamamoto et al. [2003]0.0620.2960.1650.391
Yamamoto et al. [2003]0.0360.0020.3300.001
Mochizuki et al. [2004]0.8150.9280.8840.968
Mochizuki et al. [2004]0.5180.0050.5690.004
Chauvin et al. [2005]0.7360.2690.6490.275
Chauvin et al. [2005]0.2650.1340.3040.129
Chauvin et al. [2005]0.0960.2840.0940.269
Donadini et al. [2007]0.4420.1050.4390.089
Michalk et al. [2008]0.0790.1970.0580.213
Paterson et al. [2010]0.9400.3320.9650.381
Muxworthy et al. (submitted manuscript, 2010)0.0230.0590.0010.067
Muxworthy et al. (submitted manuscript, 2010)0.0000.1560.1090.587

[22] For all but one data set (the Parícutin data set of A. R. Muxworthy et al. (A Preisach methodology to determining absolute paleointensities: 2. Field testing, submitted to Journal of Geophysical Research, 2010)) the AD test cannot reject the null hypothesis that the data sets have been sampled from continuous lognormal distributions at the 0.05 significance level. With the exception of four data sets (Pick and Tauxe [1993], the Thellier data from Yamamoto et al. [2003], and both data sets from Muxworthy et al. (submitted manuscript, 2010)), all data sets could also be sampled from continuous normal distributions. Considering the probabilities that the data sets are distributed around the expected values (P* values in Table 2), we observe that the data from Hill and Shaw [2000], and the Thellier data from Yamamoto et al. [2003] and Mochizuki et al. [2004] are not normally or lognormally distributed. Two of these data sets are from the 1960 lava flow on Hawaii, which has been noted for yielding absolute paleointensity results that are inconsistent with the expected value [Tanaka and Kono, 1991; Tsunakawa and Shaw, 1994; Hill and Shaw, 2000; Yamamoto et al., 2003]. This may be the result of bias due to the presence of chemical or thermochemical remanent magnetizations [e.g., Tsunakawa and Shaw, 1994; Hill and Shaw, 2000; Yamamoto, 2006; Fabian, 2009]. Mochizuki et al. [2004] noted that their Thellier data are systematically higher than expected and suggested that an inherent rock magnetic property or thermal alteration due to laboratory heating has caused this bias.

[23] It is worth considering the statistical power of the AD test with respect to the data being analyzed. In general, goodness-of-fit tests lose accuracy with decreasing n. The AD test is no exception. Given the small size of some of the data sets here, some of the probability should be viewed with caution. P values (Table 2) were calculated using the asymptotically derived analytical solution for the AD test [Stephens, 1986]. However, no analytical solution is currently available for the P* probabilities, which were therefore estimated using a Monte Carlo approximation with 107 simulations [e.g., Stephens, 1974, 1979]. The effect is that the P* probabilities are poorly constrained close to the tails of the distribution (i.e., P* ≈ 0.05 and P* ≈ 0.95). This is of most concern for us when P* ≈ 0.05, which means that about four of the P* probabilities (representing three data sets) are poorly constrained. Another consideration is the sensitivity of the goodness-of-fit test. The AD test is sensitive to deviations from normality at the tails of the distribution. That is to say, a small number of large outliers can dramatically reduce the calculated probability that the data are normally distributed. Given the nature of paleointensity data, where nonideal behavior can be difficult to exclude from data sets, this is a possibility. On the other hand, the Kolmogorov-Smirnov (KS) test is more sensitive to deviations close to the median value of the distribution (i.e., large numbers of data that deviate from normality close to the mean will reduce the calculated probability). The one-sample KS test for normality and lognormality returns probabilities ≥0.138, using the estimated mean and estimated standard deviation. This provides additional evidence that the data sets could be sampled from either a normal or lognormal distribution at the 0.05 significance level.

[24] For scalar paleointensities, given that the intensity must be >0 for all practical purposes, the distributions must be non-Gaussian. In general, paleointensity data sets could be lognormally distributed (Table 2). However, most data sets cannot be distinguished from a normal distribution. Our simulations indicate that treating lognormal data normally (i.e., using the arithmetic mean and the standard deviation, equations (1) and (2), respectively) produces statistics that behave in an approximately normal fashion. Importantly these statistics and probabilities represent best-case scenarios and in reality the confidence levels of these statistics will be lower. In addition, large deviations or systematic biases due to nonideal paleointensity behavior cannot be identified with these methods, and all statistics of paleointensity data rely on the assumption that such biases can be successfully identified and excluded from final data sets.

4.2. Implications for the Paleointensity Database

[25] While the SE provides a better estimate of the confidence interval around an estimated mean, the estimated standard deviation, s, remains useful for paleointensity studies. In one respect, s can be viewed as a measure of the fidelity of a paleomagnetic recorder, by accounting for natural (or laboratory induced) variability of paleointensity results from a group of specimens. It should therefore retain its role as a paleointensity data selection criterion. However, additional considerations are necessary if s is to be used in this way.

[26] The known distribution of sample variances for normally distributed data allows quantification of a confidence interval around s:

  • equation image

where equation image and equation image are the two-tailed χ2 critical values with (n − 1) degrees of freedom at the (1 − equation image)th and equation imageth percentiles. As illustrated by Figure 7a, the confidence intervals are large for small n and decrease as n increases. For n = 2, the 95% confidence interval is 0.4ss ≤ 31.9s, but for n = 30 the interval is only 0.8ss ≤ 1.3s. This quantifies the intuitive notion that s is poorly constrained for small n, for normally distributed data. If we wish to use s as a selection criterion for paleointensity analysis, we need to take into account the high degree of variability of s for small n. That is, criteria, such as δB (%), must have a sample size dependence, the necessity of which can be seen in Figure 6. If a static δB (%) criterion were to be used, as is the case with most previous studies, a data set with n = 2 and s = 15% would be accepted for further analysis along with a data set with n = 30 and s = 15%. In reality for the former data set, at the 95% confidence level, s could range from 6% to 479%, while for the latter data set, s will lie within the range 12%–20%. Clearly, the n = 30 data set is more reliable. If we impose δB (%) ≤ 25% both data sets would be deemed as acceptable results.

image

Figure 7. (a) Upper and lower 95% confidence limits for the estimated standard deviation as a function of n. These limits assume normally distributed data. (b) Sample size–dependent within-site consistency (δBn (%)) threshold values that ensure that the maximum acceptable within-site scatter is ≤25% at the 95% confidence level.

Download figure to PowerPoint

[27] The ratio equation image can be shown to follow a noncentral t distribution with noncentrality parameter ϕ = equation image (Appendix C). This allows a sample size–dependent within-site criterion (δBn (%)) to be defined:

  • equation image

where equation image is the one-tailed noncentral t critical value with noncentrality parameter equation image, and where Rmax is the desired maximum acceptable within-site consistency (e.g., the commonly used threshold of ≤25%). This formulation exactly corresponds to the confidence level contours for the normal distribution shown in Figure 6. Due to the fact that δBn (%) is within the noncentrality parameter, no unique analytical solution can be derived, however, accurate solutions can be rapidly obtained using a numerical approach. The cutoff values that give 95% confidence that equation image ≤ 25% for normally and lognormally distributed data are shown in Figure 7b. Table 3 provides δBn(%) values for various maximum values of equation image and n, assuming normally distributed data. Implementing a sample size–dependent within-site consistency criterion ensures a consistent confidence level (e.g., 95%) in all selected data. Assuming normality, and choosing a maximum within-site consistency of 25%, this approach gives a cutoff value for n = 2 of δBn (%) ≤ 12.56%, and for n = 30, δBn (%) ≤ 20.43% (Figure 7b and Table 3).

Table 3. Threshold Values for δBn That Ensure a 95% Confidence Level That the Estimated Standard Deviation Is Less Than a Specified Maximum Percentage of the Estimated Meana
nMaximum Percentage
5%10%15%20%25%
  • a

    These threshold values (in %) assume normally distributed data.

22.555.097.6110.1112.56
32.895.768.6111.4314.20
43.106.189.2412.2615.23
53.256.489.6812.8515.97
63.366.7010.0213.3016.53
73.456.8910.3013.6716.99
83.537.0410.5213.9717.37
93.597.1710.7214.2317.69
103.657.2810.8814.4517.97
113.707.3811.0314.6518.22
123.747.4611.1614.8218.44
133.787.5411.2814.9818.64
143.817.6111.3815.1218.82
153.847.6711.4815.2518.98
163.877.7311.5715.3719.13
173.907.7911.6515.4819.27
183.937.8411.7315.5819.39
193.957.8811.8015.6719.51
203.977.9311.8615.7619.62
213.997.9711.9215.8419.72
224.018.0011.9815.9219.82
234.038.0412.0315.9919.91
244.048.0712.0816.0620.00
254.068.1112.1316.1320.08
264.078.1412.1816.1920.16
274.098.1712.2216.2520.23
284.108.1912.2616.3020.30
294.128.2212.3016.3520.37
304.138.2412.3416.4120.43
354.188.3512.5016.6320.72
404.238.4412.6416.8120.95
454.278.5212.7616.9721.14
504.308.5812.8517.1021.31

[28] The PINT08 paleointensity database [Biggin et al., 2009] contains 3576 data entries. For the purposes of analyzing long-term global paleointensity variations it is necessary to compare intensities in the form of virtual (axial) dipole moments (V(A)DM). Currently, only 3049 of the PINT08 entries report a V(A)DM. Using only these entries and excluding data entries with n = 1 and data with no reported n or s, 2173 entries remain. If we apply δB (%) ≤ 25%, 1936 entries remain. This is, generally speaking, the extent to which most database analyses go, although some analyses impose restrictions on the paleointensity method used. If we apply the above-described sample size–dependent within-site criterion, δBn (%), 1560 data entries are left; which represents ∼44% of all available data. This a further reduction of ∼12% when compared to using the δB (%) criterion. The result of this pruning of the database, however, is that we have a consistent confidence in the remaining data, despite having variable n. The application of this new criterion does not greatly change the general long-term trends in geomagnetic field intensity variation (Figure 8a). It does, however, exacerbate the problem of scarce data is certain time periods: no data are available in the Middle to Upper Triassic (244–202 Ma) and only two data points pass the δBn (%) criterion from the Lower Devonian to the end of the Proterozoic Eon, from ∼524–407 Ma. A more detailed view of the number of data accepted before and after applying the δBn (%) criterion is shown in Figure 8b.

image

Figure 8. (a) Average V(A)DM during the Phanerozoic determined from the PINT08 data set (green), after application of δB (%) ≤ 25% (blue), and after application of δBn (%) ≤ 25% (red). Average V(A)DMs are calculated for 5 Myr bins. Lines indicate consecutive bin averages. (b) Number of data points per bin. The scale has been truncated at 160 points per bin for clarity. This excludes only the first bin (0–5 Ma) which includes 1324 points from PINT08, 785 after applying the δB (%) criterion, and 621 after applying the δBn (%) criterion.

Download figure to PowerPoint

4.3. How Many Samples Are Enough?

[29] Determining the optimal number of samples for a paleointensity study often is a subjective determination that depends on the degree of confidence required for the study in question. As outlined above, as many as 24 samples would be the optimal minimum number, but this is rarely achievable. When only one data point is available, no information can be obtained to quantify the uncertainty. Therefore, a minimum of n = 2 should be used. This at least allows calculation of s and quantification of a confidence interval, despite this interval being large. However, investigators should aim to maximize the number of successful results by collecting as many paleomagnetic samples as possible per unit investigated. Studies that collect only a few paleomagnetic samples per unit (i.e., 10 or less) are most likely to produce data sets that have large or unquantifiable confidence intervals. Given that paleointensity studies can have high failure rates, as many as 30–40 paleomagnetic samples should be collected per unit.

4.4. Comparison of Confidence Intervals

[30] When applied to real data sets, how well do the confidence intervals defined by the SE compare to other methods of estimating confidence intervals? The uncertainty interval defined by the estimated standard deviation, and the confidence intervals defined by the standard error (t × SE) and estimated by a nonparametric statistical bootstrap for the data sets in Table 1 are summarized in Table 4. Both t × SE and the bootstrapped confidence limits reflect the 95% confidence level, while the uncertainty interval of the standard deviation, under ideal circumstances, reflects ∼68% coverage (i.e., ∼68% of the data will fall within ±1s of the estimated mean). Two standard deviations, which should represent ∼95% coverage is also included in Table 4, however, 2s is rarely used in paleointensity studies. The uncertainty intervals defined by the estimated standard deviation and the confidence interval defined by t × SE involve the assumption that the data sets are normally distributed. The bootstrapped confidence intervals involve no assumptions about the distribution of the data sets.

Table 4. Confidence Intervals Around the Estimated Mean Using ±1s and ±t × SE and Estimated Using a Statistical Bootstrap Approacha
Referenceμ (μT)s CIt × SE CIBootstrapped CI
LowerUpperμ Within Rangeb (2s)LowerUpperμ Within RangeLowerUpperμ Within Range
  • a

    CI, confidence interval.

  • b

    Does μ fall within the range defined by ±2 standard deviations?

Pick and Tauxe [1993]37.031.942.3Y (Y)33.840.4Y35.041.1Y
Tsunakawa and Shaw [1994]45.542.544.7N (Y)33.753.5Y42.944.4N
Tsunakawa and Shaw [1994]46.033.045.8N (Y)23.555.3Y32.143.5N
Rolph [1997]43.930.248.2Y (Y)34.943.5N35.543.3N
Hill and Shaw [2000]36.228.035.2N (Y)30.532.7N30.532.7N
Calvo et al. [2002]42.342.158.1Y (Y)42.757.5N44.655.5N
Yamamoto et al. [2003]36.231.547.3Y (Y)33.345.5Y35.445.4Y
Yamamoto et al. [2003]36.237.766.1N (Y)45.358.5N46.859.2N
Mochizuki et al. [2004]45.541.751.1Y (Y)41.551.3Y43.550.4Y
Mochizuki et al. [2004]45.546.955.1N (Y)48.753.3N49.253.2N
Chauvin et al. [2005]36.034.843.8Y (Y)33.744.9Y35.342.4Y
Chauvin et al. [2005]36.035.643.0Y (Y)35.942.7Y37.142.4N
Chauvin et al. [2005]36.228.738.5Y (Y)28.538.7Y30.036.9Y
Donadini et al. [2007]49.643.752.1Y (Y)46.149.7Y46.249.6Y
Michalk et al. [2008]52.036.550.1N (Y)32.554.1Y36.447.1N
Paterson et al. [2010]24.023.125.5Y (Y)23.824.8Y23.924.8Y
Muxworthy et al. (submitted manuscript, 2010)45.037.260.2Y (Y)44.353.1Y44.652.8Y
Muxworthy et al. (submitted manuscript, 2010)44.023.974.3Y (Y)37.061.2Y41.466.4Y

[31] Using the estimated standard deviation to define uncertainty intervals includes the true mean for 12 of the 18 data sets investigated. This uncertainty interval fails when there is a bias in the data [e.g., Hill and Shaw, 2000] or when the data set contains few values [e.g., Michalk et al., 2008]. The 2s uncertainty intervals include μ in all cases, but in some instances 2s defines a range of ±50 μT (e.g., the Vesuvius data of Muxworthy et al. (submitted manuscript, 2010)). In addition, it is unlikely that the estimated standard deviation will represent a consistent confidence level for data sets with n < 7 (Figure 4). Therefore, for at least six data sets the estimated standard deviation does not provide 95% coverage (Table 4). The t × SE confidence intervals include the true mean for 13 of the data sets and include the true mean when n is small. Four of the five data sets for which the t × SE confidence interval does not include μ are rejected by the AD test for being normally or lognormally distributed about the expected means at the 0.05 significance level. This suggests that there may be a bias in the data sets as noted by the authors [Hill and Shaw, 2000; Yamamoto et al., 2003; Mochizuki et al., 2004]. For these data sets, ±1s also fails to include the true mean. Rolph [1997] noted that the paleointensity results from the 1971 lava flow from Mt. Etna may be affected by chemical remanent magnetization. Despite having relatively large n (≥7), these five data sets yield inaccurate results (intensity error fraction, ∣IEF∣ ≥ 10.7% (Table 1)).

[32] The statistical bootstrap confidence intervals were determined using a bias-corrected accelerated bootstrap method [Manly, 2007] with 106 repeat samplings to define the 95% confidence interval around the mean (Table 4). The bootstrap method consistently fails to yield confidence intervals that include the true mean. It has been noted by others that the bootstrap method can underestimate the uncertainties of data sets with few values [e.g., Schenker, 1985]. A comparison between bootstrap and t × SE confidence intervals from a Monte Carlo analysis of a normal distribution suggests that 20 point values are required for the bootstrap confidence interval to be within 10% of that defined by t × SE, and as many as 40 point values are needed to reduce this to within 5%. This makes bootstrapped confidence intervals unsuitable for most paleointensity data sets.

5. Conclusions

  1. Top of page
  2. Abstract
  3. 1. Introduction
  4. 2. Methods
  5. 3. Results
  6. 4. Discussion
  7. 5. Conclusions
  8. Appendix A:: Accuracy of the Estimated Mean
  9. Appendix B:: Confidence Intervals
  10. Appendix C:: Within-Site Consistency
  11. Acknowledgments
  12. References
  13. Supporting Information

[33] We have assessed the calculation of appropriate confidence intervals for paleointensity data using theoretical and numerical approaches, as well as using real data sets. More statistical consideration is required when analyzing paleointensity data than is generally used in such studies. Statistical analysis of real paleointensity data sets indicates that, in general, paleointensity data can be approximated by normal or lognormal distributions around the expected values, irrespective of the method or material used. Exclusion of directional information, which precludes negative values, makes scalar paleointensity data fundamentally non-Gaussian. Despite this, owing to small sample sizes and low standard deviations of the underlying distributions, the data can be approximated to be normally distributed. This approximation fails when the data suffer from undetected bias and requires that paleointensity selection criteria successfully exclude nonideal behavior.

[34] Using a combination of analytical and numerical techniques, we have illustrated that the estimated standard deviation alone is insufficient to provide a consistent confidence level when quantifying the uncertainty of a mean paleointensity estimate. Instead, the 95% confidence interval defined by the standard error (equation image × SE) should be used as the uncertainty estimate for a mean paleointensity estimate. This ensures that the same confidence level is maintained when comparing data sets of different sizes, which is not the case for the estimated standard deviation when n < 7. Comparisons indicate that use of the standard error to define the confidence interval around an estimated paleointensity provides a better uncertainty estimate than the estimated standard deviation or a statistical bootstrap. The estimated standard deviation should, however, still be used as a data selection criterion; it provides a measure of the variation from a paleomagnetic recorder. In order to maintain a consistent confidence level, criteria such as δB (%) should incorporate a sample size dependence. This is needed to reflect the larger uncertainties associated with standard deviation estimates based on small n. Using a new criterion defined here (δBn(%)) considerably reduces the paleointensity database available for long-term geomagnetic analysis; however, it provides a consistent and more rigorous confidence level in the data that remain.

[35] In using both the estimated standard deviation and the standard error for analyzing paleointensity data, authors should explicitly state in which form the uncertainties are presented. As a general recommendation, we encourage authors to maintain the typically used approach and report paleointensity estimates ± one estimated standard deviation, along with n. This allows the standard error to be calculated and helps to maintain consistent data reporting. In addition, we recommend that the standard error is referred to as such, and not as the standard deviation of the mean, which can cause confusion with the estimated standard deviation, s.

[36] With respect to the question of how many samples are enough to obtain a reliable paleointensity estimate, the expression “safety in numbers” remains true. Ideally, at least 24 acceptable paleointensity results are desirable, although this has rarely been achieved in the published literature. The lack of any quantifiable uncertainty when n = 1 should automatically preclude these data sets from any meta-analysis; therefore n = 2 is the minimum sample size. Given the typically high failure rates, paleointensity studies should endeavor to collect a minimum of 30–40 paleomagnetic samples per flow (or stratigraphic level) in the hope of obtaining at least of 7–8 acceptable results. Collection of fewer paleomagnetic samples can lead to acquisition of data sets that have large confidence intervals or that are insufficient to provide reliable estimated means and uncertainties (i.e., when n = 1). Modern methods that enable analysis of larger numbers of paleomagnetic samples, such as the microwave technique, should aid investigators in achieving this goal.

Appendix A:: Accuracy of the Estimated Mean

  1. Top of page
  2. Abstract
  3. 1. Introduction
  4. 2. Methods
  5. 3. Results
  6. 4. Discussion
  7. 5. Conclusions
  8. Appendix A:: Accuracy of the Estimated Mean
  9. Appendix B:: Confidence Intervals
  10. Appendix C:: Within-Site Consistency
  11. Acknowledgments
  12. References
  13. Supporting Information

[37] We wish to identify the probability of obtaining an estimated mean, m, that falls within ±10% of the true mean μ:

  • equation image

This can be calculated using the normal cumulative distribution function (CDF; fnorm):

  • equation image

where equation image is the standard deviation of the sample means.

Appendix B:: Confidence Intervals

  1. Top of page
  2. Abstract
  3. 1. Introduction
  4. 2. Methods
  5. 3. Results
  6. 4. Discussion
  7. 5. Conclusions
  8. Appendix A:: Accuracy of the Estimated Mean
  9. Appendix B:: Confidence Intervals
  10. Appendix C:: Within-Site Consistency
  11. Acknowledgments
  12. References
  13. Supporting Information

[38] To determine the usefulness of the estimated standard deviation to define confidence intervals, we calculate the probability that m lies within an interval around μ that is defined by a multiple (i) of s, i.e.,

  • equation image

Rearranging and multiplying throughout by equation image, we obtain:

  • equation image

Here m follows a normal distribution and equation image a χ distribution, the ratio of which is t distributed, with n − 1 degrees of freedom. The t distribution CDF (ft) can be used to calculate the probabilities:

  • equation image

For the confidence intervals using the standard error (SE), a similar approach can be used:

  • equation image

Hence:

  • equation image

[39] The numerical simulations for lognormally distributed data have a probability variation that depends on the true standard deviation, σ. For both the estimated standard deviation and standard error probabilities (Figures 4 and 5), this dependence produces a maximum probability difference of ∼10% as σ varies from 1% to 100%. Maximum probability difference contour plots are given in Figure B1.

image

Figure B1. The maximum difference between all σ values that μ falls within the confidence interval of m defined by a multiple of (a) the estimated standard deviation or (b) the standard error. These plots only apply to lognormally distributed data.

Download figure to PowerPoint

Appendix C:: Within-Site Consistency

  1. Top of page
  2. Abstract
  3. 1. Introduction
  4. 2. Methods
  5. 3. Results
  6. 4. Discussion
  7. 5. Conclusions
  8. Appendix A:: Accuracy of the Estimated Mean
  9. Appendix B:: Confidence Intervals
  10. Appendix C:: Within-Site Consistency
  11. Acknowledgments
  12. References
  13. Supporting Information

[40] We wish to consider the probability that the ratio of the estimated standard deviation to the estimated mean (equation image) is less than a specified value (Rmax), say ≤0.25:

  • equation image

If we consider a noncentral t distribution:

  • equation image

where Z is a standard normal distribution, ϕ is the noncentrality parameter, and V is a χ2 distribution with ν degrees of freedom. Given the known distributions of m and s2 (see section 2), we can show that:

  • equation image

and

  • equation image

Therefore, equation image is distributed according to a noncentral t distribution. It can then be shown that equation image is also noncentral t distributed provided that:

  • equation image

Hence:

  • equation image

which can be calculated using the noncentral t distribution CDF (fnct):

  • equation image

Acknowledgments

  1. Top of page
  2. Abstract
  3. 1. Introduction
  4. 2. Methods
  5. 3. Results
  6. 4. Discussion
  7. 5. Conclusions
  8. Appendix A:: Accuracy of the Estimated Mean
  9. Appendix B:: Confidence Intervals
  10. Appendix C:: Within-Site Consistency
  11. Acknowledgments
  12. References
  13. Supporting Information

[41] This study was funded by the Royal Society and JSPS. We thank Alan Kimber, Richard Lockhart, and Robin Willinck for statistical advice and Lisa Tauxe for providing data. We are grateful to Andrew Roberts for his comments and advice. We thank Josh Feinberg, Yongjae Yu, and two anonymous reviewers for their helpful comments that improved this paper.

References

  1. Top of page
  2. Abstract
  3. 1. Introduction
  4. 2. Methods
  5. 3. Results
  6. 4. Discussion
  7. 5. Conclusions
  8. Appendix A:: Accuracy of the Estimated Mean
  9. Appendix B:: Confidence Intervals
  10. Appendix C:: Within-Site Consistency
  11. Acknowledgments
  12. References
  13. Supporting Information

Supporting Information

  1. Top of page
  2. Abstract
  3. 1. Introduction
  4. 2. Methods
  5. 3. Results
  6. 4. Discussion
  7. 5. Conclusions
  8. Appendix A:: Accuracy of the Estimated Mean
  9. Appendix B:: Confidence Intervals
  10. Appendix C:: Within-Site Consistency
  11. Acknowledgments
  12. References
  13. Supporting Information
FilenameFormatSizeDescription
ggge1747-sup-0001tab01.txtplain text document2KTab-delimited Table 1.
ggge1747-sup-0002tab02.txtplain text document2KTab-delimited Table 2.
ggge1747-sup-0003tab03.txtplain text document1KTab-delimited Table 3.
ggge1747-sup-0004tab04.txtplain text document2KTab-delimited Table 4.

Please note: Wiley Blackwell is not responsible for the content or functionality of any supporting information supplied by the authors. Any queries (other than missing content) should be directed to the corresponding author for the article.