Interlaboratory comparison study of Mg/Ca and Sr/Ca measurements in planktonic foraminifera for paleoceanographic research

Authors


Abstract

[1] Thirteen laboratories from the USA and Europe participated in an intercomparison study of Mg/Ca and Sr/Ca measurements in foraminifera. The study included five planktonic species from surface sediments from different geographical regions and water depths. Each of the laboratories followed their own cleaning and analytical procedures and had no specific information about the samples. Analysis of solutions of known Mg/Ca and Sr/Ca ratios showed that the intralaboratory instrumental precision is better than 0.5% for both Mg/Ca and Sr/Ca measurements, regardless whether ICP-OES or ICP-MS is used. The interlaboratory precision on the analysis of standard solutions was about 1.5% and 0.9% for Mg/Ca and Sr/Ca measurements, respectively. These are equivalent to Mg/Ca-based temperature repeatability and reproducibility on the analysis of solutions of ±0.2°C and ±0.5°C, respectively. The analysis of foraminifera suggests an interlaboratory variance of about ±8% (%RSD) for Mg/Ca measurements, which translates to reproducibility of about ±2–3°C. The relatively large range in the reproducibility of foraminiferal analysis is primarily due to relatively poor intralaboratory repeatability (about ±1–2°C) and a bias (about 1°C) due to the application of different cleaning methods by different laboratories. Improving the consistency of cleaning methods among laboratories will, therefore, likely lead to better reproducibility. Even more importantly, the results of this study highlight the need for standards calibration among laboratories as a first step toward improving interlaboratory compatibility.

1. Introduction

[2] In the past few years there has been an increasing interest in using foraminiferal Mg/Ca as a proxy for seawater paleotemperatures [Elderfield and Ganssen, 2000; Hastings et al., 1998; Lea et al., 1999; Nürnberg et al., 1996; Rosenthal et al., 1997]. The interest in Mg-paleothermometry is primarily due to the fact that, in principle, by making paired measurements of δ18O and Mg/Ca on the same shells, one can use Mg/Ca to adjust for the temperature-dependency of δ18O and isolate the δ18O of seawater [Mashiotta et al., 1999]. Oxygen isotope ratios, recorded in fossil foraminiferal shells, have been paleoceanographers' principal tool for constraining glacial-interglacial variations in sea level and continental ice volume. However, in addition to the climatically driven change of seawater 18O/16O composition due to changes in the amount of 16O sequestered in continental ice, records of foraminiferal δ18O also incorporate the temperature effect on the isotopic fractionation during calcification. The relative contribution of these two factors cannot be separated without an additional independent variable (e.g., an independent estimation of the calcification temperature). While several proxies for sea surface temperatures have been developed (e.g., faunal assemblage analysis, alkenone unsaturation index), Mg-paleothermometry offers a few distinctive advantages since it is measured on the same phase as δ18O. First, it gives the actual temperature at which the foraminifer shell precipitated. Second, combined measurements of foraminiferal δ18O and Mg/Ca on the same specimens avoid errors potentially introduced by studying different specimens (or different organisms), which might have not lived at the same depth, season or time. Paired δ18O and Mg/Ca measurements on the same samples allow for reconstructing the temporal relationships between changes in the sea surface temperature (SST) and ice-volume in the past. Depending on the oceanographic context, a record of δ18Owater can provide valuable information about regional paleosalinity or global ice volume.

[3] It is now clear that Mg/Ca paleothermometry has reached a critical stage whereby steps should be taken to assure compatibility between measurements generated by different groups. This study was designed to examine the variability in measurements of Mg/Ca among laboratories. It also included Sr/Ca measurements because of the interest in studying secular variations in seawater Sr/Ca [Martin et al., 1999; Stoll and Schrag, 2000; Stoll et al., 1999]. In general, measurements of foraminiferal Sr/Ca seem more consistent than Mg/Ca among different laboratories; however, down core variations in foraminiferal Sr/Ca have significantly lower amplitude than in Mg/Ca. For example, glacial-interglacial variations of planktonic foraminiferal Mg/Ca are typically on the order of 20–25% and only 5–6% for Sr/Ca. The small size of the signal means that it is important also to test the compatibility among laboratories with respect to Sr/Ca. Because most laboratories are measuring both elemental ratios at the same time, we included both Mg/Ca and Sr/Ca in this study.

[4] Differences in elemental ratios measured in foraminiferal shells by different laboratories may be due to the following reasons: (1) offsets due to instrument calibration or incompatibility of the working standards; (2) offsets due to differences in methods applied by individual laboratories for cleaning foraminifera shells; (3) differences due to natural variability in the sample assemblages. The latter primarily reflects differences in the growth period and depth of calcification in the water column of shells of the same species as well as the mixing of shells of different ages after burial in the sediment. The first two items lead to systematic offsets among laboratories whereas the natural variability results in random differences in data produced by different laboratories as well as by each individual laboratory. The current study was designed to assess whether there are any significant offsets in measurements of Mg/Ca and Sr/Ca among laboratories and understand the possible causes of these offsets.

2. Experimental Design

[5] The study included 13 participants from the USA and Europe. Each laboratory received both standard solutions and foraminifera samples and was assigned a random identification number. The identification numbers along with the instruments and cleaning methods used by each laboratory are listed in Table 1. In most cases, laboratories used either inductively coupled plasma - optical emission spectroscopy (ICP-OES) or inductively coupled plasma - mass spectrometry (ICP-MS). The study was designed to assess the significance of systematic errors (i.e., those introduced by instrumental biases and/or methodological differences) relative to the natural variability in foraminiferal Mg/Ca and Sr/Ca.

Table 1. Instrumentation and Cleaning Methods
Laboratory IDInstrumentMethoda
  • a

    Methods are as follows: (1) “Mg cleaning” method: Rinse (ddH2O/methanol), oxidation, dilute acid leach (ultrasonication employed for clay removal and leaches); (2) “Cd cleaning” method: Rinse (ddH2O/methanol), reduction, oxidation, dilute acid leach (ultrasonication employed for clay removal and leaches); (3) rinse (ddH2O/methanol), oxidation, rinse (ddH2O/methanol), dry in oven (ultrasonication employed for clay removal and leaches); (4) rinse (ddH2O/methanol), reduction, oxidation (ultrasonication employed for clay removal and leaches); (5) oxidation with 5% sodium hypochlorite (full strength Clorox), 6x rinse ddH2O. Oxidation, 30% H2O2/0.1M NaOH solution; reduction, anhydrous hydrazine/NH4OH solution.

108ICP-OES1
129DCP-OES5
140ICP-OES1
142ICP-OES1
171ICP-OES4
213ICP-MS2
360ICP-OES2
558ICP-MS1
578ICP-MS2
628ICP-OES1
856ICP-OES1
867ICP-OES2
881ICP-OES1
935ICP-OES3

2.1. Standards

[6] Two different standard solutions were prepared from three new single element standards (High Purity®; Ca = 1000 ± 3 ppm, Mg = 1000 ± 3 ppm, Sr = 1000 ± 3 ppm). The mixed standard solutions were prepared by spiking the appropriate volumes (determined gravimetrically) of the Mg and Sr standards directly into the primary Ca standard to obtain specific Mg/Ca and Sr/Ca ratios. Each laboratory received the two standard solutions with Ca concentrations of approximately 1000 ± 3 ppm and the following elemental ratios: Standard 1, Mg/Ca = 1.856 ± 0.005 mmol mol−1, Sr/Ca = 1.011 ± 0.003 mmol mol−1; Standard 2, Mg/Ca = 3.682 ± 0.005 mmol mol−1, Sr/Ca = 2.025 ± 0.003 mmol mol−1. The laboratories were instructed to dilute the solutions to the appropriate working concentration and analyze them multiple times as regular samples against their working standards.

2.2. Foraminifera

[7] Foraminifera samples used for this study were picked from surface sediments from different geographical regions. The samples included five species, some are often used for reconstructing upper seawater temperatures: Globigerinoides sacculifer (355–425 μm; w/out sac), G. ruber (212–300 μm; white variety), Pulleniatina obliquiloculata (425–500 μm), Globigerina bulloides (300–355 μm), and Orbulina universa (425–500 μm). The size fractions are those typically used for Mg/Ca and Sr/Ca measurements.

[8] Cores selected for this study represent regions with significantly different surface water temperatures (Figure 1). The cores cover a wide water depth range and thus represent varying preservation states. In general, the selected sites represent four regions: (1) Western subtropical Atlantic samples are from a relatively shallow core (1312 m) from Little Bahama Banks (LBB) characterized by high degree of foraminiferal shell preservation reflecting the calcite oversaturation state of the bottom water [Bainbridge, 1981]. Overlying mean annual SST = 26.7°C [Levitus and Boyer, 1994]. (2) Western Equatorial Pacific samples are from three cores on Ontong Java Plateau (OJP) covering a water depth range between ∼2 and 4.5 km. This depth transect is characterized by an increasing degree of foraminiferal shell dissolution reflecting the calcite undersaturation state of the bottom water [Brown and Elderfield, 1996; Rosenthal et al., 2000]. Overlying mean annual SST = 29.2°C. (3) North Atlantic samples are from a box-core recovered at relatively shallow depth (1605 m) off the coast of Portugal. Overlying mean annual SST = 17°C. (4) South Pacific samples are from a box-core recovered from the Tasman Plateau (2140 m) south of New Zealand. Overlying mean annual SST = 13°C.

Figure 1.

Sample information. Re-picks of the same samples are color-coded (except for black). Asterisks indicate LBB, Little Bahama Bank; OJP, Ontong Java Plateau.

[9] Each individual sample (codes a through r, Figure 1), weighing about 30 mg, was divided into six subsamples of about 5 mg each (marked from 1 to 6; Figure 2). The number of shells in each subsample varied depending on the individual species shell weight (e.g., about 350 shells of G. ruber and 150 shells of G. sacculifer). To minimize differences due to the natural variability within each subsample, we followed the “crush & split” protocol proposed by [Boyle, 1995]: each subsample weighing 5 mg was gently crushed and the broken parts were mixed thoroughly in order to “homogenize” the subsample. Subsequently, the homogenized subsample was split into 8 aliquots using a razor blade. The crushed splits were placed in 8 acid leached 0.5 mL safe-lock Eppendorf® vials. Four of the vials were marked with one random ID (e.g., IC22) and the other four vials were marked with another random ID (e.g., IC56). Four laboratories received two splits (e.g., IC22 and IC56) of the same subsample, which are considered true replicates of the same foraminifera shells. Other laboratories received shells from different subsamples, considered here as re-picks of the same sample. The ID numbers were random so sample details were not known to the participating laboratories and thus each vial was treated as an independent sample. A total of 32–34 vials, each containing ∼500 μg sample were sent to each laboratory. The laboratories were instructed to follow exactly the same methods of sample cleaning and analytical protocols that they are using in their routine work.

Figure 2.

Schematic depiction of the experimental design of this study.

2.3. Statistical Evaluation

[10] In evaluating the results of the interlaboratory comparison study, we generally follow the procedures adopted by previous calibrations studies [Rosell-Melé et al., 2001]. For each individual laboratory we report the means of replicate measurements and, when comparing among laboratories, calculate the mean, median and mode of all measurements. The dispersion of the data is reported in terms of standard deviation (SD 1σ) and relative standard deviation (%RSD, which is equivalent to the coefficient of variation in percentage units). In addition, using analysis of variance (ANOVA), the intralaboratory variance (repeatability, r) and the interlaboratory variance (reproducibility, R) are estimated [Nilsson et al., 1997]. The repeatability is an estimate of the reliability of the procedures used by each individual laboratory. The reproducibility is a measure of comparability of results obtained by different laboratories. The repeatability and reproducibility are calculated from the intralaboratory variance (Sr) and interlaboratory variance (SR), respectively, at the 95% confidence level where repeatability, r = 2.8 Sr and reproducibility, R = 2.8 SR.

[11] Thus converted to degrees centigrade, the reproducibility is a measure of the closeness of agreement between two temperature estimates obtained by different laboratories. In other words, temperature estimates that differ from each other by less than the reproducibility of the method are considered the same within the 95% confidence level. It is important, however, to remember that these statistical parameters are valid only when the data are normally distributed.

[12] The current study was designed to assess several parameters that might affect the comparability of Mg/Ca measurements obtained by different laboratories: (1) the degree of intralaborator and interlaboratory analytical precision determined from repeated analysis of spiked standard solutions, (2) repeatability of each laboratory when analyzing splits from the same subsample of foraminiferal shells, (3) reproducibility among laboratories when analyzing splits from the same subsample of shells, (4) the contribution of natural variability to the variability in Mg/Ca and Sr/Ca measurements as revealed from the analysis of different subsamples of the same sample.

3. Results and Discussion

3.1. Standards

[13] The analysis of the two spiked solutions is used to evaluate the precision and accuracy of instrumental procedures used by different laboratories, when no sample manipulation is required. The results highlight a few important points:

[14] 1. The intralaboratory analytical precision (%RSD) for Mg/Ca and Sr/Ca measurements is generally better than 0.50% (Tables 2a and 2b). Results from solution analysis suggest that the short-term instrumental precision is similar for both ICP-MS and ICP-OES instruments. Long-term precision cannot be assessed from the current results.

Table 2a. Statistical Results of the Analysis of Two Spiked Solutions for Each Individual Laboratory
StandardLab IDMg/CaSDSr/CaSD
  • a

    Results from laboratory #108 were added in proofs. Therefore these results are not included in the figures and the statistical calculations presented in this paper.

S1108a1.7290.0201.0220.005
S2108a3.6630.0232.0520.005
S11291.7910.0071.0080.009
S21293.5730.0011.9900.023
S11401.8610.0081.0180.006
S21403.7610.0722.0130.007
S11421.9720.0111.0210.002
S21423.8480.0112.0240.003
S11711.898 1.045 
S21713.766 2.097 
S15581.8470.0011.0120.000
S25583.6380.0191.9990.002
S15781.8430.0121.0090.001
S25783.5940.0231.9950.002
S16281.8700.0041.0280.002
S26283.6580.0132.0410.004
S18561.985 0.952 
S28563.663 1.812 
S18671.8580.0081.0270.001
S28673.6970.0092.0700.006
S18811.9040.0661.0230.003
S28813.7940.0492.0450.007
S19351.7650.024  
S29353.6730.029  
Table 2b. Statistical Results of the Analysis of Two Spiked Solutions
ParameterStandard 1Standard 2
Mg/CaSr/CaMg/CaSr/Ca
  • a

    Excluding all the following data: Mg/Ca standard 1, all results from laboratories 129,142, 935 and one high Mg/Ca datum from laboratory 881; Mg/Ca standard 2, all results from laboratories 129,142, 935 and the high values from laboratory 140; Sr/Ca both standards, data from laboratory 856. n.d., not determined.

Expected value1.8561.0113.6822.025
Mean1.8791.0193.7442.018
Meana1.8671.0203.6972.023
Median1.8661.0213.7312.018
Mode1.8681.0253.6902.052
Intralaboratory SD0.0090.0030.0190.008
Intralaboratory%RSD0.5%0.3%0.5%0.4%
Interlaboratory SD0.0630.0120.0910.037
Interlaboratory%RSD3.4%1.2%2.4%1.8%
Interlaboratory SDa0.0240.0070.0620.022
Interlaboratory%RSDa1.3%0.7%1.7%1.0%
Sr0.0100.0040.0210.009
Repeatability r0.0280.0110.0590.025
RSDr%1.51%1.10%1.58%1.25%
SRa0.0270.0080.0740.025
Reproducibilty R*0.0770.0230.2080.070
RSDR%*4.1%2.3%5.6%3.5%

[15] 2. The Mg/Ca data of both spiked solutions show bi-modal distribution (Figure 3). This is especially evident for standard 2. In both cases the main mode is close to the expected value, whereas the second mode is higher than the expected value. The bi-modality is not seen in the distribution of the Sr/Ca results (Figure 4). Each of the spiked solutions was sent for analysis in duplicate, 500 μl vials. All the aliquots, however, came from the same batch solutions, and therefore it is unlikely that the bias is due to error in the preparation of the solutions. This is supported by the fact that there is no such bias in the Sr/Ca data. There are two other possibilities to explain the bimodality. First, Mg contamination during handling of one of the spiked solutions might explain the bi-modal results reported by laboratory 140 for standard 2. It is unlikely, however, that contamination can account for the bias in the results reported by other laboratories (e.g., 142). The second possibility is that the offset is due to inaccuracies in the preparation of working standards by some laboratories, poor matching between the calcium concentrations of the working standards and that of the spiked solutions (i.e., matrix effects) or the use of inappropriate calibration methods. For example, it has been reported by laboratory 935 that they did not dilute the spiked solution and therefore, the Ca concentration of these solutions was substantially higher than that of the working standards. Such matrix mismatch can account for the fact that measurements of standard 1 by laboratory 935 gave relatively low Mg/Ca ratios [Lear et al., 2002]. We suspect that this is also the case with laboratory 129. Another example are the Mg/Ca results from laboratory 856. Their reported value for S1 is 7% higher than the expected value, whereas S2 is consistent with the expected value. Unfortunately this laboratory measured each standard only once so it is difficult to ascertain the significance of these values. Taken at face value, however, it suggests a problem with their calibration curve, leading to non-linear offsets. These latter biases can lead to significant offsets in absolute temperature estimates (Figure 3c). That results from foraminiferal analyses obtained by this laboratory are often higher than those obtained by other laboratories is consistent with our suggestion that the bi-modality is due to problems with particular laboratories.

Figure 3.

(a, b) Histograms of individual Mg/Ca measurements of spiked solutions 1 and 2, respectively, obtained by each laboratory. (c) The difference between the mean of Mg/Ca measurements and the expected Mg/Ca in terms of temperature (°C), calculated from data in Table 2a, using the calibration: Mg/Ca = 0.38Exp(0.090T) where T is the seawater temperature in °C [Anand et al., 2003]. Note that the Mg/Ca scale and bin size of both Figures 3a and 3b represent the same relative change (±9% and ±0.5%, respectively) with respect to the mean value in both cases. Temperature estimates are based on the calibration of Anand et al. [2003].

Figure 4.

(a, b) Histograms of individual Sr/Ca measurements of spiked solutions 1 and 2, respectively, obtained by each laboratory. (c) The difference between the mean of the Sr/Ca measurements and the expected Sr/Ca in mmol mol−1 (Table 2a). Note that the Sr/Ca scale and bin size in both Figures 4a and 4b represent the same relative change (±15% and ±0.9%, respectively) with respect to the mean value in both cases. Also, note that the anomaly of laboratory #856 is larger than −0.10 mmol mol−1 and therefore is not shown in Figure 4c.

[16] Results of Sr/Ca analysis are almost normally distributed. For both solutions the modes of the measured data deviates (at the 95% level) from the expected values. The discrepancy is likely due to inaccuracy in the estimation of the expected values. Values obtained by laboratory 856 are consistently lower than others, suggesting inaccuracies in their working standards, which lead to significant offsets in foraminiferal analyses as shown below.

[17] 3. The interlaboratory analytical precision (%RSD) is 3.4% and 2.4% for Mg/Ca analysis of solutions 1 and 2, respectively. Likewise, the interlaboratory Sr/Ca precision is 1.2% and 1.8% for the same solutions (Tables 2a and 2b). These RSD's are significantly larger than the intralaboratory precision. We recalculated the mean and standard deviation without outliers. Thus we excluded the following data points: standard 1, all results from laboratories 129,142, 935 and one high Mg/Ca datum from laboratory 881; standard 2, all results from laboratories 129,142, 935 (note that although results from this laboratory fall with all the other data they did not match the solutions matrix with their standards and we decided not to include their data) and the high values from laboratory 140. On the basis of this edited data set, we estimate the Mg/Ca interlaboratory precision on the analysis of the two solutions to about 1.3% and 1.7% for standards 1 and 2, respectively. Likewise, the Sr/Ca interlaboratory precision is ≤1% for both standard solutions (excluding data from laboratory 856 for both standard solutions). The repeatability (r) of the Mg/Ca results for both solutions are on average about 1.5%. The reproducibility (R) is about 4% and 6% for solutions 1 and 2, respectively (Tables 2a and 2b). Converting these values to degrees centigrade, using the recently published calibration equation: Mg/Ca = 0.38Exp(0.090T) where T is the seawater temperature in °C [Anand et al., 2003], we calculate repeatability and reproducibility of ±0.2°C and ±0.5°C, respectively, for the analysis of solutions only.

3.2. Foraminifera

[18] Averages of duplicate measurements of Mg/Ca of individual species from various sites are given in Figures 5 and 6. The Sr/Ca data are given in Figures 7 and 8. The complete data set from the analysis of the solutions and foraminifera samples is available in as downloadable appendix at the end of the paper. The results of Mg/Ca and Sr/Ca analysis in foraminifera by different laboratories show large scatter that is significantly greater than the precision of replicate measurements of the same sample by each individual laboratory. The interlaboratory variability is also substantially larger than that obtained from the analysis of spiked solutions. To better quantify the errors associated with these analyses we need to understand the different sources of error. Errors can be divided into systematic and random types. Systematic offsets may be due to the application of different cleaning protocols, contamination from reagents used during the cleaning stage and the analysis (blank problem). As discussed previously, it is possible that some of the offsets are due to the use of inaccurate working standards. Random errors are often caused by contamination from adhering sediments or insufficient matrix matching between samples and standards. Random errors are typically recognized as outliers and are characterized by poor repeatability of replicate analysis of the same sample. At first glance the scatter in the data is quite large (Figures 6 and 8). To better quantify the comparability of Mg/Ca and Sr/Ca measurements by different laboratories we have filtered out data that are clearly biased relative to the entire population (note, however, that all the data are given in the tables). To identify outliers, we used Youden plots in which the means of one sample measured by each laboratory are plotted against the means of another sample. Samples that are offset from the next datum point by more than 3 standard deviations (calculated for the entire sample set) are considered to be outliers. The rejected data include: (1) Both Mg/Ca and Sr/Ca data sets from laboratory 129. Data from this laboratory are substantially higher than obtained by other laboratories and exhibit relatively poor reproducibility. The bias is likely a result of the cleaning procedure used by this laboratory. This lab tested a simple and rapid cleaning procedure that is clearly inadequate (i.e., oxidation with 5% sodium hypochlorite, followed by multiple rinses with deionized water without ultrasonication); (2) Two Sr/Ca data sets (laboratories 360 and 856) that are significantly lower than the other data sets. The analysis of spiked solutions shows that Sr/Ca values obtained by laboratory 856 are about 10% lower than the average value, which accounts for the bias observed in the foraminifera data. Although laboratory 360 did not analyze the spiked solutions, it is likely that inaccuracies in their working standards cause the observed offset in the foraminiferal data. These offsets can be easily corrected in future work, therefore we do not include them in the statistical evaluation; (3) 28 Mg/Ca data points that are significantly higher than the other results. The outliers constitute about 6% of the Mg/Ca data set (excluding the results from laboratory 129). These high values are probably due to contamination from adhering clays [Barker et al., 2003], which have high magnesium content [Emiliani, 1955]. The fact that there is significant difference in the number of outliers among different laboratories indicates that some laboratories hold more stringent cleaning protocols and supports our argument that the outliers are likely a result of insufficient cleaning during sample preparation. The low number of Sr/Ca outliers (about 1%) is consistent with the fact that detrital clays are relatively enriched with Mg but not with Sr.

Figure 5.

Foraminiferal Mg/Ca data. Crush and split subsamples are marked by the same color. Each datum represents the mean of duplicate analyses except for italicized values for which one of the duplicates was considered an outlier (strike-through values are not considered as the two duplicates are considered outliers). Asterisks indicate that all data from this laboratory were not considered in the statistical analysis (see text).

Figure 6.

Mg/Ca results from the analysis of five species of planktonic foraminifera. Each point represents an average of the analysis of two replicates from the same subsample. Boxes represent the mean and standard deviation (1 S.D.) for each sample, excluding the outliers (see details of the calculation in the text). Temperature estimates are based on the calibration of Anand et al. [2003].

Figure 6.

(continued)

Figure 7.

Foraminiferal Sr/Ca data. Crush & split subsamples are marked by the same color. Each datum represents the mean of duplicate analyses except for italicized values for which one of the duplicates was considered an outlier (strike-through values are not considered as the two duplicates are considered outliers). Asterisks indicate that all data from this laboratory were not considered in the statistical analysis (see text); nd, not determined.

Figure 8.

Sr/Ca results from the analysis of five species of planktonic foraminifera. Each point represents an average of the analysis of two replicates from the same subsample. Boxes represent the mean and standard deviation (1SD) for each sample, excluding the outliers (see details of the calculation in the text).

Figure 8.

(continued)

[19] In the following discussions and statistical calculations we ignore all the outliers in order to obtain a better estimate of the community performance. Mean, median, standard deviation (1σ), interlaboratory variance (SR) and reproducibility (R) values were calculated from individual Mg/Ca and Sr/Ca measurements in foraminifera and are given in Table 3. In most cases, the results represent 28 individual measurements. The means and standard deviations are also shown in Figures 4 and 5. The standard deviations (1σ) of Mg/Ca measurements obtained by the different laboratories range from ±0.2 to 0.6 mmol mol−1 (1σ), or an average of ±8% RSD (Table 3). In comparison, the Sr/Ca variability among laboratories is about ±0.02 to 0.06 mmol mol−1, or an average of ±3% RSD. Using recent field calibrations [Anand et al., 2003], we calculate the errors associated with temperature estimates to be between ±0.3 and 1.3°C (1σ) or better than ±1°C for all samples. Using the same temperature calibration for all the samples, we estimate an interlaboratory reproducibility of 20–30% or 2–3°C. This estimate is somewhat misleading, however. In previous studies, interlaboratory discrepancies were often circumvented by choosing site specific calibrations that yield the correct modern surface temperature when applied to core top samples. These calibrations not only corrected for differences in foraminiferal preservation among sites (see below) but also for poor interlaboratory reproducibility. So, while the interlaboratory reproducibility of Mg/Ca measurements is on the order of 20–30%, in practice the reproducibility of Mg/Ca-derived temperature estimates is significantly better.

Table 3. Interlaboratory Precision (SD) and Reproducibility (R) Mg/Ca and Sr/Ca (in mmol mol−1) for the Analysis of Foraminifera Samples
Sample CodeMeanMedianSDRSD%SRRRSDR%
Mg/Ca
a8.338.180.678.1%0.812.2727.2%
b3.543.580.215.8%0.250.6919.4%
c4.464.460.327.3%0.421.1826.4%
d3.563.510.288.0%0.350.9727.2%
e2.902.910.289.8%0.361.0134.8%
f5.004.910.428.4%0.511.4328.6%
g2.762.680.269.5%0.320.8932.2%
h3.323.300.298.8%0.361.0130.2%
i1.761.790.158.5%0.180.5128.8%
j2.612.570.3312.6%0.371.0439.6%
k4.554.590.347.5%0.411.1525.2%
l4.224.160.286.6%0.340.9422.4%
m4.184.160.184.2%0.210.5914.2%
n2.722.700.2910.7%0.340.9535.0%
o4.334.340.286.4%0.340.9421.8%
p2.883.010.3110.8%0.391.0837.4%
q3.373.300.319.1%0.371.0430.8%
r4.144.130.317.4%0.371.0425.0%
 
Sr/Ca
a1.381.380.064.6%0.080.2216.0%
b1.361.360.042.6%0.040.128.9%
c1.401.410.032.0%0.040.118.0%
d1.371.370.053.3%0.060.1611.5%
e1.341.350.053.4%0.060.1611.7%
f1.461.460.043.0%0.050.1510.2%
g1.401.400.032.4%0.040.118.2%
h1.521.520.063.8%0.07.2013.3%
i1.341.350.043.0%0.050.1410.1%
j1.371.370.042.7%0.040.129.1%
k1.461.470.053.3%0.060.1711.4%
l1.331.340.042.7%0.040.129.3%
m1.341.350.042.9%0.050.1310.0%
n1.391.410.042.9%0.050.149.9%
o1.351.360.043.1%0.050.1510.8%
p1.351.360.053.5%0.060.1612.1%
q1.501.500.063.8%0.070.1912.9%
r1.441.450.042.9%0.050.149.9%

[20] The interlaboratory precision for the analysis of foraminiferal samples is about four times worse than that obtained on the spiked solutions. In the following sections we explore the relative contribution of several parameters to the scatter in Mg/Ca data. These include intralaboratory precision, natural variability among samples and cleaning efficacy.

3.3. Sources of Variability in Mg/Ca Analysis

3.3.1. Intralaboratory Variance

[21] To test the precision and repeatability of individual laboratories in the analysis of foraminiferal samples we compare results from the analysis of four replicates from the same samples (Table 4, Figure 9). The samples represent four pairs of re-picked samples including b&d, c&o, e&p, h&q (the pairs are color coded in Figure 1). The average intralaboratory precision on the analysis of G. sacculifer samples is about ±3% (RSD) and we find no significant difference between the deep Pacific (OJP) and shallow Atlantic sites. The repeatability is accordingly good, about ±1°C. The precision on the analysis of P. obliquiloculata from LBB and G. bulloides from the North Atlantic is significantly lower, about ±1.5° and 2°C, respectively. Variably high Mg/Ca ratios, found in G. bulloides shells from North Atlantic core tops, have been attributed to insufficient removal of high Mg calcite layers during the cleaning process, although the nature of these has not been fully investigated [Elderfield and Ganssen, 2000]. Similar problems have been encountered in studies of other planktonic species from LBB, which might explain the poor repeatability of the P. obliquiloculata results (Y. Rosenthal, unpublished data, XXXX).

Figure 9.

Comparison of intralaboratory Mg/Ca standard deviations (±1σ) obtained from the analysis of replicates of each sample, calculated in terms of temperature using the calibration of Anand et al. [2003]. (a) Sample replicates of G. sacculifer from Ontong Java Plateau Core MW91-1 BC56 (4401 meters below seafloor); (b) sample replicates of G. sacculifer from Little Bahama Bank Core OC205 BC60 (1312 meters below seafloor); (c) sample replicates of P. obliquiloculata from Little Bahama Bank Core OC205 BC60 (1312 meters below seafloor); (d) sample replicates of G. bulloides from North Atlantic Core M39059-2 (1605 meters below seafloor). Also compared are the RSD% obtained by laboratories using a “Mg cleaning” (thin line) and “Cd cleaning” (bold line) methods.

Table 4. Intralaboratory Precision (SD) and Repeatability (r) of Mg/Ca and Sr/Ca for Replicate Analyses of Four Pairs of Repicked Foraminifera Samples
SampleSpeciesRegionSDRSD%SrrRSDr%
Mg/Ca
b&dsaccOJP0.123.4%0.130.3610.3%
c&osaccLBB0.122.8%0.140.399.1%
h&qobliqLBB0.165.6%0.190.5318.7%
e&pbullN. Atl0.278.0%0.300.8424.9%
 
Sr/Ca
b&dsaccOJP0.0100.7%0.0110.0312.3%
c&osaccLBB0.0181.3%0.0200.0564.1%
h&qobliqLBB0.0100.7%0.0110.0312.3%
e&pbullN. Atl0.0352.3%0.0390.1097.2%

[22] The precision and repeatability of Sr/Ca results are significantly better than that of Mg/Ca (Table 4). Interestingly, Sr/Ca data from the analysis of G. bulloides shells also show the largest variance. The intralaboratory relative precision (%RSD) and repeatability of Mg/Ca and Sr/Ca measurements of spiked solutions is essentially the same (Tables 2a and 2b). In contrast, the relative precision (%RSD) and repeatability of Sr/Ca analysis of foraminifera shells is 2–5 times better than that of Mg/Ca, suggesting that the difference in the results is not due to instrumental biases, but rather related to either natural variability in the samples or to the cleaning methods used for preparing samples for the analysis.

3.3.2. Natural Variability

[23] This variability is primarily caused by differences in the depth of calcification of individual shells of the same species in the water column and the mixing of shells of different ages after burial in the sediment. Histograms of all replicate measurements from each laboratory for the analysis of Mg/Ca in individual foraminifera samples are shown in Figure 10. Also, for each site we compare the mean and standard deviation obtained by all participating laboratories (excluding outliers) with the mean and standard deviation obtained from individually crushed and split subsamples. Analyses carried out on the same subsamples are marked in Tables 4 and 5 by the same color. As shown, there are noticable differences in mean Mg/Ca ratios among re-picked samples. However, within the precision of the data (±1σ) the mean Mg/Ca ratios for individual subsample splits are, in most cases, not significantly different from each other and are statistically the same as the mean of the entire sample population. In some cases, however, the average Mg/Ca value for a subsample (i.e., crushed and homogenized) is significantly different from that obtained for other subsamples. The data also suggest that homogenizing the samples results in greater sample reproducibility than obtained on re-picked samples. These results suggest that although natural variability may sometimes be significant, it cannot account for all of the observed variability among the different laboratories. Consequently, a significant part of the variability must be related to differences in cleaning methods.

Figure 10.

Histograms of Mg/Ca measurements of each foraminiferal sample (a through r) obtained by different laboratories (top panel). In the bottom panel the mean Mg/Ca and standard deviation (±1σ) for all the laboratories (bold) is compared with the mean ±1σ of “crush & split” subsamples. Note that the range in each figure is 50% of the mean of all data. Bins are all 0.1 mmol/mol. The standard deviations are calculated from the mean of each duplicate sample rather than for each individual analysis, as was done in Table 3.

Figure 10.

(continued)

Figure 10.

(continued)

Figure 10.

(continued)

Figure 10.

(continued)

Table 5. Interlaboratory Variance (SD) and Reproducibility (R) of Mg/Ca Analyses (in mmol mol−1) Obtained Only From Laboratories Using the “Mg cleaning” Protocol
SampleMeanMedianSDRSD%SRRRSDR%
Mg/Ca
a8.468.640.607.1%0.782.1825.8%
b3.643.660.143.8%0.170.4813.1%
c4.564.520.275.9%0.320.9019.6%
d3.733.670.236.2%0.300.8422.5%
e3.033.030.155.0%0.200.5618.5%
f5.245.180.346.5%0.421.1822.4%
g2.852.750.248.4%0.280.7827.5%
h3.403.470.329.4%0.421.1834.6%
i1.861.860.094.8%0.120.3418.1%
j2.712.680.228.1%0.280.7828.9%
k4.764.740.255.3%0.320.9018.8%
l4.334.330.214.8%0.270.7617.5%
m4.244.210.163.8%0.200.5613.2%
n2.782.740.3010.8%0.340.9534.2%
o4.454.450.194.3%0.250.7015.7%
p3.013.010.186.0%0.230.6421.4%
q3.473.440.308.6%0.371.0429.9%
r4.274.230.266.1%0.330.9221.6%

3.3.3. Analytical Difference

[24] The analysis of spiked solutions suggests that a significant part of the interlaboratory variability in both Mg/Ca and Sr/Ca is due to inconsistencies among standards prepared by different laboratories. As suggested above, these inconsistencies account for about 2% (RSD) of the total variability. An additional source of variability in Mg/Ca is the offsets caused by differences in cleaning methods applied by different laboratories. In general, the two most commonly used cleaning protocols are: (1) “Mg cleaning” protocol, which is a short version of the [Boyle and Keigwin, 1985] method. This method includes multiple rinses with distilled deionised water (ddH2O) and methanol in an ultrasonic bath followed by oxidation in a warm (60°C) mixture of 30% H2O2/0.1M NaOH solution and typically a single acid leaching step with 250 μL 0.001N HNO3 ultrasonicated for 30 seconds; and (2) “Cd cleaning” method, which is the full version [Boyle and Keigwin, 1985; Boyle and Rosenthal, 1996]. This protocol includes multiple rinses with ddH2O and methanol in an ultrasonic bath followed by a reduction step in hot solution of anhydrous hydrazine/NH4OH solution, then oxidation in a warm 30%H2O2/0.1M NaOH solution and finally multiple acid leachings (0 to 4 times depending on sample size), each with 250 μL 0.001N HNO3 ultrasonicated for 30 seconds. The “Cd cleaning” cleaning protocol includes the reductive step, which was devised to remove Mn- and Fe-oxides adhering to the shells, and is used when other metal ratios (e.g., Cd/Ca, Ba/Ca and U/Ca) are measured in addition to Mg/Ca and Sr/Ca typically by laboratories using ICP-MS. It has been shown that cleaning of planktonic foraminifera leads to a progressive decrease in bulk shell Mg/Ca ratios; more rigorous cleaning results in lower Mg/Ca ratios [Hastings et al., 1998]. In that study, the authors argued that the reduction step does not have a discernible effect on Mg/Ca and concluded that samples cleaned by either the “Mg method” or “Cd method” should yield statistically similar ratios. However, other experiments on the effects of cleaning on metal ratios in benthic foraminifera suggested that the addition of reductive step might lead, in some samples, to a significant decrease in Mg/Ca [Martin and Lea, 2002]. These authors suggest that the lowering of Mg/Ca values following oxidation and reduction appears to reflect the removal of contaminant Mg associated with remnant organic matter and adsorbed phases. However, given that only one out of the two samples showed a decrease in Mg/Ca following the reductive step, it remains debatable whether the difference between samples cleaned with and without reductive treatment is significant. The current study provides a sufficiently large data set to rigorously address this issue.

[25] To test the consistency between the two cleaning methods we compare Mg/Ca results obtained for all species from laboratories using the “Mg cleaning” protocol with those using the “Cd cleaning” method. We exclude from this analysis the data from O. universa because of the relatively high Mg/Ca ratios characteristic of this species (and their large scatter); including these high mean and SD values tends to heavily bias the statistical analysis. Also excluded from this discussion are the results from laboratories whose analysis of spiked solutions showed poor consistency with other data sets. The comparison shows that data generated by laboratories using the “Mg cleaning” method are internally consistent within the analytical errors (Figure 11a). Likewise, data generated only by the “Cd cleaning” method are also internally consistent with possibly one exception (Figure 11a); results from laboratory 360 are lower (up to 7% at the high end) than the other data sets. The offset is not statistically significant, however. Because there are no measurements of the spiked solutions from this laboratory, we cannot evaluate the reasons for this apparent offset.

Figure 11.

Comparison of Mg/Ca data from the analysis of planktonic foraminifera obtained from different laboratories. Each point represents the mean of the analysis of two replicates from the same subsample. (a) Data from laboratories using the “Mg cleaning” method version. Note that all the data fall on the 1:1 line; (b) Data from laboratories using “Cd cleaning”. Note that, except for laboratory 360, all the data fall on the 1:1 line; (c) Comparison of data generated by laboratories using “Cd cleaning” with data from a laboratory using “Mg cleaning” (Lab #628). Note that results obtained from the “Cd cleaning” method are about 15% lower than those obtained from the “Mg cleaning” method.

[26] A regression analysis between the two cleaning methods yield the following relationship: y = (0.85 ± 0.02) x + (0.20 ± 0.04) where y marks results from “Cd cleaning” and x from “Mg cleaning”. Note that the intercept is not significantly different from zero. This is somewhat expected since both cleaning methods exhibit similar precision. The results suggest that Mg/Ca ratios obtained from planktonic shells cleaned by the “Cd cleaning” protocol are on average 15% lower than those obtained by the “Mg cleaning” method (Figure 11c). This offset, which is attributable to the addition of reductive step, is a robust result (r2 > 0.97) and is consistent with the results of a recent study by [Barker et al., 2003]. It is also noteworthy that the same slope and intercept are obtained when using either MODEL I or MODEL II regressions. The variance in this estimate is related to several factors such as the preservation state of the shells and the species. Barker et al. [2003] demonstrate that the decrease in bulk shell Mg/Ca associated with the reductive step is largely a result of partial dissolution of high-Mg containing layers of the foraminiferal shell; hydrazine reduction is a very corrosive step, leading to a significant dissolution of the calcitic shell. These authors also show that removal of Mn-Fe-oxides is unlikely to explain the observed decrease in Mg/Ca. Partial dissolution of Mg rich layers of the shell is consistent with processes occurring on the seafloor as suggested in several studies [Brown and Elderfield, 1996; Rosenthal et al., 2000]. It is also consistent with analyses of planktonic foraminifera using a flow-through dissolution method [Benway et al., 2003; Haley and Klinkhammer, 2002].

[27] We use the temperature calibration of [Anand et al., 2003] to estimate the temperature bias due to cleaning effects. Figure 12 shows the difference in temperature estimates of samples cleaned with the “Cd cleaning” method relative to results obtained by laboratory 628 who used the “Mg cleaning” cleaning method for all the different sites. While there is significant scatter in the data, it is also clear that on average, samples from the “Cd cleaning” method yield temperatures that are about 0.6°C colder than those obtained by the less rigorous cleaning. This difference is significant at the 95% confidence level (±2 SE). In contrast with Mg/Ca, Sr/Ca ratios are not significantly dependent on the choice of cleaning methods.

Figure 12.

Difference between Mg temperatures from Lab 628, which employed “Mg cleaning” and laboratories 213, 578 and 867, which employed “Cd cleaning” technique. On average, the “Mg cleaning” Mg temperatures are 0.63°C warmer than the “Cd cleaning” Mg temperatures (solid black line). The gray shaded area defines the ±2 S.E. (95% confidence interval) of the data shown. Temperature estimates are based on the calibration of Anand et al. [2003].

[28] The current study and that of [Barker et al., 2003] provide strong evidence to suggest that differences in cleaning methods constitute a significant source of interlaboratory variability in Mg/Ca analysis. Thus when applied in paleoceanographic studies cleaning offsets may lead to significant biases in estimating absolute seawater temperature. This is especially the case when samples used in down core studies were cleaned differently than those used for the calibration. The offset does not have, however, a discernible effect on the estimates of relative temperature changes (e.g., glacial-interglacial temperature variability) (Rosenthal-unpublished data). On the basis of this study, there is no clear difference in the intralaboratory %RSD between samples cleaned with the “Mg cleaning” and “Cd cleaning” cleaning (Figure 9). The main issue is, therefore, the interlaboratory reproducibility. In Table 5 we recalculated the interlaboratory standard deviation (SD) and reproducibility (R) of Mg/Ca analyses based only on measurements from laboratories using the “Mg cleaning” protocol. The average standard deviation for all the samples is 6% and the reproducibility is about 22% as compared with average SD of 8% and reproducibility of 28% obtained from measurements using both the “Mg cleaning” and “Cd cleaning” methods. This is a considerable improvement (about 25%) in interlaboratory performance, which translates to a temperature error of about ±0.7°C rather than the ±1°C obtained previously.

3.3.4. Preservation Effects

[29] Discussion of the alteration of Mg/Ca ratios in planktonic foraminifera due to post-depositional dissolution is beyond the scope of this paper. However, our results show that differences in Mg/Ca due to variable preservation states of the samples significantly exceed the magnitude of analytical differences estimated here. For example, G. ruber samples from a shallow core in Little Bahama Bank in the Atlantic Ocean show significantly higher Mg/Ca than samples from the Ontong Java Plateau in the Pacific Ocean although sea surface temperature is significantly warmer in the latter site (26.7° and 28.9° in the LBB and OJP, respectively). In addition, there is a significant difference in Mg/Ca between shallow and deep cores in the OJP. Similar trends are observed for the other planktonic species studied here. Sr/Ca ratios do not exhibit similar depth dependent trends suggesting that post-depositional dissolution has a much smaller effect if any on Sr/Ca ratios in planktonic foraminifera. Using the calibration of Anand et al. [2003] we show that dissolution effects may lead to offsets in Mg/Ca-based temperature estimates of up to 4°C. Clearly, corrections for post-depositional-dissolution effects need to be made by either using regional calibrations [Lea et al., 2000] or dissolution corrected calibrations [Dekens et al., 2002; Rosenthal and Lohmann, 2002].

4. Conclusions

[30] Results from this study suggest that, in principle, Mg/Ca-based estimates of seawater temperatures are reproducible among laboratories. The few large deviations from the mean population's value are mainly caused by the use of inaccurate working standards or inadequate/inconsistent cleaning. After omitting the errant data sets we estimate an interlaboratory RSD of about ±8% for Mg/Ca measurements. This suggests an interlaboratory reproducibility of about ±2–3°C. In comparison, the intralaboratory repeatability is about ±1–2°C. Both parameters strongly depend on the type of the sample (i.e., species and degree of preservation). A significant degree of the scatter in the data is caused by the inconsistency in cleaning methods used by the different laboratories. We find a consistent bias, of about 1°C, between the “Cd cleaning” method that includes both oxidative and reductive steps and the “Mg cleaning” version that includes only an oxidation step. These analytical offsets have a significant effect on estimates of absolute temperatures and therefore, on interlaboratory comparability. The offsets, however, do not have a significant effect on estimates of relative down core temperature changes. Obviously, adopting a consistent cleaning method would lead to better reproducibility among laboratories measuring Mg/Ca. In practice, the problem is partly circumvented because laboratories are using local calibration equations to adjust their core top temperature estimates to the overlying observed hydrography. Using this practice, laboratories not only correct for differences in foraminiferal preservation among sites but also inadvertently for poor interlaboratory reproducibility. So in fact, the reproducibility of temperature estimates is often better than that of the Mg/Ca data on which they are based.

[31] In practice, it would be advantageous if all laboratories adopt a single cleaning protocol for Mg/Ca analysis. The results of this study and that of Barker et al. [2003] suggest that using the “Mg cleaning” protocol is sufficiently effective in removing all contamination sources. That the “Mg cleaning” protocol is also significantly less laborious than the “Cd cleaning” version further recommends it as a universal method for Mg/Ca measurements. Indeed, many laboratories currently using ICP-OES already use a variation of the short method so this proposal may seem reasonable. However, laboratories studying other trace metal proxies (e.g., Cd/Ca, Ba/Ca, U/Ca) will continue to use the “Cd cleaning” protocol. Thus it is difficult to foresee the adoption of a single method in the near future. Though a universal method may not be currently feasible, comparability among laboratories can be improved. First, we recommend that working standards be calibrated against independently quantified standards. Second, given the high Mg content of clays we suggest that Al/Ca, Fe/Ca or Ti/Ca be measured along with Mg/Ca to monitor for possible contamination from adhering clays. Thirdly, it becomes apparent that the dilute acid leaching step may have a significant impact on the bulk shell Mg/Ca ratio [Barker et al., 2003]. Therefore it seems that the two cleaning methods could be more consistent if the leaching step in the “Mg cleaning” method is adjusted to compensate for the lack of reductive step (and its concomitant partial dissolution). This, however, needs to be rigorously assessed.

[32] An obvious, but nonetheless important, conclusion of this study is the need for standards calibration among laboratories as a first step toward improving interlaboratory compatibility. Clearly with the increase in number of laboratories involved in Mg/Ca measurements for paleoceanographic studies there is an urgent to develop an agreeable solid standard to be used by all the laboratories much as is done in isotope analyses. It would also be beneficial if new comers consult with more experienced laboratories about cleaning and analytical techniques and carry out their own intercomparison studies before proceeding with paleoceanographic research. Reducing the uncertainties in Mg/Ca-based temperature estimates, both due to interlaboratory inconsistency and dissolution effects, offers the potential of using paired δ18O and Mg/Ca data for rigorous paleoceanographic reconstructions in a similar manner to that conducted with modern data. Issues raised here should be addressed through further research and perhaps also in a workshop to discuss the different approaches.

Acknowledgments

[33] The authors acknowledge the collaborative spirit of all the participating parties. The project was supported by NSF grants OCE0117569 and OCE9986716 to YR. Special thanks to Bill Curry, Anja Müller and Will Howard who provided samples for this study. Comments from the editor Bill White and two anonymous reviewers greatly improved the manuscript.

Ancillary