Preparation and quality control of in‐house reference materials for marine dissolved inorganic carbon and total alkalinity measurements

Accurate measurements of seawater carbonate chemistry are crucial for marine carbon cycle research. Certified reference materials (CRMs) are typically analyzed alongside samples to correct measurements for calibration drift. However, the COVID‐19 pandemic led to a limited access to CRMs. In response to this shortage, we prepared and monitored in‐house reference materials (IHRMs) for total alkalinity (TA) and dissolved inorganic carbon (DIC), over 12 and 15 months, respectively. Overall, TA was stable, but a slight increase in DIC of about 2 μmol kg−1 occurred over 15 months. The increase could potentially be attributed to bacterial growth, despite mercuric chloride fixation and repeated UV exposure. It is noted that this small increase was most likely within our instrument and measurements uncertainties. Our repeated measurements also identified a few bottles that had TA or DIC concentrations 4–5 μmol kg−1 higher than the rest, indicating issues during cleaning, fixation, or storage of individual bottles. This study emphasizes the importance of careful and continuous monitoring if self‐prepared IHRMs are used. Given that the amount of work required is very high, IHRM preparation is only recommended when CRMs are not available.

Marine carbonate chemistry comprises a critical area of research involving millions of seawater samples collected over the past 50 yr (Lauvset et al. 2022).Oceans play an important role in the global carbon cycle, and the marine inorganic carbon cycle has been of prime interest for researchers, particularly because $ 26% of anthropogenic carbon dioxide (CO 2 ) emissions from fossil fuels combustion, cement production, and land use change are naturally stored in the oceans (Gattuso et al. 2015, Friedlingstein et al. 2022).Two of the most frequent parameters measured to determine the inorganic carbon system in seawater are dissolved inorganic carbon (DIC) and total alkalinity (TA).DIC refers to the sum of CO 2 , bicarbonate (HCO 3 À ) and carbonate (CO 3 2À ) ions, and TA to the balance of proton acceptors over donors (Dickson et al. 2007, Middelburg et al. 2020).Accurate and precise measurements of these parameters are essential to quantify ongoing oceanic CO 2 uptake, develop models for climate change projections, and inform policymakers.Certified reference materials (CRMs) have played a vital role in achieving accurate and reliable measurements of marine carbonate chemistry.Since 1990, Prof. Andrew G. Dickson and his team at Scripps Institution of Oceanography have been supplying CRMs to the scientific community (Dickson 2010).By comparing the measured value of a CRM to the certified value, operators can determine a correction factor allowing to compensate for changes in instrument calibration that can occur during a measurement day, or over longer periods of time, for example, H 2 O concentration in the reference cell of an infrared analyzer or changes in titrant molarity for TA.This accuracy control is crucial to produce trustworthy databases of marine carbonate chemistry.
The accurate determination of carbonate chemistry speciation is important not only in monitoring changes in the oceanic carbon budget, but also in experimental assessments of the effects of changing carbonate chemistry on marine organisms and ecosystems.Additionally, another important aspect of CRMs is that their certified concentrations have been shown to remain stable over extended periods of time (Dickson 2010).
The COVID-19 pandemic had a negative impact on industries and supply chains worldwide, including the availability of CRMs (Nicola et al. 2020).Between February 2020 and January 2021, CRMs were not available, which forced scientists to explore alternatives (National Centers for Environmental Information 2022).One option is the use of self-made in-house reference materials (IHRMs) that are measured and validated against a CRM.By replacing CRMs with IHRMs, significantly less CRMs are required so that carbonate chemistry measurements can be sustained for longer as significantly less CRMs are needed, that is, only used to check for stability and consistency of the IHRM batch over time.Also, for applications where only "weather" instead of "climate" accuracy is required (Newton et al. 2015), IHRMs might be an option.While this option may appear cost-effective, it requires extensive labor and careful preparation to ensure consistency across the batch and stability of individual bottles over time.
In this study, we present a protocol for preparing and monitoring IHRMs for DIC and TA when CRMs are unavailable or low measurement uncertainties are not required.For that purpose, a batch of seawater was equilibrated to atmospheric pCO 2 in an effort to increase DIC stability once an IHRM is opened.Furthermore, batch consistency and temporal stability over a period of 15 months following bottling was assessed.Finally, we also discuss how the stability of these IHRMs could be improved, describe limitations, and whether the effort of producing and monitoring them is ultimately worthwhile.

Seawater preparation
Approximately 70 L of natural seawater were collected $ 100 m offshore of Wategos Beach, Byron Bay, 2481 New South Wales, Australia (28 38 0 08 00 S, 153 37 0 48 00 E).The seawater was stored in the dark at 4 C for 7 d before being filtered using a peristaltic pump connected to a 0.2 μm Whatman Polycap 75 AS filter.The filtered seawater was collected into a 100 L high density polyethylene tank, that had been thoroughly washed and rinsed with Milli Q (18.2 MΩ) beforehand.
The seawater was aerated with laboratory air for 5 d to match ambient pCO 2 .Two lime wood microdiffusors (88 Â 30 Â 30 mm, Sander) were connected to an air pump and placed at the bottom of the tank.A gas washing bottle filled with Milli Q was installed between the pump and the diffusors to minimize evaporation (Fig. 1).The pH, indicator for pCO 2 at constant alkalinity and temperature, was monitored daily.Once a stable pH was recorded with an Aquatrode Plus with Pt1000 (Metrohm), connected to an 888 Titrando (Metrohm), that is, < 0.01 change over 24 h, the aeration was stopped (corresponding to a change in pCO 2 by $ 10 μatm, the diurnal variability in the laboratory air).

Seawater bottling
All glassware used for this project was subject to a strict 4-step cleaning process.The bottles were acid washed with a 0.01 M HCl solution (although much stronger molarities might be more effective against biofilms), rinsed with tap water, cleaned by a dishwasher with a final Milli Q rinse, and dried for 24 h at 60 C. Bottling of the seawater was done according to the Standard Operating Procedure (SOP) 1 from Dickson et al. (2007), using 500 mL borosilicate 3.3 glassstoppered bottles (high-quality, water and acid resistant, as per Corning Laboratory Pyrex glassware), and took about 1.5 h.The bottles were filled by gravity from the bottom of the tank, using a drawing tube placed at the bottom of the bottle.Each bottle was filled to the top with CO 2 equilibrated seawater, with at least half of their volume of overflow.Each bottle was then closed with a clean glass stopper, removing the excess water and preventing any CO 2 exchange with air.Once all bottles were filled (only about 2 /3 of the 70 L seawater in the tank were used to minimize potential changes in DIC while bottling), 5 mL were removed from each bottle with a pipette, corresponding to 1% of the bottle volume (Dickson et al. 2007).The bottle neck and glass-stopper were then carefully dried (Kimwipe) and greased using Apiezon L grease.Finally, 250 μL of a saturated mercuric chloride (HgCl 2 ) solution, corresponding to 0.05% of the total volume, was added to each bottle before being closed and securely sealed.
All bottles were mixed to evenly mix the mercuric chloride, and exposed to UV light for at least 2 h, although it is acknowledged that the glass walls most likely absorbed a significant amount of the UV radiation.In total, 65 bottles were filled.The laboratory pCO 2 was measured during the bottling of the bottles on a Picarro G2201-I gas analyzer.This allowed us to evaluate the degree of CO 2 equilibration by comparing measured atmospheric CO 2 with calculated seawater pCO 2 .

DIC, TA, and nutrient measurements
DIC and TA concentrations were measured against CRM batches #190 and #195 (Dickson 2010).DIC was measured using an automated infra-red inorganic carbon analyzer coupled to a LICOR Li7000 infra-red detector (Gafar and Schulz 2018).TA was measured using an open-cell titration (Dickson et al. 2007), with an 848 Titrino plus coupled to an 869 Compact Sample Changer (Metrohm).The ionic strength of the titrant (0.05 M HCl) was adjusted to $ 0.72 mol kg À1 (corresponding to a salinity of 35) with sodium chloride (Zeebe and Wolf-Gladrow 2001).Overall precision for an individual sample was estimated by error propagation of measured sample and CRM standard deviations (Dickson et al. 2007), yielding an average of AE 1.1 μmol kg À1 for DIC, and AE 0.7 μmol kg À1 for TA.
Before bottling from the 100 L tank, a natural seawater sample was collected in a 250 mL air-tight polycarbonate bottle for analysis of nitrate, nitrite, phosphate, and silicate concentrations using spectrophotometric methods (Hansen and Koroleff 1999).The latter two concentrations are necessary for accurate pCO 2 calculation.

Repeated DIC and TA measurements
Two critical aspects of reference materials are (1) their consistency of concentrations across the batch (from the first bottle to the last) and (2) their stability of concentrations over time.
In order to assess stability, randomly selected bottles were opened and TA and DIC were analyzed over a period of 12 and 15 months, respectively (0, 3, 6, 9, and 12/15 months).To assess consistency, at 0, 6, and 12/15 months, four bottles were opened each time, covering the entire range of bottle numbers, that is, one close to the first bottle, one around the 20th bottle, one around the 40th bottle, and one around the 60th bottle.For TA, five replicates were measured from each bottle, with the highest and lowest values being excluded.For DIC, nine replicates were measured, discarding the two highest and two lowest values of each bottle.For 3 and 9 months, one random bottle was opened, and 7 replicates for TA and 15 replicates of DIC were analyzed.The two highest and lowest TA measurements and the four highest and lowest DIC were excluded from further analysis.
To allow for carbonate chemistry speciation calculations, salinity was determined measured on $200 mL of CO 2 equilibrated seawater taken from the 100 L tank.Using a conductivity cell (6.0917.080,Metrohm) connected to a 914 pH/conductometer (Metrohm), the salinity was measured on the practical salinity scale (Lewis and Perkin 1981) and recorded being 36.74.

Carbonate chemistry calculations and statistical tests
Carbonate chemistry speciation at room temperature (21 C) was calculated using the CO2SYS script (Sharp et al. 2021) for MATLAB ® (MathWorks), based on the measured salinity, TA, DIC, as well as the phosphate and silicate concentrations.The dissociation constants for sulfuric and carbonic acids, as well as total boron concentrations were taken from Dickson (1990), Lueker et al. (2000), and Uppström (1974), respectively.
To assess the consistency and stability of TA and DIC over time, linear regressions were performed, and F tests were conducted at a 95% confidence level.Furthermore, control charts were also prepared following SOP 22 from Dickson et al. (2007) to evaluate the consistency in the measurement technique.These charts compile all the collected data over the 12 months for TA and 15 months for DIC.A warning range and a control range were calculated, corresponding to the average TA or DIC from all the measurements AE 2 and AE 3 standard deviations, respectively.Typically, 95% of the measurements (70 out of 74 for DIC, and 40 out of 42 for TA) should remain within the warning limits, and ideally none beyond the control limits (Dickson et al. 2007).

Control charts
A total of 42 individual measurements of TA and 74 individual measurements of DIC were conducted over a period of 12 and 15 months, respectively (Fig. 2).The overall mean and standard deviation for TA were calculated at 2292.1 AE 2.2 μmol kg À1 .Two measurements at the 3 months mark exceeded the upper warning limit, but none was outside the control limits (Fig. 2).The measurements from the first two bottles were also close to the upper warning, while the remaining measurements were more centered around the mean TA.
Regarding DIC, the mean and standard deviation were 2056.7 AE 2.0 μmol kg À1 .Again, about 95% of the measurements were within the warning limits, with only four measurements at the 9 months mark exceeding the higher warning limit (Fig. 2).No measurements were beyond the control limits.

Batch consistency
Ensuring consistency in TA and DIC among the reference material bottles is a crucial aspect of their preparation.Such consistency was tested at 0, 6, and 12 months (Fig. 3).At the 0 months mark, a statistically significant negative trend in TA was observed.TA decreased in bottles filled later during the bottling process, that is, at higher bottle number, with an overall decrease of 3.8 μmol kg À1 from the first to the last bottle.However, this trend was not observed at the 6 and 12 months marks, with respective averages of 2291.0AE 0.9 and 2291.2AE 0.8 μmol kg À1 .
For DIC, a statistically significant trend was observed at the 6 months mark, where there was an increase of 4.4 μmol kg À1 between the first and the last bottle.The corresponding average DIC was calculated at 2056.2 AE 2.3 μmol kg À1 .However, as for the TA, this trend was not observed at the 0 and 15 months marks, with respective averages of 2055.4AE 1.0 and 2057.6 AE 0.8 μmol kg À1 .

Batch stability
The initial TA, that is, at 0 months, was 2293.8AE 1.8 μmol kg À1 with higher variability than at other time  points, ranging from 2291.3 to 2295.7 μmol kg À1 (Fig. 4).Subsequent measurements had standard deviations below 1 μmol kg À1 , with average TA of 2296.2 (based on one measurement), 2291.0AE 0.9, 2289.2AE 0.7, and 2291.2AE 0.8 μmol kg À1 .There was a statistically significant negative trend in TA over time, decreasing by $ 0.25 μmol kg À1 per month.

Dissolved inorganic nutrient concentrations, laboratory pCO 2 , and other carbonate chemistry parameters
Nitrate and nitrite, phosphate and silicate concentrations were 9.59 AE 0.02, 0.32 AE 0.03, and 1.41 AE 0.07 μmol kg À1 , respectively.The laboratory atmospheric pCO 2 during the bubbling process was 479.1 AE 4.3 μatm, as measured by the Picarro gas analyzer.The pCO 2 of each bottle opened at the beginning of the experiment was calculated at 477.1, 477.0, 481.1, and 486.9 μatm, with an average of 480.53 AE 4.7 μatm.

Discussion
As mentioned earlier, reference materials play a crucial role in ocean carbonate chemistry measurements.The consistency of DIC and TA concentrations throughout the reference material batch and their stability over time are two crucial factors in the reference material preparation.Additionally, repeated measurements made on an opened bottle must be closely monitored to ensure no drift, for instance due to CO 2 gas exchange or evaporation.

Batch consistency
The overall mean TA was $2291.9AE 2.2 μmol kg À1 .A statistically significant correlation was observed between the bottle number and TA for the first sampling point at 0 months (Fig. 3).However, a gradient within the 100 L tank is unlikely, given that the seawater was continuously mixed by aeration for several days and physical means prior to bottling.Furthermore, a concentration decrease during bottling is improbable since the batch would need to be diluted, and potential evaporation would lead to an increase rather than a decrease in TA.Additionally, no statistically significant trend of TA with bottle number was observed at 6 and 12 months.Thus, the observed trend is most likely linked to an issue with the initial measurements of bottle #1 and #22 at the 0 months mark, which exhibited an almost continuous decrease of 2 μmol kg À1 (Fig. 2).Moreover, the last two bottles at the 0 months mark fit the measurements at 6 and 12 months quite well.A discussion of potential reasons for apparent outliers can be found in section "Outliers." The overall mean DIC was 2056.5 AE 1.7 μmol kg À1 .A statistically significant trend with bottle number was observed in the DIC measured at 6 months, which was mostly driven by measurements in bottle #65.However, no statistically significant trend was observed at 0 and 15 months.This suggests that the DIC concentration across the batch was consistent, and that there was likely an issue with bottle #65 or its measurement.

Batch stability
Over a 12 months period, a statistically significant linear trend in TA was observed, with a decrease of À0.25 μmol kg À1 per month.However, if the problematic data points identified above, that is, from bottle #1 and #22 measured at 0 months would be removed, the decrease would only be 0.1 μmol kg À1 per month, without statistical significance.Ultimately, TA appeared to remain relatively stable over a 12 months period.
In contrast to TA, removing DIC data from the potential problematic bottle #65 would hardly change the slope, and the trend would still be statistically significant.It is unlikely that the increase in DIC was due to atmospheric CO 2 ingassing, as all bottles were tightly sealed with a greased glass stopper, and the seawater had been equilibrated to laboratory air pCO 2 , which differed by only 1.4 μatm.One potential explanation for an increase in DIC would be bacterial respiration, despite HgCl 2 poisoning and UV exposure, highly effective procedures to stabilize biological samples over time (Kirkwood 1992;Dickson et al. 2007).And indeed, although very rare, growth of HgCl 2 resistant bacteria has been reported/suspected in some CRM batches (Dickson 2010; National Centers for Environmental Information 2022).However, the measured increase in DIC over a 15 months period was only about 2.2 μmol kg À1 , which falls within the typical range of uncertainty for replicate measurements using state-ofthe-art techniques (Dickson 2010;Riebesell et al. 2011).In the end, it is unclear whether the observed increase in DIC was the result of biological activity, or due to chance.Whichever explanation is true, the increase in DIC over 15 months was relatively small, so that for practical purposes, the batch could still be considered stable over this period of time.

Outliers
Several potential outliers were observed for both the DIC and TA data.While instrument errors during measurements cannot be ruled out, another explanation would be that individual bottles indeed had different TA and DIC concentrations.This is despite treating all bottles the same and very carefully during preparation steps (washing, poisoning, sealing, and storage).Different TA could be explained by issues during the washing process, while higher DIC concentrations could be due to incomplete or missing HgCl 2 addition to certain bottles.
When it comes to potential instrument errors, the apparent decrease in TA at the 0 months mark with consecutive measurements (Fig. 2a) may be attributed to acid carryover in the autosampler when the dispenser, stirrer and pH electrode are not thoroughly rinsed with MilliQ between samples.While the measurement procedures have been designed to prevent such carryover, it is still a possibility under certain circumstances that would explain the decrease in TA concentrations of replicates in a measurement run.

Further considerations
To ensure batch stability over time, one could consider the use of artificial seawater instead of natural seawater for the preparation of IHRMs.This is because biological activity, which can cause changes in DIC and TA concentrations, is likely reduced in artificial compared to natural seawater due to the absence of significant macronutrients such as nitrate/ nitrite and phosphate, even in nonpoisoned samples.Moreover, the lack of particulate or dissolved organic carbon in artificial seawater also impedes heterotrophic growth.
Various recipes for artificial seawaters can be found in the literature, including those by Berges et al. (2001), Kester et al. (1967), Harrison et al. (1980), Hunt and Mandoli (2008).Using a very basic artificial seawater recipe, that is, a sodium chloride solution, allows for infrared and coulometric DIC measurements, but when it comes to TA determination using potentiometric titrations, the standard fitting equations developed for a full seawater ion matrix are not applicable (Dickson et al., 2007).The error introduced to TA measurements by using artificial seawater is difficult to assess, as IHRM measurements are ultimately referenced against a CRM.Also, carbonate chemistry speciation, such as pCO 2 , cannot be calculated from TA and DIC in a pure sodium chloride solution because all stoichiometric equilibrium constants required have been determined in a full seawater ion matrix.Hence, the recipe of Kester et al. (1967) is recommended as it is closest to natural seawater in term of ion composition.Another advantage of artificial seawater is that DIC and TA can be controlled directly through the combined addition of sodium carbonate and sodium bicarbonate, or sodium carbonate and hydrochloric acid.The pCO 2 could also be adjusted directly without the need for aeration.Finally, commercially available salts for artificial seawater preparation could contain inorganic and/or organic alkalinity contaminants.These contaminants could artificially raise TA above targeted levels but should not constitute a problem if the actual samples are ultimately referenced against CRMs with the IHRMs main purpose to assess daily or longer-term calibration drift, as done here.
Another important aspect to consider when using IHRMs is that overall uncertainty is typically higher than for CRMs.In our case, with 2.2 and 2.0 μmol kg À1 for TA and DIC, respectively, it was about three times higher than for a typical CRM.That means that samples corrected with this IHRM will also have up to a three times higher uncertainty for calculated carbonate chemistry speciation parameters.For instance, the uncertainty in calculated pCO 2 is about 5 μatm in a CRM, while about 14 μatm in our IHRM (Table 1).Having significantly higher uncertainty in measured and calculated carbonate chemistry parameters might pose problems for certain applications, where TA and DIC measurements require low uncertainty, that is, for detecting so-called "climate" signals (long-term changes in carbonate chemistry speciation) as opposed to "weather" signals of spatial or diel/seasonal variability (Newton et al. 2015).

Conclusion
In summary, our findings demonstrate that the preparation of consistent and stable in-house reference material for DIC and TA is achievable.Despite the slightly higher overall uncertainty, IHRMs are an alternative to CRMs when their access becomes limited or only "weather" instead of "climate" signals need to be detected.However, careful preparation procedures, which include aeration for pCO 2 equilibration, regular crossreferencing against CRMs to identify outliers or IHRM instability, and constant quality control are required to achieve reliable and reproducible results.These processes, time-consuming and hence posing a considerable cost for most laboratories, favor the use of commercially available CRMs.Therefore, the production of in-house reference materials is probably only advisable when the access to CRMs is severely limited, as was the case during the COVID-19 pandemic.]), and CaCO 3 saturation state in respect to calcite (Ω C ) and aragonite (Ω A ) are for 21 C using TA, DIC, and salinity data of two CRMs and the respective average IHRM from the 15 months survey.An uncertainty ratio is calculated for each parameter, where the overall IHRMs standard deviation is divided by the averaged CRMs standard deviations (uncertainty ratios).

Fig. 1 .
Fig. 1.Seawater equilibration setup.Atmospheric air was pumped into Milli Q before bubbling the natural seawater contained in the tank.

Fig. 2 .
Fig. 2. Control charts for individual TA (a) and DIC (b) measurements performed over the 12 and 15 months monitoring period.A gray line represents the average (mean), gray dotted lines represent the upper and lower control limits (UCL and LCL), while gray dashed lines represent the upper and lower warning limits (UWL and LWL).See materials and procedures, section 5, for details.

Fig. 3 .
Fig. 3. Measured TA (triplicates) and DIC (five replicates) from four bottles at 0, 6, and 12/15 months, expressed in μmol kg À1 .A linear regression, was fitted through each time point.An asterisk of the corresponding color is displayed on the right end side of the graph when statistically significant.

Table 1 .
Comparison of uncertainties for carbonate chemistry parameters using CRMs or our IHRMs.