Sampling errors arising from carousel entrainment and insufficient flushing of oceanographic sampling bottles

Collection of representative water samples is necessary for limnology and oceanography. Unfortunately, some common sampling practices using current technology have the potential to introduce significant sampling errors. For example, modern carousel hardware and software can permit closing of sampling bottles as soon as the bottle reaches the desired depth rather than allowing sufficient time (i.e., soak time) for ambient water to flush the sampling bottles. The large size of many conductivity–temperature–depth (CTD)/carousels and their associated instrumentation also increase the impacts of water entrainment as the equipment travels within the water column. Finally, some modern sampling bottles have small openings relative to their volumes, a factor that inhibits bottle flushing, particularly if the bottle closures are not completely open. Inspection of data from selected research cruises suggests that insufficient soak times can produce biased water samples. In this study, we undertook field experiments that help to quantify the errors that can arise from CTD carousel entrainment and insufficient bottle flushing. The experiments demonstrate that under stratified conditions, soak times of more than 2 min may be required to collect representative water samples. The experiments also demonstrate the occurrence of stratification within sample bottles. Some protocols that may reduce sampling errors are suggested.

Obtaining representative samples is a problem that pervades the scientific enterprise. Early oceanographic investigators such as Nansen gave considerable thought to the design of sampling bottles and thermometers (e.g., equipment examples in Sverdrup et al. 1942). In this study, we suggest that the process of obtaining water samples may need renewed attention.
Conductivity-temperature-depth (CTD)/rosette systems became common in oceanography after about 1975 (e.g., Fofonoff et al. 1974). Prior to that, in situ water sampling generally required attaching an array of individual bottles to a hydrowire (Fig. 1). These bottles were often equipped with reversing mercury thermometers that required several minutes at the sampling depth (soak time) for equilibration, and the bottles were closed (tripped) by messengers that descended at 150-200 m/min (U.S. Naval Oceanographic Office 1968: Pub. 607; Sverdrup et al. 1942). The delay associated with messengers could add appreciably to the "soak" times required for thermal equilibration. Estimating the depth at which samples were collected with the older technology was somewhat problematic, being dependent on measurements of wire out length, wire angles, and the differences in the readings of protected (against pressure) and unprotected reversing thermometers. Today, this older technology has been almost entirely replaced by CTD/rosette systems (carousels) with bottles that can be electronically triggered to close (trip) at any time interval after a bottle reaches a desired sampling depth (Fig. 1). In addition, the need for obtaining chemically cleaner samples and/or large samples for various biological and trace metal programs has sometimes resulted in the employment of sampling bottles (Grasshoff et al. 1999) with small opening areas relative to bottle volume, a factor that inhibits flushing (Weiss 1971). For example, the diameter of the openings on the 10 and 12 liter Niskin type bottles used on the Sikuliaq and Healy was 74 mm and the diameter of the opening on a much larger 30 liter Go-Flo bottle is only 90 mm. Although a recent ongoing program, GO-SHIP, recommends waiting at least 20 s at depth before tripping bottles, and in some cases possibly up to 1 min for better results (Kawano 2010;Swift 2010), these cautions are often neglected in other programs. The results presented here suggest that these recommended soak times can be too short under some circumstances and that sometimes soak times in excess of 2-3 min are required to obtain representative samples.
The motivation for this study arises from past real-world experiences on expeditions such as the U.S. Joint Global Ocean Flux Studies of the Arabian Sea and Southern Ocean (Smith et al. 1998(Smith et al. , 2000, and from more recent analysis of data from the Arctic. The Arctic data are from the Shelf Basin Interactions Project (SBI; Grebmeier and Harvey 2005), specifically from one cruise, HLY0303, where rapid carousel deployments occurred. We provide an example (Fig. 2) of four vertical profiles that show large differences between the measured bottle salinity (measured shipboard using a salinometer) and the CTD electronic profile. These data from a strongly salt-stratified Arctic water column suggest that tripping bottles on a carousel after insufficient soak times lead to bottle salinities that do not represent conditions at the bottle tripping depths. Note that on the up portion of the casts, bottle salinities are often significantly higher than the corresponding CTD salinities. Conversely (and as would be expected) the deepest bottles tripped after the CTD has stopped descending sometimes yield salinity values that tend to be lower than the CTD salinities despite the likelihood of reduced winch speeds as the carousel approaches the bottom and pauses due to data entry tasks when switching from downcast to upcast. Weiss (1971) evaluated the flushing characteristics of several types of oceanographic sampling bottles attached to a hydrowire, including Nansen and NMS series (Niskin type bottles; General Oceanics), ranging in size from 1.3 to 30 liters. He developed an idealized model of how sample bottle flushing affects the concentration of a chemical constituent within the bottle: where c o is the initial concentration, c is the final concentration, a is the opening area of the bottle, v is the bottle volume, and z is the distance traveled. Weiss (1971)   In addition to bias due to bottle flushing characteristics, water entrainment by the carousel can also introduce sampling bias. An example is given (Fig. 3) that shows a two-layer system with a more saline bottom boundary layer and a less saline surface mixed layer. The downcast CTD salinity displays a sharp gradient between 50 and 60 dbar. During the ascent (at 30 dbar/min) through the gradient, the CTD salinities display bias over a length of about 30 dbar.
Intra-bottle stratification can also bias results. Smethie and Buchholtz (1980) investigated intra-bottle stratification for dissolved oxygen while determining best practices to sample small scale ( 2 m) gradients. They deployed 30 liter Niskin bottles directly on a hydrowire every 2 m over a 10m length. They then oscillated the hydrowire up and down ("yo-yoed") with the bottles open at an amplitude of 2-4 m with an oscillation period of 10 s, for a total of about 100 s before tripping the bottles. After retrieval, five oxygen samples were taken from each bottle. They found no evidence of intra-bottle stratification in this experimental procedure that simulated rougher sea states. However, the data presented here are taken from salt-stratified quiescent water and demonstrate that intra-bottle stratification can occur.
To better quantify the impact of modern oceanographic practices on the quality of bottle data, and to suggest protocols that can alleviate these problems, experiments were conducted during three field expeditions (Fig. 4)

Materials and procedures
Similar Sea-Bird model SBE 32 carousels (Sea-Bird Electronics) were employed aboard the USCGC Healy and RV Sikuliaq. The frames' horizontal diameters were 1.50 m, and their heights were 1.80 m. The USCGC Healy's carousel was equipped with twenty-four 12 liter Niskin bottles (Fig. 5) fitted with external springs. The carousel on the RV Sikuliaq was equipped with twenty-four 10 liter Niskin type bottles from Ocean Test Equipment (Fig. 6) fitted with internal springs. Both CTD/rosette systems employed Sea-Bird SBE 911plus CTDs equipped with dual temperature and conductivity sensors, a dissolved oxygen sensor, and various other sensors (e.g., chlorophyll fluorometers). The dual temperature and conductivity sensors are often referred to as "primary" and "secondary" sensors, but we refer to them as sensor package I and sensor package II, since the inherent accuracy and precision of each sensor suite is equivalent.

Experiment types
Three main types of experiments were conducted to assess the potential impacts of carousel entrainment and insufficient bottle flushing. Experiment Type I was designed to evaluate frequently employed soaking time procedure during which the rosette is stopped for a period of time at the desired sampling depth before the bottles are closed. Bottles were tripped over time periods of up to 280 s while the carousel was stopped at a sampling depth.
Experiment Type II was similar to Type I experiments in that the rosette was stopped before the bottles were closed, but a yo-yo motion that consisted of raising the carousel 1 dbar and then lowering it 1 dbar was added after each bottle trip to mimic the motion of the ship in somewhat rougher seas. Since the yo-yo motion was not sufficiently precise to always move the carousel exactly back to the original depth, the reported salinity values for Type II experiments give a depth range. The target bottle tripping times for experiment Types I and II conducted during cruise HLY1301 are shown in Table 1.
Experiment Type III matched the "tripping on the fly" method, where the rosette is not stopped when tripping a bottle. Five separate casts were conducted for this experiment each at differing ascent speeds, ranging from 3 to 25 m/min. Bottles were tripped at selected intervals as the carousel ascended. These experiments were based on the assumption that downcast and upcast data should be the same if there were no artifacts introduced by flushing and entrainment. This seems reasonable for our relatively quick turnaround shallow profiles.
Two supplementary experiments were also conducted: (1) During some Type I experiments, bottle salinity values from the top and bottom of Niskin sampling bottles were compared to assess the potential for within bottle stratification; (2) During HLY1702, paired Niskin bottles with one as fully open as possible and the other with the end caps partially obstructing the bottle opening were closed at the same time to assess the impact of inconsistent bottle cocking and Weiss' v/a parameter on bottle flushing.
It is important to note that, except for one Sikuliaq station (SKQ201505s_001), sea states were relatively calm during all experiments and that the ships were often in ice, meaning ship motion-both linear (i.e., heave) and rotational (i.e., pitch and roll)-had a negligible role in vertical oscillations of the carousel during the experiments. This facilitated evaluating the impacts of bottle flushing and entrainment, and varying soak times independent of ship roll.
Each downcast, regardless of the experiment type, was deployed similarly as per GO-SHIP protocol (McTaggart  et al. 2010). The carousel was submerged to 10 m below the surface for about a minute and then brought back up to 1 m below the surface right before being immediately lowered to the sea floor at 25-30 m/min. The ascent speeds for Type I and II experiments were 25-30 m/min between sampling depths. Differences between the salinities recorded by the CTD at the time a Niskin bottle was closed, and the salinity of the water within the bottle were used to assess the degree of bottle flushing. Note that the sample bottles are located above the CTD sensor, so some bias may be introduced by m/dbar scale salinity gradients. This factor was taken into account during analysis of the data.

Water samples
During the cruises, salinity samples were collected from the carousel sampling bottles using 250 mL clear glass bottles with plastic screw tops with conical inserts (cruise SKQ201505s) or plastic caps and separate inserts (cruises HLY1301 and HLY1702). Duplicates were always collected during cruise HLY1301. Salinities were determined on-board with Guildline salinometers (8400B Autosal on the USCGC Healy; 8410A Portasal on the RV Sikuliaq). The International Association for the Physical Sciences of the Oceans (IAPSO) seawater standard was used to calibrate the salinometers; for cruise SKQ201505sbatch P155, expiration date = September 2015, K 15 = 0.99981; for cruise HLY1301-P series batch not available, K 15 = 0.99984; for cruise HLY1702-batch P160, expiration date = July 2019, K 15 = 0.99983. Salinity bottles were rinsed at least three times before collecting a sample. The bottles were then filled to the bottle neck. They were generally the first samples drawn, that is, from the bottom of the sample bottles, but in the experiments that explored the possibility of stratification within sampling bottles, an initial sample was followed by a sample drawn when the bottles were almost empty, as is often the case during experiments where many types of samples are drawn. During cruise SKQ201505s, two samples were drawn only when examining intra-bottle stratification. The salinity samples were then stored for up to 3 d before analysis. Sample temperatures were monitored with digital thermometers to ensure that sample temperatures were close to the Autosal or Portasal water bath temperatures (21 C on the USCGC Healy and 23 or 25 on the RV Sikuliaq) before testing. Once equilibrated, the salinity samples were analyzed. The salinometers were calibrated before and at the end of each run (no more than 24 samples) with IAPSO Seawater Standard for cruise HLY1301 and before each run for cruise SKQ201505s (no more than 16 samples). The salinometers on both ships were connected to computers that employed Scripps Institution of Oceanography's Ocean Data Facility software to guide and prompt the analyst.

CTD salinity corrections
The SBE 911plus CTD, Guildline Autosal, and Guildline Portasal have stated salinity accuracies of ± 0.003 (Sea-Bird; http://www.seabird.com/sbe911plus-ctd; last accessed: 2018-02-17), ± 0.002 (Guildline 2006), and ± 0.003 (Guildline 2002), respectively. Thus, the salinity differences between a well calibrated CTD and well calibrated salinometer sample from a well-flushed bottle should be within about ± 0.004 for both the USCGC Healy and RV Sikuliaq data. CTD salinity values can, however, start to drift after a factory calibration. The CTD employed during cruise HLY1301 was calibrated 4 months before the cruise, which was apparently long enough for significant sensor drift appear in the data. To harmonize cruise HLY1301 CTD and Autosal data, the salinity difference, Δs,  between each CTD sensor package and well-flushed bottle samples from mixed bottom layers were compared (Table 2).
where s b is the bottle salinity value and s c is the CTD salinity value. There were five stations for cruise HLY1301 where CTD and well-flushed bottle salinity values (180 s soak times) were collected in mixed bottom layers at least 10 dbar thick. USCGC Healy CTD sensor package I had a mean salinity 0.014 lower than the Autosal salinities, suggesting a significant post calibration shift. CTD sensor package II had a mean salinity 0.007 lower than the Autosal values, just outside of the expected range. Because the bottle trip CTD files (.BTL) only provided data from sensor package I, analysis of the Healy data employed the sensor package I data after correcting for the 0.014 difference described above. Two tests on cruise SKQ201505s suggested that Portasal salinities averaged 0.002 higher than CTD salinities. Since this difference is well within the stated accuracies of the CTD and the Portasal salinometer, no corrections for CTD vs. salinometer salinities were made for these samples.
To help normalize the results for ambient gradients of different strength during experiment Types I and II, the percent of undisturbed ambient water at a given depth present in each bottle sample was calculated using percent ambient salinity as a proxy: where s b is the bottle salinity, s a is the ambient salinity, and s s is the ambient salinity of the prior depth where the carousel stopped. The ambient salinity (s a ) is defined as the last CTD salinity reading at the conclusion of each time series experiment. These calculations come with two caveats. The first is that it is presumed the last CTD salinity reading is a reliable estimate of the ambient value. The second caveat has to do with the vertical displacement between the bottles and CTD on the carousel; the bottles being located 1 m above the CTD. Thus, in relatively large salinity gradients, real salinity differences between the bottles and CTD can exist even after they have equilibrated to ambient water. Therefore, equilibrated bottle salinities can be significantly less saline than CTD salinity values when salinities increase strongly with depth, which will result in apparent percent values over 100%. All we are trying to do in these calculations is to give a rough idea of how much ambient water is in the bottle. Our two-point calculations start with the bottle filled with ambient "CTDcont" is the CTD salinity trace binned into 1 s averages. "CTDdisc" is the CTD salinity at the time of Niskin bottle tripping. "Btl S1" is the salinity determined by salinometer from a sample taken initially (bottom of Niskin bottle). "Btl S2" is the salinometer salinity taken from a sample collected after the bottle was mostly emptied and represents water that was initially near the top of the Niskin bottle.
water from well below the sampling depth for which we are calculating percent ambient water. This tends to maximize the difference in salinities and probably leads to overestimates of percent ambient water. Since this conservative type of calculation showed that there are problems, we did not feel the need to go into higher mathematics. Experimental conditions necessitated somewhat subjective criteria for choosing the prior depth salinities that provided the baseline for estimating percent bottle flushing over time. These calculations, nevertheless, proved useful for visualizing bottle flushing progress in the face of varying vertical salinity gradients. The ambient salinity of the prior depth (s s ) was estimated in three different ways. If there was only one bottle tripping depth at a given station at or near the halocline, s s is the CTD salinity from the deepest part of the profile. If the first sampling depth in a series is at the bottom of the profile, s s is the CTD salinity at 12 dbar from the carousel downcast. For the rest of the sampling depths in a series, s s is the CTD salinity for the last bottle trip at the prior sampling depth.
Only data collected at or deeper than 12 dbar were included in the quantitative analysis of the Type I-III experiments and the intra-bottle stratification experiments. We decided on this protocol, because of indications that the results could, at times, have been impacted by ship discharges (engine cooling water) and turbulence from the ship's propulsion system at depths ≤ 12 dbar.

Overview
Detailed descriptions for all of the experiments are provided in the M.S. thesis by Paver (2017). A re-analysis and condensation of this thesis forms the basis for this article. The reanalysis included adding continuous CTD data from bottle sampling depths to the analysis of Type I and II experiments. In addition, some minor corrections are included. For example, it was necessary to switch to a reliance on the corrected data from Healy (HLY1301) sensor package I rather than from sensor package II because it was discovered that the bottle trip files (.BTL) could only provide data from sensor package I. Examples of each type of experiment are provided here, followed by summary information for the ensemble of data for each type of experiment. Data that relate to bottle cocking variations and intra-bottle salinity stratification are also (c-e) for 9, 17, and 49 dbar, "CTDcont" is the CTD salinity trace binned into 1 s averages. "CTDdisc" is the CTD salinity at Niskin bottle tripping. "Btl S1" is the salinity determined by salinometer from a sample taken initially (bottom of Niskin bottle). "Btl S2" is the salinometer salinity taken from a sample collected after the bottle was mostly emptied and represents water that was initially near the top of the Niskin bottle.
presented. Points to keep in mind when interpreting the detailed descriptions of Type I-III experiments are as follows: (1) Entrainment (Fig. 2) and insufficient flushing cause initial bottle salinities to be biased low when the carousel is descending in an increasing gradient and to be biased high when the carousel is ascending; (2) The sampling bottles act like low-pass filters such that their signals can lag the CTD data.

Details for two selected Type I experiments
The first Type I experiment (Fig. 7) took place during station HLY1301_00501. The carousel was stopped in the middle of a strong salinity gradient at about 15 dbar where the ambient salinity was 32.00 Twelve bottles were tripped over a period of 281 s, but the bottle for t = 60 s malfunctioned. Initial bottle and CTD salinities were higher than the ambient salinity, as expected, at 32.589 and 32.042, respectively. After the initial sample the next 120 s show lower salinities than the final values, which may suggest an oscillatory motion of the isohalines that could be interpreted as an internal wave initiated by the motion of the carousel. The CTD and bottle salinities appear to stabilize and reflect ambient conditions between 120 and 150 s, with the bottle salinities lower by an amount that can be explained by their higher position in the water column and the local salinity gradient. Salinities were taken from the top and bottom of the Niskin bottles, and they suggest stratification in some of the bottles.
Data from another Type I experiment (HLY1301_07901) conducted at multiple depths (Fig. 8, Table 3) suggest that it would take 90 s for the CTD salinities to stabilize at 17 dbar and 30 s at 49 dbar. They also show that it would take more than 180 s (the length of the experiments) for the bottle salinities to stabilize. The 9 dbar data from this station do not meet the 12 dbar criterion for including in our calculations of stability times, but they do provide a further example suggesting that carousel motion in regions of stratification may stimulate internal waves. These data reveal a wave-like feature over and interval that approximately matches the local Brunt-Väisälä frequency ( 100 s period) calculated from the CTD downcast data using Sea-Bird Seasoft software (Sea-Bird Electronics 2013). Although excluded from estimates for the time required for the CTD and bottle data to mirror ambient conditions because of its shallow depth (≤ 12 dbar), the results from this experiment do not appear to be unduly influenced by ship effects.

Details for a selected Type II experiment
This Type II experiment (HLY 1301_3802, Fig. 9) was taken at a depth of 20 dbar in a salinity gradient of 0.2/dbar. After the initial bottle trip, the carousel was yo-yoed about 1 dbar  Percent ambient salinity is calculated as described in Materials and procedures. (c) "CTDcont" is the CTD salinity trace binned into 1 s averages. "CTDdisc" is the CTD salinity at Niskin bottle tripping. "Btl S1" is the salinity determined by salinometer from a sample taken initially (bottom of Niskin bottle). "Btl S2" is the salinometer salinity taken from a sample collected after the bottle was mostly emptied and represents water that was initially near the top of the Niskin bottle. Details for a selected Type III experiment During this Type III experiment (HLY1301_ 00901, Fig. 10), 12 bottles were tripped as the carousel ascended at a speed of 14 dbar/min. Salinity inversions from depths shallower than 12 dbar suggest ship effects, so only the eight deepest sampling locations were analyzed. These data suggest a strong positive correlation between deviations from the downcast CTD salinities and the local salinity gradient. With one exception, the magnitude of the CTD and bottle salinity deviations from the downcast values were similar, suggesting that in these cases, entrainment may have contributed to the deviations more than bottle flushing.
To get a rough idea of how the salinity differences in our on-the-fly experiments translate into percent ambient water in the data for Fig. 10, we have made the assumption that downcast CTD salinity values can be substituted for s a and s s in Eq. 3. The results suggest that tripping on-the-fly can produce samples that only have a minority of their water from the tripping depth. For the data shown, the percent ambient values deeper than our 12 dbar cutoff were mostly less than 20% with only one value above 50%. Our other on-the-fly experiments gave better results, but the percent ambient values were frequently less than 60%.

Summary of Type I experiments
The Type I experimental data suggest that the time required for bottle samples to replicate ambient salinity values in the encountered gradients, that is, s/dbar, usually (with only 2 exceptions out of 13 observations) exceeds 1 min and often (8 out of 13 observations) exceeded 3 min (Fig. 11, Table 3). Equilibration times increased with increasing salinity gradients as would be expected since the number of e-foldings-the vertical travel distance interval of a bottle in which a relict water mass is removed from the bottle by a factor of   1-1/e-required for bottle values to approach ambient values within instrument accuracy would increase with increasing salinity differences.

Intra-bottle stratification during Type I experiments
Thirty-three observations (Fig. 12) were obtained during the Type I experiments for intra-bottle stratification. Twenty of these observations displayed significant stratification. Twelve of the salinity differences were within expected instrumental accuracy, but still tended to display stratification, because the upper bottle salinities were less saline than the lower bottle salinities. Data also suggest that larger intra-bottle salinity differences are present in relatively larger ambient salinity gradients, as would be expected.

Type II experiments
Five Type II experiments were conducted, but two of these were from depths shallower than 12 dbar and are therefore not included in the estimates for the time required to achieve ambient conditions. The remaining Type II experiments (Table 3) suggest that the time it took for bottle samples to replicate ambient salinity values in the encountered gradients usually took no more than 90 s (Fig. 11). The CTD salinities generally equilibrated to ambient water within 45 to 50 s. As with the Type I experiments, equilibration times appeared to increase with increasing salinity gradients. Overall, equilibration times for the Type II experiments are shorter than the Type I experiments, and there was little evidence of intra-bottle salinity stratification during Type II experiments, presumably because of the induced yo-yo motion. In Fig. 11, we include with the Type II data an open water experiment (SKQ001) where the time required to achieve ambient conditions was 98 s (Table 3), because there was significant ship motion during this Type I experiment.

Type III experiments
We undertook five experiments, during which the carousel was raised at a steady pace while the bottles were tripped on the fly, because some research programs (e.g., Measures et al. 2008;Cutter and Bruland 2012) collect water samples from continuously ascending carousels to limit exposure to trace metals, etc. This means that there was essentially no soak time for each bottle at the tripping depth. During the HLY1301 experiments, ascent rates for each subsequent cast were *Each cast contains three columns: G is the salinity gradient (Δs/m) for each bottle trip, C is the CTD upcast salinity minus CTD downcast salinity (Δs), and B is the bottle salinity minus the CTD downcast salinity (Δs). Data in grayed out cells were not included in the linear regression because of potential ship effects at depths < 12 dbar. *Each station contains three rows: Full Cock is the salinity value of the bottle with the fully cocked end plug, Partial Cock is the salinity value of the bottle with the partially cocked end plug, and the Difference is the partially cocked water salinity value minus the fully cocked water salinity value.
systematically reduced in speed: full (25 m/min), half (12 m/ min), quarter (6 m/min), and eighth (3 m/min) during the upcast. The bottles were closed in sequence once the carousel reached the bottom of a halocline to amplify the salinity difference. The ascent rate for the SKQ201505s experiment was about 10 dbar/min. These bottles were tripped in sequence-about once per minute-starting in the bottom boundary layer, through the boundary layer, and into the surface mixed layer. The data are presented in three parts: (1) the CTD salinity collected during the downcast (cd), (2) The CTD salinity collected during the upcast (cu), and (3) the bottle salinity collected during the upcast (bo). Each deployment was analyzed by calculating: (1) bottle salinity and the downcast CTD salinity differences (bo-cd), (2) salinity differences between the upcast CTD salinity and the downcast CTD salinity at bottle tripping pressures (cu-cd), and (3) salinity gradient at each bottle tripping depth (Table 4). The gradient is calculated from the following equation: where s p is the CTD downcast salinity value at a bottle tripping depth pressure, p, and s p +5 is the downcast salinity 5 dbar below the tripping depth. Unlike the gradient calculations from Type I and II experiments, the gradient in this series is calculated from the water column below the bottle tripping depth. Since the carousel is always ascending, even during bottle trips, the water above the carousel never has a chance to rebound downward, as it would when the carousel is stopped. The combined results (Fig. 13) for cu-cd linear regression gave a slope of 3.934 and an R 2 of 0.95. The combined results for bo-cd linear regression gave a slope of 4.656 and an R 2 of 0.98. Overall, these results suggest a significant positive relationship between the magnitude of a salinity gradient and the bias of the related data collected during the carousel's upcast. The bo-cd salinity differences are consistently greater than the cu-cd salinity differences. The difference between bo-cd and cucd was small relative to the overall signal suggesting that entrainment could be a larger factor than bottle flushing.

Bottle cocking experiments
According to Weiss' (1971) equation, the parameter v/a has a major impact on bottle flushing. Therefore, if bottle closures are cocked in ways that change the area of the opening for water; this could have a significant impact on bottle flushing (Fig. 14). To test this hypothesis, a suite of experiments was conducted during USCGC Healy cruise HLY1702 (26 August 2017-15 September 2017) in the Chukchi Sea. Two bottles were used for each experiment: one bottle's end plugs were set to allow the bottle ends to be as fully open as practical, whereas the second bottle closures were situated to expose roughly half of the openings (Fig. 14). Both bottles were closed at the same time during the experiment. The bottles were tripped on the fly as the carousel was ascending through a strong salinity gradient. These experiments were not meant to evaluate the quality of the salinity data as they relate to the ambient water, but rather to provide an assessment of the  would have a maximum impact on carousel data and another case where it would have a minimal impact. As the diagram suggests, a ship that is stationary with respect to the water column could be a worst-case scenario.
potential impact of flushing differences between fully open bottles and those that were partially blocked by the way the bottle was cocked.
The results of the five experiments (Table 5) show a mean salinity difference of 0.04, which depended on the strength of the gradient, with the partially cocked bottles having higher salinity values as expected. The salinity difference is significantly greater than the reported standard error of ± 0.002 for the Autosal aboard the USCGC Healy. Weiss (1971) showed that bottle flushing can be an issue for individually deployed bottles (not part of a carousel). This study shows that the employment of large carousel systems can exacerbate the flushing problem initially explored by Weiss in four ways: (1) Modern sampling bottles often are of large volumes and have relatively small openings resulting in relatively large flushing lengths (v/a); (2) The bottles may be cocked in inconsistent ways (e.g., Fig. 14) that result in even longer flushing lengths; (3) The bottles may be tripped with minimal soak times, sometimes as soon as the carousel reaches the desired depth; and (4) the shape and size of modern carousels encourages them to entrain significant quantities of water as they move through the water column.

Discussion
The upcast CTD and bottle salinity results of the "on-the-fly" experiments (Type III experiments) sometimes display relatively minimal differences that could, in some cases, be attributed to their relative locations on the carousel and the degree of the ambient salinity gradient. However, if the reasonable assumption is made that the downcast CTD salinities are as accurate as technology permits, comparison of the upcast and downcast CTD salinity readings almost always suggest that the upcasts entrain deeper water, biasing both the upcast CTD and the measured bottle salinities in relation to the water at the sampling depth. Since both the upcast bottle and CTD salinities are sometimes closer to each other than to the downcast CTD salinities, it is reasonable to infer that the upcast bias in both bottle and CTD salinities is often dominated by entrainment. The ensemble of the results indicates that bottle flushing characteristics add additional bias during Type III experiments. The results also suggest that the local gradient may be more influential than carousel ascent speed when tripping on the fly. Tripping bottles on the fly during the carousel ascent is not a preferred method for collecting representative water samples in any vertical property gradient unless avoiding contamination-for example by trace metals-is a concern that overrides the artifacts introduced by entrainment and flushing.
The overall results of Type I and II experiments display similar biases in the CTD and bottle data at t = 0 s, which should be expected since yo-yoing in the Type II experiment did not start until after the first bottle trip. Type I and II experiments sample the same depth for a period of time, therefore, these data provide an indication of the time scale over which the impact of the entrainment plumes and internal gravity waves are important. We assume that the effect of an entrainment plume ends when the upcast CTD salinities are indistinguishable from the downcast CTD salinities. Generally, it took less than 100 s for CTD values to reach this state, but bottle salinities frequently took much longer and sometimes did not become indistinguishable from the ambient salinities over the entire experimental lifetime of Type I experiments. This is the likely result of the added time required for bottle flushing. Although the data are limited, the Type II yo-yo experiments, meant to reflect a rougher sea state, and the one open-water experiment display a relatively faster approach to the "true" salinity (Table 3), which is to be expected since yo-yoing should promote bottle flushing (e.g., Smethie and Buchholtz 1980).
There are a wide range of estimates for appropriate soak time from 20 s (Kawano 2010;McTaggart et al. 2010;Swift 2010) to the more than 2 min that were sometimes encountered during this study. A partial explanation for this may include how ship drift impacts the association of the entrainment plume and the carousel. The impact of ship drift and thus lateral carousel drift may alter the carousel's position relative to the entrainment plume (Fig. 15). Ship motion is also a major factor as suggested by the comparisons of results from the Type I and Type II experiments. This requires further study.
Entrainment causes density inversions in the water column. Initial sinking and dispersion of the entrainment plume plus bottle flushing times dominate the signals in our experiments, but there are suggestions of subsequent rebound in the form of internal gravity waves. Oscillations in salinity over time while the instrument package was at rest were observed in some of the experiments. An example is provided in Fig. 8, and the observed period for this oscillation was 100 s. Using the CTD downcast data (binned into 1 dbar increments) to calculate the Brunt-Väisälä frequency at which internal waves should oscillate around a density gradient also yields a period of 100 s. Although water column oscillations could be due to externally forced internal waves, we suggest that the strong damping over time observed in the oscillations suggest that internal waves could be initiated by the carousel in a region of strong density gradients.
The initial magnitude of CTD and bottle salinity deviations from ambient conditions appear to arise from entrainment plume and bottle flushing characteristics, and these signals decrease over 1-3 min. Smaller deviations arising from internal waves initiated by the CTD/carousel motions might persist for longer periods. Smethie and Buchholtz (1980) showed that intra-bottle stratification is negligible-when sampling strong dissolved oxygen gradients-when the bottles were yo-yoed (to simulate ship motion). Intra-bottle experiments conducted during the Type I experiments under quiescent conditions aboard the USCGC Healy and RV Sikuliaq suggest that intra-bottle stratification can occur, and in some cases, even after the bottle was left open to equilibrate over periods of up to 180 s. The results also display an increase in salinity differences with increased ambient gradients.
Intuitively, one would expect that bottle flushing characteristics would be improved if care is taken to ensure that bottles are cocked so that they are as open as possible. Our paired bottle experiments on HLY1702, suggest that this is the case (Fig. 14, Table 5).
Salinity was chosen as the indicator variable for this bottle flushing study because a secondary instrument with faster flushing characteristics-the CTD-was available for comparison. While salinity is a good proxy for describing the effects of carousel entrainment and bottle flushing, other variables, such as nutrient and dissolved oxygen gradients, do not always parallel salinity gradients (Codispoti et al. 2005). Thus, what appears to be acceptable flushing with respect to salinity may not always be sufficient for other variables.

Recommendations
Several recommendations follow from these results. Bottle sample collection using a carousel depends on the environmental conditions. When practical, the ship should always be allowed to drift downwind, moderately, as this allows the carousel to drift out of the entrainment plume (Fig. 15).
According to Weiss' equations (Weiss 1971), each mixing length movement removes 67% of relict water. After moving five mixing lengths, the amount of water from other depths in the bottle would be 0.33 5 or only about 0.4% relict water, which may be negligible in most open ocean gradients. Since 10-12 liter Niskin type bottles have a mixing length of about 2 m, and the carousel motion induced by ship roll is often on the order of 2 m, this means that tripping bottles after three complete ship rolls ( 6 mixing lengths) may often be adequate. We note that Swift (2010) suggests waiting for two ship rolls before tripping, but the number of rolls to wait depends on how strong the rolls are, and the mixing lengths of the bottles that are being employed. Thus, there is no simple "rule of thumb," and a great dependence on actual conditions. Comparison of downcast and upcast CTD, and bottle salinities is a useful technique for estimating whether bottle flushing and entrainment effects have been appropriately minimized. Waiting for several ship rolls or longer to obtain a bottle sample would smooth the signal a bit, but any biases should be similar to those which are present in historical data. In addition, we presumed that these values would often average out to be close to the true ambient value, as suggested by Smethie and Buchholtz (1980). If a ship maintains station location, either naturally or by dynamic positioning, and/or is a large ship in relatively quiescent waters, this study indicates that soak times required to obtain representative samples may exceed 180 s.
Regardless of the gradient in salinity or other property of interest, failing to allow sufficient soak time will cause the bottle sample to represent an average of the entrainment plume and ambient water properties rather than a purely ambient estimate. For variables other than salinity within a strong gradient that differs significantly from salinity, sample collections over a 3-min period at several depths should be made to assess the appropriate soak times that would meet requirements.
With respect to proper bottle cocking practices our results (Table 5) suggest that more attention should be given to ensuring that the bottle caps are not restricting the flow of water through the bottles. Finally, for intra-bottle stratification, our data indicate that the value of any given variable may change vertically within the bottle under quiescent conditions and strong gradients. Thus, samples from the bottom and top of a sampling bottle may sometimes need to be averaged.
Future research considerations should include closer investigation of mechanical and environmental factors that contribute to carousel entrainment and bottle flushing. This includes carousel designs, additional instrument placement/designs, and bottle designs. Weiss' (1971) study shows that bottles designed with large effective a/v ratios are preferred. There are other types of sampling devices that might be worthy of further development, such as the WOCE water sampler (Albro et al. 1990), the PRISTINE sampler (Rijkenberg et al. 2015), and pumping systems (Codispoti et al. 1991). Instruments such as a Lowered, or Shipboard, Acoustic Doppler Current Profiler, measuring current speed, could help to determine whether samples are collected in quiescent waters or in waters with rapid currents that might accelerate the equilibration timescales.
In summary our recommendations are as follows: (1) ensure the sampling bottle caps are fully cocked prior to deployment, (2) allow the ship to drift with the current while on station, (3) in moderate swells (greater than 1 m), allow at least three swells to pass while flushing the bottle at a given depth/pressure, (4) in quiescent waters, allow the sampling bottle to flush for up to 3 min, and (5) average sample values from the top and bottom of sampling bottle when in vertical gradients.