MARHYS (MARine HYdrothermal Solutions) Database: A Global Compilation of Marine Hydrothermal Vent Fluid, End Member, and Seawater Compositions

We introduce a database of hydrothermal vent fluid compositions extracted from peer‐reviewed publications. The database includes general fluid parameters (e.g., temperature, salinity, and pH) as well as major‐, minor‐, and trace‐element concentrations (including rare earth elements) of dissolved cations, anions, reduced carbon compounds, and gases. In addition, isotopic compositions of elements and molecules are included. Each parameter in the database is given in a uniform unit and enables direct intercomparison of data from the incorporated sources. The database provides detailed information about geographic location, the date of sampling, and a broad set of supplementary information that enables clear identification of each individual sample and the original data source. This type of metadata was used to merge compositional data for discrete vent fluid samples that originate from different publications. Hence, the database provides a more complete set of compositional parameters than the original sources do. To facilitate operability, the database is provided as ®Excel sheet, which enables users to visualize, sort, filter, and evaluate the database in a straightforward way. The sample information metadata enables extraction of available vent fluid data for specific regions, tectonic settings, host rock types, etc. The database will be a useful tool in determining (1) mechanisms that set fluid chemistry, as well as (2) regional and global geochemical fluxes across the seabed. To demonstrate the extent of the database, we examine the global distribution of magnesium, chloride, and sodium concentrations in vent fluid samples that are incorporated in MARHYS Database.

of vents, and provides a wealth of general information on these but no information on the compositions of hydrothermal vent fluids. In 2013, the count of active vent sites had been 521, of which only about half was visually confirmed (S. E. Beaulieu et al., 2013). Despite a growing number of sea-going expeditions that particularly focus on sampling of submarine hydrothermal vents, a full assessment of vent fluid compositional variability and the underlying control mechanisms has not been achieved. More than one hundred individual publications list compositional parameters of thousands of individual vent fluid samples, and the number of reported vent fluid compositions is steadily increasing. Compilations that focus on specific regions or settings have been presented in the literature to achieve useful comparisons of new and published data (de Ronde & Stucker, 2015;Hannington et al., 2005;Pierre et al., 2018). Yet, there is a growing demand for a compilation of the full set of published vent fluid data, which enables researchers to examine data within the framework of the global range of vent fluid compositions. We present a first database that does provide this service. It holds detailed sample information and a broad spectrum of measured or calculated chemical parameters for focused and/or diffuse hydrothermal vent fluid, end member, and background seawater compositions.

Compilation of Vent Fluid Data
One of the challenges in creating this global database was to acquire additional information about the samples and the circumstances of sample recovery (e.g., description of exact location, time of sampling, sampler type, expedition, geocoordinates, etc.) and to arrange them into a common format for all samples. In each individual publication, these metadata are reported in a unique way. Sample information may be tabulated along with chemical data, be mentioned in the flow-text, provided in appendices, or they are not given at all. A part of our work was to acquire this information and to combine information about the same vent locations (or the same expeditions) given in individual sources. Our aim was to complement regularly lacking sample information to establish an entirely consistent set of sample information for individual vent sites.
The set of measured parameters is highly variable in the publications used in data mining, as they are often tailored to specific questions and research goals. There is no convention on how to report compositional data for vent fluids and therefore chemical species and the units are reported in different formats. Two of the most common compounds for which concentrations are tabulated with different species names in the headers are carbon and sulfide. Dissolved inorganic carbon (DIC) concentrations are sometimes given as CO 2 , HCO 3 − , or DIC. Likewise sulfide concentrations are reported as HS − , or H 2 S, or total sulfide. The use of multiple species names is misleading, and we used uniform labels in the database (ΣCO 2 , ΣH 2 S). The record ΣH 2 S in the database represents total sulfide concentration, just as the record ΣCO 2 represents concentration of DIC.
In terms of units, some sources report compositional data as mass-related quantities (e.g., mmol/kg), while other sources provide volume-related quantities, also called molarities (e.g., mmol/L). When mass-related quantities are reported it is often not stated if the values refer to kg of solvent (molalities) or kg of solution. A common problem is that numerous sources that report vent fluid data even do not clearly state the unit they report in. Often values are just reported as "mM" or "µM" without further explanation. Convention is that "M" refers to mol per liter of fluid, but in some data sources "mM" or "µM" are explicitly mentioned to refer to a kilogram of fluid. It remains therefore unclear if all values reported in the unit "M" are indeed molarities and not molalities or provided as per kg of solution. Commonly, rare earth elements (REE) are reported as unit of mass (e.g., ng/kg) and volatile concentrations in some publications are reported in units of volume of ideal gas under standard temperature and pressure (e.g., cc STP/L). These different units have their raison d'être and are owed to different analytic techniques, but hamper the intercomparability of vent fluid data. from different hydrothermal settings are highly variable, and hence given in different unit prefixes. For example, authors who investigate dissolved H 2 in ultraslow spreading environments report concentrations in mmol/L, while authors who investigate arc systems report concentrations in µmol/L. We homogenized all unit prefixes for the individual compounds according to the most common concentrations in vent fluids. Conclusively, this database provides compositions of vent fluids, end members, and background seawater with standardized species names and units coupled to a consistent set of sample information. The format for all database records (sample information or metadata, species names and units) is given in Table 1.

General Structure
The database provides three layers of information, which totally include 151 possible records for each sample entry. The first layer is called "Sample information" and contains 25 records. It provides a wealth of information about the vent location, the time of sampling, the circumstances of sample recovery, and the original data source. The second layer of information is called "Location" and contains three records, the geocoordinates (latitude and longitude), and depth (below sea level) of the samples/vents. The third layer represents the core of the database and contains some basic fluid parameters and the chemical data (the remaining 123 records).

Sample Information
We use data from the ASHES vent field on the Juan de Fuca ridge reported in  to give an example of a well-documented sampling campaign and to explain the entries in the database. The "Sample information" section (Table 1) includes a unique sample ID provided by the investigator when the fluid sample was recovered. The "Sample-ID" usually comprises a numbering scheme that is individual for every cruise (e.g., 1721-3, 1721-4, etc.). Whenever possible, the designations from the original datasets were retained in the database. This is critical in some instances when sample names are too simple (e.g., A1, A2, etc.), the sample names in different sources are identical, or the end member or seawater compositions are just named "end member" or "seawater." In these cases, the name of the vent site and the year of sampling are added, so that the sample ID's remain unique and are given as "Vent site"-"Sample-ID/Vent-ID"-"Year of sampling" in the database. The "Vent-ID" is the name of an individual orifice if a name was given to the sampled orifice (e.g., INF-s, INF-t, etc.). The parameter "Vent site" refers to the small-scale area of venting and represents subareas (e.g., "Inferno," "Mushroom," "Virgin Mound," etc.) of larger vent fields. The "Vent area," in turn, represents the large-scale vent field (e.g., "Ashes") that may contain several vent sites with again several individual vents. This comprehensive way of classifying the location enables the user to effectively find vent fluid samples for individual vent areas, different vent fields or even individual vents mentioned in different publications. The location description is comprehended by the "Volcanic edifice" record which can also be used to find multitude samples originating in individual vent fields (e.g., "Axial seamount," to stay with the Ashes example). For mid-oceanic spreading centers, the ridge segment is given as volcanic edifice (e.g., "Endeavor segment" for the Endeavor vent field). The fields "Rock type" provide reported rock compositions for the area adjacent to the vent field. The possible rock types are "basalt, basaltic andesite, andesite, dacite, rhyolite, peridotite, gabbro, olivine-gabbro, and sediment." The entry "Rock type (primary)" mentions the predominant rock type and always contains exclusively one rock type. The entry "Rock type (secondary)" lists all other rock types reported for the area around the vent field. The "Rock type" records provide information about the most likely type of fluid-rock interactions that affect the vent area. The parameters "Region (small scale)" (e.g., "Endeavor segment") and "Region (large scale)" (e.g., "Juan de Fuca Ridge") are provided to give information about the larger scale tectonic setting of the vent sites and to allocate sites to large scale regions. The record "Tectonic setting" relates all vent sites to their tectonic setting ("Mid-oceanic spreading center," "Back-arc spreading center" [including back-arc rifts] or "Volcanic arc").
A number of records are further dedicated to identify unique samples and to allocate data from different publications that refer to individual samples. Furthermore, these records help to discriminate samples from the same vents (or vent sites/areas) recovered in different expeditions. The data entries "Expedition," "Vessel," "DSV/ROV," "Dive no." provide information on which expedition (or in which project) samples were taken, which research vessel was used, and which deep submerged vehicle (DSV) or remotely operated vehicle (ROV) recovered the sample. In addition, the record "Sampler type" provides information about the type of hydrothermal fluid sampler which was used to recover the sample. The sampler type provides information necessary to assess sample quality. For example, concentrations of dissolved gases can be chosen for specific sampler types (as some samplers are gas-tight while others are not), or temperature measurements can be discriminated between on-line or separate temperature measurements (again a matter of the used sampler). The entry "time of sampling" provides information on when the fluid sample was taken. Depending on the source literature the date may be just the year, the month, or the exact day of sampling.
Six records in the "Sample information" section are dedicated to identify the original data sources. Two records (primary and secondary) are dedicated, to each "Author," "Year of publication" and "Source ID." The records for authors contain the short form of the corresponding reference (e.g., Author X, Author X & Author Y, or Author X et al.). The source identifier provides dois, URIs, or comparable dynamic links to the original sources. Usually for each sample one lead author, year of publication, and Source ID is given. For a number of samples different chemical parameters are reported in more than one sources. We have merged the data from different sources whenever we could confirm that the data in the different sources refer to one unique vent fluid sample (in total 217 database entries were merged with other entries). The position of the secondary author among the two data sources is usually allocated to the source that provides less data or has published later an equal amount of data. Like for the rock type records, the primary records are made of exclusively one entry, whereas the secondary fields can be composed of multiple authors, years of publication, and source IDs, if several samples were merged to one entry. data of original vent fluid samples but also calculated end member compositions and concentrations of background seawater, as well as measured reference materials. The last record of the sample information section is the identifier "Sample type." This identifier marks the nature of the entry. The different identifiers are "HF" for hydrothermal fluid, "EM" for end member composition, "SW" for seawater composition, or "STD" for composition of a reference material.

Vent Coordinates
The second layer of information is the "Location" section. Here, the geocoordinates and the depth below sea level of the samples/vents are given. The database provides coordinates in decimal degrees with westerly (and southerly) coordinates given as negative values and easterly (and northerly) coordinates given as positive values. The coordinates, whenever possible, refer to even single vents if not vent sites or sometimes vent areas. If coordinates were not given in the original source, but a detailed map of individual vents is provided, the coordinates were approximated from the printed map. The same was done for the depth of the vents. If not tabulated, the depth was retrieved from the text or read off bathymetric maps.

Fluid Parameters
The third layer provides some basic fluid parameters (temperature, pH, alkalinity, salinity, TGC [Total Gas Concentration], density) and, most importantly, the entire set of chemical compositions. Apart from the basic fluid parameters the main part of the database is divided into the categories "Cations," "Rare earth elements & yttrium," "Anions," "Dissolved gases," "Reduced carbon compounds" and "Isotopes" (the entire DIEHL AND BACH 9 of 17 10.1029/2020GC009385 set of parameters is listed in Table 1). The parameters temperature, pH, and sodium concentrations require special mentioning. Due to the general habit of fluid chemists to report fluid temperatures as the maximum measured temperature of a vent (because some fluid samplers do not provide on-line temperature measurement) the database includes several records of temperatures: "T sample ," "T max ," "T calc ," and "T." The entry "T sample " is given when on-line temperature measurement is available during sampling, "T max " is given when the temperature was measured separately, and "T calc " is given when an end member temperature was calculated. The record "T" lists temperatures reported without information on the type of measurement or one of the previously mentioned temperatures. In rare cases of multiple temperatures given, "T" preferably contains "T sample ," rather than "T calc ," rather than "T max " for hydrothermal fluid compositions and "T calc ", rather than "T max ", rather than "T meas " for end member compositions. Similarly, the records "pH meas ," "pH min ," "pH calc ," and "pH" provide details on how pH values were derived. "pH meas " represents measured pH values, whereas "pH calc " represents calculated pH values for end member compositions. As the calculation of end member pH values is challenging, a number of sources report "pH min ," the minimum pH value measured for fluid samples and assign this pH value to the end member composition. The record "pH" represents one of the previously mentioned pH values or is assigned when no information about pH determination is given. If multiple pH determinations are specified, the record pH preferably contains "pH calc " rather than "pH meas ," rather than "pH min ." Sodium concentrations are given either as "Na meas ," "Na calc ," or "Na." "Na meas " represents measured concentrations, "Na calc " represents sodium concentrations calculated from a charge balance, and "Na" represents one of the mentioned concentration types or values not specified. Here, in case of multiple Na determinations "Na" contains "Na calc " rather than "Na meas ."

Extent of the Database and Some Basic Results
In its current state, the database contains 1,772 entries of hydrothermal fluid compositions, 694 entries of end member compositions, and 120 entries of seawater compositions from 97 vent areas reported in 103 sources. After the initial report of seafloor hydrothermal systems on the Galapagos spreading center in 1977 , there has been a continuous increase of investigations covering vent systems all over the world. Especially after the year 2007, the number of investigated vent fluids increased (Figure 1). The global distribution of vent fields incorporated in the database (Figure 2) shows that the database includes vent fields in a range of geotectonic settings, from spreading centers (the mid-Atlantic ridge, east-Pacific-rise, southwest Indian ridge), over volcanic arcs (Mariana and Kermadec arcs), to back-arc spreading centers and back-arc rifts (in the Scotia Sea back-arc basin, the Manus back-arc basin, the Lau back-arc basin and the Okinawa Through back-arc basin). The database hence provides a representative sample size with regards to global hydrothermalism as sampled thus far.
An evaluation of magnesium (Mg), sodium (Na), and chlorine (Cl) concentrations provides some interesting general conclusions for global vent processes. The distribution of Mg concentrations in vent fluid compositions incorporated in the database shows a bimodal nature (Figure 3a). Hydrothermal fluids issue either focused, with seawater barely entrained, or highly diluted, with seawater-like Mg concentrations. Regarding all hydrothermal vent fluid compositions, there is a continuum of Mg concentrations, but the two modes around near-zero and seawater-like contents of Mg are most pronounced. This distribution can likely be a sampling artifact. Fluid chemists often aim to sample the most vigorous vents to recover hydrothermal fluids unaffected by entrainment of seawater. On the other side, sites of diffuse venting where seawater entrainment is most extensive are often sampled because they may allow insights into subseafloor microbial activities. The Mg distribution may also reflect the nature of fluid flow below the vent sites. If hydrothermal activity is extensive, the upflow paths of the hydrothermal fluids become cemented with anhydrite, resulting in a substantial focus and armoring of the upflow zone from entrainment of seawater.
The distribution of Na and Cl provides important constraints on the nature of phase separation and albitization during fluid-rock interaction. The most important process affecting Na and Cl concentrations is phase separation. Both the distributions of Na ( Figure 3b) and Cl (Figure 3c) reflect a Gaussian normal distribution centered at around seawater concentration. The distribution of both parameters does not contain a skewness and shows that low-salinity fluids occur likewise as high-salinity fluids do in hydrothermal environments. The difference between the two parameters is that the mean value of Cl concentrations is  indiscriminable from seawater concentration, whereas the Na mean value is significantly lower than seawater (see Table 1). The difference of average fluid/end member concentrations and seawater likely reflects the albitizing nature of hydrothermal fluids and leads to the conclusion that hydrothermal fluids are depleted by ∼47 mmol Na per kilogram of vent fluid due to albitization as a globally averaged mean. This evaluation shows the potential of how MARHYS Database can be of great help in quantifying global hydrothermal processes regarding flow regimes, subsurface fluid-rock interactions, or global hydrothermal mass fluxes.

Conclusions
Our database provides a new and important tool for hydrothermal vent researchers to find, compare, and evaluate global vent fluid data. The scientific community will greatly benefit from MARHYS Database and find a wide range of applications. The compilation of vent fluid compositional data enables scientists to evaluate global marine hydrothermal processes and to address a number of individual issues: a detailed evaluation of the chemical compositions in this database may result in new constraints on subsurface fluid-rock interactions, a broader knowledge on catabolic energies with regards to microbial activities and especially provide new constraints on the impact of hydrothermal circulation on ocean chemistry. The database enables the assessment of globally and regionally averaged hydrothermal fluid compositions and hence improves calculation of chemical fluxes across the ocean-lithosphere interface. It also allows the computation of compositional or isotopic differences in fluids from three different geotectonic settings and different lithologies. We plan to update the database with future vent fluid data twice a year.

Data Availability Statement
The current version of MARHYS Database (Diehl & Bach, 2020) can be downloaded from the PANGAEA data center under the link https://doi.org/10.1594/PANGAEA.921794. Please cite the correct version of the database along with this article. We are aware of the peril that users may extract data from MARHYS Database and only cite the database instead of the original data sources. We summon users to keep in mind that the original authors who completed sampling campaigns, sample preparation and chemical analyses did the principal and most essential part of the work. We encourage users of the database to cite the original literature whenever possible. Fluid chemists who wish to submit data, to report flaws or to address general inquiries concerning the database, may contact the support center marhys@uni-bremen.de. Bundesministerium für Bildung und Forschung grant no. 03G0263B. We thank Paul Berndt who digitized some of the datasets. Special thanks to Patrick Monien for checking the database in a first instance and providing useful comments on a number of formal aspects. We further acknowledge valuable contributions of Jun-ichiro Ishibashi and Jeff Seewald during the peer-review process. Their useful comments improved the quality of this manuscript and provided vital impulses on the further development of the database.