This paper demonstrates that a Raman spectroscopy, point-counting technique can be used for phase analysis of minerals commonly found in deep-sea hydrothermal plumes, even for minerals with similar chemical compositions. It also presents our robust autonomous identification algorithm and spectral database, both of which were developed specifically for deep-sea hydrothermal studies. The Raman spectroscopy expert algorithm was developed and tested against multicomponent mixtures of minerals relevant to the deep-sea hydrothermal environment. It is intended for autonomous classification where many spectra must be examined with little or no human involvement to increase analytic precision, accuracy, and data volume or to enable in situ measurements and experimentation.
 For a variety of geological studies, chemical speciation is as important as elemental concentrations. It is particularly important in deep-sea hydrothermal environments where steep gradients in chemistry and temperature define ecological habitats and reactions with seawater. Several studies suggest that scavenging reactions between seawater and hydrothermal plume particulates, composed of polymetallic sulfides and Fe/Mn (oxyhydr)oxides, are rapid enough that they may have a direct effect on global seawater composition [Kadko et al., 1995; German and Von Damm, 2003; SCOR Working Group, 2007]. However, this work relies on ex situ elemental and, for the most part, qualitative mineralogical measurements, which may not fully represent in situ conditions. This may be particularly true for fine-grained particulates [e.g., Navrotsky et al., 2008], which precipitate from hydrothermal plumes, aggregate, and sink to the seafloor to form metalliferous sediment deposits. More detailed investigations of particulate minerals are required to address questions concerning seawater chemistry and the fueling of chemosynthetic activity. Key to answering these questions is developing analytic techniques capable of in situ quantitative phase analysis.
 Raman spectroscopy can also be used for quantitative analysis, but unlike traditional methods for elemental chemistry, absolute peak intensities are too variable for quantitative use, especially for minerals, because peak intensities vary with crystal orientation. Relative chemical proportions can be determined in some circumstances using alternate spectral characteristics including (1) for gases and liquids, the ratio of Raman peak areas [e.g., Wopenka and Pasteris, 1987; Diller and Chang, 1980] and (2) for solids, point counting methods where multiple Raman spectra are used to identify and count mineral species [e.g., Haskin et al., 1997; Wang et al., 2003]. But practical methodologies are still being proven.
 The Raman spectroscopy expert algorithm (RaSEA), presented here, enables quantitative point counting. Whether this translates to quantitative, semiquantitative, or qualitative phase analysis depends on how the measurements are made. We believe it is appropriate to describe a method as quantitative if its accuracy and precision can be determined (e.g., the validation tests described here). In situ applications can also be quantitative if sample geometry is controlled [Breier et al., 2009] and reference standards are used; lacking these, results would be semiquantitative or qualitative, which may be more appropriate in some cases. RaSEA alone is evaluated here; but is intended as the initial step in an analytic sequence that would provide both relative mineral proportions (i.e., in situ by RaSEA, shore-based by X-ray diffraction) and absolute elemental concentrations (i.e., shore-based by ICP-MS).
Breier et al.  report on the other necessary part of this overall approach: an optically compatible, trace metal clean, suspended particle rosette multisampler. This sample collection system is designed to host in situ optical analysis systems, particularly for Raman spectroscopy. It solves the problems of sample geometry and control for in situ analysis of suspended particles by concentrating and trapping them on two-dimensional filters. These filters can be presented to the optical analysis system for as long or as often as needed and in a repeatable manner that allows for a focused beam and a minimal amount of seawater in the optical path. We are currently using this system to collect hydrothermal plume samples for the shore-based part of the analytic sequence just mentioned [Breier et al., 2008]. The ultimate goal is to add initial in situ speciation measurements to this analytic sequence by combining the SUPR system with an appropriate in situ Raman spectroscopy system. RaSEA would be the analytic tool used to process this in situ data.
2. Background: Challenge of Quantitative Phase Analysis
 Quantifying phases in multicomponent samples, particularly samples with little a priori knowledge of the constituents, is challenging regardless of the technique. For Raman spectroscopy, accuracy depends on the ability to acquire identifiable spectra (e.g., spectra with sufficient signal to noise) and factors such as differences in laser/phase interaction. Phase differences in reflectivity, translucence, and Raman scattering produce differences in spectral intensity and can bias the analysis. For quantitative X-ray diffraction (XRD) techniques, accuracy depends on factors such as sample preparation and the crystallinity of the constituents. When particle sizes are too coarse or significantly different between phases, accuracy suffers because of preferential crystal orientation and differential X-ray absorption [Dermatas et al., 2007]. Method-invisible phases (e.g., ionically bonded for Raman, amorphous for XRD) can also bias results. In addition, all quantitative techniques are affected by instrument setup. Raman spectroscopy is sensitive to focusing and changes in the optical path. XRD is sensitive to sample/diffractometer alignment.
 High-accuracy phase analysis typically requires a variety of techniques to refine the result. For example, the winner of the 3rd Reynolds Cup for quantitative analysis of multicomponent (≥10) clay-bearing samples achieved errors of ∼1 weight % per phase, but to do so, required a combination of single line and pattern summation XRD techniques with internal standards as well as Rietveld refinement, grain size separation, and elemental, thermal and oriented sample analysis [Omotoso et al., 2006]. The point being, no single method must stand alone. Our goal for RaSEA is a method of quantifiable phase analysis that can, at least in part, be used underwater. The underwater analysis does not need the accuracy of traditional XRD methods because the results will be refined by subsequent shore-based techniques.
Table 1 shows our current list of potentially important deep-sea hydrothermal minerals. We prepared particulate standards for 8 of the most common: anhydrite, pyrite, chalcopyrite, pyrrhotite, sphalerite, hematite, magnetite, and goethite. Solid specimens were crushed and hand-picked to remove secondary phases. Grains were ground and sieved to obtain four size classes: (1) <90 μm, (2) 90–250 μm, (3) 250–500 μm, and (4) 500–710 μm. Each size class was cleaned repeatedly in an ultrasonic bath of distilled water, rinsed with ethanol, air-dried, and stored in sealed glass vials. Qualitative powder XRD, using a Philips APD θ/2θ diffractometer, was performed on the <90 μm size class to confirm mineralogy [Bish and Post, 1989]. A JEOL JXA-733 electron microprobe was used to confirm stoichiometry on the basis of the average of five spot measurements on polished 90–250 μm grain mounts; microprobe mineral standards were used for calibration [Veinot, 1980]. Bulk elemental analysis on fused 90–250 μm particles, using a Jobin Yvon Ultima2 ICP-OES, was used to determine Al, Ba, Ca, Cu, Fe, Mg, Mn, Pb, Si, Zn to μg/g concentration using matrix matched standards [Ingamells, 1970].
 Raman spectra (n = 20 distinct points) were collected from each size class of the dry mineral standards and used to develop RaSEA. Raman spectra were also collected from <90 μm sized particles both wetted with, and immersed in, distilled water. Dry mineral samples were analyzed in 50 μL well plates, wetted samples were placed in Petri dishes, and immersed samples were allowed to settle in glass vials. RaSEA was tested on binary mixtures of pyrite with each of the other seven minerals. These were produced in five different mass ratios: 1:9, 1:3, 1:1, 3:1, and 9:1 using both the <90 and 90–250 μm particles. A seven-component mixture (excluding anhydrite for reasons to be discussed) of 90–250 μm particles was also tested. Each mixture was analyzed using point grid measurements (n = 100 for binary mixtures, n = 500 for the multicomponent mixture). To optimize the probability that each point measurement represented a unique particle, grid spacing was matched to particle size: 100 μm spacing for <90 μm particles and 250 μm spacing for 90–250 μm particles. To estimate precision, four replicate sets of measurements were made on the pyrite and goethite 1:1 mixture.
 We report Raman shifts in wave number (Δcm−1) relative to the incident laser. Raman peak intensities are evaluated on a relative basis (i.e., normalized by the height of the dominant peak). All spectra were collected using a Kaiser Optic Systems, Inc. Raman microprobe equipped with a 532 nm Invictus laser, a spectrograph (0–4400 Δcm−1 range, ∼5 cm−1 effective resolution) set to collect Stokes (red-shifted) spectra, an Andor camera (512 × 2044 active pixels, ∼1 cm−1 pixel mapping), and a motorized stage. The system was calibrated with NIST-calibrated halogen (for intensity) and neon (for wavelength) light sources and a cyclohexane standard (for Raman shift assignments). For dry and wetted particles, three different objective lenses (10X/0.25, 50X/0.75, and 100X/0.9) were used with the focal spot diameter varying from 100 μm to <10 μm respectively; spot size was matched to particle size. The laser power and exposure time were 15 mW and 5 s for all >90 μm particles and 5 mW and 15 s for the <90 μm particles. Ten exposures (cosmic ray filtered) were averaged for each point spectrum. An immersion optic was used to collect spectra from immersed particles using 0.5, 5, and 20 mW laser powers with 200, 20, and 5 s exposures, respectively. A fourth “noisy” spectrum, achieved using white side illumination, was also collected for each standard.
 Several natural hydrothermal particles samples from 9°50′N East Pacific Rise were also analyzed. Sinking particulates (1–5 mm sized) were collected during a sediment trap deployment. Suspended particulates (>1 μm) were collected by in situ filtration from the neutrally buoyant hydrothermal plume [Breier et al., 2007, 2008]. Visible images of the sinking particulates, taken with a Kodak M863 digital camera and side illumination, were reduced to white, black, and orange colored areas to estimate percent cover. Raman point measurements within these colored regions were used to estimate percentages of the different phases present.
4. Results and Discussion
 XRD, elemental and Raman spectroscopy analyses confirm that the standards are all near single phase except for pyrrhotite, which contains calcite, chalcopyrite, and sphalerite (Table 2). The hematite, goethite, and magnetite particles contain minor amounts of quartz and other (hydr)oxides. For the eight standards, as well as other minerals analyzed from Table 1, the majority of Raman peaks are within the 200–1400 Δcm−1 range (Figure 1).
 For each standard, we determined the variability of each peak and designated one peak each as primary and secondary, and several as tertiary on the basis of the utility of the peaks for phase identification. The measured peak positions are very similar to previous findings [White, 2009; Bouchard and Smith, 2003; Mernagh and Trudu, 1993]. There are some differences, however. Mernagh and Trudu  reported intense peaks in natural sphalerite at both 275 and 350 Δcm−1, and a less intense peak in synthetic sphalerite at 300 Δcm−1. In our natural sphalerite, we measured the most intense peak at 298 ± 9 Δcm−1. Such differences are expected and, for minerals, can arise from lattice and interstitial substitutions, partial transitions to dimorphous crystal structures, and differing degrees of crystallinity. This underscores the need to characterize natural peak variability, as we do here by recording peak position standard deviations, as well as means, in the RaSEA database (auxiliary material).
 For the four sulfides investigated, the primary peaks occur between ∼300 and 400 Δcm−1. Of these, only pyrite contains a sulfur-sulfur bond, giving it a distinctive two-peak spectrum [Lutz and Muller, 1991]. The other three contain metal-sulfur bonds only. The chalcopyrite and sphalerite spectra, in particular, are very similar both dominated by one intense peak at essentially the same location, 292 ± 4 and 298 ± 9 Δcm−1, respectively. Pyrrhotite has a weak and highly variable spectrum (Figure 1). It has two ideal crystal structures and theory predicts that neither should be Raman active [Mernagh and Trudu, 1993]. The most consistent peaks in our pyrrhotite spectra occurred at 471, 377, and 676 Δcm−1, although their irreproducibility precludes confirmation or estimating standard deviations; these are very similar to the 465 and 373 Δcm−1 peaks reported by Battaglia et al. . They may be the result of departures from ideal pyrrhotite structures, which could explain their high variability; but more pyrrhotite samples must be examined to fully characterize these peaks. Pyrrhotite also had the only spectra to differ significantly between wet and dry particles, showing what appear to be sulfate peaks at 980 and 1008 Δcm−1 in the spectra of wet particles.
 All three (hydr)oxides have intense peaks near 400 Δcm−1 and in the range 260–300 Δcm−1 (Figure 1). Hematite also has an intense peak at 1304 ± 23, and magnetite has a peak at 662 ± 10 Δcm−1 that may be intensified by laser resonance. Relative peak intensities are more variable than the sulfides, being affected by hydration in addition to the factors previously mentioned [Bouchard and Smith, 2003]. Studies note that these oxides may be susceptible to laser induced transformations [Bouchard and Smith, 2003; de Faria and Lopes, 2007]. In particular, goethite may dehydrate to hematite as evidenced by a growing peak at 1322 Δcm−1 [de Faria and Lopes, 2007], an effect that we witnessed occasionally, e.g., as evidenced by increased peak variability near 1300 Δcm−1 (Figure 1e).
4.1. Challenge of Automated Spectral Identification
 Peak finding and measurements, essential for spectral identification, are common analytic tasks for which there are well-established methods, but many rely on direct human involvement. Automating these tasks, to increase the spectral processing rate and eliminate bias due to human subjectivity, introduces several challenges. Specifically, these include: distinguishing signals from noise; distinguishing peaks of interest from peaks of interference; and accounting for variable baselines. These issues are particularly relevant to the use of Raman spectra for phase analysis, which in this case is based on the uniqueness of spectral patterns. Even small peaks, just above the noise range, can be important. Fluorescence can easily obscure such peaks because Raman scattering is relatively weak: only 1 out of 108 incident photons are Raman scattered. Fluorescence also introduces non-Raman peaks and makes the baselines of Raman spectra highly variable, rendering peak height measurements less precise. In addition, since Raman scattering is anisotropic the relative peak heights within a spectrum are variable and as a result, overlapping peaks can produce variable peak shapes. Any automated spectral identification algorithm must be robust enough to deal with these factors.
4.2. Raman Spectroscopy Expert Algorithm
 Our approach to using Raman spectroscopy for quantifiable phase analysis relies on point counting and pattern recognition. Point counting offers the possibility of increasing accuracy by increasing the point count. Each point is identified as a single phase. No attempt is made to deconvolute spectra that may represent more than one phase. Instead, the possibility of such an event is minimized by using a laser focal spot that approximates particle size. Pattern recognition, by comparing curve fits limited to the observed range of peak properties, offers the ability to distinguish between similar phases even with partial interferences. No attempt is made to uniquely identify the origins of any specific peak. Consequently, the presence or absence of any individual peak has less impact on the identification. Also, as an expert algorithm, RaSEA requires several user-defined decision making thresholds set by human “result truthing” of trial runs, specific details are in the RaSEA manual. The current thresholds are matched to the spectrometer setup and database compounds discussed in this paper. These parameters may require adjustment for optimal performance under different circumstances, particularly if used for other mineral combinations with more spectral similarities. In such a case the decision making thresholds must be reconfirmed and adjusted by human “result truthing” of new standard mixtures.
 RaSEA uses a continuous wavelet transform (CWT) to locate peaks and estimate their relative size [Du et al., 2006], and two identification steps, peak lookup and constrained curve fitting, to identify compounds. The CWT enables robust peak identification because it allows relative peak sizes, positions, and noise to be estimated in a manner unaffected by variable baselines. The CWT scale factor is proportional to peak area and used as a measure of peak size. The CWT transformation effectively groups peaks by size. We use the average size of the smallest peaks as a measure of noise; peaks of interest are identified when their signal to noise (i.e., size) ratio is greater than a user-defined threshold. The positions of peak maxima and centroids can both be taken directly from the CWT transform. We use the centroid as our primary measure of peak position, but if the difference between the maximum and centroid positions is greater than a second user-defined threshold we record them as separate peaks. This identifies some irregular-shaped peaks that would otherwise be missed.
 In the first RaSEA identification step, all sample peaks are compared to database peaks; if a match is found, a relevancy score is calculated. RaSEA scores on the presence of (1) primary peaks, (2) secondary peaks, (3) tertiary peaks, (4) the spacing between primary and secondary peaks, and (5) whether the most intense peak in the sample is a database mineral's primary peak. After scoring all potential matches, if one score is greater than the peak lookup threshold, RaSEA identifies the sample as the match. While this step is useful for reducing the number of potential matches, and qualitative analysis of dissimilar spectra, it is not very accurate at distinguishing similar phases. Therefore the peak lookup threshold is set to a value that produces few direct matches.
Sobron et al.  developed an algorithm, similar to RaSEA's first evaluation step, for qualitative applications during future Mars missions. It filters spectra, removes baselines, and finds peaks using a fast Fourier transform and identifies spectra by matching peak positions, intensities, and widths to a database. The goal of Sobron et al.  is to qualitatively evaluate potential samples for biomarkers and specific minerals (e.g., jarosite), prior to collection and subsequent analysis by other techniques. Our goal is the quantitative analysis of mineral mixtures including those with similar spectra. Thus RaSEA, unlike the Sobron et al.  algorithm, requires (1) a spectral database with observationally determined ranges for every peak's position, intensity, and width and (2) a second, pattern-matching, evaluation step.
 In the second, more accurate, RaSEA pattern-matching step, constrained curve fits are performed for each possible match. A standard least squares approach is used to fit database peaks to the spectra using mixed Gaussian and Lorentzian peak shapes:
where RI is Raman intensity, f is the Lorentzian shape fraction, RIG is the Gaussian intensity contribution, x is the Raman shift, RS is the peak position, FWHH is the full width half height, and RIL is the Lorentzian intensity contribution. For each possible match, peak position, intensity, and FWHH are constrained to the observed variations recorded in the database. The purpose is not to fit the unknown spectrum exactly but to fit the unknown within allowable spectral variations for the potential matches. RaSEA calculates a composite error for each fit, composed of the root mean square error, the correlation coefficient, the sum of the intensity differences between all sample and match peaks, and the differences in peak position, intensity, and width between the most intense sample peak and the primary database peak. RaSEA identifies the sample as the match with the smallest composite error. This is an inherently flexible approach to pattern-matching that can accommodate a range of spectral variations. More details are in the RaSEA manual, which is included as auxiliary material, along with the Matlab® code and database.
4.3. RaSEA Validation and Testing
 Phase composition is approximated by point count fraction and compared to the actual sample weight %. Single-phase analyses show RaSEA's accuracy is generally very high (Table 3). Accuracy is highest for pyrite and chalcopyrite (99%, and 96%, respectively) and good for sphalerite (93%). The misidentification rate is greater, but still low, for the Fe (hydr)oxides because of the variability, and similarities, in their spectra. For pyrrhotite, the accuracy and misidentification rate are poor for reasons previously mentioned. The identification of chalcopyrite, sphalerite, and calcite in the pyrrhotite standard and quartz in the hematite standard appear to be valid on the basis of XRD and elemental analysis. Except for pyrrhotite, and hematite whose identification rate is marginal, accuracy is sufficiently high that error is currently limited by counting statistics not misidentification rates: for example, the 79 pyrite observations result in a 1σ error estimate of 11% (/79); this can be decreased by increasing the count. In addition, except for pyrrhotite and the intentionally “noisy” sphalerite and magnetite exposures, RaSEA correctly identified spectra both over a range of exposure times and under immersed conditions (Figure 2). Compared to the distilled water used in these measurements, the seawater present during in situ analysis will introduce a dissolved sulfate band at 981 Δcm−1; however, this should not affect in situ results since this spectral location is unoccupied by the peaks of these minerals.
Table 3. Identification of Mineral Standards With Scoring and Curve Fitting Using the 10X Optic
 Similar accuracies were achieved for the binary and multiphase mixtures (Tables 4 and 5). This demonstrates RaSEA can detect secondary phases down to concentrations of at least 10%, even in multiphase mixtures, and lower if the point count were increased. The accuracy improvement that is achieved through the second pattern-matching step is particularly evident here. With it, the pyrite and chalcopyrite fractions in the 1:1 mixture are estimated to be 52% and 45% respectively; without it they are 21% and 55%. This is the difference between RaSEA and results achievable by peak lookup alone, as in the work by Sobron et al. .
Table 4. Identification of 90–250 μm Size Mineral Particulate Mixtures
 Precision was verified by 4 replicate measurements at separate locations on the surface of the pyrite and goethite 1:1 mixture; the mean and standard deviation for the pyrite and goethite counts were 41 ± 4 and 31 ± 2 respectively. Two matrix effects were observed. First, strong Raman scatters (e.g., sulfates and carbonates) and large reflectance contrasts can bias counts: laser scattering from particles in focus can produce spectral features from phases outside the focal spot. The pyrite and anhydrite mixtures are an extreme example; pyrrhotite and calcite may be another. For each, the sulfide mineral has a high reflectivity, and the nonsulfide mineral is a much stronger Raman scatterer. Low laser power and a smaller focal spot reduce the effect. Second, some minerals, when present in the same spectrum, result in a combination of peaks that can be misidentified as a third mineral: pyrite and sphalerite mixtures result in a greater chalcopyrite misidentification. A smaller focal spot also reduces this effect (Table 6). However, using a higher-power objective lens for this purpose also reduces the depth of field and increases nondetections and misidentifications, because of increased difficulty in focusing.
Table 6. Identification of <90 μm Size Mineral Particulate Mixtures
 Natural particulate samples provide critical “sea truthing,” showing how real-world specimens differ from our controlled mixtures (Figure 3). The >1 mm size class sediment trap samples illustrate real-world heterogeneity; for such samples a point grid is impractical. One option is to use RaSEA in a targeted point-counting approach, to correlate mineralogy with an image color map. This approach approximates phase composition by area %. For example by surface area, the filtered sediment trap material is 79% dark colored sulfides dominated by chalcopyrite, 11% light colored elemental sulfur, and the remaining 4% has the characteristic color of an Fe oxide but is identified by Raman spectroscopy at different points as magnetite, hematite, pyrrhotite, and a recurring but as yet unidentified phase. These area % compositions can be refined to a weight % by subsequent total mass and elemental measurements (e.g., ICP-MS).
 In the case of suspended particulates, particle size is even smaller, and samples can contain significant fractions of organic material [Toner et al., 2009]. In such samples analyzed to date, intense broadband fluorescence obscured any Raman peaks. For laboratory applications, sample preparation can mitigate the current fluorescence limitations. But for in situ applications, this technical challenge must be overcome to make the method widely applicable. Several instrumental approaches for mitigating, and even eliminating, such fluorescence exist, including use of different laser wavelengths, time-resolved spectroscopy, and anti-Stokes (blue-shifted) Raman spectroscopy.
 Our study demonstrates that Raman spectroscopy can be used to discriminate between many of the major minerals produced as fine-grained particles in deep-sea hydrothermal systems. When used as part of an analytic sequence that includes elemental analysis, this methodology can be used to quantify mineral phases, even on small and/or fine-grained samples and at the μm scale. Laboratory applications are possible now. A variety of in situ applications may be possible in the future, including (1) analysis of stable, and metastable, mineral phases under in situ conditions; (2) long-term monitoring of mineralogical changes in sinking and suspended particulates as part of an ocean observatory; and (3) mapping and exploration of seafloor metalliferous sediment deposits and massive sulfides by autonomous underwater vehicles.
 We thank Meg Tivey and Olivier Rouxel for advice throughout; Lauren Mullineaux, Steve Manganini, Susan Mills, Margaret Sulanowska, Nilanjan Chatterjee, and the RESET and LADDER cruise (NSF OCE-0424953) scientific parties for help gathering samples and technical assistance; and Jill Pasteris, Karl Booksh, and an anonymous reviewer for constructive comments. Support for J.A.B. was provided through a RIDGE 2000 Postdoctoral Fellowship (NSF OCE-0550331).