Developing Tailored Data Combination Strategies to Optimize the SuperCam Classification of Carbonate Phases on Mars

The SuperCam instrument onboard the Mars 2020 Perseverance rover investigates Martian geological targets by a combination of multiple spectroscopic techniques. As Raman, Visible‐Infrared Spectroscopy, and Laser‐Induced Breakdown Spectroscopy (LIBS) spectra deliver complementary information about the interrogated sample, the multivariate analysis of combined spectroscopic data sets is here proposed as a tool to optimize the SuperCam capability to discriminate mineral phases on Mars. For this purpose, the laboratory study of carbonate phases within the Ca‐Mg‐Fe ternary system were selected as representative case of study. After the characterization of model samples, the discrimination capability of mono analytical Raman, VISIR, and LIBS data sets was evaluated by applying a chemometric approach based on the combination of principal component analysis (for sample clustering) and Linear Discriminant Analysis (for mineral classification). Afterward, the low‐level combination (LL) of Raman, VISIR, and LIBS data was achieved by concatenating their spectra into a single data matrix. The mineral classification achieved by LL data sets outperformed the mono analytical ones, thus proving the complementarity between molecular and elemental spectroscopic techniques. Mineral classification was further improved by using a mid‐level data combination strategy. After evaluating benefits and limitations afforded by the proposed combination strategies, future developments are finally outlined. As such, the final objective of this research line is to develop a classification model based on data combination to optimize the capability of SuperCam in discriminating relevant minerals on Mars, this being a key requirement for the selection of the optimal targets to be cached for the future Mars Sample Return Mission.

at multiple locations of the crater floor (Beyssac et al., 2023;Corpolongo et al., 2023;E. Clavé et al., 2022;Tice et al., 2022;Wiens et al., 2022). The rover is now exploring the edges of a deltaic deposit and that deposit has elevated carbonate abundances according to orbital data (Zastrow, 2021). Carbonate-rich units were also detected at the margins of the crater rim (Horgan et al., 2020). Their presence was recently suggested to be related to lacustrine shoreline deposits formed ∼3.8-4 Gya (Noachian epoch (Ramirez & Craddock, 2018) during the closed-basin phase of the lake sequence.
Besides being often associated with the long term interaction of primary rocks with liquid water, recent works (Grice et al., 2019;Melendez, Grice, & Schwark, 2013;Thiel & Hoppert, 2018) have shown carbonate concretions can preserve biomarkers from ancient lifeforms at a geological time scale (Melendez, Grice, Trinajstic, et al., 2013). As such, a refined classification of the carbonate detections made by Perseverance is key for one of the main scientific objectives of the Mars 2020 mission, which is to evaluate the habitability of Jezero crater and the search for life markers (Farley et al., 2020). Indeed, the strong astrobiological potential of carbonate-bearing rocks makes them strong candidates for selecting samples to be cached and stored by the Perseverance rover for the future Mars Sample Return Mission (Muirhead et al., 2020).
Having this in mind, this work seeks to optimize the capability of the SuperCam instrument to classify carbonate minerals detected at the landing site correctly. As an evolution of the ChemCam system onboard the Curiosity rover Wiens et al., 2012), SuperCam is a remote multi analytical suite of five co-aligned techniques: Remote Micro-Imaging (RMI), Laser-Induced Breakdown Spectroscopy (LIBS), Time-Resolved Raman and Luminescence (TRR/L), Visible-Infrared Spectroscopy (VISIR), and sound recording (MIC). The instrument is composed of a Body Unit (BU), a Mast Unit (MU) and a Calibration Target (CT), the technical specifications of which are detailed elsewhere Madariaga et al., 2022;Maurice et al., 2021;Wiens et al., 2021). As SuperCam performs multiple spectroscopic analyses at the same target, the complementarity between elemental and molecular data could be used to optimize the characterization of carbonates, thus improving the mineralogical discrimination achieved by the single techniques.
In other fields of research (e.g., analysis of polymers (Shameem et al., 2017), pigments (Hoehse et al., 2012), explosives (Moros & Laserna, 2011, 2015, bacteria (Prochazka et al., 2018), and food (Zhao et al., 2020)) the exploitation of complementary spectroscopic data was successfully optimized through the application of data combination strategies. Focusing on planetary exploration missions, the combination of spectroscopic data was only recently introduced (Gibbons et al., 2020;Manrique-Martinez et al., 2020;Moros et al., 2018;Rammelkamp et al., 2020). In this field of work, the few manuscripts published to date focused on low-level combination strategies that are based on the multivariate analysis of spectra that have been previously concatenated into a single data matrix (Gibbons et al., 2020;Rammelkamp et al., 2020). Advancing on this line of research, this work aims to gain a deeper understanding of the advantages the mid-level combination of multiple spectroscopic data sets could provide in the proper classification of carbonate minerals on Mars.
Knowing that a wide variety of carbonate minerals within the Mg-Fe-Ca ternary system have been detected on Mars (Amador et al., 2018;Horgan et al., 2020) a collection of pure carbonate samples was investigated by a set of laboratory instruments. At first, X-ray diffractometry (XRD) and inductively coupled plasma-optical emission spectrometry (ICP-OES) were used to assess the mineralogical and geochemical composition of the samples. Raman, VISIR and LIBS spectra were then used to build mono analytical data sets. After comparing different methods for multidimensional scaling, the multivariate analysis of spectroscopic data sets was carried out by using a chemometric approach based on the combination of principal component analysis (PCA) and principal component-linear discriminant analysis (PC-LDA). Mono analytical classification results were then compared to those provided by combined data sets generated through the concatenation of VISIR, Raman and laser-induced breakdown spectroscopy (LIBS) spectra (low-level strategy, LL) and through the combination of their pre-selected spectral parameters (mid-level strategy, ML). After comparing the advantages and disadvantages afforded by using LL and ML data sets for the classification of carbonate minerals, the additional experiments that are required to develop the data combination approach further are finally outlined.

Sample Selection and Preparation
This work focuses on the study of aragonite (CaCO 3 ), calcite (CaCO 3 ), dolomite (CaMg(CO 3 ) 2 ), huntite (Mg 3 Ca(CO 3 ) 4 ), magnesite (MgCO 3 ), siderite (FeCO 3 ) and ankerite (CaFe(CO 3 ) 2 ) phases. Most specimens were selected from the Analytical Database of Martian Minerals (ADaMM) library (M. Veneranda et al., 2022) which is a collection that includes physical samples of all the main mineral phases that have been so far detected on Mars. In addition to those, further specimens were provided by members of the SuperCam team. Prior to analysis, all samples were powdered using an agate mortar and sieved to obtain a grain size <45 μm. For each specimen, 0.5 g of powders were used for acidic digestion and X-ray diffraction analysis (see Section 3.1). Additional 0.5 g were placed in a cylindrical die and pellet were formed by applying a pressure of 10 tn/cm 2 for 10 min.

Analytical Instruments
The mineralogical composition of powdered samples was investigated by XRD. To do so, a D8 Advance (Bruker) system was used in a step scan mode by setting a scan range between 2 and 65° 2θ, with a step increment in 2θ of 0.01 and a count time of 0.3 s per step. Mineral identification was then performed by the BRUKER DIFFRAC. EVA software and the Powder Diffraction File PDF-4 mineral database (ICDD) (Kabekkodu et al., 2002).
Mineralogical results were complemented by quantitative elemental analyses. Carbonate samples were first digested by following an ultrasound extraction method described elsewhere (Prieto-Taboada et al., 2012). After filtration and acidification with HNO 3 , their Ca, Fe, Mg, Mn, Na, K, Sr, Al, Cr, Cu, Zn, and Ti content was determined by ICP-OES (Varian model 725-ES, Agilent). All analyses were performed in triplicate.
Concerning spectroscopic analyses, the scientific outcome of SuperCam was emulated by interrogating sample pellets with a combination of multiple laboratory systems. In detail, LIBS spectra were collected at 3 m distance by SimulCam. Developed at the University of Valladolid, this laboratory emulator of SuperCam is equipped with a frequency-doubled Nd:YAG laser (providing pulses up to 120 mJ and 6 ns width in 532 nm), a transmission spectrometer and an intensified detector for Raman measurements emulating SuperCam . These measurements can be complemented by LIBS analysis using an intensified echelle spectrometer covering a range between 200 and 980 nm. The light collection system is composed by a 300 mm focal length f:4 refractive objective attached to a filtering stage to remove the Rayleigh scattering of the 532 nm laser. Finally, the collected light is conducted to the spectrometers through a round-to-linear bundle of seven fibers from the same manufacturer as SuperCam's 15 . As explained elsewhere , the hardware components of the laboratory instrument are different from SuperCam, so that the acquisition parameters needed to be adapted to obtain spectra of comparable (but not equivalent) quality. With this in mind, five spectra per sample were collected (under terrestrial atmospheric pressure) at different spots by accumulating 20 shots per spot, discarding five previous shots that are used to remove possible surface contamination on the targets. Every shot was recorded introducing a 2 µs delay between laser shot and spectral acquisition, and the width of the collections was of 8 µs. Data collection and visualization was performed by the Solis software (Andor).
Raman analyses were carried out by a spectroscopic system assembled in the laboratory and composed of a BWN-532-100-OEM excitation laser (BWTek) emitting at 532 nm at a maximum power of 105 mW, a double track HoloSpec spectrometer (Kaiser) and a Newton detector (Andor) equipped with a CCD refrigerated to −70°C. The laser was focused on the samples through a BAC100-532E probe (BWTek) equipped with a 20x lens that ensures a spot of analysis with a diameter of 100 μm. Compared to the remote investigations performed by SuperCam, the proximity Raman analysis performed in the laboratory provided the collection of spectra of higher quality in terms of signal-to-noise ratio (SNR). Using this setup, five analyses per sample were carried out by setting a laser power output of 20 mW, an acquisition time between 5 and 15 s and a number of accumulations varying from 20 to 50. Data collection and visualization was performed by the Solis software (Andor). VISIR analyses were performed by the ASD High-Resolution FieldSpec 4 spectroradiometer (Analytical Spectral Devices Inc.). Carbonate samples were investigated at 30 cm distance using the ASD probe. An angle of 45° was set between the light source (halogen lamp) and the sample, as well as between the sample and the detector fiber. After instrument calibration (Spectralon® white reference), 5 spectra per sample were collected in the wavelength range from 350 to 2,400 nm with a mean spectral resolution of 3 nm in the visible region and 6 nm in the near infrared region. Spectra were collected through the RS3 software (ASD Inc.) by accumulating 10 scans per spot. Compared to SuperCam, the commercial instrument employed in this work mainly differs by the range of analysis (350-2,400 nm vs. 530-850 + 1,300-2,600 nm).

Data Processing
To optimize data interpretation and combination LIBS, Raman and VISIR spectra were submitted to two levels of data-pre-processing.

Low-Level Pre-Processing
The low-level pre-processing aimed at improving the overall quality of the spectra by correcting for artifacts associated with the employed analytical systems and by removing potential variations that are uncorrelated with the investigated mineral phases.
In detail, the SpectPro software (Marco Veneranda et al., 2020) was used for baseline correction, smoothing (Savitzky-Golay method) and normalization (to the maximum intensity) of Raman spectra. Furthermore, instrumental response function of the Raman spectrometer was corrected by the use of standards as presented elsewhere (Sanz-Arranz et al., 2017). Concerning VISIR data, straight line continuum was removed by following the method described in a previous study (Adrian Jon Brown, 2006). Each spectrum was then converted to apparent absorbance to facilitate data concatenation and spectral parameter extrapolation (see Section 2.4). As LIBS spectra did not present relevant continuum background or noise, low-level data processing consisted solely in normalizing their intensity to the maximum intensity. After pre-processing, Raman, VISIR and LIBS spectra were concatenated to generate four sets of low-level (LL) combination data sets, namely LL_Raman-VISIR, LL_VISIR-LIBS, LL_LIBS-Raman and LL_Raman-LIBS-VISIR.

Mid-Level Pre-Processing
After low-level pre-processing, key spectral indicators were selected from Raman, VISIR and LIBS data sets. Concerning Raman data, a dedicated MATLAB ("The MathWorks Inc., 2022, MATLAB version: 9.13.0 (R2022b), Natick, Massachusetts: The MathWorks Inc. https://www.mathworks.com," n.d.) routine was used to perform the automated band fitting by using Voigt profiles. Afterward, the wavelength position and normalized intensity of the main peaks of carbonates near 180, 250, 710 and 1,085 cm −1 were selected as spectral indicators for mid-level data combination. Concerning VISIR data, the asymmetric Gaussian spectral band fitting approach was applied to obtain relevant spectral indicators for carbonate discrimination (Adrian Jon Brown, 2006). Successfully applied in previous works (A. J. Brown et al., 2020;Adrian J Brown et al., 2010), this method was used to fit the characteristic absorption feature of carbonates located near 2,300 nm. The resulting four parameters (centroid λ0, amplitude α, half width half maximum (HWHM) σ, and asymmetry χ) were used as VISIR spectral indicators. For LIBS data, the normalized intensity of 5 characteristic emission lines from Ca (430.2,445.4,559.0,612.4,and 616.4 nm),Mg (382.9,383.2,516.7,517.3,and 518.5 nm) and Fe (404.6,432.6,438.4,440.5,and 495.8 nm) were used as LIBS spectral indicators. In this case, a preliminary evaluation was carried out to select the spectral features that do not present self-absorption (this phenomenon being stronger at terrestrial atmospheric pressure rather than Martian (Effenberger et al., 2010)) and are not overlapped by the emission of other elements. The spectral indicators selected from VISIR, Raman and LIBS data (see Table 1) were concatenated to generate four mid-level (ML) data sets, namely ML_Raman-VISIR, ML_VISIR/LIBS, ML_LIBS-Raman, and ML_Raman-VISIR-LIBS.

Multidimensional Scaling (MDS)
For the chemometric analysis of spectroscopic data sets, the clustering capability of multiple MDS method were compared. MDS is a chemometric tool that allows visualizing similarities of individual samples composing a data set. As preliminary step of MDS, multiple quantitative (dis)similarity values among the samples were measured. Among the main methods described elsewhere (Brereton, 2009), the measures evaluated in this work are Euclidean, Cosine, Correlation, Cityblock and Chebyshev distances. For each method, the expected outcome is a dissimilarity matrix in which the original set of variables is reduced to a new set of primary dimensions that maximize the variance among samples. As such, the variance is explained by reducing the data volume with minimal loss of information. Using the obtained dissimilarity matrices as inputs, MDS returns a scatter plot in which the distance among samples is graphically represented in an abstract Cartesian space. In this work, MDS of binary data matrices described in Section 2.3 were carried out by using a dedicated MATLAB routine.

Principal Component Analysis (PCA)
After comparing the results of the different MDS described in Section 2.4.1, PCA was selected as the preferred statistical procedure for dimensionality reduction (Brereton, 2009;Bro & Smilde, 2014). Using the data matrices presented in Section 2.3 as inputs, the output of PCA is a set of principal components PC (eigen vectors that determine the directions of the new feature space), each of them explaining the variance of the data in decreasing order of magnitude (eigenvalue). PCA models were built using the Unscrambler X software (Camo Analytics). After running several preliminary tests, optimized PCA models were built on mean centered data. Scree plots were used to define the number of significa PC that are needed to effectively explain the spread of data. PCA score plots and loadings were graphically represented by a dedicated MATLAB routine.

Principal Components-Linear Discriminant Analysis (PC-LDA)
Once PCA was performed, the resulting PCs were used as inputs to build a classification model based on linear discriminant analysis (LDA) (Izenman, 2008). LDA is a linear classifier method commonly used to study the association between a set of predictors and a categorical response. Knowing the categorical class to which each sample belongs, LDA evaluates the so-called between-class variance by calculating the distance between the centroids of different classes. Afterward, the within-class variance is calculated by measuring the distance between the centroid and the sample of each class. As a result, a new dimensional space is generated that maximizes the between class variance while minimizing the within-class one, thus optimizing the discrimination of unknown samples. To verify the reliability of the discrimination model, the 85 sample inputs composing each data matrix (5 spectra for each of the 17 carbonate samples) were divided into a training set (80% of the data, in which the class of each specimen is known) and a test set (20% of the data, in which the class is unknown). In each case, the classification accuracy, sensitivity, and specificity were computed and compared. These  The NCSS Statistical Software (NCSS, LLC.) was employed to build the PC-LDA models.

ICP-OES
According to quantitative results provided by ICP-OES analyses, the content of Ca, Fe, and Mg for most of the investigated samples resemble those of pure carbonate standards. Ankerite 1 can be considered the only exception, as its iron (14.18 ± 2.10 wt%) and magnesium (27.03 ± 2.01 wt%) content fit with an intermediate carbonate sample within the solid solution dolomite-ankerite, rather than a pure ankerite endmember (CaFe(CO 3 ) 2 ). Minor degrees of Fe-Mg substitution were detected in additional samples, such as siderite 1, 2, and 3 (Mg from 2.32 to 3.66 wt%), magnesite 1 (Fe = 3.71 ± 0.13 wt%), dolomite 3 (Fe = 3.61 ± 0.32 wt%) and dolomite 1 (1.34 ± 0.21 wt%). Furthermore, several samples showed a relevant substitution of magnesium for manganese and iron. This replacement often occurs in natural carbonates and is closely dependent on the chemical environment in which they formed. In detail, siderite 2 was the sample showing the highest content of Mn (9.09 ± 1.20 wt%), followed by siderite 1, siderite 3, ankerite 1 and dolomite 3 (values between 1.41 and 1.91 wt%). Beyond the described cations, additional elements were detected in relevant amounts (between 1.03 and 2.49 wt%), as is the case of potassium (siderite 1, magnesite 1, ankerite 1), sodium (huntite 1 and siderite 3) and strontium (aragonite 1, 2 and 3, and huntite 1). Such detections are consistent with the elemental characterization of natural carbonates presented in previous works (Bishop et al., 2021;Chen et al., 2011;Sengupta et al., 2020;Zwicker et al., 2019) and it can be either related to additional cation substitutions in the crystal structure, or to the presence of minor mineralogical or chemical impurities.

XRD
Complementary to ICP-OES analyses, the mineralogical composition of the selected carbonates was assessed by XRD ( Figure S1 in Supporting Information S1 .08 2θ) respectively. Although fitting quite well their respective mineral patterns (01-078-2442, 01-083-1764, and 01-084-2067), magnesite, siderite and (above all) ankerite samples displayed different degrees of peak shifting, this being consistent with variations in their unit cell parameters. Considering the elemental compositions described in Section 3.1.1, it can be assumed these variations are due to partial cationic substitution in their crystal structure. Furthermore, magnesite 1 is the only sample of the carbonate collection in which a mineral impurity was detected (quartz, main peak at 26.92, matching pattern 01-083-2495).

LIBS
The representative LIBS spectra of the seven mineral phases analyzed in this study are represented in Figure 1. All samples placed along the Ca-Mg axis of the ternary system displayed the characteristic emission lines of calcium (including 393.4 and 396.8 nm) and magnesium (336.1, 337.3, and 338.4 nm) respectively. Compared to those, iron-rich minerals such as ankerite and siderite additionally display multiple emission lines of Fe in the UV-Vis range (from 390 to 550 nm). The main Fe lines (432.6, 438.4, and 440.5 nm) were also identified in magnesite 1, dolomite 3, dolomite 1 and dolomite 2 samples (in order of decreasing intensity), thus confirming the partial substitution of Mg by Fe. It is important to remark how characteristic lines in the deep UV region (below 280 nm) were discarded to focus on features that were less affected by instrumental response.
Concerning the detection of minor/trace additional elements, LIBS results were found to be in perfect agreement with ICP-OES results. In detail, the Mn enrichment of dolomite 3, siderite 1, siderite 3 and (above all) siderite 2 was confirmed by detecting the characteristic emission lines at 478.3 and 482.3 nm. Similarly, strontium (407.8 and 421.6 nm) was found in huntite 1 and all aragonite samples, thus fitting the elemental data provided in Table 2. As displayed in Figure 1, most carbonate targets display the potassium doublet at 766.5 and 769.9 nm, this confirming that LIBS is highly sensitive toward the detection of a few alkali and alkali earth elements.   Despite being below the limit of detection of ICP-OES, aluminum was also detected in most samples (lines at 394.4 and 396.1 nm). Similarly, intense Na lines at 588.9 and 589.5 nm were found in all the collected spectra. The characteristic line of Si (390.5 nm) was additionally detected on magnesite 1 spectra, fitting the identification of quartz impurities by XRD. The same emission line was observed on samples aragonite 1, dolomite 3 and ankerite 1, thus suggesting the presence of quartz (or other silicate minerals) at concentrations below the detection limits of XRD. Additionally, all the sample showed the expected lines for C and O.

Raman
As presented in Figure 2, the representative Raman spectra of Ca-Fe-Mg carbonates are characterized by a main peak near 1,100 cm −1 (ν 1 , symmetric stretching vibration of the CO 3 2− anion), followed by minor signals near 700 (ν 4 , in-plane bending vibration), 280 (L, lattice libration mode) and 150 cm −1 (T, lattice translational mode) (Bishop et al., 2021). Huntite can be easily identified by the position of the ν 1 peak centered at 1,120 cm −1 while the proper discrimination of the remaining Ca-Fe-Mg carbonates relies on the evaluation of their secondary Raman features. For instance, the ν 1 peaks of calcite, aragonite and siderite were found within a very narrow wavelength range (between 1,083 and 1,086 cm −1 ), while their L mode spanned from 205 to 286 cm −1 . Similarly, Figure 2 shows the main peak of magnesite, dolomite and ankerite between 1,093 and 1,096 cm −1 , while their lattice modes went from 296 to 332 cm −1 (L) and from 179 to 218 cm −1 (T). Varying from 705 to 736 cm −1 , the ν 4 mode also provides valuable information for carbonate classification, although it is the less intense of the secondary Raman bands.

VISIR
All the laboratory VISIR spectra displayed the distinctive absorption band of carbonates near 2,350 nm (3ν 3 vibrational mode). As displayed in Figure 3, this spectroscopic feature shows a clear asymmetry due to the contribution of a secondary band centered at 2,300 nm (appearing in the form of a weak shoulder or distinctive peak, depending on the mineral phase). Knowing that center, asymmetry and intensity values of the 3ν 3 band vary according to the structure and chemistry of the mineral under analysis (A. J. Brown et al., 2020), its detailed characterization is often used to discriminate carbonate minerals. A second absorption band, centered at 2,500 nm (2ν 3 + ν 1 mode) is generally used for the same purpose. However, unlike SuperCam, this spectroscopic feature cannot be detected by the commercial spectrometer employed in this work. Compared to calcite, aragonite and dolomite, Mg-rich carbonates (huntite and magnesite) mainly differed by the increased intensity of the absorption band found between 1,900 and 2,000 nm. Assuming the samples are not hydrated, this feature can be assigned to the combined 2ν 3 + ν 1 + ν 4 mode of the mineral. Similarly, the most distinctive spectroscopic feature of iron rich carbonates (ankerite and siderite) is the Fe 2+ band between 700 and 1,700 nm (the lower slope between 1,300 and 1,700 nm of ankerite is due to the small amount of Fe 2+ in comparison to siderite). In addition to those, further absorption features of weak intensity can be detected, as is the case of the 3ν 2 + ν 1 mode observed in calcite and dolomite spectra (centered at 1,850 nm) (Bishop et al., 2021).

Preliminary Comparison of MDS Methods
The capability of multiple MDS methods to cluster carbonate minerals according to their spectral features was compared. Following the methods described in Section 2.3, three sets of spectroscopic data sets were prepared: (a) mono analytical (Raman, VISIR and LIBS), (b) low-level combination (LL_Raman-LIBS, LL-LIBS-VISIR, LL_Raman-VISIR and LL_Raman-LIBS-VISIR) and, (c) mid-level combination (ML_Raman-LIBS, ML-LIBS-VISIR, ML_Raman-VISIR and ML_Raman-LIBS-VISIR). Afterward, a dedicated MATLAB routine was used to compute, for each MDS method, the dissimilarity among samples on the 10 data matrices. The distribution of samples in the new multidimensional space was graphically represented in the form of score plots. To identify the MDS method ensuring the higher degree of sample clustering and mineral discrimination, several parameters (provided by the custom-made algorithm used for data analysis and representation) were taken into consideration, including (a) the distance between the mean values of mineral clusters, (b) the area covered by the confidence ellipse (95%) of each cluster, and (c) the number of dimensions needed by each method to explain 95% of the total variance in the data sets.
As a representative example, Figure 4 displays the sample distribution returned by each MDS method based on the analysis of the LIBS data set. In a broad perspective, Euclidean and Cosine outperformed Correlation, Cityblock and Chebyshev distances by ensuring an improved discrimination of mineral clusters for all data sets. In detail, both Euclidean and Cosine distances were able to group samples according to their mineral composition. The area covered by the 95% confidence ellipses and the distance between mean cluster values were used to evaluate the capability of each method to separate mineral groups in the new Cartesian space. Although Euclidean and Cosine distances provided similar results in the first 3 dimensions, the first one clustered mineral groups within smaller ellipses of confidence while explaining the majority of the spectral variance using a lower number of coordinates (thus ensuring a more effective discrimination of samples at lower dimensions). For this reason, the Euclidean distance was selected as the optimal method for dimensionality reduction.

PCA
The PCA method was used for the multivariate analysis of spectral data sets. As the intensity of all spectra included in the matrices was normalized, a constant weight of 1 was applied to all variables. Starting from the analysis of LIBS results, Figures 5a and 5b revealed a clear separation of mineral phases in clusters. Except for calcite-aragonite, each cluster is well separated along the principal component (PC) 1 (explaining 61% of the total spectral variance). As shown in Figure 6a, this result expresses the strong negative loading of Ca lines combined with the medium and weak positive loading of Mg and Fe emissions, respectively. Compared to PC 1, the loading plot of PC 2 (12% of the total variance) is dominated by the strong positive influence of Fe lines, which are responsible for the discrimination of Mg-rich specimens from Fe-rich ones. The inspection of additional dimensions (up to PC 6, thus covering 99% of the variance) did not yield the separation of aragonite (orthorhombic CaCO 3 ) and calcite (trigonal CaCO 3 ) clusters. According to the loadings represented in Figures 6a and 6k, Na and Al strongly influenced PC 2 and PC 3 scores. Unlike LIBS, Raman PC scores show very small confidence ellipses (95%) of mineral clusters, which helps the discrimination of mineral groups. As displayed in Figure 6b, Raman PC 1 loading was mainly influenced by the position of the main peak of carbonates (ν 1 ). Indeed, low (min value at 1,085 cm −1 ) and high (max value at 1,095 cm −1 ) peak positions showed negative and positive values of loading, respectively. As a result, PC 1 (Figures 5c and 5d, 46% of the variance) displayed a clear separation of samples in three main clusters: (a) aragonite-calcite-siderite (main peak between 1,084 and 1,086 cm −1 ), (b) dolomite-ankerite-magnesite (1,094-1,098 cm −1 ), and (c) huntite (1,120 cm −1 ). An effective separation of dolomite, ankerite and magnesite clusters was achieved along the PC 2 score (21% of the variance), which loading was mostly influenced by the T vibrational mode (300, 298, and 329 cm −1 , respectively). Similarly, PC 3 loading proved the negative correlation of lattice modes L and T (minimum loading values at 184 and 289 cm −1 ) plays a key role in the separation of calcite, aragonite and siderite clusters along PC 3 (14% of the variance).
Concerning VISIR results (Figures 5e and 5f), PC 1 (60%) revealed a clear separation of ankerite and siderite from other carbonates, this being the result of the strong positive correlation expressed by the Fe 2+ band (from 1,700 to 1,300 nm) on PC 1. As displayed in Figure 6e, PC 2 loading is mostly affected by the negative correlation of the combined vibrational mode ν 2 + ν 1 + ν 4 (minimum value at 2,285 nm) and the positive correlation of the band centered at 2,345 nm (3ν 3 mode). As a result, PC 2 score (24%) showed an effective discrimination between calcite, huntite and aragonite-dolomite clusters. Dolomite group was effectively separated from aragonite along PC 4 (not shown, 9%). On the other hand, no PCs prevented the magnesite cluster from overlapping the confidence ellipses of calcite, dolomite and huntite. This limitation was mainly caused by the presence of quartz impurities in sample magnesite 1 (see Section 3.1.2). As SiO 2 partially affects the vibrational profile of VISIR spectra, the resulting confidence ellipse of the magnesite cluster was critically enlarged compared to other minerals.

PC-LDA
After evaluating PCA results, the PC values for each nonanalytical data set were used to run PC-LDA analysis. The results represented in Table 3 show that, when using four PCs, Raman produced the best model, achieving 100% accuracy, sensitivity, and specificity for both the training and validation sets. This result agrees with the PC scores plotted in Figure 5: an excellent separation between mineral groups was achieved within the first three PC dimensions. At the same time, the variance of each cluster can be circumscribed within a very small confidence ellipse ( Figure S2 in Supporting Information S1). The model built on the standalone VISIR data performed the worst, with the lowest figures of merit across the board. An assessment of the Discriminant Scores reveals that the model misclassified both calcite and magnesite samples as dolomite, indicating poor separation of these carbonate classes based on VISIR spectra alone. The model built on the standalone LIBS data performed slightly better than the VISIR model, but misclassified calcite as aragonite during validation.

PCA
As for mono analytical matrices, a constant weight of 1 was applied to all variables. PCA score plots of concatenated LIBS-Raman spectra (Figures 7a and 7b) showed an effective discrimination of mineral clusters within the first three PCs. In detail, PC 1 (55%) organizes the samples in three main clusters: (a) Ca endmembers (calcite and aragonite), (b) siderite + polycationic minerals, and (c) Mg endmembers (magnesite) ( Figure S2 in Supporting Information S1). As represented in Figure 8a, this is due to the strong coefficient of correlation of Raman ν 1 and L modes, combined with the inverse contribution of Ca (negative) and Mg (positive) LIBS emission lines. The PC 2 loading is characterized by the positive and negative contribution of Fe and Ca lines respectively, which resulted in an effective discrimination of dolomite, huntite, ankerite and siderite clusters along this dimension (23% of the variance). PC 3 loading (9%) showed a strong increase in the coefficient of correlation of Raman L and T modes, which helped discriminating aragonite and calcite polymorphs (see Figure 7b). The separation of CaCO 3 clusters is further enhanced along PC 4 (not shown, 8%).
Concerning the LL_LIBS-VISIR, the strongest positive loadings on PC 1 (Figure 8b) were the Fe lines detected by LIBS and the main Fe 2+ slope detected by VISIR. As a result, Figure 7c shows a clear separation of siderite and ankerite from Mg and Ca minerals along PC 1 (57%) ( Figure S2 in Supporting Information S1). PC 2 score (24%) is mainly affected by the 2ν 3 + ν 1 + ν 4 vibrational mode detected by VISIR (maximum value at 1,962 nm), as well as by Mg (positive loading) and Ca (negative loading) lines detected by LIBS, mainly resulting in an effective separation between calcite and dolomite clusters. Similarly, the positive values of loading provided by the 3ν 3 mode (maximum value at 2,300 nm) and by K and Mg emissions lines allow to discriminate ankerite and siderite along PC 3 (7%). The effective discrimination of dolomite from aragonite and of magnesite from huntite groups is achieved on PC 4 (4%, not shown).
The PCA score plots obtained from combined LL_Raman-VISIR data are represented in Figures 7e and 7f. As for LL_VISIR-LIBS data, the PC 1 score (41%) separates Fe-minerals from other samples ( Figure S2 in Supporting Information S1). However, ankerite and siderite are represented in the negative axis of PC 1 due to the inverted loading contribution (from positive to negative) of the Fe 2+ slope detected by VISIR. The contribution of VISIR on PC 2 and PC 3 loading was almost identical to what was measured from the LL_VISIR_LIBS data set (see Figure 8b). However, compared to LIBS, the contribution of Raman variables is negligible in PC 1 and strongly increases on PC 2 and PC 3. As a result, this data set mainly differs from the previous one by the improved discrimination of ankerite and siderite clusters on PC 2 (which is further enhanced on PC 3) and by the effective separation of aragonite and dolomite groups on PC 3 (12%). Once again, the discrimination of the magnesite cluster was only achieved on PC 4 (4%, not shown).
A last test was performed by combining Raman, VISIR and LIBS spectra in a single data set. As represented in Figure 8d, the contribution of VISIR and Raman variables on PC loadings can be considered equivalent to what was observed on Raman-VISIR data set. Concerning LIBS variables, their contribution was negligible on PC 1 while PC 2 showed the positive and negative correlation of Mg and Ca emission lines, respectively. In spite of including LIBS spectra in the data set, the resulting PC 1-2 score plot (Figure 7g) can be considered equivalent to the one represented in Figure 7e. The negligible contribution of LIBS variables was also observed on PC 3, where the positive correlation of Mg and K lines in the loading did not produce any significant change on the PC 1-3 score plot. A comparison of PC loadings proved that LIBS variables play a significant role in mineral clustering starting from PC 4 (5%, not shown).

PC-LDA
The low-level data combination of Raman, VISIR, and LIBS spectra outperformed the mineral classification based on LIBS and VISIR mono analytical data sets. As represented in Table 3, data sets based on the LL combination of LIBS-Raman and Raman-LIBS-VISIR ensured 100% accuracy, sensitivity, and specificity for both training and validation sets. As for LL_VISIR-LIBS the Figures of Merit are all above 97.6%, thus ensuring better mineral discrimination over mono analytical VISIR and LIBS matrices (see Table 3). Concerning LL-Raman-VI-SIR data sets, the obtained figures of merit are lower that those ensured by Raman data alone (100%), thus proving the strong effect of VISIR variables on mineral clustering (see discussion section).

PCA
After combining Raman, VISIR and LIBS indicators, each variable was weighted by dividing them with their own standard deviations. PCA scores and loadings of ML data sets were represented in Figures 9 and 10. Starting from the analysis of the ML_LIBS-Raman data set, Figure 9a shows that all mineral clusters can be effectively discriminated by using only two PCs. This result was achieved thanks to the contrasting effect of L1-5 (Ca,  Figure S3 in Supporting Information S1). Raman variables (e.g., peak position and intensity of ν 1 and librational modes) play a key role in separating clusters on PC 3 (11%), resulting in a strong increase in the separation of huntite and magnesite groups. Confidence ellipses showed siderite is the cluster with the higher degree of data dispersion. This is mainly due to the elemental composition of sample siderite 2, which presents a lower concentration of Fe (due to the cationic substitution with Mn, see Table 2).
The score plots represented in Figures 9c and 9d (ML_LIBS-VISIR) are very similar to ML_LIBS-Raman ones, thus demonstrating the elemental emission lines are more relevant than molecular spectral indicators in cluster separation over PC 1 (62%), PC 2 (22%) and PC 3 (10%) scores. Compared to the PCA results shown in Figures 9a and 9b, the distance between huntite and magnesite cluster decreases on PC 3. At the same time, aragonite and calcite ellipse are overlapped on both score plots. However, it must be emphasized that the weight of VISIR variables strongly increases at lower PCs, allowing the separation of aragonite and calcite clusters at PC 4 (3%, thus overcoming the limitation of LIBS in discriminating mineral polymorphs) ( Figure S3 in Supporting Information S1).
The PCA scores presented in Figures 9e and 9f proved the ML_Raman-VISIR data set also ensures successful discrimination of carbonate clusters, as it improved the classification over most mono analytical and LL data sets. Beside showing an effective separation of all groups within the first two PCs, this is the model ensuring the higher separation between calcite and aragonite clusters. This result was obtained thanks to the complementarity between Raman and VISIR variables. For instance, R2, R6 and V1 presented a strong positive correlation with PC 1, while R8 and v3 had a strong negative value of loading. Similarly, cluster separation on PC 2 was due to the positive correlation of Raman intensity values (R2, R3 and R4) and the negative correlation of VISIR parameters V1 and V2.
Finally, the PCA score plots from ML_LIBS-VISIR-Raman data set show a similar sample distribution to ML_LIBS-Raman and ML_VISIR_LIBS ones, proving that elemental parameters play a dominant role in the mid-led discrimination of carbonate clusters. Also in this case, the discrimination between dolomite and huntite is enhanced on PC 3 (14%). Compared to ML_VISIR-LIBS, the proper separation of calcite and aragonite over PC 4 (5%) was strongly enhanced, thus proving the critical role played by Raman parameters (especially those related to lattice vibrational modes, R1, R2, R5 and R6) in the discrimination of the two carbonates.

PC-LDA
As represented in Table 5, the mid-level combination of Raman, VISIR and LIBS data outperformed LL and mono analytical data matrices, as yielded 100% accuracy, sensitivity, and specificity for both the training and validation sets across all combination architectures.

Discussion
The results presented in Section 3.3 demonstrate that mineral discrimination based on the analysis of mono data matrices is effective but displays some limitations. For example, LIBS did not yield the separation of aragonite (orthorhombic CaCO 3 ) and calcite (trigonal CaCO 3 ) clusters, thus confirming this technique is not suitable for the discrimination of mineral phases that only differ by their crystal structure rather than by their elemental composition (polymorphs). Furthermore, the size of the confidence ellipses is negatively affected by the detection of multiple additional elements. Their presence can be either consistent with elemental impurities in the mineral structures (cationic substitutions related to the chemical environment in which the specimens were formed) or with additional mineral phases in the analyzed target (whose concentration is below the XRD limit of detection). Concerning SuperCam analysis, this result proves LIBS data can be used to obtain a preliminary evaluation about the purity of the carbonate target under analysis. As with the LIBS results, VISIR-based mineral discrimination was also negatively affected by the detection of mineral impurities, which led to a drastic increase in the size of the mineral cluster, eventually causing its overlapping with other groups. Concerning Raman data, it must be emphasized that the outstanding cluster separation was achieved by using high-quality spectra in which all primary and secondary peaks are detected. This is in contrast with the low SNR of most Raman spectra collected by SuperCam on Mars to date, which limits the detection of weak Raman signals (E. Clavé et al., 2022). This contrast is consistent with the poorer sensitivity of time-resolved remote Raman spectroscopy (over proximity systems such as SHERLOC (Bhartia et al., 2021), also on Perseverance, and RLS  on ExoMars) and often made more difficult by analyzing mineralogically heterogeneous targets. From this work, it can be inferred that the mono analytical discrimination of mineral phases based on PCA and LDA is a promising approach as long as relatively pure carbonates are targeted.
Concerning the multivariate analysis of LL data matrices, the results summarized in Figure 7 and Table 4 proved the PCA and PC-LDA analysis of combined spectra improves the mineral discrimination over mono analytical models. This is in line with the results provided in previous works (Gibbons et al., 2020;Rammelkamp et al., 2020), confirming the complementarity of the analytical information derived from the combined use of different spectroscopic techniques. However, it must be underlined that, when performing the LL data combination of multiple spectroscopic data sets, the weight that each technique has on sample grouping can be markedly different. In this case, VISIR data proved to play a dominant role in all data sets. This can be explained by the fact that very broad bands characterize VISIR spectra, so that hundreds of input channels (variables) are needed to describe the influence of a single vibrational mode. In contrast, Raman peaks and (above all) LIBS emission lines are very narrow, so their influence in the PC loadings is comprised of very few variables. In the case of LIBS, this limitation is partially compensated by the higher number of variables that are generally used to describe the spectra (in this work, 10,000 for LIBS, 2,700 for Raman, 2,400 for VISIR). Looking ahead, the balance between techniques may be improved by using chemometric methods alternative to PCA and LDA. For instance, future works should evaluate the potential use of ICA (independent component analysis (Forni et al., 2013)), cluster analysis (Bedford et al., 2022;Rammelkamp et al., 2020), modified PLS-DA (partial least squares-discriminant analysis (Ollila et al., 2012)) and LASSO (least absolute shrinkage and selection operator (Gasda et al., 2021)), since they have been successfully applied to the analysis of spectroscopic data collected on Mars in previous rover missions. To optimize the extraction of the information while ensuring a comparable weight of their variables, the use of different chemometric tools for each spectroscopic technique can also be taken into consideration.
Beyond the results obtained from the chemometric analysis of LL data matrices, this is the first work comparing low-level and mid-level combination strategies applied to the analysis of multiple spectroscopic data sets obtained from mineral phases. As shown in Section 3.5.2, the mineral discrimination ensured by the PC-LDA of combined spectral indicators (ML) outperformed LL and mono analytical results by ensuring 100% accuracy, sensitivity and specificity for all ML data sets. The improved classification can be due to the selection of spectral parameters that: (a) removes the contribution provided by noise or additional spurious signals (overfitting) that are retained by the full spectral traces included in the low-level fusion condition, and (b) avoids the spectral contribution of elemental impurities (for LIBS) and/or additional minerals (for VISIR and Raman) eventually present in the analyzed target. Although this approach enhances the discrimination of pure carbonate targets, the selection of limited spectral parameters makes it insensitive to the additional mineral phases present in the investigated target.
Looking at the potential application of ML data combination to SuperCam data, the following aspect needs to be taken into account: (a) the use of spectral parameters related to low intensity Raman peaks (e.g., the ν 4 mode) should be avoided, since they are rarely detected on Mars, (b) the number of VISIR variables should be increased by adding the band parameter of the second characteristic signal of carbonates around 2,500 nm (which couldn't be detected by the VISIR instrument employed in this work, see Figures 3) and (c) the potential advantages and disadvantages derived from the use of semi-quantitative elemental values (from the major-oxide composition tables, MOC ) should be considered as alternative LIBS indicators. Alternatively, the use of multivariate analysis for the automated feature extraction could be tested (e.g., PCA or Random Forest) (Pan et al., 2019).
In light of the above, SuperCam laboratory models (operating in Los Alamos, USA and Toulouse, France) should be used in future work, these being the only instruments that perfectly replicate the scientific outcome of the flight model operating on Mars.

Conclusions
The purpose of this study was to understand the advantages provided by low-and mid-level data combination strategies in the discrimination of carbonate minerals that the SuperCam instrument onboard the Mars 2020/ Perseverance rover is detecting at Jezero Crater.
To do so, a preliminary analytical work has been carried out in which a collection of carbonate samples from the Ca-Mg-Fe ternary system was investigated by using a set of laboratory instruments. After assessing their geochemical and mineralogical composition by ICP-OES and XRD analyses, Raman, LIBS and VISIR spectra were used to build mono analytical and combined data sets that were eventually analyzed by a multivariate analysis approach based on PCA (for unsupervised clustering) and PC-LDA (for supervised classification).
As described in Section 3.3, the multivariate analysis of mono analytical data sets highlighted certain limitations. Among them, the discrimination of mineral phases that only differ by their mineral structure (as is the case for aragonite and calcite) cannot be achieved by LIBS. At the same time, the reliability of the VISIR model proved to be critically affected by the presence of mineral impurities (as is the case for magnesite 1). Furthermore, although  returning a 100% accuracy and specificity and sensitivity in mineral classification, the reliability of the Raman model proved to be strongly dependant on the contribution of weak vibrational modes, whose detection on Mars is often challenging.
Although the low-level combination of spectroscopic data sets helped improve carbonate discrimination, the different analytical techniques proved to have markedly different contributions to PC scorings and eventually to sample clustering, this being related to the different widths of LIBS lines, Raman peaks, and VISIR bands. As such, VISIR variables were found to dominate the loadings of PCs 1, 2 and 3, resulting in a poor classification of magnesite by the models based on LL_Raman-VISIR, LL_VISIR_LIBS, and LL_Raman-LIBS-VISIR data sets.
By running multivariate analysis on pre-selected parameters, the mid-level combination of spectroscopic data helped balancing the contribution of Raman, VISIR and LIBS techniques, while enhancing their classification capability. By ruling out the negative contributions that the elemental or molecular impurities introduce to the classification models, PC-LDA results presented in Table 5 showed that all ML data sets achieved a 100% discrimination accuracy.
As such, this work proved the spectroscopic classification of carbonate minerals can be further optimized by the combination of spectroscopic parameters selected from LIBS, VISIR, and Raman data. Looking at the potential application of ML combination strategies to the interpretation of SuperCam spectra collected at Jezero crater, more solid classification methods need to be built by (a) expanding the number of samples to be used to test and validate the models (including mineral mixtures, and analysis of materials of different grain size), (b) using SuperCam laboratory models to obtain spectra qualitatively equivalent to those currently collected on Mars by the flight model and (c) testing complementary chemometric tools for data reduction and mineral discrimination. The refined model would finally need to be tested through the analysis of real terrestrial analogs of geological targets investigated on Mars.
In light of the further development of this research line, the final objective is to provide the SuperCam team with a solid classification model to improve the mineralogical characterization of rocks on Mars, which will eventually help in the selection of the targets to be cached for the future Mars Sample Return Mission. In a broader perspective, the proposed data combination approach can also find reliable applications to current and future planetary missions foreseeing the analysis of targets by multiple spectroscopic techniques (e.g., ESA/ExoMars mission). Signal-to-noise ratio TRR/L = Time-resolved Raman and luminescence VISIR = Visible-infrared spectroscopy XRD = X-ray diffractometry