Investigating the impacts of solid phase extraction on dissolved organic matter optical signatures and the pairing with high‐resolution mass spectrometry data across a freshwater stream network

Advancing our understanding of dissolved organic matter (DOM) chemistry in aquatic systems necessitates the integration of data streams from multiple analytical platforms. Some measurements require pretreatment with solid phase extraction (SPE), while others are performed directly on whole water samples. Evidence has suggested that SPE will be biased against select DOM fractions, leading to concerns over the ability to establish data linkages across platforms with variable needs for SPE pretreatment, such as those from optical measurements and those that provide high‐resolution molecular information. Here, we directly addressed this concern by assessing the impact of SPE on DOM optical properties through excitation–emission matrices with parallel factor analysis (PARAFAC) for 47 samples across a stream network within a single watershed reflective of variable DOM sources. PARAFAC data was further paired with molecular information obtained by Fourier transform ion cyclotron resonance mass spectrometry (FTICR‐MS). A comparison of PARAFAC models first revealed no systematic qualitative differences in major components between whole water DOM and DOM isolated by SPE (SPE‐DOM); however, quantitative biases against select components were observed. Further linkages with FTICR‐MS data revealed that the molecular fingerprint associated with each PARAFAC component was consistent between the whole water DOM and SPE‐DOM. Our results suggest that bulk scale linkages across these analytical platforms could be inferred irrespective of the observed quantitative biases resulting from SPE for samples within this example watershed. This work represents a key step toward the systematic evaluation of linkages between optical and high‐resolution mass spectrometry datasets in freshwater lotic environments.

Dissolved organic matter (DOM) is an important vessel for the transport of carbon and nutrients between terrestrial, aquatic, and coastal systems and, thus, an important component of biogeochemical cycles.DOM can be primarily characterized as a heterogeneous collection of compounds important for the transport of metals and pollutants (Yamashita and Jaffé 2008;McIntyre and Guéguen 2013), light attenuation primary productivity (Zhang et al. 2007;Cory et al. 2015;Creed et al. 2018), and microbial-mediated processes (Liu and Wang 2022).The chemical composition of DOM plays a key role in its reactivity, and many studies have begun to recognize the importance of connecting DOM compositional data collected from multiple analytical platforms (Nebbioso and Piccolo 2013;Minor et al. 2014).
Cross-platform comparisons of DOM composition become challenging when considering the variable analytical windows across methodologies.In addition, variation in sample preparation procedures across platforms may selectively impact DOM composition prior to analytical analyses.Absorbance and fluorescence spectroscopy, for example, are used to target the optically active fraction of DOM and are generally performed on the original sample matrix (e.g., in-lab measurements, in situ sensors) (Fellman et al. 2010).Fourier transform ion cyclotron resonance mass spectrometry (FTICR-MS) is another powerful tool capable of resolving thousands of individual molecular formulae in complex DOM mixtures.In this case, direct injection analyses require desalting and extraction of DOM from the original sample matrix (Kujawinski 2002), which can introduce potential biases in the types of DOM identified (Li et al. 2016(Li et al. , 2017;;Bahureksa et al. 2021;Nelson et al. 2022).
Solid phase extraction (SPE) is the most commonly used protocol for desalting and isolating DOM from environmental matrices (Minor et al. 2014).Of the many sorbents available, cartridges that utilize a styrene divenyl benzene polymer (such as PPL) have become a primary means for extracting DOM.This is due to their high affinity for recovering a range of moderately polar to nonpolar compounds (Dittmar et al. 2008).In addition, PPL provides an optimal recovery of dissolved organic carbon (DOC), the quantifiable fraction of DOM, that typically falls within a range of 20-80% (Dittmar et al. 2008;Li et al. 2017).This range in recoveries is primarily driven by sample source, where freshwater samples are reported to have a higher recovery compared to marine samples (Dittmar et al. 2008;Minor et al. 2012;Lechtenfeld et al. 2014;Perminova et al. 2014;Swenson et al. 2014).
Recent studies have suggested that losses in DOC during SPE with PPL cartridges are prompted by selective biases against certain DOM fractions.For example, investigations have shown decreased absorbance spectral slopes and homogenization of terrestrial-like signatures in SPE extracts (Arellano et al. 2018;Wünsch et al. 2018b).A recent study has also indicated decreased recovery in aromatic compounds following SPE (Chen et al. 2016a).However, the direct extrapolation of DOM signals in SPE extracts as representative of the original whole water counterpart remains varied across studies (Andrew et al. 2016;Wünsch et al. 2018b).By contrast, recent evidence has indicated that shifting terrestrial-like signals is less important for FTICR-MS analyses because the biased fractions would have comparatively little impact on the collected mass spectra even if present (Raeke et al. 2016).It remains important to note that the ionization potential of various compound classes impacted by SPE and their influence on high-resolution data analyses remain debated (Han et al. 2021;Qi et al. 2022).
The coupling of optical and FTICR-MS datasets has proven to be of strong interest to the research community (Herzsprung et al. 2012;Stubbins et al. 2014;Kellerman et al. 2015;Wagner et al. 2015;Timko et al. 2015a;Wünsch et al. 2018a).Such pairings would significantly increase information content provided from optical data, whose measurements are much easier and faster to obtain compared to FTICR-MS measurements (Zherebker et al. 2020).Successful pairings could also enable better leveraging of datasets obtained from in situ optical sensor networks with the potential to link molecular information that can be extrapolated across spatiotemporal scales.
The biases in DOM optical signatures as a result of SPE have raised concerns over the interpretation of paired optical and FTICR-MS datasets (Wünsch et al. 2018b).These concerns have yet to receive an adequate response in the current literature.In this manuscript, we address this primary concern by answering the following questions: (1) How does SPE impact bulk optical signatures at a watershed scale with expected variability in DOM source and optical character?(2) Does SPE impede the ability to pair optical data with molecular information obtained from FTICR-MS datasets?To address these questions, we characterized the optical behavior of DOM in whole water and SPE extracts for samples collected throughout a large watershed of varying land cover (and thus DOM sources).Optical data were then compared with FTICR-MS data.We hypothesized that SPE would have a considerable impact on optical signatures, as observed in previous studies (Arellano et al. 2018;Wünsch et al. 2018b).We further hypothesized that these systematic biases would hinder the adequate pairing of optical and FTICR-MS datasets.The results from this study provide a key step in the evaluation of crossplatform linkages between complex DOM datasets with implications for the interpretation of DOM chemistries across scales.

Sample collection and in-field processing
Forty-seven stream and river water samples were collected throughout the Yakima River Basin during the week of 30 August to 03 September 2021 (Supporting Information Fig. S1; Supporting Information Table S1) (Grieger et al. 2022).Samples collected represent an expected gradient of DOM sources ranging from forested headwaters to more agriculturally impacted and shrubland-dominated vegetation in the lower portion of the basin (Supporting Information Fig. S1).One liter of river water for each sample was collected and filtered on-site by passing water through sterile 0.2 μm PES filters (Sterivex) into acid-cleaned 1 L highdensity polyethylene bottles for DOM characterization.A small aliquot of filtered sample was also collected in 40 mL glass vials for DOC measurements.Samples were kept on ice until returned to the laboratory and then stored at 4 C until further analysis.

SPE and extraction recovery
Within 1 week of collection, the DOM was isolated from water samples via SPE techniques described by Dittmar et al. (2008).Briefly, the 1 L filtered samples were acidified to pH 2 with concentrated hydrochloric acid.Bond Elut PPL cartridges (1 g, 6 mL) were conditioned with 12 mL methanol followed by 18 mL pH 2 Milli-Q water.Samples were then loaded onto the cartridge, followed by an additional rinse with 12 mL pH 2 Milli-Q water to remove salts.Finally, the cartridges were vacuum dried followed by sample elution with 6 mL methanol.Sample methanol extracts were stored at À80 C until further analysis.
To assess the extraction efficiency for samples across the basin, a small aliquot of the methanol extracts was evaporated with a CentriVap (1700 RPM at 30 C), and the remaining residue was redissolved in Milli-Q water and sonicated for 15 min to ensure complete dilution.These samples were stored at 4 C and analyzed for DOC (herein referred to as SPE-DOC) and its chemical composition (herein referred to as SPE-DOM).The SPE-DOC and whole water DOC concentrations were then compared and the percent DOC recovery for each sample was calculated.

DOC and optical analysis
Water samples collected in 40 mL vials (whole water samples) and SPE-DOC samples were subject to DOC analysis within 1 week of collection.DOC concentration was measured as non-purgeable organic carbon with a Total Organic Carbon Analyzer (Shimadzu TOC-L).Samples were acidified online with 10% phosphoric acid and purged to remove traces of inorganic carbon.The analytical precision of repeat measurements was within 2%.
UV absorbance scans and three-dimensional excitationemission matrices (EEMs) were collected simultaneously for all whole water samples and SPE-DOM samples using an Aqualog (Horiba Scientific).Samples were analyzed within 1 week of collection.Absorbance scans ranged from 230 to 800 nm in 3 nm increments.The absorbance at 254 nm (A 254 ) is reported as a proxy for chromophoric DOM (CDOM).The specific UV absorbance at 254 nm (SUVA 254 ) is reported as the DOC normalized A 254 and is used as a proxy for DOM aromaticity (Weishaar et al. 2003).Spectral slopes of the natural log-based absorbance spectra were derived in the ranges from 275 to 295 nm and 350 to 400 nm, and the spectral slope ratio (Sr, S 275-295 /S 350-400 ) was calculated as a relative indicator of CDOM molecular weight (Helms et al. 2008).
EEM scans were collected under a wavelength range of 230 to 800 nm in 3 nm intervals.Post-sample processing of EEM scans included inner-filter corrections (Ohno 2002) and normalization to Raman scatter units based on individual daily water Raman scans collected at an excitation of 350 nm.Common fluorescence indices, including the humification index (HIX) and fluorescence index (FI), were calculated to represent relative contributions of humic-like DOM signatures (McKnight et al. 2001;Ohno 2002).EEMs were further subjected to parallel factor analysis (PARAFAC) in Matlab version 2020b using the drEEM toolbox version 6.0 (https://openfluor.org)(Murphy et al. 2013).A separate four-component PARAFAC model was generated for both the whole water samples and SPE-DOM samples.Each model was split half validated under nonnegativity constraints with 99.79% and 99.90% of the variability in each model explained, respectively.Tucker's congruence coefficient (θ) (Tucker 1951) was used to determine the spectral similarity of comparable components identified among both models.The θ was defined for each PARAFAC component as a multiple of θ calculated for the individual excitation and emission spectra (θ = θ Ex Â θ Em ).PARAFAC components were considered spectrally equivalent when θ > 0.95, while the cutoff for spectral similarity was set at θ = 0.92 (Lorenzo-Seva and ten Berge 2006; Murphy et al. 2008).The total sample fluorescence (F total ) was calculated as the sum of the PARAFAC component fluorescence in each sample.

High-resolution mass spectrometry
The SPE-DOM samples were prepared for high-resolution mass spectrometry analysis by diluting an aliquot of the methanol extract to a final DOC concentration of 40 mg L À1 based on the measured DOC recovery in the SPE extracts.Samples were analyzed on a 12 Tesla (12 T) Bruker SolariX Fourier transform ion cyclotron mass spectrometer (FTICR-MS; Bruker, SolariX) outfitted with a standard electrospray ionization (ESI) source.
The instrument is located at the Environmental Molecular Sciences Laboratory in Richland, WA.Ultra-high-resolution mass spectra were collected in negative mode at 220 K at 481.185 m/z and a voltage set to 4.2 kV.One hundred fortyfour scans were co-added for each sample and internally calibrated using an OM homologous series separated by 14 Da (-CH 2 groups).Mass accuracy was set to 1 ppm for singly charged ions in the 100-900 m/z mass range.Data were collected using an ion accumulation time of 0.05 and 0.08 s.BrukerDaltonik Data Analysis (version 4.2) was used to convert raw spectra to a list of m/z values by applying the FTMS peak picker module with a signal-to-noise ratio (S/N) threshold set to 7 and absolute intensity threshold to the default value of 100.Peaks were aligned (0.5 ppm threshold) and assigned chemical formulas using Formularity (Toli c et al. 2017), where a S/N > 7 and mass measurement error < 0.5 ppm were applied.The Compound Identification Algorithm was set to consider C, H, O, N, S, and P and to exclude other elements with constraints requiring at least 1 O and a maximum of 3 N, 2 S, and 1 P.The Formularity output was further processed in R v.4.0.0 using the R package "ftmsRanalysis" (Bramer et al. 2020).This package was used to remove 13 C isotopic peaks and peaks outside the confidence m/z range for FTICR-MS (200-900 m/z).In addition, this package was used to calculate common molecular indices including double bond equivalents and the modified aromaticity index (AI mod ;Koch andDittmar 2006, 2016).

Statistical analyses
All statistical analyses were performed within the R statistical platform version 4.0.5 (R Core Team 2023).Paired t-tests were used to determine significant differences in optically derived spectral indices between whole water DOM samples and SPE-DOM.Linear relationships and Pearson correlation coefficients were further used to establish linkages between DOC extraction efficiency and linkages with DOM composition represented by optical measures.
To better assess the cross-compatibility for linking optical and FTICR-MS analyses, Spearman rank correlations were derived from relationships between the relative intensity of individual molecular formulae and individual PARAFAC components for both whole water and SPE-DOM.Spearman rank correlations were performed due to the non-normal distribution of FTICR-MS peak intensities and to avoid an assumption of linearity between peak intensity and analyte concentration (Kew et al. 2022).For these analyses, the FTICR-MS dataset was constrained to molecular formulae that were present in > 90% of all samples, which accounted for 2770 of the 16,536 total identified molecular formulae across the whole dataset.It is noted as well that these molecular formulae represented the most abundant signals, accounting for an average of 81% AE 3% of the total signal intensity in each sample.For comparative purposes, this analysis was also performed extending the list of molecular formulas to include those found in 25% and 50% of all samples (results in Supporting Information).The significance of the Spearman rank correlations between EEM components and individual molecular formulae was constrained to p < 0.001.
For each PARAFAC component within each sample, the average molecular properties (e.g., C, H, m/z, H/C, etc.) were calculated from the cumulative molecular formulae where significant relationships were found.For example, if 100 molecular formulae had a positive relationship with PARAFAC component 1, then the average molecular properties for those 100 molecular formulae were calculated and were considered representative of the molecular linkage between FTICR-MS data and that PARAFAC component.Principal components analysis was then used to identify differences in sample behavior between whole water DOM and SPE-DOM based on the established average molecular properties for each of the PARAFAC components.

Assessment
Impacts of SPE on DOM absorbance SPE elicited a loss in DOC and had a variable impact on bulk optical properties.Across the whole dataset, DOC recovery from SPE was 51% AE 10%, which is slightly lower yet comparable to the > 60% average recovery often reported for freshwater systems (Dittmar et al. 2008;Minor et al. 2012;Swenson et al. 2014;Chen et al. 2016b;Roebuck et al. 2020).The recovery of absorbance signals was also variable across the dataset and was wavelength dependent (Fig. 1a).For example, absorbance recovery was $ 50% at low wavelengths (< 350 nm) with a steady decline to $ 40% at higher wavelengths (> 350 nm).The relatively stable extraction recovery at low wavelengths led to no significant changes in the spectral slope at 275-295 nm (S 275-295 ), however, the declining recovery post 350 nm likely contributed to the observed significant increase in the spectral slope at 350-400 nm (S 350-400 ) for SPE-DOM.
The observed shifts in optical spectra indicate some systematic biases in the isolation of DOM by SPE.Decreasing spectral slope ratios (Sr) observed in the SPE-DOM (Table 1, paired t-test, t = 10.805,df = 46, p < 0.001) could be interpreted as a preferential recovery of higher molecular weight DOM (Helms et al. 2008).Similar observations for decreasing Sr for DOM in SPE extracts have been reported in other studies (Arellano et al. 2018).Alternatively, decreases in absorbance recovery at high wavelengths may also be the result of molecular level shifts in charge-transfer interactions that limit long wavelength absorption (Sharpless and Blough 2014), though more work would be needed to evaluate this hypothesis.
In the low wavelength region, the absorbance recovery at 254 nm (A 254 ) was 51% AE 8% and was similar to that of the DOC recovery.It is notable that DOC and A 254 in the whole water DOM were positively related to each other (r = 0.90, df = 45, p < 0.001).The DOC and A 254 were each also positively linked with the DOC recovered from SPE (Fig. 2), suggesting that environmental matrices with higher DOC and light-absorbing behavior will have the most efficient DOC recoveries.Furthermore, the comparable recoveries of DOC and A 254 led to no significant differences in SUVA 254 between the whole water and SPE-DOM (paired t-test, t = 0.99418, df = 46, p > 0.05), which suggests broadly comparable patterns in aromaticity were observed across both data streams (Supporting Information Fig. S2; Table 1).We note that post-SPE trends in SUVA 254 and Sr are varied and often system dependent (e.g., terrestrial vs. marine, variable land cover, etc.), suggesting some source-related controls on the aromatic character of DOM after SPE in this diverse watershed (Andrew et al. 2016;Chen et al. 2016b;Arellano et al. 2018;Wünsch et al. 2018b).

Impacts of SPE on DOM fluorescence
In the SPE-DOM, there was a clear bias in the recovery of fluorescence signals observed among different EEM wavelength regions.In general, the poorest recoveries were observed in low excitation-low emission regions of the EEM (Fig. 1b).A similar trend has been previously reported in arctic fjords (Wünsch et al. 2018b).We note this pattern was more prevalent in this terrestrial-based dataset, further suggesting that the DOM source may play an integral role in the recovery of fluorescence signals from SPE.Overall, differences in the fluorescence recovery across the EEM led to significant shifts in commonly calculated optical indices, such as HIX and FI (t HIX = 7.3538, t FI = 11.485,both df = 46, both p < 0.001), both of which indicate the presence of more humic-like signatures in the SPE-DOM (Supporting Information Fig. S2; Table 1).While these indices provide useful first-hand looks at broad fluorescence trends, the measured differences suggest these indices may be limited as indicators of bulk scale linkages between whole water DOM and SPE-DOM.
To more holistically assess the impact of SPE on the identification of discrete fluorophores within this watershed, we subjected both the whole water DOM fluorescence and the SPE-DOM fluorescence to PARAFAC analysis.For each data stream, a fourcomponent model was independently validated.Collectively, the whole water DOM and SPE-DOM generated largely equivalent PARAFAC models (Fig. 3; Supporting Information Table S2), with three of the four identified components considered spectrally equivalent (θ > 0.95).The four PARAFAC components were each assigned based on traditional literature interpretations of common fluorophores in aquatic systems (Wünsch et al. 2019), although the exact identity/source likely varies and is system dependent.The three spectrally equivalent components included a microbial humic-like component most like the common M-peak (C1), a UV/VIS terrestrial humic-like component most like the common A-C peak combination (C3), and a protein-like component like the common T peak (C4).The fourth component was the commonly identified D peak (C2) that is often Table 1.Average DOC concentrations and optical parameters for all 47 samples collected within the Yakima River watershed.There percent DOC and optical recoveries are displayed along with values of various optical indices recorded in the SPE extract and the average percent difference across the whole dataset between pre-and post-SPE analyses.attributed to soils and other reduced environments such as sediments and groundwaters.While some small deviations in the excitation spectra were observed in the SPE-DOM (Fig. 3; Supporting Information Table S2), C2 was still considered spectrally similar (θ = 0.92).The strong comparability of all components suggests the broader identification of fluorophores via PARAFAC was not significantly impacted by SPE.
While the two PARAFAC models exhibited broad similarity, the overall distributions of components were different between the whole water DOM and SPE-DOM.The three humic-like components each saw favorable recoveries > 60%.In contrast, the protein-like component (C4) exhibited the poorest recovery (35% AE 9%) and had a decrease in its relative contributions to the bulk fluorescence spectra post-SPE.This decrease in C4 was in favor of the microbial humic-like component (C1) that increased in its relative contributions post-SPE (Supporting Information Fig. S2; Table 1).It is notable that while there were shifts in the measured PARAFAC component signals, there was clear linearity between individual components in the pre-and post-SPE samples (Supporting Information Fig. S2).This would indicate that despite some small shifts in the measured signals, the overall broad trends among components were preserved throughout the SPE process.
Our results are consistent with other studies that suggest SPE can preferentially enrich DOM with humic-like signatures (Parlanti et al. 2000;McKnight et al. 2001;Arellano et al. 2018;Wünsch et al. 2018b).Of the many cartridges that have been used for isolating DOM, PPL cartridges are generally considered the most well-balanced as the extraction potential ranges across moderately polar to nonpolar compounds (Dittmar et al. 2008;Raeke et al. 2016).PPL extractions are also known to have poor recovery of peptides and other N-containing compounds (Chen et al. 2016a;Stücheli et al. 2018), which we suspect may be responsible for some of the primary shifts in the optical signatures observed in this study (e.g., poor recovery of protein-like fluorescent signatures).Nonetheless, our results highlight that the primary features of the identified fluorescence components remained relatively unimpacted by SPE, which suggested that the SPE-DOM fluorescence signals were qualitatively congruent and well representative of whole water DOM fluorescence signals.On the other hand, the quantitative differences observed between whole water PARAFAC signals and SPE-DOM PARA-FAC signals reinforce concerns over the cross-comparability of PARAFAC data with respect to data streams impacted by SPE, such as high-resolution mass spectrometry (Wünsch et al. 2018b).We address this concern directly in the following section.

Impact of SPE on coupling EEM and FTICR-MS data
The pairing of optical and FTICR-MS data is of strong interest as successful pairings could provide integral insights into the molecular fingerprint associated with optical signatures (Herzsprung et al. 2012;Stubbins et al. 2014;Kellerman et al. 2015;Wagner et al. 2015;Timko et al. 2015a;Wünsch et al. 2018a).Our results presented above reinforce the quantitative biases in SPE-DOM fluorescence signatures observed in other studies, which we hypothesize will limit the potential for extrapolation of molecular information to whole water optical signatures.We tested this hypothesis by directly comparing linkages of both whole water DOM and SPE-DOM PARAFAC models with molecular information obtained by FTICR-MS.
Based on the data presented herein, we reject our hypothesis that linkages between FTICR-MS data with whole water and SPE-DOM optical signals are considerably different.We infer that despite some quantitative biases introduced within the optics by SPE, molecular level information from FTICR-MS data can be adequately coupled with whole water PARAFAC signatures.Evidence to support this conclusion is first presented as van Krevelen diagrams in Fig. 4. The van Krevelen diagrams show that low H/C and high O/C compounds were correlated with terrestrial-like optical signatures similar to other reports across a variety of biogeochemical gradients (Kellerman et al. 2015;Wagner et al. 2015;Gonsior et al. 2016;Roebuck et al. 2020).Even with the SPE-induced quantitative biases in the PARAFAC signals (Supporting Information Fig. S2; Table 1), there was a broadly similar distribution of molecular formulae that were positively and negatively associated with each PARAFAC component within van Krevelen space for both the whole water DOM and the SPE-DOM (Fig. 4).
Additional support for successful cross-platform linkages is presented in Supporting Information Table S3, which shows the majority of molecular formulae shared relationships among PARAFAC components between the whole water DOM and SPE-DOM.For example, of the 670 molecular formulae positively related to C2 in the whole water sample, 616 were identified as also being positively associated with the SPE-DOM C2.Of the four components, these trends were weakest with the microbial humic-like C1 as a nearly 30% increase in the number of formulae that were positively related after SPE was observed.This was likely driven by the high variability in C1 observed in the SPE-DOM with respect to the whole water DOM (Supporting Information Fig. S2), which we suspect may have consequently led to a strengthening of relationships between C1 and individual molecular formulas following SPE.We present further evidence for this by comparing the r 2 values of the molecular formula that had significant relationships with both the whole water DOM C1 and the SPE-DOM C1.Here, the bulk of the formula was either above (in positive cases) or below (in negative cases) a 1 : 1 line, suggesting stronger relationships post-extraction (Supporting Information Fig. S3).Similar observations were also made with C3 and C4, suggesting that while many formulae exhibit similar relationships, the strength of those linkages between the individual formulae and the PARAFAC components can be impacted by SPE.
While some variability was observed with the strength of linkages between individual molecular formulas and PARA-FAC components, the average bulk scale molecular properties associated with each PARAFAC component showed good agreement between the whole water DOM and the SPE-DOM (Supporting Information Table S3).This was best observed via principal component analysis where the whole water DOM and the SPE-DOM PARAFAC components were similarly grouped based on the average molecular properties of positively related molecular formula (Fig. 5).PC1 separated samples primarily based on their terrestrial (more aromatic and high molecular weight) vs protein-like (more reduced, high H/C) signatures.PC2 was distinguished by the microbial humic-like signatures that were more closely linked with N-type formulae.
Notable deviation of SPE-DOM along PC2 was observed for both the microbial humic-like component (C1) and proteinlike component (C4), suggesting some potential variability in N-based linkages among PARAFAC components post-SPE extraction.This variability highlights challenges associated with the characterization of N from high-resolution data.For example, N-based compounds are poorly extracted and experience ionization biases with (À) ESI (Fenn et al. 1990;Stücheli et al. 2018).However, the shift in average N linked with C1 appears to be primarily driven by the shift in the number of molecular formulae positively associated with each component in the SPE-DOM (Supporting Information Table S3).For C1, the 30% increase in positively related molecular formulae post-SPE was predominately from N-poor molecular formulae (C mean = 22 AE 10, N mean = 0.0 AE 0.2) leading to a depletion of the average N signature in the SPE-DOM samples.Despite the enrichment of N-depleted molecular formulae associated with C1 post-SPE, it is important to note the broader average molecular associations (C, H, AI mod , etc.) are not considerably different (Fig. 5; Supporting Information Table S3), which suggests the samples maintained comparable bulk scale molecular associations after SPE for all PARAFAC components.
Finally, we note that an extension of these analyses outside the realm of PARAFAC with common optical indices (e.g., HIX, FI) also yielded similar trends that show comparable molecular associations pre-and post-SPE.This data is provided in the Supporting Information (Supporting Information Fig. S4; Supporting Information Table S3) and is an indication of the broader applicability of linking FTICR-MS molecular information with general fluorescence data.

Discussion
Our results are consistent with other studies that have acknowledged SPE biases on DOM optical signatures (Chen et al. 2016a;Wünsch et al. 2018b) yet are also consistent with studies that have established linkages between FTICR-MS molecular data with whole water optical data (Wagner et al. 2015).Wünsch et al. (2018b) expressed valid concerns over the compatibility of linking whole water EEMs with SPEbased FTICR-MS data, notably pointing to the poor recovery of long wavelength fluorescence as a potential explanation for the poor linkages with FTICR-MS data reported in other studies (Stubbins et al. 2014).Our data support this concern in part, as SPE had a clear impact on the strength of correlations observed between individual formulae for three of the four PARAFAC components.In contrast to other studies, the longer wavelength PARAFAC component (C2) in this study provided the most stable linkages with FTICR-MS data (Supporting Information Figs.S2, S3), which was likely driven by a more consistent and higher recovery compared to those reported in other environmental settings (Stubbins et al. 2014;Chen et al. 2016a;Wünsch et al. 2018b).
Despite some of the small quantitative differences in the PARAFAC components post-SPE and their linkages with individual formulae, the average overall molecular properties of significantly related molecular formulae went unchanged for each PARAFAC component.This observation is likely driven by triple-blind phenomena between SPE, optical, and FTICR-MS analyses where each analysis discriminates against molecules of similar structural character (Raeke et al. 2016;Chen et al. 2016a).More specifically, the preservation of broad trends for each component after SPE (Supporting Information Fig. S2) allowed FTICR-MS linkages to be directly transferable among SPE-and whole water fluorescence signals.

Impacts of solid phase extraction on organic matter
There are additional considerations that may explain some of the variable optical behavior following SPE that may impact molecular linkages post extractions.For instance, we note that our post-SPE fluorescence was measured absent of any primary matrix effects potentially associated with the whole water samples.Ionic strength and pH have, in some cases, been indicated as sources of variation in optical data (Spencer et al. 2007;Gao et al. 2015), although these changes appear to be subtle at environmentally relevant pH (4.5-8).Differences in absolute fluorescence intensity for common peaks across these pH ranges are generally within 10% (Timko et al. 2015b;Groeneveld et al. 2022), and most studies report little or no impact of ionic strength on optical properties (Mobed et al. 1996;Boyd and Osburn 2004;Mei et al. 2009;Lu et al. 2015).Thus, we suspect that systematic differences in the pre-and post-SPE sample matrix provided minimal additional uncertainty within our fluorescence data (Timko et al. 2015b;Wünsch et al. 2018b) or its potential linkages with the FTICR-MS data.We remain cognizant of this potential limitation and recognize its need for further study.
In addition to matrix effects, it is also understood that fluorescence components represent a supramolecular assemblage of DOM moieties (Romera-Castillo et al. 2014;Cuss and Guéguen 2015;Wünsch et al. 2017;Wünsch et al. 2018c).DOM fluorophores can track multiple compounds (Herzsprung et al. 2012) or the interaction of compounds through chargetransfer interactions (Del Vecchio and Blough 2004;Yakimov et al. 2021).It remains possible the disruption of such interactions due to select molecular biases led to variable recoveries among each of the PARAFAC components post-SPE.As the extent of these assemblages is in part driven by DOM age and source (Cuss and Guéguen 2015), we suspect the potential impacts on PARAFAC component recoveries and any cascading impacts for pairing with FTICR-MS molecular information may, in part, be DOM source dependent.Further investigations into targeted mechanisms, such as charge-transfer interactions, in describing these observations are encouraged.
Finally, ionization potential among molecules via ESI is always subjective across a range of DOM sources and matrices, for which we highlight a few key points.First, our study did not compare directly to whole water FTICR-MS as the ability to obtain comparable peak information to SPE samples is limited by additional interferences that impact compound ionization (Nelson et al. 2022).As our goal was to extract SPE-based FTICR-MS information across optical signals, the absence of comparable whole water FTICR-MS data does not impede the primary findings of this study.Next, as a separate part of this study that is described with results in the Supporting Information, we also investigated the implications of assuming SPE recoveries when preparing FTICR-MS samples for analysis, which we suspected may impact compound ionization and interpretation of data.In short, this was found to be of negligible importance (Supporting Information Figs.S7, S8).Finally, while our primary correlation analyses were conservative in their constraints to the most abundant molecular formula (e.g., formula detected in > 90% of samples), extensions of these analyses to include more poorly ionizable compounds (e.g., compounds in 25% or 50% of samples) yielded comparable results (Supporting Information Figs.S5, S6; Supporting Information Tables S4, S5).While additional work is needed to expand the results of this study across a variety of different DOM sources and environmental matrices that may yield different interferences, our study highlights that bulk-scale molecular properties could be inferred through linkages with optical data in this dynamic freshwater system.

Conclusions
Our results have important implications for the analysis of DOM via optical and high-resolution molecular analyses and the impact of SPE on the cross-comparability of methods that characterize whole water vs. SPE-DOM.Across this fluvial network, the recovery of DOC following SPE was strongly linked with the starting sample DOC concentration and composition, where samples with the highest light-absorbing capacity provided the best recoveries.Our study also recognizes some of the quantitative biases in optical signatures accrued via SPE that lead to an enrichment of terrestrial-like signatures.The minimal difference between pre-and post-SPE PARAFAC models still enabled comparable linkages with molecular information from FTICR-MS.As such, this result suggests that SPE is unlikely a limitation when directly extrapolating molecular information obtained from FTICR-MS to whole water DOM fluorescence signals.It is important to recognize that these results do not constitute an assumption that all molecular formulae with established linkages are direct contributors to the fluorescence signals.Instead, this is an observation that systematic shifts in organic matter source and processing within this system led to comparable shifts in both the optical and molecular level data that allowed extrapolation of information across data streams.
This study provides a first step at increasing our confidence in the use of optical data as a surrogate for high-resolution datasets.Future studies are encouraged to extend these results across additional environmental gradients (e.g., hydrological, coastal).Additional work is also needed to better assess mechanistic controls on shifting optical dynamics and the pairing with N-containing DOM.Future studies should also consider the establishment of ubiquitous molecular proxies (Medeiros et al. 2016) that could be linked with optical data and have the potential for extrapolation to field-based applications (e.g., in situ optical sensors).Such successful pairings could provide considerable insight that allows a more predictive understanding of molecular level transformations and biogeochemical cycling of aquatic DOM across scales.

Fig. 1 .
Fig. 1. (A) Average recovery of absorbance signals (black line) with standard deviation shaded in gray, (B) average recovery of fluorescence signals, and (C) standard deviation for the percent recovery of fluorescence signals across this dataset.Figure replicated for this dataset from Wünsch et al. (2018b).

Fig. 2 .
Fig. 2. X-Y relationships to show the percent DOC recovery as a function of (A) DOC concentration in the whole water samples and (B) the absorbance at 254 nm in the whole water samples.

Fig. 3 .
Fig. 3.A comparison of separate four-component PARAFAC models for EEMs collected from whole water and SPE extracts.Excitation and emission loadings are represented as solid and dotted lines, respectively.The whole water components and their equivalent SPE components are represented as black and gray lines, respectively.Spectral congruency (θ) is also provided.See methods for more detailed information on θ.

Fig. 4 .
Fig. 4. Van Krevelen plots showing the positively and negatively related molecular formulae associated with each PARAFAC component in the whole waters (A-D) and SPE extracts (E-H).

Fig. 5 .
Fig. 5. Principal component analysis derived from the average molecular properties of molecular formulae from FTICR-MS analyses that were positively associated with each PARAFAC component in the whole water DOM, SPE-DOM, and the shared associations between the two.