Lignin phenol quantification from machine learning‐assisted decomposition of liquid chromatography‐absorbance spectroscopy data

Analysis of lignin in seawater is essential to understanding the fate of terrestrial dissolved organic matter (DOM) in the ocean and its role in the carbon cycle. Lignin is typically quantified by gas or liquid chromatography, coupled with mass spectrometry (GC‐MS or LC‐MS). MS instrumentation can be relatively expensive to purchase and maintain. Here we present an improved approach for quantification of lignin phenols using LC and absorbance detection. The approach applies a modified version of parallel factor analysis (PARAFAC2) to 2nd derivative absorbance chromatograms. It is capable of isolating individual elution profiles of analytes despite co‐elution and overall improves sensitivity and specificity, compared to manual integration methods. For most lignin phenols, detection limits below 5 nmol L−1 were achieved, which is comparable to MS detection. The reproducibility across all laboratory stages for our reference material showed a relative standard deviation between 1.47% and 16.84% for all 11 lignin phenols. Changing the amount of DOM in the reaction vessel for the oxidation (dissolved organic carbon between 22 and 367 mmol L−1), did not significantly affect the final lignin phenol composition. The new method was applied to seawater samples from the Kattegat and Davis Strait. The total concentration of dissolved lignin phenols measured in the two areas was between 4.3–10.1 and 2.1–3.2 nmol L−1, respectively, which is within the range found by other studies. Comparison with a different oxidation approach and detection method (GC‐MS) gave similar results and underline the potential of LC and absorbance detection for analysis of dissolved lignin with our proposed method.

permafrost thaw has led to additional release of ancient terrestrial organic matter, mainly derived from plant detritus, into aquatic environments (Vonk et al. 2012(Vonk et al. , 2013Spencer et al. 2015;Wild et al. 2019). The fate of terrestrial dissolved organic matter (tDOM) is important to determine with respect to understanding its role in a changing carbon cycle (Cole et al. 2007;Ciais et al. 2008). Concentrations of dissolved lignin-a biochemical marker for tDOM in seawater-are generally low. For example, in the Pacific Ocean tDOM on average makes up around 1% of the total dissolved organic carbon (DOC) (Hernes and Benner 2002). However, in the Arctic Ocean surface waters 14-24% of the DOC is found to be terrestrial (Benner et al. 2005), which reflects the closer proximity to riverine inputs from surrounding continents in the Arctic, compared to that of the large open ocean basins.
Lignin is an amorphous and highly branched phenolic biopolymer used as a building block in plant cell walls to create structural support (Monties and Fukushima 2001). Lignin exists only in vascular plants (Lewis and Yamamoto 1990;Monties and Fukushima 2001) and is therefore an excellent biomarker of terrestrial plant material. For measurement, the lignin macrostructure must first be oxidized into its constituent phenolic monomers, whichin addition to the phenol functionality-may carry aldehyde-, ketone-and acid functional groups. These lignin-derived phenols can also be divided into groups depending on ring substitution (p-hydroxyl [P], vanillyl [V], syringyl [S]) and conjugation (e.g., cinnamyl [C]). The lignin concentration in DOM is reported as the total dissolved lignin phenols (TDLP) measured after its oxidation. Open ocean TDLP concentrations are at pmol L À1 level, while the coastal ocean ranges values up to 500 nmol L À1 , depending on proximity to river discharge (Fichot et al. 2016). In estuaries, lignin is often strongly correlated with DOC, as both have a high concentration source in freshwater, and mix more or less conservatively with salinity (Hernes and Benner 2003;Fichot et al. 2016;Osburn et al. 2016). Ratios between P/V, C/V, and S/V can be used to determine the source of terrestrial material (Hedges and Mann 1979;Lobbes et al. 2000;Amon et al. 2003;Mann et al. 2016) and to reflect diagenetic fate in aquatic environments (Opsahl and Benner 1998;Hernes and Benner 2003;Kaiser et al. 2017).
Quantification of TDLP in seawater requires a sequence of steps including filtration, solid phase extraction (SPE), oxidation, purification, and finally quantification of individual lignin phenols. Filtration removes particulate matter and subsequent SPE concentrates the DOM (including dissolved lignin) and desalts the sample prior to oxidation (Dittmar et al. 2008). The oxidation is performed in a reaction vessel (typically a sealed Teflon or metal cylinder) and is mediated by the addition of a copper oxidant (Hedges and Ertel 1982;Yan and Kaiser 2018a). The reaction products are isolated from the reaction mixture afterwards using a hydrophilic lipophilic balanced (HLB) cartridge (Yan and Kaiser 2018b) containing a resin made from a copolymer of divinylbenzene and Nvinylpyrrolidone. The subsequently eluted lignin phenols are quantified using a combination of high-performance liquid chromatography (HPLC) or gas chromatography (GC), coupled with absorbance spectroscopy or MS (Hedges and Ertel 1982;Lobbes et al. 1999;Louchouarn et al. 2010;Kaiser and Benner 2012;Yan and Kaiser 2018b). The downscaling of the reaction vessel size (thereby volumes of reactants needed) together with the alternative use of CuSO 4 as oxidant, has improved yield of lignin phenols, and offers superior accuracy and precision (Yan and Kaiser 2018a).
Due to its high sensitivity and specificity, MS has been the preferential quantification method for lignin phenols. However, widespread application of MS is limited by the high purchase and maintenance costs of instrumentation. In comparison, HPLC-absorbance spectroscopy traditionally struggles with lower sensitivity, lower specificity and increased background interference from DOM compared to MS. The resulting lower sensitivity therefore requires the use of more water for extraction and makes the HPLC-absorbance spectroscopy approach less desirable.
Machine learning approaches have become more readily accessible, and a modified version of parallel factor analysis (PARAFAC2) has shown great potential to overcome problems with shifting, overlapping, and low intensity peaks in chromatography (Amigo et al. 2010). Traditional PARAFAC finds unique underlying solutions of independent components (analytes) and scores (concentrations) across a range of samples and has, among other applications, been widely used to characterize excitation-emission fluorescence matrices of DOM samples (Bro 1997;Stedmon et al. 2003;Murphy et al. 2013). In contrast to PARAFAC, PARAFAC2 is less constrained and allows one of the modes to shift to a minor extent in between samples, which makes it more suitable for chromatographic data where shifts in retention time and changes of peak shape often occur (Harshman 1972;Bro et al. 1999). This study is therefore focused on the development of machine learning-assisted absorbance-based HPLC approach, which mitigates current limitations (with sensitivity and specificity) and allows the quantification of lower concentrations of lignin phenols in a complex mixture of DOM and hence require lower initial sample volumes of seawater.

Sampling
All plastic material used in the study was acid-cleaned and ultrapure water rinsed before use, whereas all glassware was acid-cleaned and combusted. Seawater samples for analysis were taken at entrance to the Baltic (Kattegat) and in the Davis Strait west of Baffin Island, Canada (see Table 1 for sampling details). Both cruises were conducted with the research vessel R/V Dana during autumn 2021.
Seawater samples were taken using a Niskin water sampler mounted on a CTD rosette (Seabird Scientific). Between 2.4 and 4.8 L of seawater was collected for each sample and filtered through a 0.2 μm cartridge filter (Polyethersulfone Membrane Capsule Filter, Sterlitech Inc.) using a peristaltic pump. After filtration the samples were acidified to approximately pH 2 by addition of hydrochloric acid (HCl). Subsamples (60 mL) were collected from the acidified seawater samples in combusted brown glass vials for measurement of DOC. The rest of the acidified seawater samples were then stored dark and cold (at 5 C) for approximately 1 yr until extraction.

Solid phase extraction
Solid phase extraction of DOM was performed on the filtered and acidified seawater using Bond Elut Priority PolLutant (PPL) cartridges (200 mg sorbent, 3 mL cartridge capacity, Agilent) according to the method by Dittmar et al. (2008) with minor modifications. The samples were introduced into the PPL cartridges using PTFE tubing and tube adapters (Supelco) at a flowrate between 4 and 6 mL min À1 using a compact precision peristaltic pump (Shenchen). Before use, the PPL cartridges were cleaned with 3 mL methanol and conditioned with 6 mL of acidified ultrapure water (pH 2, HCl). After the extraction of DOM from the seawater samples, the sorbent was rinsed with 6 mL of acidified ultrapure water (pH 2, HCl) to remove salts. The sorbent was dried with a vacuum manifold using a pressure of À60 kPa for 5 min. The cartridges were then stored in acid-cleaned and combusted brown vials until further processing. The volume of the extracted seawater was determined using a 1 L measuring cylinder (AE 10 mL) to be able to calculate the enrichment factor and approximate the environmental concentration of lignin phenols. From the extracted seawater, 60 mL of water was collected in brown glass vials for DOC measurement.

Dissolved organic carbon
DOC was determined using high-temperature catalytic combustion (TOC/VCPH, Shimadzu). Fifteen milliliters of acidified sample (collected before SPE extraction, pH 2 by HCl) was poured into a clean glass vial (550 C, 5 h). Samples were sparged in the autosampler for the instrument using oxygen gas to remove all inorganic carbon prior to injection. A 100 μL sample per determination was injected onto the catalyst and a minimum of three injections were averaged to determine the mean instrument response. The detector response was converted to DOC concentrations via a seven-point standard curve between 0 and 311 μmol L À1 carbon using acetanilide. To ensure calibration stability, ultrapure water spiked with 62 and 104 μmol L À1 standards served as reference samples, which were all repeatedly measured after every 7 th sample. In addition, the instrument performance was verified by determining that the DOC concentrations of the deep-sea community reference standard (Hansell laboratory, University of Miami) fell between 42 and 45 μmol L À1 .

Cupric oxidation
The method for cupric oxidation applied in our study was a modified version of that developed by Yan and Kaiser (2018a). In-house built stainless steel reaction vessels with a volume of 850 μL were thoroughly cleaned before use by soaking in NaOH, then rinsed in methanol, soaked in ethanol overnight and finally rinsed multiple times with ultrapure water. In between samples, the cylinders and threads were submerged in ethanol for half an hour and rinsed thoroughly with ultrapure water afterwards.
The DOM retained on the PPL cartridge was eluted with 3 mL methanol (HPLC grade, Sigma-Aldrich). Afterwards the eluate was evaporated under nitrogen gas and redissolved in 3 mL of methanol for a second time (to achieve exact volumes). From this 3 mL methanol, either the whole volume (for seven samples) or 0.6 mL (for five samples) was transferred to the reaction vessel. The methanol was then evaporated in a fume hood with a gentle stream of nitrogen gas. Afterwards 45 μL of 10 mmol L À1 CuSO 4 , 40 μL of 0.2 mol L À1 ascorbic acid (antioxidant to avoid over-oxidation), and 748 μL of 1.1 mol L À1 NaOH were pipetted into the reaction vessel containing the dried DOM. Adding these volumes of reactant to the reaction vessel led to almost identical concentrations of each compared to those from Yan and Kaiser (2018a). The total pipetted volume filled the whole volume of the reaction vessel, leaving no headspace. The sealed reaction vessel was placed in an oven at 150 C for 2 h, which was found to give the highest yield of lignin phenols according to Yan and Kaiser (2018a). Afterwards the vessel was cooled quickly using water and the reaction product was transferred into a 5 mL combusted glass vial. The vessel was rinsed with 3 Â 800 μL of ultrapure water to make sure all of the mixture was transferred to the 5 mL glass vial. The dilution of the reaction vessel products with ultrapure water also avoided flocculation of the DOM upon subsequent acidification (data not shown).

HLB extraction and clean-up
The oxidized organics from the reaction vessel were purified using a HLB cartridge as described in Yan and Kaiser (2018b). Prior to the HLB extraction, 17 μL of 0.25 mmol L À1 cinnamic acid (CIN) was added (to give a final concentration of 21.25 μmol L À1 ) as an internal standard to account for loss of lignin phenols during the HLB extraction and the subsequently clean-up steps. After addition of CIN, the solution was acidified to pH 2 with 85 μL of 6 mol L À1 H 2 SO 4 . The cartridge was placed in the vacuum manifold and cleaned and conditioned with 2 Â 1 mL methanol and 2 Â 1 mL of acidified ultrapure water (pH $ 2, 7.4 mmol L À1 H 3 PO 4 ). A vacuum hand pump was used to maintain a low flow (pressure at À25 kPa) across the HLB cartridge. The acidified cupric oxidized product solution (approximately 3.2 mL) was pipetted onto the HLB cartridge and subsequently 3 Â 300 μL of a 20/80 (vol/vol%) methanol/water mixture was applied to remove inorganics and weakly retained compounds (Yan and Kaiser 2018b). Finally, nitrogen gas was applied for 5 min to dry the sorbent. Elution of the sample into a 1.5 mL glass vial was carried out with a 30/70 (vol/vol) methanol/ methyl acetate mixture. The addition of methanol to methyl acetate assists with the elution of acidic phenols that are strongly bound to the resin due to hydrogen bonds (Yan and Kaiser 2018b). To remove the residual methanol/methyl acetate from the cartridge, nitrogen gas was briefly applied. The methanol/methyl acetate was evaporated using nitrogen gas, and the sample was reconstituted in 190 μL ultrapure water with pH adjusted to 2.5 with 10 μL of 0.14 mol L À1 H 3 PO 4 . The 200 μL aqueous sample containing the oxidized DOM was then transferred to a HPLC vial with a 250 μL insert vial and stored at À18 C until analysis.

Analytical standards
Pure lignin phenol monomers were used as analytical calibration standards. The 11 natural lignin phenols are named and abbreviated systemically, first by their chemical group related to the amount of methoxy groups on the benzene structure, that is, p-hydroxyl (P), syringyl (S), vanillyl (V), coumaric (C as in CAD), and ferulic (F), followed by the type of functional group, additional to the phenol, that is, acid (AD), aldehyde (AL), and ketone (ON). The internal standard (CIN) was not named according to this system. p-Hydroxybenzoic acid (PAD), coumaric acid (CAD), and CIN were obtained from Merck. p-Hydroxybenzaldehyde (PAL), syringic acid (SAD), vanillin (VAN), and syringaldehyde (SAL) were obtained from Alfa Aesar. Acetovanillone (VON), ferulic acid (FAD) and acetosyringone (SON) were obtained from Acros Organics. p-Hydroxyacetophenone (PON) and vanillic acid (VAD) were obtained from Sigma-Aldrich and TCI Europe, respectively. Stock solutions with a concentration of 1.25 mmol L À1 in ultrapure water were made for each of the 11 lignin phenols and the internal standard. The solution was stirred with a magnetic stirrer over the course of 2 d to make sure complete dissolution was achieved. The stock solutions were afterwards stored at À18 C until further use.
Calibration curves consisted of 13 concentrations between 1 nmol L À1 -25 μmol L À1 and all phenols were combined to yield standard mixtures containing different concentrations of each compound (calibration standards 1-13, Supporting Information Table S1). This approach ensured that any effect of coelution between phenols with concentration differences (e.g., 25 μmol L À1 and 10 μmol L À1 ) could be addressed when applying the 2 nd derivative/PARAFAC2 method to create calibration curves. To calculate limit of detection (LOD), additional calibration standards were made with concentrations mixed between 1 and 25 nmol L À1 (calibration standards 14-18, Supporting Information Table S1).
A standard mixture (20 μmol L À1 of each lignin phenol) was made by diluting the stock solutions in ultrapure water. To evaluate the recovery of lignin phenols and the internal standard in the presence of DOM background matrix, a selected seawater sample (Kattegat, st. 11/19 m) was spiked with the standard mixture so it contained known added concentrations ($ 5 μmol L À1 ) of each lignin phenol. Both the standard mixture and the spiked sample was injected between every 7 th sample.
To assess the reproducibility of the laboratory and analytical procedures, a solution was made from Suwannee River natural organic matter reference material (SRNOM, Cat. Num. 2R101N). The SRNOM was purchased from the International Humic Substances Society, isolated in 2012 using reverse osmosis (Green et al. 2015). One hundred and fifty milligram of solid SRNOM was dissolved in 25 mL of methanol to yield approximately 250 mmol L À1 DOC. Three replicate samples, made from the SRNOM stock solution, were then subjected to cupric oxidation, clean up and analysis. Adding 120 μL of the SRNOM stock solution to the reaction vessel (the stainless steel cylinder for cupric oxidation in the oven), this resulted in the addition of approximately 30 μmol DOC, which is similar to that extracted from the natural seawater samples.

Chromatography and detection
HPLC was performed on a Nexera X2 HPLC system (Shidmadzu) equipped with LC-20AB pumps using a C18 column (Poroshell 120 EC-C18 4.6 Â 100 mm, 2.7 μm particle diameter, Agilent) with a guard column attached. The C18 column and the guard column were heated to 50 C using a column oven. The two mobile phases used for HPLC consisted of 7.4 mmol L À1 phosphoric acid in ultrapure water (mobile phase A; pH 2.4) and pure acetonitrile (mobile phase B). Many studies use a mixture of acetonitrile and methanol to elute phenolic compounds (Steinberg et al. 1984;Lobbes et al. 1999;Ingalls et al. 2010;Fischer and Höffler 2021). Here, we elected to use pure acetonitrile to speed up the elution and obtain compound absorbance spectra with less solvent interference. The total runtime for each sample was 40 min. Between 0 and 18 min, mobile phase B increases from 5% to 55% (elution of lignin phenols and the internal standard), between 18 and 20 min mobile B increases to 100% and afterwards stays at 100% until the 22 min mark (removing all residual organic compounds from the column). Between 22 and 35 min mobile phase B decreases back to 5% and stays there until the 40 min mark to stabilize the pressure before the next run starts. The mobile phases were pumped through the columns at a constant flowrate of 1 mL min À1 and injection volume was kept at 50 μL for all samples and standards. The diode array detector (DAD; SPD-M30A, Shimadzu) measured absorbance between 240 and 700 nm at a rate of 1.5 Hz.

PARAFAC2 modeling
The isolation of individual elution profiles and spectra of the lignin phenols was carried out using PARAFAC2. The analysis fits the following equation to the chromatographic data using an alternating least square routine: where x ijk corresponds to the data points in the chromatographic output (sample Â retention time Â wavelength) from the HPLC-DAD instrument. For x ijk , k corresponds to sample, i corresponds to retention time, and j corresponds to absorbance wavelength. Each PARAFAC2 component f is described by three vectors: the elution profile (retention time), a k if , the absorbance spectrum, b jf , and the concentration, c kf . Each element in x ijk , is calculated as the sum of abundance recorded across the predefined number of components, F. The superscript k in a k if implies that the elution profiles in between samples can deviate slightly from each other and allow for minor retention time shifts between samples. The residual (unexplained) signal is contained in e ijk . The goal is to explain the highest variance in the original data by the PARAFAC2 components and thereby minimize e ijk . Using an alternating least squares routine, the algorithm fits models until the improvement between iterations falls below a given convergence criterion. For this study the converge criterion was set to a relative change in errors by 10 À9 or less.
Before PARAFAC2 was applied, misalignment between chromatograms were corrected so that the peaks of the internal standard CIN were aligned across all samples (peaks in interval X in Fig. 1a). While this removes the majority misalignment between chromatograms (samples), there may still be minor remaining peak specific retention time shifts left. To circumnavigate issues associated with the considerable baseline drift due to changing solvent composition and elution of a broad DOM background signal, the chromatograms were transformed into their 2 nd derivative (Fig. 1b).
While in principle the PARAFAC2 analysis could be performed on the whole chromatogram across all samples, in practice this is very difficult, computationally slow, and often does not provide robust results due to the very large number of components required. Rather, it is commonplace to divide the chromatogram into peak windows that are subjected to individual analysis (Amigo et al. 2010). In line with this approach, the chromatograms were divided into 10 intervals ( Fig. 1a,b) and each interval was characterized by PARAFAC2 independently following a routine created to achieve the best model solution (see Fig. 2). For each interval, 10 PARAFAC2 models were fitted with one to four components, which resulted in a total of 40 models per interval. The wavelength mode was constrained to non-negativity, while the retention time mode was left unconstrained due to the nature of the 2 nd derivative space. The stop criteria for the fitting of the models were set to 3000 iterations, and convergence required a relative change in fit of 10 À9 or less. The PARAFAC2 modeling was performed in MATLAB (version R2021a) using the PLS toolbox (PLS_Toolbox 8.6.1, Eigenvector Research, Inc.).
To assess the spectral character of the components found in each model and compare them to the pure lignin phenol spectra, Tucker congruence coefficient (TCC) test was applied (Tucker 1951;Lorenzo-Seva and ten Berge 2006). A coefficient of 1 indicates that the spectra are perfectly similar, while a coefficient of 0 indicates that they are completely different. From the 40 models per interval, the best model was selected according to the scheme shown in Fig. 2. First, all models that did not converge were discarded and secondly only the model with the highest variance explained, for one to four components, respectively, was retained. From these remaining four models any model with two components that were similar because of overfitting and determined as TCC > 0.95 were discarded since they are statistically invalid (Rayens and Mitchell 1997;Lorenzo-Seva and ten Berge 2006). The remaining model with the highest explained variance and a spectral loading with a TCC > 0.98 to the pure spectrum of the expected lignin phenol in that interval, was chosen as the final model. This process was subsequently automated to run as a script in MATLAB with little user intervention. The MATLAB script is available from Bruhn et al. (2023a).

Quantification of phenolic monomers
The PARAFAC2 sample scores were used (instead of peak area or height as done with manual integration) in the construction of calibration curves. The calibration curves ranged over three orders of magnitude (1 nmol L À1 -25 μmol L À1 ). The range was therefore split into three segments, low (1, 2, 5, 10, 25 nmol L À1 ), middle (25,50,200, 1000 nmol L À1 ), and high (1, 2.5, 5, 10, 25 μmol L À1 ), and linear regression was performed for each of them. For calculation of lignin phenol concentrations in a specific sample, the appropriate segment was selected based on the PARAFAC2 scores. For the middle and high segments the calibration curves were forced through zero. The calibration standards ran in the middle of the sample run, one time.
The environmental concentration (nmol L À1 ) of each lignin phenol in the original seawater sample was calculated by multiplying the detected HLPC amount (nanomoles) by a concentration factor f c : where V elute is the volume eluted from the PPL cartridge (3 mL), V aliquot the volume of sample added to reaction vessel for cupric oxidation (0.6 mL), V vial is the volume of the HPLC vial (0.2 mL), and V extract is the volume (mL) of original seawater passing through the PPL cartridge (see Table 1). The concentration of the internal standard (CIN) measured in each sample was used to calculate the recovery of lignin phenols during the HLB extraction and clean-up. The recovery was calculated as the ratio between the detected concentration and the expected concentration of CIN (21.25 μmol L À1 ) in the HPLC vial. This recovery fraction was then used to correct for loss of natural lignin phenols in the samples by dividing the detected concentrations by it. We note that this entire calculation approach, from HPLC vial to environment, assumes no extraction bias originating from the SPE extraction using PPL (Arellano et al. 2018).
TDLP concentration (nmol L À1 ) was calculated as the sum of the 11 lignin phenol concentrations (TDLP11). The lignin carbon molar ratio (LCMR) was calculated as the ratio between lignin carbon and bulk DOC. For comparability with earlier studies, the TDLP11 was additionally normalized to the bulk DOC of the sample by dividing the TDLP11 in mg by the DOC in g (Λ 11 : mg g DOC À1 ).

Manual chromatogram integration
For comparison with the 2 nd derivative/PARAFAC2 quantification method, two manual integration methods were performed on the chromatograms as well: peak height (apex method); and peak area (perpendicular drop method). Before performing the two manual integration methods, new baselines were created under the chromatographic peaks of the lignin phenols and the internal standard. The baselines were absorbance. Intervals indicate where lignin phenols are eluting and is numbered with numerals from I to X. The chromatograms have been corrected so that the retention time of CIN peaks (interval X) align across samples. Table 3 lists which lignin phenols elute in a certain interval. The upper black lines indicate the limits of the intervals (which in most cases overlap). created by drawing a line between the two lowest points on each side of the peak. In case of co-eluting peaks, the baseline would be drawn between the two lowest points on each side of all peaks in the co-elution. The peak height in the apex method was measured as the distance between the newly created baseline and the apex of the peak (example in Supporting Information Fig. S1). The perpendicular drop method instead draws lines to the new baseline at the point where the peak starts and ends (example in Supporting Information Fig. S1). The area between these two lines is then calculated to determine the size of the peak. In case of co-elution with other peaks, the lowest point between the two peaks is used to draw the line. The peak height and peak area of each lignin phenol were determined using the extracted wavelength chromatogram at the wavelength of maximum absorbance for that specific lignin phenol (see Table 3 for the wavelength of maximum absorbance of all lignin phenols and the internal standard).
Both manual integration methods were applied to the raw analyte-specific wavelength chromatograms of the calibration standards, standard mixtures, the spiked sample and the same sample without spike. The derived peak heights and peak areas were then used to plot calibration curves and from these calculate lignin phenol and internal standard concentrations following the same procedure as described for the 2 nd derivative/PARAFAC2 method.

Limit of detection
The LOD was calculated for each lignin phenol based on the low segment calibration curves (calibration standard no. 14-18, Supporting Information Table S1), by dividing the standard deviation of calibration curve residuals by the slope of the calibration curve and multiplying by 3 (Shrivastava and Gupta 2011). For LOD determination in the presence of a DOM background, dilution of a natural seawater sample was performed by adjusting the injection volume in the HPLC instrument. For this a 1-100 fold dilution curve was used and here the dilution mimics the process of extracting less and less DOM from seawater onto the PPL cartridge. The LOD values were then calculated, similarly to the ones for the calibration curves, however, based on the residuals of the dilution curve.

Assessment of quantification methods
The results from PARAFAC2 and the two manual integration methods were compared to each other by assessing sensitivity, specificity and recovery for each lignin phenol across the three quantification methods (PARAFAC2, apex method, and perpendicular drop). Sensitivity comparison between the three quantification methods was assessed from the estimated LOD values for each. The specificity for the three quantification methods was assessed as the spectral similarity (TCC, see section "Selection of best PARAFAC2 solution") between the pure spectra and those extracted by the three different methods for each of the lignin phenols and the internal standard. For the perpendicular drop method, normalized spectra at each retention time point across the integrated lignin phenol peak were extracted and the TCC test was performed for each spectrum and a mean TCC value was then estimated for the whole peak. The recovery by the three quantification methods was examined as the ability to accurately find the added known concentrations of the lignin phenols and the internal standard in a spiked sample (see section "Analytical standards" for description of the spiked sample). The spiked sample was repeatedly measured 12 times spread out across a run with 92 samples.

Comparative lignin phenol analysis on GC-MS
We compared results of our newly proposed method for analysis of lignin phenols using HPLC-DAD, with a more traditional method utilizing cupric oxide (CuO) oxidation, liquid-liquid extraction and derivatization, followed by GC-MS, based on prior methods (Louchouam et al. 2000;Benner and Kaiser 2011;Osburn et al. 2016). For this comparison, we used water samples that were collected between 2019 and 2021 from surface waters of the Florida Coastal Everglades (FCE) and Florida Bay, FL. We hypothesized that if these divergent methods produced comparable lignin phenol concentrations and diagnostic ratios on coastal seawater samples, then the HPLC-DAD analysis is robust. Note that we made this comparison to test agreement between the methods and not to assess the accuracy of either method.

Statistical test and data availability
In order to test for differences between three quantification methods (PARAFAC2, apex method, and perpendicular drop) and between groups of samples, one-way ANOVA tests and t-tests were performed with a significance level set to a p value of 0.05. The one-way ANOVA tests were performed in MATLAB (version R2021a) using the integrated anova1 function.
The MATLAB script for the 2 nd derivative/PARAFAC2 algorithm is available from Bruhn et al. (2023a) and a dataset, including chromatograms from the oxidized DOM samples used in this study, can be downloaded from Bruhn et al. (2023b).

Extracted DOC
The DOC concentration in the original seawater samples varied between 103-227 and 65-72 μmol L À1 , respectively, for Kattegat and Davis Strait (Table 1). Therefore, different sample volumes were used to extract DOM onto the PPL cartridges for the two locations, approximately 2.4 and 4.8 L from Kattegat and Davis Strait, respectively. The amount of DOC extracted onto the PPL cartridge was calculated from the difference in the DOC concentration between the originally sampled water and the permeate water (water after PPL extraction has been performed). The extracted amount varied between 50 and 312 μmol DOC for Kattegat and 107 and 245 μmol DOC for the Davis Strait (Table 2). Depending on the sample, either the whole extract (for seven samples) or a 20% aliquot (for five samples) was transferred to the reaction vessel for oxidation. This resulted in the amount of DOC in the reaction vials varying between 18-312 and 21-49 μmol for Kattegat and the Davis Strait, respectively ( Table 2). The volume of the reaction vessel was 850 μL so the concentration of DOC during the oxidation varied between 22-367 mmol L À1 for Kattegat and 25-58 mmol L À1 for Davis Strait (Table 2).

Chromatographic separation and PARAFAC2 decomposition
The 11 lignin phenols and the internal standard eluted in 10 intervals between 8.6 and 16.7 min (Fig. 1a) in the order represented in Table 3. Five of the 12 compounds, more specifically PAD, VAD, SAD, PAL, and CIN were successfully separated by the column. This is particularly clear in the standard mixture in ultrapure water (Fig. 1a, black line). However, the phenols that eluted in intervals V-IX had variable degrees of co-elution. The spectral components of the final PARAFAC2 models for each of the intervals are shown in Fig. 3. For nearly all intervals, more than one component was necessary to explain the elution profiles. For the intervals (VI-VIII) which overlapped, target phenol spectra were repeatedly isolated across the intervals. This can be seen for example by the reoccurrence of SAL and FAD in interval VI and VII, and by VAD and SAD in interval II and III (Fig. 3). However, in contrast to interval V, where three target lignin phenols (CAD, PON, and VAL) were isolated simultaneously, the models for SAL, FAD, VAD, and SAD had a better fit when they were kept separated in individual intervals. For nearly all intervals there were unidentified co-eluting compounds, which spectra did not resemble the character of a single analyte species. The PARAFAC2 representation of the chromatograms for each interval reflected the measured data very well for all the samples, reference solutions, and standards, with the final models explaining > 98.7% of the variance in the original data across all intervals. An example of the fit is shown in Fig. 4, where the data in interval V is decomposed from three replicate measurements of the spiked sample. The actual model output is based on the 2 nd derivative data, but the reintegrated data is also shown for a more intuitive representation (Fig. 4, column to the right). The results demonstrate Fig. 3. Spectral components found from the PARAFAC2 decomposition of each interval. The components that match the pure spectrum of the lignin phenols are indicated in bold red lines for each interval. For interval V, three phenols are included in the same PARAFAC2 model (blue = PON, red = CAD, magenta = VAL). Non-bold components either represent co-eluting phenols from neighboring intervals, unknown compounds or residual background (not removed by 2 nd derivative transformation). Table 3. Overview of lignin phenols and spectral properties. The lignin phenols are presented in chromatographic order. The interval numerals refer to the highlighted sections in Fig. 1. λ max is the wavelength of maximum absorbance and ε max is the molar absorptivity.

Interval
Lignin phenol name Abbreviation λ max (nm) ε max (L mol À1 cm À1 ) how the PARAFAC2 modeling succeed in splitting the coeluting peaks into separate peaks in interval V (corresponding spectral components are shown in Fig. 3) and excluded interference from the neighboring peaks. While the CAD elution profile follows a Gaussian-like shape (red profile in Fig. 4d), PON and VAL (blue and magenta profiles in Fig. 4d) indicate a degree of fronting. The fourth component represents the background eluting signal and in this case has an absorption maximum at 258 nm (orange profile in Fig. 4d). Using manual integration, this signal ends up being combined into the others.

Molar absorptivity
Using the calibration standards, the molar absorptivity (ε max ) at the wavelength of maximum absorption (λ max ) of the phenolic monomers were calculated (Table 3). ε max ranged between 3329-11,418 L mol À1 cm À1 and λ max ranged between 254 and 323 nm. It should be noted that ε max is dependent on solvent, so the values reported here are specific for this mobile phase elution program. ε max was calculated to give a first indication for which of the phenolic monomers ought to have lowest LOD. However, this was not the case as CAD was found to have the second highest ε max but far from the lowest LOD (Table 4). In addition, PAD had the lowest LOD value, but far from the highest ε max . These contradictions are likely due to the additional factor of high co-elution with neighboring peaks.

Sensitivity, specificity, and recovery
To assess the sensitivity of the three quantification methods (2 nd derivative/PARAFAC2 method, apex method and perpendicular drop method) the LOD for each of the lignin phenols and the internal pure water standard were estimated (Table 4). The LOD values across all the lignin phenols were on the whole comparable indicating little influence of the approach for pure water standards (ANOVA, p = 0.26). In contrast, natural water samples are prone to interference from the background matrix of DOM. The LOD values in the presence of a DOM background were on the whole higher than the pure water standards (Table 4), and PARAFAC2 had a (magenta), and an unknown analyte/background noise (orange), for three measurements of the spiked sample. Output from PARAFAC2 is provided in the 2 nd derivative form (c), however, for ease of visualization the output is also shown in its non-derivatized form (d). For both forms, the original data (a, b) is compared to modeled data (e, f) and the residuals (g, h) shows the data not explained by the model. tendency for lower LOD values compared to the two manual integration methods, for most of the lignin phenols, except VAD, CAD, and PAL. A statistical comparison of the LOD values found by the three quantification in the presence of DOM, indicated that PARAFAC2 gave significantly lower values than perpendicular drop (t-test, p = 0.01), however did not significantly differ from the apex method (t-test, p = 0.06). The apex method was on the hand other not found to be significantly different from perpendicular drop (ttest, p = 0.14).
To assess the specificity of the three quantification methods, the spectral similarity, between the extracted spectra from each of the methods was assessed (Table 5; Supporting Information Fig. S2). While the spectra derived from the PARAFAC2 models matched the pure standards (due to the procedure in Fig. 2), the spectra from the  Table 5. Spectral similarity (TCC, where 1 means completely similar) between the normalized pure lignin phenol spectra and the normalized ones extracted from the three different integration methods. For the perpendicular drop method all spectra at each retention point (n = number of retention points) across the integrated lignin phenol peak were extracted, normalized and the TCC test was then performed for each of the spectra to that of the pure normalized lignin phenol spectrum, where a mean TCC was calculated in the end. manual integration approaches often deviated, in particular if the sample was not spiked (Table 5). This questions the validity of using the manual integration approach, without further method optimization for better peak separation. Despite the apparent relatively good performance of the LODs for the manual approaches, they are clearly not isolating the signal from the specific phenols (Supporting Information Fig. S2). The recovery of lignin phenol concentration for spiked samples using the 2 nd derivative/PARAFAC2, apex and perpendicular drop methods for quantification of known lignin phenol concentrations, was 91-101%, 88-99%, and 83-101%, respectively (Table 6). This recovery is based on the quantification method only and does not include the performance of the HLB column which is discussed in the next section. For the majority of the lignin phenols the 2 nd derivative/PARA-FAC2 approach resulted in comparable or significantly better recoveries. The only exceptions were for SAD and PAL.

Loss of lignin phenols during HLB extraction and clean-up
The loss of the lignin phenols and the internal standard purely due to the performance of the HLB cartridge was tested by passing three standard mixture samples (ultrapure water and no cupric oxidation) through the HLB cartridge. The mean recovery on the HLB cartridge was 82% (AE 19%) across all lignin phenols and the internal standard (Table 7). As CIN showed to have a mean recovery of 86% (AE 19%), using CIN as an internal standard for the HLB extraction and subsequent clean-up steps corrects very well for the loss of the lignin phenols. The effect of the purification using the HLB cartridge was monitored across 44 natural samples (including samples from an unpublished dataset containing marine samples) and the average losses of CIN for these samples was found to be 86% (AE 11%) as well. Of the 44 samples, only four samples exhibited losses higher than 40% for CIN. On the whole, the recovery of lignin phenols during the purification with HLB and the recovery of CIN in environmental samples were found to be very similar to the findings from other studies (Kaiser and Benner 2012;Arellano et al. 2018).

Phenol concentration and composition in samples
The lignin phenol concentrations and indices for the seawater samples are shown in Table 8. For Kattegat, the TDLP11 ranged between 4.34 and 10.09 nmol L À1 , while in the Davis Strait TDLP11 was lower and ranged between 2.08 and 3.19 nmol L À1 . Typical for all samples was that the concentration of PAD, VAD, and VAL was highest ranging from 1.12-2.54, 1.00-2.53, and 0.69-1.63 nmol L À1 , respectively, in Kattegat, and 0.53-0.57, 0.50-0.77, and 0.14-0.59, respectively, in the Davis Strait samples. The ratio between S (SAD, SAL, SON) and V (VAD, VAL, VON) phenols (S/V) for Kattegat and Davis Strait ranged between 0.15-0.26 and 0.19-34, respectively, while the ratio between C (CAD, FAD) and V phenols (C/V) were comparable between the two sites ranging between 0.08 and 0.17. The acid (Ad) to aldehyde (Al) ratios for S phenols (Ad/Al (S)) for Kattegat ranged between 1.52 and 2.52, which generally was higher to that detected in the Davis Strait samples (1.33-1.64), while the equivalent ratio for V phenols (Ad/Al (V)) differed slight between the two sites 0.92-1.98 in Kattegat and 1.29-3.73 in Davis Strait. Carbon normalized TDLP11 values (Λ 11 ) were similar between Kattegat and Davis Strait, ranging between 0.40-0.88 mg g DOC À1 . The similarity in Λ 11 implies that the fractions of terrestrial DOC are similar Table 6. The recovery (%) of known concentrations ($ 5 μmol L À1 ) in a spiked sample by the three quantification methods. The spiked sample was injected 12 times during the whole sample run.

PARAFAC2
Apex  in both water masses. The LCMR, ratio between lignin carbon and bulk DOC, ranged from 0.03-0.07 Â 10 À3 . To assess the reproducibility from cupric oxidation to analysis, we examined three replicate samples from the SRNOM stock solution (concentrations and diagenetic values are shown in Table 9). The mean TDLP11 for the three replicate SRNOM samples was found to be 293.04 μmol mol À1 DOC with a relative standard deviation (RSD) between replicates of 2.46%. Most of the lignin phenols had concentrations ranging between 2.43-26.24 μmol mol À1 DOC and RSD between 2.03% and 16.84%, while only PAD and VAD reached as high as 72.85 (AE 11.22%) and 100.35 (AE 2.70%) μmol mol À1 DOC. The S/V, C/V, Ad/Al (S), and Ad/Al (V) ratio was 0.24 (AE 1.83%), 0.14 (AE 3.24%), 6.06 (AE 3.24%), and 9.65 (AE 13.68%), respectively. Both the S/V and C/V for SRNOM was comparable to the seawater samples, while both the Ad/Al (V) and Ad/Al (S) were notably higher. The SRNOM reference showed to yield a Λ 11 of 3.81 mg g À1 (AE 2.46%), which is 4-10 times higher than the seawater samples reflecting the higher fraction of lignin compared to oceanic DOM, as expected.

Assessment of effect of DOC concentration on lignin phenols and ratios
To investigate if there was an effect of DOC concentration in the reaction vessel on the environmental lignin phenol concentrations and diagenetic ratios, all the parameters were plotted against the DOC concentration for the Kattegat samples (Davis Strait samples were excluded to avoid a location bias). From Figs. 5, 6, no visual correlation with increasing DOC concentration in the reaction vessel was observed for any of the parameters. Instead, the samples from Kattegat were divided into two groups, samples with less (n = 3) and samples with more (n = 3) than the mean DOC concentration (60 mmol L À1 ) in the reaction vessel (see the black line in Figs. 5, 6 for division of samples) to perform a statistical comparison between the two groups. Comparing lignin phenols at their environmental concentration (nmol L À1 ) showed no statistical difference between the two groups as an effect of DOC concentration (ANOVA, p values between 0.12 and 0.88; see Fig. 5). Similarly, TDLP11 did not differ significantly between the two groups (ANOVA, p = 0.92; Fig. 6) and neither did S/V,  Comparison to lignin phenols obtained from GC-MS While total lignin phenol concentrations obtained by the machine learning-assisted HPLC-DAD method were found to be similar to that determined with an established GC-MS method, there were clear differences in the composition (Fig. 7). SAL, SON, and SAD concentrations compared well between the two oxidation and quantification methods, but there were clear differences in the V and C phenols. The acid, aldehyde and ketone composition of the V phenols differed, however the sum of V phenols compared well.

Discussion
The majority of the lignin phenols, including the internal standard, were fully resolved in the standard mixtures due to the HPLC separation, with exception of the three lignin phenols (CAD, PON, VAL) in interval V. The 2 nd derivative transformation and PARAFAC2 decomposition helped improve the separation without the necessity of increasing the complexity of the chromatographic method. In contrast, Fischer and Höffler (2021) included multiple elution steps to achieve proper separation of all lignin phenols. Our HPLC separation reduces the run time by 20 min compared to the HPLC-DAD method developed by Lobbes et al. (1999) and is overall similar in run time to the more recent HPLC-DAD method developed by Fischer and Höffler (2021).
There are considerable benefits with respect to sensitivity and spectral confirmation of analytes using PARAFAC2. Skov and Bro (2008) showed that applying PARAFAC2 in contrast to the automated manual integration algorithm provided by ChemStation (software from Agilent Scientific) increased the linearity for co-eluting peaks and thereby potentially improved LOD. Even so, it was apparent from the LOD values in Table 4, that for most of the lignin phenols in ultrapure water the sensitivity was not increased using the 2 nd derivative/PARAFAC2 method. The presence of a DOM background overall resulted in a lower sensitivity for all three quantification methods, based on the estimated LOD values. While extrapolating baselines in the manual integration methods is an alternative approach to circumnavigate the signal from background DOM remaining in the sample after the HBL clean-up, the 2 nd derivative/PARAFAC2 method is a far simpler procedure requiring little effort from the operator. In addition, the PARAFAC2 decomposition can better separate interferences of potential co-eluting unknown analytes from the final estimation of lignin phenol concentrations and therefore lead to increased sensitivity (i.e., lower LOD, Table 4). Achieving lower LOD values in the presence of DOM, with the 2 nd derivative/PARAFAC2, also implies that the volume of original seawater sample required for samples for seawater can be reduced compared to the use of manual integration techniques. Overall, our observations between manual integration and PARAFAC2 modeling of chromatographic data underline the findings of Amigo et al. 2010 where they also showed that resolving highly overlapping peaks in chromatographic data by PARAFAC2 is a very powerful tool. In addition, there is a considerable improvement in data processing, removing the need for manual baseline calculation and integration of each peak, and potential for user bias.
From Table 5 and Supporting Information Fig. S2 it is clear that the specificity of the 2 nd derivative/PARAFAC2 method exceeds the manual methods for natural samples. The higher specificity for identification of lignin phenols using the 2 nd derivative/PARAFAC2 probably comes from the mathematical separation of co-eluting analytes and removal of background noise, which for manual integration cannot be removed properly and therefore interferes with the extracted spectrum of the lignin phenol, as seen in Supporting Information Fig. S2 (red lines).

Comparison to existing methods
The calculated LOD values for the 2 nd derivate/PARAFAC2 method were an order of magnitude lower, for most of the lignin phenols, compared to previous studies with a similar analytical setup (Steinberg et al. 1984;Lobbes et al. 2000;Fischer and Höffler 2021). The LOD values achieved are comparable to newer MS studies on lignin quantification where Reuter et al. (2017) found LOD values between 1 and 3 nmol L À1 and Yan and Kaiser (2018b) found between 1 and 8.7 nmol L À1 (10-87 femtamoles in 10 μL injection volume). The RSD for the triplicate SRNOM samples for the whole process, from cupric oxidation to HPLC analysis, was found to range between 2.03% and 16.84%, with most of the lignin phenols having a RSD below 3%. Yan and Kaiser (2018b) found RSD for their Suwannee River humic acid standard to be ranging between 1.1% and 14.9% for the lignin phenols produced from their whole process, with most of the phenols having a RSD over 5%, using an ultra-HPLC-MS/MS setup. From a sensitivity and reproducibility aspect, the proposed HPLC-DAD combined with 2 nd derivative/PARAFAC data decomposition is therefore competitive with more advanced instrumentation. One of the disadvantages of absorbance compared to MS detection has always been lower specificity. However, application of the 2 nd derivative/PARAFAC2 method and spectral matching of isolated components clearly improves specificity of HPLC-DAD. Effects of organic matter content on oxidation Yan and Kaiser (2018a) found that using DOM containing less than 100 μg DOC (8.33 μmol) in their reaction vessel (220 μL; 38 mmol L À1 DOC) did not only result in substantial lower TDLP values, but also changed the composition and thereby the calculated ratios between lignin phenols, due to over-oxidation. However, they found that adding ascorbic acid and lowering the concentration of NaOH, provided more stable results, for DOC amounts down to 5 μg (0.42 μmol; 2 mmol L À1 DOC in the reaction vessel). From their study, it however seems that there may also be an upper limit, with changes in lignin phenol composition at DOC concentrations higher than 38 mmol L À1 , in the reaction vessel. Kaiser and Benner (2012) also found, using a slightly different oxidation method, that a minimum of 50 mmol L À1 DOC was required in the reaction vessel to avoid over-oxidation problems and that an increase to 133 mmol L À1 DOC did not change lignin phenol composition nor TDLP values, however recommend the addition of glucose as antioxidant for samples with less than 166 mmol L À1 DOC. From our study, the statistical analysis between samples with less and more than 60 mmol L À1 DOC in the reaction vessel, indicated no significant difference in lignin phenol concentrations and ratios between the two groups (Figs. 5, 6). These findings suggest no systematic effect on lignin phenol yields of high DOC concentration in the reaction vessel and that the CuSO 4 concentrations in the reaction vessel with the current method, similar to that proposed by Yan and Kaiser (2018a), is enough to oxidize large amounts of DOM (up to 327 mmol L À1 ). This was further confirmed with a larger dataset of field samples (see Supplementary Information Figs. S3, S4). These findings indicate that concerns with DOC concentration in the reaction vessel should mostly be focused on avoiding too small sample size, which however appear to be countered with the addition of an antioxidant.

Comparison with measurements from other methods
Overall, the findings in our study agrees with other studies, that the sum of lignin phenols makes < 1% of DOC in coastal waters (Harvey and Mannino 2001;Hernes and Benner 2003;Walker et al. 2009;Osburn and Stedmon 2011). The concentration of lignin phenols found in this study for Kattegat are lower than what has previously been published for the same area. Osburn and Stedmon (2011) found that the sum of S and V phenols (S + V) in Kattegat ranged from 0.97 to 5.08 μg/L, where we found the S + V to range between 0.13 and 0.29 μg/L. The S/V and Ad/Al (V) ratios for Kattegat agree with that found by Osburn and Stedmon (2011). Similar to Osburn and Stedmon (2011) we also found an inverse relationship between lignin and salinity (see Supporting Information Fig. S5). For the Davis Strait, the individual lignin phenol concentrations varied between 0.04 and 0.77 nmol L À1 , while TDLP11 varied between 2.08 and 3.19 nmol L À1 , and fit well with other studies in similar regions. Kaiser and Benner (2012) reported 1.9 nmol L À1 for TDLP11 in Arctic Ocean surface waters and Benner et al. (2005) found approximately 2 nmol L À1 for S + V for the Arctic outflow on East Greenland. The S/V ratios (0.19-0.34) are similar to that reported by Benner et al. (2005) for the East Greenland current (0.28) and by Kaiser and Benner (2012) for Arctic Ocean surface waters (0.32).
The comparison of lignin phenol concentrations obtained with those from a well-established GC-MS method that utilizes a different oxidation and clean-up procedure was promising. Overall, both methods showed similar results, in particular for the concentration of TDLP8, individual S phenols, sum of S phenols and sum of V phenols. Interlaboratory comparison between HPLC-DAD and GC-MS was also performed by Lobbes et al. (1999), where they found substantial differences for most of the lignin phenols (as much as 300% difference for PAD). Their outcomes were however similar to the findings here in our study revealing smaller differences when comparing the sum of lignin phenols between HPLC-DAD and GC-MS, instead of individual lignin phenols. Yan and Kaiser (2018b) showed when comparing HPLC and GC coupled with tandem MS, the deviations in oxidation and chromatography only led to differences of 0-16%.
That we were able prove similarities between the HPLC-DAD and GC-MS methods in quantifying lignin phenols holds promise for the future of our new method, though clearly future work is required to identify reasons for these uncertainties. We speculate that the difference in concentrations obtained by the HPLC-DAD and GC-MS methods are mostly due to the oxidation (respectively, using CuSO 4 and CuO) and preparation steps rather than the detection. Yan and Kaiser (2018a) discovered that the oxidation of DOM with CuSO 4 compared to CuO mostly led to similar or higher concentration of lignin phenols, and they found that this was clearest for V phenols. In our study, we only observed a higher concentration of VAD measured by the HPLC-DAD method using CuSO 4 , compared to the GC-MS method using CuO, whereas we rather observed similar or underestimation for the rest of the lignin phenol concentrations. This possibly indicates an over-oxidation occurring and clearly warrants further investigation and interlaboratory comparison.

Future perspectives
The seawater sample volume requirements for HPLC-DAD have been greatly reduced compared to earlier lignin quantifications, due to the use of smaller reaction volumes for the cupric oxidation and 2 nd derivative/PARAFAC method for quantification. Although as much as 2.4-5 L of seawater was initially used for extraction of DOM, lignin phenol concentrations for some samples were determined from only a fifth part of the DOM, which equals extraction from 0.5 to 1.0 L. The extraction volume needed will however depend on sample area, as the composition and concentration of DOM and hence lignin phenols can vary greatly. It is therefore necessary to investigate what the minimum volume needed for extraction is in a particular ocean region. The need of smaller sample volumes will lower the water budget on sampling expeditions and cut the run time for SPE extraction, reducing cost and equipment needed. The lower sample volume also means that lignin measurements can be carried out for smallscale experiment designs, that is, bio-and photodegradation experiments. In addition, switching to an ultra-HPLC setup can lower solvent usage even more and at the same decrease run time, by increasing pressure, while maintaining resolution (Yan and Kaiser 2018b). Switching to ultra-HPLC can also improve LOD values, as peaks resolution will also improve (become taller and more narrow).
A challenge with PARAFAC2 is often the required skills of the user to choose the right number of components. However, with the routine proposed here (see Fig. 2) the choice is now automated and this should allow for wider use in lignin quantification using HPLC-DAD chromatography. Also the application of PARAFAC2 here considerably reduces the time needed to process the chromatographic data, and limits user bias. This is not the first demonstration of automation of PARAFAC2 in chromatography as Johnsen et al. (2014) has also shown that they could automate the selection of the right model using a classification model (partial least squares-discriminant analysis) based on seven quality criteria. However, a recent integrated approach titled PARAFAC2 based Deconvolution and Identification System (PARADISe) has compromised the complex coding and thereby made the application of PARAFAC2 extremely userfriendly, timesaving, and showed to produce reliable results that are less user-dependent (Petersen and Bro 2018). In addition, Baccolo et al. (2021) has automated the application of PARA-DISe for untargeted GC-MS analysis, so that it can be applied over the entirety of the chromatogram and extract all relevant spectra, elution profiles and relative abundance of different components, which can be attributed to specific compounds.
The interlaboratory comparison between HPLC-DAD and GC-MS does not indicate which of the two techniques is more accurate, as it was not the purpose of the study and a well characterized community consensus reference material does not yet exist. The study does however highlight the need for such an exercise for further method development and standardization.

Comments and recommendations
Application of the 2 nd derivative/PARAFAC2 method to HPLC-DAD chromatograms overall showed to increase the performance at lower concentrations in the presence of DOM.
The LOD values were significantly improved compared to using the perpendicular drop manual integration method. The 2 nd derivate/PARAFAC2 method also improved specificity and thereby confidence in identification of targeted lignin phenols. The approach opens the opportunity to perform lignin quantification using HPLC-DAD with little effort required for optimizing chromatographic conditions and integrating chromatograms, since it facilitates the separation of highly overlapping peaks that are otherwise challenging to isolate. Applying the proposed PARAFAC2 algorithm to 2 nd derivative chromatograms therefore speeds up the process and reduces interference from background DOM matrix. The interlaboratory comparison further showed that HPLC-DAD with our 2 nd derivative/PARAFAC2 method can achieve similar results to GC-MS. This reinforces the use of both techniques for the quantification of lignin phenols.
It was found that adding varying sample size, hence concentrations of DOC, to the reaction vessel did not significantly change the lignin phenol concentration nor diagenetic ratios across samples from Kattegat. This suggests that our cupric oxidation method can be applied to a variety of DOC concentrations without much concern. However, the DOC limits should still be tested for the specific sample area and laboratory conditions. Our 2 nd derivative/PARAFAC2 method provides scientists and laboratories with an option to measure lignin in the ocean, even within a complex DOM matrix, using a simple HPLC-DAD setup instead of mass spectrometry. This alternative is cost-effective and eliminates the need for derivatization of the DOM prior to analysis. Finally, the 2 nd derivative/PARA-FAC2 method provides a powerful way to enhance analytical measurements that span aquatic environments and scientific disciplines holding promise for other applications that provide similar data structure such as the quantification of amino acids, and algal pigments.