Untargeted characterisation of dissolved organic matter contributions to rivers from anthropogenic point sources using direct‐infusion and high‐performance liquid chromatography/Orbitrap mass spectrometry

Rationale Anthropogenic organic inputs to freshwaters can exert detrimental effects on aquatic ecosystems, raising growing concern for both environmental conservation and water security. Current regulation by the EU water framework directive (European Union, 2000/60/EC) relates to organic pollution by monitoring selected micropollutants; however, aquatic ecosystem responses require a comprehensive understanding of dissolved organic matter (DOM) composition. The introduction of high‐resolution mass spectrometry (HRMS) is set to greatly increase our understanding of the composition of DOM of both natural and anthropogenic origin derived from diffuse and point sources. Methods DOM was extracted from riverine and treated sewage effluent using solid‐phase extraction (SPE) and analysed using dissolved organic carbon analysis, direct‐infusion high‐resolution mass spectrometry (DI‐HRMS) and high‐performance liquid chromatography (HPLC)/HRMS. The data obtained were analysed using univariate and multivariate statistics to demonstrate differences in background DOM, anthropogenic inputs and in‐river mixing. Compound identifications were achieved based on MS2 spectra searched against on‐line databases. Results DI‐HRMS spectra showed the highly complex nature of all DOM SPE extracts. Classification and visualisation of extracts containing many thousands of individual compounds were achieved using principal component analysis (PCA) and hierarchical cluster analysis. Kruskal‐Wallis analyses highlighted significant discriminating ions originating from the sewage treatment works for more in‐depth investigation by HPLC/HRMS. The generation of MS2 spectra in HPLC/HRMS provided the basis for identification of anthropogenic compounds including; pharmaceuticals, illicit drugs, metabolites and oligomers, although many thousands of compounds remain unidentified. Conclusions This new approach enables comprehensive analysis of DOM in extracts without any preconceived ideas of the compounds which may be present. This approach has the potential to be used as a high throughput, qualitative, screening method to determine if the composition of point sources differs from that of the receiving water bodies, providing a new approach to the identification of hitherto unrecognised organic contribution to water bodies.

This approach has the potential to be used as a high throughput, qualitative, screening method to determine if the composition of point sources differs from that of the receiving water bodies, providing a new approach to the identification of hitherto unrecognised organic contribution to water bodies.

| INTRODUCTION
Fresh surface water is a fundamental resource not only for drinking water and irrigation, but also for supporting terrestrial and aquatic ecosystems. 1,2 Dissolved organic matter (DOM) is ubiquitous to all aquatic systems and is an extremely complex mixture of organic compounds although its composition has remained intractable due to the lack of suitable analytical methods. 3 DOM has been asserted to be a nutrient for autotrophs. 2,3 The range of compounds comprising DOM includes compounds generated naturally and through anthropogenic activities, and can include potentially toxic micropollutants which attract much attention in water quality legislation as they have been shown to have adverse impacts upon organisms within the aquatic ecosystems. [4][5][6] Despite individual anthropogenically derived compounds being at low concentration, the chronic exposure of stream biota to these compounds has been shown to have a wide range of acute ecotoxicological and chronic adverse effects on organisms. [4][5][6][7] These include the disruption of reproduction, 8,9 a reduction in biodiversity, 10 and dysmorphia in the maturation of organisms. 11 Furthermore, different compounds which affect organisms in a similar way can work synergistically amplifying the impact. 7,12 Regulations only cover a very minor proportion of the commonly identified micropollutants, and many others almost certainly remain to be discovered.

Micropollutants have been identified in different discharges
including sewage treatment works. 13,14 Sewage treatment works have been found to be a major gateway for the release of pharmaceuticals, 15 personal care products, 16 and plasticisers 17 into the environment. The concentration and presence/absence of target compounds across different sewage treatment works have been found to vary between different sites and over time. 13,15,16,18,19 With over 9000 sewage treatment works in the UK and numerous other point sources, the identification of potentially ecotoxicological compounds remains a challenge. Without identifying these micropollutants; the determination of ecotoxicity, effective mitigation solutions and environmental monitoring cannot be carried out.
The most common approach to the determination of organic compounds in both wastewater and the natural aquatic environment ecosystem is targeted analysis using mass spectrometry (MS) approaches focusing on known or suspected compounds. 4,20 Optimised extraction methods are used to isolate and concentrate the target analytes with subsequent interrogation involving gas chromatography (GC) 21,22 or high-performance liquid chromatography (HPLC) 13,23 linked to MS. Targeted studies have largely focused on pharmaceuticals, 15,24 personal care products, 16,25 and pesticides, 21,26 with their concentrations or load in the riverine environment being used to assess the effectiveness of sewage treatment and local sources. 13,15,27 The obvious limitation of targeted analysis is that it requires a predetermined list of known compounds. Targeted analysis will only determine the selected compounds and exclude other compounds originating from a point source or the environment. The use of electrospray ionisation (ESI) and high-resolution mass spectrometry (HRMS) has revolutionised the analysis of complex mixtures of water-soluble compounds, such as DOM, allowing the exact masses of individual molecules to be determined. 28,29 The ionisation of intact molecules and their mass analysis using instruments with high resolving power and high mass accuracy mean that each ion in a spectrum potentially corresponds to a unique compound (taking account of adducts and isotopes). Application of this approach has revealed the extraordinary complexity and heterogeneity of DOM in the natural environment, as evidenced by the DI-HRMS spectra containing many thousands of resolved ions. 30,31 One of the major challenges of utilising these HR mass spectra of DOM lies in the interrogation of the data. Attempts have been made to assign formulae to the observed ions in the spectra, using rulebased calculations. [32][33][34] All studies include carbon, oxygen, nitrogen and hydrogen; however, the inclusion of heteroatoms, e.g. P, Cl and S, varies between studies. 30, 35,36 Increasing the number of heteroatoms results in an exponential increase in the number of possible formulae for a single ion, resulting in a high level of uncertainty and false positives. 33 Isotopes and adducts, i.e. [M + Na] + , [M + K] + , [M + Cl] − , will be present in all DI-HRMS spectra, but are rarely accounted for.
Hence, despite the high mass resolution attainable using modern Fourier-transform ion cyclotron resonance (FTICR) or Orbitrap™ MS instruments, the exceptional complexity of the mass spectra obtained largely defies conventional approaches to handling these unusual data sets.
An alternative approach is to move toward data visualisation rather than more conventional peak identification approaches. One such approach is the use of van Krevelen diagrams. Such diagrams use the ratios of carbon:hydrogen and carbon:oxygen of the formulae assigned to ions as a basis for the comparison of DOM in water extracts. [37][38][39][40][41] These elemental ratios of formulae are used to classify ions to a compound class. 30,31,40,41 However, the interpretation of a van Krevelen diagram relies on the correct assignment of formulae, including appropriate numbers of heteroatoms. Incorrect assignments will lead to the inaccurate interpretations of differences in the composition of DOM extracts. Furthermore, a single ion in a DI-HRMS spectrum may be the result of multiple isomers and, therefore, the full complexity is not revealed. In addition, the correct classification using a van Krevelen diagram of a compound class for one isomer may be incorrect for another isomer with the same formula.
Despite this van Krevelen diagrams have found utility in visualising differences in composition of DOM extracts from different aquatic systems, addressing a range of questions relating to DOM source and variability between ecosystems, e.g. differences between water bodies in different geographical locations. 38,42,43 While van Krevelen diagrams have proved useful for visualising differences between DOM extract chemistries, the approach is nonstatistical and is rather restricted in truly exploiting the full complexity of the data, e.g. ion intensities and molecular species of unassigned formulae. An alternative, but still less widely applied approach, is multivariate statistics, in particular principal component analysis (PCA) of DI-HRMS spectra. The latter has been used to determine and visualise differences between the composition of DOM extracts from different solid-phase extraction (SPE) methods 44 and different water bodies within the same pristine catchment. 42 PCA requires only the detected ions and their intensities in different DI-HRMS spectra to determine if extracts are different. However, this approach has not been applied to point sources in comparison with their receiving environment.
Herein, we address the challenge of how to deal with the question of the complexity of riverine DOM analysis by HRMS. We have taken a comprehensive approach in order to retain a broad view of DOM composition and developed a method for data reduction based on a difference algorithm to highlight complex anthropogenic DOM contributions against a natural or semi-natural DOM background. To achieve this, we first recorded DI-HRMS spectra of DOM recovered by SPE, then used PCA as a rapid qualitative screening method to determine if differences exist between DOM extracts of point sources and the receiving aquatic environment. Following this the difference algorithm, employing univariate statistics (Kruskal-Wallis analysis) was applied to allow the anthropogenic point source components to be identified in DI-HRMS spectra. Heatmaps and hierarchical cluster analysis were then used as data visualisation tools, which allowed compositional differences to be recognised. The  The filtered water samples (1 L) were acidified to pH 2 using hydrochloric acid (30%, TraceSelect, Sigma-Aldrich, Gillingham, UK) and extracted using Oasis Hydrophilic-Lipophilic Balance (HLB) SPE cartridges (400 mg bed mass, 60 μm particle size, Waters Ltd, Elstree, UK). The cartridges were conditioned using HPLC grade methanol (3 mL, Rathburn Chemicals Ltd, Walkerburn, UK) and HPLC grade water (3 mL) before the acidified filtered water (1 L) was extracted.
After extraction, the cartridges were rinsed with acidified HPLC grade water (3 mL, Fisher Scientific) and dried under vacuum for 30 min. The extracts were eluted from the SPE cartridges with HPLC grade methanol (6 x 1 mL) and dried under a steady stream of nitrogen.
An aliquot of each extract (100 μL) was mixed to create a pooled quality control (QC) and an aliquot of each extract (50 μL) was removed and dried under a steady stream of nitrogen for DOC analysis. The pooled QC and all extracts were then stored at −85°C until required for analysis.

| DOC analysis
The dried 50-μL aliquots of the extracts were dissolved in MilliQ water (20 mL, Merck Millipore) before DOC analysis. Filtered water samples were analysed directly. All analyses were carried out using a TOC-L analyser (Shimadzu, Milton Keynes, UK) using the non-purgeable organic carbon (NPOC) method recommended by the manufacturer for the analysis of environmental water samples. The results for the mean of three to five injections of 150 μL, where the coefficient of variance for replicate injections was <2%, are presented in Table 1. The total ion chromatogram (TIC) was assessed for any losses in signal during analysis. Extracts were analysed in random order. The mixed QC and calibration solutions were analysed after every five extracts, and the mass drift was 1.8 ppm over all analyses.

| HPLC/HRMS and HPLC/HRMS/MS analysis
The SPE extracts (10 μL) were analysed by HPLC/HRMS using a Dionex Ultimate HPLC system (Thermo Scientific) coupled to an Orbitrap™ Elite Hybrid Ion Trap-Orbitrap™ mass spectrometer with a HESI source. Chromatographic separation used an ACE UltraCore Super C 18 column (150 x 2.1 mm i.d., 25 Å particle size; Hichrom, Reading, UK). The column was kept at a constant temperature of 50°C. The gradient program used HPLC grade water as mobile phase A and HPLC grade acetonitrile (Fisher Scientific) as mobile phase B, both with 0.1% formic acid (Fisher Scientific) as a modifier. The flow rate was kept constant at 350 μL min −1 . The gradient program was as follows: 5% B for 1 min, 5% to 95% linear gradient for 30 min and 95% held for 5 min before returning to 5% in 1 min and remaining at 5% for 4 min. All spectra were recorded using the nominal resolving power at "120,000" in positive ion mode for the mass range m/z 150 to 2000 in centroid mode and the AGC target was set to 1,000,000.
The source voltage was set to 3.5 kV, the source temperature to 80°C , the sheath gas (nitrogen) flow rate to 30 arbitrary units (arb), the auxiliary gas (nitrogen) flow rate to 10 arb, the sweep gas (nitrogen) flow rate to 10 arb, and the capillary temperature to 275°C.
Between each analysis a solvent blank of HPLC water was run to ensure that there was no carry over between samples.

| Data processing
DI-HESI-HRMS files were converted from Thermo.raw to. mzML using MSConvert. All 100 scans were merged using an openMS spectramerger module in KNIME. 45,46 This was done because the XCMS package for the peak picking of DI-HRMS expects a single mass spectrum. Ion picking and alignment were performed using the XCMS package (v 1.52.0) in R (v 3.4.0) to create a data matrix of ion intensities aligned by mass. 47 The changes in the mass accuracy across the analytical run were assessed using the accurate mass of standard ions, and ions were aligned using a mass tolerance of 5 ppm. The ion had to be present in three of the five replicate DI-HRMS analyses.
to .mzML using MSConvert. 48 Peak picking and alignment were performed using the XCMS (v 1.52.0) package in R (v 3.4.0) to create a data matrix of sample intensities aligned by mass and retention time. 47,49,50 The method used for peak picking was the centWave algorithm which is recommended for peak picking and alignment of HPLC/HRMS data. Peaks were picked above a signalto-noise (s/n) ratio of 10; the mass tolerance allowed was 10 ppm with a retention time tolerance range of 15 to 60 s. The peaks were

| DI-HRMS analysis of SPE extracts
The DI-HRMS spectra of the upstream, sewage outfall and downstream extracts are shown in Figure 1.   However, it was quickly recognised that continuing with manual comparisons of this sort across the full spectral range would be prohibitively time-consuming due to the many thousands of ions present in these mass spectra. Set out below is a new protocol for processing such a dataset to allow in-depth interrogation of source contributions.

| Statistical comparisons of DOM based on DI-HRMS spectra
The starting point for the statistical analyses is to establish if differences exist between the compositions of extracts in relation to the ions present and their intensities. The latter proceeds with creation of a data matrix of the ions aligned by their accurate masses and intensities for each DI-HRMS spectrum. After this "peak picking" step the DI-HRMS spectra were aligned to reveal 3237 ions detectable above a s/n ratio of 5. PCA was then applied to the generated data matrix to initially assess whether differences existed in composition between the extracts; the results are shown in   Even higher complexity is revealed through HPLC/HRMS than was apparent in the DI-HRMS spectra. The "peak picking" algorithm detected 14,325 individual components across all extracts, which was recognised by aligning their unique masses and retention times (m/z@rt), producing a second data matrix of peak areas. A components peak area was compared in ratio form across the three different extracts using a ternary plot ( Figure 5). The components found in each of the three extracts show three main trends. (i) The green area of the ternary plot highlights components where <5% of the total peak area is attributable to the upstream extract, confirming that these components derive from the sewage outfall and downstream extracts. As shown by the ternary plot most components have a higher contribution from the sewage outfall as these plot between 50 and 100% on the axis of the sewage outfall.
This reflects their absence/low abundance in the river background (upstream), high abundance in the sewage effluent, and reduced abundance downstream due to in-stream dilution. (ii) The blue area of the plot highlights components where <5% of the total peak area is attributable to the sewage outfall. This shows that these components are predominantly found in the river (downstream and upstream extracts). (iii) The red area highlights components where >5% of the peak areas is found in all three sources, showing that these components are common to all SPE extracts. The ternary plot Of the 96 components identified, 72 related to the polymer PPG, alluded to above and discussed further below. The other 24 compounds were a mixture of pharmaceuticals, illicit drugs, flame retardants and metabolites, as summarised in Table 2. Twenty-two of the compounds characterised have been previously identified in other sewage treatment effluents and/or surface water. 13,15,60,61 Two novel compounds were identified, namely the antiretroviral raltegravir and also piperine, which is a natural product derived from black pepper.
The antiretroviral raltegravir was tentatively identified based on multiple CID spectra recorded at a range of energies. Further evidence for the identification of raltegravir was obtained using higher energy   collision dissociation at the same collision energies used to obtain the reference spectra recorded in mzCloud (10-100 eV). ii. Manual assessments of the DOM composition, while revealing specific spectral features driving differences in DOM composition, emphasise the need to use chemometric statistical methods to interrogate datasets of this complexity.
iii. PCA of the DI-HRMS spectra was readily able to resolve the different DOM sources, including in-stream mixing.
Hierarchical cluster analysis showed that the composition of the downstream DI-HRMS spectra was more similar to that of the sewage outfall spectra than those of the upstream extracts, confirming the importance of the point source contribution to the overall DOM.
iv. Heatmapping facilitated visualisation of the changes in the intensity of ions between DI-HRMS spectra including the determination of ion intensity changes which were not readily identifiable by directly comparing the DI-HRMS spectra.
v. Comparison of the sewage outfall and upstream DI-HRMS spectra using Kruskal-Wallis analysis provided a critical statistical data reduction step to identify the most important molecular species driving the differences in composition between the DOM extracts. Others remain to be investigated to determine their environmental behaviour and potential ecosystem impact in waters.
viii. Industrially produced oligomeric PPGs were identified using DI-HRMS and HPLC/HRMS in sewage effluent for the first-time.
Overall, the results demonstrate that considerable value exists in combining DI-HESI-Orbitrap™-HRMS and HPLC/HESI-Orbitrap™-HRMS for the analysis of complex DOM extracts. Our approach also highlights the value of applying statistical approaches to the assessment of complex data sets to determine the components differing between sources. Such an approach would have value in assessing compositional differences of any point source in river systems or between temporal events driven biologically, seasonally and/or anthropogenically.