Source apportionment of polycyclic aromatic hydrocarbons in New York/New Jersey Harbour sediment

Data on polycyclic aromatic hydrocarbons (PAHs) measured in surface sediment and cores in the New York/New Jersey Harbour under the Contamination Assessment and Reduction Project (CARP) was examined via Positive Matrix Factorization (PMF), which revealed six sources. Two represented the higher and lower molecular weight (MW) fractions of coal tar and/or creosote (pyrogenic) sources and explained 49% of PAH mass in the sediment samples. Two sources were related to uncombusted petroleum (petrogenic) sources, such as heavy fuel oil and crude oil, and explained 30% of PAH mass. The final two sources were related to combustion (pyrogenic) sources such as gasoline‐ and diesel‐fuelled vehicles and explained 21% of the PAH mass. Sediment cores revealed that Σ22PAH increased from the pre‐industrial period until about 1980 and then decreased because of efforts to control water pollution via mechanisms such as the Clean Water Act.

sources are characterized by low molecular weight (LMW) and alkylated PAHs (Morrison & Murphy, 2010;Sofowote et al., 2008;Stogiannidis & Laane, 2015;Yan et al., 2006;Yu et al., 2015). Various qualitative and quantitative methods have been used for PAH source apportionments, such as the diagnostic PAH ratio approaches (Yan et al., 2006;Yu et al., 2018), multivariate analysis methods (Larsen & Baker, 2003) and carbon isotope ratios (Saber et al., 2004;Yan et al., 2006). Also, receptor modelling statistical methods, such as Principal Components Analysis (PCA) and Positive Matrix Factorization (PMF), have been applied for source apportionment of PAH (Callen et al., 2014;Deka et al., 2016;Luo et al., 2008;Sofowote et al., 2008Sofowote et al., , 2011Wang et al., 2013;Yu et al., 2015Yu et al., , 2016. The NY/NJ Harbour is the third largest harbour in the United States and one of the busiest urbanized harbours and industrialized coastal areas in the world (Cannon, 2008;Douglas et al., 2003;Gunster et al., 1993a,b;Mitchell et al., 2018). The NY/NJ Harbour is located in the highly developed industrial/commercial/transportation infrastructure of NY/NJ, which includes a marine port, numerous shipping and loading sites, oil refineries, international airports, former manufactured gas plants and several major highways with a considerable volume of vehicular traffic. Therefore, like other harbours around the world, the NY/NJ Harbour has been impacted by PAHs from multiple sources (Crawford et al., 1995;Gunster et al., 1993a,b;Naumova et al., 2002;Rodenburg et al., 2010;Valle et al., 2007;Yan, 2004;Yan et al., 2006).

The history of this region suggests the likely main sources of
PAHs. The island of Manhattan was first settled by Europeans around 1624. The City of New York grew rapidly around this nexus, and after the completion of the Erie Canal in 1825, it became the largest city in the United States. Coal was the main source of fuel for decades, and the coal ash and other trash were disposed at the city's margins, which eventually filled in large areas that were later developed (Campbell-Dollaghan, 2015;Gunster et al., 1993a). During the 1950s, coal was replaced by oil and natural gas as a primary fuel source for the city's buildings (Yan, 2004). The harbour surrounding city has housed multiple petroleum processing facilities and numerous coal gasification plants. In addition, NYC has always been a transportation hub housing three international airports, shipping terminals and major roadways. The various shipping terminals housed thousands of creosote-impregnated pilings, and the transport of petroleum led to oil spills throughout the harbour.
The Contamination Assessment and Reduction Project (CARP) (Hydroqual Inc., 2007) measured PAHs in air, water, sediment and discharges such as storm water and treated wastewater during 1998-2002 in the NY/NJ Harbour. The purpose of this work was to analyse the CARP database on PAH concentrations in the sediment of the NY/NJ Harbour to understand their sources and examine their spatial and temporal trends. This is the first comprehensive examination of PAH sources in NY Harbour, which is one of the largest harbours in the world by population. It relies on data measured from across the region, including all important sub-bays and tributaries. It also utilizes measurements from sediment cores, allowing an evaluation of the history of PAH pollution in this important region. This work is novel in the examination of a comprehensive data set for the entire harbour, and the results are useful in comparing this harbour to other industrialized estuaries around the world.

| Data collection
Concentrations of 24 PAHs (or PAH alkyl homologues) in 201 sediment samples from 38 sites in the NY/NJ Harbour (Figures 1 and 2) were measured as part of the CARP programme during [2000][2001][2002] and taken from the CARP database (Hydroqual Inc., 2007). The data that support the findings of this study are available by request to the Hudson River Foundation (info@hudsonriver.org). The original purpose of the CARP was to provide data on contaminants in sediments and surface water and inputs (tributaries, treated sewage, etc.) to support the development of a detailed water quality model of the harbour (Contamination Assessment and Reduction Project [CARP], 2007).
The ultimate goal was to identify sources of contamination so that they could be controlled, leading to cleaner sediment that could be safely disposed at a low cost. Two of the PAHs in the database (C2 and C3 phenanthrenes/anthracenes) were not utilized in the PMF analysis because they were measured in very few samples. Sediment samples were collected in 2000 and 2001 in the form of surface sediment (0-10 cm) and sediment cores. Ponar grab samplers (stainless steel) were used to collect most of the surficial sediment samples, although the Smith-McIntyre sampler and modified Van Veen sampler were also used for some surface samples. A stainless-steel box corer was used to collect surficial and less than 40 cm core sediment samples. Deeper sediment cores were collected by electric vibrocore.
Cores were sectioned using a Tech Ops Extruder. Most cores were sectioned into 6 cm slices, although this varied, and deeper sections of cores were cut into larger sections in order to obtain enough contaminant mass for detection. PAHs were measured by SGS AXYS (British Columbia, Canada). Complete method information is given in the references (Hydroqual Inc., 2007;NYS Department of Environmental Conservation, 2003). Briefly, samples were extracted with dichloromethane and cleaned up using silica, gel permeation chromatography and treatment with activated copper. Gas chromatography was performed using an RTX-5 column. Concentrations of PAHs were measured by isotope dilution gas chromatography with low-resolution mass spectrometry in electron impact mode using multiple ion detection (MID), acquiring at least one characteristic ion for each target analyte and surrogate standard. Deuterated compounds were used as surrogates (17 PAHs) and labelled recovery standards (three PAHs). The data underwent an extensive quality assurance audit (Booz Allen, 2003). . This dating scheme is utilized here, but it is important to remember that all dates are approximate.

| PMF analysis
PMF is an advanced modelling tool developed by Paatero and Tapper (1994). PMF has been used to investigate contaminant sources in a variety of environmental media (Callen et al., 2014;Deka et al., 2016;Gupta et al., 2011;Luo et al., 2008;Sofowote et al., 2008Sofowote et al., , 2011Wang et al., 2013;Yu et al., 2015Yu et al., , 2016. In PMF, the observation data matrix (X), which is composed of observed samples (n) and chemical species (m), is indicated by two matrices, the source contribution matrix (G) and source profile matrix (F), as described in the Equation (1): The solutions for G and F matrices, from running PMF, are achieved by minimizing the objective function Q, which is described by: Q represents the sum of squared deviations between the observations (X) and the model (GF), weighted by the measurement uncertainties (s ij ), where i is the sample and j is the analyte. The error model À14 (EM = À14) was employed for approximating the error matrix to reduce the weight of data that is missed or below the detection limit.
Details of PMF modelling are described elsewhere (Du et al., 2008; F I G U R E 1 Map of the study area showing the New York/New Jersey Harbour and important sample locations. Paatero & Tapper, 1994;Rodenburg et al., 2011;Sofowote et al., 2008;Yu et al., 2015).
The PMF2 software of Paatero and Tapper (1994) was used. This programme requires three input data sets: the concentration matrix, the Limits of Detection (LOD) matrix and the uncertainty matrix. All three matrixes have the same dimensions, in this case, 201 samples and 22 analytes. For the concentration matrix, one-half of the detection limit was used for measurements that were below the detection limit. The LOD matrix was constructed using the detection limits given in the CARP database. The uncertainty matrix was calculated from the standard deviation of the recoveries of corresponding deuterated surrogates that are listed in Table 1. Detected concentrations were F I G U R E 2 Sum of 24 polycyclic aromatic hydrocarbons (PAHs) measured in surface sediment samples from multiple locations (see map, Figure 1) throughout the New York/ New Jersey Harbour.
assigned one time this uncertainty. Concentrations below the detection limit were assigned three times this uncertainty. The input matrices are given in Supporting Information. PMF2 was run for 2 to 7 factors. Each number of factors was run 10 times using seed values from 1 to 10. The optimal number of factors was chosen based on the criteria outlined by Reff et al. (2007).
For identification, the cosine of theta (cosθ) was used to estimate the similarity between the PMF-generated fingerprints as the first vector with profiles of PAHs in some known sources as the second vector. The cosθ ranges from 1 to 0. If cosθ is 1, the two vectors are identical. When cosθ is 0, the two vectors are not similar. Usually, a value of cosθ larger than 0.7 suggests a high similarity between the two vectors (Soonthornnonda et al., 2011). Cosθ is a scale-free measure of proportional connectivity and provides the most meaningful analysis of this data (Weinand, 1974). The cosθ is calculated from where F i and S i are PAH percentages in PMF fingerprint and the known source, respectively.
For identification of the PMF-derived fingerprints, the PAH composition of sources was obtained from the following references: coal was taken from Stout and Emsbo-Mattingly (2008) (2004) and Yan et al. (2006), who examined sediment in the NY/NJ Harbour areas. However, the PAH T A B L E 1 PAH analytes and their corresponding surrogates, with the standard deviation of the surrogate recoveries, which were used as uncertainty estimates in the PMF analysis. concentrations were comparable to results elsewhere, such as the Chesapeake Bay (Foster & Wright, 1988) and Yangtze Estuary Yu et al., 2015). (Sum of 16 EPA priority pollutant PAHs were measured in the studies mentioned above.)

| Derivation of PMF factors
The optimal number of factors was six (Figure 3), based on several lines of evidence. First, the relative standard deviation of the G matrix was 1.22% for the 10 seed runs, indicating a stable solution. Second, the measured concentrations of PAHs were regressed against the contribution of each factor (the G matrix) using multiple linear regression; for all selected factors, the coefficients were positive and highly  C0N, C1N, C2N, C3N, BIPH, ACEY, ACE and C0F were not measured in Sofowote et al. (2008). and concentrations of parent PAHs higher than those of their alkylated homologues. Petrogenic sources were represented by two factors (Factors 1 and 5), which are characterized by a high proportion of LMW PAHs with alkylated PAHs more abundant than parents (Boehm et al., 2018;Sofowote et al., 2008;Stogiannidis & Laane, 2015;Wang et al., 2013;Yu et al., 2016). The dominance of PAHs derived from combustion (pyrogenic) over petrogenic PAHs in the NY/NJ Harbour areas was noted by other researchers (Yan, 2004;Yan et al., 2006).
The spatial variations in the factors are discussed in Supporting Information.

| Oil spills
Factors 1 and 5 were identified as spills of relatively fresh petroleum products. Factor 1 (petrogenic) explained 24% of the PAH mass ( Figure 3) and was characterized by high loads of C1Ps/A (33%), followed by C0P (14%) and C2N (12%). This factor is thought to represent heavy fuel oil. This conclusion is supported by the high loading of alkylated phenanthrene, naphthalene and volatile LMW PAHs, which suggest that this factor is related to fuel oil ( Figure 3) (Stogiannidis & Laane, 2015;Yu et al., 2016). Additionally, the highest value of cosθ for this factor was 0.965 versus heavy fuel oils (Table S1); (note that naphthalene homologues were excluded from the calculation of cosθ).
Factors 1 and 5 together comprise nearly half of the PAHs in harbour sediment. This high proportion is reasonable considering that the NY/NJ Harbour area has housed multiple oil refineries and oil storage facilities for more than 100 years. Currently, there are about 60 petroleum facilities and several container facilities in the harbour (Crawford et al., 1995;Yan, 2004). Fresh petroleum spills have in the past occurred frequently along the waterway (Crawford et al., 1995;Gunster et al., 1993a,b;Huntley et al., 1995;Yan, 2004). It has been reported that nearly 18 million gallons of petroleum products, including over 12 million gallons of #6 fuel oil, were released in the NY/NJ Harbour Estuary as a result of more than 1453 accidental incidents between 1986 and 1991 (Crawford et al., 1995;Gunster et al., 1993a, b

| Coal related
Factors 2 and 6 are thought to be related to coal-derived input, such as coal tar. Factor 2 (pyrogenic) explained 7% of the PAH mass ( Figure 3). It is characterized by high loading of LMW PAHs, such as C0N (33% of the fingerprint) and contained high proportions of C1N (16%), C1P/A (11%) and FLUOR (10%) (Figure 3). This factor is thought to represent the lighter fraction of coal tar. The best match (highest cosθ) for this factor was with coal tar, cosθ = 0.942 (Table S1) Factor 6 is also thought to be related to coal sources. It explained 23% of the PAH mass. It is dominated by HMW parent PAHs such as FLUOR (28%) and PYR (25%), followed by BbjkF, BAP and C0C, suggesting it is pyrogenic. The value of cosθ between Factor 6 and creosote was 0.978 (Table S1). Creosote was typically produced from coal tar. These lines of evidence suggest that Factor 6 represents creosote and, more broadly, the heavy fraction of coal tar (Figure 3) (Evans et al., 2009;Huntley et al., 1995;Stout & Graan, 2010;Stout & Wang, 2016;Wang et al., 2013;Yu et al., 2016).
Together Factors 2 and 6 explain 30% of the PAH mass in the harbour sediment. This high proportion is reasonable considering that coal has been used extensively as the feedstock for refining kerosene since ca. 1800, as fuel for residential heating and as the feedstock for manufactured gas plants (Hatheway, 2012). The influence of coalrelated sources is consistent with the high population and early urbanization of the NY/NJ Harbour area, which housed many manufactured gas plants. Moreover, creosote, which is derived from coal tar, has been commonly used as a preservative for railway ties, pilings and bridges within NY/NJ Harbour (Hatheway, 2012;Yan, 2004;Yan et al., 2006). The large contribution from coal-related sources is in agreement with Valle et al. (2007), who concluded that creosote from marine pilings was responsible for 76% of PAH emissions to the water column and coal tar sealants were responsible for 22% of PAH emissions to land (as opposed to water).  (Table S1). PAHs related to combustion in motor vehicle engines are common sources of PAHs (Harrison et al., 1996;Marr et al., 1999;Stout & Wang, 2016). Both factors were compared with the HMW portion of fingerprints of gasoline and diesel emissions reported by Sofowote et al. (2008), that is,  (Table S1). Factor 3 was dominated by high loadings of BbjkF (25%), BAA (12%), BAP (10%), C0C (10%), INDP (9.5%), BGHIP (9%) and C0A

| Vehicle emissions
(3%) (Figure 3). Moreover, C0A (2.92% of Factor 3) is used as a tracer for gasoline (Callen et al., 2014;Sofowote et al., 2008;Yu et al., 2016). Factor 3 explains 14% of the PAH mass in the harbour sediment data set. Factor 4 explained 7% of the PAH mass ( Figure 3) and is thought to represent diesel combustion products. The cosθ between Factor 4 and diesel emissions had a value of 0.813 (Table S1), which is similar to the value of 0.810 for the urban background that can include vehicle emissions. Furthermore, BbjkF, BAA, BEP and BGHIP (19%, 15%, 13% and 9%, respectively) dominated Factor 4 (Figure 3). These components are characteristic of diesel combustion (Sofowote et al., 2008;Yu et al., 2015Yu et al., , 2016. Together, the two petroleum combustion factors (3 and 4) explain 21% of PAH mass in the harbour sediment. The contributions of these factors were consistent with the highly urbanized nature of this area with its substantial road networks and vehicular, ship and train traffic, and the usage of diesel in powered engines (Valle et al., 2007;Yan, 2004;Yan et al., 2006).
Their weathered profile was similar to Factors 3 and 4, but this comparison is not exact because this weathered factor was found in the sediment samples that were sometimes exposed to the air allowing weathering, whereas the sediment samples of the NY/NJ Harbour examined in this study were never exposed to air. The author attributed diesel emissions to the increasing movement of large vessels using diesel fuels; gasoline sources were influenced by the highly urbanized nature of the harbour, which has substantial road networks and vehicular traffic.
In contrast, in the Yangtze River Estuary, China, PAHs were attributed to three sources: coal and gasoline combustion, coke plant emissions, and wood or grass combustion, with contributions of 50%, 25% and 25%, respectively (Yu et al., 2016). Yu et al. stated that coal and gasoline combustion sources were impacted by using coal for heating and gasoline emissions from traffic, whereas coke plant emissions were influenced by many coke plants and steel production factories. Because coal is less commonly used as a heating fuel in the United States, our conclusion that the contribution from coal sources in NY/NJ Harbour is lower is reasonable. Wood or grass combustion was influenced by the burning of straw to use as cooking fuel and fertilizer. Because this practice is rare in the NY/NJ Harbour, it is not surprising that our PMF analysis did not generate a factor related to biomass combustion.
In the present work, sources of PAHs related to coal, gasoline and diesel combustion were found to be important, contributing 23%, 14% and 7% of total PAHs, respectively. Differences in the abundance of each type of source between the urban areas mentioned above and the NY/NJ Harbour can be attributed to differences in the intensity of use of these sources across regions. Hudson River Estuary around the 1970s (Chillrud et al., 1999;Eisenbud, 1978;Yan et al., 2006). The closure of the Fresh Kills Landfill occurred in 2001 and therefore is not responsible for the decline of Σ 22 PAHs in this core, which was collected in 2000.

| Jamaica Bay (JB3) core
The Jamaica Bay core is dominated by a spike in Σ 22 PAHs in the 0.4 to 0.75 m slice ( Figure S8). This spike is attributed mostly to Factor 5 (crude oil), with some contribution from Factor 4 (diesel emissions), suggesting that it may be the result of an oil spill that happened between the 1960s and 1990s ( Figure S8) (Gunster et al., 1993a,b).
Below this slice (below 0.75 m), PAHs in this core are around  Figure S9). All six factors followed a similar temporal pattern. This location is reasonably well mixed, such that this core comes closest to characterizing the sources of PAHs to the harbour as a whole. Thus, the decline in PAHs towards the surface of this core may represent the overall improvement in water quality resulting from the implementation of the Clean Water Act in the 1970s and 1980s.

| Newark Bay (NWB01) core
The core was collected in Newark Bay near Bayonne City Park. The deepest slice of this core (0.9 m) reaches pre-industrial levels, with a Σ 22 PAHs concentration of 0.94 μg/g ( Figure S10). There is a spike in Σ 22 PAH concentration in the core slice from 0.24 to 0.36 m, which represents the 1950s or 1960s. This spike is driven primarily by Factor 3 (gasoline emissions) and Factor 4 (diesel emissions). Other than this spike, Σ 22 PAHs concentrations remain largely constant in the upper core. Several studies have suggested that petroleum combustion is one of the major PAH sources in Newark Bay (Huntley et al., 1995;Yan, 2004;Yan et al., 2006).

| Conclusions
Pre-industrial Σ 22 PAHs sediment concentrations were below about 1 μg/g. The concentration of total PAHs began to exceed these levels in the early part of the 20th century, and combustion sources are The other authors declare no competing financial interest. We thank the Hudson River Foundation (particularly Jim Lodge), which oversaw the CARP and continues to provide access to the CARP data and documentation.

DATA AVAILABILITY STATEMENT
The data analysed in this manuscript is available from the Hudson River Foundation (email: info@hudsonriver.org).