Mining airborne particulate size distribution data by positive matrix factorization


  • Liming Zhou,

    1. Center for Air Resources Engineering and Science and Department of Chemical Engineering, Clarkson University, Potsdam, New York, USA
    2. Now at Providence Engineering and Environmental Group LLC, Baton Rouge, Louisiana, USA.
    Search for more papers by this author
  • Eugene Kim,

    1. Center for Air Resources Engineering and Science and Department of Chemical Engineering, Clarkson University, Potsdam, New York, USA
    Search for more papers by this author
  • Philip K. Hopke,

    1. Center for Air Resources Engineering and Science and Department of Chemical Engineering, Clarkson University, Potsdam, New York, USA
    Search for more papers by this author
  • Charles Stanier,

    1. Department of Chemical Engineering, Carnegie Mellon University, Pittsburgh, Pennsylvania, USA
    2. Now at Department of Chemical and Biochemical Engineering, University of Iowa, Iowa City, Iowa, USA.
    Search for more papers by this author
  • Spyros N. Pandis

    1. Department of Chemical Engineering, Carnegie Mellon University, Pittsburgh, Pennsylvania, USA
    Search for more papers by this author


[1] Airborne particulate size distribution data acquired in Pittsburgh from July 2001 to June 2002 were analyzed as a bilinear receptor model solved by positive matrix factorization (PMF). The data were obtained from two scanning mobility particle spectrometers and an aerodynamic particle sampler with a temporal resolution of 15 min. Each sample contained 165 size bins from 0.003 to 2.5 μm. Particle growth periods in nucleation events were identified, and the data in these intervals were excluded from this study so that the size distribution profiles associated with the factors could be regarded as sufficiently constant to satisfy the assumptions of the receptor model. The values for each set of five consecutive size bins were averaged to produce 33 new size intervals. Analyses were made on monthly data sets to ensure that the changes in the size distributions from the source to the receptor site could be regarded as constant. The factors from PMF could be assigned to particle sources by examination of the number size distributions associated with the factors, the time frequency properties of the contribution of each source (Fourier analysis of source contribution values), and the correlations of the contribution values with simultaneous gas phase measurements (O3, NO, NO2, SO2, CO) and particle composition data (sulfate, nitrate, organic carbon/elemental carbon). Seasonal trends and weekday/weekend effects were investigated. Conditional probability function analyses were performed for each source to ascertain the likely directions in which the sources were located. Five factors were separated. Two factors, local traffic and nucleation, are clear sources, but each of the other factors appears to be a mixture of several sources that cannot be further separated.

1. Introduction

[2] There have been many studies that find relationships between elevated morbidity and mortality and higher particulate matter (PM) concentrations [Dockery and Pope, 1994; Pope et al., 1995; Brunekreef et al., 1995; van Bree and Cassee, 2000]. In order to develop an effective control strategy for airborne particles, the relationship between the sources and receptor concentrations needs to be understood.

[3] Positive matrix factorization (PMF) is a powerful tool for solving receptor models with aerosol composition data and has been used successfully in identifying the sources of the airborne particles in many studies [Xie et al., 1999; Lee et al., 1999; Ramadan et al., 2000; Chueinta et al., 2000; Polissar et al., 2001; Song et al., 2001]. Recently, the aerosol size distribution data have been analyzed by principal component analysis [Ruuskanen et al., 2001; Wahlin et al., 2001] and PMF [Kim et al., 2004].

[4] It is obvious that particles in different size ranges have unique characteristics and one always needs to consider classifying the size range before detailed analyses. A three-modal structure was widely used to describe continental particle number size distribution, including a nucleation mode at 10–20 nm, an Aitken mode at 40–80 nm, and an accumulation mode at 100–300 nm [Whitby, 1978; Raes et al., 1997; Hussein et al., 2004]. Lognormal distribution functions representing the three modes were used to parameterize size distribution, and the modes fit by the lognormal distribution functions were thought to be associated with particle sources and processes during the transport [Birmili et al., 2001]. Using constant modal distribution functions was also proved possible if particle growth processes from nucleation mode to Aitken or even accumulation mode were not considered [Mäkelä et al., 2000]. It is worth mentioning that for size distributions with a stable structure, using variant modal distributions instead of constant ones tends to be an overfitting, which means that the small random variations of the modes and even measurement errors may be fit and, consequently, the features of the modes may be smeared.

[5] The Pittsburgh supersite was located in a strong anthropogenic source region, and the size distribution measured there cannot be fit well by the aforementioned three-modal distribution [Stanier et al., 2004c]. Thus some other methods need to be applied for classifying the size ranges.

[6] Previously, size distribution data from July 2001 measured by the Pittsburgh Air Quality Study (PAQS) have been analyzed to identify the particle sources by PMF [Zhou et al., 2004]. Five sources were identified: secondary aerosol (from distant sources), local stationary combustion, remote traffic (probably from the interstate highway 1 mile away), local traffic (from the street and minor roads close to the receptor site), and nucleation, with decreasing sizes. The PMF method has successfully divided the particles into the five classes on the basis of both their size ranges and temporal behavior without prior assumptions such as lognormal distribution and the number of modes. The PMF analysis requires stationary or quasistationary size distributions measured at the receptor site. In a short period such as one month, conditions that might affect particle size changes such as photochemical activity or temperatures may be taken as relatively constant, and the change of the particle size distribution can be thought to be sufficiently constant to permit the PMF analyses.

[7] Our recent study has indicated linear relationships between size distribution data and chemical composition data simultaneously measured at the Pittsburgh supersite [Zhou et al., 2005], and this is a direct proof of the stationarity of the size distribution.

[8] In this study, a larger data set from PAQS, containing 1 year of size distribution data from July 2001 to June 2002, has been analyzed for source identification. Data mining is a process to discover patterns and relationships in data using various tools. The tools used in this study, including PMF, will be introduced in the next sections. Each factor found by PMF can be thought of as a pattern that represents the variations of a source or a group of sources. The possible sources associated with each factor (or pattern) will be investigated.

[9] Over a full year the atmospheric processes influencing the size distributions vary significantly, and our previous efforts in directly analyzing the full year data set proved to be inappropriate [Zhou et al., 2003]. Therefore this analysis will be performed on a month-by-month basis.

[10] In our previous work [Zhou et al., 2004], the days with extensive nucleation events, especially those events followed by particle growth, were excluded, and this leads to an incomplete description of the summer situation at the Pittsburgh area. In this study, a special method has been designed to remove only the data representing particle growth events, and thus a more thorough and complete investigation will be made.

2. Description of the Data

[11] The receptor site was located in Schenley Park, Pittsburgh (latitude 40.4395°, longitude −79.9405°). An overall summary and preliminary results for PAQS were given elsewhere [Wittig et al., 2003]. The size distribution data were obtained from July 2001 to June 2002. Samples were collected and measured continuously every 15 min. The data were from two scanning mobility particle spectrometers (SMPS) and an aerodynamic particle sampler (APS). Above 583 nm, the data used in this study represent the electrical mobility diameter inferred from aerodynamic mobility and estimated density [Khlystov et al., 2004]. The ratio of APS diameter and SMPS diameter is around 1.3. The samples were collected at 25% relative humidity, and “dry” particle distributions were obtained [Stanier et al., 2004a].

[12] The gas phase concentrations (O3, NO, NOx, SO2, and CO, with a 10 min resolution), particle mass and composition (PM2.5, sulfate, and nitrate, with 10 min time resolution; organic carbon (OC) and elemental carbon (EC), with 2 to 4 hour resolution), and meteorological conditions (wind direction, wind speed, etc., with 10 min resolution) were measured at the same time and location as the size distribution data. A detailed description of the instrument and sampling methods can be found elsewhere [Wittig et al., 2003; Stanier et al., 2004b]. Each size distribution sample contained 165 geometrically equal sized intervals covering the particle size range of 0.003–2.5 μm.

3. Exclusion of Data Representing Particle Growth

[13] Nucleation events observed at the Pittsburgh site were classified as regional and short-lived by Stanier et al. [2004b]. In a regional nucleation event, it is often observed that the newly formed particles continue growing. Particle growth events are confined to limited time intervals and size ranges from over 10 nm up to accumulation mode size. A typical growth event after the nucleation is shown in Figure 1, where the number concentrations on 2 July 2001 are shown. These particle growth events were also observed at many other places, and the experimental and theoretical studies on this phenomenon were recently reviewed by Kulmala et al. [2004]. Discussions of the mechanisms of these events are beyond the scope of this study.

Figure 1.

Typical particle growth on 2 July 2001.

[14] Receptor models require that the property being apportioned be stationary or, in this case, that there must be constant size distribution profiles associated with all factors. The temporal variations of the size distribution caused by particle growth events will deform the model solutions and make the sources unidentifiable. Particle growth during a regional nucleation event was identified, and the data representing the growth were removed by the following method. The number-concentration-weighted mean diameter is defined by the following equation:

equation image

In addition to the variation of number concentration dN/dt, the number-concentration-weighted diameter can also be used to investigate the nucleation event. If there is a nucleation event, the number concentration increases sharply, and the mean diameter should simultaneously drop sharply since the newly formed particles have the smallest sizes and the largest number concentrations.

[15] Since many particle growth events occurred in the size range above 6 nm, the total number concentration and mean diameter above 6 nm, N6 and dp6,av, were used. Here, N6 is the total number concentration over 6 nm; dp6,av is computed by equation (1) for all particles larger than 6 nm. The beginning of particle growth is defined quantitatively as the point at which no significant decrease of mean diameter occurs when the number concentration has a sharp increase and the mean diameter rises gradually afterward. In Figure 2, t1 is found to be the start of particle growth. The mean diameter dp6,av does not change with time when particle growth becomes stable. A polynomial regression of log dp6,av against time was made to estimate the time after which the variation in mean diameter is sufficiently small, as shown in Figure 2. When the first derivative of the regressed value is below a criterion value, such as at t2 in Figure 2, the growth can be considered to have terminated.

Figure 2.

Number concentration and mean size variations on 2 July 2001.

[16] By assuming the number concentration distribution versus dp6,av to be lognormal (during the particle growth after the nucleation, the number concentration distribution over 6 nm is usually unimodal), we define the data, within ±1.2σ from the regressed dp6,av and between t1 and t2, representing particle growth, where σ is the standard deviation of the lognormal distribution for each time interval. Figure 3 shows the result of this definition for 2 July 2001. The data thus defined were treated as missing values, which means replacing these values by the mean values and assigning them high uncertainties (low weights), such as 10 times their concentrations. Thus these data will not influence the PMF result and will not be reflected in the factors obtained by PMF. The data before and after this operation were inspected visually, and sometimes the parameters were adjusted manually to properly define the particle growth zone. If a regional nucleation event is followed with no particle growth, the corresponding data were not processed by this method. The days processed by this method are listed in Table 1. From this table, it can be seen that this kind of particle growth phenomenon rarely happened in winter.

Figure 3.

Definition of particle growth zone for 2 July 2001.

Table 1. Days With a Particle Growth Zone Defined (Data by This Definition Excluded in PMF Analysis)
July 20012, 11, 14, 15, 24, and 30 July
Aug. 20017, 11, 14, and 18 Aug.
Sept. 20011, 5, 11, 14, 16, and 26 Sept.
Oct. 20017, 10, 11, 15, 18, 20, 24, 25, 29, and 30 Oct.
Nov. 20013, 4, 8, 9, 10, and 22 Nov.
Dec. 2001None
Jan. 200225 and 26 Jan.
Feb. 200213, 14, and 25 Feb.
March 20026, 7, 9, 15, 23, 24, and 29 March
April 20022, 10, 11, 12, 16, 17, 18, and 19 April
May 20024, 5, 10, 11, 15, 16, 17, 23, 24, 26, 27, 28, 29, 30, and 31 May
June 20022, 3, 4, 5, 8, 9, 21, 22, 23, 24, and 29 June

4. PMF and Other Tools for Mining the Size Distribution Data

4.1. PMF

[17] A detailed introduction to PMF can be found elsewhere [Paatero, 1997]. In this study, PMF2 (two-way PMF) is used to solve the following two-way receptor model:

equation image

or, in the elemental form,

equation image

where X is the matrix of observed data and the element xij is the number concentration value of the ith sample at the jth size bin. G and F are the source contributions and size distribution profiles, respectively, of the sources that are unknown and to be estimated from the analysis. To be specific, gik is the concentration of particles from the kth source associated with the ith sample and fkj is the size distribution associated with kth source. E is a matrix of residuals.

[18] The residual sum of squares (Q) is minimized by finding the optimal F and G values as indicated in equations (2) and (3):

equation image

[19] Rotations are obtained by setting an FPEAK value [Paatero et al., 2002]. When the FPEAK value is positive, the following additional term is included in the object function Q:

equation image

where β2 corresponds to the FPEAK value. The term defined above attempts to pull the sum of all the elements of F toward zero and makes the program do elementary transformations for F and G by subtracting the F vectors from each other and adding corresponding G vectors to obtain a more physically realistic solution. Recently, a method for solving rotational ambiguities by detecting edges in G space have been developed, and the best solution is obtained without clearly defined edges in G space [Paatero et al., 2005]. In this study, the FPEAK value is set to 0.2 for each of the months to eliminate G space edges, and five similar factors were found for each month.

[20] The F values were normalized to make the sum of each size interval equal to 1 for each factor. The G values were then rescaled correspondingly. The uncertainties were estimated with the method described by Zhou et al. [2004]. To smooth the size distribution data and minimize the error caused by the discontinuity between instruments, every five consecutive size bins were combined into one, and 33 new size intervals were produced from the original 165 size bins. The analysis of volume size distribution data does not provide much more information since it generates similar source contributions as number size distribution analyses do [Zhou et al., 2004], and hence it will not be performed in this study. During the period between October 2001 and April 2002 the APS was not working, and only 29 new size intervals were produced for these months. This also caused incomplete information on volume size distribution. Table 2 summarizes the data set after the aforementioned treatment.

Table 2. Description of the Whole Year Data (Data Representing Particle Growth Excluded)
Size, μmNumber Concentration, number cm−3Mean Volume Concentration, μm3 cm−3
Mean, × 103SD,a × 103MinMax, × 103Median, × 103Lower 25%, × 103Upper 25%, × 103
  • a

    SD, standard deviation.


4.2. Correlation Analyses

[21] Correlations of some compositions and gases with source contributions were investigated. Using only gas data as an input to PMF will not bring much more information, and both data sets need to be averaged to 30 min. The 15 min resolution has made possible a more thorough time series analysis than 30 min resolution, such as in the time of day effect.

4.3. Conditional Probability Function (CPF)

[22] A conditional probability function [Ashbaugh et al., 1985; Kim et al., 2004] was calculated using the source contributions obtained by PMF2 and wind direction values by the following equation:

equation image

where mΔθ is the number of occurrences in the direction sector that exceeds the threshold, upper 25th percentile of the fractional contribution from each source, and nΔθ is the total number of wind occurrences in the same direction sector. Fractional contributions are used to avoid the influence of atmospheric dilution on CPF results. The angular width of direction sector is set as 10°, and thus there are 36 directional sectors. Those samples corresponding to a wind speed below 1.0 m s−1 are excluded from this study, and two thirds of the total samples were excluded. The sources are thought to be located in the direction sectors with high CPF values. When nΔθ is below 10, the CPF value is set to zero.

4.4. Fourier Analysis

[23] The source contributions from each month are linked in a temporal sequence to form annual source contribution series, and Fourier analyses have been performed for these annual series. In the signal-processing field, it is well known that Fourier analysis has no temporal specificity. For example, if the Fourier analysis says that the series has a daily pattern (24 hour period), it cannot determine that the daily pattern exists at any specific time in the series. Therefore many time-frequency analysis techniques have been developed and used in combination with Fourier analyses. To address the problem in this study, the Fourier analysis was performed for the source contributions of each month. The correlations of the source contributions with the gas and particle composition data were calculated for each month.

5. Results and Discussion

[24] Table 2 describes the average number and volume size distribution of the full year of data. The volume distribution mode is located between 0.1 and 1 μm. The smallest particles have the largest mean number concentration, but their median concentrations are not the highest. This situation may be caused by the frequent nucleation events that produce large numbers of new particles. Table 2 also indicates that the minimum number concentrations in all the size intervals are nearly zero, implying very low contributions from certain sources. The 12 months of data were summarized and described in detail elsewhere [Stanier et al., 2004c]. Figure 4 indicates the histograms of the scaled regression residuals for all size intervals of all 12 months. Most of the residuals are within −2 and +2, and the distributions are symmetric and close to normal distributions. If the size distributions measured at the receptor are not stationary or quasistationary, then these good regressions cannot be obtained. The five factors are arranged in order of decreasing size from factor 1 to factor 5. These five factors are similar to the sources identified in our previous study [Zhou et al., 2004]. Detailed discussion is provided as follows.

Figure 4.

Histograms of the scaled regression residuals. The horizontal axes indicate the scaled residual, and the vertical axes indicate the frequencies.

[25] Factor 1 has a number mode between 0.15 and 0.25 μm and a submode at 0.02 μm, as shown in Figure 5. Detecting edges in F space is a method to help resolve rotation problems [Henry, 2003]. For July 2001 the number concentrations of the two size intervals d1 (240 nm) and d2 (23 nm), the center size intervals of the two modes, were plotted in Figure 6. The edge corresponding to FPEAK = 0.2 was also plotted. When pulling down the submode by increasing the FPEAK value, the edge will rotate toward the d1 axis, but it is clear that the edge cannot overlap with the d1 axis, suggesting that the submode cannot be eliminated completely. Nevertheless, for the solutions with different rotations, the major mode and source contributions did not experience large changes, and our analysis and conclusions were not severely influenced.

Figure 5.

Size distribution profiles for each month.

Figure 6.

Discussion of the two modes of factor 1 in July 2001. (a) Size distribution of factor 1. (b) Scatterplot of the number concentration of the central size intervals of the two modes.

[26] In Figure 7, the Fourier analysis shows a weak daily pattern for factor 1. Table 3 indicates that the daily pattern is clear in February and April of 2002 but not that clear in other months. Fourier analyses have also been applied for sulfate and nitrate data for each month. These analyses found that sulfate does not have daily patterns except in the summer months (it had weak daily patterns in July and August of 2001 and June of 2002) and nitrate usually has a strong diurnal pattern during the whole year except in November and December of 2001 and January and March of 2002. On the basis of these facts, one possible explanation for the periodicity of factor 1 is as follows: When the sulfate concentration is low and nitrate has a strong simultaneously diurnal pattern, such as the situation of February and April of 2002, the temporal behavior of factor 1 can be influenced by nitrate and shows observable daily patterns. Another possible reason is that some local emissions have been included in factor 1. However, the reason cannot be completely clarified within the current data sets.

Figure 7.

Fourier transformation for the annual contribution of each factor. (The powers are all nearly zero for all factors from 0.2 h−1 to 2 h−1, the Nyquist frequency.)

Table 3. Results of Monthly Fourier Analysis for Daily Patternsa
 Factor 1Factor 2Factor 3Factor 4Factor 5
  • a

    The strength of the daily patterns was decided by comparing the spectral intensity at 1/24 h−1 with other frequencies.

July 2001noweaknostrongstrong
Aug. 2001nononostrongstrong
Sept. 2001very weakvery weaknostrongstrong
Oct. 2001very weakweaknostrongstrong
Nov. 2001novery weaknostrongstrong
Dec. 2001nononostrongstrong
Jan. 2002nononostrongstrong
Feb. 2002weakvery weaknostrongstrong
March 2002nononostrongstrong
April 2002weakweaknostrongstrong
May 2002noweaknostrongstrong
June 2002noweaknostrong

[27] Factor 1 has a strong correlation with PM2.5 mass over the entire period as indicated in Table 4 and is related to the major components of the PM2.5 mass at the Pittsburgh supersite. In winter the correlation of nitrate with factor 1 increases, suggesting that more nitrate is included in factor 1.

Table 4. Correlations r of the Source Contributions With Gas and Particle Composition Dataa
  • a

    The source contributions and the species with 10 min resolution were averaged to 30 min; for OC and EC the source contributions were averaged to the corresponding OC/EC sampling period.

  • b

    OC or EC data are missing.

  • c

    Half of the data between 3 nm and 15 nm were missing during this month because the SMPS was not functioning properly.

Factor 1
July 20010.
Aug. 2001−
Sept. 20010.100.260.340.210.380.800.780.46bb
Oct. 20010.030.320.410.270.510.730.750.610.670.56
Nov. 2001−0.050.420.480.330.570.900.680.400.890.67
Dec. 2001−0.470.460.450.140.610.810.470.540.740.76
Jan. 2002−0.420.510.560.170.470.700.480.670.560.55
Feb. 2002−0.530.700.750.340.760.910.610.540.820.76
March 2002−0.500.430.570.300.590.880.620.570.650.69
April 2002−0.330.440.520.180.520.730.580.450.510.43
May 02−0.120.390.450.220.500.850.740.310.800.66
Factor 2
July 2001−0.180.470.570.480.540.290.150.540.460.56
Aug. 2001−0.150.340.480.320.350.250.230.310.310.39
Sept. 2001−0.240.600.730.320.770.610.520.58bb
Oct. 2001−0.460.660.780.460.560.460.230.610.430.58
Nov. 2001−0.510.740.790.420.790.480.230.340.410.75
Dec. 2001−0.460.530.530.360.670.810.470.400.800.81
Jan. 2002−0.370.590.710.440.590.720.610.450.710.71
Feb. 2002−0.610.710.780.580.700.790.560.380.660.71
March 2002−0.500.530.680.590.610.700.570.430.530.61
April 2002−0.410.500.590.480.540.490.460.340.360.34
May 2002−0.290.600.680.470.640.650.410.380.640.71
June 2002−0.210.360.580.270.610.450.190.520.500.58
Factor 3
July 2001−0.240.370.400.230.28−0.14−0.230.23−0.040.26
Aug. 2001−0.270.320.440.180.31−0.01−
Sept. 2001−0.310.360.510.290.430.130.070.30bb
Oct. 2001−0.240.390.450.
Nov. 2001−0.320.330.360.150.210.04−0.150.17−0.070.21
Dec. 2001−
Jan. 2002−0.200.380.420.400.300.280.410.350.200.33
Feb. 2002−−0.100.02
March 2002−0.320.350.420.520.280.310.360.220.140.25
April 2002−0.280.300.370.590.
May 2002−0.210.330.410.490.
June 2002−−
Factor 4
July 2001−−0.03−0.060.09−0.110.10
Aug. 2001−−0.05−0.20−0.230.01−0.170.03
Sept. 2001−−0.10−0.18−0.20−0.03bb
Oct. 20010.−0.03−0.11−0.030.02−0.070.04
Nov. 20010.20−0.17−0.180.06−0.23−0.11−0.17−0.10−0.16−0.16
Dec. 20010.05−0.04−0.040.06−0.09−0.10−0.200.06−0.15−0.02
Jan. 20020.−−0.150.03
Feb. 20020.11−0.07−0.080.13−0.19−0.23−0.09−0.10−0.27−0.16
March 2002−
April 2002−−0.08−0.05−0.07−0.03−0.19−0.02
May 20020.−0.08−0.08−0.11−0.03−0.17−0.07
June 2002−−0.05−0.080.01−0.100.01
Factor 5
July 20010.08−0.13−0.16−0.02−0.15−0.15−0.11−0.19−0.25−0.18
Aug. 20010.12−0.05−0.120.03−0.17−0.22−0.18−0.19−0.24−0.17
Sept. 20010.09−0.11−0.200.03−0.19−0.24−0.20−0.24bb
Oct. 20010.17−0.17−0.250.03−0.25−0.27−0.14−0.27−0.19−0.12
Nov. 20010.17−0.12−0.150.13−0.14−0.11−0.14−0.07−0.14−0.17
Dec. 20010.27−0.06−0.08−0.07−0.10−0.19−0.21−0.13−0.15−0.08
Jan. 20020.20−0.08−0.15−0.10−0.16−0.28−0.25−0.21−0.26−0.17
Feb. 20020.35−0.16−0.21−0.03−0.20−0.27−0.20−0.22−0.22−0.20
March 20020.21−0.08−0.14−0.08−0.11−0.11−0.10−0.09−0.02−0.03
April 20020.13−0.01−0.050.16−0.09−0.030.00−0.15−0.080.02
May 20020.18−0.11−0.140.11−0.14−0.08−0.11−0.16−0.14−0.13
June 2002cccccccccc

[28] As indicated in Table 4, factor 1 usually has high correlations with sulfate. Secondary sulfate was formed by the oxidation of SO2 through photochemical reactions. The main SO2 sources are coal-fired power plants hundreds of kilometers away. In winter the correlation of factor 1 with nitrate increases noticeably. The lower temperatures support ammonium nitrate in the solid phase, and the fraction of nitrate in PM2.5 mass increases. The increase of the correlation coefficient with nitrate may explain the higher correlation of factor 1 with PM2.5 than with sulfate during winter, and this is also consistent with the fact that there was much less sulfate in winter.

[29] Factor 1 has a higher correlation with OC than with EC in the summer and fall of 2001, but the correlations are similar for both species in the first half of 2002. The higher correlation with OC than with EC in the summer may be caused by secondary organic matter condensing onto preexisting particles during the transport as well as more production of secondary OC in the summer. When primary OC dominates, the correlations with OC and with EC are similar since they are from the same sources and have similar temporal variations. In Figure 8 the CPF analysis shows that factor 1 is from the south. Table 5 summarizes the characteristics of factor 1 and other factors.

Figure 8.

Wind profile and CPF of the whole year contribution of each factor.

Table 5. Summary of the Characteristics of All Factors
 Factor 1Factor 2Factor 3Factor 4Factor 5
  • a

    Monthly results can be found in Table 4.

Sources assignedsecondary and aged primary aerosol, fresh primary aerosol from local combustion sourcesstationary combustion sourcesremote Pittsburgh traffic, local point sourceslocal trafficlocal nucleation
Size range0.15 ∼ 0.25 μm0.08 ∼ 0.1 μm in July, August and September 2001; 0.06 ∼ 0.07 μm in other months0.03 ∼ 0.04 μm15 nm<10 nm
Diurnal patternavery weakweaknostrongstrong
Weekday/weekend differencesmallnomoderatesignificantno
Correlations with gas and particle composition datano correlation with ozone; strong correlation with sulfate; correlations with other speciesnegative correlation with ozone; strong correlations with other gases; correlations with sulfate, nitrate and OC/ECweak correlations with NO, NOx and SO2; no correlations with other speciesno obvious correlations with any speciesa weak correlation with ozone; no correlations with other species
Dominating direction by CPFsouthsouth and southeastsoutheast and northwestno clear dominating directionsno clear dominating directions

[30] Factor 1 typically includes sulfate, nitrate (in winter), and secondary organics as well as primary organics that have aged in the atmosphere and have grown from their original size. The particles of the submodes cannot be from distant places; otherwise, they would be depleted during the transport. The small modes of factor 1 are more likely to be from fresh emissions of some combustion sources south of the site. Since the dominating direction of factor 1 is south, the particles of the small mode have the same variations as other large particles in factor 1, which may explain the correlation with CO. Factor 1 includes secondary, aged primary aerosol particles and also fresh primary particles from local combustion sources.

[31] The number mode of factor 2 is at 0.08 ∼ 0.1 μm in July, August, and September of 2001. In 2002 the number mode is at 0.06 ∼ 0.07 μm. In Figure 9 a daily pattern, caused by the reduction of mixing height at night, is clearly observed. The nocturnal increase of the source contribution, owing to the reduction of the mixing height, may be the cause of the negative correlation with ozone. Table 3 shows that the diurnal pattern of factor 2 does not appear in all months, indicating that this daily pattern is weak and is easily disturbed.

Figure 9.

Daily average contribution from each factor for weekdays and weekends. The strength of the daily pattern can be determined by the spectral intensity at 1/24 h−1 in Figure 6. The daily pattern strength is strong for factors 4 and 5, weak for factors 1 and 2, and nearly nothing for factor 3.

[32] The correlations between factor 2 and these gases, NO, NOx, CO, and SO2, suggest that the particles are from combustion sources and arrived at the receptor site accompanied by these gases. The strong correlation with SO2 suggests coal burning, including coal power plants and steel mills. The correlation with CO can be explained by the emission from steel mills as well as a boiler 1 km away to the northwest of the site.

[33] Factor 2 has a weak correlation with OC in summer, and in winter the correlation becomes stronger. Like factor 1, the winter correlations with OC and with EC are similar, indicating primary OC sources. Factor 2 probably contains both emissions from wood burning and other local combustion sources that are too similar in size to separate. The average contribution of factor 2 in the winter months (December 2001, January 2002, February 2002, and March 2002) is higher than that in the summer months (July 2001, August 2001, and September 2001) by 34%. In comparison, the average number contribution of factor 3 increases from summer to winter by only 17%. The higher increase of the factor 2 contribution may suggest an additional source from wood burning. As shown in Figure 9, factor 2 has no significant differences between weekdays and weekends. The CPF analysis shows it is also from the south to southeast.

[34] Factor 2 is thus assigned as stationary combustion, including emissions from local combustion sources. Probably, it may also include wood burning in winter. The emission from the boiler may also be included, except in summer when it seldom ran. The higher correlation of factor 2 with SO2 and lower correlation with sulfate suggests that the sources are located close to the receptor site, <100 km [Zhou et al., 2004]. This is also consistent with results for a coal-fired power plant that was found to increase SO2 close to the source but that produced little sulfate because of the short distance and time for oxidation [Green et al., 2003].

[35] The number mode of factor 3 is at 0.04 μm in summer and 0.03 μm in the fall and winter. This factor has been associated with Pittsburgh traffic in a previous paper [Zhou et al., 2004], but its behavior is a little puzzling here. Factor 3 is weakly correlated with NO, NOx, and CO during all months. Figure 7 and Table 3 both indicate that it has no daily patterns. Table 6 also indicates that factor 3 has no correlation with the traffic flow on Interstate 376. One reason may be that the distance of transport weakened the daily pattern and the correlations, and meteorological conditions have more influence than the emissions. Another possible reason is the presence of some particles from local point sources in factor 3. Figure 7 indicates frequency peaks at 1/12 h−1, 1/8 h−1, and 1/6 h−1 for factor 3. These are the harmonic frequencies of 1/24 h−1, caused by a nonsinusoidal periodicity of 24 hours. Although these frequencies may suggest a hidden daily pattern, they only appeared in January and February of 2002, and we cannot conclude that factor 3 has daily patterns.

Table 6. Correlations of Gases and Source Contributions With Traffic Flow Volume on Schenley Drive, Forbes Avenue, and Interstate 376a
 NONOxCOFactor 3Factor 4
  • a

    Distance to the site: Schenley Drive, ∼200 m; Forbes Avenue, ∼600 m; and Interstate 376, ∼1600 m.

  • b

    For 1–31 January and 1–30 April 2002.

  • c

    For 11–31 January and 1–30 April 2002.

Schenley Driveb0.056−0.0024−0.0450.0660.30
Forbes AvenuebM0.021−0.081−0.145−0.00690.27
Interstate 376c−0.032−0.125−0.1540.0460.39

[36] Taking the periodicity of a series as “signal” and the nonperiodicity as “noise,” then factor 3 has weak signals as just discussed, and noise is then sufficiently large to prevent a detectable 24 hour periodicity by Fourier analyses. The effect of the noise is random, and it affects any time of day to the same extent. Since there are a large number of samples for each time of weekday and weekend, we can expect a horizontal line with small fluctuations for the daily average in Figure 9. Thereafter, the concentration peak of factor 3 at morning rush hours as well as the weekday/weekend difference in Figure 9 suggests the influence of traffic.

[37] Although the number size distribution changes significantly within ∼100 m of highways [Zhu et al., 2002a, 2002b, 2004], the mode of the size distribution becomes stable at farther distances, and factor 3 is consistent with the unimodal size distributions found several kilometers downwind of the highway in Los Angeles [Kim et al., 2002].

[38] Factor 3 is probably a collection of point sources and remote Pittsburgh traffic. These are particles produced in the city but not close to the measurement station, several kilometers away. The current techniques cannot separate these two source categories, and the source characteristics are not identified well.

[39] Factor 4 has its number mode at 15 nm. Figure 9 clearly shows the concentration peak during the morning rush hour for weekdays and shows no such peak for weekends. Sometimes particles in 10–20 nm were formed by nucleation followed with no detectable particle growth, and these data were not excluded by the method described in the previous section. These nucleation events keep the average concentration value of factor 4 from decreasing rapidly after 1000 local time (LT) to the afternoon. In Figure 7, Fourier analyses found two frequency peaks corresponding to a 7 day period and a 24 hour period. Factor 4 has no correlations with any gas and particle composition data, and this can be attributed to the small traffic flow around the site. The traffic flow rate at three places (Schenley Drive, Forbes Avenue, and Interstate 376) near the site were not correlated with NO, NOx, or CO as indicated in Table 6, indicating that the temporal variations of these gases do not show traffic patterns. The contribution of factor 4 has a positive correlation with the traffic flows. These facts support the conclusion that factor 4 is from local traffic while most NO, NOx, or CO is not from local traffic.

[40] In 2002, from mid-February to mid-April, the roads near the station were closed in the early weekend mornings (0600–1200 LT) for motorless “buggy” practice. Figure 10 shows the contribution series of factor 4 and also the moving average of this series. The moving average time was chosen as 6 hours so that rapid variations were filtered. Usually, there are low emissions from local traffic on weekends, corresponding to low source contributions of factor 4. When the roads were closed for buggy practice, factor 4 shows the lowest contributions, as indicated in Figure 10. Factor 4 is thus assigned to be from local traffic. It may be from Forbes Avenue and other minor roads close to the measurement station within a distance of 1 km.

Figure 10.

Source contribution series of factor 4 from 1 January to 30 April 2002.

[41] Factor 5 represents particles smaller 10 nm from nucleation. These small particles are not involved in the particle growth event that we have discussed. A detailed study of nucleation events during PAQS was presented by Stanier et al. [2004b]. On average, the number mode is larger than 3 nm, and this may be caused by the nucleation happening upwind or happening at higher elevations with subsequent downward mixing of the particles. The similarity of data between the upwind (Florence) nucleation and Pittsburgh indicates that the nucleation is occurring upwind of Pittsburgh [Stanier et al., 2004b]. Figure 7 shows a clear daily pattern, and Figure 9 indicates that local nucleation events are more active during the daytime, especially around noon. The similar pattern of mean and median values indicates that nucleation happened frequently in Pittsburgh, and this cannot be explained by occasional occurrences of nucleation with a large number of new particles produced.

[42] Figure 11 illustrates the variation of the monthly mean number contribution from each factor. The fluctuations are within a factor of 2. Factors 1 and 2 have similar seasonal trends, high in the fall and low in the winter. They all reach their highest concentrations in November 2001, and the SO2 concentration is also the highest in that month. These high concentrations in November 2001 may be attributed to the dominant wind direction from the south, where more coal power plants are located. Figure 12 indicates the monthly volume contribution variations. The volume contribution is calculated from the number contribution and size distribution of each factor. For factor 1 the volume contribution only includes particles smaller than 0.5 μm for all months since the lack of APS data in some months prevents us from investigating the volume contribution over this size. The monthly variation of factor 1 is similar to PM2.5 mass concentration. In the summer of 2001, particles from all sources seem to be larger than in other seasons.

Figure 11.

Monthly variations of average number contribution from each factor.

Figure 12.

Monthly variations of average volume contribution from each factor.

[43] Table 7 summarizes the average contribution from each factor through the full year. The number contributions from factors 2, 3, 4, and 5 are approximately the same, and factor 1 contributes far fewer particles. The volume contribution is dominated by factors 1 and 2, and factor 5 contributes little to the total volume concentration. The volume contribution of factor 1 is underestimated because of the missing APS data.

Table 7. Average Contribution of All Factors
 Factor 1Factor 2Factor 3Factor 4Factor 5
  • a

    June 2002 is not included.

Number concentration, number cm−314024907618259724514
Volume concentration, μm3 cm−37.402.780.5000.3060.149a

6. Conclusion

[44] Positive matrix factorization and other data-mining techniques have been applied for extracting source information with the full year size distribution data from the Pittsburgh Air Quality Study. The data representing particle growth after nucleation events were excluded. The analysis was performed for each month, and the same five factors were found for all months.

[45] The five factors found in this analysis represent five different size patterns. Each pattern is caused by a source or source group. This analysis has succeeded in separating and identifying local traffic and nucleation. The effect of this method is limited in separating sources in factors 1, 2, and 3. Factor 1 includes local sources besides secondary and aged primary aerosol; factor 2 includes power plants, but it cannot be separated further from other stationary combustion sources; factor 3 is probably Pittsburgh traffic, but the evidence is insufficient.

[46] For the purpose of source apportionment, the approach by itself has significant limitations compared with traditional receptor modeling based on chemical composition data. Comparison of the results of this approach with other chemical-composition-based analysis in the future will provide more information, especially in identifying the sources included in factors 1, 2, and 3. However, there is useful information that can be obtained from such analyses. Although the initial cost of the equipment is substantial, the operational costs are low, and they provide far more information than can be obtained from particle counts alone.


[47] This research was conducted as part of the Pittsburgh Air Quality Study, which was supported by the U.S. Environmental Protection Agency under contract R82806101 and the U.S. Department of Energy National Energy Technology Laboratory under contract DE-FC26-01NT41017. This paper has not been subject to EPA's required peer and policy review and therefore does not necessarily reflect the views of the Agency. No official endorsement should be inferred.