The long-range transport of particulates can substantially contribute to local air pollution. The importance of airborne pollen has grown due to the recent climate change; the lengthening of the pollen season and rising mean airborne pollen concentrations have increased health risks. Our aim is to identify atmospheric circulation pathways influencing pollen levels in three European cities, namely Thessaloniki, Szeged, and Hamburg. Trajectories were computed using the HYSPLIT model. The 4 day, 6 hourly three-dimensional (3-D) backward trajectories arriving at these locations at 1200 UT are produced for each day over a 5 year period. A k-means clustering algorithm using the Mahalanobis metric was applied in order to develop trajectory types. The delimitation of the clusters performed by the 3-D function “convhull” is a novel approach. The results of the cluster analysis reveal that the main pathways for Thessaloniki contributing substantially to the high mean Urticaceae pollen levels cover western Europe and the Mediterranean. The key pathway patterns for Ambrosia for Szeged are associated with backward trajectories coming from northwestern Europe, northeastern Europe, and northern Europe. A major pollen source identified is a cluster over central Europe, namely the Carpathian basin with peak values in Hungary. The principal patterns for Poaceae for Hamburg include western Europe and the mid-Atlantic region. Locations of the source areas coincide with the main habitat regions of the species in question. Critical daily pollen number exceedances conditioned on the clusters were also evaluated using two statistical indices. An attempt was made to separate medium- and long-range airborne pollen transport.
 Pollen grains from various plant taxa are frequently implicated in respiratory allergy symptoms. The pollen-induced health risks are of great importance, since a worldwide increase in respiratory diseases has been observed over the past few decades [D'Amato et al., 2001]. Moreover, there is evidence that high levels of traffic-derived air pollutants may interact with pollen and bring about more intense respiratory allergy symptoms [Motta et al., 2006]. Conditions prevailing in modern cities (higher industrial and traffic-related air pollution load) are the reason why people living in urban areas are more affected by pollen-related respiratory allergies than those living in rural areas [D'Amato et al., 2001]. It was found that pollen-related allergies appeared more frequently in industrial regions, while in the rural areas patients showed only a small predisposition to any allergy [Obtulowicz et al., 1996]. According to Hjelmroos et al. , there is an indication that traffic-related air pollution can modify the composition of pollen grains. Pollen allergens may be modulated by air pollutants, facilitating an interaction between pollen and pollutants in the atmosphere outside the organism, which in turn may affect allergy-relevant phenomena [Emberlin, 1995].
 Allergenic pollen of different taxa can exceed critical values over substantial parts of Europe. In the Mediterranean region, pollen from Urticaceae taxa is among the most allergenic in the air and is often implicated in respiratory allergies (Figure 1) [Gioulekas et al., 2004]. In central Europe, pollen from ragweed (Ambrosia spp.) is the most important cause of allergy-associated respiratory diseases. The role of the Hungarian Great Plain, with Szeged in its geographical center, is quite unique due to its worldwide highest diurnal and annual ragweed pollen counts per m3 of air (Figure 2) [Makra et al., 2005]. In northern Europe, pollen grains from birch (Betula) are an important cause of pollinosis. Depending on the region, pollen grains from the Poaceae family are also considered to be important (Figure 3) [Koivikko et al., 1986].
 The long-range transport of airborne allergenic pollen has been analyzed, for example, by Lorenzo et al.  in order to identify source areas. Studying airborne pollen transport may help to comprehend pollen count variations and accurately predict its atmospheric concentrations [Damialis et al., 2005]. Several authors used single backward trajectories to interpret particular pollution episodes [Hjelmroos, 1991; Franzen et al., 1994; Rousseau et al., 2004, 2008], but this approach provides a rough approximation to the exact track and source of an air mass. Trajectories arriving at a given location over a relatively long period can be analyzed by a cluster analysis in order to investigate patterns of data within a data set. Cluster analysis has been applied by many authors to classify individual atmospheric trajectories into a relatively small number of groups [Dorling et al., 1992]. However, this procedure has not yet been used for interpreting airborne pollen levels. Moreover, cluster analysis identifies clusters of data sets without providing information on cause-effect relationships. Nevertheless, these structures can help us to interpret airborne pollen data, for example, identify transport patterns. Of course, the reliability of the results using cluster analysis increases with the number of the trajectories analyzed [Borge et al., 2007].
 To our knowledge, no study has been done concerning a combined investigation of different geographical locations and their pollen profiles in terms of long-range transport effects. Furthermore, no studies have applied clusters of trajectories in order to identify long-range transport patterns that may affect the biological component of the air quality of given locations over long periods. High temporal resolution pollen concentration data, such as bihourly course of daily pollen counts, can help in determining the origin of measured pollen (a daytime peak involves a possible local origin, while night or early dawn peaks may entail long-range transport) [Šikoparija et al., 2009], but there is no information in the literature on how to separate medium-range pollen transport involving local pollen dispersion from long-range pollen transport and, hence, on determining the contributions of these two components on the measured daily pollen concentration. In the present study, atmospheric pollen concentrations of three taxa being among the most allergenic species and having high pollen levels in three European cities (Urticaceae pollen for Thessaloniki, Southern Europe (Figure 1); Ambrosia pollen for Szeged, central Europe (Figure 2), and Poaceae pollen for Hamburg, a gateway to northern Europe (Figure 3)) have been evaluated using clusters of backward trajectories developed with the Mahalanobis metric, using data gathered over a 5 year period.
 The aim of our analysis is to identify geographical key regions responsible for high pollen levels at individual city locations. Backward trajectories arriving at these sites are clustered during the pollen season for each pollen type in order to identify regions with the highest pollen concentrations. ANOVA is used to determine whether mean pollen concentrations corresponding to different regions differ significantly. The clustering is performed using three-dimensional (3-D) backward trajectories. Cluster-dependent occurrences that exceed some critical concentration threshold are also evaluated with two statistical indices. Last, an attempt is made to separate the medium-range pollen transport including local pollen dispersion from the long-range transport of pollen.
2. Materials and Methods
2.1. Study Areas and Pollen Data
 Five year daily pollen data (1998–2002) from three European cities, namely Thessaloniki (Greece), Szeged (Hungary) and Hamburg (Germany) were used. These cities differ in their topography and climate and in plant taxa presence and abundance. For each city we selected those taxa that contribute significantly to the total atmospheric pollen concentration and also that have high allergenicity. Pollen grains in all three cities were collected using 7 day recording volumetric spore traps of the Hirst design [Hirst, 1952]. The traps were placed on the roof of buildings in the center of Thessaloniki, Szeged and Hamburg approximately 30 m [Damialis et al., 2007], 20 m [Makra et al., 2005] and 25 m [Nowak et al., 1996] above the ground level, respectively. The identification and counting of pollen grains were both performed using light microscopy. Counts were expressed as mean daily pollen concentrations (number of pollen grains per m3 of air). A detailed description of the methods for preparation and scanning of the microscope slides are given by Käpylä and Penttinen  and P. Mandrioli (Method for sampling and counting of airborne pollen and fungal spores, http://www.pollenwarndienst.at/upload/images/original/1995.pdf). Daily meteorological data (mean temperature, mean global solar flux, mean relative humidity and daily precipitation total) for Thessaloniki (Université de Thessaloniki, 2003), Szeged (Environmental and Natural Protection and Water Conservancy Inspectorate of Lower-Tisza Region, Szeged, Hungary) and Hamburg (http://www.dwd.de/klimadaten) were also utilized.
 Thessaloniki (40.13°N; 22.15°E) is located to the north of the Thermaikos Gulf, in the Aegean Sea (Figure 1). The taxon considered for Thessaloniki was nettle (Urticaceae) (Figure 1), as 15.3% of allergic people are sensitive to nettle pollen [Gioulekas et al., 2004]. Szeged (46.25°N; 20.10°E), the largest settlement in southeastern Hungary is located at the confluence of the rivers Tisza and Maros (Figure 2). The taxon selected for Szeged is ragweed (Ambrosia spp.) (Figure 2), as the sensitivity of patients to ragweed is 83.7% [Kadocsa and Juhász, 2000]. Hamburg (52.62°N; 10.15°E) is the second-largest city in Germany with approximately 1.8 million people (Figure 3). The pollen type selected for Hamburg is grass pollen (Poaceae) (Figure 3), as 24% of the local inhabitants have a positive reaction to grass [Nowak et al., 1996].
 In order to analyze the impact of pollen on human health, critical threshold values are usually defined. Thresholds may be defined either with the aim of delimiting clinical symptoms in allergic patients (clinical thresholds) or by a statistical analysis of aerobiological data sets (statistical thresholds). As it is not known whether the clinical thresholds for the pollen species in question can be determined with the same method, we applied statistical thresholds for pollen count exceedances. The threshold is taken to be the upper quartile for each pollen species; that is, the relative frequency of pollen concentrations above this threshold is 25%. As any threshold selection is subjective, this value seems appropriate because is large enough to define high-concentration events and is low enough to have sufficient cases for statistical analysis.
2.2. Backward Trajectories
 In the frame of an ETEX (European Tracer Experiment) research, the efficacy of three large-scale Lagrangian dispersion models, i.e., CALPUFF 5.8, FLEXPART 6.2 and HYSPLIT 4.8 was compared. Based on four statistical scores (Figure of Merit in Space, Probability of Detection, False Alarm Rate and Threat Score/Critical Success Index), the HYSPLIT model achieved the best performance [Anderson, 2008]. Therefore, we decided to use the HYSPLIT model in our study [Draxler and Hess, 1998].
 Since a single backward trajectory has a large uncertainty and is of limited significance [Stohl, 1998], trajectories arriving at heights h = 500, 1500 and 3000 m AMSL over a 5 year period from 1998 to 2002 were taken. The 4 day, 6-hourly 3-D backward trajectories arriving at the three locations at 1200 UT were used in order to describe the horizontal and vertical movements of an air parcel arriving at the above-mentioned cities. The period examined over years covers the days of the pollen season for each pollen type. The pollen season is defined by its start and end dates. For the start (end) of the season we used the first (last) date on which 1 pollen grain m−3 of air is recorded and at least 5 consecutive (preceding) days also show 1 or more pollen grains m−3 [Galán et al., 2001]. In the case of a given pollen type, the longest pollen season during the 5 year period was considered for each year.
2.3. Cluster Analysis
 Cluster analysis is a common statistical technique to objectively group elements such as atmospheric trajectories using a similarity measure. The aim is to maximize the homogeneity of elements within the clusters and to maximize the heterogeneity among the clusters. Here a nonhierarchical cluster analysis with k-means algorithm using a Mahalanobis metric [Mahalanobis, 1936] was applied. Data to be clustered include the 4 day, 6-hourly coordinates (ϕ, latitude; λ, longitude; and h, height AMSL) of the 3-D backward trajectories for each city.
 The homogeneity within clusters was measured by RMSD, defined as the sum of the root mean square deviations of cluster elements from the corresponding cluster center over clusters. The RMSD value usually decreases with an increasing number of clusters. Thus, this quantity itself is not very useful for deciding the optimal number of clusters. However, the change of RMSD (CRMSD) versus the change of cluster numbers, or even the change of CMRSD (CCRMSD) is much more informative. Therefore, working with cluster numbers from 15 to 1, an optimal cluster number was selected so as to maximize the change in CRMSD. The rational behind this approach is that the number of clusters producing the largest improvement in cluster performance compared to that for a smaller number of clusters is considered optimal.
 Clustering was performed on complete trajectories using their 4 day, 6-hourly coordinates for three arrival heights and three cities. Thus, altogether 1 × 3 × 3 = 9 clustering procedures were implemented. The results of cluster analysis are described and presented for each city for the lowest (h = 500 m) arrival height because the largest influence of backward trajectories on the pollen concentration is expected at this height. The separation of the backward trajectory clusters and preparation of Figures 4, 5, 7, 8, 10, and 11 for clusters of backward trajectories were both performed using a novel approach with the help of a function called “convhull.” The algorithm (qhull procedure; http://www.qhull.org) gathers the extreme trajectory positions (positions farthermost from the center) belonging to a cluster, which are then enclosed. More precisely, the procedure creates the smallest convex hull with minimum volume covering the backward trajectories of the clusters. Note that the term “convex hull” is commonly used in computational geometry for the boundary of the minimal convex set containing a given nonempty finite set of points in the plane [Preparata and Hong, 1977; Cormen et al., 2001].
Borge et al.  used a two-stage clustering procedure. They observed that the original one-stage cluster analysis including all trajectories was strongly influenced by the trajectory length. Long trajectories representing fast-moving air masses were highly disaggregated, even though they often came from the same geographical region. Many short trajectories representing slow-moving air masses, however, were grouped together, although they came from very heterogeneous regions. Therefore, short trajectories were reanalyzed that identify new clusters (second stage). The second-stage analysis is, however, not necessary if the metric in the clustering procedure is non-Euclidean. The problem of justifying the two steps vanishes when a Mahalanobis metric is used. The issue of a two-stage cluster analysis arises from different standard deviations of the coordinates of the trajectory points being far and near in time. In order to demonstrate the role of different standard deviations, let us take a difference of 200 km from the position of a given trajectory point. Such a difference some 1500 km from us seems relatively miniscule, while the same difference is considered very large when close to the arriving point of the trajectory.
 Clustering with the k-means algorithm was performed by using MATLAB 7.5.0 software. Trajectory clusters are projected on a stereographic polar plane supported by HYSPLIT [Taylor, 1997].
2.4. Analysis of Variance
 One-way analysis of variance (ANOVA) was used to determine whether the inter group variance is significantly higher than the intra group variance. After performing ANOVA on the averages of the groups in question, a post hoc Tukey test was applied to establish which groups differed significantly from each other [Tukey, 1985]. The results of ANOVA may clarify the possible role of long-range transport on local pollen levels. Significant differences between mean pollen concentrations of different cluster pairs may reveal an important influence of the origin and transport of air masses on local pollen levels.
 Pollen concentration like any variable driven by meteorological factors exhibits a strong annual course; moreover, a certain period of the year is free of pollen. Therefore, before applying ANOVA, the annual course of pollen data was removed and standardized data sets were used thereafter. Standardized data sets are free of annual cycles, guaranteeing that distinguishing between average pollen levels corresponding to trajectory types is due to the types themselves and is not related to periods of the year. Denoting the logarithm of a pollen data set by xt, t = 1, …, n the expected value function m(t) of xt is approximated by a linear combination of cosine and sine functions with periods of 1 year and one-half year. Namely, m(t) = a0 + a1 cos(w1t) + a2 cos(w2t) + b1 sin(w1t) + b2 sin(w2t) with w1 = 2π/365.25 and w2 = 2w1. Unknown coefficients in this linear combination were estimated via the least squares technique. Then the standardized, and thus annual course-free data set is yt = exp((xt − m(t))/d(t)), t = 1, …, n, where unknown coefficients in d2(t) = a0 + a1 cos(w1t) + a2 cos(w2t) + b1 sin(w1t) + b2 sin(w2t) are estimated as those in m(t), except that xt has been replaced by xt* = (xt − m(t))2, t = 1, …, n when applying the least squares technique. The logarithm transformation is applied to reduce the high variability of pollen concentration data and thus to ensure a better performance of the least squares technique than for the raw data. This step of the procedure assumes that the probability distribution of aerobiological data can be well approximated by lognormal distributions [Limpert et al., 2008]. Note that standardized data values are dimensionless.
 The variability of daily pollen concentrations for each city was analyzed in order to discover whether the clusters had any influence on the pollen levels. An F test was used to check whether the difference among the cluster-averaged pollen concentrations was significant. If a significant difference among cluster-averaged pollen concentrations was found by ANOVA, then the Tukey test was carried out to identify those cluster pairs that were associated with significantly different pollen grain averages. There are several methods available for comparing means calculated from subsamples of a sample. A relatively simple but effective way is to use the Tukey test [Tukey, 1985]. This test performs well in terms of both the accumulation of first-order errors of the test and the test power. This or any other test assumes the statistical independence of data. Consecutive pollen data, however, may be correlated and produce higher variances of the estimated means compared to uncorrelated data. The classical Tukey test was therefore modified using the variances of estimated means obtained with the help of autoregressive (AR) models fitted to data for each cluster separately.
2.5. Factor Analysis and Special Transformation
 Factor analysis (FA) identifies linear relationships among subsets of examined variables, which helps to reduce the dimensionality of the initial database without any substantial loss of information. First, a factor analysis was applied to the initial standardized data set consisting of 8 variables (3 climatic and 5 trajectory variables introduced in section 4) in order to transform the original variables to fewer variables. These new variables called factors can be viewed as the main climate/trajectory functions that potentially influence daily pollen concentration for the cities in question. The optimum number of retained factors is determined by the criterion of reaching a prespecified percentage of the total variance [Jolliffe, 1993]. This percentage value was set at 80% in our case. After performing a factor analysis, a special transformation of the retained factors was performed to discover to what degree the above-mentioned explanatory variables (3 climatic and 5 trajectory variables) affect the resultant variable (daily pollen concentration), and to give a rank of their influence [Jolliffe, 1993; Jahn and Vahle, 1968].
2.6. Statistical Characterization of Pollen Exceedance Episodes
 The role of long-range transport was studied by analyzing cluster occurrence on days when 24 h pollen concentrations exceeded a threshold value. This threshold was taken to be the upper quartile for each pollen species; that is, the relative frequency of pollen concentrations above this threshold is 25%. As any threshold selection is subjective, this value seems appropriate because is large enough to define high-concentration events and is low enough to have sufficient cases for statistical analysis. Based on the 5 year standardized data sets, this threshold value for Urticaceae in Thessaloniki, Ambrosia in Szeged and Poaceae in Hamburg is 2.05, 2.42 and 2.04, respectively.
 Two statistical indices related to the probability (INDEX1: probability of exceedance under cluster i) and frequency (INDEX2: probability of exceedance under cluster i given on pollen exceedance days) of daily pollen exceedance episodes associated with different clusters of trajectories were calculated as by Borge et al.  for each chosen pollen type. For a given site and cluster, INDEX1 is defined as
where Di is the number of days for which a backward trajectory belonging to cluster i is present (namely, the number of occurrences of cluster i), and D(〉x)i is the number of 24 h pollen exceedances with occurrence of cluster i. INDEX1 tells us the likelihood of an exceedance occurring for a given cluster. INDEX2 is defined as
where E is the total number of pollen exceedance days recorded at a given site. INDEX2 may be interpreted as the likelihood of certain trajectory being present in a pollen exceedance day.
 The 3-D clustering produced seven clusters based on a change in CRMSD (CCRMSD). For Urticaceae pollen all 3-D clusters of backward trajectories are presented, namely all trajectories with color-coded clusters (Figure 4, top left), all clusters without trajectories but with their 3-D convex hulls of different colors for the top view (Figure 4, top right) and all trajectory clusters enclosed by their smallest transparent convex hull (Figure 4, middle left) as well as their 90° rotated version (Figure 4, middle right). A vertical extension of the trajectory clusters enclosed by their transparent 3-D convex hulls has also been added (Figure 4, bottom).
 Clusters 1 and 6 including the largest number of trajectories are the most compact and are associated with air masses arriving from northern/northeastern Europe and western Europe, respectively. Cluster 3, being the third most frequent cluster, is characterized mostly by air currents originating from northern Europe. However, it exhibits a bigger variability in the origin of the trajectories compared to those of clusters 1 and 6. Clusters 2, 4 and 7, with transport patterns coming from western and northwestern Europe, are the most heterogeneous. They cover the largest regions with the longest trajectories associated with the fastest movements of the air along trajectories. Cluster 5 is also heterogeneous and is covered by the smallest convex hull that includes the shortest trajectories with the slowest air movements over southern Europe (Figure 5).
 Characteristics of Urticaceae pollen are tabulated according to the retained clusters (Table 1). Pairwise comparisons of the cluster averages using the Tukey test found 5 significant differences among the possible 21 cluster pairs (23.8%) (Figure 6). Only clusters of the above-mentioned significantly different cluster averaged pollen levels were then considered and analyzed (Table 1 and Figure 6). Mean Urticaceae pollen level of cluster 2 differs significantly from that of clusters 4 and 7; furthermore, the mean pollen concentration of cluster 5 differs significantly from those of clusters 3, 4 and 7. Note that cluster 6 has the highest mean pollen concentration that, however, does not differ significantly from any other cluster mean. This is because the test statistic depends not only on means to be compared, but also on amounts, variances and autocorrelations of these data. The mean pollen levels of clusters 2 and 5 are higher in each comparison. Therefore, high Urticaceae pollen levels can be clearly associated with air masses arriving at Thessaloniki from western Europe (cluster 2), as well as with a region that includes the shortest trajectories over the Mediterranean (cluster 5), implying that a substantial part of Urticaceae pollen is of local origin. The highest pollen concentration is observed in cluster 6, possibly transporting a substantial amount of pollen from western Europe. On the other hand, Urticaceae pollen levels are low if air masses come to Thessaloniki from northern/northeastern Europe (cluster 1), northern Europe (cluster 3) or northwestern Europe (clusters 4 and 7). The results obtained suggest that, contrary to the widespread occurrences of Urticaceae species all over Europe (Figure 1), the bulk of major pollen transported to the region arrives from the West (clusters 2 and 6) and is of local origin (cluster 5). The role of clusters 2, 5 and 6 in pollen transport is supported by their slow moving trajectories as well (Figure 4, bottom, and Figure 5).
Table 1. Parameters of Standardized Urticaceae Pollen Concentrations for the Individual Clusters, Thessalonikia
With h = 500 m. Bold denotes maximum; italic denotes minimum.
Mean (pollen grain m−3)
Standard deviation (pollen grain m−3)
Number of trajectories
 The daily pollen concentrations of Urticaceae pollen exceeded their threshold value of 2.05 on 277 days (Figure 13). Clusters 5 and 2 have the highest INDEX1 values, namely 44.2% and 33.3%, respectively. They provide the second and third highest mean pollen levels and so they are considered important clusters that contribute to the local pollen levels. The high INDEX1 value of these two clusters is in agreement with their high mean pollen concentrations. Their highest standard deviation confirms a higher chance for extreme pollen episodes for these clusters. The highest INDEX2 values belong to cluster 1 (25.6%) and cluster 6 (20.6%) and are characterized by medium and the highest mean pollen levels, respectively. Note that INDEX1 and INDEX2 are not independent parameters. When a cluster is frequent a high INDEX1 value means a high INDEX2 value, which is the case for clusters 1 and 6. Furthermore, the slow moving backward trajectories of clusters 1 and 6 (Figure 4, bottom, and Figure 5) significantly contribute to the transport and the high standard deviation of these clusters indicates a higher frequency of days with extreme pollen concentrations. Cluster 3 with air currents coming from northern Europe as well as clusters 4 and 7 with backward trajectories of western European origin are characterized by the lowest INDEX1 and INDEX2 values, which is in agreement with their lowest mean pollen levels. These clusters are the least important for determining the mean pollen level in Thessaloniki.
 Eight clusters were retained in a 3-D analysis based on CCRMSD. For Ambrosia pollen all 3-D backward trajectories are displayed simultaneously, i.e., all the trajectories are color coded (Figure 7, top left), all clusters without trajectories but with their 3-D convex hulls of different colors for the top view (Figure 7, top right) and all trajectory clusters delimited by their smallest transparent convex hull (Figure 7, middle left) as well as their 90° rotation (Figure 7, middle right). Furthermore, a vertical extension of the trajectory clusters delimited by their transparent 3-D convex hulls is presented (Figure 7, bottom).
 In clusters 1, 2 and 3 most trajectories originate from northwestern Europe, northeastern Europe and western Europe, respectively. They are long and, hence, fast-moving trajectories, but the fastest trajectories are in cluster 8. They transport air masses from western Europe and several trajectories are of mid-Atlantic origin, corresponding to very fast advection. Most trajectories of cluster 4, covering a large region, come from northern Europe. Cluster 5, with the highest number of trajectories, is heterogeneous and is associated with different circulation patterns over northern and eastern Europe. Clusters 6 and 7 consist of the shortest, i.e., the slowest, trajectories over southern and central Europe, respectively (Figure 8).
 In Table 2, the characteristics of Ambrosia pollen are summarized according to the retained clusters. Applying the Tukey test, 5 significant differences were detected among the possible 28 cluster pairs (17.9%) and are listed in Figure 9. Similar to the case of Thessaloniki, only clusters accompanied with significantly different means were then analyzed (see Table 3 and Figure 9). The average Ambrosia pollen level for cluster 5 differs significantly from the average values of clusters 1, 2, 3, 7 and 8. The highest mean Ambrosia pollen concentrations can be linked to the occurrences of cluster 7 (Carpathian basin, highest concentration), as well as to the occurrences of cluster 2 (northeastern Europe), cluster 4 (northern Europe) and cluster 1 (northwestern Europe). However, Ambrosia pollen levels are low if air masses are transported to the target area from western Europe (cluster 3, smallest concentration; cluster 8), northern and eastern Europe (a heterogeneous cluster 5) and southern Europe (cluster 6).
Table 2. Parameters of Standardized Ambrosia Pollen Concentrations for the Individual Clusters, Szegeda
With h = 500 m. Bold denotes maximum; italic denotes minimum.
Mean (pollen grain m−3)
Standard deviation (pollen grain m−3)
Number of trajectories
Table 3. Parameters of Standardized Poaceae Pollen Concentrations for the Individual Clusters, Hamburga
With h = 500 m. Bold denotes maximum; italic denotes minimum.
Mean (pollen grain m−3)
Standard deviation (pollen grain m−3)
Number of trajectories
 Cluster 7 contains the shortest backward trajectories and it is rather typical over the Carpathian basin, which represents a specific key region for Ambrosia spp. Hence it may be regarded as a local cluster for Szeged. Trajectories in clusters 1, 2, and 4 (northwestern Europe, northeastern Europe and northern Europe, respectively) indicating very high pollen levels at the target site are associated with low concentrations at their origin, but passing over the Carpathian basin they add other sources to local pollen levels. Furthermore, clusters 2 and 4 in particular have slow-moving backward trajectories that favor transporting large amounts of pollen to the target area (Figure 7, bottom, and Figure 8). The transport of pollen by trajectories arriving from western Europe (clusters 3 and 8), southern Europe (cluster 6) and by trajectories of the heterogeneous cluster 5 is not as characteristic as transport from northwestern through northern to eastern European directions due to the different occurrence of ragweed in Europe (Figure 2). Furthermore, clusters 3 and 5 include high-moving trajectories (Figure 7, bottom, and Figure 8), which permit the transport of only a small amount of pollen. Accordingly, smaller long-range transport by the trajectories of these clusters results in smaller local pollen levels in Szeged.
 The daily pollen concentrations of Ambrosia spp. exceeded their limit value (2.42) on 148 days (Figure 13). The highest INDEX1 values are associated with cluster 1 (34.5%) and cluster 2 (32.1%). This result agrees with the fact that these clusters have very high mean pollen levels with high standard deviations (Table 2). Cluster 7 provides the third highest INDEX1 (31.1%) and the highest INDEX2 (25.0%). Since the Carpathian basin has the highest amount of ragweed-pollen pollution in Europe [Makra et al., 2005] each summer, this is why these indices have very high values. Clusters 3 and 6 have very low pollen levels, in accordance with their lowest INDEX1 and INDEX2 values. The standard deviation of the daily pollen concentration for these clusters is the lowest, indicating a low likelihood for having extreme daily pollen levels.
 Seven clusters were identified for Hamburg based on CCRMSD. For Poaceae pollen all 3-D backward trajectories are represented together; that is, all the trajectories are shown with color-coded clusters (Figure 10, top left), all clusters without backward trajectories but with their 3-D convex hulls of different colors for the top view (Figure 10, top right) and all trajectory clusters enclosed by their smallest transparent convex hull (Figure 10, middle left) as well as their 90° rotated version (Figure 10, middle right). In addition, a vertical extension of the trajectory clusters enclosed by their transparent 3-D convex hulls is included (Figure 10, bottom).
 The trajectories of cluster 1 are fast and are mostly of mid-Atlantic origin (region of 50°N–60°N latitudes). The backward trajectories of cluster 2, being shorter and hence slower than those of cluster 1, arrive from the eastern mid-Atlantics. Cluster 3 transports air masses from northwestern Europe; while cluster 4 is heterogeneous but its trajectories have a definite northeastern component. Cluster 5 consists of the shortest trajectories with slowest movement and indicates their clear origin of northern Europe. Cluster 6 contains the longest/fastest trajectories arriving from the western/southwestern direction (mid-Atlantic region, 40°N–50°N latitudes). Cluster 7 is a heterogeneous group with trajectories mainly having a northern European component (Figure 11).
 In Table 3, the characteristics of Poaceae pollen are presented for each retained cluster. The Tukey test revealed only 2 significant differences among the possible 21 cluster pairs (9.5%) (Figure 12). Hereafter, only clusters of significantly different mean pollen levels were analyzed in detail (Table 3 and Figure 12). The Poaceae pollen level of cluster 5 compared to those of clusters 2 and 6 display significant differences (Table 3 and Figure 12). Clusters with the highest Poaceae pollen levels namely, cluster 2 (western Europe involving eastern mid-Atlantics, third highest concentration), cluster 6 having the longest and, consequently, fastest trajectories from the mid-Atlantic running across western/southwestern Europe before reaching Hamburg (highest concentration) and cluster 7 being a heterogeneous group involving mainly a western European component (second highest concentration) represent very similar backward trajectory directions from west to east. The backward trajectories of clusters 2, 6 and 7 originating from the northeastern Atlantics or mid-Atlantics travel across France, Belgium, the Netherlands and northwestern Germany, the regions most polluted with Poaceae pollen and, hence, they can transport large amount of pollen to Hamburg. Clusters 2, 6 and 7 are characterized by slow-moving backward trajectories that help transport large amounts of pollen to the target area. Furthermore, cluster 2 has the highest frequency; hence it makes a significant contribution to local pollen levels. In contrast, Poaceae pollen levels are low if air masses arrive at Hamburg from northwestern Europe (cluster 1 and cluster 3), northeastern Europe (cluster 4) and northern Europe (cluster 5, smallest concentration). Trajectories from the northern mid-Atlantics region cannot transport pollen to Hamburg (clusters 1 and 3). In northeastern Europe there is no habitat for Poaceae pollen (cluster 4); consequently, it cannot be transported from there to Hamburg. In northern Europe only a small region is polluted by Poaceae pollen, thus pollen transport from this area to Hamburg is negligible (cluster 5).
 The daily pollen concentrations of Poaceae exceeded their limit value (2.04) on 189 days (Figure 13). The parameters of cluster 2 (second highest INDEX1 value, 33.8%; highest INDEX2 value, 25.4%) and cluster 6 (highest INDEX1 value, 34.4%; second highest INDEX2 value, 16.4%) are in agreement with the high mean pollen levels of these clusters. The high index values of these clusters, relating to the high variability of daily pollen concentrations, are supported by their highest standard deviations. The lowest INDEX1 and INDEX2 values are associated with clusters 1, 3, 4 and 5, which may be interpreted as the clusters with the smallest mean pollen levels and the smallest standard deviations.
4. Discussion and Conclusions
 A cluster analysis was applied to 4 day, 6-hourly backward trajectories arriving at Thessaloniki, Szeged and Hamburg over a 5 year period in order to identify the main atmospheric circulation pathways influencing pollen levels at these sites. The Mahalanobis metric was used, which avoids the need for a two-stage cluster analysis introduced by Borge et al. . The 3-D delimitation of the backward trajectories for the clusters is a novelty.
 When determining important clusters that mainly influence pollen levels, the following aspects were considered: (1) the average pollen level of a given cluster should differ significantly from that of another cluster, (2) the average of the given cluster should be high, and (3) the INDEX1 value and/or INDEX2 value of the given cluster should be high. Two additional factors might be important, namely the given cluster should have a high frequency and the given cluster should have low-level backward trajectories.
 For Thessaloniki, in contrast to the widespread occurrences of Urticaceae species all over Europe, the major pollen transport can be clearly associated with air masses arriving from western Europe (cluster 2). Moreover, the highest pollen concentration can be observed in cluster 5, consisting of the shortest trajectories over the Mediterranean which naturally assumes that a substantial part of Urticaceae pollen is of local origin. The role of clusters 2 and 5 in pollen transport is emphasized by their highest mean pollen levels, highest INDEX1 values, slow moving trajectories and high variability of daily pollen levels, which favors the occurrences of days with extreme pollen episodes. On the other hand, Urticaceae pollen levels are low if air masses reach Thessaloniki from northern/northeastern Europe (cluster 1), northern Europe (cluster 3) and northwestern Europe (clusters 4 and 7).
 For Szeged, cluster 7 including the shortest backward trajectories has the highest pollen levels and is rather typical over the Carpathian basin, which represents a specific key region for Ambrosia spp. Furthermore, trajectories in clusters 1, 2, and 4 (northwestern Europe, northeastern Europe and northern Europe, respectively) indicating that very high pollen levels at the target site are associated with low concentrations at their origin, but taking several hundred kilometers over the Carpathian basin they get other sources that contribute to the local pollen levels. The highest mean pollen levels, slow-moving trajectories of clusters 2 and 4 in particular, high INDEX1 and INDEX2 values and high standard deviations indicating higher variability and, hence, a higher chance for daily extreme pollen episodes highlights the importance of these clusters in pollen transport. Ambrosia pollen levels are low if backward trajectories arrive from western Europe (clusters 3 and 8), southern Europe (cluster 6) and from the heterogeneous cluster 5 without any clear direction. The difference in the amount of transported pollen is strongly influenced by the source region of the trajectories as the occurrence of ragweed in Europe varies somewhat.
 For Hamburg, the highest Poaceae pollen levels are associated with cluster 2 (western Europe involving the eastern mid-Atlantics), cluster 6 (the mid-Atlantic region with western/southwestern air currents) and cluster 7 (a heterogeneous group with a big western European component). Backward trajectories corresponding to these clusters travel across France, Belgium, the Netherlands and northwestern Germany, namely the most polluted regions with Poaceae pollen and, hence, they can transport large amount of pollen to Hamburg. The role of these clusters in pollen transport is confirmed by their highest mean pollen levels, slow-moving backward trajectories, highest INDEX1 values and also the highest standard deviations, indicating the potential of higher extremity in daily pollen episodes. In contrast, Poaceae pollen levels are low if air masses arrive at Hamburg from northwestern Europe (cluster 1, and cluster 3), northeastern Europe (cluster 4) and northern Europe (cluster 5). The reason for the low-level transport is that these source areas, either being over the Atlantic Ocean or over northern Europe, have negligible pollen levels.
 The main results show that areas where the species of interest are very common coincide with the above-mentioned key regions and they suggest that these regions characterize the distant sources. Members of Urticaceae families are prevalent all over Europe (Figure 1), while members of Poaceae families are also widespread in Europe except for Northern Europe (Figure 3). The spreading of Ambrosia species is more limited (Figure 2). In Europe, the three main habitat regions of Ambrosia are the Carpathian Basin with peak values in Hungary [Makra et al., 2005], the Rhône-Alpes region [Laaidi et al., 2003] in France, and the western part of the Po River Plain, i.e., southern Lombardy in Italy [Mandrioli et al., 1998]. In eastern Europe, low and very low Ambrosia pollen levels are observed, while northern Europe is free of this pollen (Figure 2). According to these findings, the locations of important areas where the species of interest are very common coincide with the main habitat regions covered by trajectory clusters developed.
 After classifying objective groups of back trajectories and, in this way, detecting the main circulation pathways for the cities in question, it is important to separate “local” and “transported” components of the actual pollen levels or, in other words, it is necessary to determine the relative weight of these two components in the measured pollen concentration. In this section a heuristic approximation is performed in order to separate medium and long-range airborne pollen transport for each given city.
 Transport, from its range aspect, can be classified into three groups. Short-range transport corresponding to local component of the pollen concentration is very limited in time and space, and does not take more than an hour. During this period meteorological conditions do not change significantly. In this group transport extends from about 100 m to 1 km. For medium-range transport the appropriate time scale is 1 day and the space scale is from 1 to 100 km, while its long-range transport covers several days and can be so extensive that it can be considered transcontinental circulation [Rantio-Lehtimäki, 1994].
 Short-range transport is only affected by local meteorological parameters. Pollen measuring networks minimize local effects by standard sampling at a height of 10–30 m above ground level. (If the trap were at ground level, it would mainly collected pollen from the immediate vicinity of the pollen trap and the results between sites would not be comparable.) With low wind speeds (v ≈ 0.7–0.8 m·s−1) the trap installed in the above height collects pollen from a region of 60–70 km radius, while strong winds can transport pollen to the trap from much farther away [Makra et al., 2008]. In this way, pollen levels measured are relevant to medium- and long-range transport.
 Accepting the classification of Rantio-Lehtimäki  concerning transport ranges implies that a 24 h pollen sampling covers the 100 km limit of medium-range transport if the wind speed is around 1.2 m s−1. This limit value is close to the above-mentioned range of 60–70 km radius for trapping pollen during a 24 h period. Accordingly, we discriminate only two categories of pollen transport, namely, medium-range transport (which contains a good mix of local and more distant pollen sources gathered on the wind) and long-range transport (with an important role of wind parameters). A similar concept is outlined in the work of Gassmann and Pérez  that discriminates regional and extraregional pollen transport. It is very hard to distinguish the medium- and long-range transport of pollen, and this is why no attempt has been made to determine the relative weight of these two components in the measured pollen concentration.
 Therefore, we selected those days in the pollen season when 24 h mean pollen concentrations exceeded the upper quartile for each given pollen species and city. Pollen dispersion on a given day is substantially influenced by whether it is a nonrainy or rainy day. Accordingly, days of the arrival points of the backward trajectories were divided into two groups, i.e., nonrainy and rainy days. This kind of classification of days reveals the role of precipitation in the quantity of transported pollen [Spieksma and den Tonkelaar, 1986; Galán et al., 2000]. After performing a factor analysis, a special transformation was carried out for each city with the two groups and the 500 m, 1500 m and 3000 m arrival heights of the backward trajectories. Thus, altogether 3 × 2 × 3 = 18 procedures were performed. The threshold for defining the number of the retained factors is the criterion of reaching at least 80% of the total variance in the original variables [Jolliffe, 1993]. The main conclusion is as follows. For Thessaloniki on nonrainy days, long-range transport has higher impact on daily pollen concentration, while on rainy days it is medium-range transport that has a higher impact. For Szeged on nonrainy days, medium-range transport is important, while on rainy days the two transport ranges have almost equal weights, with medium-range transport having a slighter higher value. For Hamburg, long-range transport has a primary role on the nonrainy and rainy days.
 Note that these findings are valid only for variations of daily pollen concentrations accounted for by the eight explanatory variables and nothing is known about the variance portion not explained by these variables. However, without saying anything about the ratio of pollen transported by different modes, the origin of measured pollen can be studied by its daily course. If the mean bihourly course of daily pollen counts displays a daytime peak, they are probably of local origin. On the other hand, if their peak value displays a maximum at night or at early dawn, they may have arrived by long-range transport [Šikoparija et al., 2009]. According to the bihourly mean pollen concentrations, the bulk of Urticaceae pollen in Thessaloniki might have come via long-range transport (daily pollen peaks are between 0.00 and 4.00 h [Damialis et al., 2005]). Although Damialis et al.  did not calculate bihourly mean pollen concentrations separately on nonrainy and rainy days, the results can be presumed for nonrainy days as rainy days are rare in the pollination period of Urticaceae in Thessaloniki. Since the long-range transport has a higher impact on daily pollen levels for nonrainy days, our results are confirmed by their calculations. Ambrosia pollen in Szeged is considered to be the result of medium-range transport involving local pollen dispersion (daily pollen peaks occur between 8.00 and 16.00 h [Juhász and Juhász, 1997]). As medium-range transport was found to have a higher weight in forming daily pollen levels both for nonrainy and rainy days, our results are in accordance with the findings of Juhász and Juhász . The maximum daily Poaceae pollen concentrations in a rural area of west Wales, located at somewhat lower latitude than Hamburg, are typically recorded between 14.00 and 16.00 h [Norris-Hill, 1999]. Hence pollen grains trapped here, assuming that also in the region of Hamburg, may have arrived via medium-range transport including local pollen dispersion. Our analysis for Hamburg, however, shows the primary role of long-range pollen transport both for nonrainy and rainy days. This result does not agree with the conclusion of Norris-Hill  for west Wales. The reason, contrary to the similar climate of the two areas, might be that Poaceae pollen levels are high and very high in the continental western Europe (Figure 3) and the transport of pollen by backward trajectories crossing France, Belgium, the Netherlands and northwestern Germany contribute substantially to the measured pollen concentration around Hamburg (see clusters 2, 6 and 7, Figure 11).
 The authors would like to thank Roland Draxler for his useful advice and consultations on the HYSPLIT model, version 4.8; Miklós Juhász for providing pollen data of Szeged; Dimitrios Gioulekas for his part in collecting the pollen data for Thessaloniki; Siegfried Jäger for supplying pollen data and details about pollen sampling for Hamburg; and Zoltán Sümeghy for the digital mapping in Figures 1–3. The authors gratefully acknowledge the NOAA Air Resources Laboratory (ARL) for the provision of the HYSPLIT transport and dispersion model and READY website (http://www.arl.noaa.gov/ready/hysplit4.html) used in this publication. The European Union and the European Social Fund have provided financial support to the project under the grant agreement TAMOP 4.2.1/B-09/1/KMR-2010-0003 and TAMOP 4.2.1/B-09/1/KONV-2010-0005.