Predicting invasion success of forest pathogenic fungi from species traits

Authors

  • Aurore Philibert,

    1. INRA, UMR211 INRA-AgroParisTech BP01 78850 Thiverval-Grignon, France
    2. INRA, UMR518 INRA-AgroParisTech 16 rue Claude Bernard 75005 Paris, France
    Search for more papers by this author
  • Marie-Laure Desprez-Loustau,

    Corresponding author
    1. INRA, UMR1202 BIOGECO INRA-Université Bordeaux 1, Equipe de Pathologie forestière, 69 route d’Arcachon, Pierroton, 33610 Cestas, France
    Search for more papers by this author
  • Bénédicte Fabre,

    1. INRA, UMR1136 IAM Nancy-Université, Equipe Ecologie des Champignons Pathogènes Forestiers, 54280 Champenoux, France
    Search for more papers by this author
  • Pascal Frey,

    1. INRA, UMR1136 IAM Nancy-Université, Equipe Ecologie des Champignons Pathogènes Forestiers, 54280 Champenoux, France
    Search for more papers by this author
  • Fabien Halkett,

    1. INRA, UMR1136 IAM Nancy-Université, Equipe Ecologie des Champignons Pathogènes Forestiers, 54280 Champenoux, France
    Search for more papers by this author
  • Claude Husson,

    1. INRA, UMR1136 IAM Nancy-Université, Equipe Ecologie des Champignons Pathogènes Forestiers, 54280 Champenoux, France
    Search for more papers by this author
  • Brigitte Lung-Escarmant,

    1. INRA, UMR1202 BIOGECO INRA-Université Bordeaux 1, Equipe de Pathologie forestière, 69 route d’Arcachon, Pierroton, 33610 Cestas, France
    Search for more papers by this author
  • Benoit Marçais,

    1. INRA, UMR1136 IAM Nancy-Université, Equipe Ecologie des Champignons Pathogènes Forestiers, 54280 Champenoux, France
    Search for more papers by this author
  • Cécile Robin,

    1. INRA, UMR1202 BIOGECO INRA-Université Bordeaux 1, Equipe de Pathologie forestière, 69 route d’Arcachon, Pierroton, 33610 Cestas, France
    Search for more papers by this author
  • Corinne Vacher,

    1. INRA, UMR1202 BIOGECO INRA-Université Bordeaux 1, Equipe de Pathologie forestière, 69 route d’Arcachon, Pierroton, 33610 Cestas, France
    Search for more papers by this author
  • David Makowski

    1. INRA, UMR211 INRA-AgroParisTech BP01 78850 Thiverval-Grignon, France
    2. INRA, UMR518 INRA-AgroParisTech 16 rue Claude Bernard 75005 Paris, France
    Search for more papers by this author

Correspondence author. E-mail: loustau@bordeaux.inra.fr

Summary

1. Biological invasions are a major consequence of globalization and pose a significant threat to biodiversity. Because only a small fraction of introduced species become invasive, identification of those species most likely to become invasive after introduction is highly desirable to focus management efforts. The predictive potential of species-specific traits has been much investigated in plants and animals. However, despite the importance of fungi as a biological group and the potentially severe effects of pathogenic fungi on agrosystems and natural ecosystems, the specific identification of traits correlated with the invasion success of fungi has not been attempted previously.

2. We addressed this question by constructing an ad hoc data set including invasive and non-invasive species of forest pathogenic fungi introduced into Europe. Data were analysed with a machine learning method based on classification trees (Random Forest). The performance of the classification rule based on species traits was compared with that of several random decision rules, and the principal trait predictors associated with invasive species were identified.

3. Invasion success was more accurately predicted by the classification rule including biological traits than by random decision rules. The effect of species traits was maintained when confounding variables linked to residence time and habitat availability were included. The selected traits were unlikely to be affected by a phylogenetic bias as invasive and non-invasive species were evenly distributed in fungal clades.

4. The species-level predictors identified as useful for distinguishing between invasive and non-invasive species were traits related to long-distance dispersal, sexual reproduction (in addition to asexual reproduction), spore shape and size, number of cells in spores, optimal temperature for growth and parasitic specialization (host range and infected organs).

5.Synthesis and applications. This study demonstrates that some species-level traits are predictors of invasion success for forest pathogenic fungi in Europe. These traits could be used to refine current pest risk assessment (PRA) schemes. Our results suggest that current schemes, which are mostly based on sequential questionnaires, could be improved by taking into account trait interactions or combinations. More generally, our results confirm the interest of machine learning methods, such as Random Forest, for species classification in ecology.

Introduction

Biological invasions, resulting from deliberate and unintentional species transfers, are a major consequence of globalization and pose a significant threat to biodiversity (Levine & D’Antonio 2003; Hulme 2009; Vilàet al. 2010). Many authors have ruled out a purely idiosyncratic phenomenon resulting from complex interactions between the history of introduction, the invading species and the recipient community and have sought to identify consistent characteristics of invading species or of invaded communities (Kolar & Lodge 2001; Gasso et al. 2009; Sakai et al. 2001). This important area of research is particularly relevant to risk assessment and management, as it provides a knowledge-based approach to identification of the most threatening species and the most vulnerable communities (Stohlgren & Schnase 2006). The species traits favouring invasion success (invasiveness) have been extensively investigated in animals and plants. As only a small fraction of introduced species become invasive (Williamson & Fitter 1996), it has been suggested that invaders possess traits favouring their successful establishment, spread and impact on recipient communities. In their recent cross-taxa review, Hayes & Barry (2008) identified climate/habitat match as the only characteristic consistently and significantly associated with invasive behaviour across biological groups. However, some specific attributes were found within groups, such as vigorous vegetative growth, long flowering period and attractiveness to humans for plants (Lloret et al. 2005; Pyšek & Richardson 2007).

The fungi constitute a large biological group, but the specific traits correlated with their invasion success have not yet been investigated (Desprez-Loustau et al. 2007). Furthermore, very few studies have addressed accidental introductions (Hayes & Barry 2008).

In this study, we aimed to identify traits linked to invasiveness in forest pathogenic fungi. The high environmental, economic and social impacts of introduced pathogens in forest ecosystems (Sache et al. 2011) and the failure of current regulatory measures to control them justify efforts to develop predictive approaches of invasion success of these organisms (Brasier 2008). Davis (2003) also suggested that greater emphasis in invasion ecology should be placed on intertrophic interactions between introduced species and long-term residents, because these interactions are more likely to cause extinction than interactions at the same trophic level. Indeed, several tree pathogens, such as Ophiostoma novo-ulmi, the agent of Dutch elm disease, or Cryphonectria parasitica, the agent of chestnut blight, almost drove their host species to extinction in the area of introduction (Desprez-Loustau et al. 2007).

We addressed this question by constructing an ad hoc data set for invasive and non-invasive species of forest pathogenic fungi introduced into Europe. Data were analysed with a machine learning method based on classification trees (Random Forest, RF) (Cutler et al. 2007; Stohlgren et al. 2010). The performance of the RF classification rule based on species traits was compared with that of several random decision rules. This approach made it possible to identify the most important species-level predictors for distinguishing between invasive and non-invasive species, with a view to improving pest risk assessment procedures in particular.

Materials and methods

Data

Species

Fungal species alien to Europe were recently listed in the European Union (EU)-funded DAISIE project (Desprez-Loustau 2009; Desprez-Loustau et al. 2010). Forty of these alien species were forest tree pathogens. Several data sources (European Plant Protection Organisation, EPPO, particularly interception data sets; Plant Protection services, bibliographic data bases) were used to add to the DAISIE data set, to obtain a more comprehensive set of species both alien to Europe (EU27, i.e. the 27 countries included in EU, Norway, Switzerland, and former Yugoslavia) and reported at least once in this area before May 2009. The resulting data set includes a total of 47 species: 40 true fungi (i.e. belonging to the Eumycota kingdom) and seven Phytophthora species (belonging to Oomycota, in the Stramenopila kingdom; Fig. 1). We included these seven Phytophthora species because, despite the distant relationship of Oomycota to Eumycota in the phylogenetic tree, there is strong morphological, biological and ecological convergence of Phytophthora with fungi (filamentous mycelium, production of spores and pathogenicity to plants). Separate analyses were performed with and without Phytophthora species.

Figure 1.

 Location of the species in the phylogenetic tree of fungi and pseudo-fungi = Stramenopiles (from James et al. 2006). Invasive and non-invasive refer to the classification in Table 1.

Response variable

The delimitation of two groups of species defined as ‘invasive’ or ‘non-invasive’ in Europe was not straightforward. We overcame the problems associated with different subjective judgements and the use of different terminologies to determine the status of individual species, by basing our classification on objective criteria. The 47 species were assigned to seven classes defining different stages in the introduction–invasion continuum (Table 1). Eventually, species not established in natural habitats were considered to be non-invasive (19 species), and species established in forests, on indigenous tree species (28 species), were considered to be invasive. This approach yielded two groups of comparable size, and the invasion criterion corresponded to the emergence of a new disease.

Table 1.   Classification of fungal species based on their invasion stage in Europe. Establishment implies repeated reports of the disease-causing organism. Spread was estimated from the distribution of the species in European countries (Desprez-Loustau 2009). Ecological impact relates to the infection of an indigenous host and the severity of the damage caused
ClassInvasion stageNumber of speciesStatus
1Introduced, non-established (interceptions)5Non-invasive
2Introduced, non-established in natural habitats (only anthropogenic habitats, such as nurseries, gardens and parks)14
3Introduced, established locally in non-anthropogenic habitats, significant local ecological impact4Invasive
4Introduced, widely spread, low ecological impact8
5Introduced, widely spread, low ecological but significant economic impact6
6Introduced, widely spread, high ecological and medium to high economic impact8
7Cryptogenic (alien or emerging) species, widely spread and with high ecological impact2

Explanatory variables

We established an initial list including many traits thought to be potential drivers of the successful transition between different stages of invasion: survival, population build-up, spread and impact. We tried to document these traits for the 47 species by an intensive search in bibliographic data bases, the Crop Protection Compendium (CABI 2009, http://www.cabicompendium.org/cpc/home.asp) and reference books (Viennot-Bourgin 1949; Lanier et al. 1976; Sinclair, Lyon & Johnson 2005) as the main data sources. The variables from this initial list that were not documented for a large proportion of species, and/or for which the level of uncertainty was high, were removed, and the definition of the remaining variables was refined to avoid ambiguities. Each species profile was then completed and checked by at least two people. The variables were initially binary, categorical or quantitative. All variables were transformed into binary variables, to ensure a similar weight in analyses and to facilitate interpretation based on biological hypotheses. For example, spore volume and shape were preferred to spore dimensions, included in the species descriptions, as more ecologically meaningful. They were computed from the length and width of mitospores (the spores produced by asexual reproduction, present in all 47 species) and were redefined as categorical variables using threshold values. The final data set included 19 binary variables, among which six had missing data (Table 2).

Table 2.   Species traits and confounding variables. All variables except ‘date of introduction’ are binary (yes/no)
VariableDefinitionNumber of missing dataReasoning for predictor choice
Mitospore characteristics
 Small SizeVolume <250 μL3; calculated as inline image0As for any particle, spore size and shape affect release, transport and deposition, particularly for aerial pathogens (Stockmarr, Andreasen & Østergård 2007). Two binary variables related to spore volume were defined, based on the notion that different spore sizes may be associated with different strategies: small spores for higher dispersal, large spores for higher survival and medium spores representing a trade-off between the two strategies. Spore volume in 35 species of common native forest pathogenic fungi in France presented a trimodal distribution, from which the thresholds between classes were derived (M.L. Desprez-Loustau, unpublished data)
 Medium SizeVolume between 250 and 2500 μL30
 ShapeMitospore length-to-width ratio <30The threshold value of 3 corresponds to the optimal shape for ascospores defined by a biomechanical approach (Roper et al. 2008)
 Pigmented 0Pigmented spores are more resistant to ultraviolet radiation (Durrell 1964), potentially increasing their survival
 Pluricellular 0The cell number of spores has been shown to be a factor associated with survival in stressful environments (Shathele 2009)
 DiploidHaploid vs. non-haploid0Ploidy affects fitness components, in particular spore germination (Quintanilla & Escudero 2006)
Existence of a mechanism of Long-Distance DispersalMito- or meiospores vectored by wind, insects or running water (vs. rainsplash only)0Species with a long-distance dispersal mechanism are more likely to extend their range rapidly, thus becoming invasive
Life cycle
 Latency durationPeriod of latency between mitospore infection and secondary spore production <30 days5Species with a short latency (generation time) are more likely to build up large populations rapidly
 PolycyclicNumber of spore production cycles per year3Same reasoning as for latency
 Sexual reproduction 0Sexual fruiting bodies are often resistant forms, for overwintering for example, in fungi. Sexual spores are often wind dispersed. Recombination in sexual reproduction yields individuals with novel attributes, possibly increasing virulence to new hosts (Fraser et al. 2005)
 Selfing possible 10Selfing may favour expansion, by overcoming the need for a sexual partner (Fraser et al. 2005)
Parasitic strategy
 Non-obligate parasiteNon-obligate parasites can live independently of their host A saprophytic lifestyle can allow survival in the absence of suitable hosts
 MonogenicMonogenic parasites need a single host species to complete their life cycle0Heteroxenic parasites, which need several host species to complete their life cycle, are less likely to find a local combination of these species in the introduced area
 GeneralistHost range including at least two families0Species with broad niches (‘generalists’) are expected to invade more easily than species with narrower niches (‘specialists’), because they are more likely to find their resources
 Infection of perennial tissuesBark-wood vs. only expendable parts (leaves and twigs)0Parasites of perennial tree parts would be less likely to be affected by extinction
 Seed transmission 0Seed transmission may be linked to higher dispersal rates and impact
 EndophytismEndophytic parasites can live asymptomatically in their host until host conditions favour pathogenicity4Endophytic parasites are more likely to be overlooked when plants are introduced
Abiotic niche
 Optimal temperatureOptimal temperature for growth between 20 and 25 °C12Optimal growth temperature might be a proxy of climate-matching (the climate of origin is unknown for many species because of their unknown origin). We assumed that species with an optimum between 20 and 25 °C were more likely to be adapted to the temperate climate prevailing over much of Europe than species with an optimum below 20 or above 25 °C. Most native species of forest pathogenic fungi in France have their optimum growth temperature in the 20–25 °C range, whereas some species with lower optimum growth temperatures are found mostly in mountainous habitats and those with higher values are found in the Mediterranean area (M.L. Desprez-Loustau, unpublished data)
 Climate in the area of originTemperate vs. others (boreal, tropical, etc., in Köppen climate maps)14
Confounding variables
 Date of introductionEstimated as the date of first observation in Europe0Proxy of residence time; see reasoning in the main text
 Potential habitatSurface area covered by host species of the same genus: small if <0·5% of the total EU30 forest area (Köbler & Seufert 2001), large otherwise0Species pathogenic on commonly occurring European tree species, such as oaks and pines, would have more opportunity to establish and to spread

Previous analyses of relationships between species traits and invasion success have identified several factors that may be important sources of bias, phylogeny and residence time in particular (Wilson et al. 2007; Hayes & Barry 2008; Pyšek, Křivánek & Jarošík 2009). The potential bias owing to phylogeny was probably limited in our study, because invasive species and non-invasive species were equally frequent in most taxonomic groups (Fig. 1). Residence time was accounted for by using the date on which each species was first observed in Europe, according to previous publications and data bases. We also included a variable related to the potential habitat in the introduced area: the area covered by host tree species (Table 2). Separate analyses were performed with and without confounding variables.

Statistical methods

The Random Forest method (RF) was used to determine whether species traits were good predictors of the invasion status of the species and to identify the best such predictors for our data set.

The Random Forest method is a machine learning classification method based on a large number of decision trees (Breiman & Cutler 2003). The goal of classification tree methods is to create a set of classification rules (the branches) from the input variables included in a training set, making it possible to predict the location of new observations (here species) within the groups (the nodes, in this case ‘invasive’ or ‘non-invasive’) from the values of the input variables for these observations. The classification rules are built by recursive binary partitioning of the data set. Each binary split, based on an explanatory variable, is selected according to an index measuring the quality of its contribution to the classification, the Gini index (Breiman et al. 1984; Therneau & Atkinson 1997). The main improvement of RF over single-tree methods such as CART (Classification and Regression Tree; Breiman et al. 1984) lies in the integration of two random selection processes in the construction of each tree, one for data (observations) and the other for variables, increasing the robustness of the method (Ghattas 2000; Prasad, Iverson & Liaw 2006). Data samples are generated by a bootstrap technique (Efron & Tibshirani 1993). Individuals (species in our case) present in a bootstrap sample are referred to as ‘in-bag’ data, whereas the remaining individuals form the ‘out-of-bag’ (OOB) data. A non-pruned classification tree is built from each bootstrap sample with the CART method. For each tree, only a random sample of the explanatory variables is used to determine the best split at each node.

The OOB data can be used to rank the explanatory variables, through the calculation of a criterion called mean decrease accuracy (MDA). For each variable, MDA measures the increase in the rate of misclassification resulting from a random permutation of the values of this variable, the values of the other variables remaining unchanged. Thus, variables with a positive MDA improve the classification, and the higher the MDA, the greater the importance of the variable for classification.

The rate of correct classification by RF was calculated by leave-one-out cross-validation. Both the overall rate of misclassification (for both invasive and non-invasive species) and the rate of misclassification of invasive species were calculated. The accuracy of the species classification achieved with RF was compared with that of four reference random decision rules with different probabilities of being invasive (P): = 0 (no invasive species), = 0·5 (50% of species invasive), = 1 (100% of species invasive) and = proportion of invasive species in the data set (28/47 for all species, 23/40 for Eumycota). The rates of correct classification associated with these rules were calculated as:

image

for all species, and p for invasive species only, where nI is the number of invasive species and nNI is the number of non-invasive species.

Statistical analyses were carried out with R software (http://www.R-project.org, R Development Core Team 2009) and the randomforest package (Liaw & Wiener 2002). The three tuning parameters required by RF were set as follows:

  • The number of explanatory variables used at each node of each tree was set to the square root of the total number of variables, as described by Breiman (2001).

  • The total number of trees in the forest, corresponding to the number of bootstrap samples, was set at 100 000, after preliminary trials, as this was found to give robust results.

  • The number of iterations for the algorithm used to estimate missing values was set at 20.

Results

The proportions of species correctly classified with RF and with the four random decision rules are presented in Tables 3 and 4 for four data sets: the data sets including all species (i.e. Eumycota with Phytophthora spp) or true fungi only, with and without confounding variables. The proportion of species classified correctly was higher with RF (62·5–68·1%) than with the random decision rules (40·4–59·6%), for all four data sets (Table 3). The performance of RF was even better when judged by its classification of just the invasive species, with the frequency of correct classification reaching 73·9–82·1% (Table 4). These values were systematically higher than the proportions of well-classified species obtained with random decision rules based on < 1.

Table 3.   Performance of Random Forest (including species traits) and four random decision rules (with P, the probability of being invasive) expressed as the percentage of well-classified invasive and non-invasive species. Confounding variables are the date of introduction and potential habitat
Data setRandom ForestRandom decision rule
= proportion of invasive species in the data set= 0= 0·5= 1
Eumycota and Phytophthora, without confounding variables63·851·840·45059·6
Eumycota, without confounding variables62·551·142·55057·5
Eumycota and Phytophthora, with confounding variables68·151·840·45059·6
Eumycota, with confounding variables62·551·142·55057·5
Table 4.   Performance of Random Forest (including species traits) and four random decision rules (with P, the probability of being invasive) expressed as the percentage of correctly classified invasive species. Confounding variables are the date of introduction and potential habitat
Data setRandom ForestRandom decision rule
= proportion of invasive species in the data set= 0= 0·5= 1
Eumycota and Phytophthora, without confounding variables78·659·6050100
Eumycota, without confounding variables78·357·5050100
Eumycota and Phytophthora, with confounding variables82·159·6050100
Eumycota, with confounding variables73·957·5050100

The MDA values reported in Figs 2 and 3 were used to rank biological traits in the order of importance for the classification of both invasive and non-invasive species. When no confounding variables were included in the analysis, the traits with the highest MDA were spore shape, sexual reproduction, long-distance dispersal, optimal temperature and climate at the area of origin. Pluricellular spore, spore size and infection of perennial tissues also had a positive MDA for both the Eumycota + Phytophthora and Eumycota data sets. The host range variable (generalist) was influential in the analysis of Eumycota but not when Phytopthora spp were added. Most of the MDA values were lower when the two confounding variables were added to the biological traits, except for the non-obligate parasite variable which took relatively more importance (Fig. 3). However, even when the confounding variables were included in RF, spore shape, sexual reproduction and long-distance dispersal remained the biological traits with the highest MDA values.

Figure 2.

 Importance of the variables for classifying invasive and non-invasive species, as assessed by mean decrease accuracy (MDA) calculated with Random Forest. Data sets including Eumycota + Phytophthora or Eumycota species were considered successively. Variables were sorted in decreasing order of MDA for Eumycota + Phytophthora. Confounding variables were not included in the analysis.

Figure 3.

 Importance of the variables for classifying invasive and non-invasive species, as assessed by mean decrease accuracy (MDA) calculated with Random Forest. Data sets including Eumycota + Phytophthora or Eumycota species were considered successively, with two confounding variables included in the analysis (date of introduction and potential habitat). Variables were sorted as in Fig. 2.

The most important traits for the classification of just the invasive species (Fig. 4) were those already identified in Fig. 2. In addition, the polycyclic and latency duration variables also had a small positive effect on classification. As observed for the classification of all species, inclusion of the confounding variables reduced MDA for most biological traits but the variables with the highest MDA remained the same (Fig. 5).

Figure 4.

 Importance of the variables for the classification of just the invasive species, as assessed by mean decrease accuracy (MDA), calculated with Random Forest. Data sets including Eumycota+Phytophthora or Eumycota species were considered successively. Confounding variables were not included in the analysis. Variables were sorted as in Fig. 2.

Figure 5.

 Importance of the variables for the classification of just the invasive species, as assessed by their mean decrease accuracy (MDA), calculated with Random Forest. Data sets including Eumycota + Phytophthora or Eumycota species were considered successively, with two confounding variables included in the analysis (date of introduction and potential habitat). Variables were sorted as in Fig. 2.

Overall, three variables, sexual reproduction, spore shape and long-distance dispersal were consistently among the most important traits according to MDA. Several other traits had a high MDA in most classifications: optimal temperature, pluricellular spore, climate in the area at origin. The spore size, type of infected organ and Endophytism variables had a positive MDA in most cases, but with lower values. Some variables had a negative MDA in most cases, consistent with an absence of predictive value for the classification of species as invasive or non-invasive: selfing, monogenic, seed transmission, diploid spore and pigmented spore. The results obtained were consistent for the various data sets, both with and without Phytophthora spp., indicating that the same variables were important for differentiating between invasive and non-invasive fungi sensu stricto and sensu lato (Spearman’s rank correlation of MDA values: r = 0·86; < 2·2 × 10−16). The introduction of the two confounding variables into the statistical analysis decreased the MDA values of the biological traits, but the ranks of the traits obtained with and without confounding variables were significantly correlated (r = 0·90, < 2·2 × 10−16 for Eumycota + Phytophthora, r = 0·59, = 0·01 for Eumycota only).

Discussion

Prediction of the invasiveness of pathogenic fungi is challenging. In addition to the difficulties highlighted for other taxa (Hayes & Barry 2008), fungi have been less extensively studied and are less visible and most introductions were unintentional (Desprez-Loustau et al. 2007). The data available for statistical analyses are therefore fewer in number and less accurate. The Random Forest method has proved to be an efficient and reliable classifier, highly suitable for data sets such as ours, in which data are reported for a small number of species but a large number of variables, with some missing data (Cutler et al. 2007; Stohlgren et al. 2010). Using this method, we showed that certain species traits were useful predictors of invasion success in forest pathogenic fungi alien to Europe.

Data-related issues

One of the limitations of our study was the small number of species included in the data sets. The use of native species as a control group would have provided a larger number of species, but we thought it inappropriate here. Many comparisons of invasive and native species have been performed in invasion ecology, but the principal objective in most of these studies was to investigate how the introduced species out-competed the resident species. Here, the determinant factor in the invasion of pathogenic fungi was assumed to be the parasitic behaviour of these species on trees rather than competition with indigenous species at the same trophic level.

Comparisons between established and non-established introduced species appeared more pertinent, but they may be subject to biases in reporting and information quality for the two groups (Keller & Drake 2009; Hayes & Barry 2008). Only in the last few decades have introduced non-established species been listed as interceptions during health inspections at European borders. Furthermore, trait data are usually more readily available for well-known invasive species.

The date of first observation in Europe, identified from accessible publications and data bases, was the best estimate of residence time we could find. Records for most fungal species, particularly for micromycetes, are too sparse for a statistical analysis of temporal trends of invasion and a more accurate estimation of the date of introduction and lag phase, as proposed for plants (Aikio, Duncan & Hulme 2010). Use of the date of first observation probably results in an underestimation of residence time, because many fungal species may remain inconspicuous and undetected for years, particularly if they are non-invasive.

The importance of residence time and other confounding variables in the prediction of invasion status and the identification of trait predictors

The key role of residence time in determining invasion success has been highlighted in previous studies and for other organisms (Wilson et al. 2007; Desprez-Loustau et al. 2010) and was confirmed in our analysis. The date of first observation was the variable with the greatest influence on classification accuracy in all analyses in which it was included. Recently introduced species that are currently assumed to be non-invasive may become invasive in the future. However, residence time did not appear to be a source of bias for the identification of other predictors. Indeed, the inclusion of a proxy for residence time in RF analyses did not eliminate the effects of trait-related variables and had little effect on the rankings of these variables, providing evidence of the reliability of the traits identified in our study. Furthermore, the rates of correct species classification were not increased by adding the confounding variables to the analysis. The performances of RF decision rules derived with and without these variables were very similar and were sometimes better without the confounding variables.

Most fungal clades contained similar numbers of invasive and non-invasive species in our data sets, and the output of analyses was similar for data sets restricted to Eumycota or extended to include Phytophthora spp. Most results in our study may therefore apply to different phylogenetic backgrounds for fungi sensu lato.

Trait predictors of invasion success

Classification rules based on species traits systematically outperformed random decision rules, providing a clear demonstration that some traits are useful predictors of invasion success in forest pathogenic fungi alien to Europe. RF analyses made it possible to establish a hierarchy in the importance of these traits. The three most consistently influential variables referred to attributes favouring the ability to spread. This is particular true for the existence of a mechanism of long-distance dispersal mediated by wind, running water or insect vectors rather than rainsplash alone. Interestingly, spore shape, which was a less obvious predictor, also consistently obtained a high MDA. Aerial pathogens are overrepresented among invasive plant pathogenic fungi (Desprez-Loustau et al. 2010), therefore also in our data set. As for any particle, spore size and shape affect release, transport and deposition, particularly for aerial pathogens. We used a very simple shape descriptor for which a threshold value corresponding to the optimal drag-minimizing shape for spores ejected in the air could be defined (Roper et al. 2008). Our results suggest that the length-to-width ratio of spores is a potentially useful proxy for fungal dispersal that is both relevant and readily available. Sexual reproduction that was among the most influential variables also has a role in survival and dispersal (Table 2). The importance of dispersal-related traits in our models indicates that natural dispersion after introduction into new areas plays a key role in the invasion process, although human activities are an important factor in the unintentional movement of pathogens. Sexual reproduction may also affect invasion success by generating more virulent variants. This may be of key importance by enabling pathogenic species to adapt to newly encountered host species following their introduction into a new geographic area (McDonald & Linde 2002; Parker & Gilbert 2004). By contrast, traits mostly relating to asexual reproduction, such as the duration of the cycle and the number of sporulation cycles, played a less important role. They did not appear to be important variables for discrimination between invasive and non-invasive species and were of only borderline utility for the prediction of invasive species.

Other traits related to parasitic strategy were among the variables with a positive MDA, such as host range for Eumycota. Generalist species are thought to be better invaders than specialists because they are more likely to find appropriate resources in their new environment (Vall-Ilosera & Sol 2009). However, in the case of parasites, host specialization may not necessarily hinder success in new environments (Parker & Gilbert 2004; Gilbert & Webb 2007). Many successful invaders among forest pathogenic fungi, such as Cryphonectria parasitica, are specialist parasites that were able to make a host switch when introduced into new areas (Slippers, Stenlid & Wingfield 2005; Gilbert & Webb 2007). The decrease in MDA for generalist when Phytophthora spp were added might indicate a different behaviour: in contrast to fungi, most invasive Phytophthora spp in our data set were generalists.

Finally, optimum temperature and climate of origin were among the most important predictors in our analyses. This result highlights the importance of climate-matching, previously identified as the only characteristic consistently associated with invasive behaviour in both animals and plants (Hayes & Barry 2008).

Management applications

The management of forest diseases in general, and of those caused by exotic pathogens in particular, is based principally on preventive measures. It is often difficult to apply eradication measures once a pathogen has been introduced, and such methods are rarely successful (Desprez-Loustau 2009). This may be because the populations of invasive species have already reached intractable levels by the time the first conspicuous symptoms are observed. Identification of the traits characterizing invaders has been proposed as an effective method for targeting efforts to the most likely problematic species, using risk assessment approaches (Keller, Lodge & Finnoff 2007; Keller & Drake 2009). Pest risk assessment (PRA) schemes have been developed by International Plant Protection Organisations as a way of justifying regulatory measures with potentially negative effects on trade. PRA schemes conform to the International Plant Protection Convention standards (ISPM) recognized by the World Trade Organisation (MacLeod et al. 2010). Most PRA schemes make use of structured questionnaires, in which experts are asked to score several items on an ordinal scale. Various combinations of verbal and numerical score ratings are used by the European Plant Protection Organisation (EPPO), the North American Plant Protection Organization (NAPPO) and Biosecurity Australia (BA). Until recently, PRA schemes focused principally on crop plants and economic impact, but ISPM11 was recently revised to include environmental aspects. Our results could be used to improve PRA schemes, particularly for pathogens of non-crop plants, such as forest trees, by supporting, ranking and refining the criteria already in use and suggesting new criteria. For example, question 11 of the EPPO scheme ‘Does the organism have intrinsic attributes that indicate that it could cause significant harm to plants’ could refer explicitly to long-distance dispersal, sexual reproduction and spore attributes, such as spore shape. Our results also demonstrated the potential utility of methods, such as RF, including interactions between variables. This approach is likely to prove more relevant than sequential trait-based approaches, because different ecological strategies, determined by particular combinations of traits, may be associated with invasion success (Küster et al. 2008). CART has already been used in risk assessments for invasive species in several studies (Kolar & Lodge 2002; Keller & Drake 2009; Vall-Ilosera & Sol 2009). Random Forest offers a highly reliable method that is increasingly used for risk prediction in clinical research (Sun 2010), but little as yet in ecology (but see Cutler et al. 2007; Stohlgren et al. 2010), and never, to our knowledge, in PRA.

Conclusion

Trait-based RF analyses predicted invasive species of forest pathogenic fungi in Europe with a success rate of 73·9–82·1%. This success rate is close to that reported for well-studied groups, such as plants, in which risk assessment has been shown to be of economic benefit (Keller & Drake 2009). Expanding the spatial frame of our study, thus including a wider range of species, such as invasive and non-invasive fungi and Phytophthora spp in North America, would make it possible to validate and to generalize our results more widely, at least for temperate areas. Nevertheless, these preliminary results are promising and represent a first step towards the construction of accurate trait-based predictive models of invasive plant pathogens. These models could improve PRA, if combined with pathway analyses determining the likelihood of introduction (Brunel 2009).

Acknowledgements

This work was funded by the European PRATIQUE project (7th Framework Programme) and the French ANR EMERFUNDIS project. We thank D. Piou (Département Santé des Forêts) and Anne-Sophie Roy (EPPO) for providing data and T. Giraud, I. Sache and other participants of the Pratique and Emerfundis projects for interesting discussions. We thank the editor and anonymous reviewers for constructive comments on the manuscript and Julie Sappa for English editing.

Ancillary