Ecological niche modelling techniques were applied to address the questions of the origins and potential geographic extent of sudden oak death, caused by the pathogen Phytophthora ramorum. Based on an ecological niche model derived from the phytopathogen's California distribution and distributions of potential host species, it was determined that the disease has high potential to colonize the southeastern United States, and that its likely source area is eastern Asia.
Species of Phytophthora are ecologically and economically important phytopathogens that occur in natural and agricultural ecosystems (Erwin & Ribeiro, 1996). Introduced Phytophthora species can be particularly devastating, causing pervasive and persistent blights or rots that affect single species (e.g. Port Orford cedar root rot; P. lateralis; Jules et al., 2002), ecological communities (e.g. P. cinnamomi in Australian Eucalyptus marginata forests; Weste, 2003) and important commercial crops (e.g. potato and tomato late blight; P. infestans; Fry & Goodwin, 1997).
In 2001, P. ramorum was described and identified as the causal agent of a twig blight affecting nursery and garden Rhododendron spp. in Germany and the Netherlands since 1993 (Werres et al., 2001). In 2002, P. ramorum was identified as the agent responsible for ‘sudden oak death’ (SOD; Rizzo et al., 2002), a syndrome first observed in 1994 and associated with widespread mortality of oaks (Quercus spp.) and tanoaks (Lithocarpus densiflorus) along the coastal ranges of California, USA. The recent emergence of P. ramorum and the discovery that North American and European forms are of different mating types suggest that this pathogen is an introduced species, undergoing independent introductions from its native distributional area (Werres et al., 2001; Rizzo & Garbelotto, 2003).
Phytophthora ramorum affects a variety of species, particularly members of the Fagaceae (oaks and beeches) and Ericaceae (e.g. Vaccinium spp. and Rhododendron spp.), and hosts tend to exhibit either bark cankers or foliar symptoms (e.g. leaf lesions and twig dieback; Rizzo & Garbelotto, 2003). Among susceptible bark canker hosts (i.e. trees for which infection is usually lethal), P. ramorum is particularly aggressive to oaks in the section Lobatae, commonly referred to as red or black oaks (hereafter, red oaks; Garbelotto et al., 2003; Rizzo & Garbelotto, 2003). Given the broad distribution and high diversity of North American red oaks (Jensen, 1997), the emergence of SOD has raised concerns over the possibility and consequences of this pathogen becoming established in other parts of the continent (Rizzo & Garbelotto, 2003).
To determine P. ramorum's potential North American and global distribution, this study used ecological niche modelling (ENM). ENM in general involves relating known occurrence points to a suite of spatially explicit variables representing features of the ecological landscape in a predictive algorithm to produce a rule set that describes the species’ ecological requirements (i.e. niche, sensuGrinnell, 1917). Although this technique is novel in plant pathology, it has seen extensive application in biodiversity science and human disease applications, so it represents a logical tool for application to the question of the potential geographic distribution of P. ramorum.
Materials and methods
Commonly employed ENM approaches include climate matching, basic multivariate logistic regression analyses, and models based on environmental similarity or distance (Guisan & Theurillat, 2000; James & McCulloch, 2002; Scott et al., 2002; Kadmon et al, 2003; Rushton et al., 2004). These techniques are typically deterministic, focussing on a single decision rule or a small set of rules to describe the potential distribution of a particular species. Because distributional limits can be governed by different factors across a species’ geographic range (Grinnell, 1917; Swihart et al., 2003), more complex, multiple-criterion ENM approaches are desirable (Elith et al., 2006). What is more, comparisons among methods are still few, and have focused on interpolation challenges, whereas the challenge at hand in this paper is one of extrapolation to continental and even global scales, so choice of methodology is not clear.
One such multiple-criterion ENM that has seen broad application in particular to extrapolative challenges (Peterson, 2003) is the Genetic Algorithm for Rule-Set Prediction (garp), a machine-learning application. Machine-learning algorithms are computationally powerful because they are nonparametric, nonlinear and relatively unaffected by collinearity, and can explore and describe complex relationships among variables in complex solution spaces (Lek et al., 1996; Olden, 2000). In the present study the decision to model the ecological niche of P. ramorum using garp was made because of its multiple decision-rule capability, known robust predictive performance (Peterson & Cohoon, 1999; Stockwell & Peterson, 2002; Anderson et al., 2003), and ability to project niche models onto different geographic areas.
garp model inputs are species occurrence data (latitude/longitude) and environmental variables. Generally, a minimum of 20 occurrence points are needed to produce models that aren't overfit and unable to predict into unsampled areas (Stockwell & Peterson, 2002). Basic environmental inputs are raster geographic information system (GIS) coverages (see detailed explanations below) consisting of topographic and climate variables; biotic variables can also be included (e.g. a coverage representing host density), but typically are not used because these data are often lacking and/or difficult to represent in a spatially explicit manner. garp partitions occurrence data into training and test datasets, the former used to build the model, and the latter used to test the accuracy of the model.
garp describes nonrandom relationships between a species’ occurrence and the environment using multiple rules. Rules are if… then statements relating to environmental conditions and whether a species is present or absent under them. garp employs four rule types: atomic rules (use only single values of a variable), envelope rules (spanning ranges of values), negative range rules (as envelope rules, but with ranges being exluded from the prediction rather than included), and logit rules (logistic regression models) (Stockwell & Peters, 1999). After building an initial set of rules, garp makes iterative changes to them: in each iteration (‘generation’), rules undergo ‘genetic’ changes – insertions, deletions, point mutations, and crossing over among rules can all occur. In this manner, garp explores solution space flexibly to identify nonrandom relationships between environmental conditions and species’ occurrences. The final rule set is arrived at when iterative changes fail to improve the model's predictive accuracy, or when a user-defined number of iterations has been reached.
garp is not a deterministic algorithm, so its output (rule sets and associated predicted distributional areas) varies from run to run. To capture the variation among potential distributions, a best-subset approach can be employed, where the best models, as determined from overprediction (commission) and underprediction (omission) errors, are selected from the final model set (Anderson et al., 2003). These models can then be overlaid in a GIS to determine where they converge, that is, where spatially most or all of the models agree. garp is not a black box – it is possible to examine the rule sets, and a jackknife feature allows correlations to be made between model performance and input variables, providing some insight into how specific variables affect predictive accuracy (Peterson & Cohoon, 1999; Peterson et al., 2003a).
Because garp models distributions based on an organism's presence/absence within a landscape pixel (in this analysis, pixels were 0·08 × 0·08°; see below), one point was randomly selected from each pixel where P. ramorum occurred, yielding 68 spatially unique localities (Fig. 1a). From these records, a training dataset was compiled by randomly selecting a single point from each 0·25 × 0·25° block (∼28 × ∼28 km), ensuring input of at least 20 datapoints and capturing the overall spatial distribution of P. ramorum records. Based on this stratification, 25 points were used to build the models and 43 points for testing predictive accuracy, with models trained and results mapped at the finer 0·08° resolution.
Raster coverages (again, pixel size 0·08°) were assembled summarizing geographic distributions for 35 species of red oaks north of Mexico, as follows: Quercus acerifolia, Q. agrifolia, Q. arkansana, Q. buckleyi, Q. coccinea, Q. ellipsoidalis, Q. emoryi, Q. falcata, Q. georgiana, Q. graciliformis, Q. gravesii, Q. hemisphaerica, Q. hypoleucoides, Q. ilicifolia, Q. imbricaria, Q. incana, Q. inopina, Q. kelloggii, Q. laevis, Q. laurifolia, Q. marilandica, Q. myrtifolia, Q. nigra, Q. pagoda, Q. palustris, Q. parvula var. shrevii, Q. phellos, Q. pumila, Q. robusta, Q. rubra, Q. shumardii, Q. texana, Q. velutina, Q. viminea (Nixon, 1980) and Q. tardifolia, but excluding Q. wislizeni, as it has not yet been reported as a foliar or bark canker host of P. ramorum (Dodd et al., 2004). Two non-Quercus host trees, L. densiflorus and Arbutus menziesii, were also included. Distributional maps of 24 oaks and the two non-oak hosts were available as shapefiles (http://climchange.cr.usgs.gov/data/atlas/little/), which were converted into raster format. For the 11 remaining oak species, comparable datasets were created based on published range maps and distributional information (http://hua.huh.harvard.edu/FNA; http://www.csdl.tamu.edu/FLORA/tracy/main1.html; http://www.calflora.org). All host distributions were overlaid and aggregated over North America to derive a coverage of potential host tree distribution and used as a landscape mask to define the spatial extent across which SOD might be expected to occur in relation to host tolerances (see below).
Topographic and climatic inputs were from the Hydro-1 K dataset (U.S. Geological Survey; USGS) data describing topography (aspect, elevation, flow accumulation, flow direction, topographic index; http://edcdaac.usgs.gov/gtopo30/hydro/), monthly normalized difference vegetation index (NDVI) from years 1996–2000 derived from advanced very high resolution radiometer (AVHRR) imagery (http://edc.usgs.gov/geodata/), and yearly mean values (1961–1990) of diurnal temperature range, ground frost frequency, minimum, maximum and mean temperatures, precipitation, solar radiation, vapour pressure, and wet-day frequency drawn from the Intergovernmental Panel on Climate Change (IPCC) data warehouse (http://ipcc-ddc.cru.uea.ac.uk/). USGS and NDVI data were 0·01 × 0·01° base resolution (1 × 1 km), generalized to 0·08 × 0·08° (8 × 8 km) for analysis; IPCC data were 0·5 × 0·5° resolution (50 × 50 km) subdivided to 0·08 × 0·08° (8 × 8 km) for analysis. Each of the three environmental data sources provides a distinct sort of information: topography summarizes landscape form; climatic data inform about the atmospheric conditions above that landscape; and NDVI provides a view of seasonal patterns of greenness across landscapes (obviously at least in part a function of topography and climate). NDVI can be used to infer climatic conditions, but tends to correlate well only with minimum temperature, mean temperature and precipitation, and soil moisture (Wang et al., 2001; Adegoke & Carleton, 2002).
Based on these tradeoffs, five distinct ecological niche models were generated, differing in the type and spatial extent of environmental variables: NDVI + topography (constrained within the spatial distribution of potential host trees), NDVI + topography, climate + topography modelled over all of North America, and regional models of the latter two combinations. Regional models were based on environments represented within a 500-km buffer around known occurrence points.
For each set of models, the 10 best replicate garp runs were chosen based on the combination of omission and commission error components calculated from the independent testing data mentioned above (Anderson et al., 2003). These ‘best subsets’ of models were summed to produce maps of potential SOD distribution. In these summary outputs, values ranged from 0 (areas where no models predicted potential presence) to 10 (areas where all models predicted potential presence). In general, areas in which all best-subsets models agreed in predicting potential presence were focussed on. The best-subset models were then projected to other regions to outline potential geographic distributions for the pathogen in the USA and worldwide (Peterson & Vieglais, 2001; Peterson, 2003). This approach has seen extensive testing in a variety of invasive species systems, and has shown excellent abilities in predicting the geographic course of already-established invasions (Peterson & Vieglais, 2001; Papes & Peterson, 2003; Peterson, 2003; Peterson et al., 2003a,b).
To evaluate the accuracy of the garp predictions, the area under the receiver operating characteristic (roc) curve of the model's predictive performance was determined. An roc curve represents the relationship between the probabilities of a model correctly classifying true positives (sensitivity; Y-axis) and incorrectly classifying false positives (1 – specificity; X-axis) (Fielding & Bell, 1997). If these probabilities are equal, the model has no discriminatory power and will perform no better than random; this relationship is represented as a ‘line of no information’, a 1:1 relationship between X and Y axes (slope = 1) under which the area is 0·5 (Hanley & McNeil, 1982). As such, the area under curve (auc) is the probability at which the model will make a successful classification, ranging from random (0·5) to perfect discrimination (1·0). auc values of 0·81–1·00 are considered robust (Swets, 1988).
garp auc was calculated using a Wilcoxon test (Hanley & McNeil, 1982) to compare the number of landscape pixels where 0, 1, 2, etc., best-subset models predicted P. ramorum potential presence vs. the number of those pixels where the phytopathogen actually occurred (testing data; n = 25). The statistical significance of each model set was determined with a Z-test of the garp auc versus that of a random model (0·5; Hanley & McNeil, 1982).
Results and discussion
In general, the results of garp analyses of the different environmental datasets yielded similar geographic predictions. roc analysis showed that models based on NDVI values performed particularly well, in contrast to the models based on just climate + topography (Table 1). The close similarity of results based on very different environmental datasets (Fig. 1) suggests that the predictions are at least robust to choice of environmental datasets. In general, these predictions indicate ample potential for spread north and south along the coast, as well as potential for spread to the western slopes of the Sierra Nevada.
Table 1. Receiver operator characteristic area under curve (roc-auc) scores evaluating predictive ability of ecological niche models presented herein, including the statistical significance (P) and standard errors (SE) for the five Phytophthora ramorum ecological niche model sets. Predictivity of all models was significantly better than random (P << 0·05), but the climate + topography models were less able to discriminate between presences and absences in the independent test dataset
NDVI + topography
NDVI + topography (500-km buffer)
NDVI + topography (landscape mask)
Climate + topography
Climate + topography (500-km buffer)
To date, three different approaches have been used to model SOD distribution in California. Meentemeyer et al. (2004) applied a rule-based model, and predicted highest risk of SOD establishment in the state's northern and central coastal range between Del Norte and Santa Cruz counties (see Fig. 1 for county locations), and moderate risk in the foothills of the northern Sierra Nevada and in scattered patches of the southern coastal range from Santa Barbara county south to San Diego county. Support vector machines (svm) modelling by Guo et al. (2005) predicted areas of SOD risk in the central coastal range, from Mendocino county to Santa Barbara county, and the northern and central foothills of the Sierra Nevada. Venette & Cohen (2006) used a climate-matching model (climex) based on observed P. ramorum environmental tolerances, predicting favorable climatic conditions between Del Norte and Santa Cruz counties, and in the Sierra Nevada from Shasta county south to Tuolumne county. The garp model results are most similar to the svm outputs, but differ from all three approaches by predicting greater SOD potential distribution across more of the Sierra Nevada foothills and in San Diego county (Fig. 1).
Projecting the models across North America indicated highest potential for establishment in the southeastern USA (Fig. 2a). Previous efforts over the contiguous USA were made using climex (Venette & Cohen, 2006) and a set of five different approaches: rule-based, logistic regression, classification and regression trees, garp and svm (Kelly et al., in press). The climex, rule-based, and logistic regression models showed a broad SOD potential distribution across the eastern United States, with highest risk in the southeastern states, whereas the Kelly et al. (in press) garp and cart models demonstrated a band of highest risk spanning the middle southeast from the central Great Plains to the mid-Atlantic coast. The garp outputs described here most closely matched the region predicted by the svm model (Kelly et al., in press), which extends from the mid-Atlantic region southeast to eastern Texas.
Disagreement among the various modelling approaches may be the result of differences in computational approach, predictor variables, spatial/temporal scale, or combinations thereof. Discrepancies among models highlight uncertainties in predicting P. ramorum distribution (e.g. extent of potential range south of California's central coast). Conversely, consensus among modelling efforts strengthens the case for the validity of a prediction, and is a key consideration in evaluating an ENM's predictive ability when projected onto another region (i.e. beyond California). In this sense, for instance, the repeated conclusion that the southeastern USA is particularly vulnerable (Venette & Cohen, 2006; Kelly et al., in press; this study) should be a clear indication of high risk of establishment there. Mexico (Fig. 2a,b), which is generally considered a key centre of endemism for oaks (Nixon, 1993), should also be regarded as at considerable risk. Foliar hosts mediate the transmission of P. ramorum (Garbelotto et al., 2003), and the southeastern USA has high species richness among genera of known hosts (e.g. Rhododendron, Vaccinium) and potential hosts (e.g. Ericaceae: Kalmia, Gaylussacia) (http://plants.usda.gov/). This coincidence of ecological potential (e.g. in terms of climate) and presence of appropriate hosts indicates great potential for invasiveness in these regions.
Invasive plant pathogens have caused dramatic changes to native North American ecosystems, with substantial ecological and socioeconomic impacts (Crooks, 2002; Gilbert, 2002). Chestnut blight (Cryphonectria parasitica) and Dutch elm disease (Ophiostoma spp.) essentially eliminated the American chestnut (Castanea dentata) and American elm (Ulmus americana), respectively, from eastern deciduous forests (Keever, 1953; Karnosky, 1979). With the loss of chestnut and elm, red oaks became dominant, but the reduced species diversity and habitat heterogeneity of these stands provide fewer barriers to transmission of oak phytopathogens (Wilson, 2001). Of the 35 red oak species in the USA, 18 occur primarily in the southeast (Fig. 2b; Q. acerifolia, Q. arkansana, Q. buckleyi, Q. falcata, Q. georgiana, Q. hemisphaerica, Q. imbricaria, Q. incana, Q. inopina, Q. laevis, Q. laurifolia, Q. marilandica, Q. myrtifolia, Q. nigra, Q. phellos, Q. pumila, Q. shumardii and Q. texana). Many of these southeastern oaks are regional endemics or microendemics with relatively limited distributions (Jensen, 1997) that may predispose them to extinction if exposed to a novel pathogen (e.g. Terborgh, 1974). Oak seed mast is an important food for wildlife, the reduction of which could have substantial effects on the physical and trophic structure of eastern forests (McShea & Healy, 2002; Rizzo & Garbelotto, 2003). Red oaks are the most important commercial hardwood species in the southeastern USA (Howard, 2001; Johnson, 2001), and the regional timber industry may suffer significant economic impacts if P. ramorum becomes established. The southeastern USA appears to be at greatest risk of P. ramorum establishment and spread, given the region's appropriate ecological conditions, high richness of potential host plants, and relatively limited heterogeneity of deciduous forests.
The assessment made in this study is based on the observed geographic distribution of North American P. ramorum populations (the A2 mating type). To date, all P. ramorum isolates from North American forests have been A2, but the A1 type (representative of European populations) has been detected in US and Canadian nurseries (Ivors et al., 2004; Cave et al., 2005; Rizzo et al., 2005). The A1 and A2 mating types have different genotypic and phenotypic characteristics (Ivors et al., 2004), and sexual recombination between North American and European P. ramorum lineages could alter their ecology (e.g. environmental tolerances, range of suitable host species, and/or level of aggressiveness; Ivors et al., 2004; Cave et al., 2005; Rizzo et al., 2005).
Determining the geographic source of P. ramorum and sampling ‘native’ genomes would greatly enrich efforts towards understanding its genetic diversity, ecological amplitude, and the mechanisms of disease resistance in native-range hosts (Rizzo & Garbelotto, 2003; Ivors et al., 2004; Cave et al., 2005; Rizzo et al., 2005). Globally, regions of highest model convergence were widely scattered (Fig. 3b), but the most geographically coherent and consistently predicted potential source of P. ramorum was eastern Asia (Fig. 3c), a region speculated as the geographic origin of the pathogen (Brasier et al., 2004, 2005). Southern Japan (Kyushu, Shikoku, southern Chugoku, Kansai, southern Chubu and northwest Kanto prefectures), southern South Korea (Kyongsang-nampo province), and eastern China (Fujian and Yunnan provinces) were identified as high-priority areas for the search for native P. ramorum populations. These regions are within the ecological niche of the invasive Californian populations, have a diversity of known P. ramorum hosts (e.g. Ericaceae in Yunnan province; Brasier et al., 2004), and are well known as a source of species invasive to North America (National Research Council, 2002).
Finally, and most generally, P. ramorum distribution and ecology was modelled at least partly in the absence of information regarding host distributions and the importance of biotic interactions in shaping the species’ distribution. This approach stresses a key unresolved question in the broader field of community ecology (Ricklefs & Schluter, 1993) – what are the roles of interactions in shaping species’ distributions as a function of spatial scale and resolution? Although the suggestion has been made that autecology will determine coarse-scale distributions of species, whereas interactions will affect fine-scale distributions (Soberón & Peterson, 2005), much additional research is needed to resolve the question. In the present case, final and definitive testing will require considerable time, as P. ramorum spreads across regions and demonstrates its full distributional potential.
The views expressed in this paper are those of the authors and do not necessarily reflect the views or policies of the US Environmental Protection Agency. Mention of trade names or commercial products does not constitute endorsement or recommendation for use.