Author for correspondence: Ralph DeanTel: (919) 513 0022Fax: (919) 513 0024Email: firstname.lastname@example.org
Fungi have an astounding and diverse impact on this planet. While they are agents of human diseases and the cause of allergic reactions, factories for the conversion of carbon in environmental and industrially adapted systems, and potential biological weapons, their importance as plant pathogens is unparalleled. In plants alone, fungi cause tens of thousands of different diseases and are responsible for massive losses of food, fiber and forestry at an estimated annual cost of hundreds of billions of dollars. These losses are not only realized in the incomes of individual farmers and state economies, but contribute significantly to world hunger problems and issues relating to safeguarding a global food supply. Our collective understanding of how fungi, particularly plant pathogens, grow, reproduce, identify a host and cause disease is still at a formative stage. There is an equal lack of detailed knowledge about how a plant recognizes that it is being attacked and then mounts an adequate defense response. The advent of genomic technologies has given researchers an unprecedented opportunity to address these mysteries in a powerful and more holistic manner. Where the genetic revolution of only a few years ago allowed for the characterization of single genes, today's genomic technologies are facilitating the evaluation of the entire complement of genes in an organism and the discovery of the suites of genes that act during any one time or particular condition. This review will describe the recent development of tools for whole or partial genome analysis and multigenome comparisons. Th discussion focuses on the rice blast pathosystem as a case study.
The amount of information available on the genomes of diverse organisms continues to accumulate at an astounding rate. The first major group of organisms for which genomic technologies were brought to bear was the prokaryotes. As a direct result of these early research efforts, virulence genes were discovered, evolutionary relations were defined and an understanding of how hosts respond to attack by bacterial pathogens was formulated. More specifically, whole genome studies of prokaryotic organisms has led to the discovery of new toxins, adhesions, invasions, polysaccharide surface structures and other pathogenicity determinants to name a few (Wren, 2000). Following in the success obtained for bacterial systems, high throughput techniques were adapted for eukaryotic systems, including fungi. While the number of symbiotic and pathogenic fungi that have active genome based research is modest, as reviewed by Tunlid & Talbot (2002), the amount of resources available and subsequently genes sequenced is increasing. For a current assessment of the status of fungal genome sequencing projects, refer to http://cogeme.ex.ac.uk/index.html. The contributions of fungal genome data to the overall understanding of pathogenicity and coordinately disease control are in their infancy. While genome projects have been initiated for many fungal plant pathogens, arguably the most developed system is the rice blast genome initiative.
There are a number of diseases of rice, but none rival rice blast, caused by Magnaporthe grisea, in terms of historical impact on rice production and current potential for causing devastating epidemics. Few plant pathogens, fungal or other, have this impact at a global scale on nutrition, livelihood, and economic well being. As such, this fungus is recognized as a potential biological weapon and has been noted by the Centers for Disease Control (CDC) as a major worldwide pathological threat. Select strains can cause disease epidemics of barley, wheat, and pearl millet. In addition to being a major pathogen of agronomic crop plants, strains of M. grisea are an emerging problem on turf-type grasses in recreational and urban settings (Landschoot & Hoyland, 1992; Farman, 2002). The application of genomic technologies to the rice host and M. grisea has tremendous potential to increase our understanding of plant resistance responses and pathogen disease machinery that will lead to the development and deployment of durable control strategies.
There are several features that make M. grisea an ideal subject for high throughput genomic studies. The vast majority of phytopathogenic fungi, including M. grisea, belong to the taxonomic class Ascomycota or exist as related asexual forms (Agrios, 1997). M. grisea is further placed into the taxonomic subclass Pyrenomycota because it produces asexual spores in a flask-shaped structure called a perithecium. It is notable that numerous well-studied fungal pathogens and nonpathogens fall into this subclass, such as Nectria (Fusarium), Glomerella (Colletotrichum), and most importantly Neurospora crassa. M. grisea is also closely related to Aspergillus (Emericella) nidulans, which has a rich history of investigation. Genes and processes discovered to be important for M. grisea pathogenesis are likely to be of importance in related fungal pathogens. This has been shown to be true for a number of genes first characterized in M. grisea. The most cited examples are the cAMP and MAP kinase-signaling pathways (Lee & Dean, 1993; Xu et al., 1998). Homologues to M. grisea genes for these pathways have been found in other closely related fungi, and where the functions of these homologues have been tested, the genes have been shown to be involved in pathogenesis related events similar to M. grisea (Butler et al., 2001; Dickman & Yarden, 1999; Horwitz et al., 1999; Durrenberger et al., 2001). It is reasonable to expect that the advances made using a developed system like rice blast, will further the knowledge base for related fungi. For example, signaling pathways deduced to be important in M. grisea for tissue differentiation and infection related processes were used to identify orthologous genes and pathways in other fungi such as Colletotrichum and Sclerotinia species (Dickman & Yarden, 1999; Mitchell & Dean, 1995; Yang & Borkovich, 1999).
Existing tools available for molecular studies with M. grisea
Unlike many fungal pathogens, such as the rusts and mildews, domesticated strains of M. grisea are genetically tractable and are easily cultured and manipulated in the laboratory (Leung et al., 1988). Techniques have been developed to permit the identification of genes involved in the early stages of the infection process. In particular, we are able to induce the fungus to germinate and elaborate appressoria ex-planta (Hamer et al., 1988; Howard & Valent, 1996; Dean, 1997). Robust protoplast and Agrobacterium tumefaciens-based transformation systems are routine for this pathogen, with homologous recombination between genomic and transferred DNA occurring at a low but experimentally useful rate (Mitchell & Dean, 1995; Xu et al., 1998; Rho et al., 2001). Three published genetic maps all based on genetic markers are available, one using single copy RFLPs (Skinner et al., 1993) and another using single copy and repetitive DNA markers (Sweigard et al., 1993); both were integrated by sharing markers (Nitta et al., 1997). The integrated map provided the scaffold for anchoring the physical map and the recently completed whole genome shotgun sequence assembly (unpublished data).
In 1998, the rice blast research community assembled an international consortium with the mission of launching a major genome initiative to build on the extensive body of knowledge available for this pathogen to produce a comprehensive understanding of the pathogenic process and related biology. Over the past few years, we have developed the necessary resources to take a global whole system approach aimed at identifying entire pathways and processes requiring the activities of entire groups of genes and protein families. The tools needed for this research, including stage specific cDNA libraries, large insert BAC libraries, expression libraries, a yeast-two-hybrid system, microarray technologies and efficient infection assay platforms are available. To launch the genome project, a genomic framework was constructed from a large insert (c. 130kb) BAC library with a 25X genome equivalent (9216 clones). BAC clones were fingerprinted by HindIII digestion and assembled into 188 contigs spanning the entire genome of M. grisea. One hundred and twenty-three of these contigs were anchored to the genetic map using RFLP markers. Sequence reads were obtained from both ends of each BAC giving a sequence tag connector or ‘anchor’ about every 3 kb throughout the genome (Zhu et al., 1999). In 2000, Dean (NCSU) and Ebbole (Texas A & M) were awarded funds from the USDA to sequence chromosome 7 (the smallest chromosome) as well as to produce 35 000 ESTs derived from fungal tissue grown under several different conditions. The NSF and USDA subsequently awarded additional funds in 2001 for a 6X shotgun sequence of the M. grisea genome as a joint project between the NCSU Fungal Genomics Laboratory and the Whitehead Institute. Draft sequencing was completed in the summer of 2002 and initial results are available at (http://www-genome.wi.mit.edu/annotation/fungi/magnaporthe/).
In 2001, the NSF Plant Genome program funded the ‘whole genome analysis of pathogen-host recognition and subsequent responses in the rice blast patho-system’ (NSF No. BDI-0115642) project. This 4-yr project involving 7-investigators and directed by Dr Ralph Dean at NCSU will result in the production of additional genomic materials, tools and information for this pathosystem. These include 50 000 random insertion fungal mutants as well as pools of genomic DNA from the mutants, rice and fungal microarrays, identification of secreted proteins, novel high throughput pathogenicity assays, a set of 35 000 rice ESTs and a comprehensive database that will provide a ‘one stop shopping’ site for all the data generated by the project (www.genome.arizona.edu/mgos).
The power of these resources is just being realized. The availability of ESTs, genome sequence data and microarray data will allow us to identify the presence and expression profile of particular genes or gene clusters, predict open reading frames and pathways, develop models for in-silico mutation experiments, and gain an understanding of genome structure. It is important to realize though, that this information will not provide the knowledge of the exact function of ESTs or predicted open reading frames. We will be restricted to inferring gene function based on homologies, when present, to similar gene functions in other organisms. The mutant bank and DNA pools will be valuable long-term resources to help us deduce the function of many genes with unknown function.
Construction of the physical map for M. grisea and identification of ESTs located on chromosome 7
The most thoroughly studied portion of the M. grisea genome is chromosome 7. This is the smallest chromosome at roughly 4 megabases covering 10% of the genome, which allows it to be easily separated from other chromosomes using established CHEF gel (Contour-clamped Homogeneous Electric Fields) protocols (Farman & Leong, 1995). This chromosome contains the mating type (mat a or α) locus as well as 20 mapped markers. As previously published by Zhu et al. (1999), fingerprinting and hybridization of a 25X HindIII BAC library resulted in the identification of six contigs of BACs on chromosome 7 and the identification of a minimum tiling path of 42 BACs across the chromosome. The physical map data including fingerprints, contig assemblies constructed using the program FPC (http://www.genome.arizona.edu/software/fpc/) and marker information have all been formalized into the federated database, Magnaporthe db, which can be found at http://www.fungalgenomics.ncsu.edu/Projects/mgdatabase/chromosome7.htm. Figure 1 shows an entry screen into the information available for each chromosome. The database is designed to allow the user to query BAC, contig, and chromosome information to relate the physical map data to the genetic map (Martin et al., 2002).
To refine and annotate the physical map, we anchored appressorium stage ESTs to chromosome 7 using a two-step strategy. In the first step, 42 BAC clones composing the minimum tiling path across the chromosome were digested, the fragments were separated on a fingerprinting gel, immobilized on nylon membrane and hybridized to a pooled appressorium-specific cDNA library. The 128 hybridizing HindIII fragments isolated were arranged into 2-dimentional multiplex pools and subsequently used as probes to hybridize to a cDNA library arrayed on nylon filters. Four hundred and sixty-six chromosome 7 specific cDNA clones hybridized and were then sequenced. One hundred and eighty-one were unique and used to hybridize to the 25X HindIII BAC library using a two-dimensional multiplexing hybridization strategy composed of 16 pools of DNA containing 10–12 probes per pool for hybridization experiments. By decoding the hybridization patterns for each pool, the set of hybridizing BACs for each EST was determined. Several EST clones were hybridized individually to confirm the results of the pooling strategy.
Using the previously constructed FPC physical map, each EST was anchored to the map by the location of the hybridizing BAC clones. This procedure not only allows for the anchoring of EST sequences, but also serves in validating contigs assembled by FPC. Twenty-three ESTs were determined to contain repetitive sequence and were not considered further in this analysis; additionally three did not contribute enough DNA to the pool for adequate hybridization results. Of the 155 remaining ESTs, 142 (92%) were confirmed to reside on chromosome 7. For hybridizing FPC contigs assigned to different chromosomes or not assigned to any chromosome, the corresponding hybridizing ESTs were probed against CHEF gel blots of M. grisea chromosomes to confirmation or establish their chromosomal location (Fig. 2a). This analysis allowed us to anchor to chromosome 7 five previously unanchored FPC contigs (Fig. 2b). The majority of the ESTs were anchored to FPC contig 5, the largest contig assigned to chromosome 7, and FPC contig 44, a newly assigned chromosome 7 contig.
Chromosomal comparisons between M. grisea BAC and EST sequences and N. crassa
Genome comparisons can be done at several levels; chromosomal, sets of individual genes, or short regions the size of a single BAC clone. These comparisons are done to derive evolutionary relatedness and rates of evolution between organisms, identify conserved open reading frames and functional gene regions, and to locate important regulatory elements/regions. The power of this experimental approach is best exemplified by comparisons between human and mouse genomes where syntenic relationships have led to the identification of loci involved in human disease (Lund et al., 1999; Strippoli et al., 2000). While comparisons between these mammals have obvious direct impacts on human health, genome comparisons using fungal models are particularly facile compared with those of higher eukaryotes. Fungi, with smaller genomes in the range of 13–42 Mb, are simpler organisms to attempt to understand the genome structure. The smaller genomes contain a relatively large number of genes compared with humans and other mammals (Tunlid & Talbot, 2002). In the cases of M. grisea and S. cerevisiae, the gene density is roughly one every 3.5 kb and 2 kb, respectively, which means a rich data set is obtainable at a lower cost of time and money. Where fungi have been used, the identification of syntenic regions (regions of orthologous gene sets) and inferences into evolutionary relatedness based on synteny has been done using yeast models due to the wealth of sequence information available (Seoighe et al., 2000; Fischer et al., 2001). Interestingly, in comparisons between S. cerevisiae and C. albicans, only 9% of gene pairs are conserved, although examples of blocks of 10 or more genes were found where gene order is conserved even though transcriptional orientation was flipped (Seoighe et al., 2000). With the completion of draft sequences of the N. crassa and M. grisea genomes, we are positioned to perform detailed genome comparisons between these closely related filamentous fungi as well as with more distantly related yeasts.
When BACs from chromosome 7 are compared with N. crassa using the BLASTn or tBLASTx algorithms, varying degrees of synteny spanning the chromosome are revealed. For example, when BAC 6J18 (GenBank accession AF267176) was aligned with the N. crassa genome, it aligned to four nonadjacent N. crassa contigs. Analysis of other chromosome 7 BACs shows alignment to 2–6 N. crassa contigs. Within syntenic regions, inversions and rearrangements were observed. These observations are similar to those published by Hamer et al., 2001.
ESTs and the 1050 predicted proteins anchored to chromosome 7 of M. grisea were compared to the set of N. crassa predicted proteins. The results of this comparison are summarized in Table 1. Nearly half (43.5%) of chromosome 7 genes align with linkage group I of N. crassa (2061 predicted proteins), and conversely, linkage group I of N. crassa contains the highest proportion of proteins that match a chromosome 7 gene. When the chromosome 7 genome sequence, constructed as a pseudo-molecule of adjoining BACs, was aligned to linkage group I of N. crassa, the result was the identification of regions with high degrees of similarity as well as many cases of inversions, gene duplications, and rearrangements. A simplified version of this result is depicted in Fig. 3 generated by the program GenomePixelizer (http://atgc.org/GenomePixelizer/GenomePixelizer_Welcome.html), which shows the alignment of chromosome 7 to N. crassa linkage group I.
Table 1. Summary of comparison between Magnaporthe grisea chromosome 7 specific ESTs/ORFs and Neurospora crassa predicted ORFs.
N. crassa linkage group
Number of N. crassa proteins that match M. grisea chromosome 7 with E < 1 x 10−30
Percent of N. crassa proteins in linkage group that match M. grisea chromosome 7
Percent of M. grisea matches per N. crassa linkage group
To evaluate synteny between M. grisea and another related ascomycetes, ESTs determined to be on chromosome 7 of M. grisea were hybridized to high-density nylon filters containing an ordered cosmid library of Aspergillus nidulans. As shown in Fig. 4, ESTs specific to chromosome 7 of M. grisea predominantly hybridized to linkage group III of A. nidulans. As with N. crassa, large regions of inversions and rearrangements are seen. It is interesting to note that when the same genes were compared with the S. cerevisiae genome, genes appeared to be randomly distributed among chromosomes.
As we anticipate the release of genome sequence data of additional fungi (pathogens and nonpathogens), more informative comparative genome studies are being planed. Of particular importance will be similar syteny comparisons between pathogenic compared with nonpathogenic fungi and the identification of pathogenicity determinates or the presence of pathogenicity islands.
Development of bioinformatics tools to identify regions of sequence similarity between fungal genomes
Where the absence of genome data for fungi was once the major hurdle for researchers studying these systems, we are now faced with the inadequacies of current computing platforms and software for analyzing genome data. To look ahead, 10 fungal genomes with a minimum of 5X draft coverage will be publicly available by the end of 2003, in addition to the rapidly growing number of EST sequences being released. The challenge now is to design computing environments and databases that allow research biologists to ask meaningful questions of multiple genomes using a standardized set of tools. To this end, we developed the DeCIFR analysis system as a basic research tool for the high throughput analyses of multiple complete and incomplete sets of genomic data. DeCIFR is founded on the EnsEMBL (e!) system (http://www.ensembl.org/). Key advantages of this system are that it is an open source platform, highly developed, and customizable. To test its utility, we performed automated analysis of the Aspergillus fumigatus, Aspergillus niger, M. grisea, and N. crassa genomes. For each genome, the analyses consisted of gene predictions, repeat analysis and masking, EST, TIGR gene index(es), and GenBank dbEST comparisons. The resulting homologies are placed on a graphical display as an annotation feature on the genome of the query organism. Figure 5 displays the results of comparing M. grisea with the known information from the A. niger, A. fumigatus, and N. crassa genomes in addition to the GenBank EST and NT databases.
The base e! system was enhanced to display the results of homology searches using the BLAST suite of programs. Where additional information is available in the World Wide Web for a homology, the individual annotations are hyperlinked to the appropriate information source. Some of these external data sources are the NCBI Entrez and TIGR GeneIndex systems. The contig view display tool enables visual identifications of correlations between the annotations. New annotations and analyses are easily added to the display with a minimal effort.
Generation of the predicted annotations is automatic and performed on a computational grid consisting of 32 CPUs. Approximately one-and-a-half calendar days are required to perform the illustrated analyses for a 30–40 megabase genome (Fig. 5). To accomplish this level of performance, we enhanced the base e! software to utilize the PBS (portable batch system) open source software running on the RedHat version of the Linux operating system. An execution, management, and control environment was developed that greatly simplifies the operation of the analysis system. The result is a package that uses freely available, unrestricted software and COTS (commercial off the shelf) computers and one which exhibits reasonable scaling characteristics.
The results of the automatic analyses are stored in an object oriented relational database. The use of an object model allows ready data mining by the creation of custom programs – the product of which is readily integrated into the genome browser display. The underlying data store retains information regarding the physical location of features, the associated DNA, identifiers for the features and any additional information relating to the features. This wealth of information in a machine comprehensible format simplifies the task of comparative genomics. The current system retains information that allows a feature in one genome to hyperlink to a correspondingly similar feature in another genome. This capability forms the foundation required for automated syntenic analysis.
Current work is in progress on incorporating additional gene prediction programs. Combining the putative exons with supportive evidence derived from homologies to other organisms, EST matches and syntenic observations will improve the accuracy of modeling of genes. The goal behind having such models is to simplify the task of discovering functional synteny between multiple organisms.
As detailed in this case study, a significant level of effort by several research laboratories was necessary to create the resources required to justify whole genome sequencing. With the recent release of the draft genome data, these resources, specifically sequenced genome markers, BAC end sequences, deep sets of EST sequences, ESTs anchored to chromosome 7, and chromosome 7 BAC-by-BAC sequence, are critical in enabling the validation of the shotgun genome assembly and verifying automated gene prediction efforts. The absence of such resources will relegate the data to the status of a best guess and likely vitiate any conclusions drawn from genome synteny comparisons. The large gap between the financial and technological resources to study the model yeasts S. cerevisiae and Schizosaccharomyces pombe versus filamentous fungi that have a direct impact on human health and nutrition, is slowly being filled. The release of the 6X draft genome sequence of M. grisea is a tremendous step forward in filling this gap, however, the real challenge is in deciphering the function of each predicted gene and placing its activities within the context of the biology of the pathogen. We are just starting to meet this challenge through the collaborative project funded by the National Science Foundation to perform a comprehensive functional gene analysis of the rice blast pathogen.
This case study reports on the development of the tools and materials generated for the rice blast pathosystem. These resources have been cultivated over the past decade not to be an end in themselves, but to empower the research community to ask fundamental biological questions concerning fungi, pathogens, and eukaryotic organisms in general. The Fungal Genomics Laboratory is now using these resources to extend the current plane of understanding fungal life. A central question being addressed is how fungal parasitism evolved, and coordinately, since no pathogen can universally infect all plant species, what are the origins of host and tissue specificity? Through using micro and macroevolutionary approaches we plan to examine the evolutionary process (e.g. gene flow, genetic drift, recombination, horizontal gene transfer, gene duplication, and sequence divergence) in detail (McDonald, 1997; McDonald & Linde, 2002). The foundation from which these questions will be answered is the existing and emerging sequence data from whole or partial genomes. However, our ability to use the growing volumes of genome data will be predicated on the emergence of new tools, methodologies, and algorithms that will allow us to integrate diverse structural, functional, and comparative genome data to test for genotype–phenotype associations within a phylogeny as well as across taxonomic groups.
Whole genome analysis is enabling researchers to further understand key determinants involved in pathogenicity. Casual browsing of the predicted open reading frames from the whole genome assembly reveals promising target pathogenicity genes. These include polyketide synthases, transporters, monooxygenases, kinases and phosphatases, cytochrome P450s, and peptide synthases. We are in the process of systematically cataloguing these potential pathogenicity determinants and islands and characterizing their role in infection, reproduction, and invasive plant growth.