Profiling the secretomes of plant pathogenic Proteobacteria

Authors


  • Edited by Mark Pallen.

*Corresponding author. Tel.: +44 1865 275 132; fax: +44 1865 275 074, E-mail address: gail.preston@plants.ox.ac.uk

Abstract

Secreted proteins are central to the success of plant pathogenic bacteria. They are used by plant pathogens to adhere to and degrade plant cell walls, to suppress plant defence responses, and to deliver bacterial DNA and proteins into the cytoplasm of plant cells. However, experimental investigations into the identity and role of secreted proteins in plant pathogenesis have been hindered by the fact that many of these proteins are only expressed or secreted in planta, that knockout mutations of individual proteins frequently have little or no obvious phenotype, and that some obligate and fastidious plant pathogens remain recalcitrant to genetic manipulation. The availability of genome sequence data for a large number of agriculturally and scientifically important plant pathogens enables us to predict and compare the complete secretomes of these bacteria. In this paper we outline strategies that are currently being used to identify secretion systems and secreted proteins in Proteobacterial plant pathogens and discuss the implications of these analyses for future investigations into the molecular mechanisms of plant pathogenesis.

1Introduction

An overriding goal for bacterial plant pathologists has been to identify the key factors involved in bacteria–plant interactions. Researchers have screened thousands of transposon mutants to identify strains with deficiencies in virulence and pathogenesis. From these early studies, which largely focused on genetically amenable Gram-negative Proteobacteria, two themes began to emerge: first, that the mutations with the greatest effect on pathogenesis frequently knocked out the function or expression of protein secretion systems. Second, mutations that knocked out individual proteins secreted by these systems often had little or no effect on the development of disease symptoms. Pathogenesis, it seemed, depended on the collective activity of a wide variety of proteins secreted by a variety of protein secretion systems. Furthermore, bacteria that caused different types of plant diseases each seemed to exploit a different type of protein secretion system: tumorigenic Agrobacterium relied on the type IV (Vir) system; soft-rotting Erwinia on the type II (Out) system, and vascular pathogens and necrotizing bacteria such as Ralstonia, Xanthomonas and Pseudomonas on the type III (Hrp) system [1]. These clear cut results presented a tractable and attractive means of profiling and classifying pathogens. However, by the mid-90s, this simple and satisfying story began to collapse. Although cell wall degrading enzymes secreted by the Out system of soft rot pathogens such as Erwinia chrysanthemi and Erwinia carotovora were the cause of wholesale degradation of plant cell walls by these pathogens, the same bacteria were using a Hrp system to help establish endophytic populations in plant tissues [2]. Agrobacterium (Rhizobium) vitis was found to use a type II-secreted polygalacturonase to promote invasion of plant tissue prior to initiating tumorigenesis [3]. Finally, type III secretion genes, once thought to be the exclusive preserve of animal and plant pathogenic bacteria, were identified in plant symbionts and plant growth-promoting soil bacteria [4–6]. Researchers also began to identify roles for proteins secreted by additional protein secretion mechanisms, including the Tat pathway, autotransporter and two partner secretion systems [7–9].

Further development of our understanding of the ecology and evolution of plant pathogenic bacteria, and of the role of secreted proteins in plant pathogenesis, depends on systematic investigations into the nature and number of proteins secreted by plant pathogenic bacteria during plant pathogenesis, investigations made possible by the availability of genome sequence data for an increasingly wide range of plant pathogens. Genome sequence data can be used to predict and profile the entire complement of secreted proteins in an organism, commonly referred to as the “secretome”. The opportunity to carry out systematic functional analyses of candidate secreted proteins will help to bypass many of the problems associated with the weak, redundant or suppressed phenotypes that have limited previous studies.

Plant pathogenic bacteria are relative latecomers to the genome sequencing revolution, with the first sequenced strain being Xylella fastidiosa 9a5c in 2000 [10]. However, there are now 11 complete, and at least 30 unfinished genome sequences for bacterial plant pathogens (Table 1), along with sequences for non-pathogenic and symbiotic relatives such as Pseudomonas fluorescens and nitrogen-fixing Rhizobia. We have included the opportunistic pathogen Pseudomonas aeruginosa PAO1 in Table 1 and in subsequent analyses. P. aeruginosa PAO1 was isolated from a clinical specimen, but several laboratory studies have shown that this bacterium is able to colonise plants and induce disease symptoms [11,12], and it forms a useful point of reference for comparison with Pseudomonas syringae.

Table 1.  Plant pathogen genome sequencing projects
PathogenTaxonomyPathologyHost rangeProtein secretion profile ca. 2000Genome sequence (sizea, source)
  1. A list of current plant pathogen genome sequencing projects, with a brief summary of the pathology and host range of each pathogen, and of the protein secretion “profile” of each pathogen species prior to the availability of genome sequence data. Information on ongoing genome sequencing projects was obtained from Genomes Online (http://www.genomesonline.org).

  2. aThe calculated or predicted size of the genome is included where known.

  3. bThe host range of obligate and fastidious pathogens such as Xylella fastidiosa and Phytoplasma species is frequently quite broad, but is restricted under field conditions by the feeding preferences of insect vectors.

  4. ***The sequenced strain OY-M is a derivative of the field isolate OY-W obtained after maintaining OY-W for 11 years on a single host plant. It does not cause the stunting, hyperplastic phloem tissue and severe phloem necrosis symptoms associated with OY-W [229].

Completed
Xylella fastidiosa 9a5cΓ-ProteobacteriaChlorosis, leaf scorch, dystrophyBroadb (insect vector)Unknown2.73Mb AE003849 (2000)
Pseudomonas aeruginosa PAO1Γ-ProteobacteriaWatersoaking, necrosisBroadType III secreted effectors, proteases6.26Mb AE004091 (2000)
Agrobacterium tumefaciens str. C58α-ProteobacteriaGallsBroadT-DNA associated proteins.5.67Mb AE008688 (2001)
     5.67Mb AE007869 (2001)
Ralstonia solanacearum GMI1000β-ProteobacteriaWilt, chlorosis, necrosisBroadType III secreted effectors5.81Mb AL646052 (2002)
Xanthomonas campestris pv. campestris ATCC33913Γ-ProteobacteriaNecrosis, chlorosis, black rot, wiltCrucifersType III secreted effectors, cell-wall degrading exoenzymes5.08Mb AE008922 (2002)
Xanthomonas axonopodis pv. citri str. 306Γ-ProteobacteriaCanker, necrosis, diebackCitrusType III secreted effectors5.27Mb AE008923 (2002)
Xylella fastidiosa Temecula1Γ-ProteobacteriaChlorosis, leaf scorch, dystrophyBroad (insect vector)Unknown2.52Mb AE009442 (2003)
Pantoea stewartii DC283Γ-ProteobacteriaChlorosis, discolouration, wiltCornType III secreted effectors5Mb, BCM-HSGC, University of Wisconsin
Pseudomonas syringae pv. tomato DC3000Γ-ProteobacteriaChlorosis and necrosisTomato, ArabidopsisType III secreted effectors, type IV pili6.54Mb AE016853 (2003)
Erwinia carotovora subsp. atroseptica SCRI1043Γ-ProteobacteriaSoft rot, black rot of stems, wiltPotatoType III secreted effectors, cell wall degrading exoenzymes5.06Mb BX950851 (2004)
Onion yellows Phytoplasma*** OY-MMollicuteDystrophy – witches’ broomBroad (onion, chrysanthemum) (insect vector)Unknown0.86Mb AP006628 (2004)
      
Unfinished
Clavibacter michiganensis subsp. sepedonicus ATCC33113ActinobacteriaWilt, ring rot of tubersPotatoCell wall degrading enzymes2.5–2.6Mb, Sanger
Clavibacter michiganensis subsp. sepedonicus NCPPB382ActinobacteriaWilt, ring rot of tubersPotatoCell wall degrading enzymes, HR elicitor3.4Mb, Competence Network, Bielefeld
Erwinia amylovora 273Γ-ProteobacteriaNecrosis, watersoaking, dieback, cankerRosaceaeType III secreted effectors5Mb, Sanger
Erwinia chrysanthemi str. 3937Γ-ProteobacteriaSoft rot, black rot of stems, wiltBroadType III secreted effectors, cell wall degrading exoenzymes3.7Mb, TIGR
Pseudomonas aeruginosa UCBPP-PA14Γ-ProteobacteriaWatersoaking, necrosisBroadType III secreted effectors, protease6.52Mb, Harvard
Pseudomonas syringae pv. syringae B728aΓ-ProteobacteriaNecrosisBroad (Bean)Type III secreted effectors5.99Mb, DOE Joint Genome Institute
Pseudomonas syringae pv. phaseolicola 1448aΓ-ProteobacteriaChlorosis, necrosisBeanType III secreted effectors6Mb, Cornell University and TIGR
Spiroplasma citriMollicuteStunting, sterililty, dystrophyCitrus (insect vector)Surface proteins (solute binding, adhesion), esterase, RNaseCentral Washington University
Spiroplasma kunkelii CR2-3xMollicuteStunting, sterility, chlorosisCorn/maize (insect vector)Unknown1.6Mb, University of Oklahoma
Spiroplasma kunkelii NM2MollicuteStunting, sterility, chlorosisCorn/maize (insect vector)UnknownOhio State University
Streptomyces scabies 87–22ActinobacteriaNecrosis, dystrophyPotatoEsterase8.1Mb, Cornell University
Xanthomonas axonopodis pv. aurantifolii BΓ-ProteobacteriaNecrosisCitrusType III secreted effectors, cell-wall degrading exoenzymes5Mb, Sao Paulo State Consortium, Brazil
Xanthomonas axonopodis pv. aurantifolii CΓ-ProteobacteriaNecrosisCitrusType III secreted effectors, cell-wall degrading exoenzymes5Mb, Sao Paulo State Consortium, Brazil
Xanthomonas campestris pv. campestris 8004Γ-ProteobacteriaNecrosis, chlorosis, black rot, wiltCrucifersType III secreted effectors, cell-wall degrading exoenzymes5Mb, Guangxi University, China
Xanthomonas campestris pv. Campestris B100Γ-ProteobacteriaNecrosis, chlorosis, black rot, wiltCrucifersType III secreted effectors, cell-wall degrading exoenzymes5.1Mb, Competence Network, Bielefeld
Xanthomonas campestris pv. vesicatoria 85-10Γ-ProteobacteriaNecrosis, chlorosisCrucifers, SolanaceaeType III secreted effectors, cell-wall degrading exoenzymes5.4Mb, Competence Network, Bielefeld
Xanthomonas citriΓ-ProteobacteriaNecrosisCitrusType III secreted effectors, cell-wall degrading exoenzymes5.0Mb, University of Sao Paulo, FAPESP, UNICAMP
Xanthomonas oryzae pv. oryzaeΓ-ProteobacteriaWatersoaking, chlorosis, wiltRice, GraminaceaeType III secreted effectors, cell-wall degrading exoenzymes4.8Mb, NIAS Japan
Xanthomonas oryzae pv. oryzae KAXCC10331Γ-ProteobacteriaWatersoaking, chlorosis, wiltRice, GraminaceaeType III secreted effectors, cell-wall degrading exoenzymes4.5, NIAB, Macrogen
Xylella fastidiosa Ann1Γ-ProteobacteriaChlorosis, leaf scorch, dieback, deathOleander (insect vector)Unknown2.67Mb, DOE Joint Genome Institute
Xylella fastidiosa DixonΓ-ProteobacteriaLeaf scorch, stunting, deathAlmond (insect vector)Unknown2.62Mb, DOE Joint Genome Institute
Xylella fastidiosa Pierce's disease StrainΓ-ProteobacteriaChlorosis, leaf scorch, dieback, deathGrapevine (insect vector)Unknown2.7Mb, Univ. of Campinas
Agrobacterium radiobacter biovar II K84α-ProteobacteriaGallsBroadT-DNA associated proteinsUniversity of Washington
Agrobacterium vitis biovar III, S4α-ProteobacteriaGallsGrapevineT-DNA associated proteins, polygalacturonaseUniversity of Washington
Burkholderia glumaeβ-ProteobacteriaChlorosis, necrosisRiceType III secreted effectors, lipaseSeoul National University Crop Functional Genomics Research Centre
Pantoea citreaΓ-ProteobacteriaPink-red discolouration of fruitPineappleType III secreted effectors? (elicits HR)Genencor
Phytoplasma aster yellows witches’-broomMollicuteChlorosis, dystrophyBroad (insect vector)Unknown800kb, Ohio State University
Phytoplasma beet leafhopper transmitted virescenceMollicuteVirescenceBroad (insect vector)UnknownOhio State University
Phytoplasma maize bushy stuntMollicuteStunting, chlorosis, dystrophyMaize (insect vector)UnknownOhio State University
Phytoplasma Western X diseaseMollicuteNecrosis, dieback, stuntingPrunus (insect vector)Unknown670kb, University of California, Davis

In this paper, we review the information and tools currently available to secretome pioneers, and discuss whether a deeper and broader understanding of protein secretion profiles will challenge our understanding of the mechanistic basis of plant pathogenesis and revolutionise our ability to identify and control plant diseases. We will largely limit our discussion and analyses to those organisms for which complete genome sequences are already available, which consist almost entirely of Gram-negative Proteobacteria. We refer the reader who wishes to learn more about individual plant pathogens to the following genome sequence papers and species or genus-specific reviews: Agrobacterium tumefaciens[13–16]; E. carotovora[17–19]; P. syringae pv. tomato[20,21]; P. aeruginosa[22–24]; Xanthomonas campestris pv. campestris[25,26]; Xanthomonas axonopodis pv. citri[25,27,28]; X. fastidiosa[10,29–31]; Ralstonia solanacearum[32–34].

2Profiling protein secretion systems

The dual membrane envelope of Gram-negative bacteria presents a double barrier between the cytoplasm and the external environment. Three secretion systems, type I, III and IV have evolved to transport proteins over both membranes in a single unified process. In addition, the Sec and Tat systems secrete proteins across the inner membrane into the periplasm, where they can subsequently be transported across the external membrane by the terminal branch of the type II secretion system, the fimbrial usher porin, autotransporter domains or two-partner secretion systems. Proteins can also cross membranes as a result of the action of MscL and Holin proteins, the former being activated by hypoosmotic stress, the latter associated with the action of lytic phage. Although each system has a distinct structural organisation and mode of action, they do share some structural and operational features in common. For example, the PulD family of outer membrane secretins includes proteins involved in two terminal branches of the general secretory pathway (GSP) (type II secretion and type IV pilus biogenesis), type III secretion, competence and phage assembly [35]. Many secretion systems, including type II, type III and type IV secretion systems, are dependent on the action of lytic transglycosylases, which probably act to enlarge gaps in the peptidoglycan cell wall to allow efficient assembly and anchoring of supramolecular transport complexes in the cell envelope [36].

The core components of each of the major secretory systems are illustrated in Fig. 1. As these core components are generally quite conserved, genome sequence data can be used to predict the repertoire of secretion system components present in each genome. The repertoire of secretion systems in each of the complete plant pathogen genomes is summarised in Table 2 and discussed below. The KEGG Pathway Database (http://www.genome.jp/kegg/pathway.html) provides a comprehensive and detailed listing of the components of the Sec, Tat, type II, III and IV systems, along with the presence, absence and number of genes encoding each component in each completed genome [37]. The Transport Classification Database (http://www.biology.ucsd.edu/msaier/transport/) also provides a useful interface for obtaining bioinformatic information on protein secretion systems and other membrane transport systems [38].

Figure 1.

Protein secretion systems in Gram-negative bacteria. The core components of each of the main systems are illustrated schematically. For simplicity, arbitrary numbers of subunits are shown for multimeric protein components.

Table 2.  Secretion systems encoded in completely sequenced genomes of plant pathogenic bacteria
OrganismSec (GSP)Tat secretionType I secretionType II secretionType III secretionType IV secretionType V secretion
  1. We queried the KEGG Pathway database [37] for the occurrence of Sec, Tat, Type I, Type II, Type III, Type IV and Type V secretion systems. Our diagnostic for the occurrence of Type V (autotransporter) secretion was the presence of proteins with a significant match to Pfam model PF03797.

Agrobacterium tumefaciens C58YesYesYesNoNoYesYes
Erwinia carotovora subsp. atroseptica SCRI1043YesYesYesYesYesYesYes
Ralstonia solanacearum GMI1000YesYesYesYesYesPartialYes
Pseudomonas aeruginosa PAO1YesYesYesYesYesNoYes
Pseudomonas syringae pv. tomato DC3000YesYesYesYesYesNoYes
Xanthomonas axonopodis pv. citri 306YesYesYesYesYesYesYes
Xanthomonas campestris pv. campestris ATCC33913YesYesNoYesYesYesYes
Xylella fastidiosa 9a5cYesYesYesYesNoYesYes
Xylella fastidiosa Temecula1YesYesYesYesNoYesYes

2.1The Sec system

The Sec system is ubiquitous and essential in Gram-negative bacteria. It is involved in the complete translocation of proteins across the cytoplasmic membrane to the periplasm and also facilitates protein integration into the lipid membrane. Proteins exported by the Sec system are synthesized in the cytoplasm with a cleavable N-terminal signal sequence and translocated in an unfolded state. The Sec machinery uses the energy from ATP hydrolysis and ΔP, the electrochemical proton gradient across the membrane to effect transmembrane protein translocation [39]. The ATPase SecA guides the newly synthesized preproteins to the hetero-trimeric complex SecYEG, which forms the “channel” of the translocase (Fig. 1). The protein precursor normally interacts with a chaperone, usually SecB in Gram-negative bacteria, prior to transport to maintain export competence [40]. SecB has been reported to function as a general chaperone, which interacts not only with Sec-secreted proteins, but also with cytoplasmic proteins and with proteins secreted by other secretion systems, including the type I hasA hemophore secretion system in Serratia[41–43], which is similar to systems present in E. carotovora (ECA1535/ECA1536) and P. aeruginosa (PA3406/PA3407).

SecD and F are required for efficient export. They form a membrane bound complex with YajC, which is not essential for secretion and is not found in all prokaryotes [44]. However, YajC homologues are present in all the genomes listed in Table 2. SecD and SecF are present as a single fused protein in A. tumefaciens (Agr_c_2877p, Atu1562), Sinorhizobium meliloti (SMC02057, SMC02265) and Mesorhizobium loti (MLL1070). This is not a universal feature of α-Proteobacteria, as SecD and SecF are encoded by two genes in Bradyrhizobium japonicum (Bll4735, Bll4734) and Caulobacter crescentus (CC1991, CC1990). Insertion and translocation of membrane proteins are performed by the SRP system (signal recognition particle) in conjunction with YidC [45–47]. In E. coli, the SRP is composed of the Ffh protein and 4.5S RNA. SRP targets proteins to the SecYEG translocase via the FtsY receptor. YidC facilitates the release of the translocated proteins from the SecYEG channel. In some cases, it acts independently of the Sec system to assemble membrane proteins directly into the membrane [48].

The N-terminal leader peptide of many Sec-secreted proteins is cleaved in the periplasm by type I and type II signal peptidases. Each of the pathogens in Table 2 has one copy of SPaseII (lipoprotein signal peptidase, lspA) and at least one copy of SPaseI. According to the MEROPS peptidase database [49] both P. aeruginosa and R. solanacearum have two SPaseI-like genes: PA1303, PA0768, RSc1716 and RSc1061.

2.2The Tat system

The recently discovered Tat system exports pre-folded proteins across the cytoplasmic membrane using the transmembrane proton gradient as the main driving force for translocation, and has been extensively characterised in E. coli[50]. Many Tat-secreted proteins contain redox cofactors that are inserted prior to export and have functional roles in respiratory and photosynthetic electron transport pathways. Some, such as the haemolytic (PlcH) and non-haemolytic (PlcN) phospholipases of P. aeruginosa, are transported to the periplasm and subsequently exported by outer membrane transporters such as the type II system [51].

Investigations into the role of Tat-secreted proteins in pathogenesis are hindered by the fact that Tat mutations frequently have pleiotropic effects on bacterial cells. For example, a TatC mutant of A. tumefaciens not only failed to export Tat-dependent proteins, but also exhibited defects in growth rate and cell division and released abundant levels of several proteins into the culture supernatant when grown in rich medium or in virulence gene inducing minimal medium. A majority of TatC mutant cells also displayed defects in motility and chemotaxis, and were strongly attenuated in virulence, although they retained the ability to carry out type IV secretion [52].

The Tat system is encoded by four genes: tatA, tatB, tatC and tatE. These genes are organised in two transcriptional units, tatABCD and tatE, but only tatA, tatB and tatC are essential for operation of the system; tatE is functionally identical to tatA and research suggests that tatE is a poorly expressed duplication of tatA[53]. E. carotovora is the only plant pathogen in Table 2 that possesses a copy of tatE. tatD encodes a cytoplasmic protein with DNase activity. It is not involved in protein export and is not always found in organisms that have the other Tat components [54]. Each of the genomes in Table 2 contains at least two, and in the case of E. carotovora, P. aeruginosa and P. syringae three tatD homologues, which raises the possibility of functional redundancy in interpreting tatD phenotypes. However, Wexler et al. [54] have demonstrated that an E. coli strain with chromosomal deletions in all three tatD genes retains the ability to export tat substrates. At present it seems likely that co-transcription of tatD with tatABC reflects similar regulatory requirements rather a direct functional relationship.

tatA, tatB, tatC and tatE all code for integral membrane proteins. According to predictions based on TMHMM [55], TatA, TatB and TatE contain a single membrane-spanning domain, whereas TatC is composed of six transmembrane helices. TatA forms a large (?450 kDa) complex, while TatB and TatC proteins are assembled in another large (?700 kDa) and equimolar complex in the E. coli membrane [56] (Fig. 1). Genomic analyses indicate that only one copy of the tat operon is present in each of the plant pathogen genomes in Table 2. However, the non-pathogenic soil bacteria P. putida KT2440 and Acinetobacter sp. ADP1 both contain two sets of tat genes.

2.3Type I

The ATP-binding cassette (ABC) transporters comprise a huge superfamily of membrane transport proteins that are widespread in all organisms. Members of this superfamily are involved in several functions including transcriptional regulation and DNA repair as well as transport of proteins and solutes across membranes [57]. The diverse subfamilies and functions of ABC transporters are described in the Transport Classification Database (http://www.biology.ucsd.edu/msaier/transport/) [38] and TransportDB (http://www.membranetransport.org) [58]. The proteins known to be exported by ABC systems in plant pathogenic bacteria are predominantly proteases, lipases and hemolysins, although other proteins have also been linked to this type of secretion system, including polysaccharide degrading enzymes secreted by symbiotic S. meliloti[59]. One noteworthy plant pathogen ABC system is the Rax secretion system of Xanthomonas oryzae pv. oryzae. This system has been shown to be required for elicitation of the Xa21-mediated defence response in rice; a response typically associated with type III-secreted effector proteins. The Rax-dependent response has been shown to require a sulfotransferase-like protein encoded by raxST, but the nature of the secreted elicitor and the role of the Rax system in HR elicitation remains unclear [60].

ABC systems that secrete proteins are known as type I secretion systems, and are able to secrete proteins across both Gram-negative bacterial membranes in a single step, independently of the Sec apparatus. Each system consists of three components, an ABC ATP-binding protein, a membrane fusion protein (MFP), which forms a bridge between the outer and inner membrane, and an outer membrane pore channel protein, such as the TolC protein of the α-hemolysin transporter of E. coli. All ABC ATP-binding proteins share a common structure comprising two membrane-inserted hydrophobic domains and two hydrophilic domains associated with the interior side of the membrane. The hydrophilic domains are well-conserved and contain characteristic Walker motifs A and B [57,61,62]. Secreted substrates of type I secretion systems lack the N-terminal signal peptide motif characteristic of Sec substrates and instead have a C-terminal secretion signal, which is specific to the particular sub-type of transporter system [63].

All of the plant pathogens in Table 2 encode numerous homologues of ABC transport components. The challenge for secretome profilers is to determine which of these are involved in protein secretion. Analyses based on homology to previously characterized type I systems suggest that all of the genomes in Table 2, except that of X. campestris, encode at least one type I system. The absence of type I secretion in X. campestris is somewhat remarkable considering that the genome of the related pathogen X. axonopodis and the two X. fastidiosa genomes all contain putative type I systems. However, the type I secretion system of X. axonopodis is located in a region around the putative termini of replication that is particularly rich in strain-specific genes [25]. This region contains a 133 kb insertion that harbours a type I secretion system MFP (XAC2201), ATPase (XAC2202) and genes encoding probable type I substrates XAC 2197 and XAC 2198. XAC1918 is also a likely candidate for type I export.

In addition to XAC2201 and XAC2202, X. axonopodis and X. campestris each encode twelve further homologues of the type I MFP (HlyD), and more than 30 ABC transporters. However, all of these proteins are annotated as being involved in efflux of drugs and cations or resemble transporters involved in small solute transport. So, although we cannot absolutely rule out the possibility that one or more of the ABC/MFP proteins in X. campestris are involved in protein secretion, we find no concrete evidence to suggest that a type I protein secretion system is present in this strain. Similarly, although each of the other plant pathogen genomes contains from 4 (X. fastidiosa) to 26 (P. aeruginosa) HlyD-like genes, in each case only a fraction of these genes are likely to be associated with type I protein secretion systems. A complete list of HlyD and ABC ATP-binding protein homologues in the nine plant pathogen genomes is available in Supplementary Tables 1 and 2.

2.4Type II

In Gram-negative bacteria the type II secretion system (also referred to as the main terminal branch of the GSP), is a two step process where proteins to be secreted are first translocated across the cytoplasmic membrane by a translocase (the Sec or Tat system), and then transported across the outer membrane via the type II secretion system. The features of type II secretion systems have recently been reviewed in depth [64,65], and only a few key attributes are discussed here.

Type II secretion is mediated by a conserved, multi-component secretion apparatus, which spans both the inner and outer membrane (Fig. 1). Type II-secreted proteins may be released to the supernatant or remain attached to the cell surface. One subset of type II-like proteins encode the type IV pilus biogenesis system [66,67]. Between 12 and 15 genes (named alphabetically from A to O and S) are identified as essential for type II secretion. They are generally found in gene clusters whose organisation is relatively well conserved. Structural studies suggest that the components of the type II machinery assemble into a multi-protein complex that spans both membranes. The type II secretin, protein D, is inserted in the outer membrane, where it forms the secretion pore, and is involved in the recognition of secreted proteins. The other components of the system are associated with the cytoplasmic membrane (Fig. 1) [64]. The secretion signal for type II secretion remains poorly defined, and in many cases appears to be an attribute of folded or partially folded proteins. All of the completely sequenced plant pathogenic bacterial genomes encode a type II system, except A. tumefaciens, which does however possess homologues of pilus assembly proteins.

Sec transport across the inner membrane requires both ATPase activity and ΔP. In contrast, transport of periplasmic proteins across the outer membrane by type II secretins such as PulD in Klebsiella has been shown to depend on proton motive force rather than ATPase activity [68,69]. Furthermore, transport of a GSP-dependent endonuclease across the outer membrane of Serratia marcescens is strongly affected by the pH gradient across the outer membrane [70].

According to the KEGG Pathways Database [37] the sequenced plant pathogen genomes lack homologues of GspA (ExeA), OutB and OutS, although these proteins are known to be required for type II secretion in E. chrysanthemi[71,72], a soft rot pathogen for which the Out gene cluster has been sequenced, although the complete genome has not yet been published [73] (Table 1). Further examination of protein databases using PSI-BLAST reveals that several other plant pathogen genomes, including P. syringae and P. aeruginosa, do contain at least one chromosomal or plasmid-borne gene with some homology to GspA (ExeA), which is often clustered with pilus biogenesis genes. The role and significance of OutB and OutS in secretion appears to vary between strains. Condemine and Shevchik [74,75] have demonstrated that over-expression of OutD can compensate for lack of OutB, and suggest that OutS and OutB may have roles in chaperoning, stabilising and protecting OutD.

Several plant pathogen genomes contain multiple type II secretion systems or type II components (Table 3). P. aeruginosa, X. axonopodis, and X. campestris are the only Gram-negative bacteria to contain two full sets of type II secretion genes. Some plant pathogens contain multiple homologues of a subset of type II secretion components, most notably R. solanacearum, which contains at least four significant KEGG database hits to gspD, gspE and gspF and 10 hits to gspG, despite having only one hit to each of the remaining components. Some of these duplicated components cluster with the main type II cluster(s), others are isolated genes, scattered around the genome. It is conceivable that some of these components retain functional roles within the cell, perhaps in varying the substrate specificity or functional expression of type II systems. The presence of multiple gspDEFG genes is not unique to plant pathogens. It is also a feature of soil bacteria such as P. putida KT2440 (shown at the bottom of Table 3) and the nitrifying bacterium Nitrosomonas europaea.

Table 3.  Distribution of the core components of the type II secretion system in plant pathogenic bacteria
 gspCgspDgspEgspFgspGgspHgspIgspJgspKgspLgspMgspN
  1. Information in this table is based on orthology tables from the KEGG Pathway database [37]. This table does not include putative type IV pilus and fimbrial biogenesis genes. ECA3105 and ECA3101 were not detected using KEGG criteria and are therefore shown in italics. The final row of the table shows equivalent predictions for the non-pathogenic soil bacterium P. putidaKT2440.

Agrobacterium tumefaciens
             
Ralstonia solanacearumRS00558RS00431RS00568RS00345RS00346RS00560RS00561RS00562RS00561RS00564RS00565RS00566
  RS00567RS00952RS00569RS00348       
  RS01250RS01244RS01243RS00349       
  RS02977RS02972RS02971RS00549       
     RS01241       
     RS01251       
     RS01252       
     RS02970       
     RS02978       
     RS02979       
             
Erwinia carotovora subsp. atrosepticaECA3110ECA3109ECA3108ECA3107ECA3106ECA3105ECA3104ECA3103ECA3102ECA3101ECA3100ECA3099
             
Pseudomonas aeruginosaPA0679PA0685PA0686PA0687PA0681PA0678PA0680PA0677PA0682PA0683PA0684 
PA1867PA1382PA2677PA2676PA2675PA3100PA2673PA3098PA3097PA3096PA3095 
PA3104PA1868PA3103PA3102PA3101 PA3099     
  PA3105PA5210         
             
Pseudomonas syringae pv. tomato PSPTO3307PSPTO0319PSPTO3316PSPTO3315PSPTO3314PSPTO3313PSPTO3312PSPTO3311PSPTO3310PSPTO3309PSPTO3308
  PSPTO3317         
             
Xanthomonas axonopodis pv. citriXAC0694XAC0695XAC0696XAC0697XAC0698XAC0699XAC0700XAC0701XAC0702XAC0703XAC0704XAC0705
 XAC3534XAC3544XAC3543XAC3542XAC3541XAC3540XAC3539XAC3538XAC3537XAC3536XAC3535
  XAC4212         
             
Xanthomonas campestris pv. campestrisXCC3426XCC0670XCC0660XCC0661XCC0662XCC0663XCC0664XCC0665XCC0666XCC0667XCC0668XCC0669
 XCC3425XCC3424XCC3423XCC3422XCC3421XCC3420XCC3419XCC3418XCC3417XCC3416XCC3415
  XCC4088         
             
Xylella fastidiosa str. 9a5c XF1527XF1517XF1518XF1519XF1520XF1521XF1522XF1523XF1524XF1525XF1526
             
Xylella fastidiosa str. Temecula1 PD0742PD0732PD0733PD0734PD0735PD0736PD0737PD0738PD0739PD0740PD0741
             
Pseudomonas putida PP1046PP1047PP1048PP1049PP1050PP1051PP1052PP1042PP1053 PP1055
  PP3478PP3483PP3424PP3423       
   PP5190 PP3476       
     PP3477       

2.5Type III

The Type III secretion system, or contact-dependent secretion system as it is sometimes known, is a specialised export system found in Gram-negative pathogens of plants and animals that can be used to deliver proteins into the cytoplasm of host cells. Type III secretion systems play a central role in the host interactions of many Gram-negative plant pathogens including P. syringae, P. aeruginosa, R. solanacearum, Xanthomonas spp. and E. carotovora. The flagellar biosynthetic mechanism also represents a form of type III secretion. However, in this paper we will use the term “type III secretion system” to refer exclusively to non-flagellar type III systems unless stated otherwise. Although several mammalian pathogen genomes have been shown to encode multiple type III systems, each of the type III-containing plant pathogen genomes in Table 2 encodes just one set of type III (Hrp) genes, in pathogenicity islands or on large virulence-associated plasmids. There is no evidence of type III secretion systems in the genomes of A. tumefaciens and X. fastidiosa, although a small number of symbiotic Rhizobiaceae have been shown to possess type III systems [4]. The mechanism and structure of type III secretion systems have been extensively reviewed [76–80].

The conserved core of the type III apparatus resembles the flagellar basal body, and is known as the “needle complex”. In fact some type III-secreted proteins can be delivered through the flagellar apparatus [81,82]. A recent study of sequence divergence in flagellar and type III systems does not support the idea that type III systems evolved from flagellar systems, but rather that both systems are ancient and derived from a common ancestral system [83]. In mammalian pathogens the type III needle complex consists of a short (<100 nm long, 8 nm diameter) surface appendage, inserted in a 30 nm diameter complex that traverses both bacterial membranes and the peptidoglycan cell wall [84–87]. Plant pathogen type III systems have been shown to elaborate a filamentous surface appendage known as the Hrp pilus, which can be several micrometers long [88–91].

At the heart of the type III needle complex is a triplet of highly conserved membrane proteins FliP/Q/R (cf. HrcR/S/T). Similar triplets can be found in the Sec system (Sec F/E/Y) and in the F0F1-ATPase (ATP5/8/6), which suggests that this triplet is an ancient structure or protochannel for membrane channeling [92]. The membrane proteins FlhA (HrcV), FlhB (HrcU) and FliY (FliN + FliM) (HrcQ) are also conserved in most plant pathogen type III systems, although in P. syringae HrcQ is represented by two proteins, HrcQA and HrcQB, rather than a single protein as in most other type III systems. Blocker et al. [93] have proposed that the basally located ATPase, with homology to FliI docks as a homohexamer at the inner membrane base of the needle complex, forming a central channel through which proteins are exported. Export across the outer membrane is via a porin, HrcC, which belongs to the PulD family of outer membrane secretins [93]. The HrcC homolog YscC assembles into 20 nm diameter, multimeric ring-like structures with a central pore [94].

In contrast to the type II system, type III secretion does not normally involve a periplasmic intermediate, although P. syringae HrcC mutants accumulate the secreted protein HrpZ in the periplasm [95], and Salmonella flagellar proteins can be exported to the periplasm of appropriate mutants [96]. Type III transport across the inner and outer membranes appears to require both proton motive force and ATPase activity [97]. The HrcC pore is probably closed when proteins are not in transit to avoid compromising the outer membrane, although the mechanism for opening and closing the pore is not known. HrcJ is homologous to FliF, a component of the MS-ring complex, and is thought to bridge the inner and outer membrane components in P. syringae[98], where it may stabilize the channel between the two membranes or transduce energy or conformational signals between them.

Phylogenetic analyses indicate that type III secretion systems can be organized into five groups: (i) Ysc group, which includes the plasmid-borne Yersinia system, P. aeruginosa and Rhizobium; (ii) Hrp group I, which includes P. syringae and Erwinia spp.; (iii) Hrp group II, which includes Xanthomonas and Ralstonia; (iv) Inv/Mxi/Spa group, which includes Salmonella pathogenicity island I, EHEC system 2, and Yersinia enterocolitica Ysa; (v) Esa/Ssa, which includes EPEC system 1, Salmonella SPI-2 and the chromosomal Yersinia pestis system [76,99]. As is immediately apparent from this brief overview, type III systems are not distributed in line with rRNA-based phylogenetic relationships, and most type III gene clusters have at least some of the characteristic features of pathogenicity islands (PAIs) [100]. Type III systems within each group are not only similar at a sequence level, but also share distinctive features in terms of gene organization, regulation and secreted proteins. Pallen et al. [79] provide a detailed discussion of the evolution of type III systems in this issue.

Only two extracellular components of plant pathogen type III secretion systems have been identified: the Hrp pilus and harpins. The P. syringae, E. amylovora and R. solanacearum Hrp pili have been shown to have an essential role in effector secretion and translocation [88–91,101–103]. Harpins are also likely to have a role in effector translocation, but their actual function is less clear. Hrp pilins and harpins are generally secreted at higher levels in culture by P. syringae than effectors, which is consistent with their proposed function as secreted components of the translocation apparatus, rather than translocated proteins [89,104,105]. Harpins are not highly conserved at a primary sequence level, but are generically glycine-rich, cysteine-less proteins. Harpin-like domains are also found in homologues of the P. syringae type III-secreted protein HrpW, which also contains a pectate-binding domain [106]. Homologues of HrpW are present in the two Xanthomonas spp., R. solanacearum and E. carotovora[107].

2.6Type IV

Type IV secretion systems are used to export both proteins and nucleoprotein complexes, generally acting to deliver these factors from one cell to another. The best-studied example of type IV secretion is the VirB system of A. tumefaciens, which transports a nucleoprotein complex from the bacterium into a eukaryotic cell. Other well-characterised systems are the cag toxin secretion system of Helicobacter pylori, and the conjugative plasmid transfer system (tra system) used to transfer plasmids between bacterial cells [reviewed in [108–111]]. In A. tumefaciens at least 10 proteins (VirB2-VirB11) associate to form an apparatus that spans from the bacterial cytoplasm, through both membranes to the outside of the cell where it forms a large pilus structure that is readily visible by electron microscopy. This pilus is not to be confused with the “type IV pilus” that is secreted by an apparatus that is homologous to the type II secretion system.

X. fastidiosa 9a5c, the two Xanthomonas spp., R. solanacearum and E. carotovora genomes all encode homologues of most of the type IV components. However, none of these genomes encodes a homologue of VirB7, E. carotovora lacks VirB3 and VirD4, and R. solanacearum lacks homologues of VirB1, VirB6 and VirB8, so it remains to be seen whether these systems are functional. Confirmation that type IV systems can have a role in plant pathogenesis other than T-DNA transfer comes from a recent paper by Engledow et al. [112], which describes the presence of two type IV systems in the opportunistic pathogen Burkholderia cenocepacia, a member of the β-Proteobacteria. At least one of these systems, the Ptw system, has a role in plant pathogenesis. However, this strain contains homologues of all 11 VirB genes and VirD4.

Curiously, the genomes of both X. campestris and X. axonopodis contain multiple homologues of VirB6 (XCC3129, XCC3294, XCC3297, XCC3301, XAC2607, XAC2608 and XAC2612), which are clustered with the type IV genes of X. axonopodis, and in a region some distance from the type IV cluster in X. campestris. The precise function of VirB6 remains unclear. VirB6 is highly hydrophobic, with multiple membrane spanning regions. Jakubowski et al. [113] speculate that VirB6 functions either as a component of the channel that translocates substrates across the inner membrane, or as part of a mechanism that channels periplasmic proteins to the outer membrane. Cascales and Christie [114] have recently provided evidence that the translocated T-DNA makes contact successively with VirD4, VirB11, VirB6, VirB8, VirB2 and VirB10. If VirB6 functions in substrate recognition it is conceivable that the different variants of VirB6 in Xanthomonas spp. could have divergent substrate specificity.

Neither P. aeruginosa PAO1, P. syringae pv. tomato DC3000 or X. fastidiosa Temecula1 appear to possess type IV secretion systems. However, a recent paper by Stavrinides and Guttman [115] describes the presence of type IV genes on the plasmid pPMA4326A of P. syringae pv. maculicola ES2386. It is also interesting to note that a functional type IV system has also recently been described for the plant symbiont Mesorhizobium loti strain R7A [116]. This type IV system has been shown to deliver proteins into eukaryote cells and contributes to efficient nodulation of plant roots in an analogous manner to the type III systems described in other symbiotic Rhizobiaceae[4].

2.7Type V/two partner systems

Type V systems have recently been reviewed by Desvaux et al. and Jacob-Dubuisson et al. [117,118]. The first example of Type V secretion to be described was the IgA1 peptidase of Neisseria gonnorrhoeae. This protein contains an N-terminal signal peptide that targets it for secretion into the periplasm via the Sec system. The C-terminus of the protein then forms a pore in the outer membrane through which the mature peptidase is released. Accordingly, this type of secretion has been dubbed the autotransporter system. Many autotransporters are readily recognised from their sequence, having a classical Sec-dependent signal peptide at the N terminus and a conserved domain at the C terminus that forms the β-barrel pore when embedded in the outer membrane. Bioinformatic prediction of autotransporter proteins is discussed in more detail in Section 5.

Two additional types of secretion system have historically come under the umbrella of type V secretion: the two-partner system (TPS) and the Oca family of autotransporters, which includes YadA. In the two-partner system the passenger domain and pore-forming transporter domain are translated as two separate proteins, referred to as TpsA and TpsB [117,118]. The secreted protein is thought to transit the periplasm as an unfolded protein, folding at the cell surface as it is translocated through the transporter. It is possible that the free energy of folding drives outer membrane translocation. Desvaux et al. [117] propose that the Oca family represents a third family of surface-associated autotransporters, with a distinctive topology. Oca family proteins are present in X. axonopodis pv. citri (XAC3548, XAC3546), X. campestris pv. campestris (XCC0658), X. fastidiosa (XF1529, XF1981, XF1516, PD0731, PD0824, PD0744) and R. solanacearum (RSp1620).

P. syringae contains the highest number of autotransporter-like proteins, with nine candidate proteins. The two Xylella genomes each contain six, X. campestris, five, X. axonopodis and A. tumefaciens four, P. aeruginosa, three, while E. carotovora and R. solanacearum only contain two. One of the autotransporters in X. campestris, XCC2024, contains multiple hemagluttinin repeats suggesting a filamentous structure involved in adhesion. Two more are predicted to have peptidase activity (XCC2025 and XCC1298) and another is predicted to be a lipase (XCC3148). These serine peptidases and lipase are conserved in X. fastidiosa (PD0313, PD0950, PD0218 and PD1879), which also encodes additional autotransporters of unknown function. Some of the nine autotransporters in P. syringae contain recognisable domains such as serine peptidase (PSPTO1650 and PSPTO1649), pertactin (PSPTO2225 and PSPTO211), acid phosphatase (PSPTO5200) and lipase (PSPTO0569). In P. aeruginosa, as well as serine peptidases and lipases, one autotransporter is predicted to be a metallo-peptidase (PA0328). A full list of predicted autotransporter proteins is available in Supplementary Table 3. There is relatively little experimental evidence for the biological roles of type V-secreted proteins in plant pathogenic bacteria, but the TPS-secreted adhesin HecA has been shown to affect the virulence of E. chrysanthemi in Nicotiana clevelandii[9].

2.8The fimbrial usher system

The fimbrial usher system is used to assemble periplasmic proteins into fimbriae and pili [119,120]. These extracellular fibres mediate microbial attachment to host tissues and evasion of host defenses, as well as promoting microcolony and biofilm formation, a contributing factor both to the establishment of infection and to bacterial resistance to antibiotic treatment and ultraviolet light. The usher protein is mainly composed of membrane-spanning β-sheets, and dimerises to form a twin-pore complex in bacterial membranes [119]. Although they do not display a high degree of sequence identity, usher proteins do share a number of characteristics. One of these is the presence of two pairs of cysteines; one located in the N-terminal part and the second at the C-terminal extremity [121]. A. tumefaciens, X. axonopodis, X. campestris, X. fastidiosa Temecula1 and P. syringae each contain one usher protein, R. solanacearum and E. carotovora contain two and P. aeruginosa contains four. The highest numbers of usher proteins are found in the genomes of enterobacterial animal pathogens such as E. coli.

2.9MscL, Holins and vesicle-mediated export

Three additional mechanisms that result in the transport of proteins across membranes are exemplified by MscL, bacteriophage Holin proteins and vesicle-mediated export of ClyA (HlyE). The mechanosensitive channel MscL is involved in protecting cells from hypoosmotic shock [122], and is conserved in all the genomes listed in Table 2. Exposure to hypoosmotic shock can result in the export of cytoplasmic proteins to the periplasm, a process which is thought to involve MscL [123]. Holins are encoded by genes corresponding to at least 35 families with no detectable orthologous relationships. They accumulate and oligomerize in the membranes of infected cells during late-gene expression, eventually forming a lesion that permeabilizes the membrane and allows endolysin to attack the peptidoglycan [124]. Holin secretion can also be triggered by membrane polarization [125]. Most plant pathogen genomes contain bacteriophage genes and Holin-like proteins. For example, X. fastidiosa 9a5c contains five prophages that represent ? 7% of the genome, X. campestris two, and X. axonopodis one [126]. Most Holins are associated with bacteriophage lysis, which is terminal for bacterial cells. However, recent studies have suggested that the action of bacteriophage-like systems can result in the release of proteins that contribute to pathogenesis [127,128], although this has yet to be described in plant pathogens.

A final type of protein secretion is exemplified by the enterobacterial cytotoxin ClyA. This protein has an intrinsic ability to be translocated across the cytoplasmic membrane, accumulating in the periplasmic space without cleavage of any N-terminal signal. Wai et al. [129] have been able to demonstrate that ClyA is released in outer membrane vesicles (OMVs) that are discharged from the surface of bacterial cells, along with various OM proteins, LPS and phospholipids. As the vesicles are released they trap and transport some of the underlying periplasm. ClyA-containing vesicles are subsequently able to fuse with host cells, delivering their contents into the host cytoplasm. ClyA remains in an inactive monomeric form in the periplasm, and assembles into an active oligomeric, pore-forming cytotoxin under the altered redox conditions present in the OMV. This secretion system has only been described in mammalian pathogens to date, and it is not clear to what extent the plant cell wall will serve to prevent vesicles making contact with the plasmalemma of plant cells. However, P. aeruginosa has been shown to naturally release vesicles containing virulence factors such as hemolysin, phospholipase C, protease and alkaline phosphatase [130]. We must therefore consider the possibility that this mechanism could contribute to the export of any protein targeted to the periplasm by Sec, Tat or other mechanisms.

3Predicting secretion signals

Secretion signals target secreted proteins for delivery through the secretion apparatus, either through direct interaction with cytoplasmic or periplasmic components of the secretion machinery, or through an intermediate such as chaperones. While most signals are presented in the form of the primary, secondary or tertiary structure of the secreted protein, there is evidence that in the type III secretion system, mRNA signals may also be involved. In the type IV system of A. tumefaciens, recognition of a protein signal is used to direct the transport of DNA [111]. Secretion signals and signal recognition mechanisms have a significant impact on the evolution and function of secretion systems. The simple N-terminal signals of the Sec, Tat and Hrp systems present opportunities for rapid evolution of new secreted proteins through recombination and mutagenesis, and permit promiscuous secretion of heterologous proteins following horizontal gene transfer. In contrast, the C-terminal signals of type I systems, and the complex secretion motifs of type II-secreted proteins restrict secretion to system-specific and strain or species-specific proteins. Knowledge of secretion signal motifs can be used to profile the secretome of an entire genome, identifying new targets for future investigations. Below we review the bioinformatic tools currently available for secretion signal-based profiling of the bacterial secretome and discuss the outcome and limitations of whole genome profiles of Sec, Tat and Hrp signals in plant pathogenic bacteria.

3.1Sec/SRP signals

Substrates of the Sec secretion system share a characteristic signal peptide sequence, usually at the N terminus. Although these signal peptides share little sequence similarity, they do all share certain common characteristics including a short, positively charged N-terminal sequence, a sequence of 7–15 hydrophobic amino acids, and a short polar sequence that contains the site of cleavage by the signal peptidase and the cleavage consensus sequence ‘AXA’ (Fig. 2). Despite lack of sequence similiarity and differences in amino acid composition and even length, signal peptides can be detected with considerable accuracy [131]. These characteristic features have been exploited by several bioinformatics applications that are used to predict proteins with signal peptides. Two of the first applications were SigCleave and SPScan, which were implementations of a simple scoring matrix method [132]. Nearly a decade later, SignalP1.1 was released [133], soon followed by SignalP2.0 [134], which used a neural network and a hidden Markov model (HMM) trained on a new improved sequence dataset, and showed significant improvement in accuracy and sensitivity compared with SigCleave and SPScan [135].

Figure 2.

Signal peptide motifs identified for the Sec, Tat and Hrp (type III) secretion systems.

In addition to neural networks, another machine learning approach used for prediction of signal peptides is the support vector machine (SVM). SVMs have been used to predict signal peptides with greater than 90% accuracy [136]. A recent implementation of a support vector machine is CELLO [137], which attempts to classify protein sequences by subcellular location into one of five compartments: cytoplasmic, inner membrane, periplasmic, outer membrane and extracellular. PSORT-B attempts a similar classification using several different algorithms including a HMM combined with a SVM for signal peptide prediction [138]. The CELLO authors claim 78.9% accuracy for CELLO in prediction of extracellular proteins, surpassing the 70.0% accuracy of PSORT-B [137].

Both the H region of a signal peptide and transmembrane (TM) helices consist of a short stretch of hydrophobic amino acids. There is a danger of erroneously identifying a TM helix as a signal peptide or, conversely, classifying a signal peptide H region as a TM helix using, for example, the TM prediction tool TMHMM [55]. In order to try to discriminate between the two, Kall et al. [139] have recently developed Phobius, a combined TM topology and signal peptide predictor. False classifications of signal peptides were reduced from 26.1% for SignalP2.0/TMHMM to 3.9% for Phobius.

SignalP remains the most popular method for prediction of secreted proteins. The most recent version, SignalP3.0, continues to use both a neural network and a HMM [140]. The main improvement over the previous version is increased accuracy in prediction of signal peptidase cleavage sites. In comparative analyses SignalP3.0 performs significantly better than other machine learning and HMM methods [140], although it does not discriminate between proteins targeted to the membrane, periplasm or extracelllular locations. Table 4 summarises the tools currently available for predicting the likelihood and location of Sec-secreted proteins. At present we would recommend using a combination of tools, including both SignalP3.0 and TMHMM, to identify and describe Sec-secreted proteins in plant pathogen genomes, as we have done in the analyses below.

Table 4.  Bioinformatics resources for prediction of N-terminal secretion signals in bacterial protein sequences
Name of applicationType of algorithm usedAvailability 
CELLO [137]Support vector machineWebserver: http://cello.life.nctu.edu.tw/ 
NNPSLNeural networkWebserver: http://www.doe-mbi.ucla.edu/astrid/astrid.html 
    
Phobius [139]Hidden Markov modelWebserver: http://phobius.binf.ku.dk/ 
    
PSORT-B [138]Hidden Markov model and support vector machineWebserver: http://psort.org 
    
SigCleaveScoring matrix [132]Part of the EMBOSS suite [230]: http://www.hgmp.mrc.ac.uk/Software/EMBOSS/ 
SignalP3.0 [140]Neural networks and hidden Markov modelWebserver: http://www.cbs.dtu.dk/services/SignalP/ 
    
SPEPlip [218]Neural networkWebserver (requires authentication): http://gpcr.biocomp.unibo.it/predictors/ 
    
SPScanScoring matrix [132]Part of the GCG suite: http://www.accelrys.com/about/gcg.html 
    
SubLoc [231]Support Vector Machine [232]Webserver: http://www.bioinfo.tsinghua.edu.cn/SubLoc/ 

Table 5 summarises the outcome of a SignalP3.0 analysis of the completed plant pathogen genomes (for the full results from this analysis see Supplementary Table 4). Our results indicate that Sec-secreted proteins could represent as much as 9–15% of the proteome in each organism, with the highest percentages in the facultative pathogens P. aeruginosa, X. campestris and R. solanacearum, and the lowest numbers in the fastidious xylem pathogen X. fastidiosa. When we use TMHMM to exclude proteins with one or more transmembrane domains we find a similar distribution, but with percentages ranging from 4% to 11% of the proteome, with the largest number of Sec-positive, TM-negative proteins in P. aeruginosa. The relative proportion of proteins that lack TMs to TM proteins is quite constant across diverse pathogens, with proteins that lack TMs making up ?63% of the predicted Sec-positive proteins in E. carotovora, Pseudomonas spp. and Xanthomonas spp., ?55% in A. tumefaciens and R. solanacearum and ?50% in Xylella spp. The gross difference in the percentage and total number of secreted proteins between facultative and obligate pathogens may well reflect the different lifestyles of these bacteria. Facultative pathogens such as P. aeruginosa benefit from possessing a wide range of nutrient transporters, antibiotic resistance genes and secreted virulence factors to adapt to a wide range of environmental niches. In contrast, X. fastidiosa inhabits a relatively predictable environment, albeit one in which it cycles between a plant host and an insect vector.

Table 5.  Predicted substrates of the Sec system
ProteomeProteome size (number of proteins)Signal peptides identified by SignalP
Number of proteins% of proteome
  1. Each proteome was downloaded from the Integr8 database (http://www.ebi.ac.uk/integr8/EBI-Integr8-HomePage.do) and N-terminal signal peptides were predicted using SignalP3.0 [140]. Each of the proteins having a predicted signal peptide was also scanned for putative transmembrane regions using tmhmm (http://www.cbs.dtu.dk/services/TMHMM/) [55]. The number and percentage of proteins with no predicted transmembrane regions is shown in parentheses. The complete list of proteins identified in these analyses is available online in Supplementary Table 4.

Circular chromosome of Agrobacterium tumefaciens (Washington University)2784330 (160)11.85 (5.75)
Linear chromosome of Agrobacterium tumefaciens (Washington University)1876247 (144)13.17 (7.68)
Plasmid AT of Agrobacterium tumefaciens (Washington University)54359 (38)10.87 (7.00)
Plasmid pTiC58 of Agrobacterium tumefaciens (Washington University)19823 (13)11.62 (6.57)
Agrobacterium tumefaciens (Washington University) (total)5401659 (355)12.20 (6.57)
Circular chromosome of Agrobacterium tumefaciens (Cereon)2722309 (142)11.35 (5.22)
Linear chromosome of Agrobacterium tumefaciens (Cereon)1833216 (125)11.78 (6.82)
Plasmid AT of Agrobacterium tumefaciens (Cereon)54753 (35)9.69 (6.40)
Plasmid Ti of Agrobacterium tumefaciens (Cereon)19822 (11)11.11 (5.56)
Agrobacterium tumefaciens (Cereon) (total)5300600 (313)11.32 (5.91)
Ralstonia solanacearum3440525 (288)15.26 (8.37)
Megaplasmid of Ralstonia solanacearum1676258 (143)15.39 (8.53)
Ralstonia solanacearum (total)5116783 (431)15.30 (8.42)
Erwinia carotovora subsp. atroseptica4462631 (393)14.14 (8.81)
Pseudomonas aeruginosa5566985 (626)17.70 (11.25)
Pseudomonas syringae pv. tomato5471722 (458)13.20 (8.37)
Plasmid pDC3000A of Pseudomonas syringae pv. tomato678 (7)11.94 (10.45)
Plasmid pDC3000B of Pseudomonas syringae pv. tomato708 (6)11.43(8.57)
Pseudomonas syringae pv. tomato (total)5608738 (471)13.16 (8.40)
Xanthomonas axonopodis pv. citri4312645 (407)14.96 (9.44)
Plasmid pXAC64 of Xanthomonas axonopodis pv. citri735 (3)6.85 (4.11)
Xanthomonas axonopodis pv. citri (total)4385650 (410)14.82 (9.35)
Xanthomonas campestris pv. campestris4180669 (424)16.00 (10.14)
Xylella fastidiosa 9a5c2766248 (117)8.97 (4.23)
Plasmid pXF51 of Xylella fastidiosa 9a5c6410 (7)15.63 (10.94)
Xylella fastidiosa 9a5c (total)2830258 (124)9.12 (4.38)
Xylella fastidiosa Temecula12034221 (111)10.87 (5.46)

Strikingly, we observed substantial differences in number and homology of predicted Sec-secreted proteins in the two versions of the A. tumefaciens genome. When we aligned homologous Sec-signal genes identified in the two genomes according to the criterion of at least 80% sequence identity over 90% of their lengths, we found a significant number of Sec signal proteins predicted in the Cereon genome that were not identified as homologues in the Washington version of the proteome, and proteins in the Washington genome that were not identified in the Cereon genome. There are a number of potential reasons for this, including differences in start codon prediction and other ORF prediction criteria. Predictions of bacterial secretomes based on published annotations of the proteome are highly dependent on the accuracy of these annotations. Further work on improving ORF predictions from genome sequence data will help to resolve the differences between these genome sequences and annotations in the future. It is vital that proteome annotations are regularly reviewed and revised subsequent to genome publication. Interfaces such as the PeerGAD database for P. syringae (http://www.pseudomonas-syringae.org) [141]; PseudoDB and coliBASE for Pseudomonas and Erwinia genomes) (PseudoDB.bham.ac.uk, coliBASE.bham.ac.uk) [142]; and PseudoCAP for P. aeruginosa (http://www.pseudomonas.com) have an important role to play in this process.

3.2Tat signals

In E. coli, Tat-specific signal peptides are structurally very similar to Sec signals, but are slightly longer (26–52 amino acids). This is due in part to the presence of a longer N-region. Tat signal peptides harbour the consensus sequence motif S(T)-R-R-x-h-h-h (where h = hydrophobic residue) at the boundary between the N and H regions. Generally Tat signal peptides are less hydrophobic than their Sec counterparts, have a higher positive charge at the N-region and have a positive charge in the C-region [50,143]. Tat signals may be recognised directly by the secretion apparatus; however, it has been proposed that some Tat signal peptides operate in tandem with cognate binding chaperones to orchestrate the assembly and transport of complex enzymes [144].

Early studies of Tat secretion systems suggested that its function was to transport prefolded redox proteins. However, more recent work indicates that the Tat pathway also secretes other non-redox proteins, such as secreted virulence factors in P. aeruginosa[8]. To gain insight into the range of substrates exported using the Tat system, Dilks et al. [145] developed and used a programme called TATFIND1.2 to identify proteins with Tat signals. TATFIND1.2 uses a set of rules to identify close matches to the characteristic twin arginine motif described above and depicted in Fig. 2. Dilks et al. used TATFIND1.2 to search the complete proteomes of 84 diverse prokaryotes, and concluded that the extent to which the Tat system is used varies greatly among different taxa. Among the organisms with the greatest numbers of predicted Tat substrates were R. solanacearum, P. aeruginosa, X. campestris and X. axonopodis, with 71, 57, 55 and 50 predicted substrates, respectively. They observed no correlation between the size of the genome and the number of Tat substrates. However, these bioinformatic predictions must be treated with caution. The presence of a putative Tat signal does not always guarantee export via this system [146,147]. Furthermore, although many of the putative Tat targets identified by Dilks et al. had the S(T)-R-R-x-h-h-h motif, they did not contain a signal peptide at the N terminus and so are unlikely to be true Tat substrates.

Table 6 summarises the outcome of a TATFIND analysis of the completed plant pathogen genomes, in which we also used SignalP3.0 as an additional filtering step (for the full results from this analysis see Supplementary Table 5). Our results indicate that Tat-secreted proteins could make up 0.05–0.87% of the proteome in each organism, with the highest percentages in A. tumefaciens and R. solanacearum, and the lowest numbers in X. fastidiosa. As with Sec signals this difference in the percentage and total number of secreted proteins between facultative and obligate pathogens may well reflect the different lifestyles of these bacteria. The list of plant pathogen proteins found to have Tat signal motifs includes amidases, plant cell wall degrading enzymes, siderophore receptors, and phospholipases. All of these are promising candidates for Tat-dependent virulence factors in plant pathogenic bacteria.

Table 6.  Predicted substrates of the twin-arginine translocation (Tat) system
ProteomeProteome size (number of proteins)TATFINDTATFIND and SignalP3.0
Number of proteins% of proteomeNumber of proteins% of proteome
  1. Each proteome was downloaded from the Integr8 database (http://www.ebi.ac.uk/integr8/EBI-Integr8-HomePage.do) and Tat-dependent signal peptides were predicted using TATFIND1.2 [137]. Potential Tat substrates as predicted by TATFIND1.2 were further filtered by checking for a signal peptide using SignalP3.0 [146]. Each of the proteins having a predicted Tat signal peptide was also scanned for putative transmembrane regions using tmhmm (http://www.cbs.dtu.dk/services/TMHMM/) [55]. The number and percentage of proteins identified by TATFIND but not by SignalP3.0 and with no predicted transmembrane regions is shown in parentheses. The complete list of proteins identified in these analyses is available online in Supplementary Table 5.

Circular chromosome of Agrobacterium tumefaciens (Washington University)2784220.7915 (12)0.54 (0.43)
Linear chromosome of Agrobacterium tumefaciens strain (Washington University)1876301.6023 (17)1.23 (0.91)
Plasmid AT of Agrobacterium tumefaciens (Washington University)543101.848 (7)1.47 (1.29)
Plasmid pTiC58 of Agrobacterium tumefaciens (Washington University)19821.011 (1)0.51 (0.51)
Agrobacterium tumefaciens (Washington University) (total)5401641.1847 (36)0.87 (0.68)
Circular chromosome of Agrobacterium tumefaciens (Cereon)2722200.7311 (7)0.40 (0.26)
Linear chromosome of Agrobacterium tumefaciens (Cereon)1833321.7522 (16)1.20 (0.87)
Plasmid AT of Agrobacterium tumefaciens (Cereon)547122.197 (6)1.28 (1.10)
Plasmid Ti of Agrobacterium tumefaciens (Cereon)19821.011 (1)0.51 (0.51)
Agrobacterium tumefaciens (Cereon) (total)5300661.2541 (30)0.77 (0.57)
Ralstonia solanacearum3440511.4830 (16)0.87 (0.47)
Megaplasmid of Ralstonia solanacearum1676201.1911 (8)0.66 (0.48)
Ralstonia solanacearum (total)5116711.3941 (24)0.80 (0.47)
Erwinia carotovora subsp. atroseptica4462270.6114 (10)0.31 (0.22)
Pseudomonas aeruginosa5566571.0222 (11)0.40 (0.20)
Pseudomonas syringae pv. tomato5471340.6215 (13)0.27 (0.23)
Plasmid pDC3000A of Pseudomonas syringae pv. tomato670000
Plasmid pDC3000B of Pseudomonas syringae pv. tomato7011.4300
Pseudomonas syringae pv. tomato (total)5608350.6215 (13)0.27 (0.23)
Xanthomonas axonopodis pv. citri4312501.1626 (21)0.60 (0.49)
Plasmid pXAC64 of Xanthomonas axonopodis pv. citri730000
Xanthomonas axonopodis pv. citri (total)4385501.1626 (21)0.60 (0.49)
Xanthomonas campestris pv. campestris4180521.2422 (18)0.53 (0.43)
Xylella fastidiosa 9a5c2766170.616 (2)0.22 (0.07)
Plasmid pXF51 of Xylella fastidiosa 9a5c640000
Xylella fastidiosa 9a5c (total)2830170.616 (2)0.22 (0.07)
Xylella fastidiosa Temecula12034100.491 (0)0.05 (0)

3.3Hrp (type III) signals

The N-terminal signal peptide motifs of substrates of the Sec and Tat systems are relatively easy to recognize. However, our ability to identify the protein substrates of other secretion systems on the basis of secretion signal motifs remains quite limited. Many researchers have tried to define the type III targeting signals required for secretion by the type III system. A noteworthy feature of the type III system is that the pathway is capable of secreting heterologously expressed effectors derived from other pathogens, which suggests that the secretion signal or signals are broadly conserved across different species. This phenomenon was first observed with the reciprocal secretion of type III substrates by Yersinia, Shigella, and Salmonella spp. [148], and it has subsequently been shown that the type III systems of E. chrysanthemi and Y. enterocolitica can secrete P. syringae effectors [149,150]. The use of a universal targeting signal for type III substrates is consistent with the observation that some type III-secreted effector proteins are conserved in multiple genera of plant pathogens and animal pathogens [151–153].

Deletion mutation studies of type III-secreted effectors have shown that essential targeting information is frequently located in the first 10 and 15 amino acids of secreted proteins [154]. Fusion of the biologically active C-terminal portion of AvrRpt2 to the N-terminal regions of the X. campestris pv. vesicatoria AvrBs2 or P. syringae pv. maculicola AvrRpm1 effectors results in translocation of the hybrid proteins into plant cells, as indicated by type III/RPS2 dependent elicitation of the HR in infected Arabidopsis plants [155,156]. Some type III-secreted proteins require a chaperone for secretion to take place. For example, the HopPtoV chaperone ShcV binds to the N-terminal region of HopPtoV between amino acids 76 and 125 of the 391-residue full-length protein and is essential for HopPtoV secretion [155]. Type III chaperones can function in targeting proteins to the secretion apparatus or in the folding, refolding and stabilising of secreted proteins.

Despite these studies, the nature of the secretion signal remains controversial [157–159]. Studies with protein fusions involving the first 15 amino acids of Yersinia Yops showed that reporter hybrids could be secreted by the type III system despite frameshift mutations, suggesting that targeting information resides in the mRNA [160–162]. However, results obtained using wobble nucleotide mutations altering the mRNA argue against an mRNA signal, and the discovery that synthetic amphipathic sequences of eight residues can function as a targeting signal suggests that targeting information resides in a relatively general property of the protein [163]. It is conceivable that both mRNA and amino acid signals are used, as suggested for the flagellar protein FlgM [164].

The mRNA signal hypothesis has been explored in the context of plant pathogen effectors. Secretion of AvrPto1–15-Npt is not disrupted by frameshift mutations affecting the AvrPto sequence [150]. However, frameshift mutations affecting the first 18 codons in avrBs2 do not abolish secretion of AvrBs2 by the type III system of X. campestris pv. vesicatoria, suggesting the possibility of an mRNA targeting signal [155].

Functional assays for type III-secreted proteins in P. syringae have been used to establish a training set from which to identify patterns associated with secreted proteins. Examination of these proteins has revealed amino-acid biases in the first 50 residues, most notably high serine and proline, a pattern of equivalent amino acids in the first five positions, such as a hydrophobic amino acid in position 3 or 4, and a lack of acidic amino acids in the first 12 positions [152,165,166] (Fig. 2). These patterns were used to search the genome of P. syringae pv. tomato DC3000 [165–168] to obtain a list of candidate proteins. However, if type III secretion systems do indeed rely on a multiplicity of signals, including N-terminal amino acids, mRNA and chaperones, it is quite likely that current training sets and bioinformatic predictions will be inherently inaccurate, and that we will need to devise training sets and motifs from subsets of effectors with different targeting mechanisms to improve predictions in the future. Current information on known and predicted type III effectors in P. syringae are available online at http://www.pseudomonas-syringae.org.

Models for type III secretion signals must also explain another feature of type III secretion. Selection of substrates for type III secretion varies according to environmental conditions and contact with the host cells. In P. syringae, some substrates, such as P. syringae HrpZ and the HrpA pilin are secreted when bacteria are cultured in a simple apoplast-mimicking minimal medium. Others, such as the P. syringae effector AvrPto, are only secreted at cool temperatures, and at a moderately acidic pH [169]. A subset of type III-secreted proteins, such as the P. syringae protein AvrB, have never been shown to be secreted in vitro by their native secretion system, but are delivered into the cytoplasm of host cells [152,169]. At present we do not know whether queuing and gating of type III effectors relies on the intrinsic properties of the secreted proteins, the secretion system, or on as yet unidentified factors. One clue comes from a recent paper by Buttner et al. [170], which shows that the HpaB protein of X. campestris pv. vesicatoria regulates the exit of at least five effector proteins from bacterial cells, but is not required for delivery of translocon components such as XopA and HrpF. HpaB also seems to play a role in restricting export of non-effectors. HpaB interacts directly with the first 50 amino acids of AvrBs3, which contain the signal for type III secretion. HpaB homologues are present in X. campestris, X. axonopodis, X. oryzae, R. solanacearum, Burkholderia mallei and Burkholderia pseudomallei. If we could establish a model that predicts both the presence of a secretion signal and the nature of the interaction with HpaB we would have a more accurate system for identifying and describing type III secreted effectors in Xanthomonas, Ralstonia and Burkholderia.

3.4Other secretion signals

Attempts to devise bioinformatic tools for predicting type I, type II, type IV and type V secretion motifs have generally not been successful. Current data suggests that for some of these systems, particularly the type II system, the signal depends on the secondary or tertiary structure of folded proteins. Studies of heterologous, chimeric and mutant type II-secreted proteins indicate that each protein has reconciled folding and secretion motif presentation in its own way, resulting in a situation where exoenzymes from E. chrysanthemi cannot be exported by E. carotovora and vice versa, and where even 3D structural comparisons of proteins secreted by the same system fail to reveal a unifying secretion motif [171,172]. Substrates for type I, II, IV and V systems are usually identified using other clues, including proximity to the genes encoding the transport apparatus, homology to previously identified proteins or protein domains and the presence of upstream promoter sequences, as discussed below.

4Clustering and context

N-terminal signal sequence predictions using SignalP and TATFIND can be used to predict proteins translocated through the cytoplamic membrane to the periplasmic compartment. However, the presence of Sec and Tat signal peptides does not provide any indication as to whether proteins are secreted across the OM. The final location of many of these proteins will be the periplasm or membrane. The presence or absence of potential membrane-spanning domains can be used to provide indicative, but not conclusive, insight into final location, given the limitations of current prediction tools and the fact that some secreted proteins insert into and interact with host membranes. One additional piece of correlative evidence that can be used to support the identification of putative secreted proteins is their genetic and regulatory context, in particular clustering and co-expression with secretion system genes or other secreted proteins. Here we consider this proposition with respect to the type I, II and III systems.

Type I systems are composed of three cell envelope proteins (see Section 2.1). Genes encoding the type I system and type I-secreted proteins are typically located in close proximity (Fig. 3) [62]. For example, the genes coding for protease PrtW in E. carotovora, or PrtB and PrtC in E. chrysanthemi are clustered with genes encoding PrtD, PrtE, and PrtF which form the ABC exporter [173,174], and the gene encoding the AprA protease of P. aeruginosa is clustered with the genes encoding the AprDEF export system [175]. If we apply this reasoning to other bacteria, the tliDEF-like genes of P. syringae (PSPTO3330, PSPTO3329, PSPTO3328) are adjacent to a putative protease gene, and a putative type I system in R. solanacearum is adjacent to a gene encoding a hemolysin-like protein. The tliDEF genes of the non-pathogenic bacterium P. fluorescens SIK W1 have been shown to be involved in lipase secretion and are adjacent to lipase, protease and protease inhibitor genes [176]. The gene coding for the toxin rhizobiocin in A. tumefaciens C58 is linked with prsD, a gene encoding a putative ATPAse and prsE, coding for the membrane component of an ABC transporter, which are similar to tliD and tliE and to a type I system found in S. meliloti[59]. Proximity does seem to be a useful tool for identifying at least some candidate type I-secreted proteins.

Figure 3.

Gene organisation in type I secretion/protease clusters from P. aeruginosa, P. syringae, E. chrysanthemi, E. carotovora and A. tumefaciens. inh codes for a protease inhibitor.

Proteins are secreted by the type II (GSP) system after translocation across the inner membrane by either the Tat or the Sec system. The genes encoding the components of the Sec system are scattered across the chromosome, but the tatA, B and C genes, coding for the Tat system are organised in an operon. Genes encoding Tat or Sec-secreted proteins are generally not clustered with their cognate secretion genes. The genes encoding the type II secretion machinery are typically organised in a large cluster on the chromosome (Section 2.2), which may be associated with one or more genes encoding secreted proteins. In P. syringae pv. tomato DC3000, a phospholipase gene with a putative N-terminal cleavable signal sequence is located downstream of gspD, the secretin gene. The function of the protein and its location in the genome suggest that it is a good candidate for a type II secretion substrate. Similarly, the E. carotovora Out cluster contains two putative plant cell wall degrading enzyme coding genes, located between outB and outC. However, the genomes of Erwinia spp. are known to contain numerous genes encoding confirmed Out-secreted proteins that are not clustered with the main secretion system genes, so proximity is only moderately effective in predicting type II-secreted proteins [15,16].

Type III secretion system genes are found clustered in a single region of the chromosome (or on a megaplasmid in the case of R. solanacearum) along with genes encoding type III-secreted effector proteins. Type III-secreted effectors are also found singly and in clusters at distal regions in the genome. Many type III and effector gene clusters have the characteristic features of pathogenicity islands, which have an altered G + C content, contain more than one virulence genes and are bordered by tRNA genes and/or genetic mobile elements (reviewed recently in [100]). However, the secreted proteins associated with type III PAIs are not exclusively proteins secreted by the type III system. The 44 kb region containing the type III system of E. chrysanthemi encodes a wide range of potential virulence factors, including the TPS-secreted protein hecA, but none of these proteins appears to be a type III-secreted effector [9,177]. Co-localisation and features indicating acquisition by horizontal transfer from other organisms are significant, but not sufficient, criteria to identify candidate type III effectors [100,152].

5Homology and domain analyses

In the first section of this paper we considered how homology analyses can be used to identify the conserved components of secretion systems. Homology-based analyses are also a key tool in the identification of secreted proteins, and can be performed as direct sequence similarity analyses against databases of known secreted proteins. However, a more efficient approach may be to perform analyses based on conserved domains archived in databases such as Pfam and InterPro. Domain predictions already exist for many well-characterised types of secreted proteins, such as secreted exoenzymes, adhesins, toxins and a small number of type III effectors, and for a number of proteins of unknown function that have been observed multiple times in sequence databases. For example, Delepelaire proposes that the glycine-rich repeat GGXGXDXXX, can be used to identify a widely distributed class of type I-secreted proteins [178]. Proteins with this motif are present in all of the pathogens listed in Table 2, with the exception of X. campestris, and are particularly numerous in R. solanacearum and plant symbionts such as B. japonicum and S. meliloti.

Analyses of proteins of unknown function are particularly effective when carried out in conjunction with analyses of gene context and signal sequences, as discussed above. For example, Ginalski et al. [179] used analyses of sequence similarity and fold recognition to propose that unknown proteins with Pfam domain DUF920 are a novel family of putative transglutaminase-like cysteine proteases called BTLCP proteins, each of which contains a conserved N-terminal peptide, and several of which are located adjacent to ABC transporters, suggesting two possible mechanisms for secretion. Members of this novel protease family are present in both A. tumefaciens (AGR_C_433p,AGR_C_3225p,AGR_L_1020p, AGR_L_2583p) and P. aeruginosa (PA1434). Similarly, Kim et al. [180] used sequence-based predictions to isolate two functional lipases from A. tumefaciens.

Homology and domain predictions are central to the identification of candidate autotransporters and two partner systems, for which, like the type I, II and IV-secreted proteins, we have few tools to identify signal sequence motifs. All known autotransporters share certain features in their primary structure [181]. They generally have an N-terminal signal peptide, recognisable by SignalP, although some have an unusually long signal peptide (about 50 amino acids)[181,182]. At the C terminus there is an autochaperone domain and the β-domain, which forms the outer membrane pore through which the mature secreted protein exits the cell. The autochaperone domain and the β-domain are separated by a linker region. Adjacent to the N-terminal signal peptide is a passenger domain. In some of the best known autotransporters such as Neisseria gonorrhoeae IgA1 protease, this passenger domain has peptidase activity [183]. The function of these secretion domains may extend beyond secretion. For example, the passenger domain of the vacuolating toxin of Helicobacter pylori induces the formation of large cytoplasmic vacuoles in eukaryotic cells [184].

The C-terminal β-domain is represented in the Pfam protein domain database [185] under accession number PF03797 (http://www.sanger.ac.uk/cgi-bin/Pfam/getaccPF03797). In the current release (Pfam 15.0, August 2004), 665 proteins are listed as containing the autotransporter β-domain. These proteins are restricted to the Gram-negative Bacteria, including Proteobacteria, Planctomycetes, Cyanobacteria, Chlamydiae, Fusobacteria, and a couple of bacteriophages.

Apart from their domain composition, another characteristic feature of autotransporters is their size. Of the 665 autotransporters in Pfam 15.0, 117 are identified as fragments. For the remaining 548 autotransporter sequences the mean length is 1125 amino acids and 275 (50%) are more than 1000 residues long, whereas the majority of bacterial proteins are less than 800 residues long. The size of many autotransporter proteins is frequently linked to the presence of a large number of repeated, conserved motifs. An integrated strategy for identifying candidate autotransporters would therefore be to profile the proteome for large proteins with autotransporter motifs, a high degree of repetition in their internal structure, and the presence of characteristic repeated motifs associated with type V-secreted proteins, such as the Pfam domains Fil_haemagg,He_PIG and CADG. Other Pfam domains found in plant pathogen autotransporters include Hexapep, Pertactin, Lipase_GDSL,Peptidase_S8,Peptidase_M28,BNR, TIG and Lipoprotein_S. Individual autotransporter proteins may display one, two or more of these characteristics. A full list of proteins with autotransporter β-domains from the complete plant pathogen genomes is given in Supplementary Table 3.

6Promoter predictions

In pathogens with Group I Hrp systems, such as P. syringae, expression of type III secretion system genes and secreted effectors is positively regulated by the type III-specific sigma factor HrpL. It possible to use knowledge of HrpL-regulated promoters to identify candidate secreted proteins on the basis of expression studies and genome-wide promoter predictions [186]. Comparative analyses of HrpL-regulated promoters have revealed a conserved cis-located motif known as the hrp box, which was originally defined as GGAACC 15/16 bp CCAC [187–189], and has been refined through further analyses to tGGAACCg–13/14bp–gCCACncAg [186–189]. This consensus pattern fits the −35 −10 spacing observed for other ECF sigma factors [190]. However, no studies have conclusively linked the specific residues of the hrp box motif to HrpL binding, and the promoters of several HrpL-regulated genes deviate from this consensus sequence. Close matches to the Hrp consensus can be identified by searching the genome sequence against a scoring matrix or Hidden Markov model (HMM) trained on experimentally confirmed Hrp boxes using software such as ScanACE [191], MAST [192] and HMMER (http://hmmer.wustl.edu/). Fouts et al.[186] searched the complete genome of P. syringae pv. tomato DC3000 against a HMM trained on 51 known hrp promoter sequences to identify 12 new virulence-implicated genes. They found that genes downstream of significant matches (HMMER E value less than 1e-4) were detectably expressed in a HrpL dependent manner, but that there was no correlation between E value and level of expression.

Despite the successful identification of numerous type III effectors by promoter searching, several limitations also became apparent. First, not only type III effectors but also regulatory proteins and toxin biosynthesis genes are also under the control of the Hrp regulatory system and presence of an upstream promoter does not distinguish between these and effectors. Co-regulation of secreted and non-secreted proteins is particularly problematic for secretome profiling studies when the candidate genes in question share no detectable homology with characterised proteins [186]. Second, several known effector genes have candidate promoters with very poor (not significant) E values, indicating that by using a stringent E value threshold, many effector genes are missed. In an attempt to overcome these limitations, Petnicki-Ocwieja et al. [166] combined the Hrp box promoter searches with additional criteria including evidence of horizontal transfer (atypical %GC, codon usage, etc.) and the presence of export signals in the amino acid sequences of putative effectors, as discussed in Section 3.

Promoter prediction analyses have been used to identify candidate secreted proteins in R. solanacearum (PIP box), Xanthomonas spp. (PIP box), E. carotovora (Hrp box) and A. tumefaciens (Vir box) (e.g. [193–195]). At present the use of this approach is largely limited by the accuracy of current models for promoter prediction and by our understanding of promoter function. The development of more sophisticated models that take into account not only primary sequence data, but also DNA composition and curvature, and binding sites for additional DNA-binding proteins such as IHF and CRP are likely to enhance predictions based on co-regulation in the future, particularly when supported by experimental data from targeted and genome-wide expression analyses.

7Functional analyses of secreted proteins

Signal sequences, homology, genomic context and co-regulation can all be used to develop working models of the secretomes of plant pathogenic bacteria. However, confirmation of these models and further improvements in bioinformatic predictive tools depend on systematic functional profiling of secreted proteins. The availability of genome sequences for animal pathogens has encouraged the development and application of novel and established tools for identifying secreted proteins, often with the aim of identifying new vaccine candidates. In addition, knowledge of host-microbe interactions has been used to develop a range of techniques to identify proteins delivered into the cytoplasm of plant cells by type III and type IV systems. Fig. 4 gives a schematic overview of some of the tools and techniques currently in use, including a few techniques that have yet to be applied to plant pathogenic bacteria.

Figure 4.

Assays used to identify secreted proteins.

Enzyme tagging (Fig. 4.1) is one of the most established techniques for studying protein secretion and topology. Translational fusions to enzymes such as alkaline phosphatase (phoA) and chloramphenicol acetyl transferase (CAT) can be used in conjunction with colorimetric assays to determine the location of fusion proteins (e.g. [196,197]). Many studies have used transposon-based translational fusion reporter systems to survey individual proteins or whole genomes, but the screening process can be labour intensive. An accelerated variation of this strategy is to use fusions to antibiotic resistance proteins and to select for bacteria that secrete the resistance protein to a functional extracellular or periplasmic location (Fig. 4.2) [198,199]. Enzyme tagging has also been used to identify proteins secreted into plant or animal cells, using fusions to proteins such as adenylate cyclase (CyaA) or β-lactamase that give a clearly detectable phenotype when inside a eukaryotic cell (Fig. 4.11) [200–202]. Another variation on this technique is the Cre-reporter assay for translocation (CrAFT), where secreted proteins are fused to Cre recombinase, which acts on a genetically engineered target gene in the host genome [116,203,204]. One plant-specific method for intracellular detection is to use fusions to a truncated HR-eliciting protein such as AvrRpt2 (Fig. 4.10) [165,168]. A disadvantage of the HR assay for protein secretion is that because it relies on elicitation of programmed cell death it cannot be used in conjunction with assays for the effect of secreted proteins on cell function. We anticipate that future research will lead to the development and application of an increasing number of detection methods based on fusions to non-disruptive markers and to signal transduction molecules that induce reporter genes or recognisable phenotypes without having gross effects on cellular physiology, which will greatly facilitate future studies of effector function (Fig. 4.13).

All translational fusion techniques rely on the ability of the bacterium to secrete the fused protein, and in vitro detection methods require that the target protein be expressed and secreted in synthetic media. Both these factors mean that it can be difficult to drawn firm conclusions from negative results. One solution to the problem of fusion protein compatibility with endogenous secretion systems has been to reduce the size of the fused peptide, using antibodies rather than enzyme assays to detect the secreted protein (Fig. 4.4). Epitope tags can be introduced by using excisable transposons, such as the transposon developed by Bailey and Manoil, which can be excised using Cre recombinase to leave a 63 codon haemagglutinin tag [205]. The availability of genome sequence data means that candidate proteins can also be amplified directly and combined with N-terminal and C-terminal tags using high-throughput cloning methods. Tagged proteins can be further characterized in vitro or visualised in situ (Fig. 4.12). One potentially useful technique for visualizing protein fusions to short peptide tags is exemplified by the use of biarsenical dyes that bind to short sequences containing four cysteine residues. These dyes are non-fluorescent until they bind to the tetracysteine motif, at which point they become strongly green (FlAsH-EDT2) or red (ReAsH-EDT2) fluorescent [206]. However, it is worth noting that almost all attempts to visualize type III effectors delivered by bacteria into plant cells in situ have failed, possibly because of the small number and high turnover rate of the molecules involved, although fusions to fluorescent tags have been used to visualize secreted effector proteins in mammalian cell cultures [207]. One notable exception is a study by Szurek et al. [208], which used an immunocytochemical approach to detect the X. campestris pv. vesicatoria effector AvrBs3 in the nuclei of infected plant cells.

One means of bypassing the problems associated with translational fusions to enzymes, elicitors and epitope tags is to focus on studies of native proteins, using proteomic techniques to identify proteins present in the supernatant and membrane fractions of bacterial cells (Fig. 4.6). This approach has been used to examine secreted proteins in E. chrysanthemi, Bacillus subtilis and P. aeruginosa[209–211]. Proteomic studies can be enhanced by using post-translational tagging methods based on a membrane impermeant tag, such as biotinylation with sulfo-NHS-LC-biotin (Fig. 4.3) [212]. Biotinylated samples can be probed with HRP-streptavidin to detect proteins exposed to labeling. Protein samples can also be probed with antisera from animals inoculated with the selected pathogen, or conversely, antisera raised against candidate secreted proteins can be tested for bactericidal activity (Fig. 4.5) [213]. Antisera can also be screened against peptide libraries to identify exposed epitopes. Although antisera and peptide library-based approaches have been used to study protein secretion profiles in animal pathogens, they have yet to be extensively applied to plant pathogens, possibly because of limited access to appropriate resources and expertise in the plant pathology community. Secreted proteins can also be identified through their interactions with proteins of the secretion apparatus either in vitro or using two-hybrid techniques [114,214–217].

The final, and perhaps most important approach we shall consider here, is the use of assays based on the known properties of plants and plant pathogens. Genomic studies have identified a large number of uncharacterized and potentially secreted enzymes, many of which show homology to previously characterized enzymes, including glucanases, pectate lyases, phospholipases and proteases, but the natural substrates of many of these enzymes is unknown. Collaborations between chemists, biochemists and plant pathologists are needed to develop techniques to profile the biosynthetic and degradative capabilities of these proteins (Fig. 4.8). Progress in plant pathogen genomics and proteomics has been complemented by simultaneous advances in plant transcriptomics, proteomics and metabolomics. One of the most exciting fields of plant pathogen research at present is the study of the mechanistic effects of secreted proteins on plant cells, using bacterial delivery (Fig. 4.9), purified proteins (Fig. 4.8) and stable and transient expression of bacterial proteins in plants and model eukaryotes such as yeast (Fig. 4.14) [152,218–221]. Systematic analyses of candidate secreted proteins can also be extended to include analyses of potential interactions with plant targets, using biochemical assays and one, two and three-hybrid systems to look at protein-protein and protein-nucleic acid interactions (Fig. 4.15). Finally, analyses of plant secondary metabolites are needed to identify the anti-microbial compounds, signals and nutrient sources that are the ligands and substrates for secreted proteins. It is very likely that our increasing knowledge of the biology of plant-bacteria interactions will in turn be used to identify additional secreted proteins, by virtue of transformations and modifications of these proteins that only occur inside, or in the apoplast of plant cells, such as tyrosine phosphorylation, acylation or proteolysis [152].

8Concluding remarks

Experimental and bioinformatic studies of secreted proteins in plant pathogens have largely been limited to studies that build on the legacy of the pre-genomic era. Our overall picture of plant pathogen behaviour remains in some respects a caricature, drawn in the broad strokes of secretion system mutants and secreted proteins with clear biochemical functions and associated phenotypes. Relatively few researchers have taken up the challenge of describing and investigating the unexplored areas of the plant pathogen secretome, previously rendered invisible by the limitations of phenotypic analyses.

The first step in bridging this knowledge gap is to survey the secretomes of plant pathogens, and to identify key similarities and differences in their secretion systems and secreted proteins. In this paper we have tried to provide an overview of the tools and techniques available to researchers who wish to construct a detailed profile of a Proteobacterial secretome. We have used homology searches, domain analyses and signal peptide predictions to generate first-pass profiles of the secretomes of nine sequenced plant pathogens. These analyses suggest each genome contains tens or hundreds of uncharacterized secreted proteins, secreted by multiple secretion systems.

We now need to systematically profile the expression of these proteins during pathogenesis; to construct mutations in the full complement of secretion systems present in each model bacterium; and to characterise the secreted proteomes of bacteria grown in inducing media designed to mimic the plant environment. Ideally, such analyses will be complemented by high-throughput expression analyses of gene expression and function in vitro and in planta. The results of these studies will provide functional confirmation of the identity of secreted proteins, allowing us to refine bioinformatic predictions of promoters, domains and signal sequences.

We have highlighted only a few of many unanswered questions regarding secretion system function, particularly those relating to bioinformatic predictions of the bacterial secretome. One important research area for the future is an increased understanding of how protein “queuing” and secretion system “gating” is controlled in systems that secrete multiple proteins in an organized manner. The availability of sophisticated tools for studying protein-protein interactions is already providing new insight into the molecular mechanisms of protein targeting and translocation [114,222–224]. There are many unanswered questions about the physical assembly and localisation of secretion systems within bacterial cells and how this relates to their function. The increasing resolution of molecular imaging techniques such as electron cryomicroscopy now allows us to view the structural organization and assembly of secretion systems to a resolution of 17 angstrom [225]. Several studies have suggested that secretion systems are not distributed at random around the cell envelope, but are localized to specific sites such as the cell poles [e.g. 213,226].

The predictions we have discussed in this paper are all based on analyses of protein sequence and structure. Researchers are increasingly aware that mRNA sequence and conformation can have a significant effect on protein targeting and expression. Bioinformatic predictions of mRNA sequence and structure will certainly have an increasing role in secretome predictions in the future. In addition to the potential role of mRNA in targeting substrates to the type III system, mRNA may have a key role in regulating the expression and stoichiometry of secreted proteins. For example, Hienonen et al. [227] have reported that the mRNA of the type III pilin, HrpA has an unusually long half-life, possibly as a result of a stable GC-rich loop in the 3′ end of the transcript. Future sequence-based predictions may be designed to predict both the destination and the duration of secreted proteins at both an mRNA and protein level.

Finally, we have focused on protein secretion systems as delivery systems for proteins, but there is increasing interest in whether some systems, particularly cell to cell secretion systems such as the type III and type IV systems, can act as a one or two-way conduit for other molecules. We already know that the type IV system secretes both proteins and DNA, but recent studies of peptidoglycan receptors and associated responses have suggested that other small molecules can pass through these channels and subsequently interact with host cell components [228].

Plant pathogens have adapted a wide variety of protein secretion systems for use in host interactions. Greater understanding of the origins, functions and regulation of these secretion systems and their substrates will yield important new insights into the evolution and ecology of plant pathogenic bacteria. The availability of genome sequence data for pathogenic and non-pathogenic bacteria, along with knowledge of the conserved features of protein secretion systems and secreted proteins mean that it is now possible to build a much more accurate profile of the modus operandi and distinguishing features of bacterial “cereal” killers.

9Note added in proof

The genome sequence of the plant pathogen Xanthomonas oryzae pv. oryzae listed in the unfinished genomes section of Table 1 has now been published. Lee, B.M., Park, Y.J., Park, D.S., Kang, H.W., Kim, J.G., Song, E.S., Park, I.C., Yoon, U.H., Hahn, J.H., Koo, B.S., Lee, G.B., Kim, H., Park, H.S., Yoon, K.O., Kim, J.H., Jung, C.H., Koh, N.H., Seo, J.S. and Go, S.J., (2005) The genome sequence of Xanthomonas oryzae pathovar oryzae KACC10331, the bacterial blight pathogen of rice. Nucl. Acids Res. 33, 577–586.

Acknowledgements

The authors thank Kieran Dilks for supplying us with the TATFIND1.2 software and helpful discussions; Soren Brunak for supplying us with SignalP3.0 software; Tracy Palmer for helpful discussion and reading of the manuscript. G.M.P. is supported by The Royal Society, NERC and the BBSRC. D.J.S. is supported by the Gatsby Charitable Foundation. I.C. is supported by the Swiss National Science Foundation.

Appendix A Supplementary data

Supplementary data associated with this article can be found, in the online version at http://doi:10.1016/j.fmrre.2004.12.004.

Ancillary