The multidrug/oligosaccharidyl-lipid/polysaccharide (MOP) exporter superfamily

Authors


M. Saier, Division of Biological Sciences, University of California at San Diego, La Jolla, CA 92093 0116, USA. Fax: 001 858 534 7108, Tel.: 001 858 534 4084, E-mail: msaier@ucsd.edu

Abstract

The multidrug/oligosaccharidyl-lipid/polysaccharide (MOP) exporter superfamily (TC #2.A.66) consists of four previously recognized families: (a) the ubiquitous multi-drug and toxin extrusion (MATE) family; (b) the prokaryotic polysaccharide transporter (PST) family; (c) the eukaryotic oligosaccharidyl-lipid flippase (OLF) family and (d) the bacterial mouse virulence factor family (MVF). Of these four families, only members of the MATE family have been shown to function mechanistically as secondary carriers, and no member of the MVF family has been shown to function as a transporter. Establishment of a common origin for the MATE, PST, OLF and MVF families suggests a common mechanism of action as secondary carriers catalyzing substrate/cation antiport. Most protein members of these four families exhibit 12 putative transmembrane α-helical segments (TMSs), and several have been shown to have arisen by an internal gene duplication event; topological variation is observed for some members of the superfamily. The PST family is more closely related to the MATE, OLF and MVF families than any of these latter three families are related to each other. This fact leads to the suggestion that primordial proteins most closely related to the PST family were the evolutionary precursors of all members of the MOP superfamily. Here, phylogenetic trees and average hydropathy, similarity and amphipathicity plots for members of the four families are derived and provide detailed evolutionary and structural information about these proteins. We show that each family exhibits unique characteristics. For example, the MATE and PST families are characterized by numerous paralogues within a single organism (58 paralogues of the MATE family are present in Arabidopsis thaliana), while the OLF family consists exclusively of orthologues, and the MVF family consists primarily of orthologues. Only in the PST family has extensive lateral transfer of the encoding genes occurred, and in this family as well as the MVF family, topological variation is a characteristic feature. The results serve to define a large superfamily of transporters that we predict function to export substrates using a monovalent cation antiport mechanism.

Abbreviations
ABC

ATP-binding cassette

MATE

multi-drug and toxin extrusion

MOP

multidrug/oligosaccharidyl-lipid/polysaccharide

MPA

membrane-periplasmic auxiliary

MVF

mouse virulence factor

OLF

oligosaccharidyl-lipid flippase

PST

prokaryotic polysaccharide transporter

TMS

transmembrane segment

Major families of drug efflux pumps

Bacterial species that have developed clinical resistance to antimicrobial agents are increasing in numbers and have become a serious problem in hospitals [1]. One of the major mechanisms of drug resistance in both prokaryotes and eukaryotes involves drug efflux from cells. There are many drug efflux systems known in bacteria [2–5], and these belong to five ubiquitous transporter (super)families [6,7]. Four of them (RND, DMT, MFS and MATE; see below) use drug/H+ or Na+ antiport to energize drug efflux while one, the ATP-binding cassette (ABC), uses ATP hydrolysis.

Most characterized members of the resistance/nodulation/division (RND) superfamily [8] function as drug or heavy metal efflux pumps in Gram-negative bacteria. Homologues in Gram-positive bacteria serve as lipid exporters [9]. Small multidrug resistance (SMR) family pumps within the drug/metabolite transporter (DMT) superfamily [10] consist of homodimeric or heterodimeric structures with only four TMSs per subunit [11]. They export cationic drugs using a simple cation antiport mechanism involving a conserved glutamyl residue [12]. Drug exporters of the major facilitator superfamily (MFS) are found within six families, each known to transport a broad range of structurally distinct drugs [13,14].

The ABC superfamily of ATP-driven transporters includes many families that are potentially active in the uptake or efflux of metabolite analogues and other drugs. For instance, the oligopeptide uptake transporter of Salmonella typhimurium takes up amino-glycoside antibiotics such as kanamycin and neomycin (reviewed in [7]). Uptake systems segregate from the efflux systems phylogenetically [15]. Members of this superfamily can also act on many types of macromolecules, a unique characteristic of the ABC superfamily [16].

The MATE family of drug exporters (TC #2.A.66.1)

Only a few members of the multidrug and toxin extrusion (MATE) family [3] are characterized functionally (Table 1). These proteins include: (a) NorM and (b) VmrA in Vibrio parahaemolyticus, a halophilic marine bacterium that is one of the major causes of food poisoning in Japan [17–19]; (c) YdhE from Escherichia coli, a close homologue of NorM [20]; (d) Alf5 from the plant, A. thaliana[21]; (e) VcmA from Vibrio cholerae non-O1, a nonhalophilic Vibrio species [22] and (f) BexA from Bacteroides thetaiotaomicron[23]. One member of the family, the yeast Erc1 protein, elevates resistance to the methionine analogue, ethionine [24].

Table 1. Functionally characterized members of the multidrug and toxin extrusion (MATE) family.
OrganismAbbreviationGene name (Accession number)Substrates (drug class)References
  • a

    Quinolones and fluoroquinolones;

  • b

    b aminoglycosides;

  • c

    c anthracyclines such as adriamycin hydrochloride;

  • d

    d trimethoprim-sulfamethoxazole;

  • e

    e miscellaneous antibiotics.

Arabidopsis thalianaAth57Alf5 (BAB02774)Tetramethylammonium, PVP (polyvinylpyrrolidone) pyrrolidinone[21]
Escherichia coliEco2NorM (YdhE) (P37340)Ciprofloxacina, berberine, kanamycinb, streptomycinb, acriflavine, tetraphenylphosphonium ion (TPP) chloramphenicole, norfloxacina, enoxacina, fosfomycin, doxorubicinc, trimethoprimd, ethidium bromide, benzalkonium, deoxycholate[1,25]
Bacteroides thetaiotaomicronBth1BexA (BAB64566)Norfloxacin1, ciprofloxacina, ethidium bromide[19]
Vibrio choleraeVch1NorM (VcmA) (Q9KRU4)Norfloxacin1, ciprofloxacina, ofloxacin1, daunomycinc, doxorubicinc, streptomycinb, kanamycinb ethidium bromide, 4′6′-diamidino-2-phenylindole dihydro-chloride (DAPI), Hoechst33342, acriflavine[10]
Vibrio parahaemolyticusVpa1NorM (O82855)Norfloxacin1; ethidium bromide; kanamycinb; ciprofloxacin1, streptomycinb[25]
Vpa3VmrA (BAB68204)4′6′-diamidino-2-phenylindole (DAPI), (TPP), acriflavine, ethidium bromide[17]
Saccharomyces cerevisiaeSce3Erc1 (382954)Ethionine[28]

NorM, VmrA and VcmA have been shown to function by a drug/Na+ antiport mechanism [17,19,22], and other members of the superfamily may also be drug/Na+ antiporters. Several members are annotated as DinF proteins. The functions of the DinF proteins are unknown, but expression of some of these proteins has been shown to be induced by DNA damage [25,26]. A function related to the export of nucleotides excised from damaged DNA during photorepair can be postulated.

The polysaccharide transporter (PST) family (TC #2.A.66.2)

Characterized protein members of the PST family are generally of 400–500 amino acid residues in size and traverse the membrane 12 times as putative α-helical TMSs. Analyses conducted in 1997 [27] showed that members of the PST family formed two major clusters, one of which was concerned putatively with lipopolysaccharide O-antigen repeat unit export (flipping) in Gram-negative bacteria, the other which was concerned with exopolysaccharide or capsular polysaccharide export in both Gram-negative and Gram-positive bacteria. However, numerous archaeal homologues are now recognized, and bacteria use PST systems to export other complex carbohydrates such as teichuronic acids [28]. The mechanism of energy coupling for PST exporters is not established.

PST transporters may function together with auxiliary proteins that regulate transport and allow passage of complex carbohydrates across both membranes of the Gram-negative bacterial envelope [29]. Thus, each Gram-negative bacterial PST system specific for an exo- or capsular polysaccharide functions in conjunction with a cytoplasmic membrane-periplasmic auxiliary (MPA) protein with a cytoplasmic ATP-binding domain (MPA1-C; TC #8.A.3) as well as an outer membrane auxiliary protein (OMA; TC #1.B.18) [27]. Each Gram-positive bacterial PST system functions in conjunction with a homologous MPA1 + C pair of proteins (TC #8.A.3) equivalent to an MPA1-C protein of Gram-negative bacteria. The C-domain has been shown to possess tyrosine protein kinase activity, suggesting that it functions in a regulatory capacity [30]. The lipopolysaccharide exporters may function specifically in the translocation of the lipid-linked O-antigen side chain precursor from the inner leaflet of the cytoplasmic membrane to the outer leaflet, but this possibility has not been established experimentally.

The oligosaccharidyl-lipid flippase (OLF) family (TC #2.A.66.3)

N-Linked glycosylation in eukaryotic cells follows a conserved pathway in which a tetradecasaccharide substrate (Glc3Man9GlcNAc2) is assembled initially in the endoplasmic reticular (ER) membrane as a dolichylpyrophosphate (Dol-PP)-linked intermediate before being transferred to an asparaginyl-residue in a lumenal protein. An intermediate, Man5GlcNAc2-PP-Dol is made on the cytoplasmic side of the membrane and translocated across the membrane so that the oligosaccharide chain faces the ER lumen where biosynthesis continues to completion [31]. The exporter in Saccharomyces cerevisiae that catalyzes the translocation step is the 574 amino acid nuclear division Rft1 protein with 12 putative TMSs [31]. Homologues are found in plants, animals and fungi.

The mouse virulence factor (MVF) family (TC #2.A.66.4)

A single member of the MVF family, MviN of Salmonella typhimurium, has been shown to be an important virulence factor for this organism when infecting the mouse [32]. In several bacteria, genes encoding MviN homologues occur in operons that also encode the uridylyl transferase, GlnD, that functions in the regulation of nitrogen metabolism [33]. Nothing more is known about the function of MviN or any other member of the MVF family. However, as will be shown below, these proteins are related to members of the PST and MATE families with greatest sequence similarity to members of the PST family. It is therefore possible that MVF family members are related functionally to PST family members exporting complex carbohydrates or related substances.

The MOP superfamily

In this paper, we show that the MATE family of drug exporters, the PST family of polysaccharide exporters, the OLF family of lipid-linked oligosaccharide exporters and the MVF family of mouse virulence-related proteins are all homologous and are, therefore, related by common descent. We designate the superfamily that includes these four related families the MOP (MATE/MVF-OLF-PST) superfamily. Currently sequenced members of these families are identified, and the distribution of their members in the living world is determined. While MATE family members are found in all domains of life (bacteria, archaea and eukaryotes), PST family members are restricted to prokaryotes (both archaea and bacteria), OLF family members are restricted to eukaryotes and MVF family members are restricted to bacteria. In contrast to the MATE and PST families that exhibit multiple paralogues in any one organism, the eukaryotic OLF family is currently very small, consisting of only eight sequenced orthologous members with no more than one homologue per organism, and the MVF family, while much larger, probably also consists primarily of orthologues. Because at least some members of the prokaryotic-specific PST family can flip lipid-linked oligosaccharides (i.e. O-antigen precursors of lipopolysaccharides in Gram-negative bacteria), members of this family may serve as the functional counterpart of the oligosaccharidyl lipid exporters of eukaryotes. The reported sequence analyses lead us to suggest that prokaryotic oligosaccharidyl-lipid exporters were the primordial systems that gave rise to all members of the MOP superfamily. We tabulate these proteins according to family and derive reliable multiple alignments upon which phylogenetic trees are based (see our align website: http://www.biology.ucsd.edu/msaier/align.html/). We also derive average hydropathy, similarity and amphipathicity plots which allow us to make transmembrane topological predictions, and these in turn lead to predictions regarding the evolutionary origins of protein topological types found within the MOP superfamily. Most importantly, our results allow us to propose that all members of the MOP superfamily function as secondary efflux carriers using a solute/cation antiport mechanism.

Computer methods

Sequences of the proteins that comprise the MATE, PST, OLF and MVF families were obtained separately by initial screening procedures involving psi-blast[34] and recursive psi-blast searches [10] using the screentransporter program without iterations [35]. Recognizable members were retrieved from genbank[36], swiss-prot and trembl[37] and the nonredundant database, nrdb90 [38] (e-value 10−3). Homologues were retrieved in the period March–June 2002.

Multiple sequence alignments were constructed using the clustal x program [39]. The gap penalty and gap extension values used with the clustal x program were 10 and 0.1, respectively, although other combinations were tried. Average hydropathy, similarity and amphipathicity plots were derived from clustal x alignments using the avehas program [40]. Phylogenetic trees were derived by the neighbor-joining method from alignments generated with the clustal x program using the blosum 62 scoring matrix. The phylogenetic trees were drawn using the treeview program [1,41].

Charge bias analyses of membrane protein topology were performed using the hmmtop[42], tmhmm[43], what[44] and toppred2 [45] programs. Motif searches were conducted using the meme program [46]. The statistical significance of intrafamilial and interfamilial protein sequence similarities (i.e. between members within each of the four families of the MOP superfamily as well as between members of the four constituent families) was established using the gap (ic) program [47,48] and the prss program [49] with the blosum 62 scoring matrix, a gap opening penalty of −8, a gap extension penalty of −2 and 500 random shuffles. The blast2 program [50] was used additionally for comparison of two sequences. The ic program was similarly used for analyses of internally duplicated intraprotein segments. Binary comparison scores are expressed in standard deviations (SD) [51]. A value of 9 SD is deemed sufficient to establish homology [52]. The tms-split program [53], that combines a TMS-prediction program (hmmtop) with a multiple alignment program (clustal w), in conjunction with the ic program was used to identify internal duplications within a family. The tms-align program [53] was used to position TMSs in one protein relative to its homologues. Thus, the positions of the extra TMSs in 14 TMS proteins relative to their 12 TMS homologues could be determined using this program.

The four tables of family members (Tables S1–S4) and the multiple alignments from which the results reported in this paper were derived (Figs. S1–S4), as well as additional supplementary supporting data can be found on our ALIGN website.

Results

The four families of the MOP superfamily

The general characteristics of the four currently recognized families within the MOP superfamily are summarized in Table 2. Columns 1 and 2 present the family abbreviations and the TC # while column 3 gives the number of family members identified. The MATE family is the largest with 203 members while the PST, MVF and OLF families are of decreasing sizes in that order (155, 45 and 8 recognized members, respectively). As shown by the results presented in column 4 of Table 2, most of the members of these four families fall within the same size range. However, a few of the homologues were much larger (Table 2). Thus, in the MATE family, two plant proteins, Ath10 and Ath8, have 1094 and 746 amino acid residues, respectively. The extended hydrophilic regions in these two proteins did not show sequence similarity with anything else in the databases. Of greater interest were four large MVF family homologues, all from high G + C Gram-positive bacteria. These four proteins were Mle of Mycobacterium leprae (1206 amino acid residues), Mtu of M. tuberculosis (1184 amino acid residues), Cgl of Corynebacterium glutamicum (1083 amino acid residues) and Sco of Streptomyces coelicolor (811 amino acid residues). Except for Sco, these proteins exhibit domains (residues ≈ 720–950) that are homologous to regions of eukaryotic-type serine/threonine kinases. Presumably, these C-terminal domains function in a regulatory capacity, possibly to control the activities of the N-terminal transporter protein domains.

Table 2. Characteristics of the families of the MOP superfamily. For the MATE family, two larger homologues were found in Arabidopsis thaliana: Ath10, 1094 aas, and Ath8, 746 aas (see web table S1, Results section). No sequence similarity was observed for the extra portions of these proteins. For the OLF family, two homologues in Kluyveromyces lactis, Kla, 417 aas, and A. thaliana, Ath, 401 aas (see web table S3, Results section) are believed to be fragments. For the MVF family, four larger homologues, all from high G + C Gram-positive bacteria, were found in Mycobacterium leprae, Mle, 1206 aas, Mycobacterium tuberculosis, Mtu, 1184 aas, Corynebacterium glutamicum, Cgl, 1083 aas and Streptomyces coelicolor, Sco, 811 aas. Mle, Mtu and Cgl all include C-terminal domains (residues 720–950) that are homologous to each other and to domains in eukaryotic-type serine/threonine protein kinases. A, archaea; B, bacteria; E, eukaryotes; aas, amino acid residues per polypeptide chain; TMSs, transmembrane α-helical segments.
FamilyTC numberNumber of membersSize range (number aas)Number TMSsDistribution among organisms
MATE2.A.66.1203400–69512; 13A, B, E
PST2.A.66.2155346–58212; 14A, B
OLF2.A.66.38401–54712E
MVF2.A.66.445461–55510, 12, 13, 14, 15B

Column 5 in Table 2 summarizes the topological types identified within each of the four families of the MOP superfamily. MATE and OLF family permeases may all have 12 (or possibly 13) TMSs, but about one-third of all PST family members are predicted to have 14 TMSs, and MVF family members have 10, 12, 13, 14 or 15 putative TMSs. These will be analyzed in greater detail below.

Finally, as summarized in column 6 of Table 2, each of the four families within the MOP superfamily has a distinctive organismal distribution. While the MATE family is present in all three domains of living organisms (archaea, bacteria and eukaryotes), the PST family is found both in archaea and bacteria but not in eukaryotes, while the OLF and MVF family members are restricted to eukaryotes and bacteria, respectively.

The MATE family

Our searches revealed that the MATE family contains 203 currently sequenced proteins, including representatives from all three domains of life (Table S1). In this sense, the family is ubiquitous. The family could be divided into 15 subfamilies (see phylogenetic tree displayed in Fig. 1). Most of the members are of about 450–550 amino acid residues in length and possess 12 putative TMSs. The yeast proteins are larger (up to about 700 residues) whereas the archaeal proteins are generally smaller. Large transporter size is a characteristic of the eukaryotic domain while small size is a characteristic of the archaeal domain [11].

Figure 1.

Phylogenetic tree for the multidrug and toxin extrusion (MATE) family. The tree was derived using the clustal x program. The 15 subfamilies are labeled 1–15 together with the class of organisms from which the included proteins were derived; B, bacteria; Ar, archaea; An, animals; Y, yeast; Pl, plants. The arrow indicates the probable root of the tree as determined with outlying sequences.

Table S1 presents a summary of the 15 subfamilies (phylogenetic clusters) of the MATE family. The subfamilies, some of which include sequence divergent proteins, are as presented in the phylogenetic tree shown in Fig. 1. The subfamily numbers as well as the names, organismal sources and abbreviations of the members of the MATE family are presented in columns 1–4 of Table S1. A short description of the proteins (column 5), the gene names, accession numbers and the database sources (genbank, trembl and swiss-prot) (columns 6–8) are also provided. Finally, columns 9–10, respectively, present the protein sizes in numbers of amino acid residues and the numbers of putative transmembrane α-helical TMSs per polypeptide chain, based on hydropathy plots. The same format of presentation is used for tabulation of the proteins of the PST, OLF and MVF families (see Tables S2–S4).

The functionally characterized members of the family are found in subfamilies 1 (Sce3), 3 (Ath57), 4 (Eco2, Vch1, Vpa1), 7 (Vpa3) and 9 (Bth1). Subfamily 1 consists exclusively of yeast proteins; subfamily 2 includes only mammalian proteins, and subfamily 3 contains only plant proteins, mostly from A. thaliana. Most of the other subfamilies consist exclusively of bacterial and/or archaeal proteins. Of them, only subfamilies 6, 7 and 10 include proteins from both of these prokaryotic domains. In subfamily 14, plant and bacterial proteins cluster very loosely together. Thus, seven subfamilies are bacterial specific, three include both archaeal and bacterial proteins, one is archaeal specific, one includes bacterial and plant proteins, and three are eukaryotic specific. The three eukaryotic subfamilies consist of yeast, animal and plant proteins, respectively (Fig. 1).

Many organisms exhibit multiple MATE family paralogues. For example, among the bacteria, E. coli and Bacillus subtilis each have four paralogues, Listeria innocua and L. monocytogenes each have six and Clostridium perfringens has eight (Table S1). In the eukaryotic kingdom, both S. cerevisiae and S. pombe have three paralogues, but these are not all orthologous to each other. Most impressively, A. thaliana has 58 MATE family paralogues. No archaeon has more than four MATE family paralogues. Individual paralogues from a single species may either be closely related, presumably arising from a recent gene duplication event, or distantly related, arising from an earlier gene duplication event. Extensive phylogenetic studies of more than 70 transporter families have shown that substrate specificity typically correlates with phylogeny [54–56] although exceptions have been reported [57]. This fact allows functional predictions for many uncharacterized transporters.

In Subfamily 7, the Thermotoga maritima homologue clusters loosely together with three archaeal proteins. T. maritima is of evolutionary significance because small-subunit ribosomal RNA phylogeny has suggested that this bacterium is one of the deepest and most slowly evolving bacterial lineages [58]. By using whole-genome similarity comparisons, T. maritima appears to be the most archaeal-like of all sequenced bacteria. It has been suggested that much of the similarity between T. maritima and the archaea is due to a shared ancestry of portions of their genomes as a result of lateral gene transfer [59]. In subfamilies 6 and 10, the archaeal proteins are so distant from the bacterial homologues that the results are probably consistent with vertical transmission from a common ancestor without lateral transfer.

Drug resistances demonstrated for characterized MATE family drug/Na+ antiporters are listed in Table 1. These proteins mediate resistance to a wide range of cationic dyes, fluoroquinolones, aminoglycosides and other structurally diverse antibiotics and drugs. It is interesting to note that while cationic dyes are generally amphipathic and positively charged, aminoglycosides are strongly hydrophilic, and norfloxacin is amphiphilic. Thus, MATE family transporter substrates are diverse in nature.

Average hydropathy and similarity plots for the MATE family are shown in Fig. 2A. All 12 peaks of hydrophobicity are well conserved. Two additional peaks that are very poorly conserved are found just preceding and following conserved peak 12. The peak of hydrophobicity preceding TMS 12 is due to an inserted sequence in just one protein, Hsp2 of the archaeon Halobacterium spNRC-1, while the C-terminal peak following putative TMS 12 is due to extension of the animal homologues. These few proteins may have 13 rather than the usual 12 TMSs that characterize the MATE family. The ‘extra’ regions in these few proteins are presumably nonessential for transport function.

Figure 2.

Average hydropathy plots (top) and average similarity plots (bottom) for the MATE(A),PST(B),OLF(C)and MVF(D)families. The AveHas Program [40] was used to generate the plots with a window size of 19. Alignment position is indicated at the bottom of the figures. The numbers above the hydropathy plots indicate the numbers of the putative TMSs. In A and B, but not C and D, nonhomologous hydrophilic extensions were removed prior to graph generation.

Figure 3 shows an alignment of the first half of a MATE family protein with the second half of the same protein (PAB0243 from Pyrococcus abyssi). The two halves exhibit 40–50% similarity, 30% identity and a comparison score of 14.5 SD. These values are sufficient to establish homology [52]. Homology between the two halves of members of the PST and MVF families could also be established but not for the two halves of members of the OLF family.

Figure 3.

Binary alignment of the first half of a MATE family protein with its second half. This protein is one of the three paralogues from Pyrococcus abyssi (PAB0243). The two halves were aligned using the gap program with 500 random shuffles using the blosum 62 program as the scoring matrix, a gap opening penalty of −8 and a gap extension penalty of −2. The two halves have a similarity of 40.5%, an identity of 29.8% and a comparison score of 14.5 SD. |, an identity; :, a close conservative substitution; ·, a more distant conservative substitution.

The PST family

The sequenced proteins of the PST family are tabulated in Table S2. These proteins are derived exclusively from bacteria and archaea. However, many diverse groups of these organisms are represented. The transport functions of these systems are indicated when gene position or biochemical evidence allows postulation of their substrates. The format of presentation for Table S2 (as well as Tables S3 and S4) is as for Table S1. Interestingly, and in contrast to MATE family members, many PST family members are predicted to exhibit 14 rather than 12 TMSs. However, few proteins are predicted to have odd numbers of TMSs (11 or 13). As will be shown below, the extra two TMSs in the 14 TMS PST family proteins are localized to the C-termini of these proteins.

A dendrogram for the PST family is shown in Fig. 4. Of the 12 clusters shown, only clusters 1, 2, 6 and 12 are restricted to the bacterial domain. All other clusters include both archaeal and bacterial proteins. This surprising observation shows that protein phylogeny does not correlate with organism phylogeny. In contrast to most families of transporters, including the MATE family of the MOP superfamily, extensive horizontal transfer may have occurred during the evolution of the PST family.

Figure 4.

Phylogenetic dendrogram for the polysaccharide transporter (PST) family. The dendrogram was derived essentially as described in the legend to Fig. 1. The 12 subfamilies are labeled 1–12 together with the class of organisms from which the included proteins were derived; B, bacteria; A, archaea.

In some cases we were able to provide convincing evidence that horizontal transfer had in fact occurred. For example, in subfamily 5, Mth3, from the archaeon Methanobacterium thermoautotrophicum, and Cac1, from the bacterium Clostridium acetobutylicum, gave a blastE-value of e−66 with 39% identity and 62% similarity. The gene encoding the Clostridium acetobutylicumprotein (but not that encoding the Methanobacterium thermoautotrophicum homologue) showed a G + C content that differed substantially from that of the DNA of this organism overall (0.31 for the genome and 0.24 for the gene). These results taken together provide strong evidence for lateral transfer of PST family genes across the bacterial–archaeal boundary. It should be noted that evidence for lateral transfer of genes encoding cell surface bacterial polysaccharide biosynthetic enzymes is extensive [60–62].

Figure 2B shows the average hydropathy (top) and similarity (bottom) plots for PST family members. Fourteen peaks of hydropathy are evident, and for each of the first 12 such peaks, there is a corresponding peak of average similarity. However, the last two peaks of hydropathy are not well conserved. These two peaks represent the extra peaks present in a minority of PST family members. The identities of the proteins exhibiting 14 rather than 12 putative TMSs is possible by examining the data presented in Table S2.

About one-third of the PST family members were predicted to exhibit 14 TMSs. Surprisingly, these proteins were found in most subfamilies although only in subfamily 8 did the 14 TMS homologues predominate. In some cases, fairly close homologues were predicted to differ in topology. For example, Axy1 of Acetobacter xylinum (14 TMSs) and Pae6 of Pseudomonas aeruginosa (12 TMSs) in cluster 1 had essentially identical topologies except that Axy1 had a C-terminal extension including the extra two TMSs that were lacking in Pae6. In fact, all 14 TMS proteins that were checked carefully were homologous throughout their first 12 TMSs to the 12 TMSs of their shorter homologues but had an extra C-terminal 2 TMS segment. It was therefore concluded that the 14 TMS topological types arose from the 12 TMS proteins by addition of two TMSs at the C-termini. The phylogenetic analyses suggest that this event has occurred repeatedly throughout the evolutionary history of the PST family.

The OLF family

The proteins of the OLF family are presented in Table S3 and the corresponding phylogenetic tree is shown in Fig. 5. All eukaryotic organisms with a fully sequenced genome have one and only one OLF family member with the notable exception of Plasmodium falciparum, a eukaryotic parasite that lacks N-glycoproteins [63] and also lacks an OLF family homologue. These proteins are of 401–547 residues in length and display variable numbers of putative TMSs, from eight to 14. The two proteins with only eight putative TMSs, from Kluyveromyces lactis and A. thaliana, may be incomplete sequences, due to incomplete sequencing and to nonrecognition of exons, respectively. The other proteins are predicted to have 11–14 TMSs, and this prediction is dependent on the TMS prediction program used. The actual numbers of TMSs may be 12 as suggested by Helenius et al. 2002 [31]. The phylogeny of these proteins follows that of the organisms with the fungal, plant and animal proteins segregating as expected. This fact suggests orthologous relationships for all family members and therefore suggests a common function (Fig. 5).

Figure 5.

Phylogenetic tree for oligosaccharidyl-lipid flippase (OLF) family proteins. See the legend to Fig. 1 for format of presentation.

The average hydropathy and average similarity plots for the six full length members of the OLF family are shown in Fig. 2C. We interpret the results in terms of a 12 TMS topology, the same as the major topological type observed for the MATE and PST families. However, the first two putative TMSs are not strongly hydrophobic, and it is therefore possible that these are localized to the cytoplasmic side of the membrane as has been shown for members of the chromate-resistance (CHR) family of transporters (TC #2.A.51) [64]. Although the proteins of the OLF family can be hypothesized to exhibit a 6 + 6 TMS topology with a large, well-conserved cytoplasmic loop between putative TMSs 6 and 7 (Fig. 2C), homology between the two halves of these proteins could not be demonstrated.

As noted above, the first two putative TMSs displayed in Fig. 2C are quite hydrophilic, and they were therefore examined in greater detail. When putative TMSs 1 of the full-length OLF family members were drawn in an α-helical wheel, the helices (which lack prolyl and glycyl residues) proved to be strongly amphipathic with three fully conserved hydrophilic residues (helix residues Q4, R8 and N15) tightly clustered on one side of the helix. All other residues, including the fully conserved F12, proved to be strongly hydrophobic or slightly semipolar (data not shown). Putative helix 2 was similarly amphipathic with four well-conserved hydrophilic or semipolar residues (S5, E9, Q12 and S13) localized to one side of the helix. These two helices could provide a partially hydrophilic transmembrane pathway for passage of lipid-linked oligosaccharides through the membrane. Alternatively, these two putative helices may be localized to the cytoplasmic surface of the membrane. The remarkable conservation of Q4, R8 and N15 in helix 1 suggests an important function for these residues.

The MVF family

The phylogenetic tree for the MVF family is shown in Fig. 6. Only two organisms, Pseudomonas aeruginosa and Streptomyces coelicolor, both large genome organisms, have more than a single MVF family member encoded within their genomes, and they have only two MVF family paralogues. Except for the two ‘extra’ paralogues, Pae2 and Sco2 (subfamily 8), all α-, β-, γ- and δ-proteobacterial proteins (23 proteins) are found in the lower half of the tree. These fall into three primary clusters: cluster 1 includes only α-proteobacterial homologues, cluster 2 includes only β- and γ-proteobacterial homologues, and cluster 3 includes the one δ-proteobacterial homologue. Within cluster 1, the phylogenies of all α-proteobacterial homologues follow the phylogenies of the 16S rRNAs [65], suggesting that they are orthologues. Within cluster 2, the phylogenies of most β- and γ-proteobacterial homologues are in accordance with those of the 16S rRNAs except for Vch which clusters with Pmu and Hin but should be between Ype and Pae, and Bap which should be close to Eco [65]. Finally, the separate clustering of the δ-proteobacterial protein, Bba, distant from all other homologues, is as expected.

Figure 6.

Phylogenetic tree for mouse virulence factor (MVF) family proteins. See the legend to Fig. 1 for format of presentation. The 15 subfamilies are indicated 1–15.

The upper part of the tree shows sequence divergent proteins from sequence divergent bacteria. Only a few clusters are noteworthy. Thus, cluster 5 includes all four high G + C Gram-positive bacterial homologues, cluster 6 includes the two cyanobacterial homologues, cluster 8 includes the two low G + C Gram-positive bacterial proteins, cluster 11 includes the two ε-proteobacterial proteins, and cluster 15 includes the chlamydial orthologues. Thus, with the exception of just four proteins (Bap, Vch, Pae2 and Sco2) the protein phylogenies follow the organismal (16S rRNA) phylogenies within experimental error. This fact suggests that most of these bacterial proteins are orthologues, possibly serving a single function.

The G + C contents and codon usage frequencies for the four anomalous genes were compared with the corresponding values for the protein-encoding regions of the genomes of the same organisms. For Bap, the G + C content was 26% for both the gene and the organism. For Vch, both values were 48%; for Sco2, the values were 76% for the gene and 72% for the genome; and for Pae2, the values were 72% for the gene and 67% for the genome. In no case was the codon usage frequency for the gene significantly different from that for the organism as a whole. These approaches therefore failed to provide further evidence for recent horizontal gene transfer of genes encoding MVF homologues.

All MVF family proteins fell within the size range 480–555 amino acid residues except for the high G + C Gram-positive bacterial homologues which were large (811–1184 amino acid residues) exhibiting 15 putative TMSs. Three of these four proteins exhibit soluble protein kinase-like domains of about 250 residues in their C-terminal regions. Most of the proteobacterial homologues appear to have 13 TMSs when analyzed with the tms-align program [53] (see Fig. 2D). Thus, there is probably some topological heterogeneity in the MVF family.

Establishment of homology for the four families of the MOP superfamily

The superfamily principle states that if A is homologous to B, and B is homologous to C, then A is homologous to C [52]. We have previously published the criteria used to establish homology, namely a comparison score in excess of 9 SD for two protein sequences of greater than 60 amino acid residues in length [52]. Nine SD corresponds to a probability of 10−19 that the observed sequence similarity arose by chance [51]. In order to establish homology between two coherent families, it is only necessary to establish homology between one member of each of these families. Two such representative examples for each interfamilial comparison are presented in Table 3 although many more with comparison scores in excess of 9 SD could have been selected. When an equivalent number of nonhomologous proteins are compared (i.e. comparing proteins of the MOP superfamily with proteins of the major facilitator superfamily (MFS; TC #2.A.1), values never exceeded 7 SD.

Table 3. Comparison stores establishing homology for the four families of the MOP superfamily. Comparison scores (expressed in standard deviations, SD), percentage identity and percentage similarity were determined using the gap program with 500 random shuffles. blosum 62 was the scoring matrix. A gap opening penalty of −8 and a gap extension penalty of −2 were used. E-value scores were generated using the blast 2 program with blosum 62 as the scoring matrix, a gap opening penalty of 11 and a gap extension penalty of 1. Prot, protein.
Family Prot. 1OrganismAcc. No.FamilyProt. 2OrganismAcc. No.Comparison score (SD)% identity% similarityE-value
MATECac4Clostridium acetobutylicumAAK81286PSTPab1Pyrococcus abyssiQ9UZH42822381e-10
MATECac4Clostridium acetobutylicumAAK81286PSTMka1Methanopyrus kandleri AV19Q8TUX61920325e-07
MATEBsu3Bacillus subtilisP54181MVFFnu1Fusobacterium nucleatum ssp nucleatum ATCC 25586NP_6036061426362e-07
MATEHsp2Halobacterium Sp. NRC-1AAG20494MVFMlo1Mesorhizobium lotiNP_1060071127357e-07
PSTMac4Methanosarcina acetivorans str.C2AQ8TNU8MVFFnu1Fusobacterium nucleatum subsp nucleatum ATCC 25586NP_6036061826416e-9
PSTBsu2BacillussubtilisO34674MVFNme1Neisseriameningitidis Z2491AAF407311926362e-06
PSTCpe1Clostridium perfringensQ8XMR5OLFCel1Caenorhabditis elegansNP_5005811325371e-04
PSTCpe1Clostridium perfringensQ8XMR5OLFSpo1Schizosaccharomyces pombeT407441023361.4e-02

Figure S5 shows a binary alignment of an established MATE family member with an established PST family member. The two proteins exhibit 38% similarity and 22% identity with a comparison score of 28 SD. The corresponding alignment for interconnecting the OLF and PST families is presented in Fig. S6. The two proteins are 37% similar and 25% identical with a comparison score of 13 SD. The corresponding alignment for interconnecting the MVF and PST families is shown in Fig. S7. The two proteins are 36% similar and 26% identical, yielding a comparison score of 19 SD. These comparisons and at least one additional representative interfamilial comparison, giving values in excess of 9 SD, are summarized in Table 3. Many other binary comparisons gave comparison scores of greater than 9 SD. However, the values reported in Table 3 are more than sufficient to establish homology.

Figure 7.

Schematic depiction of the relative degrees of relatedness of the four families of the MOP superfamily.

No member of the OLF family gave a comparison score in excess of 6 SD with a member of the MATE or MVF family, and no member of the MVF family gave a score with a MATE family member as high as the values obtained with PST family members. The values recorded in Table 3 suggest a relative degree of relatedness of the four families as indicated in Fig. 7. It is clear that the PST family is more closely related to the other three families than any two of the latter are related to each other.

Identification of interfamilial conserved motifs

The meme program [46] can be used to identify conserved motifs in families of proteins, and these can be used to identify regions of sequence similarity between families [66]. We therefore applied this program and selected two interfamilial regions of conservation for the pairs of families interconnected by lines in Fig. 7. The results are presented in Table 4. Two members of each of the two families within the MOP superfamily being compared are presented, illustrating the sequence conservation between these families. For each family comparison, two motifs are presented.

Table 4. Interfamilial conserved sequence motifs in the MOP superfamily. Two conserved motifs per family comparison were selected for presentation. Column 1 presents the families from which the proteins selected (columns 2 and 5) were taken. The residue numbers of the first residue in each motif are presented in columns 3 and 6. The actual motifs are presented in columns 4 and 7 with residues conserved between families presented in bold print.
FamilyProteinResidueSequenceProteinResidueSequence
PST vs. MVF
 PSTMth337PYLTRVLGPSto1253GV…ALGNVLLPT
Atu332PILARLLSPSsol3243GV…ALNNVLLPT
 MVFPae2101PWLVRLLGPXfa283GV  ALGTVILPT
Sco2103PVLVRALAPPmu275GI  AISTVILPT
PST vs. MATE
 PSTLla284IFGIFLVLAFGFGGGIIMlo3370NIGLNVVLIPR FGLWGAAMAT
Bha3403IFALATRPELGIMGAALSag1363NWLLNLVLIPH YAAYGAAMAT
 MATESty1179IYGHFGMPELGGIGCGVVch3170TSVLNLILDPI…LGIDGAAIAT
Sen1179IYGHFGMPELGGIGCGVApe1183SSILNVILDPI…LGAVGAAVAT
PST vs. OLF
 PSTMlo3376VLIPRFGLWGAAMATBsu259GFPAAVSKFVSKYNSKGDY
Sag1369VLIPHYAAYGAAMATLinn360GIPLAVAKYIAKYNAMEEY
 OLFSpo370MYIPFMAANGVLEAFKla108GLPLSIILISWQYSNLNSY
Ath273LYIIVLAMNGTSEAFSce121GFPLSIGLIAWQYRNINAY
MATE vs. MVF
 MATECcr1145AEGATL…SLSLPVYACac1163DMKTPMKVNL
Tpa1144AEGERY…YTLVPLSFPab1153DTKTPMKLNI
 MVFYpe55AEGAFS  QAFVPILASty392DIKTPVKIAI
Sty68AEGAFS  QAFVPILASen365DIKTPVKIAI

When the PST vs. MVF families are compared, the two fully conserved motifs are: P-L-R-L-P and G…A-V-L-P-T (Table 4). Comparison of the PST and MATE families gave the two motifs shown in Table 4, the second of which is striking in its degree of conservation. The PST/OLF comparison (Table 4) and the MATE/MVF comparison (Table 4) are also presented. Although the functional and/or structural significance of these motifs are not known, sequence similarity between families is illustrated.

Discussion

In this paper we provide evidence that four previously recognized families, MATE [3], PST [27], OLF [31] and MVF [32] comprise a single superfamily. All of the functionally characterized proteins that comprise three of these four families transport their substrates with outwardly directed polarity. In the case of the MATE family, a solute/cation antiport mechanism is operative, and for the characterized members of this family, Na+ is used preferentially over H+ as the countertransported cation. The energy coupling mechanisms used by members of the PST and OLF families have not yet been investigated using experimental techniques, and a transport function for a member of the MVF family has not yet been demonstrated. However, inclusion of these four families within a single superfamily allows us to extrapolate from the MATE family to the other three families. We propose that all proteins within these families use a substrate/cation antiport mechanism to energize efflux of biologically important molecules, either small molecules as in the case of MATE family proteins, or macromolecules as in the case of the PST and OLF family proteins. Whether the PST and OLF family porters will prove to use a Na+-coupled mechanism as is true for functionally characterized MATE family members has yet to be determined, but this possibility should not be difficult to test. It seems highly likely that MVF family proteins will prove to be transporters using a similar mechanism.

Members of the MATE, OLF and MVF families all proved to exhibit sufficient sequence similarity with PST family members to establish homology. We could not show a similar degree of sequence similarity between any member of the MATE or MVF family and any member of the OLF family, and the degrees of sequence similarity between members of the MATE and MVF families was substantially less than between MVF and PST family members. The PST family is thus the link between the four families of the MOP superfamily, establishing that all four families are derived from a single origin [52]. These observations lead to the suggestion that proteins most closely related to members of the PST family were the primordial systems of the MOP superfamily.

If the primordial system was a complex carbohydrate exporter similar to members of the PST family, then such a system must have mutated to give rise to a primordial drug resistance exporter, the precursor of the MATE family. A similar pathway has been proposed for the evolution of drug exporters from carbohydrate exporters within the ABC superfamily [27,67]. The appearance of the eukaryotic homologues of the OLF family may have resulted from vertical transmission of a gene encoding a PST family protein to the developing eukaryotic kingdom. The strictly orthologous relationships of OLF family members is consistent with this suggestion.

We have provided evidence that proteins most closely related to the PST family were the primordial transporters and therefore suggest that complex polysaccharide export was the function of these proteins. The presence of numerous PST family paralogues in many prokaryotes is consistent with the interpretation that gene duplication events gave rise to paralogues for the purpose of exporting structurally dissimilar surface polysaccharides or their lipid-linked oligosaccharide precursors. The presence of only one OLF family member per eukaryote exhibiting N-linked protein glycosylation is consistent with the notion that these lipid-linked oligosaccharide exporters arose from a single prokaryotic PST family member, of the same function, possibly by vertical descent. As some current members of the PST family are lipid-linked O-antigen exporters [27], this possibility seems highly plausible. By contrast, the MATE family probably arose by mutation of a primordial PST exporter within the prokaryotic domain, and some of the members of this new family were transmitted to eukaryotes, possibly vertically, but also during the endosymbiotic invasion of the eukaryotic cell by blue-green bacteria to give rise to chloroplasts. The MATE family phylogenetic analyses (Fig. 1) are consistent with the suggestion that two distinct pathways for the introduction of these prokaryotic precursor proteins into eukaryotes may have been followed. Why plants such as A. thaliana have so many MATE family paralogues while yeast and animals have so few is an interesting question, worthy of future experimentation.

We have shown that each of the four families of the MOP superfamily have different distributions in the living world (Table 1). The MATE family is ubiquitous, and many paralogues are found in eukaryotes as well as prokaryotes [68]. The PST family is widespread in both archaea and bacteria, and multiple paralogues are present in many of these organisms. OLF family proteins are found only in eukaryotes, and all identified members are probably orthologues. Finally, MVF family proteins are found exclusively in bacteria, and only two bacteria were found to have more than one such homologue. In fact, with only four exceptions (out of 45 recognized family members), the protein phylogenies followed the phylogenies of the 16S rRNAs within experimental error, leading to the possibility that most of these proteins are orthologous, serving the same function. What that function is, is a mystery, but there are a few clues: (a) loss of the function of the homologue in S. typhimurium compromises the virulence characteristics of this bacterium in mice; (b) several MVF family members are encoded within operons that also encode GlnD, the uridylate transferase that functions in nitrogen metabolic regulation in many bacteria and (c) homology with MATE and PST family members leads to the clear suggestion that MVF proteins are secondary carriers catalyzing export of some biologically important molecule. Putting these facts together would lead to the hypothesis that MVF family proteins export substances (like amino sugar-containing polysaccharides) that are important for virulence and are regulated in response to nitrogen availability. A functional genomic approach should provide answers to this interesting question.

The proteins of the MOP superfamily exhibit a variety of topological characteristics (Table 2). Thus, while all or most MATE and OLF family members appear to exhibit 12 TMSs, the first two TMSs in the OLF family proteins are strongly amphipathic, and therefore exhibit less hydrophobicity than would otherwise be expected. The same is true of the first two TMSs in the MVF family proteins which are similarly amphipathic (see Fig. 2C,D, respectively). Whether or not these TMSs are transmembrane or membrane surface localized has yet to be determined, but their striking conservation clearly suggests an important functional role. Additionally, MVF family proteins may have as few as 10 and as many as 15 TMSs, based on hydropathy analyses (Table 2).

The pathways taken for the appearance of these topological types are worthy of consideration. A primordial gene encoding a six TMS protein must have duplicated internally to give a 12 TMS protein. Then, two additional TMSs were added to many PST family proteins, and possibly one or two was/were added to give 13 or 14 TMS proteins in the MATE and MVF families. Finally, in high G + C Gram-positive bacteria, large C-terminal hydrophilic domains, including homologues of eukaryotic-like serine/threonine protein kinases, and a fifteenth putative TMS, became associated with MVF transporters. The functions of these kinase domains is proposed to be related to the regulation of transport, but as the transport substrates are not known for MVF family proteins, the significance of this finding cannot yet be evaluated.

We have noted previously that evidence for horizontal transfer of genes encoding transport proteins between the archaeal, bacterial and eukaryotic domains is largely lacking. However, in this report we provide convincing evidence for horizontal transfer of genes encoding PST family members between bacteria and archaea. The PST family includes polysaccharide (or lipopolysaccharide precursor) exporters, and as the biosynthetic enzymes for these cell surface macromolecules are known to have been subject to extensive lateral transfer [60], it is not surprising that the associated transporter genes have been subject to similar pressures. Avoidance of immune surveillance by host animals may have provided the impetus for such gene transfer events.

Acknowledgements

Work in our laboratory was supported by NIH grants GM55434 and GM64368 from the National Institute of General Medical Sciences. We thank Mary Beth Hiller for her assistance in the preparation of this manuscript.

Supplementary material

The following material is available from http://www.blackwellpublishing.com/products/journals/suppmat/EJB/EJB3418/EJB3418sm.htm

Fig. S1. Mate alignments.

Fig. S2. PST alignments.

Fig. S3. OLF alignments.

Fig. S4. MVF alignments.

Fig. S5. Binary alignment of an established MATE family member with an established PST family member.

Fig. S6. Binary alignment of an established OLF family member with an established PST family member.

Fig. S7. Binary alignment of an established MVF family member with an established PST family member.

Table S1. MATE family members.

Table S2. PST family members.

Table S3. OLF family members.

Table S4. MVF family members.

Ancillary