Flavogenomics – a genomic and structural view of flavin-dependent proteins


P. Macheroux, Institute of Biochemistry, Graz University of Technology, Petersgasse 12/II, A-8010 Graz, Austria
Fax: +43 316 873 6952
Tel: +43 316 873 6450
E-mail: peter.macheroux@tugraz.at


Riboflavin (vitamin B2) serves as the precursor for FMN and FAD in almost all organisms that utilize the redox-active isoalloxazine ring system as a coenzyme in enzymatic reactions. The role of flavin, however, is not limited to redox processes, as ∼ 10% of flavin-dependent enzymes catalyze nonredox reactions. Moreover, the flavin cofactor is also widely used as a signaling and sensing molecule in biological processes such as phototropism and nitrogen fixation. Here, we present a study of 374 flavin-dependent proteins analyzed with regard to their function, structure and distribution among 22 archaeal, eubacterial, protozoan and eukaryotic genomes. More than 90% of flavin-dependent enzymes are oxidoreductases, and the remaining enzymes are classified as transferases (4.3%), lyases (2.9%), isomerases (1.4%) and ligases (0.4%). The majority of enzymes utilize FAD (75%) rather than FMN (25%), and bind the cofactor noncovalently (90%). High-resolution structures are available for about half of the flavoproteins. FAD-containing proteins predominantly bind the cofactor in a Rossmann fold (∼ 50%), whereas FMN-containing proteins preferably adopt a (βα)8-(TIM)-barrel-like or flavodoxin-like fold. The number of genes encoding flavin-dependent proteins varies greatly in the genomes analyzed, and covers a range from ∼ 0.1% to 3.5% of the predicted genes. It appears that some species depend heavily on flavin-dependent oxidoreductases for degradation or biosynthesis, whereas others have minimized their flavoprotein arsenal. An understanding of ‘flavin-intensive’ lifestyles, such as in the human pathogen Mycobacterium tuberculosis, may result in valuable new intervention strategies that target either riboflavin biosynthesis or uptake.


Protein Data Bank


redundancy index


Biological cofactors are generally employed by enzymes to enable a wide and diverse range of biochemical transformations necessary for all aspects of life. Some of these cofactors, such as vitamin B12 and vitamin H (biotin), catalyze a small but nevertheless important set of biochemical reactions. Other cofactors, on the other hand, perform very different chemical tasks, and compete for the title of master of versatility, with vitamin B2 (riboflavin)-derived, vitamin B6-derived (e.g. pyridoxine and pyridoxamine) cofactors and cytochrome P450 being the most serious contenders. The yellow vitamin B2, or riboflavin, is synthesized by many bacteria and plants [1,2], and then converted to FMN and FAD (for structures see Fig. 1) by riboflavin kinase (which catalyzes the phosphorylation of the ribityl side chain attached to N10 of the isoalloxazine ring system) and further adenylated by FAD-synthetase in two ATP-dependent reactions [3–5]. These two modified forms of riboflavin occur exclusively in flavin-dependent enzymes. The biochemical utility of FMN and FAD is based on their redox-active isoalloxazine ring system, which is capable of one-electron and two-electron transfer reactions and, most importantly, of dioxygen activation [6]. Generations of enzymologists have marvelled about the astonishing diversity of flavin-dependent reactions, encompassing dehydrogenation [7], oxidation [8–10], monooxygenation [11–13], halogenation [14–16], and reduction (e.g. of disulfides and various types of double bond) [17], as well as their utility in biological sensing processes (e.g. light and redox status) [18–25]. Not surprisingly, this area has been the subject of numerous review articles that have attempted to fathom and rationalize the capabilities of the flavin cofactor [26–32]. The complexity of flavin-catalyzed reactions is further increased when they join forces with other redox-active cofactors, such as iron–sulfur clusters ([2Fe–2S], [3Fe–4S] and/or [4Fe–4S]) [33–35], heme [36], molybdopterin [37], or thiamine diphosphate [38].

Figure 1.

 Structure of riboflavin, FMN, and FAD. The redox-active isoalloxazine ring is shown in its oxidized and two-electron reduced state (red and blue). The numbering scheme for the isoalloxazine ring is indicated in the oxidized structure on the left.

Since the discovery of the first flavin-containing enzyme by Otto Warburg in the 1930s [39], the number of ‘yellow’ enzymes has steadily increased, and there has been a sharp rise in the last 20–30 years, owing to the rapid progress in molecular cloning and full genome sequencing. More recently, structural genomics has led to the structural characterization of many more and hitherto unknown flavoproteins. To gain an overview of flavoproteins, their genomic distribution, and their structural topologies, we have assembled a list of flavoproteins and searched for the encoding sequences in a selection of genomes. In addition, structural information on flavoproteins in the Protein Data Bank (PDB) was analyzed in order to define the flavin-binding pocket according to the PFAM classification scheme [40].

Nature’s flavoprotein arsenal

The list of flavin-dependent proteins was assembled by using, mainly, three on-line databases. First, the enzyme database BRENDA (http://www.brenda-enzymes.org/) was searched for FMN-dependent and FAD-dependent enzymes to compile a preliminary list. This initial list contained many false positives and also missed several flavin-dependent enzymes, as well as flavoproteins with no catalytic or no known catalytic function (e.g. flavin storage proteins). To verify the dependence of a protein on flavin, the primary literature was consulted, and a complementary search for classified enzymes in the Enzyme Structures Database (http://www.ebi.ac.uk/thornton-srv/databases/enzymes/) and the PDB (http://www.pdb.org/pdb/home/home.do) was conducted to link the list of flavoproteins to the available structural information.

The current list of flavoproteins contains 276 fully classified enzymes and 98 entries for enzymes with no or incomplete classification as well as flavoproteins without a demonstrated enzymatic activity (cofactor storage, electron transfer, repressor and response proteins; 17 entries). As could be expected for a redox-active cofactor, the majority of flavoenzymes are found in enzyme class 1: oxidoreductases account for 91% (251 entries), whereas transferases, lyases, isomerases and ligases contribute only 4.3% (12 entries), 2.9% (eight entries), 1.4% (four entries), and 0.4% (one entry) (Fig. 2A). Within the class of oxidoreductases, the three largest subgroups are enzymes in EC 1.1.4 (61 entries for monooxygenases/hydroxylases), EC 1.1 (38 entries for enzymes oxidizing a CH–OH group), and EC 1.1.3 (30 entries for enzymes oxidizing a CH–CH group) (Fig. 2B).

Figure 2.

 Pie chart of flavoproteins found in various enzyme classes: yellow, class 1 (oxidoreductases); orange, class 2 (transferases); red, class 4 (lyases); blue, class 5 (isomerases); and green, class 6 (ligases). This chart was generated by using the fully classified flavoenzymes (a total of 276) from Table S1.

FAD is clearly more common as a cofactor than FMN, with 289 proteins depending on FAD (75%) and 98 on FMN (25%) (note: entries where cofactor utilization is unclear were not considered; see Table S1). Riboflavin is not used in any enzymes (except for riboflavin kinase/FAD synthetase as a substrate), but appears to be the preferred storage form of the cofactor in some organisms (e.g. riboflavin-binding protein in chicken eggs and dodecin in archaeons [41,42]). In addition, organisms (e.g. mammals) lacking vitamin B2 biosynthesis employ riboflavin-specific transporters to sequester it from dietary sources by facilitated diffusion [43].

In the majority of enzymes, the cofactor is noncovalently bound in the active site. Covalent attachment of the flavin cofactor has been confirmed in 40 cases (see Table S2), corresponding to ∼ 10.8% of all flavoproteins listed in Table S1. Apparently, covalent attachment of FMN (five entries) occurs rarely as compared with that of FAD (35 entries). Different types of covalent attachment have been found for FMN. It is linked either to the 8α-position (via N3 of a histidine) or to the 6-position (via the thiol group of a cysteine) of the isoalloxazine ring [44], or, in one case, it is bicovalently linked to N1 of a histidine and the thiol group of a cysteine [45]. Only recently, a novel attachment of FMN to redox-driven ion pumps (RnfG and RnfD) via an ester linkage between the hydroxyl group of a threonine and the ribitylphosphate side chain of the cofactor was discovered [46]. On the other hand, covalent linkage of FAD always occurs via the 8α-position, to either the N1 or N3 of a histidine, a cysteine thiol, a tyrosine hydroxyl, or an aspartate carboxyl group (Table S2) [44,47]. In five enzymes, FAD is bicovalently attached via the 8α-position and 6-position of the isoalloxazine ring system [48]. Bicovalent attachment was first discovered only 5 years ago, but appears to be more common than monocovalent attachment to the 8α-position via cysteine, tyrosine, or aspartate [49,50].

Flavoprotein structures

The first structure of a flavin-dependent protein was reported in 1972 for a bacterial flavodoxin [51,52]. Several years later, the structures of the FAD-dependent enzymes glutathione reductase (EC and 4-hydroxybenzoate 3-monooxygenase (p-hydroxybenzoate hydroxylase; EC were described [53,54]. Since that time, the numbers of deposited structures have risen to 646 and 1179 structures of FMN-dependent and FAD-dependent proteins, respectively (as of 31 December 2010), and this has been paralleled by efforts to relate the structures of flavoproteins to their functions [55–58]. The structure of flavodoxin, a small electron transfer protein that uses FMN as a cofactor, is not only the first but also by far the most frequently solved structure of all flavin-dependent proteins (> 120 entries in the PDB).

Currently, structures are available for 55 FMN-utilizing and 141 FAD-utilizing flavoproteins, accounting for ∼ 52% of all flavoproteins listed in Table S1. Overall, a total of 23 structural clans (according to the PFAM classification [40]) is represented by flavin-dependent proteins, and the structural topologies are therefore quite diverse in comparison with other cofactor-dependent enzyme families; for example, all pyridoxal 5′-phosphate-dependent enzymes adopt one of five different structural topologies [59].

As can be seen from Fig. 3, FMN and FAD binding are vastly different with respect to the topology of the binding pocket, indicating that the adenosine moiety strongly affects the mode of cofactor binding. The preferred structure for FMN binding is the classical (β/α)8-barrel (clan TIM_barrel), with 16 entries, and the flavodoxin-like fold (clan Flavoprotein), with 12 entries. Together, these two clans account for more than half of the currently known FMN-dependent structural types. Graphical representations of these two most common topologies in FMN-dependent proteins are shown in Fig. 4A,B. Within the clan TIM_barrel, five families are found in FMN-dependent enzymes: FMN_dh (six entries), Oxidored_FMN (five entries), and DHO_dh, Glu_synthase and NPD (one entry for each family). In the clan Flavoprotein, nine proteins adopt a Flavodoxin_1, two an FMN_red and one a recently discovered Flavodoxin_NrdI fold. All of the FMN-dependent proteins in this clan serve as electron transfer proteins or act as two-electron reductases for free flavin (FMN reductase, EC or other electron acceptors (e.g. azobenzene reductase, EC In addition to these two most abundant structural clans, FMN-dependent proteins are found in 12 rare folds. Some of these folds are unique structures, and are found in only one or a few enzymes, such as bacterial luciferase (Bac_luciferase), nitroreductase (Nitroreductase fold), phosphopantothenate-cysteine ligase (clan NADP_Rossmann/family DFP), and chorismate synthase (chorismate_syn). The latter two examples are very interesting, because these two enzymes do not catalyze net redox reactions and are not classical oxidoreductases, like most flavin-dependent enzymes (Fig. 2). This observation suggests that FMN-dependent enzymes used for ‘aberrant’ activities have evolved independently from the canonical FMN-dependent oxidoreductases, or, in other words, the folds necessary to carry out the enzymatic reaction were not ‘borrowed’ from the oxidoreductases, but instead novel topologies have arisen during the evolution of these enzymes. As will be discussed below, this tendency for unusual reactions to call for unusual folds is also found in FAD-dependent enzymes.

Figure 3.

 Bar plot of the distribution of structural clans (according to the PFAM classification) in FMN-dependent (A) and FAD-dependent (B) flavoproteins.

Figure 4.

 Graphical representation of the two most common structural clans for FMN-dependent (A, B) and FAD-dependent (C, D) proteins. The examples show the structures of flavodoxin from (A) Desulfovibrio vulgaris (PDB entry 1fx1), (B) bold yellow enzyme from Sa. cerevisiae (PDB entry 1oyc), (C) glutathione disulfide reductase (PDB entry 3grs), and (D) UDP-N-acetylmuramate dehydrogenase (PDB entry 1mbt), representing the clans TIM_barrel, Flavoprotein, NADP_Rossmann, and FAD_PCMH, respectively. The structure representations were generated with pymol.

The topologies found for FAD binding are dominated by the Rossmann fold or variations thereof, contained in the clan NADP_Rossmann (Fig. 4C) [56]. This structure clan comprises a large number of families (148), with nine families reported to serve for FAD binding. Almost half of the FAD-dependent proteins exhibit a fold in this clan (Fig. 3, bottom panel). Second to the clan NADP_Rossmann is the clan FAD_PCMH (two families; for a graphical example, see Fig. 4D), followed by the clan FAD_Lum_binding (five families) and the clan Acyl-CoA_dh (four families). Together, the structures found in these four clans account for 75% of all FAD-dependent proteins. The clans that are rare appear to occur predominantly in proteins with special biological functions, such as light-dependent DNA repair (deoxyribodipyrimidine photolyase, EC4.1.99.3), oxidoreductase activity in the endoplasmic reticulum (ERO1), or electron transfer from acyl-CoA dehydrogenases to the electron transport chain (clan 4Fe–4S). As discussed above for FMN-dependent proteins, this observation suggests that employment of FAD-dependent enzymes for novel or unusual functions requires the adaptation of already existing topologies and, in some cases, new structural designs to fulfill the desired role.

The majority of covalently bound flavins are present as FAD rather than FMN (Table S2). Interestingly, covalent attachment of FAD occurs only in the two most abundant clans, NADP_Rossmann and FAD_PCMH, and is almost equally distributed between these two clans (Table S2). Several families in the clan NADP_Rossmann are associated with covalent FAD linkage (DAO, GMC_oxred_N, FAD_binding_2, Amino_oxidase, and Trp_halogenase). This is in contrast to the clan FAD_PCMH, where covalent linkage is found in the family FAD_binding_4 but not in the family FAD_binding_5, which comprises FAD-containing and molybdopterin-containing enzymes, such as xanthine oxidase (EC and quinoline-2-oxidoreductase (EC, to mention only two representatives of this family (Table S1). Covalent linkage is highly prevalent in the family FAD_binding_4: 11 of the 14 structures reported for this family show monocovalent or bicovalent flavin attachment, with UDP-N-acetylmuramate dehydrogenase (EC, D-lactate dehydrogenase (EC and alkyldihydroxyacetone phosphate synthase (EC2.5.1.26) being the only exceptions (Table S2).

Impact of structural genomics consortia

Several structural genomics projects on prokaryotic and eukaryotic species have been initiated, in order to define the structures of expressed proteins in the target organism. A total of 173 (86 for FMN-utilizing proteins and 87 for FAD-utilizing proteins) entries have been deposited by structural genomics consortia since 1999, amounting to ∼ 10% of the total entries (∼ 1800 entries; 640 for FMN-utilizing proteins and ∼ 1160 for FAD-utilizing proteins). Analysis of the structural classification for FMN-dependent proteins reveals a strong bias towards the clan Nitroreductases, with a total of 27 entries (∼ 31%). As this clan has only a moderate frequency among FMN-dependent proteins (Fig. 3, top panel), this overrepresentation suggests that this type of structure is favored by the methodologies currently used in structural genomics pipelines. The aim of the consortia to elucidate the structures of as many different proteins as possible also leads to a serious lack of biochemical information, which renders some of the PDB entries difficult to interpret in terms of the biological function of the flavoprotein. On the other hand, several structures of new flavoproteins with unknown roles have been contributed by structural genomics initiatives. For example, a zinc-dependent protease from Bacteroides thetaiotaomicron (clan Glutaminase_I, family DJ-1/PfpI, PDB entry 3cne) and protein structures with a fold similar to the C-terminal domain of pyruvate kinase in the archaeons Archaeoglobus fulgidus and Methanobacterium thermoautotrophicum were recently deposited in the PDB (clan PK_C, PDB entries 1vp8 and 1t57). However, the role of the FMN cofactor in these two proteins is unclear. In the putative protease, the flavin isoalloxazine ring is sandwiched by two tryptophans at the interface of the dimeric protein, with the edge of the pyrimidine ring moiety at distance of 15 Å from the presumably catalytic mononuclear zinc center. Hence, the flavin does not appear to play a role in catalysis, but may instead be involved in dimerization of the protein or act as a gate for potential substrates to enter the active site. On the other hand, the flavin in the pyruvate kinase fold in archaeons is located in a central cavity of the protein, and engages in hydrogen bond interactions with several amino acid side chains. In this case, it seems plausible that the flavin plays a catalytic role, albeit in a type of fold that has not previously been implicated in flavoenzyme catalysis. Furthermore, an FMN-dependent oxidoreductase from Thermotoga maritima was the first structure of a flavin-dependent tRNA dihydrouridine synthase (clan TIM_barrel, family Dus; PDB entry 1vhn), an enzyme that has recently been characterized biochemically [60].

In the case of FAD, the entries provided by structural genomics consortia reflect the predominance of the clan NADP_Rossmann, with 44 of 87 entries belonging to this clan. Interestingly, several new structural families for FAD-dependent proteins were defined in the course of structural genomics efforts, such as the bluf domain of blue light sensors in cyanobacteria (1x0p), the glucose-inhibited division protein A (GidA) domain in the clan NADP_Rossmann, the HI0933-like proteins (first discovered in target 0933 from Haemophilus influenzae, PDB entry 2gqf), and a siderophore-interacting protein (family FAD_binding_9 in the clan FAD_Lum_binding). In addition, a novel covalent attachment between a side chain carboxylate group of an aspartate and the 8α-position of the isoalloxazine system was discovered in an FAD-dependent halogenase involved in chloramphenicol biosynthesis in Streptomyces venezuelae [47]. As noted before, this structural information provides interesting leads for biochemists to follow up and subject these proteins to thorough biochemical characterization in order to reveal their cellular role.

Flavogenomics – occurrence and distribution of flavoproteins in prokaryotes and eukaryotes

Despite the availability of genomic sequence information, it proved difficult to obtain reliable information on the occurrence of flavoproteins encoded in the genomes of various organisms. This is mostly because of the lack of information on whether a flavin (FMN and/or FAD) cofactor is present and the precise biochemical reaction catalyzed by the enzyme. On the other hand, it is doubtful that all, or even most, of the proteins predicted by genomics will ever be subjected to a detailed characterization that would enable accurate functional assignment of a putative flavoenzyme. For most of the species analyzed, we used the annotations provided by the responsible sequencing facility, and included only those entries that gave a clear indication of flavin dependence (see Methods). This approach probably leads to an underestimation of the number of flavoproteins, as many ‘hypothetical’ or ‘putative’ proteins may be flavin-dependent but are not annotated as such. An interesting alternative to use of the existing annotations is the analysis of predicted protein families as provided by the Broad Institute for Neurospora crassa (http://www.broadinstitute.org/annotation/genome/neurospora/Pfam.html) and on the tuberculosis research platform for Mycobacterium tuberculosis and Streptomyces coelicolor (http://www.tbdb.org/). Therefore, we have also used our set of structural families (Table S3) to search for proteins predicted in the above-mentioned species. In the case of M. tuberculosis, a parallel analysis of the available genome annotation was conducted. The ‘structural family approach’ has generated a significantly higher number of predicted flavoproteins (141 versus 113), as many hypothetical proteins are found in protein families that are typical or even specific for flavoproteins (e.g. FAD_binding_4 or NPD) and hence were included as predicted flavin-dependent proteins. The disadvantage of this more ‘inclusive’ analysis is that some of the protein families, such as PAS_3, are not specific for flavin and may utilize other cofactors (e.g. heme). In any case, the task of eliminating the false positives and false negatives inherent in both approaches can only be performed by biochemical characterization of predicted and suspected flavoproteins. To this end, structural genomics may also play an important role; however, flavoenzymes that do not hold on tightly to the flavin cofactor (e.g. chorismate synthase) or use it only transiently during catalysis (e.g. hydroxypropylphosphonic acid epoxidase) may elude identification as flavin-dependent proteins.

Although it is presently not possible to determine the exact number of flavoproteins, our analysis has revealed striking differences in the utilization of flavin-dependent proteins in various prokaryotic and eukaryotic species, which are reflected both by the total number and the percentage of genes encoding flavoproteins (Fig. 5). Several species appear to have a minimum number of flavin-dependent proteins that are required to maintain basic metabolic functions, such as succinate dehydrogenase which is necessary for primary energy metabolism, and chorismate synthase and acetolactate synthase, which are necessary for amino acid biosynthesis. Examples of species with a minimal set of enzymes are Pyrococcus abyssi, T. maritima, and Saccharomyces cerevisiae (with 12, 12 and 48 entries, respectively). On the other hand, organisms such as M. tuberculosis, Neurospora crassa, S. coelicolor and Arabidopsis thaliana contain a relatively large number of genes encoding flavin-dependent proteins. In these cases, flavoenzymes are apparently involved in a species-specific lifestyle that requires a much larger set of flavoenzymes than are needed by the ‘flavin minimalists’ mentioned before. Closer inspection of the set of flavoenzymes in these organisms reveals a multitude of one or several types of flavin-dependent proteins. In order to estimate this redundancy of a ‘flavogenome’, we have defined the quotient of the number of distinct flavin-dependent proteins (i.e. with different EC numbers) and the total number of flavin-dependent proteins as a ‘redundancy’ index (RI) (RI = 1 indicates a nonredundant flavogenome, whereas RI < 1 indicates increasing redundancy; Fig. 5C). In the case of M. tuberculosis, ∼ 34 genes encoding acyl-CoA dehydrogenases and ∼ 10–15 genes encoding flavin-containing monooxygenases and oxidoreductases give rise to high redundancy (RI = 0.55; Fig. 5C). The occurrence of this many acyl-CoA dehydrogenases is apparently related to the extensive and complex utilization of lipids from host cells by this pathogenic bacterium [61]. A large number of genes encoding acyl-CoA dehydrogenases is also found in S. coelicolor, and is only exceeded by putative flavin-dependent oxidoreductases, with 57 predicted genes. Again, the abundance of these flavoenzymes can be rationalized on the basis of the lifestyle: S. coelicolor is a rather immobile soil bacterium that can adapt to various carbon and nitrogen sources and produces a large number of biologically active compounds, such as antibiotics. In other words, the organism depends on metabolic power and versatility that are certainly conferred to some degree by flavin-dependent enzymes. In contrast to M. tuberculosis and S. coelicolor, N. crassa has apparently pursued a different metabolic strategy by using a broader array of flavoenzymes rather than a highly similar set, as indicated by the rather high RI (0.74 versus 0.5 and 0.55 for S. coelicolor and M. tuberculosis, respectively). As a result, N. crassa contains more than 100 different flavoproteins, more than any other species analyzed in our study. The large number of flavoenzymes in this filamentous fungus may be attributable to diverse biosynthetic routes leading to secondary metabolites, as well as the saprotrophic lifestyle, which requires the generation and secretion of oxidases and dehydrogenases to access organic matter in the environment. In this context, it is noteworthy that the protein family FAD_binding_4 constitutes the largest group among the predicted putative flavoenzymes in this species. Members of this family are typically oxidases that are capable of performing a wide range of substrate (e.g. sugars and alcohols) oxidation reactions [48].

Figure 5.

 Occurrence and distribution of flavoproteins in 22 selected genomes. (A) The number of genes encoding flavin-dependent proteins in the genomes of My. genitalium, Ar. fulgidus, Me. janaschii, P. abyssi, Pl. falciparum, To. gondii, Sa. cerevisiae, N. crassa, A. thaliana, D. melanogaster and Homo sapiens. (B) The numbers of predicted flavoproteins as percentages of the total proteins for the species in (A). (C) The RIs of flavoproteins in these genomes. Yellow bars indicate genomes with low redundancy, and brown bars indicate genomes with high redundancy.

The flavogenome of the model plant A. thaliana is the most prolific among the analyzed genomes. This is mostly because of the occurrence of two large groups of flavoproteins, monooxygenases and oxidases of the (S)-tetrahydroprotoberberine oxidase/berberine bridge enzyme family, with 31 and 26 members, respectively. As previously discussed for microbial genomes, the large number of enzymes in these two flavoprotein families is a reflection of the diversity of metabolic processes employed to synthesize a vast array of bioactive compounds. In the case of plants, natural products such as alkaloids and terpenes are among the compounds synthesized for signaling and defense purposes. Several members of the berberine bridge enzyme family are implicated in plant metabolism, such as (S)-tetrahydroprotoberberine oxidase, nectarin V [62], and pollen allergen proteins [63]. Therefore, it can be expected that most of the flavoproteins occurring in these two groups will catalyze distinct reactions on various different substrates.

The RI seems to be a useful tool with which to identify organisms that have a ‘flavin-dependent’ lifestyle because of their high demand for chemically complex biomolecules, and which are thus potentially vulnerable to inhibitors of riboflavin biosynthesis and/or uptake [64–66]. Although it is apparent that major species-specific differences exist, the currently estimated RIs are probably too low for several species, owing to the lack of biochemical knowledge of the enzymes in the most common flavoprotein families. Hence, future efforts to define the flavoprotein arsenal of an organism have to focus on three aspects: to capture all true flavin-dependent proteins, to eliminate false positives, and to characterize the flavoproteins biochemically in order to classify them accurately. As a significant first step, it would be useful to conduct an HMMER analysis [67] of the existing genomes to provide a list of potential flavoproteins, to enable scientists to target specifically these putative genes for biochemical and structural studies.


Flavoproteins from different species were identified by screening pertinent databases. Microbial genomes were analyzed by screening the databases provided by the J. Craig Venter Institute (Ar. fulgidus DSM4304, Bacillus subtilis 168, Chlamydia trachomatis serovar D, Deinococcus radiodurans R1, Escherichia coli K-12, Helicobacter pylori 26695, Methanocaldococcus jannaschii, M. tuberculosis CDC1551, Mycoplasma genitalium G-37, Pseudomonas aeruginosa PAO, P. abyssi, Staphylococcus aureus MW2, T. maritima, and Vibrio fischeri ES114). Putative flavoproteins in N. crassa were retrieved by a web-based analysis of the known flavin-dependent protein families listed in Table S1 on http://www.broadinstitute.org/annotation/genome/neurospora/MultiHome.html. Flavoproteins in the yeast Sa. cerevisiae were identified with the annotations available on the yeast genome website at http://www.yeastgenome.org. A similar approach was used for M. tuberculosis and S. coelicolor A3(2) (http://www.tbdb.org/). Information on flavoproteins in the human parasites Plasmodium falciparum and Toxoplasma gondii were retrieved by inspection of http://plasmodb.org/plasmo/ and http://toxodb.org/toxo, respectively. Flavoproteins from A. thaliana were retrieved by a keyword and protein name search (flavin, FMN, FAD, dioxygenase, monooxygenase, hydroxylase, and the individual names of all flavoproteins listed in Table S1), with the ARabidopsis Gene EXpression Database (AREX) (http://www.arexdb.org/index.jsp). Analysis of flavoproteins in Drosophila melanogaster was based on a search in http://flybase.org/ and http://www.brenda-enzymes.org. Human flavoproteins were identified by a text search with the enzyme names from Table S1 in the Online-Mendelian Inheritance in Man (OMIM) database (http://www.ncbi.nlm.nih.gov/omim).