Medium-chain dehydrogenases/reductases (MDR)

Family characterizations including genome comparisons and active site modelling

Authors


B. Persson, Department of Medical Biochemistry and Biophysics, Karolinska Institutet, S-171 77 Stockholm, Sweden. Fax: + 46 8 337 462, Tel.: + 46 8 728 7730, E-mail: bengt.persson@mbb.ki.se

Abstract

Completed eukaryotic genomes were screened for medium-chain dehydrogenases/reductases (MDR). In the human genome, 23 MDR forms were found, a number that probably will increase, because the genome is not yet fully interpreted. Partial sequences already indicate that at least three further members exist. Within the MDR superfamily, at least eight families were distinguished. Three families are formed by dimeric alcohol dehydrogenases (ADH; originally detected in animals/plants), cinnamyl alcohol dehydrogenases (originally detected in plants) and tetrameric alcohol dehydrogenases (originally detected in yeast). Three further families are centred around forms initially detected as mitochondrial respiratory function proteins, acetyl-CoA reductases of fatty acid synthases, and leukotriene B4 dehydrogenases. The two remaining families with polyol dehydrogenases (originally detected as sorbitol dehydrogenase) and quinone reductases (originally detected as ζ-crystallin) are also distinct but with variable sequences. The most abundant families in the human genome are the dimeric ADH forms and the quinone oxidoreductases. The eukaryotic patterns are different from those of Escherichia coli. Thedifferent families were further evaluated by molecular modelling of their active sites as to geometry, hydrophobicity and volume of substrate-binding pockets. Finally, sequence patterns were derived that are diagnostic for the different families and can be used in genome annotations.

Abbreviations
MDR

medium-chain dehydrogenases/reductases

ADH

alcohol dehydrogenase

CAD

cinnamyl alcohol dehydrogenase

YADH

yeast alcohol dehydrogenase

MRF

mitochondrial response proteins

PDH

polyol dehydrogenases

QOR

quinone oxidoreductases

ACR

acyl-CoA reductase

LTD

leukotriene B4 dehydrogenase.

Medium-chain dehydrogenases/reductases (MDRs) constitute a large enzyme superfamily with (including species variants) close to 1000 members[1,2]. The MDR enzymes represent many different enzyme activities of which alcohol dehydrogenases (ADHs) are the most closely investigated. They participate in the oxidation of alcohols, detoxification of aldehydes/alcohols and the metabolism of bile acids [3,4]. Another MDR branch has polyol dehydrogenase (PDH) activities originally detected for sorbitol dehydrogenase (SDH) [5]. All the corresponding substrates are widespread in nature because of their derivation from glucose, fructose, and general metabolism. In some organisms these substrates, such as polyols, can be accumulated at high concentrations constituting a protection against environmental stress, such as osmotic shock [6], and reduced or elevated temperature [7,8]. Polyol accumulation can, however, be harmful [9], suggesting a further protective role for these enzymes. An MDR family earlier recognized is cinnamyl alcohol dehydrogenase, CAD. This enzyme type in plants catalyses the last step in the biosynthesis of the monomeric precursors of lignin, the main constituent of plant cell walls [10]. This enzyme family has been extensively characterized through CAD from plant sources [11–13], because of its importance for the pulp industry [14]. Down-regulation or inhibition of CAD will reduce wood lignin content and yield a pulp of high quality [15]. A further MDR family long since recognized is the quinone oxidoreductase (QOR)-type, of which one mammalian form functions as a lens protein (ζ-crystallin) [16], mutational loss of which may result in cataract formation at birth. This suggests that ζ-crystallin has a role in the protection of the lens against oxidative damage [17]. In common therefore, as demonstrated by the examples above, all MDR families appear to have some members with protective functions in different organismal defences [2]. All MDR enzymes utilize NAD(H) or NADP(H) as cofactor and several but not all of the members have one zinc ion with catalytic function at the active site. Some, in particular classical, dimeric ADHs, also have a second zinc ion at a structural site, stabilizing an external loop present in those forms [18].

The availability of completed genomes provides an opportunity to evaluate all these members of the MDR superfamily. We have therefore studied the MDR enzymes corresponding to the products from available eukaryotic genomes (and for comparison, the Escherichia coli genome is also included, but not further analysed because of the distant relationships). The total number of MDR forms in each species was evaluated, orthologies were assigned and evolutionary relationships were characterized. In addition, separate sequence motifs were defined and the active site variability was investigated.

Materials and methods

Protein sequences translated from the complete genomes of Homo sapiens[19,20], Drosophila melanogaster[21], Caenorhabditis elegans[22], Arabidopsis thaliana[23], Saccharomyces cerevisiae[24] and Escherichia coli[25] were searched for MDR members using fasta[26] with known MDR proteins [1,2] as query sequences. Hits with an expect value (E value) below 10−10 were extracted and screened for MDR sequence patterns in order to find true members. These sequences were subsequently subjected to another round of fasta searches against the protein sequences from each genome to find further homologues. Multiple sequence alignments were calculated using clustalw[27]. Evolutionary trees were calculated from the alignments using the distance-based techniques, neighbour joining and upgma, and a heuristic search to find the most parsimonious tree. The neighbour joining tree was created using clustalw and the other trees were created using paup[28]. The certainty of each branch point was assessed with bootstrap tests of the different trees. When all three methods agreed on a branch point with a bootstrap value above 90%, the corresponding branch was considered significant, and is marked with an asterisk in Fig. 1. Each protein that was not assigned an unambiguous placement during the bootstrap tests was manually investigated for the appearance of family specific sequence patterns to aid in the classification process. The resulting evolutionary trees were displayed with treeview[29]. All sequences were checked vs. the swissprot database [30] for functional annotations.

Figure 1.

Evolutionary tree of the MDR enzymes from the six genomes investigated. Confidence levels of over 90% from the bootstrap test are marked with an asterisk at the corresponding branch point. The tree branches early, which indicates a divergent superfamily of ancient origin where a limited number of ancestral genes have diverged during the evolution of the separate species. The eight families are enclosed by thick lines.

The active sites of the MDR proteins were investigated by homology modelling using icm (Molsoft LLC, La Jolla, CA, USA) [31]. For the CAD and PDH families, the ketose reductase from Bemisia argentifolii (PDB accession no. 1e3j) [32] is the closest homologue with known three-dimensional structure. For the families of QOR, mitochondrial response factor proteins, leukotriene B4 dehydrogenases (LTDs) and acyl-CoA reductases (ACRs), the three-dimensional structure of E. coli quinone oxidoreductase (PDB accession no. 1qor) [33] was used as template. Active site residues were assigned according to the crystal structure of horse liver alcohol dehydrogenase with bound substrate (PDB accession no. 3bto) [34]. For the homology modelling, the active site residues were replaced according to the multiple sequence alignment of each MDR family (cf. Fig. 1). The replaced residues were positioned initially in the same rotamers as the original residues. Each replacement was followed by a conjugate gradient minimization of 100 function evaluations [35]. The last step in the replacement procedure is a conjugate gradient minimization of 1000 functional evaluations to relieve any remaining unfavourable side chain interactions. The volume of the active site is measured as the space accessible for a carbon probe in the interior of the protein. The hydrophobicity index was calculated by averaging the hydrophobicity values [36] of the active site residues.

Results and discussion

MDR forms in the completed genomes

We find 23 MDR forms (Table 1) upon screening the human genome. In addition, we find three incomplete sequences, still too early to finally evaluate, and therefore not included in this study. Thus, the total number of human MDR forms can be expected to increase slightly. The A. thaliana genome is the one with the greatest number of MDR members (38), which is consistent with a high gene duplication tendency in this organism [23]. Surprisingly, the genome of S. cerevisiae (like that of E. coli) also has many MDR members, especially in relation to its size vs. that of the larger genomes of C. elegans and D. melanogaster, with only 13 and 10 members, respectively (Table 1). Obviously, the MDR super-family exhibits different levels of variability and represents a number of different ancestral gene duplications followed by repeated acquirements of new functions, ‘enzymogenesis’[37].

Table 1. Number of sequences within each MDR family, discernible from the genomes investigated. The six enzymes that do not fit into the eight families are grouped in the column ‘Others’.
GenomeADHCADYADHPDHQORMRFLTDACROthersSum
H. sapiens (38922 ORFs) 9 2 71 211 23
C. elegans (19000 ORFs) 23 2 12 111 13
D. melanogaster (13500 ORFs) 1 3 1131 10
A. thaliana (25464 ORFs) 9 81 1 7111 38
S. cerevisiae (6000 ORFs) 1 24 5 11 1 15
E. coli (4289 ORFs) 1 21 9 1 12 17
Total23128221861656116

From the consensus evolutionary tree (Fig. 1), constructed from the aligned MDR sequences of six genomes (human, D. melanogaster, C. elegans, A. thaliana, S. cerevisiae and E. coli), we can first divide the MDR superfamily into families. Four families are clearly separated from the rest of the tree. These are the dimeric ADHs, CAD, yeast alcohol dehydrogenases (YADH) and mitochondrial response proteins (MRF). Notably, CAD is not only found in plants, but also in the S. cerevisiae genome (as in the E. coli genome), indicating that CAD has a wider function than just lignin biosynthesis, which is consistent with the annotation in swissprot recently changed to mannitol dehydrogenase. Two families contain sequences that are distantly related. These are the PDHs and QORs. Finally, there are the two families of ACRs and LTDs, and a few forms that do not belong to any of the families mentioned.

Half of the families are zinc-containing MDRs and half are non-zinc-containing MDRs. A division can be drawn (dashed line in Fig. 1), with the zinc-containing MDRs in one of the halves (bottom Fig. 1, with families CAD, PDH, ADH, YADH) and the non-zinc-containing MDRs in the other (top Fig. 1 with families QOR, MRF, LTD, ACR).

The number of enzymes in each genome belonging to the different families is listed in Table 1. ADH is the family branch most frequently found in the human genome, while ADH and CAD are the most frequent in the A. thaliana genome. The YADH-type of enzyme is present not only in yeast but also in C. elegans, A. thaliana (and E. coli). These latter organisms therefore have alcohol dehydrogenases of both the dimeric and tetrameric ADH families. In Table 1, the few forms that do not fit into the eight families are grouped in the column ‘Others’.

Comparing our results with those of other databases, e.g. pfam[38] and cog[39], we find that several family members are also represented in the corresponding entries of those databases, supporting our results. However, in contrast to our work, pfam does not subclassify the MDR superfamily. In the cog database, the human and A. thaliana sequences are presently lacking. Furthermore, cog groups the YADH and CAD families together in cog 1064, and the MRF and QOR families together in cog 0604. cog and pfam also include six yeast proteins distantly related to MDR forms but with expect values far below our threshold of 10−10, while we include some members with better expect values which are more closely related to the MDR family but not listed in the other classifications. Two of the distantly related yeast proteins were included in a previous, different genome comparison [2], giving 17 S. cereivisae MDR forms instead of the present 15. However, for E. coli, the number of MDR forms (17) is unchanged.

Of the MDR families now observed, the dimeric and tetrameric ADH families have been recently analysed elsewhere [40] while six families are further considered below: PDH, CAD, QOR, MRF, LTD and ACR. Family distinguishing sequence patterns are also recognized.

The polyol dehydrogenase (PDH) family

The PDH family contains SDH, ketose reductase and threonine dehydrogenase. SDH is present in all genomes of this study, and a corresponding gene with retained function is traceable from prokaryotes to man. This conservation emphasizes that this SDH has an important function common through a wide range of life forms. It further shows species-specific duplications, in a manner well known also in the classical ADH family [41]. The separate SDH duplications appear to have occurred independently in severallinesasreflectedbythehuman,C. elegans,D. melanogaster and S. cerevisiae genomes. These isoforms show 81.2–99.7% residue identity in pairwise comparisons. In addition, S. cerevisiae has one further SDH form that is only 53% identical to the others, indicating the presence of widely separated duplications in the SDH group.

The active site volumes of the PDHs range between 77 and 257 Å3. SDH typically has large volumes of between 210 and 257 Å3. Most of the active site residues are conserved through all PDH forms. Within the SDHs (entries 4–14 of Table 2), 10 out of 16 amino acid residues are strictly conserved, and remaining residues are exchanged only to a conservative extent in most cases. A few further enzymes in Table 2 are annotated as SDH, but their E values are much lower than those for the verified SDHs. In addition, the residues, hydrophobicity and geometry at the active sites are different from those of the confirmed SDHs, indicating that they are likely to represent further types of polyol activities.

Table 2. Members of the MDR superfamily. Active site residues correspond to the noncontinuous sequence. Annotations within parentheses are less certain due to a log E value above −20.


Protein


Annotation

log
E value


Active site residues
Hydro-
phobicity
index

Depth
(Å)

Width
(Å)

Volume
3)
  • a

    Active site residues are at positions 44, 46, 50, 56, 57, 59, 69, 70, 118, 121, 155, 159, 274, 297, 298 and 299 in the numbering of DHSO_HUMAN.

  • b

    b Active site residues are at positions 48, 50, 54, 60, 61, 70, 71, 122, 125, 164, 168, 284, 307, 308 and 309 in the numbering of in CAD1_ARATH.

  • c

    c Active site residues are at positions 42, 44, 45, 47, 48, 53, 64, 89, 90, 93, 124, 128, 241, 255, 264, 267 and 268 in the numbering of EC_3946_qor (QOR_ECOLI).

  • d

    d Active site residues are at positions 63, 65, 66, 68, 69, 73, 94, 121, 122, 125, 156, 160, 285, 300, 309, 312 and 313 in the numbering of SC_YBR026C (MRF1_YEAST).

  • e

    e Active site residues are at positions 45, 46, 47, 49, 50, 55, 63, 92, 93, 96, 128, 241, 256, 267 and 268 in the numbering of LB4D_HUMAN.

  • f

    f Active site residues are at positions 1567, 1569, 1570, 1572, 1573, 1576, 1586, 1611, 1612, 1615, 1645, 1649, 1766, 1781, 1790, 1793 and 1794 in the numbering of FAS_HUMAN.

PDHa
EC_4248.yjjNSorbitol dehydrogenase −28CTANQ-HEVVEILRNA−0.1913.2 7.0212
EC_4158.yjgVl-idonate 5-dehydrogenase−136CSYVGFHEFSEVMFRF 0.3716.7 7.2173
DM_7300579Sorbitol dehydrogenase −26SSVNR-HDLNQLCFRS−0.7216.2 5.9144
EC_1742.b1774Sorbitol dehydrogenase −45CSGFIKHEFTEVTFRY−0.1815.8 9.1257
AT_MSG15–5Sorbitol dehydrogenase −60CSYCAFHEFTEVMFRY 0.1616.4 7.2211
CE_R04B5.5Sorbitol dehydrogenase −71CSYIGFHEFTEVLFRY 0.2616.8 7.8226
CE_R04B5.6Sorbitol dehydrogenase −68CSFIGFHEFTEVLFRS 0.5515.8 7.6244
DM_7298873Sorbitol dehydrogenase −79CSYIGFHEFTEVMFRY 0.1416.9 8.9210
DM_7299382Sorbitol dehydrogenase −77CSYIGFHEFTEVMFRY 0.1415.9 8.4210
DHSO_HUMANSorbitol dehydrogenase−141CSYIGFHEFTEVLFRY 0.2614.6 7.8226
Q9UMD6Sorbitol dehydrogenase−140CSYIGFHEFTEVLFRY 0.2614.6 7.8226
SC_YLR070CSorbitol dehydrogenase −72CSYIAYHEFTEVMFRY 0.0312.9 8.7176
SC_YJR159WSorbitol dehydrogenase−137CSYIGYHEFTEVMFRY−0.1112.9 7.7207
SC_YDL246CSorbitol dehydrogenase−136CSYIGYHEFTEVMFRY−0.1113.3 8.4207
EC_1744.b1776Sorbitol dehydrogenase −23CAHGS-HENLDAMMAY−0.2412.5 7.0212
EC_3538.tdhThreonine 3-dehydrogenase−130CTIWSKHEGVDNIYGR−0.6812.5 8.0205
EC_2496.b2545Sorbitol dehydrogenase −24CSYRAKHEKYSTEWVT−1.2813.2 7.5202
SC_YAL61W(Sorbitol dehydrogenase) −15CTEIFSHELAQVMMCY 0.59 9.7 6.7145
SC_YAL60WButanediol dehydrogenase−152CSEIFSHEFLEVVIGY 0.7712.1 8.5188
EC_0598.b0608Glutathione-dependent
formaldehyde dehydrogenase
 −67CSLIP-HELYSTRFKM−0.3410.2 5.9 77
EC_1550.rspBStarvation sensing
protein RSPB
−130CSIHN-HEVVEIFRLN 0.0516.0 6.8161
EC_2050.gatDGalactitol-1-phosphate−133CSRAH-HEFSEVTWMN−0.7112.4 8.6220
5-dehydrogenase      
CADb
AT_MLD14–17Cinnamyl-alcohol−132CTQGM-HEYVCTFTEE−0.05 9.710.7160
dehydrogenase      
AT_F20D10–90Mannitol dehydrogenase −84CSCHS-HEYVCSITQE 0.0912.5 9.8213
AT_F20D10–110Mannitol dehydrogenase −91CSMGM-HEYPCTLTQE−0.01 8.3 9.0181
AT_F20D10–100Mannitol dehydrogenase −89CTMGL-HESKCTLTQE 0.0012.5 9.8174
AT_T22F8–230Mannitol dehydrogenase −87CTTGY-HEYICTLTQE−0.3910.9 8.0183
AT_F7D8–5Mannitol dehydrogenase −85CSTGF-HEYRCTLTQE−0.3212.8 7.3223
AT_F7D8–21Mannitol dehydrogenase −84CSTGF-HEYRCTLTQE−0.3212.8 7.8223
AT_F28P22–13Cinnamyl-alcohol −67CAWGD-HEFICTITQQ 0.3412.7 7.8198
dehydrogenase      
EC_0317_b0325Mannitol dehydrogenase −61CSQAG-HEYPCTSTQE−0.69 9.110.2213
EC_4160_yjgBMannitol dehydrogenase −41CSMGF-HEI-CTVLRK 0.4412.7 7.8204
SC_YCR105WMannitol dehydrogenase −40CSIGP-HEMPCTLIEQ 0.4912.5 7.4248
SC_YMR318CMannitol dehydrogenase −40CSCGN-HEYPCTLLNQ−0.0312.0 8.1237
QORc
HS_hCP39890(Mycocerosic acid synthase) −18NADLQYLALGETLFLSR 0.2217.0 7.1201
AT_F18E5–200Quinone oxidoreductase −28NADLQYLALGETLPAPR−0.2118.5 6.7220
SC_YBR046CQuinone oxidoreductase −34NIEYFYRISTLTNSRLY−0.4112.5 8.7124
EC_3946_qorQuinone oxidoreductase−121NIDYIYTAQLLTNNRLQ−0.4322.0 7.3218
AT_k11j9–30Quinone oxidoreductase −52NIDYFYMAGMLTQARMM 0.2111.2 8.9137
HS_hCP34852(Quinone oxidoreductase) −16NSDNYYF-YPVTFLAQG−0.3518.4 6.7204
AT_F14J22–2(Alcohol dehydrogenase class III)  −9NSDNFYF-VFTTMLQAG 0.1317.6 7.0199
HS_VAT1_HUMANSynaptic vesicle membrane−119---MAYMVLSVTMQCHL 0.9718.810.2289
protein VAT-1 homologue      
HS_hCP47235 +Synaptic vesicle membrane −25NIDMVIFAFYMTLYVLW 1.4717.0 6.3160
hCP1631114protein VAT-1 homologue      
HS_hCP38146Synaptic vesicle membrane −22STHFDYALLLIENAEEA 0.0417.0 5.5186
protein VAT-1 homologue      
CE_F39B2–3Quinone oxidoreductase −43NVDYIYKYGAVTNVAMS 0.1417.2 7.4211
HS_QOR_HUMANQuinone oxidoreductase−123NVEYIYSSSSITSATS-−0.0513.4 7.1160
AT_T5P19–110Quinone oxidoreductase −21NANLQYSSFLVTFVYSY 0.3516.4 7.9222
HS_QORL_HUMANQuinone oxidoreductase-like 1−139SINKLKRILDRRGLNVW−0.5516.0 6.2188
AT_K15M2–24Quinone oxidoreductase −20NLDRIGRALTFTGLYGI 0.3013.1 5.3168
AT_F5O8–27Quinone oxidoreductase−109NVDKRFYNVKLTG-VN-−0.5619.1 7.6252
AT_F25G13–100(Quinone oxidoreductase) −18NVDKIITVLLVTPMTKK 0.5115.7 6.9161
DM_7295851(ToxD protein)  −7NIDAMGRVVQYTPLTGG−0.0116.0 6.3234
MRFd
SC_YBR026CMitochondrial respiratory function protein−137NSDNQYNLQQVTGFWEK−1.4816.6 7.3198
CE_Y48A6B_9Mitochondrial respiratory function protein −22NLDNRYSFSTITGFAMW−0.1814.0 6.5168
AT_T6D9–100(Mitochondrial respiratory function protein) −15NSDNRYYSPSVTGFWSW−1.0814.1 7.1199
DM_7303260Mitochondrial respiratory function protein −32NADNTYNLALVTGFWRW−0.3118.1 7.3243
CE_W09H1–5(Mitochondrial respiratory function protein) −15NADNQYNDRLVTGFWRW−1.2716.6 6.8202
HS_ENSP234985Mitochondrial respiratory function protein −38NSDNMYNANLVTGFWQW−0.6816.1 6.3201
LTDe
AT_F2K13–110NADP-dependent leukotriene dehydrogenase −37SCDYMRKEEV-TMG-IE−0.6217.6 6.7242
AT_F2K13–140NADP-dependent leukotriene dehydrogenase −37SCDYMGKEEV-TMN-IQ−0.5616.4 6.5251
AT_F2K13–150NADP-dependent leukotriene dehydrogenase −20SCDYMGKEEV-TMN-IQ−0.5619.2 5.9251
AT_F2K13–120NADP-dependent leukotriene dehydrogenase −37SCDYMGQEEV-TMN-IQ−0.5416.5 6.6236
AT_F2K13–130NADP-dependent leukotriene dehydrogenase −34SCDYMGQY---TMN-IQ−0.4519.5 6.0242
AT_T17B22–23NADP-dependent leukotriene dehydrogenase −37SCDYMGEGEL-TMN-IK−0.4117.0 5.8247
AT_F28B23–3NADP-dependent leukotriene dehydrogenase −39SCDYMGVEEV-TMN-LQ−0.1320.3 5.9256
AT_K18L3–100NADP-dependent leukotriene dehydrogenase −37SCDHSGKEEV-TMN-VQ−0.8517.5 6.1249
AT_k19a23–10NADP-dependent leukotriene dehydrogenase −36SCDHSGKEEV-TMN-VQ−0.8517.5 6.6249
AT_F24G16–110NADP-dependent leukotriene dehydrogenase −35SCDYMRKEET-TMN-MQ−1.2517.010.1245
AT_F5I14–32NADP-dependent leukotriene dehydrogenase −37SCDYMRQEEL-TMN-LE−0.8518.5 7.3225
HS_LB4D_HUMANNADP-dependent leukotriene dehydrogenase−118TVDYMKMTTI-TAP-ME−0.0218.8 7.3218
EC_1420_b1449NADP-dependent leukotriene dehydrogenase −44SLDYMSGQDI-TLLRLQ−0.0511.7 6.4161
CE_M106–3_shortNADP-dependent leukotriene dehydrogenase −26SVDAQNETKV-TQHNRE−1.6520.0 6.8248
HS_hCP39255NADP-dependent leukotriene dehydrogenase −26SVDYMNQQTI-TQANRE−1.1819.5 7.3192
SC_YML131W(NADP-dependent leukotriene dehydrogenase) −10SNDAQSETTI-TAG-VK−0.5717.9 6.4291
ACRf
HS_FAS_HUMANFatty acid synthase−199NRDMLLTLVKVTKLLAF 0.7816.3 4.8140
CE_F32H2–5Fatty acid synthase −92NRDMLLAILQVTKLLSI 0.9116.2 5.5186
DM_7289423Fatty acid synthase −59NRDMLLIMVGVTKLLSL 1.0817.4 5.7159
DM_7295848Fatty acid synthase −88NRDMLLAMVKCTKLLSV 0.6415.6 4.8189
DM_7295849Fatty acid synthase −91NRDMLLAMVKCTKLLSV 0.6417.3 6.4189

The zinc-liganding residues, Cys45, His70 and Glu156 (residue numbers according to human SDH) [42], are conserved in most PDH forms. In six PDHs, Glu156 is exchanged for Asp, Gln or Ser. The Asp might act as a zinc ligand [43], but the Gln or Ser are not likely to contribute zinc ligands [43]. The exact nature of one ligand has been much investigated previously [44]. In DM_7300579, two of the zinc ligands, Glu156 and Cys45, are missing and we postulate that this protein does not bind zinc. The coenzyme-binding motif in this protein deviates further, having two of the three ‘coenzyme-typical’ Gly residues [45] replaced by Cys and Ala, respectively. This is expected to give a change in the fold within this region, and this protein may therefore exhibit loss of enzymatic activity, or represent another activity [37].

The cinnamyl alcohol dehydrogenase (CAD) family

This family contains CAD and mannitol dehydrogenases (MTD), represented in A. thaliana by eight forms. However, we also find two of this family's forms in S. cerevisiae (and in E. coli). This family has 43 residues strictly conserved, of which close to half (19) are glycines, typical of unaltered folds [46]. In addition, seven cysteine residues are strictly conserved, of which six correspond to the zinc-liganding positions of ADH, suggesting the presence of two zinc ions in the CAD family.

In plants, the ancestral gene for the CAD family has been duplicated after the separation from fungi, giving rise to the CAD and MTD lines. The substrate specificity, however, has been retained, as both these enzymes act on primary alcohols/aldehydes. CAD is part of the shikimic acid pathway, which leads to synthesis of nearly all plant aromatic compounds. This pathway is unique for plants, bacteria and fungi [47], consistent with the fact that no CAD homologue could be found in the other organisms.

The hydrophobicity index is typically between −0.3 and +0.3 for most CAD/MTD forms (Table 2). The molecularmodelling of the enzymes within the CAD family indicates that some enzymes have a deep (> 12 Å) and narrow (≈ 8 Å) active site, while others have a more shallow (≈ 9 Å) and somewhat wider (≈ 10 Å) active site (Table 2).

Apart from the strictly conserved Cys48, His70, Glu71 and Cys164 (residue numbers according to CAD1_ ARATH), the conserved active site residues of the CAD family are one Glu and four hydrogen-bonding residues (typically Ser/Thr/Gln).

Six A. thaliana CAD forms cluster together (59–98% residue identity), while two A. thaliana CAD enzymes (MLD14.17 and F28P22.13) form two separate lines.

The quinone oxidoreductase (QOR) family

The family containing QORs is variable but has distinct borders (Fig. 1). One enzymatic activity described for these members is QOR [48], but additional activities are likely to exist in the QOR family. In plants, QOR members give protection against diamide compounds, which may be metabolites of alkylating diazoate-derivatives [49].

Several proteins from the QOR family are found in the human genome only (Fig. 1), showing that this family has given rise to novel functions in mammals. These enzymes may therefore be highly important for mammalian metabolic conversions. As some of these enzymes are homologous to the synaptic vesicle protein VAT-1 from Torpedo californica ray, the group might be involved in neuronal functions. This would be consistent with the increased number of QOR forms in mammals. The human VAT-1 homologue displays the largest active site volume (289 Å3) of the OQR subgroup. The VAT-1 related proteins have hydrophobic substrate pockets with hydrophobicity indices up to 1.47 (Table 2).

At the active sites of the proteins of the QOR family, three residues are conserved in close to all forms: Asn41, Asp/Glu44 and Thr127. The QORs and human ζ-crystallin contain Tyr46 and Tyr52. The human QOR has orthologues in all species investigated except D. melanogaster(Table 3). The absence of a QOR member in D. melanogaster might indicate that another enzyme has evolved for this enzyme function in the fruit-fly, as is the case for ethanol dehydrogenase activity, which is often supplied by MDR enzymes, but in the fruit-fly is supplied by a short-chain dehydrogenase [5].

Table 3. Orthologues recognized within the six analysed genomes.
 H. sapiensD. melanogasterC. elegansA. thalianaS. cerevisiaeE. coli
  1. a Shown for the human member; b shown for the S. cerevisiae member; c in one domain of fatty acid synthases; dall A. thaliana forms show equidistant relationships to the LTD forms of other species.

PDH family
(SDH activity) aHS_DHSO_HUMANDM_7298873 CE_R04B5.5AT_MSG15–5YLR070CEC_1742_b1774
HS_Q9UMD6DM_7299382CE_R04B5.6 YJR159W
YDL246C
 
QOR family
(QOR activity) aQOR_HUMANCE_F39B2.3AT_k11j9–30SC_YBR046CEC_3946.qor
MRF family
(MRF) bENSP 234985DM_7303260CE_W09H1.5AT_T6D9–100
ACR family
(ACR activity) a,cHS_FAS_HUMANDM_7295848_shortCE_F32H2.5
DM_7295849_short    
LTD familyHS_LB4D_HUMAN     
(LTD activity) aHS_hCP39255CE_M106.3_shortdSC_YML131WEC_1420.b1449

The mitochondrial respiratory function proteins (MRF) family

In yeast, it has been shown that SC_YBR026C is essential for mitochondrial respiratory function (MRF) [50]. This protein has clearly discernible homologues in all investigated eukaryotic species (Table 3), forming a family, distinguishable from the other non-zinc-containing oxidoreductases (Table 2). The human orthologue (HS ENSP 234985) may be similarly important for mitochondrial function. The active site volumes are 169–243 Å3, indicating large substrates (Table 2). The substrate pocket is polar with hydrophobicity indices as low as −1.48, in contrast to that of most of the other investigated proteins. The active sites of these MRF proteins have seven out of 17 residues strictly conserved. All but two of these conserved residues are polar, contributing to an active siteconcluded to have many hydrogen bonds to the substrate(s).

The leukotriene B4 dehydrogenases (LTD) family

LTDs form a subgroup that have members from all genomes except that of D. melanogaster. In the human genome, we find two forms (LTB4_HUMAN and hCP39255), while in C. elegans and S. cerevisiae, there is only one form (as in E. coli). All these proteins form an orthologue cluster with reciprocal relationships better than 10−15 (Table 3). In addition, we find 11 A. thaliana members of this type. As plants have systems of host-defence like animals, but based upon linolenic [51] rather than arachidonic acid, the functions may indeed be corresponding. Another claim for retained function of the LTD family is that Urtica urens (nettle plant) uses leukotriene B4 as an immunoreactive agent in the defence against herbivores [52]. These proteins may also function as allyl ADHs as they have 70% sequence identity to a protein from Nicotiana tabacum that acts on monoterpene allylic alcohols [53]. The LTD active sites all are deep and narrow (Table 2). The active site is polar in all cases but one, and for the majority of the LTD members, several charged residues are present at the active site. The active site volumes are typically around 250 Å3, all consistent with activity on leukotrienes or similarly sized molecules. The typical residues at the active site are Ser45, Cys46, Asp47, Tyr49, Met50, Glu63, Thr128, Met241, and Asn256.

The acyl-CoA reductase (ACR) family

ACRs form a family (Table 2) that contributes one domain of the fatty acid synthases and erythronolide synthases [54,55]. These ACR members have active site volumes ranging from 140 to 189 Å3 and they are only found in the human, D. melanogaster and C. elegans genomes. Orthologue analysis shows that only two of the three D. melanogaster forms are closely related to the human and C. elegans forms (Table 3). The active sites are hydrophobic (index between 0.64 and 1.08) with narrow and quite deep (15–17 Å) substrate pockets consistent with the nature of their fatty acid substrates. Four leucine residues are strictly conserved at the active site. Several conserved residues are clustered at a surface corresponding to the one that is perpendicular to the subunit interacting surface in dimeric MDR forms. It seems likely that these conserved residues are involved in protein–protein interactions in the multienzymes of fatty acid synthase and erythronolide synthase, defining the subunit-interacting areas.

Sequence patterns

The sequence comparisons and subdivisions make it possible to define sequence patterns useful for characterization of MDR members. For QOR, a prosite pattern [56] already exists (PS01162). However, this pattern is too insensitive to find all the sequences we now classify as QOR members. It only finds five of our presently recognized 18 QOR members. The QOR family is highly divergent, which may explain the poor result of the existing pattern. Based upon the sequences now available, we propose a new pattern that will detect better the QOR members (Table 4). It finds 15 ofthe QOR members, threefold more than the existing prosite pattern, and it misses only three QOR forms, while detecting no false positives.

Table 4. Sequence patterns and screening results. The column hits gives number of MDR forms detected in the six genomes investigated, fp (false positives) gives number of nonmembers detected in swissprot, fn (false negatives) gives the number of proteins classified as a member but not found by the pattern.
GroupPatternhitsfpfn
QOR[GAS]-x-N-x(2)-[DEN]-x(5)-G-x(6,19)-[PS]-x(3)-[GA]-x-[ED]-x(2)-G-x-[VIL]-x(3)-G1503
MRFL-x(6)-[VL]-T-Y-G-G-M-[SA]-[KR] 600
PDH[GA]-[VIL]-[CS]-[GN]-[STA]-D-[VILMS]-[HKP]-x(14,27)-G-H-[ED]-x(2)-G-x-[VI]-x(10,12)-G-[DEQ]-x-[IV]2210
CADC-G-x-C-x(2)-D-x(17)-G-H-E1200
LTDD-x-[YF]-x-[DE]-N-V-G-[GS]-x(3)-[DEN]1600
ACRW-x(5)-W-x(8)-P-x(2)-Y-x(3)-Y-Y 500

The sequence patterns for the PDH and the CAD subgroups are based on residues that bind the catalytic zinc and the substrate. The PDH pattern (Table 4) is somewhat complex but captures all the different substrate specificities of this group and only one false positive when the pattern is screened vs. swissprot. The CAD pattern (Table 4) is shorter and is highly specific due to an additional cysteine residue in the active site region, unique to this family. It finds no false positive match and misses no one of our members. The MRF and the ACR patterns are based upon unusual sequence stretches. The MRF proteins have a highly conserved T-Y-G-G-M motif ideal to base a pattern upon. The pattern finds all the members and no false positive matches from swissprot (Table 4). The ACR pattern is based upon a sequence stretch with many aromatic residues, including two tryptophan residues, which are suitable for pattern recognition as this is the least frequently occurring amino acid (Table 4). The pattern for the LTD group is based upon the comparatively hydrophilic nature of the active site in this subgroup, is very specific, and recovers all members with no false positive matches in swissprot (Table 4).

The patterns are useful for proper recognition of new genomic sequences. They allow rapid annotation into the different families of the MDR superfamily of the huge amounts of sequences generated by ongoing genome projects. They are also ideal for finding particular enzymes in the ever increasing sequence databases.

Acknowledgements

Financial support from the Swedish Research Council, the Swedish Foundation for Strategic Research and Karolinska Institutet isgratefully acknowledged.

Ancillary