It is generally accepted that the three-dimensional structure of a polypeptide chain is determined by its amino acid sequence. Nevertheless, similar folds can have very different sequences. One of the ultimate goals in sequence analysis is to predict the structure and function of a protein based solely on its sequence. In cases where the protein of interest shares at least 30% amino acid identity with another protein, the two proteins generally exhibit similar three-dimensional structure (Doolittle 1986). Alternatively, when proteins are known to have similar structure but divergent sequences, consensus sequence motifs can be used to assess the function of unassigned sequences. These consensus motifs usually correspond to residues interacting with cofactors, substrate, or other proteins.
The increasing number of three-dimensional structures of proteins in the Protein Data Bank, complexed with appropriate ligands, provides an important tool for understanding the mechanisms of molecular recognition. In this study, we focussed on flavin adenine dinucleotide (FAD) because it and its related cofactors, nicotinamide adenine dinucleotide (NADH) and adenosine triphosphate (ATP), appear in many biological processes.
Previous comparative structural studies of mononucleotide- and dinucleotide-binding proteins reveal that some exhibit a similar three-dimensional structure with conserved sequence motifs at positions crucial for binding, although the remaining sequence can vary greatly. One of the first folds was identified by Rossmann (Rossmann et al. 1974) who discovered the correlation between the fold of dehydrogenases that bind the cofactor NADH and conserved sequence motifs. The basic structure consists of six parallel β-strands interspersed by α-helices that appear on both sides of the six-stranded β sheet. This symmetrical α/β structure is built from two halves, β1α1β2α2β3 and β4α4β5α5β6, with a crossover α-helix (α3) connecting β3 and β4 (Fig. 1). Each of these folds is known as the classical mononucleotide-binding fold or the Rossmann fold.
A variation of the fold observed for dehydrogenases is found in FAD-containing proteins. This fold consists of one β1α1β2α2β3 Rossmann fold, and a variation of both the second Rossmann fold and the crossover α-helix. This variation includes a three-stranded antiparallel β-sheet connecting β3 and β4, instead of the crossover α-helix observed in dehydrogenases. Moreover the sixth strand, from the second Rossmann fold, is missing whereas the fifth strand is retained but is close to the end of the sequence. Different variations in which structural elements are added or deleted were found in proteins containing other mono- and/or dinucleotides such as flavin mononucleotide (FMN) (Rao and Rossmann 1973), and nicotinamide adenine dinucleotide phosphate (NADPH) (Schulz and Schirmer 1974). Most of these proteins show a series of conserved amino acid residues at positions interacting with the mono/dinucleotide molecule.
We selected a nonredundant set of 32 protein-FAD complex structures from the Protein Data Base (PDB) for structure-sequence analysis with the goal of deepened understanding of principles of molecular recognition. On the basis of sequence-structure comparison and the interaction of cofactor atoms with the different protein residues, we identified several conserved motifs for each structural family. Some of these were previously characterized by others (Schulz and Schirmer 1974; Schulz et al. 1982; Wierenga et al. 1983; Schulz 1992; Correll et al. 1993; Lu et al. 1994; Ingelman et al. 1997; Fraaije et al. 1998) and some are newly derived. These conserved sequence motifs are called “most conserved” when they are present in all family members, and “partially conserved” when present in some but not all family members. In addition to the sequence-structure analysis, we investigated a more complete set of variables, including cofactor conformation, characteristics of the protein pocket wherein the cofactor is bound, cofactor directionality, and correlation of cofactor moieties (adenine, pyrophosphate isoalloxazine, etc.) interacting with conserved sequence motifs in the different family folds. Such fundamental discriminators may improve our understanding of protein evolution, in particular for FAD-binding proteins where many tertiary structures, often distantly related, are known. Furthermore, the presence of a variety of conserved sequence motifs in FAD families, specifically those that are pyrophosphate-binding, can be used as a tool for molecular recognition of phosphate analog-binding proteins.