Protein sequences encoded in three complete bacterial genomes, those of Haemophilus influenzae, Mycoplasma genitalium and Synechocystis sp., and the first available archaeal genome sequence, that of Methanococcus jannaschii, were analysed using the blast2 algorithm and methods for amino acid motif detection. Between 75% and 90% of the predicted proteins encoded in each of the bacterial genomes and 73% of the M. jannaschii proteins showed significant sequence similarity to proteins from other species. The fraction of bacterial and archaeal proteins containing regions conserved over long phylogenetic distances is nearly the same and close to 70%. Functions of 70–85% of the bacterial proteins and about 70% of the archaeal proteins were predicted with varying precision. This contrasts with the previous report that more than half of the archaeal proteins have no homologues and shows that, with more sensitive methods and detailed analysis of conserved motifs, archaeal genomes become as amenable to meaningful interpretation by computer as bacterial genomes. The analysis of conserved motifs resulted in the prediction of a number of previously undetected functions of bacterial and archaeal proteins and in the identification of novel protein families. In spite of the generally high conservation of protein sequences, orthologues of 25% or less of the M. jannaschii genes were detected in each individual completely sequenced genome, supporting the uniqueness of archaea as a distinct domain of life. About 53% of the M. jannaschii proteins belong to families of paralogues, a fraction similar to that in bacteria with larger genomes, such as Synechocystis sp. and Escherichia coli, but higher than that in H. influenzae, which has approximately the same number of genes as M. jannaschii. Certain groups of proteins, e.g. molecular chaperones and DNA repair enzymes, thought to be ubiquitous and represented in the minimal gene set derived by bacterial genome comparison, are missing in M. jannaschii, indicating massive non-orthologous displacement of genes responsible for essential functions. An unexpectedly large fraction of the M. jannaschii gene products, 44%, shows significantly higher similarity to bacterial than to eukaryotic proteins, compared with 13% that have eukaryotic proteins as their closest homologues (the rest of the proteins show approximately the same level of similarity to bacterial and eukaryotic homologues or have no homologues). Proteins involved in translation, transcription, replication and protein secretion are most closely related to eukaryotic proteins, whereas metabolic enzymes, metabolite uptake systems, enzymes for cell wall biosynthesis and many uncharacterized proteins appear to be ‘bacterial’. A similar prevalence of proteins of apparent bacterial origin was observed among the currently available sequences from the distantly related archaeal genus, Sulfolobus. It is likely that the evolution of archaea included at least one major merger between ancestral cells from the bacterial lineage and the lineage leading to the eukaryotic nucleocytoplasm.
Present address: Sequana Therapeutics, Inc., 11099 North Torrey Pines Rd., La Jolla, CA 92037, USA