We have systematically made a set of precisely defined, single-gene deletions of all nonessential genes in Escherichia coli K-12. Open-reading frame coding regions were replaced with a kanamycin cassette flanked by FLP recognition target sites by using a one-step method for inactivation of chromosomal genes and primers designed to create in-frame deletions upon excision of the resistance cassette. Of 4288 genes targeted, mutants were obtained for 3985. To alleviate problems encountered in high-throughput studies, two independent mutants were saved for every deleted gene. These mutants—the ‘Keio collection’—provide a new resource not only for systematic analyses of unknown gene functions and gene regulatory networks but also for genome-wide testing of mutational effects in a common strain background, E. coli K-12 BW25113. We were unable to disrupt 303 genes, including 37 of unknown function, which are candidates for essential genes. Distribution is being handled via GenoBase (http://ecoli.aist-nara.ac.jp/).
The long-term goal of biomedical research has always been the complete understanding of biological systems. In the last century, reductionist approaches proved immensely powerful in elucidating many biochemical, genetic, and molecular mechanisms. In this century, we are entering a more synthetic phase in which we will accomplish the goal of completely understanding biological systems in their incredible living complexity. This understanding will be expressed in a number of models, ranging from traditional biological understanding (where individuals construct models in their heads) to formal mathematical models. In any case, reaching a complete understanding requires an unprecedented standardization and completeness of data, greatly improved methods of accessing and linking information, and improved techniques and approaches for mathematical modeling.
E. coli K-12 is the best-characterized organism at the molecular level. In the accompanying report, we describe its highly accurate sequence (Hayashi et al, 2006), perhaps more accurate than of any genome of similar size, maybe even error free. Determination of a highly accurate sequence provided the impetus for re-annotation of its genome (Riley et al, 2006), which is of fundamental importance to studies not only of E. coli biology but also of other organisms because properties of more than half of its gene products have been experimentally determined.
More than a half-century of experimental investigation has led to the identification of nearly all the metabolic reactions and the small molecule metabolites involved therein. Many of the regulatory circuits have been identified and computational methods for the predication of many regulatory sites are available. It is thus a truism that ‘… all cell biologists have two cells of interest, the one they are studying and Escherichia coli’ (Neidhardt, 1996). E. coli has the further advantage of being a simple unicellular organism without as extensive an elaboration of compartments and transport mechanisms as are present even in simple eukaryotes such as yeast (Figure 7, Holden, 2002). The completeness of our knowledge and the relative simplicity of E. coli provide compelling reasons for choosing it as the first cellular system to be targeted for complete understanding. This was clearly seen by Francis Crick when in 1973 (Crick, 1973) he proposed ‘Project K: the complete solution of E. coli.’ Of course, his suggestion was hopelessly premature, being before many key technologies, rapid computation, and the web (Crick, 2002).
With a goal towards complete understanding of E. coli as a simple cellular system, we have begun the construction of uniformly designed and comprehensively prepared resources. Here, we describe a complete set of precisely defined, single-gene deletions of nonessential E. coli K-12 genes. These mutants were constructed by using a PCR gene replacement method similar to the one used to create a nearly complete set of yeast gene mutants (Giaever et al, 2002), except by using E. coli cells carrying a plasmid expressing the highly efficient λ Red recombinase (Datsenko and Wanner, 2000) (Figure 8).
Deletions were obtained for 3985 of 4288 targeted genes. Based on finding mutants with the predicted structures, the majority of these 3985 genes are probably nonessential. Because a small fraction (ca. 0.2%) of cells are predicted to contain genetic duplications (Anderson and Roth, 1977), a small number of these 3985 genes may in fact be essential. The majority of the 303 genes for which no mutants were obtained are candidates for essential genes, at least under our selection conditions (aerobic growth on a complex medium at 37°C).
In bacteria, genes are often arranged in operons that are transcribed as a unit and in which neighboring genes frequently overlap a few to several nucleotides. In such arrangements, mutation of a single gene can simultaneously affect function of neighboring or downstream genes. To circumvent these kinds of problems, mutants were designed taking into account gene organization to avoid affecting properties of more than one gene simultaneously. All mutants contain a kanamycin resistance cassette in place of the gene coding region. In most cases, the coding region from the 2nd through the 7th codon from the C-terminus has been deleted. The kanamycin resistance gene is oriented for expression of downstream genes (Figure 8A). Further, the mutants were constructed by use of a resistance cassette that can be easily eliminated (Datsenko and Wanner, 2000). The resultant kanamycin-sensitive derivatives are predicted to encode a small in-frame peptide in place of the mutated gene, in order to reduce effects on expression of downstream genes (Figure 8B).
Results of profiling the mutants for growth on synthetic and rich media are described in the manuscript. These mutants provide a new basic resource not only for systematic functional genomics studies but also experimental data source for systems biology approaches. By providing this resource openly to the research community, the authors hope to contribute to worldwide efforts directed towards a comprehensive understanding of the E. coli K-12 model cell. Accordingly, we are making the entire mutant collection freely available for nonprofit, noncommercial use via GenoBase (http://ecoli.aist-nara.ac.jp) for cost of duplication and shipping fees. Commercial and for-profit investigators should contact one of the corresponding authors directly.