DNA replication in Escherichia coli is normally initiated at a single origin, oriC, dependent on initiation protein DnaA. However, replication can be initiated elsewhere on the chromosome at multiple ectopic oriK sites. Genetic evidence indicates that initiation from oriK depends on RNA-DNA hybrids (R-loops), which are normally removed by enzymes such as RNase HI to prevent oriK from misfiring during normal growth. Initiation from oriK sites occurs in RNase HI-deficient mutants, and possibly in wild-type cells under certain unusual conditions. Despite previous work, the locations of oriK and their impact on genome stability remain unclear. We combined 2D gel electrophoresis and whole genome approaches to map genome-wide oriK locations. The DNA copy number profiles of various RNase HI-deficient strains contained multiple peaks, often in consistent locations, identifying candidate oriK sites. Removal of RNase HI protein also leads to global alterations of replication fork migration patterns, often opposite to normal replication directions, and presumably eukaryote-like replication fork merging. Our results have implications for genome stability, offering a new understanding of how RNase HI deficiency results in R-loop-mediated transcription-replication conflict, as well as inappropriate replication stalling or blockage at Ter sites outside of the terminus trap region and at ribosomal operons.
The normal cycle of DNA replication in Escherichia coli is a highly coordinated process that begins with the DnaA-mediated opening of the DNA duplex at the chromosomal origin site oriC (for a review, see Mott and Berger, 2007). Following the assembly of replication machinery at the site, replication proceeds bidirectionally around the circular chromosome until the two replication forks meet in the terminus region, generally within the fork trap between TerA and TerC (Duggin and Bell, 2009).
As the key bacterial initiation protein, DnaA is generally essential for cell viability. However, E. coli cells can utilize an alternative mode of DNA replication, constitutive stable DNA replication (cSDR), that is independent of DnaA or concomitant protein synthesis (Kogoma, 1997). cSDR can drive chromosomal DNA replication in knockout mutants of the rnhA and recG genes, which encode RNase HI and RecG respectively (Ogawa et al., 1984; Hong et al., 1995). These two proteins are involved in the removal of RNA from R-loops (RNA-DNA hybrids in otherwise duplex DNA) that form on the chromosome. While cSDR is normally repressed under physiological conditions, it may be activated and play an important role even in wild-type cells under unusual conditions such as entry into stationary phase or replication after DNA damage (Hong et al., 1996; Camps and Loeb, 2005; also see below).
Based on these and other observations, the late Tokio Kogoma and colleagues proposed that replication initiation via cSDR involves specific chromosomal sites called oriK, where R-loops form by invasion of duplex DNA with a homologous RNA transcript (von Meyenburg et al., 1987; Kogoma, 1997). In this model, the invading RNA strand acts as a primer for replication initiation and the displaced DNA strand serves as an assembly site for the replicative helicase. While oriK sites have never been precisely identified, Kogoma's group attempted to map them using an imprecise chromosomal marker frequency approach that utilized 21 hybridization probes around the chromosome (de Massy et al., 1984a; see Masters and Broda, 1971, for the first use of copy number analysis in analysing E. coli replication). Based on the ratio of copy number of these probes in exponentially growing versus resting rnhA− ΔoriC cells, they argued for the existence of four or five oriK sites in the chromosome, mapped to broad (150–200 kb) regions (de Massy et al., 1984a). Each oriK site was argued to be quite weak in replication activity relative to that of oriC. Interestingly, the terminus region of the chromosome showed the highest copy number in rnhA− ΔoriC cells, which led the authors to conclude that at least two oriK sites exist in this region. Kogoma's group made great progress towards understanding the protein requirements for cSDR, which include DnaB, DnaC, DnaG, PriA, DNA polymerases I and III, and RecA (but no other recombination protein) (see Kogoma, 1997 for review). However, the true nature of oriK sites, proposed to be transcription units prone to R-loop formation, is still unknown.
A better understanding of cSDR may illuminate important aspects of chromosomal replication, its connection to chromosome segregation and the cell cycle, and genome stability. Assuming that multiple weak oriK sites exist around the chromosome, rnhA− cells, even when they have a functional oriC, should have replication forks running in the wrong direction relative to normal replication, an uncoordinated replication cycle, and opposing replication forks meeting in unusual places around the chromosome. Notably, E. coli strains with a reduced capacity to remove R-loops are unhealthy, as evidenced by the SOS-constitutive phenotypes of rnhA− and recG− mutants, as well as the inviability of rnhA recG double mutants (Hong et al., 1995). Intriguing hotspots for homologous recombination (Hot sites), mostly in the chromosomal terminus region, are also activated in rnhA−E. coli cells (Nishitani et al., 1993; Horiuchi et al., 1994). A subset of these Hot sites are adjacent to Ter sites flanking the terminus region, and the induced recombination was shown to be dependent on the Tus protein (Horiuchi et al., 1994). These results led to a model in which aberrant replication initiates from R-loops, the resulting forks stall at the Ter sites for abnormally long times, and breakage at the stalled forks triggers RecA/RecBCD-dependent homologous recombination (later results suggested that a second fork from the same direction could collide with the blocked fork to generate the broken end; Bidnenko et al., 2002).
In a more general sense, replication disturbances are responsible for genome instability caused by R-loops (see Discussion) (French, 1992; Rudolph et al., 2009; 2010; Boubakri et al., 2010; Srivatsan et al., 2010; Gan et al., 2011; De Septenville et al., 2012; Wimberly et al., 2013). Indeed, R-loops have been linked to DNA rearrangements, double-strand break (DSB) formation, and transcriptional elongation defects in species ranging from yeast to humans (Aguilera and Garcia-Muse, 2012). Wimberly et al. (2013) recently provided evidence that R-loop-triggered replication in starving phase (wild-type) E. coli leads to DNA breaks when the fork encounters nicks in the template, resulting in stress-induced mutation and amplification. This study not only highlights the importance of R-loops in genome instability but also provides evidence for R-loop-dependent replication in wild-type (non-growing) cells. Recent work has also begun to reveal a connection between genomic instability and the formation of RNA-DNA hybrids in cancer cells (Potenski and Klein, 2011). Our study of the cSDR system in E. coli is aimed at improving our understanding of the mechanisms underlying the propagation of R-loop structures in the cell, and their effects on chromosomal integrity.
Our goals in these studies were to better understand both the initiation of replication from R-loops at oriK sites in the chromosome and the consequences of the aberrant replication that is thereby induced. The prior studies on cSDR and Hot sites focused our attention on the terminus region in our search for oriK sites. First, two of the major oriK sites identified by Kogoma were apparently located within this region, although their positions were only crudely defined. Second, Tus-dependent Hot fragments were argued to result from the activity of nearby oriK sites, and Tus-independent Hot fragments in the terminus region could conceivably be dependent on oriK sites within (or, again, close to) the Hot fragment (Horiuchi et al., 1994). Third, focusing on the terminus region provides an additional tool that can be used to assess oriK activity: the accumulation of blocked forks at Ter sites during replication termination.
By employing the technique of two-dimensional agarose gel electrophoresis (Friedman and Brewer, 1995), we were able to visualize and quantify stalled replication forks on the non-permissive side of Ter sites in RNase HI-deficient cells. We attempted to localize oriK sites by inserting artificial Ter sites at specific locations within the terminus region and observing the effects on the proportion of forks stalled at the natural TerB site downstream. We then went on to use next-generation sequencing (NGS) analyses of the genomes of various rnhA− strains to identify regions of origin activity and to analyse the replication dynamics when forks are initiated at sites other than oriC. The use of NGS with deep read coverage provides a high level of accuracy in determining copy number around the E. coli chromosome – for example, pinpointing the location of oriC in dnaA+ cells to within a few kb of its known location. The NGS results show a dramatic reversal in apparent replication directions in an rnhA−dnaA− double mutant, with most forks apparently initiating in the general vicinity of the terminus region and replication often terminating in the general vicinity of oriC. We also found dramatic discontinuities in DNA copy number, implying replication fork stalling or blockage, at Ter sites outside of the terminus trap region and at ribosomal operons. Finally, the NGS profiles under several conditions strongly suggest that the E. coli chromosome contains multiple oriK sites, with some specific candidate locations evident from small peaks in the profiles.
Blocked forks at Ter sites accumulate at greater levels in rnhA− strains
Previous work implicated the terminus region of the E. coli chromosome as a likely site of oriK elements (see Introduction). We therefore began our analysis of DNA replication in rnhA knockout E. coli by measuring the blocked forks at Ter sites. The termination events of replication in E. coli generally take place in an area known as the replication trap, a 267 kb region delimited by the innermost Ter sites, TerA and TerC. The ten known Ter sites, shown in Fig. 1, are sequences to which the termination protein Tus binds in a polar manner, blocking the progression of incoming replication forks from one direction but not the other. In the trap region, clockwise-moving forks stall first at TerC, while counter-clockwise forks do so at TerA. In order to visualize these stalled forks, we made use of two-dimensional (2D) gel electrophoresis, a technique that allows digested DNA fragments to be separated by shape as well as size, thus making it possible to distinguish between different kinds of replication intermediates. The collection of Y-shaped replication intermediates resulting from the movement of a replication fork through a restriction fragment produces a ‘Y arc’ on 2D gels, and stalling of forks at a specific location such as a Ter site results in a discrete spot on the Y arc, which can subsequently be quantified (relative to the linear restriction fragment; see Fig. 2A). This approach was previously used to visualize and quantify Tus-Ter-blocked forks in wild-type E. coli, and as expected, the highest fraction of blocked forks was found at the innermost TerA and TerC sites (0.19% and 0.85%, compared with the linear restriction fragment respectively; see Duggin and Bell, 2009).
As a result of the added oriK replication origin activity in rnhA− cells, a greater fraction of blocked replication forks are expected to be found at Ter sites in these cells than in the wild-type, presumably dependent on the location of oriK sites around the chromosome. For all experiments in this study, we grew cells in LB broth because rnhA− mutants express higher levels of SOS in LB than in minimal media, consistent with a higher level of cSDR activity (Kogoma et al., 1993; also see Usongo et al., 2013). We compared replication fork intermediates at the TerB and TerC sites of a wild-type and an rnhA− derivative of E. coli strain MG1655. These two sites were chosen because previous studies had suggested that the TerB-TerC region contains oriK site(s) (see Introduction), which could be particularly evident from an accumulation of blocked forks at TerB (the upstream TerC should block most forks originating from outside this region; see Fig. 1).
Growing cells were harvested and lysed in low melting point agarose plugs to extract their large genomic DNAs without shearing, and restriction digestions were performed in the plugs prior to 2D gel electrophoresis and Southern blotting with an appropriate probe. Each blot clearly revealed an accumulation of replication intermediates as a discrete spot at the appropriate location on the Y arc, indicative of stalled forks at the Ter site (Fig. 2B). The fraction of blocked forks was quantified (in repeated experiments) by dividing the signal of the blocked forks by the sum total of the blocked forks and linear spot (Fig. 2C).
Quantification of blocked forks in the wild-type gave values modestly higher than those of Duggin and Bell (2009), perhaps due to differences in growth conditions (they also used MG1655 as the wild-type, arguing against a strain difference). We found that the fraction of forks blocked at the TerB and TerC sites was dramatically higher in the rnhA− mutant (11-fold and sevenfold respectively; Fig. 2C). Fully 15% of the TerC restriction fragment signal consisted of blocked forks, indicating that a clockwise fork often arrives at TerC well before a counter-clockwise fork – that is, a strong asymmetry in replication. A sizable fraction (5.7%) of blocked forks was also detected in the TerB restriction fragment; these forks must have originated from oriK site(s) within the TerB-TerC region and/or forks that originated upstream of TerC but somehow leaked through it.
In an attempt to distinguish between these two alternatives, we inserted additional co-oriented Ter sites within non-essential genes between TerB and TerC in the rnhA− background. We used a recombineering technique (Baba et al., 2006) to construct four different rnhA− strains containing a 23 bp TerA site insertion in place of the non-essential genes yneK, flxA, ynfL, or ydgH (see Fig. 3A). The rationale is that an additional Ter site should substantially reduce the number of clockwise forks that leak through TerC and subsequently arrive at TerB, but should not reduce blocked forks at TerB that originated from oriK site(s) between TerB and the inserted site. We confirmed that every inserted Ter site did accumulate blocked forks (i.e. was functional; data not shown), but we focused our quantitative analysis on the forks at the TerB site.
The insertion of a Ter site at yneK, flxA or ynfL reduced the percentage of blocked forks at TerB by about threefold, to values around 2%, with somewhat different averages but overlapping error bars (Fig. 3B). However, insertion of a Ter site at ydgH caused a more dramatic decrease, down nearly 10-fold (to 0.59%; Fig. 3B). We also tested double Ter site insertions, one pair at the upstream flxA and yneK sites and the other pair at ydgH and ynfL, closer to TerB (see Fig. 3A). The percentage of blocked forks at TerB averaged 1.4% with the pair of upstream Ter sites, but less than 0.4% with the pair of Ter sites inserted closer to TerB (Fig. 3B). All of these results are consistent with the presence of one or more oriK sites in the 8.8 kb ydgH-ynfL interval, forks from which can only be blocked by the leftmost (ydgH) Ter site insertion. However, an important caveat remains, since the efficiency of the various inserted Ter sites may not be equal to each other. Further experiments are needed to definitively test for the presence of the proposed oriK site(s) in this interval and the rest of the terminus region.
Deep sequencing analysis reveals excess replicated DNA in the TerA-TerC region of rnhA− cells
We next pursued a more global approach to analyse the chromosomal replication pattern of rnhA− strains, using next generation sequencing (NGS) in an approach similar to those used in recent studies (Muller and Nieduszynski, 2012; Rudolph et al., 2013). Several prior global studies using either microarrays or NGS have documented the highest copy numbers for wild-type E. coli in the region centred at oriC and the lowest copy numbers in the terminus region, as expected for bidirectional replication from oriC (Simmons et al., 2004; Breier et al., 2005; Kouzminova and Kuzminov, 2008; Sangurdekar et al., 2010; Skovgaard et al., 2011; Kuong and Kuzminov, 2012; Rudolph et al., 2013).
The analysis is illustrated first with wild-type MG1655 (Fig. 4). Samples were prepared from both growing and chloramphenicol-treated cells, sequence reads were aligned to the reference MG1655 genome, and the numbers of sequence reads within 100 bp windows were tabulated. Prior to tabulation, we eliminated all sequence bins that included repetitive DNA segments (e.g. rRNA operons, IS elements, REP elements) which cannot be aligned correctly. The tabulation generates a much more robust genome copy number profile than that from the prior hybridization analysis for oriK sites, which measured only 21 loci (de Massy et al., 1984a).
The read count profile for chloramphenicol-treated MG1655 (oriC inactive) was, as expected, fairly uniform across the genome with no obvious trend (Fig. 4B; note that all NGS data are presented using Log2 scales). Spikes and outliers were observed above and below the mainstream, often in the same location in the two datasets (plus and minus chloramphenicol). These spikes and a component of the data scatter might be caused by read count artefacts (PCR or sequencing variations based on local sequence characteristics). We therefore calculated the ratio of the read counts in the growing cell sample to those in the chloramphenicol-treated sample for each bin (and also divided by the overall ratio of all aligned read counts in the two samples to standardize between curves). This resulted in a significant smoothing of the data, with few spikes remaining (Fig. 4C). We also generated a LOESS curve (locally weighted polynomial regression; red line) which engages the 1000 adjacent data points in each direction to approximate the overall copy number pattern.
The overall shape of the adjusted MG1655 genome profile was similar to those from the above-cited past studies, with a dominant peak near oriC and the lowest copy numbers in the terminus region (Fig. 4C). The peak of the LOESS curve is at 3916.2 kb, less than 8 kb from the actual location of oriC (3923.9 kb), and the lowest value was at position 1602.5 kb, very close to TerC (1607.2 kb). As discussed above, TerC is the Ter site with the highest frequency of blocked forks in the wild-type, consistent with most replication termination events occurring in the vicinity of TerC. The profile displayed a heightened slope for about 300 kb on each side of oriC. This result suggests that replication forks travel slower over this interval, or perhaps that there was some unusual perturbation of replication initiation under our conditions; further experiments are needed to clarify this point. All the subsequent experiments below will utilize the same manipulations to generate corrected datasets (always relative to the chloramphenicol-treated MG1655 data in Fig. 4B).
Turning to the analysis of rnhA−E. coli, we found that the oriC peak was poorly defined and that the copy numbers were more evenly distributed across most of the chromosome (Fig. 5A). Indeed, the main peak is no longer even centred at oriC, but rather covers a broad region from about 3900 kb to 4300 kb (Fig. 5A; see Discussion). The data also revealed the presence of a secondary peak in the replication fork trap/terminus region, which appears to be bounded by the TerA site on the left and TerC on the right (with a more subtle discontinuity at TerB; see blow-up of this region in Fig. 5B). This strain contains a functional oriC but is also active for oriK sites, which have previously been linked to the terminus region (see above). The peak in the fork trap region is consistent with the presence of one or more oriK sites in this region. Another possible interpretation, not mutually exclusive, is that oriK sites exist in both halves of the chromosome but fire in an uncoordinated manner. Assume that in some chromosomes, leftward (counter-clockwise) forks are blocked for prolonged periods at TerA, while in other chromosomes, rightward (clockwise) forks are blocked for prolonged periods at TerC, then each of these two classes of chromosomes would have two copies of replication fork trap DNA, two copies of DNA on the oriK-proximal side of the trap, but only one copy of the DNA distal to the fork trap region (2:2:1 and 1:2:2 for leftwards: trap: rightwards). Summing up all chromosomes in an equal mix of chromosomes would give a peak in the TerA-TerC interval (3:4:3) even if no single chromosomes have replicated only that region. For both MG1655 and the rnhA− single mutant, similar (though less well-defined) patterns were also seen using a microarray approach to measuring genome copy number profiles (Fig. S1).
The replication origin of a cryptic prophage within the TerA-TerC region, called oriJ, has previously been shown to induce autonomous plasmid replication in cells lacking the remainder of the prophage (Diaz and Pritchard, 1978; Diaz et al., 1979). We wondered whether oriJ might be responsible for replication from the terminus region in rnhA− cells, contributing to the strong peak in the terminus region. To test this possibility, we inserted an ectopic Ter site in the rzpR locus at position 1421.5 kb, upstream of both oriJ and TerA, and in the same orientation as TerA (see Fig. 5C). If oriJ is generating much of the excess DNA just to the right of TerA, the peak should be maintained, while if oriK sites to the right of rzpR are responsible, we should see the overall terminus region peak contract. Indeed, the inserted Ter site at rzpR now became the border of the high-copy terminus region, and no discontinuity was evident at TerA (Fig. 5C; Fig. S2). We conclude that the peak in the terminus region is not caused by a strong oriK in the TerA-rzpR interval, which includes oriJ, and this experiment also provides direct confirmation that the Ter sites are necessary for the striking discontinuities in the peak shape.
Next, analysis of DNA from an rnhA−dnaA− double mutant revealed a striking profile (Fig. 6A). The terminus region provided the peak, this time bordered by discontinuities at TerA and TerB (Figs 6B and S1; see Discussion). A novel discontinuity was also evident at TerE, well to the left of the normal terminus region (Fig. 6B). TerE is oriented to trap leftward-moving forks, suggesting that forks trapped at this site were initiated at oriK site(s) between TerE and TerD (1081–1279 kb). Also, the region with the lowest read counts was not far from oriC, with a notable increase in read counts in the region to the right of oriC (see below). A striking discontinuity in read counts was also evident very near or at the TerG site (2375 kb; Fig. S3). TerG is oriented to trap rightward-moving forks, suggesting that many forks trapped at this site were initiated at oriK site(s) to the left of TerG, likely between TerB and TerG (1682–2375 kb). No discontinuity was evident at TerF (2315.7 kb), about 60 kb upstream of TerG (with respect to clockwise-moving forks; Fig. S3). TerF is known to be a weak Ter site (Sharma and Hill, 1992; Duggin and Bell, 2009), perhaps explaining why forks accumulate at TerG rather than TerF. A similar, though less well-defined, profile was revealed by microarray analysis of the rnhA−dnaA− double mutant (Fig. S1C).
An independent way to generate cells that are replicating only from oriK sites is to treat the rnhA− single mutant with chloramphenicol, which inhibits replication from oriC but not oriK sites (Kogoma, 1978; von Meyenburg et al., 1987). The profile of DNA from this condition looked very similar to the profile of the rnhA−dnaA− double mutant above (compare Figs 7A and 6A; also see Fig. S4 for the profile of the rzpR::Ter insertion strain treated with chloramphenicol). The peaks in the terminus region were very similar, and both profiles showed the lowest region to the left of oriC (roughly from 3500 kb to 4000 kb). A closer look at the regions at each edge of this trough is revealing (Fig. 7B; see also Fig. S5). The rrnD operon (3426.8 kb) is at or very near a copy number discontinuity on the left, and the rrnE operon (4206.2 kb) is at a discontinuity on the right (Fig. 7B; Fig. S5). There also appear to be less dramatic discontinuities at or near rrnB and rrnA. Several past studies have shown that replication forks have particular difficulty travelling through oppositely oriented, actively transcribing rrn operons (see Discussion). All of these rrn operons are oriented in the same direction as replication in the wild-type E. coli, but would be oriented in the opposite direction for replication forks that emanate from an oriK element in or near the terminus region. Replication fork pausing or blockage at the rrn operons therefore provides a likely explanation for the discontinuities highlighted in Figs 7B and S5. We also note that the profile in this region is not strictly defined by discontinuities at rrn operons – this is particularly noticeable around rrnD, where gene copy number is changing in a progressive way on both flanks. Nearby oriK sites and/or fork stalling at sites surrounding rrnD presumably contribute to this aspect of the profiles.
Replication profiles of rnhA− strains in the absence of Tus-mediated fork blockage
Several of the read count discontinuities above occur very near or at Ter sites, and the introduction of a Ter site at rzpR created a new discontinuity as well. These results already show that fork blockage by Tus-Ter complexes plays a major role in the shapes of the profiles. In addition to the local effects on read counts caused by fork blockage at a particular Ter site, Tus-mediated fork blockage would likely have more global effects on the profile – for example, by extending the expanse of the chromosome replicated by a fork from a given oriK when its partner fork is blocked at a Ter site. This effect could be complex considering that multiple oriK elements are likely spread around the chromosome, presumably differing in strength from one another. To approach this issue, we constructed rnhA− and rnhA−dnaA− derivatives with a deletion of the tus gene and analysed their genomic profiles.
As expected, introduction of the tus mutation into both rnhA− strains led to a flattening of the terminus region peak and the disappearance of the discontinuities at Ter sites (Fig. 8A and B). The overall profile of the rnhA−tus− strain was somewhat flatter than the rnhA− single mutant (compare Figs 8 and 5A). This general flattening could result from the global effect mentioned in the paragraph above.
The overall shape of the rnhA−dnaA−tus− profile was similar to that of the double rnhA−dnaA− mutant, absent the discontinuities and peaks in the regions of Ter sites (compare Figs 8B and 6A). However, the change in profile caused by the tus mutation in the terminus region was informative – instead of a single strong peak in the TerA-TerB region, the two regions flanking TerA-TerB had noticeable peaks (LOESS maxima at 1099.8, 1278.3 and 1766.5 kb). This result implies that the dramatic peak covering the TerA-TerB region in rnhA−dnaA−tus+ cells is NOT caused primarily by one or more strong oriK site(s) within that region. The region still might contain one or more oriK(s), but they are not strong enough to cause this peak. Reflecting back on the differential hybridization analysis of de Massy et al. (1984a), these data also negate the evidence for strong oriK site(s) in the TerA-TerB interval.
So what does the data indicate about the location of oriK sites in the chromosome? Several features of the two tus− profiles perhaps provide clues. First, the highest LOESS value in the rnhA−tus− profile occurs not at oriC but at position 4273 kb, more than 300 kb clockwise from oriC (Fig. 8A). This peak could reflect the presence of oriK site(s) in this region and/or replication fork blockage at rrn and other operons (see Discussion). Second, in the rnhA−dnaA−tus− triple mutant, where oriC is non-functional, the peak region of the entire genome is broadly in the region of 700 to 1600 kb, with the maximum LOESS value at 1278 kb (Fig. 8B). This peak region is not very sharply defined, consistent with the presence of two or more oriK sites. As in the tus+ strain, the genome pattern in this mutant is basically opposite the normal, with the oriC region being the lowest copy number. Third, several discrete ‘bumps’ on the LOESS curves coincide in the two different tus− profiles; the most obvious is near 1760 kb, with the exact peak values being within 10 kb of each other in the two profiles (Fig. 8A and B; also see Discussion).
Hydroxyurea treatment amplifies origin peaks in the genome profile
Hydroxyurea (HU) inhibits the enzyme ribonucleotide reductase, leading to depletion of deoxyribonucleotides (dNTPs) in the cell and inhibition of replication. Kuzminov and colleagues recently showed a dramatic sharpening of the oriC peak with prolonged HU treatment, as measured by microarray analysis of genome copy number (Kuong and Kuzminov, 2012). Apparently, oriC initiation can continue in the presence of HU but the forks so initiated become stalled or move very slowly as they attempt to traverse the chromosome. This seemed like an excellent tool to try to define the location of oriK sites more precisely.
We began by treating wild-type MG1655 with HU for 4 h and analysing the genome profile. In agreement with the Kuzminov study, we found a very dramatic and sharp profile centred on oriC (with significantly less scatter than in the microarray data of Kuong & Kuzminov, 2012) (Fig. 9A). The LOESS-averaged intensity at the oriC peak was about fourfold higher than the intensity near the bottom of the peak (about 500 kb in either direction) and 8.6-fold higher than the lowest intensity near position 1591 kb. The highest LOESS value was at position 3925.8 kb, less than 2 kb from the known position of oriC, attesting to the high precision of the data. The median distance travelled by forks from oriC should be roughly the distance where the copy number drops in half, which corresponds to about 250 kb in either direction (261 kb clockwise and 245 kb counter-clockwise).
A prominent oriC peak is also evident in the profile of the rnhA− strain treated with HU, as would be expected (Fig. 9B). Again, the highest LOESS value in this peak (position 3922.4 kb) was within 2 kb of the known position of oriC. Also similar to the MG1655 profile, the oriC peak in the rnhA− curve drops by half at about 250 kb in either direction from oriC (250 kb clockwise and 241 kb counter-clockwise).
The terminus region of the rnhA− strain treated with HU (Fig. 9B) shows a modest peak which is essentially eliminated in the rnhA−tus− strain treated with HU (Fig. S6). The shapes of these two profiles imply that oriK sites distant from oriC are active. First, the small peak in the terminus region of the rnhA− single mutant seems to reflect some oriK activity in the region to the right of TerA, since the copy number is higher on that side. Second, the entire oriC-distal region of this profile is not as depressed as in the MG1655 profile, with a ratio between the copy number at oriC compared with the lowest point (near position 1228 kb) being only about 4.8-fold (compared with 8.6-fold in MG1655; see above). Third, since the terminus peak is dependent on the presence of Tus, the peak presumably consists of Tus-blocked forks that were triggered by origins that are relatively nearby (since forks travel only about 250 kb on average in the presence of HU). These results support the conclusion that multiple weak oriK sites exist in the general region of the terminus (e.g. within a few hundred kb of the replication fork trap).
The copy number profile of the rnhA−dnaA− strain treated with HU argues that oriK activity is occurring throughout the genome, with the greatest activity in the region between about 1200 and 2400 kb (which have the highest average copy numbers from throughout the genome; Fig. 9C). Rather than a few discrete peaks shaped like the oriC peak (only smaller), the profile is quite complex, with multiple peaks and slope changes. Small but noticeable peaks do occur with high LOESS values at about 1483 kb; 1908 kb; 2592 kb; 2832 kb; 3234 kb; and 4397 kb. It seems likely that some or all of these peaks reflect the activity of specific oriK sites, but stronger evidence is needed to confirm this inference. It is also likely that both Tus-Ter complexes and rrn operons have some effects on the overall profile. Based on both the rnhA− and rnhA−dnaA− profiles, we infer that multiple oriK elements are spread around the chromosome and each initiates replication at a much lower frequency than does oriC.
Global replication profiles in rnhA−E. coli
In this study, we analysed the replication profiles of wild-type and various rnhA− mutants of E. coli using a precise NGS approach, backed up by 2D gel and microarray approaches. Our studies revealed a dramatic increase in the levels of blocked forks at Ter sites in the terminus region when the rnhA gene is disrupted (Fig. 2), implying poorly coordinated replication in the rnhA− mutant (i.e. the two oppositely oriented replication forks arrive at these Ter sites at very different times). NGS and microarray analyses confirmed the unusual replication behaviour of rnhA− and particularly rnhA−dnaA− mutants. Wild-type cells show little or no excess of DNA in the replication fork trap interval (TerA-TerC), consistent with well-coordinated replication (Duggin and Bell, 2009; also see Figs 4C and S1A). However, both of the rnhA− strains show a prominent peak in this region (Figs 5, 6, S1B and C), which could be caused by oriK sites in the trap interval and/or uncoordinated replication (with some chromosomes having TerA-trapped clockwise forks and others having TerC-trapped counter-clockwise forks). The sharp peak delineated by TerA and TerC, however, was essentially abolished in tus− derivatives of the two rnhA− strains (Fig. 8). In the rnhA−dnaA−tus− triple mutant, peaks appeared in the regions flanking TerA-TerB rather than within that region. This argues that blockage of forks from outside the TerA-TerB interval plays a major role in the formation of the terminus peak in Tus-containing cells.
Additional results from the NGS analysis clearly reveal fork blockage by Tus-Ter complexes in rnhA− strains. The discontinuity at TerA in the rnhA− single mutant is abolished and replaced by a similar discontinuity at an artificial Ter site at the upstream rzpR locus (compare Fig. 5B and C). In addition, new and very striking discontinuities are evident at TerE, TerB, and TerG in the rnhA−dnaA− double mutant (Fig. 6; Fig. S3).
The tus− strains provide an informative view by eliminating the effects of Ter sites. The rnhA−dnaA−tus− strain shows a peak of copy number at 1278 kb, in the region nearly opposite oriC, and the lowest copy number at 3883 kb, only about 40 kb from oriC (Fig. 8B; note that all comparisons of copy number in this Discussion are based on the LOESS regression values). This result implies that much of the DNA replication in this strain is opposite in direction to that of wild-type strains. The oriC-competent rnhA−tus− mutant shows the flattest genomic profile of any of the tested strains, with only a 1.29-fold difference between the most and least represented sequences (Fig. 8A). This is not surprising, since the rnhA−dnaA−tus− triple mutant shows nearly the reverse pattern from wild-type but the rnhA−tus− double mutant should have an active oriC.
The NGS analysis presented here also contributes to our understanding of the replication-transcription conflict. The seven ribosomal RNA operons (rrn) of E. coli are co-oriented with the direction of DNA replication to minimize head-on collisions between the fork and active transcription complexes in rrn operons (for reviews, see Brewer, 1988; Rudolph et al., 2007; Merrikh et al., 2012). French (1992) first showed that replication rate is reduced during a head-on collision, and studies with a plasmid-based in vivo system showed fork blockage upon encounter with an oppositely oriented transcription complex (Mirkin and Mirkin, 2005). In addition, Pomerantz and O'Donnell (2010) showed that E. coli replisomes have great difficulty replicating through oppositely oriented transcribing RNA polymerases in vitro. The ability of the bacterial replisome to traverse protein-bound DNA in vivo and in vitro was shown to be dependent on the helicases Rep and UvrD (Guy et al., 2009). Furthermore, Boubakri et al. (2010) showed that the helicases DinG, Rep and UvrD play overlapping roles in allowing replication fork progression through oppositely oriented, transcribing rrn operons in E. coli. Finally, replication fork progression in Bacillus subtilis was slowed by about 30% in wild-type cells when a segment of DNA was replicated in the unnatural direction, and this slowing was shown to be dependent on oppositely oriented transcription (Wang et al., 2007; also see Srivatsan et al., 2010).
We found the most obvious effect of rrn operons on replication profiles in the rnhA− strain when oriC function was inhibited by chloramphenicol (Fig. 7B; Fig. S4). The profiles showed some abrupt but fairly subtle discontinuities right at rrnA and rrnE, suggesting direct blockage of the replisome by the transcribing RNA polymerase. In each of these cases, copy number decreased in the direction expected for oppositely oriented replication forks (since forks are likely travelling in both directions in the rnhA− strains). However, there was a more dramatic decrease in copy number surrounding the rrn operons, apparently in both directions. For example, gradual declines are evident for about 100 kb before and 50 kb after rrnD (considering a rightward-moving fork in Fig. 7B). Additional experiments are needed to test whether these gradual declines are caused by rrn operon transcription – for example, by a mechanism involving transcription-driven supercoiling.
Where does replication initiate in the chromosomes of rnhA−E. coli?
One goal of this study was to localize the origin sequences responsible for constitutive stable DNA replication (cSDR) in E. coli, the so-called oriK sites. We first focused attention on the terminus region of the chromosome for reasons outlined in the Introduction. 2D gel analyses of the rnhA− mutant strain showed elevated levels of blocked replication forks at both TerB and TerC (Fig. 2). We were particularly struck by the level of blocked forks at TerB, since the TerC site, 75 kb upstream, would be expected to block most incoming clockwise forks that originated outside this region (see map in Fig. 1). This suggested that the TerB-TerC region itself might contain one or more oriK sites. Results with strains containing one or two additional co-oriented Ter sites inserted within this region were consistent with a possible oriK site in the 8.8 kb ydgH-ynfL interval, although this inference depends on the assumption that all inserted artificial Ter sites are about equally efficient. We attempted to confirm the existence of an oriK site in this region by flanking this region on both sides with oppositely facing Ter sites to create a replication ‘bubble trap’. While each of the two new Ter sites successfully blocked forks as shown by the accumulation of Y-form DNA in 2D gels, we did not detect any bubble molecules in restriction fragments that contained the two Ter sites and intervening DNA (Maduike, 2012). This result argues against an oriK in this small interval, unless some unknown problem prevents formation or detection of the expected bubbles.
Our global analysis using NGS and microarrays changed our perspective on oriK sites in the terminus region. The prominent peak in the TerA-TerC replication trap region of rnhA− strains (Figs 5, 6, 7A, and Figs S1, S2 and S4) was essentially abolished in the rnhA−tus− double mutant and very much broadened (with local peak positions outside of TerA-TerC) in the rnhA−dnaA−tus− triple mutant (Fig. 8). These results argue that the peak within TerA-TerC does not primarily result from DNA molecules that have undergone bidirectional replication from an oriK within the TerA-TerC region, but rather an accumulation of some molecules with clockwise forks blocked at TerC and other molecules with counter-clockwise forks blocked at TerA. These results likewise argue against the prior identification of two oriK sites in the terminus region (de Massy et al., 1984a) and the putative oriX site in this same region (de Massy et al., 1984b); both studies relied on relative copy number measurements but with many fewer data points, and both sets of measurements would have been similarly impacted by Tus-mediated fork blockage. We also obtained evidence that the cryptic phage replication origin oriJ does not function as an oriK element in rnhA− cells (Fig. 5; also see Maduike, 2012, for negative results with an oriJ bubble trap experiment).
A surprising finding from this study is that the overall copy number of the three rnhA− strains peaked about 350 kb clockwise from oriC (Table 1, positions near 4300 kb). This result raises the possibility that an oriK site, stronger than oriC, is localized in this region. A different interpretation is that this region contains multiple elements that can block forks travelling in either direction, with a resulting accumulation of clockwise forks that originated from oriC, and counter-clockwise forks that originated from oriK sites distal to this region. If the replicated regions in the two sets of molecules overlap, a peak could be generated (much like the peak in the terminus region in Tus-proficient rnhA− cells). This region contains multiple rrn operons, which can impede replication from the counter-clockwise direction (see above). Furthermore, Gan et al. (2011) recently showed that an R-loop that is co-directional with a replication fork in a ColE1-based plasmid can stall replication. Thus, in the rnhA− mutant, fork blockage in either direction (in different chromosomes) could be sufficient to explain the unusual peak.
Table 1. Some detectable peaks in rnhA− cells (LOESS curve maxima)
All numbers in genome coordinates (kb). Bold entries are particularly strong peaks. This list is not exhaustive, but highlights peaks seen in multiple profiles. The oriC peaks, when present, are not included in this table but are mentioned directly in the text.
rnhA, + CAM
rnhA dnaA tus
rnhA rzpR::Ter, + CAM
rnhA, + HU
rnhA dnaA, + HU
rnhA tus, + HU
What have we learned about the location of oriK sites around the chromosome? Their positions should be most evident in DNA from cells in which the powerful oriC is not functional – i.e. rnhA−dnaA− double mutants and rnhA− mutants treated with chloramphenicol. First, the discontinuity at the TerE site suggested the presence of oriK site(s) between TerE and TerD (1081–1279 kb; see above). Second, the discontinuity at TerG suggested the presence of oriK site(s) between TerB and TerG (1682–2375 kb; see above). Third, the rnhA−dnaA− double mutant showed a peak in the terminus region bordered by TerA and TerB, while the peak in the rnhA− single mutant was bordered by TerA and TerC (compare Figs 6B and 5B). The discontinuity at TerC in the single mutant argues that TerC is efficient at collecting clockwise forks from oriC, which suggests that the prominent discontinuity at TerB in the double mutant results from oriK site(s) in the TerC-TerB region (1607–1682 kb). 2D gel approaches also provided evidence, albeit equivocal, for oriK site(s) in this interval (Figs 2 and 3). Fourth, the overall peak in the NGS genome profile of the rnhA−dnaA−tus− cells was broadly in the region of about 700–1600 kb (Fig. 8B), suggesting that oriK site(s) exist in this region. Fifth, the profiles of the rnhA−tus− and rnhA−dnaA−tus− cells, which have no fork blockage at Ter sites, show several discrete bumps (maxima in the LOESS curves) that we hypothesize represent oriK sites, the most obvious near position 1760 kb (Fig. 8). Sixth, modest bumps are also seen in the profiles of rnhA−dnaA− cells and rnhA− cells (with or without the rzpR::Ter insertion) treated with chloramphenicol, including several that show up in all three of these profiles (positions near 850 kb, 1840 kb and 3230 kb; TerA-TerC region peak not included since it is likely created by Tus-mediated fork blockage). Table 1 presents a summary of locations that repeatedly appeared as LOESS curve maxima in the various profiles; these are good candidate regions for oriK sites at diverse locations around the E. coli chromosome.
Following up on the recent microarray study from Kuzminov's group (Kuong and Kuzminov, 2012), we confirmed that HU treatment is a powerful tool to use in the analysis of genome replication. We found that the average distance travelled by replication forks from oriC in wild-type cells during a 4 h HU treatment was only about 250 kb, and that approximately 8.6-fold more DNA accumulated at oriC than at distant regions (Fig. 9A). The rnhA− cells treated with HU showed a prominent peak at oriC, and their overall NGS profile was consistent with the existence of multiple oriK sites throughout the chromosome (Fig. 9B). Finally, perhaps the most informative profile was from the rnhA−dnaA− cells treated with HU (Fig. 9C). As with the profiles from these cells without HU treatment, the overall peak of the NGS profile was in the terminus region (1483 kb). This is located in the replication fork trap area (TerA-TerC), and thus fork blockage at Ter sites likely contributes. The profile shows five other maxima (see Table 1) that are consistent with oriK site activity, two of which are near maxima that were consistently detected in rnhA−dnaA− cells and rnhA− cells treated with chloramphenicol (maxima at 1907.5 kb and 3233.7 kb). The LOESS curve in the HU-treated rnhA−dnaA− cells was very bumpy, suggesting additional weak oriK sites that do not form maxima and likely also fork blockage at multiple sites such as Ter and rrn operons.
Very recently, Leela et al. (2013) used bisulphite reactivity in an attempt to map R-loops in the chromosomes of E. coli MG1655 and a nusG derivative. They extracted chromosomal DNA from the cells and treated with bisulphite, which deaminates C residues in single-stranded regions, converting C to T (or G to A on the opposite strand after amplification). Surprisingly, they found that approximately 7% of sequence reads in the wild-type showed C to T or G to A conversions from bisulphite treatment, indicative of single-stranded character, and provided supporting evidence that many of these conversions were dependent on R-loop formation. Such a high level of R-loop formation in wild-type E. coli seems incongruous with the normal physiology of wild-type cells compared with rnhA− mutants (see below), and this discrepancy remains to be explained. If confirmed, these candidate R-loop locations provide another clue to the locations of oriK sites (although it is not clear that all R-loops would be functional as oriK sites).
Pathology resulting from removal of RNase HI
Mutations that inactivate E. coli RNase HI cause serious pathology, including an SOS-constitutive phenotype that presumably reflects spontaneous DNA damage, and synthetic lethality with recG mutations (see Introduction). These deleterious effects presumably result from aberrant R-loop formation, but the exact pathways involved in pathology are still uncertain. Recent studies have highlighted the deleterious effects of aberrant R-loop formation, including replication fork blockage, mutagenesis, DNA breakage, and DNA rearrangements (Boubakri et al., 2010; Gan et al., 2011; Helmrich et al., 2011; Stirling et al., 2012; Wimberly et al., 2013) (for a review, see Aguilera and Garcia-Muse, 2012).
Our analysis of chromosomal replication in the absence of RNase HI illuminates some aspects of the RNase HI-deficient physiology. First, the data directly shows that rnhA− cells have an altered (and much flatter) profile of gene copy number around the chromosome compared with the wild-type (compare Fig. 4C with Fig. 5A; Fig. S1). Highly expressed genes are concentrated in the early replicating region of the E. coli chromosome, and their higher copy numbers in rapidly growing cells augment their expression (Chandler and Pritchard, 1975; Sousa et al., 1997; Couturier and Rocha, 2006). The rnhA− cells may suffer physiological consequences due to alterations in expression caused by aberrant gene ratios. Second, 2D gel analysis and both microarray and NGS profiles showed that rnhA− cells have a much higher steady-state level of replication forks blocked at Ter sites. Fork blockage at Ter sites can lead to both homologous and illegitimate recombination, both presumably resulting from DNA breaks (Bierne et al., 1991; 1997; Horiuchi et al., 1994). A defined pathway of break formation at Tus-blocked forks involves the arrival of a second replication fork from the same direction (Sharma and Hill, 1995; Bidnenko et al., 2002). Third, our NGS analysis showed evidence for fork stalling or blockage at rrn operons and in their general vicinity (see above). Aguilera and Garcia-Muse (2012) discuss in detail the possible pathways of DNA breakage and genomic instability resulting from fork blockage during transcription of ribosomal RNA genes. Finally, the existence of oriK sites throughout the chromosome implies that oppositely oriented forks will meet at many locations, presumably in a fashion that is uncoordinated with the cell division cycle. These aberrant replication termination events may contribute to the pathology of rnhA− cells (also see Wimberly et al., 2013 for analysis of R-loop consequences in non-growing E. coli). With these considerations, it is perhaps surprising that RNase HI-deficient cells, and particularly derivatives that cannot utilize oriC, are even viable.
While this manuscript was under review, another group published a study on the aberrant replication that occurs in recG−E. coli cells using genetic and NGS copy number approaches (Rudolph et al., 2013). RecG-deficient cells also undergo cSDR, although the mechanism and locations of cSDR could differ between recG− and rnhA− cells, and RecG inactivation is not sufficient to suppress the absence of oriC function (Kogoma, 1997). In general, the genome profiles of various recG− mutant cells was reminiscent of the rnhA− profiles reported here, with a dramatic peak in the terminus region and a reversed pattern (terminus to oriC) when oriC function was abolished (Rudolph et al., 2013). However, Rudolph et al. (2013) concluded that much of the replication in recG− cells occurs by an aberrant pathway in which colliding replication forks trigger new rounds of DNA synthesis, a pathway prevented by the normal function of RecG helicase in wild-type cells. Since all of the RNase HI-deficient cells in our study have functional RecG, it seems unlikely that replication fork collisions are triggering replication and contributing to the genome profiles reported here. Further studies of the unusual mode(s) of replication in RNase HI- and RecG-deficient cells are clearly warranted.
All strains used in this study were derivatives of E. coli MG1655 (F− λ−ilvG−rfb-50 rph-1). Strains AQ12251 (rnhA339::cat) and AQ12257 (rnhA339::cat dnaA850::Tn10) were kindly provided by Dr Steven Sandler (University of Massachusetts, Amherst); these strains also contain a deletion of the Eut/CPZ55 prophage. The six strains containing artificial Ter insertions (Fig. 3A) were all derivatives of rnhA− strain AQ12251.
Plasmids and primers
Plasmid pKD13 Plus was made by using the QuikChange II Site-Directed Mutagenesis Kit to insert the 23 bp TerA sequence (5′-AATTAGTATGTTGTAACTAAAGT-3′) at position 1311 on plasmid pKD13 (Baba et al., 2006). PCR primers used were 5′-GGAACTTCGAACTAATTAGTATGTTGTAACTAAAGTGCAGGTCGACGGATCCCCGG-3′ and 5′-CCGGGGATCCGTCGACCTGCACTTTAGTTACAACATACTAATTAGTTCGAAGTTCC-3′.
To prepare the tetracycline-resistant Mini-λ plasmid for the recombineering experiments, DH10B E. coli cells containing the plasmid were grown up to exponential phase in 125 ml of LB at 30°C, and then incubated at 42°C to induce the excision of the plasmid from the chromosome (see Court et al., 2003). Plasmid DNA was then extracted from the cells using the Plasmid Midi kit from Qiagen.
Appropriate primers targeting the gene of interest on the chromosome were first used to generate a PCR product containing the artificial TerA site plus the kanamycin resistance gene from the plasmid pKD13 Plus. Primers typically consisted of a 50 nt 5′ end corresponding to a DNA sequence flanking the gene of interest in the chromosome (see Supporting information for Baba et al., 2006), followed by a 20 nt sequence (either 5′-TGTAGGCTGGAGCTGCTTCG-3′ for ‘UP’ or 5′-ATTCCGGGGATCCGTCGACC-3′ for ‘DN’) flanking the segment of the pKD13 Plus plasmid containing the kanamycin resistance gene and the artificial TerA site.
Cells containing Mini-λ plasmid were grown up to an OD599 of 0.5 with shaking at 32°C. The cell volume was then split into two, and one half was incubated at 42°C for 15 min to induce the Mini-λ plasmid, while the other was incubated at 32°C to serve as a negative control. The two sets were chilled on ice following their incubations and were used to prepare electrocompetent cells using ice-cold water or 10% glycerol. The recombineering PCR product generated above was then electroporated into both electrocompetent cells, and kanamycin-resistant transformants were selected on LB agar plates containing kanamycin. Mock electroporations were also included for both sets of cells. A successful recombineering experiment produced kanamycin-resistant colonies from the induced cells but not in the uninduced or mock controls. Successfully recombineered strains were then screened by PCR, using primers flanking the original gene in the chromosome.
Preparation of genomic DNA embedded in agarose plugs
Escherichia coli cells were grown in LB at 37°C to an OD599 of 0.4. For each agarose plug to be made, 4 ml of cells were harvested by centrifugation at 6000 g for 10 min (4°C). Using the Bio-Rad CHEF Bacterial Genomic DNA Plug Kit (#170–3592), harvested cells were resuspended in Cell Suspension Buffer (10 mM Tris pH 7.2, 20 mM NaCl, 50 mM EDTA), equilibrated at 50°C, and mixed with a 2% low melting point agarose solution (also equilibrated at 50°C). The cell/agarose mixture was immediately poured into a plug mold and allowed to solidify at 4°C for 15 min. Afterwards, each plug was retrieved and incubated in a 1 mg ml−1 lysozyme solution for 2 h at 37°C, followed by an overnight incubation in a 1 mg ml−1 Proteinase K solution. Plugs were then washed four times in Wash Buffer (20 mM Tris pH 8.0, 50 mM EDTA) and went on to be used for two-dimensional gel electrophoresis. Extra plugs were stored stably at 4°C for up to 3 months.
Digestion of agarose-embedded genomic DNA
Agarose plugs containing genomic DNA were immersed in 1 ml TE buffer (10 mM Tris pH 8.0, 1 mM EDTA) and chilled on ice for 15–30 min, inverting occasionally. The TE buffer was then replaced with 300 μl of the appropriate 1× restriction endonuclease buffer, and chilled on ice again for 15–30 min. The restriction endonuclease buffer was then replaced with fresh buffer containing 100 units of the restriction enzyme, and the plugs were incubated overnight at 37°C.
Two-dimensional agarose gel electrophoresis
Following the overnight digestion, plugs were chilled on ice and the buffer was carefully removed. Plugs were then equilibrated in 1 ml of 0.5× TBE running buffer (44.5 mM Tris-HCl, 44.5 mM borate, 1 mM disodium EDTA) on ice for 15–30 min. Plugs were carefully loaded into the wells of a 12 × 14 cm 0.4% agarose gel and covered with molten 0.5% low melting point agarose. The wells were allowed to solidify for 15 min at 4°C, and the gel was run in 0.5× TBE buffer at 15 V for 24 h. Following this first-dimension run, the second dimension gel was prepared as outlined by Friedman and Brewer (1995). Briefly, a slice from each lane was carefully cut out from the first-dimension gel at appropriate positions containing the targeted restriction fragments and placed at a 90° counter-clockwise orientation in a 20 × 25 cm gel box. Molten 1% agarose containing ethidium bromide (0.3 μg ml−1) was then poured over the gel slices and allowed to solidify. This resulting 2D gel was run at 150 V for 15 h in 0.5× TBE buffer containing ethidium bromide (0.3 μg ml−1). In order to visualize replication intermediates, a Southern blot was performed on the 2D gel using radioactive DNA probes specific to the region(s) of interest. Probes were generated by PCR and were radiolabelled using the Random Primed DNA Labeling kit from Roche. All blots were visualized using the Storm 820 Phosphorimager (Molecular Dynamics) and quantified with ImageQuant software (Molecular Dynamics).
Deep sequencing analysis
DNA used for deep sequencing analysis was prepared by harvesting 4 ml of cells grown in LB to an OD599 of 0.5 and extracting genomic DNA using the Qiagen Genomic-tip 100/G kit and Genomic DNA Buffer Set. To prepare chloramphenicol-treated DNA samples, cells were grown to an OD599 of 0.5 and incubated with chloramphenicol (150 μg ml−1) for 90 min before harvesting. Hydroxyurea-treated samples were prepared by growing cells at 30°C in M9 CAA medium to an OD599 of 0.2, then adding hydroxyurea to a concentration of 80 mM and incubating for 4 h. Note that rnhA−dnaA− double mutants, used in a subset of the genome profile experiments, are unable to grow on rich media (LB) plates (Torrey et al., 1984). While our double mutant also failed to grow on LB plates, it consistently grew in either minimal media or LB broth when inoculated from colonies on minimal plates. For consistency with all other experiments, we therefore used rnhA−dnaA− double mutants grown in LB broth (inoculated from colonies on minimal media agar plates).
The genomic DNA was sequenced via Illumina HiSeq™ (50 bp or 100 bp single reads) at the Sequencing Facility of the Duke Institute for Genome Sciences & Policy (IGSP). The raw sequence read data (fastq files) have been deposited in the National Center for Biotechnology Information Sequence Read Archive (Accession Number SRP026465). The sequencing reads (approximately 13 to 25 million per sample; 17 million average) were imported into Geneious Pro (Biomatters) and assembled to the reference chromosome MG1655 (GenBank Accession Number 000913.2). The assembly process was set to medium sensitivity on Geneious, with the following parameters: 10–15% gaps allowed per read; word length of 18; index word length of 13; words repeated more than 12 times ignored; 20% maximum mismatches per read; and maximum ambiguity of 4. BAM files of the resulting assembly data were exported to JMP Genomics (SAS), where read counts were generated for 100 bp bins across the length of the chromosome by adding the read counts together from each of the 100 base pair positions (note that every 50-base oligonucleotide read therefore gets counted 50 times, once for each base pair position). The deep sequencing data are located in Datasets S1–S3.
By default, the Geneious software maps sequencing reads from repeated regions of the chromosome randomly during assembly, and this would produce inaccurate read counts for these regions. Bins containing repeated regions were therefore removed from the raw read count data for the samples prior to analysis. First, using the assembly data for our reference strain, chloramphenicol-treated MG1655 (MG+CAM), we identified 719 bins (out of 46 397) in the chromosome that contained repeated regions, along with 223 additional bins whose read counts were lower than 4 standard deviations from the mean of the read counts in the set. Second, additional bins containing annotated RIP and REP elements, rrn genes, and mobile elements in MG1655 greater than 200 bp in length were removed. Third, as most of the strains used contained an rnhA mutation, and some also contained a deletion of the Eut/CPZ55 prophage (strains AQ12251, AQ12257, and their derivatives; see Figs 5C, 6, 8, 9B and C), the bins for the rnhA gene and the prophage were also included for removal. Collectively, these identified bins (1281 in total) were deleted from the raw read count data for each sample strain analysed, and are listed in Dataset S5.
Following the removal of the repeat bins above, the raw read counts for each sample set were divided by the values from the corresponding bins in the MG+CAM reference strain dataset. This ratio was then multiplied by the ratio of total reads in the MG+CAM reference set over total aligned reads in the relevant sample, to correct for the somewhat different numbers of aligned reads in the various samples. The Log2 values of the corrected sample/reference ratios were then calculated and plotted against genomic position to generate the replication profile charts. A LOESS utility (Peltier Tech; http://peltiertech.com/WordPress/loess-utility-for-excel) was used to generate a smoothed curve on the corrected ratios. LOESS calculates a moving weighted regression across the data in Microsoft Excel, and we calculated the Log2 values of the LOESS regression values for plotting in the figures. The number of points for the moving regression (Npts) in each case was set to 2000 (corresponding to 200 kb). To compensate for the circular nature of the E. coli chromosome, the first and last 1000 data points in each dataset were copied to the end and beginning of the database, respectively, prior to running the LOESS calculations. The LOESS regression was then performed on the data as described above, and the calculated values for the copied data points were removed before plotting the LOESS values against genomic position, laid over the replication profile charts.
This research was supported in part by NIH grants to KNK (GM72089) and JDW (GM084003). We are very grateful to David MacAlpine and Jason Belsky, who provided invaluable advice on the NGS analyses. The authors declare that they have no conflicts of interest with regard to this study.