Functional regions of the N-terminal domain of the antiterminator RfaH

RfaH is a bacterial elongation factor that increases expression of distal genes in several long, horizontally acquired operons. RfaH is recruited to the transcription complex during RNA chain elongation through specific interactions with a DNA element called ops. Following recruitment, RfaH remains bound to RNA polymerase (RNAP) and acts as an antiterminator by reducing RNAP pausing and termination at some factor-independent and Rho-dependent signals. RfaH consists of two domains connected by a flexible linker. The N-terminal RfaH domain (RfaHN) recognizes the ops element, binds to the RNAP and reduces pausing and termination in vitro. Functional analysis of single substitutions in this domain reported here suggests that three separate RfaHN regions mediate these functions. We propose that a polar patch on one side of RfaHN interacts with the non-template DNA strand during recruitment, whereas a hydrophobic surface on the opposite side of RfaHN remains bound to the β′ subunit clamp helices domain throughout transcription of the entire operon. The third region is apparently dispensable for RfaH binding to the transcription complex but is required for the antitermination modification of RNAP.

were included in transcription reactions at 500 nM. Aliquots were withdrawn at times ranging from 10 to 1200 sec and analyzed on a 8% denaturing gel.
B. The fraction of RNA at opsP (U43+G44+C45) as a function of time, quantified from the gel in (A). Wild-type GreB but not the cleavage deficient variant completely eliminates the slow escaping fraction of RNAP observed in the presence of RfaH.
C. Anti-pausing activity of RfaH at hisP expressed as k hisP /Eff hisP (see the next section) is not affected by GreB, suggesting that RfaH is properly recruited and retained with the TEC in the presence of GreB.
D. The fractions of RNAP retained at U43, G44 and C45 position as functions of time quantified from gel in (A). GreB dramatically reduces the RNAP fractions at C45 and G44 but increases the fraction at U43 compared to those observed in the presence of RfaH alone. Considering that GreB effects at pause sites are conventionally attributed to acceleration of RNA cleavage in backtracked TECs (Marr & Roberts, 2000), the data pattern above suggests that ~15% of RNAP is backtracked at positions C45 and G44 with the U43 nucleotide positioned in the active site.

Kinetic analysis of pausing at the opsP and hisP sites
To model RNAP pausing at opsP, we treated three successive RNA positions, U43 (the conserved essential pause, opsP1), G44 and C45 (the nonessential subpause, opsP2), as a single pause site. At the 5-s timepoint, 50-80% of RNAP was observed at such a combined opsP, and when allowed to vary, the fitted occupancy of opsP at zero time often approached 100%.
Preliminary analysis demonstrated that RfaH and its variants affected the distribution of RNAP between U43, G44, C45 positions and the RNAP escape rate, but not the occupancy of the opsP extrapolated to zero time. Accordingly, for uniform analysis of all datasets we constrained the latter parameter to 100%.
The fraction of RNAP at the opsP site as a function of time t was described by equation The equation 1b assumes that the opsP site is populated with 100% efficiency and generates two populations of RNAP, 1-F slow and F slow , which escape with the first order rate constants k fast and k slow , respectively. The model implies that opsP is populated instantaneously, which is a commonly used simplification. Considering that opsP is located just 8 bp downstream of the halted complex, the expected time of RNAP arrival to this site is about 0.5 s at 15 nt/s. The influx of RNAP into opsP after 5 s is in most cases very small and does not need to be taken into account, permitting simple mathematical modeling of the process and precise determination of the fraction of slowly escaping RNAP and escape rate constants. The F slow and k slow parameters were also employed for accurate modeling of arrival of opsP-released RNAP at the hisP site (see below).
The fraction of RNAP arriving at the hisP site by the time t (designated as ArrhisP) was described by a differential equation 2a or 2b: Both models assume that a non-zero minimal time represented by an offset parameter is required for RNAP to arrive at the hisP site located 108 bp downstream of the halted complex.

Equation 2a was used in conjunction with equation 1a for the dataset where RNAP escape from
opsP and, accordingly, arrival to hisP site followed the simple monoexponential function.
Equation 2b was employed when escape from the opsP followed the biexponential function (equation 1b). According to equation 2b, RNAP populations that escaped from opsP with the first order rate constants k fast and k slow arrived at hisP with rate constants k arr and k slow , respectively. Thus, the fast escaping RNAP was slowed down by superimposition of the effects of multiple low-efficiency pauses upstream of the hisP site (k fast >k arr ), whereas for the slow escaping fraction the release from the opsP site was the sole rate limiting step, and the arrival constant was the same as the opsP escape constant k slow .
The diffusion of the RNAP front inevitably occurs, but with a relatively low sampling resolution of our assay it does not need to be taken into account to adequately describe the RNAP arrival at the hisP site. Importantly, the k slow and k arr parameters should not be confused with the mean rates of nucleotide addition, but rather represent the rate-limiting events that ultimately modulate the influx of RNAP at hisP.
The fraction of RNAP at hisP site as a function of time t was described by a differential equation 3: , 0 In this model, RNAP populates the hisP site with efficiency Eff hisP and escapes with the first order rate constant k hisP . The model implies that RNAP populations that were released from opsP rapidly (1-F slow ) and slowly (F slow ) differ only in the rate of arrival at hisP, which may be an oversimplification. However, our data do not allow for an independent inference of the slow fraction properties since (i) the slow fraction's escape rate from opsP was typically an order of magnitude lower than the hisP escape rate, making opsP the sole rate limiting step and (ii) the slow fraction usually accounted for less than a quarter of RNAP. In other words, slowly released RNAP never populated hisP to any measurable extent, making the determination of RNAP pausing propensity impossible. In addition, the influx and efflux rate constants (k arr and k hisP , respectively) were very similar at the hisP site, resulting in the apparent pausing efficiency (0.1-0.3) that is considerably lower than the fitted efficiency (0.5-0.8).
Importantly, under these conditions, the efficiency and pause escape rate parameters cannot be fully resolved because an increase/decrease in efficiency can be compensated by an increase/decrease in the escape rate constant. On the other hand, the k hisP /Eff hisP ratio could be determined with high accuracy, and was very reproducible in repeated experiments. Accordingly, we analyzed the data with the modified model including k hisP /Eff hisP parameter instead of k hisP and used k hisP /Eff hisP values for comparative evaluation of the AP activity of RfaH variants, as described in the Results section. For each dataset, the fractions of RNAP (1) at the opsP site; (2) arrived to hisP; and (3) at the hisP site were simultaneously fit to equations 1, 2 and 3, respectively, using numerical integration capabilities of Scientist 2.01 software [(Micromath; Bulirsch-Stoer method (Bulirsch & Stoer, 1991)]. were predicted to induce a greater decrease in stability (up to 6.54 kcal/mol for the W4A variant); however, the structure of the mutants is predicted to be largely intact, whereas the domain interface is destroyed after RfaH recruitment to the TEC, and the same region is thought to instead bind to the ß' CH domain of RNAP (Belogurov et al., 2007). Destabilization of the closed conformation might actually increase RfaH activity, but the detailed analysis of the effects of these substitutions would require a high-resolution experimental model of RfaH-TEC interactions. The predicted effects on stability did not correlate with a given mutant's antipausing activity, although variants with an increased stability tend to have increased or nearwild type levels of activity. The overall structure is also not predicted to change as a result of these substitutions: the total molecule RMS (root mean square) values deviated less than 2Å from the wild-type RfaH and they do not correlate with activity (Fig. S3). Predicted lowest energy structures for mutants fit quite well within WT structural ensemble (Fig. S2) further confirming that they belong to the same structural ensemble, accessible in solution. Flexibility changes were assessed by alignment of the sample structural ensembles for each mutant and the starting PDB structure and were found to be insignificant in the closed conformation of RfaH (data not shown). Altogether our modeling indicates that the impact of each single amino acid substitution on RfaH structure, flexibility and stability (G) are rather moderate and are unlikely to account for majority of noted defects in mutants' activity. The tube radius for the wild-type ground state was set to 0.5, for "mutants" -to 0.15, for the higher energy WT structures -to 0.05. Cartoon is colored according to activity of RfaH variants in Fig. 6. Variants that possess near-wild type activity (F51A, H20A, K37A) are colored yellow; those displaying significant defects (R16A, T66A, W4F) are colored red; the wild-type structural ensemble is shown in black for visibility. The coordinates for alignment were obtained through CONCOORD-PBSA modeling, alignment and image were generated using PyMOL (DeLano, 2002). Tube cartoon representations (tube radius for WT was set to 0.5, for "mutants"to 0.15). Cartoon is colored according to RfaH activity in Fig. 6: the wild-type, K37A, E19A, H20A, F51A, H94A are shown in yellow; mildly defective R43A and T72A -in orange; very defective R16A, H65A, T66A, T67A, F56L, Y54A, Y8A, Y54F, W4F are shown in red. The coordinates for the alignment obtained through CONCOORD-PBSA modeling, alignment and image were generated using PyMOL. where q is raw ellipticity, [P] is protein concentration (µM) and n is the number of amino acid residues.