Building better enzymes: Molecular basis of improved non‐natural nucleobase incorporation by an evolved DNA polymerase

Abstract Obtaining semisynthetic microorganisms that exploit the information density of “hachimoji” DNA requires access to engineered DNA polymerases. A KlenTaq variant has been reported that incorporates the “hachimoji” P:Z nucleobase pair with a similar efficiency to that seen for Watson–Crick nucleobase incorporation by the wild type (WT) KlenTaq DNA polymerase. The variant polymerase differs from WT KlenTaq by only four amino acid substitutions, none of which are located within the active site. We now report molecular dynamics (MD) simulations on a series of binary complexes aimed at elucidating the contributions of the four amino acid substitutions to altered catalytic activity. These simulations suggest that WT KlenTaq is insufficiently flexible to be able to bind AEGIS DNA correctly, leading to the loss of key protein/DNA interactions needed to position the binary complex for efficient incorporation of the “hachimoji” Z nucleobase. In addition, we test literature hypotheses about the functional roles of each amino acid substitution and provide a molecular description of how individual residue changes contribute to the improved activity of the KlenTaq variant. We demonstrate that MD simulations have a clear role to play in systematically screening DNA polymerase variants capable of incorporating different types of nonnatural nucleobases thereby limiting the number that need to be characterized by experiment. It is now possible to build DNA molecules containing nonnatural nucleobase pairs in addition to A:T and G:C. Exploiting this development in synthetic biology requires engineered DNA polymerases that can replicate nonnatural nucleobase pairs. Computational studies on a DNA polymerase variant reveal how amino acid substitutions outside of the active site yield an enzyme that replicates nonnatural nucleobase pairs with high efficiency. This work will facilitate efforts to obtain bacteria possessing an expanded genetic alphabet.

The variant polymerase differs from WT KlenTaq by only four amino acid substitutions, none of which are located within the active site. We now report molecular dynamics (MD) simulations on a series of binary complexes aimed at elucidating the contributions of the four amino acid substitutions to altered catalytic activity. These simulations suggest that WT KlenTaq is insufficiently flexible to be able to bind AEGIS DNA correctly, leading to the loss of key protein/DNA interactions needed to position the binary complex for efficient incorporation of the "hachimoji" Z nucleobase. In addition, we test literature hypotheses about the functional roles of each amino acid substitution and provide a molecular description of how individual residue changes contribute to the improved activity of the KlenTaq variant. We demonstrate that MD simulations have a clear role to play in systematically screening DNA polymerase variants capable of incorporating different types of nonnatural nucleobases thereby limiting the number that need to be characterized by experiment. It is now possible to build DNA molecules containing nonnatural nucleobase pairs in addition to A:T and G:C. Exploiting this development in synthetic biology requires engineered DNA polymerases that can replicate nonnatural nucleobase pairs. Computational studies on a DNA polymerase variant reveal how amino acid substitutions outside of the active site yield an enzyme that replicates nonnatural nucleobase pairs with high efficiency. This work will facilitate efforts to obtain bacteria possessing an expanded genetic alphabet.

K E Y W O R D S
DNA replication, enzyme engineering, expanded genetic alphabets, molecular dynamics, polymerase

| INTRODUCTION
The development of "semisynthetic" microorganisms possessing artificially expanded genetic information systems (AEGIS) will permit access to cells with novel phenotypes and biotechnological applications. [1][2][3][4] Nonnatural nucleobase pairs that meet the size and/or hydrogen bonding complementarity rules of Watson-Crick base pairing have been described over the past two decades ( Figure S1A), [5][6][7][8] including the complementary 2-amino-8-(1-beta-D-2 0 -deoxyribofuranosyl)imidazo [1,2-a]-1,3,5-triazin-[8H]-4-one (trivially known as P) and 6-amino-3-(2 0 -deoxyribofuranosyl)-5-nitro-1H-pyridin-2-one (trivially known as Z) nucleobase pair that is present in "hachimoji" DNA ( Figure 1). 1,5,[9][10][11][12] As is true of naturally occurring (Watson-Crick) DNA, AEGIS DNA duplexes containing P:Z pairs interconvert easily between A-and B-helical forms. 13,14 In addition, B-form DNA tolerates the inclusion of multiple consecutive P:Z nucleobase pairs with minimal structural impact on the double helix when compared to duplexes containing only A:T or G:C base pairs. 9 Identifying DNA polymerases capable of catalyzing the incorporation of P:Z nucleobase pairs with efficiencies comparable to those that replicate Watson-Crick DNA is a necessary prerequisite to realizing the promise of expanded genetic alphabets. 15,16 A variety of library generation and selection strategies have been developed to reengineer the fidelity of DNA polymerases, [17][18][19] including the large (Klenow) fragment of Thermus aquaticus DNA polymerase I, which lacks the N-terminal 5 0 -3 0 exonuclease domain (KlenTaq). 20 Thus, a compartmentalized selfreplication strategy 21,22 was used to obtain a KlenTaq F I G U R E 1 (a) Illustration of the X-ray crystal structure of the "evolved" variant KlenTaq polymerase in its binary complex (PDB: 5W6Q), 24 showing the location of the P:Z nucleobase pair in the polymerase domain. Helices and strands are colored purple and yellow, respectively. (b) Chemical structure of the P:Z nucleobase pair. R indicates the location of the 2'-deoxyribose substituent and hydrogen bonds are indicated by dashed lines. (c) Close-up of the polymerase domain showing the palm, fingers and thumb domains, the helical regions αG-αK, αP, and αQ, and the side chains of the mutated residues, Val444, Ala527, Glu551, and Val832, in the variant KlenTaq polymerase variant capable of incorporating dZTP opposite a P nucleobase in the template strand with a greatly improved efficiency relative to the wild type (WT) precursor ( Figure 1). 23 The evolved KlenTaq variant contains four amino acid replacements (M444 V, P527A, D551E, and E832V; Figure 1), all of which are distal to the active site. As a result, none of these residues interact directly with either the primer/template P:Z in the active site or with incoming nucleotide triphosphate (dZTP) in high resolution crystal structures of the pre-and postincorporation complexes for this variant polymerase (PDB:5W6K and PDB:5W6Q, respectively). 24 In considering, how this evolved KlenTaq variant might differ from WT KlenTaq, Singh et al. analyzed differences in relative domain motions in the unnatural complexes as compared to the natural complexes. 24 There were no significant differences between the relative domain positions in the crystal structures of pre-incorporation complexes for the Watson-Crick (WT KlenTaq within incoming dNTP) and AEGIS (KlenTaq variant with incoming dZTP paired to template P) systems. In contrast, there were significant differences in the post-incorporation Watson-Crick (WT KlenTaq with G:C bound in the active site) versus AEGIS (evolved KlenTaq with P:Z bound in the active site) complexes. In forming the ternary preincorporation complex, there is a large motion of the fingers domain as it closes down on the incoming dNTP when compared to the binary post-incorporation complex. 24 Thus, the average rotation angle calculated from a comparison of the pre-and post-incorporation AEGIS complexes was 63.6 while that for the Watson-Crick complexes was decreased by 4-5 despite the fact that the Watson-Crick and AEGIS pre-incorporation complexes ultimately reached a similar structural state. Modeling the AEGIS template/primer bound to WT KlenTaq in a postincorporation complex resulted in two significant "clashes," suggesting that the structure of the AEGIS template-primer differed significantly from that of its Watson-Crick counterpart. Collectively, the crystallographic evidence supported the hypothesis that the four amino acid substitutions in the KlenTaq variant led to increased flexibility in the enzyme and that this property was required for efficient incorporation of dZTP opposite template P. 24 In addition, this prior study showed that the post-incorporation complex was the most affected by the residue substitutions. We now report a series of molecular dynamics (MD) simulations on eight binary complexes that not only test the idea that amino acid substitutions in the KlenTaq variant give rise to increased flexibility but also reveal how altered dynamical motions might contribute to the increased efficiency of P:Z nucleobase incorporation by the KlenTaq variant. These calculations set the scene for obtaining novel KlenTaq polymerase variants that can incorporate multiple types of nonnatural nucleobase pairs using MD simulations as part of rational, structure-based strategies.  Figures S2-S4). The similarity of motions in the "thumb," "palm," and "fingers" domains for both polymerases is evident from principal component analysis (PCA). 25 Thus, projecting the simulation snapshots along the largest components (PC1 and PC2) shows that WT KlenTaq and the KlenTaq variant explore the same phase space ( Figure S4).

| WT KlenTaq polymerase binds differently to Watson-Crick and AEGIS DNA template/primer duplexes
We tested the hypothesis that the four amino acid substitutions in the KlenTaq variant exert their effects by modifying dynamical motions in the binary complex 21 using four MD simulations: WT KlenTaq bound to Watson-Crick DNA (template: 5'-AAAGGGCGCCGTGGTC-3 0 / primer: 5'-GACCACGGCGCC-3 0 ) and P:Z-containing DNA (template: 5'-AAAGPGCGCCGTGGTC-3 0 /primer: 5'-GACCACGGCGCZ-3 0 ) duplexes, and the KlenTaq variant bound to the same Watson-Crick and AEGIS DNA duplexes. Template and primer strands were positioned in these models based on the X-ray crystal structure of the variant KlenTaq/AEGIS DNA binary complex. 24 All four models were stable throughout the MD simulations based on root mean square deviation (RMSD) values ( Figure S2). Most of the mobile residues were located in the fingers and thumb domains. The thumb domains in both polymerases when in the binary complexes, however, were less flexible when compared the cognate, uncomplexed enzymes ( Figure S3).
The dynamical motions of both the protein and the DNA in the four trajectories were analyzed to understand how the altered residues (M444 V, P527A, D551E, and E832V) in the KlenTaq variant might impact incorporation of both Watson-Crick and AEGIS nucleobases. The trajectories of the binary complexes in which WT KlenTaq was bound to either Watson-Crick or AEGIS DNA gave two main principal components, which captured 39.6% of the coordinate variance ( Figure 2a). The largest of these, PC1, is associated with a motion in which the thumb and fingers domains move away from each other in the binary complex. The second major component, PC2, describes local motions in the thumb domain. Projecting structures sampled in the two MD simulations along PC1 and PC2 (Figure 2a) shows that the two binary complexes separate along PC1, although they can sample common regions of phase space.
Domain motions (DynDom) 26 computed from the MD trajectories highlight these differences in the dynamics of WT polymerase when complexed with either Watson-Crick or AEGIS DNA duplexes ( Figure 3). In the WT KlenTaq/Watson-Crick DNA binary complex, the inactive exonuclease domain (residues 290-423) moves away from the polymerase domain (424-832) due to bending of residues in the palm (424-449, 553-614, and 774-832) ( Figure 3a, Table S1). This motion is consistent with the need for the polymerase to change its conformation in order to bind the DNA duplex. Additional evidence is provided from the dynamic cross-correlation map (DCCM) 27 of this binary complex, which shows the motions of the fingers and thumb domains to be anticorrelated (red box in Figure 4 [left/top]). An anticorrelated motion of residues in the αJ/K helix of the palm domain ( Figure 1) and fingers domain (blue box in Figure 4 [left/top]), which is not seen in the MD trajectory of the uncomplexed WT polymerase ( Figure S4c), also supports the DynDom analysis.
The motional properties of the binary complex in which WT KlenTaq binds AEGIS DNA differ in a number of important aspects from that containing Watson-Crick DNA. For example, the domain motions are more complicated even though the exonuclease and polymerase domains still move apart (Figure 3b). Not only do the fingers and thumb move away from each other but the tip of the thumb about the AEGIS DNA becomes  (Table S1) due to bending of residues located in the palm, thereby also causing the tip of the fingers domain to move away from the thumb (Figure 3b). In agreement with these observations, the DCCM for the WT KlenTaq/AEGIS DNA binary complex shows anti-correlated motions of (a) the fingers and thumb (red box in Figure 4 [left/bottom]) and (b) residues in the αJ/K helix of the palm domain with the fingers (blue box in Figure 4 [left/bottom]). Neither of these motions is seen for the uncomplexed WT polymerase ( Figure S4c). The presence of the Z:P nucleobase pair also leads to anti-correlated residue motions in the αG helix and the fingers domain (green box in Figure 4 [left/ bottom]).

| The evolved KlenTaq polymerase accommodates both Watson-Crick and AEGIS DNA template/primer duplexes albeit through distinct dynamical motions
While it is important for the evolved polymerase to recognize the unnatural AEGIS substrate, it must also recognize Watson-Crick nucleobases to be useful for the DynDom analysis suggests that the fingers domain of the variant polymerase exhibits different motions when binding Watson-Crick rather than AEGIS DNA. In addition, the tip of the fingers domain is tilted toward the base with a 14.6 rotation and a 0.5 Å translation in the KlenTaq variant/Watson-Crick DNA binary complex (Table S1), and the domain motions differ from those present in the KlenTaq variant/AEGIS DNA complex (Figure 3c,d). These findings are consistent with inferences based on comparing the X-ray crystal structures of the KlenTaq variant/AEGIS DNA and the WT KlenTaq/Watson-Crick DNA binary complexes. 24 The rotation of the thumb domain about residues 468-534 in the MD trajectory of the KlenTaq variant/ AEGIS DNA binary complex is larger than that seen in the X-ray structure (Table S1), perhaps as a consequence of replacing Pro-527 by alanine. The exonuclease domain also moves away from the polymerase domain in the KlenTaq variant/AEGIS DNA binary complex, which is not seen when the variant polymerase binds to Watson-Crick DNA (Figure 3c,d). Finally, an anti-correlated motion between the αJ/K helix in the palm and the fingers is present when the KlenTaq variant binds AEGIS DNA [blue box in Figure 4 [right/bottom]) that is absent in the free polymerase. In contrast, this anti-correlated motion is decreased in the KlenTaq variant/Watson-Crick DNA binary complex. We conclude that the KlenTaq variant exhibits altered dynamical motions in binding the Watson-Crick DNA duplex, as compared to AEGIS DNA, even though it still readily accommodates this substrate.

| Z:P-containing DNA in the binary complex is sub-optimally recognized by the WT KlenTaq polymerase
Interactions of the terminal nucleobase pair (C112:G205) of Watson-Crick DNA with the WT polymerase seen in the MD-derived trajectory are consistent with experimental findings. [28][29][30] Three intermolecular interactions between the C112:G205 nucleobase pair and protein residues Arg-573, Arg-746, and Asp-785 are highly populated in the simulation, two of which (Arg-746/G205 and Asp-785/C112) involve the sugar-phosphate backbone ( Figure S5a). The third interaction is a hydrogen bond between the Arg-573 side chain and the newly added nucleobase C112 (in the Nth position at the 3 0 -end of the primer) ( Figure S5a, Table S4). In addition, the side chain of Arg-587, which is located in the αJ/K helical region of the palm domain, forms three highly populated hydrogen bonds to C111, located in the [N-1]th position, in the MD trajectory (Orientation 1 of Figure 5a). This finding is consistent with prior proposals that Arg-587 correctly positions the template strand through interactions with the [N-1]th nucleotide thereby allowing the incoming dNTP to bind appropriately for reaction with the 3'-OH in the ternary complex prior to primer extension. 29 The interaction of Arg-587 with the [N-1]th nucleotide in the binary complex is also correlated with the presence of a salt bridge between Arg-596 and Glu-832 in the simulation trajectory of the WT KlenTaq/Watson-Crick DNA binary complex (Figures 5a and S6a). Two "coupled" arrangements of the Arg-587 and Arg-596 side chains located at the ends of the αJ/K helical region are also seen during the MD trajectory ( Figure S6a). In the first arrangement (Orientation 1), which is present in a majority of sampled structures, Arg-587 interacts with the [N-1]th nucleotide (C111) and Arg-596 is positioned to interact with the Glu-826 and Glu-832 side chains ( Figure 5). In the second arrangement (Orientation 2) Arg-587 interacts with the Nth nucleotide (C112) and Arg-596 reorients to interact with the αH helix in the thumb domain ( Figure 5).  (Table S5). For example, only two interactions with the Nth nucleotide (Z112) are observed, in which Arg-573 and Asp-785 continue to form populated interactions to the nucleobase and sugar-phosphate backbone of Z112 ( Figure S5b). Moreover, the AEGIS nucleobase P205, located in the template strand, becomes hydrogen bonded to the Gln-754 side chain, a protein/DNA interaction that is absent in the trajectory for the WT KlenTaq/Watson-Crick DNA binary complex ( Figure S5a). Importantly, because of its functional role in primer extension, the Arg-587 side chain interacts with Z112 rather adopting its optimal location in which it can interact with C111, the [N-1]th nucleotide ( Figure 5, Table S5). Arg-596 is also oriented so as to interact with Asp-452 in the αH helix at the base of the thumb domain presumably as a consequence of the altered conformational preference of the Arg-587 side chain (Figure 5b). Finally, the salt bridge between Arg-596 and Glu-832 is absent in the MD trajectory computed for the WT KlenTaq/AEGIS DNA binary complex. variant/Watson-Crick DNA binary complex. Arg-573 is also seen to interact with C207. When AEGIS DNA is bound by the KlenTaq variant, however, the Gln-754/P206 interaction is maintained but Arg-573 binds to Z112 (Table S6, S7).

|
Importantly, and unlike what is observed for the WT polymerase, Arg-587 preferentially binds to the [N-1]th nucleotide in the primer (C111 or Z112) in both binary complexes involving the KlenTaq variant ( Figure S7a, Table S6, S7). This finding suggests that the salt bridge between the side chains of Arg-596 and Glu-832 in the KlenTaq variant is not necessarily required to maintain the Arg-587/[N-1]th nucleobase interaction ( Figure S7b).
Although the DNA duplexes maintain their helical structure and bending angle in all MD simulations, the distribution of the slide and twist angles 31 in the CC dinucleotide step in the Watson-Crick primer (slide: −1.3 and −1.9 Å, twist: 25 ) differ from those of the cognate (CZ) step in the AEGIS primer (slide: −1.9 Å, twist: 30 ) (Figure 6a,b). This finding again supports the idea that WT KlenTaq binds DNA differently when the Z:P nucleobase pair replaces C:G whereas both Watson-Crick and AEGIS DNA are bound in a similar manner by the KlenTaq [variant Figure 6c,d]. The molecular origin of these differences in the slide and twist angle distributions are difficult to assign to specific protein/DNA interactions or individual amino acid substitutions in the Klentaq variant, although they may be associated with the altered electrostatic and dispersion properties of the Z:P nucleobase pair. 13 As far as we can tell from our MD trajectories of the four binary complexes, the 3'-OH in either of the terminal C or Z nucleotides adopts a similar location and participates in the same hydrogen bonding interactions ( Figure S8). We cannot rule out the possibility, however, that longer MD simulations might exhibit altered positioning of the 3'-OH in the Watson-Crick and/or AEGIS binary complexes.  Figure S11) and the resulting trajectories were compared to that obtained for the WT KlenTaq/AEGIS DNA binary complex. In particular, we wanted to ascertain which, if any, of the residue substitutions corrected mispositioning of arginine side chains about the P:Z nucleobase pair in the AEGIS DNA duplex. No change was observed for the polymerase/Z112 interactions in the simulations of the P527A and E832V KlenTaq variants compared to those discussed above for the WT polymerase/AEGIS DNA complex ( Figure S12). This was not the case, however, in the MD simulation of the M444 V variant/AEGIS DNA binary complex in which only the Arg-573 side chain interacted with Z112 (Figure 8a). Altered polymerase/ Z112 interactions were also observed for the D551E variant/AEGIS DNA binary complex in which only the Arg-587 side chain can interact with the nonnatural nucleotide, albeit via an electrostatic interaction with the phosphate group in the oligonucleotide backbone ( Figure 8a). As importantly, the motions of the M444 V KlenTaq variant do permit the side chain of Arg-587 to hydrogen bond to C111, the [N-1]th nucleotide (70-73% occurrence). In addition, the DCCMs for the M444 V and D551E KlenTaq variants (Figure 8b) reveal an anticorrelated motion between the palm and thumb domains, which is absent in the P527A and E832V KlenTaq variants ( Figure S13b). Turning the effects of individual residue substitutions on the structural properties of the template/primer duplex about the nonnatural P:Z nucleobase pair in the AEGIS DNA, we find that the CZ dinucleotide slide and twist distributions observed for the M444 V, P527A, and E832V KlenTaq variants are essentially identical to those seen for the WT polymerase. On the other hand, substituting Asp-551 by Glu results in the CZ dinucleotide slide and twist distributions becoming more like that observed for the CC dinucleotide when Watson-Crick DNA is bound by the WT polymerase (Figures 6a,b and 8c). In addition, we note that the interaction of Asp-785 with Z112 is absent in both MD trajectories computed for the M444 V/AEGIS DNA and D551E/AEGIS DNA complexes, suggesting that the addition of dZTP to the primer will be slowed in both of these KlenTaq single substitution variants.

| CONCLUSIONS
These MD simulations systematically explore the role of dynamical motions within the WT and evolved KlenTaq polymerases involved in positioning of Watson-Crick or AEGIS DNA template/primer substrates. In the absence of substrate, the amino acid substitutions within the evolved polymerase have no effect on the overall dynamical properties of the enzyme. In the presence of either Watson-Crick or AEGIS template/primer, however, the dynamical motions of the evolved KlenTaq are distinct from those of the WT polymerase. Specifically, WT KlenTaq is unable to maintain sets of correlated motions and specific interactions of amino acid residues required to appropriately position P:Z in the active site. The four amino acid substitutions in the KlenTaq variant resolve this problem by altering the flexibility of key segments of the enzyme with the consequence that the enzyme can bind to Watson-Crick and AEGIS DNA in an equivalent fashion.
Our MD simulations also confirm the proposal 24 that the KlenTaq variant has increased flexibility compared with the WT polymerase, which facilitates positioning of the Arg-587 side chain to interact with the [N-1]th nucleotide in the growing strand of both Watson-Crick and AEGIS DNA duplexes. Replacing Met-444 in the αG helix and Asp-551 at the base of the thumb domain by valine and glutamate, respectively, allows the fingers, palm and thumb domains in the KlenTaq variant to bind AEGIS DNA in an optimal conformation. All three mutations are synergistic. Changing Met-444 to valine increases the flexibility of the αG helix thereby ensuring an optimal interaction of Arg-587 with both Watson-Crick and AEGIS DNA duplexes. Replacing Asp-551 by glutamate rigidifies the base of the thumb and also increases the flexibility of the αG helix. Our calculations also confirm that replacing Pro-527 by alanine permits the tip of the thumb to be more flexible, 24 allowing the variant to bind both Watson-Crick and AEGIS DNA and facilitating the interaction of Asp-785 with the newly added nucleotide (C112 or Z112) in the binary complex. In complexes of WT KlenTaq, salt bridge formation between Glu-832 and Arg-596 is correlated with positioning of the Arg-587 side chain. This is no longer possible for the KlenTaq variant because Glu-832 is substituted by valine. No obvious dynamical changes were observed, however, for the E832V KlenTaq variant.
Both MD simulation and X-ray crystallography indicate that increasing domain flexibility will be required for polymerases capable of replicating DNA containing nonnatural nucleobases with high efficiency. This appears to be the case even for DNA duplexes that look very similar to those containing only Watson-Crick nucleobase pairs. In identifying correlated residue/residue and residue/ nucleotide interaction networks, hinge regions, and domain movements required for productive positioning of both natural and unnatural template/primer substrates in the active site of the KlenTaq variant, our calculations lay a foundation for screening altered KlenTaq polymerases with potential use in not only replicating hachimoji DNA but also the shape complementary nucleobases developed by Romesberg 2 and Hirao. 7 Further, our comparative analyses of MD simulations for all possible combinations of WT and evolved polymerases complexed to Watson-Crick and AEGIS DNA provides a general, structure-based strategy for obtaining polymerases capable of replicating expanded genetic alphabets.

| Parameterization of the nonnatural nucleobases
N9-methylated forms of the Z and P nucleobases were geometry optimized at the HF/6-31G* level of theory using the NWChem software package 32 prior to obtaining RESP charges 33 subject to the constraint of zero net charge. This procedure is consistent with the GAFF2 parameterization of small molecules. 34 Partial charges on the sugar-phosphate atoms in the corresponding nucleotide fragments were taken from the PARMBSC1 force field used to model nucleic acids, 35 with the desired overall charge being obtained by adjusting the initial partial charges on C1' and H1' (Table S2). 36 Force field parameters for the bonding interactions were obtained from PARMBSC1 and GAFF2. The resulting set of parameters for the two nonnatural nucleobases are provided elsewhere (Data S1).

| Building the initial models of WT KlenTaq, the evolved KlenTaq variant and the binary complexes
An initial model of the evolved KlenTaq polymerase variant bound to template (5'-AAAGPGCGCCGTGGTC-3 0 ) and primer (5'-GACCACGGCGCZ-3 0 ) DNA ( Figure S1b) was prepared from the corresponding X-ray crystal structure (PDB: 5W6Q). 24 Unobserved residues (291-293, 642-665, and 832) in the original X-ray crystal structure of the evolved KlenTaq variant were added based on their positions in other structures of WT KlenTaq (PDB:4KTQ 30 and PDB:1BGX 37 ). Energy minimization was performed to remove steric clashes in the initial model using the ff14SB 38 and PARMBSC1 35 force field parameters for the protein and DNA components, respectively, and our "in-house" parameters for the Z:P nucleobase pair. These calculations together with subsequent equilibration and MD simulations employed the PMEMD module implemented in the Amber2016 software suite. 39,40 The resulting model of the KlenTaq variant/AEGIS DNA binary complex was then used to construct models of the apo-forms of WT KlenTaq and the evolved KlenTaq variant, and the KlenTaq variant/ Watson-Crick, the WT KlenTaq/Watson-Crick and the WT KlenTaq/AEGIS DNA binary complexes. Similarly, the last frame of WT KlenTaq/AEGIS DNA binary complex was used to construct the initial models of the binary complexes in which AEGIS DNA was bound to the four singlepoint KlenTaq variants. Each of these models was placed in a box of explicit TIP3P 41 water molecules containing K + and Cl − ions to yield an ionic strength of 10 mM (Table S3). These counterions were chosen in order to mimic the conditions under which the KlenTaq variant was obtained by directed evolution. 23 Each system was energy minimized to remove bad contacts, heated, and then equilibrated for 100 ns to allow the quasi-immobile ions sufficient time to equilibrate prior to the production phase. 42 Three independent MD simulations of 110 ns were then performed on each of the six systems in the NPT ensemble (T = 323 K and p = 100,000 Pa). Models of the binary complexes of the M444 V, P527A, D551E, and E832V single-point KlenTaq variants bound to AEGIS DNA were built from the last frame of the WT KlenTaq/Z:P-containing DNA complex. These model structures were then solvated and equilibrated following the procedures outlined above, and MD trajectories (110 ns) determined using identical simulation conditions to those outlined above. Periodic boundary conditions were used in all MD simulations, with an 8 Å cutoff being used for non-bonded interactions, and particle-mesh Ewald methods were used to describe long-range electrostatics. [43][44][45] The temperature and pressure of each system was maintained using Langevin constraints, 46 and the SHAKE algorithm was used to constrain bonds involving hydrogen atoms, 47 which allowed the use of 2.0 fs time steps.

| Analysis of the MD trajectories
Structures were sampled at 100 ps intervals in each of the 23 MD trajectories to give a combined 3,300 "snapshots" for each of the six WT and variant Klentaq systems, and 1,100 "snapshots" for the four single variants. These trajectories that were analyzed using the DynDom, 26 3DNA, 31 Bio3D, 27 CPPTRAJ, 48 and VMD 49 software packages. Thus, DCCMs were computed based on the motions of the C α carbons of the protein using algorithms implemented in the Bio3D library in "R". 27 These maps reveal the extent to which the motions of two C α carbons are correlated during the MD trajectory, with the correlation coefficient for the residue pair varying over a range of −1 to +1. When the C α carbons have completely correlated motions (identical phase space, period and direction), the correlation coefficient has a value of +1. For "anti-correlated" motions, in which the correlated motions take place in opposite directions, the value of the correlation coefficient is −1. For completely uncorrelated motions, the correlation coefficient has a value of zero. The Bio3D library was also used to perform PCA for all snapshots obtained in the combined MD trajectories for each of the six systems.
Reference structures needed for the DynDom analysis were obtained by averaging the coordinates of the last snapshot in each of the three independent MD simulations for WT Klentaq, the Klentaq variant and four binary complexes (WT KlenTaq/Watson-Crick DNA; WT KlenTaq/AEGIS DNA; KlenTaq variant/Watson-Crick DNA; KlenTaq variant/AEGIS DNA). The domain motions of each reference structure compared to the X-ray crystal structure of WT KlenTaq in a binary complex (PDB: 3SZ2) 30 were then obtained using the DynDom package 26 by aligning the C α carbons of the two proteins. Helical parameters for the DNA molecules in each of the four complexes throughout the MD trajectories were calculated using the 3DNA 31 and do_x3DNA programs. 50 Hydrogen bonds were identified in the simulations assuming that the heavy-atom donor (D)/ acceptor (A) distance cutoff was 3.0 Å and that the D-H-A angle was in the range of 135-180 . Populated hydrogen bonds were defined to be present in at least 10% of the sampled structures in each MD trajectory except for those between the Arg-587 side chain and the nitro group of the Z nucleobase ORCID Nigel G. J. Richards https://orcid.org/0000-0002-0375-0881