Methods for enzyme library creation: Which one will you choose?

Enzyme engineering allows to explore sequence diversity in search for new properties. The scientific literature is populated with methods to create enzyme libraries for engineering purposes, however, choosing a suitable method for the creation of mutant libraries can be daunting, in particular for the novices. Here, we address both novices and experts: how can one enter the arena of enzyme library design and what guidelines can advanced users apply to select strategies best suited to their purpose? Section I is dedicated to the novices and presents an overview of established and standard methods for library creation, as well as available commercial solutions. The expert will discover an up‐to‐date tool to freshen up their repertoire (Section I) and learn of the newest methods that are likely to become a mainstay (Section II). We focus primarily on in vitro methods, presenting the advantages of each method. Our ultimate aim is to offer a selection of methods/strategies that we believe to be most useful to the enzyme engineer, whether a first‐timer or a seasoned user.


INTRODUCTION
The number of combinations of mutations that can theoretically be made in a protein of 100 amino acids works out to more than 10 130 possible combinations. If we consider that the largest screened libraries contain approximately 10 15 members, [1] it is clear that only an infinitesimal part of the vast theoretical sequence space is accessible through experiments and will remain so even as automation increases screening capacity. For this reason, researchers have focused on improving library quality because the ''smarter'' the library, the greater the odds of identifying the improved variants you are look-most useful when the structure or the functional regions of the protein we seek to alter are unknown. [8,12] However, random mutagenesis requires a powerful high-throughput screening method to isolate improved variants because beneficial mutations occur two to three orders of magnitude less frequently than do mutations that are deleterious to function. [9,16] Furthermore, randomized libraries severely limit the introduction of consecutive mutations in the context of a single round; even with the most modern of approaches, a single round of random mutagenesis seldom suffices to provide significant improvement. [11,17,18] Recombination methodologies involve fragmenting mutated copies of the gene of interest and reassembling them to give rise to new combinations of mutations [8] . Arguably the most popular application of rational recombination is SCHEMA, which involves the guided recombination of fragments of homologous genes from the same family, giving rise to chimeric proteins. [19] SCHEMA is particularly of interest because it is characterized by somewhat smaller theoretical library sizes (∼10 3 -10 4 variants) than other recombination methods [20][21][22][23] while generating diverse libraries with a higher proportion of functional variants than randomization methods. [8,24] While these options are powerful and widely adopted, our review emphasizes focused approaches. They involve targeting specific residues or regions for mutation to one or many possibilities, often in a combinatorial manner, or they might imply the use of insertions and deletions in the gene sequence. [3,18,[25][26][27][28] Targeted mutagenesis is limited by the requirement for structural or biophysical knowledge of the enzyme. To address this limitation, it is common to perform exploratory rounds of randomization that inform subsequent rounds of targeted mutagenesis: Nobel laureate Frances Arnold once stated that bringing together rational design and directed evolution is the way forward to the generation of quality libraries containing a high number of sequences with the desired properties. [4,27,29,30] A targeted strategy might include randomization of certain parts of a gene, but not the whole sequence. [10,12,17,18,31] Targeted mutagenesis can produce smaller theoretical libraries (∼10-10 4 ) that may hold a high proportion of functional variants, which in turn translates in a lower screening effort. [11,17] This has shown to be especially useful in improving specific properties of an enzyme: activity, selectivity and thermal resistance. [8] Although of limited scope for sequence diversification, the simplicity of targeted approaches has its benefits, including limiting the number of variants to be screened to find an improved enzyme.
The most basic approach for the generation of targeted mutations is site-directed mutagenesis (SDM), introduced by Nobel laureate Michael Smith to create a point mutation [32] . Its derivatives include site-saturation mutagenesis (SSM), discussed in Section I, for introduction of degenerate codons at a specific position [9, [33][34][35] , and combinatorial saturation mutagenesis (CSM, Sections I and II) for mutating more than one site at a time.
Single site saturation libraries: Saturation libraries offer high quality and small size while giving access to greater sequence diversity than do point mutations. However, building saturation libraries one position at a time severely limits exploration of sequence diversity; in particular, it curtails the discovery of positive epistatic effects, where specific combinations of mutations (whether proximal or distal) give a greater improvement than expected from combining effects of the point mutants. [36,37] These complex interactions may result from direct interactions between residues or complex interactions involving conformational dynamics or stability. [37] Epistasis is challenging to foresee, limiting predictability in the exploration of the sequence space. [38,39] Combinatorial saturation libraries: To explore epistasis in saturation libraries, combinatorial approaches can be useful. In this case, more than one site is mutated at once and the combination of possible mutations are assessed. Examples of strategies of this kind are ISM (Iterative Saturation Mutagenesis) and its derivatives.
In ISM, sites of interests are chosen (as many as necessary) that generally contain 1 to 3 amino acids (or more). These are then randomized individually using saturation mutagenesis. The best hits selected from the first randomization are subjected to further iterations of randomization with saturation mutagenesis at the other positions. [5,9,28,39,40] ISM allows for a fast convergence of beneficial mutations, mainly because the choice of positions to mutate is informed by prior structural or biochemical knowledge. [41] The positions to mutate are chosen according to the final goal: B-FIT for a gain in thermostability [41] ; CASTing for improvement of catalytic properties. [28,35] A novel iterative methodology (FRISM), inspired by CAST/ISM, includes a rational prediction aspect that restricts the number of mutants to be screened. FRISM has been proven useful for engineering stereoselectivity in enzyme variants. [5,42,43] Below, we review a broad selection of methods and strategies to produce the targeted library that best suits your needs. Section I presents an overview of established approaches for standard library creation and popular commercial solutions to speed up the creation of targeted libraries. For those wanting to further explore protein sequence space in a bid to modify enzyme activity more deeply, Section II presents recently reported, innovative approaches to generate more complex, mainly targeted libraries.

SECTION I: ESTABLISHED METHODS FOR STANDARD SITE-DIRECTED AND SATURATION MUTAGENESIS FOR TARGETED LIBRARY CREATION
Site directed mutagenesis (SDM) allows an amino acid in the target protein to be substituted using mutagenic primers. This is easily achieved using commercially available kits such as QuikChange by Agilent (discussed in Section I-B 7). A direct derivative from SDM is site-saturation mutagenesis, commonly referred to as SSM, for mutation at a single site, or CSM (Combinatorial Saturation Mutagenesis) for more than one randomization site. [8,9,33,35,36] In SSM and CSM, degenerate mutagenic primers introduce the nineteen other amino acids (or a defined subset of them) at each site.
The main advantage of saturation mutagenesis is that it allows for non-conservative codon substitutions that are unlikely to arise by random mutagenesis, giving access to non-natural evolution pathways. [44] Direct improvement of saturation mutagenesis entails the use of primers that reduce codon degeneracy to tune it to the needs of the user. [36] For instance, one can restrict codon redundancy or the presence of stop codons. [8,33,45] The twenty possible naturally occurring amino acids are encoded by NNN degeneracy that codes for 64 possible codons (including 3 stop codons), where N is either A, T, G, or C. The NNK degeneracy (K = T or G, also refer to the IUPAC nomenclature for standard letters of degeneracy [46] ) limits the number of codons to 32, while encoding all possible amino acids and only 1 stop codon. The library size is reduced which facilitates screening efforts. It is important to note that when NNK is used in a combinatorial manner, the redundancy and library size increase considerably (i.e., for 1 position mutated with NNK, ∼34% of hits are redundant, and for 4 positions, library size reaches 10 6 with a redundancy of ∼81%), and includes an important fraction of unwanted stop codons.
If the user is instead willing to sacrifice some of the possible amino acids, [47] NNT and NNG code for 16 possibilities, while NDT (D = A, G, T) provides 12 codons, for an even smaller library. [36] More recently, Tang et al. [45] devised a method called SILM (Small Intelligent Library Method) [8] as an alternative to NNK that includes all possible amino acids but reduces degeneracy. In SILM, four primers (with NDT, VMA, ATG and TGG, where V = A, C, or G and M = A or C at the site to be mutated) [45,48] are mixed at a specified ratio. Another method, called 22c-trick, reduces degeneracy by combining three primers (NDT, VHG, and TGG, where H = A, C, T) that contain 22 codons that code for 20 amino acids. [49] For an ever-tighter control over codon degeneracy, several tools are available to the users, such as: the DC-ANALYZER, [8,45] MDC-Analyzer, [50] SwiftLib, [51] or DeCoDe. [52] Many methods have been developed to introduce mutations in the targeted gene at specific positions. Here we describe a selection of the most popular methods employed in the field of enzyme engineering, in order of increasing complexity. We also mention methods that are less commonly used but that could provide advantages for users having more expertise in molecular biology or having specific requirements. Most of the methods described in this section can be adapted to perform site-directed mutagenesis (SDM), site-saturation mutagenesis (SSM) or combinatorial site-saturation mutagenesis (CSM). It is important to remember that the efficiency of mutagenesis is often dependent on the availability of easy-to-use cloning and assembly methods, the most popular of which we describe in Section I-B.

Whole plasmid site-directed mutagenesis for simplicity and speed
The whole plasmid approach to site-directed mutagenesis is the most popular method for substituting one or a few consecutive amino acids in a target enzyme (Table 1). First commercialized by Stratagene in 1996 as the QuikChange kit (see Section I-B 7), the standard protocol relies on the design of overlapping primers containing the desired mutagenic codon(s) (Figure 1). [53] These are used to amplify the whole plasmid with a high-fidelity DNA polymerase. The resulting PCR product is a mutated and nicked double-stranded DNA. Treatment with DpnI enzyme eliminates non-mutated methylated wild-type DNA template; transformation into a bacterial host allows repair of the nicked, mutated strand with the recombination machinery of the host. Although straightforward, this results in poor transformation efficiency due to nicked DNA; furthermore wild-type contamination resulting from incomplete DpnI digestion has a higher transformation efficiency, which can increase the cost of identifying mutated variants.
These limitations make the standard method unsuited to the generation of larger libraries (> 10 4 transformants, as also reported in the QuikChange Site Directed Mutagenesis Manuals). Other drawbacks include annealing of the complementary primers to each other instead of to the target plasmid (primer-dimers) and poor PCR amplification. Nonetheless, the whole-plasmid approach remains one of the fastest and simplest ways to introduce mutations, making it attractive to novice as well as experienced users. To enhance its benefits, numerous improvements have been made concerning primer design, [59][60][61] incorporating DNA insertions or deletions, [62] enhancing transformation efficiency and including mutations at multiple sites (multi-site directed mutagenesis). [62,63] A recent, noteworthy improvement for difficult targets (large plasmid or difficult to deal with due to the presence of stable secondary structures) has also been proposed by Li et al. [64] 2. Whole-plasmid site-directed mutagenesis with template/product modification for improved mutational efficiency Whole-plasmid site-directed mutagenesis using template/product modification ( Figure 1) are not popular amongst novices, as the protocols are lengthier than with other methods. However, they have proven useful to reduce wild-type template contamination, offering a mutational efficiency of 50-100%. [55,56] The Kunkel method and its derivatives [65] require the generation of whole-plasmid single-strand template DNA containing dUTP instead of dTTP. This often implies using a phage to infect E. coli cells defective in dUTPase (dut -) and uracil N-glycosylase (ung -). The lack of these enzymatic activities allows the cells to propagate the phasmid DNA containing the target gene with uracil residues. Synthesis of a complementary strand using a mutated primer restores dTTP. Upon transformation into bacteria, the original dUTP-containing strands are cleaved, eliminating the wild-type template.
Although the basic method does not involve PCR, a remarkably improved protocol relies on the use of a high-fidelity polymerase that prevents uracil stalling, Taq ligase, in vitro uracil DNA glycosylase and exonuclease III. The exonuclease cleaves the uracil-containing template and non-desired DNA products. This results in the transformation of double-stranded DNA with higher transformation efficiency. [55] This protocol gives further accessibility to the Kunkel method by providing a description to generate uracil containing templates without the need to use phages. [55] Other studies to improve the mutation efficiency of Kunkel-based methods have focused on improving conditions for DNA amplification such as primer design, extension time, and annealing temperature [66] or the use of phi29 polymerase. [67] TA B L E 1 Selected mutagenesis methods

Category Method
Mutagenesis efficiency a

Recommended protocol
Whole-plasmid site-directed mutagenesis Less than 100% ∼20 High-fidelity polymerase No Laible and Boonrod [54] DpnI Whole plasmid site-directed mutagenesis with template/ product modification Pfunkel c 70-100% ∼20 T4 Polynucleotide kinase No Firnberg and Ostermeier [55] High-fidelity polymerase that reads uracil-containing DNA  [58] High-fidelity polymerase DpnI a Mutagenesis efficiency is as calculated in each of the recommended protocols. b Estimated time includes transformation and overnight culture. c dut -/ung -: E. coli strain required to generate uracil-containing templatem. d mutS: E. coli strain required to avoid repair of the mutated template.
Oligonucleotide-directed mutagenesis by elimination of unique restriction site (USE) [56,68] also relies on whole-plasmid amplification of DNA. It employs a mutagenic primer and a primer that eliminates a unique restriction site in the mutated plasmid. Subsequent digestion with the corresponding restriction enzyme eliminates wild-type template. This product is transformed into a mutS E. coli strain deficient in repair of mismatched bases. The obtained colonies are pooled, the plasmid DNA is obtained and digested with the same restriction enzyme to ensure complete elimination of wild-type template, and transformed in a standard E. coli strain. In conjunction with the Kunkel-based methods, USE can serve to increase mutational efficiency. [69]

Site-directed mutagenesis by overlap extension for large plasmids or difficult constructs
First described in 1989 as a PCR-based method to introduce mutations, insertions and deletions, [70] the main advantages of Site-directed mutagenesis by Overlap Extension (SOE) are high product yield and traceability of product formation. It is mostly used for large plasmids, difficult-to-amplify genes or when whole-plasmid amplification consistently fails due to other reasons (i.e. formation of primer dimers). In SOE, overlapping primers containing the mutation(s) are designed, sim-ilarly to the commercial QuikChange method (see Section I-B 7). These are used separately in two PCR reactions to introduce the mutation(s), paired with a primer complementary to either the 5′ or 3′ terminus of the gene ( Figure 1). A final PCR reaction combining the two mutated fragments and extending to full length with the terminal primers then recreates the whole, mutated gene with efficiencies of up to 98%. [70] An improvement of this method allows one to carry it out in a single tube: in this instance, a megaprimer carrying the mutation of interest is generated using one mutagenic primer and a non-mutagenic one that starts at either of the gene's termini. The megaprimer is then used to amplify the whole gene in a second PCR step. [71] A variation of SOE for more efficient removal of impurities such as template DNA or excess primers is PAGE-mediated Overlap Extension PCR (POEP). [72] This modification increases mutational efficiency, especially for multiple-site insertions where efficiency of SOE is near 75% (and reaching 100% for POEP). [72] Improvements for creation of insertions and deletions > 30 bp have also been described [73] , as well as multi-site directed mutagenesis by creating mutagenic fragments with homologous regions that act as megaprimers. [74][75][76] The main limitation of SOE is the general requirement for subsequent subcloning of the mutated DNA fragment in the desired vector.
Homologous recombination in vitro and in vivo can alleviate this limitation and will be discussed in Section I-B 6 and 7.

Inverse PCR for saturation mutagenesis and the introduction of deletions and insertions
Developed in 1989, [77] this approach is particularly advantageous for introducing deletions and insertions, and for saturation mutagenesis. [78] Inverse PCR relies on the amplification of a targeted sequence using back-to-back primers for outward amplification of regions flanking the targeted sequence ( Figure 1). Amplification generates a linear product that is subsequently ligated to regenerate the whole plasmid containing the target gene. Ligation requires a phosphate group at the 5′ terminus; this can be achieved with T4 polynucleotide kinase or by use of phosphorylated primers. As with whole-plasmid amplification, digestion of the methylated template DNA with DpnI prior to transformation increases transformation efficiency. An improvement of this method consists in generating complementary overhangs for a ligation-independent approach (SLIM). [58] A recent improvement (URMAC) is described in Section II-B 2.

B. FLEXIBLE AND EASY-TO-USE ASSEMBLY/ CLONING STRATEGIES INSTRUMENTAL TO MUTAGENESIS
The development of straightforward cloning and assembly methodologies in conjunction with site-saturation mutagenesis methods has accelerated the creation of ''smarter'' methods for library generation.
Sub-cloning of fragments into vectors constitutes a major bottleneck in library generation. This is reflected in the popularity of the QuikChange kit (Section I-B 7) that requires no restriction/ligation step. Ligation-independent methods have provided noteworthy solutions to this problem, accelerating library creation to explore sequence space more efficiently. The major advantage of ligation-independent methods resides in providing increased flexibility in library generation.
Mutations introduced in different parts of the genes can subsequently be reassembled to explore interactions between selected mutations or within a defined region in the protein (i.e., a tunnel, the C-terminus, etc.). [26] A few ex vivo and in vivo methods will also be mentioned briefly.

Restriction-free cloning for versatility
Restriction-Free (RF) cloning can be used to accelerate single or multiple site-directed mutagenesis, insertion and/or deletions. [79] It consists in amplifying the vector using high-fidelity PCR and an insert (megaprimer) containing regions homologous to the vector in its 3′ and 5′ termini ( Figure 2) [79,80] . This method is versatile as any  [74,80,[84][85][86] . Created in BioRender.com product yield and low efficiency for large insertions. To circumvent these limitations, Exponential Megapriming PCR (EMP) adds forward or reverse primers to the megaprimer reaction to amplify the whole plasmid. [82] A recent improvement of RF cloning, Modified Restriction-Free cloning (MRF) [83] allows insertion of fragments of up to 20 kb into the cloning vector.

Multiple overlap extension PCR for fragment assembly
Multiple Overlap Extension PCR (MOE-PCR) can be used to assemble up to eight DNA fragments in a single PCR reaction without the need for additional enzymes (T4 DNA ligase, exonuclease, etc.). [74] In MOE, fragments containing the desired mutations and 50 bp homologous to the contiguous fragment are produced in independent PCR reactions. These are subsequently mixed in a single PCR reaction and used as a template for the assembly of the whole gene or plasmid in a single step (Figure 2). Careful design of the annealing tem-perature of termini is needed for specificity and efficiency. A previous version of MOE-PCR, called multi-fragment site-directed mutagenic overlap extension PCR, applies the same principle of homologous recombination of fragments termini and amplification: by assembling fragments two by two it can be used to efficiently assemble 13 fragments. [76]

Combinatorial codon mutagenesis based on SOE and megaprimer PCR for tunable mutational frequency
The concept of Combinatorial Codon Mutagenesis (CCM) [84] is based on SOE-PCR and other recent work. [87][88][89]

Ligation-independent cloning (LIC) and derivatives (SLIC and SLiCE) to ligate inserts into vectors without the need of a ligase
Ligation-Independent Cloning (LIC) allows assembly of insert(s) into vector without ligase as the overhangs created are completely complementary. LIC takes advantage of the exonuclease activity of T4 DNA polymerase to create overhangs in homologous regions between the desired mutated insert(s) and the vector. [90] This exonuclease activity is favored in the absence of dNTPs and stops when adding specific dNTPs.
Sequence and Ligation-Independent Cloning (SLIC) [85] (Figure 2), a modification of LIC, allows the recombination of up to 10 fragments. In SLIC, imperfect overhangs are created with T4 DNA polymerase and RecA protein is used to promote homologous recombination in vitro.
Recombination ex vivo using cell lysates of bacteria instead of the RecA protein is possible with a further modification: SLiCE (Seamless Ligation Cloning Extract). [91] Its application to enzyme engineering [92] will be described in the in vivo and ex vivo homology-based recombination methods section I B-6).
The main disadvantage of creating overhangs with T4 DNA polymerase is that ssDNA overhangs created may have strong secondary structures. A recent improvement of this method to recombine short fragments aims at reducing this disadvantage by optimizing incubation temperature and time with T4 DNA polymerase. [93] A recent variation of SLIC consists in the use of T5 exonuclease instead of T4 DNA polymerase and its transformation in E. coli for assembly. [94]

OmniChange for simultaneous mutation in distal parts of the gene
OmniChange can be used to mutagenize up to five codons simultaneously in up to the same number of fragments. As a first step, fragments are created via PCR amplification using forward primers and reverse mutagenic primers, or vice versa. Digestion with DpnI prevents contamination with wild-type template DNA. The mutated fragments are cleaved with an iodine treatment to generate cohesive ends for subsequent reassembly (Figure 2). Nicks are repaired in vivo after transformation in E. coli BL21-Gold (DE3) lacI Q1 (common laboratory strains can be used).

in vivo and ex vivo homology-based recombination
Several methods described above generate nicked products (i.e. wholeplasmid amplification and OmniChange) that are repaired in vivo by common laboratory strains of E. coli. We deem it useful to briefly enumerate selected examples of easy-to-implement in vivo and ex vivo (cell lysate) homology-based recombination methods ( Table 2). The methods use either E. coli or S. cerevisiae, the most common organisms for the heterologous expression and directed molecular evolution of prokaryotic and eukaryotic enzymes. We selected two that take advantage of the recA-independent recombination pathway of E. coli present in common cloning strains such as DH5α, XL10-Gold or Stbl3: SLiP (in vivo) [91,92] and IVA cloning (ex vivo). [95] REPLACR mutagenesis [96] instead uses bacteria expressing Red/ET with recA-recombineering activity whereas IVOE [97] uses S. cerevisiae. These approaches can be also applied by more experienced users to avoid the cost of commercial kits.

Commercial and proprietary methods to facilitate mutagenesis
Several kits are available for mutagenesis and assembly. They can be useful to novices for an easy start or as a standard quick alternative for experienced users. We present a selection of the most widely used kits in Table 3.
Until recently, gene synthesis was not broadly applied to generate libraries of very high quality (namely minimum biases). Nowadays the cost of gene synthesis and its quality have improved making this strategy more accessible. For instance, Twist Bioscience has recently developed a proprietary method for the synthesis of combinatorial libraries that they refer to as nearing perfection making commercial gene libraries even more accessible. Their synthesis capability was successfully validated by Li and colleagues, demonstrating how a high-quality combinatorial saturation mutagenesis library can be obtained through highfidelity solid-phase chemical gene synthesis on silicon chips coupled with gene assembly. [98] Their method has shown higher mutational efficiency, fewer incomplete sequences and less wild-type contamination than library generation through traditional saturation mutagenesis methods. [98] The same laboratory recently presented a second example [99,100] of how commercially available methods (in this instance from Lab-Genius) can be of aid in the generation of smart libraries. Their method relies on previously developed techniques, in particular DNA assembly and on USER (Uracil-Specific Excision Reagent) cloning and it allows more accurate gene synthesis of long double stranded polynucleotides. [100,101] While the use of synthetic libraries is no substitute for optimal library design, the steadily decreasing cost of gene synthesis definitely makes it a viable complement to smart library design strategies. Other companies that offer library generation services include ATUM, Codexis, Creative Biogene, Creative Biostructure and ThermoFisher.

Method Short Description Organism
SLiP -SLiCE-mediated PCR-based site-directed mutagenesis [91,92] Complementary to SOE, the target gene is amplified in two separate PCR reactions, generating two gene fragments containing the desired mutation(s) and short (15-20 bp) overlapping regions to each other and the receiving vector. Assembly of the linearized vector and the mutated fragment is done ex vivo by using SLiCE extracts.
IVA cloning -in vivo assembly [95] Fragments containing short (15-20 bp) 5′ and 3′ overlaps are directly transformed for in vivo assembly. Used to do assembly, mutagenesis, insertions and deletions.
IVOE -in vivo overlap extension [97] Complementary to SOE, the target gene is amplified in two separate PCR reactions, generating two gene fragments containing the desired mutation(s) and 40-66 bp 5′ and 3′ overlaps with each other and the linearized receiving vector. These are subsequently transformed for in vivo assembly to regenerate the full gene-length and sub-clone to the vector.
REPLACR mutagenesis -Recombineering of Ends of linearised PLAsmids after PCR [96] The vector is linearized by inverse PCR with primers designed to generate short (∼17 bp) overlaps at 5′ and 3′. The linear PCR product is directly transformed for assembly in vivo. The method can be used to do site-directed mutagenesis, insertions and deletions.
Bacteria expressing Red/ET with recArecombineering activity.

SECTION II: OUR SELECTION OF THE MOST CREATIVE OUT-OF-THE-BOX STRATEGIES FOR LIBRARY CREATION
Below, we review a selection of recent (2016 to 2021), innovative methods that either rethink and improve established methods or propose completely new solutions for ''smarter'' library generation (refer also to Table 4). The novelty might come in the form of a simpler way to target specific segments of the target for mutagenesis or of a strategy to explore sequence space from a different perspective. We are confident that, if you are dealing with a complex problem of library generation, you will find a solution here or be inspired by one of these methods.

A. ASSEMBLY-BASED STRATEGIES
As mentioned in section I-B, in general, DNA assembly methods can be used as a complementary tool to generate targeted libraries. Some of the methods highlighted in this section belong to this category.

Golden Gate-based approaches: treating the enzymes in ''parts''
We recently developed a strategy to provide flexibility in blending mutated and non-mutated, user-specified segments of a protein of interest. [26,102] This approach is based on the Golden-Gate assembly method, which makes use of Type IIs restriction enzymes that cut outside of their recognition site, allowing for seamless assembly of fragments. [110] Among its strengths (see details in Table 4), this method is quick, low-cost and allows for the introduction of highly mutated gene segments (or "parts") into the wild-type background. It also lends itself to automation and allows for the exploration of epistatic effects.
The main benefit of this approach is to tailor the mutagenic treatment of separate regions within a gene. In our case, a large, putative substrate-binding tunnel was well suited to random mutagenesis, whereas individual residues in other regions were targets for saturation mutagenesis. [102] The problem was treated by applying the concept of ''parts'' often used in synthetic biology to assemble gene pathways. The gene was ordered in 3 parts, each designed to include 5′ and 3′ ends for seamless and easy (ideally one-pot) reconstitution of the whole gene. [110] Each part was subjected to the appropriate mutagenesis strategy and the mutated and non-mutated parts were reassembled at will to generate a series of gene libraries with various parts -or all parts -mutated. Demonstrated in our work with a gene segmented into 3 parts, we believe the Golden Gate strategy could be applied to at least 25 parts. [111] Because the user selects the appropriate mutagenesis technology for each part, there is no limit to which options and residues are chosen.
Pullmann et al. followed up with a similar approach to generate multisite saturation mutagenesis libraries, additionally offering a software package to facilitate primer design (https://msbi.ipb-halle. de/GoldenMutagenesisWeb/) and a colorimetric control screen for restriction efficiency. [103] TA B L E 3 Selected commercial mutagenesis and assembly kits

Simultaneous introduction of multiple mutations in distal parts of the protein: Darwin Assembly, RECODE
Darwin assembly is a method by Cozens and Pinheiro. It is noteworthy as it represents a relatively simple, fast, and low-cost solution to the introduction of multiple mutations in distal sites of a protein. [104] Among its strengths (Table 4), it allows for flexibility in the evolutionary pathway that the researcher uncovers: non-contiguous mutations are introduced simultaneously and not sequentially so the evolutionary pathway chosen does not rely on the very first mutation introduced, as per other methods. The method lends itself to automation, for instance by using the Antha software of Synthace. [112] Darwin Assembly enables the parallel introduction of more than 10 distal mutations at once and the generation of high-quality libraries The RECODE [89] (Rapidly Efficient Combinatorial Oligonucleotides for Directed Evolution) method can similarly be used to generate either single mutations at distal positions, insertions, deletions, and multiple cassettes. This method is similar in nature to the Darwin assembly in that it makes use of several mutagenic primers that are annealed at once on the template DNA albeit different conditions are used. In RECODE the authors also use ad-hoc ''anchor'' primers similarly to the Darwin assembly. These are designed and standardized to allow facile reassembly of the library into the vector, but they are not biotinylated.
The method is straightforward: the primers are annealed to the template DNA upon denaturation, the nicks between them are then filled and ligated thanks to the action of both enzymes, and double stranded DNA is finally regenerated. The ultimate goal is the achievement of combinatorial mutations at multiple sites using one or two-step PCRs. The so-obtained linear DNA can then be cloned into a vector (the authors used Gibson assembly). [105] A drawback of RECODE compared to Darwin assembly might be higher contamination with wildtype template since the biotin-based purification is not used.

URMAC for targets of very big dimensions and GC-rich targets
URMAC (UnRestricted Mutagenesis and Cloning) [106] offers several advantages: it is well suited for large plasmids, having been validated on targets up to 17 kb, and it can be used for deletions, insertions and substitutions. URMAC allows for the introduction of simultaneous mutations and is applicable to GC rich templates, which are often troublesome. These advantages all result from to not having to generate a copy of the whole plasmid as in most commonly used methods, which also translates into working with smaller constructs. primers and they are phosphorylated, which allows for their participation in the ligation steps. The main shortcoming of this method, compared for instance to Golden Gate-derived methods (Section II A-1), is the requirement for unique restriction sites. This can become a limitation for plasmids larger than 30 kb. In such an instance, the authors suggest that further recombination methodologies should be considered to transfer back the mutated constructs in the original plasmid. [106] 2. Nicking mutagenesis for robustness and reliability in the generation of saturation mutagenesis libraries ''Nicking mutagenesis'' [107] is a robust and reliable one-pot method that can be used to generate single or multi-site saturation mutagenesis libraries. It can be used on any plasmid provided that it contains a BbvCI restriction site.
First, a plasmid containing the gene of interest is nicked on one strand using BbvCI and Nb.BbvCI endonuclease, followed by exonuclease III digestion of that strand. A mutagenic strand is then synthesized, to replace the digested strand; this step uses low concentration mutagenic primers and thermal cycling, followed by a reaction with Phusion polymerase. The new, nicked strand is repaired with Taq DNA ligase, yielding a new, heteroduplex DNA. The non-mutated strand is nicked and digested using a second restriction endonuclease (Nb.BbvCII) and exonuclease III, followed by the synthesis of the complementary strand using a secondary primer and treatment with DpnI to eliminate any wild-type plasmid.
The authors generated libraries of single-site saturation mutagenesis for 71-codon stretches, using a mixture of 71 oligo sets, each carrying NNN degeneracy at one of the 71 codons. The results were analyzed by deep sequencing.
The method was also validated using larger constructs; both single and multi-site protocols were designed. [107] The report is remarkable for the methods proposed and for its rigor, the method having been corroborated by an external research group to evaluate its robustness and reproducibility. The cost and time for the application of the method was estimated to be, respectively, $455 and 1 day.

CRISPR-Cas9-based methods (ICM): for higher specificity and lower presence of wild-type construct or incorrect sequences
Whereas the CRISPR-Cas9 system is now widely used for editing genomes in vivo, there has been little reported of its use in vitro. To the best of our knowledge, the paper by She et al. [108] is The authors directly compared this method with traditional PCRbased saturation mutagenesis approaches, demonstrating its superiority in terms of codons identified and presence of wild-type construct or incorrect sequences. [108] Further advantages relative to PCR-based methods include flexibility, as it can be applied to almost any scaffold, and a greatly lowered error rate (Table 4).

TRIAD and InDel-assembly: for a smoother introduction of insertions and deletions
While Nature often uses DNA insertions and deletions as an evolutionary tool, their use in the lab applied to enzyme evolution is not yet widespread. Methods for the random combinatorial incorporation of InDels (Insertions or Deletions) are available, they are limited with respect to the quality and diversity of the generated library. When these modifications are made instead at targeted sites, [62,78,58,95,96,106] the exploration of the sequence space is limited.
Because of its uniqueness as an exploratory tool for protein evolution, [113] even though the insertion of randomized InDels cannot be strictly classified as a focused mutation approach. We believe that it deserves a place in this review The most commonly encountered problem in the development of methods for InDels is frame-shifting [6,[114][115][116][117][118][119][120] which causes a high TA B L E 4 Strengths and limitations of recent methods (section II) to build complex libraries

Method Strengths Limitations
Golden Gate-based approach 1 2017 [102] (i) Upon receiving the synthetic DNA constructs, the one-pot version allows library generation in less than a day and using commercial reagents that are readily available reagents; (ii) combinatorial; (iii) can be carried out one-pot for simple assemblies; (iii) allows the introduction of highly mutated portions of the gene into a wild-type background; iv) automatable; (v) allows the exploration of epistatic effects.
(i) Conceptually complex for the beginner; design of the genetic parts can be optimized with company if ordering genes or parts to assemble.
Golden Gate-based approach 2 2019 [103] In addition to the points made above (Golden Gate-based approach I), online tool available to facilitate primer design.
Same as above (Golden Gate-based approach 1).
Darwin Assembly 2018 [104] (i) From plasmid to DNA library within a day and using commercial reagents that are readily available; (ii) can be carried out one-pot; (iii) can target multiple distal positions in a gene; (iv) reduced wild-type contamination; (v) automatable; (vi) allows the exploration of epistatic effects.
(i) Conceptually complex for the beginner; (ii) in the first version of the method, biotinylated primers need to be used.
RECODE 2016 [89] (i) Can be performed in under a day; (ii) can be used either to insert single mutations, insertions, deletions, and multiple cassettes; (iii) involves a simple one step (or two step) PCR; (iv) reduced nonspecific mutations and incorrect ligations thanks to the use of high-fidelity DNA polymerase and ligase.
(i) Uses phosphorylated primers; (ii) needs an extra step via assembly, for instance via the Gibson assembly [105] .
URMAC 2017 [106] (i) Does not require sub-cloning nor copying the entire plasmid as per other common methods; (ii) can be used for deletions, insertions and substitutions; (iii) well suited for large plasmids (up to 17 kb); (iv) allows for the introduction of simultaneous mutations; (v) applicable in cases of high % GC templates.
Requires the presence of unique restriction sites, which can be hard to find in constructs of > 30 kb.
Nicking mutagenesis 2016 [107] (i)Can be performed in less than a day; (ii) it can be applied to any plasmid provided that it contains a BbvCI restriction site; (iii) simple as it can be carried out in one-pot (iv) reliable, robust.
(i) Complexity of designs increases when multiple BbvCI nicking sites are present.
CRISPR-Cas9-based method (ICM) 2018 [108] (i) Can be performed in under a day; (ii) flexible, as the cleavage site does not depend on a particular restriction site (the PAM (SpCas9) site occurs on average every 8 to 12 bp); (iii) Cas9 is specific, with no mismatches; (iv) high quality of the generated libraries as it is PCR independent so it avoids the introduction of biases; (vi) efficient for plasmids up to 9 kb.
Requires obtaining the Cas system, or to express the protein.
Transposon-based InDels method (TRIAD) 2020 [6] (i) TRIAD taps into solutions that might not be accessible using libraries of point mutants through the efficient random incorporation of InDels (ii) offers the possibility to create libraries focused on a specific region of a protein, (iii) compared to other insertion and deletion methods, it solves the problem of frame shifting, (iv) it does not require the use of highly specialized equipment.
(i) Conceptually complex for the beginner, although simpler than existing methods to introduce InDels; (ii) high-throughput screening may be necessary.
InDel-Assembly [109] (i) As for TRIAD, this method taps into solutions that might not be accessible using libraries of point mutants because it allows the random incorporation of InDels in libraries that explore composition and sequence length variation (ii) offers a cost-effective solution for high-quality DNA assembly, (iii) compared to other insertion and deletion methods, it solves the problem of frame shifting, (iv) it does not require the use of highly specialized equipment.
(i) Conceptually complex for the beginner, (ii) it requires the use of paramagnetic streptavidin-coated beads and biotinylated primers.
frequency of non-functional variants [6] , or the use of highly specialized DNA synthesis equipment. [121] The use of engineered transposons for sampling InDels has also been explored [121][122][123][124] and methods exist that tackle the frame-shift problems, but are currently limited to generating InDels that are fixed in length and sequence, limited to the generation of deletions, or that are technically difficult. [6,[122][123][124][125] Among the newest methods, two are of relevance in that they: (i) solve the frame-shifting problem, (ii) allow introduction of randomized InDels, (iii) and do not require highly specialized equipment.
The TRIAD method (Transposition-based Random Insertion And Deletion mutagenesis) [6] overcomes these drawbacks. TRIAD allows one to create libraries characterized by the random insertion of short, in-frame InDels and can be used to introduce single InDels (of a length of up to three nucleotide triplets, or 9 bp) generating either betweencodon or cross-codon mutations without frame-shifting.
The InDel-assembly [109] is a novel DNA assembly platform that provides a means to introduce InDels of varying length and composition through the use of standardized dsDNA building blocks. InDelassembly works through cycles of restriction and ligation of the building blocks on paramagnetic beads. In contrast to TRIAD, this method was applied to create a targeted library: for instance, it has been used to engineer protein loops or linkers in a ß-lactamase. In addition, the authors also describe an ad-hoc strategy to analyze the generated libraries. [109] CONCLUSIONS Identifying a suitable method for the creation of a library for enzyme engineering is not as straightforward as it may seem, in particular for While the field of enzyme engineering expands, new approaches addressing where to mutate or how to mutate genetic sequences are increasingly sought for. The newest and most promising methods for ''smart'' library creation reviewed in Section II illustrates the creativity with which these challenges are addressed. The ''smarter'' the method used for generating the library of variants, the more efficient the exploration of theoretical protein sequence space, and the lower the throughput of screening needed to uncover improved variants.
Even the most innovative of advances will never enable exploring all of sequence space. Nonetheless, the advent of methods that facilitate generation of libraries with improved quality will make us better at navigating sequence diversity.
Looking to the future necessarily brings computer aided strategies into play, to realize the goal of ''generating small(est) and high(est) quality mutant libraries'' . [126] The main barrier to using predictive or computational methods is likely the unwillingness of chemists and biologists of "getting acquainted with computational techniques'' . [126] Happily, many computational tools for enzyme engineering no longer require being a computational expert. [127] In the not-too-distant future, machine learning will procure a ''third approach'' together with rational design and directed evolution, to be used in complementary and ever-morepowerful ways. [128] These approaches may be useful to better target the mutational hot-spots in a protein, hence, together with the selection of a smart method for library construction, they have the potential to reduce even further the number of variants that need to be screened before the improved enzyme can be identified.

CONFLICT OF INTEREST
The authors declare no conflict of interest.

DATA AVAILABILITY STATEMENT
Data sharing is not applicable to this article as no new data were generated or analyzed in this review.