Design and Construction of a Designed Ankyrin Repeat Protein (DARPin) Display Library

Protein display systems are powerful techniques used to identify protein molecules that bind with high affinity to target proteins of interest. The initial challenge in implementing a display system is the construction of a high‐diversity naïve library. Here, we describe the methods to generate a designed ankyrin repeat protein (DARPin) display library using degenerate oligonucleotides. Specifically described is the construction of a single DARPin repeat module by overlap extension PCR, concatenation of the module by restriction enzyme digestion and ligation, and incorporation of the concatenated modules into a full‐length DARPin sequence in a bacterial cloning or display vector containing the hydrophilic N‐ and C‐terminal capping domains. Protocols for PCR amplification of DARPin sequences to estimate diversity of naïve and enriched libraries via next‐generation sequencing are included, as is a simple Linux‐based program for analysis of naïve and enriched sequences. © 2024 The Authors. Current Protocols published by Wiley Periodicals LLC.


INTRODUCTION
Isolation of antibodies and similar binding molecules, e.g., nanobodies, from immunized animals is a lengthy and intensive process (Harmansa & Affolter, 2018).Protein display techniques were developed as an alternative to simplify the process and to allow the identification of novel designed protein binders that are not naturally occurring [e.g., singlechain variable fragment (scFv) fragments, synthetic protein binders].In vitro (mRNA and ribosome display) and in vivo (phage, bacteria, and yeast) systems are capable of displaying a broad range of molecules, from short peptides (Rice & Daugherty, 2008;Wu et al., 2016) to small globular proteins (Chao et al., 2006;McMahon et al., 2018).Cell-based systems, using bacteria, phage, or yeast, secrete and anchor the protein binder to the cell or phage surface, where it can interact with exogenous target proteins; cells displaying target-specific binders are then selected for, and enriched, using a variety of methods, including biopanning and magnetic and/or fluorescence-based cell sorting (Chao et al., 2006; see Current Protocols article: Kenrick et al., 2007).Iterative rounds of selection of increasing stringency are used to isolate protein binders with high specificity and affinity for the target protein (Chao et al., 2006).Binders, in whatever form, have a wide range of applications, including as molecular tools used in basic research as well as therapeutic molecules for the treatment of disease.
Modern recombinant molecular biology techniques have simplified the creation of highly diverse libraries that reproduce naturally occurring binder repertoires (see Current Protocols article: Schladetsch & Wiemer, 2021) that expand on protein folds used for cellular protein-protein interactions.An example of the latter is ankyrin repeat proteins, which are naturally occurring proteins composed of repeating 33 amino acid modules, each consisting of a β-turn followed by two anti-parallel α-helices (Fig. 1).The average ankyrin repeat protein has 4 to 6 modules, although proteins with up to 33 modules exist (Kumar & Balbach, 2021).The repeats generate a concave binding surface, with 6 to 8 amino acid residues of a module contributing to substrate binding (Kumar & Balbach, 2021).Binz et al. recognized the utility of the ankyrin repeat protein scaffold and developed a designed ankyrin repeat protein (DARPin) system, which has two or three repeat modules sandwiched between hydrophilic N-and C-terminal caps.Using ribosome display, they confirmed that DARPins that bind target proteins with high affinity could be selected from their synthetic DARPin library (Binz et al., 2004;Binz
We have designed a strategy that uses basic molecular biology procedures to generate a highly diverse naïve DARPin library (≥10 8 unique sequences) using oligonucleotides incorporating degenerate nucleotide bases.The procedure is straightforward and inexpensive and incorporates a specific subset of amino acids into variable positions while avoiding stop codons, chemically reactive amino acids, and overrepresentation of any particular amino acid.Oligonucleotide design and considerations for amino acid diversity at variable positions are discussed.Basic Protocol 1 describes the use of overlap extension PCR (OE-PCR) and oligonucleotide pools to generate a short DNA fragment comprising a single DARPin repeat.Basic Protocol 2 details the concatenation of these repeat fragments by restriction enzyme digestion with type IIs restriction enzymes followed by ligation.Basic Protocol 3 describes the PCR amplification of the concatenated DARPin repeats and their ligation into cloning or display vectors containing the sequence-invariant N-and C-terminal caps to create a diverse library of DNA sequences encoding a complete DARPin molecule.Basic Protocol 4 describes sample preparation of PCR amplicons from naïve and enriched libraries for next-generation sequencing (NGS).Basic Protocol 5 details a method to estimate library size and diversity using the NGS data generated by Basic Protocol 4.

STRATEGIC PLANNING
DARPin display was developed using in vitro display technologies (Binz et al., 2004), although expression of DARPins on the surface of bacteria (Zadravec et al., 2015), yeast (Mohan et al., 2019), and phage (Steiner et al., 2008) has been described.Initial planning should include testing whether the desired system of choice can efficiently display DARPins and/or whether modifications to the system are required for efficient display, e.g., modification of tether length or anchoring strategy.We have used a DARPin that binds maltose-binding protein (MBP) to test DARPin surface exposure, as MBP is highly soluble, easily expressed and purified, and amenable to labeling; this particular anti-MBP DARPin has not been published, but the MBP-DARPin crystal structure has been deposited into the Protein Data Bank (PDBid 5FIN).As the ligation of the variable region of the library into the display vector is a resource-and time-consuming endeavor, verification that the system of choice can display DARPins is absolutely critical.
The second major consideration is deciding on the identity of the amino acids at each variable position and choosing the degenerate codons to encode the desired residues.In the described protocols, we have used a pool of ∼80 oligonucleotides (see Supporting Information, Table 1) and seven different degenerate codons (Fig. 2, Table 1) to generate a highly diverse library with a relatively balanced distribution of amino acids at each position (Fig. 1, Table 1).Less specific degenerate codons (e.g., such as NNB and NNK, which encodes all 20 amino acids and one stop codon) could be substituted and reduce the number of required oligonucleotides at the expense of including a stop codon and incorporating potentially undesirable amino acids (e.g., cysteine, proline).Conversely, using more degenerate codons that encode fewer amino acids can provide a more desirable distribution of a specific subset of amino acids at the expense of requiring many more oligonucleotide primers.

NOTE:
Experiments involving PCR require extremely careful technique to prevent contamination.

of 23
Current Protocols

GENERATION OF A SINGLE DARPin REPEAT BY OVERLAP EXTENSION PCR
A pool of overlapping oligonucleotides containing degenerate codons (Table 1) is used to generate a 99-bp DNA segment that encodes a 33-amino-acid DARPin repeat module.The 99-bp DARPin sequence is flanked by type IIs restriction enzyme sites, allowing the repeat to be concatenated by restriction enzyme digestion and ligation.A single repeat or concatenated repeats can be PCR-amplified using oligonucleotides (see Supporting Information, Table 1) that are complementary to the 5 and 3 flanking sequences, which contain the type IIs restriction enzyme sites.In addition, M13 forward and reverse primer sequences (Table 2) are appended to the "external" forward and reverse primers,   Here, three oligonucleotide pools, corresponding to the 5 , middle, and 3 ends of the DARPin repeat, in conjunction with the external 5 and 3 primers are used to assemble a fully synthetic double-stranded nucleic acid fragment that encodes a single DARPin internal repeat module, referred to as 1XIR, with eight variable positions specifying, via degenerate codons (Table 1), 2 to 12 amino acids (Fig. 1).The repeat is flanked by restriction enzyme recognition sites and unique sequences allowing sequencing and/or PCR amplification of the DNA fragment (Fig. 2).The protocol as described uses ∼80 oligonucleotides (see Supporting Information, Table 1), although the choice of degenerate codons can dramatically increase, or decrease, the number of required oligonucleotides.If executed properly, a limited number of PCR reactions are required with the oligonucleotide pools to generate the DARPin repeat, and thus, picomole-scale oligonucleotide synthesis in 96-well plates should provide ample starting material.We suggest testing multiple PCR reaction conditions to determine optimal parameters to generate the initial DARPin repeat.Specifically, we test multiple high-fidelity DNA polymerases, two or more annealing temperatures, and three oligonucleotide template concentrations.

of 23
Morselli et al.

of 23
Current Protocols Reaction setup 1. Maintain reagents (e.g., enzymes) on ice or in a cold block.
2. Add all PCR components (see Table 3), except the enzyme, to a sterile PCR tube according to the manufacturer's recommendations and mix well by inverting the tube several times.Centrifuge briefly to bring the liquid to the bottom of the tube, add the enzyme, and repeat the mixing and centrifugation steps.
3. Create three primer pools in separate sterile low-retention microcentrifuge tubes using degenerate oligonucleotides: a.For Pool 1, containing equimolar amounts of the 64 oligonucleotides encompassing the 5 end of the DARPin repeat, as the oligos are supplied at 5 μM concentration, add equivalent volumes of each (e.g., 5 μl) to a tube to create Pool 1.
Oligonucleotide sequences are listed in Table 1 in the Supporting Information.

of 23
Current Protocols b.For Pool 2, corresponding to the middle of the DARPin repeat, add equivalent volumes of the 16 oligonucleotides comprising Pool 2 to a tube to create a 5 μM stock.c.For Pool 3, in this DARPin design, add a single oligonucleotide at 5 μM concentration to a tube.
4. Mix the three primer pools at a 1:1:1 ratio to generate an oligonucleotide master pool containing all the oligonucleotides required to generate a single DARPin repeat.The external primers incorporate M13 sequencing primer sequences so that the product can be sequenced if necessary.
Typically, we prepare a 50-μl reaction volume for each polymerase and oligonucleotide pool concentration and split this volume in half to test two annealing temperatures in the thermal cycler capable of gradients.An example PCR setup using KOD HotStart DNA polymerase, along with cycling parameters, is provided in Table 3.
6. Evaluate the PCR reaction by electrophoresing a 5-μl sample of the PCR product on a 2% agarose gel, staining with nucleic acid gel stain, and imaging with a transilluminator.
An example of a test PCR reaction product is shown in Figure 3.
7. Scale up the PCR using conditions that generate a correctly sized PCR fragment.Purify the PCR product using a spin column kit, or if additional bands are present, purify the product by gel extraction using a gel extraction kit following the manufacturer's Morselli et al.

of 23
Current Protocols protocol.In either case, maximize yield by using two elutions per column: that is, after the first elution in elution buffer, add an additional volume of elution buffer to the column and incubate at room temperature for several minutes prior to centrifuging.
For gel extraction, we prefer the Promega Wizard® SV Gel and PCR Clean-Up System, as it uses less binding buffer to dissolve the gel slice, thus minimizing processing time.
Warming the elution buffer (10 mM Tris HCl, pH 8, or sterile nuclease-free water) to 50°t o 60°C may improve the yield.
8. Quantify the purified DNA fragment using a spectrophotometer or, preferably, a fluorescence-based quantification method.Verify the fragment quality using a 2% agarose gel and then store the DNA fragments frozen if not proceeding directly to restriction enzyme digestion (see Basic Protocol 2).
The output from this protocol is a DNA fragment comprising a single DARPin repeat flanked by type IIs restriction enzyme sites.

CONCATENATION OF DARPin REPEATS
This protocol describes the concatenation of DARPin repeat modules to generate DNA fragments containing two or more consecutive in-frame repeat modules that can be inserted between the hydrophilic N-and C-terminal caps to generate a DNA fragment that encodes a complete DARPin molecule.The purified DNA fragment generated by OE-PCR (Basic Protocol 1) is the starting material for this protocol.The starting material can be amplified using the M13.DARPin.For./M13.DARPin.Rev.primers to generate additional material, which may be required because purification of the ligation products from agarose gels is typically inefficient.Two different type IIs enzymes (BbsI and BsaI) will be used to generate compatible 5 and 3 overhangs, and ligation of the two differentially digested fragments will generate a DNA fragment containing two consecutive DARPin modules (Figs. 2 and 4).The two-repeat ("2XIR") module is then PCR-amplified to generate additional material that can be digested with a single type IIs enzyme (e.g., BsaI) and then ligated to a single "1XIR" module that has been digested with the compatible type IIs enzyme (e.g., BbsI) to generate a "3XIR" module containing three consecutive repeats (Fig. 5).Although most DARPin libraries employ two or three internal DARPin repeat modules, the digestion and ligation of DNA fragments could be repeated to generate any number of internal repeat modules.Large amounts of DNA fragments (∼10 μg) should be subjected to restriction enzyme digestion, as significant amounts of material are lost during the gel extraction and purification steps.

Reaction setup
1. Maintain reagents (e.g., enzymes) on ice or in a cold block.
2. Add reagents, except the enzyme, to a sterile PCR tube according to the manufacturer's recommendations and mix well by inverting the tube several times.Centrifuge briefly to bring the liquid to the bottom of the tube, add the enzyme, and repeat the mixing and centrifugation steps.

of 23
Current Protocols Restriction enzyme digestion 3. Set up two reactions with restriction enzyme digestion components (see Table 4), one with BbsI and the other with BsaI, to generate DNA fragments with compatible 5 and 3 ends.Using a thermal cycler, incubate the reactions for 1 to 6 hr at 37°C and then heat-inactivate the enzymes by incubation at 80°C for 20 min.
4. Electrophorese the entire reaction volume on a 2% agarose gel, excise the fragments with a razor blade, and subsequently column-purify using the Promega Wizard® SV Gel and PCR Clean-Up System.To maximize yield, use two elutions of 50 μl per column.
5. Quantify the purified DNA fragments using a spectrophotometer and verify that the restriction enzyme digest and subsequent purification step generated a single band using a 2% agarose gel.

of 23
Current Protocols  Ligation of the DARPin repeats 6. Mix the BsaI-and BbsI-digested fragments at a 1:1 equimolar ratio and ligate using T4 DNA Ligase following the manufacturer's protocol.Ligate at 16°C overnight using a thermal cycler or temperature-controlled water bath.
Due to incomplete ligation, we typically see 75% of the material fully ligated, and because of material losses during gel extraction, we typically ligate 2 to 5 μg DNA in 50-to 100-μl volumes using T4 DNA Ligase.
7. Electrophorese the entire ligation mixture on a 2% agarose gel and purify the ∼200-bp band corresponding to a successfully ligated 2× DARPin repeat (2XIR) by gel extraction.
The yield at this step is typically in the 25% to 50% range, which is sufficient quantity and diversity given that 10 ng of a 200-bp fragment is >4 × 10 10 molecules.9. Gel-extract the PCR-amplified 2XIR fragment, digest with a single type IIs enzyme (either BsaI or BbsI), and ligate to the purified single DARPin repeat that has been restriction-digested with the enzyme to generate a compatible overhang, as described in steps 6 to 7, to generate a 3XIR repeat.
10.As for the 2XIR repeat, purify the 3XIR fragment by gel extraction and PCR-amplify to generate sufficient material for ligation into a plasmid vector.Quantify the purified DNA fragment using a spectrophotometer, verify the fragment quality using a 2% agarose gel (Fig. 5), and store the fragments frozen at −20°C if not proceeding directly to Basic Protocol 3.
If the PCR product is of sufficient purity, it can be column-purified, although gel extraction may be necessary.
The output from this protocol is a DNA fragment containing two or more concatenated DARPin repeat modules with expected sizes of ∼250 bp (2XIR) and ∼350 bp (3XIR).

LIGATION OF INTERNAL REPEATS INTO CLONING/DISPLAY VECTOR CONTAINING N-AND C-TERMINAL CAPPING REPEATS
This protocol describes the large-scale ligation of the DARPin variable regions (Basic Protocol 2) into a vector containing the N-and C-terminal capping repeats.The vector will vary according to the chosen experimental system (bacteria, phage, yeast), although the strategy is essentially the same for the in vivo selection systems: a cassette containing the N-and C-terminal caps and a "stuffer" sequence containing the ccdB toxin gene, for negative selection of the parental template, is cloned in-frame with the 5 and 3 elements required for display (e.g., signal sequences, tethers, epitope tags for cell surface detection).A variety of display systems have been described for bacteria (Rice & Daugherty, 2008;Wendel et al., 2016), yeast (McMahon et al., 2018;Mohan et al., 2019), and phage (Pardon et al., 2014).As these systems have typically been used for display of nanobody or scFv fragments, optimization of the elements flanking the DARPin library may be necessary.Alternatively, the DARPin library can be assembled in a cloning vector for use in in vitro systems or for transfer into other systems that have issues with transformation efficiency.The sequence of the cassette that we use is available in the Supporting Information.Note that use of the ccdB negative selection element requires that plasmids containing the cassette be propagated in specific Escherichia coli strains (e.g., DB3.1).Alternatively, the cassette can be generated with a random DNA sequence inserted as the stuffer fragment.
Large-scale DNA ligation and subsequent electroporation are critical steps in generating large libraries and require optimization of ligation reactions and electroporation parameters.We have followed detailed ligation and electroporation protocols for the generation of bacterial peptide display (Getz et al., 2012) and scFv phage display (see Current Protocols article: Schladetsch & Wiemer, 2021) libraries to optimize parameters for our library, and we refer the reader to these excellent references.

Additional Materials (also see Basic Protocols 1 and 2)
PCR reagents (see   2. Add PCR reagents (see Table 5), except the enzyme, to a sterile PCR tube according to the manufacturer's recommendations and mix well by inverting the tube several times.Centrifuge briefly to bring the liquid to the bottom of the tube, add the enzyme, and repeat the mixing and centrifugation steps.

Vector preparation
3. Prepare the display vector in large quantities using a plasmid miniprep kit or by PCR amplification using the parameters in the footnote to Table 5, primers BsaI.Vec.For.and BsaI.Vec.Rev.(see Table 2), the cloning/display vector as a template, and KOD HotStart DNA Polymerase.If the vector is PCR-amplified, treat the PCR product with DpnI restriction enzyme and column-purify.Electrophorese the purified vector on a 1% agarose gel to confirm that only a single band is present (Fig. 5) and verify the sequence of the purified PCR fragment by whole-plasmid sequencing before proceeding to the next step.

Restriction enzyme digestion and analysis
4. Restriction-digest the 3XIR DARPin library (purified, concatenated 3XIR DARPin repeats generated in Basic Protocol 2) with BsaI and BbsI (see Table 6) while the vector (see step 3) is digested with BsaI alone, as the cassette is designed to generate compatible ends using a single enzyme (see Fig. 2).Simultaneously dephosphorylate the vector with rSAP to reduce background.Incubate the reactions in a thermal   Test ligations and the large-scale ligation to generate the final library require ∼7.5 μg digested and purified plasmid and ∼1.5 μg digested and purified DARPin insert, so an excess of material is digested, anticipating losses during purification.
5. Column-purify the 3XIR repeats.Gel-extract the miniprepped and digested vector to eliminate partially digested vector and the stuffer sequence.
6. Quantify the purified DNA fragments using a spectrophotometer and verify the fragment quality using a 1% agarose gel for the large DNA fragments and 2% agarose gel for small fragments (Fig. 5).
The DNA fragments are stored frozen at −20°C if not proceeding directly to the ligation (steps 7 to 9).
Test and large-scale ligations 7.After performing test ligations to determine optimal ligation ratios and minimize background (which have been described and will not be covered here; Getz et al., 2012; see Current Protocols article: Schladetsch & Wiemer, 2021), set up large-scale ligations as in Table 7 (which describes a 3:1 insert-to-vector ratio).Ligate at 16°C overnight using a thermal cycler or temperature-controlled water bath.
8. Column-purify the ligated library using the Zymo DNA Clean & Concentrator-25 using two elutions of 30 μl sterile nuclease-free water to maximize DNA yield.Further desalt the samples by drop dialysis for 2 hr using membrane filters floated on sterile water in a petri dish according to the manufacturer's instructions.Pipet the samples off the filters and then wash the filters with 25 μl water, add this water to the dialyzed sample, and quantify the DNA yield/recovery with a spectrophotometer.9. Perform large-scale electroporation, estimation of electroporation efficiency, and preparation of freezer stocks and library maxipreps as previously described in Morselli et al.
The output from this protocol is a DARPin-encoding DNA library that can be displayed by the organism of interest.

ESTIMATION OF LIBRARY SIZE AND DIVERSITY BY NEXT-GENERATION SEQUENCING (NGS)
An estimate of the library size can be calculated from the number of colony-forming units obtained after electroporation of the library into bacterial cells (Getz et al., 2012) (Basic Protocol 3).However, these measurements provide no information on library diversity, the correctness of the final assembled library, and the presence or absence of non-productive DNA species (e.g., cloning artifacts, frameshifts) The following protocol uses PCR amplification of the DARPin repeats from the plasmid maxiprep in Basic Protocol 3 to generate an NGS template.To avoid introduction of bias and other artifacts (see Current Protocols article: Podnar et al., 2014), the number of PCR cycles is limited to five, although an identical reaction of 25 to 30 PCR cycles is run in parallel to ensure that a DNA band of the correct size, as visualized by agarose gel electrophoresis, is generated by the PCR reaction.
The protocol will generate high-quality DNA fragments suitable for NGS analysis that will provide enough data to assess the library diversity.We suggest two potential instruments for NGS analysis, MiSeq and NovaSeq (both Illumina instruments), which are capable of the read lengths necessary for the full coverage of DARPin variable regions (∼300 bp); in most instances, researchers will access these instruments through core facilities or commercial entities, so instrument choice may differ.

Additional Materials (also see Basic Protocol 1)
PCR reagents (see Table 8) SPRISelect beads (Beckman-Coulter, B23317) 80% (v/v) ethanol (from 200-proof stock; make fresh) 10 mM Tris•HCl, pH 8 (Thermo Scientific, J22638.AP) High-Sensitivity D1000 Assay (D1000 High Sensitivity ScreenTape, Agilent   Each primer has a unique four-base sequence at the 5 end, and the complementary regions that the primers bind are staggered; the 5 heterogeneity and shift in binding region minimize the frequency of the same nucleotide in the same position, especially in the constant regions at the 5 and 3 ends of the amplicons, thus avoiding poor performance of the NGS instruments.
The volume of the 30-cycle reaction is reduced to minimize reagent usage.
2. Assuming bands of the correct size, approximately 340 to 350 bp, are obtained (as determined by analysis of the 30-cycle PCR reactions), purify the 5-cycle reactions from low-molecular-weight DNA fragments (<200 bp).
4. Transfer the tube to a magnetic stand for PCR tubes and incubate for 5 to 10 min or until the solution is clear.
The beads should be on the side of the tube in contact with the magnet.

of 23
Current Protocols 5.While the tube is still on the magnet, carefully remove the supernatant (containing fragments <200 bp), add 200 μl freshly prepared 80% ethanol solution, and incubate for 30 s.
6. Keep the tube on the magnetic stand and discard the ethanol-containing supernatant.
7. Repeat steps 5 and 6 an additional time for a total of two washes with 80% ethanol.
Completely remove any traces of ethanol and allow the samples to air-dry for 5 min or until the bead pellet is matte (not shiny).
8. Remove the tube from the magnetic stand and resuspend the bead pellet with 55 μl of 10 mM Tris•HCl, pH 8. Incubate for 5 min at room temperature.9. Transfer the tube into the magnetic stand and incubate for ≥1 min or until the solution is clear.
The beads should be on the side of the tube in contact with the magnet.
10. Measure the size distribution and concentration of the purified, PCR-amplified DARPin libraries using 2 μl for the High-Sensitivity D1000 Assay on an Agilent TapeStation.
11. Transfer 50 μl into a new tube for NGS library preparation following the manufacturer's instructions for the NEBNext Ultra II DNA Library Prep Kit for Illumina using NEBNext® Multiplex Oligos for Illumina during the ligation step.
12. Use the starting amount of DNA measured in step 10 to calculate the approximate number of PCR cycles for the final amplification step (according to the manufacturer's recommendations).
13. Purify the final library with SPRISelect beads similarly to steps 3 to 9, except that the ethanol-washed beads are resuspended in 20 μl of 10 mM Tris•HCl, pH 8.
14. Subject the final libraries to quality control using the High-Sensitivity D1000 Assay according to the manufacturer's instructions and quantify with the Qubit 1× dsDNA HS Assay Kit or equivalent and Qubit fluorometer or real-time machine.Then, dilute final libraries according to Illumina's recommendations and sequence on a NovaSeq6000 SP flowcell as 2 × 250 bases after spiking in 1% to 5% PhiX sequencing control in order to increase the complexity of the sequenced libraries.

Sample preparation for enriched library analysis
15.For analysis of enriched libraries after multiple selection rounds, use the same PCR reaction parameters as for the naïve library but PCR-amplify the DARPinencoding region from individual colonies or a cell suspension by colony PCR (Bonnet et al., 2013; see Current Protocols article: Woodman et al., 2016) using the DARP.Amplicon.F/DARP.Amplicon.R primer pair (Table 2) to generate a ∼365-bp PCR product.
16. Visualize, gel-extract, and quantify the PCR amplicon as in Basic Protocol 1.
17. Perform NGS to analyze enriched populations.
Highly enriched populations contain a minimal number of sequences compared to naïve libraries and do not require large amounts of sequencing data.A variety of core facilities and commercial vendors offer low-cost options that provide enough sequence data, ≥50,000 unique sequences, to sufficiently analyze enriched populations.

of 23
Current Protocols

NGS ANALYSIS OF NAÏVE AND ENRICHED LIBRARIES
Analysis of paired-end read data from NGS of DARPin PCR amplicons is complicated by the high sequence identity of internal DARPin repeats (75 of the 99 bp of a repeat are conserved, as they represent invariant amino acids), and programs that create merged sequences often discard sequences or incorrectly merge sequences (e.g., a DARPin sequence that contains three repeats is incorrectly assembled into a merged sequence containing two internal repeats with the sequence of one repeat discarded).Experimentally, this problem can be solved by using a single-read sequencing technology or by altering the library design such that internal repeats are generated with distinct primer pools that use different synonymous codons for invariant positions.Fortunately, NGmerge (Gaspar, 2018), an open-source program, is a capable of merging repetitive DARPin sequences.
Bioinformatic analysis of NGS sequence data requires a significant amount of expertise for complicated datasets.In contrast, basic DARPin library analysis to determine the number of unique sequences in naïve libraries, e.g., library diversity, and abundance of specific sequences in enriched libraries, e.g., enriched DNA sequences of the targetspecific binders, is relatively simple.Data quality can be assessed with free tools (e.g., FastQC), the paired-end reads assembled into a single read with NGmerge, and merged DNA sequences translated and sorted using basic php scripts.We have written a simple program (MAMETS) that uses demultiplexed files as input and that merges pairedend reads, removes the adapter sequences used by sequencing instruments, reversecomplements reverse sequences, translates DNA sequences into one-letter amino acid code, sorts sequences by abundance, and writes out basic statistics on sequence abundance and characteristics.Parameters are easily modifiable for analysis of other library types (e.g., nanobodies).

Materials
Demultiplexed NGS dataset (provided by sequencing service; see Basic Protocol 4) Computer with Linux and php interpreter installed Procedure 1. Download demultiplexed NGS dataset from sequencing service/facility and perform simple quality-control checks using FastQC (https:// www.bioinformatics.babraham.ac.uk/ projects/ fastqc/ ).Specifically, verify that the sequence length and %GC content match the anticipated values.
In addition, the "per base sequence quality" plot can indicate issues with library preparation or with the sequencing instrumentation.Examples of good and bad sequencing results are provided in the FastQC documentation.
2. Download MAMETS (https://srv.mbi.ucla.edu/MAMETS)and extract/unzip into an accessible directory.Review the instructions for running the program in the README.txtfile.
3. Edit the parameter file (labeled parameters) to indicate the location of the NGS sequence files (inputR1 and inputR2).
4. Modify the parameter file to include the 5 and 3 invariant nucleotide sequences that flank the variable library region for sequences in both the forward and reverse directions.
The current parameter file contains the invariant nucleotide sequences flanking the DARPin library created by this set of protocols.
5. Set the NGmerge mismatch parameter (probability) for file merging.The default in the parameter file is 0.2.Increasing the mismatch parameter typically yields more successfully merged files but with a potentially higher error rate (Gaspar, 2018).
6. Edit the parameter file to specify the minimum sequence length of the merged files for subsequent analysis.
The default length in the parameter file is for DARPin libraries created by this set of protocols, which are expected to contain a minimum of two internal DARPin repeats (hence, a minimum size of 224 bp, corresponding to two 99-bp repeats and the invariant flanking sequences).
7. Specify the reading frame in the parameter file for translation of merged nucleotide sequences into one-letter amino acid sequences.
The default reading frame is +1, as the invariant flanking regions specified in the parameter file are in-frame.Sequences in the reverse orientation will be reversed-translated so that all the resulting amino acid sequences are in the forward reading frame.
8. Modify the skipN parameter in the parameter file to specify the threshold for writing out translated amino acid sequences.
Enriched cell populations have a small number of sequences overrepresented due to the selection procedure, with background noise consisting of many unique sequences present in small amounts; in this instance, a threshold of 500 to 1000 is a reasonable starting point.The threshold can be modified based on the histogram of sequence counts displayed in the results (Fig. 6).For naïve libraries, high diversity of nucleotide sequences is expected, with no particular sequence overrepresented; the skipN parameter could be set at 100 in this instance to observe particular sequences that may be overrepresented due to bias introduced during library construction.
The program writes out abundant amino acid sequences, histograms of sequence counts and lengths, and statistics on total number of sequences, number of unique sequences, and number of sequences containing stop codons (terms).Intermediate files generated during sequence merging and analysis can be removed or written out.An example of MAMETS output for an enriched DARPin library is shown in Figure 6.

COMMENTARY Background Information
Library generation is a time-consuming endeavor that requires considerable planning in order to generate a highly diverse library capable of producing high-affinity binders.The first consideration is which DARPin scaffold to use, as there are a number of variations (Schilling et al., 2014;Seeger et al., 2013) on the original scaffold (Binz et al., 2004); the approach described herein uses a secondgeneration scaffold (Seeger et al., 2013).The second consideration is the choice of amino acids at each variable position and the degenerate codons to be used to incorporate the desired amino acids.Many protein binder libraries have chosen to exclude cysteine, due to its chemical reactivity, and proline, due to conformational restraints.Excellent discussions on the choice of amino acids to include at variable positions in DARPin and nanobody libraries have been published (Chen et al., 2021;McMahon et al., 2018;Seeger et al., 2013;Valdés-Tresanco et al., 2022).Using degenerate codons to encode amino acids at variable positions has the potential to bias the amino acid composition at each position toward amino acids encoded by four or more codons (e.g., arginine, serine, leucine) at the expense of those amino acids encoded by two or fewer codons; the use of a different construction method, e.g., gene synthesis using nucleoside phosphoramidites, allows fine control of amino acid distribution but is significantly more expensive.Various programs (Jacobs et al., 2015) and web-based services (http:// algo.tcnj.edu/decodoncalc/ ) are available to assess and evaluate the possible codon choices.The simplest choice is to use NNK or NNB degenerate codons, which encode all 20 amino acids, but at the expense of skewed amino acid frequencies and the inclusion of a stop codon (TAG).A recent synthetic nanobody library was generating using NNB codons but included an in vitro translation technique to eliminate transcripts that included stop codons (Chen et al., 2021); this procedure could be adapted to the creation of a DARPin library.Our approach uses a number of different degenerate codons at each variable position to produce a balanced distribution of amino acids with desired chemical properties and no stop codons and which requires ∼90 oligonucleotide primers.
Finally, the desired library diversity and thus size must be considered.The general consensus is that a library size of >10 8 unique sequences is sufficient to identify high-affinity binders to the target of interest.High diversity at the molecular level is relatively easy to achieve (1 ng of a 99-bp DNA fragment encoding a DARPin repeat is 1 × 10 10 molecules) and is part of the allure of cell-free systems (e.g., ribosome and mRNA display), which can easily achieve library sizes in excess of 10 12 unique sequences (He & Taussig, 2002).In vivo systems are, however, limited by transformation efficiency of the organism of choice, which typically restricts library sizes to 10 8 to 10 10 .

Critical Parameters
Beyond the initial design of the library, the most critical factors in the described experiments are minimizing bias in the library and generating sufficient quantities of input DNA for each protocol.Bias in the library can be introduced by PCR amplification during library generation and in amplicon preparation for NGS analysis through a number of mechanisms (Kebschull & Zador, 2015;Krehenwinkel et al., 2017).Use of highfidelity DNA polymerases and minimization of the number of PCR amplification cycles, for all protocols, can significantly reduce the introduction of bias into the naïve library or PCR amplicons prepared for NGS.Large amounts of input DNA (in excess of 10 μg) are required for most of the library construction protocols, so PCR reactions should be scaled accordingly, taking into consideration that significant amounts of material are lost during manipulation (purification, restriction enzyme digestion, and ligation) of the DNA fragments.

Troubleshooting
Common problems in executing the protocols are listed in Table 9.

Understanding Results
This series of protocols explains a relatively straightforward method to generate a diverse DARPin library and a simple program (MAMETS) to analyze library diversity.Enriched libraries obtained using the display technique of choice are analyzed using the same program to determine which binder sequences have been enriched.The MAMETS program can also be applied to other library types (e.g., nanobody or scFv) by modifying sequence-specific parameters in the parameters file.

Library construction
The library is constructed using standard procedures (PCR, restriction enzyme digestion, ligation) common in labs that routinely use molecular biology techniques.Basic Protocols 1 and 2 generate about 100-to 300-bp DNA fragments that encode single or concatenated DARPin repeat modules, and Basic Protocol 3 incorporates the library into an appropriate display vector.The results from these protocols will be DNA fragments or plasmids that are visualized by agarose gel electrophoresis (Figs. 3 and 5) to ensure that the DNA fragments are the correct size.The presence of M13 primer sequences on the 5 and 3 ends of the DARPin repeats and vector-specific sequencing primers, for display vectors, allows Sanger sequencing of the DARPin repeats as they are amplified, concatenated, and ligated into the display vector to estimate whether library construction is proceeding correctly.Sanger sequencing results can verify whether the fragments are in-frame and if concatenated repeats are correctly assembled (note that invariant bases will be correctly assigned, whereas variable bases will be assigned two or more nucleotide bases depending on the degenerate base).

Library analysis by next-generation sequencing
Sample preparation of the naïve library for NGS analysis requires similar molecular biology techniques (PCR, ligation) as for library construction but also requires beadbased DNA fragment purification and sensitive DNA quantification methods and instrumentation.Biochemistry labs that lack the necessary expertise and instrumentation may elect to provide the PCR amplicons generated in the first two steps of Basic Protocol 4 to a sequencing facility for sample preparation and sequencing.The result from Basic Protocol 4 is raw sequencing data; inexpensive amplicon sequencing of enriched libraries will

of 23
Current Protocols generate 50 to 200K reads, whereas the thorough analysis of naïve libraries using a No-vaSeq6000 SP flowcell, as detailed in this protocol, will produce in excess of 1.3 × 10 9 reads.
Basic Protocol 5 uses the raw sequencing data generated by Basic Protocol 4 as input, with the result being an easy-to-interpret collection of statistics (Fig. 6).Enriched libraries will have a relatively small number of sequences that are very abundant and that represent likely binders or false positives that bind to a component of the display system.The results for analysis of the naïve library will, optimally, show no enrichment of any particular sequence and a very large number of unique sequences, indicating high diversity.

Time Considerations
The choice and design of the library of interest should be carefully considered, as a significant amount of time will be required to generate the final library.Basic Protocol 1 can be accomplished in 1 week if multiple parameters are evaluated early (e.g., polymerase and PCR reaction conditions).Basic Protocols 2 and 3 may take an experienced researcher several months to accomplish, as restriction enzyme digestions, ligations, and subsequent amplification of DNA products can take considerable time due to the inefficiency of the gel extraction and purification steps.Subsequently, the large-scale ligation requires large amounts of products to be prepared and small-scale test ligations and transformations to be carried out and optimized prior to scaling up the reactions.Preparation of sufficient amounts of electrocompetent cells in-house can also be time consuming; this step could be expedited by purchasing commercially prepared cells (which may be prohibitively expensive for very large libraries).Preparation of amplicons for NGS analysis (Basic Protocol 4) can be completed in 1 week, although the actual sequencing may take 1 to 2 weeks depending on the sequencing facility's work schedule.The analysis of the NGS data (Basic Protocol 5) can be completed in 1 day.

Figure 1
Figure 1 Structure of a DARPin (derived from PDBid 5MA6).(A) Cartoon representation of the crystal structure.The N-and C-capping repeats are in red and purple, respectively, and the three internal repeats are colored in shades of gray.The variable residues in internal repeat 2 are shown in stick representation and colored according to (C).(B) Internal repeat 2 from panel A rotated 90°t o show a side view of the repeat structure.(C) The amino acid sequence, in one-letter code, of the internal DARPin repeat generated using the protocols in this article.Morselli et al.

Figure 2
Figure 2 Experimental schematic from overlap extension PCR (OE-PCR) to final assembled library.Degenerate codons are indicated by circles in the oligonucleotide pool cartoon in the upper left.Restriction enzyme sites and M13F/R priming sequences are also indicated.

Figure 3
Figure 3 Generation of DARPin internal repeat module by OE-PCR.DNA ladder is loaded in the first lane, and the size of selected bands is indicated.Analysis shows bands of the anticipated size, ∼120 bp, are present for four template concentrations (1, 0.4, 0.1, and 0.025 μM) and two annealing temperatures (65°C and 60°C).

Figure 4
Figure 4Graphical representation of the procedures required for naïve library generation and analysis of naïve and enriched libraries using next-generation sequencing.

Figure 5
Figure5DNA agarose gel images for PCR-amplified 3XIR and display vector.DNA ladder is loaded in the first lane, and the size of selected bands is indicated.Left, PCR amplification of the 3XIR ligation product to generate additional material for subsequent restriction enzyme digestion and ligation into the cloning/display vector.3XIR species is present at the anticipated size of ∼360 bp; bands present at ∼250 bp show that some 2XIR is present and gel extraction is necessary.Right, PCR amplification of the display vector.

Figure 6
Figure6Example output of the MAMETS program, with statistics from an enriched population of cells selected to bind to a protein of interest.The information displayed is amino acid sequences occurring more than 1000 times, a histogram of sequence counts, the total number of merged sequences (22,642), the number of unique sequences in the file (5134), and the size distribution of unique sequences in the file.

Table 1
Degenerate Codons Used to Encode Variable Amino Acids at the Positions Pictured in

Table 2
Non-pool Oligonucleotides Used in Library Generation and Analysis

Table 3
PCR Reaction Setup for OE-PCR in Basic Protocol 1 a

Table 4
Restriction Enzyme Digestion of DARPin Repeats in Basic Protocol 2

Morselli et al. 10 of 23
PCR reaction parameters for amplification of the purified repeats are typically similar to those used to generate the initial repeat by OE-PCR, but some fine-tuning of the annealing or template concentration may be required.The number of PCR cycles is limited to 10 cycles to minimize PCR-induced bias in the library due to preferential PCR amplification of specific sequences.

Table 5
PCR Reaction Setup for Vector PCR in Basic Protocol 3 a The template concentration is 10 ng/μl.The reaction volume is divided into five portions of 100 μl each for PCR cycling with an annealing temperature of 62.5°C, extension time of 4 min, and 30 PCR cycles.
(Getz et al., 2012nts and equipment for whole-plasmid sequencing and for large-scale electroporation, estimation of electroporation efficiency, and preparation of freezer stocks and library maxipreps(Getz et al., 2012; see Current Protocols article: Schladetsch & Wiemer, 2021) Reaction setup 1. Maintain reagents (e.g., enzymes) on ice or in a cold block.

Table 6
Restriction Enzyme Digestion of 3XIR DARPin Repeats and Display/Cloning Vector in Basic Protocol 3

Table 7
Large-Scale Ligation in Basic Protocol 3

Table 8
Reaction Guidelines for Basic Protocol 4 Perform five cycles of amplification for library generation and 25 to 30 cycles for agarose gel electrophoresis. a

Table 9
Troubleshooting Guide for DNA Library Construction