CRISPR/Cas9‐mediated knockout of a prolyl‐4‐hydroxylase subfamily in Nicotiana benthamiana using DsRed2 for plant selection

The properties of host plants used for molecular farming can be modified by CRISPR/Cas9 genome editing to improve the quality and yield of recombinant proteins. However, it is often necessary to target multiple genes simultaneously, particularly when using host plants with large and complex genomes. This is the case for Nicotiana benthamiana, an allotetraploid relative of tobacco frequently used for transient protein expression. A multiplex genome editing system incorporating the DsRed2 fluorescent marker for the identification and selection of transgenic plants was established. As proof of principle, NbP4H4 was targeted encoding a prolyl‐4‐hydroxylase involved in protein O‐linked glycosylation. Using preselected gRNAs with efficiencies confirmed by transient expression, transgenic plant lines with knockout mutations in all four NbP4H4 genes were obtained. Leaf fluorescence was then used to screen for the absence of the SpCas9 transgene in T1 plants, and transgene‐free lines with homozygous or biallelic mutations were identified. The analysis of plant‐produced recombinant IgA1 as a reporter protein revealed changes in the number of peptides containing hydroxyproline residues and pentoses in the knockout plants. The selection of efficient gRNAs combined with the DsRed2 marker reduces the effort needed to generate N. benthamiana mutants and simplifies the screening processes to obtain transgene‐free progeny.

deletion of a few nucleotides at the target site. [2] If the DSB is introduced near the beginning of the coding sequence, the resulting indel often causes a frameshift mutation that leads to premature termination during protein synthesis, which is a simple way to generate knockout mutants. More sophisticated outcomes can be achieved by tweaking the system components. For example, the provision of a repair template can be used to generate clean knockout and knock-in events by favoring the homology-dependent repair pathway. [3][4][5] Furthermore, the expression of multiple gRNAs targeting different genes and/or a single gRNA targeting a conserved sequence in multiple genes can facilitate multiplex genome editing. [6] The CRISPR/Cas9 system has been applied in all major food/feed crops [7] and has been used primarily to improve traits such as biotic and abiotic stress tolerance, yield, and nutritional quality. [8,9] However, it can also be used to enhance the industrial performance of plants, including the optimization of recombinant protein manufacturing by molecular farming. Examples include the improvement of protein levels and structural properties by interfering with the expression of host genes encoding endogenous proteases [10] and glycosyltransferases, [11][12][13] as well as targeting the RNA silencing machinery. [14] Many such examples have been discussed in recent reviews. [15] One of the most widely used hosts for molecular farming is Nicotiana benthamiana-a close relative of common tobacco. This species has been used extensively as a model for the functional analysis of plant genes and interactions with pathogens because it has a short lifecycle and carries a mutation that makes it hyper susceptible to many viruses. [16] This prompted its development as an expression host for viral vectors, and more recently it has emerged as the key industrial platform for transient expression, particularly for the production of antibodies [12,17,18] and vaccine candidates. [19] For example, N. benthamiana was used to manufacture ZMapp, which is a cocktail of three IgG antibodies for the treatment of Ebola virus infections, [20] as well as clinical vaccine candidates against hepatitis B virus, [21] influenza virus, [22] HIV [23] and, most recently, SARS-CoV-2. [24] The optimization of N. benthamiana and other tobacco host species is challenging due to the large and complex allotetraploid genome, which means that multiple copies of each target gene are usually present and must be mutated simultaneously. [25] Many N. benthamiana products have been manufactured in a transgenic line (ΔXT/FT) in which the N-linked glycosylation pathway is suppressed by RNA interference (RNAi), but silencing is incomplete leading to the production of some plant-type complex N-linked glycans. [11] More recently, multiplex genome editing has been used for the same purpose. Several xylosyltransferases (XylT) and fucosyltransferases (FucT) were knocked out in N. benthamiana using TALENS [26] and the complete set of XylT and FucT genes was subsequently knocked out using the CRISPR/Cas9 system, leading to products with no detectable planttype complex N-linked glycans at all. [27] Similar work has been reported in tobacco BY-2 cells. [28,29] In the context of O-linked glycosylation, the knockout of a single prolyl-4-hydroxylase (P4H) gene via homologous recombination in the moss Physcomitrella patens showed promise for the production of recombinant human proteins devoid of undesir-able plant-specific hydroxyproline residues. [30] Most studies in higher plants, however, have focused on the synthesis of human mucin-type O-linked glycans rather than the suppression of plant-type O-linked glycans. [31][32][33] We, therefore, targeted the N. benthamiana P4H4 gene family, which encodes some of the key metabolic enzymes responsible for the first step of plant-type O-linked glycosylation. [34] Gene-edited plant lines are usually produced by the in vitro regeneration of modified cells under selection, but this is a laborious and time-consuming process that involves the handling and genotyping of large numbers of plants. Typically, the CRISPR/Cas9 cassette is left in the genome, and further work is required to segregate the transgene from the mutations and thus generate transgene-free lines. [35,36 ] The workload becomes more difficult to manage for the introduction of multiple mutations, and ideally, it would be possible to introduce all mutations and remove the transgene cassette in the T0 and T1 generations, respectively. Fluorescent proteins can be used for the noninvasive confirmation of transgene removal, and this has been applied to CRISPR/Cas9 editing in many species. [35,37,38] Although green fluorescent protein (GFP) is widely used as a marker, its spectral properties overlap with several plant pigments and a preferable alternative is DsRed2, a mutant of the DsRed protein from the coral Discosoma sp., which is easily distinguished from plant cell autofluorescence. [39] When the DsRed gene is linked to a transgene encoding the recombinant protein of interest, it can be used not only as a transgene marker but also in many cases as a semiquantitative indicator for the expression of the linked transgene. [40,41] Here we extend CRISPR/Cas9 genome editing in N. benthamiana by adding a DsRed2 reporter to simplify transgene tracking in the mutated population at any vegetative stage across the T0 and T1 generations. The vector, which produces multiple gRNAs released from a concatemer by Csy4 ribonuclease, [42] was optimized by adding a scaffold attachment region (SAR) and a plastid-targeted DsRed2 sequence to maximize the local concentration of DsRed2 and thus reduce the limit of detection. [41] As proof of concept, we targeted P4H group 4 (NbP4H4) genes, which are strongly expressed in N. benthamiana leaves. [43] Our system achieved strong DsRed2 fluorescence in N. benthamiana leaf tissues during transient expression and stable transformation experiments. This not only simplified the selection of T0 transformants likely to contain the desired mutations but also allowed the early identification of homozygous and biallelic quadruple knockout T1 seedlings lacking the SpCas9 transgene. Mass spectrometry analyses of a reporter protein harboring multiple proline and O-glycosylation sites showed some changes in the hydroxyproline content when produced in edited plants, thus confirming the effectiveness of the editing tool in a functional context.

Plant material and growth conditions
N. benthamiana seeds were germinated in bulk in a small greenhouse box containing soil, which was placed in a growth chamber under long-day (16-h photoperiod) conditions. We transferred 1-week-old seedlings into separate pots for maturation. Plants used for transient expression were kept at room temperature after agroinfiltration and were maintained in the greenhouse with a long-day photoperiod. Transgenic plants were generated by the Agrobacterium-mediated transformation of cotyledons. Sterile seedlings and tissue cultures were grown in an incubator at 25 • C with a long-day photoperiod. After rooting, T0 primary transformants were transferred to soil and grown in a growth chamber under long-day conditions at 24 • C. All subsequent generations were propagated directly in the growth chamber starting from seeds as described above.

Phylogenetic analysis
We targeted the N. benthamiana P4H4 genes by locating all putative P4H sequences in the N. benthamiana genome databases using BLAST, starting with the protein and DNA sequences of the known homolog Nbv6.1trP32386. [43] We screened the N.

Design of gRNAs
We used CCTop software (https://cctop.cos.uni-heidelberg.de:8043/) to identify putative target sites [45] with Tobacco "Nicotiana benthamiana Niben101" set as the reference genome. The parameters were set to 20 nt maximum length and 12 nt seed sequence length allowing up to four mismatches for the identification of off-targets and up to two mismatches in the core region. We selected gRNAs with no predicted off-targets within other genes and that targeted the N-terminal region of the coding sequence of at least two NbP4H4 homologs simultaneously (Supplementary Table S1).

Construct design and cloning
We developed two CRISPR/Cas9 binary vectors (pBV114 and pBV113) containing the DsRed2 gene. The vectors were identical except that the former contained a SAR between SpCas9 and DsRed2 transcriptional units ( Figure 1). The vectors were generated by combining conventional restriction and ligation techniques with Golden Gate assembly, using primers synthesized by Sigma-Aldrich (Germany) and cloning reagents from Thermo Fisher Scientific (Austria).
The concatenated gRNA cassettes were assembled as previously described. [42] Briefly, we added SapI and AarI restriction sites to vector pJET1.2 (Thermo Fisher Scientific) to produce the modified vector pUV1. This allowed the Golden Gate assembly of gRNA cassettes at the SapI site, followed by transfer to the destination vector pBVM5/pBVM5.2 using the AarI site. Vectors pYLCV1 and pOGS1, containing the Cestrum yellow leaf curling virus (CmYLCV) promoter [46] and an optimized sgRNA scaffold, [47] were used as templates for the amplification of the corresponding cassette elements. Partially overlapping protospacer halves together with Esp3I restriction sites were incorporated into the primers such that unique complementary ends were formed by restriction, ensuring ordered ligation of the elements and the reconstruction of the protospacers in the resulting cassette. Likewise, SapI sites flanking the outermost elements of the cassette allowed its incorporation into pUV1 for amplification in Escherichia coli.
After sequencing, the intermediate plasmids containing the assembled gRNA cassette to confirm the lack of mutations, the gRNA module controlled by the CmYLCV promoter was transferred to binary vectors pBVM5 and pBVM5.2 to create the final vectors pBV113 and pBV114, respectively. The binary vector pBVM5 was constructed using parts originating from vectors pDIRECT_21C, [42] pTRAkt_HC, [48] B357p9ioR-35sCasWT (DNA Cloning Service, Germany) and pDsRed2 (Clontech/Takara, USA). The DsRed2 gene was fused to a plastid transit peptide sequence from the barley starch synthase I gene [41] and was also equipped with a C-terminal 3xFLAG tag sequence. Vector pBVM5.2 is a derivative of pBVM5 with the SAR removed.

Transient expression in N. benthamiana
The binary vectors were introduced into chemically competent Agrobacterium tumefaciens strain GV3101(pMP90) using the freezethaw method. Overnight cultures containing pBV113 or pBV114 were prepared from a single colony in a YEB medium supplemented with 25 mg L -1 gentamycin and 100 mg L -1 spectinomycin. The next day, the cultures were diluted to OD 600 = 0.4, and a syringe was infiltrated into the youngest fully expanded leaves of 5-week-old wild-type plants by agroinfiltration. [11] The tissues were collected 5 days post infiltration (dpi), snap-frozen in liquid nitrogen, and stored at -80 • C for further analysis. Seeds from Cas9-positive T0 plants were germinated directly in the soil and the seedlings were screened to identify transgene-free T1 plants.

Mutant screening
Genomic DNA was isolated from agroinfiltrated N. benthamiana leaves using the NucleoSpin Plant II kit (Macherey-Nagel, Germany). To assess gRNA efficiency, each targeted exon was amplified using DreamTaq polymerase (FisherSci, Austria) and gene-specific primers based on the N. benthamiana draft genome sequence (Supplementary Table S2).

Fluorescence analysis
A white light source with a green excitation filter was used to visualize macroscopic DsRed2 fluorescence, which was observed through a red filter as previously described. [40] Infiltrated leaf tissues were observed with a Leica DM5500B fluorescence microscope equipped with a DFC 300 FX camera and DsRed filter set, and with a Leica SP5 confocal microscope. Samples were mounted in tap water on a glass slide.
DsRed2 was excited at 561 nm and fluorescence emission was monitored at 573-642 nm. Images were captured using Leica Application Suite v4.10.0.

Production and peptide analysis of recombinant IgA 1
Three individual biological replicates for the edited and two for the wild-type plants were produced by infiltrating the construct into the  Table 1B) was performed as described previously. [12] Pooled protein fractions from each sample were then further concentrated using Amicon centrifugal filters with an MWCO of 10,000 Da (Merck, Austria) and 20 μg in solution submitted for glycopeptide analysis for each sample. Purified IgA1 samples were subject to reduction, S-alkylation (iodoacetamide), and digestion with sequencing grade trypsin (Promega, Walldorf, Germany) and GluC (Promega). Glycopeptides were then processed as described previously. [12] In short, the digested samples were loaded on a C18 column (ACQUITY PRM HSST3 Manual glycopeptide searches were made using MassHunter 10.0 (Agilent). For the quantification of the different glycoforms, the peak areas of EICs (Extracted Ion Chromatograms) of the first four isotopic peaks were summed, using the quantification software Skyline. Values were averaged over three biological replicates (n = 3) for the mutant and two biological replicates for the wild type (n = 2) and the standard deviation was calculated.

Design of gRNAs targeting N. benthamiana P4H4 genes
We selected NbP4H4 as our target because transcriptome analysis has shown high corresponding transcript abundance in N. benthamiana leaves (Gene Expression Atlas v6, https://benthgenome.qut.edu.au/).
We used the published NbP4H4 sequence data [43] to screen all available N. benthamiana sequence resources with BLAST. N. benthamiana has a 3.1-Gbp allotetraploid genome spanning 19 chromosomes, and two independently assembled draft genome sequences [25,50] are publicly available via the SOL Genomics Network (https://solgenomics. net/) and the Queensland University of Technology website (https:// benthgenome.qut.edu.au/). Furthermore, both PacBio and hybrid assemblies of the LAB strain genome [51] have been made available under "reserved analyses" status via Apollo (https://apollo.nbenth.  Note: Amplicons spanning these target sites were sequenced and the mutation frequencies were evaluated using TIDE. Mutation frequency is a percentage that refers to all indels (p < 0.001) after decomposition, whereas the knockout (KO) score is a percentage that refers only to indels leading to a frameshift. (B) Mutations in T1 plants lacking the Cas9 transgene (confirmed both by PCR and the lack of DsRed2 fluorescence). The lengths and types of the indels are indicated (deletion -, insertion +, wild-type wt). The zygosity of the mutation was determined according to the presence or absence of wild-type sequences and the number of overlapping chromatogram traces. Heterozygous (HET) = one allele is wild-type, the other is mutated. Biallelic (BIAL) = both alleles are mutated, but the mutations are distinct. Homozygous (HOM) = both alleles carry the same mutation. a n/a = not available because the sequence chromatogram comprised too many overlapping traces to be properly analyzed for indels (after including three technical replicates). b n/a = not applicable because the prevalence of an inversion is not quantifiable in this setting (TIDE can only detect indels ≤ 50 bp).
combine the Csy4 processing system for polycistronic gRNA from Pseudomonas aeruginosa [52] with the DsRed2 gene for efficient transgene selection in T0 and counterselection in T1.

Determination of gRNA efficiencies and confirmation of DsRed2 fluorescence
We assessed the relative efficiencies of the eight gRNAs by tran-  Figure 3A) and was clearly distinguished from the autofluorescence of chlorophyll and other pigments in wild-type plants ( Figure 3B). Epifluorescence and confocal microscopy showed, as expected, that DsRed2 was localized in the plastids of agroinfiltrated plants ( Figure 3C) but not in wild-type plants ( Figure 3D). Confocal microscopy confirmed that the autofluorescence detected at the excitation wavelength of chlorophyll was not detected at the lower wavelength used to excite DsRed2 ( Figure 3D).

Generation, selection, and analysis of P4H4 quadruple knockouts
We combined gRNAs G3 and G7 and placed them in tandem in binary vector pBV113 for stable transformation experiments, thus targeting six potential mutation sites (two each in NbP4H4_1 and  Figure S1B). PCR with exonspecific primers for each NbP4H4 gene followed by Sanger sequencing revealed genome editing events at all six target loci (Table 1A). Most showed an editing efficiency close to or above 90% in at least four target sites, indicating a high probability of heritability. [53] These plants were allowed to self-pollinate and produce seeds.

Identification of transgene-free T1 plants by DsRed2 screening
We germinated 20-50 seeds per line and visualized macroscopic DsRed2 fluorescence to distinguish between plants with transgene expression and potential negative segregants (Figures 3E,F). Individual plants lacking a DsRed2 signal were tested by PCR to confirm the absence of the SpCas9 cassette, and the mutant genotype was verified by Sanger sequencing (Supplementary Figures S2A,B). The zygosity of the mutations was determined by checking for the presence or absence of wild-type sequences and the number of overlapping chromatogram traces. Accordingly, we were able to readily identify a transgene-free plant among the progeny of NbpBV113_4, which contained biallelic or homozygous KO mutations in all four NbP4H4 genes (Table 1B). No visual phenotype correlating with the quadruple knockout was detected. However, a phenotype segregating with the SpCas9 cassette was observed in line NbpBV113_3, where the fluorescent plants were smaller than wild-type plants of the same age ( Figure 3E).
This phenotype was probably caused by a genomic integration effect.

Prolyl 4-hydroxylation and O-glycosylation status of recombinant IgA1
It is well known that the members of the P4H enzyme family are required to initiate the O-glycosylation of arabinogalactan-proteins, extensins, and recombinant glycoproteins produced in plants. [34] To assess the functional consequences of knocking out the P4H4 subset of these enzymes we used quadruple knockout plants for the recombinant production of IgA1 and subsequently analyzed its proline-rich hinge region. In LC-ESI-MS spectra of IgA1 peptide HYTNPSQD-VTVPCPVPSTPPTPSPSTPPTPSPSCCHPR ( Figure 4A) the conversion of up to five proline residues to hydroxyproline could be detected as well as the presence of additional pentoses, indicating arabinosylation. Relative quantification of glycoforms was performed to monitor changes due to genome editing. Since many of the glycosylated peptides were present only in low quantities, their total amount was summarized for each hydroxyproline-residue and compared to peptides with oxidized prolines lacking any pentoses ( Figure 4B). A reduction in the relative amounts of peptides containing one to four hydroxyprolines could be observed for the reporter protein produced in the NbP4H4 knockout mutant compared with the wild type, whereas peptides containing five hydroxyproline residues were present in similar amounts in both groups. Surprisingly, however, the relative amount of the unmodified hinge region was slightly reduced and the overall amounts of glycopeptides with pentoses were increased for the mutant, possibly indicating a rebalancing effect due to the remaining P4H activities. This indicates that members of the P4H4 subfamily contribute to hydroxyproline formation in a recombinant protein, resulting in a shift in glycosylation pattern, but further studies are needed to elucidate the exact mechanisms.

DISCUSSION
The functional analysis of plant genes by mutation is hampered by the presence of paralogs or homeologs with redundant or overlapping roles, particularly in polyploid species such as N. benthamiana.
The same issue arises when attempting to modify plants to remove undesirable properties, which is necessary during the domestication of crops, but also for the development of plants as a production platform for valuable molecules including recombinant proteins. We therefore established a rapid and straightforward procedure for multiplex genome editing in N. benthamiana followed by the selection of transgene-free mutant lines.
As proof of concept, we targeted a subgroup of NbP4H genes encoding prolyl-hydroxylases responsible for the first committed step of plant-type O-linked glycosylation. [34] There are major differences between O-linked glycosylation in plants and mammals, which makes it advantageous to eliminate the endogenous pathway in plants. Blocking the first step (the conversion of proline to hydroxyproline in the sequence -Ser-X-X-Pro-) is a promising strategy to achieve this. However, the complete abolition of NbP4H activity is challenging because the allotetraploid N. benthamiana genome contains a large number of P4H genes falling into at least four homology groups (NbP4H1, NbP4H4, NbP4H9, and NbP4H10). [43] NbP4H4, which shows the highest transcriptional activity in leaves, appears to comprise four homologous candidate genes. It can be difficult to correctly identify and differentiate homologs in N. benthamiana due to sequencing errors, incorrect annotations, and the diverse genotypes and accessions used in different experiments. [54] We therefore cannot completely rule out the possibility that even more NbP4H4 genes remain to be discovered in this species. Indeed, two of the NbP4H4 genes targeted in this study had been incorrectly annotated in one genome assembly, probably due to the presence of long introns that were misinterpreted as intergenic regions.
Despite the extensive sequence divergence between the homeologous NbP4H4 loci, we found one target site (G3) that was conserved in all four genes. However, a single gRNA targeting multiple loci can be insufficient for the complete abolition of gene function because the mutation efficiency often varies between sites. We therefore utilized a multiplex gRNA system in which concatenated transcripts of up to eight gRNAs are processed by Csy4 ribonuclease. [42] Testing was car- Previously reported editing vectors with integrated visual markers showed fluorescence mostly at the seed stage [35,55] or need special equipment and transformation methods to enrich for the desired mutants. [56] Others have used fluorescent proteins in cotransformation experiments with double T-DNA binary vectors without an editing cassette [57] or with multiple A. tumefaciens strains, needing confocal microscopy to evaluate the results. [37] In contrast, our system allows to report the presence of adjacent transgenes, as already reported for antibody-producing transgenic maize plants. [40] All six T0 plants displaying strong and evenly distributed fluorescence in the leaves also carried targeted mutations in the four NbP4H4 genes, and two of the plants showed editing efficiencies close to or above 90% at four of the target sites. It is tempting to speculate that selection for strong DsRed fluorescence may also favor the high expression of the Cas9 and gRNA cassettes located on the same T-DNA, increasing the mutation efficiency, in agreement with the finding that Cas9 availability is a limiting factor in multiplex genome editing. [38] However, this hypothesis would need to be confirmed with a larger sample of plants.
The ultimate aim of genome editing in plants is the generation of offspring that carry the desired homozygous or biallelic knockout mutations in all target genes but lack the SpCas9 transgene because this avoids the possibility of subsequent off-target mutations and also allows the direct comparison of mutant and wild-type plants without potential pleiotropic effects caused by transgene expression or integration. Transgene elimination is also important for plant breeding because it ensures regulatory compliance and genetic stability.
Finally, transgene elimination is required for the stacking of traits by the sequential combination of mutations. Traditional methods for the negative selection of transgenic plants include the use of markers that confer sensitivity to particular herbicides or antibiotics [58] and the use of fluorescent markers that allow visual screening. [59] Our vector system used DsRed2 to identify negative segregants in the T1 population derived from the two T0 plants with the highest editing efficiency.
We analyzed the T1 seedlings devoid of fluorescence and confirmed both the absence of the SpCas9 transgene and the heritability of the mutations. Despite analyzing only three individual siblings per line, we were able to identify quadruple knockout plants with homozygous or biallelic mutations in all the target genes.
The loss of P4H activity in Arabidopsis thaliana was reported to cause root-hair phenotypes and problems with cell wall assembly. [60] We did not observe any obvious phenotypes associated with the NbP4H4 quadruple knockout, perhaps because we selected only one particular subfamily of P4H enzymes. On a functional level, targeting this subset of enzymes in our study led to distinct changes in recombinant IgA1 peptides containing hydroxyproline residues and pentoses. Some of these changes, like slightly decreased amounts of the unmodified hinge region or overall increased amounts of glycopeptides with pentoses, were unexpected for a P4H knockout but may be explained by isoform-specific substrate preferences of the remaining P4H enzymes and other unknown factors. Indeed, a total of four homology groups of putative P4H genes have been described in N. benthamiana, each one comprising many paralogous and/or homoeologous copies, and the functional characterization of one representative gene from each group revealed very similar enzymatic activities but slightly different substrate preferences. [43] This highlights the difficulties faced when editing combinations of such genes to suppress the plant-specific oxidation of prolyl residues in recombinant proteins. However, improvements in multiplexing strategies and simple screening procedures, such as those demonstrated here, will facilitate the combined knockout of further members of this enzyme family to develop plant hosts in which the O-linked glycosylation pathway is disabled specifically in the tissues used for recombinant protein production.

CONCLUSION
The identification of mutant genotypes or phenotypes following mul-

ACKNOWLEDGMENTS
The authors acknowledge funding from the Austrian Science Fund FWF (project W1224). The MS equipment was kindly provided by the EQ-BOKU VIBT GmbH and the BOKU Core Facility Mass Spectrometry.

CONFLICT OF INTEREST
The authors declare no financial or commercial conflicts of interest.

DATA AVAILABILITY STATEMENT
The data that supports the findings of this study are available in the supplementary material of this article.