The rapidly advancing Class 2 CRISPR‐Cas technologies: A customizable toolbox for molecular manipulations

Abstract The CRISPR‐Cas technologies derived from bacterial and archaeal adaptive immune systems have emerged as a series of groundbreaking nucleic acid‐guided gene editing tools, ultimately standing out among several engineered nucleases because of their high efficiency, sequence‐specific targeting, ease of programming and versatility. Facilitated by the advancement across multiple disciplines such as bioinformatics, structural biology and high‐throughput sequencing, the discoveries and engineering of various innovative CRISPR‐Cas systems are rapidly expanding the CRISPR toolbox. This is revolutionizing not only genome editing but also various other types of nucleic acid‐guided manipulations such as transcriptional control and genomic imaging. Meanwhile, the adaptation of various CRISPR strategies in multiple settings has realized numerous previously non‐existing applications, ranging from the introduction of sophisticated approaches in basic research to impactful agricultural and therapeutic applications. Here, we summarize the recent advances of CRISPR technologies and strategies, as well as their impactful applications.


| INTRODUC TI ON
The clustered regularly interspaced short palindromic repeats (CRISPR)-CRISPR-associated (Cas) systems are naturally occurring adaptive immune systems in many bacteria and archaea, combating invading viruses and plasmids. 1 They act in three steps to provide protection from foreign invaders: (a) adaptation, during which a series of Cas proteins mediate the acquisition of invading nucleic acid fragments (spacers) into the CRISPR array loci of host cells; (b) biogenesis, which includes constitutive transcription from the CRISPR array followed by maturation of CRISPR RNA (crRNA) and continuous expression of Cas protein(s); and (c) targeting (also called interference), in which a crRNA would guide the effector complex containing Cas nuclease(s) to cleave homologous sequence(s) to destroy the invading nucleic acids. 2 The mechanism underlying the targeting step was found of great value to mediate RNA-guided DNA cleavage and has been exploited for programmable genome targeting. Compared to earlier tools developed for genome editing, such as zinc-finger nucleases (ZFNs) and transcription activator-like effector nucleases (TALENs), which rely on protein-DNA interactions for targeting and can only be programmed through protein engineering to modify their DNA-binding domains to re-target different sequences, the CRISPR-Cas systems provide much greater ease for reliable reprogramming. 3 The specificity of targeting is conferred by RNA-based guidance through base-pairing, and the guide sequences can be easily adjusted to target a new sequence with high certainty. 3 Therefore, this breakthrough has actualized the long-desired site-specific genome editing with high efficiency, high accuracy and ease of reprogramming.
According to the number of proteins involved in the targeting step, CRISPR-Cas systems are generally classified into two classes named Class 1 and Class 2. 4 In Class 1 systems, multiple protein units form an effector complex together with the crRNA to recognize and cleave a target sequence, whereas a single protein complexing with crRNA does the job in a Class 2 system. 4 To date, there are six types of CRISPR-Cas systems discovered: three of them (type I, type III and type IV) are identified as Class 1 systems, while the other three (type II, type V and type VI) are classified into the Class 2 category. 4 Due to the straightforward composition of the effectors, Class 2 systems have been intensively studied, engineered and applied for genome editing, and among which, the type II (Cas9) systems are the most thoroughly characterized and utilized. 5,6 In 2013, the Cas9 system was first applied in mammalian cells for site-specific genome editing, and the success has greatly encouraged investigations regarding other CRISPR-Cas systems for potentially better editing efficiency and novel applications. 5,6 Not only have additional type II systems been discovered from new species, but also new members of type V and type VI systems are being identified and characterized (Table 1 and Figure 1). 7 Along with the delineation of detailed structures and acting mechanisms of various Class 2 CRISPR-Cas systems, 4,7,8 optimizations and new applications have been conceived and accomplished. In this article, we will review representative members of Class 2 CRISPR-Cas systems that have been discovered so far, and discuss their wide-ranging applications in gene-editing, novel tools developed therefrom, as well as some prospective advancement of the CRISPR-Cas technology.

| T YPE II: THE CRIS PR-C A S9 SYS TEMS
The implementation of CRISPR-Cas9 systems for genome engineering should acknowledge the research efforts from different fields that have been continued through more than 30 years. 9 CRISPR sequences were first described to be arranged as direct repeats with spacers in 1987, 10 after which the natural functions and working mechanisms were unveiled step by step, setting up the stage for the first piece of experimental evidence regarding CRISPR-Cas-mediated adaptive immunity in 2007. 11 In 2013, marked by the first application of codon-optimized Cas9 originated from Streptococcus pyogenes, type II systems became the first CRISPR-Cas systems applied as programmable genome editing tools targeting mammalian cells, 5,6,12 prompting extensive investigations into the phylogeny, functions and structural details of known and newly identified CRISPR-Cas systems.
The type II CRISPR-Cas systems operate through an effector module consisting of a single Cas9 protein, a crRNA and a trans-activating crRNA (tracrRNA). 6 Each crRNA carries a guide sequence (20 nt) derived from a spacer at its 5′ end, which is capable of base-pairing with the homologous sequence found in the invader (target) DNA, and a repeat-derived sequence of 19-22 nt in length at its 3′ end, which hybridizes with a tracrRNA via complementary sequence. 13 The crRNAs are encoded by the CRISPR array in the host genome, which is first transcribed into a long pre-crRNA composed of multiple crRNA sequences. The pre-crRNA would then pair up with multiple tracrRNAs, each base-paring with one of the repeat sequences, and be processed by RNase III into separate crRNA-tracrRNA structures. Each crRNA, with its 3′ repeat sequence paired with a tracrRNA, would eventually form a complex with a Cas9 protein and act as a guide to aid the identification of any homologous sequences (target sites) found in the invader DNA. 13,14 These molecular principles have been exploited for targeting desired sequences through modifying the crRNA coding genes, thereby developing a programmable genome editing tool.
Furthermore, scientists have successfully engineered a single RNA chimera that mimics the structure of tracrRNA:crRNA complex. 15 This engineered single guide RNA (sgRNA) can fully replace the tra-crRNA:crRNA to direct the Cas9 complex for sequence-specific DNA cleavage, thus lowering the complexity of genome editing technology to a greater extent and providing the most commonly used CRISPR-Cas9 system for genome editing nowadays. 15 The base-pairing process between a Cas9-crRNA complex and its target sequence (also named protospacer) requires an additional short sequence located 3′ downstream to the target sequence on the non-targeted strand, which is known as the protospacer-adjacent motif (PAM). 16 When Cas9 protein(s) in a complex recognizes a PAM sequence through its PAM-interacting (PI) domain ( Figure 1), it triggers a DNA melting process at the adjacent regions as well as subsequent base-pairing between the crRNA (or sgRNA) and target sequence(s). 17 The proper interaction between Cas9 protein(s) and PAM sequence(s) is often essential, and it significantly affects the efficiency and specificity of subsequent targeting and cleavage. 17 The various Cas9 systems discovered from different species recognize distinct PAM sequences, which are effector-specific and often G-rich. 18 For practical applications in genome editing, the requirement of PAM enhances the specificity while restricting the selection of target sites from a specified genome. Moreover, it is often found that several nucleotides (the "seed" sequence) in the guide sequence would be first exposed to the solvent environment, which facilitates the subsequent full base-pairing process. 19 Altogether, the PAM sequence and the "seed" sequence are critical for the specificity of Cas9-catalysed DNA cleavage. 18 Structural studies of representative Cas9 effectors reveal that most of them have a bi-lobed architecture consisting of a recognition (REC)   lobe and a nuclease (NUC) lobe ( Figure 1A), which forms a positively charged groove to accommodate the negatively charged sgRNA:target DNA heteroduplex. 18,20,21 The NUC lobe consists of three domains: the PI domain interacting with PAM, as well as the RuvC and HNH domains that cleave the non-targeted strand and the target strand, respectively. 18 After being guided to a target sequence by a specified sgRNA, both the RuvC and HNH domains in Cas9 catalyse DNA cleavage to introduce a double-stranded break (DSB) with blunt ends. 18

| SpCas9
The Streptococcus pyogenes Cas9 (SpCas9) system mentioned above is the first and most widely applied CRISPR-Cas system harnessed for genome editing. 5

| SaCas9
The Staphylococcus aureus Cas9 (SaCas9) system is another widely studied CRISPR-Cas9 system. 20 SaCas9 is 1053 aa in size (about 3.2 kb), which is much smaller than SpCas9, thus enabling the simultaneous carrying of the Cas9 and sgRNA coding sequences in a single AAV vector. 20 A crystallographic study has shown that SaCas9 has a similar bilobed structure to SpCas9, although they shared only 17% sequence identity ( Figure 1A). 20 SaCas9 recognizes distinct PAM sequence 5′ NNGRRT (Table 1). 20 It is worth noting that while the pre-requisite of a longer PAM could largely reduce the off-target probability, it, however, reduces the number of potential targetable sites at the same time. Engineered variants of SaCas9 have been generated to recognize different PAM sequences such as 5′ NNNRRT, which provides opportunities to broaden the targeting range of CRISPR-SaCas9. 25

| Other type II CRISPR-Cas systems
As the earliest CRISPR-Cas systems identified and applied for genome editing, the type II family keeps providing new choices of Cas effectors.

TA B L E 1 (Continued)
Other than the above-mentioned SpCas9 and SaCas9, representatives of type II CRISPR-Cas systems also include Campylobacter jejuni  Figure 1A). CjCas9 (984 aa, about 3.0 kb) is the smallest Cas9 identified so far. 26 Its PAM sequences are reminiscent of the long PAM sequence for SaCas9, but vary among different reports. 21,26 The condensed size of CjCas9 has enabled the packaging of its coding sequence, together with a sgRNA cassette and a marker gene, in an all-in-one AAV vector for genome editing. 26

| T YPE V: THE CRIS PR-C A S12 SYS TEMS
Identification and characterization of Class 2 CRISPR-Cas systems other than the type II systems have helped expand the CRISPR-Cas arsenal for nucleic acid editing. 31 Among them, Cpf1 (later renamed as Cas12a) was the earliest to be characterized, 32 and subsequently, the C2c1 (Cas12b) and other type V (Cas12) systems were identified (Table 1 and Figure 1B). [33][34][35][36][37] Although type V CRISPR-Cas systems show great diversities, they still share some common characteristics that would distinguish them from the Cas9 systems. Firstly, the Cas12 nucleases possess one RuvC nuclease domain but no HNH domain, and they would recognize T-rich PAM 5′ upstream to the target region on the non-targeted strand, which is different from Cas9 systems which rely on G-rich PAM at 3′ side of target sequences (Table 1 and Figure 1B). 32 Secondly, Cas12 generates staggered DSBs distal to the PAM sequence, unlike Cas9, which generates a blunt end in the proximal site close to the PAM. 32 The staggered DSBs created by Cas12 may support a unique targeting strategy for gene knock-in via the non-homologous end-joining (NHEJ) mechanism. 32 Despite these common properties, the Cas12 systems present vast structural and functional diversities. In the following paragraphs, members of the type V CRISPR-Cas family that have been characterized will be introduced.

| Cas12a (Cpf1)
Cas12a was the first functionally characterized type V CRISPR-Cas system. 32 The sizes of Cas12a family proteins vary from about 1200 aa (3.6 kb) to about 1500 aa (4.5 kb) ( Table 1 and Figure 1). 32 Cas12a proteins adopt a bi-lobed architecture consisting of a REC lobe and a NUC lobe, which is reminiscent of Cas9, 38 Figure 1). 41 Although the exact mechanism has not been fully understood, studies have suggested that while the cis-cleavage by Cas12a requires recognition of PAM and a specific target sequence, the non-target DNA degradation through the trans-cleavage occurs in a PAM-and sequence-independent manner. 42,43 It is noteworthy that the trans-cleavage activity by Cas12a has been re-purposed for highly sensitive detection of specific nucleic acid sequence(s). 41

| Cas12b (C2c1)
The Cas12b proteins are another group of type V CRISPR-Cas effectors that have DNA-cleaving activity (Table 1 and Figure 1B). 34 Alicyclobacillus acidoterrestris Cas12b (AacCas12b), an example of  Figure 1B). 34 Unlike Cas12a, however, Cas12b requires both crRNA and tracrRNA to form an effector complex to proceed to the targeted DNA cleavage, which yields staggered DSBs with seven-nucleotide overhangs. 33,44 Interestingly, structure analysis showed that Cas12b employs a distinct strategy for target DNA recognition and cleavage when compared to Cas12a and Cas9. AacCas12b, for example, binds to the sgRNAs as a binary complex and then to target DNAs as ternary complexes, which permits the capture and cleavage of both the target and non-target DNA strands independently. 14,44 It is worth noting that the Cas12b system is highly sensitive to any single-base mismatch within the 20 nt spacer region, which suggests high specificity of CRISPR-Cas12b systems. 33 However, it should also be noticed that some Cas12b effectors, such as AacCas12b, possess ssDNAcleaving activity similar to Cas12a. 41

| Other type V CRISPR-Cas systems identified from metagenomic data
Recently, several other type V CRISPR-Cas members besides Cas12a and Cas12b have also been identified from metagenomic data through bioinformatic pipelines (Table 1 and Figure 1B). 35

| T YPE VI: THE CRIS PR-C A S13 SYS TEMS
Up to now, reported members of the type VI CRISPR-Cas family include Cas13a, Cas13b, Cas13c and Cas13d (Table 1 and Figure 1C). Distinct from Cas9 and Cas12, the Cas13 proteins possess unique properties to cleave ssRNA rather than DNA. 45 The subtypes of the Cas13 systems have their unique features while sharing some common characteristics. There is no DNA catalytic domain in Cas13 proteins; instead, researchers identified two conserved higher eukaryotes and prokaryotes nucleotidebinding (HEPN) domains, each containing an RNA cleavage site ( Figure 1C). 45 Members of the CRISPR-Cas13 system work as dual-component systems, in which a crRNA forms a complex with the Cas13 protein without involving any tracrRNA. 46 The flanking sequence(s) of protospacers, termed as "protospacer-flanking site" (PFS) and comparable to the "PAM" for Cas9 and Cas12, is essential for the RNAtargeting process (Table 1). 45 Another distinctive feature of the Cas13 systems is the collateral cleaving activity towards non-targeted, unspecific RNAs in the reaction environment. Upon binding with the targeted RNA, the catalytic pocket formed by the two HEPN domains is activated and can cleave exposed RNA indiscriminately in the solution, including endogenous RNAs of housekeeping genes. 46 This promiscuous RNase activity may protect bacteria from virus spread via infection-triggered cell death and dormancy induction. 45 Several subtypes of CRISPR-Cas13 systems have been introduced as potential tools for RNA editing. Among them, the structures and activities of Cas13a, Cas13b and Cas13d have been studied (Table 1 and Figure 1C). 45,47,48

| Cas13b
The Cas13b system is another member of type VI CRISPR-Cas13 systems characterized after Cas13a (Table 1 and Figure 1C). 47

| Cas13d
Cas13d is a new RNA-targeting effector whose crystal structure has recently been defined (Table 1 and Figure 1C). 52 Similar to other Cas13, it has two HEPN domains responsible for pre-crRNA processing ( Figure 1C). 53 Meanwhile, Cas13d possesses appealing characteristics for RNA editing. Its small size (less than 1000 aa, 3 kb) favours the delivery to designated organ systems via AAV, and the lack of PFS requirement makes it less restrictive in selecting target sequences for RNA editing. 48

| HI G H -EFFI CIEN C Y G ENOME ED ITING AND NE W APPLI C ATI ON S ENAB LED BY C ATALY TI C ALLY AC TIVE C A S9 AND C A S12
The unprecedented high efficiency, site-specific targeting and

| Introducing insertions or deletions through targeted DNA cleavage followed by NHEJ repair
Cas9 and Cas12 cleave dsDNA and generate DSBs with blunt and staggered ends, respectively. The subsequent repair via the highly active but error-prone NHEJ mechanism can lead to the efficient introduction of small insertions/deletions (indels) at the target sites, which disrupts the translational reading frame of a coding sequence or the binding sites of trans-acting factors in promoters or enhancers. The first evidence for successful genome editing mediated by CRISPR-Cas9 in mammalian cells was the detection of indels specific to the pre-selected target sites after Cas9 cleavage. 5,6 Soon this strategy was applied in mouse zygotes and achieved the one-step generation of Tet1/Tet2 double knockout mouse line. 54  Recently, the NHEJ repair mechanism has also been exploited to capture large foreign DNA at Cas9-induced DSBs, thus establishing a new homology-independent knock-in approach. 55 With the aid of a promoterless reporter system targeting at the universally expressed GAPDH, He et al have carried out a side-by-side comparison between homology-directed repair(HDR)-and NHEJ-mediated knock-in.
They found that the NHEJ pathway mediated homology-independent knock-in at previously unattainable high efficiencies, which was superior to commonly used HDR methods in all human cell lines examined. 56   the fact that the HDR mechanism mediates precise knock-in of desired sequences, recent studies showed that unwanted insertions of donor templates and high frequency of indels at target sites were introduced through NHEJ repair, which might not be avoidable. 61,62,68 Hence, further investigations into the functional impact of these unwanted modifications would be of interest.

| Advanced technologies for transgenesis
The programmable targeted genome editing brought about by the revolutionary CRISPR-Cas9 systems has enabled genetic modifications that were previously impossible, such as those in lower vertebrates and large mammals. 55,69 The CRISPR-based technology has widely revolutionized the field of transgenesis, and now it is possible to generate genetically modified animal models in almost any species. In line with this development, various delivery methods have been established. For small organisms, the Cas9, sgRNA and donor templates can be easily delivered in forms of plasmids, mRNAs or in vitro assembled ribonuclear protein complexes (RNPs) through direct injection into the gonads of, for example, the C elegans, 70 or into the zygotes and pre-blastoderm embryos of mice 61 and Drosophila. 71 Compared to the conventional transgenic technologies that involve a series of labour-intensive procedures and often take more than a year to produce genetically modified mice, 72

| Genome editing for disease therapy
Other than introducing genetic modifications that can be passed on across generations, CRISPR-Cas systems have also enabled the previously impossible somatic genome editing owing to the high editing efficiency permitted. By coupling with the AAV system, which is a clinically potent and safe vector for in vivo gene delivery, 77 104 to labelling molecules, 105 which will be discussed in this part.

| Base editing by fusing Cas9 with different deaminases
Cytidine deaminases such as APOBEC1 and AID catalyse the conversion of cytidine to thymidine (C → T) and, when fused with a dCas9 or nickase Cas9 (D10A), become a novel tool to mediate C•G to T•A substitutions within targeted sequences, which is now known as the base editor (BE). 103 As no enzymes are known to deaminate adenine in DNA, an RNA adenosine deaminase is fused to mutant Cas9 to realize specific A•T to G•C substitution in DNA. 106 Both types of BEs can catalyse base substitution in the genome without DNA cleavage, which can be applied under either in vitro or in vivo conditions. Since the first demonstration by Komor et al, 103 base editing systems, primarily the thirdgeneration BE (BE3), have been applied in a wide range of cell types, including various cell types from human and mouse. 103,107 Successful base editing has also been achieved in living animals such as mice 108  found to be promising for correction of more than 2,000 potential pathogenic point mutations. 107,112 The BE systems have also been applied for genome-wide knockout or mutagenesis screenings. The CRISPR-STOP 113  Nevertheless, further investigations are necessary to explore the full potentials as well as to overcome the limitations of these newly emerged base editing tools.

| Targeted transcriptional and epigenetic regulation via Cas9-guided activators or repressors
Besides gene editing, the CRISPR-Cas9 systems have also been em-

| CRISPR-enabled high-resolution genomic imaging in live cells
The Off-target editing at sequences similar to selected target sites has been observed with some commonly used Cas proteins (eg SpCas9 and SaCas9), which raises a critical concern to CRISPR-Cas technologies and it is especially undesirable for therapeutic applications. 22 Aiming at a thorough and unbiased detection of off-target events, multiple experimental methods have been established to capture the genomic landscape of CRISPR-Cas9 cleavages through high-throughput sequencing technology, which include GUIDE-seq, CIRCLE-seq and SITE-Seq. [132][133][134] While these technologies showed high sensitivity in detecting off-target events, some limitations were also noticed, including restrictions to cell models and high rates of false positives introduced by complicated experimental procedures. 135 At the same time, strategies have been adopted for enhancement of the specificity of CRISPR-Cas tools through protein engineering, sgRNA modifications and selection of suitable delivery methods. 136 For example, the enhanced specificity SpCas9 (eSpCas9) 137 and high-fidelity variant number 1 of SpCas9 (SpCas9-HF1) 138  light-sensitive molecules with dCas9. 131 In summary, the CRISPR-Cas technologies and its associated applications are expeditiously evolving and, thus, enabling numerous novel applications whose future development may go beyond the scopes that we could currently foresee.

CO N FLI C T O F I NTE R E S T
The authors declare no competing financial interests.

AUTH O R CO NTR I B UTI O N S
J.W. drew the figure and the table; J.W., C.Z. and B.F. wrote the paper. All authors have read and approved the final manuscript.

DATA AVA I L A B I L I T Y S TAT E M E N T
The data that support the findings of this study are available from the corresponding author upon reasonable request.