The applications of CRISPR/Cas system in molecular detection

Abstract The Streptococcus pyogenes CRISPR/Cas system has found widespread applications as a gene‐editing and regulatory tool for the simultaneous delivery of the Cas9 protein and guide RNAs into the cell, thus making the recognition of specific DNA sequences possible. The recent study that shows that Cas9 can also bind to and cleave RNA in an RNA‐programmable manner is suggestive of potential utility of this system as a universal nucleic‐acid recognition tool. To increase the signal intensity of the CRISPR/Cas system, a signal amplification technique has to be exploited appropriately; this requirement is also a challenge for the detection of DNA or RNA. Furthermore, the CRISPR/Cas system may be used to detect point mutations or single‐nucleotide variants because of the specificity of the recognition between the target sequence and the CRISPR/Cas system. These lines of evidence make this technique capable of detecting pathogens during infection via analysis of their DNA or RNA. Thus, here we summarize applications of the CRISPR/Cas system to the recognition and detection of DNA and RNA molecules as well as the signal amplification. We also describe its potential ability to detect mutations and single‐nucleotide variants. Finally, we sum up its applications to testing for pathogens and potential barriers for its implementation.

in the sgRNA are customizable, 2 which makes the design of sgRNA easy in practice, greatly expanding its applications in biological research. In this review, we provide a brief summary of the applications of the CRISPR/Cas system to molecular detection including detection of specific sequences of DNA or RNA, single nucleotide variants and pathogens.

SPECIFIC SEQUENCES OF DNA OR RNA
DNA fluorescence in situ hybridization (FISH) has been widely used to visualize sequence-specific genes for research and diagnostic purposes. Nonetheless, it requires heat treatments and formamide to denature double-stranded DNA (dsDNA) to enable probe hybridization; these treatments may affect integrity of the biological structure and organization of the genome. Accordingly, Baohui Chen 7 employed an enhanced green fluorescent protein (EGFP)-tagged dCas9 protein and a structurally optimized sgRNA to image repetitive elements and non-repetitive genomic sequences. They modified the sgRNA design to enhance its stability and to promote its assembly with the dCas9 protein. They substituted an A-U base pair flip with a putative Pol-III terminator in the sgRNA stem loop to avoid premature termination of U6 Pol-III transcription and extended the dCas9-binding hairpin structure to improve sgRNA-dCas9 binding.
Both modifications increased the targeting efficacy and decreased background and nucleolar signals. They combined the two modifications and found that this approach not only improves the efficacy of imaging but also increases the efficiency of gene regulation. They designed 73 sgRNAs targeting both DNA strands spanning a 5 kb non-repetitive region of the MUC4 gene and concluded that 26 to 36 sgRNAs are sufficient to detect a non-repetitive genomic locus by means of CRISPR. They also labelled different regions of the same gene or different genes with multiple sgRNAs and reported that the spots of imaging via two genes increased whereas those of the same gene did not, suggesting that CRISPR can label multiple genomic elements at the same time. On the basis of this observation, Wulan Deng 8 think up a way to directly detect a sequence of interest in a genome without denaturation of dsDNA by applying the dCas9/sgRNA complex to FISH in fixed cells on account of this complex's strong and stable affinity for its target DNA. They coexpressed a fluorescent protein with the dCas9 protein and utilized the direction of sgRNA with a target sequence to detect and visualize specific sequences of DNA. Both the major satellite and telomere that contain hundreds to thousands of repeats and some repetitive sequences with tens to hundreds of copies for sgRNA targeting were imaged successfully in fixed cells. Nevertheless, as to the non-repetitive sequences, the labelling efficacy of the target gene was low and the background signal was quite strong, suggesting that sgRNA's targeting efficacy should be optimized for a non-repetitive sequence. This is because the fluorescent signal of a few dCas9-sgRNA complexes in the target region was insufficient for detection. According to the research of Baohui Chen, 7 at least 26 unique sgRNAs are needed to target the same region to obtain a detectable signal. On the other hand, this approach has been difficult to practice in biological applications because of the challenges in the delivery of dozens of sgRNAs into the cell and large numbers of off-target sites associated with the great many sgRNAs. Thus, Peiwu Qin 9 devised a robust fluorescent-signal amplification system by utilizing a re-engineered sgRNA, which contains up to 16 MS2 motifs that can bind to bacteriophage MS2 coat protein (MCP), so that they can label the target via a fluorescent tag of MCP ( Figure 1A). The researchers improved the traditional method of targeting non-repetitive regions by employing four unique sgRNAs rather than 26 to 36 sgRNAs because the amplification effects of the large number of fluorescently tagged MCP molecules can reduce the number of dCas9/ sgRNA complexes required for reliable detection. They also determined that a further reduction in the number of sgRNAs may be possible, whereas the potential loss of dCas9 targeting specificity owing to the modification of sgRNA should be taken into account.
Apart from the signal amplification and single-colour labelling technique, multicolour CRISPR labelling of chromosomal loci has been developed: Hanhui Ma et al 10 used three orthogonal CRISPR/dCas9 components tagged with different fluorescent proteins to detect multiple loci in the genome so that this method can differentiate various chromosomal loci simultaneously and confirm the distance between the loci at the same time ( Figure 1A). The limitation of this approach is that apart from S. pyogenes Cas9 whose PAM is required only for the "NGG" sequence, many other bacterial Cas9 proteins recognize more complicated PAM sequences, which makes the target sequence more demanding. Consequently, to circumvent these issues, Yi Fu et al 11 developed a new method for labelling a target site not by fluorescently tagged dCas9 but by a newly engineered sgRNA containing RNA stem loop motifs MS2 and PP7 that can bind to bacteriophage coat proteins MCP (MS2 coat protein) and PCP (PP7 coat protein). Furthermore, the RNA-binding viral proteins MCP and PCP can be labelled with different colours via fluorescent proteins so that two distinct targets can be visualized simultaneously.
As in Yi Fu's study, Siyuan Wang 12 conducted analogous research and found that extending the tetraloop and stem loop 2 of the sgRNA with MS2 or PP7 aptamers can increase the signal-to-background ratio of chromatin imaging. In the meantime, Hanhui Ma et al 13 proposed an approach similar to but more advanced than the above method. They selected two of three hairpins (MS2, PP7, and boxB) to fuse to stem loops or to the 3′ end of the sgRNA, and this approach yielded six combinations. Therefore, the final sgRNAs were capable of recruiting six different pairs of fluorescent proteins fused to RNA hairpin-binding proteins, which can recognize the corresponding RNA elements. In the end, six colours were presented through different combinations. Thus, the simultaneous imaging of six chromosomal loci was feasible with this approach.
In view of the development of DNA detection technologies, from repetitive to non-repetitive sequences, the challenge was to enhance the signal released from the locus of the target sequence labelled with re-engineered sgRNA/dCas9-viral RNA-binding protein-fluorescent protein complex. Comparison of the detection of repetitive and non-repetitive sequences revealed that the former yields an obvious signal amplification effect because its abundant copies can combine with much more sgRNA molecules, which should produce a stronger signal. Nonetheless, the latter was not able to produce such a strong signal this way. Therefore, the researchers devised two other methods to amplify the signal: one involves transduction of large amounts of sgRNAs targeting the same gene in different regions, and the other involves attaching more fluorescent proteins to one single sgRNA. The limitations of the first method are the following: (a) with the increase in the number of sgRNAs, the off-target effects will become more serious and ineluctable; and (b) it is technically difficult to deliver so many sgRNAs into a cell. It seemed that the second method is suitable for detection of a single non-repetitive sequence but still required four sgRNAs, and lattice light sheet microscopy was needed to obtain a reliable signal. From the single loci to simultaneous multilocus detection, various improvements have been made to add various fluorescent colours though the method was still restricted to six colours. Accordingly, there is still much room for improvement.  One study indicates that the HNH nuclease domain of Cas9 is homologous to other HNH domains that cleave RNA substrates. 14 Doudna 15 lab combined this result with their prior discovery 16 that single-stranded DNA (ssDNA) targets can be activated for cleavage by a separate PAMmer (an oligonucleotide that hybridizes to the target and function as the PAM). They wondered whether Cas9 can cleave ssRNA targets while the PAMmer is present on the complementary side. Thus, they tested a series of DNA and RNA substrates in vitro cleavage experiments and made a conclusion that deoxyribonucleotide-containing PAMmers can specifically activate Cas9 to cleave ssRNA, whereas ribonucleotide-based PAMmers cannot. They also found that even without the PAMmer, dCas9 can bind to the target ssRNA but the binding affinity is much weaker than with a PAMmer, and the high-affinity binding may not need correct base pairing between the guide RNA and the ssRNA target, especially if the complementary PAMmer is present. What's more, the extension of the 5′ end of a PAMmer can improve the binding specificity between the guide sequence and the PAMmer-ssRNA target while binding affinity and cleavage efficacy may undergo concomitant losses. Most importantly, their lab demonstrated that Cas9 can be specifically directed to bind or cut RNA targets while avoiding corresponding DNA sequences via custom design of the PAMmer sequence. In addition, they tried to apply this approach to isolate endogenous GAPDH transcripts from a cell lysate under physiological conditions. At first, they obtained only two GAPDH-specific RNA fragments. But after complete elimination of RNase H-mediated RNA cleavage, they successfully isolated intact GAPDH mRNA and observed that in the absence of a PAMmer, GAPDH mRNA can still be isolated though with lower efficacy. These data indicate that the Cas9/gRNA complex binds to GAPDH mRNA through direct RNA-RNA hybridization. These results mean that this approach can help to purify endogenous untagged RNA transcripts.
The study on CRISPR/Cas9 targeting of RNA began to gain popularity after Doudna's study. David M. Shechner 17 developed a method for locus-specific targeting of long non-coding RNAs in vivo, and this approach is called CRISPR-Display. This method uses a nuclease-deficient S. pyogenes Cas9 mutant (dCas9) to deploy a large RNA cargo to targeted DNA loci by incorporating the cargo into the sgRNA. In other words, the sgRNA became a longer sgRNA that contains a sequence of interest such as artificial aptamers, pools of random sequences, and natural long non-coding RNAs. The sgRNA may bind to genomic loci so that finally we could find out where the RNA sequence of interest is localized. The limitation of this method is that the length of inserted RNA should be at least 4.8 kb. This approach is different from the procedure for inserting the RNA sequence into sgRNA to display RNA domains. The Doudna lab quickly demonstrated that nuclease-inactive CRISPR/Cas9 can bind RNA in a nucleic-acid-programmed manner and allows for endogenous RNA tracking in live cells 18 ; this approach was more convenient than the design of new sgRNA. The researchers called this nucleuslocalized RNA-targeting Cas9 "RCas9" and confirmed that RCas9 can track RNA to oxidative-stress-induced aggregates of RNA and/or proteins that are thought to be involved in neurodegeneration. 19 These findings remind us that RNA tracking and targeting may give us quite a convenient way to explore the basic pathogenesis and pathological processes of diseases. Because all cells of an individual contain almost the same DNA, the functional distinctions between cell types are closely related to the portions of the genome that are transcriptionally active. As a result, expression of RNA is linked to many diseases. For example, the expression of certain small non-coding RNAs known as microRNAs is increasingly recognized as a characteristic feature of oncogenic transformation. Tumour microRNA signatures can act as biomarkers that show the type of cancer and associated clinical outcomes. 20,21

MUTATIONS AND SINGLE-NUCLEOTID E VARIANTS (SNVs)
It is well known that CRISPR/Cas9 has high efficacy of site-specific gene targeting, but its potential off-target effects have raised major concerns regarding its application in many respects. What's more, in The authors also utilized a human colorectal cancer cell line (HCT116) that carries a gain-of-function 3 bp deletion to test whether this assay can analyse naturally occurring variations. The results showed that PCR products amplified from the cells harbouring only wild-type alleles were cleaved by the wild-type-specific RGEN and were not cleaved at all by the mutation-specific RGEN.
Meanwhile the heterozygous were cleaved partially by both wildtype-specific and mutant-specific RGENs; this approach reveals the possibility of analysing mutations. In contrast, the results on the HEK293 cells harbouring a 32 bp deletion (del32) indicated that the del32-specific RGEN cleaves the PCR products from wild-type cells as effectively as those from HEK293 cells, reminding us about the inaccuracy of this cleavage. Lately, they found that this RGEN has an off-target site with a single-base mismatch downstream of the on-target site, which means that this method is not specific enough to distinguish sequences with SNVs owing to their off-target effects.
This author decreased RGEN activity by means of a single-base-mismatched guide RNA instead of a perfectly matched RNA such that the RGEN can discriminate between a wild-type sequence and mutant sequence via a single base. Incidentally, they have shown in another paper that it was two single bases not one base that the RGEN needs to discriminate on-target sites from off-target sites. 25 They created one more base mismatch deliberately to distinguish the wild-type sequence from a mutant sequence and succeeded. They Moreover, they can also detect point mutations in the BRAF and NRAS genes using RGENs that recognize the "NAG" PAM sequence.
It is not easy to control the recognition and cleavage of the RNA-guided Cas9 nuclease because its specificity is dependent on the length of the sgRNA 26 and on the sequence, number, position and distribution of mismatches. 27   (formerly C2c2), which is responsible for the processing and maturation of crRNA and can degrade non-targeted RNA after cleavage of targeted RNA directed by RNA. In addition, they found in their research that distinct active sites within the Cas13a protein catalyse pre-crRNA processing and RNA-directed RNA cleavage. In addition, they determined that the two distinct catalytic activities of Cas13a can be harnessed together for RNA detection, for which the activated Cas13a is able to cleave thousands of non-targeted RNAs after cleavage of the target RNA enables potent signal amplification.
Feng Zhang also published some papers about the detection ways of nucleic acid called SHERLOCK with attomolar sensitivity and singlenucleotide mismatch specificity by means of Cas13a in tandem with recombinase polymerase amplification (RPA), which can be coupled with T7 transcription to convert amplified DNA to RNA for subsequent detection. [38][39][40] The sensitivity of the method combined with RPA is higher than that of any other isothermal amplification method. Cas13a is an RNA-guided RNase that can cleave the regions complementary with the crRNA thereby providing a platform for specific RNA recognition and cleavage. In addition, after cleaving the target RNA, Cas13a cannot stop cleaving nearby off-target RNAs. To take advantage of this function, the authors added some reporter RNA which is non-targeted and will release the signal when it is cut by the Cas13a-mediated collateral RNA cleavage in the reaction.
They uncover that the cleavage products of Cas13a can be activators of Csm6 which is CRISPR type-III effector nuclease. 41,42 The The main features of the three methods are summarized in Table 1.
As for the time and cost, the two methods that involve the CRISPR/ Cas9 system take~3 hours, whereas the dCas9/sgRNA complex combined with FISH is more inexpensive. Regarding sensitivity, 10 colony-forming units (cfu)/mL is required for detecting MRSA, which T A B L E 1 Brief summary of applications in detecting pathogens of three main methods  44 which also complicates the design of sgRNA. Secondly, the "off target" effects which may cause false negative or positive results need to be considered. The frequency of off-target sgRNA binding varies a lot, ranging from very few off-targets to great amounts. 25,27,45,46 However, target sequences can be selected by online software to help reduce the probability of off-target binding, 47 and more specific variants of Cas9 protein 48,49 or CRISPR systems from other types 50,51 may be helpful to address this issue.

Main detection system
Accompanied by the aforementioned off-targets ameliorated in the future, the accuracy of CRISPR/Cas technology in molecular detection will also be improved. Thirdly, RNA is very fragile due to ubiquitous RNase, the detection of the interested nucleic acid and mutations were prone to be affected. Therefore, it is important to make sure that the designed longer sgRNA for signal amplification of nucleic-acid detection in CRISPR/Cas9 system is not cut short or degraded by the RNase, which may cause the final result false negative. As for CRISPR/Cas13a detection system, SHERLOCK platform is in good graces for its extremely high sensitivity. Nevertheless, SHERLOCK is an exponential pre-amplification that saturates quickly after the reaction starts, which makes accurate quantification in real time quite difficult. 39 More explorations need to be done to observe a proper way about quantification of the detection and to require a wider linearity range. Collectively, further studies to solve these limitations of CRISPR/Cas technology will pave the way for the molecular detection in human diseases in vitro, including different types of cancer.

| CONCLUSION S
Since the discovery of the CRISPR/Cas system, it has been widely used for genomic editing to treat some mutation-induced diseases.
Nevertheless, there is still too much controversy regarding the applications of CRISPR/Cas9 in medical treatments due to the risk of "off-target" effects. From Yanfang Fu's 52 research, we know that off-target sites that consist of up to five nucleotides difference from the intended target site may be mutagenized by CRISPR/Cas9 at even higher frequencies than the intended on-target sites. This is a thought-provoking result meaning that the application of this technique in medicine requires caution because there are countless potential off-target sites that have four or five mismatches compared with the expected targeted sequence in the human genome. Besides, there are some technical challenges, such as the need for improvement of editing efficacy and selection of delivery methods. Thus, we aim to apply this technique to disease diagnosis because many diseases are caused by a change in gene. We want to combine this technology with some other approaches to detect DNA or RNA, mutations, and SNVs, and the combined method can facilitate the diagnosis of infections with some pathogens and therefore diagnose a disease at the molecular level with greater precision and reliability and without safety concerns.