Rapid and accurate species identification for ecological studies and monitoring using CRISPR‐based SHERLOCK

Abstract One of the most fundamental aspects of ecological research and monitoring is accurate species identification, but cryptic speciation and observer error can confound phenotype‐based identification. The CRISPR‐Cas toolkit has facilitated remarkable advances in many scientific disciplines, but the fields of ecology and conservation biology have yet to fully embrace this powerful technology. The recently developed CRISPR‐Cas13a platform SHERLOCK (Specific High‐sensitivity Enzymatic Reporter unLOCKing) enables highly accurate taxonomic identification and has all the characteristics needed to transition to ecological and environmental disciplines. Here we conducted a series of “proof of principle” experiments to characterize SHERLOCK’s ability to accurately, sensitively and rapidly distinguish three fish species of management interest co‐occurring in the San Francisco Estuary that are easily misidentified in the field. We improved SHERLOCK’s ease of field deployment by combining the previously demonstrated rapid isothermal amplification and CRISPR genetic identification with a minimally invasive and extraction‐free DNA collection protocol, as well as the option of instrument‐free lateral flow detection. This approach opens the door for redefining how, where and by whom genetic identifications occur in the future.


| 963
BAERWALD Et AL. in remote field locations or in countries with developing scientific infrastructure. A field-deployable genetic-based approach would allow biologists to quickly identify species in the field. Rapid species identification ensures accurate data collection in ecological studies and can be critically important for time-sensitive species management and compliance with laws protecting threatened species. For example, human activities (e.g., logging, fishing, sample collection) that may jeopardize a protected species must be rapidly altered or stopped once the number of individuals permitted to be "taken" by this activity under U.S. Endangered Species Act regulations has been met. Field-deployable identification will allow scientists working in remote locations and developing nations to conduct their own genetic species identification in situ. Customs agents and wildlife forensics specialists could also benefit from rapid species identification at border crossings or crime scenes, respectively. CRISPR (clustered regularly interspaced short palindromic repeats)-based genetic methods could be ideal for species identification, due to their diagnostic specificity, sensitivity and speed (Knott & Doudna, 2018). The recently developed CRISPR-based SHERLOCK nucleic acid detection platform has shown promise in the fields of diagnostic healthcare (Gootenberg et al., 2017Myhrvold et al., 2018) and agriculture . SHERLOCK combines isothermal amplification with the functional capability of Cas13a to indiscriminately cleave RNA (including reporter RNA) only after it detects a specific target sequence. Many of the qualities that make it attractive for field deployment in healthcare and agriculture (e.g., rapid detection, single temperature reaction condition, high sensitivity, low cost) make it well suited to transition to an ecological context.
In this study, as a proof of principle, we engineered SHERLOCK DNA assays that do not require DNA extraction or specialized equipment to genetically distinguish three morphologically similar fish species (Figure 1a) with range overlap in California's San Francisco Estuary (SFE) ( Figure S1). Specifically, we sought to reliably distinguish the US threatened and California endangered delta smelt (DSM; Hypomesus transpacificus) (CDFW, 2019;USFWS, 1993), the California threatened longfin smelt (LFS; Spirinchus thaleichthys) (CDFW, 2019) and the non-native wakasagi (WAG; Hypomesus nipponensis). All three are members of the family Osmeridae and are particularly difficult to distinguish morphologically at younger life stages. For example, a recent study found substantial morphology-based field misidentifications between juvenile delta smelt and wakasagi (Benjamin et al., 2018). Field-deployable genetic assays for these species would enable real-time decision-making when evaluating protected species "take." Real-time knowledge of take would benefit the year-round ecological monitoring programmes occurring throughout the SFE as well as state and federal water export facilities, which are required to substantially scale back exports when take limits for protected species are exceeded. More generally, a SHERLOCK-enabled field taxonomic identification method could be broadly utilized by nonmolecular biologists working in the fields of ecology, conservation biology and environmental monitoring for any target species.

| Production of LwCas13a, crRNAs and gBlock fragments
LwCas13a protein was synthesized and purified by GenScript.
Guide CRISPR RNAs (crRNAs) were synthesized by Integrated DNA Technologies (IDT) as ultramer RNA and rehydrated following Dharmacon's synthetic guide RNA resuspension protocol (Dharmacon). Gene block (gBlock) fragments were synthesized by IDT for use in sensitivity and qPCR reactions.

| RPA primer and crRNA design
We used published mitochondrial cytochrome b (cyt-b) sequences for DSM, LFS and WAG (Baerwald, Schumer, Schreier, & May, 2011;Brandl et al., 2015) to identify diagnostic polymorphisms between the species for recombinase polymerase amplification (RPA) primer F I G U R E 1 Accurate, sensitive and rapid species-specific diagnostics using DNA from tissue with the SHERLOCK platform. DSM = delta smelt, LFS = longfin smelt, WAG = wakasagi. (a) Phenotypic comparison of three osmerid fish species (juvenile life stage) targeted for SHERLOCK differentiation (Photo credit: Rene C. Reyes). ST = listed as threatened by state of California; SE = listed as endangered by state of California; FT = listed as threatened by USA. (b) SHERLOCK schematic. DNA is extracted from a small amount of tissue. In a single reaction, the DNA is converted to amplified target RNA that binds to target-specific crRNA. Activated Cas13a then collaterally cleaves an RNA reporter, causing fluorescence. (c) Osmerid specificity based on fluorescence after 1 hr. DNA from each target species was tested against all three species-specific crRNAs. N = 40 biological replicates for target species and N = 20 biological replicates for nontarget species for each crRNA. Boxplots display the median and interquartile range for each DNA and crRNA assay combination. (d) Species-specific identification with each column representing a crRNA and each row representing a common fish species found in the San Francisco Estuary. Fluorescence values are the background-subtracted average from two biological replicates per species followed by normalization for each smelt species assay. (e) Limit of detection for each SHERLOCK assay using species-specific crRNA and serial dilutions of species DNA derived from a synthetic template. Fluorescence was measured after 1 hr and bars represent means (± 1 SD) from three technical replicates. (f) SHERLOCK time-course. Fluorescence of species-specific crRNA combined with 20 ng DNA from each target species. Fluorescence was measured every 5 min over a 110-min time-course. Three biological replicates were averaged per species (± 1 SD). (g) Comparison of delta smelt SHERLOCK and qPCR time-course. SHERLOCK conditions and results are same as in (f). The qPCR also used 20 ng DNA as template and amplified the same cyt-b region as SHERLOCK by using a TaqMan assay. Fluorescence was measured every 5 min over a 1-hr timecourse. Three biological replicates were averaged (± 1 SD). design ( Figure 1b; Figure S2, Tables S1-S3). Sequences were downloaded from NCBI and then aligned in mega7 (Kumar, Stecher, & Tamura, 2016). RPA primers for delta smelt were taken directly from Baerwald et al. (2011) while new primers were designed for both longfin smelt and wakasagi (Table S1) using primer3web version 4.1.0 (Koressaar & Remm, 2007;Untergasser et al., 2012). Forward primers contained the T7 promoter sequence (TAATACGACTCACTATAGGG) at the 5′ end along with four or five additional bases to increase binding affinity.
The crRNAs were designed following the guidelines in Gootenberg at al. (2017). Each crRNA was 67 nucleotides in length with a 28-nucleotide spacer sequence and contained the T7 binding sequence ( Table S2). None of our protospacer flanking sites (PFS) for the smelt species contained G, which reduces LwCas13a cleavage robustness. We introduced a mismatch in position 5 of the spacer to increase the specificity of LwCas13a (Gootenberg et al., 2017). The Multiple Primer Analyzer tool (ThermoFisher Scientific) was used to ensure that both RPA primers and crRNAs that formed self-dimers or cross-primer dimers were not taken through to production.

| DNA extraction from tissue
DNA was extracted from a 2 × 2-mm caudal fin tissue piece using the DNeasy Blood and Tissue kit (Qiagen). We followed the manufacturer's protocol with a few modifications. Dissected fin was incubated at 56°C overnight (~16 hr) in Buffer ATL and Proteinase K solution.
Samples were eluted in 100 µl of DNase/RNase-free water. DNA concentration was measured using the Qubit dsDNA Broad Range Assay Kit read on the Qubit 2.0 (Life Technologies).

| SHERLOCK assay
Detection reactions were performed as described in Gootenberg RPA amplification (TwistAmp Basic RPA kit, TwistDx) occurred in a total volume of 12 µl (excluding DNA input). Two microlitres of DNA input was added to each reaction. Reactions were set up in a laminar flow hood to reduce the chance of contamination.
Reactions were carried out in BioRad white shell qPCR plates and then incubated at 37°C for 1 hr and 45 min with fluorescent plate readings every 5 min for a total of 21 cycles. Fluorescent excitation and emissions were measured using the FAM channel on the BioRad

CFX96 Touch Real-Time PCR Detection System (BioRad). BioRad
CFX Maestro Software was used to obtain relative fluorescent units for each sample across cycles. Duplicate negative control samples were included on each plate and their average was used to background-subtract all samples for each cycle.
An initial assay screen was completed using DNA extracted from the caudal fin of two target individuals and one of each of the nontarget species. For example, delta smelt reactions were tested with two delta smelt individuals and one each of longfin smelt and wakasagi.
For additional screening, we selected highly specific crRNA/RPA primer pair combinations with the greatest fluorescence intensity and most rapid amplification (one pair per species).

| Tissue specificity reactions
The original cyt-b TaqMan assays designed to distinguish delta smelt, longfin smelt and wakasagi were extremely specific and did not display any signs of cross-amplification (Baerwald et al., 2011;Brandl et al., 2015). We conducted additional screening of our best crRNA/ RPA primer pairs (one per species) to see if they exhibited similar specificity. Background-subtracted fluorescence for 40 target fish and 20 each of the other nontarget smelt species were assessed for each of the three osmerid SHERLOCK reactions after 1 hr at 37°C.
Boxplots were used to visualize the median and interquartile range for each DNA and crRNA assay combination.
Additionally, we screened two individuals from each of the 24 nontarget fish species (Table S4) found throughout the same geographical range as all three smelts ( Figure S1). These 48 samples, along with target smelt samples, were run with all three individual smelt assays to ensure specificity. Fluorescence values for the biological replicates were background-subtracted, averaged and then normalized based on the highest fluorescence values across all species after 1 hr at 37°C. These normalized values were graphically displayed by creating a heatmap using ggplot2 in R (R Core Team, 2019).

| gBlock sensitivity reactions
gBlocks were synthesized by IDT for all three of our target species.
The gBlocks contained the cyt-b amplified region with 20 additional flanking bases on either end and the T7 promoter sequence on the 5′ end. Serial dilutions starting with between 2.2 and 3 billion copies per reaction were diluted 1:10 with our smallest copy number between 2.2 and 3 copies per reaction. SHERLOCK detection reactions were run with three technical replicates per dilution factor.
Fluorescence values for these technical replicates were backgroundsubtracted and averaged (± 1 SD) after 1 hr at 37°C.
The delta smelt gBlock dilution series was additionally subject to qPCR for comparison of assay sensitivity. Again, three technical replicates were analysed. The qPCRs were comprised of the following: gBlock template, 0.9 µm forward and reverse primer, 0.06 µm TaqMan Probe (Applied Biosystems), and 1× TaqMan Universal PCR Master Mix (Applied Biosystems) for 40 cycles with a 10-min initial denaturation at 95°C, 15 s cycle denaturation at 95°C and 1 min annealing at 63°C. Images were recorded after each cycle. Reactions were carried out in the same BioRad CFX96 Touch Real-Time PCR Detection System used in SHERLOCK detection reactions. BioRad CFX Maestro Software was used to obtain relative fluorescence units for each sample. Duplicate negative control samples were included on each plate and their average was used to background-subtract all samples for each cycle.
Fluorescence for these technical replicates were background-subtracted and averaged (± 1 SD) after 1 hr at 37°C.

| Time course reactions for speed comparison
We assessed the relative magnitude of background-subtracted fluorescence signal over time for each of the three osmerid SHERLOCK assays. Species-specific crRNA was combined with 10 ng DNA extracted from tissue for each target species, with three technical replicates per species. Fluorescence was measured every 5 min over a 110-min time-course and then background-subtracted. The technical replicates were averaged per species (± 1 SD).
We also compared the delta smelt SHERLOCK assay results mentioned in the previous paragraph with the speed of a TaqMan qPCR assay, which amplifies the same target region of cyt-b. The starting DNA template (10 ng) was from the same sample as used for the SHERLOCK reaction, with three technical replicates. The qPCR conditions were the same as described above for the gBlock qPCRs. Fluorescence was measured every 5 min over a 50-min time-course and then background-subtracted. The technical replicates were averaged (± 1 SD). Wild smelt were swabbed to determine if results were similar to those obtained from swabbing hatchery fish. Because wild delta smelt are rare and protected, wild wakasagi were caught and swabbed in the SFE (Cache Slough and Liberty Island, Lower Sacramento Ship Channel, and Suisun Marsh) as a surrogate species. All wakasagi swabs were swirled in 300 µl PBS buffer directly after collection and frozen until the SHERLOCK detection reaction was prepared in the laboratory. Two "no template" negative controls and two positive tissue DNA controls were run on each plate.

| Mucus swabbing
Reactions were incubated at 37°C for 1 hr unless otherwise indicated. Additionally, qPCR was performed on all biological replicates of extraction-free mucus swabs in 300 µl 1× PBS for delta smelt. The qPCRs and analysis methods were the same as described above for the gBlock qPCR. and a negative control were tested. Additionally, a lateral flow time series experiment using nonextracted delta smelt mucus stored in 300 µl of 1× PBS was used to determine the speed of positive detection.

| Specificity and sensitivity using tissue and synthetic oligos
Using DNA extracted from tissue, we assayed 40 samples of each target osmerid species along with 20 samples of each nontarget | 967 BAERWALD Et AL.
osmerid species using species-specific crRNAs. All individuals amplified for their species-specific assay and no individuals cross-amplified for either of the other two nontarget species assays (Figure 1c).
Additionally, we confirmed that 24 other fish species commonly found in the SFE did not produce false positive SHERLOCK results for any of the assays (Figure 1d; Table S4), further validating 100% species specificity. We tested the sensitivity of our three osmerid SHERLOCK assays using synthetic gBlock oligonucleotide DNA fragments (Table S5). The DSM and LFS assays could reliably detect their respective DNA targets down to ~ 300 copies per reaction, whereas the WAG assay sensitivity was slightly lower (~2,000 copies per reaction) (Figure 1e). These limits of detection should be effective for reliably detecting the mitochondrially encoded cyt-b target even when DNA concentrations are low, as mitochondrial DNA copy number varies across species and tissue types but typically ranges from hundreds to thousands of copies per cell in eukaryotes (Cole, 2016).

| Speed and sensitivity comparisons with qPCR
We next determined how rapidly positive detections could be made after initiating the SHERLOCK reaction by reading fluorescence every 5 min for a total of 1 hr (Figure 1f). For all three assays, the minimum positive detection time was less than 20 min for 20 ng of input DNA, and reached a maximum at ~ 30 min, remaining stable for the remainder of the time-course (Figure 1f). When directly comparing SHERLOCK and qPCR assays for DSM (with both sets of primers and probes targeting the same cyt-b region), SHERLOCK detections were ~ 2.5 times more rapid when using the same fluorescence reader and averaged an absolute signal intensity that was 15 times higher, although this may be influenced by the amount of reporter in each assay (Figure 1g; Figure S3a). However, the DSM qPCR assay was more sensitive than SHERLOCK as tested on synthetic gBlock oligo DNAs ( Figure S3b). The qPCR limit of detection was ~ 3 copies per reaction in comparison to ~ 300 copies per reaction with SHERLOCK. This decreased sensitivity is probably due to conducting a SHERLOCK one-pot reaction (combining RPA and Cas13a detection in a single reaction for increased speed and convenience) versus conducting a two-step SHERLOCK protocol, which is more sensitive and typically capable of single molecule detection (Kellner, Koob, Gootenberg, Abudayyeh, & Zhang, 2019). Serial dilutions of the delta smelt synthetic oligo showed that minimum positive detection time was 20 min or less when input DNA was ~ 3,000 copies per reaction or greater ( Figure S4).

| Optimization of minimally invasive sampling
Once the specificity, sensitivity and speed of SHERLOCK results were characterized for traditionally extracted tissue, we focused on developing a method for accessing the target species' DNA with minimal invasiveness and requiring little to no additional upstream procedures prior to commencing the SHERLOCK reaction.
Fish mucus, which is abundant and covers all epithelial surfaces, can be swabbed with a brush to obtain DNA samples, and this method has been successfully used for genotyping and high-throughput sequencing (Taslima, Davie, McAndrew, & Penman, 2016;Taslima, Taggart, Wehner, McAndrew, & Penman, 2017). More generally, mucus swabbing is used for genetic analysis of many other diverse organisms including humans (Clarke et al., 2014), amphibians (Pidancier, Miquel, & Miaud, 2003) and molluscs (Henley, Grobler, & Neves, 2006). We first tested both DNA extraction-and nonextraction-based methods for performing SHERLOCK on mucus swabs from delta smelt (Figure 2a). We observed that even without DNA extraction, by simply swirling the swabs in tubes containing PBS, we could detect robust SHERLOCK fluorescence, comparable in magnitude to DNA extracted from fin tissue (Figure 2b). We proceeded to test this noninvasive approach in all three osmerid SHERLOCK assays, and found that the mucus swabbing in PBS method performed well across all assays and displayed a high degree of species-specificity, considerably reducing processing time and making it ideal for field applications (Figure 2c). Furthermore, similar to DNA extracted from tissue, SHERLOCK fluorescence could be detected 15-20 min after reaction commencement (Figure 2d,e) and is approximately twice as rapid as qPCR (Figure 2e), providing additional speed for field deployability and time-sensitive applications. F I G U R E 2 Characterization of SHERLOCK assays using noninvasive mucus swabs. (a) Schematic of a rapid and noninvasive species detection method using mucus swabs (delta smelt shown here as an example) placed directly in PBS followed with three one-pot SHERLOCK reactions (one for each smelt assay). Only a reaction containing crRNA specific to delta smelt will fluoresce. (b) Evaluation of different methods for noninvasive species detection compared to DNA extracted from tissue. Mucus swabs were used both with and without DNA extraction and with varying volumes of PBS. Delta smelt caudal fin tissue or mucus swabs were used and SHERLOCK fluorescence was measured after 1 hr. Median and interquartile range are shown for each boxplot. N = 6 for positive control (DNA extracted from tissue) and N = 10 for all other methods. Trad. = traditional; centr. = centrifugation (see Methods). (c) Species specificity for each osmerid assay demonstrated by SHERLOCK fluorescence after 1 hr. For each target species, mucus swabs placed directly in 300 µl PBS were tested against all three species-specific crRNAs. Boxplots display median and interquartile range for each DNA and crRNA assay combination. DSM = delta smelt, LFS = longfin smelt, WAG = wakasagi. Median and interquartile range are shown for each boxplot. For target species, N = 10 (DSM), N = 7 (LFS) and N = 39 (WAG) and ranged from three to 10 for each nontarget species. (d) Rapid detection of SHERLOCK fluorescence for mucus swabs placed directly in 300 µl PBS from each target species. Fluorescence was measured every 5 min over a 110-min time-course.
Average fluorescence values were plotted with error bars = 1 SD. DSM: N = 10; LFS: N = 7; WAG: N = 39. (e) Time comparison of SHERLOCK and qPCR assays using delta smelt mucus swabs and targeting the same locus. SHERLOCK conditions and data are as in (d). Fluorescence was measured every 5 min over a 60-min time-course. Average values for N = 10 are plotted for each assay with 1 SD error bars.

| Enabling genetic assay field deployment
As the SHERLOCK fluorescence assay performed well, with high specificity and rapidity, using a minimally invasive swabbing technique without the need for DNA extractions, we moved forward with refinements that could further aid field deployment. We tested a visual, equipment-free SHERLOCK readout method using lateral flow strips and dual labelled RNA reporter (Gootenberg et al., 2017 (Figure 3a). When conducting the SHERLOCK lateral flow assay for DNA extracted from tissue, mucus swabs in PBS and synthetic template all showed positive visual bands for each speciesspecific assay tested 1 hr after the reaction start (Figure 3b). The positive bands for wakasagi were noticeably lighter, probably due to the reduced signal strength of that assay, as seen for detection of both tissue and mucus fluorescence (Figures 1f and 2d). Based on the successful visual read-outs from the genetic identification at 1 hr, we conducted a time series to determine the minimum time needed to visually detect a positive band for delta smelt mucus with the naked eye. Using technical replicates, we reliably saw bands at 40 min, with occasional detection at 30 min (see Figure 3c for representative strips at each time point), approximately twice the time needed to conduct genetic identification using a fluorescence reader. The 15-20-min loss in speed, and probably some loss in sensitivity, will need to be considered against the ease of equipment-free detection.
Additionally, results will vary depending on each crRNA assay, because some have stronger signals than others, and each mucus swab, because they can pick up varying amounts of DNA from the target individual.

F I G U R E 3
Lateral flow detection of extracted DNA from tissue as well as nonextracted mucus swabs. (a) Schematic of SHERLOCK instrument-free detection using a FAM-and biotin-labelled RNA oligonucleotide reporter and commercial lateral flow strips. Uncleaved reporter accumulates as anti-FAM antibody/gold nanoparticle conjugates at the control (Streptavidin) line. If target DNA is present, the reporter is cleaved by Cas13a, resulting in conjugate binding at the antibody capture line. (b) For each species-specific assay, on-target synthetic gBlock, tissue DNA and mucus swabs (in 300 µl PBS) were detected with lateral flow. Off-target tissue DNA (see Methods for species used) and no-template reactions were included as negative controls. (c) Time-course of delta smelt mucus swab detection over 60 min. (d) Quantification of band intensities from (c). Neg., no-template control.

| D ISCUSS I ON
The adaptation of the SHERLOCK method described in this study has all the necessary attributes for full field deployment. The DNA extraction-free direct use of mucus swabs is a significant advancement that allows all subsequent steps to be performed outside the laboratory. Because the isothermal RPA amplification requires only a constant low temperature (Piepenburg, Williams, Stemple, & Armes, 2006) and the entire reaction can occur in a single tube with lyophilized reagents (Gootenberg et al., 2017), SHERLOCK can easily be performed in the field. The reaction can occur in the palm of the hand (and subsequently be put onto a lateral flow strip) or in a small portable device with temperature control and fluorescence detection (e.g., ESEQuant TS2, Qiagen). The speed of detection (<20 min when using a fluorescence device), along with the option of instrument-free detection, are also critical components for field usage.
In comparison, hand-held qPCR instruments such as Biomeme's Franklin unit takes 30-60 min to detect target DNA (biomeme.com) and as a qPCR platform may be less sensitive than SHERLOCK. It is anticipated that the entire protocol, from obtaining a sample to genetic identification, could be completed in less than 1 hr, which enables near real-time species diagnostics. Altogether, our results provide an important proof-of-concept that SHERLOCK can be reliably used in a variety of ecological and environmental monitoring settings to obtain accurate, sensitive and rapid genetic results.
Future studies may expand its use to other organisms and finer-scale taxonomic differentiation, such as discriminating between subspecies. CRISPR-based methods can also be used for detecting specific organisms in environmental DNA samples (Williams et al., 2019). As a whole, SHERLOCK and other CRISPR methods such as DETECTR (Chen et al., 2018) and FLASH (Quan et al., 2019) have the potential for widespread application in ecology due to their sensitivity, accuracy and speed. By embracing CRISPR methods, ecology and conservation biology will be able to bring rapid, genetic-based taxonomic identification to the most remote field settings. Furthermore, the ease of use of SHERLOCK and similar assays will expand the power of CRISPR beyond the realm of geneticists and move it into the hands of field biologists, unlocking the potential of this transformative technology to redefine how, where and by whom genetic identification occurs in the future.