Design and optimization of a 16S microbial qPCR multiplex for the presumptive identification of feces, saliva, vaginal and menstrual secretions

Abstract Molecular methods for body fluid identification have been extensively researched in the forensic community over the last decade, mostly focusing on RNA‐based methods. Microbial DNA analysis has long been used for forensic applications, such as postmortem interval estimations, but only recently has it been applied to body fluid identification. High‐throughput sequencing of the 16S ribosomal RNA gene by previous research groups revealed that microbial signatures and abundances vary across human body fluids at the genus and/or species taxonomic level. Since quantitative PCR is still the current technique used in forensic DNA analysis, the purpose of this study was to design a qPCR multiplex targeting the 16S gene of Bacteroides uniformis, Streptococcus salivarius, and Lactobacillus crispatus that can distinguish between feces, saliva, and vaginal/menstrual secretions, respectively. Primers and probes were designed at the species level because these bacteria are highly abundant within their respective fluid. The validated 16S triplex was evaluated in DNA extracts from thirty donors of each body fluid. A classification regression tree model resulted in 96.5% classification accuracy of the population data, which demonstrates the ability of this 16S triplex to presumptively identify these fluids with high confidence at the quantification step of the forensic workflow using minimal input volume of DNA extracted from evidentiary samples.


| INTRODUC TI ON
Body fluid identification (BFID) is the first step of the forensic DNA analysis workflow that can play a crucial role in story corroborations of suspects, victims, and/or witnesses. It can be useful for investigative leads and/or crime scene reconstruction. Equally important, it allows a DNA analyst to determine the best location to swab or cut to obtain a DNA profile from an evidentiary sample [1,2]. Most methods currently used in forensic serology rely on enzymatic-based tests that result in a color change that is interpreted and recorded by an analyst. Although these serological methods have been utilized for decades, there are well-documented flaws associated with each [3][4][5][6]. Therefore, there has been extensive research among the forensic community that addresses molecular-based BFID methods, such as messenger RNA or microRNA analysis, DNA methylation, and microbial DNA analysis [7][8][9][10][11].
There are numerous studies that focus on microbiome-based BFID, but mostly rely on 16S rRNA gene sequencing, as this is a gold standard in microbiome analysis [12,13]. The 16S ribosomal subunit is specific to prokaryotes and is highly conserved within the same genus and species but has several hypervariable regions, all of which allow for taxonomical microbial classification [12]. An important consideration for forensic applications is the high cost and complex workflow of high-throughput sequencing (HTS) methods. As of now, HTS is not standard in forensic DNA analysis due to cost, sample preparation time, hands-on training requirements, and complicated back-end bioinformatic analyses. To better align with the current DNA analysis workflow, real-time (q)PCR methods for bacterial BFID have been proposed; however, most of these amplify multiple bacterial species to classify a single body fluid [14][15][16]. This could be problematic when developing a qPCR multiplex that can identify multiple body fluids due to the number of optical filters in qPCR instruments limiting how many sequences can be multiplexed in a single well.
This approach would require the use of multiple reactions or wells, which would increase reagent cost and sample consumption compared with a single-well assay.
The most successful microbiome-based BFID studies focus on body fluids with high bacterial content, such as vaginal fluid, feces, and saliva [13,[15][16][17][18][19][20][21][22]. One reported limitation is the inability to differentiate menstrual blood from vaginal secretions using microbial signatures, which could be useful information in a sexual assault case [21]. A major disadvantage is that some forensically relevant body fluids, particularly blood and semen, are difficult to characterize using microbial signatures because low bacterial cell counts often result in poor DNA yields, a problem when considering the often-compromised nature of forensic evidence [13]. Several reports address these concerns by incorporating other types of molecular markers, such as messenger RNA; however, multiplexing challenges still apply to these integrated assays [17,18,23].
The purpose of this research was to design a qPCR multiplex targeting the 16S gene of three microbial species that are highly abundant in the respective body fluids-Bacteroides uniformis for feces, Streptococcus salivarius for saliva, and Lactobacillus crispatus for female intimate samples (vaginal fluid/menstrual blood). The main objective was to provide proof of concept that a single-well microbial qPCR assay can presumptively identify more than one forensically relevant body fluid; therefore, differentiating between female intimate samples and identifying blood and semen were not primary goals of this study.

| Primer and probe design
Primers and probes were designed using default parameters in Beacon Designer 8 (Premier Biosoft, Palo Alto, CA). The same primer sequences from the SYBR Green mode were used to design the probes in TaqMan mode, which allowed for primer specificity testing before ordering/testing the hydrolysis probe. Primers and probes were ordered from Integrated DNA Technologies (IDT, Coralville, IA). Dual-labeled hydrolysis probes were labeled with internal quenchers, HPLC-purified and normalized to 100 μM in TE buffer. Sequence information for qPCR primers, probes, and respective targets are listed in Table 1. Once the spin basket and swab were discarded, DNA purification was performed according to protocol without modification. All samples and reagent blanks were eluted in 50 μl of ATE buffer (Qiagen), and total DNA was quantified using the NanoDrop™2000 (Thermo Fisher Scientific) before storage at −80°C (data not shown).

| qPCR
The designed multiplex is technically a qualitative-PCR assay since the goal is not to quantify any microbial DNA in the sample, which would require analyzing known quantity standards alongside the questioned samples. However, synthetic DNA standards were used to validate the multiplex and could be used as positive controls or for quantification, if desired; therefore, the MIQE guidelines for qPCR reports were followed throughout the project [24].
Initially, each microbial target was evaluated as a single-plex  Table S1. Each gBlock® was resuspended in TE buffer at 10 ng/μl and incubated at 50°C for 20 min, per manufacturer's instructions. Ten-fold serial dilutions were prepared at the concentration range of 5 pg/μl-0.05 fg/μl, aliquoted, and stored at −20°C. To validate the multiplex, the three gBlocks® sequences were pooled together at a concentration of 1 ng/μl before making ten-fold serial dilutions to the optimized concentration range (5 pg/ μl-0.05 fg/μl).

| SYBR green assay
To ensure amplification of a single product, each primer set was evaluated first via a SYBR Green assay with melt curve analysis to ensure a single amplified product. Standards and no-template controls were analyzed in triplicate using 6.25 μl of PerfeCTa SYBR Green SuperMix (2X) (VWR, Radnor, PA), 3.75 μl of nuclease-free water, 0.25 μl (10 μM) of forward and reverse primers (

| Hydrolysis probe-based assay
Once a single amplification product was confirmed in the SYBR Green assay, each target was individually evaluated in a probe-based assay before multiplexing. After multiplex validation using gBlock® standards, the assay was tested using body fluid samples.

| Linear range of classification
The goal of the linear range of classification study was to determine the lowest DNA concentration at which the qPCR assay will accurately classify as the correct body fluid. Total

| Assay validation
In both SYBR Green and probe-based assays, there was no amplification detected in any of the negative controls, including extraction reagent blanks. Single peaks were observed in the SYBR Green melt curve analysis, which verified that there was only one PCR product for each primer pair (data not shown). The same primer sequences were then used for the probe-based assay, in which various primer/probe concentrations and 5′ reporter dyes were evaluated during optimization. The final 16S triplex primer and probe concentrations for all three microbial targets were 400 nM (forward and reverse) and 200 nM, respectively. All reported data are representative of the 16S triplex at these concentrations. The 5′ reporter dyes were chosen based on having similar excitation and emission wavelengths as dyes that are in use in commercial STR multiplex kits, for example, the ATTO550 and SUN dyes (IDT) emit in the same filters as ABY and VIC (Thermo Fisher Scientific), respectively. These choices were designed to ensure that qPCR instruments would already be calibrated for the requisite probe emission spectra, which would ease implementation in forensic laboratories.
The slopes, amplification efficiencies, and R 2 values of the standard curves ( Table 2) were all within the recommended ranges for qPCR assays [25]. The reported standard curve data in Table 2 are averaged across six experiments, demonstrating repeatability and reproducibility of the assay. The linear dynamic range was determined to be 5 pg/μl-0.05 fg/μl (Table S3 in Appendix S1). We acknowledge that the lower limit of this range is not the lowest limit of detection since each microbial target amplifies at approximately 28 cycles; however, lower concentrations were not tested to minimize reagent consumption. Additionally, we felt that six standards were sufficient to validate assay performance because the overall purpose is to obtain a raw Cq value rather than to quantify any microbial DNA from a sample. Although it is possible to quantify DNA of each microbial species using this assay, further research expanding the lower limit of detection would be required to quantify microbial DNA for forensic applications.
An important consideration during assay validation was equivalent amplification of the microbial DNA from forensic body fluid samples in a single-plex assay when compared to the multiplex. To address this concern, multiple DNA extracts were tested for each target species both in single-plex and multiplex reactions and verified that Cq values were within one cycle of each other (data not shown).

| Evaluation of body fluid specificity
When evaluating raw data (

| Classification regression tree analysis
It should first be noted that although blood and semen samples were evaluated in this study, the goal was not to differentiate between them using this assay, as they have low bacterial DNA yields [13] and were classified as VF while two were classified as saliva (Table S4).
Differentiation between VF and MB using the individual CART model was dependent on S. salivarius Cq value cutoff of 18.86 ( Figure S1); however, a similar S. salivarius cutoff (Cq = 19.2) was used to distinguish between saliva and VF, which likely explains the observed misclassifications between VF, MB, and saliva (Table S4).
When female intimate sample data were combined, the tree plot ( Figure 1) looked similar to that in Figure S1; however, the overall classification accuracy increased to 96.5% with only two samples (one saliva and one VF/MB) misclassified (Table 3). This demonstrated that combining female intimate samples in a dataset can increase classification accuracy. Furthermore, it supports the reported claim that it is difficult to differentiate vaginal fluid from menstrual blood using microbial signatures since both fluids originate from the same body cavity and can contain similar bacterial compositions at any point during the menstrual cycle [17,18,26]. This of course is somewhat appropriate given casework scenarios; however, as the project is still developmental in nature, additional future work will be required when classifying mixtures and distinguishing venous blood from menstrual secretions within a sample of mixed origin.
Importantly, 100% of fecal samples was correctly classified regardless of VF/MB grouping, and there were no misclassifications involving blood/semen samples. Saliva misclassifications were observed in both CART models, which could be due to higher-thanexpected S. salivarius detection in other body fluids thus negatively impacting its anticipated saliva specificity. Another possible explanation is that S. salivarius primers and probe were designed at the species level, and differentiation among Streptococcus species in saliva has been reportedly more difficult using 16S compared with other genes [19,27]. however, one group did not examine feces [28], and different methodologies were used, such as loop-mediated isothermal amplification (LAMP), reverse transcription LAMP, or direct PCR combined with immunochromatographic strip [28][29][30]. Importantly, none of these studies amplified 16S, which supports the previous statement that the 16S rRNA gene may not be the best target for saliva identification, especially at the species taxonomic level. Since saliva is commonly present on crime scene evidence, incorporating a more specific saliva marker into the proposed qPCR multiplex may be useful for forensic casework implementation.

| Linear range of classification
The goal of this study was to determine at which ten-fold dilution of DNA extract the body fluid will classify correctly using the grouped female intimate CART model. All saliva samples could only be accurately classified when the DNA extract was input into the qPCR reaction (Table S5). The lowest DNA concentrations quantified via UV-spectrophotometry were observed in saliva compared with vaginal/menstrual secretions and feces (data not shown); therefore, it was expected that ten-fold dilutions of saliva extracts would not yield correct classification results. It should be noted that, unless otherwise stated, any dilution that was correctly classified was also correctly classified in the DNA extract; for example, vaginal fluid and feces were correctly classified in DNA extracts of all five donors but only in the first dilution (D1) for three donors (Table S5).
There were no fluids that classified correctly beyond the second dilution (D2), except for MB, which was still accurately classified in the second dilution for four out of five donors ( Importantly, high accuracy was achieved using forensically relevant dried samples of appropriate volumes that had been stored at room temperature for an extended period of time, which suggest that bacterial signatures remain stable enough to characterize dried body fluids via qPCR. The assay was developed for use on a qPCR instrument with five optical filters, and only four are used in the multiplex (three for microbial targets and one for passive reference dye); therefore, it is possible to include one more primer set and remain a single-well assay. An internal positive control should be consid-

ACK N OWLED G M ENTS
Dr. Paul Brooks for assistance with statistical modeling.