Development and validation of genetic markers for sex and cannabinoid chemotype in Cannabis sativa L.

Hemp (Cannabis sativa L.) is an emerging dioecious crop grown primarily for grain, fiber, and cannabinoids. There is good evidence for medicinal benefits of the most abundant cannabinoid in hemp, cannabidiol (CBD). For CBD production, female plants producing CBD but not tetrahydrocannabinol (THC) are desired. We developed and validated high‐throughput PACE (PCR Allele Competitive Extension) assays for C. sativa plant sex and cannabinoid chemotype. The sex assay was validated across a wide range of germplasm and resolved male plants from female and monoecious plants. The cannabinoid chemotype assay revealed segregation in hemp populations, and resolved plants producing predominantly THC, predominantly CBD, and roughly equal amounts of THC and CBD. Cultivar populations that were thought to be stabilized for CBD production were found to be segregating phenotypically and genotypically. Many plants predominantly producing CBD accumulated more than the current US legal limit of 0.3% THC by dry weight. These assays and data provide potentially useful tools for breeding and early selection of hemp.

as a Schedule 1 controlled substance in the United States, https ://www.deadi versi on.usdoj.gov/sched ules/index. html. CBD is now formulated as an approved prescription medicine (DEA schedule 5) marketed as Epidiolex® by Greenwich Biosciences.
Plant sex and cannabinoid chemotype are essentially qualitative traits that are important for hemp producers and breeders. Maximal production of CBD occurs in unpollinated female hemp plants. To grow and develop legally compliant hemp cultivars (<0.3% THC by dry weight), genetic propensity for THC production must be known. However, sex and cannabinoid chemotype are difficult to phenotype in young plants. Until the onset of flowering, male and female plants are phenotypically indistinguishable, and immature plants produce relatively small quantities of cannabinoids. Cannabinoid chemotype of immature plants may also not reflect the cannabinoid profile of mature plants ( de Meijer, Hammond, & Micheler, 2009;Pacifico, Miselli, Carboni, Moschella, & Mandolino, 2008). Molecular markers can address these challenges, since DNA from very young plants can be used in reliable genotype assays.
The genetic structure of CBDAS and THCAS have been recently elucidated (Grassa et al., 2018;Laverty et al., 2019). While the genes are highly alike, sharing 84% amino acid identity (Onofri, de Meijer, & Mandolino, 2015), they are not allelic or at equivalent loci. However, chromosomal scaffolds containing these genes are physically linked in repulsion and not highly homologous, leading to low recombination (Laverty et al., 2019). Consequently, cannabinoid chemotype inheritance can largely be modeled as monogenic, with plants producing predominantly THC (chemotype 1, B T /B T ), about equal amounts of CBD and THC (chemotype 2, B T /B D ), or predominantly CBD (chemotype 3, B D /B D ; de Meijer et al., 2003;Small & Beckstead, 1973).
through the X: autosome ratio (Ainsworth, 2000;Vyskot & Hobza, 2004), although this mechanism is not fully understood (Divashuk et al., 2014;Ming, Bendahmane, & Renner, 2011). Despite having well-defined heteromorphic sex chromosomes (Sakamoto, Akiyama, Fukui, Kamada, & Satoh, 1998), environment can play a large role in sex determination in Cannabis sativa (Schaffner, 1921). Factors such as altered hormones (Lubell & Brand, 2018;Ram & Jaiswal, 1970), daylength (Schaffner, 1921), and autosomal genes (Faux, Berhin, Dauguet, & Bertin, 2014) have been shown to influence sex expression. Through manipulation of hormones, it is possible to create all-female progeny using pollen produced by a female (XX) plant induced to produce male flowers and pollen (Lubell & Brand, 2018). From a commercial perspective, these so-called 'feminized' seeds are generally more expensive to produce than normal dioecious seeds due to the additional work required for their production and the market demand for all-female seed lots.
The absence of a Y chromosome does not appear to be sufficient to ensure a total lack of production of male flowers (Faux et al., 2014;Menzel, 1964;Razumova et al., 2016). Some Cannabis plants are monoecious, producing both male and female flowers on the same plant (Menzel, 1964). While monoecious plants are commonly referred to as hermaphrodites, the botanical definition of hermaphrodite requires staminate and carpellate parts on the same flower (Lebel-Hardenack & Grant, 1997), a phenomenon rarely seen in C. sativa.
PACE (PCR Allele Competitive Extension, 3CR Bioscience Ltd) is a high-throughput, fluorescence-based marker system that can interrogate SNPs, indels, or other polymorphic DNA features. Fluorescence-based marker systems such as PACE, KASP, TaqMan, and the recently developed RhAMP are estimated to be 45 times faster than gel-based systems (Rasheed et al., 2016;Toth, Pandurangan, Burt, Mitchell Fetch, & Kumar, 2018). PACE has the further advantage of lower cost while retaining simple codominance, unlike loop-mediated isothermal amplification (LAMP; Notomi et al., 2000) or direct sequencing (McKernan et al., 2015;Weiblen et al., 2015). PACE assays, unlike some other marker systems, require a fluorescent plate reader or qPCR system to score, limiting field-based testing. However, the ease and speed of PACE assays is well suited for breeding or advanced production systems. Here we used publicly available sequence information to develop reliable, high-throughput PACE assays that are highly predictive of sex and cannabinoid chemotype phenotypes in C. sativa.

| MATERIALS AND METHODS
For sex testing, dioecious seeds from CBD cultivars of C. sativa were started in a greenhouse in plugs of soilless mix in 2019 and cultivated under a 18L:6D light regime. DNA was extracted from leaves harvested from 2-week old plants. Genotyped female plants were planted in outdoor field trials, while males were transplanted to two-gallon pots and kept in the greenhouse. Plant sex was noted at the onset of flowering.
For cannabinoid chemotype marker testing, cannabinoid data and tissue from the 2018 Cornell CBD Hemp Cultivar Trials was used. Hemp seeds were obtained from multiple sources (Table S3). Plants for CBD production were started in the greenhouse under a 18L:6D light regime with males or monoecious plants removed based on phenotype. The trials were located on Cornell University farms in upstate New York: one at Bluegrass Lane Turf and Ornamental Research Farm and the other at Cornell AgriTech Gates West Farm. Late season rainfall lead to saturated field conditions in the Geneva location during flowering.
The top 10 cm of mature female plants were harvested by hand at maturity and dried in a greenhouse. The inflorescence was then milled using a Magic Bullet food grinder (Homeland Housewares) and stored at 4°C until analysis. For each sample, 50 mg of dried, milled tissue was mixed with 1.5 ml ethanol by high-speed shaking at room temperature with a Tissuelyser (Qiagen), and filtered through a SINGLE StEP PTFE Filter Vial (Thomson). The resultant liquid was directly subjected to HPLC analysis (Dionex UltiMate 3000; Thermo Fisher) with biphenyl-4-carboxylic acid (BPCA) as an internal standard, using a Phenomenex Kinetex 2.6 µm Polar 100 Å column 150 × 4.6 mm heated at 35°C. Samples were injected and eluted at 1.2 ml/min over a 6 min gradient, from 65% acetonitrile, 0.1% formic acid, to 80% acetonitrile, 0.1% formic acid, followed by a 4 min isocratic step. Absorbance was measured at 214 nm. The following standards were used as calibrants: THCA, Δ 9 -THC, CBDA, CBD, cannabichromenic acid (CBCA), cannabichromene (CBC), CBGA, CBG, cannabinol (CBN), and Δ 8 -THC (Sigma Aldrich).
DNA was isolated using a high-throughput modified CTAB method utilizing PALL DNA binding plates (Doyle & Doyle, 1987). PACE reactions were run according to the manufacturer's (3CR Bioscience Ltd) instructions with five extra final cycles on a Bio-Rad C1000 Touch thermocycler. A Bio-Rad CFX96 qPCR machine was used as a fluorescent plate reader, and the data were analyzed using Bio-Rad CFX Maestro software. The primers (Table 1) were designed so that samples from plants with a Y chromosome and THC-dominant plants produce HEX fluorescence.
Multiple linear regression analysis was performed in RStudio version 1.1.463 running R version 3.5.1 (R Core Team, 2018). Cannabinoid chemotype allele score was numerically coded as [−1,0,1] while all other variables were coded as factors. Total potential CBD and THC were calculated by summing the concentration of the decarboxylated form with the concentration of CBDA or THCA multiplied by 0.877.

| Sex assay development
A novel high-throughput assay for the Y chromosome was developed based on the previously identified male-specific MADC6 sequence (Genbank AF364955.1). To develop the assay, the MADC6 sequence was compared to the FINOLA (Genbank GCA_003417725.1) and Purple Kush (Genbank GCA_000230575.1) genomes (Van Bakel et al., 2011) using BLAT (BLAST-Like Alignment Tool) on the C. sativa Genome Browser Gateway (UCSC Genome Browser, University of California). A PACE assay, named CSP-1, was designed based on a SNP between the sequences (Figure 2).

| Sex assay validation
The CSP-1 assay was used to test a total of 2,170 plants of 14 cultivars. In all but one population the genetic male:female ratio fit the expected 1:1 model (Chi-square p > .05, Table  S1). The individuals genetically scored as females were planted in field trials and the individuals genetically scored as males were discarded or moved to greenhouse conditions. Approximately 98% of the screened genetic females were phenotypically female. Approximately 1% of the screened genetic females were monoecious, including individuals from three cultivars (Table S1). Two screened plants were phenotypically male, and when retested, shown to be originally miscalled. About 270 plants genetically scored as male from four hemp cultivars were allowed to flower in greenhouse conditions, and all were phenotypically male ( Table 2). Monoecious plants (20 plants each of the cultivars 'Anka', 'Hlesia', and 'USO-31') were also examined with this assay, and all monoecious plants were scored as female.

| Cannabinoid chemotype assay development
A PACE assay to predict cannabinoid chemotype was generated through comparison of marijuana-type CBDAS (B T ) and hemp-type CBDAS (B D ), which were previously found to correspond to high-THC and high-CBD chemotypes, respectively ( Figure 3; Weiblen et al., 2015). While THCAS and CBDAS are not the same gene, their close linkage in repulsion suggests that they are inherited monogenically as a cannabinoid chemotype locus Laverty et al., 2019). This assay was named CCP-1.

| Cannabinoid chemotype assay validation
Two hundred and seventeen plants from 14 hemp cultivars grown for CBD in two locations were tested with the cannabinoid chemotype (CCP-1) assay and phenotyped for cannabinoids using HPLC. Of these, two were homozygous for the marijuana-type allele (B T /B T ), 65 were heterozygous (B T /B D ), and 150 were homozygous for the hemp-type allele (B D /B D ). Most cultivar populations were segregating for this allele, which was consistent with the phenotypic data ( Figure 4). The genotypic data corresponded to three apparent chemotypes, in terms of total potential CBD and THC (Figure 5a, ANOVA p < 1e-4). This indicates that the CCP-1 assay identifies previously established B T and B D alleles .

F I G U R E 4 Distribution of cannabinoid chemotype alleles across
cultivar populations. Additional allele frequency data can be found in Table S3 218 | TOTH eT al.
Mean Δ 9 -THC and total potential THC differed across genotypic groups (ANOVA p < 1e-4). Within the genotypic groups there was a strong correlation between total potential CBD and total potential THC concentrations (Figure 5a; B T /B D r = .72 p < 1e-4, B D /B D r = .86 p < 1e-4).
The Δ 9 -THC concentration for B D /B D samples was consistently <0.3% (dry weight), while 35% of the B T /B D samples had a Δ 9 -THC concentration <0.3% (Figure 5b). Only 39% of the B D /B D samples had total potential THC concentration <0.3% (Figure 5b). The mean ratio of total potential CBD:THC was 0.02, 1.6, and 20.3 for B T /B T , B T /B D, and B D /B D lines, respectively (Figure 5d; Table S2).
A total of 1,420 plants from 47 cultivars were tested with the CCP-1 assay (Table S3). These cultivars were from multiple sources and grown for CBD, grain, or grain/fiber. The THC-associated B T allele frequency varied by cultivar, from 0% in some clones grown for CBD, up to 98% in a Chinese grain/fiber cultivar (Table S3). F I G U R E 5 Genotype to phenotype relationships. (a) Total potential tetrahydrocannabinol (THC) and cannabidiol (CBD) concentration (% dry mass) in individual plants for which B locus genotype was also determined. The red line indicates 0.3% total potential THC (b) Δ 9 -THC concentration by genotype. The red line indicates 0.3% dry weight Δ 9 -THC. (c) Total potential THC concentration by genotype. The red line indicates 0.3% dry weight total potential THC. (d) Total potential CBD:THC concentration ratio. All means differ (ANOVA p < 1e-4). Tabular data can be found in Table S2 T A B L E 3 Linear regression R 2 values of various models predicting cannabinoid data

| Other factors affecting cannabinoid production
Genotypic group, cultivar, and trial were used to create models explaining the potential CBD:THC concentration ratio as well as the concentrations of Δ 9 -THC, CBD, potential THC, potential CBD, and total potential cannabinoids (Table 3). Total potential cannabinoids included CBD, THC, CBC, CBG, and their corresponding acids. Genotypic group explained the most variance in the CBD:THC ratio, as well as Δ 9 -THC and potential THC levels, but not total potential cannabinoids. Cultivar was an important factor in total potential cannabinoid abundance, as well as the concentration of CBD and Δ 9 -THC. The cultivar explained ~3% of the variation in the potential CBD:THC ratio when the genotypic group was taken into consideration, and the trial was a poor predictor of all measured variables.

| CSP-1 sex assay
The CSP-1 assay was found to be a reliable predictor of plant sex. There was a 50:50 segregation ratio in nearly all tested dioecious populations including CBD types, grain types, and grain/ fiber types. As expected, monoecious plants were scored as female (Divashuk et al., 2014). Given these data, it is likely that the CSP-1 assay distinguishes a nonrecombining part of the Y chromosome. This is somewhat surprising, given that the original MADC6 marker assay was not a completely accurate predictor of plant sex, with 2/75 reported recombinants (Törjék et al., 2002). It is possible that this was due to PCR failure, monoecious plants with a quantitatively male phenotype, or that the CSP-1 assay in fact examines a different DNA sequence than the original MADC6 assay. Recent C. sativa whole-genome sequencing (Laverty et al., 2019) showed six unassembled scaffolds in the male genome with >99% identity to the MADC6 sequence in C. sativa, possibly contributing to the empirical success of this assay. As MADC6 shows some sequence relationship to retrotransposons, it is possible that the sequence was subject to copy number increase in the recent past (Sakamoto, Ohmido, Fukui, Kamada, & Satoh, 2000;Törjék et al., 2002). It is well known that in the development of sex-determining regions of plants, an absence of recombination between male-and female-specific sequences can lead to an expansion of retrotransposon copy number repeats, which are not lost through a Muller's ratchet-type mechanism (Sakamoto et al., 2000;Vyskot & Hobza, 2004).

| CCP-1 cannabinoid chemotype assay
The CCP-1 cannabinoid chemotype marker assay detected three genotypic groups that corresponded to three phenotypic groups, reflecting previously described chemotypes . Since the CCP-1 assay examines CBDAS only and CBDAS and THCAS are not allelic, it is possible that a plant with a recombination between CBDAS and THCAS would not be scored correctly. However, we did not detect this in any of our samples, and the tight linkage in repulsion between CBDAS and THCAS is well established Grassa et al., 2018;Weiblen et al., 2015). The mean Δ 9 -THC and total potential THC concentrations as percent dry matter were significantly different in each chemotype. If Δ 9 -THC concentration alone as assayed by HPLC was used as the criterion for legal compliance at the level of 0.3%, then all B D /B D samples and 35% of the B T /B D samples would be below the threshold. It is possible that past breeding material chosen for low THC was in fact heterozygous, leading to segregation in released cultivars (Table S3). If total potential THC is used as the legal criterion, then 61% of the B D /B D samples would be above the legal threshold of 0.3%, and therefore be noncompliant. The close correlation between potential CBD and THC concentrations in the B D /B D class suggests that it might be difficult to develop a cultivar that accumulates high CBD concentrations, while maintaining low total potential THC. The average potential CBD:THC ratio was about 20:1, which suggests that accumulation of greater than 6% CBD will result in rise of the total potential THC above 0.3%. A clear target for breeders developing high CBD hemp cultivars is to raise the ratio of total potential CBD:THC.
There are several lines of evidence to suggest that the concomitant increase in THC concentration with that of CBD in the B D /B D group is due to promiscuous activity of the active CBDAS. Despite attempts, no demonstrably active transcribed THCAS has been isolated from a confirmed chemotype 3 plant (Kojoma et al., 2006;Laverty et al., 2019;Onofri et al., 2015). Other research found a C. sativa plant with a B D /B D genotype and a catalytically inactive CBDAS that accumulates CBGA and essentially no THCA, although mutations in CBDAS and a putative THCAS would also explain this observation (Onofri et al., 2015). A purported active THCAS from CBD-dominant fiber-type hemp was later shown to be a cannabichromenic acid synthase (CBCAS) (Kojoma et al., 2006;Laverty et al., 2019). Lastly, in vitro expression of wild-type CBDAS leads to production of CBDA:THCA in a ratio very close to 20:1 at optimal pH (Zirpel, Kayser, & Stehle, 2018). Future breeding efforts should be informed by this promiscuous activity.
Some studies have found sequence and copy number variation in CBDAS and THCAS and correlated them to differences in cannabinoid production (McKernan et al., 2015;Weiblen et al., 2015). These differences were not assayed here, but could have conceivably contributed to some of the variation within groups.

| Other factors affecting cannabinoid production
Neither trial nor cultivar per se explained variations in the CBD:THC ratio. Trial did not appear to have much of an effect on any measured parameters, despite flooding stress in one trial location. The explanatory power of models 3 and 7, including cultivar but not marker coding, are likely due to differences in allele frequency in each cultivar population. While cultivar per se poorly explained CBD:THC ratio, cultivar was the best predictor of total potential cannabinoid concentration. It has previously been demonstrated that the factors affecting cannabinoid chemotype are not linked to total cannabinoid content (Grassa et al., 2018).
In testing 14 different cultivars, with 217 plants total across two locations, we found that none of the B D /B D samples had a Δ 9 -THC concentration >0.3% dry weight. However, we did find that most cultivar populations were segregating for the B T allele. It is possible that differences in cannabinoid production ascribed to changes in environment may in fact be due to sampling of individual plants with B T alleles. Additional studies of the influence of environment on cannabinoid production coupled with individual plant genotyping may lead to a better understanding of the regulation of cannabinoid production.