Molecular sexing assays in 114 mammalian species: In silico sequence reanalysis and a unified graphical visualization of diagnostic tests

Abstract Molecular‐based methods for identifying sex in mammals have a wide range of applications, from embryo manipulation to ecological studies. Various sex‐specific or homologous genes can be used for this purpose, PCR amplification being a common method. Over the years, the number of reported tests and the range of tested species have increased greatly. The aim of the present analysis was to retrieve PCR‐based sexing assays for a range of mammalian species, gathering the gene sequences from either the articles or online databases, and visualize the molecular design in a uniform manner. For nucleotide alignment and diagnostic test visualization, the following genomic databases and tools were used: NCBI, Ensembl Nucleotide BLAST, ClustalW2, and NEBcutter V2.0. In the 45 gathered articles, 59 different diagnostic tests based on eight different PCR‐based methods were developed for 114 mammalian species. Most commonly used genes for the analysis were ZFX, ZFY, AMELX, and AMELY. The tests were most commonly based on sex‐specific insertions and deletions (SSIndels) and sex‐specific sequence polymorphisms (SSSP). This review provides an overview of PCR‐based sexing methods developed for mammals. This information will facilitate more efficient development of novel molecular sexing assays and reuse of previously developed tests. Development of many novel and improvement of previously developed tests is also expected with the rapid increase in the quantity and quality of available genetic information.


| INTRODUC TI ON
Molecular-based sexing techniques can be used to reliably determine sex in mammals with limited sexual dimorphism. However, even in species with clear sexually dimorphic traits molecular sexing has various purposes, such as embryo sex identification, behavior and ecology studies, and conservation genetics.
For molecular-based sexing, sex-specific DNA markers are often utilized, such as the presence of a testis-determining factor gene (SRY) in mammals. In our previous study, we reviewed various molecular-based sexing methods and proposed terminology unification regarding sex-specific sequence variants (SSSV) (Hrovatin & Kunej, 2018). Those can further be divided into three main groups: (a) length polymorphisms, (b) sequence differences, and (c) number | 5019 STRAH And KUnEJ (dose) of sex chromosomes. Length differences can arise either due to chromosome specific number of repeats, or due to indels specific for either sex chromosome-sex-specific indels (SSIndel). Sequence differences encompass Y-chromosome-specific fragments or genes, allele-specific sequences (nonhomologous parts of homologous genes), and single nucleotide variations on homologous genes of sex chromosomes (sex-specific sequence polymorphisms-SSSP).
We also established minimal requirements for reporting molecular sexing assays, including unification terminology (Hrovatin & Kunej, 2018): species scientific name, species ID, gene name, sequence and ID, sex-specific variant, method, coordinates of relevant regions on the nucleotide sequence, characteristics defining the amplicon system, description of detected amplicons and controls, and reference PMID or WoS ID. There is, however, still little overview of the currently existing molecular sexing assays, based on PCR, which are still the most commonly used. The field lacks a review study on existing sexing methods developed for different species, as a consequence multiple tests have been developed for the same species.

While many new tests are published, previously developed tests
have not yet been reexamined according to the recent updates of genomic browsers. Additionally, in many of the examined articles the methods were not adequately described and the information needed to be supplemented. Finally, the main elements for development of a PCR-based molecular sexing test need to be summarized for more efficient development of the study in the future.
The aim of the present analysis was therefore to: (a) gather reported PCR-based sexing assays for a range of mammalian species and develop a table with extracted relevant information from the publications, (b) supplement the extracted data with missing genomic information, (c) reexamine the molecular design using data from latest genomic browsers using in silico analysis, (d) unify graphical visualizations of the sexing tests, and (e) summarize main elements for designing and reporting a PCR-based sexing test. Visualization of the molecular sexing tests was performed using the following steps. The Nucleotide BLAST was used for the majority of alignments, and ClustalW was used in cases of large gaps in the sequences. Genetic polymorphisms were extracted from Ensembl browser and marked on the sequence. For tests including the use of a restriction enzyme, the enzyme recognition sites of the sequences were retrieved using the NEBCutter v2.0 tool (http://nc2.neb.com/ NEBcutter2/). Ensembl genomic browser release 90 was used to retrieve information on genetic variations (Zerbino et al., 2018). In cases of PCR assays based on nonhomologous genes, chromosome ideograms and locations of the genes were extracted from Ensembl browser.

| MATERIAL S AND ME THODS
In cases of references with incomplete information related with nucleotide sequences, we visualized the method with a simple sketch of the sequence, primers, and the SSSV. We presented the expected results for each method with a visualization of band lengths in bp on an agarose gel.

| RE SULTS
The present analysis consisted of the following five main steps: (a) obtaining articles on molecular sexing of mammals and extracting the available data, (b) complementing the missing genomics data and presentation in a tabular format, (c) obtaining SNP locations from the Main findings of the study PCR-RFLP Beckwitt et al., (2002) TA B L E 1 (Continued) (Continues) Ensembl browser, (d) visualization of the assays in a unified manner, and (e) summing up the main elements and guidelines for designing a new PCR-based test for molecular sexing.

| Literature search and data extraction
Obtained 45 articles were published between 1990 and 2018. A total of 114 different species were sexed in these articles. Several assays were tested on multiple species, giving a total of 161 tests.
The articles were heterogeneous in terms of the information they provided. Most did not report species ID, gene accession numbers or sample sizes, but sometimes also lacked electrophoreograms or any product sizes in base pairs.

| Complementing the missing data and tabular presentation
The data extracted from the articles are presented in tabular format (Table 1). For each test, the following information is presented: common name and scientific name of the species, taxonomy ID, gene name, SSSV, sample size, and method. Additional details are included in the Supporting Information Appendix S1: gene name, primer name, nucleotide sequences of the forward and reverse primer, and annealing temperatures for PCR.
In total, 25 articles reported the sequences used for the assay development accompanied by NCBI accession numbers or Ensembl ID. For 21 articles, the sequences were not provided. Available sequences were obtained from genomics databases for 12 of the articles not containing NCBI accession numbers or Ensembl ID. Ten articles employed nonhomologous genes for their test, so sequence alignments were not necessary for visualization.

| Visualizations of reanalyzed molecular sexing tests
Visualizations of 65 tests for 114 species are presented in Supporting Information Appendix S2, and two examples of visualized tests are also presented in Figure 1a,b. Visualization of each test includes the following elements: article citation, species common name, species scientific name, primers used, sequence alignment (or either chromosome or gene representation SSSV on the sequence, restriction enzyme recognition and cleavage sites (where appropriate), expected PCR products for both sexes and NCBI accession numbers or Ensembl ID.

| Main elements required for development of a new molecular sexing test
In this section, we sum up minimal information for designing a PCR-based sexing technique obtained from the articles.
Generally, it is useful to obtain reliable genetic information on the species in question, genes and SSSVs. Ideally, the products should be amplified in one step, produce unambiguous results, and provide an internal amplification control (Villesen & Fredsted, 2006) The goal is to choose a method compatible with laboratory equipment and intended use. After obtaining the nucleotide sequence, the appropriate SSSV, method, and primer specificity are chosen based on the type and quality of the samples to be used in research. While designing the test, three basic elements should be considered.

| Primer design
Primers can be designed to either amplify genes of multiple species, or are specific for one species. The approach is chosen according to the purpose and the means of the study. Degenerate primers are useful for multiple species, while species-specific primers are usually preferred for studies of samples collected in the field, which might F I G U R E 1 (a) A visual representation of the design of sex determination test using a PCR method for the domestic dog, containing an SSIndel (b) A visual representation of sex determination using a PCR-RFLP method for sheep, containing an SSSP be contaminated with foreign DNA. For example, Sastre et al. (2009) developed a test used on wolf fecal samples and tested it on several species of animals likely to be preyed upon by wolves, and Okuyama et al., 2014 designed a raccoon-specific test, which would also prevent species misidentification of the samples collected in the wild.
Design of degenerate primers useful for a greater number of species usually targets genes commonly preserved between the species (Aasen & Medrano, 1990). Primers can be derived from a consensus sequence (Bidon et al., 2013;Fredsted & Villessen, 2004;Morin et al., 2005).

| Product size
Defining the optimal product size and size difference between the products is necessary for sexing and amplification success.

| Internal amplification controls
Internal PCR amplification controls confirm successful amplifications and thus increase the reliability of the test. Often, X-specific or autosomal products are utilized. They are necessary because absence of a male-specific signal can be the result of an unsuccessful PCR reaction.
Usually, the Y-specific product is the diagnostic component and the X-specific (or autosomal) product is the amplification control.
The amplification control is present in all samples and indicates a successful PCR reaction, while the presence or absence of the diagnostic (Y-specific) product determines the sex. Bidon et al., 2013 even used amplification of two Y-specific and independent genes (in addition to the amplification control) to decrease the possibility of one diagnostic Y-chromosome signal not appearing due to failed amplification.
Tests which amplify homologous X-and Y-specific genes with the same pair of primers already include the internal control.
Nevertheless, an additional primer pair for a Y-specific gene (mostly SRY) can still be included when developing a method, in order to corroborate the results (Lindsay & Belant, 2007;Malik et al., 2013;Morin et al., 2005).

| D ISCUSS I ON
The present analysis contains a collection of PCR-based sexing as- The present study contains a collection of information on a range of PCR-based sexing test, enabling easier making the access to information on already existing assays, such as primers, genes, SSSVs, and expected results of specific tests. Missing information from the articles, such as official gene names and accession numbers for the sequences used for sexing, is also supplemented. The unified visualizations present sequence alignments of the PCR sexing assays and their expected results. To our knowledge, this study is the first to review and reanalyze the existing sexing assays. In future studies, it should be explored if sequence variants discovered recently effect previously developed sexing assays. The three main elements of designing a PCR-based sexing assay presented in this study will help in the development of new tests where necessary.
While the application of bioinformatics methods for in silico development of new genetic sexing assays can help produce reliable tests in the future, the importance of confirmation with larger sample sizes should not be overlooked, due to the possibility of variation of the genes of interest within the population. The increase in availability of annotated genomic data (especially containing information on possible SSSPs and SSIndels) can, however, also help develop more reliable assays while at the same time decrease the necessity for large sample sizes, especially in cases where samples are not readily available. For better review of the existing and upcoming novel sexing assays, a searchable database should be developed.

ACK N OWLED G M ENTS
This work was supported by the Slovenian Research Agency (ARRS) through the Research program P4-0220.

CO N FLI C T O F I NTE R E S T
None declared.

AUTH O R CO NTR I B UTI O N S
Data curation, data synthesis, visualization and writing R.S; design of the study, writing, coordination of the study: T.K.

DATA ACCE SS I B I LIT Y
Sequences used for alignments were downloaded from NCBI and Ensembl, their accession numbers are provided in Supporting Information Appendix S1.