Evolution of genes involved in the unusual genitals of the bear macaque, Macaca arctoides

Abstract Genital divergence is thought to contribute to reproductive barriers by establishing a “lock‐and‐key" mechanism for reproductive compatibility. One such example, Macaca arctoides, the bear macaque, has compensatory changes in both male and female genital morphology as compared to close relatives. M. arctoides also has a complex evolutionary history, having extensive introgression between the fascicularis and sinica macaque species groups. Here, phylogenetic relationships were analyzed via whole‐genome sequences from five species, including M. arctoides, and two species each from the putative parental species groups. This analysis revealed ~3x more genomic regions supported placement in the sinica species group as compared to the fascicularis species group. Additionally, introgression analysis of the M. arctoides genome revealed it is a mosaic of recent polymorphisms shared with both species groups. To examine the evolution of their unique genital morphology further, the prevalence of candidate genes involved in genital morphology was compared against genome‐wide outliers in various population genetic metrics of diversity, divergence, introgression, and selection, while accounting for background variation in recombination rate. This analysis identified 67 outlier genes, including several genes that influence baculum morphology in mice, which were of interest since the bear macaque has the longest primate baculum. The mean of four of the seven population genetic metrics was statistically different in the candidate genes as compared to the rest of the genome, suggesting that genes involved in genital morphology have increased divergence and decreased diversity beyond expectations. These results highlight specific genes that may have played a role in shaping the unique genital morphology in the bear macaque.


| INTRODUC TI ON
In species with internal fertilization, genital morphology may have complex and rapid evolution. This has been demonstrated in males (Eberhard, 1993;Klaczko et al., 2015;Langerhans et al., 2016) and females (Greenway et al., 2019;Simmons & Fitzpatrick, 2019;Sloan & Simmons, 2019). There have been multiple proposed explanations for this including (1) species isolation via a "lock-and-key" mechanism, (2) sexual conflict, and (3) cryptic female choice (Eberhard, 1985;Sloan & Simmons, 2019). The first explanation suggests that male and female genitalia uniquely coevolve to prevent heterospecific mating. It has historically been dismissed as a general explanation for genital evolution but is acknowledged to play a role in particular cases (Eberhard, 1985(Eberhard, , 2010. One reason for this dismissal is a presumed lack of variability in female genitalia between species, which suggests the "lock" (female genitalia) is rarely a unique match for the "key" (male genitalia) (Eberhard, 1985;Sloan & Simmons, 2019). However, studies in beetles (Sota & Kubota, 1998), flies (Kamimura & Mitsumoto, 2012), and damselflies (Barnard et al., 2017) have demonstrated reproductive incompatibility between closely related species caused by the mechanical mismatch in heterospecific crosses, suggestive of a "lockand-key" mechanism (Sloan & Simmons, 2019). Sexual conflict may involve either conflict between males trying to obtain mates or conflict between males and females in levels of investment in offspring (Eberhard, 1985). Cryptic female choice happens when male reproductive output is affected by the female after copulation (Eberhard, 1985(Eberhard, , 1996. These processes both involve competition that may also result in species-specific genitalia. For example, a prediction of cryptic female choice is that female genitalia evolve more rapidly than male genitalia. This can result in compensatory coevolution of male genitalia that causes reproductive isolation between species (Greenway et al., 2019;Langerhans et al., 2016;Sloan & Simmons, 2019). In support of this, it has been found that male and female genitalia coevolve in internal fertilizing poecilid fishes (Greenway et al., 2019) and in dung beetles female genitalia diverge more quickly than male genitalia (Simmons & Fitzpatrick, 2019). Additionally, it has been found that females in populations of dung flies exhibiting sexual conflict have greater preference for conspecific males than females in monogamous populations (Martin & Hosken, 2003).
The bear macaque, Macaca arctoides, is unique among macaques (and primates generally) in its genital morphology, with distinctive genital characteristics in both sexes (Figure 1). In males, it has the longest baculum of any primate (4-6 cm; Fooden, 1990), even when corrected for body weight (Dixson, 1987;Fooden, 1990). In addition to its elongation, the urethral opening is located on the ventral side of the penis, altering the overall morphology relative to other macaque species (Fooden, 1990). In females, M. arctoides female genitalia appears uniquely coadapted to their male counterparts in both length and morphology (Fooden, 1967). Given these compensatory changes, it has been hypothesized that genital evolution in this species followed a "lock-and-key" mechanism, which is likely to be rare among primates (Dixson, 1998).
Beyond genital morphology, the mating behavior of the bear macaque is also atypical among macaques. In most other macaques, males and females mount multiply, but in contrast -and perhaps due to the altered male genital morphology -bear macaques engage in a single-mount ejaculatory mating strategy with a prolonged sitting where the males deposit an ejaculatory plug in females (Fooden, F I G U R E 1 Diverged genital morphology in male and female bear macaques. M. arctoides has divergent male and female genital morphology (top), relative to M. mulatta (bottom) representing mechanical isolation between these species. M. mulatta morphology is representative of other species in macaques. Male external genitalia for each species are shown on the left (from figure 3 of Dixson, 1987) and female reproductive tracts are shown on the right (from figure 4 of Fooden, 1967). Reproduced with permission. Divergent traits for male external genitalia include size, shape, and color. Divergent traits for female reproductive tracts include depth of the uterine cavity, width of the vagina, and opening size. Both males and females differ in the urethra location. Scales on the left are in 1 cm divisions. Note: the baculum, which is an internal male genital structure, is divergent in size and shape (not pictured) 1990). However, most studies on macaques on mating behavior have used M. arctoides as the sole representative of the sinica species group (Dixson, 1987;Dixson & Anderson, 2004). When considering Macaca radiata, also a member of the sinica species group, studies have found they too favor a single mount mating style similar to M. arctoides, suggesting this is not a unique feature of this species, but instead representative of the sinica species group (Shively et al., 1982). Similar to M. arctoides, M. radiata also exhibits a postejaculatory sit. Since both species have larger bacula than species in the fascicularis group, the hypothesis that increased baculum length aids in prolonged intromission is supported.
Additional distinguishing characteristics of the bear macaque that are not related to genital morphology include whitish pelage in newborns, and a bald forehead and cheeks (the latter evident in Figure 2a) (Fooden, 1990). Further, the evolutionary history of this species has been noted as extremely complex with early molecular work revealing phylogenetic incongruence between mitochondrial and Y-chromosomal genealogies for M. arctoides (Tosi et al., 2000(Tosi et al., , 2003 Jiang et al., 2016;Li et al., 2018;Li, Han, et al., 2009).
Similarly, both morphological and Y-chromosomal trees support placement in the sinica group Jiang et al., 2016;Tosi et al., 2000) (see Table S1 in Fan et al., 2018 for review). This phylogenetic incongruence was hypothesized to be consistent with ancient hybridization between the ancestor of the fascicularis species group and a sinica species group member (an ancestor of either M. assamensis or M. thibetana) contributing to the putative hybrid speciation of M. arctoides (Tosi et al., 2003). In contrast to allopolyploid speciation, homoploid hybrid speciation (HHS) conserves chromosome number (Mallet, 2007;Mavárez & Linares, 2008;Schumer et al., 2014). This hypothesized hybrid origin is diagrammed in Figure S1A, with a range of possible split times indicated by the thick purple line.
A recent analysis of macaque genomes supports introgression between M. arctoides and members of the fascicularis and sinica species groups , both of whom it overlaps geographically ( Figure 2b). These species groups are estimated to have split ~3 million years ago. Recent phylogenetic studies indicate introgression may have occurred after the split of M. mulatta and M. fascicularis Roos et al., 2019), which is estimated at 1.68 mya Jiang et al., 2016;Stevison & Kohn, 2009). This timing overlaps glacial periods of the Pleistocene which is associated with a reduced expanse of Southeast Asian forests into multiple refugia. Ecological isolation is a common feature of other scenarios of HHS (Gompert et al., 2006). In the case of M. arctoides, isolation is hypothesized to have contributed to the extensive interspecific hybridization, forcing the interbreeding of species that otherwise might have had alternative options for mating. Available phylogenetic evidence suggests that this interbreeding subsided ~1.5 mya (Tosi et al., 2003), with limited fossil evidence of hybridization beyond the late Pleistocene (Fooden, 1990). The evidence for historical hybridization suggests that the derived genital morphology may have been influenced by or evolved despite extensive genetic exchange with other macaque species (Tosi et al., 2000(Tosi et al., , 2003. Here, we investigated the potential role of hybridization in shaping the unusual genital morphology of M. arctoides. We hypothesized that if the unique genital morphology contributes to reproductive isolation, then genes involved in genital morphology would show extreme population genetic signatures for M. arctoides. To test this hypothesis, we compiled a list of 2284 candidate genes associated with genital morphology in other organisms, with a specific focus on mammals. We used an existing recombination rate map for M. mulatta to identify outliers in several population genetic metrics, such as introgression, nucleotide diversity, and divergence. Phylogenomic and population genetic approaches were used on several extant macaque species (M. arctoides; the sinica group species M. thibetana and M. assamensis; and the fascicularis group species M. fascicularis and M. mulatta). We found that 67 of the candidate genes were significant genome outliers in at least one of seven population genetic metrics. Further, the mean across 4/7 population genetic metrics was significantly different in the candidate genes as compared to all other genes in the genome. Interestingly, three dN/dS outliers were associated with putative baculum morphology in mice. Overall, our results suggest that genital evolution in the bear macaque involved a large number of genes associated with various phenotypic changes in both males and females.

| Samples and Genome Sequencing
One female each from M. arctoides (Malaya) and M. assamensis (A20) were sequenced. The M. arctoides sample was from the Malaka Zoo in Malaysia (previously described in Tosi et al., 2000). These two samples were multiplexed and sequenced on one lane of sequencing for initial quality assessment. They were then subsequently run on one additional lane of sequencing each for a final 1.5 lanes per sample. Sequencing was done at UCSF Medical Center Sequencing Facility on an Illumina HiSeq 2000 machine. Each run was assigned as different read groups to be treated as independent in the variant calling workflow. The raw read data have been deposited in the NCBI sequence read archive and are available via the accession PRJNA622565 (see Table 1).

| Publicly available sequences
Raw FASTQ files were downloaded from 8 public genome samples for this project (Table 1). Genome samples were selected to be representative of the taxa relevant for M. arctoides evolution. For the fascicularis group, many more public sequences were available than used here, however, we selected two sequencing projects and included all samples from both projects, including those with lower coverage (<10X, see Table 1). This sample size was chosen to allow for equal representation across the various species groups. Additional sample details can be found in the corresponding publications (Fan et al., 2014Yan et al., 2011). These were downloaded from NCBI SRA using sratoolkit version 2.8.1 with options to split data into read pairs and in the original format (ncbi/sra-tools, 2008). Because M. thibetana was sequenced on an older platform, we used seqtk (Li, 2019) to convert from Q64 to Q33. Additionally, SRA files from multiple lanes as indicated in the FASTQ header were split for independent alignment to the reference genome facilitating the definition of separate read groups to be treated as independent in the variant calling workflow.

| Alignment, genome analysis, and variant calling
Raw FASTQ files from all samples were aligned to the reference genome rheMac8 (NCBI Accession: PRJNA214746). The masked reference genome was downloaded from UCSC. The reference genome for rheMac8 includes 20 autosomes, a pair of sex chromosomes, a mitochondrial genome, and 284,705 unnamed scaffolds of varying length. The total length of all scaffolds is 3.2 Gb.
The total length of all named chromosomes is 2.8 Gb. GATK best practices were followed to obtain high-quality variant sites (see Supplementary Methods and Tables S1 and S2). Briefly, reads were aligned to the reference genome, duplicates were marked, indels were realigned, base quality scores were recalibrated, variants were called, and variant quality scores were recalibrated. Samples from each species were processed independently and merged into a final callset (see Data Availability). The baboon reference genome was added to the variant files via two-way genome alignment and custom scripts that are available via GitHub. These final filtered files for each chromosome with the baboon information are freely available and were used for all subsequent analyses. A more detailed workflow is described in both the supplement as well as the GitHub page for the project, which includes command prompts and custom scripts (see Data Availability).

F I G U R E 2
Summary of focal species, Macaca arctoides, its evolutionary relationships, and the data used. (a) M. arctoides has unique genital morphology and is a putative homoploid hybrid species. Image "Stumptail Monkeys" from (Waal, 2004) by Frans de Waal licensed under CC. (b) The geographic range of M. arctoides and other focal species in this study. M. arctoides has present-day geographic overlap with members of both species groups (redrawn from Fooden, 1980). Background image "Topographic map of East Asia" by Ksiom is licensed under CC BY-SA 3.0. This study included WGS from M. arctoides and representative members of each species group (Table 1). Baboon was used as an outgroup species. (c) The intersection of heterozygous sites between the five species. The output of vcf-compare from vcftools (Danecek et al., 2011) was input into the UpSetR package (Conway & Gehlenborg, 2019) to plot the intersection of sites. Values in the left plot match count of heterozygous sites per sample in Table 1

| Four-taxon test for introgression
To analyze patterns of introgression in sliding windows, a modified four-taxon test was used ( Figure S2). Rather than separately analyzing gene flow between M. arctoides and each parental species group, as has been done previously , we relied on the modified statistic f dM (Malinsky et al., 2015), which is suggested to perform better than other related statistics based on simulation results (Martin et al., 2015). Specifically, hybridization between the three samples of M. arctoides (P3) was tested with samples from the sinica species group (P1) and the fascicularis species group (P2), using the baboon reference genome as an outgroup (papAnu4; see Supplement for details). Our approach differed from the typical way these tests are setup where one may only be interested in shared ancestry between two pairs of taxa, (P1:P3 and P2:P3), and the P1 and P2 taxa are more closely related to each other than either are to P3 (see figure 2 in Kulathinal et al., 2009). However, this modification allowed for the quantification of shared ancestry from both putative parents together, with a single metric ranging from −1 to 1. We note that previous work had similar results when using the more traditional four-taxon test setup for these taxa . To further justify this modified approach, this analysis was repeated with individual taxa from each species group (Table S3). Genome-wide averages for both D and f dM were similar when a single representative of each species group was included (Table S3), suggesting that grouping the species did not qualitatively change the overall results (see Supplementary Methods).
Calculations of f dM were done using scripts available on GitHub (Martin, 2019) and using several differently sized genomic windows.

| Topology weighting
To analyze the weights of different phylogenetic topologies across the genome, the software Twisst was used to compare the weights of different species topologies throughout the genome (Martin & Belleghem, 2017;Martin et al., 2019). First, data used for the introgression analysis were pruned to make sure that each group had at least one member represented in the genotype data. Next, neighborjoining trees were constructed in sliding windows of 50 SNPs each across the genome using Phyml from scripts available on GitHub (Martin, 2019). This generated Newick formatted tree files that were used as input for Twisst. The resulting weight distributions among the three possible topologies were summed across the genome ( Figure 2d).
The topology weighted outputs were further explored by collapsing adjacent intervals with the same majority topology supported. Since there were three possible topologies, the majority topology was defined as any one topology that had at least twothirds of the total weight values. Any intervals without a majority topology were labeled as "unresolved." If adjacent intervals supported the same majority topology, they were collapsed. The data were subsequently split into the three major topologies. These were intersected with gene regions from the Ensembl annotations for rhe-Mac8 downloaded from UCSC.
Bootstrap confidence intervals for topo1, topo2, topo3, and unresolved proportions were computed by resampling 2792 1 Mbp intervals along the genome (149 when considering only chromosome X) for each of 10,000 replicates and taking the 2.5% and 97.5% quantiles. Note: A list of samples used in this study with a summary of genome information. Two samples were sequenced as part of this study (in bold) and the remainder were downloaded from NCBI.

| Divergence analysis
Based on the three possible topologies, nucleotide divergence was calculated as D XY using scripts available on GitHub (Martin, 2019).
First, stringency in defining the majority topology was increased to require 100% of the weights to be on only one of the three to-

| Candidate gene selection
The unique genital morphology of both males and females in the bear macaque ( Figure 1) led us to focus our study on candidate genes that have been identified in other studies to be associated with genital morphology. We conducted a literature search for terms associated with genital morphology such as "disorder of sex development," "DSD," "baubellum," and "baculum," the latter two terms included due to the extremely large baculum of the bear macaque, and the female equivalent structure. We filtered to remove genes that are solely associated with either fertility or adult hormonal disorders/ shifts as these conditions are unlikely to affect the development of genitalia. Included in our initial list of 141 genes (Table S4) were genes associated with gonad dysgenesis (Bamshad et al., 1997;Ekici et al., 2013;Lavi et al., 2020;Witchel, 2018), female masculinization (Wang et al., 2020), genital tubercle formation (Haraguchi et al., 2000), baculum morphology (Schultz, Ingels, et al., 2016), hypogonadism (Délot et al., 2017), gonad differentiation (Witchel, 2018), gonad development (Witchel, 2018), sex determination (Délot et al., 2017), and sex differentiation (Délot et al., 2017).
In addition to our literature search, we used the Human Phenotype Ontology (HPO) (Robinson et al., 2008)  Note: In addition to a manually curated list from the literature, several phenotype terms were mined from the Human Phenotype Ontology (HPO) and the Mammalian Phenotype (MP) Databases. Here, the number of genes associated with each term is listed along with the number of orthologs in rheMac8. Further, the number of genes used in our analysis where we calculated seven population genetics metrics are listed along with the final number of outliers per candidate gene set.
a set of 1554 genes. As expected, there were several genes that overlapped across each set, but also many genes unique to each query. In sum, our final candidate gene list comprised 2284 genes from the literature, as well as the HPO and MP databases that had orthologs in the rheMac8 reference genome (Table 2). Of these 2284 genes, only a subset was used in the calculations of each population genetic metric based on the constraints of these subsequent analyses. The major limiting factor was the data for recombination rate, which was used to identify genomic outliers (see next section). The total number of genes analyzed for each statistic is reported in Table S6 and summarized in Table 2. For most statistics, 1055 candidate genes were included in the population genetic analysis for outliers. This methodology was previously described in Bailey and Stevison (2021). Briefly, we used the three possible gene tree topologies to estimate a branch-specific dN/dS for the M. arctoides branch for every gene in the genome. Estimates where dS was 0 and dN/dS was greater than 10 were excluded as these genes had insufficient divergence to obtain valid estimates (David et al., 2020;Villanueva-Cañas et al., 2013). Then, the topology weights for each gene window from our Twisst analysis were used to weight the dN/dS estimates for each gene across the three gene trees.

| Outlier analysis
To further identify extreme values of each metric, we took into consideration that many of these statistics have been shown to be correlated with recombination rate, which can impede the detection of outliers, specifically enriching for outliers in regions of low recombination rate (Stevison & McGaugh, 2020;Wolf & Ellegren, 2017). This approach should also help to account for the potential bias of several of these statistics in regions of low diversity, which is highly correlated with the recombination rate. Therefore, we also  (Coop & Przeworski, 2007). Therefore, we instead used a 5% cutoff for extreme values of this statistic instead of the Cook's distance approach. This percentage produced fewer genome-wide outliers than the average of all other metrics, suggesting it as an appropriate cutoff (Table S6).
Genes identified as genome-wide outliers were then compared against our set of a priori candidate genes to identify outliers that were putatively involved in genital morphology. A chi-square test was conducted to determine if the number of outliers that were in our candidate gene set was more than expected due to random chance. A summary of the regression results and chi-square tests can be found in Table S6, and the resulting 67 outlier genes that overlapped our candidate gene set can be found in Table S7. BCFtools (version 1.11) (Li, Handsaker, et al., 2009) Table S7.

| Permutation analyses
Permutation tests were used to identify differences in various sum- the results are reported in Table S6.
In addition to this, we conducted permutation analyses on genes categorized as being involved in either male or female genital development or abnormalities according to the HPO and MP databases.
Specifically, the categories used were HP:0010461, HP:0010460, MP:0009198, and MP:0009208 (see Table S5). This was done to examine the possibility of male and female genital morphology evolving at different rates (see Introduction). These analyses parallel the above in examining the same summary statistics, though all tests were two-tailed as there was not a specific expectation for the direction of evolution.

| RE SULTS
Genomic analysis of five macaque species revealed a mosaic of evolutionary ancestry with respect to two major species groups.
These analyses used newly sequenced whole genome samples from  (Fooden, 1980). However, the two-way intersection between M. arctoides and M. thibetana had the fewest shared heterozygous sites, which is likely due to the low overall heterozygosity in the M.

| Extensive Mosaicism of the M. arctoides Genome
The four-taxon test ( Other window size results are in Figure  S3 and plots using individual M. arctoides samples are in Figure S4  This sliding window analysis was repeated with several window sizes ( Figure S3), and all had a similar genome-wide mean f dM . In a previous study, estimates of Patterson's D (Green et al., 2010) were similar to the f dM estimates here (see Table S8 in ).
Phylogenetic relationships among the species groups were evaluated via the software Twisst (Martin & Belleghem, 2017). This method quantifies phylogenetic relationships across the genome and returns weights in sliding windows (Figure 2d). The topology weights were used to define a "majority topology" as having twothirds or more of the sum of weights in any region. Windows that did not meet this criterion were re-classified as "unresolved." This analysis showed the proportion of the genome that supported "topo2" (fas,(arc,sin)), which groups M. arctoides with the sinica group, as

| Investigation of outlier genes associated with genital morphology
Our gene-centered approach led to the calculation of seven population genetic statistics across all genes in the rheMac8 annotation. In total, our analysis focused on 10,543 genes for which we were able to get reliable estimates for each metric, including recombination rate (Table S6). However, for the dN/dS calculations, this baseline number of genes was far fewer with only 1903 due to the limitation of our approach (see Methods). Of the 2284 genes that were included on our candidate gene list (Table 2 and S4) and also had orthologs in rheMac8, only 1055 were included in our analyses (Table 2 and S6). Of the genome-wide outliers across these metrics, 67 of these 1055 genes were statistically elevated as potential outliers for further investigation (Table S7).
A chi-square test was used to compare the proportion of outlier candidate genes (OCGs) against those analyzed to the expected number of outliers based on the remainder of the genes in the genome (Table S6). This analysis revealed that there were no more outliers than expected by chance, suggesting that the 67 outliers are not more than would be expected of any other set of candidate genes. However, a permutation test was used to compare the difference between means across the seven population genetic metrics found that for four metrics, the mean of the candidate genes was significantly different than expected based on the value across the remainder of the genes in the genome (Figure 4). These metrics showed that the nucleotide diversity, π, across the candidate genes was significantly lower than expected (Figure 4a). Additionally, both measures of F ST between M. arctoides and the two species groups revealed higher than expected nucleotide divergence (Figure 4c,d).
Similarly, D XY between M. arctoides and the fascicularis species group was much higher than expected based on 100 permutations of a similar number of genes that were not part of the candidate gene set ( Figure 4b).
An investigation into the 67 genes found that three genes were OCGs across four statistics (CEP19, GATA3, and PTPN23). Three more genes were OCGs across three statistics (AIRE, CSPP1, and SNAI2  (Table S7).
Despite the uncertainty of hits in the larger databases, seven of the OCGs were on the original literature survey for candidate genes (Table 2 and S4), which were more carefully selected based on their role in genital morphology. Of these, GATA3 was an outlier across multiple statistics, which was initially included due to its known association with urogenital anomalies (Délot et al., 2017), with over 20 years of research on this gene (Lemos & Thakker, 2020), including knockout studies (Grote et al., 2008). Another of these seven is HFE, which is associated with hereditary hemochromatosis (HH) and subsequent hypogonadism (Délot et al., 2017) due to decreased iron circulation (Bezwoda et al., 1977;McDermott & Walsh, 2005), with symptoms such as decreased libido and testicular atrophy in humans (Kelly et al., 1984). In our SNP effect analysis, GATA3 had 1 predicted moderate impact variant and 8 low impact variants (Table S7).
Additionally, two of the seven OCGs from the literature survey gene list were CYP proteins (CYP17A1 and CYP21A2). Both CYP17A1 and CYP21A2 are associated with congenital adrenal hyperplasia (CAH aka 17OHD), a disorder that leads to abnormal sexual development (Délot et al., 2017), with CYP21A2 mutations being the most common cause of CAH (Koppens et al., 1992).
CYP21A2 mutations in humans have been associated with simple virilization, in which female genitalia are masculinized during development (Koppens et al., 1992). Several cohort studies have found genotype-to-phenotype associations linking CYP21A2 mutations to a variety of CAH phenotypes (de Carvalho et al., 2016;Soardi et al., 2008;Torres et al., 2003). Additionally, protein structure analysis has been done to examine the impact of specific mutants on a form of CAH known as the salt-water wasting disease, which if left undiagnosed leads to low sodium levels and death (Haider et al., 2013). CYP17A1 is heavily involved in steroidogenesis, with

F I G U R E 4 Permutation test results.
For each panel, the red line is the difference between the mean of all genital genes and all other genes for a given summary statistic. Additionally, a histogram of the results from 100 permutations where the mean difference was recalculated using the same number of randomly selected genes as the candidate genital genes is shown. The various summary statistics plotted here are ( Table S6 M. arctoides π deletion mutations leading to hormone production inhibition and sex development abnormalities (Keskin et al., 2015;Papi et al., 2018;Turkkahraman et al., 2015;Xia et al., 2021;Zhang et al., 2015). Recent genome studies of 17OHD patients revealed 46,XY karyotypes in physically presenting females which co-occur with multiple mutations in the CYP17A1 gene (Xia et al., 2021;Zhang et al., 2015). CYP21A1 had two high, 21 moderate, and 32 low impact variants, whereas CYP17A1 had two moderate and eleven low impact variants (Table S7). Additionally, CYP21A1 had the highest proportion of silent and nonsense variants, as well as the 2 nd highest proportion of missense variants ( Figure 5).
Three more OCGs on the original literature candidate gene list that were outliers for dN/dS were identified in a study as major candidates for baculum morphology in mice (Schultz, Ingels, et al., 2016) (DACT1, PTPRC, and ASPM; Figure 5). This is particularly intriguing when one considers the unique baculum morphology of the os penis in the bear macaque relative to all other primates (see Discussion).
Because these were outliers for dN/dS, which is targeted to the  (Table S5).

| DISCUSS ION
Genital morphology is an important trait that may evolve rapidly.
In the bear macaque, the divergent genital morphology appears to be consistent with coevolution between males and females following a "lock-and-key" mechanism. We examined the intersection of genome-wide outliers in several population genetic metrics with genes associated with genital morphology in mammals to identify genes that might explain the unique genital evolution of the bear macaque.

| Genital morphology genes have higher divergence and lower diversity than expected
Although the number of outliers from our genital candidate genes was not more than expected by chance, the average divergence of these genes was higher than expected and the diversity was lower than expected ( Figure 4). None of the permutations overlapped the observed mean difference between the actual candidate gene set for four of the seven population genetic metrics. Specifically, the candidate genes had lower π which is consistent with recent positive selection on these genes that would have reduced local genetic variation. Similarly, F ST between M. arctoides and both species groups was higher than expected, which is also consistent with positive selection on this subset of genes as compared to the remainder of the genome. Additionally, D XY between M. arctoides and fascicularis species was elevated, but not D XY between M. arctoides and sinica species. These results suggest that the rate of evolution of these genes is higher than the rest of the genes in the genome, confirming the importance of genital morphology in the evolution of this species. A possible explanation for the difference in the two measures of nucleotide divergence is that F ST is a relative measure of divergence, whereas D XY is an absolute measure (Noor & Bennett, 2009). Here, the majority of the genome clusters with sinica species, which could explain why D XY between M. arctoides and the sinica species group does not have more outliers than expected.
Of 67 candidate genes identified as outliers, many are involved in a variety of phenotypes. Of course, because these are developmental phenotypes, it is not surprising that the functions of these genes would be highly pleiotropic. However, their specific role in contributing to the divergent genital morphology and subsequent reproductive isolation in this species is unclear. Of note, the number of genes in both phenotype ontology databases associated with male genital abnormalities is nearly 2x the number of genes associated with female genital abnormalities (Table 2). This is not likely due to underlying differences in the genetics of these traits, but instead more likely reflects a bias toward research into male reproductive morphology as compared to female reproductive abnormalities (Sloan & Simmons, 2019). Still, for the most part, the mean values across our various metrics were not significantly different between these sets of male vs. female genes (

| Divergent baculum morphology putatively associated with dN/dS outliers
Regardless of the inconclusiveness of the genes that were found based on phenotype ontology methods, 3/21 outliers for dN/dS were genes previously identified in mice as candidate genes for morphological variation in the baculum (DACT1, PTPRC, and ASPM). The baculum, or penis bone, is a novel male internal genital structure in placental mammals that evolved ~145 million years ago (Brindle & Opie, 2016), with the bear macaque, having the longest primate baculum. Since that time, it has been gained or lost multiple times, including a recent loss in Homo sapiens (Carosi & Scalici, 2017;. Despite its remarkable interspecific variation, the penis bone generally does not vary extensively within species (Ramm et al., 2009), and maybe the only physical trait that differs consistently between closely related species, making it a key morphological character for species identification (Carosi & Scalici, 2017).
However, the genetics of morphological variation in this trait is only beginning to be understood. In 2016, a morphometric analysis using micro-CT scans of bacula in mice identified three major quantitative trait loci, two that together explained 50.6% of the variation in baculum size, and one that explained 23.4% of the variation in baculum shape (Schultz, Ingels, et al., 2016). These regions were further narrowed by both differential gene expression in the early development of male mice, and potential involvement in bone or genital morphogenesis, revealing 16 major candidate loci with potential effects on baculum development and morphology (see table S3 in Schultz, Ingels, et al., 2016), that were included on our candidate gene list (Table S4).
The baculum in M. arctoides is more than double the average length of sinica species group taxa and nearly four times as long as fascicularis species group taxa (Dixson, 1987;Fooden, 1971). In addition to its elongation, the morphology of the M. arctoides baculum also has a unique arced crescent shape, with an absent distal region that is present in most other macaque species (See the comparison in figure 8 of Fooden et al., 1985;Fooden, 1990). Among macaques, these two features-a gentle dorsoventral curvature and the lack of a well-defined distal process-are shared only by members of the fascicularis species group. This latter point perhaps explains why 4/5 amino acid changes across these three genes that were shared between M. arctoides and another group, were shared between M. arctoides and the fascicularis species members. This is in stark contrast to the rest of the genome which has a much higher proportion of shared variation with sinica species group members.

| Future directions
This study has focused on genes with phenotypic associations with genital morphology, but an outlier analysis simply presents hypothe- Additionally, as the genetic basis of other genital traits, such as baubellum morphology becomes known, the candidate gene list can be extended. Similarly, as the genome annotation improves with subsequent reference genomes, more genes can be considered that have clear orthologs in macaques.
In addition to this study, there has been extensive interest in the evolution of this species. However, due to the limited number of publicly available whole-genome samples (now three with the present study), the conservation status of this species, and the ethical and logistical barriers to experimental research with primates, it is challenging to make functional insights into its evolution. Here, our results provide conflicting support both for and against an HHS scenario, proposing new alternatives that can be addressed in future studies on the evolution of this species. Additional WGS data from more individuals of this species would allow for a better characterization of selection/adaptation within this lineage. Additionally, more WGS samples would be useful in examining the regions of excess shared ancestry for more direct tests that rule out ILS so that hybridization can be better quantified and understood.

ACK N OWLED G M ENTS
This work was supported by research start-up funds from the Department of Biological Sciences at Auburn University (LSS).
Sequencing costs were supported in part by NIH grant R01 GM115433 (to JDW). This work was made possible in part by a grant of high-performance computing resources and technical support from the Alabama Supercomputer Authority. BJE was supported by a grant from the Natural Science and Engineering Research Council of Canada (RGPIN-2017-05770). We thank members of the Stevison lab for helpful discussions regarding the analysis and results, with particular thanks to Stephen Sefick. We thank Simon Martin both for having a public github repository of high quality, but also for responsiveness regarding inquiries in the use of his code. We thank several researchers in attendance at the 2016 Speciation GRC, Evolution 2017, and PEQG 2018 for helpful discussions regarding this work.
We thank Wendy Hood, Ken Halanych, and Geoff Hill for feedback on early drafts of the manuscript.

O PEN R E S E A RCH BA D G E S
This article has earned an Open Data Badge for making publicly available the digitally-shareable data necessary to reproduce the reported results. The data is available at https://doi.org/10.5281/ zenodo.5006971 and http://dx.doi.org/10.35099/ auror a-67.