The relationship between sexual dimorphism and androgen response element proliferation in primate genomes

In the males of many vertebrate species, sexual selection has led to the evolution of sexually dimorphic traits, which often are developmentally controlled by androgen signaling involving androgen response elements (AREs). Evolutionary changes in the number and genomic locations of AREs can modify patterns of receptor regulation and potentially alter gene expression. Here, we use recently sequenced primate genomes to evaluate the hypothesis that the strength of sexual selection is related to the genome‐wide number of AREs in a diversifying lineage. In humans, we find a higher incidence of AREs near male‐biased genes and androgen‐responsive genes when compared to randomly selected genes from the genome. In a set of primates, we find that gains or losses of AREs proximal to genes are correlated with changes in male expression levels and the degree of sex‐biased expression of those genes. In a larger set of primates, we find that increases in indicators of sexual selection are correlated with genome‐wide ARE counts. Our results suggest that the responsiveness of the genome to androgens in humans and their close relatives has been shaped by sexual selection that arises from competition among males for mating access to females.

Although it is well established that sexual selection can shape sex-biased phenotypes, there is growing interest in the impacts of sexual selection on the genomic architecture (Mank, 2017, Wilkinson et al., 2015, Kirkpatrick, 2017. As genomic information becomes more readily available, the genome-wide effects of sexual selection may become detectable without a priori knowledge of the strength of sexual selection or phenotypic characteristics. Because sexual selection often drives the evolution of sexually dimorphic traits, the extent of sexual dimorphism across species can be used as an indicator of the relative strength of sexual selection, especially in systems where the mating competition is well understood (Dines et al., 2014, Morris and Carrier, 2016. In primates, for example, body mass and canine tooth length are sexually dimorphic in strongly sexually selected species van Schaik, 1997, Thorén et al., 2006). Additionally, relative testis mass is known to correlate with increased multimale mating behavior and is an indi-cator of postcopulatory sexual selection, as it is associated with greater sperm counts for sperm competition (Anderson and Dixson, 2002, Kahrl et al., 2016. Sexual dimorphism is often accompanied by sex-biased gene expression resulting in correlations between expression-level differences and dimorphic features (Harrison et al., 2015, Pointer et al., 2013, Ellegren and Parsch, 2007. Sex-biased gene expression is considered to be one possible resolution to intralocus sexual conflict (Bonduriansky and Chenoweth, 2009, Stewart et al., 2010, Pennell and Morrow, 2013, the situation in which an allele at a locus has different fitness effects depending on the sex of its bearer (Bonduriansky and Chenoweth, 2009, Connallon and Clark, 2014, Lande, 1980. There is evidence that sex-biased regulation via hormones or other signaling pathways can lead to sexbiased gene expression (Mank, 2017, Oliva et al., 2020, Anderson et al., 2020. Recent studies have suggested that sex-biased hormones have pleiotropic effects that reduce genetic correlations between the sexes (Wittman et al., 2021), and many genes have differential effects on the phenotype depending on the bearing sex (van der Bijl and Mank, 2021). Hence, genomic regions associated with sex-biased signaling could contribute to the phenotypic expression of traits under sexual selection and thus respond to sexual selection.
The most obvious genomic targets of sex-biased hormone regulation are cis-regulatory elements, sequences of the DNA usually found in noncoding regions that regulate gene expression (Geserick et al., 2005, Tsai and O'Malley, 1994, Wittkopp and Kalay, 2012. Despite being noncoding regions, they are subject to selection because nucleotide substitutions can affect the strength with which a receptor binds to the genomic region (a value known as binding affinity) and how exclusively the genomic region is bound by a particular receptor over others (i.e., the specificity of the sequence) (Hahn, 2007, Smith et al., 2013, Wray, 2007. Furthermore, duplications of cis-regulatory elements may be important in creating new opportunities for receptor regulation (Jimenez-Delgado et al., 2009) or in altering the transcriptomic response in an additive fashion (Geserick et al., 2005, Cleutjens et al., 1996, Wilson et al., 2016. There is evidence that sex-biased gene expression is altered as physically proximal cis-regulatory elements are gained or lost (Matthews et al., 2021). In the human genome, there is a correlation between the number of estrogen-mediated genes and the number of putative estrogen binding sites across chromosomes (Anderson and Jones, 2019). In the Gulf pipefish, there is an excess of estrogen binding regions near putative sex-biased and estrogen-mediated genes involved in ornament expression (Anderson et al., 2020). It is possible that just as gene expression-level differences coincide with sexual dimorphism, signatures of sex-biased regulation via changes in genome-wide cis-regulatory elements will likewise coincide with sexual dimorphism.
To investigate the hypothesis that an increase in sexual selection should correspond with an increase in genomewide regulatory elements, we chose to look specifically in vertebrates because this group has well-documented links between sex-biased signals, sex-biased traits, and the associated cis-regulatory elements. Typically, sexual selection acts more strongly in males and leads to male-biased traits, such as ornamentation, weaponry, courtship, and sperm production, which are often regulated by androgens in vertebrates (Bartos et al., 2012, Bhattacharya et al., 2019, Ghosal and Sorensen, 2016, Lindsay et al., 2016. The cis-regulatory element associated with androgen receptors is the androgen response element (ARE), a 15 base-pair sequence composed of two palindromic (i.e., reverse complement) half-sites separated by a three base-pair spacer (e.g., AGAACAnnnTGTTCT-half-site, spacer, half-site) (Wilson et al., 2016, Roche et al., 1992 (Table 1). If (1) sexual selection drives the evolution of sexual dimorphism and enlarged testis size and (2) these features are, at least in part, controlled by sexbiased expression mediated by androgens, then we can predict that new androgen-regulated genetic regions should accumulate over evolutionary time to create new genomic regions with the potential for sex-specific gene expression. These new genomic regions can either occur to induce novel androgen-regulated expression in a gene or enhance an already existing androgenresponsive gene. In either case, the total number of genomewide AREs would increase. In the present study, we formally tested this prediction by first investigating if AREs are associated with greater sex-biased and androgen-responsive expression. Then we examined correlations between gain or loss of proximal AREs and expression patterns. Finally, we determined the correlation between the number of genome-wide AREs and indicators of both precopulatory (size dimorphism) and postcopulatory (relative testis size) sexual selection. A correlation between the ARE count and the indicators would suggest that sexual selection drives a genome-level change in responsiveness to sex-hormone signaling as adaptations to mating competition arise over evolutionary time.
The ARE nucleotide sequence (motif) with the highest binding affinity to the androgen receptor is called the canonical ARE (cARE) (Wilson et al., 2016, Roche et al., 1992. Substituting base pairs in the motif may still allow for androgen binding but will alter the affinity and specificity of the sequence. Although the cARE has high affinity for binding to androgen receptors, it has low specificity and will easily bind to other related receptors, especially the glucocorticoid receptor (Laudet et al., 1992, Nelson et al., 1999, Geserick et al., 2003. Paradoxically, higher specificity is achieved via more point mutations on one of the half-sites, so a loss in overall affinity accompanies higher specificity (Wilson et al., 2016, Nelson et al., 1999, Sahu et al., 2014. There are numerous variants of the ARE (Table 1) that have been empirically confirmed. Some of these variants are proposed to be general, whereas others are proposed to be specific to the androgen receptor. Another aim of our study was to see which validated ARE motifs more strongly correlate with indicators of sexual selection and if they share any patterns in nucleotide sequence.
To investigate the emergent patterns in genome-wide counts of various ARE motifs and sex-biased traits associated with sexual selection, we required a vertebrate taxon containing a multitude of species with well-sequenced genomes as well as data on characteristics associated with pre-and postcopulatory sexual selection. We identified simiiform primates as a viable taxonomic group. These primates have been extensively studied in terms of sexual dimorphism, testis size, and genomic sequencing. For sexually dimorphic traits, and therefore indicators of precopulatory sexual selection, we identified body mass dimorphism van Schaik, 1997, Mitani et al., 1996) and canine height dimorphism (Thorén et al., 2006, Plavcan andvan Schaik, 1992) as  viable characteristics for analysis. In addition, we identified relative testis mass as an indicator of postcopulatory sexual selection (Anderson and Dixson, 2002, Harcourt et al., 1995, Moller, 1988. We also selected 18 variations of the ARE motif, including six androgen-specific ARE motifs and 12 nonspecific ARE motifs. Of the 12 nonspecific motifs, we included the canonical ARE (cARE), as well as an alternate to the canonical ARE (aARE) that differs at the end base pairs only (Claessens et al., 2017). We also considered two versions of the generic steroid response element: the canonical SRE (cSRE) and an alternate SRE (aSRE), which bind generally to multiple corticosteroids (Roche et al., 1992). To investigate correlations between AREs and expression levels, we used sex-biased (Guo et al., 2018) and androgen responsive data from humans (Nelson et al., 2002, Romanuik et al., 2009) as well as expression data from five organs (Brawand et al., 2011).
Here, we explore two dimensions of AREs and sex bias. In the initial line of questioning, we test whether gene proximity to AREs is greater in male-biased and androgen responsive genes relative to all genes. In a small sample of hominid primates, we tested for correlations between changes in male expression levels and sex bias for select genes, and we examine the gain or loss of AREs proximal to those genes. In the second line of questioning, we tested multiple hypotheses regarding the relationship between the number of genome-wide ARE motifs, the different motif sequences, and the three traits related to sexual selection in simiiform primates. First, we test for a significant correlation between genome-wide counts of each ARE motif and each of the physical traits. Second, we employ multidimensional analysis to look for clusters of ARE motifs that group with respect to physical traits. After determining which, if any, ARE motifs show a positive relationship with physical traits, we then investigate possible patterns of sequence or androgen specificity among those motifs.

DATA COLLECTION
Thirty-one species of simiiform primate had published genomes at the time of this study. To evaluate the completeness of genomic sequencing, we used BUSCO (Simão et al., 2015) and selected primate genomes that contained more than 80% of the singlecopy orthologous gene set from the primate database as complete sequences. After this selection step, we were left with 26 primate species for our study. Using existing literature, we gathered data on body mass, canine height, and testis weight for these species (Thoren et al., 2006, Harcourt et al., 1995, Plavcan and Ruff, 2008 (Table S1). In some cases, data were available from a congener or were gathered prior to a taxonomic revision, splitting a formerly single species into multiple species. For Nomascus leucogenys, testis dimensions were taken (Hill and Kanagasuntheram, 1959) and used to estimate testis mass (Dahl et al., 1993). For three species (Rhinopithecus roxellana, Cercopithecus mona, and Piliocolobus tephrosceles), no comparable data were available for testis weight. Both body mass and canine height dimorphism were calculated using the residuals from a fitted log-log model of male and female values for each species (Abouheif and Fairbairn, 1997, Clutton-Brock, 1985, Rensch, 1950 such that larger residuals represent greater male-biased dimorphism. Similarly, relative testis mass was calculated using the residuals from a fitted log-log model of body and testis mass for each species (Moller, 1988) such that larger residuals represent greater relative testis mass.
To investigate genes and their association with AREs, sex bias, and androgen responsiveness, we used three datasets: one that reported a list of sex-biased genes in humans (Guo et al., 2018) (Table S2), and two that reported genes and expressionlevel changes after exposure to androgen in human LNCaP cell lines (Nelson et al., 2002, Romanuik et al., 2009 (Tables S3  and S4). For expression data, we used data found in five species (Homo sapiens, Pan troglodytes, Pan paniscus, Gorilla gorilla, and Macaca mulatta) across five organs (brain, cerebellum, heart, kidney, and testis) and numerous genes (Brawand et al., 2011). In many cases, the datasets contained just one replicate, but where multiple individuals were sampled within a species, we took the arithmetic mean of the RPKM (Reads per kilobase of transcript per million mapped reads). We used RPKM values of males as expression values and calculated sex bias by determining the log 2fold difference between male and female values within a species. Given the small sample sizes (mostly n = 1), we did not calculate significance of sex bias.
We selected ARE motifs for analysis by searching the literature for research that established the role of the element in a transcriptional response to androgen for a given gene (Table 1). This could have been done by excision of the motif in question or transfection of the motif to generate a response to an androgen. We limited our dataset to 18 motifs for ease of multidimensional analysis. All motifs were described using either mouse or human cell lines making them appropriate for study in our primate group. For each species, we searched for each motif in the genome using the exactmatch feature in the R library Biostrings (Pagès et al., 2019). This feature counts the number of times a set of nucleotides has the exact pattern of base pairs in the scanned sequence, in our case the genome, as the given motif. Because our chosen motifs often diverge by one to two base pairs, we did not allow any mismatches in the motif search. Scans were done only in one direction, which was sufficient to identify all sites for palindromic motifs. For nonpalindromic motifs, we ran a second scan with the reverse complement of the motif and summed the number of matches to represent total hits in the genome. We used the phytools (Revell, 2012) heatmap function to calculate a standardized score (Z-score) of both the genome-wide count data for each motif and residual values for all three indicator traits ( Fig. 1; raw data, Table S5). It is important to note that direct detection of a motif should not be equated with the identification of an active androgen receptor binding site (Wasserman and Sandelin, 2004), as many factors contribute to androgen receptor binding. Rather, our goal was to enumerate the total number of motifs in the genome, and these sites should be interpreted as putative regions for androgen receptor binding. In cases where we needed to match ARE location with a gene of interest, we used the General Feature Format (GFF) files associated with the genomes for the five species described in the expression data section.

RESPONSIVENESS, AND ARES IN HUMANS
We used a custom R script to extract gene coordinates from the human GFF file and then selected genes from this list that matched those that were male-biased from the list of sex-biased genes (n = 1043). We then ran the exactmatch feature to produce a list of coordinates from the investigated AREs. We used a previously published custom R script (Anderson et al., 2020, Anderson andJones, 2019) that matches AREs to genes based on a predetermined distance from the transcription start and end sites. In this case, we search 25-kb upstream of the transcription start site, 10-kb downstream of the transcription end site, and the entirety of the gene (both introns and exons). We chose this distance because, although cis-regulatory elements can be found far away from the transcription start site (Carroll et al., 2005), the majority are within 25 kb and can also be found in intragenic regions (Zheng et al., 2016, Welboren et al., 2009. To see if the malebiased gene set had more AREs than expected, we randomly sampled an equal number of genes (n = 1043) from all genes in the human genome 1000 times to generate two distributions. One distribution is the number of genes with at least one ARE within the predetermined distance (25 kb up, 10 kb down); the other distribution is the total number of proximal AREs in the entire set. We then determined where the male-biased gene set falls on these distributions. We repeated this process for the androgenresponsive gene list (n = 122). If AREs and genes with proximal AREs are overrepresented in the male-biased and androgenresponsive gene sets compared to the random distribution, then there is support for the importance of AREs in male-biased expression and androgen responsiveness, respectively.

EXPRESSION
We identified 8250 orthologous genes with complete data across the five primates used for expression analysis. As with humans (described above), we generated a list of proximal AREs and their corresponding genes using the same distances for each species (Table S6). We then filtered genes to keep those with a proximal ARE present in at least one species (leaving 1312 genes). For each organ, we filtered genes so that we kept any gene with an RPKM value in males greater than 10 and an absolute malefemale log 2 -fold difference greater than 2 in at least one of the five species. In testis, we only used the RPKM value cutoff as we did not have ovary data to compare. We chose these cutoffs a priori to investigate those genes with biological significance and to account for errors due to low sample size. To perform a correlation analysis, we removed any gene without any change in ARE count across species. We downloaded the ultrametric tree for our selected species from the 10k trees website (Arnold et al., 2010) to use as our phylogeny. We created three models using generalized least squares (GLS) with the phytools package (Revell, 2012) in R, where both RPKM value in males and sex bias were independently treated as a function of the number of proximal AREs for each gene. The models were (1) no phylogenetic effect on the model, (2) including a phylogenetic effect (cor-Pagel) where the phylogenetic effect's value (lambda) was freely estimated in the model, and (3) using a predetermined value of lambda, estimated using the phylosig function. We compared the Akaike Information Criterion (AIC) (Akaike, 1974) of the three models and chose the model with the lowest AIC value. The P-values from the set of RPKM or sex-bias analyses for each organ were adjusted for multiple tests using the False Discovery Rate (FDR). We further estimated the correlation values by estimating the covariance with lambda set to the value determined by the appropriate model (no phylogenetic effect means lambda = 0) and converted those covariances to correlations. We calculated R 2 from the chosen model.
To test for overrepresentation of any ARE motifs, we calculated a genome-wide baseline for each motif by counting the number of hits in the genome in each species and taking the average of all five species to represent the genome-wide group. For the proximal ARE group, we counted the number of each mo-tif present in the set of 1312 genes. Because AREs were mostly conserved across species within genes, we only counted a motif once per gene. With a count table for genome-wide and proximal AREs for each motif, we performed a chi-square analysis to test for significantly different distributions. We then investigated the residuals for each motif to determine the percent contribution of each motif to the chi-square statistic. If any motif had a large contribution to the differences between the two sets, we removed it and ran the chi-square test again to confirm its effect.

INDIVIDUAL CORRELATION ANALYSIS
We downloaded the ultrametric tree for all of our 26 selected species from the 10k trees website to use as our phylogeny. As before, we created the three models using generalized least squares where the genome-wide count of an ARE motif was treated as a function of each indicator of sexual selection. Using the chosen model, we looked at the significance of the effect of indicator on motif number both with and without Bonferroni correction for multiple tests of each indicator. As in the previous analysis, we estimated the correlation values and calculated R 2 from the chosen model.

MULTIDIMENSIONAL ANALYSIS
To assess larger patterns of trait correlations across all ARE motif counts, we used several statistical approaches. First, we investigated whether ARE motif counts were more likely to be positively correlated with indicators of sexual selection by using a binomial test for the number of positively correlated motif counts with a given indicator, where the expected number by chance was one half of the total number of motif counts (n Exp = 9). Second, we performed a principal components analysis and identified the principal component that had the largest number of loadings with the same directionality and explained a large amount of variance. We then took that principal component (PC ARE) and did the above-described GLS modeling to determine its correlation with indicators of sexual selection. Using the 23 species with data available for all three indicator traits, we used canonical correlation analysis (CCorA) (Harold, 1936) to maximize correlations between ARE motif counts and indicator traits. CCorA is symmetrical in that indicator traits and ARE motif count correlations are maximized in both directions. We were able to account for phylogeny in the CCorA using canonical correlation in the phytools package. We used both NbClust (Malika et al., 2014) and mclust (Scrucca et al., 2016) to determine optimal numbers of clusters. We then used kmeans clustering for the number of optimal clusters to determine groupings of ARE motifs in the CCorA.

MOTIF SEQUENCE PATTERN
We further attempted to determine if there was some uniformity to ARE motif sequences that either significantly positively correlated with indicator traits in the GLS analysis or showed positive loadings in the multidimensional analyses. These ARE motif counts are putatively informative about the strength of sexual selection. We generated a haplotype network for the motifs using pegas (Paradis, 2010) to visualize similarities and differences among motifs. We also compared the frequency of each base pair of the putatively informative motifs to those motifs that showed no correlations in our analyses described above. It has been suggested that the loss of thymine at position 3 in the motif confers higher specificity to the androgen receptor (Nelson et al., 1999, Claessens et al., 2017, Verrijdt et al., 2003. Seventeen of the 18 motifs chosen have been tested for androgen specificity with competitive assays (Nelson et al., 1999, Geserick et al., 2003, Denayer et al., 2010, Schauwaers et al., 2007 except for ccnd1, which has an adenine at the third position. Six of these motifs are androgen specific and lack thymine at the third position. We further investigated whether these androgenspecific motifs are more likely to correlate with indicator traits.

EFFECT OF ARES ON SEX BIAS AND EXPRESSION
Of the 1043 genes with male sex bias in humans, 101 have at least one proximal ARE. In the entire gene set, 111 proximal AREs were detected. These values are in the 80.0 and 87.1 percentiles, respectively, of 1000 random draws of an equal number of genes across the human genome (Fig. 2a,b). These numbers suggest some possible influence of AREs on sex bias but not a significant one. For the 122 androgen-responsive genes in human LNaCP cell lines, 18 have at least one proximal ARE, and in the entire gene set, a total of 25 proximal AREs are present. These values are in the 97.5 and 100.0 percentile, respectively, of 1000 random draws of an equal number of genes across the human genome (Fig. 2c,d). These numbers provide clear evidence that androgen-responsive genes have an excess of AREs.
For the comparative analysis across the five primate species, of the 1312 genes have at least one ARE in at least one species, and 946 of these genes passed our organ-based RPKM filter. Of these 946 genes, 431 show a significant correlation between the Table 2

. Genes with correlations between the number of proximal AREs and gene transcription measures for five primate species across five organs, as described in the text.
We filtered genes from the initial list of 8250 genes by the presence of a proximal ARE, a male RPKM value >10, and a male to female log 2 foldchange >2 (sex bias), for at least one species in each category. The number of genes passing filter is shown, along with the number of genes with significant correlations. Human gene symbols for significant genes are shown. Bold genes are significant in both male RPKM and sex bias for the same organ. Italicized genes are significant in two separate organs. The list of testis genes can be found in Table S7. number of proximal AREs and male expression levels or degree of sex-bias expression (or both) for at least one organ (Tables 2 and S7). Most of the significant correlations occur in testis because there was one less level of filtering (no sex-bias criterion); however, 41% of all genes tested in testis show a significant correlation between male expression values and the number of proximal AREs. The remaining organs tend to have more significant correlations with male expression values (23-33% of genes tested) than with sex bias (13-28%). Roughly 40% of genes with sex-bias correlations to proximal ARE numbers also have male expression correlations to proximal ARE numbers. R 2 values are typically high (i.e., >0.9) for significantly correlated genes, whereas correlation coefficients range from 0.1 to 0.99 (Table S7). These data support the supposition that altering the number of local AREs can affect male and sex-biased expression for a gene.
Tests for over-and underrepresentation of specific AREs reveal the motif mvdp is strongly underrepresented for both the gene set with at least one proximal ARE and the gene set that passed the filters. In both cases, the residuals of mvdp strongly influence the distribution (∼20%) and removal of mvdp from the lists results in a lack of significance in the chi-square tests (with mvdp: P = 0.0046, without: P = 0.0607). For the gene set with significant correlations between male expression values or sex bias and the number of proximal AREs, the motif p21 was strongly overrepresented, as the residuals greatly influence the distribution (47%) and removal of p21 from the list results in a lack of significance in the chi-square test (with p21: P = 0.0249, without: P = 0.5128). Interestingly, mvdp has the most genomewide hits of any tested motif (range: 988-1255) and p21 has the fewest (range 47-66).

MOTIF NUMBERS
Of the 18 studied ARE motifs, four motifs have at least one significant positive correlation between their genome-wide counts and an indicator trait (Table 1). For these significant positive correlations, canine size dimorphism correlates with three motif counts (hklk, ccnd1, hk2), relative testis mass correlates with two (hklk, psa3), and body mass dimorphisms correlates with just one (psa3). The hklk motif count has a good model fit and a high correlation with both testis mass and canine dimorphism, as do hk2 and ccnd1 motif counts with canine dimorphism. Looking at the correlations between the genome-wide motif counts and canine size, many motif counts have large correlation values (e.g., cARE, sc12, pem2) but lack statistical significance. The genomewide count of psa3 has a poor model fit and low correlation with body mass dimorphism and testis mass, whereas canine size has both high fit and correlation yet is not significant. Looking across indicator traits, motif counts with high correlational values in one trait tend to have high values in the other traits, even if these correlations are not significant in all or even one trait, especially when looking at canine size and testis mass. Two motif counts (cSRE and aARE) have a significant negative correlation with an indicator trait (canine dimorphism and body mass dimorphism, respectively), but both have poor model fit and correlation values close to zero. Three motif counts have negative correlations with all three indicator traits: cSRE, mvdp, and slp2. In general, for body mass dimorphism the models describe low correlation values and poor model fit with motif counts, whereas for testis mass and canine dimorphism the models tend to have higher correlation values and better model fit with motif counts.
Using the binomial test, where the null expectation of positively correlated motif counts is half the total sampled motifs (n Exp = 9), we find that more motif counts positively correlate with canine dimorphism (n Obs = 14, P = 0.030) and relative testis mass (n Obs = 15, P = 0.008) than expected by chance, but no excess of positive correlations exists for body mass (n Obs = 8, P = 0.815). Principle components analysis of the ARE motif counts finds the first principal component explains 52.7% of the variance and has all but three motif counts (cSRE, psa1, and slp2) with positive loadings; two of these three motif counts have negative correlations with all three traits. No other component has as many loadings with the same directionality. This component (PC ARE) positively correlates with both canine size dimorphism and relative testis mass with a good fit and high correlation but does not correlate with body size dimorphism (Table 1). Because the PC ARE represents a general increase in motif count (indicated by the large number of positive loadings), it supports the finding of more positive correlations across motif counts than expected by chance. The correlation value for PC ARE and canine dimorphism is higher than that for PC ARE and testis mass, an observation that agrees with the general trend of higher correlations across individual motif counts for canine dimorphism. These results support the hypothesis that as genome-wide ARE counts increase generally across motifs, there is a corresponding increase in some indicators of sexual selection.

MULTIVARIATE ANALYSIS
Phylogenic signal in the CCorA (Fig. 3) is estimated to be near zero (λ = 0.000066) and the first two canonical variates for each matrix (ARE count and indicator trait) have correlation values greater than 0.96. Pillai's trace F-statistic indicates that dimensions one through three are significant (P = 0.0001), but dimensions two through three (P = 0.1097) and three by itself (P = 0.5930) are not. These results indicate a significant correlation between the indicator trait dataset and the genome-wide ARE count dataset. Analyzing the CCorA without accounting for phylogeny finds no significant values despite lambda's small estimated value. All indicator traits are positive on the second component ( Fig. 3a) with relative testis size and canine dimorphism having strong loadings, whereas the first component is largely influenced by body mass. Clustering analysis suggests four groups, with motif counts clustered by quadrant and with c3 and p21 clustering together. Five motif counts (psa3, hklk, hk2, cARE, and pem2), three of which significantly correlate with an indicator trait, have positive loadings on both axes (Fig. 3b). Four motif counts (mvdp, psa1, slp2, and cSRE) generally have negative loadings on both axes. Species are divided such that Cercopithicoidae is separated from the Hominoidea and Platyrrhini along the second axis. In general, the first axis separates species by the amount of polygyny or precopulatory sexual selection and the second axis separates species by the amount of polygynandry or postcopulatory selection. The results of the multidimensional analysis are in agreement with the previous analyses, making the CCorA (Fig. 3) plot a good visual representation of the data. Motif counts with negative loadings on both axes have negative correlations with all three indicator traits except for psa1, which has low correlation values and poor model fits. Three of these motif counts also have a negative loading on the PC ARE. Furthermore, motif count positively correlates with testis mass and canine dimorphism and in the CCorA most motif counts are positive on the second axis that is strongly influenced by both of those indicator traits. All five motif counts with positive loadings on both axes also positively correlate with all three indicator traits. Notably, counts of mvdp have repeated negative loading across analyses and are significantly underrepresented in the genes with proximal AREs from the prior analysis of the five species. Both p21 and c3 have the lowest ranges of any motif (i.e., max value <70). Combined with the result of overrepresentation of p21 as a proximal ARE in the expression dataset, our observations suggest that these motifs may have an effect but their low rate of occurrence makes these effects hard to detect when compared to other motifs (typical mean number of occurrences in the genome is ∼500).

TRENDS IN MOTIF PATTERNS
Using a haplotype network to show similarities between motifs, we find that the cARE is at the center of the network from which the various motifs diverge (Fig. 4). The motifs with genome-wide counts that significantly correlate with indicator traits are not directly adjacent to the cARE and are generally dispersed from each other in the network except for psa3 and hk2. No single basepair position is conserved across the significant motifs (Table 1) except for base pairs that are conserved across nearly all motifs (positions -5, -4, -3, -2). Motifs that are more specific to the androgen receptor are generally closer together on the network and only one (ccnd1) of the six motifs is correlated with an indicator trait. Considering motifs that had positive (cARE and pem2) or negative (slp2, psa1, mvdp, cSRE) loadings on the multivariate analysis did not reveal any new noticeable pattern. All of these findings considered together suggest a lack of any consistent evolutionary pressure on specific motif patterns.

Discussion
Our results reveal two general trends in primates. First, AREs are implicated in male-biased, androgen-responsive genes and the gain or loss of proximal AREs over evolutionary time can affect expression values and sex bias of these genes. Second, increasing numbers of genome-wide AREs, along with an increase in sexual dimorphism in trait values, are indicative of sexual selection. Both trends can point to particular motifs that may have a significant effect on expression patterns or have genome-wide counts significantly increase along with indicator traits. No pattern of motif nucleotide sequence, including those with high androgen receptor specificity, predictably influences correlations between traits and genome-wide counts. When considered as a whole, the majority of AREs and the linear combination of AREs do increase in number along with sexual selection and gene expression. We conclude that individual motif sequences may not be as important to sexual selection as the overall trend in the number of motifs, which could allow for stronger sex-biased regulation. We investigated whether genome-wide ARE motif counts are reliable indicators of sexual selection by examining motif counts as a function of other indicators of sexual selection. The two traits we examined for precopulatory sexual selection are body mass dimorphism and canine height dimorphism, as both have been implicated in sexual selection in primates (Plavcan and van Schaik, 1997, Mitani et al., 1996, Plavcan and van Schaik, 1992, Leutenegger and Kelly, 1977. Body mass dimorphism and canine height dimorphism are strongly correlated (r = 0.641, P = 0.001, R 2 = 0.395, λ = 0.830), yet canine height has more significantly correlated motifs and more positively correlated motifs. Looking at the multidimensional analyses, body mass dimorphism has little influence on the distribution of ARE motif counts. Instead, canine height dimorphism has a large influence on the distribution of ARE motif counts. If both body size dimorphism and canine size dimorphism are indicators of sexual selection and are correlated, why does genome-wide ARE motif count have a strong signal in one (canine) but no discernable signal in the other (body size)? It is important to note that these traits are approximations for sexual selection and carry assumptions that may not truly reflect the direct influence of the trait on reproductive success (Kappeler and Van Schaik, 2004, Henshaw et al., 2016, Jones, 2009. Some evidence suggests that body size dimorphism might be more susceptible to factors beyond sexual selection and canine size is a better indicator of sexual selection (Thorén et al., 2006, Leutenegger andKelly, 1977), although other results call this conclusion into question (Plavcan et al., 1995, Matsuda et al., 2020. Our results cannot settle which trait is a better indicator of sexual selection, but they do show a strong correlation with canine size dimorphism. If canine height is a better predictor of sexual selection than body mass, then the results found here support the hypothesis that greater precopulatory sexual selection will lead to an increase in genome-wide AREs. To test whether genome-wide ARE motif count is a good indicator of postcopulatory sexual selection, we used relative testis size as our sexually selected trait. Testis size had fewer significant motifs than canine size dimorphism and a weaker effect in the multidimensional analyses but slightly more positive correlations with ARE motifs overall. In ours and other datasets, relative testis mass is not correlated with either body mass (P = 0.253) or canine height (P = 0.981) sexual dimorphism, suggesting independent effects of pre-and postcopulatory selection (Lupold et al., 2019). Interestingly, the hklk and psa3 motifs, which are affected by canine height and body mass, respectively, are also significantly affected by testis mass. Further, the multidimensional analysis suggests that the direction of testis effects tends to be the same for canine size dimorphism. These results suggest that testis mass does correlate with genome-wide motif number and that precopulatory and postcopulatory sexual selections affect the genome similarly with respect to the evolution of AREs.
A lack of a clear pattern in nucleotide substitutions or androgen specificity in significantly correlated motifs suggests that the precise composition of individual motifs may not be as important as the number of motifs present. Nonetheless, the canonical ARE motif represents the baseline from which the significant motifs are derived and follows the general pattern of increasing in frequency with the evolution of sexually dimorphic traits (Table 1; Fig. 2). Selection and drift likely play roles in further departures from the canonical ARE motif, with selection acting to increase or decrease specificity when necessary and genetic drift primarily affecting neutral or weakly selected substitutions. The motifs that show significant patterns in primates may be a result of conserved genomic regions that are under stronger selection and have had duplications or transpositions in their evolutionary history, thus increasing the number of local motifs. It is likely that a similar study in another taxonomic group will not find the same motifs as significant but would find a similar pattern of an increasing number of motifs correlating with stronger sexual selection.
We suggest a vital role for cis-regulatory elements, in particular AREs, in sexual selection as they can help resolve intralocus conflict by using sex-biased signals to create sex-biased gene expression that can lead to sexual dimorphism (Wittman et al., 2021, Cox et al., 2017. Importantly, there are multiple mechanisms for sex-biased expression (Mank, 2017, Ellegren and Parsch, 2007, Wright et al., 2018 and increasing genome-wide counts alone cannot explain the entirety of a genomic response to sexual selection. When androgen regulation does occur, responsive genes can be both cis-and trans-regulated. When considering the diverse mechanisms for sex bias and trans-regulation of androgens, it is no surprise that male-biased genes in humans did not have a significant overrepresentation of proximal AREs. Male-biased genes did skew to the top of the distribution, indi-cating some effect of AREs. Given the widespread genomic differences in sex-biased trait expression (van der Bijl and Mank, 2021), it is reasonable to propose that some portion of these differences are controlled by AREs in vertebrate systems, as androgens are known to regulate sex-biased phenotypes. Indeed, for genes with a known androgenic response, there is an incredibly apparent overrepresentation of proximal AREs. Not every gene has a proximal ARE, just 15% have one, which is expected given upstream trans effects, the possibility of long-range regulation, and undetected AREs. More AREs were indeed present than observed throughout all of our analyses, but our choice to focus on exact matching and a subset of AREs prevents their detection. We expect that our choice represents a subset of AREs and the larger patterns will hold with more AREs, even though individual genes analyzed may change in proximal ARE count. Our results show that AREs do play some role in sex-biased expression and when the genes in question are androgen regulated, they play an incredibly important role.
In our analysis of gain and loss of AREs on gene expression across species, we find a large share of genes that could have correlations with proximal AREs do indeed exhibit such correlations. By filtering genes by male expression levels and sex bias, we are focusing on genes that are likely targets of androgen regulation, which we further ensure by including only samples with at least one species with a proximal ARE. By comparing male RPKM values and sex bias, we can see that the presence of an ARE does not automatically guarantee a sex-biased response. Although caution should be taken due to the small sampling of both species and expression levels, that about one in three genes chosen had a significant correlation between expression or sex bias with proximal AREs is a striking result.
The findings of overrepresentation of proximal AREs in androgen-responsive genes and the effects of gain or loss of proximal AREs on gene expression and sex bias demonstrate the influential roles AREs have in sex-biased traits. When considering genome-wide counts, we view an increase in AREs as both creating new locations for cis-regulation and enhancing responses in already existing androgen-responsive regions. Not every discovered ARE is necessarily active; nucleotides flanking the half-sites and inside the spacer may influence binding (Nelson et al., 1999), and the presence of the FOXA1 promoter may influence the expression of androgen-induced genes (Jin et al., 2014, Kregel et al., 2020, Sahu et al., 2011. Further, a gene with androgen regulation may not necessarily be sex biased or under sexual selection. Although these factors individually may affect androgen binding, the goal of our study was to investigate global patterns of the ARE motif in response to sexual selection. A pattern is apparent based on these traits and expression changes, particularly relative testis size and canine sexual dimorphism, which are indicators of strong sexual selection. ARE plays a crucial role in the development of male-biased traits when sexual selection acts strongly on males. Our results show that there is a corresponding increase in genome-wide AREs with increasing sexual selection in primates. As more fully sequenced genomes become available, the robustness of these findings can be further explored in primates and across multiple vertebrate systems. Although some motifs show multiple significant patterns, there is no indication that androgen receptor specificity or nucleotide sequence (as opposed to the number of AREs) is the main target of selection, suggesting that different clades might favor alternate motifs depending on their evolutionary histories. Ultimately, an analysis of additional genomes across a wider array of organisms will better elucidate the evolutionary response of genome-wide ARE motif number to an increase in sexual selection intensity. Given these findings, there is a clear need to further investigate the role that AREs and other noncoding DNA elements play in evolutionary responses to natural and sexual selection.