Multivariate stabilizing sexual selection and the evolution of male and female genital morphology in the red flour beetle*

Abstract Male genitals are highly divergent in animals with internal fertilization. Most studies attempting to explain this diversity have focused on testing the major hypotheses of genital evolution (the lock‐and‐key, pleiotropy, and sexual selection hypotheses), and quantifying the form of selection targeting male genitals has played an important role in this endeavor. However, we currently know far less about selection targeting female genitals or how male and female genitals interact during mating. Here, we use formal selection analysis to show that genital size and shape is subject to strong multivariate stabilizing sexual selection in both sexes of the red flour beetle, Tribolium castaneum. Moreover, we show significant sexual selection on the covariance between the sexes for specific aspects of genital shape suggesting that male and female genitalia also interact to determine the successful transfer of a spermatophore during mating. Our work therefore highlights the important role that both male and female genital morphologies play in determining mating success and that these effects can occur independently, as well as through their interaction. Moreover, it cautions against the overly simplistic view that the sexual selection targeting genital morphology will always be directional in form and restricted primarily to males.

several decades and has been the topic of investigation in almost all animal taxa (e.g., insects: House and Simmons 2003;House et al. 2013;spiders: Foellmer 2008;Kuntner et al. 2016;reptiles: King et al. 2009;fish: Langerhans et al. 2005;Booksmythe et al. 2016;birds: Brennan et al. 2010birds: Brennan et al. , 2017mammals: Stockley 2002;Ramm 2007). More recently, there have been a growing number of studies showing that female genitals may often be as variable as male genitals and may also evolve as rapidly (e.g., Simmons 2014; Ah- King et al. 2014). Despite this, there is a strong under-representation of studies on female genitalia (Ah- King et al. 2014), as well as a general lack of studies examining how complex interactions between the sexes can shape genital coevolution (Ah- King et al. 2014;Brennan and Prum 2015). A more detailed understanding of genital evolution therefore requires a greater focus on female genitals and how they interact with male genitals during mating (Ah- King et al. 2014;Brennan and Prum 2015).
Historical explanations for the evolution of genitals have largely focused on three main hypotheses: the lock-and-key, the pleiotropy, and the sexual selection hypotheses (Hosken and Stockley 2004). Originally, the lock-and-key hypothesis proposes that genital divergence has evolved to prevent hybridization by ensuring that only males of the correct species were able to provide the right "key" for the female "lock" (Dufour 1844). More recently, however, Arnqvist (1997) proposed that a similar process could also operate within a species, with the poor alignment of male and female genitals reducing male mating success. The pleiotropy hypothesis proposes that the divergence in genital morphology is due to the pleiotropic effects of selection on other nongenital traits (Mayr 1963). Thus, genital divergence is considered a neutral process with genitals only evolving because they are genetically correlated with other nongenital traits that are the target of selection (Mayr 1963). Finally, the sexual selection hypothesis proposes that a number of different processes, most notably cryptic female choice for males with genitals that are better able to stimulate them during mating or through sperm competition and/or sexual conflict, all have the potential to drive genital divergence (Hosken and Stockley 2004). Although there is a degree of empirical support for both the lock-and-key (e.g., Wojcieszek andSimmons 2012, 2013;Anderson and Langerhans 2015) and pleiotropy (e.g., Arnqvist and Thornhill 1998;House and Simmons 2005) hypotheses, most support is currently for the sexual selection hypothesis (e.g. Hosken and Stockley 2004;Ah-King et al. 2014;Simmons 2014). This is not altogether surprising given the much larger number of studies testing this hypothesis (Ah- King et al. 2014) and highlights the need for more empirical studies focused on the lock-and-key and pleiotropy hypotheses.
One important criterion that has been used to discriminate between these alternate hypotheses is the form of selection targeting male genitals (Arnqvist 1997;Hosken and Stockley 2004). According to the original lock-and-key hypothesis, it is predicted that a pattern of stabilizing natural selection prevents hybridization by ensuring that male genitals are species specific and provide the correct "key" to fit the female "lock" (Dufour 1844). Moreover, within a species, the reduction in mating success that occurs when male genitals are poorly aligned with the average genital structure of females in the population is predicted to result in a pattern of stabilizing sexual selection targeting male genitals (Arnqvist 1997). Thus, although both scenarios predict that male genital morphology will be under stabilizing selection, the mode of selection is fundamentally different (i.e., natural vs. sexual selection). According to the sexual selection hypothesis, variation in male genital morphology is related to fertilization success, with males having extreme genitals being the most successful due to their stimulatory, competitive, and/or coercive ability (Arnqvist 1997). Sexual selection is therefore predicted to impose strong linear (or directional) selection on male genitals (Arnqvist 1997;Hosken and Stockley 2004). In contrast, according to the pleiotropy hypothesis, male genital morphology does not correlate with fitness and therefore should not experience any direct selection (Arnqvist 1997). However, if phenotypically correlated with other traits under selection, male genitals can experience indirect selection and this can take any form (Arnqvist 1997).
The ability to empirically quantify the strength and form of selection acting on male genitals has been greatly enhanced by the use of multivariate selection analysis (Lande and Arnold 1983) and insects have played a key role in this endeavor. The majority of studies on insects have documented linear selection on male genitals (e.g., damselflies: Cordoba-Aguilar 1999, 2009water strider: Arnqvist and Danielsson 1999;Danielsson and Askenmo 1999;Bertin and Fairbairn 2005;praying mantis: Holwell et al. 2010;oriental beetle: Wenninger and Averill 2006;earwig: van Lieshout 2011;van Lieshout and Elgar 2011), although stabilizing selection has been shown to target some aspects of male genital morphology as well (seed bug : Tadler 1999;Dougherty and Shuker 2016;dung beetle: Simmons et al. 2009;millipede: Wojcieszek and Simmons 2012;broad horned beetle: House et al. 2016;water strider: Bertin and Fairbairn 2005). Although this finding appears to support the sexual selection hypothesis, it is important to note that it is easier to statistically detect linear than stabilizing selection (Lande and Arnold 1983), and that there are also many empirical examples showing that sexual selection is not always linear in form (e.g., Chenoweth and Blows 2005;Brooks et al. 2005;Bentsen et al. 2006;Gerhardt and Brooks 2009;Steiger et al. 2013). Unfortunately, we currently lack similar formal estimates of selection for female genital morphology. In general, more comprehensive studies are needed (ideally exceeding several hundred individuals; Hersch and Phillips 2004) to characterize the pattern of linear and nonlinear selection acting on genital morphology, especially for females.
The red flour beetle, Tribolium castaneum (Coleoptera: Tenebrionidae), is a model species in the study of sexual selection (reviewed in Fedina and Lewis 2008). This species is highly polygamous and will mate every few minutes, yet as high as 55% of mating attempts do not produce viable offspring (Lewis and Iannini 1995;Bloch Qazi et al. 1996;Pai et al. 2005;Fedina and Lewis 2008). Although pericopulatory processes are likely to explain this outcome (Tyler and Tregenza 2012), the role that male and female genital morphology (or their interaction) play in the successful transfer of a spermatophore remains unknown. Here, we use multivariate selection analysis (Lande and Arnold 1983) to characterize the strength and form of direct linear and nonlinear sexual selection acting independently on male and female genital size and shape in the red flour beetle, T. castaneum. Having shown that the fitness surfaces for both sexes contain a peak, we then formally test whether this represents stabilizing selection. It has been argued that a fitness surface containing a peak should only be referred to as stabilizing selection if the peak resides within the phenotypic space sampled (Mitchell-Olds and Shaw 1987). To test this, we estimated the location of the global maxima and the 95% confidence region (CR) for each sex and mapped these alongside the fitness surfaces displaying the individual measures of genital size and shape. Finally, we estimated the sign and strength of correlational selection that targets the covariance between male and female genital size and shape. The standardized selection gradients from this analysis therefore measure the importance of the interaction between male and female genital morphology for the successful transfer of a spermatophore during mating.

STOCK POPULATIONS AND REARING PROTOCOL
A total of six stock populations (ß200 beetles per population) of the widely used Georgia 1 (GA1) "wild-type" strain of T. castaneum were originally derived from the Beeman Lab (US Grain Marketing Production Research Centre). These populations were cultured in ad libitum standard medium (95% white flour and 5% bakers' yeast) and maintained at 30°C, 60% humidity and on a 16:8 h light:dark cycle for three generations prior to the start of our experiment. Each generation, adult beetles were mixed at random between populations to ensure gene flow and the maintenance of genetic variation.

EXPERIMENTAL PROCEDURE
Male and female beetles used in this experiment were taken at random from the stock populations. To ensure virginity, pupae were collected from the stock populations over a two-week period. A set of nested sieves were used to separate the pupae from the adults and medium. The pupae were then removed from the sieve with soft grip tweezers and their sex was determined under a microscope. Each pupa was then placed into an individual cell of a unisex, square plastic transparent box (10 cm 2 ; 25 cells per box, 2 cm 2 per cell), with each cell half-filled with medium. The boxes were checked daily and the eclosion date for emerging adults was recorded to ensure that only virgin adults aged 7-21 days were used in the mating trials (Attia and Tregenza 2004).

MATING TRIALS
Mating trials were conducted at 22 ± 1°C in a mating arena that consisted of 2 × 2 cm cells in a 25 cell box that was lined with paper to provide traction (Tyler and Tregenza 2012). In every trial, a female was first placed into one of the mating arena cells followed by a male. The time of male introduction and the start and end of mating was recorded. Males typically make multiple mounting attempts; however, we define mating as a mounting that lasted longer than 30 s, as shorter mating attempts are unlikely to result in the transfer of a spermatophore (Tyler and Tregenza 2012). Following a mating longer than 30 s, the male was removed and frozen (n = 535). To verify whether a mating attempt had been successful or unsuccessful, each female was placed in a 60-mL breeding pot (sized 67 mm × 34 mm) that contained 30 mL of standard media to oviposit under the standard incubation conditions. After seven days, each female was removed and frozen (n = 535). Forty days later, each pot was checked for the presence or absence of offspring (now newly eclosed beetles) to verify whether mating was successful or failed. Mating pairs were classified as successfully mated if mating resulted in offspring and received a fitness score of one (n = 216). Mating pairs were classified as unsuccessfully mated if no offspring were produced and received a fitness score of zero (n = 282).

DISSECTIONS AND GEOMETRIC MORPHOMETRICS
The male genitalia were removed from the abdomen and mounted on a microscope slide in a drop of Hoyer's solution. The female genitalia were squeezed out of the body by gently pressing the abdomen and mounted in a drop of Hoyer's solution, while still attached to the body. The genitalia are delicate and prone to damage during dissection. As we are interested in estimating the selection independently targeting male and female genitals, as well as the covariance between these structures during mating, we required intact genitals from both individuals in the interacting mating pair. When damage to the male or female genitals occurred, we removed the mating pair from the dataset leaving a final sample size of 498. All genitalia were placed in a consistent, longitudinal orientation and digital images were taken using a Leica DFC295 digital microscope camera that was mounted on a dissecting Leica M125 microscope ( Figure S1 and S2). Due to the complexity of the male and female genitalia, geometric morphometric (GM) analysis was used to quantify the variation in the size and shape of the outline of the male aedeagus and female vagina and supporting structures. A description of the programs used to digitize the male and female genitalia and analyze the GM data is described in Figure S1 and S2. Although our shape analysis for males and females returned a total of 19 and 36 RW scores, respectively, only the first four were used as they each accounted for over 75% of the shape variation (Gutierrez et al. 2011). Our measurements of genital morphology were highly repeatable in both sexes (Table S1).

Characterizing linear and nonlinear sexual selection on male and female genital size and shape
We used standard multivariate selection analysis (Lande and Arnold 1983) to evaluate the strength and form of linear and nonlinear selection acting on male and female genital size and shape. An absolute fitness score was assigned to each male and female in our experiment, with one being assigned to the male and female in a pair that successfully obtained a mating and zero being assigned to the male and female in unsuccessful pairs. Following Lande and Arnold (1983), this absolute fitness score was transformed to relative fitness by dividing by the mean absolute fitness of the population.
To estimate the standardized linear selection gradients (β), a first-order linear multiple regression model was fitted using centroid size (CS) and the first four RW scores describing the variation in male and female genital shape as the predictor variables, and relative fitness as the response variable (Lande and Arnold 1983). We then used a second-order quadratic multiple regression model that included all linear, quadratic, and cross-product terms to estimate the matrix of nonlinear selection gradients (γ) that describes the curvature of the fitness surface. Quadratic regression coefficients are known to be underestimated by a factor of 0.5 using standard multiple regression analysis, so we doubled the quadratic selection gradients derived from this model (Stinchcombe et al. 2008). In addition to these conventional models for each sex, we also fit a second multiple regression model to account for the interaction between male and female genitals during mating. In this model, we first estimated β using a first-order linear multiple regression model that included CS and the four RW scores for each sex as the predictor variables, and relative fitness as the response variable. We then estimated γ using a second-order quadratic multiple regression model that included all linear, quadratic, and cross-product terms for each sex, as well as cross-product terms between each sex, as the predictor variables and relative fitness as the response variable. It was not our intention to interpret the selection gradients from this analysis but simply to demonstrate that including the genital morphology of both sexes and their interaction in the same model did not alter the sign or strength of the resulting selection gradients compared to the conventional models.
As relative fitness does not conform to a normal distribution, we used a resampling procedure to assess the significance of our standardized selection gradients (Mitchell-Olds and Shaw 1987). We randomly shuffled relative fitness scores across male and female pairs in our dataset to obtain a null distribution for each selection gradient where there is no relationship between our measures of genital size and shape and relative fitness. We used a Monte Carlo simulation to determine the proportion (p) of times (out of 10,000 iterations) that each gradient pseudo-estimate was equal to or less than the original estimated gradient, and this was used to calculate a two-tailed probability value (as 2p if P < 0.5 or as 2(1p) if P > 0.5) for each selection gradient in the model (Manly 1997). We conducted separate randomization tests for the linear multiple regression model and the full quadratic model (including linear, quadratic, and correlational terms) following the procedure outlined above.
As the strength of nonlinear selection gradients can be underestimated by interpreting the size and significance of individual γ coefficients , we explored the extent of nonlinear selection acting on male and female genital size and shape by conducting a canonical analysis of the γ matrix to locate major eigenvectors of the fitness surface in each sex (Phillips and Arnold 1989). For each sex, we used the permutation procedure outlined in Reynolds et al. (2010) to determine the strength and significance of nonlinear selection operating along the eigenvectors of γ. This procedure, however, does not estimate the strength of linear selection operating along the eigenvectors of γ and we therefore used the "double regression" method of Bisgaard and Ankenman (1996) to estimate this form of selection acting along each eigenvector. The strength of linear selection along each eigenvector (m i ) is given by theta (θ i ), whereas the strength of nonlinear selection is given by their eigenvalue (λ i ).
We used thin-plate splines (Green and Silverman 1994) to visualize the major eigenvectors of the fitness surface extracted from the canonical rotation of the γ for males and females. We used the "Tps" function in the FIELDS package of R (version 2.13.0, www.r-project.org) to fit the thin-plate splines, and visualized splines as a contour map using the value of smoothing parameter (λ) that minimized the generalized cross-validation score (Green and Silverman 1994).
Estimating the location of the global maxima on the fitness surface and its 95% CR According to Lande and Arnold (1983), a phenotypic trait subject to stabilizing selection is characterized by a negative standardized quadratic selection gradient (i.e., γ < 0). The problem with this definition is that a negative value of γ does not guarantee that a peak in fitness will exist within the range of existing phenotypes sampled (Mitchell-Olds and Shaw 1987). This scenario could arise, for example, if there is a monotonic increase in fitness across the range of phenotypes examined, meaning the location of the fitness peak must be based purely on extrapolation (Mitchell-Olds and Shaw 1987). Mitchell-Olds and Shaw (1987) therefore proposed that stabilizing selection should be limited to cases where extreme phenotypes have lower fitness and maximum fitness occurs at some intermediate point in the phenotypic distribution. Although Mitchell-Olds and Shaw (1987) provide a number of statistical tests to support this definition of stabilizing selection, a simpler approach is to visually compare the location of the fitness peak to the distribution of the phenotypic data. To this end, we estimated the location of the global maxima (fitness peak) and the 95% CR for the fitness surface of each sex and mapped these alongside the same fitness surfaces displaying the individual measures of genital size and shape.
Existing methods for finding the CR associated with the location of the maxima of a regression function rely on the assumption that the data are normally distributed (Peterson et al. 2002), which is clearly not the case for our measure of fitness. Here, we use nonparametric bootstrapping method that we have previously developed (del Castillo et al. 2016) that is not based on any distributional assumptions and can find CRs on the location of the maxima of either quadratic polynomial models or of more flexible thin-plate spline models. This approach is provided by the "OptRegionTps" function in the OPTIMAREGION package of R (del Castillo et al. 2016).
In brief, a quadratic polynomial model was fit to the data using ordinary least squares regression implemented in the "lm" function of R, yielding a fitted response surfaceŷ(x) and residuals r i = y(x) −ŷ(x). We then applied bootstrapping to the residuals to create bootstrapped realizations y * (x) =ŷ(x) + r * for each data point in our dataset (x). For each simulated set of y * (x), we fit a quadratic polynomial and found parameter estimates γ * . Following Yeh and Singh (1997), we repeated this procedure 1000 times and computed Tukey's data depth for each generated γ * vector, keeping the 100(1 − α) % deepest (where in our case a = 0.95). This provides an approximate nonparametric bootstrap 95% CR for the quadratic polynomial coefficients (γ). The responses y * (x) that corresponded to the parameter vectors γ * lying inside of their CR were then maximized numerically using the NLOPTR package of R (Johnson 2014;Ypma 2014) with respect to the regressors (x 1 , x 2 . . . xn) yielding the bootstrapped response global maxima (x * ). The nonparametric bootstrapped CR for the location of the global maximum of the fitness function is computed as the convex hull of all the bootstrapped maxima (x * ) that were found. We use the centroid (average) of all the maxima found as our point estimate of the global maxima of the fitness surface.
Characterizing sexual selection on the interaction between male and female genital morphology Because we measured the genital size and shape of both males and females in each interacting pair, as well as the outcome of this interaction, we were able to estimate the sign and strength of the correlational selection operating on the covariance between these traits across the sexes. We estimated these between-sex correlational gradients by fitting a multiple regression model that included the standardized linear, quadratic and cross-product terms for each sex, and the standardized cross-product terms between the sexes as the predictor variables and relative fitness as the response variable. As these gradients essentially measure how genital size and shape interact between the sexes to determine fitness, we refer to the resulting between-sex covariance matrix from this analysis as the interaction matrix and consider the terms in this matrix as being analogous to conventional Lande and Arnold (1983) selection gradients for subsequent interpretation. We used the resampling and thin-plate spline procedures outlined above to test the statistical significance and to visualize the standardized correlational selection gradients, respectively.

RESULTS
GM analyses yielded a measure of CS and four RW scores for each sex that together explained 78.40% and 85.38% of the total variation in male and female genital shape, respectively. In males, RW1 explained 39.25% of the total variance in genital shape with negative values corresponding to a short, wide aedeagus and positive values to a long, narrow aedeagus (Fig. 1A). RW2 explained a further 15.04% of this total variance with negative values of RW2 corresponding to an anteriorly shortened tip of the aedeagus and positive values to an anteriorly lengthened tip (Fig. 1C). RW3 explained 12.79% of the total variance in male genital shape with negative values corresponding to an anticlockwise twist of the anterior tip of the aedeagus and positive values to a clockwise twist of the anterior tip of the aedeagus (Fig. 1E). RW4 explained the remaining 11.32% of the total variance in male genital shape with negative values corresponding to a compression of the leftside, posterior of the aedeagus and positive values to a similar compression but on the right-side of the aedeagus (Fig. 1G).
In females, RW1 explained 59.26% of the total variance in genital shape with negative values corresponding to a wide, short vaginal aperture and positive values to a narrow, elongated vaginal aperture (Fig. 1B). RW2 explained a further 14.26% of the total variation in female genital shape with negative values corresponding to narrower, longer supportive structures of the vagina and positive values to broader, curved supportive structures (Fig. 1D). RW3 explained 7.82% of the total variance in female genital shape with negative values corresponding to an extreme posterior elongation of the supportive structures of the vagina and positive values to an extreme posterior broadening of these supportive structures (Fig. 1F). RW4 explained the remaining 4.04% of the total variance in female genital shape with negative values corresponding to a shorter, wider vaginal aperture and posterior broadening of the supportive structures and positive values to a narrower, tapering vaginal aperture and posterior elongation of the supportive structures (Fig. 1H).
Standardized linear, quadratic, and correlational selection gradients for genital size and shape in males and females are presented in Table 1, panels A and B, respectively. In males, there was significant linear selection favoring increased values of RW1 (long, narrow aedeagus), RW3 (clockwise twist of the  There was also significant negative quadratic selection on CS, RW1 (length and width of the vaginal aperture), and RW4 and negative correlational selection on CS and RW1. Importantly, a full model that included the genital morphology of both sexes and their interaction produced quantitatively similar gradients (Table S2): that is, there were strong positive correlations among the linear (r = 0.99, n = 10, P = 0.0001), quadratic (r = 0.99, n = 10, P = 0.0001), and correlational (correlational: r = 0.92, n = 20, P = 0.0001) selection gradients from our conventional models and this full model. We conducted a canonical rotation of the γ matrices presented in Table 1 to locate the major dimensions of nonlinear sexual selection for male and female genital size and shape. The resulting M matrices of eigenvectors and their associated eigenvalues are presented in Table 2, panels A and B, respectively. In males, three of the five eigenvectors (m 3 -m 5 ) had negative eigenvalues, whereas the remaining two eigenvectors (m 1 and m 2 ) had positive eigenvalues (Table 2, panel A). However, there is only significant nonlinear selection operating on m 4 and m 5 , demonstrating that the fitness surface is best described as a multivariate peak in shape ( Fig. 2A). There was also negative linear selection operating on m 2 that largely favors an increase in RW1 and RW4 (Table 2, panel A). In females, four of the five eigenvectors (m 2m 5 ) had negative eigenvalues, whereas the remaining eigenvector (m 1 ) had a positive eigenvalue (Table 2, panel B). As shown for males, significant nonlinear selection was only detected on m 4 and m 5 , indicating that the fitness surface for females is also best described as a multivariate peak in shape (Fig. 2B). There was also significant positive linear selection on m 4 that largely favors a reduction in RW4, and negative linear selection on m 2 that largely favors a reduction in RW3 (Table 2, Figures 2A and 2B, respectively. We can therefore be confident in formally defining the observed pattern of nonlinear selection in the sexes as multivariate stabilizing sexual selection (Mitchell-Olds and Shaw 1987). Table 3 provides the interaction matrix of standardized correlational selection gradients for genital size and shape across the sexes. There was significant negative correlational selection on RW1 in males and RW4 in females (Table 3) and inspection of the thin-plate spline (Fig. 3A) showed that fitness was highest at negative values of RW1 in males and positive values of RW4 in females (Fig. 3A). Consequently, the fitness of an interacting male and female beetle is highest when males have a short, wide aedeagus (Fig. 1A) and females have a narrower, tapering vaginal aperture and posterior elongation of the associated supportive structures (Fig. 1H). There was also significant negative correlational selection on RW3 in males and RW4 in females ( Table 3) and inspection of the thin-plate splines (Fig. 3B) showed that fitness was highest at negative values of RW3 in males and     positive values of RW4 in females. As a result, the fitness of an interacting pair is highest when males have an anticlockwise twist to the anterior tip of the aedeagus (Fig. 1E) and females have a narrower, tapering vaginal aperture and posterior elongation of the supportive structures (Fig. 1H).

Discussion
Explaining why the genital morphology of males is so highly divergent in species with internal fertilization has intrigued evolutionary biologists for decades (Hosken and Stockley 2004;Simmons 2014). Here, we show that male and female genital size and shape play an important role in the successful transfer of a spermatophore during mating in the red flour beetle (T. castaneum). This imposes strong sexual selection on male and female genital morphology that is multivariate stabilizing in form, being characterized by a well-defined peak in fitness at intermediate values of genital size and shape in both sexes. We also found that sexual selection targeted the covariance between the sexes for specific aspects of genital shape, indicating that the interaction between male and female genitals also plays an important role in the successful transfer of a spermatophore during mating. Collectively, our work highlights the important yet often-ignored role that female genital morphology plays in determining mating success (Ah- King et al. 2014;Simmons 2014) and shows that these effects can occur independently, as well as through their interaction with male genital morphology. Moreover, our work cautions against the overly simplistic view that the sexual selection targeting genital morphology will always be directional in form (Arnqvist 1997).
Over two decades ago, Arnqvist (1997) provided a set of criteria to distinguish among the three major hypotheses of genital evolution for use in single species studies, with the form of selection targeting male genital morphology being an important criterion in this checklist (Arnqvist 1997). Our finding that the effects of male genital size and shape on mating success in T. castaneum generate a pattern of multivariate stabilizing sexual selection on this trait is in general agreement with Arnqvist's (1997) within-species view of the lock-and-key hypothesis, and demonstrates a clear and important role for sexual selection in male genital evolution in this species. In insects, linear selection on male genital morphology (e.g., Cordoba-Aguilar 1999, 2009Arnqvist and Danielsson 1999, Danielsson and Askenmo 1999, Bertin and Fairbairn 2005Wenninger and Averill 2006;Holwell et al. 2010;van Lieshout 2011;van Lieshout and Elgar 2011) appears more common than stabilizing selection (e.g., Tadler 1999;Simmons et al. 2009;Wojcieszek and Simmons 2012;House et al. 2016;Dougherty and Shuker 2016). This pattern also appears true more generally for selection on male sexual traits, although it should be noted that most experimental designs have more power to detect linear than nonlinear forms of selection (Higgins et al. 2009;Hunt et al. 2010). Interestingly, stabilizing sexual selection does appear common in a number of signaler-receiver systems. For example, female mate choice exerts multivariate stabilizing selection on temporal and spectral components of the male advertisement call in grasshoppers (Butlin et al. 1985), field crickets (Brooks et al. 2005), and anurans (Gerhardt 1991;Polakow et al. 1995;Gerhardt and Brooks 2009). In these systems, the transmission of an acoustic signal is constrained by the biophysics of signal generation and emission, and how the sensory organs of the receiver perceive and process the signal (Endler and Basolo 1998). As the sensory system of the female can only detect call components within a narrow range, this produces a pattern of multivariate stabilizing selection acting on male call structure (Endler and Basolo 1998). Although the most parsimonious explanation for the pattern of multivariate stabilizing sexual selection we observe on male genital morphology in T. castaneum is the "mechanical fit" of genitals, we cannot rule out the possibility that a similar "sensory-based" lock-and-key process does not exist. Indeed, Eberhard et al. (1998) has argued that male genitals are perceived by the female using tactile channels, with an intermediate male genital morphology being favored because they best stimulate the average female in the population (the "one-size-fits-all" hypothesis). Distinguishing between mechanical and sensory lock-and-key processes, however, has proven difficult (Eberhard et al. 1998) and clearly more work is needed before this can be achieved in T. castaneum.
Although the evolution of male genital morphology has been the subject of intense research, female genitals have been relatively understudied (Ah- King et al. 2014). This bias in genital research likely stems from the long-held view that males play the dominant role in sex, and that female genitals are largely invariant (Ah- King et al. 2014). This bias is likely to be further compounded by the fact that variation in male genital morphology is typically large across populations and species (Wojcieszek and Simmons 2012;Soto et al. 2013;Hosken et al. 2019) making them far easier to study. Our work, however, directly challenges these views by showing that female genital morphology in T. castaneum is far from invariant, and that the existing variation in this trait is also subject to strong multivariate stabilizing sexual selection. In fact, the nonlinear selection gradients describing the pattern of multivariate stabilizing sexual selection acting on female genital morphology were as strong as those reported for male genital morphology, further highlighting the equally important role that females play in determining the outcome of mating in this species. Although the major models of genital evolution do not provide any clear predictions regarding the strength and form of selection acting on female genitals, we believe that the pattern of sexual selection we document for female T. castaneum adds further support to an important role for a "lock-and-key" process in the evolution of genital morphology in this species. The operation of the "lock-and-key" process centers on the alignment of the male and female genitals during mating, with the optimal male genital morphology being the one that, on average, most closely aligns with the average female genital morphology in the population. As any deviation from this optimal morphology decreases the fit with the female genitals and reduces subsequent mating success, stabilizing selection is predicted to target male genital morphology (Arnqvist 1997). However, because the successful outcome of mating depends on both sexes, the poor alignment of genitals and the resulting reduction in mating success can also be caused by the female genitals deviating from the "average" in the population, which should also generate stabilizing selection on female genital morphology. Furthermore, as the "lock-and-key" process is based on the interaction between male and female genitals during mating, sexual selection should also target the covariance between these traits across the sexes-a finding that is also supported by our study. Collectively, our work highlights the value of formally estimating sexual selection on genital morphology in both sexes, and supports recent claims that our understanding of genital evolution will continue to be hampered until the persisting male bias in genital research is addressed (Ah- King et al. 2014).
Although the strength and statistical significance of our estimated selection gradients indicate that stabilizing selection is the predominant form of selection targeting male and female genital morphology in T. castaneum, it is important to note that significant linear selection was also detected in both sexes for several aspects of genital shape (RW1, RW3, and RW4 in males and RW3 and RW4 in females) and also along the major eigenvectors of selection (m 2 in males and m 2 and m 4 in females). Exactly how these aspects of genital shape provide an advantage to males and females during the transfer of a spermatophore is currently unclear, although it is possible that it helps better position the male genitals or the ability of the female genitals to accommodate these structures during the transfer of a spermatophore (e.g., Werner and Simmons 2008). It is important to highlight, however, that because we only examined a single episode of selection (i.e., mating success), this is likely to provide an incomplete measure of all the linear selection targeting male and female genital morphology, and consequently the total sexual selection acting on these traits. This is particularly true given that a number of postcopulatory processes in insects, including sperm competition (e.g. Danielsson and Askenmo 1999;Wenninger and Averill 2006) and cryptic female choice (e.g., Cordoba-Aguilar 1999), are known to impose significant linear selection on male genital morphology, although the contribution of female genitals to these episodes of selection remains largely unknown (Ah- King et al. 2014). Consequently, the selection gradients we provide should only be interpreted within the context of mating success and not considered as estimates of total selection acting on male and female genital morphology.
Our work shows that male and female genital morphology not only has important independent effects on the successful transfer of a spermatophore during mating in T. castaneum, but also that the interaction between these traits influences mating success. That is, sexual selection targets male and female genital morphology directly, as well as indirectly via the covariance between these traits. To our knowledge, this is the first time such an approach has been used to study genital evolution, despite offering a novel means to assess how important the interaction of male and female genital morphology is to mating success. In T. castaneum, we found significant negative correlational selection gradients for two aspects of genital shape: RW4 in females with RW1 and RW3 in males. Biologically, this means that a mating pair will have a higher success in transferring a spermatophore when a male with either a short, wide aedeagus (RW1) or an anticlockwise twist to the anterior tip of the aedeagus (RW3) mates with a female having a narrower, tapering vaginal aperture and posterior elongation of the supportive structures (RW4). Explaining these relationships at this stage would be purely speculative, as isolating a mechanism requires functional studies (Ah- King et al. 2014;Simmons 2014;Brennan and Prum 2015). Unfortunately, only a handful of such studies exist for insects (Ronn et al. 2007;Werner and Simmons 2008;Polak and Rashed 2010;Kahn et al. 2010;Hotzy et al. 2012). For example, Werner and Simmons (2008) used histology to show that three of the genital sclerites in male dung beetles (Onthophagus taurus) form a functionally integrated unit that generates the tubular-shaped spermatophore and delivers its opening to the female's spermathecal duct, whereas a fourth serves as a holdfast device during mating. It is possible that a similar mechanical process enables a short, wide aedeagus in T. castaneum or one with an anticlockwise twist to the anterior tip to more efficiently deliver a spermatophore or better anchor the male when mating to a female with a narrower, tapering vaginal aperture with posterior elongation of the supportive structures. However, detailed functional studies are clearly needed to confirm this.
The pattern of sexual selection we document for male and female genital morphology in T. castaneum is likely to provide a number of important insights for genital evolution. First, strong linear and stabilizing selection is predicted to deplete the additive genetic variance of phenotypic traits (Lande 1979). This is especially true for male and female genital morphology in T. castaneum, where our estimated quadratic selection gradients were almost twice the median strength (γ = |0.16|) reported across studies based on mating success (Kingsolver et al. 2001). Indeed, where stabilizing selection has been demonstrated, it appears equally as strong, at least for insects (γ = -0.28, Tadler 1999; γ = -0.19 and -0.34, House et al. 2016;γ = -0.38, Dougherty and Shuker 2016). Yet, genital morphology is known to evolve rapidly (e.g., Simmons et al. 2009;House et al. 2013;Hopwood et al. 2016) and contain as much additive genetic variance as nongenital traits (e.g., Arnqvist and Thornhill 1998;House and Simmons 2005;Higgins et al. 2009;Kamimura and Iwase 2010). This suggests that mechanism(s) must exist to preserve the additive genetic variance of genital morphology, as has been argued more generally for sexual traits subject to strong selection (Kirkpatrick and Ryan 1991). Second, in a population subject to persistent stabilizing selection in the absence of frequency-dependent selection, theory predicts that the population mean phenotype should evolve to match the peak in fitness (Simpson 1953;Lande 1979). Indeed, Estes and Arnold (2007) showed that a quantitative genetic model where the fitness optimum was able to evolve to the optimum within an adaptive zone with stable boundaries performed significantly better than five other competing models to explain the evolution of phenotypic means in an extensive database (Gingerich 2001). If the presence of stabilizing selection and the evolutionary response of the population mean is more common than generally appreciated for genitals, as might be expected under the lock-and-key hypothesis, it is relatively easy to envisage intra-and interspecific differences in genital morphology evolving as a result of variation in the location of the adaptive optima (Hansen 1997). This would certainly help explain the adaptive radiation in genital morphology frequently observed across natural (e.g., Wojcieszek and Simmons 2012;Heinen-Kay and Langerhans 2013;Oneal and Knowles 2013) and experimental (e.g. Simmons et al. 2009;House et al. 2013;Hopwood et al. 2016) populations, as well as the extreme diversification in genitals observed across closely related species (e.g., Kuntner et al. 2016). Finally, theory predicts that correlational selection will generate a genetic correlation between the two traits by creating linkage disequilibrium (Lande 1984) or by favoring pleiotropic mutations (Lande 1980), although these mechanisms are not strictly required if correlational selection is strong and persistent (Sinervo and Svensson 2002). Although we do not know of any formal models examining whether correlational selection can also generate intersexual genetic correlations, there is empirical evidence showing that artificial selection on the covariance between male and female traits can dramatically alter the strength of the intersexual genetic correlation between these traits (Delph et al. 2011;Stewart and Rice 2018). It is therefore possible that the correlational selection we document on the covariance between aspects of genital shape in male and female T. castaneum will also promote these trait being genetically correlated across the sexes. Ultimately, this will facilitate the coevolution of male and female genital shape in T. castaneum (Lande 1980), and may explain why the sexual co-evolution of genitals has been so widely documented in both experimental evolution and comparative studies (reviewed in Brennan and Prum 2015).
In conclusion, our study shows that both male and female genital morphology is subject to strong multivariate stabilizing sexual selection in T. castaneum, but that sexual selection also targets the covariance between the sexes for aspects of genital shape, indicating that how the genitals interact during mating is also important to the successful transfer of a spermatophore in this species. Both findings provide empirical support for the within species "lock-and-key" hypothesis of genital evolution (Arnqvist 1997), although we cannot determine at this stage whether this process is driven by a mechanical or sensory-based interaction (or both) during mating. The pattern of sexual selection we document for male and female genital morphology in T. castaneum is likely to have a number of important implications for genital evolution, including explaining the adaptive radiation of genitals across populations and the diversification of closely related species, as well as the co-evolution of male and female genital morphology.

Supporting Information
Additional supporting information may be found online in the Supporting Information section at the end of the article. Figure S1. To facilitate GM measurement of the male aedeagus, a total of 5 type 2 landmarks that are discrete and could be consistently identified were located (( ) -1, 6, 10, 14 and 19) and represent points at the endpoint of the aedaegus (points 1, 10 and 19) and at the minimum curvature of the bulge at the tip of the aedaegus (points 6 and 14) (Zelditch et al., 2014). Figure S2. To facilitate GM measurement of the female vagina and supporting structures, 18 type 2 landmarks that could be consistently identified ( ( ) -1, 5, 6, 7, 8, 13, 14, 16, 18, 19, 21, 23, 24, 29, 30, 31, 32 and 36) and another 18 sliding semi-landmarks were placed along the female genital outline using the programs described above ( Figure S1). Table S1. Repeatability estimates and 95% confidence intervals (CIs) for centroid size (CS) and the first four relative warp (RW) scores describing genital shape in male and female T. castaneum. Table S2. The vector of standardized linear selection gradients (β) and the matrix of quadratic and correlational selection gradients (γ) for successful mating in male and female Tribolium castaneum.