Estimation of the strength of mate preference from mated pairs observed in the wild

Abstract A number of key processes in evolution are driven by individuals preferring mates with particular phenotypes. However, despite long‐standing interest, it is difficult to quantify the strength of mate preference from phenotypic observations in nature in a way that connects directly to key parameters in theoretical models. To bridge the gap between mathematical models and empirical data, we develop a novel maximum likelihood‐based method to estimate the strength and form of mate preference, where preference depends on traits expressed in both males and females. Using simulated data, we demonstrate that our method accurately infers model parameters, including the strength of mate preference and the optimal offset match between trait values in mated pairs when model assumptions are satisfied. Applying our method to two previous studies of assortative mating in marine gastropods and the European common frog, we support previous findings, but also give additional insight into the role of mate preference in each system. Our method can be generalized to a variety of plant and animal taxa that exhibit mating preferences to facilitate the testing of evolutionary hypotheses and link empirical data to theoretical models of assortative mating, sexual selection, and speciation.

under frequency-dependent selection (Chamberlain et al. 2009), and mate preferences evolve adaptively to produce the most fit offspring (see Jiang et al. 2013). Similarly, mate preference for a phenotypically or genetically similar mate may evolve due to outbreeding depression (Epinat and Lenormand 2009) and may contribute to population structure in small populations that may be of conservation interest (e.g., Langin et al. 2015). Henceforth, we consider the specific case of the trait matching rule and define mate preference as the preference for a mate conditioned upon the individual chooser's own phenotype, which implies that phenotypic traits can be measured in both sexes.
Mate preference is fundamental to biological questions linking ecology, phenotypic evolution, and the genetic basis for mating traits, yet our capacity to measure mate preference and study its consequences in natural systems is highly limited. Much of what we know and the predictions we are able to make about evolutionary processes involving mate preference stem from theoretical models, laboratory studies, and simulations, with less emphasis on field studies (Turelli et al. 2001;Kirkpatrick and Ravigné 2002;Gavrilets 2004;Carvajal-Rodriguez and Rolán-Alvarez 2014). This is because in most, if not all cases, it is impossible to directly measure mate preference in the wild. Experimentally, mate preferences can be studied using mate-choice trials, but such trials are accompanied by assumptions and confounding effects not found in nature and are not feasible in many study organisms (Johnson and Marzluff 1990;Dougherty 2020). Simulation studies connect a mathematical framework to the sampling process and allow us to study the effects of specific parameters on evolutionary outcomes given a set of assumptions. However, mathematical models go notably untested with field data, leaving a lacuna between theory and the natural world. One tractable approach in studies of natural populations is to infer mate preference via a Pearson correlation coefficient calculated between quantitative traits measured within known mated pairs (Jiang et al. 2013). This is valuable because it has allowed us to observe presence or absence of assortative mating across a variety of taxa (de Cara et al. 2008;Jiang et al. 2013). A correlation can be used as evidence that assortative mating may be occurring in a population, however, it cannot accurately measure the strength of mate preference directly, nor can correlations help distinguish between mate preference rules. Furthermore, correlation coefficients do not appear as parameters in analytical or simulation-based models of sexual selection or divergence with gene flow (de Cara et al. 2008). To provide a link between theory and data, the strength of mate preference must be directly estimated as a parameter value.
Here we develop a likelihood-based method adapted from the mathematical framework given in Kirkpatrick and Nuismer (2004) to infer the strength of mate preference from observa-tional data collected in the wild. Our method requires observations of phenotypes of males and females in mated pairs, as well as phenotypes of randomly sampled individuals (although not strictly required in all cases, see the "Likelihood Function" section) of reproductive capacity in a population. We know of one other family of methods to estimate mate preference from similar data (Rolán-Alvarez et al. 2015;Fernández-Meirama et al. 2017;Estévez et al. 2018). These approaches are based on fitting a strength parameter to indices of assortative mating for each pair that are scaled to approximate probabilities. We take a more direct statistical approach by estimating all parameters via maximum likelihood, which also provides for the calculation of standard errors or profile likelihood functions to gauge the (im)precision in the estimation, as well as formal inferences such as statistical tests and confidence intervals. This approach is also easily extended to generalizations of the model. We also include a parameter that allows for offset matching, where the phenotype of the most preferred partner differs from the chooser's own phenotype by a fixed value instead of strict trait matching only, as is common in most mating functions (e.g., Lande and Arnold 1983;Arnqvist et al. 1996;Dieckmann and Doebeli 1999;Thibert-Plante and Hendry 2009). In the next sections, we (i) present the mathematical framework from which we build a likelihood-based method to estimate parameter values. Using the mathematical framework in (i), we (ii) demonstrate why using a correlation to measure the strength of mate preference is insufficient and can generate spurious results. Then, we (iii) evaluate the performance of the maximum likelihood estimates with simulated data, and (iv) analyze and interpret data from two previously published studies on assortative mating using our methodology, which we have made publicly available in an package called matepref (Clancey and Johnson 2021) for R (R Core Team 2021).

MATE PREFERENCE MODEL
Our model is designed to estimate the strength of mate preference when individuals prefer a phenotypically similar mate (i.e., trait matching) or a mate with a larger or smaller trait value relative to themselves (i.e., offset matching) on a continuous scale. We typically rely on having a single quantitative trait that can be measured in both sexes. In some cases it may also be appropriate to have two quantitative traits, one that is measured in males and one that is measured in females, where each trait is important in the context of mating. Specifically, our model is a statistical application of a theoretical model proposed by Kirkpatrick and Nuismer (2004), which we adapt for the purpose of estimating parameters pertaining to mate preference from observations of trait values in mated pairs of organisms in the wild.
We assume individuals can move freely in space and come into contact with potential mates at random. In dioecious species, male and female traits, or in hermaphroditic species, traits of donors and recipients, hereafter X and Y , are assumed to be independently sampled from a homogeneous population. To give a general formulation of the model, we can decompose the joint distribution of traits in (X, Y ) mated pairs of organisms into the unspecified marginal density functions f X (x) and f Y (y), a mating function describing the probability of mating between individuals with X and Y trait values, and a normalizing constant. Assuming observations within X , Y , and (X, Y ) mated pairs are independent, the joint distribution of X and Y trait values given that the organisms are in a mated pair can be written using Bayes' theorem as where S denotes an indicator variable for a successfully mated pair and the denominator, P(S = 1), is the marginal probability of mating. The mating function we propose builds upon a Gaussian mating function from Lande (1983) and Kirkpatrick and Nuismer (2004), but we include additional parameters such that where α ≥ 0 is a parameter describing the strength of mate preference as a function of the difference between the trait values, δ is the value of x − y that maximizes the mating probability, and γ is a parameter that equals the probability of mating when x − y = δ (i.e., the probability of mating with the perfect matching or offset matching of trait values). Now we assume the distributions f X (x) and f Y (y) to be normal density functions, or approximately normal, such that X iid ∼ N (μ x , σ x ) and Y iid ∼ N (μ y , σ y ). These distributions, combined with the mating function in equation (2), give us the fully specified conditional distribution of mated pairs as Note this distribution is not dependent on the parameter γ. The denominator of equation (3) is equal to the marginal probability of mating and for computational purposes can be written in closed-form. The double integral here is equal to the expectation where t = −α(σ 2 x + σ 2 y ) and Z = (X − Y − δ)/ σ 2 x + σ 2 y . Because Z is a normal random variable with mean (μ x − μ y − δ)/ σ 2 x + σ 2 y and unit variance, Z 2 has a noncentral χ 2 distribution with one degree of freedom and noncentrality parame- x + σ 2 y ), and thus equation (4) is the moment-generating function of a noncentral χ 2 distribution. This can be written as (see Johnson et al. 1995) giving a fully specified statistical model to formulate the likelihood function in the following section.

LIKELIHOOD FUNCTION
We now consider the problem of the estimation of the parameters of equation (3) with empirical data. We envision a scenario where field biologists are able to randomly sample individuals of reproductive capacity to measure a quantitative trait of interest and observe mated pairs in their study population. The data then consist of paired observations of trait values within mated pairs, (x i , y i ), and additional unpaired observations of trait values of females (x i ) and males (y i ) that are not observed as part of a mated pair. Assuming mutual independence across paired and unpaired observations, from equations (3) and (5) we have the likelihood function where θ = (α, δ, μ x , μ y , σ x , σ y ) , S p is the set of indices of ordered (x, y) mated pairs, and S x and S y are the sets of indices of observations x and y, respectively, that are not observed as members of a mated pair. The maximum likelihood estimate of θ can be obtained numerically by the maximization of equation (6) with respect to θ. Either or both S x and S y can be empty, in which case the corresponding terms are omitted from the likelihood function. It should be noted that μ x , μ y , and δ are not identified if S x and/or S y are empty, so offset matching cannot be distinguished from differences between the mean trait values in mated pairs. To unambiguously estimate δ it is necessary to have samples of mated pairs as well as samples of males and females that are not observed as members of mated pairs. However, if δ can be assumed to be zero, as is sometimes the case, then it can be fixed rather than estimated, and then the model is identified even with using only a sample of mated pairs.

INTERPRETING THE MATING FUNCTION
To interpret an estimated value of α as the strength of mate preference, it is helpful to understand the mating function. The mating function in equation (2) describes the probability of mating as a function of trait differences and the parameters α, δ, and γ. It is very unlikely that the probability of mating is equal to one even in a perfectly matched pair. Therefore, we use the parameter γ as the maximum probability of mating when x − y = δ. Figure 1 depicts how the mating function depends on the trait differences x − y and the parameters α, δ, and γ. The parameter γ can be viewed as the composite effect of all factors independent of x and y on the probability of mating such as other independent traits conferring a mating advantage, availability of resources, or territories that may affect the probability of mating, or other factors that are system specific. To facilitate the interpretation of α, we can use the mating function to express an interval around δ in terms of x − y where the probability of mating is within p% of the maximum probability of mating of γ. For any p% we can define the values of x − y such that P(S = 1|x, y) ≥ (1 − p/100)γ.
The interval δ ± √ − log(1 − p/100)/α defines trait differences x − y leading to a mating probability that is within p% of the maximum probability of γ. Because α is the "strength" of mate preference, larger values of α will reduce the interval width, while the value of δ defines the center of the interval. The value of p is arbitrary, but can be specified as a biologically meaningful value in terms of the impact of the trait difference on the reduction in the mating probability.

Limitations of the Correlation Coefficient as a Metric of Mate Preference
The Pearson correlation coefficient is a common metric used by empiricists to infer the existence and strength of preferencedriven assortative mating in a particular population (Jiang et al. 2013) where mate preference is assumed to be the biological process generating a correlation between traits within mated pairs. Even though mate preference does in fact generate a correlation, a sample correlation coefficient cannot be used to estimate the parameter α, which is the strength of mate preference. This is because the correlation of X and Y in mated pairs is a function of both α and the variances of X and Y . To observe the mathematical relationship between a correlation (ρ) and α as a function of a common phenotypic standard deviation, σ x = σ y = σ, we numerically computed the correlation between X and Y from the conditional distribution given in equation (3) as a function of α and several fixed values of σ (Fig. 2). Whenever mate preference generates a nonzero correlation between traits within mated pairs, we can see that ρ increases monotonically with respect to α, but the relationship is not linear and varies as a function of σ.
More specifically, as α increases, ρ increases faster with larger values of σ until leveling off. As the variance of X − Y increases, more of the distribution of X − Y will fall outside a region of high probability of mating as defined using Figure 1. Overall, for the same given value of α, different values of ρ can be generated depending on the variability of traits in the population, and therefore, a correlation coefficient is not a direct estimate of α and will be misleading if used to infer the strength of mate preference.

Evaluation of Performance with Simulated Data
To validate our method, we evaluate the performance of the maximum likelihood estimates with simulated data. Males and females composing a breeding population are generated as random samples from normal distributions (except in simulations designed to purposefully violate the normality assumption) with constant means (female: μ x = 10; male: μ y = 12) and standard deviations (σ = 2 for both sexes). Mated pairs are simulated with equation (2) under different fixed values of α and δ, while holding γ = 1 constant. Sample sizes n p (number of mated pairs) and n s (total number of single individuals of both sexes, where the number of males is equal to the number of females) vary to demonstrate their influence on the estimates or are fixed at n p = 100 and n s = 100. We simulate 1000 replicate datasets for each value of α and δ. Parameter estimates are obtained by maximizing the likelihood function using the R function optim() with the L-BFGS-B algorthim (R Core Team 2021). Figure 3 shows the distance between estimated and true values of α and δ for set values of each parameter under different sample sizes of mated pairs and single individuals. Note that we expect the error around the estimates,α, to be proportional to the mean ofα (as seen in Fig. 3A), because α has a lower bound very near zero. This is not the case withδ, because δ can be any real number. At larger values of α and smaller sample sizes, there is a tendency for estimated values (α) to be larger than true values. The bias and error aroundα, not surprisingly, decreases with larger samples sizes, particularly when the number of mated pairs is increased. The estimates of δ have little bias, although sample sizes of mated pairs also appear to influence the error surroundinĝ δ.
We find that including δ in the model is very important to estimate α without bias whenever the true value of δ is nonzero. If the true value of δ is zero and then not estimated in the model, an accurate estimate ofα can still be obtained. If the true value of δ is not zero and then not included in the model,α becomes less accurate as α and δ increase (Fig. 4). This is because when δ is nonzero, males and females within mated pairs differ from each other, and this difference corresponds to an actual preference. If δ is incorrectly designated as zero, the phenotypic difference between mates then incorrectly estimates a low preference.
Last, we test the model's performance when we violate two key assumptions: (i) normality and (ii) population homogeneity. Very often quantitative traits are normally distributed or approximately normal, but situations could arise when phenotypic distributions are nonnormal. A probable and problematic example would be a skewed distribution. In Figure 5, we violate the normality assumption by drawing trait values from a gamma distribution with three different shape parameters to simulate skewed distributions of varying degrees. Particularly when the skew is minimal, the parameter estimates for α and δ are robust to this normality violation. We also investigated other examples of nonnormal distributions, specifically heavy-tailed distributions. These are even less problematic than skewed trait distributions, and therefore the results of these simulations are not presented here.
Estimates of the two parameters α and δ can also be biased if mating occurs primarily within unidentified subpopulations, thus violating the assumption that observations are drawn from a homogeneous population. The effect of population structure increases as the subpopulation means get further apart or the variance of each subpopulation distribution decreases. We evaluate the effect of hidden population structure with divergent trait means on the estimates of α and δ with constant sample sizes and a constant population standard deviation (values given in Fig. 6). We simulate three subpopulations with discrete structure (i.e., mating takes place only within each subpopulation), where the value of dictates the distance between mean trait values in each subpopulation. For each value of (Fig. 6), population means for pairs and single individuals are μ − , μ, and μ + , respectively. Subsequently, all pairs and single males and females are combined into one dataset to mimic "hidden" population structure. Then, α is estimated as if the observer were unaware of the structure. As expected, the estimate of the strength of mate preference increases as phenotypic divergence among subpopulations  increases (Fig. 6A), but the estimates are surprisingly robust, especially at the lower values of . The estimate of the offset match is similarly affected by hidden population structure, as the value ofδ underestimates the true parameter with increasing values of ( Fig. 6B). At lower values of , this bias is minimal.

Application to Real Data
Many empirical studies demonstrate the existence of assortative mating by measuring phenotypes and observing mated pairs in the wild. In this section, we apply our likelihood-based approach to real data from two published studies of size-assortative mating in three species of marine gastropods from the genus Echinolittorina (Ng et al. 2019) and two populations of the European common frog (Rana temporaria) (Dittrich et al. 2018). Datasets from both studies are publicly available on Dryad (see Dittrich et al. 2018;Ng et al. 2019). Each study measured body size (shell length in Echinolittorina and snout-vent length (SVL) in R. temporaria, both in millimeters), a trait easily observable in males and females in a sample of mated pairs and unmated individuals in each population. Periwinkles, or marine gastropod molluscs in the family Littorinidae, occur worldwide in the rocky intertidal and have been the subject of numerous evolutionary studies including many studies of sexual selection and mate choice (Ng et al. 2019;Perini et al. 2020). In the genus Echinolittorina, courtship is initiated by a male following a female's mucus trail; if this female is deemed an acceptable mate, courtship ends with the male mounting and copulating (Ng et al. 2013). Males are suspected to exhibit sizedependent mate preference because larger females likely have higher fecundity, but physically copulating with a very large female may not be possible for a smaller male or the risk of sperm competition may be very high and jeopardize paternity if multiple males target the largest females (see Ng et al. 2019).
We   Table 1. We used likelihood ratio tests to determine if α > 0 and δ = 0, which is mate preference with offset matching, significantly differs from α = δ = 0, the null expectation of random mating. Results of the likelihood ratio tests are E. malaccana: χ 2 2 = 20.56, p = 0.000034, E. radiata: χ 2 2 = 22.10, p = 0.000016, and E. vidua: χ 2 2 = 31.53, p = 0.00000014 and support the hypothesis that these three species exhibit mate preference with offset matching, albeit to different degrees. To visualize the impact of the estimated values of α and δ on the joint distribution of (X, Y ) mated pairs, we show contour plots depicting the joint distribution of male and female phenotypes under the null expectation of random mating compared to the estimated distribution under our model with maximum-likelihood estimates ofα andδ (Fig. 7). We also calculate the interval δ ± √ − log(1 − p/100)/α ("Interpreting the Mating Function" section) that defines the range of trait differences (shell length measured in millimeters) where the mating probability is within p = 10% of the maximum probability of γ using the estimates for α and δ for each species, E. malaccana: (−0.097, 0.92), E. radiata: (0.58, 2.20), and E. vidua: (0.14, 1.00). Similar to the situation in marine gastropods, Dittrich et al. (2018) hypothesize the pattern of assortative mating in R. temporiana to be driven by larger males preferring larger and more fecund females. However, under closer investigation this pattern is much more complex and dynamic. Larger individuals of both sexes arrive at breeding sites earlier than smaller individuals, and larger males can out-compete smaller males for preferred mates. In the face of competition from larger males, the alternative tactic of smaller males is to quickly and indiscriminately mate with any available female, even a small one. Thus, the complex mechanisms leading to size-assortative mating in R. temporiana can generate different outcomes across time and space (Dittrich et al. 2018). Dittrich et al. (2018) studied two populations of R. temporaria from southern and central Germany. The first population, Fabrikschleichach (FS), consisted of a network of 140 ponds where R. temporaria typically used between 35 and 40 ponds for reproduction. In contrast, the second population at the locality Kleiwiesen (KW) also contained a network of ponds, but according to the authors, almost the entire population bred within one pond, creating strong male-male competition for larger female mates. FS population males tended to be smaller than their female mates, whereas the KW population showed the opposite trend, likely due to differences in male densities, in the KW population.
In light of the dynamics of the R. temporaria mating system, the authors assessed whether mate preference was adaptive and varied between populations due to mate availability during migration to breeding sites and competition for mates. A summary of the parameter and interval estimates for SVL measurements (millimeters) in each population can be seen in Table 2. Here, after reanalyzing the published dataset in our model, we found no statistical difference in strength of mate preference across the two populations (a 95% Wald confidence interval for the difference in α FS − α KW is 0.0022 ± 0.0024), but similar to the authors' hypothesis, the direction of the offset changed under different conditions (a 95% Wald confidence interval for the difference in δ FS − δ KW is 20.21 ± 10.90) (see Appendix for the 95% Wald CI calculation and alternatives to analyzing data from multiple 8.70 ± 0.59 7.20 ± 1.04 σ y 6.65 ± 0.27 5.81 ± 0.39 populations). Note that the values ofα are small. To compensate for numerical instability occurring when α is close to zero, we converted all observations of SVL from millimeters to centimeters. This enlarged the value ofα, and likewise decreased the value ofδ, ensuring mathematical stability during optimization. These values were converted back to millimeters after the estimates were obtained. Like the analyses in Echinolittorina, results of likelihood ratio tests comparing the model of α and δ held constant at zero to the model allowing these parameters to vary are FS: χ 2 2 = 115.22, p ≈ 0 and KW: χ 2 2 = 21.12, p = 0.000026. The strength of mate preference and the difference in the direction of the phenotypic offset between male and female mates can be visualized in Figure 8 for the FS and KW populations. Again, we also calculate the interval δ ± √ − log(1 − p/100)/α ("Interpreting the Mating Function" section) that defines the range of trait differences (SVL (mm)) where the mating probability is within p = 10% of the maximum probability of γ using the estimates for α and δ for each population, FS: (1.38, 10.76) and KW: (−20.40, −7.9).

Discussion
A hurdle to testing evolutionary theory surrounding mate preference is the lack of methods allowing theoretical models to interface with observational data (Gavrilets 2014;. To fill this gap, we have developed a maximum likelihood based approach to estimate parameters in a Gaussian mating function from empirical data. Extensive testing of our model using simulated data demonstrates accurate estimation of the key parameter, the strength of mate preference, α. In addition, we can estimate the optimal offset match in mated pairs, the parameter δ, and the population means and variances in males and females from maximizing a single likelihood function. We discovered that estimating the offset parameter δ is highly necessary to obtain a nonbiased estimate of α, and this parameter also provides valuable information about mate choice mechanisms operating in many mating systems. Best estimation results for both parameters, α and δ, are achieved with larger sample sizes in a wellmixed population. We also addressed the issue of hidden population structure. Not surprisingly, hidden population structure can skew results, and therefore spatial scale should be carefully considered when using this method (see Rolán-Alvarez et al. 2015;Estévez et al. 2018). We anticipate empiricists will be aware of strong population structure and be able to use corrected phenotypes (e.g., Langin et al. 2015) in most circumstances, or else we urge consideration of the possibility of hidden structure. If discrete subpopulations can be identified and are believed to share a common α and δ, the likelihood function in equation (A.2) can be used to accommodate structured populations (see Appendix). Even so, parameter estimates appear very robust to undetected weak population structure. As long as the population is mixing so that local distributions of potential mates are similar to the overall population, and this will occur especially as trait variances increase, parameter estimates will have minimal bias and can still be applied to a variety of biological systems. Applying our method to two published datasets on sizeassortative mating (Dittrich et al. 2018;Ng et al. 2019), we support previous findings and gain additional insight into each mating system. In species of marine gastropods, sexual size dimorphism, with females being the larger sex, is common but not ubiquitous (Ng et al. 2019). However, the mechanism by which males choose mates-preferring females slightly larger than themselves-appears widespread even in species with little to no dimorphism Ng et al. 2019). Ng et al. (2019) investigated the relationship between directional sexual selection on female size and sexual size dimorphism in marine gastropods by studying the mechanisms of male mate choice and patterns of sexual dimorphism in seven different species of marine gastropods. They uncovered a negative correlation between sexual selection intensity on female body size and sexual size dimorphism, and speculate this relationship is caused by the existence of a "similarity-like" mate choice mechanism (Fernández-Meirama et al. 2017), where males preferentially mate with females of the same size plus a specific value (Ng et al. 2019). Our likelihood-based method directly addresses both components of this mate choice mechanism by estimating the parameters α and δ. In our reanalysis of data on three species of Echinolittorina included in the Ng et al. (2019) study, we also found strong size-assortative mating and males preferring larger females respective to their own size, confirming the existence of a "similarity-like" mate choice mechanism. With parameter estimates in hand, our model could be used to further investigate the relationship between mate preferences generating sexual selection on female size and sexual size dimorphism in these species with simulation.
In the European common frog, mate preferences may be highly labile so that individuals choose mates adaptively. Dittrich et al. (2018) believe multiple mechanisms, such as conditions during migration to breeding ponds and the amount of male-male competition over mates, all generate the observed patterns of assortative mating and make interpretation difficult. These authors found that the correlation between traits in mated pairs fluctuates across populations. Our reanalysis shows that the strength of mate preference is not statistically different across populations, but the direction of the offset match differs, where males preferred females smaller than themselves in one population, but in the other population, males preferred females larger than themselves. This information is not apparent from the correlations, but is obtained by estimating the parameters separately in our model. Thus, it appears that preference for a particular body size is consistently operating in this species, but local conditions may dictate the optimally sized partner with respect to oneself.
Although our model follows many previous studies that have collectively developed a rigorous theoretical framework to understand mate preference and its evolutionary consequences, we rely on key assumptions that can limit application to all situations. First, we assume phenotypic traits are normally distributed in a homogeneous population. In reality, not all phenotypic trait measurements are independent (e.g., spatial structure would violate this assumption) and normally distributed. Parameter estimates are robust to weak violations of these assumptions as long as the trait is continuous. For example, a skewed distribution such as flowering date (Schmitt 1983;Blionis et al. 2001) can be analyzed with our method, as can populations with weak structure. On the other hand, the model cannot handle categorical phenotypic measurements, which are often important in mating, such as song type in red crossbills (Loxia curvirostra) (Snowberg and Benkman 2007), discrete color polymorphisms in strawberry poison frogs (Oophaga pumilio) (Yang et al. 2016), alternative male tactics in Trinidadian guppies (Poecilia reticulata) (Reynolds et al. 1993), or polymorhic structures in diving beetles (Graphoderus zonatus) (Iversen et al. 2019).
Next, we assume that the probability of mating decays in a Gaussian fashion as a function of differences in continuous trait values that can be measured in males and females (eq. 2). This assumption limits the model to matching rules of mate choice and cannot easily be extended to preference/trait rules, unless preference for a particular trait in the opposite sex can be measured as an independent psychological trait on a continuous scale. The preference function we use is unimodal and symmetric, and therefore cannot be ascribed to, for example, a situation where females in a population prefer the largest males. Even in cases where mate choice is made via a matching rule, other types of mating functions that are not unimodal or symmetric, may be better suited to specific systems (Lande 1981;Carvajal-Rodriguez and Rolán-Alvarez 2014;Neelon et al. 2019). Models comparable to ours could potentially be built on alternative preference functions, but it may not be possible to write the likelihood function (eq. 6) in closed form as we were able to do here, and numerical approximations would need be used in these cases. Nonetheless, our method also provides the basis for a statistical framework for testing hypotheses about mate preference functions in future studies.
Last, estimation of the strength of mate preference is based on measuring one focal trait. When the data are collected, we assume we have correctly identified and measured this key trait. Mate preference in reality could be very strong but we underestimate it by measuring the wrong trait. Alternatively, we may conclude mate preference is based on the focal trait, but we have merely measured a correlated or indicator trait (Candolin 2003). Measuring an indicator trait may not be a problem if the goal is to understand a process that relies on assortative mating, but this will be misleading if the goal is to unlock key traits involved in mate choice to understand a mating system. Overall, investigators using this method should carefully consider the assumptions in relation to their research objectives, and prior evidence for mate preference based on candidate traits.
Even in the face of limitations, our method has broad implications for the study of mate preference in wild animal and plant populations. Many theoretical models have been constructed to explain phenotypic evolution and divergence that were motivated by observations of mating patterns in nature (Lande 1981). With this foundation firmly in place, it is time for empirical assessment of the role of mate preference in evolutionary dynamics as predicted by models of divergence with gene flow (Dieckmann and Doebeli 1999;Räsänen and Hendry 2008), sexual selection (Servedio and Bürger 2014;Servedio 2016), and the evolution of sexual dimporphism (Bolnick and Doebeli 2003). Our methodology is unique in the use of a likelihood-based approach to estimate mate preference directly from observational data, thereby allowing an extensive range of systems in which we can study these processes. To help facilitate future studies in this area, we have made our method publicly available and easy to implement in a package called matepref (Clancey and Johnson 2021) for R (R Core Team 2021) that includes functions to estimate the model described in this article, and to simulate data for simulation studies. It is our hope that our method answers previous questions and raises new questions about the role of mate preference in evolutionary dynamics, and does so efficiently to provide a seamless connection between theory and data.