#### Analytical model

Our object here is to address the far more interesting question of why some males attain more reproductive success than others. In such an analysis, we start with the male features of possible relevance to reproductive success, and our first task is to model the relative male reproductive contributions, the λ_{k} values, as some function of the potentially interesting features. There are many ways one might model the impact of such features, but one of the more attractive methods is to use log-linear models.

In the usual log-linear model, the response variable (here ln λ_{k}) is estimated directly, and is then regressed on a set of predictor (*z*) variables, each measured without error. Here, the response variables cannot be measured directly, but rather are estimated indirectly. The relationship is log-linear, but a conventional log-linear regression is inappropriate, because the response variable is not measured, it is itself modelled. In addition to their analytical features, log-linear models are also a means of linearizing multiplicative contributions of different male features to overall fitness, as reflected in the λ_{k} values. Thus, log-linear models have conceptual advantages as well. For the *k*th male, express the collection of potentially interesting features in vector form *G*_{k} = { *g*_{jk}}, and then define

Because this is a closed population, with all the potential male parents accounted for, the λ_{k} values must sum to unity, and each is bounded by [0,1]. To ensure this, it is customary to use one of the λ_{k} values, say the last (λ_{K}) as a reference for all of the others. We define the feature differences between the *k*th and Kth males as

and then rewrite eqn 3 in the form

The final results are invariant with respect to the choice of referencing male. We can model different combinations of the J candidate features, in an attempt to extract those of major importance, anticipating a parsimonious model, with J < (K – 1).

Given a maximum likelihood solution for the vector of J regression coefficients, *B**, we can back-translate to estimate the corresponding male reproductive contributions indirectly,

As a reminder, these are ‘modelled’λ_{k} values, rather than progeny counts. We can also evaluate proper submodels (some of the β_{j} values = 0) by using nested log-linear stepwise regression procedures, a fact we can use to explore the potentially relevant male features in some detail. The idea is to find the smallest set of predictive features that will describe the situation adequately.

Using the translation of β into λ as in eqn 6, and applying those translations to eqn 2, we maximize log-L with respect to the choice of β_{j}. The essence of our strategy is to compute log-L with all of the β values set to zero, beginning with the null hypothesis that none of the features of interest is predictive of relative male reproductive performance. Holding all β values except the first at zero, we compute the λ values implied by alternative values of β_{1}, and using them to compute the likelihood in eqn 2. We replace the initial estimate of β_{1} = 0 with that value, β^{*}_{1} ≠ 0, for which log-L is maximum. Using this value of β^{*}_{1} and β_{3} = β_{4} = ⋯ = β_{J} = 0, find that value of β_{2} that maximizes log-L; call it β^{*}_{2}. We continue in this fashion until all of the β values have been adjusted, and then cycle through them again. The process ends when further changes in the β values lead to no further improvements in log-L. In our experience, the algorithm is simple, reasonably fast for small J, and well behaved.

In the usual log-linear treatment, we extract likelihood ratio test criteria that are asymptotically χ^{2} distributed, but the approximations are not particularly close in our application (see below). This may be a general problem in analysis of paternity where the asymptotic conditions for χ^{2} approximation are not met. Thus, in these circumstances, we recommend the use of nonparametric (permutational) procedures for hypothesis testing. It is important for permutational testing to be clear on the hypothesis of interest. We are not testing here whether male reproductive contributions are homogeneous; we take the fact that they are quite heterogeneous as well established ( Meagher, 1986; Smouse & Meagher, 1994). Rather, we are testing the null hypothesis that (whatever the pattern of reproductive heterogeneity), there is no relationship between it and the profiles of candidate morphological features. Our hope, of course, is that we can disprove that null hypothesis, thereby demonstrating the predictive relevance of the features of interest. On the null premise that the feature profiles ( *Z*_{k}) are irrelevant to male reproductive success, it should not matter at all which profile is attached to which male. We thus performed a nonparametric analysis by permuting feature profiles among males, and repeating the whole analysis 1000 times. The distribution of the likelihood scores obtained under randomization was then compared with the result obtained with the male feature profiles attached to the proper males.

#### Male reproductive effort in *Chamaelirium luteum*

The relationship between reproductive effort and reproductive success is central to many theories concerning sex-specific reproductive performance; a good example of this is Bateman’s Principle (outlined above) that predicts different relationships between effort and success for males and females. In terms of reproductive effort and success, it has generally been presumed that an increase in resources allocated to attractive structures should result in an increase in reproductive success, and indeed several authors have put forward results that demonstrate this to be the case ( Schoen & Stewart, 1986; Broyles & Wyatt, 1990; Devlin *et al*., 1992 ).

In the context of our *C. luteum* example, a reasonable expectation might be that the resources a male puts into reproduction will influence his reproductive success. Meagher (1991) defined reproductive effort as the physiological resources committed to reproductive structures, and quantified the amounts of male reproductive effort for *C. luteum* by measuring rosette leaf number, stalk leaf number, stalk length and raceme length, all of which are correlated with pollen output and floral display size. We defined four feature variables for the *k*th male, relative to the values of the K = 273*r*d male:

Histograms of these variables are presented in Fig. 1. Each variable is unimodally distributed, with sufficient variation among males to make the analysis worthwhile. The coefficients of variation are 0.35, 0.42, 0.32 and 0.30, respectively. That is to say, we know from earlier analyses ( Meagher, 1986; Smouse & Meagher, 1994) that we have reproductive heterogeneity among males; we also have substantial morphological variation among them; the question here is whether the two are correlated.

Meagher (1991) showed, using a subset from the present data consisting of 102 males with unique genotypes, that the correlations between male features and reproductive success were not significant. We have analysed the model in eqn 6, for each of the four features individually, as well as for the full quartet (Table 1). Our results, using the full battery of 273 males, tell the same tale as did Meagher’s (1991) analysis. While male success is demonstrably uneven ( Meagher, 1986; Smouse & Meagher, 1994) in this data set we found no convincing evidence of a correlation between variation in male reproductive success and variation in morphological feature profile.

Table 1. Likelihood-ratio test criteria (Λ) for the log-linear model and submodels analysed by text eqn 6, describing the impact of male morphology on male reproductive success, using morphological features; *P*-values for test criteria were evaluated via permutational procedures. Permutational shuffling of male feature profiles shows the same result in graphic fashion ( Fig. 2). Since these permutations were based on random rearrangement of the data, the random distribution of each estimated β-coefficient is centred on zero, though not symmetrically so in all cases. The lack of symmetry around zero suggests that the log-likelihood ratio is a better test statistic for evaluating permutation results than the β-values themselves. Also, as anticipated above, the asymptotic approximation to a χ^{2} distribution does not appear to be a reliable basis for significance tests, based on the observed log-likelihood ratios generated by this analysis ( Fig. 2).

The lack of fit of our log-likelihood ratios to a χ^{2} distribution is a striking outcome of our analysis. When a test statistic is asymptotically χ^{2} distributed, the asymptote in question refers to the sample size or information content of the underlying data. In our case, we are using an apparently quite large data set, with 2294 progeny and 273 male parents. However, in this analysis, information content of the data is derived from other factors as well, such as the number of genetic loci and the number and frequencies of alleles within loci ( Meagher & Thompson, 1986). Thus, the axis along which the ‘asymptote’ should be evaluated in the case of a paternity analysis is complex, and likely to be multidimensional. In most genealogical inference applications, a permutation test or some other resampling scheme should be used to evaluate statistical significance. The log-likelihood ratio, while apparently not a close approximation to a χ^{2} distribution, is a useful test statistic for evaluating the results of permutation-based significance testing. Therefore, we recommend that hypothesis testing for future analyses such as this should be based on the permutation test used here.