An adjusted coefficient of determination (R2) for generalized linear mixed models in one go

The coefficient of determination (R2) is a common measure of goodness of fit for linear models. Various proposals have been made for extension of this measure to generalized linear and mixed models. When the model has random effects or correlated residual effects, the observed responses are correlated. This paper proposes a new coefficient of determination for this setting that accounts for any such correlation. A key advantage of the proposed method is that it only requires the fit of the model under consideration, with no need to also fit a null model. Also, the approach entails a bias correction in the estimator assessing the variance explained by fixed effects. Three examples are used to illustrate new measure. A simulation shows that the proposed estimator of the new coefficient of determination has only minimal bias.


PIEPHO
assessed by comparing a measure of the total variance between both models on the linear-predictor scale.The measure was the average semivariance (ASV), which depends only on the variance-covariance structure of the random effects, including the residual error structure, and which provides a natural metric of total variance that also accounts for the covariance among observations.The approach is completely general, allowing for any variance-covariance structure, including heteroscedasticity, spatial, or temporal correlation, and it was shown to be equivalent to the adjusted R 2 proposed for fixed-effects LM.
A downside of the approach in Piepho ( 2019) is that two model fits are required, one for the candidate model and one for the null model.This paper proposes an alternative that only requires the candidate model to be fitted but still makes use of the ASV metric for total variance.It will be shown that in an LM the approach also coincides with the adjusted R 2 .The LM case will be taken as a point of departure to introduce the key idea and to show that it leads to the adjusted R 2 in this simple case.Next, extension to LMM will be discussed.Finally, extension to the new approach GLMM will be considered, where total variance is assessed on the linear predictor scale.It will be shown how standard output from the GLIMMIX procedure of SAS can be employed to implement the approach with ease.Several examples are used for illustration.Simulations are presented that show good performance of the proposed measure.

The main idea
Let the data vector be given by  = ( 1 ,  2 , … ,   )  with () =  = ( 1 ,  2 , … ,   )  and var() =  = {  }.The key idea put forward in Piepho (2019) is to consider the semivariance (Webster & Oliver, 2007), which for two observations   and   ( ≠ ) is defined as The semivariance depends not only on the variances,   and   , but also on the covariance,   , thus providing a natural metric of variance that also accounts for covariance among observations.A natural measure of total variance, borrowed here from geostatistics (Webster & Oliver, 2007, pp. 61-62), is obtained by averaging (  ,   ) across all pairs, yielding the ASV where   =   −  −1 1  1   with   the n-dimensional identity matrix and 1  an n-vector of ones.This average can be regarded as a discrete version of the double integral given in Webster and Oliver (2007), which integrates the semivariance over a defined spatial area, and its application to general variance-covariance structures outside of geostatistics.
This measure of variance captures only the random part of the model (V) but not any variation in the fixed effects ().The key idea of the present paper is to assess total variance in a way that captures both components.To this end, we may consider the quantity 1 2 (  −   ) 2 , the usual estimator of the semivariance if stationarity can be assumed (Webster & Oliver, 2007, p. 65), that is, a constant mean across all observations.In the presence of fixed effects, stationarity is violated, and an extension is needed as will now be detailed.The quantity provides a natural way to also assess the variability in the mean across observations in the data.To do so, we here use the expected value of 1 2 (  −   ) 2 , which may be denoted as the expected semisquared difference, and is defined as where can be denoted as semisquared bias.It is seen that the expected semisquared difference (  ,   ) is a natural extension of the semivariance (  ,   ), involving the addition of the semisquared bias (  ,   ).This extension provides a natural partition of the variance explained by a given model, with (  ,   ) assessing the contribution of the random-effects component (via ) and (  ,   ) assessing the contribution of the fixed-effects component (via ).Again averaging across all pairs, one obtains the average semisquared bias (ASSB) and the average expected semi-squared difference (AESSD) Thus,   () is the contribution of  and   () is the contribution of  to the average expected semisquared difference   (, ), our measure of total variance.
Interestingly, the average expected semisquared difference,   (, ), can be related to the sample variance It follows from results on quadratic forms (Searle et al., 1992, p. 466, Theorem S1) that This result is quite remarkable because total variance in LM is usually assessed in terms of  2  or, equivalently, the total sum of squares ( − 1) 2  (Draper & Smith, 1998).If total variance is partitioned in terms of  2  or total sums of squares, this leads up to the common coefficient of determination R 2 for LM.By contrast, if total variance is partitioned based on ( 2  ) =   (, ), and the components of the partition are unbiasedly estimated, this leads up to the adjusted R 2 for LM, as will be shown in the next section.

2.2
The LM case The candidate model can be written as where  ∼ (0,  =  2  ).The error variance can be unbiasedly estimated by where  = () and   =  − (  ) −   .The null model, that is, the intercept-only model, is given by where  is the intercept parameter,  0 ∼ (0,  0 =  2 0 ) and σ2 0 =  2  = 1 −1     .In difference to Piepho (2019), we will not actually need to fit this null model here, but the null model is still a useful point of reference for LM, as will be seen at the end of this section.
The key idea of the proposed approach is to assess the average expected semisquared difference,   , based on parameters of the candidate model.For the LM in (9),   is Equation ( 12) provides a natural partition of   into a component explained by the fixed effects ,   (), and a component explained by the random residuals ,   ( 2  ).The proposed coefficient of determination therefore is We find that   ( 2  ) =  2  , which is estimated by where the subscript LM on θ indicates that we are operating under the LM in Equation ( 9) and σ2  is the estimator given in Equation ( 10).A natural starting point for estimating   () is the naïve estimator from which an unbiased estimator is obtained as For LM, we have var( β) = (  ) −  2  , which is unbiasedly estimated using (10).The second term on the right-hand side of ( 16) is a bias-correction term applied to the naïve estimator, given by the first term.It can be shown (see the Appendix) that that is, the estimator of the residual variance under the null model.From this, it follows that which is identical with the adjusted R 2 for LM (Draper & Smith, 1998;Piepho, 2019).
Obviously, these considerations based on the average expected semisquared difference have not led to a new measure of fit for LM.However, to the best of my knowledge, the derivation of this known measure is novel, and it will be useful subsequently in developing a novel coefficient of determination for LMM and GLMM.The right-hand side of Equation ( 18) suggests that Ω  could be estimated by fitting two models, that is, the candidate model ( 9) and the null model (11), and this was the basis of the approach proposed in Piepho (2019).The derivation in this section shows, however, that this can also be circumvented, and it is only necessary to estimate the candidate model, using the parameter estimates needed in (18) after the first equal sign.This result is readily extended to LMM and GLMM as will be shown next, and this is the main contribution of the present paper.

Extension to LMM
An LMM can be written as with var() = , var() = , and cov(, ) = 0, such that The fixed effects are estimated by As done for LM in Equation ( 12), the average expected semisquared difference,   (, ), can be partitioned using the general approach outlined in Section 2.1 as and hence the coefficient of determination may be defined as The unbiased estimator of   () is where var( β) = (   −1 ) − . So far, it was assumed for simplicity that  is known but in practice it needs to be replaced by V, its residual maximum likelihood (REML) estimator.If this is done (24) will no longer be unbiased (Kenward & Roger, 1997), but given that REML provides a consistent estimator of  (Searle et al., 1992) (24) will be a consistent estimator of   ().Further, we may consistently estimate the variance explained jointly by random effects  and residual  using Inserting ( 24) and ( 25) in ( 23), a consistent estimator of the coefficient of determination for LMM is given by We may also define a coefficient of determination for the variance explained by random effects as which may be motivated by the partition The estimator of   (  ) is simply   ( Ĝ  ).The variance explained by both fixed and random effects can be defined as It is also noted that   () can be further partitioned according to the component random effects of the LMM.For example, assume that with var( 1 ) =  1 , var( 2 ) =  2 , and cov( 1 ,  2 ) = 0. Thus +   () . (31) Therefore, we can also estimate the contribution of each random effect to the total variance and express this as a proportion of   (, ) to obtain a partial coefficient of determination for each random effect.This partitioning of the contributions of different random effects  1 and  2 is unique but rests on the assumption of their independence.This fact is analogous to the partitioning of contributions of fixed effects in LM to the total sum of squares, which is unique only in the case of orthogonality (Searle, 1987).

Extension to GLMM
A GLMM has a linear predictor The residual random effect  associated with the  units in the linear predictor is optional and may be added to account for overdispersion (Piepho, 2019).Its variance-covariance structure is denoted as In principle,  could be integrated into  and this may be convenient when implementing the approach in a GLMM package (see the Supporting Information), but here we state it separately because it is usually considered as a component of the residual (unexplained) variance.When computing a coefficient of determination for random effects Ω  , it will therefore be useful not to include these effects.It should be mentioned here that for GLMM our approach entails one important simplification compared to the approach outlined in Piepho ( 2019): Here, we do not strictly need to add the further residual  in the linear predictor as in Piepho (2019), where this was necessary to absorb the "unexplained variance" on the linear predictor scale when fitting the null model.Our approach here does not require fitting a null model, and hence it is not necessary to generally fit the residual .However, such a residual may be considered in its own right as a way to model overdispersion.
The observed data have a conditional expectation where (.) is the link function.The variance takes the general form where   is a diagonal matrix with evaluations of the variance function var(  |  ) on the diagonal and  is a correlation matrix or a covariance matrix if overdispersion needs to be modeled (Wolfinger & O'Connell, 1993;Stroup, 2013).The variance function depends on the assumed conditional distribution for the response , which may be either in the exponential family (which includes the binomial, Poisson, and Gamma distributions) or quasi-distributions that only require specifying the variance function.
A challenge with GLMMs is that the random model terms occur both on the linear predictor scale (via the random effects ) and on the observed scale (via the conditional distribution of  for a given value of the linear predictor ).In defining a coefficient of determination, a choice needs to be made as to the scale on which variance is to be assessed, that is, the observed scale or the linear predictor scale (Dempster & Lerner, 1950).In either case, the variance occurring on the one scale needs to be projected onto the other scale in order to have a common scale on which to define the coefficient of determination.There are several approaches for doing this, as reviewed in Piepho (2019) and Zhang (2022).Projection onto the observed scale may seem most natural because it usually is most tangible to practitioners.Projection onto the linear predictor scale has the advantage that effects are additive on that scale, thus facilitating the partitioning of the contribution of different effects as illustrated above for LM and LMM.Hence, the linear predictor scale can be considered as the more natural scale on which to assess explained variance.This projection will be employed here, using the approach in Piepho (2019), which will now be briefly reviewed.To achieve the projection, the linear predictor of the GLMM is extended by an auxiliary residual term ℎ as The role of the added residual ℎ is to take up the projection of the residual from the original scale (Nakagawa & Schielzeth, 2013), as will be detailed below.The variance-covariance structure for var(ℎ) =  ℎ is chosen in such a way that the corresponding conditional variance of the response, given  =  +  + , is well approximated, or exactly represented where possible.Subsequently, to compute the coefficient of determination, we proceed exactly as for LMM, except that we use in place of  to assess total variance.For all distributions except the binary and also for quasi-models with overdispersion (Wolfinger & O'Connell, 1993), we use the Taylor-series expansion approach of Foulley et al. (1987) to project the residual variance from the original onto ℎ on the linear predictor scale.It is shown in Piepho (2019) that (after some rearrangement) the first-order Taylor-series approximation yields where   = diag[ −1 ()∕].This also follows directly from a Taylor-series expansion of the model for the response around the expected value of the linear predictor as is used in pseudo-likelihood methods for estimating GLMM based on linearization (Wolfinger & O'Connell, 1993).
There are different methods to estimate GLMMs.The pseudo-likelihood method just mentioned is particularly suited to implement this approach.This method estimates fixed effects iteratively using (21), replacing  with  ℎ in (37).A further convenient feature of the pseudo-likelihood approach is that upon convergence it yields an estimate of  ℎ as a side product.All of this means that with pseudo-likelihood the approach for LMM can be applied directly using ( 24) and ( 25).There are different variations to this approach, depending on whether the linearized model is estimated by REML or ML and whether the Taylor series expansion for the linearization is around the fixed effects  only or around the fixed and random effects in (32) jointly (Stroup, 2013;Stroup et al., 2018).We here use REML and expansion around the linear predictor evaluated at the fixed and random effects (this is denoted as RSPL in the SAS procedure GLIMMIX).Other estimation methods, such as Gaussian quadrature (Pinheiro & Bates, 1995) or the Laplace method (Wolfinger, 1993) do not yield  ℎ directly but require some extra computation.With these estimation methods, we generally have  = , meaning that the matrix  ℎ is diagonal, that is,  ℎ =  −1     −1  , which facilitates its computation where needed (Table 1), especially when the GLMM package allows extracting an estimate of   (see the Supporting Information for an implementation with the SAS procedure GLIMMIX).
It is mentioned here that for a Poisson distribution, Foulley et al. (1987) average the conditional mean of the ith observation,  , corresponding to the residual variance on the observed scale, over units before entering this as the residual variance on the linear predictor scale (Nakagawa et al. [2017] do the same), whereas the approach proposed here averages  −1  over units, corresponding to the projected residual variance on the linear predictor scale.The latter seems more natural because the averaging takes place on the scale where the coefficient of determination is assessed.Piepho (2019), where diagonal elements of  ℎ were erroneously computed as   instead of  −1  .
For the binary distribution, the important limiting case of a binomial distribution with  = 1, exact results for  ℎ are available and should in fact be preferred because the Taylor-series approximation breaks down in this case (Breslow & Lin, 1995).In binary models, the inverse link function may be viewed as the cumulative probability density function (c.d.f.) of the residual error ℎ on the linear predictor (or liability) scale, and the observed binary outcome is based on whether or not the linear predictor  exceeds a threshold, given by the general intercept.Specifically, with logit, probit, or complementary log-log link, the inverse link functions correspond to the c.d.f. of the standard logistic, normal, and extreme value distributions, which have variances var(ℎ  ) =  2 ∕3, var(ℎ  ) = 1, and var(ℎ  ) =  2 ∕6, respectively (Dempster & Lerner, 1950;Keen & Engel, 1997;Nakagawa et al., 2017).Hence, assuming these variances for h on the linear predictor scale provides an exact projection.The model is readily extended to ordered categorical data (McKelvey & Zavoina, 1975;McCullagh, 1980;Gilmour et al., 1987;Keen & Engel, 1997).

Implementation
All that is required to compute the coefficient of determination Ω  for LMM are the design matrix  and estimates of , , and var( β).These are then plugged into (24) and ( 25), from which Ω in ( 26) is computed.To estimate Ω  using (27), we further require the design matrix  and an estimate of .For GLMM, we can use the same approach as with LMM, replacing  with  ℎ .When pseudo-likelihood is used, the last iteration for the linearized model yields  ℎ directly.When Gaussian quadrature or the Laplace method is used, a minimal postprocessing step is needed to obtain  ℎ in (37) from , the estimate of , and a specification for  ℎ , either based on exact results for binary data or based on (38) in case of a first-order Taylor series approximation for the variance projection onto the linear predictor scale.The latter requires an estimate of  when this has parameters, and an estimate of the linear predictor (32) for the computation of   and   .In SAS, all of these computations may be conveniently implemented by two runs of the GLIMMIX procedure as is detailed in the Supporting Information.

Example 1
For illustration, we use the hypothetical beetle dataset in Nakagawa and Schielzeth (2013), which has three traits: body length (normal distribution, identity link), egg count (Poisson, log link), and color morph (binary, logit link) (see Table 2).

Example 2
Beitler and Landis (1985)  Note: The patient-level coefficient of determination is based on the binary inflation (see main text) of the fitted model.The clinic-treatment-level coefficient of determination is based on the binomial distribution using the specification for the links as described in Table 1.All models fitted by adaptive Gaussian quadrature using the GLIMMIX procedure of SAS (method = quad).
as a random sample from a larger target population.Thus, the linear predictor has fixed treatment main effects and random clinic main effects and clinic × treatment interaction effects.The response is the binomial count   of the number of patients responding favorably out of the total number of patients   allocated to a treatment in a given clinic.We fitted GLMMs with logit, probit, and complementary log-log link.Results using Gaussian quadrature are shown in Table 3.
As we have a binary distribution at the level of individual patients, we may consider the binary response   of the jth patient in the ith clinic × treatment group linked to the binomial response for the grouped data by   = ∑   .The binary response can be modeled using the same linear predictor as for the grouped data, adding on an auxiliary term ℎ  with var(ℎ  ) =  2 ∕3, var(ℎ  ) = 1, var(ℎ  ) =  2 ∕6 for logit, probit, and complementary log-log links, respectively.That variance pertains to the observed binary response   of individual patients, whereas we fit the model at the clinic × treatment level assuming a binomial distribution for the counts   .Thus, to obtain a patient-level coefficient of determination, the rows of the relevant vector and matrices (, , ,  ℎ ,  ℎ ) need to be expanded from the binomial model for grouped data (  ,   ) with 16 clinic × treatment combinations to represent the binary patient-level response   .This operation will henceforth be referred to as the binary inflation.The resulting patient-level coefficient of determination may be contrasted to a clinic × treatment-level coefficient of determination, where the distributional model is the binomial with count   and sample size   and the first-order Taylor series approximation is used with the specification for the link function given in Table 1.That analysis yields a much higher coefficient of determination (Table 3) because of the higher level of aggregation.The patient-level analysis is relevant when considering the value of the treatment for an individual patient, whereas the clinic × treatment-level analysis is relevant when the focus is on the comparison of treatments for entire clinics.We analyzed the same data using pseudo-likelihood (Table 4).The results are very similar to those obtained by Gaussian quadrature (Table 3).The three link functions yield rather similar fits based on both information criteria and the coefficients of determination.Note that when using pseudo-likelihood a true likelihood is not available but the coefficients of determination can be computed nonetheless.
TA B L E 4 Analysis of the clinic data (Beitler & Landis, 1985) using a fitted binomial model with three link functions (logit, probit, and complementary log-log link) Note: The patient-level coefficient of determination is based on the binary inflation (see main text) of the fitted model.The clinic-treatment-level coefficient of determination is based on the binomial distribution using the specification for the links as described in Table 1.All models fitted by restricted pseudo-likelihood using the GLIMMIX procedure of SAS (method = RSPL).
a Group = Clinic × treatment combination.

Example 3
Gilmour et al. (1987) use a dataset on the deformities in the feet of 2513 lambs scored in three ordered categories, denoted as K1, K2, and K3.The lambs represent 34 sires.The linear predictor comprises a random effect for sires and fixed effects for four contrasts denoted as YR, B1, B2, and B3.The dataset is also used in the documentation of the GLIMMIX procedure of SAS (version 9.4;Example 49.11).To set the stage, we first analyze this data using a binomial model with a probit link.For this purpose, the second-and third-ordered categories (K2 and K3) are merged.The response then is a count   for each sire  = 1, … , 34 of the number of lambs falling into category K1, and the corresponding sample size   is the total number of lambs representing the sire.We also consider the analysis for K1 and K2 merged to yield the count   .As we are using a probit link, we model the residual variance on the linear predictor scale as var(ℎ  ) = 1 for the jth lamb representing the ith sire.We use binary inflation (see Example 2) to obtain the lamb-level coefficient of determination for the 2513 lambs from the binomial model for grouped data (  ,   ) for the 34 sires (Table 5).This may be contrasted to a sirelevel coefficient of determination, where the distributional model is the binomial with count   and sample size   and the first-order Taylor series approximation is used with the specification for the probit link given in Table 1.As in Example 2, that analysis yields a much higher coefficient of determination (Table 5) because of the higher level of aggregation.For comparison, Table 5 also shows the results when K1 and K2 are merged and compared versus K3.It is seen that results do not agree perfectly but show a similar tendency.The sire-level analysis is relevant when the focus is on the evaluation of the sires, as is typically the case in breeding programs.The lamb-level analysis is relevant when the focus is on the status of the individual lambs.
Next, we use the same dataset for a multinomial proportional odds model with a cumulative probit link (see Example 49.11 of GLIMMIX documentation).As there are three ordered categories, we need to fit two thresholds (intercepts), denoted here as  1 and  2 .The linear predictor for the threshold model is   =   − (    +    ) for  = 1, 2. We do not fit a residual effect .The probability for the ith observation to fall into the kth category is   = Φ(  ) − Φ( ,−1 ) with  0 = −∞ and  3 = ∞ and Φ(.) the cumulative distribution function of the standard normal (Gilmour et al., 1987).Again, for the lamb-level coefficient of determination, we can represent the error on the latent scale by as var(ℎ  ) = 1, because of the probit link.Results in Table 5 for this model are quite similar to the binomial model when K2 and K3 are merged.For the sire-level analysis, there is some ambiguity.One could choose one category, or a cumulative category, to define a binomial variance on the original scale, which could be projected onto the linear predictor scale.However, TA B L E 5 Analysis of the lamb data (Gilmour et al., 1987) using a fitted binomial model for two categories (K1, K2, and K3) and a multinomial model with three categories (K1, K2, and K3) Note: The lamb-level coefficient of determination is based on the binary inflation (see the main text) of the fitted model using var(ℎ  ) = 1.The sire-level coefficient of determination is based on the binomial distribution using the specification for the probit link in Table 1; in the case of the multinomial model, we use the categories K1, K2, and K3 to derive a binomial model (variance function), which is then projected onto the linear predictor scale.All models fitted by adaptive Gaussian quadrature using the GLIMMIX procedure of SAS. a In the case of the multinomial model, we use the pairs of categories K1 vs. K2 and K3 or K1 and K2 vs. K3 to derive a binomial model (variance function), which is then projected onto the linear predictor scale.b Level at which the coefficient of determination is computed.
because there are more than two categories, the choice is not unique, and this is precisely why the GLIMMIX procedure reports only   but not   .Table 5 shows that the results differ when K1 versus K2 and K3 and when K1 and K2 versus K3 are the two categories.
To illustrate the method further, we fitted the threshold model for all possible subsets of the covariates YR, B1, B2, and B3 and computed the lamb-level coefficient on determination (Table 6).The ranking of models is not identical to that obtained with information criteria, but there is broad agreement as to the best models and the most important covariates (B1 and B2).Note that there is one instance of a negative coefficient of determination, showing that the property of the adjusted R 2 for LM carries over to our extension for LMM and GLMM.The reason for the possibility of a negative value is the bias correction in (24).Incidentally, McKelvey and Zavoina (1975) proposed a coefficient of determination for a threshold model for ordinal level dependent variables with probit link, which is similar to ours, but does not use the bias correction as per Equation (24) and does not account for random effects.

Single covariate
To assess the performance of the proposed method, we used simulation scenarios for random-coefficient regression as described in Xu (2003).The LMM used for simulation was TA B L E 6 Lamb-level coefficients of determination (%) for the proportional odds model using all possible subsets of the covariates YR, B1, B2, and B3 Note: All models fitted by adaptive Gaussian quadrature using the GLIMMIX procedure of SAS.
( = 1, … , ;  = 1, … ,   ) with  0 ∼ (0,  2 0 ),  1 ∼ (0,  2 1 ) and   ∼ (0,  2 ).Without loss of generality, we set  0 = 0.The covariate values  1 were simulated once from a standard normal distribution, and this one set of values was used in all 1000 simulation runs for a scenario.This constitutes a slight modification from Xu (2003), who modeled the covariate as a random variable, whereas here we condition on the values of the covariates.Using only one set of values allowed obtaining a single true value for the parameters Ω  , Ω  , and Ω  holding in all simulation runs.For the REML-based estimators of these parameters, we computed the mean and standard deviation.In computing the bias correction in (24), one can use the plug-in estimators of var( β).For LMM and GLMM when using pseudo-likelihood, we can also use the Kackar-Harville method, as implemented in the approach of Kenward andRoger (1997, 2009), which is available via the ddfm = KR option of the mixed model procedures in SAS.This option was used in these simulations.Results in Table 7 show that the estimators have very small bias in all scenarios.

Two covariates
In this simulation, model ( 41) was extended by adding a second covariate  2 , also generated from a standard normal distribution: where  2 ∼ (0,  2 2 ).In all scenarios, we set  0 = 0 and  0 = 2.The results in Table 8 again show a very small bias, this time consistently downward.

5.1
The ASV as a natural measure of variance with correlated data In geostatistics, the presence of covariance among observations is so central that the concept of semivariance is commonplace.The semivariance provides a very natural metric to account for correlation among observations.In geostatistics, it is common to assess the dispersion variance of a random field by integrating the semivariance across the area under consideration, even though there seems to be no consistent terminology in the geostatistical literature for this integral.Isaaks and Srivastava (1989, p. 481) use the term average variogram, whereas Webster and Oliver also use the term dispersion variance (2007, p. 60) for the same quantity.In this paper, we have extended that concept to obtain a general measure of total variance, the ASV, for assessing the coefficient of determination.It appears that outside of geostatistics the concept has been used only rarely.I believe it is a concept that deserves attention whenever observations in a dataset display covariance and a measure of total variance is needed that can account for any covariance in the data.In this paper, we have given several examples where the data structure called for modeling covariance via random effects that represent clustering (Examples 1 and 3) or using a serial correlation model for the residual error to accommodate a repeated measures structure (Example 2).Whenever the data structure calls for an LMM or GLMM to be fitted, there is automatically an implication that the data are correlated in some way.I strongly believe that this fact needs to be taken into account when defining measures of R 2 for LMM and GLMM, and this paper has shown how this can be done using the ASV.

Comparison of the ASV with the average marginal variance
An alternative to the ASV metric that ignores the covariance between observations is given by the average marginal variance (AMV) This alternative metric was considered by Piepho (2019).Differences in the estimated variance explained between AMV and ASV were not very large in the examples considered, one of which was Example 1 also considered in the present paper.However, the extent of the difference depends on the magnitude of the covariance and the design.To gain some insights, it is instructive to consider the balanced one-way layout and the random effects model with fixed intercept, and random effects for groups and observations within groups.It can be shown (Piepho, 2019, Section 4) that for this design and model,   () =  2  +  2  , where  2  is the variance for random group effects and  2  is the residual variance.By comparison, we find   () = ( − 1) −1 ( − 1) 2  +  2  , where a is the number of groups and m is the number of observations per group.The correlation of observations in the same group, known as the intraclass correlation, is given by  =  2  ∕( 2  +  2  ).The ASV,   (), approaches its minimum when a = 2, m → ∞ , and  → 1.In this limiting case,   () = 1 2   (), showing that ignoring the covariance can have a substantial effect in extreme cases.

Comparison with R 2 by Nakagawa and Schielzeth
The coefficient of determination proposed here bears some resemblance to that proposed by Nakagawa and Schielzeth (2013;Equations 26 and 27), but there are two important differences.First, in Nakagawa and Schielzeth (2013), the contribution of random effects to the total variance is effectively assessed by the AMV as discussed in Section 5.2, and so their measure does not account for the covariances among observations.Second, while Nakagawa and Schielzeth (2013) do not make a clear distinction between a parameter and its estimator and also are not explicit about the estimators to use for variance components, it appears they propose to estimate the contribution of fixed effects (denoted as  2  in their paper) by 1  β      β.When using this estimator, their coefficient of determination can be written as It follows from a standard result on the expected value of quadratic forms (Searle et al., 1992, p. 466, Theorem S1; also see Equations 15 and 24) that β      β is an upwardly biased estimator of       .It is therefore expected that this coefficient of determination overstates the variance explained by fixed effects.It may be noted that for LM, the numerator in ( 42) is the model sum of squares.This is also the numerator in the usual R 2 for LM, which may be written as assuming that the usual estimator of  2  given in ( 10) is used.However, for LM the denominator of ( 42) is not quite the same as in the usual R 2 in (43), because ( V) =  σ2  is not the same as the residual sum of squares, given by ( − ) σ2  .

Connection of ASV with heritability
It is well known that there is a close connection between the coefficient of determination and heritability, which assesses the fraction of the phenotypic variance explained by genotypic of genetic effects.A key question in the definition of heritability, especially in the context of crop species, is what constitutes "the phenotype."The answer to this question depends on the objectives of the analysis and the experimental design.Arguably, in most applications of plant breeding the relevant measure of "the phenotype" is a genotype mean across replicate observations per trial and across trials (Schmidt et al., 2019).Usually, the genotype mean is obtained by empirical best linear unbiased estimation (BLUE) based on a suitable LMM.A further LMM with random genetic or genotypic effects may then be fitted to these genotype means, taking the variance-covariance matrix of the BLUE of genotype means into account as the residual variance structure.The coefficient of determination of this analysis can be regarded as a measure of heritability that takes into account any heterogeneity of variance, and any covariance among genotype means resulting from the experimental design and genetic relatedness among the genotypes, as illustrated in Feldmann et al. (2022).

A C K N O W L E D G M E N T S
This work was supported by the German Research Foundation (grant PI 377/20-2).

C O N F L I C T O F I N T E R E S T S TAT E M E N T
The author declares that he has no conflicts of interest.

D ATA AVA I L A B I L I T Y S TAT E M E N T
The data that support the findings of the motivational example of this study are available in the Supporting Information.

O P E N R E S E A R C H B A D G E S
This article has earned an Open Data badge for making publicly available the digitally-shareable data necessary to reproduce the reported results.The data is available in the Supporting Information section.This article has earned an open data badge "Reproducible Research" for making publicly available the code necessary to reproduce the reported results.The results reported in this article could fully be reproduced.

TA B L E 1 a
Values of diagonal elements of   and   for a few examples (m = sample size of binomial distribution) (.) is the probability density function of the standard normal distribution.

TA B L E 2
Coefficients of determination (%) of the beetle data inNakagawa and Schielzeth (2013) Based on a comparison of full and null models.Results copied from Piepho (2019) except where indicated.b Results copied from this paper. 2 () and  2 () are the marginal and conditional coefficients of determination proposed by the authors.These can be directly compared with Ω  and Ω  , respectively.c Residual effects  not included here.
a d Corrected compared to

of parameters and information criteria: Binomial link function Logit Probit Complementary log-log Estimate SE Estimate SE Estimate SE
(Beitler & Landis, 1985)cal trial on two treatments (control vs. intervention) conducted at eight clinics (also seeStroup et al., 2018, p. 394).The overall dataset comprises 273 patients.The clinics are regarded TA B L E 3 Analysis of the clinic data(Beitler & Landis, 1985)using a fitted binomial model with three link functions (logit, probit, and complementary log-log link) Estimates

Binomial model Multinomial model (proportional odds) K1 versus K2 and K3 K1 and K2 versus K3 Estimate SE Estimate SE Estimate SE
A Monte Carlo study investigating the mean (and standard deviation) of Ω , Ω , and Ω under regression model (41) with one covariate Results are based on 1000 simulation runs.For details, see the main text.
TA B L E 7Note:Note: Results are based on 1000 simulation runs.For details, see the main text.