Introduction
 Top of page
 Summary
 Introduction
 Definitions of R^{2}
 Common problems when generalizing R^{2}
 General and simple R^{2} for GLMMs
 Related issues
 Worked examples
 Final remarks
 Acknowledgements
 References
 Appendix 1
 Supporting Information
Many biological datasets have multiple strata due to the hierarchical nature of the biological world, for example, cells within individuals, individuals within populations, populations within species and species within communities. Therefore, we need statistical methods that explicitly model the hierarchical structure of real data. Linear mixedeffects models (LMMs; also referred to as multilevel/hierarchical models) and their extension, generalized linear mixedeffects models (GLMMs) form a class of models that incorporate multilevel hierarchies in data. Indeed, LMMs and GLMMs are becoming a part of standard methodological tool kits in biological sciences (Bolker et al. 2009), as well as in social and medical sciences (Gelman & Hill 2007; Congdon 2010; Snijders & Bosker 2011). The widespread use of GLMMs demonstrates that a statistic that summarizes the goodnessoffit of mixedeffects model to the data would be of great importance. There seems currently no such summary statistic that is widely accepted for mixedeffects models.
Many scientists have traditionally used the coefficient of determination, R^{2} (ranging from 0 to 1), as a summary statistic to quantify the goodnessoffit of fixed effects models such as multiple linear regressions, anova, ancova and generalized linear models (GLMs). The concept of R^{2} as ‘variance explained’ is intuitive. Because R^{2} is unitless, it is extremely useful as a summary index for statistical models because one can objectively evaluate the fit of models and compare R^{2} values across studies in a similar manner as standardized effect size statistics under some circumstances (e.g. models with the same responses and similar set of predictors or in other words, it can be utilized for metaanalysis; Nakagawa & Cuthill 2007).
In Table 1, we briefly summarize 12 properties of R^{2} (based on Kvålseth 1985 and Cameron & Windmeijer 1996; compilation adopted from Orelien & Edwards 2008) that will provide the reader with a good sense of what a ‘traditional’ R^{2} statistic should be and also provide a benchmark for generalizing R^{2} to mixedeffects models. Generalizing R^{2} from linear models (LMs) to LMMs and GLMMs turns out to be a difficult task. A number of ways of obtaining R^{2} for mixed models have been proposed (e.g. Snijders & Bosker 1994; Xu 2003; Liu, Zheng & Shen 2008; Orelien & Edwards 2008). These proposed methods, however, share some theoretical problems or practical difficulties (discussed in detail below), and consequently, no consensus for a definition of R^{2} for mixedeffects models has emerged in the statistical literature. Therefore, it is not surprising that R^{2} is rarely reported as a model summary statistic when mixed models are used.
Table 1. Twelve properties of ‘traditional’ R^{2} for regression models; adopted from Orelien & Edwards (2008)Property  References 

R^{2} must represent a goodnessoffit and have intuitive interpretation  Kvålseth (1985) 
R^{2} must be unit free; that is, dimensionless  Kvålseth (1985) 
R^{2} should range from 0 to 1 where 1 represents a perfect fit  Kvålseth (1985) 
R^{2} should be general enough to apply to any type of statistical model  Kvålseth (1985) 
R^{2} values should not be affected by different model fitting techniques  Kvålseth (1985) 
R^{2} values from different models fitted to the same data should be directly comparable  Kvålseth (1985) 
Relative R^{2} values should be comparable to other accepted goodnessoffit measures  Kvålseth (1985) 
All residuals (positive and negative) should be weighted equally by R^{2}  Kvålseth (1985) 
R^{2} values should always increase as more predictors are added (without degreesoffreedom correction)  Cameron & Windmeijer (1996) 
R^{2} values based on residual sum of squares and those based on explained sum of squares should match  Cameron & Windmeijer (1996) 
R^{2} values and statistical significance of slope parameters should show correspondence  Cameron & Windmeijer (1996) 
R^{2} should be interpretable in terms of the information content of the data  Cameron & Windmeijer (1996) 
In the absence of R^{2}, information criteria are often used and reported as comparison tools for mixed models. Information criteria are based on the likelihood of the data given a fitted model (the ‘likelihood’) penalized by the number of estimated parameters of the model. Commonly used information criteria include Akaike Information Criterion (AIC) (Akaike 1973), Bayesian information criterion (BIC), (Schwarz 1978) and the more recently proposed deviance information criterion (DIC), (Spiegelhalter et al. 2002; reviewed in Claeskens & Hjort 2009; Grueber et al. 2011; Hamaker et al. 2011). Information criteria are used to select the ‘best’ or ‘better’ models, and they are indeed useful for selecting the most parsimonious models from a candidate model set (Burnham & Anderson 2002). There are, however, at least three important limitations to the use of information criteria in relation to R^{2}: (i) while information criteria provide an estimate of the relative fit of alternative models, they do not tell us anything about the absolute model fit (cf. evidence ratio; Burnham & Anderson 2002), (ii) information criteria do not provide any information on variance explained by a model (Orelien & Edwards 2008), and (iii) information criteria are not comparable across different datasets under any circumstances, because they are highly dataset specific (in other words, they are not standardized effect statistics which can be used for metaanalysis; Nakagawa & Cuthill 2007).
In this paper, we start by providing the most common definitions of R^{2} in LMs and GLMs. We then review previously proposed definitions of R^{2} measures for mixedeffects models and discuss the problems and difficulties associated with these measures. Finally, we explain a general and simple method for calculating variance explained by LMMs and GLMMs and illustrate its use by simulated ecological datasets.
Definitions of R^{2}
 Top of page
 Summary
 Introduction
 Definitions of R^{2}
 Common problems when generalizing R^{2}
 General and simple R^{2} for GLMMs
 Related issues
 Worked examples
 Final remarks
 Acknowledgements
 References
 Appendix 1
 Supporting Information
where ‘var’ indicates the variance of what is in the following parentheses. Equation (eqn 6) can also be expressed as the ratio between the residual variance of the model of interest and the residual variance of the null model (also referred to as the empty model or the intercept model):
 (eqn 7)
where is the residual variance of the null model.
We have deliberately left −2 in the denominator and numerator so that (‘D’ signifies ‘deviance’) can be compared with Equation (eqn 3). For a LM (Equation (eqn 1)), the −2 loglikelihood statistic (sometimes referred to as deviance) is equal to the residual sum of squares based on OLS of this model (Menard 2000; see a series of formulas for nonGaussian responses in Table 1 of Cameron & Windmeijer 1997). There are several other likelihoodbased definitions of R^{2} (reviewed in Cameron & Windmeijer 1997; Menard 2000), but we do not review these definitions, as they are less relevant to our approach below. We will instead discuss the generalization of R^{2} to LMMs and GLMMs, and associated problems in this process, in the next section.
Common problems when generalizing R^{2}
 Top of page
 Summary
 Introduction
 Definitions of R^{2}
 Common problems when generalizing R^{2}
 General and simple R^{2} for GLMMs
 Related issues
 Worked examples
 Final remarks
 Acknowledgements
 References
 Appendix 1
 Supporting Information
The first obstacle of fitting models with REML only applies to LMMs, and this can be resolved by using the ML estimates instead of REML. However, it is well known that variance components will be biased when models are fitted by ML (e.g. Pinheiro & Bates 2000).
With respect to the second obstacle regarding the choice of null models, it seems that both are permitted and accepted in the literature (e.g. Xu 2003; Orelien & Edwards 2008). Inclusion of random factors in the intercept model, however, can certainly change the likelihood of the null model that is used as a reference, and thus, it changes R^{2} values. This relates to an important matter. For mixedeffects models, R^{2} can be categorized loosely into two types: marginal R^{2} and conditional R^{2} (Vonesh, Chinchilli & Pu 1996). Marginal R^{2} is concerned with variance explained by fixed factors, and conditional R^{2} is concerned with variance explained by both fixed and random factors. So far, we only concentrated on the former, marginal R^{2}, but we will expand more on the distinction between the two types in the next section.
Although we do not review all proposed definitions of R^{2} for mixedeffects models here (see Menard 2000; Xu 2003; Orelien & Edwards 2008; Roberts et al. 2011), it appears that all alternative definitions of R^{2} suffer from one or more aforementioned problems and their implementations may not be straightforward. In the next section, we introduce a definition of R^{2}, which is simple and common to both LMMs and GLMMs and probably less prone to the aforementioned problems than previously proposed definitions.
General and simple R^{2} for GLMMs
 Top of page
 Summary
 Introduction
 Definitions of R^{2}
 Common problems when generalizing R^{2}
 General and simple R^{2} for GLMMs
 Related issues
 Worked examples
 Final remarks
 Acknowledgements
 References
 Appendix 1
 Supporting Information
We now generalize to GLMMs. We have mentioned already that for nonGaussian responses, it is difficult to define the residual variance, . However, it is possible to define the residual variance on the latent (or link) scale, although this definition of the residual variance is specific to the error distribution and the link function used in the analysis. In GLMMs, can be expressed as three components: (i) multiplicative dispersion (ω), (ii) additive dispersion () and (iii) distributionspecific variance () (detailed in Nakagawa & Schielzeth 2010). GLMMs can be implemented in two distinct ways, either by multiplicative or additive dispersion; dispersion is fitted to account for variance that exceeds or falls short of the distributionspecific variance (e.g. from binomial or Poisson distributions). In this paper, we only consider additive dispersion implementation of GLMMs although the formulae that we present below can be easily modified for the use with GLMMs that apply to multiplicative dispersion. For more details and also for a review of intraclass correlation (also known as repeatability) and heritability, both of which are closely connected to R^{2} (see Nakagawa & Schielzeth 2010). When additive dispersion is used, is equal to the sum of the additive dispersion component and the distributionspecific variance , and thus, R^{2} for GLMMs can be defined as:
 (eqn 28)
where is variance explained on the latent (or link) scale rather than original scale. This can be easily generalized to multiple levels:
 (eqn 29)
where u is the number of random factors in GLMMs (or LMMs) and is the variance component of the lth random factor. Equation (eqn 29) can be modified to express conditional R^{2} (i.e. variance explained by fixed and random factors).
 (eqn 30)
As one can see in Equation (eqn 30), conditional R^{2} () despite its somewhat confusing name can be interpreted as the variance explained by the entire model. Both marginal and conditional convey unique and interesting information, and we recommend they both be presented in publications.
In the case of a Gaussian response and an identity link (as used in LMMs), the linked scale variance and the original scale variance are the same and the distributionspecific variance is zero. Thus, () reduces to in Equations (eqn 29) and (eqn 30). For other GLMMs, the linkscale variance will differ from the original scale variance. We here present R^{2} calculated on the link scale because of its generality: Equations (eqn 29) and (eqn 30) can be applied to different families of GLMMs, given the knowledge of distributionspecific variance and a model that fits additive overdispersion (e.g. MCMCglmm; Hadfield 2010). Importantly, when the denominators of Equations (eqn 29) and (eqn 30) include (i.e. for GLMM), both types of will never become 1 in contrast to traditional R^{2} (see also Table 1). Table 2 summarizes the specifications for binary/proportion data and count data, which are equivalent to Equations (eqn 22), (eqn 23), (eqn 24), (eqn 25). The GLMM formulations presented in Table 2 for binomial GLMMs were first presented by Snijders & Bosker (1999). They also show that this approach can be extended to multinomial GLMMs where the response is categorical with more than two levels (Snijders & Bosker 1999; see also Dean, Nakagawa & Pizzari 2011). However, to our knowledge, equivalent formulas for Poisson GLMMs (i.e. count data) have not been previously described (for derivation, see Appendix 1).
Table 2. Examples of generalized linear mixed models (GLMMs) with binomial and Poisson errors (two random factors) and corresponding marginal and conditional R^{2}  Binary and proportion data  Count data 

Link function  Logit link  Probit link  Log link  Squareroot link 
Distributionspecific variance   1   0·25 
Model specification   
Description  Y_{ijk} is the number of ‘successes’ in m_{ijk} trials by the jth individual in the kth group at the ith occasion (for binary data, m_{ijk} is 1), p_{ijk} is the underlying (latent) probability of success for the jth individual in the kth group at the ith occasion (for binary data, is 0).  Y_{ijk} is the observed count for the jth individual in the kth group at the ith occasion, μ_{ijk} is the underlying (latent) mean for the ith individual in the kth group at the ith occasion. 
Marginal R^{2}     
Conditional R^{2}     
As a technical note, we mention that for binary data the additive overdispersion is usually fixed to 1 for computational reasons, as additive dispersion is not identifiable (see Goldstein, Browne & Rasbash 2002). Furthermore, some of the R^{2} formulae include the intercept β_{0} (like in the case Poisson models for count data). In such cases, R^{2} values will be more easily interpreted when fixed effects are centred or otherwise have meaningful zero values (see Schielzeth 2010; see also Appendix 1). We further note that for Poisson models with squareroot link and a mean of Y_{ijk} <5, the given formula is likely to be inaccurate because the variance of squareroot transformation of count data substantially exceeds 0·25 (Table 2; Nakagawa & Schielzeth 2010).
Related issues
 Top of page
 Summary
 Introduction
 Definitions of R^{2}
 Common problems when generalizing R^{2}
 General and simple R^{2} for GLMMs
 Related issues
 Worked examples
 Final remarks
 Acknowledgements
 References
 Appendix 1
 Supporting Information
where C_{γ}, C_{α} and C_{ɛ} are PCV at the level of groups, individuals and units (observations), respectively, and , and are variance components from the intercept model (i.e. Equation (eqn 22); PCV for additive dispersion, can also be calculated by replacing with ). Proportion change in variance is in fact one of earliest proposed R^{2} measures for LMMs (Raudenbush & Bryk 1986; Bryk & Raudenbush 1992), although it can take negative values (Snijders & Bosker 1994). We think, however, that presenting PCV along with R^{2}_{GLMM} will turn out to be very useful, because PCV monitors changes specific to each variance component, that is, how the inclusion of additional predictor(s) has reduced (or increased) variance component at different levels. For example, if C_{γ} = 0·12, C_{α} = −0·05 and C_{ɛ} = 0·23, the negative estimate shows that variance at the individual level has increased (i.e. ). Additionally, we refer the reader to Hössjer (2008) who describes an alternative approach for quantifying variance explained at different levels using variance components from a single model.
Worked examples
 Top of page
 Summary
 Introduction
 Definitions of R^{2}
 Common problems when generalizing R^{2}
 General and simple R^{2} for GLMMs
 Related issues
 Worked examples
 Final remarks
 Acknowledgements
 References
 Appendix 1
 Supporting Information
We will illustrate how the calculation of along with PCV using simulated datasets. Consider a hypothetical species of beetle that has the following life cycle: larvae hatch and grow in the soil until they pupate, and then adult beetles feed and mate on plants. They are a generalist species and so are widely distributed. We are interested in the effect of extra nutrients during the larval stage on subsequent morphology and reproductive success. Larvae are sampled from 12 different populations (‘Population’; see Fig. 1). Within each population, larvae are collected at two different microhabitats (‘Habitat’): dry and wet areas as determined by soil moisture. Larvae are exposed to two different dietary treatments (‘Treatment’): nutrient rich and control. The species is sexually dimorphic and can be easily sexed at the pupa stage (‘Sex’). Male beetles have two different colour morphs: one dark and the other reddish brown (‘Morph’, labelled A and B in Fig 1), and morphs are supposedly subject to sexual selection. Sexed pupae are housed in standard containers until they mature (‘Container’). Each container holds eight samesex animals from a single population, but with a mix of individuals from the two habitats (N_{[container]} = 120; N_{[animal]} = 960). Three traits are measured after maturation: (i) body length of adult beetles (Gaussian distribution), (ii) frequencies of the two distinct male colour morphs (binomial or Bernoulli distribution) and (iii) the number of eggs laid by each female (Poisson distribution) after random mating (Fig. 1).
Data for this hypothetical example were created in R 2.15.0 (R Development Core Team 2012). We used the function lmer in the R package lme4 (version 0.99937542; Bates, Maechler & Bolker 2011) for fitting LMMs and GLMMs. We modelled three response variables (see also Table 3): (i) the body length with a Gaussian error (‘Size models’), (ii) the two male morphs with the binomial error (logitlink function; ‘Morph models’) and (iii) the female egg numbers with the Poisson error (loglink function; ‘Fecundity models’). For each dataset, we fitted the null (intercept/empty) model and the ‘full’ model; all models contained ‘Population’ and ‘Container’ as random factors; we included an additive dispersion term (see Table 2) in Fecundity models. The full models all included ‘Treatment’ and ‘Habitat’ as fixed factors; ‘Sex’ was added as a fixed factor to the body size model. Two kinds of and PCV for the three variance components were calculated as explained above. The results of modelling the three different datasets are summarized in Table 3; all datasets and an R script are provided as online supplements (Data S14).
Table 3. Hypothetical mixedeffects modelling of the effects of nutrient manipulations on body length (mm) (Size models), male morphology (Morph models) and female eggs (Fecundity models); N_{[population]} = 12, N_{[container]} = 120 and N_{[animal]} = 960Model name  Size models Gaussian mixed models  Morph models Binary mixed models (logit link)  Fecundity models Poisson mixed models (log link) 

Null Model  Full Model  Null Model  Full Model  Null Model  Full Model 


Fixed effects  b [95% CI]  b [95% CI]  b [95% CI]  b [95% CI]  b [95% CI]  b [95% CI] 
Intercept  14·08 [13·41, 14·76]  15·22 [14·53, 15·91]  −0·38 [−0·96, 0·21]  −1·25 [−1·96, −0·54]  1·54 [1·22, 1·86]  1·23 [0·91, 1·56] 
Treatment (experiment)  –  0·31 [0·18, 0·45]  –  1·01 [0·60, 1·43]  –  0·51 [0·41, 0·26] 
Habitat (wet)  –  0·09 [−0·05, 0·23]  –  0·68 [0·27, 1·09]  –  0·10 [0·001, 0·20] 
Sex (male)  –  −2·66 [−2·89, −2·45]  –  –  –  – 
Random effects  VC  VC  VC  VC  VC  VC 
Population  1·181  1·379  0·946  1·110  0·303  0·304 
Container  2·206  0·235  < 0·0001  0·006  0·012  0·023 
Residuals (additive dispersion)  1·224  1·197  –  –  0·171  0·100 
Fixed factors  –  1·809  –  0·371  –  0·067 
PCV_{[Population]}  –  −16·77%  –  −17·34%  –  −0·54% 
PCV_{[Container]}  –  89·37%  –  <−100%  –  −84·32% 
PCV_{[Residuals]}  –  2·21%  –  –  –  41·54% 
 –  39·16%  –  7·77%  –  9·76% 
 –  74·09%  –  31·13%  –  57·23% 
AIC  3275  3063  602·4  573·1  902·7  811·9 
BIC  3295  3097  614·9  594·0  920·4  836·9 
In all the three model sets, some variance components in the full models were larger than corresponding variance components in the null models (e.g. ). In Morph models, the sum of all the random effect variance components in the full model was greater than the total variance in the null model (c.f. ); see above; Snijders & Bosker 1994). All these patterns result in negative PCV values (see Table 3), while values never become negative. In Morph and Fecundity models, values are relatively minor (8–10%) compared with values. In Size models, on the other hand, was nearly 40%. This was due to a very large effect of ‘Sex’ in body size model; in this model, the ‘Treatment’ and ‘Habitat’ effects together accounted for only c. 1% of the variance (not shown in Table 3). The variance among containers in the null Size model was conflated with the variance caused by differences between the sexes in the null model, as ‘Sex’ and ‘Container’ are confounded by the experimental design (single sex in each container; Fig. 1). A part of the variation assigned to ‘Container’ in the null model was explained by the fixed effect ‘Sex’ in the full model. Finally, it is important to note that both ‘Treatment’ and ‘Habitat’ effects were statistically significant in all the datasets in most cases (five out of six). Much of data variability, however, resided in the random effects along with residuals (additive dispersion) and in the distributionspecific variance. Note that differences between corresponding and values reflect how much variability is in random effects. Importantly, comparing the different variance components including that of the fixed factors within as well as between models, we believe, could help researchers gaining extra insights into their datasets (Merlo et al. 2005a,b). We also note that in some cases, calculating a variance component for each fixed factor may prove useful.