Confidence, prediction, and tolerance in linear mixed models

The literature about Prediction Interval (PI) and Tolerance Interval (TI) in linear mixed models is usually developed for specific designs, which is a main limitation to their use. This paper proposes to reformulate the two‐sided PI to be generalizable under a wide variety of designs (one random factor, nested and crossed designs for multiple random factors, and balanced or unbalanced designs). This new methodology is based on the Hessian matrix, namely, the inverse of (observed) Fisher Information matrix, and is built with a cell mean model. The degrees of freedom for the total variance are calculated with the generalized Satterthwaite method and compared to the Kenward‐Roger's degrees of freedom for fixed effects. Construction of two‐sided TIs are also detailed with one random factor, and two nested and two crossed random variables. An extensive simulation study is carried out to compare the widths and coverage probabilities of Confidence Intervals (CI), PIs, and TIs to their nominal levels. It shows excellent coverage whatever the design and the sample size are. Finally, these CIs, PIs, and TIs are applied to two real data sets: one from orthopedic surgery study (intralesional resection risk) and the other from assay validation study during vaccine development.

and contains the effect of bias and variability on individual release values. During analytical method qualification, it is not only important to investigate the systematic bias of the analytical method but also the ranges within which individual release values will deviate from their true amount. In an assay qualification study during vaccine development, the trueness is evaluated using the CIs for the mean method bias at each concentration of the sample. On the other hand, Prediction Intervals (PIs) are very useful to assess the range of values within which a future observation may lie, eg, where a measurement of concentration is expected to lie in a new batch. Note that a PI can also be named as Tolerance Interval (TI) type I or -expectation TI. Furthermore, a confidence level can be added to a PI, to obtain a TI type II (or beta-gamma content). TIs are therefore a good extension to PIs especially for small sample sizes, as the uncertainty of estimating the quantiles with a limited number of observations is taken into account. These intervals are applied in many contexts, especially in pharmaceutical industries, 1,2 and more generally in industrial statistics (see the book from Kenett et al 1 ). TIs can also be used for process control in both the univariate and multivariate cases, as already proposed by Fuchs and Kenett 3 in the eighties. In method comparison studies (also called bridging, eg, for analytical method transfer between laboratory or reagent bridging of critical reagents), the prediction (or tolerance) intervals are used to assess the range of the differences between two clinical or analytical measurement methods. 4,5 It has been recently shown that PIs and TIs are better than agreement intervals to compare two clinical measurement methods. 4 In transfer of an analytical procedure, individual future values in a receiving laboratory should be "similar" to those of the reference laboratory. In assay or analytical method qualification, the accuracy is related to the PI when observed concentrations are compared to their theoretical known values. The PI or TI for future measures should, ideally, lie completely inside an equivalence margin, eg, the future measurements will not differ more than 20% from their nominal level.
The concepts of CI, PI, and TI will be here revisited under the framework of linear mixed models, including fixed effects of any type. The advantages, the coverage probabilities, and convergence of these intervals will be discussed. The literature usually focuses on one-way random models (see overview by Krishnamoorthy and Mathew 6 ), random effects models, [6][7][8] or Monte Carlo simulation 9 to calculate the TIs. These formulas are no longer appropriate in case of unbalanced or more complex designs, and this is a main limitation to their use. Furthermore, most of the papers deal with one-sided intervals (see the introductory overview by Sharma and Mathew 10 ), while two-sided intervals will be the main focus of this paper. On the other hand, Sharma and Mathew 10 proposed a mathematical approach based on small-sample asymptotic properties to calculate one-sided or two-sided TIs for balanced or unbalanced designs. This methodology will be here compared to the modified large sample (MLS) approach applied on the expected mean squares (EMS) for balanced or unbalanced designs.
We propose a novel analytical two-sided PI formula that is generalizable under a wide variety of designs (one random factor, multiple random factors with nested or crossed structure, and balanced or unbalanced designs). The generalized Satterthwaite's method, based on the Hessian matrix, calculates the degrees of freedom for the variance of the prediction. The sum of the variances for fixed effects and variance components is directly used as the variance for the prediction. This approach is compared to the classical Satterthwaite approximation on the mean squares for the degrees of freedom and the use of the effective sample size to predict future observations. Additionally, an algebraic proof for the equivalence of these two methods in case of one-way random effect model is given in the Appendix. An intense simulation study with a wide variety of designs will be made to compare the different intervals and to assess their coverage probabilities. Two real data sets will be used to illustrate and compare the different statistical intervals: one from orthopedic surgery to evaluate the risk of intralesional resection bone tumor (a mixed model with fixed and random variables) and the other from an assay validation study (a one-way random effect model).
The proposed PI and TI can be useful for a large number of applications in medical research under the linear mixed model framework. The use of PIs for inferring on a future observation deserves more attention from researchers since it quantifies the total uncertainty of measurements taking into account the systematic bias (by using common CI) as well as the precision. The former has been widely accepted and used, while the latter should be equally addressed by researcher for its use when making interpretation of research results. Moreover, the rarely used TI by in-cooperating the level of confidence would bring additional information to medical research area when a higher level of assurance is needed.

CONFIDENCE, PREDICTION, AND TOLERANCE FOR A UNIVARIATE MODEL
In this section, we briefly review the classical CI for a mean, the PI for a single future observation and the TI for a proportion of future observations. This will be useful to the understanding on mixed models, to be discussed later. Given that the random variable X is normally distributed, it follows X ∼ N( , 2 ), where is the mean and 2 the variance, both assumed unknown and estimated, respectively, byX and S 2 (the unbiased estimator of the variance). The classical 100(1 − )% CI for the mean is then given byX where t 1− /2,n−1 is the 100(1 − ∕2)% quantile of the t-distribution with n − 1 degrees of freedom.
To predict a future value, X n+1 , the uncertainty is composed of two parts: the variability of the estimated mean and the variability of the random variable itself, ie,X ∼ N( , 2 ∕n) and X n+1 ∼ N( , 2 ). It follows that X n+1 −X ∼ N(0, 2 (1+1∕n)), where the variance 2 has been factorized. Then, the classical 100(1 − )% PI for a future observation is given bȳ The PI for a single future observation, X n+1 , is also called -expectation TI, because a PI will contain, on average, 100(1 − )% (eg, 95%) of the future data. The TI type II, also named beta-gamma content, is an interval in which at least 100(1 − )% of the entire population (future data) will lie with a 100(1 − )% confidence level. The exact formula is complex and requires to solve an integral, but a good approximate formula can be obtained by replacing the t quantile by a z quantile (the standardized normal distribution) and the standard deviation by the upper bound of its 100(1 − )% CI. [11][12][13] This leads to the following formula: where z 1− /2 is the 100(1 − ∕2)% quantile of the standardized normal distribution and 2 ,n−1 is the 100( )% quantile of the 2 distribution. A detailed comparison between PI and TI is given in the literature with theoretical aspects and simulation results. 4 The left panel of Figure 1 summarizes the relationships between these different intervals and their convergence. One sample is simulated for sample size n = 5 to 100, and the different intervals are calculated by formulas (1), (2), and (3). The lower (upper) bounds are connected for each interval and smoothed curves are added for better visualization. Confidence intervals narrow with increasing sample sizes and collapse to the point estimate (here, the mean ). The PI and TI move closer to the population distribution quantiles, ie, q 0.025 and q 0.975 , for 95% PI or 95% TI (whatever its confidence level).

CONFIDENCE, PREDICTION, AND TOLERANCE IN LINEAR MIXED MODELS
There is no fundamental difference between univariate and mixed models in the context of PI and TI other than the estimation of the parameters and their covariance matrix. The mean of the univariate response will be replaced by the estimate of a given linear combination of fixed effects and the variance by the sum of the variance components in the linear mixed model. The main difficulty lies in estimating the correct degrees of freedom for PI. This section provides briefly the main formulas in estimating a general linear mixed model by the restricted/residual maximum likelihood (REML) method with a variance components covariance structure. More details and other methods can be found in the literature. 14,15

The linear mixed model and its covariance matrices for the estimated parameters
The linear mixed model is given by the following formula: where Y is the response variable (the known data vector) of length n, is a vector of p unknown parameters corresponding to the fixed effects with known design matrix X, is a vector of g unknown parameters corresponding to the random effects with known design matrix Z, and is the unknown random error vector. We introduce the notation to define a vector containing both fixed and random effects: = ( , ). The vectors and are assumed to be Gaussian distributed with expectations 0 and their variances are given respectively by G and R as follows: The variance of Y is therefore given by V as follows: The vector of unknown parameters in V is defined by = ( 1 , … , q ). It includes the residual variance ( q = 2 ) and is of length q. 16 The number of unknown parameters in depends on the number of random effects in the model and the residual variance. Different approaches to estimate the parameters in G and R are provided in the literature, 16 but this paper will focus on the REML technique. 17 IfV andĜ are the estimates of V and G obtained by REML (and̂the estimate of ), then and can be estimated, respectively, bŷand̂as follows: where − denotes the generalized inverse of a matrix. Note that the variance estimates can be bounded (to zero) to avoid a negative variance component. The approximate variance-covariance matrix of̂= (̂,̂) is then given byĈ 18 as follows: where the variance-covariance matrix of the fixed effects is given by the well-known formula Var(̂) =Ĉ 11 = (X ′V −1 X) − , the covariance between the fixed and random effects is given byĈ 21 = −ĜZ ′V −1 XĈ 11 , and the variance-covariance matrix of the random effects is given by Var(̂) =Ĉ 22 = (Z ′R −1 Z +Ĝ −1 ) −1 −Ĉ 21 X ′V −1 ZĜ. Note that̂and̂are, respectively, the empirical best linear unbiased estimator and empirical best linear unbiased predictor (EBLUP) as G and R are replaced bŷ G andR. Other approximations can be found in the literature, especially useful for unbalanced models, in small sample sizes, or by using the "sandwich" estimator. [19][20][21][22][23][24] The covariance matrix of maximum likelihood estimates can be estimated by using the Fisher information matrix. 25 The mathematical details are given by Wolfinger et al. 16 The covariance matrix of̂, here, denoted bŷ, can be estimated by using the (observed) Fisher information matrix (I Obs ), which is the inverse of Hessian matrix:̂= Finally,̂and̂are asymptotically independent: 26,27(p239), 28(p33) Cov(̂,̂) ≈ 0.
Different approximations are available in the literature to calculate the CIs for the fixed effects, but Kenward-Roger's method for the degrees of freedom, k, has been considered. 29 In SAS, the options OUTPM and OUTP provide the predicted values with the confidence levels for, respectively, a given level of fixed effects or a given level of fixed effects conditioned on random effects:

Prediction interval in a linear mixed model
Like the univariate model, the uncertainty to predict a future value in a mixed model, Y n+1 , is composed of two parts: the variability of the fixed effects and the total variability (but these two parts cannot be, here, factorized): Prediction Variance = Fixed effects variance + Total variance.
For a given and estimable linear combination of fixed effects (l estimated by l̂), the mean,Ȳ l and the future value, Y n+1 , follow respectively these normal distributions:Ȳ where 2 T is the "total" variance 30 : and estimated bŷ2 Formulas (15) and (16) are illustrated on the right panel of Figure 1. As explained in the previous section, a 100(1 − )% PI will converge to the true corresponding quantiles 100( ∕2)% and 100(1 − ∕2)% of distribution (16). These quantiles are here given by By combining (15) and (16), it follows that A 100(1 − )% PI is then given byŶ [Correction added on 20 November 2019 after first online publication: Equation (21) has been corrected.] whereŶ l ,n+1 = l̂. The key parameter in formula (21) is the degrees of freedom r. Satterthwaite approximation can be used, but this approach assumes that the variance components are independent. 31 This assumption may be violated for complex designs, unbalanced designs, or missing data, and also for balanced design, if one is not careful to use the mean-squares (mean squares are independent) rather than the variance components. The methodology proposed here is therefore based on the generalized Satterthwaite method where the degrees of freedom of a variance can be directly calculated from the variance itself and its standard error (SE) (this equation is more general than the traditional Satterthwaite approximation, as it can even be applied to a linear combination of any distribution, as long as the resulting distribution can reasonably be assumed as chi-squared): The degrees of freedom for the PI in formula (21) is therefore given by where Var(̂2 T ) can be replaced byVar(̂2 T ) as follows: wherêi is the (i, j) element of̂(the observed Fisher information matrix). As a particular case, the PI formula, in the context of a one-way random balanced design, is usually written as 8 : wherêis the estimated intercept (the only fixed effect of the model), r ′ is the degree of freedom of the total variance (computed from the mean squares), and N e is the "effective sample size." One can notice, after some algebraic manipulations, that our formula (21) is more general and actually includes formula (25). Algebraic proof is given in Appendix A. Note that each bound of the PI converges to the true quantiles of the normal distribution N ( l , 2 T ) (see (19) and Figure 1 (Right)), when the sample size increases (up to infinity) as t 1− /2,r → z 1− /2 ,Ĉ 11 → 0, and̂2 T → 2 T . To calculate a PI conditionally on a linear combination of fixed and random effects (l ), the previous formulas can be adapted wherê2 T is replaced bŷ2 (the residual variance is the only one to account for variability) andĈ 11 byĈ. This should be done cautiously as the EBLUP are not appropriate for such use, and one should rather consider the random effects as fixed effects, in this purpose, throughout the whole modeling process.

Tolerance interval in a linear mixed model
As explained in Section 2, a PI can also be named -expectation TI as it contains a given proportion of the future values, but only on average. TIs (type II) are therefore a better approach, 4 especially for small sample sizes, as a confidence level is added. Sharma and Mathew proposed to calculate two-sided TIs from small-sample asymptotics (SSA) approach. 10 This requires to reparametrize the mixed model and to rewrite the log-likelihood function. 10(p262) This methodology is therefore not available from any mixed models output from standard statistical software. Furthermore, the convergence is not achieved when the estimated covariance matrix is not positive definite and the authors propose "to add a small number to the diagonal elements" to tackle this issue. 10(p263) In this section, we propose an approximate solution where the total variance is replaced by the upper bound of its CI (by analogy to the univariate case). Graybill and Wang 32 proposed a mathematical method to calculate a CI for a linear combination of variances, called MLS method, which has been later extended for variance components unrestricted in sign. 33 The TI proposed in this paper is then similar to the PI where the total variance is replaced by the upper bound of its CI obtained by the MLS method on the EMSs (the EMSs are independent in balanced designs, while the variance components may be correlated). The total variance is rewritten as a linear combination of EMSs of each variance components:̂2 T = ∑ q =1 k EMS . A 100(1 − )% TI with 100(1 − )% confidence level is then given byŶ where EMS j is the EMS associated to the jth variance component, and H = r ∕ 2 ,r − 1, where r j is the classical ANOVA degrees of freedom associated to the corresponding j th EMS. Some examples are detailed in the next sections. For other designs or unbalanced cases, the total variance can still be expressed as a linear combination of EMSs, although the MLS method assumes the terms to be independent (simulations and real data example for unbalanced designs are given in the next sections). This TI (26) converges to the PI (21) and to the true quantiles (19) when the sample size (related to the degrees of freedom r j ) increases asĈ 11 → 0,̂2 T → 2 T , and r ∕ 2 ,r → 1. Note that the MLS method was proposed by Hoffman and Kringle in random effects models, 7 while the proposed formula (26) is an extension that includes fixed effects of any type. Furthermore, the variance of fixed effects is not taken into account in the MLS approach (26) because calculating the EMSs corresponding to the fixed effects is challenging, especially for unbalanced or complex designs, and the total variance is usually much greater than the variance of fixed effects (asĈ 11 → 0).

Tolerance interval with one random variable
In a mixed model with one random variable and for a given linear combination of fixed effects, we have Y l ,ij = l + i + ij , where A is the number of levels of , n is the number of replicates, i ∼ N(0, 2 ), and i ∼ N(0, 2 ). The EMS of the residual and the random variable are given, respectively, by EMSE =̂2 and EMSA =̂2 + n̂2. It follows that the total variance is given bŷ2 where

Tolerance interval with two nested random variables
In a mixed model with two nested random variables and for a given linear combination of fixed effects, we have Y l ,ijk = l + i + j(i) + j(i)k , where A is the number of levels of , B is the number of levels of , n is the number of replicates, i ∼ N(0, 2 ), (i) ∼ N(0, 2 ), and (i)k ∼ N(0, 2 ). The EMS of the residual and the random variables are given, respectively, by EMSE =̂2, EMSB =̂2 + n̂2 and EMSA =̂2 + n̂2 + nB̂2. It follows that the total variance is given bŷ2 T = (1−1∕n)EMSE+EMSA∕(nB)+EMSB(1∕n−1∕(nB)). The 100(1− )% TI with 100(1− )% confidence level is then given bŷ where

Parameterization of fixed and random effects in the linear mixed model
For ease of computation for factor level means and their degrees of freedom, we formulate fixed factor(s) as cell means where no intercept is used and each level of fixed effect combination will be estimated as a parameter in the model directly.
The degrees of freedom for the cell means can be approximated using Kenward-Roger or Satterthwaite method as usual.
For random factor(s) in the model, it is recommended to reflect the actual design of the experiment, where no model simplification is used. Omitting some random effects or combining multiple random factors into one can result in the underestimation of variance components, which is not desirable when predicting a future observation due to the smaller total variance obtained from the model. Moreover, note that building a model to 'explain' or 'understand' the data is not necessarily the same than building a model to 'predict' future values (see the work of Shmueli 34 ). The goal is not here to reduce the model or to understand the interaction but to predict future values at a given linear combination of fixed effects.

Statistical software
In SAS, the option ASYCOV from the Proc Mixed procedure provides the covariance matrix of the variance components, ie, the (observed) Fisher information matrix. The options OUTPM and OUTP in the MODEL statement provide the estimated average values for a given level of fixed effects (see (12)) and a given level of fixed effects conditional on random effects (see (13)), respectively. There is no option to calculate PIs for a future observation. The EMSs can be requested from the Proc Mixed statement (ie, TYPE3). In JMP, the "Fit Model" platform allows to run a mixed model and the PIs can be obtained from the "individual CI" option in the "Save Columns" menu. There is no formula and no discussion in the help regarding the PI and its degrees of freedom. 35 One can actually notice that the degrees of freedom for fixed effects (ie, Kenward-Roger) are used for the PI. This is not appropriate as explained in Section 3.2 with PIs too wide (see next section). In R, the R package varComp 36 allows to provide the covariance matrix of the variance components from the function vcov (but varComp is no longer maintained in the current version of R). There is no solution to calculate PIs or TIs, either, other than using a bootstrap approach (ie, from lme4::bootMer). R and SAS code are available in the supplementary material to calculate the CI, PI, and TI presented in this paper and applied in Section 5.

SIMULATION RESULTS
In order to compare the widths and coverage probabilities of the CIs and our PIs and TIs, 10 4 data sets were simulated (per scenario) under different scenarios as summarized in Table 1. Three different models are simulated with the intercept as fixed effect, and one random variable, two nested random variables, or two crossed random variables. These models are simulated with a low or high residual variability. Eight different statistical intervals are calculated for each simulated data set: the 95% CI (degrees of freedom from Kenward-Roger's method), the 95% PI (degrees of freedom from Kenward-Roger's method, and from the total variance as proposed in this paper) and the 95%TI with confidence levels from 50% to 90% (step 10%). The following sections will summarize the results on graphs with the coverage probabilities, while the comparison of the widths of these intervals is given online as supplementary materials (CI, PI, and the largest TI with 90% confidence level). The coverage probabilities are calculated as follows: frequency of simulated CIs that include the true intercept (or the true linear combination of fixed effect). The prediction level of the PIs is assessed by the average of the proportions covered by the simulated PIs (see formula (16)), same for the prediction level of the TIs. The confidence level of the TIs is the frequency of simulated TIs that cover a proportion higher than its prediction level (if the true distribution is an  N(0, 1), and if a 95% TI with 80% confidence is calculated as [-2, 2], then it covers a proportion equal to 95.45%, and this is higher than 95% as it should be in 80% of cases).

One random variable
The model used for the simulations is Y ij = + i + ij , where i, the number of levels of , is equal to 3, 5, or 10 and j, the number of replicates, is equal to 2, 3, 5, 7 or 10, which leads to 15 different sample sizes. Furthermore, the intercept = 25, the random effects i ∼ N(0, 2 ), and the residual i ∼ N(0, 2 ) where the variance components values are given in Table 1. The coverage probabilities of the CIs and PIs are summarized in Figure 2 and the TIs in Figure 3.
The coverage probabilities of the CIs of factor level means obtained by Kenward-Roger's method are very good, except for a low number of levels of the random variable when the residual variability is high and the number of replicates is high (see Figure 2). The coverage probabilities of the PIs, with degrees of freedom calculated form the total variance, are close to the nominal level except that when the residual variability is low, and there is a low number of levels in the random variable. Note that three levels for a random variable is really a minimum, and the coverage probabilities are higher than 93% from 5 levels. The PIs calculated with degrees of freedom from Kenward-Roger's method are conservative as their coverage probabilities are always higher than the nominal level. Most of the prediction levels of the TIs are higher than the nominal level (see Figure 3 (Left)), as expected, because these intervals include at least 95% of the future data. The higher the confidence level, the wider the TIs, and on Figure 3 (Right), we can notice that the confidence levels are close to their nominal levels, except when the residual variability is high with a low number of levels for the random variable. The average widths of TIs (with 90% confidence level) are larger than those of PIs, which are larger than those of CIs (see supplementary materials), as expected, whatever the design (sample size). Furthermore, none of the simulated PIs were narrower than the simulated CIs for each simulated data set. While this comment may seem surprising as this is not expected, the comment is noteworthy as in later simulations for specific parameter combinations this seemingly paradoxical behavior can be observed and will be further discussed.

Two nested random variables
The model used for simulation is Y ijk = + i + j(i) + j(i)k , where i the number of levels of is equal to 2, 3, or 5; j the number of levels of within is equal to 2, 3, or 5; and k the number of replicates is equal to 2, 3, or 5, which leads to 27 different sample sizes. Furthermore, the intercept = 25, the random effects i ∼ N(0, 2 ), (i) ∼ N(0, 2 ), and the residual (i)k ∼ N(0, 2 ) where the variance components values are given in Table 1. The coverage probabilities of the CIs and PIs are summarized in Figure 4, the TIs in Figure 5, while their average widths are summarized in the supplementary materials. The coverage probabilities of the CIs obtained by Kenward-Roger's method are very good, except for a low number of levels of the random variable (Figure 4 (Left)). The coverage probabilities of the PIs, with degrees of freedom calculated form the total variance, are excellent except when the residual variability is low with a low number of levels of the random variable (as it is for the one random variable model). The PIs calculated with degrees of freedom from Kenward-Roger's method are conservative as their coverage probabilities are always higher than the nominal level. Most of the prediction levels of the TIs are higher than the nominal level (see Figure 5) for a high residual variability as expected. In theory, a TI with 50% confidence is similar to a PI (the average is nearly equal to the median), but in practice, a 60% confidence level is usually required. 4 This is also observed here in Figure 5 where the TIs with 60% confidence level are very close to the 95% prediction level. On Figure 5, the confidence levels are close to their nominal levels except for very low number of levels from the random variable (ie, two levels).

Two crossed random variables
The model used for simulation is Y ijk = + i + j + ij + ijk , where i the number of levels of is equal to 2, 3, or 5, j the number of levels of is equal to 2, 3, or 5, and k the number of replicates is equal to 2, 3, or 5 ,which leads to 27 different sample sizes. Furthermore, the intercept = 25, the random effects i ∼ N(0, 2 ), ∼ N(0, 2 ), i ∼ N(0, 2 ), and the residual i k ∼ N(0, 2 ) where the variance components values are given in Table 1. The coverage probabilities of the CIs and PIs are summarized in Figures 6, the TIs in Figure 7, while their average widths are summarized in the supplementary materials.
Most of the coverage probabilities of the CIs obtained by Kenward-Roger's method are slightly higher than the nominal level, Figure 6 (Left). The coverage probabilities of the PIs, with degrees of freedom calculated form the total variance, are excellent except when the residual variability is low with a low number of levels of the random variable (see Figure 6 (Right)). The PIs calculated with degrees of freedom from Kenward-Roger's method are conservative as their coverage probabilities are always higher than the nominal level. Most of the prediction levels of the TIs are higher than the nominal level (see Figure 7 (Left)) as expected. The TIs with 60% confidence level are very close to the 95% prediction level, as expected. On Figure 7 (Right), the coverage probabilities are higher than their nominal levels when the residual variability is high. The coverage probabilities move closer to the nominal level when the residual variability is low and when the random variables have at least three levels.

Unbalanced two crossed random variables
The simulation scenarios from Sharma and Mathew 10 (see Section 5.2) are used to compare the performance of our methodology (based on the MLS approach) to the SSA approach on an unbalanced two crossed random variables design. The model is similar to the one detailed in the previous section where, here, and have, respectively, four and five levels. The design is largely unbalanced with the number of replicates from 1 to 9 and given by ((1, 2, 1, 9) ′ , (2, 9, 8, 1) ′ , (1, 1, 1, 3) ′ , (7, 1, 6, 2) ′ , (1, 3, 1, 1) ′ ). Twelve scenarios are simulated by setting the variance components as follows: 2 = 10, 10, 10, 10, 5, 5, 5, 5, 1, 1, 1, 1; 2 = 0.1, 0.1, 0.5, 0.5, 0.1, 0.1, 0.5, 0.5, 0.1, 0.1, 0.5, 0.5 and 2 = 0.1, 0.5, 0.1, 0.5, 0.1, 0.5, 0.1, 0.5, 0.1, 0.5, 0.1, 0.5. The intercept is equal to 0, and the residual variance is 1 for all scenarios. Formulas given in Section 3.3 can still be used as an approximate solution, by adapting the EMS. The EMS of the residual, the random variables and their interaction, are given, respectively, by EMSE =̂2, EMSA =̂2 + 1.568̂2 + 7.84̂2, EMSB = 2 + 1.6552̂2 + 6.6206̂2, and EMSAB =̂2 + 2.0754̂2 . It follows that the total variance is given bŷ2 T = 0.464EMSE + 0.128EMSA + 0.151EMSB + 0.265EMSAB. Only the 90% TIs with 95% confidence level were simulated by Sharma and Mathew using the small-sample asymptotic methodology. 10 We reproduce here their simulations together with the 95% CIs (degrees of freedom by Kenward-Roger's method) and the 95% PIs with degrees of freedom from the fixed effects (Kenward-Roger) or from our methodology by using the total variance. The 90% TI with 95% confidence level is here given bŷ± The coverage probabilities of the CIs obtained by Kenward-Roger's method are excellent except an overestimation for high residual variability (CIs slightly too wide for the 4 last scenarios on Figure 8 -left). The coverage probabilities of the PIs with degrees of freedom from Kenward-Roger's method are all higher than the nominal level (the PIs are too wide). The coverage probabilities of the PIs with degrees of freedom from our methodology based on the total variance are better and closer to the nominal level (Figure 8 -left). The coverage probabilities of the TIs calculated with our MLS approach are better than the SSA approach for the first half of designs, and similar for the second half of designs where the coverage probabilities are higher than the nominal level. As explained in the previous sections, the TIs may be conservative when the residual variability is high.

A paradox with confidence interval larger than prediction interval
In the supplementary material, the average widths of TIs are larger than those of PIs, which are larger than those of CIs, as expected, except for the designs with a low number of levels for a random variable (ie, only two levels). In these cases, the 95% CIs may be larger than 95% PIs, which leads to a paradox as a PI should be larger than a CI. The variance of prediction is always larger than the variance of the fixed effects (see (14)). The degrees of freedom (calculated from the total variance) is also larger (or equal) for a prediction compared to a mean (fixed effects); thus, the quantile of the t-distribution is smaller for a prediction. A paradox arises when the increase of the variance for the prediction does not compensate the decrease of the t-distribution quantile. This paradox appeared in our simulations with nested design only for the design with two levels nested within the parent variable, with a frequency approximately between 40% and 70% (depending on the number of levels for the parent variable and the residual variability) (see the supplementary online material for plots of frequency). It occurred in crossed design with only two levels for the first or second random variable, with a frequency approximately between 25% and 75% (depending on the number of levels of the other random variable and the residual variability).

Other designs
The simulations presented in this paper are based on random models with intercept as fixed effect. The results and findings remain valid with additional fixed effects. In case of missing data (under the assumption of missing completely at random or missing at random), the approach proposed remain valid as our calculation is based on the Hessian matrix in the linear mixed model. Note that with missing data, the design is unbalanced, and the variance components may be (highly) correlated. Our approach that the degrees of freedom are calculated directly by using the Generalized Satterthwaite's method takes into consideration the covariance of random effects estimated from the Hessian matrix.

Orthopedic surgery study, intralesional resection risk -mixed model
The data set from Francq and Cartiaux is used as an illustration to compare the different statistical intervals (see the literature for descriptive plots and more statistical details, 30 and for more details on the study design and medical context. [37][38][39]. The data and R or SAS code are provided in the Supplementary online material. This study is composed of 23 surgeons performing a tumor cutting at four different slices (fixed variable with four levels) with a free hand or navigated technology (fixed variable with two levels). The response variable is the Error of Safe Margin (ESM) that should ideally be between -5 and 5. Values lower than -10 mean that the tumor is not removed completely (intralesional resection). The goal of this study was to assess and compare the risk of intralesional resection in bone tumor surgery between the classical surgery (free hand procedure) and a new surgery (navigated technology). The data are assumed to be Gaussian. To simplify the calculation of the predictions as explained in Section 3.4, the model only includes cell means (combinatory factor levels of slice and technology) for the fixed effects, and surgeon as a random variable. where the degrees of freedom for surgeon variance is 22 and the residual one is 66 as (23 − 1)(4 − 1) = 66 (which is slightly different from Equation (28)), H Sur = 22∕ 2 0.2,22 − 1 and H res = 66∕ 2 0.2,66 − 1. The different statistical intervals are displayed in Figure 9 (Left) for free hand and navigated technology at the four slices. Table 2 provides the estimates of cell means and all the estimated values of the three intervals. As expected, the widths of 95% TIs (with 80% confidence) are larger than those of the 95% PIs, which are larger than those of the 95% Cis. If we focus on slice 1 with free hand technology, we predict ESM to be 0.11; the average ESM will be between -1.98 and 2.21 (with 95% confidence), whereas 95% of individual ESM values are expected to lie between -10.16 and 10.39 on average, or between -10.89 and 11.1 with 80% confidence (at least 95% of individual ESM values will be between these two values in 80% of cases). Surgeons can fail to remove the tumor with free hand technology at slice 1, 3, and 4 as the PIs or TIs overlap the threshold -10.

Assay validation study -one-way random model
During development of a vaccine, different analytical methods for determining the antigen concentration or the (relative) potency in the produced vaccine batches need to be developed. In assay development, a linear mixed model across all samples is applied to estimate the variance components to evaluate the precision of the method (repeatability and intermediate precision). Trueness is the systematic deviation (bias) between the reference value and the measured concentration. Accuracy is the closeness between the reference value and an individual test value, which takes into account the systematic error (trueness) and random error (precision). Trueness is therefore assessed by using CIs at each measured concentration while accuracy is evaluated by using PIs. The aim of a validation is to show that an analytical method is suitable for its intended use, and consequently to prove the reliability of the measurements. The dataset from Hoffman and Kringle 7 is used to illustrate the different statistical intervals on a one-way random model. Concentrations are measured for six runs and three replicates (balanced). The REML method is used, together with Kenward-Roger's method for degrees of freedom of intercept. The run and residual variances are 0.000681 and 0.001253, their variances are, respectively, 5.123 × 10 7 and 2.617 × 10 7 (obtained from the observed Fisher information matrix), and their covariance is estimated to −8.72 × 10 8 . The total variances is then 0.000681 + 0.001253 = 0.001934, and its variance 5.123 × 10 7 + 2.617 × 10 7 + 2(−8.72 × 10 8 ) = 5.995 × 10 7 . This leads to 2(0.001934 2 ∕(5.995 × 10 07 )) = 12.484 degrees of freedom (calculated from the estimated total variance and its variance) for the PI.  Figure 9 (Right).

Assay validation study -unbalanced one-way random model
The same data set 7 is used to illustrate the different statistical intervals on an unbalanced one-way random model, where the first replicates of run 1, run 2, and run 6 are taken out (three missing values). The run and residual variances are then 0.000575 and 0.001594; their variances are, respectively, 8.605 × 10 7 and 6.151 × 10 7 (obtained from the observed Fisher Information matrix), and their covariance is estimated to −3.22 × 10 7 . The total variances is then 0.000575 + 0.001594 = 0.002169 and its variance 8.605×10 7 +6.151×10 7 +2(−3.22×10 7 ) = 8.316×010 7 . This leads to 2(0.002169 2 ∕(8.316×10 7 )) = 11.31448 degrees of freedom for the PI. Formulas (27) and (28) have to be adapted to the unbalanced design by modifying the degrees of freedom and the EMS. The EMS of the residual and the random variable are given here, respectively, by EMSE =̂2 and EMSA =̂2 + 2.48̂2. It follows that the total variance is given bŷ2 T = EMSA∕2.  Figure 9), the variances are therefore greater and the degrees of freedom smaller.

Discussion
The performance of the predictive analysis (ie, PIs or TIs) can be assessed from a cross-validation technique or by using a test data set. In both cases, observations that do not contribute to the computation of the statistical intervals are considered as 'new' observations and can then be compared to the statistical intervals. Empirical coverage can be obtained, ie, the frequency of 'new' observations included in the PIs, and the predictive or confidence level can be adjusted if needed (see also the bootstrap calibration proposed by Hoffman for one-sided TIs 40 or the double calibration bootstrap in mixed models proposed by Francq and Cartiaux 30 ). Note that these techniques are efficient for large data set while small or moderate sample sizes are usually encountered in non-clinical or preclinical study (ie, Section 5.2).
Multivariate tolerance regions can be used when multiple attributes are assessed. 3 In the orthopedic surgery study (Section 5.1), the primary outcome is the error on safe margin but other responses variables (ie, location, parallelism, etc) [37][38][39] were measured, and multivariate tolerance regions can be useful to assess these different outcomes simultaneously (ie, the joint TIs where the different outcomes of a future surgery will lie). In assay validation study (Section 5.2) or when measuring vaccines quality, the main response variable is usually the potency but other attributes may also be measured (ie, viral vector aggregation, free viral proteins, polysorbate content, sucrose content, etc) and multivariate tolerance regions can be very helpful to compare new batches to standard or reference samples.

PRACTICAL RECOMMENDATIONS AND CONCLUSIONS
In this paper, we recommend the use of formula (21) and formula (26) to obtain a two-sided PI for a future observation without and with an additional confidence level in linear mixed models. The approach of calculating the PI has been shown to be generalizable for more complex designs with its connection to classical Satterthwaite method for the balanced design with one random factor. The MLS approach for TIs provides better accuracy than the small-sample asymptotic method for low residual variability (similar accuracy otherwise), and is mostly easier to apply in practice. Our simulation study shows good coverage probabilities for the proposed PIs and TIs. Study designs with a small number of levels for a random factor (less than three) are not recommended to be used given that the coverage probability of PI/TI can be lower/higher than their nominal level.
When obtaining the PIs/TIs for a future observation at a given level of fixed effect(s), we recommend building a cell mean model without the intercept, which has the advantage of obtaining degrees of freedom and SE of fixed effect(s) directly from the model. Our solution to the PI and TI can be constructed by using the output of the mixed model from standard software (eg, SAS proc mixed and R lme4 or varComp package).
The proposed analytical solution of PIs and TIs in this paper can be useful and important for medical research and a large number of applications under the linear mixed model framework. It is straightforward to use as compared to other resampling-based methods and Bayesian method which can be computational intensive and relying on the input parameters (eg, prior distribution of parameters). For further research, we shall compare the results of our proposed method to that of Bayesian TIs using simulation study as well as real case studies.

ACKNOWLEDGEMENT
The authors gratefully thank the Chemistry, Manufacturing and Controls (CMC) Statistical Sciences team of Glaxo-SmithKline (GSK) and the reviewers for their helpful comments during the preparation of this work.

SUPPLEMENTAL MATERIAL
Additional supporting information (plots with the widths of the statistical intervals from Section 4, plots for frequency of paradox described in Section 4.5, the data set used in Section 5.1, and R and SAS code) can be found in the online version of this article at the publishers web site.

CONFLICTS OF INTERESTS
This study was sponsored by GlaxoSmithKline Biologicals SA. All authors are employees of the GSK group of companies.

AUTHOR CONTRIBUTION
The literature search was performed by Bernard G. Francq (BGF) and Dan Lin (DL), the code to run the simulations by BGF and DL, the applications chosen by BGF and DL, the plots by BGF, and the algebraic proof by BGF. All authors were involved in drafting the manuscript or critically revising it for important intellectual content. All authors approved the manuscript before it was submitted by the corresponding author.

DATA AVAILABILITY STATEMENT
The data set from the orthopedic surgery study (intralesional resection risk) can be found in the online supplementary material.
On the other hand, the degrees of freedom, r, for our PI in formula (21) is given by ) .
The covariance matrix between the two variance components is given by 27(p90) : It follows that ) . Therefore, The degrees of freedom (A1) and (A2) are then equal: r = r ′ .