Issues with the expected information matrix of linear mixed models provided by popular statistical packages under missingness at random dropout

Likelihood‐based methods ignoring missingness at random (MAR) produce consistent estimates provided that the whole likelihood model is correct. However, the expected information matrix (EIM) depends on the missingness mechanism. It has been shown that calculating the EIM by considering the missing data pattern as fixed (naive EIM) is incorrect under MAR, but the observed information matrix (OIM) is valid under any MAR missingness mechanism. In longitudinal studies, linear mixed models (LMMs) are routinely applied, often without any reference to missingness. However, most popular statistical packages currently provide precision measures for the fixed effects by inverting only the corresponding submatrix of the OIM (naive OIM), which is effectively equivalent to the naive EIM. In this paper, we analytically derive the correct form of the EIM of LMMs under MAR dropout to compare its differences with the naive EIM, which clarifies why the naive EIM fails under MAR. The asymptotic coverage rate of the naive EIM is numerically calculated for two parameters (population slope and slope difference between two groups) under various dropout mechanisms. The naive EIM can severely underestimate the true variance, especially when the degree of MAR dropout is high. Similar trends emerge under misspecified covariance structure, where, even the full OIM may lead to incorrect inferences and sandwich/bootstrap estimators are generally required. Results from simulation studies and application to real data led to similar conclusions. In LMMs, the full OIM should be preferred to the naive EIM/OIM, though if misspecified covariance structure is suspected, robust estimators should be used.

Likelihood-based methods ignoring missingness at random (MAR) produce consistent estimates provided that the whole likelihood model is correct.
However, the expected information matrix (EIM) depends on the missingness mechanism. It has been shown that calculating the EIM by considering the missing data pattern as fixed (naive EIM) is incorrect under MAR, but the observed information matrix (OIM) is valid under any MAR missingness mechanism. In longitudinal studies, linear mixed models (LMMs) are routinely applied, often without any reference to missingness. However, most popular statistical packages currently provide precision measures for the fixed effects by inverting only the corresponding submatrix of the OIM (naive OIM), which is effectively equivalent to the naive EIM. In this paper, we analytically derive the correct form of the EIM of LMMs under MAR dropout to compare its differences with the naive EIM, which clarifies why the naive EIM fails under MAR. The asymptotic coverage rate of the naive EIM is numerically calculated for two parameters (population slope and slope difference between two groups) under various dropout mechanisms. The naive EIM can severely underestimate the true variance, especially when the degree of MAR dropout is high. Similar trends emerge under misspecified covariance structure, where, even the full OIM may lead to incorrect inferences and sandwich/bootstrap estimators are generally required. Results from simulation studies and application to real data led to similar conclusions. In LMMs, the full OIM should be preferred to the naive EIM/OIM, though if misspecified covariance structure is suspected, robust estimators should be used.

K E Y W O R D S
expected information matrix, linear mixed models, longitudinal data, MAR dropout, observed information matrix, robust variance It is known 2 that a correctly specified LMM (mean evolution and covariance structure both included) yields unbiased estimates when data are missing at random (MAR), whereas if data are missing completely at random (MCAR), correct specification of the covariance structure is not required. Even though point estimates are valid under MAR, care must be taken when using precision measures as the dropout mechanism is inevitably involved in the expectations required to derive the expected information matrix (EIM). 3,4 Thus, to validly derive the EIM (correct EIM), one must correctly specify the missingness mechanism, which is difficult to verify in practice. To avoid such difficulties, there has been a consensus toward using the observed information matrix (OIM) in real applications, [3][4][5] as it is always a large-sample approximation of the correct EIM (with expectations in EIM taken over the true data generating mechanism), without requiring explicit modeling of the missingness mechanism.
In most statistical packages for LMMs though, precision measures for the fixed effects are obtained by inverting only the fixed-effect block of the OIM, without providing the full (including both fixed effects and variance components) OIM, which is equivalent to setting the off-diagonal blocks of the OIM equal to zero. This is based on the argument that the expectations of the off-diagonal blocks of the OIM are equal to zero, which, however, holds only under MCAR or fully observed data. This leads to a naively derived OIM, referred to as the naive OIM, borrowing the terminology by Kenward and Molenberghs, 4 who defined the naive EIM as the one derived by taking expectations ignoring the missing data mechanism, which is effectively equivalent to naive OIM. This issue has been briefly covered in some popular books devoted to LMMs, 6 whereas in other standard books for longitudinal data analysis, it is ignored or only partly covered. 5,7-10 Based on a limited simulation study using random intercept and slope LMMs, Pantazis et al 11 illustrated that the SE produced by the lme 12 function in R 13 led to incorrect coverage probabilities. Recently, Maruo et al 14 evaluated the effects of adopting the naive OIM on the estimate of the treatment effect based on multivariate Normal models assuming compound symmetry and unstructured covariance specification using simulation studies. They showed that the bias in the naive SE was nonnegligible, tending to increase at higher MAR dropout rates. However, a more solid exposition of the problem with the naive EIM/OIM in LMMs may be required to change the standard practice of popular statistical packages. In any case, we suppose that a statistician applying an LMM to MAR data in the light of the standard argument by Rubin 2 (ie, that correctly specified likelihood-based methods provide unbiased estimates under MAR) may be still unaware of the aforementioned issues.
Another problem often ignored in the literature is that when the covariance structure is misspecified, the true variance of the estimators is no longer based on the inverse of the correct EIM or the full OIM. 15 In such cases, robust (sandwich or bootstrap) estimators are required. Chavance and Escolano 16 proposed considering model misspecification as probable if the ratio of the model-based to sandwich SE is out of the [3/4; 4/3] interval. As shown by Maruo et al, 14 robustness is lost if the naive OIM is used in the sandwich formula under MAR data. Additionally, it has been shown that the fixed-effect estimators can be considerably biased under covariance structure misspecification. 17 The motivating example for this work comes from the epidemiology of human immunodeficiency virus (HIV) infection during the HIV natural history, that is, the disease evolution from seroconversion (soon after infection) up to antiretroviral treatment (ART) initiation or clinical AIDS onset. In HIV infection, the absolute number of CD4 cells, a marker of disease progression, decreases over time in the absence of ART. When modeling the CD4 count evolution during the HIV natural history, CD4 measurements collected after ART initiation or AIDS onset are by definition excluded, leading to a dropout-like missingness mechanism. Treatment guidelines up to 2015 suggested ART initiation when CD4 counts drop below a certain level, thus, missing CD4 data are likely to be MAR, as the dropout probabilities depend on the observed CD4 counts. In light of this argument, LMMs using a random intercept and slope structure or more advanced covariance structure approaches can be applied. 17,18 In this paper, we analytically show that using the naive EIM (and subsequently the naive OIM) of a correctly specified LMM can lead to seriously underestimated SEs for the fixed-effect estimates when dropout is MAR. In Section 2, we briefly examine the most frequently used statistical packages in terms of the way SEs are obtained. In Section 3, we analytically derive, under any assumed MAR dropout model, the correct form of the EIM without considering further covariates apart from time. The effects of using the naive EIM on the coverage rate of the population slope estimate are presented assuming various scenarios for the parameters of a specific dropout mechanism. In Section 4, the same approach is used to highlight the consequences of using the naive EIM on the SE of the estimate of the slope difference between two groups. Using a very similar setting, in Section 5, we also examine the performance of both the naive and correct EIM when the covariance structure is misspecified. The performance of the considered precision estimators is evaluated in a simulation study in Section 6. In Section 7, the examined methods are fitted to data from the "Concerted action on seroconversion to AIDS and death in Europe" (CASCADE) study. 19 Section 8 presents concluding remarks together with a discussion of limitations and possible extensions.

REVIEW OF SOME POPULAR STATISTICAL PACKAGES FOR FITTING LINEAR MIXED MODELS
There are plenty of available software packages to fit LMMs; here we just review some of the most popular approaches. The lme 12 (version 3.1.160) function in R fits an LMM in the formulation described in Laird and Ware 1 through direct maximization of the (restricted) likelihood function, allowing for great flexibility in the specification of the covariance structure. Lme provides SEs for the fixed effects using the naive OIM. lmer (version 1.1.31) 20 is a similar package in R that uses a penalized least squares approach that leads to ML (or REML) estimates; SEs are also based on the naive OIM. To calculate the full OIM of an LMM fitted by the lmer function, the merDeriv 21 (version 0.2.4) package can be used. The command mixed in STATA 22 also uses the naive OIM to produce SEs for the fixed effects. In SAS, in the PROC MIXED procedure, the SEs for the fixed effects are based on inverting the -block of the information matrix. However, Maruo et al 14 described how SEs based on the OIM can be obtained through the NLMIXED procedure. The vcovCR function from the library clubSandwich 23 (version 0.5.8) in R provides cluster-robust variance-covariance matrices, however, it uses the naive OIM in the sandwich formula, hence robustness may be lost. 14 The most alarming point, though, is that none of the above-mentioned reference manuals discusses potential problems with the naive OIM when data are MAR.

DERIVING THE CORRECT EIM OF LINEAR MIXED MODELS UNDER MAR DROPOUT
Suppose that Y ⊤ = (Y 1 , Y 2 , … , Y Q ) denote the marker measurements intended to be collected at prespecified times t 1 , t 2 , … t Q on a randomly chosen individual i, with subscript i suppressed for simplicity. Missingness is assumed to arise from individuals' drop-out. Therefore, the missingness indicators R = (R 1 , … , R Q ) ⊤ , with R j = 1 when Y j is observed and R j = 0 otherwise, can be replaced by the number of the observed marker measurements, i.e. M = ∑ Q j=1 R j . Given M = j, the observed and missing marker data are denoted by respectively, with M = Q implying that the data are fully observed, that is, Y (Q) = Y. We assume that dropout is MAR, which implies that the dropout probabilities are assumed to depend only on the observed marker data for each dropout pattern, that is, P(M = j|Y (j) ; c ), j = 1, … , Q − 1 and P(M = Q|Y (Q−1) ; c ) = 1 − ∑ Q−1 j=1 P(M = j|Y (j) ; c ), where c denotes the parameters of the dropout model.
Assume that the true marker evolution over time is characterized by an LMM, Y = X 0 + Zb + , where X and Z denote the design matrices for the fixed and random effects, respectively, ∼ N(0, 2 0 I (Q) ) the within-individual errors, and b ∼ N(0, 2 0 D 0 ) the random effects. Subscript "0" denotes the true values of the parameters, with the full parameter vector denoted by ⊤ 0 = ( ⊤ y0 , ⊤ c0 ), where ⊤ y0 = { ⊤ 0 , 2 0 , vech(D 0 ) ⊤ } denotes the true parameter vector of the marker model and vech the "vector-half" operation stacking the columns of the lower triangular part of a symmetric matrix. Thus, the true marginal mean and covariance matrix of the full marker data, Y, are equal to E 0 (Y) = X 0 and Var 0 (Y) = 2 0 V 0 = 2 0 (I (Q) + ZD 0 Z ⊤ ), respectively. We assume that the fitted model is correctly specified, implying that the assumed first two moments of Y are equal to E(Y) = X and Var(Y) = 2 V, respectively. Based on standard asymptotic theory arguments, for example Reference 24, under specific regularity conditions, the estimator of the fitted model,̂y, converges in probability to the value of y solving the system of equations E 0 , where X (j) and V (j) are the appropriate submatrices of X and V for the jth drop-out pattern, respectively. Since f (Y (j) ; y ) is a quadratic form in Y (j) , the system of equations to be solved is equivalent to where r (j) = 0(j) − X (j) , a 0j = P(M = j; 0 ); 0(j) = E 0 (Y (j) |M = j; 0 ) and 0(j) = Var 0 (Y (j) |M = j; 0 ). Since the dropout mechanism is MAR and the fitted model is correctly specified, the solution of Equation (1) is y0 , which implies that the estimator̂y is consistent for y0 . 2 Moreover, the central limit theorem guarantees that the limiting distribution of̂y becomes approximately Normal, that is, more formally, } denotes the EIM for a sample of 1 individual, N the number of individuals, and D − − → the convergence in distribution. Thus, the approximate large-sample covariance matrix of the estimator̂y based on a sample of size N is equal to (̂y) −1 ∕N. By interchanging integration and differentiation, it can be easily verified that As shown in Appendix A of the Data S1, the correct EIM at the true parameter values, ( y0 ), is equal to where r 0(j) = 0(j) − X (j) 0 , vec is the "vec" operator transforming a matrix into a column vector by vertically stacking the columns of the matrix, and G q denotes the duplication matrix, 25 that is, vec(D) = G q vech(D). The information between 0 and 2 0 is equal to 0, whereas , which may not be equal to zero. On the contrary, if the expectations in ( y ) are calculated ignoring the missing data mechanism, ( y ) can be shown to be block diagonal. Thus, the naive EIM for the fixed effects is defined as Note that the information currently used by statistical packages (naive OIM), is equal to where M i denotes the number of observed measurements for individual i and X i,(M i ) and V i,(M i ) denote the observed submatrices of X and V for individual i, respectively. It then follows that the naive OIM, multiplied by N −1 , converges in probability to the naive EIM, that is, 1 , which shows that the naive EIM and the naive OIM coincide asymptotically. In addition, since an obvious estimator of the naive EIM is 1 , the naive EIM and the naive OIM are effectively equivalent in practice.
One straightforward conclusion is that when the drop-out mechanism is MCAR (r 0(j) = 0(j) − X (j) 0 = 0 as Y and M are independent under MCAR), ( y0 ) takes a block diagonal form, implying that the naive EIM/OIM are valid under MCAR. However, under MAR, ( y0 ) is not block diagonal in general, thus, precision measures for the fixed effects should not be based only on the block of the EIM/OIM.

Necessary and sufficient conditions for ( y0 ) being block diagonal under Q = 2 and Q = 3 marker measurements
Under Q = 2 marker measurements and assuming a linear population-averaged evolution with a random-intercept structure, we proved that , that is, when the expectations of the observed marker measurements conditional on the dropout pattern are equal to the corresponding marginal expected values in the population. Thus, the naive EIM will be valid only under E 0 (Y (j) |M = j; 0 ) = X (j) 0 . Although this condition is not equivalent to MCAR (as MCAR requires a stronger condition, that is, Y and M being independent), we expect that in most applications where E 0 (Y (j) |M = j; 0 ) = X (j) 0 holds, the dropout mechanism will be MCAR.
However, if Q > 2, this relationship does not hold. In Appendix C in Data S1, for Q = 3 marker measurements, we first derived the necessary and sufficient conditions for ( y0 ) to be block diagonal, which are However, as shown in Appendix C in Data S1, one can construct a MAR dropout mechanism such that ( y0 ) is block diagonal (ie, the previous equations hold) even though 0(2) ≠ X (2) 0 . It should be noted, though, that we do not expect such a dropout mechanism in real applications as it was rather nonintuitive.

Numerical comparison of the naive and correct precision measures for the population slope estimate under various MAR drop-out mechanisms
As a specific application, we numerically evaluated the correct and naive EIMs of a correctly specified LMM, using various scenarios regarding the drop-out mechanism. Specifically, we assumed a random intercept and slope model, 1 ) denote the population intercept and slope, whereas (b 0 , b 1 ) denotes the random effects, assumed to follow the N(0, D) distribution. The true parameter values, obtained by fitting the model to the CASCADE study data, were the following: ( 00 , 01 ) = (23.42, −1.31), Var 0 (b 0 ) = 24.82, Var 0 (b 1 ) = 1.66, and Cov 0 (b 0 , b 1 ) = −2.03. The maximum study duration was assumed to be 5 years, with the marker assessed every 4 months.
Moreover, we assume a specific MAR logistic drop-out mechanism of the form respectively. The parameter c 1 is related to the chance of drop-out when the marker is at a certain level Y ⋆ , whereas c 2 quantifies the change in the log-odds of drop-out associated with one unit increase in the observed current marker's values. Therefore, c 2 = 0 corresponds to an MCAR drop-out mechanism, whereas c 2 ≠ 0 implies MAR.
To quantify the effects of using the naive EIM of the fixed effects in relation to the drop-out parameters, we compared the SEs based on the naive EIM with the SEs based on the whole ( y0 ) for various values of the drop-out parameters, c 1 and c 2 . ( y0 ), though, requires obtaining the first two moments of Y (j) |M = j; 0 and the marginal probabilities of dropout, a 0j ; to calculate these terms, we relied on quasi-Monte Carlo integration coupled with importance sampling 26,27 using 5 ⋅ 10 5 draws. The parameter c 1 was set to correspond to the hazard of drop-out at Y ⋆ = 500 CD4 cells/ L (baseline). We used four values of c 1 corresponding to 1%, 10%, 20%, and 40% baseline hazard of dropout and 16 points for c 2 (−0.30, −0.28, … , 0), leading to 64 scenarios. Over the four baseline hazard of dropout scenarios, the median (IQR, min-max) expected numbers of the observed measurements were 14.3 (13.6-14.7, 12.8-14.9), 7.6 (7.3-7.9, 7.1-8.1), 5.0 (4.9-5.1, 4.8-5.2), and 2.9 (2.6-3.2, 2.5-3.4), respectively. We focused on the slope parameter as the effects on the variance estimates of the intercept were negligible for all dropout scenarios. By basic asymptotic arguments, the asymptotic coverage probabilities of 100(1 − )% confidence intervals obtained by using the naive EIM are equal , where se N (̂1) denotes the asymptotic SE obtained by the naive method, that , Φ denotes the cumulative distribution function of the N(0, 1) distribution, and The results regarding the asymptotic coverage probability of the naive method are presented in Figure 1 (panel A 1 ) and the %-underestimation of the SE −100 × When c 2 = 0, which corresponds to an MCAR dropout mechanism, for all baseline hazard scenarios the naive EIM was correct, a finding that was expected. For a given c 2 , the asymptotic coverage rate was closest to the nominal level and the underestimation of the SE was lowest when the baseline dropout was small (1%) and tended to increase at higher dropout levels, which is consistent with the results reported in Maruo et al. 14 Moreover, for a given baseline dropout scenario, the asymptotic coverage probability tended to become lower with increasing absolute values of c 2 (ie, indicating heavier drop-out in the sense of higher dependence on the observed marker values), going down to 77.1% when baseline dropout was high (40%) and c 2 = −0.30. Similar trends were observed for the underestimation of the SE.
We also considered the effects of the naive EIM when using natural splines in the design matrix of the random effects (one internal knot at 1.37 years), which is a much more flexible covariance structure. Assuming a single slope for the population-averaged evolution, the asymptotic coverage rate and the underestimation of the true SE for the estimated slope are provided in Figure S1 (panels A 1 and A 2 ). Generally, the results remained similar to those reported above for the random intercept and slope model, but the magnitude of underestimation of the SE was higher in some cases.

DERIVING THE CORRECT EIM OF LINEAR MIXED MODELS INCLUDING A BINARY COVARIATE UNDER MAR DROPOUT
In this section, we extend the previous setting by introducing a binary covariate, G, with the probability of being in group g denoted by p g = P(G = g), g = 0, 1 (p 0 + p 1 = 1). We assume that the binary covariate influences the population-averaged

F I G U R E 1 (A)
A random intercept and slope model without further covariates, being correctly specified, is assumed to be fitted to MAR/MCAR data due to drop-out (panels A 1 and A 2 ), which corresponds to the setting of Section 3. The asymptotic coverage probability (A 1 ) and underestimation of the true SE (A 2 ) of a marker's rate of change (slope) estimate using the naive information matrix of the fixed effects are presented. The parameter c 2 measures the change in the log-odds of drop-out associated with one unit decrease in the current marker value. Four scenarios regarding the drop-out rate at 500 CD4/ L, that is, approximately the CD4 counts at seroconversion (baseline), were presented: 1%, 10%, 20%, and 40%. (B) A random intercept and slope model including a binary covariate affecting marker's population rate of change, being correctly specified, is assumed to be fitted to MAR/MCAR data due to drop-out (panels B 1 and B 2 ), which corresponds to the setting of Section 4. The asymptotic coverage probability (B 1 ) and underestimation of the true SE (B 2 ) of the slope difference estimate between the two groups using the naive information matrix of the fixed effects are presented. The parameter c 2,1 measures the change in the log-odds of drop-out associated with one unit decrease in the current marker value for group 1, whereas the corresponding value for group 0 is c 2,0 = −0.14. Four scenarios regarding the common drop-out rate at 500 CD4/ L, that is, approximately the CD4 counts at seroconversion (baseline), were presented: 1%, 10%, 20%, and 40%. marker evolution, but the covariance structure is the same in the two groups. Thus, the model for the observed data under the jth dropout pattern in the gth group is Y (j),g = X (j),g G + Z (j) b g + g , where X (j),g and Z (j) denote the corresponding design matrices of the fixed and random effects, respectively, b g ∼ N(0, 2 G D G ) the random effects, g ∼ N(0, 2 G I (Q) ) the corresponding within-subject residuals, and G the fixed-effects parameter, which is of main interest. Similarly to the previous section, we assume that the model is correctly specified, with the MAR/MCAR dropout mechanism being group-specific parameters. Let ⊤ yG0 = { ⊤ G0 , 2 G0 , vech(D G0 ) ⊤ } be the true parameter vector of the marker model and ⊤ G0 = ( ⊤ yG0 , ⊤ cG0 ) the true parameter vector of the whole data generating mechanism, with 0(j),g = E 0 (Y (j),g |M = j, G = g; G0 ), 0(j),g = Var 0 (Y (j),g |M = j, G = g; G0 ) being the true expected value and covariance matrix of Y (j),g |M = j, G = g, respectively, and a 0j,g = P(M = j|G = g; G0 ) denoting the true drop-out probabilities for group g, g = 0, 1. The estimator of the fitted,ŷ G , converges to the value of yG solving where r (j),g = 0(j),g − X (j),g G and 2 G V (j),G = 2 G (I (j) + Z (j) D G Z ⊤ (j) ) denotes the covariance matrix of the model for the jth pattern of dropout; recall that the random-effects structure is assumed to be the same in the two groups.
Similarly to Section 3, as the model is correctly specified and dropout is MAR, Equation (4) is solved at yG = yG0 , thusŷ G is consistent for yG0 . Moreover, It also immediately follows that ( yG0 ) becomes block diagonal when dropout is MCAR.

Numerical comparison of the naive and correct precision measures for the population slope difference estimate under various MAR drop-out mechanisms
We assumed an LMM with group-specific population slopes but with common baseline value, as it would be the case in a clinical trial comparing the marker rate of change between two treatment groups. Thus, the fixed-effects design matrix is for both groups, where t (j) = (t 1 , … , t j ). For the fixed effects, we assumed that ⊤ G0 = (23.42, −1.31, 0), i.e. the true slope difference between the two groups was equal to zero. The timing of marker measurements and the parameters of the covariance structure were the same as those in Section 3.2 We assumed the dropout mechanism described in Section 3.2, with a common baseline hazard rate but group-specific slopes, We considered various dropout parameters (c 2,0 = −0.14 and c 2,1 ranging from −0.30 to 0.30) along with four scenarios regarding the common baseline hazard for dropout (1%, 10%, 20%, and 40%). The asymptotic coverage probability and the underestimation of the naive SE associated with the slope difference were calculated. Since the true slope difference is equal to zero, the type-I error is equal to 1-coverage probability.
The results for the precision measures of the slope difference are presented in panels B 1 and B 2 of Figure 1. For all baseline dropout scenarios, the coverage probability was equal to the nominal levels when c 2,0 = c 2,1 . This is expected as c 2,0 = c 2,1 means that the dropout mechanism is the same in the two groups, which implies that the group-specific SEs would be underestimated at the same level in the two groups. In general, for all dropout scenarios, when c 2,1 < 0, there was no remarkable underestimation of the SE of the slope difference. For c 2,1 > 0, though, the coverage probability tended to decrease with increasing c 2,1 > 0, with the maximum underestimation of the SE observed at the high baseline drop-out scenario (33.2%) along with a 80.9% corresponding coverage probability (equivalently 19.1% type-I error). This implies that using the naive EIM is more of a concern when the dropout mechanisms applied to the two groups are in the opposite direction; for example, with higher marker values more likely to be missing in the first group and lower marker values more likely to be missing in the second one.
We recalculated the above results assuming natural splines (one internal knot at 1.37 years) in the design matrix of the random effects. The full results are provided in in Figure S1 (panels B 1 and B 2 ). Similar trends in the underestimation of the SE when using the naive EIM were found; however, the magnitude of the underestimation tended to be slightly higher.

DERIVING THE CORRECT EIM UNDER COVARIANCE STRUCTURE MISSPECIFICATION
In practice, researchers can hardly be sure whether the correct likelihood model is used, which means that the assumed covariance structure may be misspecified. This is more problematic under MAR missing data as there is a risk for biased fixed-effect estimates. 2,17 Nevertheless, it has been shown 15 that estimators from misspecified likelihood models still converge to a Normal distribution, but located at the limit in probability of the estimator. Putting these together, under the setting of Section 3, assuming misspecified covariance structure, the limit in probability of the fitted model, say } is the EIM. Recall that as the expectations in ( y ) are taken over the true data generating mechanism, inferences based on ( y ) or the full OIM are asymptotically equivalent. The true covariance matrix of √ N(̂y − y⋆ ) is though no longer equal to ( y⋆ ) −1 in general, which implies that even the full OIM does not necessarily lead to valid SEs in this case. However, we identified a special case where the SE from both the naive and the correct EIM are consistent even though the fitted covariance structure is misspecified. Specifically, assuming that X = Z (or the columns of X is a subset of the columns of Z), we first proved that  . If it is also assumed that dropout is MCAR and the true distribution of the marker is multivariate Normal, we proved that the SE for the fixed effects based on the naive EIM and the correct EIM are consistent. The proof is outlined in Appendix D in Data S1.

Numerical comparison of precision measures for the population slope under various MAR drop-out mechanisms when the covariance structure is misspecified
We numerically investigated the consequences of misspecified covariance structure by comparing the variance-covariance matrix of the fixed effects based on the naive EIM and on the correct EIM, ( y⋆ ), with the true variance-covariance ( y⋆ ) −1 ( y⋆ )( y⋆ ) −1 , using a very similar data generating setting to that used in Section 3.2. The only difference, though, is that the true random-effects structure, apart from a random intercept and slope structure, included a random square-root time trend, with the true variance-covariance matrix of the random effects being vech(D 0 ) = (4.59, 5.19, −24.20, 6.59, −16.62, 53.07) ⊤ . The remaining true parameter values were identical to the corresponding ones in Section 3.2. The fitted LMM included a random intercept and slope structure only; thus, the covariance structure was misspecified whereas the fixed-effects structure was correctly specified. Since dropout is MAR, we derived the asymptotic bias in the fixed-effects estimates by solving Equation (1). When parameter estimates are biased, the coverage probability of confidence intervals converges to 0 as N → ∞; we focused instead on the approximate coverage probabilities for a sample size of N = 500 individuals. By basic probabilistic arguments, this is given , where 10 and 1⋆ denote the true marker slope and the corresponding asymptotic slope estimate by the misspecified model, respectively. Also, se N (̂1) denotes the SE for the estimated slope by using the naive EIM (ie, using 2 ) −1 ∕500) and se R (̂1) the corresponding true one based on the robust variance formula ( y⋆ ) −1 ( y⋆ )( y⋆ ) −1 ∕500. The results are presented in Figure S2. In panel A 1 , the coverage probability for the estimated slope using the naive EIM is shown. It is evident that the coverage probability is exactly equal to 95% when c 2 = 0, which corresponds to MCAR dropout. Thus, our theoretical findings are confirmed (recall that X = Z in the misspecified model). Similarly to our findings in Section 3, the coverage probability reduces with decreasing c 2 (minimum coverage 38.7%) and the smallest effect is observed when the baseline dropout is low. There was a similar pattern in the underestimation of the true SE, with the maximum percentage of underestimation being 40.1% (panels A 2 and A 3 of Figure S2), though the magnitude of underestimation tended to be higher than that presented in panel A 2 of Figure 1. Underestimation was much smaller when the correct EIM was used (panel A 3 of Figure S2). As shown in panel A 4 of Figure S2, the estimated slope could be seriously biased, with the degree of bias tending to increase at higher MAR dropout probabilities. It needs to be emphasized that the reduced coverage probability displayed in panel A 1 of Figure S2 is mainly due to biased slope estimates, thus, it can be only partly attributed to underestimation of the true SE. In fact, when using the correct EIM and the robust variance formula, the coverage rate improved partially: with the minimum coverage probabilities being 51.6% and 60.8%, respectively.

Numerical comparison of precision measures for the population slope difference under various MAR drop-out mechanisms when the covariance structure is misspecified
We repeated the numerical example of Section 5.1 but incorporating a binary covariate potentially affecting marker's slopes but not the baseline values. The true covariance structure included a random intercept and slope along with a random square-root time trend, assumed to be the same in the two groups. As in Section 4.1, the true slope difference was assumed to be equal to zero. The fitted model had correctly specified population evolution but misspecified covariance structure including only a random intercept and slope. We numerically evaluated the performance of the naive and correct EIMs for the slope difference estimate. Similarly to Section 4.1, the dropout mechanisms applying to the two groups were assumed to be different but with a common baseline hazard rate, that is, P(M = j|M ≥ j, Y (j),g , G = g; c 1 , c 2,g ) = exp{c 1 +c 2,g (Y j,g −Y ⋆ )} 1+exp{c 1 +c 2,g (Y j,g −Y ⋆ )} . It was assumed that c 2,0 = −0.14, whereas c 2,1 was assumed to range from −0.30 to 0.30.
The results are presented in Figure S3. Similarly to the findings in Section 4.1, the SEs based on the naive EIM and the correct EIM were valid when c 2,1 = c 2,0 , which implies that the dropout mechanism was identical in the two groups. The naive EIM could could lead to severe underestimation of the true SE of the slope difference (up to 32.3%), the magnitude of which was more pronounced when the dropout mechanisms in the two groups were of opposite direction and the common baseline dropout probabilities were high. What is more, under this setting (high dropout along with opposite dropout dropout in the two groups), the estimated slope was also remarkably biased, thus leading to totally erroneous inferences. Using the correct EIM led to considerably underestimated precision measures (recall that correct inference is obtained by the sandwich formula) as well, but at a much lower level compared to that of the naive EIM.

SIMULATIONS
We evaluated the performance of the examined precision metrics through a simulation study including N = 500 individuals in each dataset and using 2000 replications. Three scenarios regarding the true data generating mechanism were examined. In the first one, data were generated according to a random intercept and slope model, with the true parameter values provided in Section 3.2. A MAR logistic dropout mechanism was applied using (c 1 , c 2 ) = (−2.31, −0.20).
The median (IQR) number of the observed measurements (averaged over replications) was 6.7 (3.0,12.4). The fitted model was correctly specified. In the second scenario, a binary covariate, G, was introduced using the data generating mechanism described in Section 4.1. The dropout mechanisms of the two groups, though, were assumed to be different:  1,12.4), respectively. The fitted model was also correctly specified. The third scenario was identical to the second one, but the true covariance structure, apart from a random intercept and slope covariance structure, included an additional random square-root time trend, which was ignored in the fitted model. The variance-component parameters were the same as those in Section 5.1. The median (IQR) numbers for group 1 (G = 1) and group 0 (G = 0) were equal to 6.5 (2.3,16.0) and 6.5 (3.0,12.1), respectively. For each simulated dataset, we calculated the SEs based on the naive OIM, the full OIM (including both the fixed effects and the variance components, which is asymptotically equivalent to the correct EIM as derived in Sections 3 and 4), the correct EIM, and the robust (sandwich) formula applied to the OIM. For each scenario, we recorded the average SEs and the empirical coverage probabilities for the four methods. Parameter estimates and the naive OIM were obtained by the lme 12 (version 3.1.160) function in R (version 4.2.2). An R function created by the authors was used to calculate the OIM and the robust variance formula applied to the OIM (available as Data S1). The formula for the correct EIM was evaluated at the fitted parameter values, with expectations taken over the true data generating mechanism. Illustration of the default SE estimators of other popular packages using one simulated dataset is provided in Appendix E in Data S1. The results are presented in Table 1. Precision estimates for the intercept were adequate in all scenarios. As expected, since dropout was MAR and the fitted model was correctly specified in the first two scenarios, fixed-effect parameter estimates were approximately unbiased. For the first scenario, confirming the asymptotic results from Section 3, the SE of the slope estimate obtained by the naive OIM was biased downwards (9.6%) compared to the empirical SE. As a result, the empirical coverage probability was lower (92.5%) than the nominal level. The full OIM, the correct EIM, and the robust variance formula all yielded valid results; since each dataset included 500 individuals and there were about six observed marker measurements (out of the 16 intended to be collected), the asymptotic arguments behind the robust variance formula worked well. In the second scenario, the naive estimator of the SE was also biased downwards for both estimates, the population slope in group 0 and the slope difference, by 7.0% and 10.8%, respectively. The corresponding empirical coverage probabilities were equal to 93.4% and 91.5%, respectively, which are comparable to the findings presented in panels B 1 and B 2 of Figure 1. The SEs obtained from the full OIM, the correct EIM, and the robust variance method were close to the empirical SD of the corresponding estimates. For the third scenario, where the covariance structure was misspecified, the degree of underestimation of the SE by the naive method was higher, 14.1% and 19.6% for the population slope estimate in group 0 (G = 0) and the slope difference estimate, respectively. When the full OIM was adopted, there was still some degree of underestimation, 9.0% and 9.3%, respectively; very similar results were obtained for the correct EIM. Since the covariance structure was misspecified and the dropout mechanisms applied to the two groups were MAR (and of opposite direction as well), the fixed-effect estimates, except for the intercept's, were seriously biased. Therefore, TA B L E 1 Simulation study results comparing the performance of precision estimates including N = 500 individuals in each dataset with 2000 replications. the poor empirical probabilities of the naive method regarding the population slope estimate in group 0 (G = 0) and the slope difference estimate, are attributed to both (i) bias in the corresponding estimates and (ii) underestimated SE. The empirical coverage probabilities improved by using the full OIM or the correct EIM and even further by using the robust estimator of the SE (Table 1).

APPLICATION
The CASCADE study is an international collaboration of HIV cohorts that includes patients with well-estimated dates of seroconversion. To explore the consequences of using the naive OIM to obtain SEs in real data, we used CD4 data restricting to Europeans (ie, born in Europe), infected through sex between men after January 1, 2000. As development of clinical AIDS or ART initiation are known to affect CD4 trends, all CD4 data collected after the first of the previous two events were excluded from the analyses, leading to monotone missingness (drop-out). Dropout could also occur due to death, but treatment initiation was the dominant reason for dropout as the proportion of patients developing AIDS or dying before ART initiation was negligible. Thus, in line with many researchers, we assumed that dropout can be considered MAR since ART initiation, which, at least up to the end of 2015, was based on the values of the observed CD4 counts. This implies that a correctly specified LMM should yield unbiased estimates. In total, 9144 met our inclusion criteria, of whom 1086 (11.9%) were >45 years old at seroconversion. 6444 (70.5%) experienced a dropout event, with the dropout rate being higher in those aged >45 (78.5%) compared to those ≤45 (69.4%). As suggested by many researchers, the square root transformation led to an almost linear within-individual decline over time since seroconversion as well as to an approximate Normal distribution. We applied a random intercept and slope model on the square-root scale allowing for different trajectories by baseline age (>45 and ≤45 years) through the lme 12 function. SEs for the fixed effects were obtained based on (i) the naive OIM (as returned by the vcov function in R), (ii) the full OIM, and (iii) the robust variance formula applied to the OIM. The results are presented in Table 2. Firstly, higher age was associated with lower initial CD4 counts and a faster rate of CD4 decline. SEs for the baseline CD4 values were very similar across the three methods. For the CD4 slope of those aged ≤45 years, though, the naive OIM yielded a 9.8% lower SE compared to that from the full OIM. To investigate whether this result is in line with our analytical findings, we fitted the logistic MAR dropout model of Section 3 to square-root CD4 TA B L E 2 Results based a linear mixed model applied to natural history CD4 data from the CASCADE study. counts from the CASCADE data. The estimated average hazard for dropout at 500 CD4 cells/ L was around 10% and c 2 about −0.20 for the two age groups. Thus, taking into account these estimates, our finding of a 9.8% underestimated SE is in line with our analytical findings in Figure 1 (panel A 2 ). The SE for the slope difference between the two age groups was quite similar by both methods (Table 2); this can be explained by the fact that the dropout probabilities for the two groups increased at the same rate as CD4 counts decrease. The sandwich estimator resulted in slightly higher SEs compared to those from the OIM, especially for the rate of decline in those ≤45 years old (relative difference 6.9%). However, the magnitude of difference was not so large to clearly indicate model misspecification. 16

DISCUSSION
In this article, we analytically derived the correct form of the EIM of LMMs under any specific MAR dropout model to juxtapose and compare it with the naive EIM, aiming to highlight why the naive EIM can be biased under MAR data. The case of iid data (no other covariates apart from time) and a two-group case were considered, assuming that the measurement times are fixed. We showed that the SE for a single slope or a slope difference between two groups produced by the naive EIM (effectively equivalent to the naive OIM and routinely provided by statistical packages) can severely underestimate the true variability, leading to poor coverage rates especially when the degree of MAR dropout is high. The underestimation is due to the fact that the naive EIM only uses the fixed-effect submatrix of the OIM, setting the off-diagonal blocks to zero; but the off-diagonal blocks of the EIM can be far from zero under MAR dropout, implying that the fixed-effect estimates are no longer asymptotically uncorrelated with the variance-component estimates, as it is the case under MCAR or complete data. Regarding the slope difference parameter, the naive EIM could be even more biased when the magnitude or the direction of dropout differs substantially between the two groups. We also investigated the effect of misspecifying the covariance structure. In this case, as shown by White, 15 a sandwich formula to adjust for model misspecification is required. We showed that SEs based on the naive EIM or the correct EIM can underestimate the true SE under MAR dropout, although the correct EIM to a lesser degree, with the bias of the correct EIM being attributed to model misspecification. To also examine the performance of the examined precision measures in finite samples, we carried out a simulation study, with the results confirming our analytically derived findings. Thus, our results clearly show that the full OIM, which is asymptotically equivalent to the correct EIM but without requiring correct specification of the dropout mechanism, should be preferred to the naive EIM in all cases. However, when misspecification of the covariance structure is expected, the sandwich estimator should also be used. Note though that there are concerns about the sandwich estimator performance in small samples. 28 An interesting alternative to the sandwich estimator can be a cluster bootstrap-based variance estimator, which, however, was not explored in the paper. Similar findings were observed in the application to CD4 data from the CASCADE study. Focusing on the CD4 decline of those aged ≤45 years, the SE produced by the naive OIM was 9.8% lower than the corresponding one by the full OIM. For the slope difference by age group, though, the naive OIM and the full OIM yielded essentially the same results. This can be explained by fact that the dropout mechanisms applied to the two groups were quite similar. The robust variance formula resulted in slightly higher SEs for the slope parameters (6.9%), which is not so large to certainly suggest model misspecification. 16 However, according to previous analyses, 17,18 a more elaborated covariance structure may be required.
Although potential problems with using the naive EIM in simple settings have long been recognized and illustrated by Kenward and Molenberghs,4 it is surprising that popular statistical packages for LMMs still use a naive SE by default. Maruo et al 14 used simulation studies to investigate the performance of the naive OIM in multivariate Normal models assuming covariance structures that are less frequently encountered in real applications (compound symmetry and unstructured covariance) compared to random-effect-based covariance structures. Their results were in line with our analytical findings; the degree of bias in the naive OIM was linked to the magnitude of MAR dropout.
The results reported in this manuscript concern LMMs with certain frequently used covariance structures (random intercept and slope and splines in the design matrix of the random effects) under MAR dropout. However, although not explored in this paper, we suppose that similar results may hold for LMMs with different covariance-structures or other types of MAR missingness (eg, nonmonotone missingness), especially given that a more general exposition of the problem has been provided by Kenward and Molenberghs et al. 4 Presenting analytical findings specifically for LMMs, though, might have a greater impact on changing the standard practice of statistical packages regarding the estimation of the SEs of the model parameters. For example, if one used a simulation study, it would be difficult to tell whether an observed 93% coverage rate truly reflects bias or it is due to random chance.
In this paper, we do not propose using the correct EIM in real data applications; the correct form of the EIM was only required to juxtapose its differences with the naive EIM. The fact that the correct EIM requires (i) correct specification of the missingness mechanism and (ii) can be complex for non-monotone missingness limits its usefulness given that the OIM is easy to calculate and does not require specification of the dropout mechanism. However, our analytical and simulation-based results concerned the case where sample size is large. The performance of the examined precision measures under low sample sizes and more complex designs (eg, crossed or nested random effects) needs further consideration. Moreover, especially for complex designs, the full OIM is more difficult to code than the naive OIM. Apart from covariance-structure misspecification, misspecification of the mean model or the error distribution is common in real applications, which, however, was not explored in the current study. We also acknowledge that the performance of the naïve and correct EIM was compared under a known true model, which is not feasible in real-data analyses. Nevertheless, results from the real-data application were in line with our analytical and simulation study findings.
To conclude, we have shown that the SEs currently produced by most statistical packages for fitting LMMs may be, strictly speaking, incorrect if dropout is MAR which could have serious consequences, for example, in clinical trials evaluating treatment effectiveness based on a marker rate of change (slope) over time. The bias is expected to be particularly large when dropout depends on the observed marker values and the proportion of dropouts is high or differs substantially between covariate groups. We totally propose that statistical packages on LMMs should provide the full OIM by default. To obtain the OIM in R, the merDeriv 21 package or the R function available as Data S1 in this paper could be used. Sandwich estimators or a cluster bootstrap-based variance estimator should be also considered, especially when model misspecification is probable.