Negative variance components and intercept‐slope correlations greater than one in magnitude: How do such “non‐regular” random intercept and slope models arise, and what should be done when they do?

Statistical models with random intercepts and slopes (RIAS models) are commonly used to analyze longitudinal data. Fitting such models sometimes results in negative estimates of variance components or estimates on parameter space boundaries. This can be an unlucky chance occurrence, but can also occur because certain marginal distributions are mathematically identical to those from RIAS models with negative intercept and/or slope variance components and/or intercept‐slope correlations greater than one in magnitude. We term such parameters “pseudo‐variances” and “pseudo‐correlations,” and the models “non‐regular.” We use eigenvalue theory to explore how and when such non‐regular RIAS models arise, showing: (i) A small number of measurements, short follow‐up, and large residual variance increase the parameter space for which data (with a positive semidefinite marginal variance‐covariance matrix) are compatible with non‐regular RIAS models. (ii) Non‐regular RIAS models can arise from model misspecification, when non‐linearity in fixed effects is ignored or when random effects are omitted. (iii) A non‐regular RIAS model can sometimes be interpreted as a regular linear mixed model with one or more additional random effects, which may not be identifiable from the data. (iv) Particular parameterizations of non‐regular RIAS models have no generality for all possible numbers of measurements over time. Because of this lack of generality, we conclude that non‐regular RIAS models can only be regarded as plausible data‐generating mechanisms in some situations. Nevertheless, fitting a non‐regular RIAS model can be acceptable, allowing unbiased inference on fixed effects where commonly recommended alternatives such as dropping the random slope result in bias.


INTRODUCTION
Statistical models with random intercepts (RI models) and with random intercepts and slopes (RIAS models) are widely used for analyzing clustered data, including longitudinal repeated-measures data.They offer a flexible and parsimonious way of accounting for the correlated nature of repeated measurements within an individual, can easily accommodate severely unbalanced data, and produce valid estimates when there are missing outcome data, provided the data are missing at random conditional on the observed data.However, when such models are fitted to data, convergence problems are sometimes encountered, for example, references 1,2.Depending on the statistical package and command options chosen, models sometimes do not converge, or convergence is to models with negative estimated intercept and/or slope variance components, or to models where variance component estimates and/or the estimated correlation between random intercepts and slopes is on a parameter space boundary.Discussion of negative variance components in the statistical literature goes back at least to a paper by Nelder published in 1954. 3Nelder outlines how such negative values can arise in the analysis of randomized block and split-plot experiments by analysis of variance (ANOVA) and proposes a modified Normal model in which "negative components of variance can appear and be given a satisfactory interpretation."5][6] Implicit or explicit in much of the literature on negative variance components is the assumption that, because a variance must be non-negative by definition, variance components also cannot have underlying values that are negative.We do not subscribe to this view.In this paper, we use the term "pseudo-variance" to refer to a variance component that takes a negative value which precludes its interpretation as a variance.Some investigations of non-positive definite covariance matrices in mixed effects models have treated the phenomenon as one kind of model non-convergence. 7The issue is often regarded as an artifact of estimation.In some cases the cause may indeed be the unlucky play of chance: when the number of individuals (or data clusters) from whom random effects are to be estimated is small and a variance component has a small positive underlying value, sampling variability may result in a negative estimate (or zero if estimates are constrained to be non-negative).
Such "unlucky" negative estimates of variances that have underlying non-negative values are not the focus of this paper.Rather, our interest is in certain marginal distributions that are mathematically identical to those from RI or RIAS models with negative intercept and/or slope pseudo-variances and/or pseudo-variances that imply intercept-slope correlations greater than one in magnitude (which we term a "pseudo-correlation").We term these "non-regular" RI and RIAS models and investigate how and when such models arise, specifically in the RIAS scenario.Previous discussion of such non-regular models-which has sometimes used the terms "singular fit" and "boundary fit" in this context-has focused predominantly on the simple RI model (and repeated-measures ANOVA).In the context of a regular RI model, the random intercept variance can be considered to represent both a variance component and a covariance, with the intra-cluster correlation coefficient (ICC) representing the proportion of the variance of the outcome that is shared by measures from the same cluster.If individuals from the same cluster tend to be less similar to one another than individuals from different clusters, as can happen if there is competition (rather than sharing) between individuals from the same cluster, then the data will imply a RI model in which the random intercept pseudo-variance (and the within-cluster correlation) are negative.][10][11][12] Drawing on Nelder's argument that negative variance components may be legitimate when they give rise to a valid "complete distribution" such that "the variance matrix is positive semi-definite," 3 Verbeke and Molenberghs distinguish between the hierarchical interpretation of a mixed effects model, in which the variance components are interpreted as variances and therefore must be non-negative, and the marginal interpretation based on the implied marginal model that is obtained by integrating over the random effects.In their view, negative pseudo-variances are entirely acceptable in contexts where only the marginal interpretation is of interest and the marginal variances are non-negative.Furthermore, they argue that non-regular models can be given a valid hierarchical interpretation.Because random effects are latent, many different hierarchical models can imply the same marginal model, and the data cannot help us choose between them.The authors apply this fact to the RI model and propose modifications to the standard formulation of the model that can account for negative intercept pseudo-variances. 8,10,13,14ecause longitudinal studies are prominent in medical research, the RIAS model is an important and frequently used model in this field.Just as there are marginal distributions with negative ICC that correspond to non-regular RI models, so there exist marginal distributions that are mathematically equivalent to non-regular RIAS models.Molenberghs and Verbeke have shown that the possibility of formulating hierarchical models that induce marginal models with structures such as negative compound-symmetry correlation extends theoretically to linear mixed models in general. 14However, to our knowledge, there has been little detailed consideration in the literature of when and how non-regular RIAS models arise in practice, a question that has important implications for longitudinal study design.In the context of the quadratic polynomial growth model (RIAS model with linear and quadratic time terms), McNeish and Bauer note that having a small number of individuals and/or a small number of repeated measures increases the probability of non-convergence, which they define as including non-regular fits; their focus is on methods to avoid non-regular models, and not on the specific situations in which they arise. 7he non-regular RI model with negative intercept pseudo-variance corresponds to a situation that can realistically occur in practice.For example, consider an experiment where n rats are placed in each of a number of cages, with a single bowl of food per cage.The rats will compete for the food, and as a result their weights over time will exhibit a negative intercept pseudo-variance and a negative ICC.In this setting there is a bound on the ICC 14 : although it can be negative, it must be greater than −1∕(n − 1).As we increase the number of rats per cage, this bound approaches zero, implying that ICCs that are possible with a small number of rats per cage become impossible with a larger number of rats per cage.For example, with two rats in each cage, a pairwise correlation of −0.9 between the two outcome values is possible, but with three rats the three pairwise correlations cannot all be −0.9.This is not conceptually problematic, because when we change the experimental design by adding an extra rat to each cage, we expect behavior to change as each rat now has two competitors.Therefore, we would expect the change from two rats per cage to three rats per cage to alter the true parameter values.Note that, in discussions of the non-regular RI model, there has been little consideration of the fact that clustered data can be sampled in two different ways.Studies may sample whole clusters of size n; however, it is also possible first to sample clusters and then to randomly sample a subset of individuals (say, of size m) from each cluster.In such studies it is the actual cluster size (n) that gives the bound on the ICC.
When the RIAS model is used for modeling longitudinal data, there are (as with the non-regular RI model) bounds on the values that the parameters can take.In this setting n is not the size of a cluster of individuals, but the number of time points at which an individual contributes measurements.This introduces a conceptual complexity, at least in some settings.Suppose we have a non-regular RIAS model that describes an outcome variable that is measured annually for 3 years from baseline.If we take additional intermediate measurements at 6, 18, and 30 months, we would typically not expect the measurements taken at 0, 12, 24, and 36 months to change, assuming that the underlying process under study is unchanged by the additional measurements.While the additional measurements would be expected to improve precision of model parameter estimates, they would often not be expected to affect the underlying parameters.The switch from having annual measurements to having measurements every 6 months in a non-regular RIAS model is therefore not akin to switching from having four rats per cage to seven rats per cage in a non-regular RI model; rather it is akin to having a large number of rats (at least seven) in each cage, from which we now sample seven rather than four.This conceptual concern over generalizations to data collected according to schedules that differ from those used in the current study needs consideration when deciding whether a non-regular RIAS model represents a plausible data-generating mechanism.Nonetheless a non-regular RIAS model can, in many situations, be regarded as a pragmatic description of the data at hand.
In this paper, we consider when and how non-regular RIAS models arise, how to recognize in practice when data imply such a model, and whether these models correspond to a plausible data-generating mechanism.To gain insight into the phenomenon, we restrict our exploration to scenarios where data are balanced, that is, measurements are taken at the same time points across individuals.In Section 2 we introduce the RIAS model and notation.In Section 3 we simulate data to explore whether non-regularity in a RIAS model corresponds to specific characteristics of individual trajectory plots, then introduce a dataset where fitting a RIAS model gives a negative random slope pseudo-variance and explore how different statistical software packages handle this situation.In Section 4 we elucidate the eigenvalues and eigenvectors relating to the variance-covariance matrices that govern variability in the RIAS model, in order to propose the concept of "defining eigenvalues" and explore factors that determine the parameter space compatible with non-regular RIAS models.We also use the defining eigenvalues to investigate two kinds of model misspecification that can cause non-regular RIAS models to arise.We summarize and discuss our findings in Section 5.

THE RIAS MODEL: NOTATION AND CONDITIONS FOR NON-REGULARITY
We introduce the non-regular RIAS model in the context of balanced and complete data, assuming that we have longitudinal data at n time points, centered such that the mean time is zero.Consider the analysis of such data with a RIAS model with one random intercept and one random slope (for time), which we term the "simple RIAS model": and where y ij is the outcome for the i th person at the j th visit, for i = 1, … , N and j = 1, … , n; t j is the centered time of the j th visit;  0 is the fixed intercept;  1 is a fixed effect for time; b 0i is a random intercept for the i th person; b 1i is a random slope for time for the i th person; and e ij is the residual error for the i th person at the j th visit.The random effects are bivariate normally distributed around a mean of zero, with a random intercept variance of  2 b0 , a random slope variance of  2 b1 , and an intercept-slope covariance of  b01 .The residual error is independently normally distributed with mean zero and residual variance  2 e .A mixed-effects model such as the one in Equation ( 1), for balanced and complete data, can be written in general matrix form as a hierarchical model: where, in the simple RIAS setting, Y i is the n-by-1 vector of the outcomes for the ith person, X is an n-by-2 design matrix consisting of a column of 1's and a column of measurement times (t j ) that link  (a vector of fixed effects for the intercept and time) to Y i , Z = X and links the vector of two random effects b i to Y i , b i ∼N(0, G), where G is the 2-by-2 matrix from Equation (1), and R n =  2 e I n where I n is a n-by-n identity matrix.The model can be extended to arbitrary fixed effects structures, as long as it includes a fixed intercept and a fixed slope in time, and to unbalanced data.
The model also has a marginal formulation: For regular RIAS models b i follows a bivariate normal distribution and Zb i follows a multivariate normal distribution, so both G and ZGZ T will be positive semidefinite (PSD).Non-regular RIAS models arise when  is PSD, but G (and therefore also ZGZ T ) is not.
For G to be PSD we require that  2 b0 ≥ 0,  2 b1 ≥ 0, and  2 b0  2 b1 ≥  2 b01 .When one or more of those conditions do not hold, then G is not PSD and  2 b0 and  2 b1 cannot be interpreted as variances.Further,  b01 cannot be interpreted as a covariance and  01 = cannot be interpreted as a correlation coefficient.By analogy with "pseudo-variance," we use the terms "pseudo-covariance" and "pseudo-correlation" to denote these parameters when G is not PSD.Note that we use the modulus of the product of the random intercept and slope variances in defining the pseudo-correlation to ensure that this always takes real, rather than imaginary, values.Equation (1) implies that: From the covariance formula, we see that, in contrast to the RI model, a negative intercept pseudo-variance in a simple RIAS model does not necessarily imply a negative covariance-or correlation-between outcomes for an individual at different time points (and nor does a positive intercept variance necessarily imply a positive covariance or correlation), since the marginal covariance depends also on the time values of the points in question and on  2 b1 and  b01 .Furthermore, for  2 b1 to be positive in the simple RIAS model, the marginal variances must be a quadratic function of time with a positive coefficient for the quadratic term; conversely,  2 b1 will be negative when the quadratic term coefficient is negative.Situations with a negative quadratic term coefficient seem intuitively plausible.For example, patients with a degenerative disease may initially be similar on a measure of disease severity, then may become more variable due to differing individual rates of deterioration, before finally becoming more similar again due to a floor/ceiling effect.If data were collected for a limited time period during the patients' deterioration or at a limited number of time points, the sigmoid shape of each patient's overall trajectory may not be apparent; such data could correspond to a simple RIAS model with negative  2 b1 .The intercept-slope pseudo-correlation,  01 = . As we will see in the next section, a boundary estimate of  01 as 1 or −1 is often a key indicator in results from statistical software that data imply a non-regular model.Note that in extensions of the RIAS model when G has >2 dimensions (ie, the model has >2 random effects), non-negative variance components and correlations ≤1 are not sufficient conditions to ensure G is PSD; we do not consider such more complex scenarios further.

DATA SIMULATIONS AND EXAMPLE
This section addresses the question of how it can be recognized in practice that a particular dataset implies a non-regular RIAS model.We explore examples of data implying non-regular RIAS models, first using simulated data to explore the relationships between the various manifestations of non-regularity in RIAS models (ie, negative  2 b0 , negative  2 b1 , and  01 outside [−1, 1]) and characteristics of individual trajectory plots, and then using an actual dataset to examine how major statistical software packages handle non-regularity.

Simulated examples
We simulated a number of longitudinal datasets, each comprising outcome values at three time points for 500 individuals, from the marginal multivariate Normal distribution in Equation ( 3) with an arbitrary constant mean value over time ( 0 = 80,  1 = 0).The simulations were performed in SAS using PROC IML; simulation was from the marginal variance/covariance matrix , calculated from the matrix equation  = R n + ZGZ T .The values of  2 b0 ,  2 b1 ,  b01 , and  2 e in matrices G and R n were chosen to represent a range of both regular and non-regular RIAS models.Trajectory plots of the resulting data were created with PROC SGPLOT.These show that, even when no mean effects over time are present, there are generally no obvious visible indications when data imply a non-regular RIAS model.
If  2 b0 is decreased without changing the value of the other three parameters, then the marginal variance is decreased at every time point.This does not result in any distinctive feature that can help identify compatibility with a non-regular RIAS model in a single dataset; indeed, a trajectory plot for data implying a small negative  2 b0 may be virtually indistinguishable from one for data implying a small positive  2 b0 (Figure 1).A negative  2 b1 corresponds to a marginal variance that increases and then decreases over time.Even in a large dataset, this is likely to be visibly apparent from trajectory plots only when the magnitude of the negative  2 b1 is reasonably large (Figure 2).
The implied  01 is outside [−1, 1] when  b01 is large in magnitude relative to  2 b0 and  2 b1 .A  b01 of large magnitude corresponds to a pronounced divergence (if positive) or convergence (if negative) of the individual trajectories.In combination with a large  2 b0 relative to  2 e (corresponding to a small amount of within-individual fluctuation over time), this could give the trajectories an extreme fan-like appearance; however, in more commonly observed situations with smaller random effects variances and pseudo-covariance and larger residual variance, an implied pseudo-correlation outside [−1, 1] is not easily discernible from the trajectory plot (Figure 3).

A non-regular RIAS model: revisiting the rat data
Negative random effects pseudo-variances occurring specifically in the RIAS model are rarely considered in the literature.An exception is Molenberghs and Verbeke's brief discussion of the issue as it arose in analysis of the "rat data" presented in their 2000 book. 13,15The data are from an experiment investigating the effect of inhibiting testosterone production on craniofacial growth in male Wistar rats randomized to three groups (control and low-and high-dose treatment).Using a transformation of the time scale to account for the non-linear relationship between growth and the rats' age, Verbeke and Molenberghs used PROC MIXED in SAS to fit a RIAS model with a separate fixed slope effect in each treatment group.

F I G U R E 2
Trajectory plots showing simulated outcome data at three time points (−1, 0, 1) for 500 individuals with intercept pseudo-variance 10; varying negative slope pseudo-variance; intercept-slope pseudo-covariance 2; and residual variance 20.
When the "nobound" option was used to remove the constraint on the random intercept and slope variance components to be non-negative, SAS estimated  2 b1 to be negative.We re-fitted Verbeke and Molenberghs' RIAS model with PROC MIXED in SAS, with and without the "nobound" option; we also fitted a RI model to the data with PROC MIXED.We then fitted the same RIAS model using the "mixed" command in Stata and the "lmer" function in the lme4 package in R. Code for all models is provided in Section 6 of the Technical Appendix.Model convergence was achieved with all five methods (Table 1).The models fitted in Stata and R had the same parameter estimates.Both the Stata "mixed" command and "lmer" use a G matrix parameterized by a Cholesky decomposition, which ensures that it is PSD. 16,17

TA B L E 1
Model estimates for the rat data 13,15  Note: The RIAS model fitted here is more general than that defined in equation ( 1), allowing different slopes in each group and with the number and timing of visits allowed to differ between rats.The "time" variable time ij for each rat i for i = 1, … , 50 at measurement point j for j = 1, … n i was derived from the rat's age as . Abbreviations: NC, not calculated; REML, restricted maximum likelihood; RI, random intercept; RIAS, random intercept and slope; SE, standard error.a The same model, using a Cholesky decomposition to estimate the G matrix, can be fitted with SAS PROC MIXED, using the option "type = fa0(2)" in the RANDOM statement.A factor analytic parameterization can also be implemented in SAS PROC MIXED, using "type = fa(1)" (for 1 factor).Both approaches produced the same results as those from Stata and R lme4 for the rat data.b Standard errors for random effects are given as reported in Stata.
The log-likelihood for the model with SAS PROC MIXED and "nobound" was slightly higher than those for the other models, and the standard errors for the fixed effect coefficient estimates were smaller.The fixed effects estimates from the RIAS model fitted with SAS PROC MIXED without the "nobound" option and from the models fitted in Stata and R were the same to two decimal places, and the same as those from the RI model to one decimal place.However, the reported random effects estimates for the RIAS models differed: whereas SAS PROC MIXED simply constrains  2 b1 at zero and does not calculate  01 (reporting it as 0), the Cholesky decomposition used by Stata and R lme4::lmer allows  01 to go to its boundary value 1, such that convergence is achieved with a small positive value of  2 b1 .While the Cholesky decomposition method ensures that the estimated G matrix is PSD, this is not necessarily the case for the default method used by SAS PROC MIXED.When SAS PROC MIXED was used with the "nobound" option, an estimated value for  01 was reported, based on the modulus of the negative estimate of  2 b1 .The user is alerted to the fact that data imply a non-regular RIAS model in SAS and R, but not in Stata.With SAS PROC MIXED, the log included the note "Estimated G matrix is not positive definite."With lme4 in R, the warning "boundary (singular) fit: see ?isSingular" was issued.The "isSingular" function returned "TRUE"; according to the lme4 package documentation, this function "evaluates whether a fitted mixed model is (almost/near) singular, that is, the parameters are on the boundary of the feasible parameter space: variances of one or more linear combinations of effects are (close to) zero."In Stata, by contrast, no warning was given, and the problem is barely apparent from the output produced: a small positive estimated  2 b1 looks plausible, the default output reports  b01 but not  01 , and it is only by using the "stddev" option that the user may be alerted to the problem by the reported correlation of 1 (95% CI: −1, 1).

EIGENVALUES AND EIGENVECTORS FOR THE SIMPLE RIAS MODEL
In this section we use eigenvalue theory to characterize how and when non-regular RIAS models arise.In Section 4.1 we show that non-regular RIAS models can be characterized using what we term "defining eigenvalues."In Section 4.2 we describe how the number and timing of visits impact on the defining eigenvalues.These defining eigenvalues then allow boundaries on the parameter space that is compatible with non-regular RIAS models to be derived (Section 4.3) and illustrated graphically (Section 4.4).In Section 4.5 we consider two causes of non-regularity.

"Defining eigenvalues" for the simple RIAS model
We show (see Section 1 in the Technical Appendix) that ZGZ T has only two non-zero eigenvalues, these being the eigenvalues of the 2-by-2 matrix GZ T Z, with the remaining n − 2 eigenvalues all being zero.We denote these eigenvalues by  k , such that (without loss of generality)  1 ≥  2 and  k = 0 for k = 3, … , n.We term  1 and  2 the "defining eigenvalues of ZGZ T ."Now, since R n =  2 e I n , the eigenvectors of  and those of ZGZ T are identical, with each eigenvalue of  being equal to the corresponding eigenvalue of ZGZ T plus  2 e .Denoting these eigenvalues of  by  k , we have and We term  1 and  2 the defining eigenvalues of .Both these defining eigenvalues and  2 e must be non-negative for  to be PSD.For a non-regular model, the smaller of the defining eigenvalues of ZGZ T ( 2 ) must be negative, but not more negative than minus the residual variance ) . The other defining eigenvalue ( 1 ) can be negative or positive.Now note that since time has been centered, Z T Z is a diagonal matrix with positive elements n and q where q = ∑ n j=1 t 2 j ; when the time points are integer-spaced as well as centered, q = n(n−1)(n+1)

12
. Therefore From ( 4) and ( 5) it follows that  1 and  2 are the eigenvalues of GZ T Z + R 2 where R 2 =  2 e I 2 .We term this key matrix the "PSD defining matrix" and denote it by D.
Provided that D is PSD (and that  2 e is non-negative), then  will be PSD, even if G is not.As well as their role in determining conditions for  to be PSD, even when G is not, the defining eigenvalues of  relate to the standard errors of β obtained when fitting the RIAS model to data.We show in Section 2 of the Technical Appendix that the standard errors of β depend only on the defining eigenvalues of  ( 1 and  2 ), and not on the remaining eigenvalues.This result has implications that we return to in Section 4.5.1.

Impact of changing the number and timing of follow-up visits
The form of the PSD defining matrix D has implications concerning the number and timing of measurements.We show in Section 3 of the Technical Appendix that if we increase n and q while keeping G constant, and one or both of the defining eigenvalues of ZGZ T is initially negative, then adding follow-up visits will increase the magnitude of that eigenvalue, ultimately until it is greater than  2 e in magnitude.Indeed, when ZGZ T has at least one negative eigenvalue, it is not possible continually to increase n and q while G remains unchanged: ultimately this must result in  no longer remaining PSD.
It should be noted that N, the number of individuals contributing measurements, has no effect on the defining eigenvalues of ZGZ T , and hence plays no role in the question of whether data are compatible with a non-regular RIAS model.If N is small, parameter estimates implying a non-regular model may be more likely to occur than when N is large, but this is because the precision of estimation is low with a small number of observations.

Defining boundaries for non-regular simple RIAS models
In order to characterize situations when a non-PSD G matrix is compatible with a PSD  matrix, we will define two boundaries demarcating parameter space within which each matrix is PSD, that is, a G PSD boundary and a  PSD boundary.Because software may use the modulus of the product of the estimates of  2 b0 and  2 b1 to estimate  01 when either  2 b0 or  2 b1 is negative and the other is positive, we also define a boundary within which the pseudo-correlation between the random intercepts and slopes based on the modulus of the product of We will use the term "G pseudo-correlation boundary" to denote a boundary of parameter space in which  01 is within [−1, 1], even though the corresponding G matrix is not PSD.This may occur when both  2 b0 and  2 b1 are negative and  01 is within [−1, 1], or when either  2 b0 or  2 b1 is negative, and the other positive, given that  01 is defined using the modulus of the product of  2 bo and  2 b1 .For  to be PSD, all of its eigenvalues must be non-negative.So,  2 e must be non-negative, as must both of the eigenvalues of D. From (8) we therefore require and The  PSD boundary is therefore defined by values of  2 b0 ,  2 b1 ,  2 b01 , and  2 e that solve the following equation: From the direction of the inequalities in (10) to (13), it follows that re-arranging ( 14) as an expression for  2 b0 in terms of  2 b1 ,  2 b01 , and  2 e (provided that the values of these parameters also satisfy (10)  ) . ( Comparing this formula with the corresponding formula for G to be PSD (Equation 9) shows that the difference between the G and  PSD bounds on  2 b0 decreases as n and q increase, and as  2 e decreases, but always remains greater than zero as long as  2 e is greater than zero.Graphical depictions of the relationships between the various random effects parameters and the G and  PSD bounds are presented in the next section.These will further illuminate the effects of changes to the parameter values on the gap between the two bounds, which represents the parameter space corresponding to non-regular RIAS models.

Plots illustrating defining boundaries
In this section, a series of plots illustrates the relationships defined in the preceding sections.The values of the parameters kept fixed in each plot have been chosen to illustrate specific aspects of these relationships, and these values change from one plot to the next.In each plot axis scales have been chosen to highlight the points illustrated.Figure 4 illustrates the relationship between the G PSD and pseudo-correlation boundaries (blue solid and dotted lines, respectively) and the  PSD boundary (pink line) for data comprising three measurement time points.The intercept (pseudo-)variance is displayed against the slope (pseudo-)variance component for fixed values of  b01 and  2 e , with shading used to characterize G and .The white area represents parameter space corresponding to both a PSD  matrix and a PSD G matrix, that is, a regular RIAS model.Conversely, the black area represents parameter space that cannot correspond to actual data because the implied  matrix is not PSD.The other seven shaded areas represent parameter space corresponding to the various types of non-regular RIAS models.Such models may imply negative values for  2 b0 (blue),  2 b1 (yellow), or both (green); the implied value of  01 may additionally be outside [−1, 1] (gray and the darker shades of blue, yellow, and green).All eight combinations of the three relevant conditions ( 2 b0 positive or negative,  2 b1 positive or negative, and  01 within or outside [−1, 1]) appear as distinct areas).
In Figures 5-8, the pink, green, purple, cyan, and orange lines in each plot represent the  PSD boundaries for differing values of n and q.The area of parameter space for which data are compatible with a RIAS model implying negative  2 b0 decreases with increasing n (Figure 5), and the area for which data are compatible with a RIAS model implying negative  2 b1 decreases with increasing q (Figure 6), when the other random effects parameters are kept constant.The comparatively large gap between the pink and green lines shows that, when n and/or q are small, a small increase (eg, increasing the number of time points from three to four) can make a considerable difference to the area of parameter space for which data are compatible with a non-regular RIAS model.The comparatively small gap between the cyan and orange lines shows that increasing n and/or q has a much smaller impact when these are already large.The area of parameter space for which data are compatible with a RIAS model implying  01 outside [−1, 1] decreases as both n and q increase (Figure 7).As the magnitude of  b01 increases (its sign has no effect, as is clear from Equation ( 14)), the area of parameter space for which observable data are compatible with a RIAS model implying negative  2 b0 and/or  2 b1 decreases for given values of n and q, and holding the residual variance constant (Figure 7).When a RIAS model implies  01 outside [−1, 1], the area for which this is accompanied by negative estimates of  2 b0 and/or  2 b1 is greater when the magnitude of  b01 is small than when it is large (Figure 7).The area of parameter space for which data are compatible with a non-regular RIAS model increases with increasing  2 e for given values of n and q (Figure 8).

Some causes of non-regularity
In the previous sections, we have characterized non-regular RIAS models.Here we consider two ways in which misspecification of a regular mixed effects model can give rise to data that are compatible with a non-regular RIAS model.

Ignoring non-linearity in fixed effects
Curvature in the relationship between the outcome and time that is not accommodated by additional fixed effects in the model can be a cause of non-regularity.We show in Section 4 of the Technical Appendix that introducing unaccounted-for curvature into a RIAS model leaves the defining eigenvalues of  unchanged, while the non-defining eigenvalue (ie, the residual variance represented by the diagonal elements of R n ) will increase in expectation.If this increase in the non-defining eigenvalue of  is such that it remains smaller than the other two (defining) eigenvalues, then fitting the RIAS model will give parameter estimates that correspond to a regular RIAS model.However, if the non-defining eigenvalue becomes larger than either of the other two then parameter estimates that correspond to a non-regular RIAS model can result.Conceptually this result can be understood by appreciating that typically the (pseudo-) covariance between slopes and intercepts is likely to remain largely unchanged by the addition of curvature that is not accounted for in the model.Sigma PSD boundary (q = 1) Sigma PSD boundary (q = 5) Sigma PSD boundary (q = 10) Sigma PSD boundary (q = 50) Sigma PSD boundary (q = 100)

F I G U R E 6
Plot illustrating the effect of changing the sum of squared times (q) while keeping the number of time points constant (n = 3), with residual variance fixed at 10 and intercept-slope (pseudo-)covariance at ±2. Abbreviation: PSD, positive semidefinite.
Residual variances will increase, and intercept and slope (pseudo-)variances decrease, with the potential for (pseudo-) correlations between slope and intercept to increase in magnitude up to and beyond 1.
A further, somewhat surprising, consequence of unaccounted for non-linearity in fixed effects is that standard errors of the fixed effects are unchanged (assuming the software used allows non-regular models).This is because, as explained in Section 4.1 and in Section 2 of the Technical Appendix, the standard errors of β depend only on the defining eigenvalues of  and these are unchanged by the addition of the non-linearity.

Omitted random effects
The omission of additional random effects can also be a cause of non-regularity.Provided that there are measurements at sufficient follow-up times, then the introduction of additional random effects, such as random quadratic terms, can be investigated analytically.However, this is not possible with a limited number of follow-up times.For example, with data at three time points, a model that includes a random quadratic term is over-parameterized and cannot be fitted.In such situations we can conceptualize a model with a random quadratic term as being more appropriate, but cannot fit such a model.If a random effect (such as a random quadratic term) is omitted from the specification of a regular linear mixed model, then it is possible that the only simple RIAS model that is compatible with the data is non-regular.Indeed, in some situations a non-regular RIAS model can be interpreted as a regular linear mixed model with one or more additional random effects.For example, we show in Section 5 of the Technical Appendix that for three evenly spaced time points, data that are compatible with the simple RIAS model are also compatible with a whole set of parameterizations of a random quadratic model.Further, at least one such parameterization will be regular.So, we can think of the non-regular RIAS model for three-point data as being generated by a random quadratic model with parameters that cannot be uniquely estimated from the data.

DISCUSSION
In this paper, we have explored characteristics of non-regular RIAS models.Balanced data can imply a non-regular RIAS model when the eigenvalues of the marginal variance-covariance matrix are all positive, but one or both of the eigenvalues of the random effects variance-covariance matrix are negative.All but two of the eigenvalues of the marginal variance-covariance matrix for the RIAS model are equal to the residual variance.We term the other two eigenvalues the "defining eigenvalues" of the marginal variance-covariance matrix.Non-regular RIAS models arise when one or both of the defining eigenvalues are positive but smaller than the residual variance.We have shown that these two defining eigenvalues are also the eigenvalues of a 2-by-2 "defining eigenvalue matrix" which has elements that depend on the random effects (pseudo-)variances and (pseudo-)covariance of the RIAS model and on the number of measurement points and the times at which they occur.Using this defining eigenvalue matrix, we have shown that the parameter space compatible with non-regular RIAS models becomes smaller as the number of time points (n) and the sum of squared centered time values (q) increase.Hence, for any given non-regular model parameters (intercept and slope pseudo-variances, their pseudo-covariance, and residual variance), increasing n and/or q will ultimately lead to the marginal variance-covariance matrix becoming negative definite.In other words, particular parameterizations of non-regular RIAS models have no generality for all choices of n and q.Nevertheless, if we allow the parameters of the random effects variance-covariance matrix to change, then however large n and q become (while remaining finite), there are viable marginal covariance structures that correspond to non-regular RIAS models, as long as the residual variance remains above zero.
To address whether non-regular RIAS models can be considered plausible data-generating processes, a comparison with non-regular RI models is illuminating.Whether or not non-regular RI models can be considered as plausible data-generating processes is a subject of some controversy. 11,14,18Clearly interpretation of a negative variance in isolation is problematic.However, the marginal variance-covariance structure implied by the non-regular RI model can be completely plausible in situations where there is competition between members of a cluster, specifically where there is a negative correlation between every pair of observations from the same cluster and all observations have the same variance.In this setting the negative intercept pseudo-variance (when considered in combination with the residual variance) in the RI model has a useful interpretation that reflects the negative correlation between pairs of measurements, giving credence to the interpretability of the non-regular RI model.However, such an interpretative stance must be coupled with a recognition that in any real-life setting the possible values a negative intercept pseudo-variance can take differ by total cluster size: specifically, as the cluster size is increased, the most negative intercept pseudo-variance that can be observed will be reduced in magnitude, because the pairwise degree of competition among the original n individuals cannot be maintained with an increased number of individuals in each cluster and the correlation between the measurements on the n individuals will change.
Even though the RI model is a special case of the more general RIAS model, it is in our view harder to find examples in which non-regular RIAS models may be plausible data-generating processes for longitudinal data.Let us return to the example of a non-regular RIAS model that describes an outcome variable that is measured either annually or every 6 months for 3 years from baseline.If we change n and q by adding the six-monthly measurements, then the bounds on the random effect pseudo-variances and pseudo-covariances will change.Consequently, it may be the case that we have a non-regular model for data at 0, 12, 24, and 36 months that cannot possibly be correct for data with additional measurements at 6, 18, and 30 months.It is then a conceptual stretch to consider this to be a credible model for the underlying data-generation process.We can also imagine taking measurements more frequently than every 6 months: indeed we often imagine that we are modeling a continuous process with measurements at least theoretically possible at an infinite number of time points, even within a fixed follow-up duration.Ideally, in situations in which n and q do not represent characteristics of the underlying data-generating process, we would like to have models that describe the data irrespective of the number and timing of the measurements.
Of course, in practice the number of possible measurement times will not be infinite.In practical settings there will be a finite time span and a finite number of points at which measurements can be made, either by design or by biological constraint.Analysts may find it sufficient to use a data-generating model that is appropriate for a finite number of visits that is unlikely to be exceeded in practice, as opposed to a model that could apply to continuously collected data that will never be observed, in which case the concerns described in the previous paragraph can be disregarded.The non-regular RIAS model can also be an appropriate data-generating model for processes with discrete time points and a fixed follow-up time, for example, in an experiment investigating the effect of a treatment for insomnia on the number of hours of sleep per night over a given period such as a week.Here concerns over what would happen to parameters if additional measurements were taken do not arise.There are also situations where measurements at intermediate time points might be expected to alter measurements at future time points because the measurements affect the underlying process under study.For example, if the outcome measure is a score on a cognitive test, then carrying out the test at 6 months could conceivably alter the test score at 12 months through familiarity.In such settings there is less concern if model parameters for data at four time points are not applicable to data at seven time points, since n and q could be considered to be an inherent part of the data generation process.Nonetheless, whether a particular non-regular model could also apply to data sets with different measurement schedules should always be considered before deciding that it represents a realistic data-generating model, rather than just being a pragmatic description of the data at hand.
0][21] It has been noted in the literature that a sufficient sample size at both levels (ie, individuals and measurements for the RIAS model used with longitudinal data, or clusters and individuals for the RI model used with clustered data) is needed for robust and unbiased inference on fixed effects. 7,224][25] To our knowledge it has not, however, been fully appreciated that non-regular RIAS models resulting from misestimation of the random effects variances are most likely to be encountered when both the number of individuals studied and the number of measurements and/or duration of follow-up are small, because the parameter space corresponding to non-regular RIAS models increases as n and/or q decrease.
Furthermore, it has generally not been appreciated that data (observed at a finite number of visits, ie, for finite n and q) may have a variance-covariance matrix that exactly corresponds to a non-regular RIAS model.In this case, increasing the number of individuals studied will not result in the RIAS model parameter estimates becoming regular.However, increasing the number of measurement points and/or duration of follow-up may allow a model that more correctly describes the data generation process to be identified.
We have shown two kinds of misspecification of a RIAS model to be causes of non-regularity.First, when the true trajectory of the outcome over time is not linear, but this non-linearity is not accounted for by the fixed effects in the model, data that in truth imply a regular model (when the fixed effect structure is correctly specified) can have a marginal variance-covariance structure that corresponds to that of a non-regular RIAS model.
Second, non-regular RIAS models can arise when the data-generating mechanism implies a model with a more complex random effects structure than that of the model fitted.We have shown that, where there are just three time points, data implying a non-regular simple RIAS model are always compatible with at least one regular model with an additional random quadratic term; however, this more complex model is not identifiable from the data.The idea that a non-regular RIAS model can reflect a more complex model with unidentifiable parameters has parallels with Molenberghs and Verbeke's interpretation of the non-regular RI model as a model with a correlation between random intercepts and residuals, with the parameters of this model not uniquely identified by the data. 14ne "solution" often proposed for handling non-regularity is to simplify the random effects structure by removing the random slope effect or setting it to zero, as is done by default (without the "nobound" option) in SAS PROC MIXED. 16,19,25While this approach may help achieve model convergence in some situations, omitting random effects that have non-zero variance results in bias in inferences concerning the fixed effects, 4,7,22,26,27 and this applies even when a variance component is negative. 11,28,29In analyzing the rat data (see Table 1), the estimates from the model fitted with SAS PROC MIXED using the "nobound" option might be regarded as reflecting the parameter values implied by the data more closely than those from the other models fitted.Contrary to the widely assumed "common sense" that singular fits "should be prevented," 24 we therefore do not believe that removing random effects with negative variance estimates is generally preferable to a convergent non-regular model which will allow unbiased fixed effects inference.
A second solution that has been suggested in the context of non-regular RI models is to use a covariance structure modeling approach, where the marginal covariance matrix is modeled directly, rather than through random effects. 18hile the compound symmetry marginal covariance matrix implied by the RI model has only two parameters regardless of the size of the clusters to be analyzed, the marginal covariance matrix for longitudinal data has n(n + 1)/2 parameters.Even when there are only three time points, the number of parameters to be estimated from the data may therefore preclude this approach as an alternative to a RIAS model.
A third solution frequently applied, because it is the basis for default methods implemented in software including R lme4 and Stata, is to use a parameterization that ensures the estimated random effects covariance matrix is PSD.While these software packages use a Cholesky decomposition, arguments have also been made for the superiority of a factor analytic approach when fitting models with more than two random effects. 7Such approaches may be helpful if convergence cannot otherwise be achieved, but arguably they represent a distortion of the parameter values implied by the data in non-regular scenarios, and result in biased fixed effect standard errors, just as omitting the random slope does (Table 1).In situations where there is no unaccounted-for non-linearity in the fixed effects in the model and the data do not support identification of a more complex random effects structure, provided interest is in the marginal interpretation of the model, we propose that using a non-regular RIAS model is an acceptable approach.
Missing data, in particular informative dropout, may also give rise to non-regular RIAS models.If individuals with unusually steep slopes drop out due to a missing-not-at-random (MNAR) mechanism, the marginal variance in the outcome will be reduced at later time points, potentially resulting in negative quadratic curvature of the marginal variances over time.In the rat experiment, increasing numbers of rats were killed over time by the anesthesia needed to obtain the growth measurements. 13Although the researchers who collected the rat data reportedly believed that there was a constant probability for a rat to drop out, given that it survived up to that point, 30,31 if the dropout were in fact dependent on unobserved outcomes then this could be a factor in the negative slope pseudo-variance observed for this dataset.We have not investigated non-regular RIAS models in situations where data are unbalanced (which can be understood as an extreme kind of missing data scenario), but similar principles likely apply.
These insights lead us to suggest the following steps for dealing with non-regular RIAS models in practice: (1) At the study design stage, investigators should be aware that later analysis of the data with a RIAS model may result in estimates that correspond to a non-regular model if measurements are collected at only three time points, over a short follow-up period, and/or in only a small number of individuals.(2) During study conduct, minimizing dropout (especially any that may be due to a MNAR mechanism) can help avoid a non-regular RIAS model.(3) Ensure any non-linearity in the relationship between the mean outcome and time is accounted for by the fixed effects in the model.(4) Try adding polynomial random effects to the model, where these are identifiable from the data.(5) Be wary of over-simplifying non-regular RIAS models as this can lead to invalid inferences regarding fixed effects.In some situations, the non-regular RIAS model can be thought of as being compatible with a more complex, regular, but (sometimes) non-identifiable model.(6) Use software options (eg, "nobound" in SAS PROC MIXED or the "stddev" option in the Stata "mixed" command) that will make visible when data are compatible with a non-regular model.
boundary (3 time points) Sigma PSD boundary (4 time points) Sigma PSD boundary (5 time points) Sigma PSD boundary (10 time points) Sigma PSD boundary (25 time points)F I G U R E 5 Plot illustrating the effect of changing the number of time points (n) while keeping the sum of squared times constant (q = 10), with residual variance fixed at 10 and intercept-slope (pseudo-)covariance at ±2. Abbreviation: PSD, positive semidefinite.
from RIAS and RI models fitted in SAS, Stata, and R.
to (13)) will give a lower  PSD bound for  2 b0 .It can also be re-arranged to give formulae for lower  PSD bounds for  2 e and  2 b1 in terms of the other parameters and for an upper  PSD bound for  2 b01 in terms of the other parameters.The bounds for  2 b0 and  2 b1 (but not  2 e ) can be negative, and the implied upper bound for  01 can be greater than 1.