Linear Increments with Non‐monotone Missing Data and Measurement Error

Abstract Linear increments (LI) are used to analyse repeated outcome data with missing values. Previously, two LI methods have been proposed, one allowing non‐monotone missingness but not independent measurement error and one allowing independent measurement error but only monotone missingness. In both, it was suggested that the expected increment could depend on current outcome. We show that LI can allow non‐monotone missingness and either independent measurement error of unknown variance or dependence of expected increment on current outcome but not both. A popular alternative to LI is a multivariate normal model ignoring the missingness pattern. This gives consistent estimation when data are normally distributed and missing at random (MAR). We clarify the relation between MAR and the assumptions of LI and show that for continuous outcomes multivariate normal estimators are also consistent under (non‐MAR and non‐normal) assumptions not much stronger than those of LI. Moreover, when missingness is non‐monotone, they are typically more efficient.


Proof of Theorem 1
Whenβ ls t is fixed to equal β t , the least-squares estimator (α ls t ,γ ls t ) of the remaining parameters (α ls t , γ ls t ) using only those individuals with R t = 1 is given by Therefore, assuming that equations (2) and (3) hold, that the measurement error process is independent of the other processes, and that measurement errors have mean zero, we have Hence For consistency of (α ls t , γ ls t ), we see that Similarly to equation (29), it can be shown that when equation (4) holds,

Proof of Theorem 2
In this proof we omit the superscript 'ls' from α ls t and γ ls t .
In their Section 3.3, A&G describe their imputation method. Adapting their formulae for Y est t and ∆Y est t to make them apply to the outcomes observed with error rather than to the underlying outcomes, we have We use induction to prove that E(Y est t − Y t | X, Y 1 ) = 0 for all t = 1, . . . , T .

Proof of Theorem 3
Line (38) follows because of the assumption of dDTIC and the autoregressive assumption of equation (1), as we now show. and So, from equations (40) and (42), we have that for k < s ≤ t, It follows from equations (1) and (9) that So, using equations (39), (43) and (44), we have So, line (38) follows.
The following example shows that independent return does not imply strong independent return. However, it does not show that this matters for inference.
Note that dDTIC does hold in this example.
The MLE of θ obtained by fitting the MVN model to the observed data and ignoring the missingness mechanism is the value of θ for which the derivative with respect to θ of the log likelihood function N i=1 L i (θ) equals zero, where and r k means the sum over all possible k-vectors whose elements are zero or one.
Even if (Y 1 , X, ǫ 2 , . . . , ǫ T ) is not normally distributed, equation (46) still holds, provided that ǫ t has mean zero and its variance does not depend on F t−1 . This is because, treated as a function of y it (i = 1, . . . , N), N i=1 h t (y it | r s , G is (r s ); θ) depends only on (and is a linear combination of G s (r s )) and Var(Y t | G s (r s )), and hence only depends on the distribution of (X ⊤ , Y 1 ) ⊤ and ǫ 2 , . . . , ǫ t through their means and variances. Now, for any r t−1 with final element equal to one, Using equation (13), equation (47) reduces to which, by equation (46), equals zero at the true value of θ, because, as stated Similarly, using equation (14), it follows that also has expectation zero at the true value of θ.
Therefore equation (45) has expectation zero at the true value of θ. So, under standard regularity assumptions, the MLE of θ from the unstructured MVN model is consistent (Stefanski and Boos, 2000).
The proof that autoregressive MVN yields consistent estimators when independent return holds is analogous. The parameters δ tj are removed from θ, since they are whenever the final element of r s equals one. The proof for unstructured MVN continues to apply for autoregressive MVN, once h t (Y t | r s , G s (r s ); θ) have been replaced by h t (Y t | r s , Y s ; θ) and equations (13) and (14) have been replaced by versions of those equations with (X, Y t−1 ) and X, Y k in place of G t−1 and G k , as Theorem 5 allows.

Proof of Theorem 5
By equation (1) and dDTIC, we have which implies that equation (13) holds.
Next we prove that equation (14) holds.
which implies that equation (14) holds. Lines (48) and (50) follow from strong independent return. Line (49) follows from the same argument used above to prove equation (13).
The proofs are analogous when independent return, rather than strong independent return, holds.

Proof of Theorem 6
The complete-data least-squares estimators of β t and γ t are Solving these simultaneous equations yieldŝ The least-squares estimator of α t iŝ If β t is constrained to equal I, then equations (54) and (55) still hold.

Proof of Theorem 7
When data are monotone missing and the δ tj 's are constrained to equal zero, where 1 t−1 denotes a (t − 1)-vector of ones. The maximum likelihood estimate of θ can be obtained by fitting the models defined by equations (84) and (85) with the δ tj 's omitted and estimating µ 1 , µ T +1 , Σ 1,1 , Σ 1,T +1 and Σ T +1,T +1 by the corresponding sample means, variances and covariances. Fitting by maximum likelihood the models given by equations (84) and (85) with the δ tj 's omitted is equivalent to fitting them by least squares, which is the method proposed by A&G.
When data are monotone missing, the imputed value of Y t obtained using au- Therefore, aMVN imputation is equivalent to the iterative imputation procedure that is LI imputation.

Proof of Theorem 8
We begin by proving b). So, assume that mortal-cohort independent return and independent death hold. Then Equations (20) and (26) imply that Line (60) follows because of independent death. Line (62) follows because of mortal-cohort independent return. Line (63) follows by induction. Line (64) follows by equation (57). Hence, from equation (64), It follows from equations (56) and (65) that Hence, independent return holds in the supplemented process. Line (69) follows from mortal-cohort independent return and equation (66).
The proof of c) is analogous to that of b). The changes are as follows. Replace X by G k−1 in equations (56), (58)-(63) and (66)-(70). Replace equation (57) by and replace equation (65) by Finally, we prove a).

Proof of Theorem 9
In order also to be able to discuss the use of MVN imputation for mortal-cohort inference (below), we prove the following more general version of Theorem 9.
Theorem 10. If equation (20), mortal-cohort dDTIC, mortal-cohort independent return and independent death hold, then for k < s < t, When mortal-cohort strong independent return and strong independent death hold, (X, Y k ) on the left-hand side of these equations can be replaced by (X, G k ).

Proof
First, consider a).
Line (77) follows by mortal-cohort independent return and independent death.
Appendix S3: Proof of equation (19) Before giving a formal proof, we provide some intuition as to why this constraint arises. The unstructured MVN model can be reparameterised as If it assumed that equation (2) holds, then each δ tj must equal zero. Since there are (T − 1)(T − 2)/2 matrices δ tj , each of which has m 2 elements, constraining δ tj = 0 reduces the number of free parameters by (T − 1)(T − 2)m 2 /2. Returning to equation (19), we note that there are T (T −1)/2 matrices Σ st (s < t) and T −1 matrices β t , and each of these matrices has m 2 elements, so the constraint of equation (19) reduces the number of free parameters by the same number: (T − 1)(T − 2)m 2 /2.

The relation between the parameters of the original and reparameterised models
is as follows. For t ≥ 2, and, for 1 ≤ s < t ≤ T , Σ s,t is given by equation (19). Conversely, for t ≥ 2, β t , γ t and α t are given by equations (15)- (17) without the hats, and We now provide a formal proof of equation (19).
For 1 ≤ s < t ≤ T , we have from equation (1) that [Throughout this proof, terms beginning t−1 j=s+1 should be interpreted as being equal to zero if s = t − 1.] using equations (54) and (55). Note that equation (90) still holds when β t is constrained to equal I.

Appendix S4: Random-walk MVN (rMVN) methods
Here we describe the LI-rMVN imputation and rMVN imputation methods introduced in Section 4.3.
The relation between the parameters of the original and reparameterised models is given by equations (86)-(88) and by equations (16) and (17)  This model can be fitted by maximum likelihood to the outcomes Y t + e t observed with error, thus treating them as though they were the underlying outcomes Y t , and ignoring the missingness mechanism (see Section S5 for fitting algorithm). We call this the 'random-walk MVN (rMVN)' method.
Like Theorem 4, Theorem 11 does not require that the data actually be normally distributed. Equations (86) and (87) can then be used to obtain consistent estimates of µ t and Σ t,T +1 . Note that the maximum likelihood estimates of Σ t,t obtained using equation (88) are not consistent unless there is no measurement error. For example, the maximum likelihood estimator of Σ 11 converges to Var(Y 1 )+Var(e 1 ) as N → ∞, rather than to Σ 11 = Var(Y 1 ). This is not a problem for LI imputation using the random-walk MVN estimates of α t and γ t ('LI-rMVN imputation'). It is also not a problem when imputation is carried out using equation (18) with the random-walk MVN estimates of µ and Σ ('rMVN imputation'), because the complete-data maximum likelihood estimator of the parameters of a linear regression of Y on t and/or X is not a function ofΣ t,t (see Appendix S6 for details).

Proof of Theorem 11
To avoid confusion in this proof, we shall denote the true values of α t and γ t in equation (2) as α 0t and γ 0t , and the true values of µ t = E(Y t ) and Σ st = Cov(Y s , Y t ) as µ 0t and Σ 0st . The model being fitted is Note that we are not assuming in this proof that the model given by equations (91) and (92) describes the true relation between the random variables.
= 0 Equation (95) uses the assumptions that {e t : t = 1, . . . , T } is independent of all other processes, e t is independent of e s for all t = s, and E(e t ) = 0 for all t. Equation (96) follows from equation (14) with G k replaced by (X, Y k ).

Appendix S5: EM algorithm for MVN methods
As explained in Section 4 of our paper, the standard (unstructured) MVN method does not respect the constraints on the variance given by equation (19). Schafer with c s,t (1 ≤ s, t ≤ T ) denoting the sample covariance of Y s and Y t , and c T +1,t denoting the sample covariance of X and Y t , and c T +1,T +1 denoting the sample variance of X.
For the random-walk MVN model, the constraints on the variance given by equation (19) with β = I need to be imposed. Again, the norm package can be used to carry out the E step of the EM algorithm, but the M step needs to be modified.
The M step can be carried out by fitting the model given by equations (83)- (85) with δ tj = 0 and β t = I and then calculatingμ andΣ using equations (86)-(88). However, we did not actually do this. Instead, we used a Newton-Raphson algorithm to maximise the observed-data likelihood directly.
Appendix S6: MVN imputation as a method for estimating parameters of a linear regression model For simplicity, assume that Y is univariate (i.e. m = 1). However, in the following, Y could easily be replaced by one of the m univariate elements of a vector Y .
First, we shall show that the complete-data maximum likelihood estimates of the linear regression model are functions of the complete-data statisticsX,Ȳ t , c T +1,t and c T +1,T +1 . Second, we shall show that the values ofX,Ȳ t , c T +1,t and c T +1,T +1 (t = 1, . . . , T ) in the imputed dataset are equal to, respectively,μ T +1 ,μ t ,Σ T +1,t andΣ T +1,T +1 . This implies that performing MVN imputation and then fitting the linear regression model of equation (98) to the imputed data gives the same estimates of ψ 0 , ψ 1 and ψ 2 as applying the forementioned functions toμ T +1 ,μ t ,Σ T +1,t andΣ T +1,T +1 directly. and Thereforeψ is a function ofX,Ȳ t , c T +1,t and c T +1,T +1 .
At convergence of the EM algorithm for fitting the MVN model (whether unstructured, autoregressive or random-walk) the expected values ofX,Ȳ t , c T +1,t and c T +1,T +1 given the observed data are equal toμ T +1 ,μ t ,Σ T +1,t andΣ T +1,T +1 (see Section S5). Since X is fully observed,X and c T +1,T +1 are observed, andμ T +1 andΣ T +1,T +1 are equal to them. The values ofȲ t and c T +1,t are not observed, but because X is fully observed their expected values given the observed data can be calculated by application of equation (18). Since this is precisely what is done in MVN imputation, the values ofȲ t and c T +1,t calculated from the imputed data will equalμ t andΣ T +1,t . Thus, whether one applies equations (99)-(101) to the imputed data or substitutesμ T +1 ,μ t ,Σ T +1,t andΣ T +1,T +1 forX,Ȳ t , c T +1,t and c T +1,T +1 in equations (99)-(101), one gets the same value ofψ.
(U, V, {R 0,t = 0}), which means the same as conditioning on (U, V )] that and so equation (102) implies that as required.
If ǫ it is not normally distributed, formula (18) will not correspond, in general, to the conditional expectation of Y t given G T when R l = 1 for some l > t. Nevertheless, multiple imputation using the unstructured MVN model has been found often to work well in practice when data are MAR even when not normally distributed (Schafer, 1997;Schafer and Graham, 2002;Lee and Carlin, 2010;Demirtas et al., 2008), and so there is cause to think that uMVN imputation may also work well in practice.
Appendix S8: Further results from Simulation Studies 1 and 2, and Simulation Study 3 Tables S1 and S2 show the results of fitting the linear regression model in simulation studies 1 and 2, respectively, of Section 6 of our paper.
For Simulation Study 1 we also modified the return mechanism so that the independent return assumption was violated. In particular, logit{P (R 0,t = 1 | R 0,t−1 = 0, R 0,t−2 , F T )} = φ t + X + (Y t−1 + Y t−2 )/2, where φ t is chosen to make P (R 0,t = 1 | R 0,t−1 = 0) = 0.5. The results are shown in Tables S3 and S4. In Simulation Study 3, data were generated from the same model as in Simulation Study 1 except that β t = 1.2 was replaced with β t = 1, and independent measurement error e it was added to the underlying outcomes. The errors e it were generated from the same bimodal distribution as ǫ it . The ω t and φ t values were again chosen so that P (R 0,t = 0 | R 0,t−1 = 1) = 0.5 and P (R 0,t = 1 | R 0,t−1 = 0) = 0.5. For each of 1000 simulated datasets we applied the same methods as in Section 6.1. We additionally applied these methods constraining β t = 1. Table S5 and Table S6 show the means and empirical SEs of the estimators of µ t and (ψ 0 , ψ 1 , ψ 2 , ψ 3 ), respectively. As expected, the methods that constrain β t = 1 are approximately unbiased and the methods that do not impose this constraint are biased. LI-LS imputation is more efficient than estimating the compensator, and LI-rMVN imputation is yet more efficient. There is little gain from using rMVN imputation compared to using LI-rMVN imputation.

Appendix S9: Software for LI methods
The LI-LS imputation method can be applied using the FLIM package in R (Hoff, 2014 Table S3: Means and empirical SEs of estimated µ t in Simulation Study 1 when return mechanism is modified to violate independent return assumption.    Table S6: Means and empirical SEs of estimated ψ 0 , ψ 1 , ψ 2 and ψ 3 in in Simulation Study 3. LI-LS imputation is applied both with β t estimated and with β t constrained to equal 1.