#### The multilevel item response model

We first introduce the MLIRT model. Let *y*_{ijk} (binary, ordinal, and continuous) be the observed outcome *k*(*k* = 1, … ,*K*) from individual *i*(*i* = 1, … ,*N*) at visit *j*(*j* = 1, … ,*J*_{i}, where *j* = 1 is baseline and *J*_{i} is the number of visits of individual *i*). Let *y*_{ij} = (*y*_{ij1}, … ,*y*_{ijK}) ′ be the vector of observations for individual *i* at visit *j* and let *y*_{i} = (*y*_{i1}, … ,*y*_{iK}) ′ be the outcome vector across visits. Throughout, all outcomes are coded so that larger observation values correspond to worse clinical conditions. Let *θ*_{ij} be a univariate latent variable measuring disease severity of individual *i* at visit *j*, with a higher value denoting more severe status. In the first level measurement model, we model the binary outcomes, the cumulative probabilities of ordinal outcomes, and the continuous outcomes as functions of *θ*_{ij} and outcome-specific parameters.

- (1)

- (2)

- (3)

where , *a*_{k} is the outcome-specific ‘difficulty’ parameter and *b*_{k} (always positive) is the outcome-specific ‘discriminating’ parameter representing the discrimination of outcome *k*, that is, the degree to which outcome *k* discriminates between individuals with different latent disease severity *θ*_{ij}. In model (2), the *k*th ordinal outcome has *n*_{k} categories and *n*_{k} − 1 thresholds with the order constraint . The probability that individual *i* is in category *l* on outcome *k* at visit *j* is *p*(*Y* _{ijk} = *l* | *θ*_{ij}) = *p*(*Y* _{ijk} *⩽ l* | *θ*_{ij}) − *p*(*Y* _{ijk} *⩽ l* − 1 | *θ*_{ij}).

In the second level structural multilevel model, the latent disease severity *θ*_{ij} is regressed on covariates of interest, visit time, and random effects.

- (4)

where *X*_{i0} and *X*_{i1} are vectors of covariates associated with baseline disease severity and disease progression rate, respectively, *X*_{i0} may or may not be the same as *X*_{i1}, *t*_{ij} is visit time variable with *t*_{i1} = 0 for baseline, random intercept *u*_{i0} and random slope *u*_{i1} determine the subject-specific baseline disease severity and disease progression rate, respectively. It is assumed that *u*_{i} = (*u*_{i0},*u*_{i1}) ′ are from *N*_{2}(0,*Σ*), with covariance matrix *Σ* being denoted by , where the variance of *u*_{i0} is set to 1 for identifiability, is the variance of *u*_{i1}, and *ρ* is the correlation coefficient. The random effects vector *u*_{i} takes into account all three sources of correlations illustrated in Section 1. For example, if *θ*_{ij} = *u*_{i0} + (*β*_{10} + *β*_{11}*x*_{i} + *u*_{i1})*t*_{ij}, where *x*_{i} is treatment indicator (1 if treatment and 0 otherwise), then significant negative coefficient *β*_{11} indicates that the treatment slows down the disease progression. The combined level 1 and level 2 models are MLIRT with subject-specific covariance (referred to as subject-specific MLIRT models) [24, 25, 23, 26, 27, 44, 45].

Let *t*_{i} denote the time to terminal event for individual *i*, *δ*_{i} (1 if the terminal event is observed and 0 if not needed) denote the censoring indicator for *t*_{i}, and *X*_{i} denote vector of possible risk factors with the first element being 1. Vector *X*_{i} can share part of or all covariates in *X*_{i0} and *X*_{i1}. The regular AFT model can be expressed as log(*t*_{i}) = *X*_{i}*γ* + *σ*_{ε}*ε*_{i}, where *γ* is the unknown coefficient, *ε*_{i} is independent random error, and *σ*_{ε} is the scale parameter. The correlation between time to a terminal event and the longitudinal outcomes can be accounted for by sharing the random effects *u*_{i0} and *u*_{i1} in the AFT model as

- (5)

When *η*_{0} and *η*_{1} are not equal to zero, the correlation between the survival time and longitudinal outcomes is incorporated, and the random effects have different effects on *θ*_{ij} and *t*_{i}.

Let *f*_{0}( · ),*S*_{0}( · ), and *h*_{0}( · ) denote the density, survival, and hazard functions of random error *ε* in model (5), respectively. Let *f*( · ),*S*( · ), and *h*( · ) denote the density, survival, and hazard functions of T, respectively. Then we have the following relationships: , , and , where . The common density distributions and the corresponding survival functions for *ε*_{i} are summarized in Table 1. When *ε*_{i} follows normal, logistic, and extreme value distributions, event time *t*_{i} follows log-normal, log-logistic, and Weibull distributions, respectively. For example, if *ε*_{i} follows normal distribution with *S*_{0}(*ε*_{i}) = 1 − Φ(*ε*_{i}), where Φ( · ) denotes cumulative standard normal distribution function, then . To solve this equation for *t*_{i}, , where Φ^{ − 1} is the inverse function of Φ( · ). Under the local independence assumption (i.e., conditioning on the random effects vector *u*_{i}, all components in *y*_{ij} and *t*_{i} are independent), the full likelihood for individual *i* is

- (6)

Table 1. Common distributions of *ε*_{i} in the parametric AFT models. *ϕ*( · ) and Φ( · ) denotes the probability density function and cumulative distribution function of the standard normal distribution.Distribution | *f*_{0}( · ) | *S*_{0}( · ) |
---|

Normal | *ϕ*(*ε*_{i}) | 1 − Φ(*ε*_{i}) |

Logistic | exp(*ε*_{i}) ∕ [1 + exp(*ε*_{i})]^{2} | 1 ∕ (1 + exp(*ε*_{i})) |

Extreme value | exp(*ε*_{i} − exp(*ε*_{i})) | exp( − exp(*ε*_{i})) |

For notation convenience, we let the observed data be *y* = {*y*_{ijk}} ∪ {*t*_{i}} ∪ {*δ*_{i}}, the difficulty parameter vector be *a* = (*a* ′ _{1}, … ,*a* ′ _{k}, … ,*a* ′ _{K}) ′ , with , the discrimination vector be *b* = (*b*_{1}, … ,*b*_{K}) ′ , and *β* = (*β* ′ _{0},*β* ′ _{1}) ′ , and the parameter vector *Ψ* = (*a* ′ ,*b* ′ ,*β* ′ ,*ρ*,*σ*_{u},*σ*_{k},*γ*,*σ*_{ε},*η*_{0},*η*_{1}) ′ . We refer to the proposed joint modeling framework assuming the log-normal, log-logistic, and Weibull distributions for survival time as models JM_{LN}, JM_{LL}, and JM_{W}, respectively. In addition, we consider reduced models assuming independence between the survival time and longitudinal outcomes (i.e., *η*_{0} = *η*_{1} = 0). We refer to the reduced models assuming the log-normal, log-logistic, and Weibull distributions for survival time as models RM_{LN}, RM_{LL}, and RM_{W}, respectively.

#### Bayesian inference

To infer the unknown parameter vector *Ψ*, we use Bayesian inference on the basis of MCMC posterior simulations. We use vague priors on all elements in the parameter vector *Ψ*. Specifically, the prior distributions of all elements in *β* and *γ*, *η*_{0}, and *η*_{1} are *N*(0,100). We use the prior distribution *b*_{k} ∼ Gamma(0.01,0.01), *k* = 1, … ,*K*, to ensure positivity. The prior distribution for the difficulty parameter *a*_{k} of the continuous outcomes is *a*_{k} ∼ *N*(0,2000) because some continuous measurements are quite large. To obtain the prior distributions for the threshold parameters of ordinal outcome *k*, we let *a*_{k1}* N*(0,100), and *a*_{kl} = *a*_{k,l − 1} + *δ*_{l} for *l* = 2,*n*_{k} − 1, with *δ*_{l} ∼ *N*(0,100)*I*(0,), that is, normal distribution left censored at 0. We use the prior distribution *ρ* ∼ Uniform[ − 1,1], and *σ*_{k},*σ*_{u},*σ*_{ε} ∼ Gamma(0.01,0.01).

The model fitting is performed in OpenBUGS (OpenBUGS version 3.2.2) by specifying the likelihood function and the prior distribution of all unknown parameters. We use the history plots available in OpenBUGS and view the absence of apparent trend in the plot as evidence of convergence. In addition, we use the Gelman–Rubin diagnostic to ensure the scale reduction of all parameters are smaller than 1.1 [46].