Models and inference for on–off data via clipped Ornstein–Uhlenbeck processes

We introduce a model for recurrent event data subject to left‐, right‐, and intermittent‐censoring. The observations consist of binary sequences (along with covariates) for each individual under study. These sequences are modeled as generated by latent Ornstein–Uhlenbeck processes being above or below certain thresholds. Features of the latent process and the thresholds are taken as functions of covariates, allowing the researcher to distinguish factors that have an effect on the frailty, from those that have an effect on the variability, of the observational unit. Inference is achieved by a quasi‐likelihood approach, for which consistency and asymptotic normality is established. An advantage of our model is that particularities regarding the censoring need not be taken actively into account, and that it is well suited for situations where the individuals under study are irregularly and asynchronously observed. The motivation for our model came from a dataset pertaining to the incidence of diarrhoea among Brazilian children growing up under rather harsh conditions. We analyze these data with our model and contrast the results with an intensity‐based counting process analysis of the same data.


INTRODUCTION
Many phenomena, in the biosciences, epidemiology, engineering, and the social sciences, are recorded over time as being in one of two states, "on or off." An epidemiologist has follow-up data on the health history of a number of individuals; an engineer on the stress level of a given component over time; an economist has data on the employment status of group of individuals surveyed at various points in time. Such data are often binary (sick/healthy, overheated/temperate, employed/unemployed), and the data gathering is not performed continuously in time, but irregularly with differing lengths between the observation times. Observational schemes of this kind abound in many fields of science, and have in common that they generate sequences of zeros and ones where the state of the observational unit between the observation times is unknown (i.e., censored). The statistical modeling and analysis of this kind of data can be quite challenging because one needs to tend to the dependence in time as well as account for rather complicated patterns of censoring. An appealing feature of the model we introduce in this paper is that both the temporal dependence, as well as all types of censoring, are accommodated in a straightforward and neat manner requiring minimal statistical ingenuity on the part of the applied researcher doing the analysis. The data that inspired the present study have previously been analyzed in Borgan, Fiaccone, Henderson, and Barreto (2007), and is shown in Figure 1. The plot shows the health status of a sample of Brazilian children living in the metropolitan area of Salvador, observed at discrete and irregular times over a period of 455 days. In the figure the children are plotted according to their identification number (the y-axis), with the color of the dots indicating whether the child suffered from diarrhoea (black dots) or were in good health (grey dots) at the time of observation. The blank dots are times at which no observation was made, thus we see that the data are left-, right-, and intermittently censored. That the observations took place at discrete and irregular times, means that the data consist of a time stamp and an indicator of the state of the child (sick/healthy) at that time, along with covariates. (In Figure 1 we have only included every 10th child in the sample of 925, because the resolution becomes problematic when the entire dataset is included.) The pattern depicted among the 93 children in the plot is characteristic for the entire sample.
Approaches to data such as those in Figure 1 include intensity-based counting-process methods, renewal processes, and models for the time between events (Andersen, Borgan, Gill, & Keiding, 1993;Borgan et al., 2007). A comprehensive overview of recurrent event methods is given by Cook and Lawless (2007). Borgan et al. (2007) developed a recurrent event version of Aalen's additive hazard model, and fitted such models to the Brazilian data.
In this paper we introduce a class of models where the sequences of zeros and ones are generated by latent and independent Ornstein-Uhlenbeck processes being below or above certain thresholds. Differences between the observational units are accounted for by letting features of the Ornstein-Uhlenbeck processes as well as the thresholds be governed by covariates. Each covariate might enter into one or both of the regression structures of the model, thus allowing for inference on the effect of each covariate on the temporal correlation of the underlying process, its effect on the level of the process, or its effect on both these features. In the setting of the Brazilian data, this means that our model enables us to say something about how different factors affect the fluctuation of a child's health, and on how these same or other factors affect the frailty of a child.
These two features of the model are rather different, and subject matter knowledge on a case to case basis is required to decide which covariates should enter where. Given that such knowledge is available, the fact that our model allows the covariates to affect the data generating process in Days F I G U R E 1 Observation pattern and actual observations for diarrhoea data (for every 10th child in the sample of 925). The grey dots indicate that the child was healthy at the observation time, and the black dots indicate that the child was sick. The white areas are time points at which no observations were made two different ways is appealing as it potentially permits for a more detailed understanding of the phenomenon under study. As already mentioned, another appealing property of our method, and one which sets it apart from the above-mentioned approaches, is that by explicitly modeling the mechanism generating the zeros and ones, all three types of censoring are taken care of without further efforts.
The paper proceeds as follows. In Section 2 we introduce the latent processes ticking in the background as well as the observational scheme, based on which we derive our model. The complexity of the likelihood of the model presented in Section 2 makes it computationally infeasible to maximize; in Section 2.2 we therefore present what we call the quasi-likelihood method of estimation. This and related estimation methods are variably called quasi-, pairwiseand composite-likelihood (Cox & Reid, 2004;Hjort & Omre, 1994;Hjort & Varin, 2008;Nott & Rydén, 1999;Varin, 2008;Varin, Reid, & Firth, 2011;Varin & Vidoni, 2005). A lucid review paper of quasi/composite-likelihoods methods is Varin et al. (2011). In Sections 3.2-3.3 we prove consistency of the maximum quasi-likelihood estimator, and derive its limiting distribution. Section 3.4 contains a study of the maximum quasi-likelihood estimator when the chosen parametric model is misspecified, a model selection criterion is derived, and we introduce a goodness of fit measure based on the ratio of two nested quasi-likelihoods. In Section 3.5 we conduct a small simulation study to assess the asymptotic results in a finite-sample setting. Finally, in Section 4 we analyze the Brazilian data using clipped Ornstein-Uhlenbeck process models, and contrast the results with those obtained by the linear hazard model of Borgan et al. (2007). Most of the proofs are deferred to Appendix.

A clipped Ornstein-Uhlenbeck process
The data consist of n children observed at various points in time over a finite interval [0, T]. Associated with each of the i = 1, … , n children there is a latent stochastic process { i (t) ∶ 0 ≤ t ≤ T} governing the health condition of the child. The child is sick if the process is above a certain threshold, and in good health otherwise, and it is only this zero-one version of the latent process that is actually observed. Moreover, the sample is not continuously monitored, and the health status of a child is only ascertained at certain points in time. These time points might be fixed and different for each of the children, they need not be equidistant, or they might be generated according to a stochastic process independent of the underlying latent processes. The observation times are denoted by Let i = {t i,0 , … , t i,k i } be the set of observation times of the ith child. The zero-one sequences available for analysis are given by is the possibly time-varying child specific threshold above which the child is sick.
In this paper we take the 1 (t), … , n (t) to be independent Ornstein-Uhlenbeck processes. More precisely, the health process of the ith child is a mean zero Gaussian process with covariance function Cov( i (t), i (s)) = exp(−a i |t − s|), for a nonnegative parameter a i . The children may differ among each other and over time with regard to how prone they are to falling ill (frailty), and they may also differ in how fast their health is changing. Our model captures differences in frailty among the children and over time through the thresholds c 1 (t), … , c n (t). Differences in oscillation of the child-specific health processes are accounted for by the parameters a 1 , … , a n . If covariates are available, say x i = (1, x i,1 , … , x i,r 1 ) t and z i (t) = (1, z i,1 (t), … , z i,r 2 (t)) t , both a i and c i (t) can be modeled as functions of these. We propose a model with where the vectors x i and z i (t) can be identical, overlapping or disjoint, and the vector z i (t) might contain time-varying elements. In particular, in this model the probability of the ith child being ill at time t is 1 − Φ(c i (t)), with Φ(x) the standard normal distribution function. Notice that there is no loss in generality by letting the marginal variance of the i (t) processes be 1 and by not explicitly introducing a drift parameter. In effect, the introduction of such parameters would lead to an overparametrization of the model: A drift parameter would be indistinguishable from the time-varying thresholds, while the introduction of a variance parameter would just result in a scaling of the coefficients. Finally, let be the observed data, and D = (Y, , {z(t)} t∈ , x) denote a generic such observation vector. The true likelihood contribution of the ith child is the multivariate normal probability, which is, for moderate k i and n, computationally burdensome to compute. Moreover, contrary to an Ornstein-Uhlenbeck process itself, a clipped Ornstein-Uhlenbeck process is no longer Markov (Slud, 1989), so the likelihood contribution in (2) cannot be factorized. In other words, likelihood inference based on ∏ n i=1 L i ( , ) is, excluding the trivial cases, for example, k i ≤ 4, infeasible. To deal with this issue we propose what we call the quasi-likelihood approach for inference on the parameters governing the underlying Ornstein-Uhlenbeck processes as well as on the parameters determining the varying frailties of the individuals under study. The quasi-likelihood is the topic of the next section.

The quasi-likelihood approach
Define the (r 1 + r 2 + 2) × 1 vector = ( t , t ) t , and the probabilities for (u, v) in the set {(0, 0), (0, 1), (1, 0), (1, 1)}. Throughout the article we write " ∑ (u,v) " for sums over this set, and follow the same convention for " ∏ (u,v) ," "min (u,v) ," and so on. The bivariate normal probabilities p uv can easily be computed with the formulas presented in Appendix. To not overburden the notation, we may drop some of the arguments from p uv ( , i, j) when it is clear from the context what probability we are referring to.
The quasi-likelihood approach consists of approximating the true likelihood in (2) with the pairwise construction We define the functions q and q j via where the jth quasi-likelihood contribution of the ith child, that is q j ( , D i ), is given by The quasi-maximum likelihood estimator̂n is the value of maximizing Q n ( ). Notice that q j ( , D i ) is the probability mass function of a multinomial experiment with four outcomes, hence it is a proper likelihood contribution for such an experiment, and the Bartlett identity holds, where 0 denotes the true parameter value. In a similar manner, the quasi-score function U n ( ) = Q n ( )∕ and its contributions u( , D i ) and u j ( , The jth quasi-likelihood score contribution of the ith child is in terms of the random variables where all three derivatives are bounded as long as the correlations ( , i, j) are bounded away from 1. Note also that the derivatives appearing in B i,j and C i,j are not the same function evaluated in different time points, but derivatives with respect to differentarguments of p uv .
Remark 1. The pairwise quasi-likelihood we consider in this paper can of course be extended to involve triplets of observations, that is, , quadruples, and so on. Such strategies were pursued in Hjort and Varin (2008) for the Markov Chain case. Quasi-likelihood constructions of this type are likely to yield more efficient estimators, at the cost, however, of a larger computational burden. Another pairwise construction not considered in this paper is to let the quasi-likelihood involve all pairs of child-specific observations, that is (Y i,l , Y i,j ) for all l ≠ j, and not only the adjacent ones.

Assumptions and notation
For a vector x, ||x|| is the Euclidean norm, while for a function z, For the covariance function on the observation grids we write Throughout the paper we assume that the following hold: ) become positive definite with probability one under for all j; and (iv) the parameter space Θ ⊂ R r 1 +r 2 +2 is compact.
We note that assumption (ii) entails that none of the processes 1 (t), … , n (t) are degenerate, in particular, the correlation function ( 0 , i, j), evaluated in the true parameter value, is strictly smaller than one with -probability one for all i and j. This implies that none of the probabilities p 00 , p 01 , p 10 , p 11 approaches zero, ensuring that the random variables are bounded when evaluated in 0 . In particular, for any -integrable real valued function by the law of large numbers. The compactness assumption on the parameter space is used in the consistency proof below.

Consistency
Define the Kullback-Leibler divergence KL( ) of the quasi-likelihood by where KL( , , Since Q n consists of proper multinomial likelihood elements, standard techniques (e.g., Ferguson, 1996, chapter 17) can be applied to prove that KL( ) is nonnegative and equals zero if and only if = 0 . This is the content of Lemma 1 in Appendix. We have the following result.
Theorem 1. The valuên maximizing Q n ( ) is consistent for 0 .
Proof. The function q( , D) is continuous in for all D, and log{q( , D)∕q( 0 , D)} is bounded above by the integrable function − ∑ k j=1 log q j ( 0 , D). By compactness of Θ, it follows from Theorem 16(b) in Ferguson (1996, p Lemma 1, the function KL( ) attains its maximum in the unique point 0 . Theorem 5.7 in van der Vaart (1998) then gives the result. ▪

Asymptotic normality
Due to the Bartlett identity in (4), the variance of the quasi-score function given the covariates Under the assumptions imposed above, this variance is finite, and assumption (i) ensures the existence of matrices such that in probability under . When evaluated in the true parameter values, we write H = H( 0 ) and C = C( 0 ).
Theorem 2. The sequence n −1/2 U n ( 0 ) converges to a mean zero normal distribution with covariance matrix H + 2C.
Proof. By assumption (i) and (ii), n −1 Var 0 U n ( 0 ) converges to H + 2C. From Lemma 2 we have that ||u j ( 0 , D i )|| is bounded by something that is proportional to ||x i || + 2||z i || ∞ (see Lemma 2 for details). Hence, for all i where the right-hand side is finite with -probability one. The u( 0 , D 1 ), … , u( 0 , D n ) are then i.i.d. and bounded random variables, and the central limit theorem yields the result. ▪ In order to get the limiting distribution of the estimator̂n we need to prove that By theorem 5.23 in van der Vaart (1998, p. 53) the equality in (9) holds if the quasi-likelihood functions log q( , X) are Lipschitz in a neighbourhood of the true 0 and H is invertible. That this is indeed the case is proved in Theorem 4. To summarize, we have that at the model, the estimator n is consistent for the true value 0 , and that n 1∕2 (̂n − 0 ) converges to a mean zero normal distribution with covariance matrix H −1 (H + 2C)H −1 . In the next section we consider the state of affairs when the model on which the quasi-likelihood is based is misspecified, and derive a model selection criterion and a goodness-of-fit test.

Model selection
Suppose that the observed sequences of zeros and ones are generated by Ornstein-Uhlenbeck for some continuous, non-negative and bounded function a 0 (x); and that the thresholds are c 0 (z i (t)), i = 1, … , n, with c 0 (z(t)) a bounded function. Otherwise, both functions are unknown. In this situation the parametric models of the form given in (1) can be viewed as parametric approximations to the true model, perhaps lying out of reach of the parametric models employed for estimation. The quasi-likelihood estimator̂n then aims at the least-false parameter value lf minimizing the distance where q 0 ( , x, z) is the hypothetical quasi-likelihood based on the true underlying distribution, and the expectation is taken with respect to the true distribution.
Theorem 3. The estimator̂n is consistent for the least-false parameter value lf ; and Proof. Follows, with minor modifications, from the results given in Sections 3.2 and 3.3. ▪ If the parametric model we are employing happens to be the correct one, then K = H, since the Bartlett identity is then back in force.
The model presented in display (1) contains two regression models, one on the covariance of the Ornstein-Uhlenbeck processes and one on the thresholds. Since each covariate available for analysis might enter into one, the other, or both of these, we are left having to choose between 4 r 1 +r 2 different models. Without subject matter knowledge, this quickly becomes a insurmountable task, as exploring all models would quickly exhaust the computing power at one's disposal. Given that there are only a handful of models that are deemed plausible, however, the model selection criterion and the test statistic that we now introduce can assist in choosing the best among these.
(1) The quasi-likelihood information criterion (qlic) is a special case of the composite likelihood information criterion introduced in Varin and Vidoni (2005). It is given by whereĤ n ,K n ,Ĉ n are consistent estimators of H, C, K. Among the candidate models, the model with the highest value of the qlic is to be preferred. The derivation of the qlic follows the same principles as the derivation of the Akaike information criterion in the standard likelihood case (e.g., Claeskens & Hjort, 2008, chapter 2.3), and can thus be viewed as an aic-type criterion for the quasi-likelihood method of inference. Consistent estimators of the matrices H, respectively.
(2) The quasi-likelihood ratio test. The adequacy of different models whose parameters have been estimated by the quasi-likelihood methods can also be assessed by the likelihood ratio inspired statistic Here we have partitioned the (r 1 + r 2 + 2) parameter vector = ( , ) as ( 1 , 2 ), with 1 and 2 of dimension p and q, respectively. We are interested in testing whether the subset 2 of parameters is equal to zero. Here, Q n,wide (̂1,̂2) and Q n,narrow (̃1, 0) are the quasi-likelihoods of the model including the full parameter vector ( 1 , 2 ), and a narrow model where 2 = 0; both evaluated in their respective maximizers (̂1,̂2) and (̃1, 0). For a given choice of ( 1 , 2 ), corresponding to a wide and a narrow model, write for the (p + q) × (p + q) matrix where H 00 , H 01 = H t 10 , and H 11 are the probability lim- , respectively, all evaluated in ( 1 , 2 ) = ( 1 , 0). Provided the conditions of Theorem 2 are satisfied, we have for estimation in the wide model, as well as for estimation in the narrow model. Coupling this with we find that under the null hypothesis 2 = 0, where (U, V) is mean zero multinormal with covariance matrix H + 2C, with C the probability Contrary to the standard likelihood case, the limiting random variable M will not be a chi-square (e.g., chapter 22 of Ferguson, 1996 for Wilks theorem), but can relatively easily be simulated, via multinormal realizations of (U, V) from the N p+q (0, H + 2C) distribution, inserting the consistent estimators for H and C, similar to those introduced above.

A simulation study
In this section we assess the asymptotic results in a finite-sample setting. To do so, we simulated data of the form presented in Section 2 for n = 1, 000 individuals. Specifically, exploiting the Markovian property of the Ornstein-Uhlenbeck processes, we simulated these over a fine partition of the unit interval according to with z i,2 (jΔ) = z i,2 sin(2 jΔ), j = 0, 1, … , 1∕Δ, and x i , z i,1 , z i,2 , i = 1, … , n taken as independent standard normals. The time points, say J i ⊂ {t j ∶ j = 0, 1, … , 1∕Δ}, at which observations were made, were determined by independent Bernoulli sampling with success probability 1∕60, which means that the number of observations per individual is about 17. Finally, the observed zero-one processes were generated by taking Y i, To these data, we fit three models; one small, one correctly specified, and one big; with the misspecification in both the small and the big model taking place in the threshold function. In the small model the time varying covariate is excluded, that is, c small,i = 0 + 1 z i,1 ; while the big model contains an extra covariate z i,3 , also taken as independent standard normal, independent from the three other covariates; thus c big,i (t) = 0 + 1 z i,1 + 2 z i,2 (t) + 3 z i,3 .
The simulations and the estimation were conducted in the R programming language (R Core Team, 2013). Implementing the quasi log-likelihood functions only requires computing bivariate normal probabilities of the type Pr { i (t j−1 ) ≤ c i (t j−1 ), i (t j ) ≤ c i (t j )}, which can be done by numerical integration using the formula displayed in (A2). The remaining three probabilities are then taken care of by the identities in (A3). Note, however, that these probabilities may differ for each individual and for each time point. The quasi log-likelihoods were then maximized using the nlm()-function in R. With the amount of data used for the simulations, that is about n × (n −1 ∑ n i=1 k i ) ≈ 1,000 × 17 individual data points, the optimization took about 10 min on a standard portable computer.
For each simulation we computed the qlic score for each of the three models. In 86 of 100 simulations, the qlic selected the true model above the big model, and it always chose the true model over the smaller model. When studying the quantile-quantile plots in Figure 3, it should be kept in mind that the number of observations per individual is small (i.e., low sampling frequency), and that the underlying truth, involving the time varying covariate z i,2 (t) = z i,2 sin(2 t), is quite complicated. The estimates and the qlic do, however, appear to be behaving as expected.
Remark 2. Sampling frequency. In Section 3.3, consistency and limiting normality were derived under the n → ∞ regime. One could, however, consider other types of asymptotic regimes, either of the infill-variety where n is held constant and the k i s tend to infinity; or where a ratio of max i≤n k i and n tends to some constant. Even in the n → ∞ setup of this paper, a pertinent question is whether the sampling frequency (the size of the k i s) affects the precision of the quasi-likelihood estimates. As a first stab at this question, we estimated the asymptotic relative efficiency of the estimates of the true model estimated above, compared to the same model but with higher sampling frequency. Recall that in the former, we sampled the 1001 equidistant time points partitioning the unit interval, with probability 1∕60. In a new set of simulations, we sampled them with probability 1∕40. This means that the expected number of observations per child increases from about 17 to about 25. Looking at the average over 100 simulations of estimated ratios of the type Var(low freq. estimate)∕Var(high freq. estimate), for the estimates of 0 , 1 , 0 , 1 , and 2 in the correctly specified model, we got 1.164, 1.340, 1.047, 1.007, and 1.057, respectively. These numbers indicate that the variance decreases as the sampling frequency increases, notably, the rather modest increase in sampling frequency appear to have a large effect on the precision with which we are able to estimate the parameters entering the covariance function, that is, 0 and 1 . These results merit further investigation.

The data and previous modeling strategies
The data analyzed in this section have previously been studied in Borgan et al. (2007) using a version of Aalen's linear hazard regression model. A more elaborate discussion of the data is found in that paper. For comparative purposes, we briefly present the model of Borgan et al. (2007) and fit one such to the data, thereafter, we fit three different latent Ornstein-Uhlenbeck process models. The adequacy of the Ornstein-Uhlenbeck process models compared to the linear hazard models may be evaluated using the focused information criterion introduced in Jullum and Hjort (2017), and extended to regression models in Cunen, Walløe, and Hjort (2020) and Claeskens, Cunen, and Hjort (2019). Since such comparisons must be rather elaborate and would lead us too far afield, we do not pursue such a study of different model classes here. As part of a sanitation program in the metropolitan area of Salvador, Brazil, the Institute of Public Health at the Federal University of Bahia conducted several studies and data gathering efforts. One of these consisted of surveying the extent to which infants in the Salvador area suffered from episodes of diarrhoea. Data collectors were assigned to households and conducted home visits over a period of 455 days from October 2000 to January 2002. One child aged under 3 years at entry was monitored from each household. A major challenge with these data is the different types of missingness, clearly visible by the white rectangles in Figure 1. Some 16% of the children entered late into the study and about 21% of the children dropped out of the study before the completion date. In addition, there are the intermittent missingness, whereby observation was interrupted but later resumed. According to Borgan et al. (2007), this type of missingness was mainly due to data collectors not being available. Cases where the child was not available for a home visit are more problematic as they can invalidate our assumption of the observation times being independent of the underlying processes. Such breaches of the independence assumption occur if the children could not be visited, or were not at home, because of their health condition. The data do not contain information about the reasons for which censoring occurs, and in our analysis we have assumed that censoring is independent of the underlying health processes. See Remark 5 for further discussion of this point. The data collectors were assigned contiguous identification numbers, which explains the white rectangles visible in Figure 1. The periods during which there are no observations are due to periods of vacation and a strike among the data collectors. Borgan et al. (2007) studied a class of counting process models for recurrent events data in discrete time with linear hazard rates and used martingale methods to derive the limiting distribution of the estimators of the cumulative coefficients B k (t) = ∑ t s=1 k (s), k = 0, … , p. Due to the three forms of censoring they had to introduce a 'missingness process' indicating whether a child was observed, not observed, or had dropped out of the study (Borgan et al., 2007). The value of this process at a given time must be assumed known prior to this time (i.e., it must be predictable), and it is assumed conditionally independent of the counting process, given the past. This latter assumption is similar to our assumption of the observations times being independent of the latent Ornstein-Uhlenbeck processes, while the former is immaterial for our model. In our model, the reason for which a value is missing is irrelevant, as long as it is independent of the state of i (t). Table 1 displays estimated cumulative regression coefficients for a model of the form (14), along with standard errors (SEs). Estimators have approximately normal distributions, so Wald ratio tests may be read off from the table, pointing to those cumulative coefficients which are significantly present.

Fitting clipped Ornstein-Uhlenbeck process models
We fitted three clipped Ornstein-Uhlenbeck process models to the Brazilian data. The results are shown in Table 2. This table contains the parameter estimates and the estimated standard errors, along with the Wald statistics; the latter are the parameter estimates divided by their approximate standard errors, that is, the statistic testing the null-hypothesis of no effect against its two-sided alternative. For illustrative purposes we also include the qlic score of the three models, along with the M n statistics of (12), comparing a big model to a medium model, and a medium model to a small model. As the three models in Table 2 are arrived at in a rather ad hoc fashion, the two model selection criteria should not be taken too seriously. In the big model we see that six of the estimated coefficients are not significant at the 0.05 percent level, removing these we obtain the medium model, with a somewhat superior qlic score. The small model, that only includes the two intercepts and the log of time since the start of the study, appears to be too parsimonious as its qlic score is inferior to the two bigger models. The M n -statistic of (12) comparing the big and TA B L E 2 Parameter estimates and approximate SEs for three Ornstein-Uhlenbeck process models, along with the ratio of the two (i.e., testing the null-hypothesis of no effect against its two-sided alternative). The upper panel contains estimates of the entering the covariance function, while the lower panel contains estimates of the of the threshold c(t).
The last line is the M n of (12) the medium model does not lead to rejection of the null-hypothesis of the medium model being true; the M n -statistic pitting the medium against the small model does reject the hypothesis of the small model being true. The linear hazard model and the clipped Ornstein-Uhlenbeck process models are very different models, so one should be careful in comparing the two. It is, however, interesting to note that some of the covariates seem to have an effect in the linear hazard model of Table 1, but not in the big model of Table 2, while the converse is only true for one of the covariates. Two possible reasons for this could be the efficiency loss due to the quasi-likelihood estimation, or the fact that four of the insignificant coefficients in the big clipped Ornstein-Uhlenbeck model enter the covariance function of the Ornstein-Uhlenbeck process and thereby play a different role in this model compared to the linear hazard model, where they work directly on the hazard.
This brief discussion highlights the importance of meticulously thinking through which covariates should enter what regression part of the clipped Ornstein-Uhlenbeck process model, a task that, admittedly, requires a certain intuition for the phenomenon under study.
According to the qlic, the medium model provides a better fit to the data than the big and the small models. The negative estimate of the coefficient on the binary age variable (> 24 months) indicates that children older than 2 years tend to have less oscillating health than those below 2 years. The effects of consuming water from a contaminated water storage and living in the proximity of standing water seem to work in the same direction, that is, by attenuating the oscillation of the underlying health processes, but it is likely that they do so for rather different reasons. Older children are less prone to falling sick and have longer streaks of good health, while children exposed to polluted water are likely to stay ill for longer periods of time when they fall ill.
This is the ratio for boys whose mothers are below 25 years old and that are living in rain-affected accommodation, in the proximity or not to open sewerage. The upward sloping curve indicates that the sanitation program had a larger effect in the areas that were not plagued by open sewerage at the start of the program. The estimate displayed in the plot was obtained by plugging in the quasi-likelihood parameter estimates, and then using the delta method to obtain a pointwise confidence band.

CONCLUDING REMARKS
In this section some possible extensions of the clipped Ornstein-Uhlenbeck process model are briefly introduced, along with some complementing remarks.
Remark 3. Dependency/Contagion. Since diarrhoea is a highly contagious disease, a shortcoming of the model introduced in this paper is that possible dependencies between the children is not taken into account. An extension of our model that accounts for dependency between the children is obtained by taking the zero-one processes Y i (t) = I{ i (t) ≥ c i (t)} as above, but with i (t) a spatial Gaussian process { i (t) ∶ i = 1, … , n, 0 ≤ t ≤ T}. In the case of the Brazilian data, such a model  Table 2. Pointwise 95% confidence intervals demands information about the geographical proximity of the children to each other. Given that such information is available, one could take Cov( i (t), j (s)) = exp(−a 1 |t − s| − a 2 d(i, j)), for nonnegative constants a 1 and a 2 and some distance d. The child-specific covariates would then enter the threshold functions. One extension of the quasi-likelihood appropriate for this model is to consider pairs of observations horizontally and vertically, so to speak, thus letting the extended quasi-likelihood consist of probabilities of the type Pr {(Y i,j−1 , Y i,j ) = (u, v)}, for j = 1, … , k i , i = 1, … , n, as above, but also where n j is the number of children with a jth observation. Other constructions along the lines of Hjort and Omre (1994) and Nott and Rydén (1999) could also be worked with. Other applications where this type of extension of our model is of interest include random effects type models, where the zero-one sequences are clustered in subgroups of various sizes, and associated with each individual in a group there is an unobservable threshold, c i,k say, centered around a group specific threshold c k ; or in genomics, where the assessment of genomic co-occurrence-which comes down to assessing the similarity of genome-wide binary vectors-is an active field of research. See the recent PhD thesis of Rand (2019), and in particular Salvatore et al. (2019), where it is argued that genome-wide binary vectors ought to be regarded as generated by correlated Gaussian processes, clipped at various thresholds.
Remark 4. Focused inference for the quasi-likelihood. The qlic introduced in (11) assesses overall goodness-of-fit issues of the model. In many situations the research question concerns a clearly specified statistical quantity, and the statistical model works as a vehicle in providing inference about this specified quantity. The focused information criterion (fic) (Claeskens et al., 2019;Claeskens & Hjort, 2003, 2008Cunen et al., 2020;Jullum & Hjort, 2017) takes this into account and aims at selecting the optimal model in terms of mean squared error for a prespecified statistical quantity. fic-theory can be developed for the quasi-likelihood worked with in this paper, as well as for the general composite likelihood case (Varin et al., 2011), thus yielding the possibility of focused model selection in cases where the full likelihood is computationally infeasible.
Remark 5. Endogenous observation times. As noted in Section 4.1 the assumption of the observations times being independent of the underlying process i (t) is in many application untenable. An interesting future research project is the development of methods of inference, for example, the quasi-likelihood approach, for clipped Ornstein-Uhlenbeck processes when the observation times are endogenous. As an example, consider the following modification of the model studied in this paper: For the ith child, the process i (t) defined in Section 2.1 can be characterized as the solution to the stochastic differential equation d i (t) = −a i i (t) dt + (2a i ) 1/2 dB i (t), where B t is a standard Brownian motion. Let B ′ i (t) be a Brownian motion correlated with B i (t) and suppose that the observation times are generated by a point process whose intensity i (t) is the solution to d i (t) = i ( i − i (t)) dt + i i (t) 1∕2 dB ′ i (t), for positive parameters i , i and i satisfying the Feller condition.
Remark 6. Multistate data and several thresholds. Consider multistate data where the states are, at least, on the ordinal level of measurement. Data on the stages of a disease might be of this type. Pursuing the idea of this paper, one could consider models where the indicator function Y i (t) takes on more than two values, and where the different states correspond to the level of an underlying continuous process. For example, a three-state model takes Y i (t) = I{c i,1 (t) < i (t) ≤ c i,2 (t)} + 2I{c i,2 (t) < i (t)}, where c i,1 (t) < c i,2 (t), and these are possibly time-varying thresholds, and i (t) is an Ornstein-Uhlenbeck process of the type introduced in Section 2.1.
Remark 7. Crossings data. Suppose that what we observe are the times at which the underlying process crosses the thresholds in either direction. Such data would for example arise if each of the children in the Brazil data were continuously monitored. The true likelihood would in this situation consist of probabilities of the type with t i,1 , … , t i,k i the times of the crossings. This is a rather different object from the likelihood given in (2), and the quasi-likelihood of (3) would in this situation incur a larger efficiency loss than when the true likelihood is that of (2). Since likelihoods consisting of probabilities such as the one above are computationally burdensome, quasi-likelihood techniques ought to be developed for this kind of data. Note that the model we sketch here is closely related to first hitting time models in survival analysis, see Caroni (2017) for a booklength treatise such models.

ACKNOWLEDGMENTS
We thank Robin Henderson for kindly providing us with the Brazil data. We are also grateful to the editor and two anonymous referees for constructive comments which have contributed to a stronger paper. The work of E.A.S toltenberg was supported by the PharmaTox Strategic Research Initiative, Faculty of Mathematics and Natural Sciences, University of Oslo. The work of N.L. Hjort has been supported by the Norwegian Research Foundation, via the project FocuStat (Focus Driven Statistical Inference With Complex Data, led by Hjort).