State estimation for aoristic models

Aoristic data can be described by a marked point process in time in which the points cannot be observed directly but are known to lie in observable intervals, the marks. We consider Bayesian state estimation for the latent points when the marks are modelled in terms of an alternating renewal process in equilibrium and the prior is a Markov point point process. We derive the posterior distribution, estimate its parameters and present some examples that illustrate the influence of the prior distribution.


Introduction
Inference for point processes on the real line has been dominated by a dynamic approach based on the stochastic intensity or hazard function (Brémaud, 1972;Karr, 1991;Last & Brandt, 1995).Such an approach is quite natural, is amenable to likelihood-based inference and allows the utilisation of powerful tools from martingale theory.However, it breaks down completely when censoring breaks the orderly progression of time.In such cases, state estimation techniques are needed that are able to fill in the gaps (Brix & Diggle, 2001; Van Lieshout, 2016).
In this paper, we concentrate on aoristic data (Ratcliffe & McCullagh, 1998) in which the points may not be observed directly but upper and lower bounds exist.Such data are commonplace in criminology.Suppose, for example, that a working person leaves his place of residence early in the morning and returns late in the afternoon to discover that the residence has been burgled.Then the exact time of the break-in cannot be determined, but it must have happened during the absence of the resident.In rare cases, a burglar may also be caught in the act, in which case the time of break-in coincides with the bounds.The actual break-in times tend to be estimated by ad hoc, naive approaches, e.g. the mid-point of the reported interval (Helms, 2008) or the length-weighted empirical probability mass function of the interval lengths.An obvious disadvantage of such methods is that dependencies between offence times, such as the near-repeat effect (Bernasco, 2009), are ignored.
The focus of this research is to develop a Bayesian inference framework for aoristic data that is able to infer missing information and takes into account expert knowledge and interaction.Specifically, in Section 2 we formalise aoristic censoring as a marked point process in time in which the points cannot be observed directly but are known to lie in observable intervals, the marks.Upon employing a Markov point process prior ( Van Lieshout, 2000), the posterior distribution of the point locations is derived in Section 3. In Section 4 we turn to Monte Carlo based inference.The paper is concluded by simulated examples that demonstrate the influence of the prior (Section 5) and some reflections on future research. (2) Figure 1: View of two alternating renewal processes.In (1), a point t of the point process X falls in a Yphase.In criminological terms, this may correspond to a person being away from home and being burgled; the exact time of burglary is unknown.Hence the whole interval is recorded.In (2), a point t of the point process X falls in a Z-phase.This corresponds to a person being home and being burgled.In this case, the exact time of burglary is known, thus the exact point is recorded.Renewal times are denoted by S i , S i+1 with S 0 = 0.
2 Marked point process formulation

Alternating renewal processes for censoring
In this paper, we consider a censoring mechanism based on an alternating renewal process.Let C 1 , C 2 , ... be a sequence of random 2-vectors such that C i = (Y i , Z i ), i ∈ N, are independent and identically distributed (Asmussen, 2003;Ross, 1996).Furthermore, assume that C i has joint probability density function T i be the time of the nth renewal.Note that no renewal occurs at the end of a Y -phase.Furthermore, assume that 0 < ET 1 < ∞.Then, by the strong law of large numbers, is well-defined and the supremum is attained with probability one.Furthermore, the renewal function is finite and absolutely continuous with respect to Lebesgue measure (Ross, 1996, Chapter 3).
An alternating renewal process can be used for censoring in the following way.Let X be a temporal point process on R + (Brémaud, 1972;Daley & Vere-Jones, 2003, 2008;Last & Brandt, 1995) and let each point t ∈ X be associated with an alternating renewal process, independently of other points of X.Now, if t happens to fall in some Z-phase, t is observed perfectly, whereas t is observed aoristically if it falls in a Y -phase.The censoring mechanism is illustrated in Figure 1.

Age and excess distribution
Aoristic data generated by the censoring mechanism described in Section 2.1 can be expressed in terms of the age and excess (also referred to as residual lifetime) with respect to the Y -process.Recall that for an alternating renewal process, the age with respect to the Y -process is defined as the excess with respect to the Y -process as Let us parametrise an interval on the real line by its left-most point and length.In other words, a pair (a, l) ∈ R × R + corresponds to the closed interval [a, a + l].Then, the recorded interval of a latent point t ≥ 0 can be written as First, let us consider the joint distribution of age and excess with respect to the Y -process.
Proposition 2.1.Let N be an alternating renewal process as in (2.1).Assume that T 1 is absolutely continuous with respect to Lebesgue measure and that 0 < ET 1 < ∞.Then, for t ≥ 0, the joint distribution of (A(t), B(t)) has an atom at (0, 0) of size and, for 0 ≤ u ≤ t, v ≥ 0, Here F Y denotes the cumulative distribution function of Y 1 and M is the renewal function.
The claim follows by an application of Fubini's theorem for the last term, the observation that if u = t and zero otherwise, and because The long-run behaviour as time goes to infinity can be obtained by appealing to the key renewal theorem (for example found in Ross (1996, Theorem 3.4.2)).Specialising to the parameter vector I(t), the following theorem holds.
Theorem 2.2.Let N be an alternating renewal process as in (2.1).Assume that Y 1 and T 1 are absolutely continuous with respect to Lebesgue measure with probability density functions f Y and f and that 0 < ET 1 < ∞.Then (−A(t), A(t) + B(t)) tends in distribution to ν, the mixture of an atom at (0, 0) and an absolutely continuous component that has probability density function f Y (l)/EY 1 on {(a, l) ∈ R × R + : a ≤ 0 ≤ a + l}.The mixture weights are, respectively, EZ 1 /ET 1 and EY 1 /ET 1 .
Proof.First, let us consider the limit behaviour of the joint cumulative distribution function of A(t) and B(t) as t → ∞.With the notation of Proposition 2.1, by Theorem 3.4.4 in (Ross, 1996), c(t) converges to EZ 1 /ET 1 .Also, for t > u, the second term in the joint cumulative distribution function of A(t) and B(t) is zero.For the last term, note that for v ≥ 0 the function h v : R + → R defined by h v (s) = 1 − F Y (v + s) is non-negative, monotonically non-increasing and integrable.Hence the key renewal theorem implies that We conclude that Q(u, v) = lim t→∞ P(A(t) ≤ u; B(t) ≤ v) exists and equals Note that Q is the cumulative distribution function of the mixture of an atom at (0, 0) and an absolutely continuous component with probability density function f Y (u + v)/EY 1 on (R + ) 2 .By Helly's continuity theorem, (A(t), B(t)) converges in distribution.
Turning to the parametrisation I(t) = (−A(t), A(t) + B(t)), its limit distribution inherits an atom at (0, 0) from Q of size EZ 1 /ET 1 .By the change of variables bijection h :

Complete model formulation
We are now ready to formulate a model.Let X be an open subset of the positive half-line R + .The state space of X, denoted by N X , consists of finite sets {t 1 , . . ., t n } ⊂ X , n ∈ N 0 , which we equip with the Borel σ-algebra of the weak topology (Daley & Vere-Jones, 2003, 2008).We will assume that the distribution of X is specified in terms of a probability density function p X with respect to the distribution of a unit rate Poisson process on X ( Van Lieshout, 2000).
Upon labelling the points of X independently with a mark according to the mixture distribution of Theorem 2.2, denoted by ν, the complete model W is obtained.Its realisations are sets {(t 1 , I 1 ), . . ., (t n , I n )} ⊂ X × (R × R + ).For parametrisation I j = (a j , l j ), the pair (t j , I j ) defines an interval [t j + a j , t j + a j + l j ].The ensemble of all realisations is denoted by N X ×(R×R + ) and equipped with the Borel σ-algebra of the weak topology.Note that W has probability density function p X with respect to the distribution of a Poisson process on X × (R × R + ) with intensity measure × ν where is Lebesgue measure.
Due to the censoring, one does not observe the complete model W but rather the set Our aim is to reconstruct X or W from U .In order to do so, the posterior distribution of X or W given U is needed.This will be the topic of the next section.

The Bayesian framework
In a Bayesian framework, the posterior distribution updates prior forms in the light of data gathered (Gamerman & Lopes, 2006).Heuristically, through the use of Bayes' theorem.The term p U |X (u | x) describes the likelihood that the points of x generate the intervals in u.In the literature this term is referred to as a forward term, forward density or forward model ( Van Lieshout, 1995;Van Lieshout & Baddeley, 2002).The term p X (x) captures prior beliefs about the geometry of x.In our context, since the forward model is a mixture of discrete and absolutely continuous components, some care is required in handling (3.1).
Theorem 3.1.Let W be a point process on the open set X ⊂ R with probability density function p X with respect to the distribution of a unit rate Poisson process on X marked independently with mark distribution ν defined in Theorem 2.2 .Write X for the ground process of locations in X and consider the forward model (2.2).Let u be a realisation of U that consists of an atomic part {(a 1 , 0), . . ., (a m , 0)}, m ∈ N 0 , and a non-atomic part {(a m+1 , l m+1 ), . . ., (a n , l n )}, n ≥ m.Then the posterior distribution of X given U = u satisfies, for A in the Borel σ-algebra of the weak topology on N X , Proof.We must show that for each A in the Borel σ-algebra of N X with respect to the weak topology and each F in the Borel σ-algebra of the weak topology on N R×R + the following identity holds: describe a probability density function for parametrisations of intervals generated by x ∈ X , noting that it is jointly measurable as a function on X × (R × R + ).Then, denoting the cardinality of a set by | • | and Lebesgue measure by , (3.3)For the left-hand side of (3.2), expanding as before, we get Plugging in the claimed expression for the conditional expectation, one obtains Note that in order to cancel terms, the order of integration must be changed.As evidently the first term in q x (a, l), that is f Y (l)/EY 1 , does not depend on x, by Fubini's theorem, after cancelling and rearranging terms and noting that the term in between brackets cancels out against the normalisation constant c({u 1 , . . ., provided that x i ∈ [a m+i , a m+i + l m+i ] for i = 1, . . ., n − m and zero otherwise. As a special case, let us consider an inhomogeneous Poisson process with integrable intensity function λ : X → R + .Then, under the posterior distribution, X consists of n independent points, one in each interval of u, with probability density function λ(x) [ai,ai+li]∩X λ(s)ds on [a i , a i + l i ] ∩ X for intervals with l i > 0. To see this, recall that for a Poisson process ( Van Lieshout, 2000) p X ({a 1 , . . ., a m , x 1 , . . ., which factorises over terms associated with each interval.Hence (3.4) is proportional to

Statistical inference
In this section we will consider statistical inference for aoristically censored data.Our main aim is to reconstruct the latent point process X from observed parametrised intervals U , that may or may not be censored.In tandem, the censoring probability as well as the parameters η of the distribution of the nondegenerate intervals must be estimated.Parameters of the prior distribution may either be treated as fixed or subject to estimation.

Forward model parameters
Suppose that we observe a realisation u = {(a 1 , 0), . . ., (a m , 0), (a m+1 , l m+1 ), . . ., (a n , l n )} of U , where a i ∈ R, l i > 0 and n = 0. Our first aim is to estimate the parameters η of the mark distribution ν (cf.Theorem 2.2).The parameter vector η comprises the parameters ζ of the probability density function f Y as well as any other parameters θ involved in the joint distribution of the random vector C 1 = (Y 1 , Z 1 ) that defines the alternating renewal process (cf.Section 2.1).
The likelihood function can be obtained from the proof of Theorem 3.1 by taking A equal to N X in equation (3.3).On a logarithmic scale, upon ignoring terms that do not depend on η.Equation (4.1) simplifies greatly if we assume that the mixture weight p .
The atom probability p may be estimated by m/n, the fraction of atoms in the sample u.For ζ, we need the following result.
Proposition 4.1.Let ν be as in Theorem 2.2.Then the distribution of the lengths of non-degenerate intervals is given by the length-weighted marginal distribution f (l) = lf Y (l)/EY 1 and the left-most points are, conditionally on L = l, uniformly distributed on [−l, 0].
Proof.Let f (l) be the marginal distribution of the length l.Evaluating, .
Let f A|L=l (a) be the conditional probability density function for the left-most point A of an interval given its length L. Using the definition of conditional density, When the censoring probability does not depend on ζ, the latter may be estimated by treating the nondegenerate intervals as an independent sample from f (l) and applying the maximum likelihood method.For example, if f Y is the probability density function of a Gamma(k, λ) distribution with shape parameter k > 0 and rate parameter λ > 0, f (l) is the probability density of a Gamma distribution with parameters k + 1 and λ.

State estimation
Since the posterior distribution of X or W given U (cf. Theorem 3.1) is intractable because of the normalisation constant c(u), we will use Markov chain Monte Carlo methods (Brooks, Gelman, Jones & Meng, 2011;Møller & Waagepetersen, 2003) for simulation.These methods construct a Markov chain in such a way that the stationary distribution of the chain is exactly the posterior distribution.Of these methods, a Metropolis-Hastings algorithm with a fixed number of points will be used.Since the transition probabilities depend on likelihood ratios, the benefit is that one can sample from unnormalised densities.
Let us return to the framework of Theorem 3.1.Note that sampling from the posterior distribution of X given U is cumbersome due to the presence of the permutation sum term in (3.4).Therefore our approach is to sample from the posterior distribution of the complete model W and project on its ground process of locations.Doing so avoids attributing points to intervals and therefore avoids the intractable sum.Moreover, as we saw in Section 2.3, W has probability density function p X with respect to a unit rate Poisson process on X × (R × R + ) with intensity measure × ν.Upon observing U = u for u = {(a 1 , 0), . . ., (a m , 0), (a m+1 , l m+1 ), . . ., (a n , l n )}, by (3.4) this means that we must sample from a probability density function π on X n−m that is proportional to p X ({a 1 , . . ., a m , x 1 , . . ., x n−m }).The ordering of the points inherent in working on X n−m represents the unique correspondence between points in X and intervals in U in the complete model.We will use the notation x to indicate that we look at vectors rather than sets x.In the special case that n = m, all points are observed perfectly and there is no need for any simulation.
We will use the Metropolis-Hastings algorithm (Brooks, Gelman, Jones & Meng, 2011) when n > m, i.e. when there are density-admitting points.The state space is given by From now on, we shall assume that the state space is non-degenerate in the sense that Now, the Metropolis-Hastings algorithm is defined as follows.Let q : E(u) × E(u) → R + be a Markov kernel.Iteratively, if the current state is x ∈ E(u), propose a new state ȳ ∈ E(u) according to the probability density function q(x, •) and accept the proposal to move to ȳ with probability . ., a m , y 1 , . . ., y n−m }) q(ȳ, x) ≥ p X ({a 1 , . . ., a m , x 1 , . . ., x n−m }) q(x, ȳ); p X ({a1,...,am,y1,...,yn−m}) q(ȳ,x) p X ({a1,...,am,x1,...,xn−m}) q(x,ȳ)} otherwise. (4.3) When the proposal is rejected, stay in the current state x.The choice of q depends on p X .In our simulations in Section 5, we will use the following algorithm which is valid when the prior density function p X is strictly positive.
Algorithm 4.2.Supppose that p X > 0 and n > m.Iteratively, if the current state is x ∈ E(u), • pick an interval [a m+i , a m+i + l m+i ], i = 1, . . ., n − m, uniformly at random from the non-degenerate ones; • generate a uniformly randomly distributed point y i on X ∩ [a m+i , a m+i + l m+i ] and propose to update x i to y i ; • accept the proposal with probability and otherwise stay in the current state.
A few remarks are in order.First, note that since X is open, the intersection with closed intervals that contain a point in X is also non-degenerate when l i > 0. Secondly, when p X may take the value zero, the proposal mechanism in Algorithm 4.2 might result in a new state that does not belong to E(u), even when x does.Moreover, only changing one component at a time might lead to non-irreducible Markov chains.For example, if u contains the parametrisations of the intervals [0, 1] and [0.1, 1] and p X (x) = 0 for realisations x that contain components separated by a distance less than 0.55, then states such as x = (0.3, 0.9) and ȳ = (0.9, 0.3) cannot be reached from one another.
Let the target distribution be π given by (3.4) interpreted as a probability density on E(u).In the next propositions, basic properties of the algorithm are considered.The proofs are modifications to our context of classic Metropolis-Hastings proofs found in, for example, (Mengersen & Tweedie, 1996), (Roberts & Smith, 1994) or (Møller & Waagepetersen, 2003, Chapter 7).
We will write Y i for subsequent states and denote by P (x, Proposition 4.3.Consider the set-up of Theorem 3.1 with n > m and assume that condition (4.2) is met.Then, the Metropolis-Hastings algorithm defined by Markov kernel q on E(u) and acceptance probabilities (4.3) is reversible with respect to π.
Proof.Take x, ȳ in E(u) and assume that π(ȳ) q(ȳ, x) > π(x) q(x, ȳ) ≥ 0. Then writing c(u) for the normalisation constant.We conclude that the chain is in detailed balance and therefore reversible with respect to π.
Recall that the Markov chain is called π-irreducible (Meyn & Tweedie, 2009) if for every x ∈ E(u) and every F ⊂ E(u) with π(F ) > 0 there exists some natural number τ such that P τ (x, F ) > 0.

Prior model parameters
In Section 4.2, we discussed Monte Carlo methods to sample from the posterior distribution of W or X given U = u.This distribution is defined in terms of the prior probability density function p X .Typically, p X is given in unnormalised form and depends on a parameter vector θ, that is, p X (x; θ) = c(θ)h X (x; θ) for an explicit function h X : N X → R + .When θ is treated as an unknown, since the likelihood function for θ contains the latent marked point process W , we call on techniques from missing data analysis.
The likelihood function l(θ) is obtained from the proof of Theorem 3.1 by taking A equal to N X in equation (3.3).Disregarding terms that do not depend on θ, one obtains l(θ; u) = c(θ)c(θ|u) −1 where and c(θ|u) −1 is given by One observes that c(θ|u) is equal to the normalisation constant on E(u) of the non-atomic part of the posterior distribution of X given U = u.
To handle the two normalisation constants, it is necessary to look at the log relative likelihood L(θ) of U with respect to some fixed and user-selected reference parameter θ 0 , Then, as in (Gelfand & Carlin, 1993;Geyer, 1999), Being expressible in terms of expectations under the reference parameter, the log likelihood ratio can be approximated by Markov chain Monte Carlo methods.Note that two samples are required: one from the posterior distribution of X and one from the prior.For the latter, provided p X is locally stable, classic Metropolis-Hastings methods based on births and deaths apply (Geyer, 1999;Møller & Waagepetersen, 2003).If the conditional intensity is monotone, exact simulation can be carried out (Kendall & Møller, 2000;Van Lieshout & Baddeley, 2002).

Examples
In this section, we present a few examples to illustrate how the choice of prior affects state estimation.Calculations were carried out using the C++ marked point process library MPPLIB, developed by Steenbeek et al.For p X we choose the area-interaction point process (Widom & Rowlinson, 1970;Baddeley & Van Lieshout, 1995), a model that favours clustered, regular and random realisations depending on parameter values.Specifically, this model has probability density function with respect to a unit rate Poisson process on X .Here U r (x) = n i=1 B(x i , r) where B(x i , r) is the closed interval [x i −r, x i +r].When γ < 1, realisations tend to be regular, for γ > 1 clustered.When γ = 1, one has a Poisson process with intensity β.The scalar α is a normalisation constant.Realisations can be obtained by Kendall's dominated coupling from the past (CFTP) algorithm (Kendall, 1998) developed initially from the perfect simulation methods of Propp and Wilson for coupled Markov chains (Propp & Wilson, 1996).

Toy example
Consider data u = {(0.45,0.4), (0.51, 0), (0.58, 0)} that consist of two atoms and a single non-degenerate interval.By the discussion at the end of Section 3, for a Poisson prior (γ = 1), the posterior distribution of the location X 3 in X = (0, 1) that generated the non-degenerate interval is uniformly distributed.To see the effect of informative priors, Figure 2 plots the posterior distribution of X 3 when the prior is an area-interaction model with η = 2r log γ = 1.2 and r = 0.1.Note that mass is shifted to the left side of the interval due to the presence of atoms.For η = −1.2 and r = 0.1, the atoms repel X 3 , resulting in mass being shifted to the right side of the interval (cf. Figure 3).To carry out the state estimation, we ran Algorithm 4.2 with a burn-in of 10,000 steps and calculated the histograms based on the subsequent 100,000 steps.

Area-interaction gamma model
The left-most panel in Figure 4 shows a simulation in X = (0, 1) from U in a model where X is an areainteraction process with parameters β = 12, η = 2r log γ = 0 and r = 0.05 marked by a mixture distribution ν in which the atom probability is p = 0.2 and f Y is the probability density function of a Gamma distribution with shape parameter k = 2.5 and rate parameter λ = 0.07.The points shown as black dots are the points of X in the simulated pattern, the red points constitute a realisation from the posterior distribution of X given U obtained by running Algorithm 4.2 for 10,000 steps.The points seem to settle in a random manner within the intervals.A simulation using a prior favouring clustering can be found in black in Figure 5.The parameter settings were as before except that η = 1.2.In red, a realisation from the posterior distribution is shown, obtained after 10,000 steps from the Metropolis-Hastings algorithm.Figure 5 shows the effect of the complex underlying geometry when choosing proposal points within parametrised intervals.The algorithm tends to move proposed times to areas where multiple intervals intersect, leading to clustering within these regions.Figure 6 shows, in black, a simulation from U for a regular area-interaction prior with η = 2r log γ = −1.2.A realisation from the posterior distribution of X is shown in red.The structure of the prior point process is maintained in the posterior, with points being spread out from each other.

Conclusion
In this work, a Bayesian inference framework for aoristic data was introduced in which an alternating renewal process is used to interval censor temporal data, converting it into a marked point process model.A prospective point, which cannot be observed directly, was paired with an interval within which the point surely lies.State estimation was then applied to best estimate the location of this point.Theory was developed regarding the distribution of these marks based on this renewal framework and the posterior distribution deduced.The fact that the forward model allows for a mixture of discrete and absolutely continuous components makes this process nontrivial.A state estimation procedure was outlined in the form of a Metropolis-Hastings algorithm for a fixed number of points, after which ergodicity properties were verified.Using an area-interaction prior, this procedure was applied to sample from the posterior distribution.Effects of the prior are clearly present when sampling from the complete model.
Throughout, we assumed that all intervals corresponding to a point in X were observed.Returning to a criminology context, sampling bias may arise since the data may contain only intervals whose right-most point is in a given interval.Additionally, a random labelling regime was assumed.It might be more realistic to have location-dependent independent marking, for example based on a semi-Markov process rather than an alternating renewal process.Furthermore, spatial aspects were completely ignored.These generalisations will form the topic for our future research.

Figure 2 :Figure 3 :
Figure 2: Locations of two atoms and a interval together with a histogram of point locations.The interval start and end points as well as the atom locations are marked on the histogram x-axis.
Proof.Write F n for the cumulative distribution function of S n , n ∈ N. By partitioning over the number of renewals up to time t and upon noting that N (t) = n if and only if S n ≤ t and S n + Y n+1 + Z n+1 > t, one obtains that