Cox processes driven by transformed Gaussian processes on linear networks

There is a lack of point process models on linear networks. For an arbitrary linear network, we use isotropic covariance functions with respect to the geodesic metric or the resistance metric to construct new models for isotropic Gaussian processes and hence new models for various Cox processes with isotropic pair correlation functions. In particular we introduce three model classes given by log Gaussian, interrupted, and permanental Cox processes on linear networks, and consider for the ﬁrst time statistical procedures and applications for parametric families of such models. Moreover, we construct new simulation algorithms for Gaussian processes on linear networks and discuss whether the geodesic metric or the resistance metric should be used for the kind of Cox processes studied in this paper.


| INTRODUCTION
In Sir David R. Cox's highly influential paper 'Some Statistical Methods Connected with Series of Events' (Cox, 1955) he invented doubly stochastic Poisson processes obtained by a generalization of Poisson processes where the intensity function Λ that varies over space or time is a stochastic process.These Cox models play nowadays an important role when analysing point patterns in Euclidean spaces or on spheres (see Møller and Waagepetersen, 2004; Baddeley 1 arXiv:2212.08402v2[math.ST] 15 Jan 2024 F I G U R E 1 Left: The locations of street crimes in a part of Chicago.Right: The locations of spines on a dendrite tree (dataset number five in Christensen and Møller, 2020).The circle marks the root of the tree, the black lines are a main branch, and the grey lines are side branches. et al., 2015;Lawrence et al., 2016, and the references therein).In particular, Cox processes driven by a transformed Gaussian process (GP) Y or independent copies Y 1 , . . .,Y h of Y play a major role: A Log Gaussian Cox process (LGCP) has Λ (u ) = exp(Y (u ) ) (Møller et al., 1998;Cuevas-Pacheco and Møller, 2018); LGCPs constitute the most widely used subclass of Cox processes.Further, an interrupted Cox process (ICP) is obtained by an independent thinning of a Poisson process, where the selection probability of a point u is given by exp(− h i =1 Y i (u ) 2 ) (Stoyan, 1979;Lavancier and Møller, 2016).Moreover, a permanental Cox point process (PCPP) is obtained if Λ (u ) = h i =1 Y i (u ) 2 (Macchi, 1975;McCullagh and Møller, 2006).
In recent years there has been an increasing interest in analysing point patterns on a linear network L, that is, L is a connected set in k (the real coordinate space of dimension k ) given by a finite union of bounded, closed, line segments which can only overlap at their endpoints, see Ang et al. (2012), Baddeley et al. (2015), the references therein as well as further references given later in the present paper.Figure 1 shows two examples of point patterns observed on linear networks, which we will use for illustrative purposes throughout the paper.The first dataset was first analysed in Ang et al. (2012) and consists of 116 locations of street crimes reported in the period 25 April to 8 May 2002 in an area close to the University of Chicago.The second dataset is one of the six point pattern datasets analysed in Christensen and Møller (2020) and consists of 566 locations of spines on a dendrite tree protruding from a neuron (see also Rasmussen and Christensen, 2021).
The contribution of the present paper is the following.We use isotropic covariance function models with respect to the geodesic metric d G or the resistance metric d R on L as developed in Anderes et al. (2020) as well as new models developed in the present paper in order to construct models for isotropic GPs and hence new models for LGCPs, ICPs, and PCPPs on linear networks with isotropic pair correlation functions (more details follow in the next paragraph).Also we construct new simulation algorithms for GPs and consider for the first time statistical procedures and applications for parametric families of LGCPs, ICPs, and PCPPs on linear networks (however, our approach to

| Setting
This section specifies the setting of a linear network.
Denote k the k -dimensional Euclidean space, k ∈ {1, 2, ...}.For a linear network L = ∪ m i =1 L i , we assume m < ∞, each L i ⊂ k is a closed line segment of length l i ∈ (0, ∞), L i ∩ L j is either empty or an endpoint of both L i and L j whenever i j , and L is a path-connected set.We equip L with a 'natural' metric d (as given below) and arc length measure ν (A) = ∫ A d L (u ) for Borel sets A ⊆ L. We let |L | = ν (L ) = m i =1 l i denote the length of the linear network.Various remarks are in order.
For disconnected linear networks, definitions and results may be applied separately to each connected component of the network if we consider independent Gaussian processes on the connected components and independent point processes on the connected components.
The definition of a linear network may be extended to the more abstract case of a graph with Euclidean edges (Anderes et al., 2020) but we avoid this generalization for ease of presentation and since statistical methods have so far only been developed for the case (c).
For each line segment L i , there are two possible arc length parametrisations.We assume one is chosen and given by u i (t ) = (1 − t /l i )a i + (t /l i )b i , t ∈ [0, l i ], where a i and b i are the endpoints of L i .The definitions and results in this paper will not depend on this choice, including when calculating arc length measure restricted to L i : For Borel sets A ⊆ L i , ν (A) = ∫ l i 0 1(u i (t ) ∈ A) dt where 1(•) denotes the indicator function.Furthermore, let V denote the set of endpoints of L 1 , . . ., L m and consider the graph with vertex set V and edge set E given by L 1 , . . ., L m .Thus, two distinct vertices u, v ∈ V form an edge if and only if {u, v } = {a j , b j } for some j ∈ {1, . . ., m }, in which case we write u ∼ v .
We have two cases of natural metrics in mind, namely when d is the geodesic metric d G or the resistance metric d R .For u, v ∈ L, d G (u, v ) = min ν (p uv ) where the minimum is over all paths p uv ⊆ L connecting u and v .Section 2.2 below provides the more technical definition of d R .
Indeed there are other interesting metricincluding the least-cost metric (Rakshit et al., 2017), but to the best of our knowledge parametric models for isotropic covariance functions c (u, v ) = c 0 (d (u, v ) ) have so far only been developed when d = d G , d = d R , or d is given by the usual Euclidean distance.However, Euclidean distance is usually not a natural metric on a linear network.

| The resistance metric
This section defines the resistance metric d R for a graph with Euclidean edges (Anderes et al., 2020) in the special case of a linear network L = ∪ m i =1 L i as given in case (c) in Section 2.1.The section also summarises some properties of d R and compares with d G .
Consider the graph G = (V , E ) and its relation ∼ as defined above, and denote d V the classic (effective) resistance metric d V on V (Klein and M. Randić, 1993).Since d R is an extension of d V to L, we start by recalling the definition of d V using a notation as follows.Let u 0 ∈ V be an arbitrarily chosen vertex called the origin.For any u, v ∈ V , define the so-called conductance function by and define a matrix ∆ with rows and columns indexed by V so that its entry (u, v ) is given by where c (u ) = w ∈V : w ∼u con(u, w ) is the sum of the conductances associated to the edges incident to vertex u.In fact ∆ is symmetric and strictly positive definite, and it is similar to the 'Laplacian matrix' obtained when viewing G as an electrical network over the nodes with resistors given by the length of each line segment (see e.g.Kigami, 2003;Jorgensen and Pearse, 2010) except that ∆ has the additional 1 added at entry (u 0 , u 0 ) (this makes ∆ invertible).Now, let B 0 be a zero mean Gaussian vector indexed by V and having covariance matrix Σ = ∆ −1 .Then the resistance metric on V is the variogram Extend B 0 by linear interpolation to a zero mean Gaussian process (GP) Z 0 on L so that For i = 1, . . ., m, define a zero mean Brownian bridge B i on L i so that and define Finally, the resistance metric on L is defined by For the following theorem, which follows from Anderes et al. (2020, Propositions 2-4), we use a terminology as follows.A closed line segment in k with endpoints a and b is denoted If all vertices in G are of order two, we say that L is a loop (since L is isomorphic to a circle).If there is no loop, we say that L is a tree.
, with equality if and only if there is only one path connecting u and v .In While d R is not as intuitive as d G , it reflects the topology of L: Without loss of generality assume that u, v ∈ V , since if e.g.u ∈ L j \ V , we may split L j into the two line segments with endpoints {a j , v } and {v , b j }, and then consider a new graph with vertex set V ∪ {u } and edge set E ∪ { {a j , v }, {v , b j } }, cf.(B) in Theorem 1. Viewing the graph (V , E ) as an electrical network with resistor l i at edge L i , i = 1, . . ., m, we have that d R (u, v ) is the effective resistance between u and v as obtained by Kirkhoff's laws.These laws are in accordance with (C).For example, for the Chicago street network in Figure 1, max d R ≈ 675 feet is much smaller than max d G ≈ 2031 feet, and it is also smaller than the side length of a square surrounding the network which is a little less than 1000 feet.Finally, the result in (D) quantifies for a circular network how much less d G will be than d R .Of course it could be debated if using the resistance distance for the street network and in (D) are the right ways of quantifying connectedness, but at least we are not aware of any other metric on L than d R which reflects the topology and is useful for constructing valid isotropic pair correlation functions (as considered in Section 3 and later on).Moreover, the following proposition and the remarks below show that d R (u, v ) is nicely behaving.
Proposition 1 For any u ∈ L j and v ∈ L i , let Then A i ≤ 0 with equality if and only if L i is the only path connecting a i and b i , and d R (u, v ) satisfies the following.
so d R (u, v ) considered as a function of t is linear (the case A i = 0) or quadratic (the case A i < 0) on each of the intervals [0, s ] and [s, l i ], continuous on [0, l i ], and differentiable on [0, l i ] \ {s }. where and gives that A i ≤ 0 with equality if and only if L i is the only path connecting a i and b i .From ( 1) and ( 2) we obtain (3) and ( 4) by a straightforward calculation, and thereby we easily see that d R (u, v ) as a function of t behaves as stated in (A) and (B).Finally, since d R (u, v ) is a concave function of t ∈ [0, l i ] in the case i j , the inequality (5) follows.
It follows from ( 3) and ( 4) that once Σ = ∆ −1 has been calculated, d R (u, v ) can be quickly calculated for any u, v ∈ L. For example, the Chicago street network in Figure 1 has 338 vertices, and using standard methods for inversion of the 338 × 338 matrix ∆ took less than a 0.1 second.The inequality (5) becomes useful when searching for point pairs u, v ∈ L with d R (u, v ) ≤ r and v ∈ L i , since we need only to consider the cases where d R (u, a i ) ≤ r or

| GAUSSIAN PROCESSES AND ISOTROPIC COVARIANCE FUNCTIONS
Let Y = {Y (u ) | u ∈ L } be a Gaussian process (GP) where each Y (u ) is a real-valued random variable.The distribution of Y is specified by the mean function µ (u ) = ¾Y (u ) and the covariance function The necessary and sufficient condition for a well-defined GP in terms of such two functions µ and c is just that c is symmetric and positive definite.

| Isotropic covariance functions
We are in particular interested in isotropic covariance functions c (u, v ) = c 0 (d (u, v ) ) for all u, v ∈ L, where with some abuse of terminology we also call c 0 a covariance function.So c 0 is required to be positive definite, that is, n j ,ℓ=1 a j a ℓ c 0 (d (u j , u ℓ ) ) ≥ 0 for all a 1 , . . ., a n ∈ , all pairwise distinct u 1 , . . ., u n ∈ S , and all n = 1, 2, . ... Henceforth, we assume that the variance σ 2 = c 0 (0) is strictly positive, and pay attention to the correlation function r 0 (t ) = c 0 (t )/σ 2 .Many of the commonly used isotropic correlation functions, including those in Table 1, are valid with respect to the resistance metric but not always with respect to the geodesic metric.The reason for this is discussed in this section.In Table 1, for comparison we consider isotropic correlation functions defined on other metric spaces (S, d ), where d = ∥ • ∥ is the usual Euclidean metric if S = k , and where d is the geodesic distance if Typically (including all of our examples), r 0 will be a completely monotone function.Recall that a function f : [0, ∞) ↦ → is completely monotonic if it is non-negative and continuous on [0, ∞) and for j = 0, 1, . . .and all u > 0, the j -th derivative f (j ) (u ) exists and satisfies (−1) j f (j ) (u ) ≥ 0. By Bernstein's theorem, f is completely monotone if and only if it is the Laplace transform of a non-negative finite measure on [0, ∞), meaning that for every t ≥ 0, Model Correlation function r 0 (t ) Range of shape and smoothness parameters Powered exponential exp TA B L E 1 Four parametric models for an isotropic correlation function r 0 (r ).Here, Γ is the gamma function, K ν is the modified Bessel function of the second kind, φ is a scale parameter, τ is a shape parameter, and α is a smoothness parameter.The correlation functions are well-defined at all scales φ > 0 but the range of shape and smoothness parameters depend on the model and the space S .For S ∈ { k , k }, the correlation functions are well-defined for every dimension k = 1, 2, . ... For S = L, conditions on L may be needed if distance is not measured by the resistance but the geodesic metric, see Section 3.1.
where F is a cumulative distribution function with F (s ) = 0 for s < 0. We refer to F as the Bernstein CDF corresponding to f .Thus, any non-negative valued distribution with a known Laplace transform can be used to produce a completely monotone function.This fact is used in the following example.

Example 1
The following functions f 1 , f 2 , f 3 are completely monotone functions with f 1 (0) = f 2 (0) = f 3 (0) = 1 and they have corresponding Bernstein CDFs F 1 , F 2 , F 3 defined as follows.For τ > 0, φ > 0, and t ≥ 0, where Γ(τ, φ ) denotes the gamma distribution with shape parameter τ and rate parameter φ, and where Γ −1 (τ, φ) denotes the inverse gamma distribution with shape parameter τ and scale parameter φ.Moreover, for ψ > 0, χ > 0, λ ∈ , and t ≥ 0, and F 3 is the CDF for a generalized inverse Gaussian distribution with probability density function In the next theorem, which summarises Theorems 1 and 2 in Anderes et al. (2020), we need the following definition.We say that L is a 1-sum of L 1 = L 1 ∪ . . .∪ L j and L 2 = L j +1 ∪ . . .∪ L m if L 1 and L 2 are (connected) linear networks where 1 ≤ j < m, L 1 ∩ L 2 = {u 0 } consists of a single point u 0 , and This property is possible if d = d G or d = d R but unless L is a straight line segment it is impossible if d is given by the usual Euclidean distance.Using induction we say for n = 3, 4, . . .
Theorem 2 Let f : [0, ∞) ↦ → be a completely monotone and non-constant function.Then f (d R (u, v ) ) is strictly positive definite over (u, v ) ∈ L × L.Moreover, if L is a 1-sum of trees and loops, then f (d G (u, v ) ) is strictly positive definite over (u, v ) ∈ L × L. However, if there are three distinct paths between two points on L, then there exists a constant φ > 0 so In Table 1, for each model, r 0 is completely monotone for the ranges of the parameters (cf. the comments to Theorem 1 in Anderes et al., 2020).So for each example of r 0 in Table 1 and for r 0 given by f 1 , f 2 , or f 3 in ( 7)-( 9), of trees and loops.In Table 1, the ranges of the parameters are the same for the two cases S = k and S = L (in agreement with that 1 is isomorphic to L if L is a loop).Finally, ( 7) is the special case of the generalized Cauchy function when α = 1 in Table 1, whilst (8) and ( 9) are not covered by Table 1.
For example, consider the Chicago street network in the left panel of Figure 1.Here L is not a 1-sum of trees and loops and therefore we cannot use the geodesic metric for the cases of covariance functions related to the Chicago street network.See also the counter examples in Anderes et al. (2020, Section 5).On the other hand, the dendrite data shown in the right panel of Figure 1 is observed on a tree, so here we can use the geodesic/resistance metric (by Theorem 2(C) the two metrics are equal in this case).

| Simulation of GPs on linear networks
This section discusses how to simulate a GP Y = {Y (u ) | u ∈ L } using three different algorithms, which are all available in our package coxln.We assume without loss of generality that the mean function of Y is zero.
The following is a straightforward algorithm applicable to any metric d and any linear network L.
Algorithm 1 Select a finite subset D ⊂ L and make the following steps.
• Simulate Y restricted to D , e.g. by using eigenvalue decomposition of the corresponding covariance matrix Σ D .
• For u ∈ L \ D , approximate Y (u ) by the average of those Y (v ) where v ∈ D is closest to u with respect to d G .
Specifically, we have chosen a grid D = V ∪ D 1 ∪ . . .∪ D m where each D i is a fine discretization of L i as described after the proof of Theorem 4 below.The disadvantage of Algorithm 1 is of course that the dimension of Σ D can be large and hence eigenvalue decomposition (as well as other methods) can be slow.Algorithm 2 below is much faster but requires L to be a tree and c (u, v ) = σ 2 exp(−sd (u, v ) ) to be an exponential covariance function with parameter ) appears as two special cases in Table 1 with scale parameter φ = 1/s, namely the powered exponential model with α = 1 and the Mátern model with α = 1 2 ).But we first need to establish a Markov property given in the following theorem, where we denote the shortest path between u, v ∈ L by p uv .
Theorem 3 Suppose that Y is a GP on a tree L with exponential covariance function c (u, v ) = σ 2 exp(−sd (u, v ) ) where σ > 0, s > 0, and d = d G = d R .For n = 1, 2, ... and every pairwise distinct points u, v , w 1 , . . ., w n ∈ L so that w i ∈ p uv for at least one w i , we have that Y (u ) and Y (v ) are conditionally independent given Y (w 1 ), . . .,Y (w n ). .
Inverting the covariance matrix, we get that the corresponding precision matrix has 0 at entries (1, 2) and (2, 1), thus implying that Y (u ) and Y (v ) are conditionally independent given Y (w ).
Consider the case n = 2 and e.g.
In a similar way we verify the case with n ≥ 3.
We use the following notation in Algorithm 2. Pick an arbitrary origin u 0 ∈ V and set G 0 (u 0 ) = {u 0 }.For half-open line segment with endpoints u and v so that u is excluded and v is included.
Algorithm 2 Suppose that L is a tree.Let s > 0 and σ > 0 be parameters, and pick an arbitrary vertex u 0 ∈ V .Construct random variables Y (w ) for all w ∈ L by the following steps.
(II) For j = 1, 2, . .., conditioned on all the Y (w ) so far generated, generate independent GPs Y (u, v ) for all u ∈ G j −1 (u 0 ) and all v ∈ G j (u 0 ) with u ∼ v , where Y (u, v ) depends only on Y (u ) and for every w , w 1 , w 2 ∈ (u, v ] we have Theorem 4 Let the situation be as in Algorithm 2. Then Y is a zero mean GP on L with exponential covariance function Proof By considering the GPs associated to the successive generations to u 0 , we prove by induction that the iterative construction in Algorithm 2 makes Y a GP as stated. Clearly the distribution of Y (u 0 ) is correctly generated, cf.(I).This is induction step j = 0.
For induction step j ≥ 1, we condition on all the Y (w ) so far generated, i.e., w is equal to either u 0 or Y (w ) is a member of a GP associated to two neighbouring vertices, with one being a child of i -th generation i to u 0 and the other being a child of (i − 1)-th generation i to u 0 where 1 ≤ i ≤ j , cf. (I) and (II) in Algorithm 2 (here we interpret u 0 as a child of 0-th generation to u 0 ).By the induction hypothesis, these Y (w ) have been correctly generated.If we take two points w 1 and w 2 contained in different line segments between points in G j −1 (u 0 ) and G j (u 0 ), then by Theorem 3, Y (w 1 ) and Y (w 2 ) are (conditionally) independent, which is in accordance to our construction.So it suffices to consider the (conditional) distribution of (Y (w 1 ),Y (w 2 ) ) when and u ∼ v .This (conditional) distribution depends only on Y (u ), cf.Theorem 3, and it is straightforwardly seen to be a bivariate normal distribution with mean and covariance matrix given by ( 10) and ( 11), respectively.
In practice, when using Algorithm 2 for simulation, a discretization on each line segment is needed.For some integer n j > 0 (which may depend on l j ) and i ∈ {0, 1, . . ., n j }, define s i = i l j /n j .Then, for each u ∈ L j , if u is the midpoint between s i −1 and s i , we approximate Y (u ) by the average (Y (u j (s i −1 ) ) + Y (u j (s i ) ) )/2, and otherwise if u is closest to u j (s i ), we approximate Y (u ) by Y (u j (s ) ).For example, if u is closest to u j (s i ), the approximation error is bounded in probability by for t ≥ 0 where Φ is the distribution function for the standard normal distribution.(Similarly, in case of Algorithm 1, where in ( 12) we replace the exponential correlation function by the correlation function of Y .)Further, to generate the n j − 1 normal variables Y (u j (s i ) ), i = 1, . . ., n j − 1, we start by generating the variable Y (u j (s 1 ) ) in accordance to ( 10) and ( 11) with w = w 1 = w 2 = s 1 .Then we can add u j (s 1 ) to the vertex set, whereby we split l j into the two line segments given by this new vertex and the endpoints of l j .Hence, if n j > 1, we can repeat the procedure when generating Y (u j (s 2 ) ), and so on until all the n j − 1 normal variables have been generated.
Theorems 3 and 4 do not hold if L is not a tree.Indeed, a GP on the circle 1 (which is equivalent to a loop of length 2π) with exponential covariance function, considering four arbitrary distinct points on 1 , it can be shown that the GP is not Markov by calculating the precision matrix of the GP at these four points.On the other hand, Pitt (1971) verified that the GP on 1 with covariance function c is Markov, but considering two arbitrary points on a tree together with a point on the path connecting them, it can be shown that the GP on the tree with covariance function c is not Markov.Consequently, we cannot have a covariance function only depending on the geodesic distance which, for an arbitrary linear network, makes a GP Markov.Moreover, in general, if we want to simulate a GP with an exponential covariance function on a linear network which is not a tree, we cannot rely on Markov properties.Instead we have to use the straightforward but slower Algorithm 1.
For other covariance functions than the exponential, we may use the following theorem, which follows immediately from (6) and the central limit theorem.
Theorem 5 Suppose d is a metric on L so that (u, v ) ↦ → exp(−sd (u, v ) ) for (u, v ) ∈ L 2 is a well-defined correlation function for all s > 0, and let Y be a zero mean GP on L with covariance function where σ > 0 and F is a CDF with F (s ) = 0 for s < 0. For an integer n > 0 and i = 1, . . ., n, suppose we generate first S i from F and second Y i as a zero mean GP on L with covariance function Conditioned on S 1 , ..., S n , the output in Algorithm 3 is an approximate simulation of a zero mean GP with covariance function The running times of Algorithms 1-3 for simulating an approximate GP on a tree are compared in the following example, and the choice of n in Algorithm 3 needed to obtain that ( 14) is a good approximation of the covariance function ( 13) is considered in the example thereafter.
Example 2 Let L be the dendrite tree in Figure 1. and (τ, φ ) = (5, 5).For these choices of parameters and for n = 20, 50, 200, Figure 2 shows the correlation functions along with 100 approximations given by (14).For n = 20 the approximations show a lot of variability around the true correlation functions, while for n = 200 the approximations are much closer to the true correlation functions.Also it is evident in the figure that a higher n is required to obtain a good approximation for (τ, φ) = (1, 1) than for (τ, φ ) = (5, 5).This is expected since the variance of If L is not a tree, Algorithms 2 and 3 cannot be used, so we use the straightforward shows that the correlation is smallest when r is fixed and rather similar when using the Γ or Γ −1 distribution although these distributions are rather distinct (e.g. the variance is finite for Γ(1, 100) but infinite for Γ −1 (2, 0.01)).Accordingly, in plot 2 we see less smoothness than in plots 3 and 4 which show a similar degree of smoothness.

| POINT PROCESSES AND SOME OF THEIR CHARACTERISTICS
This section reviews point processes, moment properties, and inference for point process models on linear networks, and the section provides the needed background material for Sections 5-6.Readers who are familiar with the general theory for point processes defined on a metric space may glance many parts of Sections 4.1, 4.2, and 4.4.Section 4.3 contains new results for the K -function on a linear network.

| Point process setting
We restrict attention to point processes whose realisations can be viewed as finite subsets x of L: Let B be the class of Borel sets A ⊆ L. We let the state space of a point process on Then by a point process is meant a random variable X with values in N (in the terminology of point process theory, X is a simple locally finite point process, see e.g.Daley and Vere-Jones, 2003).Equivalently, for every A ∈ B, the count We say that X is a Poisson process with intensity function ρ : , and conditioned on N (L ), the points in X are independent and each point has a density proportional to ρ with respect to ν.
For every u ∈ L, we let X u be the point process which follows the reduced Palm distribution of X at u, that is,  Middle row: When r 0 has Bernstein CDF given by a gamma distribution (left) or inverse gamma distribution (right), where in both cases the mean of s is 0.01.Bottom row: For plots 2-4, the corresponding densities for F and correlation functions, where the curves in solid, dashed, and dotted correspond to plots 2-4, respectively.
for any non-negative measurable function h defined on N × L (equipped with the product σ-algebra of F and B).
Intuitively, X u follows the distribution of X \ {u } conditioned on that u ∈ X (see e.g.Møller and Waagepetersen, 2004, Appendix C).If ρ (u ) = 0, X u may follow an arbitrary distribution.If X is a Poisson process with intensity function ρ, then X and X u are identically distributed whenever ρ (u ) > 0.
Let X 1 denote the Poisson process with intensity 1. Suppose the distribution of X is absolutely continuous with respect the distribution of X 1 .Let f be the density of (the distribution of) X with respect to (the distribution of) X 1 .

| Moment properties
The point process X has n-th order intensity function ρ (u 1 , . . ., u n ) (with respect to the n-fold product measure of ν) if this is a non-negative Borel function so that for every pairwise disjoint sets A 1 , . . ., A n ∈ B (ρ (u 1 , . . ., u n ) is also called the n-th order product density for the n-th order reduced moments measure).Thus, ρ (u 1 , . . ., u n ) is an integrable function, which is almost everywhere unique on S n (with respect to the n-fold product measure of ν).In the following, for simplicity nullsets are ignored, so the non-uniqueness of ρ (u 1 , . . ., u n ) is ignored.Moreover, when we write ρ (u 1 , . . ., u n ) it is implicitly assumed that the n-th order intensity function exists.
In particular, ρ (u ) is the usual intensity function.The point process is said to be (first-order) homogeneous if Instead of the second order intensity function, one usually considers the pair correlation function (pcf) given by , setting a 0 = 0 for a ≥ 0, and one usually assumes that is isotropic.For a Poisson process, so g = 1.One often interprets g 0 (t ) > 1 as repulsion/inhibition and g 0 (t ) < 1 as attraction/clustering for point pairs at distance t apart but one should be careful with this interpretation as t grows.
The specific models in this paper are typically attractive (g > 1) and satisfies the following stronger property: For n = 2, 3, . .., i = 1, . . ., n − 1, and any pairwise distinct u 1 , . . ., or equivalently, for any pairwise disjoint sets A 1 , A 2 , . . .∈ B, So ( 19) means that the counts N (A 1 ), N (A 2 ), . . .are positively correlated at all orders, and for brief we shall say that X is positively correlated at all orders.
Non-parametric estimation of ρ and g are discussed in Rakshit et al. (2017), Rakshit et al. (2019), andBaddeley et al. (2021).For non-parametric estimation of the pcf, usually kernel methods are used under the assumption (17) of isotropy (but we do not need to assume that X is (first-order) homogeneous).

| K -function
Since kernel methods are sensitive to the choice of bandwidth, popular alternatives are given by estimators of the K -function defined in (20) below.On the other hand, for the specific parametric models in this paper, we have simple expressions for g 0 but not for K , and since the K -function is an accumulated version of g 0 , it may be harder to interpret.
Suppose g (u, v ) = g 0 (δ (u, v ) ) where δ is a metric on L (since we consider derivatives below, it is convenient to switch from the previous notation d to δ).Then X is said to be δ-correlated, cf.Rakshit et al. (2017); is also said to be second-order reweighted pseudostationary (Ang et al., 2012).Following Rakshit et al. (2017) and Note that K depends only on g 0 , but g depends on both g 0 and δ.If X is a Poisson process, then K (t ) = t .
Non-parametric estimation of K was carefully studied in Rakshit et al. (2017) under the following technical assumption for the metric.Suppose δ is regular, meaning that for every u ∈ L, δ (u, v ) is a continuous function of v ∈ L and there is a finite set N ⊂ L such that for i = 1, . . ., m and all v ∈ L i \ N , the Jacobian exists and is non-zero where t = ∥v − a i ∥.Both d G and d R are regular, where J d G = 1 and a useful expression for the calculation of J d R is given in the following corollary which follows immediately from Proposition 1.
Corollary 1 For all u ∈ L j and v ∈ L i with u v , using a notation as in Proposition 1, we have Some final remarks are in order.
For u ∈ L and 0 ≤ t ≤ R , define This is a weight which accounts for the geometry of the linear network when shifting from arc length measure on L to Lebesgue measure on the positive half-line (Rakshit et al., 2017, Propositions 1 and 2).For δ = d G , we have , and for δ = d R , once the matrix Σ from Section 2.2 has been calculated, w d R is quickly calculated from ( 22).
It follows from ( 16), (20), and Rakshit et al. (2017, Equation (8)) that for any A ∈ B of positive arc length measure, Non-parametric estimators of K are based on omitting the expectation symbol in ( 23), possibly after elaborating on the right hand side in ( 23) in order to realize how correction factors can be included in order to adjust for edge effects, cf.Rakshit et al. (2017).We may use ( 23) as a more general definition of K without assuming the existence of the pcf but requiring that the right hand side in ( 23) is not depending on the choice of A, cf.Baddeley et al. (2000).
In terms of Palm probabilities, for d L -almost all u ∈ L with ρ (u ) > 0, In general the weight makes it hard to interpret this expression of K .For point processes with points in k or k , there are much simpler expressions of K -functions in terms of Palm probabilities, see e.g.Baddeley et al. (2000) and Møller and Rubab (2016).This is caused by that a translation is a natural transitive group action on k and a rotation is a natural transitive group action on k .However, there is no natural transitive group action on a linear network.

| Estimation and model checking
For parametric families of Cox point process models the most common estimation methods are based on the intensity, pair correlation, or K -functions using either minimum contrast estimation, composite likelihood, or Palm likelihoods, see Møller andWaagepetersen (2007, 2017) and the references therein.For the analyses in Section 6, we use minimum contrast estimation for fitting the Cox process models in Section 5.2 to various datasets in Section 6 using the pcf (we discuss this choice of estimation method in Section 7.3): If g 0 depends on a parameter, we estimate this parameter by minimizing the integral where ĝ0 is a non-parametric estimate of g 0 (see e.g. ( 31) in Rakshit et al., 2017) and 0 ≤ a 1 < a 2 and p, q > 0 are user-specified values (see Section 6).The models in Section 5.2 all have nice expressions of the pcf but not of the K -function.Moreover, the models used for the data analyses in Section 6 include a parameter for the intensity function, which g 0 does not depend on, and this parameter is estimated by a simple moment method; in the simplest case homogeneity is assumed and the intensity is estimated by ρ = n (x )/|L |.
For model checking other functional summary statistics are needed when ρ, g , or K and their corresponding nonparametric estimators have been used for estimation.Cronie et al. (2020) suggested analogies to the so-called F , G, Jfunctions (introduced in Van Lieshout, 2011, when considering point processes with points in k ) which account for the geometry of the linear network.Cronie et al. (2020) showed that their F , G, J -functions make good sense under a certain condition called intensity reweighted moment pseudostationarity (IRMPS): X is IRMPS if inf ρ > 0 and δ is a regular metric on L such that for n = 2, 3, . .., any pairwise distinct u 1 , . . ., u n ∈ L, and any u ∈ L, g (u 1 , . . ., for some function g 0 .This condition is satisfied if X is either a Poisson process or a log Gaussian Cox process (LGCP) with a stationary pair correlation function (which is usually not a natural assumption, cf.Section 5. LGCP.We show in Section 5.2.1 that IRMPS is in general not satisfied for the LGCP.For the SSI point process in Cronie et al. (2020) it is hard to evaluate g (u 1 , . . ., u n ) and hence to check the assumption of IRMPS.
Alternatively, Christensen and Møller (2020) introduced three purely empirical summary functions obtained by modifying the empirical F , G, J -functions for (inhomogeneous) point patterns on a Euclidean space to linear networks.
Briefly, the modification consists of replacing the Euclidean space with the linear network, introducing the shortest path distance instead of the Euclidean distance, and adapting the notion of an eroded set to linear networks.
Christensen and Møller (2020) demonstrated the usefulness of their empirical F , G, J -functions for model checking although underlying theoretical functions are missing.We also use these functions for the data analyses in Section 6.
Specifically, we will consider a concatenation of the three functions and validate a fitted model using a 95% global envelope (i.e., a 95% confidence region for the concatenated function

| COX PROCESSES DRIVEN BY TRANSFORMED GAUSSIAN PROCESSES
This section introduces Cox processes on linear networks and in particular various kinds of model classes obtained by a transformed Gaussian process (interpreted as a random intensity function).Since such Cox process models are introduced for the first time they deserve some attention, although readers who are familiar with similar models defined on k or k may glance many parts of Sections 5.1-5.2.Moreover, Section 5.3 introduces an index of cluster strength.

| Cox processes on linear networks
Let Λ = {Λ (u ) | u ∈ L } be a non-negative stochastic process and for any A ∈ B, define the random measure ξ u ).Throughout this section we assume that almost surely ξ (L ) is finite, and we let X be a Cox process driven by Λ: That is, X is a point process on L which conditioned on Λ is almost surely a Poisson process with intensity function Λ (with respect to ν).Usually in applications Λ is unobserved, and so if only one point pattern dataset is observed the Cox process X is indistinguishable from the inhomogeneous Poisson process X |Λ (for a discussion of which of the two models is most appropriate, see Møller and Waagepetersen, 2004, Section 5.1).
Henceforth, assume ¾Λ (u ) is an integrable function with respect to ν.Then the Cox process X is well-defined, since almost surely ξ (L ) < ∞.By conditioning on Λ it follows from ( 16) and ( 18) that Let W ⊆ L be a bounded Borel set with ν (W ) > 0; we think of W as an observation window.Then X W is a Cox process driven by Λ restricted to W , and X W has a density given by with respect to X 1 ∩ W , where X 1 is the unit rate Poisson process, cf.Section 4.1.In particular, if W = L and u ∈ L with ρ (u ) > 0, then X u has density with respect to X 1 , cf. ( 15).In general the densities in ( 27) and ( 28) are intractable because the expected values are difficult to evaluate.Instead we exploit ( 26) for calculating the intensity and pair correlation functions and for making inference as demonstrated in the following sections.
As pointed out in Møller and Waagepetersen (2007) it is useful to write Λ as where Λ 0 = {Λ 0 (u ) | u ∈ S } is a non-negative 'residual' stochastic process with ¾Λ 0 (u ) = 1 whenever ρ (u ) > 0. Then Typically, it is only ρ (u ) which is allowed to depend on covariate information, whilst Λ 0 is considered to account for unobserved covariates or other effects which has not been successfully fitted by a Poisson process with intensity function ρ, see e.g.Møller and Waagepetersen (2007) and Diggle (2014). If for n = 2, 3, . .., i = 2, . . ., n, and all pairwise distinct u 1 , . . ., u n ∈ S , then X is positively correlated at all orders, cf. ( 19).
The condition (31) will often be satisfied in the following.

| Models
Consider a GP Y = {Y (u ) | u ∈ L } with mean function µ and covariance function c, and let Y 1 , . . .,Y h be independent copies of Y .In the remainder of this paper we study the following models.
• X is a log Gaussian Cox process (LGCP) if and µ (u ) = −c (u, u )/2 for all u ∈ L. The latter condition is required since we want ¾Λ 0 (u ) = 1.
• X is an interrupted Cox process (ICP) if µ = 0 and for all u ∈ L where we define Π(u 2 ) (this definition differs slightly from the one used in Lavancier and Møller (2016), which includes a factor 1/2 inside the exponential function).Since ¾Π(u In all cases, the distribution of X is completely specified by (ρ, c ) and in the case of an ICP or PCPP the value of h.
Note that the intensity function ρ can be any non-negative locally integrable function with respect to ν.
Similarly defined LGCP, ICP, and PCPP models are well-studied for point processes with points in k or k , and most of their properties immediately extend to linear networks as summarized in the following.

| Properties of log Gaussian Cox processes
Let X be a LGCP, cf.(32).As in Møller et al. (1998) and Coeurjolly et al. (2017), we obtain the following results.For any integer n ≥ 2 and any pairwise distinct u 1 , . . ., In particular, the LGCP is determined by ρ and i.e., by its first and second order moment properties.Note that g is isotropic if and only if c is isotropic, and then ρ (u 1 , . . ., u n ) depends only on the inter-point distances d (u i , u j ), 1 ≤ i < j ≤ n.Moreover, in most specific models, including those in Table 1, c ≥ 0 or equivalently X is positively correlated at all orders, cf. ( 19) and ( 35).
For u ∈ L with ρ (u ) > 0, the reduced Palm process X u is a LGCP with intensity function ρ This follows from ( 27) and ( 28) along similar lines as in Coeurjolly et al. (2017).Consequently, if c is isotropic, the K -functions of X and X u agree.
Let us return to the concept of IRMPS as defined by ( 25).Cronie et al. (2020) noticed that IRMPS is satisfied for the LGCP if inf ρ > 0 and for all u 1 , u 2 , u ∈ L, for some function c 1 (in our notation; see Cronie et al., 2020, Equation (29)).This statement is true due to (35), however, in our opinion ( 37) is a very strong condition, since we are not aware of any good examples where it is satisfied unless L is isometric to a closed interval and δ is usual (Euclidean/geodesic/resistance) distance.
Incidentally, in Cronie et al. (2020, Lemma 2) the metric δ is assumed to be origin independent; they did not define the meaning of 'origin independent' but we have been informed (by personal communication) that they had in mind that (37) should be satisfied and when they let δ = d R be the resistance metric (in the text after Lemma 2 in Cronie et al., 2020), they admit that they misunderstood the meaning of origin independent as used in Anderes et al. (2020, Proposition 2).Moreover, the proof of Cronie et al. (2020, Lemma 2) is incorrect (they claim that δ (u ′ , u 1 ) = δ (u ′′ , u 1 ) for any u ′ , u ′′ , u 1 ∈ L, which is obviously not correct if δ = d R , and which seems wrong in general for another choice of metric).

| Properties of interrupted Cox processes
Let X be an ICP, cf. ( 33).Then X conditioned on Π is obtained by an independent thinning of a Poisson process Z on L with intensity function ρ Z (u ) = ρ (u ) (1 + c (u, u ) ) h/2 , where the selection probabilities are given by Π.In the terminology of Stoyan (1979), X is an interrupted point process.
Assuming for ease of presentation that c (u, v ) = σ 2 r 0 (d (u, v ) ) is isotropic with r 0 (0) = 1, we obtain in a similar way as in Lavancier and Møller (2016) that the mean selection probability is constant and given by and the pcf is given by Third and higher-order moment results are less simple to express, but it can be proven that X is positively correlated at all orders.As σ increases from 0 to infinity, then p ms decreases from 1 to 0, whilst g 0 (t ) increases from 1 to , which shows a trade-off between the degree of thinning and the degree of clustering.
To understand how g 0 (t ) depends on h it is natural to fix the value of p ms ∈ (0, 1).Then is a strictly decreasing function of h whenever r 0 (t ) 0. Consequently, taking h = 1 is natural if we wish to model as much clustering as possible when keeping the degree of thinning fixed.

| Properties of permanental Cox point processes
Let X be a PCPP, cf.(34).Permanental Cox point processes with points in k were studied in Macchi (1975) and McCullagh and Møller (2006); using the parametrization in the latter paper, X is a PCPP with parameters α = h/2 and To stress that c is a correlation function, we shall write c = r .For n = 1, 2, . . .and u 1 , . . ., u n ∈ L, define the α-weighted permanent by where the sum is over all permutations π = (π 1 , . . ., π n ) of (1, . . ., n ) and #π is the number of cycles.The usual permanent corresponds to α = 1 (Minc, 1978), in which case X is also called a Boson process (Macchi, 1975).We have from which it can be verified that X is positively correlated at all orders.It also follows that the degree of clustering is a decreasing function of α.In particular, Thus g ≤ 1 + 1/α ≤ 3, which reflects the limitation of modelling clustering by a PCPP.Valiant (1979) showed that exact computation of permanents of general matrices is a #P (sharp P) complete problem, so no deterministic polynomial time algorithm is available.For most statistical purposes, approximate computation of permanent ratios is sufficient, and analytic approximations are available for large α.However, as α → ∞, X tends to a Poisson process and the process becomes less and less interesting for the purpose of modelling clustering.
The PCPP can be extended to the case where α ≥ 0 and c is not necessarily a covariance function (in which case we loose the connection to Gaussian and Cox processes), see McCullagh and Møller (2006) and Shirai and Takahashi (2003).Indeed the process also extends to the case where α is a negative integer whereby a (weighted) determinantal point process (DPP) is obtained.We return to DPPs in Section 7.1.

| An index of cluster strength
Consider again a LGCP, ICP, or PCPP X with an isotropic covariance function c (u, v ) = σ 2 r 0 (d (u, v ) ) where r 0 (0) = 1, h = 1 if X is an ICP, and σ = 1 if X is a PCPP.To quantify how far X is from a Poisson process we follow Baddeley et al. (2022) in defining If X has constant intensity ρ, the total variation distance between the distribution of X and that of a Poisson process with intensity ρ is at most ρν (S ) √ ϕ.This follows since Baddeley et al. (2022, Lemma 9) immediately extends to our setting of linear networks.

Baddeley et al. (2022) used ϕ to describe the degree of clustering for another class of Cox processes, namely
Neyman-Scott point processes (on k and with the cluster size following a Poisson distribution): the degree of clustering increases as ϕ increases.Although our Cox processes are not cluster point processes, ϕ has a similar interpretation since realizations of the processes may look more or less clustered.More precisely, it follows from ( 36), (39), and ( 40) and ϕ + 1 is the maximal value of the pcf.Thus, if X is a LGCP or an ICP, ϕ is a strictly increasing function of σ > 0, where X approaches a Poisson process as σ → 0, while realizations of X become more and more clustered as σ grows.
If instead X is a PCPP, the degree of clustering decreases as h ∈ {1, 2, ...} increases, where X approaches a Poisson process as h → ∞.
The index ϕ for 'cluster strength' is used in Sections 6.3 and 7.3.

| APPLICATION OF STATISTICAL INFERENCE PROCEDURES
For the analyses of the real and simulated datasets in this section, we have used the functions mincontrast and linearpcf from spatstat for calculating the minimum contrast estimates and the non-parametric estimate of the pcf (the latter is only available in the case of the geodesic metric), respectively, as well as a range of other minor functions from spatstat for handling point processes on linear networks.Furthermore, we have used the function global_envelope_test from the GET package for global envelope tests.The rest of the code (such as the estimation of the pcf in the case of the resistance metric, or the estimation of the F , G , and J -functions) we have implemented ourselves, and it is available in our package coxln.For the analysis of the street crimes in this paper we fitted instead all three model classes (LGCP, ICP, and PCPP)

| Analysis of Chicago crime dataset
given in Section 5.For the Gaussian process underlying the Cox process models, we used for simplicity a constant mean and an isotropic exponential covariance function with metric d = d R (since the network is not a 1-sum of trees and cycles, the geodesic metric will not give well-defined models).Thus we have (up to) four unknown parameters in each model: ρ, the intensity of the point process; σ 2 , the variance of the Gaussian process (this is not a parameter in the PCPP model); s, the scaling of the exponential covariance function; and h, the number of Gaussian processes used in the model (this is not a parameter in the LGCP model).
To estimate the parameters of all three models, we used the unbiased estimate ρ = n (x )/|L | = 0.00372 for the intensity and estimated the remaining parameters by the minimum contrast method based on the pair correlation Model ρ σ2 ŝ ĥ LGCP   24).We estimated g 0 non-parametrically on the interval r ∈ [0, 100], but since the shape of the nonparametric estimate mostly resembled the shapes of the theoretical pair correlation functions for the three models on the interval r ∈ [20, 100], in (24) we let a 1 = 20 and a 2 = 100 while p and q were given by the default values in spatstat.
The estimated parameter values are shown in Table 3. Figure 4 shows the non-parametric estimate together with the estimated pair correlation functions for each of the three models, where the estimated pair correlation functions are very similar for the LGCP and the ICP, while the pair correlation function for the PCPP deviates from the other two at low distances.
The left column of Figure 5 shows a simulation from each of the three fitted models.None of the simulations show strong deviation from the data, although the ICP does show a tendency to have small densely packed clusters of points that are not present in the data.Moreover, the right column of Figure 5 shows the 95% global envelopes based on a concatenation of the empirical F , G, J -functions (based on 999 simulations) discussed at the end of Section 4.4.
The LGCP provides the best fit with a p-value of 0.155 for the global envelope test.The ICP provides a rather bad  possible problems with the estimation procedure, which we will explore in a simulation study in Section 6.3, or with unidentifiability of the parameters of the Bernstein CDF.
To try out a model which do not become almost identical to the model using the exponential covariance function, we considered a covariance function with an inverse gamma Bernstein CDF with τ = 2 fixed (corresponding to an inverse gamma distribution with an infinite variance).Table 4 shows the minimum contrast estimate and Figure 6 shows the corresponding estimated covariance function.This covariance function has a larger variance parameter σ 2 and a heavier tail than the estimated exponential covariance function.The 95% global envelope in Figure 7 (right panel) and the p-value of 0.243 for the global envelope test reveal that the covariance function with inverse gamma density and fixed parameter τ = 2 provides a similar good fit as the exponential covariance function.This observation is further discussed in Section 7.3.

| Simulation study
The analysis of the dendrite dataset showed the problem that we in practice get an exponential covariance function when we fit the covariance functions given in Example 1.It should be noted that all these covariance functions have the exponential covariance function as a limiting case, which makes it possible to get arbitrarily close to an exponential covariance function in the estimation procedure, while this was not the case when the τ parameter was fixed in the analysis of the dendrite data.
To explore whether this was a general problem or simply applied to the dendrite dataset, we made a number of simulations using covariance functions which were different than the exponential case to see if we still obtained exponential covariance functions from the estimation procedure.Specifically we took the network used in the dendrite dataset and simulated 1000 simulations of an ICP with a homogeneous intensity function and a covariance function with inverse gamma Bernstein CDF.We did this for various combinations of parameters, where σ 2 ∈ {10 −1 , 10 0 , 10 1 , 10 2 , 10 3 , 10 4 } and τ ∈ {1.1, 1.5, 2, 5} while the rest of the parameters were given by (ρ, h, φ ) = (1, 1, 0.02).The mean number of points is ¾N (L ) = ρ |L | = 736 for all the chosen parameter settings.For each simulation we fitted two models using minimum constrast estimation (using the same values of a 1 , a 2 , q, p as in Section 6.2): an ICP with an inverse gamma Bernstein CDF with h and τ fixed at their true values, and an ICP with an exponential covariance function with h = 1 fixed.For each simulation, we calculated the non-parametric estimate of the pcf and used D in (24) for calculating its distance to each of the two pcfs for the estimated models; using an obvious notation, these distances are denoted D Exp and D IG .For each combination of parameters σ 2 and τ, Table 5 shows the percentage of simulations where D IG > D Exp , i.e., the cases where the non-parametric estimate of the pcf resembles an exponential covariance function more than a covariance function with inverse gamma Bernstein CDF with τ equal to the value used in the simulations.To quantify how far the ICP is from a Poisson process, the values of ϕ given by ( 42) and p ms given by ( 38) are also shown in Table 5.The row in the table corresponding to σ 2 = 10 −1 contains values close to 50%, which makes sense since g 0 (0) − 1 = 0.00416 or p ms = 0.912 reveal that the simulated processes are almost Poisson.In the rest of the table the non-parametric estimate are typically closest to an exponentiel covariance function, showing that there indeed is a tendency for getting an estimated model corresponding to an exponential covariance function (although the values seem to decrease in the last row).
We also made similar simulation studies for LGCPs and PCPPs.These showed similar values to those in Table 5 for the LGCP, while the values where significantly smaller for the PCPP (typically in the range 30%-40% when ϕ = 2 is as large as possible in this model).
The results of this section are further discussed in Section 7.3.

| CONCLUDING REMARKS
Our results and considerations on point processes on linear networks in the previous sections give rise to various conclusions and open research problems which we discuss in Sections 7.1-7.4below.

| New point process models
Baddeley et al. ( 2017) mentioned the lack of repulsive models on linear networks.An interesting case is a determinantal point process (DPP) which to the best of our knowledge has yet not been investigated in connection to linear networks.For any given covariance function c on L, a well-defined DPP will be specified by the density with respect to the unit rate Poisson process on L. A DPP is a model for repulsiveness (e.g.g ≤ 1) and it possesses a number of appealing properties.See Lavancier et al. (2015) (and the references therein) who considered DPPs with points in k but many things are easily modified to DPPs on linear networks.
Several facts are interesting: If c is isotropic then the pcf of the DPP is isotropic, and e.g.c could be specified by one of the isotropic covariance functions given in Table 1 and Example 1.The normalizing constant which is omitted in (43) can be expressed in terms of the eigenvalues of a spectral decomposition of c.This spectral decomposition is also needed when specifying the n-th order intensity functions and an efficient simulation algorithm of the DPP.
However, it is an open problem how to determine the eigenvalues and eigenfunctions of the spectral representation, in particular if c is required to be isotropic.Until this problem has been solved, we need to work with the unnormalized density (in the right hand side of ( 43)) and to use MCMC methods for approximating the normalizing constant (Geyer, 1999;Møller and Waagepetersen, 2004).
The recent paper Bolin et al. (2022) on a new class of Whittle-Matérn GPs on compact metric graphs is interesting for several reasons including the following.The model class is flexible and contains differentiable GPs; the precision matrix at the vertices can be quickly calculated; and Markov properties of the GP can be used for computationally efficient inference.Thus the Whittle-Matérn GPs may serve as an interesting alternative to the GPs considered in the present paper.Although the Whittle-Matérn covariance function is not isotropic, and hence in connection to items (I)-(VI) in Section 1 may appear to be less useful, this could very well be an advantage as pointed out after item (VI).

| The choice of metric for isotropic covariance and pair correlation functions
Isotropic covariance functions are available for larger classes of linear networks with respect to the resistance metric than the geodesic metric, cf.Theorem 2. Therefore, we are able to obtain LGCPs, ICPs, and PCPPs for larger classes of linear networks with respect to the resistance metric than the geodesic metric.Indeed, this kind of flexibility assumes that one uses the same covariance but in different metrics, which is a bit specific.Could we obtain the same flexibility with a different covariance function in the geodesic metric?How do we prove that it will actually be a valid covariance function?
Various other comments in the previous sections highlighted the interest of d R compared to d G .In addition we notice the following.
Suppose L is not a tree and that both c  36), (39), and (40) that g G (u, v ) ≤ g R (u, v ), indicating that X R is able to model a higher degree of clustering than X G .
On the other hand, for a DPP, we have always that g 0 ≤ 1, and typically g 0 < 1 and g 0 is an increasing function.
So for two such DPPs with the same g 0 -function but given by the different metrics d G and d R , respectively, the DPP using d R is able to a higher degree of repulsiveness than the DPP using d G .

| Estimation
For parameter estimation in this paper we used minimum contrast estimation based on the pcf which worked well when we used an exponential covariance function, but this approach was not able to identify the parameters in the covariance functions given by the Bernstein CDFs in Example 1 unless we fixed a subparameter, cf.Sections 6.2 and 6.3.We are not sure if this is a problem of unidentifiability or caused by the choice of estimation procedure for the following reasons.
On the one hand, Christensen and Møller (2020) concluded that for the dendrite data minimum contrast estimation based on the pcf performs better than if it is based on the K -function, and the alternative method based on composite likelihood estimation is less reliable than minimum contrast estimation for the exponential covariance function, thus suggesting that this approach will not improve estimation for more complex covariance functions either.This is also in line with the comments in Baddeley et al. (2022) on estimation procedures.In particular, Baddeley et al.
(2022, Section 12) concluded that Neyman-Scott point process models are poorly identified under weak clustering, irrespective of the model parametrization, and this is not due to faults in currently available fitting methods (they considered the Euclidean state space case but we do not expect the choice of space to be of importance for this conclusion).
On the other hand, if the problem is unidentifiability, one idea for estimating the parameters of the Bernstein CDFs could be to to use shrinkage estimators involving a penalty on cluster scale as was done in Baddeley et al. (2022) using the parameter ϕ in (41).However, we also remade the simulation studies behind Table 5 in the case where we fixed the parameter σ 2 at its true value, which did not improve the estimation of the parameters in the Bernstein CDF, thus suggesting this approach might not work.
A more careful study of whether the problem is the choice of estimation procedure or unidentifiability, extending the methods of Baddeley et al. (2022) to our setup, is left for future research.More ambitiously than minimum contrast, composite likelihood, and other moment based estimation procedures (Møller and Waagepetersen, 2017), future work may also investigate likelihood-based inference for parametric point process models on linear networks.
This could include the adaptation of missing data MCMC methods for maximum likelihood (Geyer, 1999;Møller and Waagepetersen, 2004) and Bayesian approaches (Rue et al., 2009;Guttorp and Thorarinsdottir, 2012) to linear networks.

| Miscellaneous
In ( 12) we provide a bound in probability of the approximation error of Algorithms 1 and 2 due to the discretization.
This may be extended to a bound for the maximal approximation error when considering variables (Y (u 1 ), ...,Y (u n ) ) with u 1 , ..., u n on the same line segment of L (using a similar approach as in Equation ( 4.3) in Wood and Chan (1992) which is based on an inequality for multivariate normal probabilities on rectangles, see Chapter 2 in Tong (1982)).
Furthermore, for Algorithm 3, it could be interesting to establish convergence rates.
The special results in Section 5.2.1 for LGCPs on linear networks (as well as LGCPs defined on other state spaces like k and k ) could possibly be exploited for developing techniques of model fitting or checking, where we have the following results in mind.Assume the covariance function of the underlying GP is isotropic.The K -functions based on X and X u agree, so if X is observed within a bounded window W ⊆ L, empirical estimators K and Ku for u ∈ X W are expected to be close.Moreover, the special structure in (35) implies that ρ (u 1 , u 2 , u 3 ) ρ (u 1 )ρ (u 2 )ρ (u 3 ) = g 0 (d (u 1 , u 2 ) )g 0 (d (u 1 , u 3 ) )g 0 (d (u 2 , u 3 ) ).
This structure was exploited in Møller et al. (1998) for LGCP with points in k to construct a third-order moment characteristic which is useful for model checking, but it remains to consider the case of a LGCP on a linear network.
We have given various references for non-parametric estimation of the intensity, pair correlation, and K -functions on linear networks, cf.Sections 4.2 and 4.3.In the inhomogeneous case these estimators depend 'locally' on the inten-

Theorem 1
We have the following properties of d G and d R .(A) The definition (2) of d R does not depend on the choice of origin u 0 ∈ V .(B) Both d G and d R are metrics on L, and their definitions are invariant to splitting a line segment L i into two line segments.
Algorithm 1 instead, provided of course that Y is specified by a valid covariance function, meaning that Algorithm 1 may work for d = d R but not for d = d G as illustrated in the following example.Example 4 Figure 3 shows examples of simulations of zero mean GPs defined on the Chicago street network in Figure 1 and with various isotropic covariance functions c (u, v ) = σ 2 r 0 (d (u, v ) ) where σ = 1 and in order that c is valid we take d = d R , cf.Theorem 2. In the first two plots (the top row), r 0 (t ) = exp(−st ) is an isotropic exponential correlation function with s = 0.1 or s = 0.01, and the next two plots (the middle row) relate to Theorem 5 with the Bernstein CDF given by a Γ(1, 100)distribution or a Γ −1 (2, 0.01)-distribution, cf.Example 1.The densities for these Bernstein CDFs are shown in the left panel in the bottom row, and the last plot shows the corresponding correlation functions for t ≤ 200 feet.The top row shows the scaling effect of the parameter s for the exponential correlation function.Plots 2-4 are comparable, since s has mean 0.01 in all three cases.Since max d R ≈ 675 feet, the last plot indicates a rather strong correlation in the GPs in plots 2-4 and
U R E 3 Simulation of zero mean GPs on the Chicago street network with an isotropic exponential covariance function c (u, v ) = r 0 (d R (u, v ) ). Top row: When r 0 (t ) = exp(−st ) with parameter s = 0.1 (left) or s = 0.01 (right).
2.1).Apart from these examples Cronie et al. (2020) did not verify any other cases of models where IRMPS is satisfied (in Section 5.2.1 we correct some mistakes in Cronie et al., 2020).At least Cronie et al. (2020) demonstrated the practical usefulness of their empirical estimator of the J -function for both a Poisson process, a simple sequential inhibition (SSI) point process, and a ) and a p-value obtained by the global envelope test (based on the extreme rank length) as described inMyllymäki et al. (2017),Mrkvička et al. (2020), and Myllymäki   andMrkvička (2019).For the dendrite spine locations datasets analysed inChristensen and Møller (2020) it is concluded that the results based on such a global envelope test are consistent with the results obtained if instead the summary functions fromCronie et al. (2020) are used.
Ang et al. (2012) used the empirical K -function together with simulations to show that the Chicago crime dataset (shown in the left panel of Figure1and available in spatstat) was more clustered than a homogeneous Poisson process while an inhomogeneous Poisson process with log-quadratic intensity fitted by maximum likelihood provided a better fit.They remarked that the latter model was only provisional and shown for demonstration.
Pair correlation functions for the Chicago crime dataset: Non-parametric estimate (solid curve) and curves estimated by minimum contrast for a LGCP (dashed curve), an ICP (dotted curve), and a PCPP (dashed-dotted curve) model.function, cf. (

F
I G U R E 5 Left column: Simulations of point patterns under the estimated LGCP (upper row), ICP (middle row) and PCPP (lower row) models for the Chicago crime dataset.The width of the lines shows the simulation of the underlying transformed Gaussian processes (the random intensity functions), while the points show the simulated point pattern.Right column: 95% global envelopes for the fitted LGCP (upper row), ICP (middle row), and PCPP (lower row) models for the Chicago crime dataset.The p-values of the global envelope test are shown above each plot.Pair correlation functions for the dendrite dataset: Non-parametric estimate (solid curve), and curves estimated by minimum contrast for ICP with exponential covariance (dotted curve) and covariance function with inverse gamma Bernstein density with fixed τ = 2 (dotted-dashed curve).

F
U R E 7 95% global envelopes for the fitted ICP models with an exponential covariance function (left) and a covariance function with an inverse gamma Bernstein CDF with τ = 2 (right) for the dendrite dataset.The p-values of the global envelope test are shown above each plot.
0 (d G (u, v ) ) and c 0 (d R (u, v ) ) are well-defined isotropic covariance functions.(Recall that d G ≥ d R with equality if and only if L is a tree, cf.(C) in Theorem 1.) Suppose also that c 0 > 1 and c 0 is a decreasing function, so c 0(d G (u, v ) ) ≤ c 0 (d R (u, v ) ).This is typical for the covariance function of the GP underlying a LGCP, ICP, or PCPP (an exception is a LGCP if c 0 can be negative but usually c 0 ≥ 0).Consider the twoLGCPs, ICPs, or PCPPs obtained by using the same type of transformed isotropic GP (or GPs) with the same mean and c 0 -function but using the different metrics d = d G and d = d R .Using an obvious notation, let us denote these two Cox processes by X G and X R and their corresponding pair correlation functions by g G (u, v ) = g 0 (d G (u, v ) ) and g R (u, v ) = g 0 (d R (u, v ) ).It follows from ( sity function at the individual observed points.Recently, when considering point processes on Euclidean spaces, Shaw et al. (2021) demonstrated the advantages of introducing new global estimators over the existing local estimators.It remains to consider the case of point processes on other spaces including linear networks.
Running times of Algorithms 1-3 used for simulating a GP on two different grids on a tree.Let the situation be as in Theorem 5 and suppose that L is a tree and d = d R = d G .Choose an integer n > 0 and independently for i = 1, ..., n make the following steps.
Theorem 5 allows simulation of any GP with a covariance function of the form (13), if a simulation algorithm for F is available.For the case of a tree, Theorem 5 combined with Algorithm 2 gives the following simulation algorithm.(I)GenerateSifrom F .(II)Using Algorithm 2 generate Y i as a zero mean GP on L with covariance function σ 2 exp(−S i d (u, v ) ).Output√ n Ȳn as an approximate simulation of a zero mean GP with covariance function given by (13).
Parameter estimates for the LGCP, ICP, and PCPP models for the Chicago crime dataset.A dash indicates that the parameter is not present in the model.
The percentage of simulations of an ICP with D IG > D Exp for various combinations of σ 2 and τ values, where the corresponding values of ϕ and p ms are shown.