Multivariate Geometric Anisotropic Cox Processes

This paper introduces a new modelling framework for multivariate anisotropic Cox processes. Building on recent innovations in multivariate spatial statistics, we propose a new family of multivariate anisotropic random fields and construct a family of anisotropic point processes from it. We give conditions that make the models valid, and we provide additional understanding of valid point process dependence. We also propose a likelihood-based inference mechanism for this type of process. Finally we illustrate the utility of the proposed modelling framework by analysing spatial ecological observations of plants and trees in the Barro Colorado Island study.


Introduction
In this paper, we introduce a new class of multivariate and heterogeneous point process models. In doing so, we address two challenging problems in point process analysis: we propose valid and nontrivial models for multi-type point processes, an open problem in the literature, and we produce multivariate spatial models that can flexibly accommodate anisotropy in both the marginal and joint dependence structures.
We choose to build our models using log-Gaussian Cox processes (Diggle & Milne, 1983;Møller et al., 1998) as a foundation. Thus, the observed point pattern is modelled in terms of a random intensity, generated by a random field. Recent interest in random field modelling has greatly enhanced our ability to specify flexible models for multivariate patterns. We shall build on recent progress made in this area by, for example, Gneiting et al. (2010), Apanasovich et al. (2012) and Genton & Kleiber (2015), by allowing for anisotropy in the second-order dependence structure of the latent random field model.
Datasets that require anisotropic models have been common in the point process literature over the last 20 years, for example the locations of chapels in Welsh valleys (Mugglestone & Renshaw, 1996;Rajala et al., 2016Rajala et al., , 2018b, the epicentral locations of earthquakes in California over a 20 year period (Veen & Schoenberg, 2006) and clustered locations of shrubs in dryland ecosystems (Haase, 2001). The Welsh chapels and Californian earthquakes both form elliptical clusters, indicating an anisotropic second-order interaction between points in the same pattern. Meanwhile, the dryland shrub data display a directional preference in the interaction of points of different type: Haase (2001) found one species to grow more often than would be expected to the east of a second species. In the point process literature, it is common to accommodate heterogeneities in the observed point pattern by using a spatially homogeneous random field model to specify an intensity process conditional upon some known covariates (see, e.g. Waagepetersen, 2008;Waagepetersen & Guan, 2009;Diggle et al., 2013). This approach is limited in its applicability, however, when faced with heterogeneous point pattern data with no covariate measurements, or indeed when the source of heterogeneity is unknown.
Our chosen approach to accommodating anisotropy is based upon a particular form of anisotropy known as geometric anisotropy (Goff & Jordan, 1988): whereas the spatial covariance functions that drive isotropic processes have circular contours of equivariance, those that drive geometric anisotropic processes have elliptical contours of equivariance.
This approach was also considered in the univariate case by Møller & Toftaker (2014). A great advantage of this approach is that it can be used in conjunction with well-known isotropic covariance functions; our models will use Matérn-based covariance structures, which will allow the user to directly specify both the range of dependence in, and the smoothness of, the resulting random field. We will also discuss and address identifiability concerns for this class of parametric models.
Once the random field has been specified, the point process is conditionally generated as a Poisson process with intensity determined by the random field. This has the advantage of automatically generating a valid set of anisotropic point processes, where the marginal and cross-pair correlation functions have an analytic form, which we provide. We explore the restrictions that are naturally placed on all cross-pair correlation functions, where we utilise recent results for isotropic multivariate random fields due to Apanasovich et al. (2012) and Gneiting et al. (2010). By representing our multivariate process in both the spatial and spectral domains, we will also demonstrate that allowing for distinct geometric anisotropies in each marginal process places further restrictions on valid forms of the cross-dependence structures. This is an important result that yields unique insights into the possible variation of joint co-dependence in multivariate geometric anisotropic random fields, and by extension Cox processes.
Once we have understood the constraints on possible model forms, we develop new inference methods. We detail a two-stage estimation procedure in which we first estimate the anisotropy parameters, and then use these estimates to transform the data to be isotropic; this 'isotropised' point pattern is then used to estimate the covariance parameters for the underlying random field model. For this second stage, Møller & Toftaker (2014) advocated the use of minimum contrast, a method of moments approach to estimation for point process models. We appeal to the likelihood principle, and develop a maximum likelihood-based approach to inference that builds on the work of Tanaka et al. (2008). Straightforward maximum likelihood estimation of the model parameters is infeasible, due to the intractability of the LGCP likelihood, however Tanaka et al. (2008) showed that the intractability of the point process likelihood can be circumvented by considering the so-called Fry process (Fry, 1979). This is a secondary point pattern formed by the difference vectors of all point pairs in the original point pattern, and it can be treated as an inhomogeneous Poisson point process, with an associated tractable likelihood. Tanaka et al. (2008) showed that the Fry process likelihood can be used to perform inference for univariate, isotropic point process models. In a novel extension of this work, we use the Fry process likelihood to perform inference for anisotropic, multivariate point processes.
Finally, we apply our newly-developed methodology to real data from a tropical rainforest stand on Barro Colorado Island, Panama (Condit, 1998;Hubbell et al., 1999Hubbell et al., , 2010. Recent work by Waagepetersen et al. (2016) and Rajala et al. (2018a) has highlighted the importance of developing realistic multivariate point process models to aid the understanding of complex species interactions within this rainforest. The need to develop anisotropic methodology in particular is characterised in Figures 1c and 1e, which show the estimated intensity of Guatteria dumetorum and Miconia hondurensis. Their strongly anisotropic features are clear, and we also show two simulated fields from the presented multivariate model class, exhibiting similar features.
Thus, to summarize, this paper provides a number of new and important insights for multivariate spatial processes, describing the complex relationships possible when allowing for distinct geometric anisotropies in each univariate component. Our understanding gives sufficient, but not necessary, conditions to yield valid multivariate random field models and, by extension, valid multivariate Cox processes.

Log-Gaussian Cox processes
Consider the multivariate point process X = {X p ∈ R d , p = 1, . . . , P }, where the index p is used to denote a univariate component of the multivariate process, and suppose that we wish to use such a process to model a multi-type point pattern. We will denote the observed point pattern X ∩ W = {x p,i ∈ W, i = 1, . . . , n p ; p = 1, . . . , P }, where n p ∈ N is the total number of points of type p observed in the observation window W ⊂ R d . Henceforth, we will also use x p to denote an arbitrary observed point of type p. For many applications of interest, d = 2, however much of the multivariate framework established here can be applied to point processes defined on a space of any dimension.
We define X to be a multivariate log-Gaussian Cox process (LGCP; Møller et al., 1998): each univariate sub-process X p is an inhomogeneous Poisson process with intensity specified by where S(x) = {S p (x), p = 1, . . . , P } is a multivariate Gaussian random field (GRF). We will assume S p , and therefore X p , to be stationary for all p = 1, . . . , P , and we denote the constant mean of S p (x) by µ p . The intensity process Λ p (x) will therefore also have a constant mean, which we denote λ p , and which will take the following form: where σ pp denotes the variance of S p (x).
Key to the definition of a multivariate LGCP is the conditional independence of its components: given its intensity process Λ p (x), the univariate LGCP X p is independent of {X q , q = 1, . . . , P, q = p}. As a result of this property, the second-order behaviour of the point process X may be entirely, and conveniently, described through the covariance structure of the multivariate GRF S(x). We do so by specifying the matrix of covariance The second-order behaviour of the multivariate point process X can be directly measured through the level of clustering or separation present in the resulting point pattern.
The cross-pair correlation function g pq (r) is defined as the expected number of points from process q that lie at a distance r from the typical point in process p, and is the standard tool for measuring aggregation and segregation, both within and between processes. For a log-Gaussian Cox process, g pq (r) can be straightforwardly expressed in terms of the covariance structure for the underlying multivariate GRF: From this relationship, it is clear to see that g pq (h) = 1 is equivalent to C pq (h) = 0, which indicates independence between processes p and q at the scale h ∈ R d . Thus, for a bivariate Poisson process {X p , X q }, i.e. under an assumption of complete spatial randomness, we would expect g pq (h) = 1, whereas significant departures from this would indicate aggregation (g pq (h) > 1) or segregation (g pq (h) < 1) of points from processes p and q at separation h ∈ R d .
The dependence structure for the multivariate GRF S(x) can equivalently be described in the frequency (spectral) domain. The (cross-)spectral density function f pq (ω) forms a Fourier transform pair with the (cross-)covariance function: By considering the spectral-domain behaviour of our multivariate GRF S(x), we will demonstrate the difficulties inherent in multivariate modelling of geometric anisotropic spatial dependence, and we will consider the complex coherence at frequency ω ∈ R d , γ pq (ω): (3)

Geometric Anisotropic LGCPs
We describe here the approach to modelling geometric anisotropy in univariate LGCPs, as introduced by Møller & Toftaker (2014). In brief, the required dependence structure is specified through the application of an isotropic covariance structure to a geometrically manipulated version of the space on which the process lives. Since a LGCP is fully defined by the first and second order characteristics of the underlying Gaussian random field, Møller & Toftaker (2014) showed that one can therefore construct a geometric anisotropic LGCP through using standard geometric manipulations to modify the space on which the latent univariate GRF is defined. In Section 3.1, we will show how this can be flexibly extended to the multivariate case by specifying individual components of a population of P GRFs through P potentially distinct geometric manipulations of R 2 .
Given an isotropic covariance function C 0 ( h ), h ∈ R d , we can define a geometric anisotropic version as (Christakos, 1992) where for θ ∈ [0, π) and ζ ∈ (0, 1], and where R θ is the rotation matrix. Under this parameterisation, Σ is defined such that the ellipse E = {h ∈ R 2 : h T Σ −1 h = 1} has a semi-major axis of unit length at angle θ, relative to the abscissa axis of the original coordinate system, and a semi-minor axis of length ζ at angle θ + π/2. Accordingly, we can describe the covariance function defined in (4) as 'elliptic', and we have that the LGCP driven by a GRF with elliptic covariance structure will also display elliptic, or geometric anisotropic, second-order behaviour, described by the pair correlation function and spectral density function as: is the isotropic spectral density that forms a Fourier transform pair with C 0 ( h ), and g 0 ( h ) is the corresponding isotropic pair correlation function.
Our specification of geometric anisotropy differs slightly from that of Møller & Toftaker (2014), who include an additional scale parameter in their definition of the deformation matrix Σ; this is used to scale the axes in the resulting elliptical covariance structure. In practice, however, the majority of parametric covariance functions of interest incorporate a scale parameter that directly controls the correlation length, and so including a separate scale parameter in (5) creates nonidentifiability issues when performing parameter inference. We avoid this issue by assuming all scale information to be controlled by the parametric form of C 0 ( h ).
Since we are considering processes that display anisotropy, it will be useful for their analysis to be able to express their second-order properties in polar coordinates. We there-fore define the anisotropic pair correlation function, replacing the vector h ∈ R d with its length r and angle φ: 3 Defining the Model 3.1 Accommodating multivariate geometric anisotropy For a population of P LGCPs, we specify the multivariate dependence through the covariance structure of the P -dimensional GRF that drives the P conditionally independent intensity processes. We extend the definition of geometric anisotropy in (4) and we define the following family of geometric anisotropic auto-and cross-covariance functions: for some corresponding family of isotropic covariance functions {C 0,pq ( h ); p, q = 1, . . . , P }, and for a collection of deformation matrices {Σ pq ; p, q = 1, . . . , P }, where Σ pq is defined in terms of the parameter pair (θ pq , ζ pq ) according to (5).
This framework will allow for the possibility of distinct geometric anisotropies in each of the marginal processes. Such processes can be used to model, for example, bivariate point patterns in which each component displays elliptical clustering at different orientations, or with differing degrees of ellipticity. Care must be taken in specifying the parameters for the cross-covariance functions C pq , however, in order to ensure a valid multivariate model. In the spatial domain, we require the matrix of covariance functions (C pq (h)) P p,q=1 to be nonnegative definite for all h ∈ R d ; the equivalent requirement in the spectral domain is that the matrix of spectral densities (f pq (ω)) P p,q=1 is nonnegative definite for all ω ∈ R d . If we consider the bivariate dependence structure for two processes X p and X q , p, q = 1, . . . , P , then we can see that the restriction in the spectral domain is equivalent to requiring the magnitude squared coherence |γ(ω)| 2 to be bounded above by 1, where the complex coherence is defined as in (3). This restriction can alternatively, and unsurprisingly, be written at every frequency as and this gives an upper bound on the magnitude of the cross-spectrum. This upper bound is displayed in Figure 2 for a bivariate process with distinct marginal geometric anisotropies.
By considering the behaviour of (7) over the two-dimensional Fourier domain, we can now make some general comments about the level of dependence between components in a bivariate geometric anisotropic LGCP; this discussion also applies to pairwise dependences in multivariate LGCPs of higher dimension. For the remainder of this subsection, the only assumption that we make is that each autospectrum and cross-spectrum in the bivariate process is decreasing for increasing frequencies ω. In particular, the following discussion is valid for any family of spectral densities that satisfies this assumption.
The inequality in (7) implies that, for any two processes, between-process dependence can only be non-negligible at those frequencies that contribute significantly to the marginal dependence in both processes. For two processes with distinct marginal geometric anisotropies, this restriction impacts the high-frequency behaviour of the bivariate process more than the low-frequency behaviour This can be seen by considering the spectra displayed in Figure 2: when constructing the upper bound for the cross-spectrum according to (7), the high-frequency contributions of each of the autospectra are killed by the negligible power at the same frequency in the other autospectrum; the contrasting behaviour of the marginal processes at high frequencies kills any high-frequency dependence between the processes. As a result, for any two processes that display contrasting anisotropic behaviour, significant between-process dependence will be more evident at low frequencies, or large spatial scales.
Due to our modelling assumption of geometric anisotropy in the cross-dependence structure, the cross-spectrum will have elliptical contours of equal power density. From Figure   2 we can also see that the elliptical geometries of the autospectra can dictate a nontrivial geometric structure for the upper bound of the cross spectrum. For any given pair of marginal spectra, and thus a given upper bound to the corresponding cross-spectrum, the ellipticity of the true cross-spectrum will therefore impact its permissible coverage of the frequency space, as its elliptical structure must fit within the upper bound's nontrivial geometry. Indeed, we can see from Figure 2 that, in order for our elliptical cross-spectrum to extend further into the higher-frequency regions of the Fourier space, the ellipticity of the cross-spectrum should be more pronounced; if we were to assume a more isotropic crossdependence structure, then the non-negligible cross-spectrum would be more restricted to the low-frequency region around the origin. Since the overall power in the cross-process dependence is obtained by integrating the cross-spectrum over the entire Fourier domain, this gives us a link between the power and the degree of anisotropy in the cross-process dependence. We will formalise this relationship towards the end of the next section, in the context of a Matérn specification for our multivariate dependence structure.

A multivariate Matérn correlation structure
The Matérn family of correlation functions (Stein, 1999;Guttorp & Gneiting, 2006) provides a flexible route to modelling multi-scale dependence in stationary spatial processes.
For univariate random fields, one can use a single three-parameter covariance function to replicate dependence structures that act over any positive scale, whilst additionally controlling the smoothness of any realisations. The flexibility of this model has made it the tool of choice for modelling univariate processes in the spatial statistics literature, and there has naturally been a great deal of interest in extending its use to the multivariate setting. Gneiting et al. (2010) and Apanasovich et al. (2012) have recently addressed this interest, proposing the use of a Matérn function to describe all auto-and cross-covariances for a multivariate isotropic stationary random field. This work has been further extended by Kleiber & Nychka (2012), who accommodate nonstationarity by allowing the Matérn parameters to vary with respect to location; this allows for the possibility of local anisotropic behaviour, as well as local variances and smoothnesses. We will incorporate elements from these approaches in our modelling framework, though we retain an assumption of stationarity. Our aim is to ascertain whether observed second-order point process characteristics can be modelled independently of first-order covariates; this goal would be best served under assumptions of stationarity in the underlying GRF.  Apanasovich et al. (2012), who relax the restrictions on the multivariate model and provide sufficient conditions for its validity for any dimension P ≥ 1. We present a class of Matérn auto-and cross-covariance functions that can accommodate multivariate geometric anisotropy, and we adapt the work of Apanasovich et al. (2012) in order to provide sufficient conditions for its validity, for P ≥ 1.
Following Gneiting et al. (2010), we define the isotropic multivariate Matérn covariance function to be where K ν (·) is the modified Bessel function of the second kind (Abramowitz & Stegun, 1965, pp.374-379). Here, σ pq ∈ R (σ pp > 0) is the zero-lag covariance between field components S p and S q , and α pq > 0 and ν pq > 0 are scale and smoothness parameters, respectively.
The latter two parameters control the rate of decay of covariance between the same two processes with respect to distance. As a scale parameter, α pq determines the 'practical range' of the covariance function, i.e. the separation distance at which S p and S q may be considered approximately independent. The smoothness parameter ν pq determines the shape of the covariance function, and in particular the speed with which it decays close to the origin. For the marginal processes, ν pp clearly then controls the smoothness of the realisations; indeed, the marginal process X p will be m times mean-square differentiable if and only if ν pp > m.
Throughout the literature, the Matérn covariance function has been defined using a variety of parametric forms, with the three parameters interacting in a different manner in each specification. The predominant material difference between the parameterisations is the formulation of the term used to scale the absolute distance h . The inverse of this distance-scaling factor is also known as the correlation length and this is proportional to the practical range; as can be seen from (8), in the current parameterisation, the correlation length is equal to α pq /2 √ ν pq . For alternative parameterisations of the model where the correlation length is independent of ν pq , it is often found that the effects of α pq and ν pq on the practical range and shape of C 0,pq ( h ; α pq , ν pq , σ pq ) cannot be well separated. The parameterisation of the Matérn function given in (8), attributable to Handcock & Wallis (1994) in the univariate scenario, is chosen to allow maximal separation of the roles of α pq and ν pq in determining the second-order behaviour of S(x) and, ultimately, the resulting point process X.
As α pq increases for fixed ν pq , so too will the practical range of C 0,pq ( h ; α pq , ν pq , σ pq ).
This will increase the maximum distance at which one can expect to find cross-process aggregation and segregation of points in X p and X q . Note that the corresponding effect for the marginal scale parameters is that an increase (decrease) in α pp will result in an increase (resp. decrease) in the width of the observed clusters in X p . Recall that the smoothness parameter controls the shape of the covariance function; as ν pq increases, C 0,pq ( h ; α pq , ν pq , σ pq ) becomes smoother around h = 0. As a result of our parameterisation, as ν pq increases for fixed α pq , (8) will increase for small values of h and decrease for large values of h ; the distribution of variance shifts from high scales to low scales.
Thus, where the scale parameter α pq determines the width of areas in which processes X p and X q will have similar intensities, ν pq will determine how similar these intensity processes are within these regions.
Having established the Matérn form of the auto-and cross-covariances for a multivariate isotropic GRF, we now generalise to allow for anisotropic multivariate covariance structures.
Recall from Section 3.1 that we obtain our geometric anisotropic (cross-)covariance function by applying the deformation matrix Σ pq : which is defined for any h ∈ R d .
Recall that we require the matrix (C pq (h; α pq , ν pq , σ pq , Σ pq )) P p,q=1 to be nonnegative definite for all h ∈ R d , in order for (9) to define a valid multivariate covariance model.
Satisfaction of this requirement can be guaranteed by placing the following conditions on the cross-covariance parameters {α pq , ν pq , σ pq , θ pq , ζ pq , p = q}.
Condition 3.2. The matrix with elements −4ν pq /α 2 pq , p, q = 1, . . . , P , is conditionally nonnegative definite. This is a weaker assumption than that of nonnegative definiteness, and it may be satisfied by a matrix containing only negative elements.

Condition 3.3. The matrix with elements
is nonnegative definite.
The proof of Proposition 3.1 is given in the Appendix, and follows a similar argument to the proof of Theorem 1 of Apanasovich et al. (2012).
Remark 3.1. If Condition 3.4 holds, then the P × P matrix with (p, q)-element |Σ pq | −1/2 = ζ −1 pq , will be nonnegative definite; in particular, we can deduce ζ 2 pq ≥ ζ pp ζ qq , for all p, q = 1, . . . , P. the Matérn parameters in order to guarantee a valid multivariate dependence structure in an isotropic setting. In the simpler isotropic framework, the three conditions specified by Apanasovich et al. (2012) are sufficient to guarantee nonnegative definiteness of the resulting spectral density, and also to guarantee that all absolute zero-lag cross-correlations are bounded above by one. In the more general geometric anisotropic setting, we require a more extensive specification. Conditions 3.1-3.4, above, are sufficient to guarantee nonnegative definiteness of the geometric anisotropic spectral density, and are also sufficient for the absolute colocated cross-correlations to be bounded above by 1.
These conditions constitute a set of implicit relationships that, between them, specify a valid multivariate geometric anisotropic LGCP. We will now provide explicit restrictions on the cross-dependence parameters in terms of the marginal dependence parameters. This will allow users to sequentially construct a valid multivariate model by first specifying the marginal covariances, and then conditionally specifying the cross-covariance structures.
This sequential approach to model construction will also be reflected in our model-fitting procedures in Section 4.
Trivial rearrangement of Condition 3.1 yields an explicit expression for ν pq in terms of the corresponding marginal values. In Remarks 3.2-3.5, below, we provide similar constructions for the cross-covariance parameters α pq , σ pq , θ pq and ζ pq , such that Conditions 3.1-3.4 may be satisfied. The proofs for Remarks 3.2-3.5 are given in the Appendix.
for constants V p , V q ≥ 0 and for some A σ,pq ∈ [−1, 1] that form a valid correlation matrix.
Remark 3.4. Condition 3.4 is satisfied by the deformation matrices {Σ pq ; p, q = 1, . . . , P } if their diagonal elements [Σ pq ] ii , can be written Remark 3.5. For small P , we can follow the lead of Apanasovich et al. (2012) and use equicorrelated matrices A (i) in this scenario, for the sake of identifiability, we redefine ∆ (i) Conditions 3.1-3.4, along with Remarks 3.2-3.4, indicate a sequential approach to specifying a valid multivariate geometric anisotropic Matérn covariance structure in practice.
As mentioned previously, Condition 3.1 and Remarks 3.2-3.4 suggest that one must specify the parameters for the marginal covariance function before conditionally specifying the parameters for each cross-covariance function. These statements also indicate that, within each individual component of the joint model, i.e. for fixed p, q, there is a particular order in which the five parameters (θ pq , ζ pq , α pq , ν pq , σ pq ) should necessarily be specified. From Remark 3.3, we can see that, for each (p, q) pairing, the specification of the Matérn power parameter σ pq is dependent upon the corresponding ratio of anisotropy ζ pq , as well as the other Matérn parameters, α pq and ν pq , and Remark 3.4 indicates that the anisotropy parameters (θ pq , ζ pq ) should be jointly specified. In addition, Condition 3.1 and Remark 3.2 indicate that the smoothness parameter ν pq should be specified before the scale parameter α pq . We conclude that, for each marginal or bivariate component of the joint covariance model, the anisotropy parameters should be specified before the Matérn parameters, with the Matérn smoothness, scale and power parameters being specified third, fourth and fifth, respectively.
We conclude this section by considering the limitations placed on the zero-lag crosscorrelation coefficients ρ pq := σ pq / √ σ pp σ qq . By rearranging Condition 3.3, we can write: with where B(·, ·) is the Beta function (Abramowitz & Stegun, 1965). The first inequality in (10) is directly implied by Condition 3.3. The second inequality in (10) can be shown componentwise: By Remark 3.1, Condition 3.4 ensures that τ (4) pq ≤ 1, and as noted by Apanasovich et al. (2012) in the isotropic framework, Conditions 3.1 and 3.2 are sufficient to guarantee that τ (i) pq ≤ 1, i = 1, 2, 3. In the isotropic framework, τ (4) pq = 1, and we are left with the limitations noted by Apanasovich et al. (2012): the zero-lag cross-correlation will be bounded above by 1 when the corresponding univariate isotropic processes share identical Matérn parameters. When the marginal parameter specifications differ, this upper bound will decrease as the smoothness and inverse correlation length of the cross-covariance structure depart from the arithmetic mean of the corresponding marginal quantities.
In our more general anisotropic framework, we can see from τ (4) pq in (10) that the upper bound on the colocated cross-correlations will also be affected by the relationship between the cross-covariance ratio of anisotropy ζ pq and the ratios of anisotropy in the corresponding marginal covariance structures. If we assume Condition 3.4 to hold, then by Remark 3.1, ζ pq will be restricted to the closed interval [ζ pq will reduce to 1, and the upper bound of the colocated cross-correlation ρ pq will behave as in the isotropic framework, i.e. as described above. Increasing ζ pq away from this geometric mean, however, will decrease τ (4) pq , which will in turn shrink the upper bound on ρ 2 pq , given in (10). In other words, as the ellipticity of the cross-covariance function becomes less pronounced, the maximum possible degree of zero-lag correlation between the two components of the field will decrease. This formalizes the relationship between the power and the anisotropy of the cross-process dependence, discussed at the end of Section 3.1.

Parameter Estimation Procedure
In order to fit our parametric model to an observed multitype point pattern, we must estimate both the marginal and joint anisotropy parameters {θ pq , ζ pq ; p, q = 1, . . . , P }, as well as the parameters that specify the mean and Matérn covariance structure of the underlying Gaussian random field, {µ p , α pq , ν pq , σ pq ; p, q = 1, . . . , P }. At a high level, we follow the approach of Møller & Toftaker (2014), who fit a univariate version of our model by first estimating the angle and ratio of anisotropy in the observed data, before using these estimates to back-transform the data into an isotropic framework. The resulting 'isotropised' point pattern is then used to estimate the mean parameters and the Matérn parameters. Our approach to each component of this two-stage model-fitting procedure will differ from the methods of Møller & Toftaker (2014), however. We use an approach to estimating anisotropy that is less sensitive to user-specified tuning parameters, which we adapt from the work of Rajala et al. (2016), and we use a more automatable approach to estimating the mean and Matérn parameters, which we develop from the work of Tanaka et al. (2008).
In developing our parameter estimation methodology, we are faced with the question of whether to put measures into place to guarantee that the fitted model satisfies Conditions 3.1-3.4, therefore ensuring validity of the multivariate dependence structure. This is the approach taken by Apanasovich et al. (2012) for fitting multivariate isotropic Matérn GRFs; they fit the marginal dependence structures then use the estimated marginal parameters to restrict the parameter subspace for the Matérn cross-covariances. Since Conditions 3.1-3.4 are sufficient, and not necessary, the resulting restriction on the joint dependence structure could be overstated, potentially resulting in inconsistent estimators for the Matérn cross-covariance parameters. Under the assumption that the smoothness is known, however, the power and scale parameters for a univariate Matérn covariance function cannot be consistently estimated under infill asymptotics (Zhang, 2004); consistency can only be achieved by increasing the observation window W . As noted by Apanasovich et al. (2012), constraining σ 2 pq and α 2 pq (p = q) conditional on their corresponding marginal values therefore provides no additional penalty in terms of estimator consistency when assuming a fixed observation window. Furthermore, the numerical tests of Apanasovich et al. (2012) show that reasonable accuracy can indeed be obtained when using this constrained approach to parameter estimation; this approach therefore warrants examination in the current framework. In order to avoid compromising the consistency of the anisotropy estimators, we do not use our conditions from Section 3 to restrict the parameter pair (θ pq , ζ pq ), p = q.

Estimating the Anisotropy Parameters
We focus first on quantifying the anisotropy present in both the marginal and joint dependence structures in a multi-type point pattern. Møller & Toftaker (2014) estimate the angle of anisotropy in a univariate geometric anisotropic point pattern by finding the angle φ at which the r-integrated difference between the anisotropic pair correlation function g a (r, φ) and its phase-shifted self g a (r, φ + π/2), is maximised. This is achieved by estimating g a (r, φ) over a discrete lattice of polar coordinates (r, φ), and numerically approximating the required integral in r. Accuracy of the resulting estimator is therefore sensitive to the resolution of the polar lattice, as well as the choice of two bandwidth parameters used in estimating the anisotropic pair correlation function; for details of these bandwidth parameters, see Møller & Toftaker (2014). Finally, use of this estimation method is also dependent on the assumption that the isotropic pair correlation function is strictly decreasing. Whilst this assumption holds true for our assumed Matérn model, it can be violated by real data.
The approach we detail below is more widely applicable, as it does not depend on such an assumption, and it is also less sensitive to subjective choices of bandwidth parameters.
We adopt and adapt the method introduced by Rajala et al. (2016) for estimating the angle of anisotropy: we adopt this method for characterising anisotropy in the marginal covariance structures, and we adapt it for estimating the angle of anisotropy in the crosscovariance structures. For the sake of generality, we describe the procedure for estimating θ pq , p = q. We start by constructing the point pattern formed by the difference vectors {x p,i − x q,j ; i = 1, . . . , n p , j = 1, . . . , n q }; this is the (bivariate) Fry process (Fry, 1979), and when p = q, this will be rotationally symmetric of order 2, about the origin. The Fry process is useful here as its first-order properties will reflect the second-order properties of the original point pattern. We can therefore estimate any second-order anisotropy in the original bivariate point pattern by estimating the anisotropy in the intensity of the bivariate Fry process.
Dividing the polar plane into a selected number, n F , of distinct sectors, and for l ∈ L ⊂ N, we collect the lth nearest Fry point in each sector into a set, G l , of n F points, such that each G l sketches out a noisy contour around the origin, and such that the intensity of the Fry process is reflected in the proximity of the G l s to one another. For point patterns that display segregation, the anisotropy in the joint second-order dependence structure will be shared by the contours of the intensity field for the Fry process; for aggregated point patterns, the angle of anisotropy will be phase-shifted by π/2 in the Fry process. In order to quantify the anisotropy in the original point pattern then, we can treat the G l s as sampled versions of the Fry intensity's contours, and assuming Gaussian measurement error we can infer the corresponding true contours using adjusted ordinary least squares, and subsequently derive the angle of anisotropy in the original point pattern. For full technical details of this method, we direct the reader to Rajala et al. (2016).
For each marginal process, as described by Møller & Toftaker (2014), we can transform the observed point pattern X p ∩ W and the corresponding observation window W by assuming fixed values for θ ∈ [0, 2π) and ζ ∈ [0, 1]: If the chosen values of θ and ζ are equal to the values that describe the anisotropy of X p , then the transformed point process X p,θ,ζ will be isotropic and the corresponding anisotropic pair correlation function g a pp,θ,ζ (r, φ) will be constant with respect to its second argument. This motivates our chosen method for estimating the marginal anisotropy ratios ζ pp , which we also adopt from the work of Rajala et al. (2016).
Following Rajala et al. (2016), we define the following directional discrepancy statistic: where is the sector-K-function, an anisotropic variant of Ripley's K-function, evaluated on the isotropised point pattern X p,θ,ζ . To estimate the marginal ratio of anisotropy ζ pp , we backtransform our observed point pattern using the estimated angle of anisotropyθ pp and a sequence of candidate ratios {ζ pp,k := kζ max /(1 + n ζ ), k = 1, . . . , n ζ }, for some user-defined upper bound ζ max . We then chooseζ pp = ζ pp,k ∈ (0, ζ max ) to be the candidate value that minimises the estimateV pp,θpp (ζ pp,k ). Note that, although we defined ζ pp ∈ (0, 1) in Section 2.2, the sampling variance of the estimated sector-K-function can result in an estimated ratioζ pp > 1.
Møller & Toftaker (2014) use a similar approach, in effect minimising the directional discrepancy statistic (11), but using the anisotropic pair correlation function in place of the sector-K-function. Indeed, it is possible to use any directional second-order statistic in the integrand of (11). We choose to use K a pp,θ,ζ for two reasons. Firstly, the analysis of Redenbach et al. (2009) suggests that the sector-K-function is better-suited to characterising anisotropy than nearest-neighbour statistics; the authors conclude that, for detecting anisotropy in point patterns, statistical tests based on the sector-K-function have greater power, in general, than those based on nearest-neighbour orientation statistics. Secondly, estimation of the sector-K-function requires the choice of only one tuning parameter, an angular bandwidth, whereas the use of the anisotropic pair correlation function would require the specification of both angular and radial bandwidths.
Our chosen approach to estimating ζ pp can be extended to the multivariate scenario, where we are interested in the geometric anisotropic cross-dependence exhibited by a given pair of Cox processes X p and X q . By manipulating the space R d on which both processes live, we also manipulate the cross-covariance function C pq (h) that specifies the dependence between the random fields that drive X p and X q . For each pair of processes, we once again define a discrete set of candidate multivariate anisotropy ratios {ζ pq,k ∈ (0, ζ max ), k = 1, . . . , n ζ }, and we chooseζ pq = ζ pq,k for which the estimated value of V pq,θpq(ζ pq,k ) is minimised, where V pq,θpq(ζ pq,k ) is defined through transforming both X p ∩ W and X q ∩ W , along with their common observation window W .
The above approach to estimating the anisotropy parameters requires the selection of a number of control parameters: the number of sectors n F , into which we partition the Fry process; the number n ζ of candidate ratios of anisotropy, as well as their upper bound ζ max ; and the limits of integration, b 1 and b 2 in (11), which we use to calculateV pq,θpq (ζ pq,k ) when estimating ζ. As a rule of thumb, and for reasons outlined below, Rajala et al.
(2016) suggest choosing n F ≈ λ|W |/6, where λ|W | is the expected number of points in the original point process. We adopt this guideline for choosing n F when estimating the anisotropy in the marginal processes, and we derive a similar rule of thumb for n F when estimating the between-process anisotropy, by following the same arguments as Rajala et al. (2016). For the bivariate Poisson process with intensity vector (λ p , λ q ) in a circular spatial window W , the expected number of bivariate Fry points per sector is approximately λ p λ q |W | 2 /3n F . Each point in the bivariate process can be expected to contribute if there are at least (λ p + λ q )|W | points per sector, and so we have a bivariate direction count rule of n F ≈ λ p λ q |W |/3(λ p + λ q ). Selection of both n ζ and ζ max is straightforward: ζ max should be chosen such that (0, ζ max ) covers the majority of the sampling distribution of ζ pq , and selection of n ζ involves a trade-off between accuracy in the resulting estimates and computational expense of the estimation procedure. In Section 5, where we implement our model fitting procedure for both simulated data and tropical rainforest data, we use ζ max = 2 and n ζ = 199 for estimating all marginal and joint ratios of anisotropy.
Choice of the limits of integration, b 1 and b 2 in (11), is a more subjective task, and should be determined by the range of scales over which dependence (either within, or between processes) is sought to be characterised; these need not be the same for all marginal and cross-dependence relationships being estimated. In Section 5, we detail our choices of these limits of integration.

Estimating the Matérn Parameters
Once we have estimated our anisotropy parameters, we can isotropise the point pattern and its observation window, and use this transformed data to estimate the remaining parameters. In order to ensure that the Matérn parameters satisfy Conditions 3.1-3.3, we define ν pq , α pq and σ pq according to the specifications in Condition 3.1, Remark 3.2 and Remark 3.3, respectively. Techniques for modelling the correlation matrices A ν , A α and A σ are discussed by Apanasovich & Genton (2010), and the reader is directed there for further details. When P is small, however, we can simplify our task by assuming the off-diagonal elements of A α , A ν , A σ to be constant (Apanasovich et al., 2012).
In order to estimate both the mean and Matérn parameters, we maximise the Palm loglikelihood, first proposed by Tanaka et al. (2008). For estimating the marginal parameters, we use the version of the Palm log-likelihood given by Dvořàk & Prokešovà (2012), where the inner region correction is proposed to deal with edge effects: where r ij = x p,i − x p,j , K p (r; α pp , ν pp , σ pp ) is Ripley's univariate K-function, which we approximate by numerically integrating the corresponding pair correlation function, and |X p,θ,ζ ∩ W \ R| denotes the number of points in the isotropised pattern X p,θ,ζ that lie further than a distance R from the boundary of W θ,ζ . R is a user-defined tuning parameter that can be objectively set based on the data; this is discussed further in Section 5. As is common in the point pattern literature, we use = in the summation notation to indicate summation over pairs of distinct points.
The Palm log-likelihood (13) can be analytically maximised with respect to λ p , yielding the maximum Palm-likelihood estimate (MPLE)λ p , and we obtain MPLEs for the remaining marginal Matérn parameters by numerically maximising (λ p , α pp , ν pp , σ pp ). The MPLE for µ p can be subsequently calculated according to (2).
We further develop the Palm log-likelihood approach, in order to estimate the parameters for the cross-covariance structure; our bivariate Palm log-likelihood follows a similar construction to the marginal version. First, we obtain the symmetric bivariate Fry process for components X p and X q , using the inner region correction to deal with edge effects. We then treat this Fry process as an inhomogeneous Poisson process, with intensity equal to a bivariate version of the Palm intensity (Daley & Vere-Jones, 2008;Prokešovà & Jensen, 2013), which we define heuristically as follows: for x at distance r from the origin o, the occurrence rate of process q at x ∈ {R 2 : x = r}, assuming there to be a point of process p at the origin, is where dx is the Lebesgue measure for the infinitesimal set at x. Following this definition, we can relate the bivariate Palm intensity to the (isotropic) cross-pair correlation function for the original process: λ 0,pq (r) = λ q g 0,pq (r), and this allows us to obtain the following bivariate Palm log-likelihood, which can be maximised to obtain estimates for α pq , ν pq , σ pq , p = q: where r ij = x p,i −x q,j , K pq (r; α pq , ν pq , σ pq ) is Ripley's bivariate K-function, and |X p ∩W \ R| denotes the number of observed points in process p that lie further than a distance R from the boundary of the window R. By substituting our previous estimates of λ p and λ q into (14), we obtain an expression in terms of the Matérn cross-covariance parameters only.
We numerically maximise this expression in (α pq , ν pq , σ pq ) over the constrained parameter space described by Condition 3.1, Remark 3.2 and Remark 3.3, and dependent on the corresponding estimated marginal Matérn parameters. As described in Section 4.1, the use of constrained optimisation should not affect the consistency of the cross-covariance parameter estimators, however they may display some bias due to the truncation of their supports.

Proof of concept simulations
We demonstrate the validity of the model fitting procedure described in Section 4, through For each model, and for p, q = 1, 2, we executed our parameter estimation procedure as described in Section 4.1. For both the marginal and cross-dependence relationships, we estimate θ pq using Fry processes consisting of only those point pairs separated by r ∈ (0, 0.25). Similarly, when estimating ζ pq , we numerically approximate the integral V pq,θpq (ζ) as defined in (11), using the limits of integration b 1 = 0, b 2 = 0.25. Approximation of V pq,θpq (ζ) involves estimating the sector-K-function over a discrete, high-resolution set of distances r, using an angular bandwidth parameter which we choose to be h φ = π/8 following Rajala et al. (2018b, §4.3.1), and details of the chosen sector-K-function estimator are given in the Appendix. In estimating the anisotropy parameters, our choice of interval for r is deliberately large relative to the true scale of dependence in all of our models, as we intend to show that reasonable results can be obtained without prior knowledge of the true scale of dependence in the data.
When estimating the Matérn parameters, despite using a favourable form of the Matérn parameterisation as discussed in Section 3.2, there proved to be insufficient separation of the effects of ν pq and α pq in practice for both parameters to be allowed to vary freely during estimation. In order to avoid this issue, a common strategy (e.g. Diggle et al., 2013) is to restrictν pq to three candidate values, representing three sufficiently distinct levels of smoothness in the resulting random fields: we seekν pq ∈ {0.05, 0.5, 5.0}, p, q = 1, 2. For the case p = q, this candidate vector was further restricted, to ensure thatν 12 satisfied Condition 3.1. The remaining Matérn parameters were allowed to vary on continuous bounded intervals:α pq ∈ (0, α U B pq ) andσ pq ∈ (0, σ U B pq ). In the marginal cases, α U B pp = 10 and σ U B pp = 50, p = 1, 2, were chosen such that these constituted generous intervals around the corresponding true values. For estimating the cross-covariance parameters, α U B 12 and σ U B 12 were chosen to ensure compliance with Conditions 3.2 and 3.3.
Our implementation was carried out in Matlab, where we used the default interior-point algorithm to carry out constrained maximisation of the Palm-log likelihood with respect to (α pq , σ pq ), for each candidate value of ν pq . Since this algorithm requires the user to initialise the parameters being sought, we did so using a computationally inexpensive version of the widely-used minimum contrast method, minimising the difference between the estimated (isotropic) pair correlation function and its closed-form expression across a coarse grid of parameter pairs (α pq , σ pq ).
We also detail our choice of the MPLE tuning parameter R. Following the guidance of Prokešovà & Jensen (2013), we chose R to be approximately equal to the range of interaction in the relevant dataset. The practical range of dependence is defined in the geostatistics literature to be the distance at which the spatial auto-or cross-correlation decays to 0.05. We calculated the practical range for each of our models, motivating our choice of R = 0.1 for models 1 and 2, and R = 0.25 for models 3 and 4. In a small proportion of runs, the MPLE procedure returned seemingly degenerate estimates of either α pq orσ pq , p = 1, 2, with one or the other being returned equal to their upper bound. This was found to occur when the majority of the points in the corresponding dataset lay in the boundary region created using the above values of R. In this scenario, the number of points contributing to the Palm log-likelihoods (14)- (13) Table 1: Monte Carlo estimates and standard errors for the anisotropy (top) and Matérn (bottom) parameters in four distinct models. All estimates and errors are given to 2dp, apart from those for the scale parameters; these are presented to 3dp, due to the magnitude of the errors. Finally we note that, across all models, the estimation of µ 1 and µ 2 is poor. We found that this can be improved by reducing the MPLE tuning parameter R, but with a loss of accuracy in the resulting covariance parameters. In practice, we can of course avoid this trade-off by instead using the classical estimator for the intensity,λ p = n p /|W |, and combining this withσ pp to obtain a more accurate estimate for µ p .
Overall, these results indicate reasonable success for our model fitting procedure, and warrant its use in exploring the model's effectiveness in characterising real data.

Application to ecological data
In order to demonstrate the utility of our multivariate geometric anisotropic framework, we fit our multivariate Matérn geometric anisotropic LGCP to a bivariate point pattern from a 50ha plot in the BCI forest stand in Panama. Our point pattern of interest comprises two tree species, Cecropia obtusifolia and Spondias radlkoferi. To ease comparison with the studies in the previous section, we rescale the coordinates to the half-unit window [0, 1] × [0, 0.5]; this rescaled bivariate point pattern is displayed in Figure 4. C. obtusifolia and S. radlkoferi were chosen as a preliminary study of the data revealed empirical evidence of between-process anisotropy at a range of r = 50m. This is demonstrated in Figure   5, which we describe below. This preliminary evidence also motivates the scales over which we seek to characterise anisotropy in the data: for both the marginal and crossdependence relationships, we estimate θ pq using Fry processes consisting of only those point pairs separated by r ∈ (0, 0.05), and we estimate ζ pq using b 1 = 0 and b 2 = 0.05 as the limits of integration inV pq,θpq (ζ).
As in Section 5.1, we must also choose a value of the MPLE tuning parameter R. Once again, we do so by consulting the marginal and cross-pair correlation functions for the isotropised data, once the marginal and between-process anisotropy parameters have been estimated. We found that the corresponding implied practical range, both within-species and between-species, was in the interval [0.1, 0.2]. We therefore executed the MPLE portion of our model fitting procedure for each of R = 0.1, 0.15, 0.2.
For the proof-of-concept studies in Section 5.1, we were able to avoid constraining the anisotropy parameters during the estimation procedure, as we knew that their true values satisfied the relevant model validity conditions of Section 3.2. When fitting the model to observed data, however, we have no such assurance. Instead of introducing any new constraints on the anisotropy parameters here, we acknowledge this uncertainty by checking each fitted model against Conditions 1-4; all of the fitted models we present here were found to satisfy these validity conditions. In Section 5.1, we also found that estimating the Matérn parameters via constrained optimisation can result in underestimation of the overall power in the between-process covariance. This occurs when the estimated value of σ 12 is equal to the upper bound specified by the marginal dependence structures. By calculating this upper bound explicitly, and comparing withσ 12 , we can therefore ascertain whether each fitted model accurately represents the between-species dependence structure; this is important, as it describes the interspecific interaction between individual trees in our dataset.
To begin with, we applied our model-fitting procedure as in Section 5.1. This resulted in the model specified by the first three rows of parameter estimates in Table 2. Sinceσ 12 = σ U B 12 in each of these specifications, we conclude that none of these fitted models accurately represent the interspecific interaction in our dataset. Motivated by the observation that distinct values of the marginal smoothness parameters lead to prohibitively small values of σ U B 12 , we next proceeded to fix the smoothness parameters, ν 11 = ν 22 = ν 12 = 0.5, such that we sought to fit a geometric anisotropic bivariate Exponential covariance structure to our data. Crucially, all of the discussion from Sections 3 and 4 is valid for fixed values of ν pq , p, q = 1, 2. The resulting parameter estimates for this model are given in rows 4-6 of Table   2. In order to demonstrate the utility of our multivariate anisotropic framework, we also fit an isotropic version of the multivariate Exponential LGCP to the same data, for the purpose of comparison. In practice, we achieve this by fixing ζ pq = 1 and θ pq = 0 for p, q = 1, 2, and implementing the MPLE portion of the model fitting procedure as described above, using R = 0.1, 0.15, 0.2. The resulting three sets of estimated scale and power parameters for this model are given in the bottom three rows of Table 2. As is shown in this table, the interspecific interaction is well-represented in only one of each of the anisotropic and isotropic fitted Exponential models. We henceforth restrict our attention to these two fitted models.
In order to assess different aspects of each model's performance, we use global envelope  were developed by Myllymäki et al. (2017) to address multiple testing concerns with regards to the popular use of Monte Carlo envelope tests (Loosemore & Ford, 2006;Baddeley et al., 2014). The envelopes provided by the GETs describe a proper statistical test: if the observed test statistic lies outside the simulated envelope at any instance, then the null hypothesis that the observed data belong to the fitted model may be rejected. In order to construct the envelopes, we use one of the following two approaches, both of which are described in detail by Myllymäki et al. (2017). For symmetric second-order statistics, we use the scaled studentized maximum absolute difference (MAD) to construct the critical bounds. For asymmetric second-order statistics, it is more appropriate to construct the envelopes using the scaled directional quantile MAD. In both cases, we configure our tests such that they have a global type I error probability of 0.1, using M = 499.
To assess each model's description of bivariate anisotropy in the data, we estimate the the sector-K-function (12) at a distance r = 0.05, and at the angles φ = kπ/60, k = 0, . . . , 60. This assessment is summarised in Figure 5, which gives the estimated marginal and between-process sector-K-functions for the observed BCI data, along with their corresponding directional quantile MAD envelopes. To assess the fitted model's ability to replicate aggregation in the observed multi-type point pattern, we use the 'p-to-q' nearest-neighbour distance distribution function G pq (r) (Van Lieshout & Baddeley, 1999).
This describes the empirical distribution of the absolute distance between the typical point of type p and its nearest point of type q and is, in general, asymmetric in p, q. We calculate G pq (r) at a range of distances, using a bivariate version of the border-corrected estimators detailed by Baddeley et al. (2015, §8.11.3). The resulting estimates for the observed BCI data are provided in Figure 6, along with their corresponding studentized MAD envelopes.
From Figure 5, we can see that the two chosen species in the BCI forest stand exhibit anisotropic interspecific interaction at a range of 50m: for φ ∈ [7π/60, 9π/60] ∪ [12π/60, 18π/60], the estimated sector-K-function K a 12 (0.05, φ) lies outside of the envelope generated by a multivariate isotropic LGCP. The p-value for the global envelope test that was carried out for this statistic was 0.058, indicating departure from the isotropic model when using a global type I error probability of 0.1. From the bottom row of panels, we can see that our multivariate geometric anisotropic LGCP can comfortably replicate this observed heterogeneity.
Finally, the bottom row of panels in Figure 6 demonstrates that our fitted multivariate anisotropic LGCP also accurately captures the clustering behaviour exhibited by the observed bivariate data. The top panels in this Figure suggest that, despite not being able to account for the anisotropy in the data, the multivariate isotropic LGCP has also captured the clustering behaviour evident in this particular example.

Discussion
Using the model-fitting methodology described in Section 4, we have shown that by incorporating geometric anisotropy into the between-process dependence, as well as the marginal dependence, we can construct a LGCP that more accurately replicates any rotationally heterogeneous interaction between points in a multi-type point pattern. We have focussed here on a covariate-free approach, motivated in part by the desire to allow the description of anisotropic between-process dependence in data for which there are no explanatory spatial variables. Nevertheless, the models presented here are flexible enough to use (potentially incomplete) covariate information where it is available. Indeed, an interesting first extension of this work would be to incorporate covariates into the first-order description of the GRF underlying our LGCPs; for instance, the expected value of the GRF could be specified through a linear regression model, and inference with respect to the regression parameters may be achievable through the use of estimating functions (e.g. Waagepetersen, 2008;Waagepetersen & Guan, 2009). Such an approach would allow the user to exploit any knowledge of spatial covariates whilst being confident that any residual heterogeneity in the data would be accounted for by the increased flexibility of the multivariate geometric anisotropic second-order dependence structure.

Acknowledgements
The

A Appendix
A.1 Proof of Proposition 3.1 In Proposition 3.1, we state that Conditions 1-4 are sufficient for the geometric anisotropic Matérn function in (9) to specify a valid multivariate covariance model, and we sketch the proof here. This proof is similar to that of Theorem 1 of Apanasovich et al. (2012), with additional consideration required to account for geometric anisotropy. As such, our proof depends on the following lemmas, due to Apanasovich et al. (2012), proofs for which can be found in that paper.
of Proposition 3.1. We operate in the spectral domain: by Cramér's generalisation of Bochner's Theorem (Cramér, 1945), the covariance matrix (C pq (h)) P p,q=1 is nonnegative definite if and only if the corresponding matrix of spectral densities (f pq (ω)) P p,q=1 is also nonnegative definite. We therefore consider the form of the multivariate spectral density function, corresponding to (9): where each anisotropic deformation matrix Σ pq is defined according to (5) in terms of θ pq and ζ pq . We can decompose this spectrum as follows, in the process defining four terms numbered I to IV: where A ν,pq = 1 − {ν pq − (ν pp + ν qq ) /2} /∆ ν is the (p, q)-element of a valid nonnegative correlation matrix; nonnegative definiteness of the spectral matrix follows from nonnegative definiteness of the matrices formed from these constituent terms.
Condition 3.2 is sufficient to guarantee nonnegative definiteness of the matrices with elements given by either the first or third terms in (15); this can be seen for the former by applying Lemma 1 and for the latter by applying Lemma 2.
Conditions 3.1, 3.2 and 3.4 are sufficient to guarantee nonnegative definiteness of the matrix with elements given by the second term of (15). To see this, we first rewrite the second term in (15) as where we note that the infinite expansion of the logarithm is valid when and this is satisfied at all times, since 4ν pq /α 2 pq > 0. Now, consider the matrices B and C with elements b pq ≥ 0, c pq ≥ 0, p, q = 1, . . . , P and suppose that both −B and −C are conditionally nonnegative definite. By applying Lemma 2 (with δ = 0, r = 1), we have that the matrix with elements 1/b pq is nonnegative definite, and therefore by the Schur product theorem, we have that the matrix with elements −c pq /b pq is conditionally nonnegative definite. Now, using the matrices in Conditions 2 and 4 in place of the matrices −C and −B, respectively, we can state that the matrix with elements − 4ν pq /α 2 pq Σ 1/2 pq 2 is conditionally nonnegative definite. By applying Lemma 2 once more (this time with δ = 1), we therefore have that the matrix with elements 4ν pq /α 2 pq Σ 1/2 pq 2 + 1 −r is nonnegative definite for all r > 0. It is now clear that, since A ν is nonnegative definite and ∆ ν ≥ 0 (both by Condition 1), each exponential argument within the product above specifies a nonnegative definite matrix. Repeated further use of the Schur product theorem therefore allows us to conclude that the matrix with elements given by the second term in (15) is indeed nonnegative definite.
Finally, Condition 3.3 states the nonnegative definiteness of the matrix with entries specified by the fourth term of (15), and so we may conclude the stated result.
A.2 Proofs of Remarks 3.2-3.4 In Remarks 3.2-3.4, we provide definitions of the correlation length, smoothness parameter and spatial deformation matrix for the geometric anisotropic Matérn cross-covariance function C pq (h |α pq , ν pq , σ pq , Σ pq ), in terms of the corresponding marginal quantities. In this subsection, we prove that these definitions satisfy Conditions 3.2-3.4, respectively.
Recall that a matrix A ∈ C P ×C P is conditionally nonnegative definite if, for all x ∈ C P such that P p=1 with ∆ α ≥ 0 and 0 ≤ A α,pq ≤ 1 that form a valid correlation matrix. Consider x ∈ C P such that P p=1 x p = 0. Using (16), Hence, the matrix with (p, q)-element − 4νpq α 2 pq is conditionally nonnegative definite.
Proof of Remark 3.3. Through straightforward manipulation of the expression in Remark 3.3, we see that the matrix in Condition 3.3 is equal to the matrix with (p, q)-element given by V p V q A σ,pq , where V p , V q and A σ are defined in Remark 3.3. Since A σ is a (nonnegative definite) correlation matrix, and since V p , V q ≥ 0, this is also nonnegative definite.
Proof of Remark 3.4. We also give motivation for the chosen construction of Σ pq . We wish to have Σ pq such that the P × P matrix with (p, q)-element −ω T Σ pq ω is conditionally nonnegative definite. Now, for a P × P matrix with (p, q)-element −C pq to be nonnegative definite, a necessary condition is for It therefore follows that for Condition 3.4 to hold, we need Since this must hold for all ω ∈ R 2 , we can consider the particular case for {ω ∈ R 2 : ω 2 = 0}, Σ is a P × P correlation matrix and each ∆ (i) Σ is a nonnegative constant. Now, consider x ∈ C P such that P p=1 x p = 0. We wish to show that p,q x p ω T Σ pq ω x * q = ω T p,q x p Σ pq x * q ω ≤ 0 ∀ω ∈ R 2 .
By expanding the above quadratic in ω, and then substituting our chosen construction for the diagonal elements, we can simplify to obtain ω T p,q Σ ω 2 2 + 2ω 1 ω 2 p,q where, for i = 1, 2, B Σ is a correlation matrix. In order for this quadratic term to maintain the same sign for all ω ∈ R 2 , we must be able to factorise it further, i.e. we must be able to write ω T p,q x p Σ pq x * q ω = k 1 (ω 1 ± k 2 ω 2 ) 2 .
for some k 1 , k 2 ∈ R. By expanding and equating terms, it is straightforward to show that this form can be obtained: we can write Note that this specifies a relationship between the diagonal and off-diagonal elements of the set of matrices {Σ pq , p, q = 1, . . . , P }, which must be satisfied in order for the P × P matrix with (p, q)-element −ω T Σ pq ω to be conditionally nonnegative definite.
Note that, since each Σ pq is a deformation matrix with form given by (5), its diagonal and off-diagonal elements must be consistent with the same choice of (θ pq , ζ pq ). This places a fundamental restriction on the form of each Σ pq , which will, in general, not agree with the constraint in (17). We can circumvent this apparent incompatibility of restrictions on the set of deformation matrices {Σ pq , p, q = 1, . . . , P } by writing the off-diagonal elements in the form where b, c ∈ R P are constant P -length vectors, ∆ Σ is a nonnegative constant, and A Σ ∈ R P ×P is a P × P real matrix that satisfies P p,q=1 x p A (2) Σ .
By specifying the off-diagonal elements of Σ pq in this way, we have that our conditional nonnegative definiteness restriction (17) reduces to a restriction on A Σ , which is unaffected by the need for Σ pq to maintain the form of a valid deformation matrix, specified by (5); since there are no further restrictions on the form of A Σ , such a matrix will certainly exist. Therefore, if the diagonal elements of the deformation matrix Σ pq are specified as in Remark 3.4, the resulting off-diagonal elements (which are immediately specified via (5)) will always satisfy a valid decomposition (18), guaranteeing satisfaction of the relationship (17) will be satisfied. This allows us to conclude that, if the diagonal elements of the deformation matrix Σ pq are specified as in Remark 3.4, the P × P matrix with (p, q)element −ω T Σ pq ω will be conditionally nonnegative definite.

A.3 Estimators of second-order summary statistics
We present details of two estimators of second-order summary statistics that are used in our parameter estimation procedure. The first estimator we consider is for the isotropic cross-pair correlation function g 0,pq (r), used in initialising the Matérn power and scale parameters:ĝ 0,pq (r) = = xp∈Xp∩W xq∈Xq∩W κ hr ( x p − x q − r) where κ hr is a radial kernel function with bandwidth h r ,λ p is an estimator for the constant expected intensity of component X p , defined in (2), and |W ∩ W u | is an edge correction factor, defined as the area of overlap between the observation window W and its translation by u ∈ R 2 ; without such a correction, due to the finite observation region, the estimator would underestimate the number of point pairs that lie within distance r of each other.
The use of this edge correction also rendersĝ 0,pq (r) =ĝ 0,qp (r) in general. In (19), and in the remainder of the paper, the notation Σ = indicates summation over all point pairs formed of distinct points; for bivariate definitions such as (19), this is clearly only relevant for the case where p = q. For component p of our multivariate LGCP, we choose to estimate the expected intensity parameterλ p using the classical global intensity estimator,λ p = n p /|W |.
The choice of kernel function κ hr is discussed by Illian et al. (2008) and common choices include the Epanechnikov kernel and the box kernel; we make use of the latter as it can be shown to minimise the variance of (19): The second estimator that we detail here corresponds to the anisotropic sector-Kfunction K a pq (r, φ):K a pq (r, φ) =K a pq (r, φ + π) = = xp∈Xp∩W xq∈Xq∩W where H(x 1 − x 2 , (r, φ)) = I( x 1 − x 2 ≤ r)κ h φ (ψ(x 1 , x 2 ) − φ), with I(·) the indicator function, κ h φ an angular kernel function with bandwidth h φ , and ψ(x 1 , x 2 ) the angle between the directed line from x 1 to x 2 and the abscissa-axis. In our implementation, we will use a box kernel for κ h φ , defined analogously to the radial kernel function κ hr above.

A.4 Additional Figures
In Section 5 of the article, we provide proof-of-concept results for our model-fitting procedure. There, we have given numerical summaries of the estimated parameter distributions for four distinct model specifications, along with an illustration, in Figure 3, corresponding to one of these models.
Here, we provide illustration of the estimated parameter distributions for the three remaining model specifications in our proof-of-concept tests. Figures 7, 8 and 9 correspond to Models 2, 3 and 4, respectively, and the true parameter values used to generate each dataset can be found in Table 1.