Second‐order preserving point process permutations

While random permutations of point processes are useful for generating counterfactuals in bivariate interaction tests, such permutations require that the underlying intensity be separable. In many real‐world datasets where clustering or inhibition is present, such an assumption does not hold. Here, we introduce a simple combinatorial optimization algorithm that generates second‐order preserving (SOP) point process permutations, for example, permutations of the times of events such that the L function of the permuted process matches the L function of the data. We apply the algorithm to synthetic data generated by a self‐exciting Hawkes process and a self‐avoiding point process, along with data from Los Angeles on earthquakes and arsons and data from Indianapolis on law enforcement drug seizures and overdoses. In all cases, we are able to generate a diverse sample of permuted point processes where the distribution of the L functions closely matches that of the data. We then show how SOP point process permutations can be used in two applications: (1) bivariate Knox tests and (2) data augmentation to improve deep learning‐based space‐time forecasts.


| INTRODUCTION
In this paper, we are motivated by the problem of testing for interactions between two spatial-temporal point processes.For example, in Klauber (Klauber, 1971), a two-sample randomization test was introduced to test for interactions between underground nuclear tests and subsequent earthquakes.Similar methods have been applied to test for interactions between firearm arrests and shootings (Wyant et al., 2012), homicide and other violent or property crimes (Flaxman et al., 2013) and law enforcement drug seizures and overdose (Mohler et al., 2021).In a more general multivariate fashion, the interest is in detecting and modelling cross-interactions among any two marginal processes, as shown in Jalilian et al. (2015) and Waagepetersen et al. (2016).Indeed, the literature on modelling multivariate spatial point patterns is mainly restricted to the bivariate case (see (Diggle & Milne, 1983;Harkness & Isham, 1983;Grabarnik & Särkkä, 2009;Picard et al., 2009)).In this context, interactions have been modelled by using Gibbs point processes and Cox processes.
In a two-sample randomization test, the event times of a point process are permuted, while the spatial coordinates of the process are fixed.This "permuted point process" provides a counterfactual that can be compared with the second point process for which an interaction is being assessed.However, the permuted point process may not have the same statistics as the original data.For example, in Figure 1 we plot a realization of a self-exciting Hawkes process along with a permutation of the Hawkes process.Whereas the Hawkes process exhibits significant spacetime clustering, the permuted process only exhibits spatial clustering.Thus, two-sample randomization tests typically require the assumption that the point process is stationary Poisson or that the intensity of the point process is separable (Bhopal et al., 1992;Diggle, 2013;Schmertmann et al., 2010).
Our goal here is to generate permutations of a point process that preserve second-order statistics of the process.In Section 2, we introduce a simple combinatorial optimization algorithm that, starting from a random permutation, iteratively swaps two random event times to improve the agreement of the second-order statistics (as measured by the L function) between the permuted point process and the data.We then apply the method in Section 3 to synthetic data generated by a self-exciting Hawkes process and a self-avoiding point process.We also use the method to assess interactions between (a) Los Angeles earthquakes and arson and (b) Indianapolis law enforcement drug seizures and overdose.Finally, we use second-order preserving point process permutations to augment training data and improve a CNN-LSTM-based short-term forecast.
We discuss some open questions and directions for future research in Section 4.

| METHODOLOGY
i¼1 denote the spatial coordinates and event times of a point process and ti be a random permutation of the event times.Two common second-order statistics of a spatial point process are Ripley's K function (Ripley, 1976) and the related L function.We consider space-time extensions of the K and L functions, where we first define z i to be a three-dimensional vector consisting of the two spatial coordinates and one time coordinate of ðx i , t i Þ (and each coordinate is rescaled by min-max scaling to be in ½0,1).The K function is then estimated by and the L function is given by LðrÞ ¼ ½KðrÞ 1=2 .The K and L functions measure the prevalence of events within a space-time radius r of each event from the process and can be compared with the K and L functions of a Poisson process to determine clustering or inhibition.
Here we propose an algorithm to generate permutations such that the L function, LðrÞ, of the data better matches the L function, LðrÞ, of the permuted point process.We first note that we do not want the L function of each permutation to exactly match that of the data.If one were to simulate multiple realizations of a point process, there would be variation in each sample that would lead to a distribution of L functions.Thus, our goal is to generate multiple permuted point processes such that the distribution of the L functions closely matches the distribution of the L function of the data.
The algorithm, which is summarized in Algorithm 1, proceeds in two stages.In the first stage, M independent random permutations zk M of the data are generated and the L function L k ðrÞ is computed for each permutation.Next, the mean over the M L functions is calculated as follows: along with the error, F I G U R E 1 Simulated Hawkes process in black (first spatial coordinate versus time) along with the same Hawkes process with times randomly permuted in red.
In the second stage, we generate M second-order preserving (SOP) permutations.For each k, we initialize the permutation with the random permutation zk from stage one.We then iteratively swap two random times to generate a proposal permutation qk .The L function, L prop ðrÞ, of this proposal permutation is then computed, along with the proposal error, Here we want the L function of the permutation to be close to the L function of the data plus the L function error of random permutation k, so that the variation of the SOP permutations around the data L function will be similar to the variation of the random permutation L functions around their mean.We accept proposal permutations if they reduce the error, and we terminate the iteration when the error is below a tolerance parameter γ.We note that the integral in Equation ( 4) can be approximated using numerical integration, for example, using the trapezoidal rule (we calculate the L function at 100 equally spaced points and approximate the integral using quadrature at those same points).

| Bivariate Knox test
The motivating application of our second-order preserving permutations is a two-sample Knox test (Klauber, 1971) for interactions between two point processes.In particular, given a time cutoff τ and spatial distance cutoff δ, the Knox statistic (Knox & Bartlett, 1964) κðτ, δÞ is given by κðτ,δÞ ¼ where the two point processes are ðx a i , t a i Þ and ðx b j , t b j Þ.The Knox statistic counts the number of events of type b within a radius δ and time window τ of events of type a.
To determine excess clustering or inhibition, the Knox statistic can be compared with a null distribution where the two processes are independent.The null distribution of the Knox statistic is computed through multiple realizations of κðτ,δÞ ¼ X i, j where tb j are random permutations of the event times of process b.In the next section, we will explore the benefits of using SOP permutations rather than random permutations to generate tb j .

| EXPERIMENTS
In this section, we illustrate the effectiveness of Algorithm 1 at generating second-order preserving permutations using synthetic and real data.
We then compare the results of bivariate Knox tests that utilize SOP permutations versus random permutations.Finally, we show how second-order preserving permutations can be used to improve a CNN-LSTM-based space-time forecast through data augmentation.

| Synthetic data experiment with a Hawkes process
In the first experiment, we generate data from a self-exciting Hawkes point process with intensity We simulate the process on the unit cube in space-time with background rate μ ¼ 40, reproduction number θ ¼ 0:75, exponential kernel f in time with parameter ω ¼ 100 (mean 0.01) and Gaussian kernel g in space (2D) with standard deviation σ ¼ 0:01.We estimate the L function on r ½0,0:3 at 100 discrete (evenly spaced) points and use a tolerance of γ ¼ :01 for Algorithm 1.The tolerance was selected by visually inspecting the L functions corresponding to different choices of γ (see Figure 2).
F I G U R E 2 L function of data versus L functions of second-order preserving (SOP) permutations for different choices of γ In Figure 3, we plot a realization of the Hawkes process (first spatial coordinate versus time) along with an example random permutation.We then plot three example SOP (L function preserving) permutations and the corresponding L function.We see that Algorithm 1 is able to produce diverse point patterns that match the second-order statistics of the permutations to the original data.
Next, we apply a bivariate Knox test to two independent Hawkes process realizations generated with the above parameters.For the Knox test, we use parameters δ ¼ 0:1 and τ ¼ 0:1 for the Knox statistic κðτ, δÞ.In Figure 4, we plot the null distribution of the Knox statistic under both random permutations and SOP permutations for M ¼ 200 permutations.We note that the random bivariate Knox test rejects the null hypothesis that the two point patterns are independent, whereas the p-value for the SOP-based Knox test is p ¼ 0:1.On the bottom of Figure 4, we also plot the L functions for the 200 permutations.Here we see that the random permutations are much less clustered than the data, whereas the SOP permutations match the L function of the data (with similar variation to the random permutations, by design).
3.2 | Synthetic data experiment with a regular self-avoiding point process In the next experiment, we generate data from a self-avoiding "regular" point process.We iteratively simulate N points in the unit cube using rejection.The fist point is generated at random, then the next point is generated at random subject to being at least a distance c (space-time Euclidean distance) from all previous points (otherwise the point is rejected and a new point is added).We let N ¼ 100, c ¼ 0:2 and estimate the L function on r ½0,0:3 at 100 discrete (evenly spaced) points and use a tolerance of γ ¼ :01 for Algorithm 1.
In Figure 5, we plot a realization of the regular process (first spatial coordinate versus time) along with an example random permutation.We then plot three example SOP (L function preserving) permutations and the corresponding L function.We see that Algorithm 1 is able to produce diverse regular point patterns that match the second-order statistics of the permutations to the original data.
Next, we apply a bivariate Knox test to two independent regular process realizations generated with the above parameters.For the Knox test, we use parameters δ ¼ 0:1 and τ ¼ 0:1 for the Knox statistic κðτ, δÞ.In Figure 6, we plot the null distribution of the Knox statistic under both random permutations and SOP permutations for M ¼ 200 permutations.We note that in this case, both the random and SOP bivariate Knox tests fail to reject the null hypothesis that the two point patterns are independent (which is the correct result).However, in the bottom of Figure 6, we see that the random permutations are much more clustered than the data, whereas the SOP permutations better match the L function of the data.Next, we apply a bivariate Knox test to earthquake and arson data from Los Angeles during 2020-2021.The earthquake magnitudes range from 1.5 (query cutoff) to 4.28 being the largest.The arson events are based on Los Angeles crime reports from the same time period.Given the small magnitude of the earthquakes, we do not expect there to be an association between these two types of events.We plot the distribution of events in space and time in Figure 7.
For the Knox test, we use parameters δ ¼ 1000 m and τ ¼ 30 days for the Knox statistic κðτ, δÞ.In Figure 8, we see that the random permutations are less clustered than the data, whereas the L functions of the SOP permutations better match the L function of the data.In Figure 8, we also plot the null distribution of the Knox statistic under both random permutations and SOP permutations for M ¼ 400 permutations.Whereas the random permutation test would reject the null hypothesis of independence at the p ¼ 0:01 (two-sided) level, the SOP-based Knox test fails to reject the null hypothesis at the p ¼ 0:05 level.
If we look again at Figure 7, we see that earthquakes are clustered along fault lines, which are in different areas from housing (where arsons are clustered).In time, earthquake aftershocks are highly clustered in short windows (e.g., a few hours).Thus, it appears that arsons "avoid" earthquakes, hence the rejection of the null hypothesis by the standard two-sample Knox test.However, the standard test assumes that the processes themselves are not clustered and clearly earthquakes are.The SOP-based Knox test accounts for randomly occurring avoidance among two independent cluster processes, and the test produces a more conservative p-value.

| Association between drug seizures and overdose in Indianapolis
In our last experiment, we assess interactions between law enforcement drug seizure events and overdoses where naloxone was administered by Indianapolis Emergency Medical Services.Data consist of latitude, longitude, date and time of the incident and come from Indianapolis, Indiana, from 1 July 2018 to 31 December 2018.A space-time positive association between law enforcement drug seizures and overdose was observed in Mohler et al. (2021).One hypothesis is that when law enforcement officers make an arrest for drug dealing and seize drugs, users may need to seek out alternative sources and may be at higher risk of overdose in the near future.For the Knox test, we use parameters δ ¼ 250 m and τ ¼ 21 days for the Knox statistic κðτ, δÞ (similar to the values used in Mohler et al., (2021)).In Figure 9, we see similar agreement between the L functions of random permutations, SOP permutations and the L function of the data.
One possible explanation is that the intensity of law enforcement seizures is separable, and thus, random permutations generate similar secondorder statistics to SOP permutations.We note that the L functions are tightly clustered around the mean, due to larger number of events in the dataset compared with the previous three examples.In Figure 9, we plot the null distribution of the Knox statistic under both random permutations and SOP permutations for M ¼ 400 permutations.Both tests reject the null independence hypothesis, as none of the M ¼ 400 permutations yield Knox statistics as extreme as the data.
3.5 | Data augmentation for CNN-LSTM 1-day ahead point process forecast A number of neural network-based models for space-time point processes have been introduced recently (Chen et al., 2020;Jalilian & Mateu, 2023;Mateu & Jalilian, 2022;Wang et al., 2017).These models can improve accuracy of forecasts over simpler parametric models but typically require larger datasets due to the increased number of parameters.Here we show how SOP permutations can be used to improve model performance on held-out data through augmentation of training data.
Next, we discretize space into 25 Â 25 grid cells and time into 1-"day" intervals from 0 to 730.We then use a sliding window and create 14 Â 25 Â 25 features consisting of a binary indicator in each space-time cell for whether at least one event occurred (y ¼ 1) or no events occurred (y ¼ 0).We use this feature as input to a CNN-LSTM (Shi et al., 2015) to predict whether an event will occur or not in each grid cell in the follow- First, we train the model with no data augmentation.In Figure 10, we plot the AUC of the model on held-out data over 20 epochs.Recall that AUC stands for the area under the ROC curve and provides an aggregate measure of performance across all possible classification thresholds.
Next, we use SOP permutations to increase the data set size by a factor of 10.We again use 20 epochs, so in this case on the 0th and 10th epochs, the actual data are used to train the model, whereas on the other epochs, SOP permuted data are used.We see in Figure 8 that the SOP-augmented training yields an improved AUC value on the held-out data.

| DISCUSSION
In this paper, we introduced an algorithm for generating point process permutations that preserve second-order statistics of the process.We showed that the SOP permutations may be useful for bivariate Knox tests when the underlying process intensities are not separable.In two of our experiments, standard bivariate Knox tests suffered from false discovery and rejected the independence null hypothesis, whereas the more conservative SOP-based Knox test did not.In the other two cases, we found that random and SOP-based Knox tests produced similar results.
Another potential application for SOP point process permutations is data augmentation in deep learning models.One challenge in point process research is that a dataset typically consists of a single realization.The algorithm we have introduced here could be used to simulate other realizations, and our results indicate that such realizations may be able to improve accuracy of deep learning-based space-time models.
There are several limitations of the present article.First, we have no guarantees that Algorithm 1 will converge for a given tolerance.In our experiments, the algorithm was able to find permutations (other than the identity) that satisfied the error objective, but for certain point processes, there may not be a solution.Second, we have not analysed the theoretical properties of SOP-based bivariate Knox tests.In our four examples, we observed improved type 1 errors without loss of power, but more research is needed to better understand the statistical properties of Future research should also explore data augmentation applications of SOP permutations, as well as more sophisticated combinatorial optimization techniques to speed up convergence.

F
I G U R E 3 Top left: realization of the Hawkes process (first spatial coordinate versus time).See text for the parameters used.Top right: example random permutation.Lower left: three example second-order preserving (SOP) permutations of the Hawkes process.Lower right: L function of data versus L functions of SOP and random permutations.

F
I G U R E 4 Two independent Hawkes processes experiment.Top: Null distribution of the Knox statistic under both random permutations (left) and second-order preserving (SOP) permutations (right) for M ¼ 200 permutations.Data Knox statistic represented by vertical dashed line.Bottom: L functions for the 200 random (right) and SOP (left) permutations.F I G U R E 5 Top left: realization of the regular process (first spatial coordinate versus time).See text for the parameters.Top right: example random permutation.Lower left: three example SOP permutations of the regular process.Lower right: L function of data versus L functions of second-order preserving (SOP) and random permutations.

F
I G U R E 6 Two independent regular point processes experiment.Top: null distribution of the Knox statistic under both random permutations (left) and second-order preserving (SOP) permutations (right) for M ¼ 200 permutations.Data Knox statistic represented by vertical dashed line.Bottom: L functions for the 200 random (right) and SOP (left) permutations.F I G U R E 7 Top: distribution of earthquake and arson events in Los Angeles during 2020 and 2021.Bottom: distribution of drug seizure and overdose events in Indianapolis during the second half of 2018 F I G U R E 8 Arson versus earthquakes.Top: null distribution of the Knox statistic under both random permutations (left) and second-order preserving (SOP) permutations (right) for M ¼ 400 permutations.Data Knox statistic represented by vertical dashed line.Bottom: L functions for the 400 random (right) and SOP (left) permutations.
ing day.The CNN-LSTM consists of three ConvLSTM2D layers (implemented in Keras) followed by batch normalization.The last layer is a F I G U R E 9 Law enforcement drug seizures versus overdose.Top: null distribution of the Knox statistic under both random permutations (left) and second-order preserving (SOP) permutations (right) for M ¼ 400 permutations.Data Knox statistic represented by vertical dashed line.Bottom: L functions for the 400 random (right) and SOP (left) permutations.Conv3D layer with a sigmoid activation.The model is trained using Adam optimization with a binary cross-entropy loss.The data are split into the first 486 days used for training (after waiting 14 days to create the first set of features) and the last 216 days used for testing.
SOP permutations and associated interaction tests.At present, our recommendation would be to use the SOP-based Knox test in tandem with the standard random Knox test.If the results fail to agree, then one should question whether the assumptions of the standard Knox test hold.