Fast and accurate recurrent event analysis for genome‐wide association studies

Many diseases recur after recovery, for example, recurrences in cancer and infections. However, research is often focused on analysing only time‐to‐first recurrence, thereby ignoring any subsequent recurrences that may occur after the first. Statistical models for the analysis of recurrent events are available, of which the extended Cox proportional hazards frailty model is the current state‐of‐the‐art. However, this model is too statistically complex for computationally efficient application in high‐dimensional data sets, including genome‐wide association studies (GWAS). Here, we develop an application for fast and accurate recurrent event analysis in GWAS, called SPARE (SaddlePoint Approximation for Recurrent Event analysis). In SPARE, every DNA variant is tested for association with recurrence risk using a modified score statistic. A saddlepoint approximation is implemented to achieve statistical accuracy. SPARE controls the Type I error, and its statistical power is similar to existing recurrent event models, yet SPARE is significantly faster. An application of SPARE in a recurrent event GWAS on bladder cancer for 6.2 million DNA variants in 1,443 individuals required less than 15 min, whereas existing recurrent event methods would require several weeks.


| INTRODUCTION
Since the first genome-wide association study (GWAS) was carried out in 2005 (Haines et al., 2005), many statistical methods have been developed to enable computationally efficient analyses on a genome-wide scale. Currently, analysis of GWAS data using logistic and linear regression models require little computational time. However, most methods for GWAS on survival traits, which were only released much more recently, are generally slower.
In 2017, the R package gwasurvivr (Rizvi et al., 2019) became available for GWAS time-to-event analysis using a Cox proportional hazards model. Gwasurvivr applies a fast Cox regression by means of a low convergence threshold, but it is still not scalable to large biobanks such as the UK Biobank. In 2020, Bi et al. (2020) developed SPACox, which is based on a score test and enables a fast and accurate time-to-event analysis for large biobanks, including the analysis of rare variants using a saddlepoint approximation (SPA). Recently, Coxmeg (He & Kulminski, 2020) and GATE (Dey et al., 2022) became available, which enable large-scale mixed-effects Cox regression using a score test to account for population stratification and family structure. Whereas GATE uses a variance ratio approach for estimating variance in the score test and implements a SPA, Coxmeg uses a genetic relationship matrix for estimating score statistic variances.
All existing methods for survival analysis in GWAS focus on time-to-event outcomes, and no methods have been developed for analysis of recurrent event outcomes. Many diseases have a recurring nature, for example recurrent cancers, infections, depressions or sport injuries (Kaster et al., 2021;Smedinga et al., 2017;Ullah et al., 2014). In a recurrent event analysis, all recurrences are included in the statistical analysis, instead of only the first recurrence. This improves the power of detecting associations between the determinant and risk of recurrence compared to time-to-event analysis (Zhong & Cook, 2021). Statistical recurrent event models are available, most of which are extensions of the Cox model; examples of well-known recurrent event models are the Andersen and Gill (AG) model (Andersen & Gill, 1982) and the Prentice, Williams, and Peterson (PWP) model (Prentice et al., 1981), which are available in standard survival software. Recurrent event models such as the AG model or the PWP model can include frailties as random effects to account for correlated recurrence times within subjects. However, frailty models are significantly slower than above-mentioned Cox-type models without a frailty term, and faster alternatives are needed to enable a recurrent event GWAS in a workable time frame.
In this study, we propose our novel method SaddlePoint Approximation for Recurrent Event analysis (SPARE), which uses a test statistic similar to the score statistic to assess the associations between a DNA variant and recurrence risk. Because the test statistic is based on martingale residuals, we only require fitting the null model once, which saves run time. Our method also implements a saddlepoint approximation to achieve statistical accuracy for low-frequency variants. The implementation of SPARE is schematically displayed in Figure 1. Through an extensive simulation study, we demonstrate that SPARE controls the Type I error and that the statistical power of SPARE is comparable to state-of-the-art recurrent event models over a wide range of realistic simulation scenarios. Finally, the applicability of SPARE is demonstrated on real bladder cancer recurrence data.

| MATERIALS AND METHODS
In this section, we first discuss recurrent event models that are typically used to analyse recurrent event data. Second, we define our statistic and describe how they are used in the saddlepoint approximation in SPARE to test F I G U R E 1 Implementation of SPARE in a GWAS. Given a recurrent event model that can be chosen by the user, a null model is fitted to compute martingale residuals and the saddlepoint approximation of martingale residuals. Subequently, for every genetic variant, a test statistic is computed for association between the variant and recurrence risk. In case of a low p value, a saddlepoint approximation is applied. SPARE, SaddlePoint Approximation for Recurrent Event analysis. associations between genetic variants and recurrent events. Finally, the set-up of our simulation study and real data application is discussed, which will demonstrate the performance of SPARE. For simplicity, we mainly focus on SPARE as an alternative for the AG frailty model, since this is the most used and most wellknown recurrent event model.

| Recurrent event analysis
Since recurrent event models include multiple recurrences, there are additional considerations in modeling recurrence risk as compared to the time-to-event Cox model. Kelly and Lim identified different model components of Cox-based recurrent event models (Kelly & Lim, 2000), namely: (1) time scale; (2) risk set; and (3) correction for within-subject correlation. In addition, Kelly and Lim described stratification of the baseline hazard as a model component, however, this component is fixed by choice of risk set. A brief explanation of the model options is included in Table 1. More information on these model components can be found in the paper by Kelly and Lim (2000). Additional recommendations on modeling recurrent events for epidemiological studies can be found in other papers (Amorim & Cai, 2015;Kaster et al., 2021;Yadav et al., 2018).
Notably, SPARE is a suitable alternative for any recurrent event model that can be constructed based on the characteristics in Table 1. Any desired type of recurrent event model based on these characteristics can be selected as null model, which is used to compute the martingale residuals and test statistics.

| The andersen and gill model
We assume that the data consist of n independent samples ,2 , i denote the observed event times of individual i. Index n i denotes the total number of observed events of individual i, the final event time is T A B L E 1 Overview and description of recurrent event model components and the options for the model components, adopted from Kelly and Lim (2000). In the AG frailty model, the recurrence risk of individual i for experiencing the kth recurrence is controlled by λ t X g λ t e ( ; , ) = ( ) .

Model
Here, λ t ( ) 0 is the baseline hazard function for all recurrences. Parameters β and γ in Equation (1) specify the association parameters corresponding to respectively the covariates and genetic variants, and random effect u i denotes the individual-specific frailty for experiencing a recurrence. The frailty term accounts for correlated recurrence times within individuals, and ignoring this correlation may lead to spurious associations.
In this setup, the partial likelihood of the AG model is given by in which r k n = 1( = ) , which equals 1 if the kth event was a recurrence, or 0 in case of a censoring event.
Next to frailty models, variance-corrected recurrent event models are also frequently used in epidemiology. These models do not include frailties, but use a robust variance "sandwich"-estimator for the effect estimates to test for association in the recurrent event models. The robust variance adjusts the variance of the association parameters for the correlated recurrence times within individuals. We will consider both the AG model based on frailty and the AG model based on robust variance in our analyses.
The aforementioned definitions of event times and hazard functions are based on calendar time, as the AG is a calendar time model. However, these definitions can be transformed to correspond to a gap time analysis. In the gap time framework, the time since previous recurrence is analyzed, therefore the time of the previous recurrence is subtracted from the current time in the baseline hazard: λ t X g λ t t e ( ; , ) = ( − ) Elapsed time models have the same formulation of the hazard function as the calendar time model, however, subjects are at risk of all subsequent recurrences, which leads to different estimates in baseline hazard and effect estimates (Kelly & Lim, 2000). It is also possible to use recurrence-specific baseline hazards such as used in the PWP model (Prentice et al., 1981) by stratifying the baseline hazards per recurrence, which are used in case of (semi-) restricted risk sets described in Table 1.
In our definitions, we omit situations in which there is intermittent interval censoring, or situations in which the final event time is equal to the time of censoring.

| Martingale residuals and test statistics
The test statistics in SPARE are based on martingale residuals, which are computed once from a recurrent event null model. For example, for the AG frailty model in Equation (1), the null model is given by The null model should be adapted to the desired model characteristics given in Table 1, so that the martingale residuals match the desired recurrent event analysis. All desired covariates (e.g., age, sex, principal components) should be included in the null model, so that these effects are also taken into account in SPARE. The cumulative hazard function of an individual i is given by and represents the total accumulated risk of experiencing recurrences that an individual has been exposed to up to time t. The total cumulative hazard of individual i is obtained by evaluating the total hazard that the individual has been exposed to, that is, until censoring: This can be interpreted as the total 'expected' number of recurrences for individual i during its exposure.
Martingale residuals are defined as the difference between observed number of events and the cumulative hazard of individual i: Martingale residuals have mean zero, and represent the discrepancy between the observed number of recurrences and expected number of recurrences during the time that an individual was at risk of recurrences.
In SPARE, estimates for λ t ( ) 0 , β and u i in the null model are obtained using the coxme function in R, which is based on a penalized partial likelihood (Therneau et al., 2003). These estimates are subsequently used to compute the cumulative hazard function estimate t X Λ ( , ) i i in Equation (4), the total cumulative hazards estimates Λ i and martingale residuals estimates M i in Equation (5). Note that all estimates have been computed from a null model, which is the same for all genetic variants and only needs to be computed once.
For any genetic variant, the test statistic is used in SPARE to test the association between the genetic variant g and recurrence risk. The empirical variance of S is estimated by Details on the derivation of the variance of S are given in the Supporting Information Methods. Note that the genetic variant is centered to have mean zero in SPARE before computing the test statistic and empirical variance.
As a first step in SPARE, the genetic variant g is tested for association with risk of recurrence using S, which is assumed to be normally distributed with mean zero and variance given in Equation (7) under the null hypothesis.
The test statistic in Equation (6) is similar to a score statistic, which is given by . The test statistic is the same statistic as would have been obtained as estimator for β G in the simple linear regression on martingale residuals: Therefore, a unit increase in the dosage/allele count of g i corresponds to S more recurrences than expected, by definition of the martingale residuals.

| Saddlepoint approximation
The true distribution of S may have a heavier tail compared to the normal distribution, which is usually assumed to be an acceptable approximation under the null hypothesis for generating the primary p value. As a result, hypothesis testing on S can lead to inflated Type I errors. In data with a low number of recurrences, the distribution of M i is skewed (Therneau et al., 1990), which may result in a heavy tail of the true distribution of S. Low-frequency genetic variants g i also cause heavy tails in the distribution of S, which may give rise to inflation of Type I errors. For a better estimation of the true distribution of S under the null hypothesis and therefore more accurate p values, we implemented a saddlepoint approximation, which was also successfully used in SPACox (Bi et al., 2020) and GATE (Dey et al., 2022).
be the moment-generating function of the martingale residuals, that is, with expectation taken over the martingale residuals.
The moment-generating function P M in Equation (9) can empirically be estimated for any value of t: The empirical cumulant-generating function is given by K t P t ( ) = log(ˆ( )).

M M
The cumulant-generating function of S can be computed from cumulant-generating function of the martingale residuals: The saddlepoint approximation of the probability density function (PDF) of S is then given by the cumulative distribution function (CDF) of S is approximated by where Φ denotes the CDF of the standard normal distribution. Instead of a normal distribution for S, the CDF in Equation (15) is used to compute p values in the saddlepoint approximation. Note that Bi et al. adopted a similar scheme as in Equations (9)-(15) in their saddlepoint approximation as used in their method SPACox, however, the authors used as test statistic (Bi et al., 2020). The CGF given in Equation (11) of S′ is then given by and the saddlepoint approximation is changed accordingly. Also note that the martingale residuals of SPACox are based on time to first event only, whereas SPARE includes all recurrences in computation of the martingale residuals.
In our application of the saddlepoint approximation, we used a similar approach as Bi et al. (2020). We computed the cumulant-generating function K t ( ) . The Cauchy distribution is symmetric and ensures a heavy tail with enough points away from zero, while still having most points near its expectation. Linear interpolation was used to estimate for other values of t in the saddlepoint approximation.
Since we only require SPA-corrected p values in the tails of the distribution of S, we do not need to apply the SPA-procedure for genetic variants with test statistics close to the mean, that is, genetic variants with a high p value. Therefore we used a p value threshold of p = 0.01 for the first step in which S was assumed to follow a normal distribution with mean zero and variance given by Equation (7). Only for variants with p value below 0.01, a SPA is applied to obtain a corrected p value. The implementation of SPA increases computational time, since the CDF in Equation (15) needs to be computed for every variant that exceeds the p value threshold. To reduce additional run time of SPARE, it is possible to use more stringent p value thresholds for SPA, for example, p = 0.001 or p = 0.0001.

| Simulations
We compared the Type I error, power, and run time of SPARE and the Andersen and Gill model over a range of simulation scenarios. In simulation scenarios for the Type I error and power, the sample size was fixed at N = 10, 000. For subject i, the time of censoring was generated using a mixture distribution: to mimic a dropout of 50% during the follow-up time.
Recurrences were generated from a non-stratified calendar time model, based on a Weibull distribution: where λ and ν represent the scale-and shape parameters of the Weibull distribution. The hazard function is the same for all recurrences, however individual i is only at risk for the kth recurrence if exactly k − 1 recurrence have preceded (calendar time). In all simulations we set ν = 1.5 and adjusted λ so that the expected number of recurrences matched the simulation scenario. The three-dimensional covariate X is independently normally distributed with mean 0 and variance 1 in each entry, and represents a nongenetic continuous covariate such as age or principal components. The association parameter β = (0.1, 0.2, 0.4) is included to represent a small, modest, and large effect of the covariates on recurrence risk, respectively. Genetic variants g i are independently simulated from a binomial distribution with parameters n = 2 and p equal to the minor allele frequency (MAF), according to the Hardy-Weinberg equilibrium. The frailties u i are normally distributed with mean zero and varying variance to mimic unmeasured effects on recurrence risk, which causes within-subject correlation.

| Model evaluation
We compared the performance of SPARE with the AG model, both with frailty and robust variance. The baseline hazards of the AG (frailty) model and the AG (robust variance) model are respectively given by Martingale residuals were computed based on their null models: We evaluated the statistical performance of SPARE with saddlepoint approximation and without saddlepoint approximation, and both based on frailty and robust variance, leading to four SPARE models: (i) SPARE (frailty), (ii) SPARE (robust variance). (iii) SPARE-noSPA (frailty), and (iv) SPARE-noSPA (robust variance). We also evaluated the performance of two AG models: (i) AG (frailty) and (ii) AG (robust variance). The Cox model was also included in power calculations, to compare the performance of recurrent event models with a time-toevent model in the recurrent event setting. A cut-off p value of 0.01 was used for implementation of the saddlepoint approximation in SPARE, all analyses were based on the additive genetic model. For evaluation of the Type I error, we simulated 1000 recurrent event data sets from a null model including 10 6 independent and unassociated genetic variants per data set, resulting in 10 9 independent tests under the null hypothesis. The Type I error was evaluated at significance levels α = 10 −5 and α = 5 × 10 −8 to study the course of the Type I error of SPARE in the tails of the distribution of the test statistic. The most stringent significance threshold α = 5 × 10 −8 was suggested as genome-wide significance threshold by Risch and Merikangas (Risch & Merikangas, 1996), which is commonly used in GWAS.
To evaluate power, we simulated 1000 recurrent event data sets including one genetic variant, which was associated with hazard ratio (HR) γ according to the model in Equation (18). The power was computed as the fraction of times that the p value of the statistical test exceeded the significance threshold of α = 5 × 10 −8 .
The run time of the different methods was computed for sample sizes N = 1, 1.5, 2.5, 5, 10, 20, and 50k and 10 6 genetic variants. The MAF of these variants was randomly sampled from genotype data of a real GWAS data set (MAF ≥0.05). For AG models with sample size 1-5k, the run time was estimated based on the analysis of 1500 SNPs, for AG models with sample size 10, 20, and 50k, the run time was estimated based on 300 SNPs because of the computational burden. This is justified, since the run time can be linearly extrapolated as it scales linearly with the number of SNPs. The CPU time for 10 6 SNPs was then estimated by multiplying CPU time with 10 6 /(number of analysed SNPs). Three continuous covariates were included in the analysis to mimic a real GWAS.
For application of the AG model in analysis, we used the "REGWAS" R package which was developed by PolyKnomics B.V., commissioned by the Radboud university medical center. This package enables GWASs on recurrent event data but is limited to the Andersen and Gill model. The REGWAS R package is available on https://github.com/PolyKnomics/REGWAS.

| Application to bladder cancer data
Data from the Nijmegen Bladder Cancer Study (NBCS) were used to illustrate the performance of SPARE. The NBCS includes 1443 patients with non-muscle invasive bladder cancer (NMIBC) who experienced a total of 1864 recurrences. All patients were genotyped using OmniExpress-12 and -24 chips and imputed to higher SNP density using 1000 Genomes phase 1 integrated version 3 and Genome of the Netherlands 4 as reference panels. More information on the NBCS and genotyping procedure can be found in Galesloot et al. (2022).
We selected the AG frailty model (Andersen & Gill, 1982) as null model in SPARE. SNPs with MAF <0.05 were excluded because of unadequate power to detect realistic effect estimates in relation to sample size. In total, 6,157,816 SNPs were included in the analysis. Age, sex, and the first 10 principal components were included as covariates in the null model. The additive genetic model was used for testing associations between genetic variants and recurrence risk in SPARE. We also compared the outcome p values of SPARE with the p values obtained from the AG frailty model.

| Type I error
The results of the simulations for assessing the Type I error of SPARE are displayed in Table 2. The AG model was not tested on Type I error, since this is a well-studied model with established asymptotic properties (Andersen & Gill, 1982). Therefore, the statistical inference was assumed to be accurate. Also note that the computation of Type I error for these models would require 1.5 × 10 8 CPU hours, on estimate. SPARE (frailty) controls the Type I error in all simulation scenarios, although a little conservative in the scenario with frailty variance 0.3 and recurrence rate 0.3 for MAF = 0.001, 0.3. The Type I errors for SPARE (robust variance) are slightly conservative when there is a relatively high within-subject correlation (frailty variance = 1). SPARE-noSPA (frailty) and SPARE-noSPA (robust variance) control the Type I error for MAF 0.3, but have inflated Type I errors for all simulation scenarios with MAFs 0.05, 0.01, and 0.001.

| Power
The empirical power of SPARE and the AG model are displayed in Figure 2. We excluded SPARE-noSPA from power analysis, since we observed that SPARE-noSPA has inflated Type I errors for MAF 0.05 and 0.01, which makes power comparisons unfair.
In all simulation scenarios, the statistical power of SPARE (frailty) and AG (frailty) model is similar, except for a slightly lower power of SPARE (frailty) for the scenario with frailty variance = 0.3, recurrence rate = 0.3 and MAF = 0.01 or 0.001. Methods based on robust variance have a decreased power for frailty variance = 1, that is, a high within-subject correlation, except for scenario with MAF = 0.001 in which AG (robust variance) has most power.
Notably, all methods based on recurrent event analysis have a significant gain in power as compared to the time-to-event Cox model. The power gain increases with increasing recurrence rate and decreasing within-subject correlation, that is, a decreasing frailty variance in simulations.

| Run time
For every method, the run times in CPU hours, are displayed in Figure 3. The run time includes time to read genotype files. SPARE (frailty) was 863 to 374, 356 times faster than the AG (frailty) model and 148 to 71, 668 times faster than the AG (robust variance) model, depending on the sample size and the event rate. SPARE scales better with sample size compared to the AG model as the complexity of Cox-type models is quadratic in sample size (Kamphorst et al., 2022), whereas the run time of SPARE scales linearly with sample size. The run time of the AG models is longer for recurrence rate 1.5 versus 0.3, whereas the run time of SPARE is similar for these recurrence rates since SPARE only requires fitting a null model once.
The run times of SPARE (frailty) and SPARE (robust variance) are nearly the same, because they both require fitting a null model only once and apply a SPA in their analysis. There is no difference in the run time of the computation of test statistics and application of the saddlepoint approximation, and the differences in the run time of the null model are neglicible. Implementation of the saddlepoint approximation in SPARE reduced the speed by 1.2 − 1.5-fold, however, this loss of speed can be partly avoided by lowering the p value threshold for use of the saddlepoint approximation (now set at p = 0.01) at the cost of lower accuracy.
The run time calculations were based on SNPs with MAF ≥0.05, for which the SPA is of less importance because the Type I error is well controlled. Nevertheless, we expect a similar run time comparison between SPARE and SPARE-noSPA if we would have tested SNPs with MAF <0.05, because the SPA was applied for p < 0.01. The fraction of SNPs with p < 0.01 is similar for common SNPs and low-frequency SNPs, because the distribution of test statistics of common-or low-frequency SNPs is well approximated by a normal distribution for test statistics with p > 0.01.
3.4 | Application to bladder cancer data SPARE (frailty) identified a genome-wide significant association between rs11649688:C>T and recurrence risk in NMIBC (p = 4.9 × 10 −8 ), which was not previously reported in genetic association studies on recurrence risk in NMIBC. rs11649688 is located upstream gene RBFOX1, in the 16p13.3 locus. Loss of the RBFOX1 gene was previously associated with risk of recurrence in colon cancer (Mampaey et al., 2015).
A Manhattan plot of the GWAS is given in Figure 4; the genomic inflation factor is λ = 1.0092. The total analysis, including the loading of genotype data (6,157,816 SNPs, N = 1443), took 14 min and 10 s, parallelized over 32 cores on AMD EPYC 7H12 processors.
Next, we analyzed the NBCS data set with the AG frailty model, which required 8 h and 20 min parallelized T A B L E 2 Empirical Type I errors for significance thresholds α = 10 −5 and α = 5 × 10 −8 in four different models in SPARE, stratified by the simulation parameters frailty variance, MAF, and recurrence rate.  F I G U R E 2 Power to detect association at α = 5 × 10 −8 , stratified by the simulation parameters frailty variance, MAF and recurrence rate. The x-axis displays the true association parameter γ of a SNP, the y-axis displays the empirical power to detect association at α = 5 × 10 −8 . Sample size N = 10, 000 was used.
over 12 nodes with 128 cores on AMD Rome 7H12 processors, which is equivalent to 16.7 days on 32 cores. This is 1694 times slower than SPARE, and demonstrates that SPARE offers a significant gain in computational time.
The outcome p values of the GWAS analysis of the NBCS data set based on SPARE and AG frailty are plotted against each other in Figure 5. The association that was detected by SPARE between rs11649688 and recurrence risk in NMIBC was not statistically significant in the AG model (log(HR) = −0.29, SE = 0.056, p = 1.8 × 10 −7 ). We observed that p values in SPARE are generally lower than in the AG frailty model. The QQ-plots in Figure 5 suggests that SPARE controls the Type I error in the GWAS on NMIBC recurrence, while the AG model shows a slight deflation of p values.
Finally, we assessed the association between rs11649688 and risk of recurrence in NMIBC in a timeto-event analysis using SPACox (Bi et al., 2020). We observed a strong association between rs11649688 and risk of first recurrence (p = 2.4 × 10 −6 ), but this association was not genome-wide statistically significant, in contrast to the result of the recurrent event analysis.

| SPARE and recurrent event analysis
So far, we only compared the performance of SPARE with the AG model. However, SPARE is a suitable alternative for any recurrent event model based on the characteristics described in Table 1.  (Kelly & Lim, 2000;Prentice et al., 1981). We simulated recurrent event data and genotype data as in Section 2.5, but used the PWP-CT, PWP-GT, and GT-UR models as null model for computing martingale residuals. SPARE (frailty) based on these null models also controls the Type I error. The Type I errors for SPARE (robust variance) are conservative for the GT-UR model, and SPARE-noSPA has inflated Type I errors for MAF 0.001, 0.01, and 0.05, similarly as in Table 2. Also, the statistical power of SPARE based on PWP-CT, PWP-GT, and GT-UR frailty null models are similar to the full PWP-CT, PWP-GT, and GT-UR frailty models, respectively (see Supporting Information Figures).

| DISCUSSION
In this paper, we presented SPARE as a fast and accurate method for recurrent event analysis in GWAS. SPARE is significantly faster than existing recurrent event models, yet achieves a similar statistical power, while adequately controlling the Type I error. Because of the freedom to select the model components for computing martingale residuals, SPARE is a suitable alternative for any Coxtype recurrent event model. SPARE is the first statistical method to enable a recurrent event analysis for GWAS in a workable time frame.
Currently, there is a need for genetic analysis of recurrent events as these data will be increasingly available in the future, for example, via linkage to electronic health records. This holds both for moderate sample sizes, such as cohorts of cancer patients, or biobank-scale sample sizes, such as the UK biobank. While many phenotypes have a recurring nature, they are still often analyzed using time-to-event methods, which may drastically reduce statistical power as we observed in our power calculations. SPARE will enable future studies to more efficiently identify associations between genetic variants and risk of recurrent events.
In SPARE, a type of score statistic is used to test associations between genetic variants and risk of recurrence. Recently developed methods for time-to-event analysis in GWAS are also based on a score test in association analysis (Bi et al., 2020;Dey et al., 2022;He & Kulminski, 2020), since it is much faster than alternative tests in survival analysis, such as the Wald test. SPARE can be viewed as a modified score test, in which a scaled test statistic and a simplified variance of the score statistic is used. The test statistic S used in SPARE can be interpreted as the regression coefficient in a simple linear regression on martingale residuals with the genetic variant as independent variable: a unit increase of the dosage/allele count corresponds to S additional recurrences on average.
Recurrent event analysis is different from time-toevent analysis, as multiple recurrences that an individual might experience are included in the analysis instead of only the first event. Because of this, recurrent event analysis faces additional modeling considerations, such as stratification of baseline hazard, time scale, and correction for correlated event times (Kelly & Lim, 2000). An important strength of SPARE is that any desired specification of recurrent event model options can be incorporated. Notably, in our current work, we considered a simplified scenario for recurrent event analysis in which no intermittent censoring or time-varying variables were present, and in which the time of censoring was later than the time of the final recurrence. The null model used in SPARE can, however, also account for these model options, for example, it may be desirable to include the number of prior recurrences as time-varying variable into the model.
A possible shortcoming of SPARE is that an association between a genetic variant and a recurrent event outcome can be accounted for in the frailty term of the null model, as part of the unexplained recurrence risk. This way, the frailty term already captures part of the genetic effect, and the martingale residuals may thus already partly be adjusted for this genetic effect. This leads to a reduced association between the genetic variant and martingale residuals in SPARE, with the risk that a genetic variant will not reach the significance threshold and will be missed. This problem can be avoided by fitting a full recurrent event frailty model such as the AG frailty model for variants that are nearly below the signficance threshold. Finally, SPARE might not be suitable for the detection of interactions between genetic variants and a covariate, since the null model already should contain the covariate effect. This might be avoided by including the covariate in SPARE together with the genetic variant and the interaction term, but we did not verify this in simulations.
In conclusion, SPARE enables a fast and accurate recurrent event analysis in GWAS. We recommend using SPARE based on a frailty null model, as this adequately controls the type I error while maintaining statistical power at the cost of only a slight reduction in run time as compared to robust variance. We recommend using p value threshold of p = 0.01 for implementation of SPA, as this yields accurate results while maintaining a fast run time.
We implemented SPARE in the R package "SPARE," which is available on https://github.com/JasperHof/ SPARE. The software enables recurrent event analysis for. bed and. bgen format, and allows the user to modify relevant parameters used in analysis such as p value threshold for SPA and type of null model. SPARE relies on existing R packages for fitting the null model, which are sufficiently fast for large biobank scale data. A guide on how to install and use SPARE is available on https:// github.com/JasperHof/SPARE.

ACKNOWLEDGMENTS
This work was sponsored by NWO-Domain Science for the use of supercomputer facilities. This article is funded by Radboud Institute for Health Sciences.

DATA AVAILABILITY STATEMENT
The data that support the findings of this study are openly available in GitHub at https://github.com/ JasperHof/SPARE.