##### Lan–Demets group sequential approach using error spending

The first method we consider is a general group sequential method used mainly in randomized clinical trials developed by Lan and Demets[10] using an error spending approach. An error spending approach uses the concept of cumulative alpha or type I error, *α*(*t*), defined as the cumulative amount of type I error spent at analysis *t* and all previous analyses, 1,…,*t*-1. We assume that 0 < *α*(1) ≤ ∙∙∙ ≤ *α*(T) = *α*, where *α* is the overall type I error to be spent across the evaluation period. The function *α*(*t*) can be any increasing monotonic function that preserves family-wise error, but there are several common approaches including the Pocock[11] boundary function *α*(t) = log(1 + (exp(1)-1)t/T) *α*, O'Brien-Fleming[12] boundary function *α*(t) = , and the general power boundary function *α*(t) = (*t*/*T*)^{p}*α* for *p* > 0. The most commonly used boundary function for safety evaluations has been a flat, Pocock-like, boundary on a standardized test statistic scale. This boundary spends *α* approximately evenly across analyses, given the test statistic is asymptotically normally distributed. Therefore, it spends more *α* at earlier analyses relative to later analyses, given the amount of statistical information, or sample size, observed up to time *t* compared with an O'Brien Fleming boundary, which is commonly used in efficacy studies. This flat boundary has been discussed as Pocock like, but a Pocock boundary when testing more frequently (quarterly or more often) is not completely flat. For further discussion of boundary shapes and statistical trade-offs between them in practice for postmarket surveillance, see Nelson *et al*.[13]

Given the error spending boundary function, Lan and Demets developed an asymptotic conditional sequential monitoring boundary for any asymptotically normal test statistic based on independent increments of data.[10] This boundary can be computed and used to compare with almost any standardized test statistic, including one that controls for confounding. For example, when interest is in an adjusted RR, , or log RR, it can be estimated using Poisson regression, and a standardized test statistic can be calculated,. The value of *Zval(t)* can then be compared with the asymptotic conditional monitoring boundary developed by Lan and Demets,[10] resulting in a decision to stop if *Zval(t)* exceeds the monitoring boundary or to continue collecting additional data. This is an appealing approach because the boundary is very simple to calculate and relies on a well-defined asymptotic distribution. However, in practice with rare events and frequent testing (small amount of new information between analyses), the asymptotic properties of the boundary fail to hold. This is similar to the scenario where an exact test may be preferred to an asymptotically normal test when the sample size is small. The following methods have sought to address the shortcomings of this approach to allow for more precise statistical performance in a wider variety of settings.

##### Group sequential likelihood ratio test

The group sequential likelihood ratio test (LRT) approach is a method that has been used in the Vaccine Safety Data Link project to monitor vaccine safety for a single time vaccine exposure.[3, 6, 7, 14] The approach uses exposure matching with a fixed matching ratio (1:M) to control for confounding and then computes a LRT statistic. The most commonly used method is the Binomial maxSPRT,[14] which assumes continuous monitoring (i.e., after each matched set of exposed and unexposed individuals come into the dataset, the test statistic is compared with the monitoring boundary).

Specifically, for the maxSPRT method, one creates matched exposure strata, *s* (*s* = 1,…,S), such that each exposed individual, with *D*_{s1} = 1, is matched to one or more unexposed individuals (*D*_{s2} = 0,…,*D*_{S(M+1)} = 0) who have the same categorical confounders, . Then, the log LRT statistic at each analysis, *t*, is the following:

where and are the number of events observed among those exposed and unexposed to the MPI up to time *t*, respectively, and *Y*(*t*) = *Y*_{D = 1}(*t*) + *Y*_{D = 0}(*t*) is the total number of events up to time *t*. Note that *S(t)* is the number of strata up to time *t*, which also is the number of exposed participants because we are assuming a fixed matching ratio of 1:M. This particular LRT, which conditions on the total number of events, *Y*(*t*), is designed for the rare event case in which only one event is expected to be observed per exposure stratum. One can think of this LRT as comparing the observed proportion of exposed (and unexposed) events out of the total number of events to the expected proportion under the null, which is just 1/(M + 1) for the exposed participants and M/(M + 1) for the unexposed participants.

However, when events are not extremely rare, or when the probability within a stratum of more than one event occurring is not small, the assumptions of this LRT are violated, and a more general two-sample binomial likelihood ratio test statistic should be used:

where andare the number of people exposed and unexposed to the medical product up to time *t*, respectively, and *N*(*t*) = *N*_{D = 1}(*t*) + *N*_{D = 0}(*t*) is the total sample size up to time *t*. Note that this general LRT incorporates the total sample size, unlike the binomial maxSPRT LRT that is conditional on the total number of events. For rare events, the performance of each LRT is similar. Further evaluation needs to be conducted to establish the scenarios in which each LRT has better statistical properties.

For the binomial maxSPRT, a Pocock-like boundary has been proposed, c(*t*) = *a*, which is a flat boundary on the log LRT statistic. One common way to solve for the constant, *a*, uses an iterative simulation approach similar to the following:

- Step 1: Simulate data assuming H
_{o} and the observed event rate while controlling for confounding (i.e., using a permutation approach: fix *Y*_{s1},…,*Y*_{sM} (*s =* 1,…,S), and permute *D*_{s1},…,*D*_{sM} to create *D*_{s1*},…,*D*_{sM*} so that you hold the exposure strata relationships and thus control for confounding). - Step 2: Calculate LLR(
*t*) on the simulated dataset. - Step 3: If LLR(
*t*) ≥ *a* then *Signal*_{k} = 1 and stop loop; otherwise, continue to next *t* + 1. - Step 4: If
*t* = T, then *Signal*_{k} = 0.

This process is repeated a large number, *Nsim*, times, and the estimated *α* level for the boundary is calculated as . One solves for *a* by repeating the simulation and changing *a* until .

This approach is a special case of the general unifying boundary approach developed by Kittleson *et al*.[15] To allow for the more general approach, define *c(t) = au(t)* where *u(t)* is a function dependent upon the proportion of statistical information (e.g., sample size) up to time *t* and is of the form *u(t)* = (*N(T)/N(t)*)^{1-2Δ} where Δ > 0 is a fixed parameter depending upon the design (e.g., *u(t)* = 1 is Pocock, and *u(t)* = (*N(T)/N(t)*)^{0.5} is O'Brien and Fleming). The same approach is used to solve iteratively for *a*, but the boundary *c(t)* will now be shaped differently depending upon *u(t)*. We have named this more flexible version of the binomial maxSPRT as the group sequential LRT (GS LRT). This additional flexibility allows the method to be applied more generally, for example, within the Mini-Sentinel pilot, where data are not available as often (potentially quarterly). Furthermore, the shape of boundary can be changed to reflect the desired trade-offs appropriate to the specific safety question of interest. Because the original binomial maxSPRT used a unifying boundary type approach, we have presented it as such here, but as has been shown by others[16], the error spending approach and unifying approach are complementary, and therefore, we could have chosen an error spending approach.

A potential limitation of the GS LRT method is the fixed matching ratio. In practice, if there is a need to implement a strict matching criterion, because of the need for strong confounding control, then it can be difficult to find *M* unexposed matches for each exposed participant especially in the scenario of frequent monitoring. Frequent monitoring typically implies that an exposed participant should be matched to *M* unexposed participants within the current analysis time frame. This can lead to loss of matched strata including strata with events. When strata are lost, the results are then only generalizable to the subpopulation of the exposed population for which a matching control was found. Often, the matching criterion is then loosened, leading to less confounding control but a larger matched cohort.

##### Conditional sequential sampling procedure

The conditional sequential sampling procedure (CSSP)[17] was specifically developed to handle chronically used exposures, such as drugs that are taken over a period. However, the approach also is able to accommodate a single time exposure such as a vaccine. This method handles confounding using stratification and assumes that the data are aggregated.

Specifically, using categorical confounders, , one stratifies the entire population under evaluation (unlike GS LRT, which uses a matched sample). Then, at each analysis, *t*, within each confounder stratum, (k = 1,…,K), one calculates the exposure time, *E*_{D, k}(*t*), and number of events, *Y*_{D, k}(*t*) among all participants in stratum *k* on medical product *D* (*D* = 0 (unexposed) or *D* = 1 (exposed)) since the previous analysis *t*-1, where and Under Ho, no relationship between exposure to the MPI and the outcome conditional on strata, the conditional distribution of *Y*_{D = 1, k}(*t*)|*Y*_{D = 1, k}(*t*) + *Y*_{D = 0, k}(*t*) is, which is based on the proportion of exposure time observed for those exposed compared with the total exposure time including exposed and unexposed. Using this stratum-specific conditional distribution, one can simulate the distribution of *Y*_{D = 1, k}(*t*), the number of outcomes among those on the MPI within each stratum under Ho, given *Y*_{D = 1, k}(*t*) + *Y*_{D = 0, k}(*t*).

The test statistic of interest is then the total number of adverse events observed among those exposed up to time *t* across all strata, . The CSSP approach uses an error spending approach in combination with the conditional stratum-specific distributions to create the sequential monitoring boundary. Specifically, it uses the following iterative simulation approach:

- Step 1: Create a single realization of the following dataset of observed exposed counts under Ho for analysis
*t*, *t* = 1,..,*T* as follows: - For all confounder strata
*k*, simulate if else set - Calculate (total number of simulated exposed events at analysis
*t*)

- Step 2: Repeat Step 1 for a large number of realizations,
*Nsim*, to create a distribution of total number of exposed events at each analysis,. - Step 3: Order from smallest to largest and if then signal at analysis
*t* else continue. - Step 4: Set the simulated event counts that would have signaled at this analysis, , to an extreme value, such as 1000, so that these realizations will be indicated as having past the boundary. This allows for a cumulative error spending calculation that incorporates stopping. Otherwise, keep from Step 1 and repeat from 1 at next analysis,
*t* + 1.

Using this simulation approach explicitly incorporates the sequential monitoring stopping rules. Any form of the cumulative error spending function, *α*(*t*), can be assumed as discussed in the section on the Lan–Demets Group Sequential approach using error spending.

This CSSP approach is especially good when evaluating rare events, but it has limitations when there are too many strata and/or short intervals between analyses. The reason this approach breaks down is because the only informative strata are those that meet the following two criteria: (i) at least one observed event but not all participants observe an event; and (ii) both an exposed and unexposed participant. Furthermore, each analysis is treated as having separate strata because information from one analysis to the next is being treated as independent. Therefore, the true number of independent strata is*K* × *T*(number of confounder strata times total number of analyses) across all analyses. So as both *K* and *T* increase, very few strata will be informative. As a result, the test statistic is less stable, which can both influence power and potentially inflate or deflate the type I error. Having a small number of informative strata also leads to results being generalizable to the informative strata population only and not to the overall population. Caution should be taken in the interpretation of the results in this high dimensional strata situation. Furthermore, this approach assumes a constant relationship between exposure duration and the probability of an event, which may not be valid. Overall, it has nice properties for the rare event case and will be applicable to postmarket surveillance in settings where testing is not performed highly frequently or when too many confounder strata are required.

##### Group sequential estimating equation approach

The final approach we will present is an approach that controls for confounding through regression (unweighted or weighted). It can be applied to either the single exposure time or chronic exposure time settings. It has the flexibility to incorporate different exposure duration relationships, but we will focus on a constant relationship (i.e., given exposure duration, one assumes a constant rate of disease based just on exposure time). The approach uses a generalized estimating equation (GEE) framework and a score test statistic. Specifically, assume that the mean regression model under the null hypothesis, H_{o}, of no relationship between the MPI and the event is*g*(*E*(*Y*_{i}(*t*))) = *β*_{0} + *β*_{Z}*Z*_{i} + *f*_{θ}(*E*_{i}(*t*)), where g(.) is the mean link function; for example, the logit for a logistic model or the logarithm for a Poisson model. The exposure link function,*f*_{θ}(.), would typically be ignored for a single time exposure or specified as the logarithmic function if using a Poisson model. However, to allow for flexibility, this has been kept general.

Given the mean model, the generalized score statistic,[18] *Sc(t)*, can be calculated, with the additional specification for the family from which the data have arisen; for example, a binomial family for logistic regression and a Poisson family for a log regression model. However, a nice property of GEE when using the generalized score statistic is that it only assumes that the mean model is correctly specified.[19]

To calculate the sequential monitoring boundary, it has been proposed to use the following permutation data distribution:

- Step 1: At each analysis
*t*, simulate data by fixing (*Y*_{N(t-1)+1}, *Z*_{N(t-1)+1}),…,(*Y*_{N(t)}, *Z*_{N(t)}) and permuting *D*_{N(t-1)+1},…,*D*_{N(t)} to create *D*^{*}_{N(t-1)+1},…,*D*^{*}_{N(t)} and calculate . - Step 2: Repeat Step 1 for a large number of realizations,
*Nsim*, to create a distribution of score statistics, under H_{o}, at each analysis *t*, .

The boundary can be defined following the unifying boundary formulation as outlined for the GS LRT method or an error spending approach as outlined for GS LD method, except with this permuted dataset and score test statistic. Note that we are not directly estimating the effect of *D*_{i} because a score statistic is calculated under H_{o}. This allows for the test statistic to have better statistical properties, such as power, when the interest is in comparing alternative hypotheses that are closer to the null (e.g., better power relative to other methods for detecting RR = 1.5 versus RR = 3.0)[20].

The potential advantages of this approach compared with the other three approaches is that it may provide more flexible confounder control compared with GS LRT or CSSP, and it does not rely as heavily on the asymptotic assumptions as needed for the Lan–Demets error spending approach. However, a limitation to this approach, and any regression approach, is that it requires the first analysis to have enough events and observations to estimate the parameters of the mean regression model. This can be difficult for the extremely rare event case where the GS LRT or CSSP approaches may be preferable. As outlined by Nelson *et al*.,[13] it may be advantageous in safety surveillance to delay the first test of the data until an adequate amount of information has accrued, in which case, this method may be applicable in most commonly encountered situations. Furthermore, it requires more computational time than the well-defined asymptotically normal Lan–Demets error spending approach, so under the non-rare event case, the latter approach may be preferable for simplicity. Overall, all four approaches are applicable to the postmarket surveillance setting, and a brief summary of assumptions, limitations, and advantages is outlined in Table 1.

Table 1. Overview of the four statistical methods sequential monitoring including potential advantages and limitations | Exposure setting | Confounding control | Test statistic | Sequential boundary formulation | Potential advantages | Potential limitations |
---|

GS LD | Single time or chronic exposure | All: Matching, stratification, regression | Any standardized test statistic | Error spending boundary derived using a normal approximation | Easy to apply, flexible confounding control | In very rare event setting, or frequent testing, the normal approximation assumptions may not hold |

GS LRT | Single time exposure | Matching with fixed matching ratio | LRT | Unifying boundary derived using permutation; potential to extend to error spending boundary | Matching provides an appealing interpretation | Information loss because of restricted sample; potential loss of exposed if matching criteria too strict or insufficient confounding control if criteria too loose |

CSSP | Single time or chronic exposure | Stratification | Number of events for those on MPI | Error spending boundary derived by conditioning on number of events within strata | Works well for rare adverse events | May not maintain type I error when strata are small or if testing is frequent |

GS EE | Single time or chronic exposure | Regression | Score statistic | Unifying boundary or error spending boundary derived using permutation | Flexible confounding control with few assumptions | Requires sufficient outcome data at first look to estimate the initial regression parameters |