Recruitment prediction for multicenter clinical trials based on a hierarchical Poisson–gamma model: Asymptotic analysis and improved intervals

We analyze predictions of future recruitment to a multicenter clinical trial based on a maximum‐likelihood fitting of a commonly used hierarchical Poisson–gamma model for recruitments at individual centers. We consider the asymptotic accuracy of quantile predictions in the limit as the number of recruitment centers grows large and find that, in an important sense, the accuracy of the quantiles does not improve as the number of centers increases. When predicting the number of further recruits in an additional time period, the accuracy degrades as the ratio of the additional time to the census time increases, whereas when predicting the amount of additional time to recruit a further n•+$n^+_\bullet$ patients, the accuracy degrades as the ratio of n•+$n^+_\bullet$ to the number recruited up to the census period increases. Our analysis suggests an improved quantile predictor. Simulation studies verify that the predicted pattern holds for typical recruitment scenarios in clinical trials and verify the much improved coverage properties of prediction intervals obtained from our quantile predictor. In the process of extending the applicability of our methodology, we show that in terms of the accuracy of all integer moments it is always better to approximate the sum of independent gamma random variables by a single gamma random variable matched on the first two moments than by the moment‐matched Gaussian available from the central limit theorem.


Abstract
We analyze predictions of future recruitment to a multicenter clinical trial based on a maximum-likelihood fitting of a commonly used hierarchical Poissongamma model for recruitments at individual centers. We consider the asymptotic accuracy of quantile predictions in the limit as the number of recruitment centers grows large and find that, in an important sense, the accuracy of the quantiles does not improve as the number of centers increases. When predicting the number of further recruits in an additional time period, the accuracy degrades as the ratio of the additional time to the census time increases, whereas when predicting the amount of additional time to recruit a further + • patients, the accuracy degrades as the ratio of + • to the number recruited up to the census period increases. Our analysis suggests an improved quantile predictor. Simulation studies verify that the predicted pattern holds for typical recruitment scenarios in clinical trials and verify the much improved coverage properties of prediction intervals obtained from our quantile predictor. In the process of extending the applicability of our methodology, we show that in terms of the accuracy of all integer moments it is always better to approximate the sum of independent gamma random variables by a single gamma random variable matched on the first two moments than by the moment-matched Gaussian available from the central limit theorem. statistical power. In such a scenario, there is an increased risk of Type II error, thus potentially preventing or delaying an effective treatment from being approved (Treweek et al., 2013).
Recruitment of a patient to a clinical trial can be thought of as a three-stage process. First, some recruitment centers are initiated; more centers can be initiated as the trial progresses. Second, a potential recruit is enroled at a given center; after a lag, the potential recruit is screened for suitability, and if suitable that patient is randomized onto a particular treatment. Methods for predicting future recruitment usually model the probability of screening success separately, so we focus on the second stage of the process.
Future recruitment is often predicted using deterministic methods, based on the number already recruited up to that time, or historical data (Carter et al., 2005). Such an approach is inadequate due to the stochastic nature of the recruitment process, and a number of stochastic models have been proposed. Senn (1997) considers a Poisson-based model for a multicenter clinical trial where recruitment follows a Poisson process with a fixed study-wide rate, ≥ 0. The time to recruit a given number of patients then follows a gamma distribution. The underlying assumption that recruitment follows a Poisson process is well accepted in the literature, with many articles exploring an inhomogeneous model with a time-dependent rate (Carter, 2004;Carter et al., 2005;Tang et al., 2012;Lan et al., 2019). The basic Poisson model outlined above fails to incorporate variation in recruitment rate across centers, as well as the uncertainty in the rate estimate. Anisimov and Fedorov (2007) propose a random effects model in which recruitment follows a homogeneous Poisson process within each center, with the center-specific rates viewed as a sample from a gamma distribution. The time to recruit a given number of patients then follows a Pearson Type VI distribution, while the number recruited in a given time is negative binomial. This model accounts for staggered center initiation times and provides a method for predicting recruitment for new centers entering the trial. Citations of Anisimov and Fedorov (2007) on Google Scholar show that it has also been used by major pharmaceutical companies and in statistical software to plan drug production and distribution across centers during clinical trials. Further details of the model will be given in Section 2.
The Anisimov and Fedorov model (henceforth AF) has been developed and extended in numerous directions. For example, Bakhshi et al. (2013) suggest an extra level of hierarchy to incorporate variation from trial to trial in the gamma distribution parameters, with an aim to forecast recruitment for trials yet to begin. Mijoule et al. (2012) propose a Pareto mixture distribution for the center rates in place of the gamma. Further, Lan et al. (2019) and Urbas et al. (2020) both incorporate time-varying rates into the AF model, while also incorporating parameter uncertainty using the Bayesian paradigm.
We investigate future predictions based on a maximum likelihood fit of the AF model to multicenter recruitment data, where a total of • patients has been recruited over centers by a census time, . We then consider two prediction objectives, where prediction intervals are required for either (1) the total number + • recruited over some additional time + or (2) the total time + to obtain + • additional recruits. In this section, for brevity, we focus on objective (1); similar methods and results are obtained for objective (2). Within the AF model, the distribution of the predicted number of recruits,˜+ • , has a negative binomial distribution, which depends on the observed data via the maximum likelihood estimates (MLEs) of the model parameters; in contrast, the true number recruited, + • ∼ Poisson( • + ), where • is the sum of the recruitment rates of the individual centers. Letˆbe the th quantile of˜+ • ; that is, the predicted quantile. We first investigate ∶= ℙ( + • ≤ˆ) in the limit as → ∞, and empirically for finite , and show that the key determinant of the behavior is the ratio + ∕ . The desirable result of = is only recovered in the limit as + ∕ → 0, whereas in more typical scenarios can be very different from . The underlying reason for this is that the uncertainty in the MLEs is not being accounted for. Our asymptotic approximation to feeds in to a new methodology, which allows us to produce tractable prediction intervals, which have a coverage that is very close to that intended, and with a fraction of the computational cost of any bootstrap-based scheme.
Our theory, and hence our adjusted interval, is derived under the assumption that all centers opened at the same time; however, sometimes this is not the case. For example, given a predicted shortfall, perhaps based on our theory, it may be decided to open a new set of centers as well as keeping the existing centers going. Alternatively, or in addition, the existing centers may have been opened at different times. Guided by our theory, we provide an intuitive, tractable methodology for creating a prediction interval in such cases and demonstrate its accuracy in practice via extensive simulation studies.
Section 2 describes the AF model in detail, and Section 3.1 provides the asymptotic analysis in the case where all centers opened at the same time and details the TA B L E 1 Common notations used in this article. Objectives 1 and 2 are abbreviated to O1 and O2, respectively Realization of + • (O1) or specified total number of additional recruits required (O2) Estimated th quantile for + • Estimated th quantile for + methodology for creating prediction intervals with almost perfect coverage. Section 3.2 describes an empirical extension to this methodology for situations where the centers opened at different times. Our results and methods are verified via a detailed simulation study in Section 4, and then applied to a clinical-trial recruitment data set in Section 4.3. We conclude in Section 5 with a discussion. First, however, we define the notations that will be used throughout.

Notations
Let be the number of centers, and for = 1, … , , let and represent the time for which center was open before the census time and number recruited in center during the time . The shorthand refers to the vec- , and when all centers are open for the same time we denote that time by . For Objective 1, let + be the additional time ahead at which predictions will be made, and let + be the number recruited in center in that time, with + • = ∑ =1 + .
For Objective 2, let + • be the additional number of recruits sought and let + be the additional time taken to recruit this number. Table 1 summarizes these notations, and others that will be introduced later.
The negative binomial distribution of the number of successes until there are failures when the probability of success is denoted ( , ). We use the notation → and ⇒ to indicate convergence in probability and in distribution, respectively, and Φ to indicate the cumulative distribution function of an (0, 1) random variable.

Model, data, and likelihood
The model assumes that the recruitment rate at center , for = 1, … , , is , where each is drawn independently from Data for center are 1 , … , , . The likelihood for center is Hence, up to an additive constant, the log-likelihood given data , = 1, … , , = 1, … , , is Thus, = ( 1 , … , ) is a sufficient statistic. In the special case where 1 = ⋯ = = , the second term in (2) reduces to −( + • ) log( + ) and, as we shall see in Lemma 1, ∕ˆdepends on only through • .

Prediction
Since | ∼ ( ), given a prior of (ˆ,ˆ) for and an observation of , the posterior distribution for is is not tractable in general, but in the special case where which has moments of Alternatively, if the number of additional recruits is fixed at + • then, + | • ∼ Gam( + • , • ), so in the case where 1 = ⋯ = = , the predicted further time˜+ to recruit these has a Pearson VI distribution (e.g., Johnson et al., 1994) with a density of Thus,˜+ has moments of:

ASYMPTOTIC ANALYSIS AND METHODOLOGY
We consider the properties of the quantile estimates under repeated sampling, so that is a random variable, and andˆare, therefore, random. We examine the probability under the true data-generating mechanism that the quantity of interest, + • or + , will be less than its predicted quantile. This leads to a tractable formula for an alternative probability, * ( ), such that ℙ( + • ≤ˆ * ) ≈ or ℙ( + ≤ˆ * ) ≈ , and hence to prediction intervals with close to the intended coverage. In Section 3.1, we consider the scenario where all centers have been open for the same time; an intuitive extension for the more general scenario is given in Section 3.2.

All centers opened simultaneously
When all centers have been open for the same time, , • ∼ ( • ) is the key (random) summary of the data, instead of • for the specific realization; thusˆandˆare random. Importantly, in this caseˆ∕ˆdepends on only through • . Lemma 1. When 1 = ⋯ = = , the MLE for the likelihood in (2) satisfiesˆ∕ˆ= • ∕( ).
Proof. Set = ∕ ; from the invariance principle it is sufficient to show thatˆ= • ∕( ). Substituting for and ignoring terms only in , (2) becomes: Thus, which is zero (and a maximum for ) when = • ∕( ), as required. □ We now state our main result.

Theorem 1. Consider an infinite sequence of recruitment scenarios indexed by the number of recruitment centers,
= 1, 2, … . In each scenario, , after each center has been opened for a fixed common time , ( , ) is estimated from data ( ) by maximizing (2). It is used in (3) to estimate the th quantile, ( ) , of the total number, of recruits in an additional, fixed time + ; it is also used in (5) to estimate the th quantile, ( ) , of the time, +( ) until a further +( ) recruits have been obtained. Denoting the quantile estimates asˆ( ) andˆ( ) , respectively, the following results hold with ∼ (0, 1): 2. If as → ∞, Theorem 1 is proved in Section S1 of the Supporting Information. We discuss the consequences for + • in detail; those for + are analogous.
Theorem 1 confirms the intuition that the width of any confidence interval estimated using (ˆ,ˆ) is wider than that which would be obtained were the total intensity, • , known precisely; however, it also shows that the ratio approaches 1 as the census time increases. More importantly, for the median, Theorem 1 suggests that ℙ( + • ≤ˆ0 .5 ) ≈ Φ( √ + ∕ ), so that when + ≈ , this probability is approximately uniformly distributed on [0,1]. By contrast, when + << the probability concentrates at ≈ 0.5 as is desirable, and when + >> the probability concentrates around 0 and 1 each with a mass of 0.5, which is not desirable. The theoretical densities for ℙ( + • ≤ˆ0 .5 ) as a function of (with + = 400 − ) are given in Figure 1. For more general quantiles, with fixed, as + → 0, the probability approaches a point mass at as desired, but as + → ∞ the same concentration around 0 and 1 happens, Despite this decidedly unintuitive behavior of the quantile probabilities, Theorem 1 also shows that the relative error in the quantile estimate decays in proportion to 1∕ √ as expected. The resolution of this apparent contradiction lies in the fact that while the quantiles for + • and + • themselves are ( ), both the discrepancy between them and the widths of the distributions are ( √ ). The discrepancy between the quantiles also decreases to 0 as + ∕ ↓ 0, so depending on this ratio, the two distributions can closely overlap or almost entirely diverge ( + >> ).
Thus, even though the point estimate of a quantile may be accurate relative to the size of the quantile (( √ ) compared with ( )), unless + << , prediction intervals will not, in general, provide the intuitive and desirable coverage properties: ℙ(ˆ0 .05 ≤ + • ≤ˆ0 .95 ) ≈ 0.9, for example. However, the (asymptotically) correct coverage can be recovered by adjusting the interval, based on Theorem 1, as we now describe.
Theorem 1 suggests that to obtain a predictive value with the true (asymptotic in ) probability of it not being exceeded, we must target a value * such that Writing for Φ −1 ( * ) √ ( + + + )∕( + ) and letting ′ ∼ (0, 1) be independent of , the right-hand side may be rewritten as Rearranging gives In practice we do not know , and necessarily substitutef or this value. The estimatorˆis consistent for , and so we might expect this approximation to be reasonable. Section 4.2 provides empirical verification that adjustments based on this approximation lead to substantial improvements in coverage.

Different center opening times
We now consider the scenario where 1 = ⋯ = does not hold. In this case the posterior for • is intractable and, hence, so are the distributions for˜+ • and˜+. Furthermore, Lemma 1 does not hold. Although the distribution of • is intractable, its moments are not: with ∕ = and ∕ 2 = 2 . Denoting the th cumulants of , , and by , , and , respectively, 0 < 1 = 1 = 1 = and 0 < 2 = 2 = 2 = 2 by design, and the following holds for all ≥ 3: Theorem 2 is proved in Section S2 of the Supporting Information. Since the moment generating function of a random variable is ( ) = exp{ ( )}, where ( ) is the cumulant generating function, the coefficient of in ( ) is a linear combination of products of the cumulants 1 , … , where all coefficients are positive. This immediately leads to the following: Theorem 2 and Corollary 1 show that a momentmatched gamma approximation to is, in a sense, strictly better than the moment-matched Gaussian approximation available through the central limit theorem. We, therefore, make the approximation (see also Lemma 2.2 in Anisimov, 2011) that where * • and * are chosen so that the first two moments of * • match those of • . Figure S1 in the Supporting Information, and the accompanying text, demonstrate the accuracy of this approximation for two scenarios relevant to trial recruitment that we will describe in Section 4.2.
The posterior distribution for * • is exactly that which would arise given the Gam(ˆ,ˆ) prior if each center had been open for the same time of * and a total of * • patients had been recruited. Thus, if the MLEs from this "data", * andˆ * were to satisfyˆ * =ˆandˆ * =ˆthen the theory from Section 3.1 would follow through exactly. In reality, whatever the partitioning of * • across centers, the data would typically lead to slightly different MLEsˆ * ≠ˆand * ≠ˆ; nevertheless, in the proof of Theorem 1 the most important aspect of the MLEs is their ratio. From Lemma 1, * ∕ˆ * = * • ∕ * , and empirical comparisons of * • ∕ * againstˆ∕ˆ(see the Supporting Information) showed a relative error of less than 0.1%.
The methodology for constructing prediction intervals for either˜+ • or˜+ then proceeds as in Section 3.1, usinĝ andˆunder the assumption that • ≡ * • .

EMPIRICAL VERIFICATION OF THEORY AND METHODOLOGY
Simulations were carried out to test the asymptotic theory and methods proposed in this paper for finite numbers of centers, . A large number (20,000 unless otherwise stated) of realizations of the parameters 1 , … , , and hence the sample ( 1 , … , ) were simulated for a given set of parameter values. For each realization, the parameters and were estimated using maximum likelihood and the quantile of interest, or , was estimated. Either ℙ( + • ≤ˆ) or ℙ( + ≤ˆ) was then calculated exactly using the known (simulated) 1 , … , . The results outlined below will primarily focus on predicting + • . Unless specified otherwise, the following parameter values were used: = 2, = 150, = 150, = 200. The latter two are the default values used when considering varying census times and center numbers, respectively.
When predicting + • , the total trial length was set to = + + = 400, since with the default , [ When conducting simulations with varying number of centers, we set = to maintain a fixed expected number of recruits per unit time. In the Supporting Information, we explore an alternative scenario where = 150 is fixed as varies.

4.1
Verification of Theorem 1 Figure 2 shows the empirical distribution of ℙ( + • ≤ˆ) over repeated simulation and, hence, estimatesˆ, for the median, = 0.5. The left panel varies the census times ∈ , while the right panel fixes (and hence + = − ) and varies the number of centers, ∈ ℂ . The shape of the density function for ℙ( + • ≤ˆ) depends on the ratio of + ∕ and shows very little variation with , just as described in Section 3.1, and matching almost perfectly the relevant theoretical curves in Figure 1. In particular, when = + , as in all cases in the right panel, the distribution is very close to uniform, empirically verifying the, perhaps unintuitive, result that increasing the number of centers in the trial, thus increasing the sample size upon which the MLEs are based, does not affect the accuracy of the quantile estimates. The theory predicts that the lines in the right panel should be horizontal; however, there is a slight positive gradient. This is because the theory is based on a continuous approximation whereas + • is a discrete random variable. The density function for ℙ( + • <ˆ) (not shown) exhibits a slight negative gradient, supporting this explanation. Figure 3 repeats Figure 1 and the left panel of Figure 2 but for the = 0.25 quantile. Again, the empirical results match the theory almost perfectly. As with = 0.5, the estimate improves with increasing census time, but as predicted in Section 3.1, when ≪ + , the mass is now not evenly distributed between the regions close to 0 and close to 1.
When predicting quantiles for + , Theorem 1 suggests that the accuracy of the quantile is primarily dependent on the ratio of + • ∕ • . Thus, with a fixed + • and , and with = , there is essentially no change in the prediction accuracy; Figure 4 captures the close agreement between

Adjusted prediction intervals
We now study empirically the effectiveness of using quantiles based on * ( ) to derive prediction intervals, and compare with intervals based directly on . At each simulation, a naive, unadjusted 90% interval was estimated by calculatinĝfor = 0.05 and = 0.95. An adjusted 90% interval was also derived by using * ( ) from (7) instead of , both for = 0.05 and = 0.95. The performance of the intervals was assessed for each method by calculating the mean, over 2000 simulations, of the true prediction interval coverage. The mean width of the prediction intervals was also recorded. We first consider the case were all centers opened simultaneously, then the case of different center opening times. Table 2 shows the results for each ∈ and + = − . The unadjusted method gives satisfactory results for ≪ + only, as is to be expected given Theorem 1. For all other scenarios, the quantiles are inaccurately estimated and the coverage can be far less than intended, as low as 63.7% for a census time early on in the trial. Further diagnostics showed approximately equal contributions to undercoverage fromˆ0 .05 being too high andˆ0 .95 being too low. In contrast, by applying (7), the coverage is consistently improved upon and corrected to almost exactly the desired 90%. The improved coverage does come with a cost of an increased interval width, but the increase seems proportionate.

All centers opened simultaneously
When = 0, (7) gives * = : no correction is needed. We, therefore, also examined the effect of our adjustment when data are simulated using a much lower true parameter value, = 50. In this case, the lowest coverage was 77.8%, observed when ( , + ) = (50, 350), improving to 90.2% after our adjustment, while when = + = 200 the coverage improved from 84.1% to 90.0%; the full tabulation is provided in the Supporting Information.
Similar improvements to those in Table 2, but for the 95% prediction interval are also provided in the Supporting Information, confirming that the * adjustment performs equally well when adjusting quantiles, which are further into the tails of the distribution. A further table in the Supporting Information demonstrates an even more striking improvement than in Table 2, found when creat-ing a 90% predictive interval but with = 20; for example, when ( , + ) = (50, 350) the coverage improved from 59.2% to 89.7%.

Different center opening times
We consider two different opening time scenarios: (1) the center opening times are drawn uniformly and independently from the interval [0, ] and (2) half of the centers are opened at time 0 and half of the centers open at time . The former mimics a gradual coming online of new centers, while the latter scenario could occur when an initial interim analysis suggests that many new centers must be opened to achieve the required sample size.
The investigation into quantile adjustment to obtain a 90% prediction interval (Table 2) was repeated for opening time scenarios (1) and (2), and the results are provided in Tables 3 and 4, respectively. The prediction intervals for these cases were constructed according to the methodology of Section 3.2. Additional diagnostics for the moment matching were also recorded: the mean (over repeated samples) of * , the ratio of this to the mean (over repeated samples) of the mean (over centers) of the 's, and the ratio of the mean of the * • to the mean of the • . In both cases, the intervals obtained by combining the methodology proposed in Section 3.2 with (7) produce coverages very close to 90%, whatever the census time. By contrast the unadjusted intervals suffered from coverages as low as 48% when = 50. Typically the values of * and * • are lower than and • (although their ratio is almost unchanged; see Section 3.2), representing the increased uncertainty in parameter values because some centers have not been open for the full time interval. The especially poor coverage of the unadjusted intervals results because it is now the ratio + ∕ * that determines the extent of the undercoverage.
Equivalent tables for + for opening time scenarios (1) and (2), presented in the Supporting Information, show similar dramatic improvements.

Application to clinical trial recruitment data
Finally, we applied our methodology to recruitment data from an oncology clinical trial. The recruitment centers opened at different times, thus the methodology of Section 3.2 applies. For anonymization reasons, all times in the data set were jittered by up to a week, and in the plot described below both time and cumulative recruitment have been rescaled to lie in the interval [0,1].

TA B L E 3
The mean (over repeated sampling) of the true coverage probability and width of an intended 90% prediction interval for + • using the unadjusted and adjusted methods for opening time scenario (1) We examined the 41 centers that had opened by time 0.125 and calculated 90% prediction intervals for the total recruitment from these centers for the remainder of the recruitment period. With a single data set it is impossible to obtain true coverage probabilities, however, we can compare the predicted intervals with the true number recruited. Figure 5 shows the prediction intervals as dashed lines in red (unadjusted intervals) and dotted lines in blue (adjusted intervals) with the actual recruitment numbers shown as a solid black line (this figure appears in color in the electronic version of this article, and color refers to that version). The recruitment goes outside of the unadjusted interval just before time 0.6 yet remains entirely within the adjusted prediction interval.
A diagnostic likelihood-ratio test (see Urbas et al., 2020) with a null hypothesis that the Poisson process is timehomogeneous (as assumed by the model) produced pvalues of 0.39 (data up to the census time) and 0.50 (all data). Diagnostic Q-Q plots (see the Supporting Information) suggest that the assumption of a gamma distribution in (1) is reasonable.

DISCUSSION
We must start by pointing out that the model described in Section 2 is just that: a model. The hierarchical nature allows the borrowing of information from centers that have been open for some time and enables sensible predictions for newly opened centers, the Poisson process is a reasonable first approximation for the recruitment process at an individual center, and the gamma hierarchical distribution is chosen for tractability. The model does not account for the myriad logistical issues that might occur during a trial, affecting recruitment, and even were this not the case, data do not arise from the model. However, the model has gained traction in the industry and has been developed further by a number of authors (see the introduction of this article).
Theorem 1 first provides insight into when prediction intervals obtained by simply plugging in the parameter point estimates might be adequate; for example, when the future time horizon is small compared with the time for which the trial has been running. However, often the future time horizon is at least as long as the current F I G U R E 5 Recruitment (black, solid) for an oncology clinical trial. The estimated 90% prediction intervals are shown by red dashed lines (unadjusted intervals) and dotted blue lines (adjusted intervals), and the "+" symbols indicate center opening times. This figure appears in color in the electronic version of this article, and color refers to that version length of the trial, and in this situation the coverage of plug-in intervals is poor. The methodology resulting from Theorem 1 essentially, takes parameter uncertainty into account to create prediction intervals with almost exactly the intended coverage.
Alternatives that allow parameter uncertainty to inform prediction intervals include Bayesian methods (e.g., Zhang and Long, 2010;Urbas et al., 2020), which are typically computationally expensive, or the bootstrap, which is usually even more expensive. Our method has the same cost as the standard plug-in, frequentist approach.
The diagnostics detailed at the end of Section 4.3 suggest that the model of Anisimov and Fedorov (2007) is suitable for the oncology data set that we examine, but this might not always be the case. The simulation study in Section S5 of the Supporting Information suggests robustness to moderate departures from the hierarchical gamma distribution and robustness of improvements to the prediction intervals through our method. However, as demonstrated, for example, in Urbas et al. (2020), if the intensity curve for each center is strongly time-dependent, predictions based on the assumption of a homogeneous Poisson process can be wildly inaccurate, and a time-inhomogeneous Poisson process might be more appropriate (e.g., Lan et al., 2019;Urbas et al., 2020). If the exact form of the time-inhomogeneity is known then the standard time transformation used for the Poisson process ( → ∫ 0 ( ) , where ( ) is intensity at true time for center ), with one transformed time scale per center, permits the application of our correction to predicting the number of new recruits in a given addi-tional (true) time. However, the time-dependency typically contains unknown parameters, and our correction as it stands cannot account for the uncertainty in these. Future research could look into extending our method to allow for this.
Theorem 1, upon which our prediction adjustment is based, describes the limit as the number of centers → ∞. Our simulations suggest that the approximation based on the limit result works well even when is as low as 20; however, it is unlikely to hold for very low center numbers. Furthermore, experience has shown that for very low center numbers it is possible for the likelihood to increase monotonically as → ∞ and → ∞ with ∕ fixed (this can occur when the counts for individual centers are underdispersed). Relevant historical data might then be brought in to make parameter estimation more robust; however would still be low and the intended coverage might not be achieved.
This article has considered scenarios where centers can open at different times up until the census time, additional centers may be opened at the census time (perhaps driven by the results of the analysis) and we wish to predict the total recruitment for these centers into the future. A more general opening time scenario would also allow for centers coming online at different times after the census time. This could be incorporated into predictions of recruitment over the remainder of the recruitment period via a more general definition of • , which would become a weighted sum of the individual intensities, with a center's weight being the fraction of the future time that it would be open for.
How to deal with the converse problem in this scenario: predicting the time to recruit a certain number of patients, is an open problem. Prediction using the model of Anisimov and Fedorov (2007) relies on the true center opening times, which are rarely known in advance. There is often a plan and a back up plan, however, and it is straightforward (see the appendix of Urbas et al., 2020) to combine the Anisimov and Fedorov (2007) model with a standard survival model for the opening time of each center conditional on the planned opening time and, potentially, other covariates. Alternatively, Lan et al. (2019) models center opening times as realizations from an inhomogeneous Poisson process. With either of these approaches, once the model has been fitted using the data up to the census time, it is straightforward to repeatedly simulate sets of future opening times. One would then obtain a mixture of negative-binomial distributions for the distribution of the number of additional recruits over additional time + . The mixture could be approximated by a single negativebinomial distribution and our method applied directly to that. This would mainly be an extension of the model of Anisimov and Fedorov (2007), and would certainly be interesting to explore; the aim of this paper, however, is to analyze the existing method of Anisimov and Fedorov (2007) and, in addition to new insights on performance, provide improved prediction intervals.

A C K N O W L E D G M E N T S
The first author acknowledges support from award: NIHR-MS-2016-03-01 Lancaster University.

D ATA AVA I L A B I L I T Y S TAT E M E N T
The results in this paper are mainly from simulations studies; the one clinical-trial data set that we use is not shared due to privacy/ethical restrictions.  Tables, and Figures referenced in Sections 3, 4, and 5 are available with this paper at the Biometrics website on Wiley Online Library. In addition, the computing code used in Sections 3 and 4 is also available in the Supporting Information.