Minimum sample size for external validation of a clinical prediction model with a continuous outcome

Clinical prediction models provide individualized outcome predictions to inform patient counseling and clinical decision making. External validation is the process of examining a prediction model's performance in data independent to that used for model development. Current external validation studies often suffer from small sample sizes, and subsequently imprecise estimates of a model's predictive performance. To address this, we propose how to determine the minimum sample size needed for external validation of a clinical prediction model with a continuous outcome. Four criteria are proposed, that target precise estimates of (i) R2 (the proportion of variance explained), (ii) calibration‐in‐the‐large (agreement between predicted and observed outcome values on average), (iii) calibration slope (agreement between predicted and observed values across the range of predicted values), and (iv) the variance of observed outcome values. Closed‐form sample size solutions are derived for each criterion, which require the user to specify anticipated values of the model's performance (in particular R2) and the outcome variance in the external validation dataset. A sensible starting point is to base values on those for the model development study, as obtained from the publication or study authors. The largest sample size required to meet all four criteria is the recommended minimum sample size needed in the external validation dataset. The calculations can also be applied to estimate expected precision when an existing dataset with a fixed sample size is available, to help gauge if it is adequate. We illustrate the proposed methods on a case‐study predicting fat‐free mass in children.


INTRODUCTION
Clinical prediction models provide individualized outcome predictions to inform patient counseling and clinical decision making, such as treatment and monitoring strategies. [1][2][3] Depending on the context, they may also be referred to as clinical prediction tools, diagnostic or prognostic models, risk scores, and prognostic indices, among other names. They are typically developed using a regression framework, which provides an equation to predict the outcome conditional on the values of multiple predictors (variables, covariates). In this article, we focus on prediction of continuous outcomes (such as birth weight, depression score, blood pressure or fat mass), for which the model equation is typically a linear regression. Such models can be used to predict an individual's expected outcome value, conditional on the individual's predictor values. The outcome may relate to something current (eg, fat mass level at present) or in the future (eg, pain score at 1 month after a back injury).
Recently we proposed how to calculate the minimum sample size needed to develop a prediction model with a continuous outcome. 4,5 Once a model has been developed, it is important to evaluate its predictive performance in new data, independent to that used to develop the model. This process is known as external validation, and is usually crucial regardless of how a model was developed. In particular, external validation indicates how the model performs in new data that is representative of the target population to which the model will be used in practice. [6][7][8][9][10][11][12][13] However, despite being widely encouraged and having its importance clearly demonstrated, [13][14][15][16][17][18][19] external validation of published prediction models is rare in practice, with researchers predominately focusing on the development of new models. 19 Even when external validation is performed, the sample size is often too small to provide reliable conclusions about a model's predictive performance and key measures are often neglected; in particular, calibration of predicted and observed outcome values is rarely examined. 16 In this article, we propose criteria to determine the minimum sample size needed for external validation of a clinical prediction model with a continuous outcome. We suggest the minimum sample size needs to be large enough to precisely estimate three key measures of predictive performance: calibration slope (agreement between predicted and observed values across the range of predicted values), calibration-in-the-large (CITL, agreement between predicted and observed outcome values on average), and R 2 (the proportion of variance explained). Section 2 introduces these performance measures, while in Section 3, we derive three closed-form solutions for the sample size required to estimate each of them precisely. As these solutions depend on the variance of observed outcome values, we also present a fourth criterion that aims to ensure this variance is estimated precisely. Hence, our sample size calculation comprises checking four criteria, and we suggest the largest sample size calculated from the four approaches is used as the minimum required for the external validation. Section 4 applies our proposal to an applied example, and Section 5 concludes with discussion.

KEY MEASURES OF PREDICTIVE PERFORMANCE FOR A CLINICAL PREDICTION MODEL WITH A CONTINUOUS OUTCOME
Assume that we wish to externally validate an existing prediction model for a continuous outcome, and have obtained a suitable external validation dataset containing a sample of individuals from the target population of interest. We now describe how to quantify the prediction model's performance in this dataset.
First, the researcher needs to calculate the existing model's predicted (expected) outcome value (Y PREDi ) for each individual (i). As the outcome is continuous, the existing prediction model equation will usually be in the form of a linear regression and so contain an intercept ( ), and predictor effects ( 1 , 2 , 3 , etc) corresponding to predictor variables (X 1i , X 2i , X 3i , etc). For example, with three predictors a simple example of an existing prediction model equation is: However, in practice the right hand side of the model equation (also known as the model's linear predictor) may be far more complex, for instance with more than three predictors and potential interactions and non-linear terms (eg, defined by splines or polynomials). A real example is given in Box 1.

BOX 1 Hudda et al prediction model for the natural logarithm of ln(fat-free mass) in children 20
Ln(fat-free mass) = 2.8055 + (0.3073 × height 2 ) − (10.0155 × weight −1 ) + (0.004571 × weight) + (0.01408 × BA) −(0.06509 × SA) − (0.02624 × AO) − (0.01745 × other) − (0.9180 × ln(age)) + (0.6488 × age 0.5 ) + (0.04723 × male) • Predictor variables of black (BA), south Asian (SA), other Asian (AO), or other (other) ethnic origins are all binary, with value of 1 if individual has the particular origin and 0 otherwise • Height is measured in meters, weight in kilograms, age in years, and fat-free mass in kilograms Clearly, the external validation dataset must contain values for all the predictors (X 1i , X 2i , X 3i , … ) included in the prediction model equation, so that Y PREDi can be calculated by applying the model's equation to each individual. The dataset must also contain the observed outcome value (Y i ) for each individual, so that the prediction model's predictive performance can then be quantified by comparing the Y PREDi values to the Y i values.
We now introduce three key statistics to quantify a model's predictive performance upon external validation, which focus on overall model fit and calibration.

R-squared
R 2 is a well-known measure of overall model fit and quantifies the proportion of outcome variation explained. Let var(Y i ) denote the variance of Y i values in the external validation population, and var(Y i − Y PREDi ) denote the variance of (Y i − Y PREDi ) values (ie, the prediction errors in the external validation population). Then the true proportion of outcome variation explained by the predicted values from the prediction model, R 2 val , is: Values of R 2 val closer to 1 indicate better fit of the Y PREDi from the prediction model.

Calibration slope and calibration-in-the-large
Calibration measures the agreement between predicted (Y PREDi ) and observed (Y i ) outcome values in the external validation dataset. 21 It is best shown graphically on a calibration plot, with Y PREDi on the horizontal axis plotted against Y i on the vertical axis, with every individual providing a single data point. A LOESS smoothed calibration curve should also be fitted through the points and presented on the plot. 2,11,22 Ideally, the predicted outcome values are not systematically under-or over-estimated across the entire range of predicted values. That is, the points are scattered randomly around the 45 • line of perfect agreement (corresponding to a slope of 1), with little variation around the line (ie,R 2 val is large), and with close agreement between predicted and observed values across the entire horizontal axis range.
To formally quantify calibration performance in an external validation dataset, a calibration model can be fitted of the form, where "cal" is used to emphasize that parameters are from the calibration model. This model can be fitted using standard estimation methods for a linear regression, such as using restricted maximum likelihood estimation. The parameter cal represents the calibration slope, which measures agreement between predicted and observed outcomes across the whole range of predicted values. 2,3 As mentioned, the ideal cal value is 1. A cal < 1 indicates that some predictions are too extreme (eg, predictions above the mean are too high, and/or predictions below the mean are too low) and a slope > 1 indicates that the range of predictions is too narrow. A calibration slope < 1 is often observed in external validation studies, as clinical prediction models are often developed in small datasets without adjustment for overfitting, which leads to extreme predictions (miscalibration) in new individuals external to those used for model development. [23][24][25][26] The term 2 cal measures the residual variance in the calibration model. Note that the true calibration slope in the external validation population can also be expressed as, 27 where R 2 cal is the proportion of variance of Y i values explained when the calibration model (3) is fitted to the external validation population.
Systematic over-or under-prediction is still possible even when the calibration slope is 1, and thus it should always be considered alongside calibration plots and CITL. The latter measures the agreement between mean predicted (Y PRED ) and mean observed (Y ) outcome values, which can be estimated in the external validation dataset using: Estimating CITL val by applying Equation (5) in an external validation dataset is equivalent to estimating cal by fitting model (3) with the constraint that cal equals 1 (see Section 3.2).
In this section, we propose four criteria for researchers to use as a basis for determining the minimum sample size required for an external validation study. The first three criteria aim to ensure the sample size is large enough to estimate R 2 val , CITL val , and cal precisely (ie, with a small margin of error). Closed-form solutions are derived for this purpose. As these expressions depend on the estimates of (residual) variances, a fourth criterion aims to precisely estimate these also.

Criterion (i): Precise estimate of R 2 val
Our first criterion targets a precise estimate for R 2 val from the external validation dataset, such that the confidence interval for R 2 val will be narrow. There are many suggestions for deriving confidence intervals for R 2 . 28 Here, we focus on the approach suggested by Wishart, 29 which uses the following approximate standard error (SE) ofR 2 val : Tan suggests this approximation works well when the sample size (n) is reasonably large (say >50), 28 which is likely to be the situation when externally validating a clinical prediction model (see criterion (iv)). Rearranging Equation (6) gives a closed-form sample size calculation of: Equation (7) can now be used to calculate the sample size (n) required to meet criterion (i), by specifying a desired value for SER2 val and by setting R 2 val at the anticipated true value for the external validation population. For example, consider an existing prediction model with an adjusted R 2 of 0.5 in the development dataset, with this adjusted (rather than apparent) R 2 giving an unbiased estimate of expected performance in new data. Then, if we assume F I G U R E 1 Sample size (number of participants, n) needed in an external validation dataset to target a confidence interval for R 2 val of a particular width (either 0.05, 0.1, or 0.2) for different assumed R 2 val values between 0.1 and 0.9. Sample size calculated using Equation (7)   . We can now apply Equation (7) to give, and so 769 participants are required to meet criterion (i). To achieve the same margin of error, 905 participants are required when assuming R 2 val is 0.3, and 197 participants are required when assuming R 2 val is 0.8. These values are reasonably close to those using more exact (but not closed-form) approaches to confidence interval derivation, such as that based on the scaled non-central F approximation proposed by Lee. 30 The ss.aipe.R2 function within Kelley's MBESS package for the R software identifies the sample size required to ensure Lee's confidence interval for R 2 val is sufficiently narrow, [31][32][33] and so is an alternative to using Equation (7). Figure 1 shows how the required sample size changes from R 2 val values between 0.1 and 0.9 based on Equation (7) and assuming SER2 val is 0.0255 to target a confidence interval width of 0.1. The required sample size will be lower when allowing for wider target confidence intervals, and higher when aiming for narrower target confidence intervals ( Figure 1). However, we suggest SER2 val ≤ 0.0255 is a sensible compromise, as it targets a precise estimate (margin of error of 0.05 or less compared to the true value) and still gives a required sample size that will be realistic to obtain in practice.
Note that upon external validation the true R 2 val may be lower or higher than the adjustedR 2 reported for model development. Therefore, although the adjustedR 2 from the development study is a useful starting point, we also recommend calculating the sample size required when assuming slightly different values for the true R 2 val . For example, researchers might apply Equation (7) assuming R 2 val values ± 0.1 of the adjustedR 2 reported from the development study, and note the largest sample size across this range.

Criterion (ii): Precise estimate of CITL
Our second criterion targets a precise estimate of CITL val from the external validation dataset. We estimate CITL val by using Y − Y PRED (from Equation (5)), which is equivalent to estimating the intercept when fitting (in the external validation dataset) model (3) with the predicted values as an offset term: Therefore the SE ofĈITL is: We can rearrange Equation (9) to obtain an expression for the required sample size: Hence, the sample size required to meet criterion (ii) can be derived using Equation (10), for which the researcher must pre-specify R 2 CITL (the anticipated proportion of variance explained by the predictions in the external validation population), along with var(Y i ) (the anticipated variance of Y i in the target population), and the desired SEĈ ITL .
A sensible starting point is to assume CITL is zero, as then R 2 CITL = R 2 val (the anticipated proportion of variance explained by the predictions upon validation), and so with R 2 val assumed to be the same as the adjustedR 2 reported from the development study. If CITL is not zero then R 2 CITL will not equal R 2 val . Hence, it is also sensible to consider a range of values for R 2 CITL when applying Equation (10), such as ± 0.1 of the adjustedR 2 reported from the development study, and to note the largest sample size across this range.
The value that defines a precise SEĈ ITL is context specific, as it depends on the scale of the outcome values. For example, for systolic blood pressure an SE of about 2.5 mmHg may suffice, but for BMI a smaller SE may be required as the scale is much narrower.
For instance, consider external validation of a prediction model for systolic blood pressure with a reported adjusted R 2 of 0.5 in the development study, and that the variance of the observed Y i values is anticipated to be 400 in the target population for the validation study. Let us target an SEĈ ITL of 2.55, as this gives a 95% confidence interval for CITL val with a narrow width of about 10 mmHg, assuming a 95% confidence interval for CITL val can be derived approximately bŷ CITL ± (1.96 × SER2 val ). Assuming R 2 CITL = R 2 val = 0.5, then applying Equation (10) gives, and thus at least 31 participants are required to achieve criterion (ii). More cautiously assuming that R 2 CITL = 0.4, the required sample size is and thus 37 participants are required. It is likely that the sample size to precisely estimate CITL is smaller than that required to precisely estimate the measures outlined in criteria (i), (iii), and (iv).

Criterion (iii): Precise estimate of calibration slope
The third criterion targets a precise estimate of cal , which represents the calibration slope obtained from fitting calibration model (3) in the external validation dataset. Aŝc al is the slope from a simple linear regression model, the SE of̂c al can be estimated by, 34 Equation (13)  In terms of choosing SE 2 cal , a value ≤0.051 is recommended, to target a 95% confidence interval for cal that has a narrow width ≤ 0.2 (eg, if the calibration slope was 1, the confidence interval would be 0.9 to 1.1 assuming confidence intervals derived bŷc al ± 1.96SÊc al ; note that replacing 1.96 by critical values of the t-distribution is unnecessary, as the sample size will not be small).
In terms of choosing cal , a simple starting point is to assume good calibration, such that cal = 1 and cal = 0 in model (3). Then, R 2 cal = R 2 val from criterion (i), and so R 2 cal might be assumed to be the same as the adjusted R 2 estimated in the model development study. For example, for external validation of a prediction model that had an estimated adjusted R 2 of 0.5 in the development dataset, a simple starting point is to anticipate the same value for R 2 val . Then, assuming the model's predictions will be well calibrated in the external validation dataset (ie, on average, fitting model (3) would givêc al of 0 and âc al of 1), using Equation (13) gives, and thus 386 participants are required to target a confidence interval width of 0.1 for the calibration slope, under the assumptions of good calibration. The sample size should also be large enough to precisely estimate some miscalibration. Often when a prediction model is externally validated the calibration slope is less than 1, due to overfitting during model development that was unaccounted for in the final prediction model equation (ie, penalization or shrinkage estimation methods were not used). In such situations R 2 cal can still be assumed to be the same as the adjusted R 2 presented for model development, as this value specifically adjusts for optimism due to overfitting. When applying Equation (13) for fixed R 2 cal and SE 2 cal values, lowering the assumed cal below 1 will produce lower sample sizes than when assuming the prediction model is well calibrated. Hence, assuming cal is 1 is more conservative for the sample size calculation. Further sensitivity analyses could be undertaken if desired. For example, we could change both cal and R 2 cal values. However this is complex, as Equation (4) reveals that the value of cal depends on R 2 cal (and also var(Y i ) and var(Y PREDi )). Therefore, changing the assumed value of cal has implications for what the assumed value of R 2 cal should be. This may be too intricate for the sample size calculation. Similarly, although situations of under-prediction (where cal is >1) may lead to larger required sample sizes, this may not be practical to consider as over-prediction situations are more common. Thus, we generally suggest to apply Equation (13) assuming good calibration ( cal =1) and set R 2 cal equal to the adjusted R 2 estimated for model development.

Criterion (iv): Precise estimates of residual variances
Our final criterion targets precise estimates of̂2 CITL and̂2 cal . This is essential because, although these residual variances are not direct measures of predictive performance themselves, their estimated values are used toward parameter estimates and, crucially, SEĈ ITL val and SÊc al . For̂2 CITL , we can equivalently consider the sample size needed to precisely estimate a residual variance in a linear regression model with only an intercept (see model (8)). In such situations, Harrell suggests calculating the sample size to ensure the lower and upper bounds of a 95% confidence interval for the residual variance has a small multiplicative margin of error (MMOE) around the true value, 2 using where 2 1− 2 ,n−1 and 2 2 ,n−1 are the critical values of a 2 distribution with n − 1 degrees of freedom for which there is, respectively, a probability of 1 − 2 and 2 of being less than the critical value. The second term within the bracket of Equation (14) will typically give the largest MMOE.
We recommend a margin of error of within 10% of the true value (1.0 ≤ MMOE ≤ 1.1), for which Equation (14) reveals that a sample size of at least 234 participants is needed to ensure an MMOE ≤1.1 for̂2 CITL .
For precise estimation of̂2 cal , we need to adjust the sample size required for a slope parameter being estimated (see model (3)). As outlined by Riley et al,4 the solution is simply 234 + 1, and thus 235 participants are required to ensure an MMOE of ≤1.1 for̂2 cal . Hence, in summary, at least 235 participants are needed to meet criterion (iv), and thus 235 is the minimum sample size required for any external validation of a prediction model for a continuous outcome, regardless of context and before consideration of criteria (i), (ii), or (iii).

Summary of the criteria
Our sample size criteria aim to ensure the external validation dataset will precisely estimate R 2 val , CITL, calibration slope, and residual variances. The approach requires a separate sample size calculation for each criterion, and the largest sample size calculated provides the minimum needed for the external validation study. A step-by-step guide to our proposal is provided in Figure 2.

APPLIED EXAMPLE
We now illustrate our sample size proposal using an applied example. Hudda et al developed a prediction model for the natural logarithm of fat-free mass in children and adolescents aged 4 to 15 years, including 10 predictor parameters based on height, weight, age, sex, and ethnicity (see Box 1 for model equation). 20 The model is required to provide an estimate of an individual's current fat mass (weight -predicted fat-free mass). The apparent calibration of the model in the development dataset is shown in Figure 3A. In the development dataset, the estimated adjusted R 2 was 0.95. An initial F I G U R E 2 Summary of the steps involved in our sample size calculation for external validation of a clinical prediction model for a continuous outcome external validation was undertaken in 176 children aged 11-12 years from the UK Avon Longitudinal Study of Parents and Children (ALSPAC) study, 35,36 where the model had an estimated R 2 val of 0.90 Figure 3B. However, as acknowledged by Hudda et al, further external validation is warranted in a broader age range, for which a sample size calculation can be undertaken using our proposal. We assume that the validation population is similar to the development population, and work through the calculations for criteria (i) to (iv). STEP 1: Calculate the sample size needed to precisely estimate R 2 val (criterion (i)) This requires us to apply Equation (7). Based on assuming an R 2 val = 0.90, as in the published external validation of the model, and a SER2 val of 0.0255 to target a confidence interval width of 0.1, a sample size of 56 children is required, as: It is sensible to also consider that the model may perform worse upon external validation, say with a 0.1 reduction in R 2 val to 0.80. Then, the required sample size to meet criterion (i) is 197 children. These sample size values are also identified within Figure 1. Therefore, based on the published informationvar(Y i ) ≈ 0.286 2 = 0.082. Interestingly, when contacting the original study authors directly for this information, they calculated it to be a similar value ofvar(Y i ) = 0.089. We will use this value from the study authors going forward.
We must also specify the expected value for R 2 CITL . We begin by assuming R 2 CITL = R 2 val and that this is 0.90, as in Hudda's initial external validation of the model.
The precision required to estimate CITL needs to be placed in context of the mean outcome value in the population. Hudda et al reported a median baseline fat-free mass of 24.8 kg. If we assume that the mean value is similar, then we have: Considering the original untransformed scale, an accuracy of approximately ±1 kg around Y seems reasonably precise.  To ensure a 10% margin of error in residual variance estimates from the calibration models, at least 235 participants are required (see Section 3.4).

STEP 5: Calculate the final sample size
Assuming we aim to validate the model of Hudda et al in a population similar to the development data, steps 1 to 4 have provided four sample sizes to ensure criteria (i) to (iv) are met. These are summarized in Table 1. Based on the largest of these sample sizes, the final minimum sample size required to meet all criteria is 235 participants. This is driven by criterion (iv), to target sufficient precision around̂2 CITL and̂2 cal .

WHAT IF SAMPLE SIZE FOR EXTERNAL VALIDATION IS FIXED?
Sometimes there are no resources for prospective recruitment of participants to a new study for external validation of a prediction model. Then, researchers might seek an existing (already collected) dataset from the target population of interest. However, the sample size of an existing dataset is fixed, and so the researcher (and other stakeholders such as funders and collaborators) needs to know if it is large enough for reliable external validation. In this situation, our calculations in steps 1 to 4 can be re-expressed to calculate the expected SER2 val , SEĈ ITL , SÊc al , and MMOE conditional on the known sample size and assumed values of R 2 val , var(Y i ), R 2 CITL , R 2 cal , and cal as before. For example, in the initial external validation of Hudda et al, an existing dataset, from the ALSPAC study, of 176 children was used. Based on the calculation shown in Table 1, this sample size is likely to give very precise estimates of R 2 val , CITL, and cal when assuming R 2 val = R 2 CITL = R 2 cal is 0.9. However, the sample size is lower than the 235 recommended for precise estimation of̂2 CITL and̂2 cal , and so the MMOE for these estimates is expected to be >10%. Nevertheless, when applying Equation (14) assuming 176 participants, the MMOE is 1.12, and thus the error is expected to be 12%, only just over the 10% recommendation. Hence, this existing dataset appears to have a reasonable sample size for external validation, which would have been useful for Hudda et al to know at the time.

DISCUSSION
We have proposed closed-form sample size calculations for studies externally validating a prediction model for a continuous outcome. These aim to ensure the sample size is large enough to precisely estimate key measures of predictive performance (R 2 , CITL, and calibration slope) and the residual variances in calibration models. This led to four criteria, and the largest sample size required satisfying all four criteria is the recommended minimum sample size needed in the external validation dataset. Our work builds on minimum sample size calculations for model development. 4,38 As with any sample size calculation, assumptions are required to implement our proposed approach. In particular, researchers must specify the model's anticipated R 2 val ,var(Y i ), and̂c al in the validation dataset. As discussed, a simple starting point is to assume these will be the same as those reported for the original model development study, especially if the target population (for validation) is similar to that in the model development study. Then the researcher might consider sample sizes based on slight adjustments; in particular, assuming the model may perform slightly worse than in the development dataset. Our example illustrated this for a prediction model of fat-free mass in children, where we assumed an R 2 val of 0.8 rather than the 0.90 or 0.95 values reported in the original model development study. Lower values may be even more important to consider in situations where the development dataset was small (such that reported performance statistics were estimated with large uncertainty); the developed prediction model did not adjust for overfitting using, for example, penalization and shrinkage techniques (such that reported performance statistics are likely to be optimistic); and in situations where the intention is to validate the model in a different population or setting from that used in the development study. Larger sample sizes may be needed if missing data are expected, and if a model's predictive performance in key subgroups (eg, males, females) is of interest.
Section 5 discussed how to use our calculations when an existing dataset (of a fixed sample size) is already available, in order to gauge the expected precision of estimates conditional on the sample size available. Ideally the dataset will be large enough to ensure precise estimates, as then more robust conclusions about predictive performance will be possible. However, we recognize that even when datasets are small, obtaining estimates of predictive performance is still useful; in particular, these could ultimately be combined in a meta-analysis. 39 It is important that datasets for external validation are high quality and applicable to the target population, setting, and timing of implementing the prediction model in practice. Adequate sample size does not overcome issues in quality and applicability. [39][40][41] We chose to focus on R 2 , CITL, and calibration slope as these are key performance measures; ensuring precise estimation of residual variances is also important, as they are used to calculate the aforementioned predictive performance measures and also mean-squared error. We anticipate that the largest sample size will usually be driven by criterion (i), (iii), or (iv). Further work might consider precise estimation of calibration curves, 11,22,42 and extension to non-continuous outcomes is needed, building on work of others. 11,17,43 Closed-form sample size solutions are transparent and quick to implement, but more difficult to derive for binary and time-to-event outcomes. Jinks et al do suggest closed-form sample size calculations for precisely estimating the D statistic for time-to-event prediction models. 44 Also, we only focused on statistical measures of predictive performance, and not on clinical utility or impact of using the model to inform healthcare decisions (eg, initiation of treatment).
Finally, sometimes the sample size for an external validation dataset must also be large enough for model updating, for example, when the researcher aims to recalibrate one or a few of the model parameters to the target population of interest. Then, the required sample size needs to meet the criteria described in this article (for external validation), and also those criteria proposed for model development (as model updating is akin to model development 5 ). The exact sample size needed for model updating depends on how the model is to be updated (eg, which parameters, and indeed how many parameters, are to be revised) and whether additional predictors are to be included. Riley

DATA AVAILABILITY STATEMENT
Data sharing is not applicable to this article as no new data were created or analysed in this study.

ETHICS STATEMENTS
Ethical approval for the ALSPAC study was obtained from the ALSPAC Ethics and Law Committee and the Local Research Ethics Committees. The views expressed are those of the authors and not necessarily those of the BHF, Cancer Research UK, the NHS, the NIHR, the Department of Health or the EU.