Novel predictions of invasive breast cancer risk in mammography screening cohorts

Mammography screening programs are aimed at reducing mortality due to breast cancer by detecting tumors at an early stage. There is currently interest in moving away from the age‐based screening programs, and toward personalized screening based on individual risk factors. To accomplish this, risk prediction models for breast cancer are needed to determine who should be screened, and when. We develop a novel approach using a (random effects) continuous growth model, which we apply to a large population‐based, Swedish screening cohort. Unlike existing breast cancer prediction models, this approach explicitly incorporates each woman's individual screening visits in the prediction. It jointly models invasive breast cancer tumor onset, tumor growth rate, symptomatic detection rate, and screening sensitivity. In addition to predicting the overall risk of invasive breast cancer, this model can make separate predictions regarding specific tumor sizes, and the mode of detection (eg, detected at screening, or through symptoms between screenings). It can also predict how these risks change depending on whether or not a woman will attend her next screening. In our study, we predict, given a future diagnosis, that the probability of having a tumor less than (as opposed to greater than) 10‐mm diameter, at detection, will be, on average, 2.6 times higher if a woman in the cohort attends their next screening. This indicates that the model can be used to evaluate the short‐term benefit of screening attendance, at an individual level.

and harms of screening, and an increased knowledge about, for example, genetic risk factors of breast cancer, proposals for individualized, risk-based screening [8][9][10][11] have attracted attention and some risk-based screening trials are even being carried out. 12,13 It is therefore useful to explicitly incorporate screening in models of risk prediction.
In order to calculate absolute risk, existing models rely on combining relative risk estimates with external data on breast cancer incidence rates and competing risks. 4 In this study, we present an alternative approach to risk prediction for screening cohorts which builds upon a natural history model of breast cancer. The approach introduced in this study jointly models the latent processes of breast cancer onset, tumor growth, symptomatic detection, and screening sensitivity, It incorporates each woman's individual screening history, and can be adapted according to projected future screening attendance. It also does not rely on external data on incidence rates, essentially because it models onset rather than diagnosis of cancer.
The remainder of the study is structured as follows. In the next section we describe the conventional approach adopted by existing breast cancer risk prediction models. We then describe the model we use, and how it can be trained on a cohort for a particular length of follow-up. Afterward, we describe how the trained model can be used to predict cancer incidence under a projected future screening attendance. Then, we present an analysis of data from a large mammography screening cohort, in which we compare our approach to the conventional approach. We conclude the study by discussing different aspects of our approach and possible extensions.

CONVENTIONAL APPROACHES TO RISK PREDICTION USING EXTERNAL INCIDENCE DATA
Although existing breast cancer risk prediction models differ, in particular with respect to which risk factors they use, most are based on the same, general procedure. This procedure can be summarized in terms of the following three steps: 1. Acquire relative risks for the risk factors you want to base your prediction on. These can be sourced from previous studies, or estimated by, for example, fitting a proportional hazards model to data from the study population. 2. Infer a baseline hazard function of age, by combining the relative risks with (external) population hazard rates of breast cancer incidence and competing risks. 3. Use the baseline hazard function and the individual relative risks to generate individualized risk predictions. This approach was first described for breast cancer 14 with a model that is known as the Gail model. That model focused, as we do, on invasive breast cancer. Mathematically, the final step amounts to calculating the probability that an individual, who is disease free at age b, will be diagnosed with the disease of interest (breast cancer) in a subsequent interval (b, b + z], which can be written as where D is age at detection. h 1 (t)r(t)dt is the instantaneous probability (at age t) of being diagnosed with breast cancer. The hazard of breast cancer diagnosis, h 1 (t), is estimated by multiplying age-specific breast cancer incidence rates by one minus the population attributable risk (which is equal to the relative risk model (t); see Bruzzi et al 15 ). The exponential term represents surviving without breast cancer from age b to age t. h 2 (t) is the age-specific hazard of dying from causes other than breast cancer, which is assumed to be the same for all individuals, and S 2 (t) = exp{−∫ t 0 h 2 (a)da} is the probability of surviving from competing risks up to the age t. S 2 (t)∕S 2 (b) is the conditional probability of surviving from other causes to age t, having survived from them up until age b.
Two risk prediction models that are based on the above approach are the Breast Cancer Risk Assessment Tool (BCRAT), 1,16 and the Breast Cancer Surveillance consortium (BCSC) model. 2,17 These models have incorporated a wide array of risk factors, such as family history, age at menarche, parity, BMI, mammographic density, polygenic risk score, and previous breast biopsies.
The International Breast Cancer Intervention Study (IBIS) uses a model commonly referred to as the Tyrer-Cuzick model. 3 While also following the above outline, IBIS takes a novel approach to incorporating family history into the risk prediction. It uses the detailed family histories of ovarian and breast cancer to infer a risk phenotype determined by the presence/absence of the high penetrance BRCA1/2 genes, 18 and a hypothetical low penetrance gene.
Software implementations of the approach outlined above have been developed (iCARE 19 ) and recently the performance of a new model based on relative risks from published literature has been compared to the BCRAT and IBIS models. 4 An example of a risk prediction model which deviates from the above general approach is the Rosner and Colditz model. 20,21 This is based on a nonlinear Poisson regression model, where the logarithm of the incidence rate depends on time-varying reproductive factors such as menarche, parity, and menopause. This type of model has also been used together with other risk factors, such as family history, polygenic risk score, BMI, alcohol intake, and mammographic density. 22 We describe each submodel in order, and show how each can be modified to include covariates. See Strandberg and Humphreys 24 for more details on fundamentals of the model.

Tumor onset
The first event in the natural history of breast cancer-from the continuous growth point of view-is the onset of the tumor. We use the Moolgavkar-Venson-Knudson (MVK) clonal expansion model. [25][26][27] Based on the Knudson two-hit hypothesis, 28  Cancer onset is then defined as the time T when the first malignant cell is formed. An issue with the MVK model is that its four parameters are not jointly identifiable from time-to-event data (eg, incidence data). However, Heidenreich et al 29 found a parameterization based on three parameters: =̃∕̃. With this parameterization, the hazard function for onset at age T = t is To study risk factors for breast cancer onset specifically, we turn the onset model into a proportional hazards model by letting the parameter depend on a set of k covariates {x t1 , x t2 , … x tk } according to the function Since is itself proportional to the first event ratẽ, we can interpret the covariates as increasing or decreasing the first event rate.
From the hazard function, we can derive the survival function to be with the corresponding probability density function

Tumor growth
After breast cancer onset, the cancer cells will proliferate, and the tumor will grow. We assume tumors are spherical, and grow according to an exponential function. The tumor volume at time z after onset is with an inverse growth rate parameter r > 0 and starting volume v 0 ≈ 0.065 mm 3 (which corresponds to a starting diameter equal to 0.5 mm). We assume that the tumor growth is largely deterministic after this point and also that tumors are detectable with nonzero probability. The exponential model has been widely used to model breast cancer tumor growth. [30][31][32][33] To account for the great variation in growth between individual tumors, we model the inverse tumor growth rate as a random effect R, where each tumor's inverse growth rate is a random draw r from R. The inverse growth rate R = r is related to the tumor volume doubling time through DoublingTime = ln(2) ⋅ r.
To incorporate covariates x r = x r1 , … , x rk we use the mean value parameterization of the gamma distribution when modeling the conditional inverse growth rate R|x r . The (conditional) probability density function is written as where ln( (x r )) = 0 + 1 x r1 + 2 x r2 + · · · + k x rk .
and E [R|X R = x r ] = (x r ), and Var (R|X R = x r ) = (x r ) 2 . The submodel for R| x r is a generalized linear model from the gamma family.

Symptomatic detection
Given enough time, a tumor will progress to a point where it will begin to display symptoms. We model the time U ′ from onset to symptomatic detection by using a continuous hazard function proportional to the current (latent) tumor volume, that is, a hazard function h U ′ given by at time from onset z for a parameter > 0. Thus, on average, fast growing tumors surface sooner than slow growing tumors.
To study factors relating to the rate of symptomatic detection, we can let the rate coefficient depend on a set of covariates x u = {x u1 , x u2 , … , x uk } through the function ln( ) = 0 + 1 x u1 + 2 x u2 + · · · + k x uk .

Screening sensitivity
Between the times of invasive breast cancer onset and symptomatic detection, there is an opportunity to detect the tumor early through mammography screening. We choose to model the screening test sensitivity (STS) at the time of screening , as a logistic function dependent on the current (latent) tumor diameter d( ). .
The linear predictor of the logistic function can be extended to include more covariates related to the screening sensitivity.
If we add the set of covariates x s = {x s1 , x s2 , … , x sk }, the STS model is Let Ω ∞ ∶= { 1 , 2 , … , n } be the set of all n screening ages in a woman's lifetime-past and planned for the future. Let Ω x = { k ∈ Ω ∞ | k ≤ x} ⊆ Ω ∞ be the woman's screening set up to, and including age x. It follows that for any two ages Let S be the age that a woman with breast cancer is screen-detected. The support for S is Ω ∞ ∪ ∞. If a woman is screen-detected at age S = k , k ∈ Ω ∞ , her screening history will consist of k − 1 negative screens (at ages Ω k−1 ), followed by a positive screen at age k . The woman's observed screening set is Ω s . The probability of the results from screening up to and including time k is

3.5
Training the risk prediction model Given a cohort of women who are healthy at start of follow-up (in the sense that they have not been diagnosed with breast cancer-either invasive or in situ), the above submodels can be assembled into a joint likelihood function of age at detection, mode of detection, and tumor size for each individual. Models can then be trained using maximum likelihood estimation. It can be shown 24 that a breast cancer case, with diagnosis of a tumor of volume v, at age x, has an individual likelihood contribution , if screen-detected and " … " refers to any additional covariates in the screening sensitivity submodel.
We have no information on the size of any possibly latent tumor for the censored women. If we want to determine their likelihood of not having a detected breast cancer by the censoring time (x), we must therefore compound over an additional variable-the inverse growth rate. Let U denote age of symptomatic detection (that would occur in absence of screening) and S denote the age of screen detection, and let D ∶= min(U, S) be the age at detection, by whichever mode occurs first. The likelihood contribution of a censored participant can be shown 24 to be The above likelihood contributions need to be adjusted for random left truncation before the likelihood can be properly evaluated. If a woman enters the study at age b without having been diagnosed with breast cancer, her likelihood contribution is conditional on this fact, that is, on D > b. The correct adjustment is given by: 24 where L * is the appropriate unadjusted likelihood contribution from the previous section (the tumor size v is only relevant for cases). A similar adjustment will also be necessary for making predictions of the future risk (next section).
After the adjustment, we can add up each woman's individual log-likelihood contribution: This log-likelihood can then be jointly optimized with respect to all parameters in all four submodels. In a previous study we have used simulations 24 to show that parameters are identifiable based on the observed outcome data.

SHORT-TERM RISK PREDICTION
Based on the above natural history/continuous growth model we can calculate probabilities of a number of outcomes of interest. We can of course predict diagnosis, as the conventional approach to risk prediction also does. Because we explicitly model tumor onset and growth and incorporate screening information, we can also make predictions of other events, such as tumor onset, mode-specific detection and even tumor size distributions. We note that, in the risk predictions below, competing risks in the form of death from causes other than breast cancer are not taken into account in these risk predictions. In our approach, we also do not consider in situ cancer as a competing risk, but instead censor individuals if they are diagnosed with in situ cancer during follow-up (Section 5.1).

Risk of onset
We begin by showing how to calculate a woman's short-term risk of onset of an invasive breast cancer. This is the conditional distribution of the age at onset T, given no detection up until the current time b, that is, the distribution of T|{D > b}. Note that no diagnosis before age b does not necessarily mean no onset before age b. Therefore, P(T ≤ b|D > b) ≠ 0, and should be interpreted as the probability of having a latent, undetected, tumor at age b. This could be used to predict the immediate risk of cancer, conditional on past screening results. The distinction between immediate and future risk has been emphasized in lung cancer screening, 34 for example.
We first obtain the CDF: .
Prediction of risk of onset is novel in the context of breast cancer risk prediction. Existing prediction models (ie, based on the conventional approach described above) focus on diagnosis. We now show how diagnosis can be predicted based on our model.

Short-term risk of breast cancer detection, either mode
Assuming that a woman currently of age b will attend the screenings planned for her in the interval (b, b + z], her z-year overall risk of breast cancer, detected by either mode, is Note that the screening set Ω b contains all of the negative screens up to the age at prediction b, and Ω b+z includes, in addition to the negative screens in (0, b], the screenings planned in the prediction interval (b, b + z], which are treated as negative when calculating L cen (b + z; Ω b+z ).

Short-term risk of diagnosis, specific mode
The overall predicted risk of breast cancer, as calculated above, can be separated into two mode-specific risks-the risk of being screen-detected at any of the planned screenings in (b, b + z] without being symptomatically detected beforehand; and the risk of being symptomatically detected in the intervals between each screen without being screen-detected beforehand.
The probability P(S = k , U > S) can be viewed simply as the censored likelihood contribution L cens ( k ; Ω k ), only with the result of the last screening at k changed from a negative into a positive screen. The easiest way to express the risk of symptomatic detection in an interval (b, b + z] which includes one or more planned screenings is to simply subtract the screening-risk above from the overall risk, that is,

Size-specific risk prediction (with and without screening attendance)
Consider a woman who is free from a breast cancer diagnosis up to age b, who has a screening history Ω b and is scheduled to attend her next screening at age ′ . It can be interesting to predict the effect that attending the scheduled screening F I G U R E 2 An illustration of two consecutive L sym surfaces: one between tumor size 0.5 and 15 mm and ages 61 to 62, before a screening at age 62; the other between tumor sizes 0.5 and 15 mm and age 62 to 63, following a screening at age 62.
will have on the BC outcome in the interval In particular, we address how effectively the screening at ′ will help to detect a tumor before it reaches a size d max . To do this, we need to make two counterfactual predictions-one where the woman attends the scheduled screening, and one where she does not. The most straight-forward approach for calculating these predictions is to integrate the individual likelihood contribution functions L scr and L sym over the relevant age and size intervals. The key is to understand that L scr is continuous over tumor size, and that within each interval between screenings L sym is continuous over both age and tumor size.
First, we change the tumor size variable in L * from the volume to the diameter. The Jacobian for this change of variable is d 2 ∕2. We introduce the following short-hand for the integrand we will use: In Figure 2 we provide an example of M sym in the age interval (b, x] = (61, 63] and tumor size interval d ∈ (0.5 mm, 15 mm], where a screening occurs at age ′ = 62 (the woman is assumed to have attended biennial screening since age 40, and to have received only negative results). The function is divided into two surfaces-one for before age 62, and one for after age 62. We see that each surface is continuous over age and tumor size, but that discontinuity between the two surfaces occurs at the time of screening.
In our considered scenario where we want to integrate over the area , we need to separate the probability into the following three components (the likelihood surfaces representing components 1 and 3 being represented in Figure 2): The probability of detecting a tumor with diameter d ≤ d max by age b + z, given no detected cancer before age b, is This can be intuitively extended to include multiple rounds of screening. For each additional screening round, the probability at the screening is added similarly to (2), and the interval between the added screening and the previous screening must be added similarly to (1) and (3).
If the woman does not attend the screening at ′ , then Ω b+z = Ω b and the predicted probability becomes an integral over a single age interval, namely

RISK PREDICTION IN A SWEDISH MAMMOGRAPHY SCREENING COHORT
We demonstrate our approach to risk prediction using the Karolinska Mammography Project for Risk Prediction of Breast Cancer (KARMA). 35 KARMA is an ongoing Swedish mammography screening cohort, in which 70 877 women attending the Swedish screening program were enrolled during the period between January 1, 2011 and March 31, 2013. The women were aged 40 to 74 at entry, and start of follow-up coincided with attending a regular mammography screening. In Sweden, women are invited to mammography screening every 18 to 24 months (as long as they are of screening age).
We divided the available follow-up in order to create a training set and a validation set. For training the model, that is, estimating the parameters in our onset, growth and detection submodels, we used follow-up data until August 31, 2017. We note that this is the same end of follow-up date as we used in a previous study based on KARMA, 36 before we obtained a more recent data update. The validation set for our current study used data from September 1, 2017 until October 31, 2019. A major reason for this temporal validation strategy is that, at the time, screening data was only available up until August 2017, while the diagnosis data was available until October 2019. Therefore, training could only be (reliably) done on the first time interval, while prediction could still be done on the later time interval.

5.1
Training the risk prediction model In our training data we only included women without a breast cancer diagnosis prior to study entry (left truncation). We also excluded women without a recorded screening history, or without a measurement of mammographic percent density (PD). PD is the proportion of radio-dense tissue in the breast and plays an important role in the risk and detection of breast cancer. 37,38 Women diagnosed with in situ breast cancer during follow-up were censored at time of diagnosis. This strategy has also been used in published risk prediction studies based on the conventional approach. 2 Among the invasive breast cancer incident cases, we excluded five cases without records of either date of diagnosis, tumor size, or mode of detection. This left us with 65 536 women, of which 1032 were diagnosed with invasive breast cancer during the follow-up to August 31, 2017. Subjects were censored if they died or were diagnosed with in situ breast cancer before the end of follow-up. A summary of the data used for training is presented in Table 1. Missing data was handled with mean-value (single) imputation. The highest missingness in the predictors was in second degree family history (37% missing). For all other predictors, missingness ranged from 5% to 8%. Since mammographic density is an important risk factor for breast cancer, 37-39 we wanted to include it in our onset submodel. Mammographic density is known to decrease with age, particularly during menopause. [40][41][42][43] However, we only had access to density measurements at baseline. Our solution to this was to assign each woman to an age-based quartile of PD (AQPD), and to assume that-while the PD will change with age, the assigned quartile will not. In other words, we assume that a woman belonging to the top quartile among 60-year olds today at age 60, also belonged to the top quartile among 30-year olds at age 30, and so on. With this approach we obtained a time-constant categorical variable with four levels (1-4), which we used as a covariate in our onset model. TA B L E 1 A summary of the variables used in the risk prediction model, for the training data and the validation data. The other covariates we included in our prediction model were: first and second degree family history of breast cancer, body mass index (BMI), age at menarche, history of benign breast disease, current use of hormone replacement therapy (HRT), parity status, and age at first childbirth (if parous) for the onset model; BMI for the inverse tumor growth rate model; total breast area (TBA) for the symptomatic detection model; and PD for the screening sensitivity model.
The variables were selected due to their previous associations with breast cancer risk, and for their occurrence in other risk prediction models. The onset submodel was the default submodel to include them in. A previous study using a similar natural history model 44 found associations between BMI and tumor growth rates, and between breast area and symptomatic detection rates. For this reason, those variables were chosen as example predictors to include in those respective submodels.
The continuous variables (BMI, TBA, PD, age at menarche, and age at first childbirth) were scaled by subtracting the mean and dividing by the SD. (This was done to more easily compare the effect sizes between the variables. The estimated coefficients represent the effect of increasing each variable by one SD.) Age at first birth was included as an interaction TA B L E 2 Parameter estimates for the risk prediction model fitted to the training data, including coefficients for a variety of known breast cancer risk factors in the four submodels. term with parity status, which is zero for all nulliparous women. Therefore, the coefficient for parity represents having the first childbirth at the average age for the population, vs having no children. We used the above variables, along with the individual screening histories, ages at study entry, ages at diagnosis or end of follow-up, and tumor sizes, to fit our prediction model to the training data. Parameter estimates are displayed in Table 2.
Based on this fitted model, we are able to produce a number of predictions which cover the time frame of the validation data. These predictions can then be compared to the observed outcomes of the validation data.

Risk prediction in the validation data and comparisons with the conventional approach
In our validation data, we started with individuals that were diagnosis-free as of August 31, 2017. We then excluded women aged above 74, since they would not be invited to additional screenings during the follow-up from September 1, 2017 to October 31, 2019. We also excluded women who had not attended a screening within the last 2 years of the training follow-up. We could not reliably assume that these women would attend screening in the validation follow-up. For our predictions, we then assumed for the remaining 47 575 women-based on whether they had previously been invited at 18-or 24-month intervals-that they would continue to be invited 18 or 24 months after their last screening, and that they would attend each screening. Women who died or were diagnosed with in-situ breast cancer during the 26-month follow-up were considered as noncases when evaluating the predictions against observed outcomes. A summary of the validation data can be found in Table 1.
To obtain a reference for the quality of our predictions, we compared them to predictions generated by the more conventional risk prediction approach. The procedure we used is similar to that used by Eriksson et al, 45 only with different covariates. We did as follows: 1. With the training data we performed a nested case-control study (four controls to one case) with the exact same covariates listed in Table 2 (only as risk factors in a proportional hazard model). 2. We retrieved external data on age-specific breast cancer hazard rates from the Association of the Nordic Cancer Registries (NORDCAN), and competing risk mortality rates from Statistics Sweden. 3. We combine the estimated odds ratios from (1) with the rates from (2), and used the R package iCARE 19 to predict the individual risks in the validation data.
In Figure 3 we summarize and compare the predictive performances of our natural history approach and the conventional approach, where we are predicting the overall risk of diagnosis within 26 months. On the left side, we sorted the women by their respective predicted risk, and produced a receiver operating characteristic (ROC) curve for each model. The area under curve (AUC) was very slightly higher for the natural history model (64.3% vs 63.2%). At a targeted specificity of 80%, the positive predictive value (PPV) was 1.18% for the natural history approach and 1.00% for the conventional approach, while the respective negative predictive values (NPV) were 99.54% and 99.50%. At a targeted sensitivity of 0.80, the PPV were 0.77% and 0.73%, and the NPV were 99.68% and 99.64%.
On the right side of Figure 3 we compare the distributions of predicted risk (of being diagnosed within the 26 months of follow-up) for the two approaches/models. For each, we stratify the women by their outcome in the validation follow-up. This lets us also compare the predicted risks of the cases and the censored individuals for the two models. We observed an incidence of 0.0060 in the validation follow-up. With the natural history model, we overall underestimate the risk as 0.0056 (underestimated by 7%), while we overestimate the risk as 0.0082 with the conventional model (overestimated by 37%). The average predicted risks for the cases were 0.0075 and 0.0104, respectively, for the two models. These average risks for the cases were 33% and 27% higher than the respective average risks for the non-cases. This indicates that the natural history model obtains a slightly better discrimination, and better calibration in-the-large. Calibration plots can be found in Figure A1. The estimated calibration slopes were 0.88 for the natural history model, and 1.14 for the conventional model.

Mode-specific predictions
To showcase that our prediction model is capable of making more specific predictions than other models, we predict the proportion of screen-detected vs symptomatic cancers in the validation data. We also predict the proportion for different subsets of the validation data, based on PD, BMI, and TBA. The women were partitioned based on if they belonged above or below the medians of each of these covariates. This created eight subsets for which we separately predict the number of screen-detected and symptomatic cases, using the formulas for mode-specific risk of diagnosis. These two predicted probabilities were summed up over all women in the subset to get the expected number of cases for each mode and subset, and observed-over-expected (O/E) ratios were calculated (with 95% CIs estimated using log(O∕E) ± 1.96 ⋅ O −0.5 ).
The results from this analysis are presented in Table 3. The calibration was better for screen-detected cases than for symptomatic cases, particularly among cases with PD above the median. The small sample sizes for each made it difficult to discern patterns, other than a tendency for symptomatic cases to under-predict the number of cases among those with high PD, and over-predict for those with low PD.
The   the tumor growth rate, where faster growing tumors are less likely to be caught by screening before they surface symptomatically; and rate of symptomatic detection, where a higher rate increases the probability of finding a tumor before or between screenings. (Risk factors for the age at onset should not affect the proportion as long as the screening intervals are independent of the onset risk-factors. They should instead increase the incidence of both screen-detected and symptomatic cancers equally.)

Predicting tumor size
Next, we predicted the tumor size distribution in the validation follow-up. In presenting the results, we separate the predictions by mode, that is, we obtain one distribution for screen-detected cases, and another for symptomatic cancers.
Results are displayed in Figure 4. On the left, the histograms show the observed tumor diameters for the two modes, and the lines show the corresponding averages of the individual predicted size distributions, weighted by each individual's predicted risk for the specific mode of detection. On the right side of Figure 4, we show a sample of 10 individual women's predicted size distributions. Note that these curves do not integrate to one-they represent the predicted risk of being detected and being detected at a certain size. We can see that-while the screen-detected predictions are satisfactory-the symptomatic predictions overestimate the tumor sizes. For symptomatic cases, the mean of the predicted tumor size distribution was 24.2 mm, while the observed was 20.1 mm. For screen detected cases the corresponding values were 14.7 and 14.8 mm. attending the next screening (red), or not attending (gray). One figure for each category of age-based quartile of percent density.

Predicting the effect of screening attendance on detected tumor size
Finally, we predicted the probability of detecting a tumor at diameter d < 10 mm for each woman in the validation data based on whether she hypothetically (i) attends and (ii) does not attend, her next planned screening. In order to account for the increased overall risk in the limited follow-up due to attending screening, we divided each predicted size-constrained risk of tumor diameter d < 10 mm by the predicted overall risk, thus obtaining two conditional probabilities (of a small tumor), for each woman. We then stratified the predicted probabilities based on AQPD values. Histograms of the two sets of conditional probabilities are presented in Figure 5. Overall, the average predicted probability of a small (< 10 mm) tumor was 0.22 if attending the screening, and 0.09 if not attending. The average ratio of these probabilities was 2.59. Stratifying by AQPD; the average (conditional) probabilities for the lowest quartile (AQPD = 1) were 0.26 if attending, and 0.08 if not attending (average ratio 3.62). Comparatively, women with AQPD = 4 had an average (conditional) probability of 0.17 if attending, and 0.10 if not attending (average ratio 1.73).
Additionally, we repeated the predictions with an extended follow-up of 10 years (but still with only one planned screening). After 10 years, we might assume that all tumors detected by the screening in the attending scenario will have had time to be symptomatically detected in the nonattending scenario. These predictions, which might be considered to be more relevant, are presented in Figure 6.

DISCUSSION
Conventional approaches to breast cancer risk prediction are flexible in their ability to incorporate risk factors and context-specific incidence rates. However, they do not explicitly incorporate individual screening histories. A study by Kerlikowske et al 46 accounted for the presence of two different screening programs in their data by stratification. In this study, we have introduced a novel approach to risk prediction in screening cohorts based on natural history modeling and have shown that it can produce equal (or marginally better) risk predictions of incidence, while also incorporating individual screening histories. Our approach essentially treats symptomatic detection and screen detection as competing events.
As mentioned in the introduction, there is interest in moving toward risk-based screening for breast cancer, [8][9][10][11] and several trials have begun. Two examples are the WISDOM study in the United States, 12 which uses the BCSC model for its risk prediction, and BC-Predict in the United Kingdom, 13 which uses a Tyrer-Cuzick model. Based on short-term predictions from these models, the interval of the next mammography screening could be modified, and possibly supplemented with other methods of screening (eg, magnetic resonance imaging).
The natural history approach as presented in this study is not only able to perform absolute risk predictions, but can also potentially predict the effect that a change to screening will have-and on an individual level. In this study, we predicted (among the women in our cohort) the probability of detecting a tumor before it reaches 10 mm in diameter-both if they were to attend their next planned mammography, or not. This is a useful comparison since tumor size is, for example, indicative of metastatic spread. With the same method, the natural history could predict the same probability if, for example, the screening was moved to be sooner, or an additional screening was added. Of course, the effect that increased screening of high-risk women (or decreased screening of low-risk women) may have on the clinical outcome of the cancer can, however, not be estimated by these risk prediction models without also predicting breast cancer survival and accounting for competing risks. Therefore, we should be cautious with interpreting the results, since some of the predicted screen-detected cases are overdiagnosed. 47 One approach to study this is to use these models in a broader simulation, as we have previously outlined. 48 The modeling approach used in this new risk prediction relies heavily on its parametric assumptions. They are necessary to piece together the latent natural history based only on data from diagnosis. The exponential growth assumption has been shown in vivo to hold reasonably well for breast cancer compared to alternatives. 49,50 Also, the tumor volume doubling times estimated in a previous study 36 are similar to those estimated in vivo. 50,51 The MVK model is based on the Knudson two-hit hypothesis, 28 and thus has a cancer biology connection. While the hypothesis is outdated, we are primarily after a flexible a flexible function that can resemble a time-shifted version of a breast cancer incidence rate curve. It is also desirable for having closed expressions for its hazard, survival, and density functions. Because of these strong parametric assumptions, we felt it was important to make comparisons with the conventional approach to risk prediction. Overall, for risk prediction, given our results, the assumptions that we make appear not to be overly restrictive.
It is worth noting however that the calibration of the natural history model is not optimal. This is most noticeable in Table 3 for some of the categories of symptomatic cases, and for the symptomatic tumor sizes in Figure 4. This indicates that the feature differences between screen-detected and symptomatic cancers can be improved. For the covariates used in the submodels (Table 3), alternative relationships/covariate definitions may improve the results. Mammographic density in particular plays an important role in both risk and screening sensitivity and more complex modeling with respect to this covariate might be useful. Alternatives to the logistic function used for the screening sensitivity could also be explored. 52 In this study we have treated each risk factor as a time-constant variable. With some additional work, our model could be made to handle piece-wise constant risk factors in the onset submodel. Data on the specific time-points where these, for example, HRT use or menopause status, changes occur, would of course be needed. While this information was not available in our study data, it is feasible that in other situations it could be, for example, in the context of clinical use, where updated information can potentially be collected at each screening. Seeing as mammographic density is a particularly important factor both for breast cancer risk and for screening sensitivity, longitudinal modeling of individual PD could be worthwhile when performing risk prediction, provided that multiple measures of PD are available for each woman. Mammographic density change with age could, for example, be jointly modeled using a Gompertz function. 43 Competing risks (mortality) are not currently included in our natural history approach, whereas they are catered for in conventional risk prediction methods. Unfortunately, it is not entirely straightforward to incorporate them in our approach. In the Appendix, we have included a derivation of an upper and lower bound to the risk prediction in the presence of competing events, which is numerically manageable to evaluate. To otherwise get the exact risks, a triple integral would need to be solved numerically. We will continue to explore competing risks for our model in the future, but considering, in our application, the age of the women being under 74 and the short-term prediction, we do not think that the impact of its inclusion would be very significant here. We repeated the iCARE analysis without competing risks (by setting the competing risk rates to zero) and saw no noticeable difference in AUC values (a difference of 0.0003) or in mean predicted risks (a difference of 0.00005). As expected when omitting competing risks, the predicted risk is overestimated. We do not see how the prediction differences would be any more (or less) different for the natural history approach. While this might not be a serious limitation for such short-term predictions at the current Swedish screening ages, we can expect the difference to be greater when considering, for example, 5-year, 10-year, or lifetime risk-or in settings where the screening age is higher or the average life expectancy is lower.
As pointed out earlier, we focused on developing a risk prediction model for invasive breast cancer, as most other approaches have done, and censored women after an in situ diagnosis. An alternative approach would be to treat in situ as a competing risk, which is a strategy that has been used by Tice et al. 17 Both are however pragmatic solutions. Ideally one would like to model the invasiveness process, but this would substantially complicate the method.
In this study, we limited the risk factors under consideration to common epidemiological variables already used by the conventional breast cancer risk prediction models. Other risk factors can be incorporated, for example polygenic risk scores (PRSs) for breast cancer. Mavaddat et al 53 found that the association between a PRS based on 333 single nucleotide polymorphisms (SNPs) and breast cancer risk persisted even when adjusting for family history of breast cancer, with estimated odds-ratios of 1.56 and 1.71, respectively, for women with and without family history. The inclusion of a PRS in our risk prediction model therefore has great potential to improve the predictions. For comparison, Zhang et al 23 included a PRS based on 67 SNPs in both the Gail model and the Rosner-Colditz model. They found that the AUC for predicted 5-year risk increased from 0.62 to 0.65 in the Gail model, and from 0.65 to 0.68 in the Rosner-Colditz model when adding the PRS to the other risk factors. If our approach can be extended to include a wider range of risk factors and more comprehensively deal with the complexities of competing risks and in-situ cancers, it would be more convincing as a tool for clinical users. After more careful consideration of risk factors for inclusion, it should undergo thorough evaluation for development as a prediction model. 54,55 There is room and opportunity for advanced statistical models based on the natural history of breast cancer to provide better understanding of breast cancer and to identify women at high risk. By adequately modeling the underlying processes leading up to detection, and by explicitly modeling breast cancer screening, one can make more specific predictions about various features the breast cancer diagnosis. These features are important to compare between proposed screening changes. In this study, we have presented one .
Similarly, we can obtain the PDF: A.2 Derivation of the short-term overall risk prediction (Section 4.2) )) drdt.

A.4 Bounds for prediction in the presence of competing risks
In the presence of a risk of a competing event U (with survival function G U (u) and density probability density function f U (u)), the risk of breast cancer detection is expressed as where the enumerator is The integral ∫ b+z b P(D > u)f U (u)du is a triple integral which has to be evaluated numerically. Alternatively, much simpler calculations can be done to produce upper and lower bounds for this integral-and by extension for the risk. By using the fact that P(D > u) consists of two terms, the first of which is G T (u), an upper bound can be derived as which is a relatively easy single integral (given a closed form for G U (u)). Similarly, a lower bound can be expressed as These lead to the lower and upper bounds for the risk of breast cancer detection in the presence of competing risks

F I G U R E A1
Calibration plots comparing the natural history approach to the conventional approach. The curves were fitted using logistic regression of the observed outcomes on a logit transformation of the model-predicted risks. The histograms show the distributions of the predicted risks. Closer adherence to the diagonal line indicates better calibration.