Heterogeneity in risk of prostate cancer: A Swedish population‐based cohort study of competing risks and Type 2 diabetes mellitus

Most previous studies of prostate cancer have not taken into account that men in the studied populations are also at risk of competing event, and that these men may have different susceptibility to prostate cancer risk. The aim of our study was to investigate heterogeneity in risk of prostate cancer, using a recently developed latent class regression method for competing risks. We further aimed to elucidate the association between Type 2 diabetes mellitus (T2DM) and prostate cancer risk, and to compare the results with conventional methods for survival analysis. We analysed the risk of prostate cancer in 126,482 men from the comparison cohort of the Prostate Cancer Data base Sweden (PCBaSe) 3.0. During a mean follow‐up of 6 years 6,036 men were diagnosed with prostate cancer and 22,393 men died. We detected heterogeneity in risk of prostate cancer with two distinct latent classes in the study population. The smaller class included 9% of the study population in which men had a higher risk of prostate cancer and the risk was stronger associated with class membership than any of the covariates included in the study. Moreover, we found no association between T2DM and risk of prostate cancer after removal of the effect of informative censoring due to competing risks. The recently developed latent class for competing risks method could be used to provide new insights in precision medicine with the target to classify individuals regarding different susceptibility to a particular disease, reaction to a risk factor or response to treatment.


Introduction
Underlying assumptions in most common methods for survival analysis are that censoring is non-informative and that the association between risk factors and the event of interest is homogenous in the population studied. 1 However, these assumptions are often violated or neglected. First, informative censoring may happen when the studied risk factor (e.g., smoking, obesity, metabolic syndrome) is associated with death and a shorter life-expectancy. [2][3][4] Methods based on competing risk analysis can provide reliable real-life estimates of risk in these study settings, but results are often difficult to interpret in terms of etiological hypotheses, 5,6 especially if the risk factor is associated with both time to death and the risk of the disease under study. Secondly, risk factors or other studied covariates (e.g. treatments, drugs, and dietary exposures) may have different associations with the risk within sub classes of the population studied, 7,8 and previous reports of heterogeneity in cancer risk suggested that the majority of the cases are arising from a minor susceptible part of the population. 9 Thus, there may be latent classes within a population with a known or unknown geno-or phenotype that render them to higher susceptibility of disease, or high risk in association with a specific covariate, or positive or negative response to a specific treatment, in line with the theories behind precision medicine. 10 Researchers investigating metabolic aberrations, metabolic diseases, or drugs for metabolic diseases, and risk of prostate cancer are facing a complex task. Observational studies suggest that men with Type 2 diabetes mellitus (T2DM), [11][12][13][14][15][16][17] and men on anti-diabetic drugs, [18][19][20] have a decreased risk of developing prostate cancer. Moreover, other studies indicate that men with metabolic aberrations have a higher risk of aggressive or fatal prostate cancer. [21][22][23][24] Due to the higher age when prostate cancer generally occurs, competing risk in terms of death is an issue, 25,26 especially since metabolic disease is associated with a shorter life-expectancy. One previous study concluded that individuals with diabetes at a given age have a smaller lifetime risk of cancer than individuals without diabetes at the same age, attributable to the higher mortality rates among individuals with diabetes. 27 Moreover, they noted that differences in cancer occurrence between individuals with and without diabetes were a quantitatively smaller problem than the differences in mortality between the two groups. 27 In line with these results, we have previously investigated prostate cancer risk in relation to other competing causes of death and reported that the decreased risk of prostate cancer among men with metabolic aberrations is of much smaller quantity than the increased risk of death among those men, as compared to men with normal metabolic levels. 28 We have also used other methods for competing risks to investigate similar research questions with respect to prostate cancer, 29,30 but to the best of our knowledge there is no established method that yields straight forward interpretations to analyze etiological associations within these study settings.
Based on these limitations in conventional survival analysis some of the present authors recently developed a method to handle competing risks and cohort heterogeneity based on latent class analysis. 31 The aim of the current study was to investigate heterogeneity in risk of prostate cancer classified into risk categories defined by tumor characteristics at diagnosis by applying this method. We further aimed to elucidate the etiological association between T2DM and risk of prostate cancer, and to compare the results with results derived from conventional methods for survival analysis.

Participants
We applied a method based on latent class analysis for competing risks, 31 on a prospective cohort study in the comparison cohort of Prostate Cancer Data base Sweden (PCBaSe) 3.0. For each man with prostate cancer in the National Prostate Cancer Register of Sweden, five prostate cancer-free men were randomly selected from the background population into the comparison cohort, matched on birth year and county of residence. 32 By using the Swedish 10-digit personal identity number, the men were linked to a number of national health care registers and demographic databases. The study was approved by the Research Ethics Board at Umeå University, Sweden.
For the current study, we selected men who entered the comparison cohort between January 1, 2007, and December 31,2009, and were between 55 and 84 years old. The men in the study were followed from entry date until date of prostate cancer diagnosis, date of death, or December 31 2014, whichever occurred first. Time in follow-up was used as a timescale in all analyses.

Covariates in study
We retrieved data from the National Patient Register on discharge diagnoses from hospital admissions up to ten years prior to the date of inclusion in the comparison cohort. These data were used to calculate the Charlson Comorbidity Index (CCI), categorised into no comorbidity (0), mild (1), moderate (2), and severe (3 or more) comorbidity as described previously. 33,34 Data on educational level were retrieved from the Longitudinal Integration Database for Health Insurance and Labor Market Studies at Statistics Sweden, and were categorized into three groups based on duration of education: 9 years, 10-12 years, and 13 years, which corresponded to elementary school, secondary school, and higher education. 35 Status of T2DM was defined by antidiabetic drug prescriptions classified according to the Anatomical Therapeutic Chemical Classification System from the What's new? Men with type 2 diabetes have a decreased risk of developing prostate cancer but metabolic disease is associated with a shorter life expectancy. Here the authors applied a new method to handle competing risks and cohort heterogeneity and found no association between type 2 diabetes mellitus and prostate cancer risk after isolating the effect from competing risks. This method may support precision medicine by correctly classifying individuals regarding susceptibility to a particular cancer or response to treatment. Prescribed Drug Register. We retrieved data of metformin (A10BA or A10BD), insulin (A10A) and sulphonylurea (A10BB), and classified T2DM status as users of no antidiabetic drugs, metformin, or insulin/sulphonylurea. This categorization were based on the national guidelines of diabetes care and pattern of use of anti-diabetic drugs in Sweden. 36 We used age at start of study, T2DM status, education level and CCI as covariates in all analyses. T2DM status and CCI were recorded at start of study. Covariates were transformed to z-scores (mean 5 0, standard deviation 5 1) prior to analysis.

Endpoints assessed
Data on prostate cancer diagnosis was obtained from the National Prostate Cancer Register of Sweden which includes information on date of diagnosis, age at diagnosis, tumour stage and differentiation, serum levels of prostate-specific antigen (PSA) at time of diagnosis, and primary treatment. 31 Prostate cancer risk categories were defined at diagnosis according to a modification of the National Comprehensive Cancer Network Guideline: 37 Low-risk: T1-2, Gleason score of 2-6 and PSA < 10 ng/ml; intermediate-risk: T1-2, Gleason score 7 and/or PSA 10-20 ng/ml; high-risk: T3 and/or Gleason score 8-10 and/or PSA 20-50 ng/ml; metastatic disease: T4 and/or N1 and/or PSA 50-100 ng/ml (regional metastases) or M1 and/or PSA > 100 ng/ml (distant metastases). 32 We clustered these risk categories into favorable-risk prostate cancer (low-and intermediate-risk), and aggressive prostate cancer (high-risk and metastatic disease). 38 Date and cause of death were obtained from the Cause of Death Register, and deaths from cardiovascular diseases were defined as codes I00.0-I99.9 (International Classification of Diseases, 10th revision). We used four endpoints in the analyses: favorable-risk and aggressive prostate cancer, death of cardiovascular diseases and death of other causes.

Survival analysis
We investigated heterogeneity in risk of prostate cancer, and elucidated the association between T2DM status and risk of prostate cancer to differentiate the effect of informative censoring from competing endpoints, with the latent class model for competing risks described previously, 31 and also briefly described below. In comparison to conventional survival models, the model does not assume homogenous associations between covariates and endpoints and takes into account simultaneous risks of all four endpoints. It assumes that the study population may be comprised of one or several latent classes, and that risk differences between classes are induced by latent heterogeneity, that is heterogeneity not captured by covariates, and can be quantified by relative frailty and/or variability in associations of base hazard rates. The assumption of proportional hazards is only made for each endpoint and each latent class separately, but is not required to be valid for the full study population collectively. If no latent classes are found in the study population, the method will automatically reduce to a model with similar settings as the Cox proportional hazards model. Associations with covariates and latent class membership on risk of all endpoints were investigated by calculations of hazard ratios (HR) from the latent class model for competing risk analysis. The HR for latent class membership are defined by relative frailties. The calculations were performed with the software package Advanced Latent Class Prediction and Competing Risk Analysis version 0.2, (ALPACA, A.C.C. Coolen, M. Rowley, M. Inoue, London, UK). The results from the latent class model for competing risk analysis were compared to HR from Cox and Fine and Gray regression models. 39 In the Cox models, we calculated the risk of each endpoint separately and censored for all other endpoints, while in Fine and Gray regression models, all other endpoints were handled as competing risks. These models were calculated by STATA MP/2 version 14 (StataCorp LP, College Station, Texas).

Latent class model for competing risk
As described in detail previously, 31 latent class analysis in ALPACA is performed in two stages: during the parameter estimation stage, model estimates are obtained for many candidate models, covering a range of latent classes, base hazard rate complexities, and degrees of permitted heterogeneity. Due to its stochastic nature, this process is performed multiple times until consistent estimates are obtained. At the model selection stage the relative probabilities of all candidate models are computed and the model with the greatest probability of describing the study data, the optimal model, is determined. Bayesian model selection balances the need for model complexity with the evidence available in the data to support such complexity. The base hazard rate parametrisation is based on a spline construction, where the inferred number of spline points increases with the complexity of the base hazard rate. In our study, we included latent class models with one to four latent classes, and with up to ten spline time points.
Based on the optimal model, we analysed the base hazard rate, covariate HRs, latent class membership HRs (relative frailty), and survival functions, both under the influence of competing risks (crude) and isolated from the effect of censoring due to competing risks (marginal), separately for each latent class. We here quantified the association of the covariates on the endpoints by calculating covariate HRs and quantified the association attributed by the class membership on the endpoints by calculating class membership HRs, that is the association of relative frailty due to factors not accounted for in our study. Characteristics of study participants were analysed in relation to the most probable latent class they were assigned to. In order to assess the effect of censoring due to competing risks we graphically compared the crude and marginal survival functions.

Results
Mean age at start of study was 70 years (SD 5 7 years). During a mean follow-up of 6 years (SD 5 2), 3,397 men were diagnosed with favorable-risk prostate cancer, 2639 men with aggressive prostate cancer, 9,165 died of cardiovascular diseases and 13,228 died of other causes without no previously diagnosis of prostate cancer. Baseline characteristics of the full study population are shown in Table 1.
Based on the latent class model for competing risks, the study population consists with high probability of two distinct latent classes. Class 1 included 115,623 (91%) men in the study, while Class 2 included 10,859 (9%) men. Men in Class 1 were typically younger and with fewer comorbidities than men in Class 2, (Supporting Information Table 1). At the end of follow-up, 85% of the men in Class 1 were alive and free of prostate cancer, 6% had died of cardiovascular diseases and 9% from other causes, while in Class 2 all men were either dead, 23% from cardiovascular diseases and 24% from other causes, or had been diagnosed with prostate cancer (31% with favorable-risk and 22% with aggressive prostate cancer; Table 2).
Men in Class 2 were frailer than men in Class 1 with respect to the risk of all four endpoints. The relative frailty between the classes was largest for favorable-risk and aggressive prostate cancer, with HRs for class membership of 16.4, 95% confidence interval: 7.1, 38.3, for favorable-risk prostate cancer and 8.1, 95% confidence interval: 4.5, 14.7, for aggressive prostate cancer. All covariate HRs were weaker than the class membership HRs on risk of favorable-risk and aggressive prostate cancer, (Table 3).
We found no associations between T2DM status and risk of favorable-risk and aggressive prostate cancer in any of the classes, but an association with death of cardiovascular diseases and other causes in Class 1, and in Class 2 an inverse association with death of other causes. By visually observing the crude and marginal survival functions for T2DM status, we found that informative censoring due to competing risks had opposite effects on the survival curves in the both classes ( Figure 1). The scales were different between the figures and the differences between the crude and marginal survival curves were far higher in Class 2. In that class, the crude survival curve (under influence of competing risks) underestimated the risk as compared to the marginal survival curve (isolated from the effect of competing risks), while in Class 1 we found a smaller, but opposite effect.
In the analysis of HRs from Cox regression analysis, we found that age, T2DM status and the CCI were inversely associated, and education level positively associated to favorable-risk prostate cancer (Supporting Information Table  2a). Moreover, age was positively associated with aggressive Insulin/sulphonylurea 3 10,030 8 1 Educational level categorised as low (9 years of school), intermediate (10-12 years), and high (13 years), corresponding to mandatory school, high school, and college or university. 2 Educational level missing for 2006 men (2%); these men were included in the group with low educational level. 3 Ordered variable such as men in the insulin/sulphonylurea group could also use metformin.  Table 2b).

Discussion
By applying the recent latent class analysis method for competing risks, 31 we found heterogeneity in risk of prostate cancer with two distinct latent classes in the study population. The risk of prostate cancer, both favorable-risk and aggressive, was stronger associated with latent class membership than any of the included covariates (age, T2DM status, education level and the CCI). The association with class membership included factors not accounted for in the study, that is class-specific relative frailty. Moreover, we found no association for T2DM status and prostate cancer risk after isolating the effect of censoring due to competing risks, in contrast to the findings in a conventional survival analysis.
The main strength of the recent method is the possibility to systematically map cohort heterogeneity and the effect of censoring due to competing risks, without violating underlying statistical assumptions. The only assumption made is that any competing risk censoring effects are a consequence of such heterogeneity. Moreover, the method is developed to assess results with straightforward etiological interpretations, in contrast to sub distribution HRs based on competing risk regression. 5,6 By use of this new method it is possible to detect frail sub classes in the study population with classspecific risk factors to the event of interest, and to detect scenarios where competing risk may mask or bias etiological associations, such as false aetiology or false protectivity. 9 A limitation is that information in the covariates can only give an indication about class membership retrospectively. Another limitation is the crude definition of T2DM status based on data of anti-diabetic drug use without accounting

Cancer Epidemiology
H€ aggstr€ om et al. for duration. In the current study, men with T2DM treated with diet only is not included, and a minority of those classified as T2DM might have Type 1 diabetes mellitus, however, this definition of T2DM have been used before in a similar setting. 40 The results from our study are consistent with studies of frailty of cancers at other sites, where researchers have suggested that a high proportion of cases may arise in a minor susceptible part of the population. 9,[41][42][43] There is one previous study of prostate cancer using the recently developed method where, in line with our results, the population in that study also included two distinct latent classes; however, the frailer class consisted of 16% of the cohort as compared to 9% in our study. 31,44 The difference in size for the frailer class might be attributed to differences in the overall cohort, as the study by Grundmark et al. included a younger cohort with longer follow up, which may imply a higher risk of developing prostate cancer. 44 We speculate that men in the frail class may have family history of prostate cancer, 45,46 or an unknown risk factor. Our conventional analysis of T2DM status and prostate cancer risk showed an inverse association, in line with previous reports. [18][19][20] The conventional analysis handles all events other than prostate cancer as censored, and neglects the associations with T2DM status and competing risks. To the best of our knowledge, no previous studies have been able to assess the etiological risk of T2DM and risk of prostate cancer, isolated from the effect of censoring from competing risks. By use of the new method we identified the effect of censoring due to competing risks and assessed etiological association isolated from this effect. This resulted in a null association for T2DM status with risk of prostate cancer. We thus noted that the conventional analysis showed what has previously been denoted as false protectivity. 9,31 There are other reports discussing issues with etiological association after accounting for the effect of censoring from competing risks, these studies have used the terms selection bias of survivors, or competing risk bias. [2][3][4] Similar to our data, one of these studies investigated the consistent inverse association for smoking and the risk of malignant melanoma, and after simulating and removing the effect of censoring from competing risks, a null etiological association was detected. 2 In conclusion, methods in conventional survival analysis are often limited by assumptions that are rarely satisfied in real life situations. In the current study we have applied a recently developed method created for heterogeneous populations to calculate etiological risk estimates for prostate cancer undisturbed of competing risks. In accordance with studies of other cancer sites, we found that only a minor part of the studied population had a higher risk of prostate cancer. In line with the concept of precision medicine, this method can detect classes of individuals that differ in their susceptibility to a particular disease, their reaction to a certain risk factor or their response to a specific treatment. These individuals can be further investigated to find clues for preventive or therapeutic interventions. 10