Developing and validating a nomogram for penile cancer survival: A comprehensive study based on SEER and Chinese data

Abstract Objective The primary aim of this study was to create a nomogram for predicting survival outcomes in penile cancer patients, utilizing data from the Surveillance, Epidemiology, and End Results (SEER) and a Chinese organization. Methods Our study involved a cohort of 5744 patients diagnosed with penile cancer from the SEER database, spanning from 2004 to 2019. In addition, 103 patients with penile cancer from Sun Yat‐sen Memorial Hospital of Sun Yat‐sen University were included during the same period. Based on the results of regression analysis, a nomogram is constructed and validated internally and externally. The predictive performance of the model was evaluated by concordance index (c‐index), area under the curve, decision curve analysis, and calibration curve, in internal and external datasets. Finally, the prediction efficiency is compared with the TNM staging model. Results A total of 3154 penile patients were randomly divided into the training group and the internal validation group at a ratio of 2:1. Nine independent risk factors were identified, including age, race, marital status, tumor grade, histology, TNM stage, and the surgical approach. Based on these factors, a nomogram was constructed to predict OS. The nomogram demonstrated relatively better consistency, predictive accuracy, and clinical relevance, with a c‐index over 0.73 (in the training cohort, the validation cohort, and externally validation cohort.) These evaluation indexes are far better than the TNM staging system. Conclusion Penile cancer, often overlooked in research, has lacked detailed investigative focus and guidelines. This study stands as the first to validate penile cancer prognosis using extensive data from the SEER database, supplemented by data from our own institution. Our findings equip surgeons with an essential tool to predict the prognosis of penile cancer better suited than TNM, thereby enhancing clinical decision‐making processes.


| INTRODUCTION
Penile carcinoma, while relatively uncommon in the genitourinary system, once had a higher prevalence in certain regions.Globally, approximately 26,000 new cases are estimated to occur, accounting for approximately 1% of new cancer diagnoses in males worldwide. 1Postoperative penile cancer patients often face complex challenges, including concerns about survival, physical and psychological health needs, and adjustments in their lifestyle and relationships.In Western industrialized nations such as Europe and the United States, the incidence varies between 0.1 and 1.0 per 100,000 individuals, whereas in Brazil, it can be as high as 2.8 to 6.8 per 100,000. 2,3Squamous cell carcinoma accounts for more than 95% of all penile cancer cases. 4,5Treatment options for penile cancer encompass various approaches, including local excision with organ preservation, partial penile excision, radical penile excision, and inguinal lymph node dissection.However, it is worth noting that there is a dearth of comprehensive, large-scale randomized controlled trials or comparative observational studies assessing the efficacy of primary penile cancer treatments.
Prognostic factors in penile carcinoma encompass a range of variables, including pathological characteristics, clinical stage, lymph node metastasis, and molecular factors.Pathological features such as tumor type, grade, depth of invasion, nerve invasion, and lymphatic canal invasion are pivotal prognostic indicators for penile carcinoma.Numerous studies have revealed the predictive role of pathological grade in metastasis, disease progression, and overall prognosis.Nevertheless, certain factors, such as the primary tumor site, vascular infiltration, treatment modality, and socioeconomic variables like social relationships and family income, have been relatively underexplored in terms of their prognostic implications.Zini et al. 6 developed a straightforward model for predicting the necessity of surgery in primary penile squamous cell carcinoma, incorporating Surveillance, Epidemiology, and End Results (SEER) stage and tumor grade (TG) as key variables.Furthermore, Thuret et al. 7 reported that augmenting the AJCC stage with TG enhances the accuracy of predicting cancer-specific mortality.
A multitude of factors collectively influence the prognosis of penile cancer.Prognostic models derived from the comprehensive SEER database have consistently demonstrated robust predictive capabilities for both overall survival (OS) and cancer-specific survival (CSS) among patients afflicted with penile cancer. 8Therefore, the main goal of this study was to create a validated and reliable nomogram specifically designed to predict survival outcomes for individuals diagnosed with penile cancer, drawing upon data sourced from the SEER and a Chinese organization.

| Patients and data collection
During the study period spanning from 2004 to 2019, we identified a total of 5744 patients diagnosed with penile cancer in the SEER database by employing SEER*Stat software (version 8.4.1).Additionally, we conducted a retrospective analysis of 103 patients with penile cancer from Sun Yat-sen Memorial Hospital (2004-2019), serving as an external validation cohort.The study protocol received approval from the Ethics Committee of Sun Yatsen Memorial Hospital, Sun Yat-sen University, and informed consent was waived.The inclusion criteria were as follows: (1) patients ranging in age from 0 to 100 years old; (2) patients with complete follow-up information; and (3) patients presenting a single primary tumor, with the exclusion of cases involving other tumors that could potentially influence prognosis.

| Development and validation of the nomogram
The selected patients were randomly allocated into two distinct groups: the training cohort and validation group 1, maintaining a ratio of 7:3.Additionally, a validation group 2 consisting of 103 patients from Sun Yat-sen Memorial Hospital was included for external validation.In these three groups (the training cohort, the internal cancer prognosis using extensive data from the SEER database, supplemented by data from our own institution.Our findings equip surgeons with an essential tool to predict the prognosis of penile cancer better suited than TNM, thereby enhancing clinical decision-making processes.

K E Y W O R D S
external verification, nomogram, overall survival, penile cancer, prognosis, SEER, TMN validation group, and the external validation group), we leveraged significant variables identified through both univariate and multivariate Cox regression models to construct nomograms designed for the precise prediction of OS in cases of penile cancer.The predictive accuracy of our nomogram was assessed by receiver operating characteristic (ROC) methodology; the area under the ROC curve (AUC) and the concordance index (c-index) measured the discrimination ability.Reliability was analyzed by calibration curves.The prediction model was compared with a prediction model built from TNM stages.

| Definitions
Marital status was assessed and categorized as follows: married and other/unknown.Income status was defined as an annual income of <$55,000 or >$55,000.The ruralurban classification was delineated as: Nonmetropolitan (nonmetropolitan counties not adjacent to a metropolitan area and unknown/missing/no match); Metropolitan areas are categorized into three groups: counties situated in metropolitan areas with a population exceeding 1 million, counties within metropolitan regions hosting populations ranging from 250,000 to 1 million, and counties located in metropolitan areas with fewer than 250,000 residents.Additionally, there are "Adjacent to metropolitan" regions, encompassing nonmetropolitan counties situated in proximity to metropolitan areas.The training cohort underwent analysis via the Cox regression model, considering 14 potential variables: age, ethnicity, marital status, income, rural or urban residence, primary site, grade, histology, TNM stage, surgery, non-primary site surgery, radiation, chemotherapy, and L/V invasion.

| Statistical analysis
Statistical differences were described in the figure legends.Statistical analysis was performed employing IBM SPSS Statistics 26.0 and R software, specifically version 4.0.3, which can be accessed at http:// www.R-proje ct.org.The mean, often referred to as the average, is a measure of central tendency that summarizes the central value of a dataset.The standard deviation is a measure of the amount of variation or dispersion of a set of values.The 95% confidence interval (95% CI) is used to estimate the range within which we expect the true value of a population parameter (like the mean or proportion) to lie, with a 95% level of confidence.The hazard ratio (HR) is a measure used in survival analysis to compare the risk of an event occurring at any given point in time between two groups.Nomograms were generated and validated using R Statistical Software with the utilization of packages such as rms, survival, ggplot2, tim-eROC, and rmda.Statistical significance was determined when the p-value was less than 0.05.

| General characteristics
A total of 5744 patients were retrieved from the SEER database, spanning from 2004 to 2019.Among these, 2246 patients were excluded as they did not meet the inclusion criteria, as depicted in Data S1.Consequently, 3498 patients were enrolled in this study, and they were randomly divided into two cohorts: the training group, consisting of 2450 patients, and the validation group 1, comprising 1048 patients.The baseline characteristics of the study population are presented in Table 1.Predominantly, the patients were Caucasian, with African American and individuals of other ethnicities representing 9.1% and 8.2%, respectively.A majority of the patients fell into Grade II, with 24.0%, 17.0%, and 0.7% in stages I, III, and IV, respectively.Squamous cell carcinoma emerged as the predominant histology.The patient distribution across stages was as follows: 1772 in stage T1, 738 in stage T2, 543 in stage T3, and 103 in stage T4.Over 91.7% of the patients underwent radiotherapy, while 87.1% received either partial or total surgery.Chemotherapy was administered to approximately 12.4% of the patients.The Kaplan-Meier survival curve for penile cancer patients is illustrated in Figure 1.

| Risk prediction nomogram development
Both univariate and multivariate Cox regression analyses were conducted, with the findings detailed in Table 2.
Stepwise regression method using the likelihood ratio (LR) test to identify the final independent risk factors in a multivariate analysis.Factors such as age, ethnicity, marital status, grade, histology, TNM stage, and surgery were recognized as independent prognostic indicators.Following this, a predictive model was developed based on these factors for both training groups.This model facilitates the projection of 1-, 3-, and 5-year OS rates for patients, as demonstrated in the nomogram (Figure 2).

| Predictive accuracy of nomogram
The precision of the nomogram in predicting 1-, 3-, and 5year OS was evaluated using the AUC index.For the training set, the AUC values were as follows: 0.806 at 1 year, 0.781 at 3 years, and 0.782 at 5 years, accompanied by a c-index of 0.7424.Similarly, in validation group 1, the AUC values were 0.792 at 1 year, 0.777 at 3 years, and 0.795 at 5 years, with a c-index of 0.7385.These results underscore the nomogram's robust discriminatory capacity, as shown in Figure 3A,B.Additionally, the calibration curves for both training and validation cohorts closely mirrored the ideal diagonal line, signifying exemplary model consistency.The calibration at 1 and 3 years is satisfactory, but the calibration at 5 years is less favorable for the SEER database.Furthermore, the calibration of the second validation database is inadequate.Data S2A-F display the nomogram-predicted probabilities for 1-, 3-, and 5-year OS.The decision curve analysis (DCA) results, illustrated in Data S3A-C, revealed a favorable net benefit for penile cancer patients using our model, underscoring its clinical applicability.The nomogram was further validated externally with a cohort of 103 patients from SYSMH.The AUC values for this group at 1, 3, and 5 years were 0.865, 0.856, and reas depicted in Figure 4C, and the c-index stood at 0.785, confirming the nomogram's high accuracy.This model also demonstrated strong consistency, as evidenced by the calibration curve for validation group 2 aligning closely with the ideal diagonal line (Data S2G-I).Additionally, the DCA showed significant net benefits for validation group 2 (Data S3G-I), highlighting the nomogram's considerable potential in aiding clinical decision-making.

| TNM based predictive nomogram construction
In addition, in the study, we constructed and validated a prediction model based on TNM staging, and compared the nomogram and TNM staging models to predict the survival of penile cancer (Figure 4A).The predicted AUC values of 1-, 3-, and 5-year TMN stages were 0.712, 0.682, and 0.668, respectively, which were much lower than the nomogram we constructed (Figure 4B).Similarly, the comparison found that the nomogram constructed in this study had better ability to predict the prognosis of penile cancer due to its consistency with practice and clinical value (Figure 4C-H).These findings suggest that the nomogram may offer greater clinical utility compared to the TNM stage.

| DISCUSSION
The prevalence of penile cancer is relatively low, as noted in sources, 9,10 but it exhibits substantial regional variation due to factors such as geographical location, religious practices, socioeconomic status, and general health conditions. 11Ethnic and religious groups, particularly Jewish and Muslim communities that traditionally practice early-age circumcision, tend to have lower incidences of penile cancer. 12Treatment methods for the primary lesion include penile-sparing treatments as well as radical penectomy with urethroperineal fistula or penile  reconstruction.The choice of treatment method should be based on the size of the tumor, histological staging, grading, and the patient's own wishes.The overall principle is to remove the tumor completely while preserving as much of the penis as possible.However, the prognosis presents a significant challenge for medical professionals, primarily due to the rarity of cases and the absence of reliable prognostic tools. 13This scarcity has led to a lack of focused attention and guidelines for this patient demographic.Our study is pioneering in addressing this gap through the utilization of large-scale data, including contributions from our own center.The clinical stage of penile carcinoma is a critical determinant of prognosis.Post-treatment survival rates show considerable variation by stage: 95.8% for stage I, 77.8% for stage II, 47.8% for stage III, and 0% for stage IV.Patients diagnosed with advanced penile cancer exhibit a 2-year survival rate of only 21%. 3 Lymph node metastasis, a key prognostic indicator, significantly impacts patient survival and necessitates a comprehensive assessment.While current treatments have boosted survival rates for about 80% of patients with early-stage disease, those with inguinal lymph node metastases face drastically reduced 5-year overall survival rates, dipping below 40%. 14Recent research underscores the relevance of inguinal lymph node density (LND) as a metric for risk stratification, particularly in penile cancer patients with positive lymph nodes who undergo inguinal lymph node dissection. 15he TNM stage system serves as a widely employed tool for evaluating cancer prognosis, but it is not the best tool for any one group.In this study, the constructed nomogram was compared with the predictive model based on the TNM stage, and it was found that our model has a higher value than the TNM in the aspect of prognosis assessment for penile cancer patients.Therefore, for clinical doctors, this tool is a necessary aid. 16In this study, we analyzed 3498 patients from the SEER database to explore the overall characteristics of penile carcinoma and to formulate prognostic prediction models.Through the application of Cox regression analysis, we discerned nine autonomous risk factors that include age, race, marital status, grade, histology, TNM stage, and surgical intervention, all serving as pivotal predictors of OS.Subsequently, we developed nomograms for 1-, 3-, and 5-year OS.The validation of these nomograms focused on assessing their fit, generalizability, and effectiveness.The models showed comparatively good consistency, discriminatory capability, and clinical utility.Thus, these prediction models hold substantial potential in guiding clinical decisions for patients with penile carcinoma.
Yang et al. 17 developed a competing risk prediction model using SEER database data, encompassing 2091 patients with penile cancer.Their findings indicated significant links between survival in penile cancer patients and factors such as AJCC stages II and III, tumor diameter exceeding 5 cm, and TNM stages N1-3 and M1.Several prognostic factors influencing penile cancer prognosis have been identified. 4,18,19In our nomograms, the TNM stage is characterized by a high-risk score, underscoring its importance.Models based on SEER data have demonstrated enhanced predictive accuracy for OS in penile cancer.In recent studies on marital status, 20,21 males who are unmarried or divorced had an increased risk of invasive penile squamous cell carcinoma, which may indicate a connection between advanced-stage cancer and marital status.Our research supports this finding by identifying marital status as an independent risk factor for penile cancer.
In our study, the median patient age was 65 years, consistent with the age ranges reported in previous studies: 69 years by Zini et al. 6 and 61 years by Zheng et al. 22 Consistent with earlier findings, our study suggests that older individuals often have worse prognoses, likely due to coexisting health conditions.Moreover, tumor grade emerged as a notable independent prognostic factor in our Cox regression analysis, with higher-grade tumors correlating with poorer outcomes.The predominance of squamous cell carcinoma in our findings may account for the generally unfavorable prognosis of penile carcinoma.Additionally, T, N, and M stages were identified as independent risk factors affecting OS.Our results reinforce the notion that surgical treatment can enhance patient outcomes, with local recurrences typically responding well to surgical interventions and exerting minimal impact on survival. 5,14,22In contrast, Chen et al. 13 did not consider average income, which might have influenced their findings.However, our analysis revealed no significant effect of income or rural-urban status on OS.We attribute the robustness and reliability of our results to our larger sample size.
Nevertheless, our research does have certain limitations.The retrospective nature of our data collection constrained the breadth of our study.While circumcision may offer some protection against penile cancer, there is only limited information available on this topic.Hence, the generalizability of our nomogram to all penile cancer patients remains uncertain.Furthermore, we were unable to include potential prognostic factors such as smoking history, sexual behavior, and HPV infection [23][24][25] in our analysis, which could have influenced our findings.Additionally, our approach mirrored previous studies in that we did not engage in prospective data collection.Therefore, validating the efficacy of our predictive model through large-scale, prospective, randomized controlled studies is an essential next step.

| CONCLUSION
In summary, our research, which used data from our institution and the SEER database, developed and externally validated a nomogram that can accurately predict the prognosis of patients with penile cancer.It indicated better consistency, a higher c-index, and significant clinical value when compared to the TNM staging system's predictive model.This nomogram is a valuable tool that can help clinicians diagnose and treat penile cancer patients on an individual basis.

1
Kaplan-Meier survival curve for penile cancer patients.

F I G U R E 2 F I G U R E 3
Nomogram predicting 1-, 3-, and 5-year overall survival in penile cancer.Receiver operating characteristic curve analysis.Comparing predictive accuracy in the training cohort (A), validation group 1 (B), and validation group 2 (C).

F I G U R E 4
Comparative analysis and clinical utility of the nomogram.A nomogram for predicting 1-, 3-, and 5-year overall survival (OS) based on the TNM stage (A), a comparison of ROC curves between the nomogram and TNM stage for predicting OS (B), calibration plots evaluating the nomogram's performance for 1-, 3-, and 5-year OS predictions (C-E), and a DCA comparing the clinical utility of the OS prediction model over 1-, 3-, and 5-year periods, juxtaposing the nomogram model and TNM stage model (F-H) with the range of threshold probabilities 0.201 (0.185, 0.229), 0.386 (0.374, 0.408), 0.442 (0.413, 0.461), respectively.
Baseline characteristics of the study population.
T A B L E 1Abbreviations: L/V Invasion, lymphangion/vessel invasion; SD, standard deviation; SEER, Surveillance, Epidemiology, and End Results.