An estrogen receptor (ER)‐related signature in predicting prognosis of ER‐positive breast cancer following endocrine treatment

Abstract Quite a few estrogen receptor (ER)‐positive breast cancer patients receiving endocrine therapy are at risk of disease recurrence and death. ER‐related genes are involved in the progression and chemoresistance of breast cancer. In this study, we identified an ER‐related gene signature that can predict the prognosis of ER‐positive breast cancer patient receiving endocrine therapy. We collected RNA expression profiling from Gene Expression Omnibus database. An ER‐related signature was developed to separate patients into high‐risk and low‐risk groups. Patients in the low‐risk group had significantly better survival than those in the high‐risk group. ROC analysis indicated that this signature exhibited good diagnostic efficiency for the 1‐, 3‐ and 5‐year disease‐relapse events. Moreover, multivariate Cox regression analysis demonstrated that the ER‐related signature was an independent risk factor when adjusting for several clinical signatures. The prognostic value of this signature was validated in the validation sets. In addition, a nomogram was built and the calibration plots analysis indicated the good performance of this nomogram. In conclusion, combining with ER status, our results demonstrated that the ER‐related prognostic signature is a promising method for predicting the prognosis of ER‐positive breast cancer patients receiving endocrine therapy.


| INTRODUC TI ON
Breast cancer is a heterogeneous disease with multiple molecular features. It is a major health burden in the world, which results in the leading cause of cancer death among females. Incidence rate of breast cancer has been increased for several years, resulting from a and progesterone receptor (PR). ER-related genes are highly expressed in luminal A tumours, while expression levels of HER2 and some proliferation-related genes are low. Compared with luminal A tumours, luminal B tumours have lower expression levels of ER-related genes, higher expression of the proliferation-related genes and variable expression of HER2 genes. Patients with luminal A breast cancer were often considered to have the best prognosis, followed by patients with luminal B breast cancer. 3 Expression of ER is associated with favourable prognosis and can predict the efficacy of endocrine therapies including aromatase inhibitors and tamoxifen.
Previous studies demonstrated that ER-positive breast cancer patients treated with adjuvant tamoxifen treatment resulted in a decreased breast cancer death. Despite most ER-positive breast cancer patients show good prognosis after receiving antiestrogen therapy, while some of them can develop acquired resistance after 5 years of therapy and suffer from distant metastasis or even death. 4 The high-throughput platforms for genomic analysis provided promising tools in medical oncology with great clinical applications.
Multiple gene prognostic signatures could provide further prognostic information, and several molecular prognostic profiles have been validated and are in clinical use: the Oncotype Dx, the Amsterdam 70-gene signature and the PAM50 are the three most commonly used. The Oncotype DX calculates a recurrence score and divides breast tumours into low-, intermediate-and high-risk groups to estimate the likelihood of recurrence in tamoxifen-treated patients with (ER)-positive breast cancer. 5,6 The Amsterdam 70-gene signature could accurately grouped patients into low or high risks to predict distant metastases and death, which is approved for application in both ER-positive and ER-negative tumours. 7 The PAM50 is a 50-gene test, improving classification of breast cancer patients into prognostic groups. 8 These signatures assist therapeutic strategies determination and prognosis predication of patients with breast cancer.
Expression of ER-related genes could provide predictive value for predicting the responses to chemotherapy, and may allow to identify patients who will either benefit or be resistant to chemotherapy. 9 In this study, we constructed an ER-related gene signature and developed a nomogram to predict the relapse-free survival (RFS) of ER-positive breast cancer patients receiving endocrine therapy.
Our findings suggested that this ER-related gene signature could be used as an effective prognostic predictor for patients with ER-positive breast cancer patients receiving endocrine therapy.

| Data processing
Three datasets (GSE6532, GSE4922 and GSE9195) containing gene expression profiling data of ER-positive breast cancer patients receiving adjuvant hormonal therapy alone and their corresponding clinical data were downloaded from the GEO databases. Only ERpositive patients with complete clinical information were included in our analysis. Three chip platforms, Affymetrix Human Genome U133A (GPL96), Affymetrix Human Genome U133B (GPL97) and Affymetrix Human Genome Plus 2.0 (GPL570) were used to obtain gene expression data. Raw microarray cell intensity files were obtained, background-adjusted and normalized using Robust Multichip Average. The RNA expression data were scaled with a standard deviation of 1 and a mean of 0. The data under the same chip platform were then merged and the ComBat method was used to remove the potential internal and external batch effects. We reannotated the probe sets of the Affymetrix Human Genome U133A, Affymetrix Human Genome U133B and Affymetrix Human Genome Plus 2.0 platforms by mapping all probes to the Gencode annotation (Version 29) using SeqMap. We selected the probes that were mapped uniquely to the genome with no mismatch. We obtained 256 ERrelated genes through the Molecular Signature Database v6.2 2 3.
Only 62 genes mapped to all the three platforms were used for further analysis.

| Construction of the ER-related prognostic signature
The dataset based on GPL96 was used as the training set, and another two sets based on GPL97 and GPL570 were used as validation sets. GSE12093 and GSE17705 based on platform GPL96 containing survival information were also downloaded and combined for validation. Univariate Cox regression analysis was first performed to identify prognostic genes. P < 0.05 was considered as significant.
Lasso-penalized Cox regression was used to further narrow the genes for prediction of the RFS. The LASSO Cox regression model was analysed using the 'glmnet' package. LASSO shrinked all regression coefficients towards 0 and set the coefficients of many irrelevant features exactly to 0 based on the regulation weight λ. The optimal λ was selected according to minimum cross validation error in 10-fold cross validation. Finally, a multivariate Cox regression analysis was conducted to assess the contribution of genes as an independent prognostic factor for patient survival. A stepwise method was employed to select the best model. A risk score was built, with the coefficients weighted by the penalized Cox model in the training set. The optimal cut-off of risk score was obtained using 'survminer' package in r. All patients were classified into either the high-risk or the low-risk group based on the optimal cut-off of risk score.

| Construction of the nomogram
A nomogram was constructed using the 'rms' r package and calibration plots were performed to assess the prognostic accuracy of the nomogram. The predicted outcomes and observed outcomes of the nomogram were presented in the calibrate curve and the 45° line represented the best prediction.

| Statistical analysis
To investigate the prognostic accuracy of ER-related classifier, we performed time-dependent receiver operating characteristic (ROC) analysis using the 'survivalROC' R package. Relapse-free survival was analysed based on Kaplan-Meier method, and we performed the log-rank test to assess the statistical significance of the differences between different groups. Cox regression model was used to analyse multivariable survival analysis. Hazard ratios (HRs) with their respective 95% confidence intervals were obtained. A P < 0.05 was considered statistically significant and all tests were two-sided. All statistical tests were performed with r software (version 3.5.0).

| Patient characteristics
As shown in Figure 1, a flow chart of the analysis procedure was developed to describe our study. We collected breast cancer expression datasets and their corresponding clinical data from GEO database.
Cases from GSE GSE6532, GSE4922 and GSE9195 were assigned to three sets: training set (GPL96), validation set I (GPL97) and validation set II (GPL570). Clinical information included age, tumour size, grade and lymph node status. As GSE12093 and GSE17705 did not have the information of age, tumour size, these two datasets containing 434 ER-positive patients were combined for validation (validation set III).
The clinicopathologic characteristics of patients in the training set are shown in Table 1. The median follow-up in training set was 9.5 years (low-risk group) and 5.2 years (high-risk group); in the validation set I, median follow-up was 8.9 years (low-risk group) and 4.9 years (high-risk group); in the validation set II, median follow-up was 10.5 years (lowrisk group) and 7.2 years (high-risk group); in the validation set III, median follow-up was 8.7 years (low-risk group) and 6.9 years (high-risk group). Ninety-three (50%, training set), 92 (54.1%, validation set I), 25 (39.1%, validation set II) and 74 (28.7%, validation set III) patients in high-risk group developed relapse during the follow-up period.

| Identification of an ER-related signature
We first performed univariate Cox regression analysis to identify prognostic genes in the training set. The patients were stratified into high expression group and low expression group according to optimal cut-off of each gene. And 28 ER-related genes significantly associated with the RFS were considered as prognostic genes and selected for further analysis. Then Lasso-penalized Cox analysis with 10-fold cross-validation was performed to narrow the genes for prediction of the RFS, 13 ER-related genes were identified. Subsequently, we conducted a stepwise multivariate Cox regression analysis, and 10 ER-related genes were finally screened out as prognostic genes to build a

| Analysis of the ER-related signature in the training set
In the training set, we performed the time-dependent ROC curves analysis to assess the prognostic accuracy of the ER-related signature. The areas under the ROC curve (AUC) achieved 0.656, 0.736 and 0.735 at 1, 3 and 5 years of recurrence-free survival ( Figure 3A). The risk score for each patient was calculated and we classified all breast cancer patients in the training set into high-risk group and low-risk group by using the optimum cut-off score (0.1) generated by 'survminer' package in R via the maximally selected rank statistics. We found that patients in the lower-risk group had significantly better RFS than those in high-risk group ( Figure 3B).  Multivariate Cox proportional hazards regression analysis demonstrated that the ER-related signature was an independent risk factor when adjusting for the classical clinicopathologic factors (Table 2). When we stratified the patients by clinicopathological risk factors, the ER-related signature was still a statistically significant prognostic model where patients in high-risk group had poorer prognosis (Figure 4). The same results were found in the entire validation set ( Figure S1).

| Validation of the ER-related signature in validation sets
To validate the predictive power of the ER-related signature for breast cancer patients, we tested the signature in three validation sets. According to the signature identified above, patients in the lower-risk group had significantly better RFS In validation set I, AUCs In the three validation sets, Kaplan-Meier analysis and log-rank test demonstrated that patients in the lower-risk group had significantly better RFS ( Figure 5). Multivariate Cox proportional hazards regression analysis of validation set I and II also demonstrated that the signature was an independent risk factor ( Table 2).

| Nomogram development
To predict the recurrence probability of breast cancer patients using a quantitative method, we constructed a nomogram that integrated both the ER-related signature and the conventional clinicopathological factors ( Figure 6) to predict 3-and 5-year DFS probability.
Calibration plots indicated that the nomograms had good accuracy compared with an ideal model both in training set and validation set ( Figure 6B-G).  higher tumour grades and poor prognosis in many tumours. [10][11][12][13] In breast cancer, CCNE1 is the immediate downstream effector of estrogen-related receptor α. 14 Overexpression of cyclin E contributes to the antiestrogen resistance. 15 And up-regulation of CCNE1 can abrogate the tamoxifen-mediated growth arrest via the modification of RB/E2F pathway. 14 CITED2 has been reported to play a role in tumourigenesis, including that of the colon, lung and skin. [16][17][18] In a murine mammary cancer model, CITED2 is identified as a potential facilitator of breast cancer bone metastasis. 19  In this study, we developed a prognostic signature based on 10 ERrelated genes and constructed a novel nomogram to predict the RFS.
These findings might lead to the development of a cheap molecular test and suitable in the clinical routine. Although the nomogram demonstrated an accurate survival prediction, several limitations should not be ignored. The sample size of our study was limited, and large-scale cohort studies are performed to investigate the prognostic value of this ER-related signature. As only the patients who had complete information were included in present study, there might be a selection bias in the primary cohort. Several predictors, such as radiotherapy and Ki-67 index, were not analysed. Further, in vivo and in vitro studies are required to confirm the exact molecular mechanisms of these diagnostic genes.
In conclusion, combining with ER status, our results demonstrated that the ER-related prognostic signature is a novel and important method for predicting the prognosis of breast cancer patients. Thereby, it may be a useful predictive tool with a good prospect of clinical application for ER-positive breast cancer patients receiving endocrine therapy.

ACK N OWLED G EM ENTS
None.

CO N FLI C T O F I NTE R E S T
The authors declare that they have no conflict of interest.