Comparison of breast cancer risk factors among molecular subtypes: A case‐only study

Abstract Epidemiological studies have a clear definition of the risk factors for breast cancer. However, it is unknown whether the distribution of these factors differs among breast cancer subtypes. We conducted a hospital‐based case‐only study consisting of 8067 breast cancer patients basing on the Tianjin Cohort of Breast Cancer Cases. Major breast cancer subtypes including luminal A, luminal B, human epidermal growth factor receptor 2 (HER2)‐enriched and basal‐like were defined by estrogen receptor, progesterone receptor, HER2, and Ki‐67 status. Variables including demographic characteristics, reproductive factors, lifestyle habits, imaging examination, and clinicopathologic data were collected for patients. Chi‐square test and one‐way analysis of variance were used to compare the distributions of variables among the four breast cancer subtypes. Multivariate logistic regression was used to estimate the odds ratios and associated 95% confidence intervals where luminal A patients served as the reference group. Overall, more commonality rather than heterogeneity on the distributions of factors was found between the four molecular subtypes of breast cancer. The proportion of overweight and obesity were lower in HER2‐enriched subtype. Women with age at menarche ≤13 years were more likely to be found in basal‐like subtype. Postmenopausal women were more frequent in HER2‐enriched and basal‐like subtypes. Women with benign breast disease and higher breast density were more common in HER2‐enriched subtype. Risk factor scoring showed that total risk scores were similar among the four subtypes. HER2‐enriched and basal‐like subtypes were more frequently diagnosed with large tumors. Calcification was more likely to be found in luminal B and HER2‐enriched subtypes, whereas less distributed in basal‐like subtype. Most of the breast cancer risk factors were similarly distributed among the four major breast cancer subtypes; commonality is predominant.


| INTRODUCTION
Breast cancer is the most common malignant tumor and the leading cause of cancer death among women, with an estimated 1.7 million new cases and 521 900 deaths worldwide each year according to the Globocan 2012. 1 Although China is a relatively low-incidence country for breast cancer, new cases of breast cancer have been growing at a rate of 3%-4% per year in recent years, with an incidence of 27.0/100 000 in 2012. 2,3 Breast cancer is a highly heterogeneous disease. Based on the expression of specific genes, intrinsic subtyping has classified breast cancer into four major subtypes, including luminal A, luminal B, human epidermal growth factor receptor 2 (HER2)-enriched, and basal-like breast cancer. Each breast cancer subtype carries distinct clinicpathologic characteristics and prognoses, which may suggest heterogeneous etiologies. 4 The occurrence of breast cancer is mainly an interaction of genetic factors and environmental factors, and traditional epidemiology studies have made a clear definition of the risk factors for breast cancer. Recent studies showed that established risk factors might have different effect on different intrinsic subtypes, although the results were inconsistent. One study showed increasing body mass index (BMI) significantly reduced the risk of luminal A tumors among premenopausal women, increasing age at menarche was associated with a lower risk of basal-like subtype. 5 Another study showed age at first birth was associated with luminal A tumors, and duration of lactation was inversely associated with risk of basallike tumors. 6 In a case-control study, parity had a protective effect on all subtypes except for basal-like subtype, breastfeeding was associated with the risk of luminal A, luminal B, and basal-like subtypes, and increasing age at menarche had a protective effect on luminal A and B subtypes. 7 More studies are still needed to illustrate the disparity of breast cancer risk factors among subtypes, especially studies from Asian population.
We conducted this study to evaluate the associations between common risk factors and breast cancer subtypes in a Chinese breast cancer cohort, and to summarize the commonality and heterogeneity of breast cancer epidemiological risk factors among breast cancer subtypes.

| Study population
This study was a hospital-based case-only study basing on the Tianjin Cohort of Breast Cancer Cases (TBCCC).
TBCCC is an open prospective cohort study, which was launched since 2004 and aimed to support studies on breast cancer survival, treatment evaluation, disease progression, molecular subtypes, quality of life, and precision medicine among Chinese female breast cancer patients. [8][9][10][11][12][13] A total of 12 128 newly diagnosed breast cancers patients had been enrolled in TBCCC until March 2014, while an estimated 2000-3000 breast cancer patients per year will be continuously added to the current open cohort. These newly diagnosed breast cancer patients were defined as patients who were first diagnosed as breast cancer with pathological examination within 6 months after admission in Tianjin Medical University Cancer Institute and Hospital (TJMUCH). All patients were followed up annually with telephone to collect information of recurrence, metastasis, mortality, and further examination and treatment after progression. Hospital information system at TJMUCH was used to confirmed self-reported information of recurrence, metastasis, and further examination and treatment after admission. Established death registry data in the local region were used to confirmed self-reported information of mortality. If patients cannot be contacted by telephone, both established cancer registries and death registries were used to ascertain the prognosis of enrolled patients.
All patients in TBCCC must be Chinese residents. All patients were confirmed with pathological examination, and patients without clear pathological examination were excluded. Moreover, patients without written consent and blood samples, or refused to receive baseline survey and further follow-up were excluded. In this study, male breast cancer patients and patients without molecular subtypes (or cannot be imputed based on relevant test results, detailed information referred to the section "Imputation of breast cancer molecular subtypes") were also excluded. Finally, a total of 8067 breast cancer patients with complete data were included in this study.

| Data collection
Data of demographic characteristics (age, race, marriage, education, occupation, income, insurance, etc.), reproductive factors (age of menarche, menopausal status, age at menopause, pregnancy, living birth, breast feeding, abortion, etc.), lifestyle habits (smoking, alcohol drinking, diet, physical activity, etc.), and body size (height and weight) were investigated by trained physicians with face-to-face questionnaire interview. Imaging examination data were recorded on the case report form by sophisticated imaging physicians with at least 5-year experience on breast imaging diagnosis. Pathology data were collected from the pathological report form recorded by the pathological physicians with at least 3-year experience on pathological diagnosis.

| Imputation of breast cancer molecular subtypes
The molecular subtype for a part of patients was obtained from the pathology report form recorded by pathologists, and the rest was obtained by imputation. A random forest algorithm was used to construct a subtype classifier using the caret R package, 7 molecular subtypes were predicted based on age, ER, PR, HER2, and Ki-67. The random forest algorithm is an ensemble or collection of multiple decision tree models. Each tree is grown from a bootstrap sample of the training dataset and each node is split using the best among a randomly selected subset of explanatory variables or features. Forest algorithm injects randomness into the training of the trees, and combines the output of multiple random trees into the final classifier. We split the cases with known molecular subtype into two groups: the training set (n = 837, known) and the testing set (n = 209, known). Basing on parameters including age, ER, PR, HER2, and Ki-67, random forest algorithm was used to model and optimize the training set. We validated the testing set using the constructed model and evaluated the performance of the imputation, with an accuracy of more than 99%. Finally, the constructed model was used to impute the cases with unknown molecular subtypes (n = 7032). After imputation of breast cancer molecular subtypes, there were 4881 luminal A, 1296 luminal B, 1327 HER2-enriched, and 563 basal-like breast cancer cases in this study.

| Statistical analysis
The measured data and count data were expressed as mean ± SD and n (%), respectively. Chi-square test and oneway analysis of variance were used to compare the distributions of demographic characteristics and risk factors among four subtypes of breast cancers. Multivariate logistic regression was used to estimate odds ratios and associated 95% confidence intervals where luminal A patients served as the reference group, since luminal A patients were the most commonly diagnosed breast cancer subtype. Chi-square test was used to analyze the association of breast cancer subtypes with tumor markers, hormone levels, and clinical features. All statistical tests were two-sided and P < 0.05 was considered statistically significant. All analyses were performed using the SPSS 23.0 software (SPSS Inc., Chicago, IL, USA).

| RESULTS
For the 8067 breast cancer patients, 60.5% (n = 4881) were classified as luminal A, 16.1% (n = 1296) as luminal B, 16.4% (n = 1327) as HER2-enriched, and 7.0% (n = 563) as basal-like breast cancer. Demographic characteristics of patients by molecular subtypes were summarized in Table 1. Compared with other subtypes, luminal B subtype was more likely to be younger at diagnosis (P < 0.001). There was a statistically difference in marriage status, average monthly income per person, current occupations, and age at first marriage among the four subtypes of breast cancer (P < 0.05). However, no statistical difference was found in education among the four groups (P = 0.771).
Furthermore, we constructed a risk scoring system including a total of 11 variables for to summarize breast cancer risk for each patient. Each variable was divided into two categories, with 0 representing lower risk for breast cancer and 1 representing higher breast cancer risk (Figure 1). We added the above 11 risk factors together and calculated a total risk score for each patient. The average scores for four subtypes were 4.6, 4.7, 4.6, and 4.6, respectively. Box plot showed that total risk scores for the four molecular subtypes were similar (Figure 1).
Of the 8067 cases, CA153, CA125, and Carcinoembryonic Antigen (CEA) were measured in 4803, 2047, and 3270 breast cancer patients, respectively. There was no statistically significant difference in the proportion of CA153 (P = 0.55), CA125 (P = 0.25), and CEA (P = 0.37) beyond the reference range in different subtypes ( Figure 2). CA125, CA153, and CEA were measured simultaneously in 1867 breast cancer patients and no significant difference was found for the three markers combined among the four subtypes (χ 2 = 3.653, P = 0.30). Prolactin (PRL) and testosterone were measured in 5000 and 5786 breast cancer patients, with 15.7% and 10.7% of patients being above or below the reference range, respectively. The abnormal proportions of PRL and testosterone in different subtypes of breast cancer were statistically significant (P < 0.01), whereas other four hormones as follicle-stimulating hormone, estradiol (E2), progesterone, and luteinizing hormone were not significantly different among the four subtypes (P > 0.05) (Figure 2).
The tumor characteristics of patients by molecular subtypes were shown in Table 3. The proportion of calcification HER2, human epidermal growth factor receptor 2; BMI, body mass index; HRT, hormone replacement therapy.
a Odds ratios were adjusted for age, marriage, education, average monthly income, current occupations, age at first marriage, and all above risk factors for breast cancer.
b Statistically significant at P < 0.05.

T A B L E 2 (Continued)
is highest in luminal B (73.3%) and lowest in basal-like (53.6%) subtype. Luminal A cases were more likely to have American Joint Committee on Cancer (AJCC) early stage (65.9%) and tumor size ≤2 cm (50.7%). Compared to luminal A cases, calcification was associated with an increased odds of luminal B and HER2-enriched subtype, whereas with a lower odds of basal-like subtype. HER2-enriched and basallike breast cancers were more often to have large tumors. There was no significant association between molecular subtype and lymph node metastasis (P = 0.114).

| DISCUSSION
In this hospital-based case-only study, more commonality rather than heterogeneity on the distributions of factors was found between the four molecular subtypes of breast cancer. The differences between four molecular subtypes of breast cancers are mainly manifested in clinical characteristics such as calcification, stage, tumor size, and mammary gland-related hormone levels. Overall, risk factor scoring indicated that total risk scores for the four molecular subtypes were similar. Many studies have evaluated the association between breast cancer risk factors and breast cancer subtypes. In a case-control study in East Asian women, they found overweight, late menopause, and lack of breastfeeding appear to increase risk of both luminal and ER-PR tumors. 15 In a crosssectional study of 7020 patients, Brouckaert et al. found BMI was linearly related to the probabilities of luminal B and HER2-like breast cancer subtypes. 16 Phipps et al. found breast density was similarly positively associated with risk of all subtypes, BMI was positively associated with risks of ERpositive and triple-negative breast cancer. 17 Au et al. suggest a correlation of the occurrence of luminal-like BC subtypes with low parity and short or no duration of breastfeeding. 18 In a nested case-control study, number of pregnancies was inversely associated with relative risk of luminal-like breast cancers, hormone therapy use was strongly associated with risk of luminal-like breast cancer. 19 In a study of reproductive factors and risk of triple-negative breast cancer, breastfeeding decreases the risk of TNBC. 20 These researches are similar to our findings that a few factors were differently associated with certain subtypes, but in general there is no substantial difference. Researchers may emphasize the special risk factors for special subtypes, while these factors cannot be well replicated in other studies, the disparity may be caused simply by chance, but the underlying biological role of certain factors should not be overlooked and need further research. In this study, women with menarche age ≤13 years were more likely to be found in basal-like subtype. In a meta-analysis of 12 populations by Yang, they found women with menarche age <12 years increased 1.16 times the risk of ERpositive tumors. 21 Ma et al. showed that late menarche age can reduce the risk of all subtypes breast cancer. 22 Our study found postmenopausal women were more frequent in HER2enriched and basal-like subtypes. In a population-based casecase study consisting of 2710 women, they found that age at menopause were positively associated with odds of triplenegative breast cancer. 23 In a case-control study in Southeast Asia, late age of menopause was associated with an increased risk of luminal and basal-like tumors. 15 Women with benign breast disease and higher breast density were more common in HER2-enriched subtype, whereas Holm et al. did not find a significant difference between benign breast disease and breast cancer subtypes. 7 The association between breast density and breast cancer subtypes is still uncertain. 17,24,25 In our study, reproductive factors such as number of pregnancy, number of live births and breastfeeding have no difference among the four subtypes. Current findings on reproductive factors are inconsistent. 20,22,[26][27][28] These findings require confirmation in other studies, and further researches are needed to establish the association between factors and breast cancer subtypes.
The differences between different breast cancer molecular subtypes are mainly manifested in tumor characteristics. In this study, we found serum CA153, CEA, and CA125 were not statistically different between the four groups. Similar to our results, Moazzey et al. reported that CA153 and CEA were not significantly different among different subgroups. 29 We found that hormones such as PRL and testosterone were significantly different in different subtypes of breast cancer. Similar to our findings, Hachim et al. found PRLR expression was highest in the luminal A subtype, 30 Guo et al. found a testosterone increased the risk of ER+ breast cancer. 31 Furthermore, Cen et al. found that calcification is associated with luminal A and HER2-enriched subtypes, 32 in consistent with our results. In our study, there was a significant difference in tumor stage and tumor size. In a retrospective study of Chinese women, 33 they found the differences between tumor size, lymph node metastasis, AJCC tumor stage, and molecular subtypes. HER2-enriched breast cancer has higher lymph node metastasis and higher AJCC tumor stage.
Although our study benefits from a large sample size, comprehensively collected data on a large number of breast cancer risk factors and clinical factors, as well as imaging examination data, several limitations must be acknowledged. First, our study was not designed as a case-control study, which made it difficult to quantify the exact risk for the development of breast cancer subtypes. However, some literature evaluated differences among breast cancer subtypes through case-case studies, like our study. [34][35][36] Second, we only had a small part of subtype information from the pathological report form recorded by the pathological physicians and predicted subtype for the rest. It is regrettable that we cannot get replication from publicly available database such as The Cancer Genome Atlas (TCGA) due to incomplete information on the necessary parameters. However, the use of this subtype classifier may have improved accuracy compared with a previously used Immunohistochemistry (IHC)-based method. Further validation is warranted.

| CONCLUSION
In conclusion, most of breast cancer risk factors and tumor markers for different subtypes of breast cancer are similar, except a few factors for certain subtypes, and the difference is not substantial. The differences between different breast cancer molecular subtypes are mainly manifested in tumor characteristics such as calcification, stage, tumor size, and mammary gland-related hormone levels, etc. The molecular classification of breast cancer is of great significance in guiding clinical work.