Machine learning‐based model constructed from ultrasound radiomics and clinical features for predicting HER2 status in breast cancer patients with indeterminate (2+) immunohistochemical results

Abstract Background We aimed to predict human epidermal growth factor receptor 2 (HER2) 2+ status in patients with breast cancer by constructing and validating machine learning models utilizing ultrasound (US) radiomics and clinical features. Methods We analyzed 203 breast cancer cases immunohistochemically determined as HER2 2+ and used fluorescence in situ hybridization (FISH) as the confirmation method. From each case, the study analyzed 840 extracted radiomics features and 11 clinicopathologic features. Cases were randomly split into training (n = 141) and validation sets (n = 62) at a 7:3 ratio. Univariate logistic regression analysis was first performed on the 11 clinicopathologic characteristics. The least absolute shrinkage and selection operator (LASSO) and decision tree (DT) techniques were employed for post‐feature selection. Finally, 19 radiomics features were utilized in logistic regression (LR) and Naive Bayesian (NB) classifiers. Model performance was gauged using the area under the receiver operating characteristic curve (AUC), accuracy, sensitivity, and specificity. Results Our models exhibited notable diagnostic efficacy in differentiating HER2‐positive from negative breast cancer cases. In the validation sets, the LR model outperformed the NB model with an AUC of 0.860 and accuracy of 83.8% compared to NB's AUC of 0.684 and accuracy of 79.0%. The LR model demonstrated higher sensitivity (92.3% vs. 46.2%) while the NB model had a better specificity (91.8% vs. 63.3%) in the validation set. Conclusions Machine learning models grounded on radiomics efficiently predicted IHC HER2 2+ status in breast cancer patients, suggesting potential enhancements in clinical decision‐making for treatment and management.


| INTRODUCTION
Breast cancer is a heterogeneous tumor that can be classified into four major molecular subtypes: luminal-like (luminal A and luminal B), HER2-positive, and basal-like. 1 The prognosis varies significantly among these molecular subtypes, prompting extensive research into the molecular diagnosis of breast cancer. 2HER2 overexpression, which arises from the amplification of the ERBB2/neu proto-oncogene located on the centromere of chromosome 17 (CEN17), is found in approximately 15%-20% of breast cancers. 3HER2-positive breast cancers are notably aggressive, prone to early recurrence, and linked to a poor prognosis. 4,5Despite this, treatments such as trastuzumab (Herceptin) effectively target HER2 protein, benefiting affected patients. 6HER2 status is primarily assessed by immunohistochemistry (IHC).According to the ASCO/ CAP guidelines, IHC results are categorized as negative (0 or 1+), equivocal (2+), or positive (3+).For the equivocal IHC HER2 (2+) results, a follow-up with in situ hybridization (ISH), mainly fluorescence in situ hybridization (FISH), is required. 7HER2 expression, however, has shown heterogeneity within tumors in almost 40% of cases. 8,9Manual HER2 status determination is cumbersome and prone to error, with diagnostic variability among pathologists leading to misdiagnosis in a significant number of cases. 10,113][14] Furthermore, changes in HER2 expression are observed in 20%-40% of patients undergoing neoadjuvant chemotherapy. 15,16The heterogeneity of HER2 within tumors has been identified as an independent factor linked with inadequate response to neoadjuvant chemotherapy in HER2-positive patients. 17onsequently, enhancing the accuracy and reproducibility of HER2 assessment is crucial.In current clinical practice, real-time assessment of tumor HER2 expression through multiple preoperative CNBs presents challenges due to the intra-tumor heterogeneity of HER2.This makes achieving consistent clinical detection and tailoring individualized treatments difficult. 18While ISH testing offers a potential solution, its high cost, time-intensive nature, potential delay in treatment, and limited common use make it less viable.Essentially, there's a crucial need for an accurate, convenient, and non-invasive method to predict HER2 expression status, facilitating tailored breast cancer treatment to enhance patient prognosis.
Clinical diagnosis of breast tumors relies on imaging examinations, including mammography, magnetic resonance imaging (MRI), and ultrasonography (US).While mammography is essential for breast cancer screening in developed countries, its efficacy diminishes for dense breast tissue. 19,20MRI boasts high sensitivity but may be inaccessible due to its cost.In contrast, US offers real-time, multi-angle evaluations, is non-invasive, and radiation-free, making it ideal for diagnosing breast diseases, especially in dense glands. 21,22A form of artificial intelligence (AI) called machine learning (ML), which has recently gained popularity in medicine, uses features from sample data to identify patterns.This has paved the way for computer-aided diagnosis (CAD) systems in breast cancer. 23Currently, numerous ML models are employed to construct CAD systems for breast cancer, such as Naive Bayesian (NB), support vector machine (SVM), and logistic regression (LR).These techniques facilitate data categorization and prediction based on specific features extracted from medical images. 246][27] Notably, imaging has been identified as a means to capture tumor biology at genetic and cellular levels. 28ome studies have discovered correlations between imaging features and HER2-positive breast cancer subtypes.For instance, one study found a correlation between extracted peritumor MRI imaging radiomics features and HER2 expression in breast cancer. 29Numerous more investigations revealed a link between radiomics features taken from breast mammography and MRI images and HER2-positive subtypes of breast cancer. 30,31n light of the aforementioned findings, our study aimed to investigate the relationship between clinicopathologic factors, US radiomics, and HER2 status.Furthermore, we sought to construct an optimal machine learning model to assist in diagnosing patients with indeterminate IHC results.

| Patients and cohorts
Between January 1, 2017, and December 31, 2017, clinical and image data were retrospectively retrieved from the Picture Archiving and Communication System (PACS) at Zhejiang Cancer Hospital.At the hospital, all of the enrolled patients underwent breast surgery.This retrospective study did not require informed consent, guaranteeing that the data were analyzed anonymously.The final results did not include any personal identification.The study was approved by the Zhejiang Cancer Hospital's Institutional Ethics Committee (IRB-2022-185).

| Pathology assessment
IHC staining scores defined by the College of American Pathologists and the American Society of Clinical Oncology (CAP/ASCO) were used to assess HER2 protein levels in breast cancer tissue samples, 32 scores 0/1+ denoted "HER2-negative," 2+ as "HER2 indeterminate," and 3+ as "HER2-positive."FISH determined the final HER2 expression, with a HER2/CEN-17 ratio <2 indicating non-amplification and ≥2 indicating amplification.
Lesions were enrolled using the following criteria.13 HER2-positive).We collected clinical data that included age, menopausal status, T stage (T1 for tumor size ≤20 mm; T2 for 20 mm < tumor size ≤50 mm; T3 for tumor size >50 mm), laterality and quadrant of the breast lesion, family tumor history, levels of estrogen receptor (ER), progesterone receptor (PR), Ki-67 expression, molecular subtype, and axillary lymph node (ALN) status.All of the patients had either invasive ductal carcinoma (IDC) or ductal carcinoma in situ (DCIS).

| US images acquisition and segmentation
US evaluations utilized various instruments: LOGIQ E9, Toshiba Aplio 400, Philips iU22, and Siemens S 2000.For the conventional US examinations, a high-frequency probe model was adopted.Preoperative 2D US images were compiled, and the 3D slicer (version 5.0.3)platform 33 was employed for analysis.The image parameters were consistent across patients, with a gain set at approximately 50%, an image depth ranging from 3 to 7 cm, and focusing parallel to the lesion.The imaging process ensured clear visualization of the largest section of the lesion, and all ultrasound images were stored in digital imaging and communications in medicine (DICOM) format.The region of interest (ROI) for each tumor was identified by manually outlining the tumor on the transverse view of the 2D grayscale US image in DICOM format, as illustrated in Figure 1.A junior radiologist initially marked the contours, followed by verification by a senior radiologist.Discrepancies were resolved through consensus, with both parties blinded to patient details.

| Features selection and models construction
The model's workflow is depicted in Figure 2. We first screened the training set using univariate logistic regression analysis on 11 clinicopathologic features: age (both as a continuous variable and with a cut-off at 50 years), menopausal status, family history, T stage, laterality and quadrant of the breast lesion, expression of ER, PR, and Ki-67, and ALN metastasis status.The clinicopathologic features identified during screening were subsequently subjected to data downscaling and feature screening, in conjunction with ultrasonographic radiomics features.Z-score and mean normalization methods were employed to standardize the respective characteristics of each patient.Feature selection employed two strategies for reproducibility.The least absolute shrinkage and selection operator (LASSO) was employed to determine the strongest predictive features for HER-2 status from the set of training data (Figure 3), and the Decision Tree (DT) calculated feature importance.The LASSO is a regularization algorithm.In contrast to ridge regression, the LASSO algorithm incorporates the L1 criterion as a penalty function into the cost function of standard linear regression.It iteratively updates the values of the weight coefficients until an optimal solution is reached.This process involves compressing the weight coefficients of unimportant features to zero, facilitating feature selection. 35The appropriate LASSO α parameters were chosen utilizing 10-fold cross-validation on the training set.From 841 features, 86 with non-zero coefficients were screened.After employing the LASSO method, DT is utilized to further screen features.The DT construction process forms a high-resolution feature subset by ranking features with information gain, adding top-ranked features as nodes, halting tree growth to prevent overfitting, implementing pruning for size reduction and overfitting mitigation, using a default 25% confidence threshold, determining leaf node characteristics by majority rule, handling misclassifications in small leaf node samples, and generating if-then rules from the final decision tree.This ensures capturing relevant features while avoiding overfitting. 36,37Ultimately, we selected 19 US radiomics features for ML model construction, as depicted in Figure 4.The most robust and significant selected features were put into logistic regression (LR) and Naive Bayesian (NB) classifiers, validation for the prediction of HER2 status.The performance of the ML models was evaluated using the receiver operating characteristic (ROC) curve, with attention to the area under the curve (AUC) for both training and validation sets.Additionally, accuracy (ACC), sensitivity (SEN), and F I G U R E 3 LASSO coefficient solution path of the 86 features.86 features with non-zero coefficients were selected from 841 features.

| Patient characteristics
Two hundred three patients with breast cancer were examined for HER2 2+ status by IHC.Subsequent clarification using FISH categorized the patients into HER2-negative (n = 161) and HER2-positive groups (n = 42).The comparative clinicopathological features of the two groups are detailed in Table 1.Notably, the HER2-positive group exhibited a significantly higher Ki67 expression index compared to the HER2-negative group (p = 0.027).Moreover, molecular subtypes varied significantly between the two groups (p < 0.001).The fundamental clinicopathological characteristics of the patients, split between the training and validation sets, are demonstrated in Table 2.No notable differences were observed between the two sets concerning age, menopausal status, family tumor history, HER2, ER, PR, Ki-67 expression levels, T stage, lateralization and quadrant of breast lesions, and molecular subtypes (all p > 0.05).However, the training set had a higher incidence of axillary invasion compared to the validation set, with this difference being statistically significant (p = 0.031).

| Feature dimensionality reduction and screening
For each patient, 840 quantitative US radiomics features were retrieved.Out of the 840 radiomics features, calculations were performed for 162 first-order features, 216 GLCM parameters, 126 GLDM parameters, 144 GLRLM parameters, 144 GLSZM parameters, 45 NGTDM parameters, and 3 shape-based parameters.Univariate logistic regression indicated a significant difference in PR between the HER2-positive and negative groups within the training set (p = 0.017) as shown in Table 3.Following this, we combined 840 radiomics features with the clinical factor PR, resulting in 841 features, for further analysis.LASSO screening identified 86 features with non-zero coefficients.Following this, we applied DT analysis to determine the relative importance of each feature, primarily leveraging the information gain rate.This analysis culminated in the selection of 19 crucial radiomics features.Figure 4 depicts these 19 key features vital for HER2 status prediction.Notably, ML approaches enabled the successful development of HER2 prediction models using the training set, and their efficacy was subsequently validated by the validation set.

| Machine learning model performance for prediction of HER2 status
Table 4 presents the predictive performance of both the LR and NB models.Figure 5 illustrates the results of the confusion matrix, while Figure 6 displays the ROC analysis outcomes.Figure 7  value of the LR model was significantly higher than that NB model (DeLong test, p = 0.031) in the validation set.These results underscore the LR model's robustness and its efficacy in accurately classifying and predicting outcomes based on the given data.In contrast, while the NB model exhibited a lower performance in terms of AUC, accuracy, and sensitivity, it was noteworthy for its impressive specificity, which stood at 91.8% in the validation set.This indicates that the NB model possesses a high capability to correctly identify negative cases, making it a valuable tool in contexts where the cost of false positives is high, and specificity is of paramount importance.

| DISCUSSION
Patients with HER2-positive breast cancer commonly respond favorably to neoadjuvant chemotherapy with the HER2-targeted agent trastuzumab, leading to improved disease-free survival and overall survival. 38,39Thus, HER2 status remains pivotal for determining breast cancer prognosis and tailoring optimal treatment strategies.
Radiomics involves extracting and analyzing quantitative imaging features from modalities like US and CT.These features, often subtle to human eyes, can be quantified to correlate with the lesion's intrinsic qualities.This correlation enables the imaging phenotype to align with genetic information, aiding in disease diagnosis, individualized treatment strategies, and prognosis prediction. 28,40,41ISH was employed in this work to ascertain HER2 2+ status and to assess the efficacy of the variables in classifying HER2 2+ status.We examined the concordance between our ML predictions based on radiomics features and the FISH assay results.A study involving 1951 cases revealed that HER2-positive patients with breast cancer tended to have a higher Ki-67 proliferation index, 42 which aligns with our findings.In our cohort, 88.1% of HER2positive patients had a high Ki-67 proliferation index (37/42; p = 0.027).A meta-analysis of 12 papers discerned a relationship between breast US features and HER2 expression, identifying three US features with significant associations to HER2 overexpression: microcalcifications on US imaging, indications of malignancy on US, and irregular breast mass shapes on US. 43 Although some correlation emerged, the results were variable and somewhat limited, underscoring the insufficiency of US features alone in comprehensively characterizing lesions, leading us to further investigate radiomics.
Previous studies have documented the diagnostic efficacy of various ML algorithms in the context of breast cancer.LR is advantageous for breast cancer diagnosis with medical imaging radiomics due to its interpretability, efficiency in binary classification, probabilistic outputs, adaptability to linear and non-linear relationships, low risk of overfitting, efficiency with small datasets, suitability for feature selection, and easy implementation. 44,45Jing Zhuo et al. 18 reported that, in the application of mammography (MG) radiomics, a LR model demonstrated superior performance (AUC = 0.787) compared to the SVM model in predicting the HER2 status of patients with breast cancer.NB is well-suited for handling high-dimensional data. 24Matthias Benndorf et al. 46 developed and validated a NB model for breast mass lesions using standardized descriptive terminology (BI-RADS lexicon) to mitigate variability in practice (AUC = 0.935).Woojae Kim et al. 47 reported recruiting a total of 679 patients and constructing a nomogram based on Naive Bayes (NB) modeling to predict recurrence within 5 years after breast cancer surgery, achieving an AUC of 0.81.We deployed two ML models, LR and NB models, based on US radiomics to forecast HER2 status in patients preliminarily diagnosed as HER2 2+ via IHC.Given that IHC can discern HER2-negative (0/1+) from HER2-positive (3+) during the initial stages, accurately establishing the eventual status of HER2 2+ is becoming increasingly important in clinical choice-making.When comparing the predictive performance of both models, the LR model clearly outshined.For the validation set, its accuracy and AUC stood at 83.8% and 0.860 (95% CI: 0.727-0.994)respectively.While the LR model boasted a higher sensitivity of 92.3% in the validation set, the NB model impressively registered a specificity of 91.8%.According to a study conducted by Zejun Jiang et al., AUCs for texture analysis-based ML models of breast dynamic contrast enhancement-magnetic resonance (DCE-MR) images for predicting HER2 2+ patients with breast cancer ranged from 0.808 to 0.865. 48However, the sample size of that study was very small, with only 73 cases and 279 texture features and there was no validation set to validate the results.While our models exhibited promise, it's crucial to emphasize their current role as potential adjuncts rather than replacements for FISH testing in differentiating HER2 IHC 2+ breast cancer patients.It may help to classify FISH cases or collaborate with oncologists in formulating initiate treatment plans.Our research advocates for the potential utility of ML classifiers derived from radiomics features in ultrasound images as a non-invasive methodology to ascertain HER-2 status.However, this study is not devoid of limitations.(1) Firstly, this study was a single-center study and lacked external validation sets.The predictive sample size for this study was relatively small.Multi-center studies are needed to bolster heterogeneity and model adaptability.(2) Secondly, this study was retrospective, accompanied by sample imbalances and inevitable selection bias, which may impact the results of variable analysis.Future large-scale and prospective research should be undertaken to mitigate this error.(3) Thirdly, manual ROI delineations may not comprehensively represent lesions, and it is time-consuming and laborintensive.Applying ML models alone cannot achieve the automatic identification and segmentation of lesions.Therefore, the implementation of AI-automated operating systems will be pursued in the future.(4) We collected only 2D grayscale US images for radiomics analysis; multi-modal US features, such as US elastography, were not included.This suggests a potential shift toward three-dimensional imaging modalities, like automated breast ultrasound (ABUS), and apply multimodal US features in future endeavors.
The understanding and classification of HER2 are continually evolving.In recent times, the category of "HER2 low" breast cancers has been introduced.These primarily encompass tumors designated as IHC 1+ or 2+ but are ISH-negative. 49 therapies.In this light, ML models leveraging US images offer promising avenues.They can discern predictive information for IHC HER2 2+ cases, providing valuable insights into the potential HER2 outcome.Furthermore, the ML models we constructed can be employed repeatedly, both pre-and post-treatment, offering guidance for HER2-targeted therapies.This approach paves the way for tailored treatments and precision medicine in breast cancer care.
In conclusion, our machine learning models, particularly LR, show promise for deciphering indeterminate IHC HER2 (2+) results in breast cancer, offering valuable insights for targeted therapies.While not a replacement for FISH tests, these models enhance decision-making for individual patient needs.Validation with extensive external datasets is crucial before clinical integration.This research sets the stage for future explorations, including multi-modal deep learning in upcoming studies.
Inclusion criteria encompassed: (1) breast cancer confirmation by postoperative pathology; (2) postoperative specimen IHC results indicating HER2 2+; (3) definitive verification of HER2 status by FISH with clear results; (4) breast ultrasonography conducted within 2 weeks before surgery; (5) clear US images with identifiable lesions.Exclusion criteria were: (1) incomplete clinical data or subpar US quality; (2) lack of postoperative IHC or FISH HER2 status verification, cases with heterogeneous FISH results, or IHC score of 2+ without subsequent FISH testing; (3) preoperative neoadjuvant chemotherapy or radiotherapy; (4) multifocal lesions.Out of 203 female patients that fit the criteria, 161 were HER2-negative, and 42 were HER2-positive, with a median age of 53 (range 26-85) years.The dataset was split into a 7:3 ratio, designating 141 patients for training (112 HER2-negative; 29 HER2-positive) and 62 for validation (49 HER2-negative; grayscale ultrasound image of breast lesion.(B) Manually outline the region of interest using software 3D slicer.

F I G U R E 2
Flowchart of the overall process of constructing the machine learning models for predicting HER2 2+ status.| 5 of 14 YAN et al.

F I G U R E 5
Results of Confusion matrix.(A) Prediction results of the logistic regression (LR) model on the training set; (B) prediction results of the LR model on the validation set; (C) prediction results of the Naive Bayesian (NB) model on the training set; (D) prediction results of the NB model on the validation set.
Clinically, interpreting HER2 can be challenging when FISH tests indicate HER2 heterogeneity.Accurate classification becomes paramount, especially since tumors with a lower range of HER2 expression might respond favorably to new targeted F I G U R E 6 ROC curves of the machine learning model for the prediction of HER2 2+ status.(A) Training set of the LR model; (B) validation set of the LR model; (C) training set of the NB model; (D) validation set of the NB model.
provides the AUC values of the models for both training and validation sets.In comparing the performance metrics of the LR model and the NB model, Note:The bold values mean p value <0.05, representing significant differences.Abbreviations: ALN, axillary lymph node; CI, confidence interval; ER, estrogen receptor; HER2, human epidermal growth factor receptor-2; OR, odds ratio; PR, progesterone receptor.