Single‐cell protein profiling defines cell populations associated with triple‐negative breast cancer aggressiveness

Triple‐negative breast cancer (TNBC) is an aggressive and complex subtype of breast cancer that lacks targeted therapy. TNBC manifests characteristic, extensive intratumoral heterogeneity that promotes disease progression and influences drug response. Single‐cell techniques in combination with next‐generation computation provide an unprecedented opportunity to identify molecular events with therapeutic potential. Here, we describe the generation of a comprehensive mass cytometry panel for multiparametric detection of 23 phenotypic markers and 13 signaling molecules. This single‐cell proteomic approach allowed us to explore the landscape of TNBC heterogeneity, with particular emphasis on the tumor microenvironment. We prospectively profiled freshly resected tumors from 26 TNBC patients. These tumors contained phenotypically distinct subpopulations of cancer and stromal cells that were associated with the patient's clinical status at the time of surgery. We further classified the epithelial‐mesenchymal plasticity of tumor cells, and molecularly defined phenotypically diverse populations of tumor‐associated stroma. Furthermore, in a retrospective tissue‐microarray TNBC cohort, we showed that the level of CD97 at the time of surgery has prognostic potential.

Triple-negative breast cancer (TNBC) is an aggressive and complex subtype of breast cancer that lacks targeted therapy. TNBC manifests characteristic, extensive intratumoral heterogeneity that promotes disease progression and influences drug response. Single-cell techniques in combination with next-generation computation provide an unprecedented opportunity to identify molecular events with therapeutic potential. Here, we describe the generation of a comprehensive mass cytometry panel for multiparametric detection of 23 phenotypic markers and 13 signaling molecules. This single-cell proteomic approach allowed us to explore the landscape of TNBC heterogeneity, with particular emphasis on the tumor microenvironment. We prospectively profiled freshly resected tumors from 26 TNBC patients. These tumors contained phenotypically distinct subpopulations of cancer and stromal cells that were associated with the patient's clinical status at the time of surgery. We further classified the epithelial-mesenchymal plasticity of tumor cells, and molecularly defined phenotypically diverse populations of tumor-associated stroma. Furthermore, in a retrospective tissue-microarray TNBC cohort, we showed that the level of CD97 at the time of surgery has prognostic potential.

Introduction
Triple-negative breast cancer (TNBC) is a profoundly heterogeneous subtype of breast cancer with characteristic aggressive behavior and poor outcome [1,2]. Because TNBC lacks the expression of estrogen (ER), progesterone (PR), and human epidermal growth factor 2 (HER2) receptors, cytotoxic chemotherapy remains the treatment of choice for early-stage and advanced TNBC. However, multiple clinical trials with targeted approaches and immunotherapies are ongoing [3].
Intra-and intertumoral heterogeneity are major obstacles in the effective clinical management of patients. Both were in part molecularly dissected by a number of studies that identified multiple TNBC subtypes, with each exhibiting unique biological features [4]. The intratumoral heterogeneity is driven by genetic, epigenetic, and phenotypic inputs within the cancer cells and extrinsic factors from the tumor microenvironment (TME). Such heterogeneity can then influence tumor progression and therapeutic response [5]. Phenotypic plasticity of TNBC is often attributed to epithelial-mesenchymal transition (EMT), a mechanism that generates a subpopulation of highly motile cells, often with increased ability to seed new tumors and capacity to self-renew and differentiate [6,7]. EMT and its reverse program mesenchymal-epithelial transition (MET) are key advantageous developmental programs hijacked by tumor cells to support their dissemination from the primary site. While EMT and acquisition of mesenchymal phenotype (M) is critical for cancer cell spread, subsequent reversion back to the epithelial state (E) is a prerequisite for successful metastatic outgrowth at distant sites [5,8,9].
Selected tumor cell populations can dynamically switch between EMT and MET and exist in a spectrum of hybrid E/M states, partially bearing features of both phenotypes. Cells in these hybrid states are presumed to have the highest tumorigenic and metastatic potential [10,11]. Both EMT/MET programs are influenced by a number of factors and signaling molecules, including the stereotypical EMT inducer TGF-b and SMAD signaling pathways, NF-jB, JAK-STAT proteins, PI3K/AKT/mTOR or Wnt [5,12]. These stimuli are usually governed by TME, comprised of different populations of infiltrating immune cells, stromal cells such as fibroblasts, pericytes, endothelium, and other cell types together with extracellular matrix [13]. Various elements of TME play both pro-and anti-tumor roles, and TME represents a promising therapeutic target [13,14]. An ample effort is currently being made toward the detailed characterization of tumor-TME interactions at single-cell and spatial levels. Several recent single-cell transcriptomic reports further confirmed that TNBC tumors consist of distinct subpopulations of tumor cells [15,16], cancerassociated fibroblasts [17], and immune cells [18] with clinically relevant transcriptomic signatures. The observed gene expression changes are not always reflected at the level of protein expression. Mass cytometry overcomes this limitation by employing metal-conjugated antibodies, enabling the quantification of dozens of proteins and phospho-epitopes in individual cells simultaneously [19].
In this study, we applied such mass cytometric profiling to a cohort of prospective TNBC patients. We comprehensively mapped phenotypic TNBC diversity and signaling status at the protein level. Our data revealed the presence of distinct cancer and stromal phenotypes within TNBC tumors, and their association with patient clinical status. Validation of selected markers in expanded retrospective cohorts with routine histology techniques further corroborated their stratifying potential.

Cell lines
Breast cancer cell line MDA-MB-231 (RRID:CVCL_ 0062) was obtained from the American Type Culture Collection (ATCC, Manassas, Virginia, USA) and used as an internal control. Cells were cultured in RPMI 1640 medium (Thermo Fisher Scientific, Waltham, Massachusetts, USA, TFS) supplemented with 10% fetal bovine serum (TFS) and 100 UÁmL À1 penicillin/streptomycin (Sigma-Aldrich, Merck-Millipore, Darmstadt, Germany). Cells were maintained at 37°C and 5% CO 2 , routinely tested for mycoplasma contamination with PCR and authenticated using AmpFLSTR Identifiler Plus PCR Amplification Kit (TFS) to verify their origin. Peripheral blood mononuclear cells (PBMCs) were obtained from a healthy donor with written informed consent in accordance with the Declaration of Helsinki. PBMCs were isolated as a buffy coat layer on Ficol-Paque gradient. Isolated PBMCs and the MDA-MB-231 cell line were incubated with CD24 and ROR1 antibodies for 30 min at room temperature (Table 1). After washing once with Maxpar Cell Staining Buffer (CSB) (Fluidigm, South San Francisco, CA, USA), cells were stained with 1 lM cisplatin for subsequent dead cell exclusion

Mass cytometry barcoding
To achieve parallel sample analysis and thus minimize the batch effect, we used metal tagged antibody-based barcoding approach. 0.1-1 9 10 6 cells from each dissociated tumor sample, as well as MDA-MB-231 cells and PBMCs (internal controls) were barcoded using a barcoding scheme consisting of combinations of HLA-I and CD298 antibodies conjugated to platinum ( 195 Pt and 196 Pt) and cadmium ( 106 Cd, 110 Cd, 111 Cd, 112 Cd, 113 Cd, 114 Cd, and 116 Cd) isotopes (see Table 1). Antibodies were purchased from Fluidigm or conjugated in-house using Maxpar antibody conjugation kits (Fluidigm). Thawed individual tumor samples, cell line, and PBMCs were incubated with antibodies for 30 min at room temperature, washed with CSB buffer, and pooled together for subsequent antibody staining.

Antibodies and antibody labeling
All antibodies, clones, metal tags, and providers are listed in Table 1. Metal-labeled antibodies were prepared using the Maxpar antibody conjugation kits (Fluidigm), according to the manufacturer's instructions or purchased pre-conjugated. Each antibody was titrated and validated into the working panel to achieve an optimal signal-to-noise ratio.

Mass cytometry antibody staining and detection
After barcoding, the pooled samples were stained with a mastermix of surface antibodies for 30 min at room temperature; except for CD24 and ROR1 that were stained before fixation and freezing at À80°C. Samples were then washed twice with CSB buffer. Next, the cells were permeabilized with 80% methanol for 30 min on ice and washed with CSB buffer. For intracellular staining, the cells were incubated with a mastermix of intracellular antibodies for 30

Data preprocessing and analysis
Data were normalized in FLUIDIGM software based on EQ beads and compensated for channel crosstalk, as previously described [20]. These compensated .fcs files were processed with FLOWJO software (v10.8.0, BD). Cells were gated for singlets, de-barcoded, and cisplatin-positive dead cells were excluded from the analysis. The following major cell types of interest were identified by manual gating using selected marker expression pattern: PanCK + /CD45 À cancer cells, CD45 + PanCK À immune cells, CD90 + /PanCK À /CD45 À stromal cells. Representative gating strategy is shown in Fig. S1A. The total amount of collected cells from measured samples ranged from 850 to 174 000. The resulting, clean dataset contains in total 73 532 cancer cells, 560 218 immune cells, and 156 928 stromal cells.
For the purposes of unsupervised data analysis, we utilized recently published Risk Assessment Population IDentification (RAPID) algorithm [21], enabling identification of phenotypically distinct populations and determining whether they stratify patient survival. Twenty-six manually pre-gated .fcs files were exported, and loaded into RSTUDIO (4.1.1 version) [22]. Here, each file was downsampled to the same cell number and combined into new data frame. Because some samples contained lower number of cells they were excluded for further unsupervised analysis of epithelial cells (21 samples analyzed in total) and stromal fraction (24 samples analyzed in total). This preprocessed mass cytometric data frame, together with .csv file containing annotated clinical parameters of tumor specimens, was subjected to a modified RAPID algorithm. Dimensionality reduction was calculated based on 36 markers using the R T-SNE package [23]. The following parameters were used: iteration = 10 000, perplexity 200, theta 0.9, and eta 200. The optimal number of clusters was computed based on the t-SNE map using the modified script, followed by FlowSOM clustering [24].
Because our dataset lacked information about the patient outcome, on which the RAPID algorithm is based, we introduced a Ki-67 + LNR index that consisted of clinical assessment of cancer spread to lymph nodes and Ki-67 positivity, as determined by diagnostic histology analysis for each patient. This clinically relevant Ki-67 + LNR index was then implemented for modified RAPID analysis instead of patient survival.
To calculate Ki-67 + LNR index for each patient we used two variables, LNR ("lymph node ratio" = number of positive lymph nodes over number of excised lymph nodes) and Ki-67 that was normalized to maximum value from all tumors. For each analyzed cell in the dataset, we stored also information from what tumor/patient the cell originated and the cluster number in which the cell was identified by FlowSOM. We took normalized Ki-67 and LNR values of single tumors and assigned them to each analyzed cell in dataset as new variables. Then we selected cells of separate FlowSOM clusters and calculated the means of normalized Ki-67 and LNR variables. Finally, a sum of these two means results in Ki-67 + LNR index that acquires values from 0 to 2.
Mesenchymal-epithelial (MET) score was calculated for each cancer cell. Mean intensity (MI) of selected proteins measured by mass cytometry was normalized to their corresponding maximal values obtained in the dataset. The normalized values of the final range from 0 to 1 were then used to calculate the MET score. The mass cytometry MET score was then calculated as a cumulative sum of normalized MIs from epithelial markers [EpCAM + CD49f + CD9] minus the cumulative sum of MIs from mesenchymal markers [Vimentin + aSMA + CD44] for each cancer cell in the sample. The final MET score for each patient is calculated as the mean MET score from all cancer cells in the sample and acquires values from À3 (fully mesenchymal-like phenotype) to +3 (fully epitheliallike phenotype). MET score calculation was done in R.
Correlation analysis was performed in GRAPHPAD PRISM (version 9.0.2 GraphPad Software, San Diego, CA, USA) using non-parametric Spearman's correlation. Heatmaps were generated in the CLUSTVIS tool [25].

Tissue microarray (TMA) construction and immunohistochemistry
The first set of archival tissue samples from TNBC patients was constructed at the MMCI, and contained patient tissue collected between 2005 and 2009. The second set was also constructed at the MMCI, collected between 2012 and 2021. TNBC status was determined according to ASCO standardsthreshold for the hormone receptor for IHC staining of tissue sample was 1% [26]. TMA construction and immunostaining of archival formalin-fixed, paraffin-embedded tumor samples with appropriate antibodies was done according to standard techniques established at MMCI (Table 1). Protein expression was assessed semiquantitatively by an expert breast cancer pathologist, using the histoscore (H-score) method. In H-score, the percentage of positive cells (0-100%) is multiplied by staining intensity (0-3), resulting in a final histoscore that ranges between 0 and 300. Immunohistochemistry MET score was calculated as a cumulative H-score of [EpCAM + CD49f] proteins minus a cumulative Hscore of [Vimentin + aSMA] proteins. For Burstein classification of tumor tissues we used five surrogate markers -AR, CK5/6, CD8, E-cadherin, Vimentinin co-junction with quantification of tumor-infiltrating lymphocytes according to the relevant studies [27][28][29]. We then classified these samples according to Burstein's classification as luminal-androgen receptor (LAR), mesenchymal (MES), basal-like immunosuppressed (BLIS), and basal-like immunoactivated (BLIA) subtypes. TNBC subtypes for this cohort were added to Table S1 and used for further analysis.

Statistical analysis
Statistical analyses and visualizations were performed in R (4.1.1 version, R Core Team, 2021) and GRAPHPAD PRISM (version 9.0.2, GraphPad Software). Kaplan-Maier analysis was performed by the survminer R package. Illustrations were created with BioRender.

Broad view of TNBC cellular landscape through large-scale single-cell proteomics
While single-cell transcriptomics vastly expanded our understanding of tumor ecosystems, this approach does not yet allow plausible profiling of cell landscapes in larger patient cohorts or greater cell numbers. To challenge this limit, we designed a prospective, singlecell-based, large-scale mass cytometric analysis of 26 treatment-na€ ıve triple-negative breast cancer cases. Twenty-four tumors were classified as high grade 3, and two were intermediate grade 2 tumors. Six patients presented with involved lymph nodes that have been resected, and none of them had signs of metastatic disease at the time of surgery (Table S1). Freshly resected tumor samples were immediately dissociated into single cell suspensions and mass-tag barcoded, along with a cancer cell line and PBMCs that served as internal controls. Samples were then sequentially stained with a panel of surface and intracellular antibodies (Fig. 1A).
The antibody panel was designed to identify tumor cell subpopulations that are known to contribute to breast cancer progression (e.g. CD24, CD44 as markers of CSCs), surface molecules that associate with EMT/MET plasticity (e.g. EpCAM and Vimentin) and a set of novel surface antigens reflecting breast cancer cell plasticity (e.g. CD29, CD97, CD49c, ITGB5), published previously [30]. We also included markers that would allow for subtyping of stromal cells (e.g. CD90, Vimentin, and aSMA) and a spectrum of immune cell types (e.g. CD45, CD3, CD14, CD19). Additionally, to inspect the activation of tumor-relevant signaling pathways at the level of protein, we included signaling hallmarks from TGF-b/SMAD, NF-jB, JAK-STAT, PI3K/AKT/mTOR, and Wnt signaling pathways. The activation of these signaling effectors is often involved in cancer cell plasticity and tumorigenesis (Fig. 1B).
This workflow resulted in 894 942 high-quality singlecell proteomic profiles from 26 samples. To allow for a comprehensive view of TNBC "cytome", we generated two-dimensional maps from our data using the automated dimensionality reduction algorithm t-distributed stochastic neighbor embedding (tSNE). Most cells were of hematopoietic origin (CD45 + ), followed by stromal (CD90 + ) and cancer compartments (PanCK + ; Fig. 1C). Information about other cell types or cell subsets is not available due to the limited number of markers that the current stage of technology allows. The presence of major cell types was additionally confirmed by manual gating (Fig. S1A).

3.2.
Ki-67-positivity and lymph node involvement at the time of surgery as a proxy for patient outcome We expected that the population of cancer cells in our TNBC cases will be globally heterogeneous, with some subpopulations more and some less abundant in aggressive tumors. To approach this question in an unbiased manner and to correlate cancer subpopulations with clinical parameters, we took advantage of a recently published automated and unsupervised machine learning algorithm RAPID [21]. This concept allows the identification of phenotypically distinct populations and determines whether they stratify patient survival. Due to the prospective design of this study that required freshly resected primary tumors, the long-term clinical outcome data, including metastatic spread and survival, was not available at the time of analysis. We therefore tested and robustly validated 'Ki-67 positivity and lymph node ratio' (LNR) at the time of the primary tumor surgery as a surrogate predictive and prognostic marker in TNBC, with minor modifications [31][32][33] (Fig. S1B). This approach enabled us to stratify patients solely according to their Ki-67% and lymph node involvement, as both parameters are routinely collected during diagnosis and widely available. As a proof-of-concept study, we correlated the introduced Ki-67 + LNR index with survival in two independent retrospective TNBC tissue microarray cohorts that we later used for histological validations (Fig. S1C,D). With good agreement between the predictive power of Ki-67 + LNR index and survival in retrospective cohorts, we applied this index to our prospective single-cell proteomic study.

Cancer subpopulations associated with clinical parameters
Simultaneous profiling of identically processed and preserved suspensions with mass cytometry eliminates the need for batch correction. We therefore applied modified algorithm employing FlowSOM clustering directly on cancer cells from our patients. FlowSOM provides an overview of how all markers are behaving on all cells, and detects subsets that might be otherwise missed. Using self-organizing maps for data analysis, FlowSOM serves as both clustering and visualization tool [21,24]. The identified FlowSOM clusters are then directly associated with Ki-67 + LNR index. Using this data analysis approach, we observed eight phenotypically distinct populations of PanCK + cells ( Fig. 2A, Fig. S2A,B). Majority of analyzed tumors contained cells in all clusters (Fig. S2B, Table S2). These populations had distinct expression patterns of EMT/MET marker Vimentin, a number of cell adhesion molecules and integrins (CD29, CD49f, CD97, CD44, CD90), and signaling molecules (pNF-jB and HLA-DR; Fig. S2C,D). These TNBC subpopulations also differed in their Ki-67 + LNR indices. While Cluster 8 had the highest calculated Ki-67 + LNR index (blue; 0.88), Cluster 4 had the lowest (pink; 0.72) (Fig. 2B). Molecularly, cancer cells present in Cluster 8 displayed elevated expression of CD97, pNF-jB, and HLA-DR, and low CD90 and Vimentin levels (Fig. 2C). In contrast, Cluster 4 contained cells expressing low CD97, pNF-jB and HLA-DR levels, and high levels of CD49f, CD44, CD90, and Vimentin. The high expression of CD90 and Vimentin are frequently associated with epithelial-mesenchymal transition (EMT). EMT and its reverse mesenchymalepithelial (MET) program are well-known contributors to plasticity and heterogeneity of cancers, including TNBC. We therefore decided to predict the EMT-MET phenotype of cancer cells in our patient cohort.
To assign cancer cells their EMT-MET status in an unbiased manner, we calculated the mesenchymal-toepithelial (MET) score for each cell and all patients based on selected epithelial and mesenchymal molecules. We selected Vimentin, aSMA, and CD44 as bona fide mesenchymal markers, and EpCAM, CD49f, CD9 as the representative of epithelial phenotype (Fig. 2D, [30]). The score ranged from À3 value representing fully mesenchymal phenotype, to +3 for fully epithelial phenotype, with hybrid phenotypes scoring around 0. We assessed that the cancer cells in 23 of 26 samples are highly EpCAM-positive, with more than 50% cancer cells expressing this epithelial marker (average from all samples = 80.4%; 95% CI = 0.003) and exhibiting more epithelial phenotype based on their positive MET score. The MET score did not surpass 0.5 value, indicating that cells are, based on our classifier, in an intermediate or hybrid EMT/MET state, coexpressing both epithelial and mesenchymal proteins (Fig. 2D). MET score negatively correlated with Ki- 67 + LNR index (Fig. 2E), suggesting that cancer cells favor hybrid EMT/MET phenotype endowing them with high plasticity and increased fitness, features that are required for the tumor development and progression.
Quite unexpectedly, we also identified Cluster 4 to be enriched for small population of both EpCAM high and EpCAM low cells with CD90/CD44/Vimentin/Ki-67 high molecular profile, resembling proliferative cancer stem-like (CSC) subpopulation (Fig. 2F). Such basallike CD44/CD90 + cells have been previously identified as tumorigenic and able to interact with monocytes and macrophages [34,35].

Unprecedented heterogeneity of TNBC stromal compartment
In parallel, using similar unsupervised analysis in the CD90 + /PanCK À /CD45 À stromal fraction, we molecularly dissected 10 stromal subsets of comparable abundance, present in 24 tumor specimens (Fig. S3A,B). Mirroring the situation in cancer compartment, these stromal subpopulations displayed diverse phenotypes with the most significant differences in the expression of CD90, aSMA, and CD29a set of well-known cancerassociated fibroblast (CAF) markers, surface integrin CD49c, adhesion G protein-coupled receptor family member CD97, tetraspanin CD9, canonical TGF-b signaling hallmark pSmad2, and proliferation surrogate Ki-67 (Fig. S3C,D). Interestingly, these stromal clusters (Fig. 3A) showed similar Ki-67 + LNR index with only minor difference in values, with the highest being Cluster 4 (blue, 0.84), and the lowest detected in Cluster 9 (orange, 0.72; Fig. 3B). This observation suggested that the identified stromal subsets do not stratify patients based on Ki-67 + LNR index.
Compared to Cluster 9, cells in Cluster 4 (the highest Ki-67 + LNR index) expressed high levels of aSMA, CD90, CD29, CD49c, and Ki-67, also indicating an increased proliferative activity in this subpopulation (Fig. 3C). Because adhesion molecules and integrins, including CD49c, might play a functional role in tumor progression, we inspected whether these molecules are present also on stromal cells. We detected populations of CD49c-, CD49f-, and CD97positive cells present across clusters at different proportions (Fig. 3D). All three surface molecules were concomitantly co-expressed, along with CD90 and CD44, with small "activated population" of cells expressing pSmad2, active b-catenin, and phosphorylated Akt (Fig. 3E). This suggests that these stromal subsets co-expressing signaling molecules might actively contribute to disease progression; however, precise functional experiments are needed. Although the stromal fraction did not differ in Ki-67 + LNR index, we observed heterogeneity in stromal compartment and identified interesting subpopulations for further studies.

Validation of complex TNBC phenotypes with routine, low-dimension immunohistochemistry
We next addressed the critical limitation of our study the lack of long-term clinical follow-up in our freshly resected patient cohortwith an alternative strategy. We investigated the association between our mass cytometry protein hits with patient outcome in retrospective TNBC cohorts that included over 200 samples. We chose immunohistochemistry as a classical and routine procedure that does not require state-ofthe-art instrumentation and is widely used at clinical pathology departments for diagnostic purposes.
First, we in-house assembled TNBC tissue microarray that included 108 primary tumors and if available, also matched lymph node metastases. For cross-validation purposes, this cohort included archival tissue from 24 patients analyzed with mass cytometry. This sample set was then clinically annotated for lymph node involvement and, if available, also for survival.
Based on the unbiased outcomes from mass cytometry analysis, we selected proteins which expression significantly separated the Ki-67 + LNR-high and -low clusters, in both cancer and stromal compartments. These included CD97, HLA-DR, pNF-jB, and aSMA. Additionally, we included known markers of EMT/ MET that were included in the cytometric panel, specifically EpCAM, CD49f, and Vimentin (Fig. 4A).
Percentage of positivity and staining intensity of selected markers in stromal and tumor cells for individual patients was determined using the semiquantitative H-score method by a certified breast cancer pathologist in a blinded fashion. Although we were not able to reach statistical significance, this histopathological analysis revealed different expression patterns of EpCAM, CD49f, Vimentin, CD97, HLA-DR, and pNF-jB in lymph node metastases compared to matched primary tumors (Fig. 4B). These expression patterns varied from patient to patient and showed both increasing and decreasing trends in metastases relative to primary tumors, corroborating profound heterogeneity among the individual patients.
To approach this assessment in a similar way as done for the mass cytometry data, we classified the EMT/MET status of this retrospective histology cohort using histology MET score (hMET score). The hMET score was calculated from H-score (0-300) of epithelial markers EpCAM and CD49f, and mesenchymal markers Vimentin and aSMA. Such hMET score extracted specifically from tumors that were analyzed with mass cytometry was in an agreement with the actual mass cytometry MET score (Fig. 4C). hMET score could be then, in simplified practice, used as a surrogate for mass cytometry MET score.
Out of all analyzed antigens, only the expression of EpCAM was significantly increased in tumors that disseminated into lymph nodes; the levels of EpCAM separated patients that had lymph nodes involved at the time of primary tumor resection from those that had not lymph node metastases present (Fig. S4A). Levels of HLA-DR and pNF-jB in the tumor fraction did not stratify patient survival (Fig. S4B). However, increased cancer cell expression of CD97 was significantly associated with worse overall survival, in concordance with our mass cytometry data (Fig. 4D). Although the correlation of stromal Vimentin with survival did not show significance, we observed an interesting trend between elevated stromal aSMA expression and worse overall survival (Fig. S4C).

Discussion
The recent revolution in single-cell and spatial omics technologies revealed an unexpected level of TNBC heterogeneity. Despite this, the contribution of different tumor or stromal cell subsets to disease progression and therapy resistance remains unclear.
To understand how selected subpopulations and cell states shape tumor biology and influence the clinical outcome of TNBC patients, we established a single-cell proteomics pipeline, allowing complex analysis of TNBC heterogeneity with mass cytometry. This approach allowed us to characterize TNBC "cytome" and associated diverse cancer and microenvironmental phenotypes with clinical state of 26 patients at the time of tumor surgery. While most single-cell-based TNBC studies focus on immune cells [36] or solely rely on transcriptomic data [15,17,18,37], we used proteins as the ultimate readout.
Due to the prospective nature of our study, clinical outcome related to distant metastasis or patient survival was not available at the time of analysis. We therefore modified an alternative approach and introduced the Ki-67 + LNR index that reflects Ki-67% positivity and lymph node involvement at the time of surgery. Both of these parameters are relevant, standalone predictive and prognostic factors in TNBC, and their higher levels are associated with worse prognosis and more aggressive clinical features [31][32][33]. Nonetheless, there are some concerns about the reproducibility of the Ki-67 assessment and its limited clinical utility [38]. Recent Ki-67 consensus meeting established that Ki-67 IHC does have clinical validity for the determination of prognosis in patients with early-stage breast cancer and proposed several recommendations that can lead to precise analytical validity of Ki-67 IHC determination namely careful preanalytical handling and standardized visual scoring [39]. These methodical criteria were met in our study that involved early-stage TNBC patients (see Section 2).
Data analysis uncovered eight clinically distinct subsets of cancer cells, each with a specific protein signature. The cluster that significantly separated cell populations with high and low Ki-67 + LNR index was enriched in HLA-DR, pNF-jB, and CD97. CD97 also significantly stratified patients based on their overall survival in our histology studies. CD97 is a member of G-protein-coupled receptor family with involvement in adhesion, migration, and cancer progression [40,41]. CD97-positive cancer cells associated with higher Ki-67 + LNR index in patients and concomitantly expressed HLA-DRan MHC class II molecule, normally expressed by the antigen-presenting cells, but also frequently identified in TNBC samples [42,43], and pNF-jB, a transcription factor implicated in TNBC proliferation and invasiveness with drug targeting potential [44,45]. Our findings suggest that the identified population highly expressing CD97/HLA-DR/pNF-jB proteins might be functionally involved in TNBC progression, and clinically targeted in the future.
In addition, we showed that the cancer cells in TNBC tumors reside in a spectrum of hybrid EMT/ MET states based on the introduced MET score, consisting of key epithelial and mesenchymal markers. Such hybrid EMT/MET phenotype embodies a major survival advantage: it maintains a highly plastic and tumorigenic state that can be dynamically polarized toward EMT or MET, thereby promoting cancer cell spread or homing and metastatic outgrowth at distant sites [5,8,11,46,47]. In our study, the hybrid EMT/ MET profile of cancer cells was further confirmed by an independent immunohistochemistry using the hMET score. This scoring in the matched samples was associated with high Ki-67 + LNR index and suggested that the hybrid phenotype may indeed drive tumor cell proliferation and metastatic lymph node colonization.
The stromal compartment of TNBC displayed a profound heterogeneity in cell surface and intracellular marker expression. We identified subsets of stromal cells co-expressing high levels of aSMA, CD90, and CD29 that associated with high Ki-67 + LNR index. This subpopulation resembled the profile of previously described myofibroblast-like (aSMA-highCD90high FAP high ) subsets of cancer-associated fibroblasts (myCAFs) [17,37,48]. These myCAFs are functionally capable to initiate EMT of cancer cells, hence support their spread to distant organs [49]. The TNBC stroma was also enriched for cell subsets positive for surface integrins CD49f and CD49c, and adhesion G-protein-coupled receptor member CD97. While the role of CAF-expressed integrins in breast cancer tumors has been previously suggested [50], their functional implications in the TNBC context remain unclear.
Our study comes with several specific limitations that are commonly associated with techniques requiring tissue dissociation and more generally with mass cytometry-based approach. Firstly, although extensively optimized, the tissue dissociation protocol used in our laboratory may lead to underrepresentation of some cell types, including epithelial cells that are relatively fragile and can be destroyed during the dissociation process [51]. Secondly, this dissociation protocol can potentially alter the cell surface composition. Thirdly, pre-selected antibodies in this panel can bias the phenotyping attempts. For example, we considered the hereby identified stromal cells as cancerassociated fibroblasts, but we cannot exclude the possibility that CD90 + fraction contained also mesenchymal stromal cells or perivascular-like cells. The part of the tumor tissue that was provided to us by the pathologists may not fully represent the composition of the entire tumor. Additionally, because of the prospective nature of our study, we lack information about clinical outcome (e.g. distant metastasis, overall survival) due to the short follow-up period.

Conclusions
In this study, we established a workflow for complex investigation of TNBC at a single-cell resolution and described phenotypically distinct cell subsets of tumor and stromal cells.
We introduced a novel clinically relevant Ki-67 + LNR index that can be associated with described heterogeneous cell subpopulations, thus identify cells that might contribute to disease development. Moreover, we identified cancer cell CD97 level as a predictor of worse clinical outcome. Taken together, our findings shed a new light on the heterogeneity of treatment-na€ ıve TNBC, providing valuable resource for future research in both basic and translational settings.

Supporting information
Additional supporting information may be found online in the Supporting Information section at the end of the article.    Table S1. Clinical and histopathological parameters of TNBC patients. Table S2. Percentage of cells present in cancer clusters across samples.