Defining the relationship between cellular and extracellular vesicle (EV) content in breast cancer via an integrative multi‐omic analysis

Much recent research has been dedicated to exploring the utility of extracellular vesicles (EVs) as circulating disease biomarkers. Underpinning this work is the assumption that the molecular cargo of EVs directly reflects the originating cell. Few attempts have been made, however, to empirically validate this on the ‐omic level. To this end, we have performed an integrative multi‐omic analysis of a panel of breast cancer cell lines and corresponding EVs. Whole transcriptome analysis validated that the cellular transcriptome remained stable when cultured cells are transitioned to low serum or serum‐free medium for EV collection. Transcriptomic profiling of the isolated EVs indicated a positive correlation between transcript levels in cells and EVs, including disease‐associated transcripts. Analysis of the EV proteome verified that HER2 protein is present in EVs, however neither the estrogen (ER) nor progesterone (PR) receptor proteins are detected regardless of cellular expression. Using multivariate analysis, we derived an EV protein signature to infer cellular patterns of ER and HER2 expression, though the ER protein could not be directly detected. Integrative analyses affirmed that the EV proteome and transcriptome captured key phenotypic hallmarks of the originating cells, supporting the potential of EVs for non‐invasive monitoring of breast cancers.


INTRODUCTION
'Liquid biopsies' of tumour markers contained in blood or other bodily fluids have been suggested as a means of capturing tumour heterogeneity and monitoring disease evolution [1].These represent a minimally invasive approach, amenable to repeated analyses over time.
Further, the use of circulating biomarkers may overcome some of the sampling biases present in traditional biopsy approaches [2].Extracellular vesicles (EVs) are one class of circulating biomarker which have attracted substantial attention as a component of liquid biopsies [3].EVs are membranous vesicles which contain nucleic acid and protein content derived from the cell of origin [4].As such, EVs derived from tumour cells have previously been shown to contain diseaserelated markers [5].Circulating tumour-derived EVs therefore have demonstrated potential as a readily accessible biomarker source.
Breast cancer is the most diagnosed cancer in women worldwide.
Whilst incidence rates have largely remained stable over the past few decades, the development of targeted treatments have led to an overall decline in mortality rates during this time [6].Administration of these therapies is largely informed by the expression profile of three receptors in the primary breast tumour: estrogen receptor (ER), progesterone receptor (PR) and human epidermal growth factor receptor 2 (HER2) [7,8].The expression profile of these biomarkers indicates which downstream pathways may be implicated in the disease process, and provides both prognostic and predictive information [9].As such, quantification of ER, PR and HER2 expression has become a routinely implemented practice for breast cancer.
It has been well-established, however, that tumour features are both temporally and spatially heterogeneous.Accordingly, ER, PR and HER2 expression have been reported to change in response to treatment [10], upon relapse [11] and between primary and metastatic lesions [12].This is likely accompanied by a suite of changes to downstream disease pathways and has important implications for disease prognosis and response to therapy [13].As such, there is currently an unmet need for monitoring tumour heterogeneity and evolution during treatment.
Given their presence within the circulation, tumour-derived EVs have been proposed as a means to address this need and monitor tumour evolution.The clinical utility of EVs as circulating biomarkers, however, is contingent upon establishing clear correlations between tumour and EV features.It has long been posited that specific selection of cargo occurs during EV biogenesis, resulting in selective enrichment or depletion of certain species relative to the cell [14][15][16].As such, EV cargo may not directly reflect the molecular features of the originating cell.This implies the need to characterise cellular and EV content and identify informative patterns to realise their utility as biomarkers.In the breast cancer context, proteomic analysis of HER2 and TNBC EVs has suggested these EVs capture the activity of key signalling pathways in the originating cells [17].To our knowledge, however, an integrative analysis of cells and EVs across all breast cancer subtypes has not yet been undertaken.
To this end, we employed an in vitro model to explore the proteomic and transcriptomic landscape of breast cancer cell derived EVs.Specifically, this study aimed to investigate if different cellular patterns of ER, PR and HER2 expression were recapitulated in the corresponding EVs, and to interrogate the relationship between cellular and EV content.To evaluate the effect of serum deprivation for EV production on breast cancer cells in vitro, a panel of eight breast cancer cell lines were cultured in three conditions: medium containing ≥10% FBS (complete medium), defined medium with 1% EV-depleted serum (low serum) and complete medium followed by 24 h serum-free (serum starvation).Whole transcriptome analysis revealed that responses to serum deprivation varied according to the cellular phenotype.Importantly, this analysis confirmed that key disease-associated features

Significance Statement
This work aimed to validate several assumptions underlying the potential use of EVs as circulating biomarkers in cancer, using breast cancer as a disease model.In vitro production of EVs using cell line models is commonly performed in the context of EV biomarker discovery and/or functional studies.
Our analysis verified that the transcriptome of breast cancer cells in culture remains stable in low serum medium and during short-term serum starvation, validating these ubiquitously used approaches towards EV production.Further, our findings empirically confirmed that disease hallmarks, including dysregulated signalling pathways, are recapitulated in the RNA and protein content of the corresponding EV populations.This affirms the assumption that tumourderived EVs capture key molecular features of the cell of origin.Importantly, therefore, they may provide a means to longitudinally monitor tumour evolution.This work is an important contribution to our understanding of the relationship between cellular and EV content, providing empirical proof-of-concept for their use as circulating biomarkers for cancer.
including ER, PR and HER2 expression remained stable across all culture conditions.Having validated our in vitro model, EVs were collected from all cell lines and subjected to transcriptomic and proteomic analysis.This analysis suggested that the composition of the EV proteome and transcriptome substantially differed across ER+, HER2+ and triple negative breast cancer (TNBC) cells.Integration of the EV proteome and cellular transcriptome data affirmed that these observed differences were reflective of dysregulated cellular signalling pathways in each cell type.This study therefore provides empirical confirmation that EV cargo can be used to infer clinically relevant features of the originating cells in the breast cancer context.

Study design
A

Cell culture-Complete medium
Cells were maintained in 175 cm 2 flasks at 37

Cell culture-Low serum medium
Composition of the low serum medium for each cell line is listed in Table S1.All low serum medium was supplemented with 1X Glu-taMAX and Phenol Red to a total concentration of 16 g/L.FBS was depleted of bovine extracellular vesicles by overnight ultracentrifugation.Briefly, the serum was diluted 1:3 in basal medium, and centrifuged at 100,000 × g for 18 h at 4

Cell culture-Serum starvation
Cells were maintained in a 175 cm 2 flask (n = 3 per cell line) in standard growth mediums at 37 • C and 5% CO 2 .Once cells had reached 70%-80% confluence, they were washed 3X with sterile PBS and serum-free basal medium added to the flasks.The specific medium used for each cell line is listed in Table S1.After 24 h, cells were collected and counted as described for the Low Serum culture condition.

EV production and conditioned medium collection
Cells were transitioned to low serum medium as described above and passaged once in this medium before EV collection proceeded.For collection, cells were seeded to a 175 cm 2 flask (n = 3 per cell line) and 15 mL conditioned medium collected and refreshed at 72, 96 and 120 h after seeding respectively.Seeding densities were determined such that cells were not more than 70%-80% confluent at the time of the final collection.Collected medium was immediately centrifuged at 800 × g for 5 min to remove cellular debris.Conditioned medium from the three collection points was pooled for subsequent EV isolation to minimize the effect of any time-dependent culture artefacts (total 45 mL per sample).Conditioned medium was frozen at −20 • C for a maximum of 6 weeks pending downstream EV isolation.
At the time of final collection, cells were detached from the flasks, washed and counted as described for the Low Serum culture condition above.

EV isolation
Details of EV isolation and characterisation have been submitted to the EV TRACK database, with ID EV210263.
Collected frozen culture medium was thawed in warm water.The medium was clarified by centrifuging at 10,000 × g for 30 min at room temperature (rotor = Eppendorf F34-6-38, k factor ∼1100).This was designed to enrich for vesicles <300 nm, based on calculations described in ref. [18].Clarified medium was concentrated from 45 mL to <0.5 mL using Amicon Ultra 50 kDa centrifugal filters (Merck Pty Ltd, Australia).These were centrifuged (rotor = Beckman SX4250) at 4 • C and 3900 × g for 20-30 min at a time, with the filter unit refilled with medium between each run until the entire volume had been concentrated.Samples of concentrated medium were reconstituted to 500 uL in PBS and separated in PBS on 10 mL home-made chromatography columns comprised of Sepharose 4B (Sigma-Aldrich, Australia).
To verify that no artefacts (e.g., FBS-derived EVs) were present in the cell culture medium or introduced during subsequent processing, tissue culture flasks containing low serum medium only (n = 3) were incubated at 37 • C in 5% CO 2 for 24 h.This medium was collected and processed as described immediately above.These control samples contained negligible amounts of particles and protein, indicating no contaminating artefacts were present.

EV quantitation
EVs were quantified using a NanoSight NS300 instrument with NTA software (v3.

RNA sequencing
Total RNA was isolated from cell pellets and isolated EVs using the Norgen RNA/Protein Purification Plus Kit (48200, Norgen Biotek Corp, Canada) as per manufacturer instructions.Isolated RNA was quantified using the Qubit RNA Broad Range assay kit for cells (Thermo Fisher Scientific, Australia) or the Qubit RNA High Sensitivity assay kit for EVs (Thermo Fisher, Australia) with a Qubit 4 Flurometer (Thermo Fisher, Australia).Each sample was treated with 10 units RNAse-free DNAse I (Qiagen, Australia) on EconoSpin Micro Volume DNA/RNA spin columns (Epoch Life Science, USA).
For cell samples, cDNA was generated using the SuperScript VILO cDNA Synthesis kit (Thermo Fisher, Australia).Library preparation was performed using the Ion Ampliseq Transcriptome Human Gene Expression Kit (Thermo Fisher, Australia).Templating was performed with the Ion PI HiQ OT2 200 Kit (Themo Fisher, Australia) with an Ion OneTouch 2 Instrument (Thermo Fisher, Australia).Sequencing was performed with the Ion Proton System (Thermo Fisher, Australia) using the Ion PI Chip Kit v3 (Thermo Fisher, Australia) and Ion PI HiQ Sequencing 200 Kit (Thermo Fisher, Australia).Reads were mapped to the hg19 AmpliSeq Transcriptome ERCC v1 using the Torrent Suite software v5.0.5 (Thermo Fisher, Australia) to produce a matrix of read counts.
For EV samples, ribosomal RNA was depleted from samples using the QIAseq Fast Select rRNA Removal kit (Qiagen, Australia).Following rRNA depletion and fragmentation, cDNA was synthesised using the NEBNext Ultra II Directional RNA Library Prep Kit for Illumina (New England BioLabs, Australia).End repair, adaptor ligation and amplification were performed using the xGen Prism DNA Library Prep Kit (Integrated DNA Technologies, Australia) with samples barcoded and amplified for 15 cycles of PCR using xGen UDI Primer Pairs (Integrated DNA Technologies, Australia).Size selection purification was performed at several points throughout the protocol as directed by manufacturer instructions using 2.5% w/v Speed Beads magnetic carboxylate modified particles (Merck, Australia) in a house-made buffer of 50% PEG in 2.5 M NaCl, 0.05% Tween-20 and 10 mM citric acid, pH6.Following the final purification, the library was amplified for a further 9 PCR cycles using universal p5/p7 primers (Integrated DNA Technologies, Australia).The size profile of the pooled, amplified library was verified using the Agilent High Sensitivity DNA kit and reagents (Agilent Technologies, Australia) on a 2100 Bioanalyzer Instrument.

Mass spectrometry (nanoLC-MS/MS)
Proteins were extracted from isolated EVs using 1% sodium deoxycholate buffer.Five micrograms of EV protein was reduced in 10 mM MaxQuant software (version 1.6.5.0) [23].Up to two missed trypsin cleavages were allowed, and a maximum of five modifications per peptide.Oxidation of methionine and acetylation of the N-terminus were set as variable modifications, and carbamidomethylation was set as a fixed modification.The false discovery rate (FDR) was set to 1%, and peptide identifications were matched between runs.All other search settings were the defaults for an Orbitrap instrument.Data was exported for analysis in the R Software Environment.

Statistical analysis
Data analysis was performed using Microsoft Excel for Microsoft [24] and pheatmap (1.0.12) [25] packages in the R environment.Gene set variation analysis (GSVA) was performed using the GSVA (1.38.0) package [27], with gene sets derived from the Gene Ontology (GO) project [28,29].Linear modelling with empirical Bayes moderation implemented through the limma package [30] (3.44.1) was applied to the GSVA scores to assess for differential expression of gene sets (FDR <0.05).

Model fitting and differential expression analysis of EV RNA
sequencing data was performed using the DESeq2 package (1.30.0) [26].The DESeq command was run with the following parameters: Data processing and statistical analysis of proteomics data was based on the approach described in ref. [32].Protein quantification was based on label-free quantification (LFQ) intensities computed by the MaxQuant software as described in ref. [33].Briefly, identified proteins were filtered to remove those identified as potential contaminants, and to those with a score ≥5.Proteins were further filtered by missing values, and only retained if identified in ≥80% of samples from at least one subtype group (ER+, HER2+ or TNBC).Data was then quantile normalised.Remaining missing values were imputed by one of the following: (i) for proteins missing in <25% of all samples, values were imputed using a localised least squares imputation method [34], (ii) for proteins missing in >25% of all samples, values were imputed from a normal distribution centred at minimum intensity.Model fitting and differential expression analysis of proteomics data was then performed using the limma package (3.44.1).Linear modelling with empirical Bayes moderation was used to test for differential protein expression (FDR < 0.05).Gene set enrichment analysis (GSEA) was performed using the fgsea (1.14.0) [35] package.Sparse Partial Least Squares Discriminant Analysis (sPLS-DA) modelling [36] was performed using the mixOmics (6.14.0) package [37].The model was tuned using 3-fold cross-validation repeated 50 times.

Serum deprivation induces different transcriptomic responses based on cellular phenotype
Cell culture for the purpose of EV production requires adaptation from conventional techniques to minimize interference from serumderived EVs.One method to achieve this is to culture cells in complete medium until sub-confluent, then incubate in a serum-free medium for a short period for the purpose of EV collection.ing analysis (NTA, data not shown).The formulation of this medium is described in detail in the Materials and Methods section.Cell viability was assessed by Trypan blue exclusion at the point of collection, and average viabilities per cell line across all three conditions were ≥95% (data not shown).RNA was isolated from cell lysates, and whole transcriptome profiling was performed using a targeted sequencing approach (AmpliSeq Transcriptome, Life Technologies).
To first validate the ER, PR and HER2 expression profiles of each cell line, we compared the normalised read counts for the corresponding ESR1, PGR and ERBB2 transcripts in all three culture conditions across the panel (Figure 1A).All presumed ER+ cell lines (BT474, BT483, with no PGR expression across all culture conditions.This is concordant with previous observations of some luminal epithelial-like features in this cell line [40].A binary summary of the ER, PR and HER2 status for each cell line is given as Table 1. Previous transcriptomic analyses of breast cancers, such as in the landmark study conducted by Perou et al. [41] have suggested that cellular patterns of ER, PR and HER2 expression are associated with distinct transcriptomic landscapes.To explore these within our dataset, data was first transformed by variance stabilising normalisation (VSN).
We performed differential expression analysis to identify specific genes altered by serum deprivation within each group of cell lines, with a total of 16517 genes included in the analysis.For the ER+ (HER2±) group, a total of 98 genes were significantly differentially expressed We next sought to specifically examine the expression profile of disease-relevant gene sets during serum deprivation.To this end, we performed GSVA [27] for biological process gene ontology (GO) terms.Linear modelling with empirical Bayes moderation implemented through the limma package was used to assess for differences in GSVA scores between conditions (full results in Supplementary File 1).For illustrative purposes, a subset of GO terms of interest related to epithelial-mesenchymal transition (EMT), extracellular vesicle biogenesis, ER signalling and ERBB2 signalling respectively were selected for graphical representation (Figure 1D).GSVA scores for these terms were generally not significantly different between culture conditions, with some exceptions.The low serum medium was associated with an increased score for extracellular vesicle biogenesis gene set for the HER2+ cell line, but not for the ER+ or TNBC subtypes.Likewise, serum starvation was associated with an increased GSVA score for extracellular vesicle biogenesis in ER+ cell lines only.
The overall findings of the differential expression analysis and GSVA suggested that the cellular transcriptome was only modestly altered under serum deprivation conditions.This was supported by the PCA plot, which showed distinct clustering of samples into cell line groupings regardless of culture condition (Figure 1B).To quantitatively validate this, we calculated the Pearson's correlation co-efficient for the mean (n = 3) of normalised gene count values across all combinations of cell line + culture condition (Figure 1E).Each sample was very highly correlated to other samples from the same cell line regardless of culture condition, with correlation co-efficients ≥0.95 for all intra-cell line comparisons (Figure 1E).Scatterplots of individual gene expression values for the T47D cell line across culture conditions are included as an illustrative example (Figure 1F).Samples were more highly cor-related within ER/HER2 expression subtypes than between subtypes, also irrespective of culture condition.For the subsequent part of this study, low serum culture was chosen as the preferred method, because the retention of cell viability meant that conditioned medium could be collected and replaced several times over a defined period.Further, the use of a uniform medium across all cell lines ameliorated the potential impact of different basal medium formulations on cell characteristics, as had previously been described by other investigators [43].

The transcriptomic landscape of breast cancer EVs
EVs were isolated from the conditioned medium of all cell lines by an optimised SEC method, as previously described [19].Specifically, the method was designed to enrich for small EVs <300 nm.Cells were counted at the time of collection (Figure S3a), and all were assessed to be ≥95% viable by Trypan blue exclusion (Figure S3b).Total EV yields per million cells were within the same order of magnitude across all cell lines as measured by both particle number (Figure 2A) and protein amount (Figure S3c), It was generally noted, however, that the more mesenchymal-like TNBC cell lines (HS578T and MDAMB231) were associated with lower overall yields than the more epithelial-like ER+ lines.The ratio of protein to EVs was calculated as a proxy measure of sample purity, and this too was consistent across all samples (Figure S3d).EV size distributions were measured by NTA.Representative distributions are shown in Figure 2B (one per ER/HER2 expression subtype, n = 1), and individual distributions for each cell line are included in Figure S3c.The size distributions were comparable across samples, with most EVs measuring 50-150 nm by NTA.
We then sought to explore the transcriptomic landscape of the isolated breast cancer EVs.To achieve sufficient material for total RNA sequencing, the three biological replicates of EVs isolated from each cell line were pooled prior to RNA extraction.To evaluate biological and technical variability, two additional samples containing material from a single biological replicate of two of the cell lines (BT483 and BT474) were concurrently prepared and sequenced.These samples are denoted as BT483(a) and BT474(a), respectively.Total read counts and % of reads mapping to exonic and intronic features are given as Table S2.It was noteworthy that although the samples were subject to DNAse digestion and rRNA depletion that a substantial proportion of reads mapped to intronic regions.This phenomenon has previously been documented in RNA-seq experiments [44].In the current case, it is hypothesised to be due to co-isolated DNA present within EVs or the presence of immature mRNA transcripts.Reads were filtered to only those mapping to exonic features including known protein coding genes and non-coding genes with known transcription (e.g., long non-coding RNA, processed and unprocessed pseudogenes) for further analyses.
The ESR1 (ER), PGR (PR) and ERBB2 (HER2) RNA expression profiles of the EVs mirrored that of the originating cells (Figure 3A).EVs derived from the ER+ cell lines (BT483, MCF7, T47D, BT474) contained elevated levels of ESR1 and PGR transcripts relative to the

The proteomic landscape of breast cancer EVs
To explore the landscape of the breast cancer EV proteome, shotgun proteomics analysis was performed on the isolated EVs.Between subtypes.Amongst these were multiple general EV markers, including CD63, CD9, CD81 and PCDC6IP (ALIX) (Figure S4a).Notably, neither the ER nor PR protein were detected in any sample at any level by mass spectrometry.This was validated by Western blotting of a subset of cell lines and corresponding cells.The ER and PR protein products were detected in MCF7 (ER+/PR±/HER2−) and BT474 (ER+/PR+/HER2+) lysates and were not detected in the corresponding EVs (Figure 2C).
Western blotting also confirmed the presence of EV associated proteins CD63 and ADP-riboslyation factor 1 (ARF1), and the depletion of serum albumin in the EV isolates (Figure 2C).The HER2 protein was detected at varying levels in EVs from all the cell lines analysed, both by mass spectrometry (Figure S4a) and Western blot (Figure 2D).
For quantitative analysis of the EV proteomic data, quantile normalisation was applied to spectral intensity values, and missing values were imputed using a localised least squares imputation method [34].His- Next, the composition of EVs derived from HER2 overexpressing cell lines (BT474, SKBR3) was compared to those from non-overexpressing lines (MDAMB231, HCC1143, HS578T, MCF7, T47D, BT483).In total, 270 proteins were identified as differentially expressed between HER2+ and HER2− EVs (Figure 4C(ii)).Two hundred two proteins had logFC >1.5, and of these, 181 proteins were up-regulated and 21 were down-regulated.HER2 peptides were detected in almost all individual EV samples, with higher spectral intensities observed for EVs derived from HER2+ lines versus HER2− (Figure S4a).The identified peptides mapped across the entire length of the HER2 protein, including the extracellular and cytoplasmic regions (data not shown).Other proteins with large positive fold changes between HER2+ and HER2− EVs included dehydrogenase/reductase SDR family member 2 (DHRS2), growth factor-receptor bound protein 7 (GRB7) and serine/threonine protein kinase mTOR (MTOR).
Model parameters including the number of components and the number of features per component were determined based on 3-fold cross-validation repeated 50 times.The balanced error rate (BER) for predictions (based on maximum distance) was lowest for 3 with 14, 19 and 18 features per component respectively (51 features in total).
Unsupervised clustering based on expression of these 51 selected proteins segregated the samples into four main clusters, which corresponded to the ER/HER2 expression subgroups (Figure 4D).Finally, gene ontology (GO) over-representation analysis for cellular component terms was performed on the proteins included in the model.
Significantly over-represented terms included 'membrane part' , 'intracellular vesicle' and 'cytoplasmic vesicle' .A list of all proteins included in the signature and their average normalised expression in each ER/HER2 expression subgroup is given as Table S3.

Integrative analysis of transcriptomic and proteomic data across cells and EVs
The final stage of this work was to explore the global relationship between the cellular transcriptome and the EV transcriptome and proteome respectively.This analysis focused on the 2033 genes that were commonly detected at the transcript level across cells and EVs, and at Given the weak correlation between EV protein and cellular transcript abundance on the single gene level, we next sought to investigate this relationship at a broader phenotypic level.Gene sets were curated by identifying genes significantly up and down regulated for the ER+ versus ER−, HER2+ versus HER2− and TNBC versus non-TNBC contrasts at the cellular level, applying an FDR cut-off of 1% and an absolute log fold-change threshold of 1.5.Gene set enrichment analyses (GSEA) against these six gene sets were then performed for the same contrasts in the EV proteomics data (Figure 5C).The analysis suggested that all cellular gene sets were also significantly enriched in the same direction at the EV protein level for each respective contrast.Specifically, 100 of the 1213 genes significantly up-regulated in ER+ cells relative to ER-were detected at the protein level, with 16 of these proteins also significantly up-regulated in ER+ EVs (Figure 5C(i)).For the same contrast, 196 of the 1234 down-regulated genes were detected at the EV protein level, with 24 proteins significantly down-regulated in ER+ EVs.Similarly, 62/376 genes significantly up-regulated and 161/1582 genes significantly down-regulated in HER2+ cells relative to HER2− were detected at the EV protein level.21/62 and 19/161 of these proteins were significantly up-and down-regulated in HER2+ EVs respectively.238/924 genes significantly up-regulated and 95/2081 gene significantly down-regulated in TNBC versus non-TNBC cells were detected at the EV protein level.19/95 proteins were significantly up-regulated, and 38/238 significantly down-regulated in TNBC EVs.Across all contrasts, there were no cases where transcripts and proteins were significantly enriched in opposite directions between cells and EVs.Gene sets and GSEA results are available in Supplementary File 3.

DISCUSSION
The mediator subunit 1 (MED1).These genes are genomically co-located with ERBB2 (17q12-17q-21), and as such are commonly co-amplified and over-expressed [50].Analogously, transcripts over-represented in TNBC EVs included vimentin (VIM), zinc finger E-box binding homeobox 1 (ZEB1).Both VIM and ZEB1 have been noted to be up-regulated in basal-like breast cancers, which constitute approximately 80% of TNBCs [51].In total, these analyses findings verified that the EV transcriptome reflected that of the originating cell, both globally and on a single transcript level.
The EV proteome also recapitulated key cellular features, albeit less directly.It has previously been reported that transcript and protein abundances within the cell have a weak-moderate positive correlation that is largely dictated by translation efficiency [52].The EV proteome is further modulated by cargo selection into EVs, with the precise mechanisms underpinning this only partially understood [53].These processes could explain why the correlation between EV protein abundance and cellular transcript abundance is only weakly positive.This phenomenon was exemplified by the inability to detect ER and PR protein in any of the EV samples by mass spectrometry or Western blotting, even where the EVs originated from cells with very high ER and/or PR expression.The predominantly nuclear localisation of ER and PR protein within the cell [54,55] makes it plausible that these proteins would be lowly abundant or absent in small EVs generated either in endosomes or at the plasma membrane.Despite this, ER+ EVs still contained several proteomic hallmarks of ER pathway hyperactivity.
The current findings are consistent with previous work in this space which has suggested that breast cancer EVs broadly reflect the phenotype of the originating cell.The presence of HER2 protein on breast cancer-derived EVs has previously been reported, with the relative load of HER2-bearing EVs in the circulation found to be strongly correlated to HER2 expression in the primary tumour [64].Other work in a murine model of breast cancer has suggested that EVs capture key proteomic hallmarks of EMT [65], consistent with our finding that EVs derived from mesenchymal-like TNBC cells are enriched in EMT markers.Similarly, EVs from HER2+ and TNBC breast cancer cells were reported to contain unique protein signatures directly reflective of aberrant pathway activity in their cells of origin [17].The current study extends on these previous works, demonstrating that this is similarly applicable to EVs from ER+ cells at both the protein and transcript levels.Available statistics suggest that ≥70% of invasive breast cancers are ER+, meaning ER is a key therapeutic target in this disease [66,67].
Despite this, to our knowledge no previous study has sought to evaluate if cellular ER pathway activity is recapitulated in EVs.The current findings provide proof-of-concept that monitoring of ER pathway activity via circulating EVs could plausibly be used to monitor response to anti-estrogen therapies, amongst other molecular features.
Another key facet of this study was to evaluate culture conditions for in vitro production of EVs in a panel of breast cancer cell lines.
While it is well-established that modification of cell culture to reduce or remove exogenous serum-derived EVs is necessary for EV production in vitro [68], the impact of these modified conditions on the cells, and by extension the collected EVs, had not been investigated.The current study affirmed that culture in a defined low-serum medium (with EV-free serum) and short-term serum withdrawal were both viable methods for EV production.Notably, both methods appeared to induce distinct gene expression changes in ER+, HER2+ and TNBC cells.This Whilst this study represents an important foundational step towards developing EV-based diagnostics for breast cancer, there are several limitations to this work.First and foremost, the study was performed using an in vitro model system.As one of the aims of the study was to define the relationship between the cell of origin and corresponding EVs, this necessitated the use of homogenous cell populations for EV production.Analysis of EVs derived from human or animal samples is considerably more complex.Circulating bodily fluids, including plasma and serum, contain a mixture of EVs from various cellular sources as well as highly abundant non-EV constituents (e.g., serum albumin, lipoprotein particles, immunoglobulins).In the cancer context, it is estimated that the load of tumour-derived EVs in the circulation is <1% of the total EV population, even for a large well-vascularised tumour [69].This presents a substantial methodological challenge for the detection of EV-associated biomarkers, and disease-related signal is likely to be masked by biological noise.As such, non-targeted screening approaches for biomarker discovery, including shotgun proteomics and RNA sequencing, have limited utility when applied directly to complex biological samples.Our approach was therefore to perform these initial discovery experiments with less complex in vitro models.This enabled identification of putative EV protein and RNA markers, which can then inform the design of targeted assays (e.g., multiple reaction monitoring targeted proteomics, targeted RNA sequencing, quantitative reverse transcription PCR) to increase the sensitivity of detection.
Future work in this space will involve development and optimisation of such assays for validation of candidate markers in relevant human clinical samples.

CONCLUDING REMARKS
In transcriptomic data can be accessed via the Mendeley data repository, using the following details: DOI: 10.17632/wk3yz9kc4x.2.Link to view record: https://data.mendeley.com/datasets/wk3yz9kc4x/draft? a=9409af08-7ed9-4392-b272-7ea898adbe1b.Raw and processed RNA sequencing data can be accessed via the Gene Expression Omnibus (GEO) and can be accessed using the following: Record number: GSE188385.Details of EV isolation and characterisation have been submitted to the EV TRACK database, with ID EV210263.

ORCID
Rebecca E. Lane https://orcid.org/0000-0002-7650-389X Michelle M. Hill https://orcid.org/0000-0003-1134-0951 panel of eight breast cancer cell lines with varying patterns of ER, PR and HER2 expression were selected for use in this study.To evaluate the effect of serum deprivation on the cellular transcriptome in the context of in vitro EV production, the panel of cell lines were cultured in complete serum-supplemented medium (n = 3 per cell line), a defined low serum medium (n = 3) and complete medium followed by 24 h serum starvation (n = 3).Cells were harvested and the transcriptome profiled.To study the relationship between cell phenotype (i.e., ER, PR and HER2 expression) and EV content in each cell line, cells were cultured in low serum medium (n = 3 per cell line) with the conditioned medium harvested for EV isolation.The EV transcriptome was profiled by RNA sequencing (single replicate per cell line) and the proteome was characterized by LC-MS/MS (n = 3 per cell line).Subsequent analyses identified transcripts and proteins differentially represented in EVs derived from ER+ versus ER−, HER+ versus HER2− and TNBC versus non-TNBC cells, respectively.Integrative analyses were then performed to explore the relationship between the cellular transcriptome and the EV transcriptome and proteome in the breast cancer context.

TCEP and alkylated in 40 mM 2 -
chloroacetamide, then digested overnight at 37 • C with Promega Sequencing Grade Modified Trypsin (0.2 µg per 5 µg protein).Digested peptides were de-salted using 100 µL OMIX C-18 tips (Agilent Technologies, Australia) as per manufacturer instructions.Peptides were dried in a SpeedVac centrifugal concentrator and reconstituted in 0.05% trifluoracetic acid (Sigma-Aldrich, Australia).Liquid chromatography mass spectrometry was performed on an easy-nLC 1000 system (Thermo Fisher, Australia) coupled to a Q Exactive Plus instrument (Thermo Fisher, Australia).One microgram of digested peptides were injected onto an Acclaim PepMap RSLC C-18 column (2 µm particle size, 75 µm diameter × 50 cm, Thermo Fisher, Australia).Peptides were separated over a 90 min gradient from 0% to 85% Buffer B (80% v/v Acetonitrile) at 250 nL/min.Full scan MS was acquired at 70,000 resolution from 350 to 1400 m/z.Data dependent MS2 was performed on the top 20 precursor ions at 17,500 resolution from 200 to 2000 m/z.Normalised collision energy was 29 eV, and dynamic exclusion was set at 30 s. Acquired spectra were searched against the Swiss Prot human proteome database (released 8th May 2019 with 20,421 entries) using 365 (Version 2108, Microsoft Corporation, USA) and the R software environment (Version 4.0.3,R Foundation for Statistical Computing, Austria).Unless otherwise stated, n = biological replicates, defined as material derived from replicate cultures of the same cell line.Centre and dispersion measures for graphical representation were mean and standard deviation respectively.These details are defined in the associated figure legends.Graphics were produced with the ggplot2 (3.3.2) AmpliSeq targeted sequencing data was performed using the DESeq2 package (1.30.0)[26].The DESeq command was run with the following parameters: test = 'Wald' , fitType = 'parametric' .Testing to identify differentially expressed genes between culture conditions were performed with alpha = 0.05, with significance defined as an adjusted pvalue < 0.05.Testing to identify differentially expressed genes between cell subtypes (i.e., ER+ vs. ER−, HER2+ vs. HER2− and TNBC vs. non-TNBC, respectively) were performed for cells cultured in low serum medium only, with significance defined as an adjusted p-value <0.1 and an absolute log fold change threshold of 1.5.Count normalization was performed by variance stabilizing transformation (VST) with the following parameters: blind = FALSE, fitType = 'parametric' .VSTnormalized data was used for visualization and exploratory analyses.
test = 'Wald' , betaPrior = TRUE, fitType = 'local' .Gene-wise tests for significance were performed with alpha = 0.05, with significance defined as an adjusted p-value <0.05.Count normalization was performed by variance stabilizing transformation (VST) with the following parameters: blind = FALSE, fitType = 'local' .VST-normalized data was used for visualization and exploratory analyses.Summary statistics of read counts and read mapping were generated by MultiQC v1.8 [31].
MCF7 and T47D) had elevated expression of both the ESR1 and PGR transcripts relative to ER-lines in all three culture conditions.Similarly, the presumed HER2+ lines (BT474 and SKBR3) expressed amplified levels of the ERBB2 transcript.BT483 cells also exhibited some amplification of this transcript, in line with previous observations by other investigators[39].The triple negative (TNBC) lines MDAMB231 and HS578T had minimal expression of all three markers, though notably some expression of ESR1 appeared to be induced in MDAMB231 cells during serum starvation.Despite classification as TNBC cells, HCC1143 showed an intermediate expression level of ESR1 transcript,

(
adjusted p-value <0.05) between low serum and complete culture medium, with 72 up-regulated and 26 down-regulated (FigureS1a(i)).Under the same conditions, 915 genes (486 up, 429 down) were differentially expressed in HER2+ cells (FigureS1a(ii)) and 243 genes (109 up, 134 down) in TNBC cells (FigureS1a(iii)).Interestingly, there was very minimal overlap in the differentially expressed genes between the three cell line groups.Specifically, only a single gene (KRT5) was commonly up-regulated in all three cell groups, and no genes were commonly down-regulated (Figure1C(i)).This indicated that responses to serum deprivation were distinct between cell phenotypes.In the serum starvation condition, 1251 genes (609 up and 642 down) were significantly differentially expressed in ER+ cells (FigureS1b(i)), 230 (110 up, 120 down) in HER2+ cells (FigureS1b(ii)) and 894 (535 up, 359 down) in TNBC cells (FigureS1b(iii)).There was also minimal overlap in differentially expressed genes between the cell phenotypes during serum starvation.Specifically, 26 genes were commonly up-regulated and 16 genes were commonly down-regulated by serum starvation (Figure1C(ii)).Notably, however, several genes including ribosomal RNA processing 7 Homolog A (RRP7A), probable E3-ubiquitin ligase DTX2 (DTX2), Transcription factor MafG (MAFG)and Nuclear transcription factor Y subunit gamma (NFYC) were among the top up-regulated genes (ordered by adjusted p-value) across all three cell subtypes.Our analyses also indicated that the two serum deprivation conditions elicited distinct responses amongs matched cells.For example, only 21 significantly differentially expressed genes (15 up and 6 down-regulated) were common between the low serum and serum starvation conditions for ER+ cells (FigureS2a).Similar patterns were also observed for the HER2+ and TNBC cells (FigureS2b& S2c).Lists of significantly up-and down-regulated genes across cell phenotypes for the low serum and serum starvation conditions are included in Supplementary File 1.

F I G U R E 2
Characterisation of EVs isolated from eight breast cancer cell lines.(A) Total recovered EVs for each cell line.Each point represents one biological replicate (n = 3 except MDAMB231 and HS578T where n = 2).Mean (thick bar) ± SD overlaid in grey for each cell line.(B) Frequency (%) by EV size (nm) for four representative samples (one per expression subtype) as measured by nanoparticle tracking analysis (NTA).(C) Western blot detection of ER & PR expression in a subset of cell lysates and corresponding EVs.Β-actin is included as a loading control for cell lysates.EV markers CD63 and ADP-riboylsation factor 1 (ARF1), and non-EV contaminant serum albumin are included as controls for EVs.(D) Western blot detection of HER2 in a subset of cell lysates and corresponding EVs.B-actin is included as a loading control for cell lysates, EV marker heat shock protein 70 (HSP70) is included as a control for EVs.Colours indicate ER and HER2 expression in cell lines: ER−/HER2− (red), ER−/HER2+ (purple), ER+/HER2− (blue) and ER+/HER2+ (navy).ER-lines.Likewise, EVs derived from the HER2+ lines (SKBR3 and BT474) contained elevated levels of ERBB2 transcript.PCA of the VSNtransformed transcript count data suggested that different cellular expression patterns of ER and HER2 were associated with distinct transcriptomic features in the corresponding EVs (Figure3B).This analysis also suggested that the two sets of replicate samples (BT474/BT474(a) and BT483/BT483(a)) were closely correlated, suggesting that biological and technical variability within cell line replicates was limited.We next performed differential expression analysis to identify overrepresented transcripts in ER+, HER2+ and TNBC cell-derived EVs, respectively.Of a total of ∼14,000 transcripts included in the analysis, 662 were significantly up-regulated (FDR <0.05) and 846 were F I G U R E 3 Exploratory analysis of RNA sequencing data from breast cancer cell-derived EVs.(A) Log 2 normalised expression of ESR1, PGR and ERBB2 transcripts in each sample, (B) principal component analysis (PCA) plot of RNA expression data.Each bar represents a single sample, colours represent cellular ER and HER2 expression.(C) Volcano plots (−log 10 p value against log 2 fold-change) of differential transcript expression between EVs derived from (i) ER+ vs. ER−, (ii) HER2+ vs. HER2− and (iii) triple negative (TNBC) vs. non-TNBC cell lines.Each point represents a single protein.Grey represents no significant expression difference, colour (turquoise, purple or red, respectively) represents a significant expression difference (adjusted p value <0.05).A subset of differentially expressed genes of interest are labelled with gene symbol, along with the ESR1 (ER), PGR (PR) and ERBB2 (HER2) transcripts.(D) Heatmap showing unsupervised hierarchical clustering of samples based on normalised expression of transcripts in the PAM50/ProSigna subtyping signature described in ref. [45].Distance measure for clustering of both rows and columns was Pearson's correlation.down-regulated in ER+ versus ER-EVs (Figure 3C(i)).As expected, both ESR1 and PGR were amongst the up-regulated transcripts.Other overrepresented transcripts included trefoil Factors 1 and 3 (TFF1/TFF3), growth regulating estrogen receptor binding 1 (GREB1) and SAM pointed domain containing ETS transcription factor (SPDEF).For HER2+ EVs, a total of 254 transcripts were up-regulated and 326 down-regulated relative to HER2-EVs.As expected, the ERBB2 transcript was amongst those with the highest positive fold change.Other transcripts up regulated in HER2+ EVs included StAR-related lipid transfer domain containing 3 (STARD3), post-gpi attachment to proteins phospholipase 3 (PGAP3) and mediator subunit 1 (MED1).For TNBC EVs, 1181 transcripts were up-regulated and 819 downregulated relative to non-TNBC EVs.Amongst the top up-regulated transcripts were moesin (MSN), vimentin (VIM), zinc finger E-box binding homeobox 1 (ZEB1) and glutathione S-transferase Pi 1 (GSTP1).Finally, to evaluate if the cellular ER and HER2 expression subtype could be inferred from the EV transcriptome, we performed unsupervised hierarchical clustering of samples based on expression of genes in the PAM50/ProSigna gene list [45].Quantification of the expression of this 50-gene signature in primary breast tumour tissue is used to stratify tumours into five subtypes: luminal A & luminal B (generally ER+), HER2 enriched (generally HER2+), basal (generally TNBC) or normallike.Forty eight of the 50 genes were detected in the EV RNA samples, and VSN-normalised expression values for these used to perform sample clustering.Samples fell into two major clusters, containing TNBC and non-TNBC samples, respectively.The non-TNBC cluster appeared to further separate into two sub-groups containing HER2+ and HER2− samples respectively.
1500 and 2200 proteins were identified per sample.Sparse proteins were removed by applying a filter which retains only proteins identified in at least 80% of the samples from at least one subtype group (i.e., ER+/HER2−, ER+/HER2+, ER−/HER+ or ER−/HER2−).A Venn diagram showing the overlap of identified proteins across the four subtypes is shown as Figure 4A.As expected, most identified proteins (2164 of 2228) were detected at some level across all breast cancer EV

5 (
of the data pre-and post-normalisation and imputation are included as Figure S5.The normalised intensity values with imputed missing values are included in Supplementary File 2. These were used to infer protein expression in each sample.PCA of the normalised data is shown as Figure 4B.The samples appeared to broadly cluster into the four subtype groups.Interestingly, in the previous analysis of the cellular RNA sequencing data, the ER+/HER2− and ER+/HER2+ cell lines had clustered together.In this analysis of the EV proteome, however, these samples appeared more distinct.The clustering patterns suggested differential representation of proteins in the breast cancer EV proteome based on patterns of cellular ER and HER2 expression.As ER expression could not be directly inferred from the EVs, the subsequent analyses sought to identify if the EV proteome indirectly recapitulated the ER and HER2 expression profile of the originating cell.The output of all differential expression analyses is included in Supplementary File 2.First, the proteomic features of EVs derived from ER+ cells (T47D, MCF7, BT474, BT483) were compared to those from ER-(MDAMB231, HS578T, HCC1143, SKBR3) cells.In total, 113 proteins were identified as differentially expressed between ER+ and ER− EVs.Ninety-eight proteins had logFC values greater than 1.Figure4C), and of these, 67 proteins were up and 31 were downregulated in ER+ EVs.Among the proteins with large positive fold changes in ER+ EVs were Na+/H+ exchange regulatory cofactor NHE-RF1 (SLC9A3R1), matrix Gla protein (MGP) and ephrin type-B receptor 4 (EPHB4).Additionally, several proteins involved in the biogenesis of endosomally-derived EVs (exosomes), including components of the endosomal sorting complex required for transport (ESCRT) were also up-regulated in ER+ EVs.These included vacuolar protein-associated sorting proteins 25 (VPS25), 28 (VPS28) and 37C (VPS37C), multivesicular body subunit 12A (MVB12A) and 12B (MVB12B) and tumour susceptibility gene 101 protein (TSG101).

F I G U R E 4
Figure 5A.As expected, there was a significant (p < 0.05) positive linear relationship between cellular and EV RNA count values, with a Pear- current study represents an important step towards realising the translational potential of EVs in breast cancer.The work has empirically validated several assumptions underpinning EV diagnostics in general, primarily that the molecular content of EVs recapitulates that of the originating cell.The current analyses have confirmed this is true in the breast cancer context, albeit with some specific exceptions, most notably including both the ER and PR proteins.The work has further provided proof-of-concept that characterisation of a subset of EV-associated transcripts and/or proteins is sufficient to infer ER and HER2 signalling pathway activity in the originating cell.This implies the potential to monitor the molecular evolution of breast tumours via circulating EVs.Additionally, this study has empirically evaluated two commonly used methods for in vitro EV production, confirming that the molecular characteristics of cultured cells remain largely stable when transitioned to serum-deprived conditions.The core finding of this study was that key molecular features of ER+, HER2+ and TNBC cells were recapitulated in the corresponding EVs.On the transcript level, EVs appeared to directly capture the content of the originating cell, with EV transcript abundance having a strong linear correlation with cellular expression.Further, specific transcripts were differentially represented across EVs which directly reflected transcriptomic features of the originating cells.For example, EVs derived from ER+ cells were enriched in transcripts of known estrogen-responsive genes including trefoil factors 1 and 3 (TFF1/TFF3)[48] and growth regulating estrogen receptor binding 1 (GREB1)[49] compared to EVs from ER-lines.Transcripts overrepresented in HER2+ EVs included StAR-related lipid transfer domain containing 3 (STARD3), post-gpi attachment to proteins phospholipase 3 (PGAP3), proteosome 26S subunit, non-ATPase 3 (PSMD3) and could imply a need to validate EV production conditions for each combination of cell line, culture condition and phenotype of interest in future studies.On the whole-transcriptome level, however, cellular RNA samples derived from the same cell line were highly correlated (Pearson's correlation ≥0.95), regardless of the culture condition.This suggested that the global transcriptomic changes induced by either form of serum deprivation were minimal, with cells retaining their key phenotypic characteristics.To our knowledge, this represents the first empirical validation of culture methods that are widely employed EV research.
conclusion, this work has provided new insight into the proteomic and transcriptomic landscape of breast cancer EVs and is amongst the first attempts to empirically evaluate the relationship between cellular and EV content.The results have established that breast cancer EVs contain disease-relevant protein markers and recapitulate patterns of ER and HER2 pathway activation, albeit indirectly.It has also established that there is a positive linear correlation between cellular transcript abundance and EV transcript abundance.Accompanying this work has been an evaluation of the effect of culture adaptations for EV production on the cellular transcriptome.This has empirically confirmed, for the first time, that breast cancer cells retain diseaserelevant features during serum deprivation and EV production.In total, this work represents an important foundational step towards realising the potential of circulating EVs as a component of liquid biopsies in breast cancer.