Additional Supporting Information may be found in the online version of this article.

STEM_791_sm_supplFig1.tif13104KSupporting Information Figure S1. MCF12A and MCF10-2A basal cell lines contain two phenotypically distinct subpopulations. EpCAMpos/CD49fhigh (EpCAM+) and Fibros (EpCAMneg/CD49fmedium-to-low) populations were isolated by FACS sorting and analyzed by immunofluorescence staining. Scale bar, 100 μm. EpCAM+ cells have epithelial morphology and express epithelial markers (green), while Fibros exhibit mesenchymal-like phenotype and express mesenchymal markers (red). Abbreviations: CKs, cytokeratins; E-cad, Ecadherin; FN, Fibronectin; N-cad, N-cadherin; Vim, vimentin.
STEM_791_sm_supplFig2.tif1405KSupporting Information Figure S2. Non-tumorigenic basal cell lines contain two distinct subpopulations that can be isolated by differential trypsinization. Parental Myo1089 and MCF12A were grown to confluency and treated with trypsin for 2 min (1st Tryps). The attached cells were subsequently removed after further 10 min incubation in trypsin (2nd Tryps). The process was repeated for three consecutive passages. Flow cytometry analysis (upper panels) and bright field images (lower panels) demonstrate that Fibros subpopulations are enriched in the 1st Tryps, while EpCAM+ are enriched in the 2nd Tryps.
STEM_791_sm_supplFig3.tif7813KSupporting Information Figure S3. Expression of myoepithelial markers by immunofluorescence in sorted EpCAM+ and Fibros subpopulations from MCF12A and Myo cell lines. Basal cytokeratins were stained with 34bE12 antibody, which recognizes CK5, CK10, and CK14. Human breast fibroblasts (HBFs) were used as a positive control for aSMA. Abbreviations: aSMA, alpha smooth muscle actin; CKs, cytokeratins. Scale bar, 20 μm.
STEM_791_sm_supplFig4.tif2189KSupporting Information Figure S4. Evaluation of stem cell/progenitor properties in the MCF12A cell subpopulations. A: EpCAM+ and Fibros isolated by FACS sorting were cultured for 6 passages. Flow cytometry analysis demonstrates that the EpCAM+ cells, but not Fibros, can regenerate the parental phenotype. B: Analysis of CD44/CD24 markers in isolated EpCAM+ and Fibros subpopulations. C: Quantification of ALDH1 activity by Aldefluor assay. Cells were triple labeled with Aldefluor and antibodies for EpCAM and CD49f. Left panels: Aldefluor staining of parental MCF12A cells in the presence or absence of the DEAB inhibitor. Right panels: EpCAM+ and Fibros subpopulations were gated and the proportion of Aldefluor positive cells within each population quantified. Data represent mean number of Aldefluor positive cells +SEM in 5 independent experiments. Differences were statistically significant (t-test p<0.05). Abbreviations: DEAB, diethylaminobenzaldehyde; SSC-A, Side scatter-Area.
STEM_791_sm_supplFig5.tif4568KSupporting Information Figure S5. The mesenchymal-like subpopulation (Fibros) within non-tumorigenic basal cell lines is originated from EpCAM+ cells via EMT. A: Scheme of the repopulation experiment. Left panel: EpCAMneg/CD49fmed-low cells (Fibros) and EpCAMhigh/CD49fhigh cells (EpCAM+) from Myo1089 (and MCF12A, not shown) parental cells were FACS sorted. Central panel top: post-sort purity check demonstrating 100% purity of sorted EpCAM+ cells. Central panel bottom: EpCAM+ cells stained with EpCAM antibody 24h after sort. Right panel: sorted EpCAM+ cells were grown for two passages (14 days) to allow the generation of de novo Fibros (Myo-Fibros-EMT). B: Sorted Myo-Fibros and Myo- Fibros-EMT cells were grown for four days, stained for E-cadherin (E-cad), vimentin (Vim) and fibronectin (FN). RNA was extracted from confluent cells. C: Generation of Fibros-EMT cells from sorted EpCAM+ was monitored by immunofluorescence staining with E-cadherin (E-cad), vimentin (Vim) and fibronectin (FN) antibodies at the indicated time points. D: Fibros and Fibros-EMT cell populations from Myo1089 and MCF12A cell lines were analyzed by qPCR using a panel of differentiation markers. Data represent log10 mean expression values +SEM from two independent experiments. Expression values were normalized to the corresponding sorted EpCAM+ cells (2 days post sort) sample. Differences between Fibros and Fibros-EMT were not statistically significant (t-test, p<0.05)
STEM_791_sm_supplFig6.tif2000KSupporting Information Figure S6. The CD44high/CD24low profile does not identify a subpopulation with increased regeneration ability in Fibros. CD44high/CD24pos and CD44high/CD24neg cells from MCF12A-Fibros (A) and Myo-Fibros (B) isolated by FACS sorting and cultured for 3 passages. Flow cytometry demonstrates that these two populations regenerate each other but retain the EpCAMneg/CD49f med-to-low mesenchymal-like profile.
STEM_791_sm_supplFig7.tif1750KSupporting Information Figure S7. EpCAM+ cell subpopulation within MCF12A Myo1089 and MCF10-2A cell lines exhibits high levels of ALDH1 activity. EpCAM+ cells and Fibros subpopulations were isolated by FACS sorting and Aldefluor activity was measured after 1 week in culture. Numbers represent average proportion of Aldefluor positive cells +SEM in 3 independent experiments. Differences were statistically significant (t-test p<0.05). Abbreviations: DEAB, diethylaminobenzaldehyde; SSC-A, Side scatter-Area.
STEM_791_sm_supplFig8.tif1183KSupporting Information Figure S8. EpCAMneg/CD49fhigh cells do not represent a separate subpopulation within the Myo-Fibros. EpCAMneg/CD49fhigh and EpCAMneg/CD49flow populations within Myo-Fibros were isolated by FACS sorting and tested for functional and phenotypic differences. A: Sorted subpopulations show indistinguishable phenotypical features. B: qPCR expression analysis of diverse differentiation markers shows no evident differences between these two subpopulations. C: Sorted EpCAMneg/CD49fhigh and EpCAMneg/CD49flow cells regenerate each other after 2 passages, indicating that they belong to the same Fibros subpopulation.
STEM_791_sm_supplFig9.tif2351KSupporting Information Figure S9. The EpCAMlow/CD49fhigh subpopulation within the MCF12A cell line is not a distinct subpopulation with increased stem cell features. EpCAMlow/CD49fhigh cells do not represent a distinct subpopulation but contain a mixture of EpCAM+ and Fibros. This may be due to incomplete separation of these two populations by FACS sorting. A: EpCAMlow/CD49fhigh cells were isolated by FACS sorting and cultured for 24 h. Representative bright field and confocal images (EpCAM, green. Fibronectin, FN, in red) showing presence of EpCAM+ and Fibros cells. B: MCF12A cells were labeled with EpCAM, CD49f, CD24 and CD44 antibodies and the EpCAMlow/CD49fhigh subpopulation was gated and analyzed for the CD24/CD44 profile. This subpopulation showed the same variety of CD44/CD24 profiles as unsorted parental cells (left panel). C: Sorted EpCAM+, Fibros and EpCAMlow/CD49fhigh cell subpopulations were analyzed for expression of a variety of differentiation markers using qPCR. In all genes analyzed, EpCAMlow/CD49fhigh showed intermediate expression to EpCAM+ cells and Fibros. D: Analysis of ALDH1 activity by Aldefluor assay. Parental MCF12A cells were triple labeled with Aldefluor and EpCAM and CD49f antibodies. EpCAMpos/CD49fhigh (EpCAM+), EpCAMlow/CD49fhigh and EpCAMpos/CD49fmed-to-low (Fibros) cells were gated as shown in the left panel and the proportion of Aldefluor positive cells within each population was quantified. Numbers represent average proportion of Aldefluor positive cells +SEM in 5 independent experiments. Differences were statistically significant (t-test p<0.05). E: Quantification of mammosphere (< 100 μm) formation in sorted EpCAM+, Fibros and EpCAMlow/CD49fhigh subpopulations. Bars represent the mean number of mammospheres +SEM. Data shown is from 2 independent experiments.
STEM_791_sm_supplFig10.tif2003KSupporting Information Figure S10. Gene expression profiles of EpCAM+ and Fibros subpopulations within non-tumorigenic basal cell lines resemble those of basal-A and basal-B/mesenchymal breast cancer cell lines, respectively. A: The expression of the genes differentially expressed between basal-A (n=315 genes) and basal-B/claudin-low (n=397) cells reported by Neve et al [11], and between basal (n=119) and mesenchymal (n=49) breast cancer cells reported by Charaffe-Jauffret et al [25] were analyzed in EpCAM+ and Fibros subpopulations within the Myo and MCF12A cell lines. T47D luminal cell line and human breast fibroblasts (HBFs) were used as controls. Genes upregulated in each breast cancer cell type in the indicated signatures were represented as box-and-whisker plots and p values were calculated, as previously reported [10, 24]. B: Unsupervised hierarchical clustering of 49 breast cancer cell lines profiled by Neve et al [11] using the 512 genes differentially expressed <1.5 fold between EpCAM+ and Fibros (Supporting Information Table S3). Cell lines were classified as luminal, basal-A and basal-B according to Neve et al [11]. Claudin-low cell lines [10] were included within basal-B. MCF10A and MCF12A (nontumorigenic) and HC1500 (uncertain identity) cell lines were excluded from the analysis. Data were log-transformed, genes and arrays were mean centered, and clustering was performed using the correlation centered method and average linkage.
STEM_791_sm_supplFig11.tif3033KSupporting Information Figure S11. Analysis of the EpCAM/CD49f marker profile in breast cancer cell lines. Upper left panel: Representation of the four different cell subpopulations identified in the normal human mammary gland (adapted from Lim et al [5]). 2 basal-A breast cancer cell lines (BT20, HCC1954), 2 claudin-low breast cancer cell lines (SUM159, MDA-MB-157), 2 luminal cancer cell lines (T47D, MCF7) and the Myo-EpCAM+ and Myo-Fibros subpopulations were subject to flow cytometry analysis with EpCAM and CD49f. Note the similarity in the FACS profile between basal-A cancer cell lines and the Myo-EpCAM+ subpopulation, and between Myo-Fibros and claudin-low breast cancer cell lines.
STEM_791_sm_supplTable1.xls2829KSupporting Information Table S1. Genes differentially expressed between EpCAM+ and Fibros subpopulations identified by significance analysis of microarrays (SAM) using a false discovery rate (FDR) <1%. SAM analyses were performed in Myo and MCF12A cell lines either individually or combining both cell lines. Gene ontology analysis was performed with FDR <5% using the bioinformatics and database package ROCK (
STEM_791_sm_supplTable2.xls20KSupporting Information Table S2. Analysis of cell populations within Myo1089 and MCF12A basal cell lines using the nearest centroid correlation method.
STEM_791_sm_supplTable3.xls126KSupporting Information Table S3. Genes differentially expressed < 1.5 fold (512-gene list) between EpCAM+ and Fibros subpopulations in MCF12A and Myo cell lines.
STEM_791_sm_supplMethods.pdf109KSupporting Information

Please note: Wiley Blackwell is not responsible for the content or functionality of any supporting information supplied by the authors. Any queries (other than missing content) should be directed to the corresponding author for the article.