Transcriptional profiling of single tumour cells from pleural effusions reveals heterogeneity of epithelial to mesenchymal transition and extra‐cellular matrix marker expression

Dear Editor, Malignant pleural effusions (MPE) in advanced nonsmall-cell lung cancer (NSCLC) offer a rich source of tumour-derived material for liquid biopsy.1 However, molecular monitoring of NSCLC is largely dependent on tumour biopsies. Previous NSCLC MPE studies either did not transcriptionally evaluate the tumour cell compartment of MPEs2 or relied on a positive selection of epithelial (EPCAM expressing) cells.3,4 This strategy excludes cells transitioning to an invasive, mesenchymal phenotype through epithelial to mesenchymal transition (EMT).5–7 Here, wemolecularly characterize single EPCAM-negative and -positive MPE tumour cells (TCs) to investigate the potential of an MPE liquid biopsy. Our study included 11 MPEs from nine NSCLC patients (Table 1 and Supporting Information). 1468 single TCs and 131 pools of 10–15 white blood cells (WBCs) were identified by flow cytometry [median of 146 TCs per patient (range 48–230)] (Figure 1A).8 Among 584 TCs passing quality control (QC), 483 completed staining for EPCAM, revealing that 67% (322 of 483) were EPCAM-negative (Figure 1B). The proportion of EPCAM-positive TCs ranged considerably from patient to patient (median 24%; range 0% 80%). Importantly, UPENN-1 had no detected EPCAMpositive TCs. This suggests that EPCAM based TC isolation may under-represent the number and phenotypic diversity of TCs. t-distributed stochastic neighbour embedding analysis revealed that TCs clustered away fromWBCs (Figure 1C). Index sorting linked the transcriptional profile of each cell to its protein expression, demonstrating that cells in the WBC cluster were EPCAM-negative but CD45-positive (Figure 1C). We confirmed high expression of tumour specific genes KRT7 and KRT8 and epithelial

gene EPCAM among cells in the TC but not the WBC cluster ( Figure S1).
We performed differential gene expression analysis to identify TC specific genes. 185 genes were significantly differentially expressed in MPE TCs versus WBCs (adjusted p-value [p-adj] <0.05 and log2 fold-change log2FC>1.5; Figure S2A and Table S1). Genes significantly upregulated in TCs include NSCLC tumour markers NAPSA, SFTPB, CEACAM6, C3, KRT7, KRT18, and KRT1 ( Figure S2B). Gene Ontology (GO) revealed enrichment for gene signatures including extracellular matrix structural constituent ( Figure S2C and Tables S2-4). Expression of tumour markers and lack of expression of immune markers suggest the lung tumour origin of the MPE TCs.
We sought to identify differentially expressed genes between EPCAM-positive and EPCAM-negative TCs. Sixty one genes were significantly differentially regulated in EPCAM-positive TCs versus EPCAM-negative TCs (p-adj <0.05 and log2 fold-change log2FC>1.5; Figure 2A and Table S5). Epithelial cell transcripts MUC1, KRT7, CEA-CAM6 and NAPSA were significantly enriched in EPCAMpositive TCs versus EPCAM-negative TCs ( Figure 2A) and expressed in the majority (62%-75%) of EPCAMpositive TCs ( Figure 2C). Importantly, KRT7, CEACAM6 and NAPSA are expressed in only 11%-30% of EPCAMnegative TCs implying routine pathological analysis of NSCLC samples with these markers may inadvertently overlook a large number of NSCLC cells undergoing the EMT process. Extracellular matrix (ECM) genes COL1A1, COL1A2, COL3A1 and SPARC were significantly enriched in 52%-65% of EPCAM-negative TCs (Figure 2A,C) while minimal expression of the ECM genes was observed in 3%-28% of EPCAM-positive TCs. GO analysis of genes

F I G U R E 1 Isolation and characterization of pleural effusion tumour cells (TCs) and WBCs by single-cell RNA sequencing. (A)
Representative scatter plots demonstrating the flow cytometric gating strategy for the detection of TCs in the pleural effusion sample from patient UPENN-9. 1468 Figure 2B).
We assessed the expression of a curated list of additional ECM, EMT and tumour specific genes to investigate single-cell heterogeneity among TCs ( Figure 2D). The majority of TCs expressed KRT8, KRT18, KRT19, and mesenchymal gene VIM with considerable heterogeneity in the

F I G U R E 3 Single-cell analysis of EMT in MPE TCs. (A) Box plots of EMT scores for single MPE TCs were calculated for each patient.
The EMT score was calculated by the sum of the log2 Z scores of six established mesenchymal genes (AGER, FN1, MMP2, SNAI2, VIM, ZEB2) followed by subtracting the sum of the log2 Z scores of six established epithelial genes (CDH1, CDH3, CLDN4, EPCAM, MAL2, and ST14) B) Percentage of EPCAM-positive and EPCAM-negative TCs for each patient. The total number of TCs for each patient is shown below the patient number. (C) Linear regression was performed between EMT score and EPCAM protein expression for each MPE TC. A negative correlation was observed between the two variables. The relationship is statistically significant. D) Violin plot of the EMT score for EPCAM-negative and EPCAM-positive TCs. Dashed lines represent quartiles and solid line denotes the median score. Paired t-test was utilized to assess significance (p-value < 0.0001) expression of other epithelial and ECM genes. Next, we constructed a Z score to assess the relationship between the expression of epithelial, keratin and ECM genes. Epithelial (sum of the log2 Z scores of 11 epithelial genes), ECM (sum of the log2 Z scores of seven ECM genes) and keratin (sum of the log2 Z scores of three keratin genes) Z scores were calculated (genes listed in the figure legend). Scatter plot analysis verified that the expression of ECM and epithelial genes are largely mutually exclusive ( Figure 2E). In contrast, EPCAM-negative TCs with a high ECM Z score have a wide range of keratin expressions ( Figure 2F).
Single-cell heterogeneity within each patient sample was assessed by intracluster correlation coefficients (ICC score) using a curated gene set (Table S9). Lower ICC scores reflect higher heterogeneity. Eight of nine samples had high heterogeneity (ICC score range 0.012-0.261) and one sample (UPENN-7) had low heterogeneity (ICC score 0.663) (Table S10). Thus, considerable single-cell heterogeneity exists within patients.
Previously, we demonstrated that an EMT score calculated from RNA sequencing of bulk NSCLC tissue was significantly lower (more epithelial) in patients who respond score. An epithelial Z score was calculated by the sum of the log2 Z scores of 11 epithelial genes (CEACAM6,NAPSA,CDH1,CDH3,CLDN4,CLDN3,CLDN7,EPCAM,ST14,MAL2 and MUC1), an ECM Z score was calculated by the sum of the log2 Z scores of seven ECM genes (SPARC,DCN,MMP2,MMP3,COL1A1,COL1A2 and COL3A1) and a keratin Z score was calculated by the sum of the log2 Z scores of three keratin genes (KRT18, KRT19 and KRT8). Scale bar of heatmap refers to log2 normalized UMI counts to immunotherapy versus non-responders. 9 We sought to demonstrate the feasibility of measuring an EMT score from MPEs. The median single-cell EMT score ranged from 4.61 for UPENN-1 to -1.43 for UPENN-5A, with considerable intra-patient heterogeneity between the minimum and maximum single-cell EMT scores ( Figure 3A). All patients with a high EMT score 7,4,2,and 9) had a high proportion of EPCAM-negative TCs (range 76%-100%). In contrast, all patients with a low EMT score had a low proportion of EPCAM-negative TCs (range = 26%-46%; Figure 3B). A similar inverse relationship between EMT score and EPCAM protein expression was detected at the single-cell level (Correlation -0.322, pvalue 3.96e-13) ( Figure 3C,D, and Figure S3) in MPE TCs. A paired t-test analysis revealed a significant difference between the EMT scores of EPCAM+ and EPCAM-TCs (p-value < 0.0001) ( Figure 3D).

CONCLUSION
Thus, through single-cell transcriptional analysis, we show that the majority of MPE TCs did not express EPCAM and likely escaped detection in previous studies. The unbiased analysis of TCs allowed the identification of transcriptional differences in EPCAM-positive and EPCAMnegative TCs and uncovered significant intra-patient heterogeneity in gene expression and EMT score. We establish the feasibility of an MPE liquid biopsy assay with a potential future diagnostic value as a liquid biopsy in NSCLC patients.

A C K N O W L E D G E M E N T S
Cell sorting was performed in the Abramson Cancer Center Flow Cytometry and Cell Sorting Shared Resource Laboratory and partially supported by BD Biosciences. This work was also supported, in part, by the National Cancer Institute at the National Institute of Health (RO1 CA207643 and CA234225), V Foundation (T2017-009), and the LUNGevity Foundation. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.