Transcriptomic profiling reveals three molecular phenotypes of adenocarcinoma at the gastroesophageal junction

Cancers occurring at the gastroesophageal junction (GEJ) are classified as predominantly esophageal or gastric, which is often difficult to decipher. We hypothesized that the transcriptomic profile might reveal molecular subgroups which could help to define the tumor origin and behavior beyond anatomical location. The gene expression profiles of 107 treatment‐naïve, intestinal type, gastroesophageal adenocarcinomas were assessed by the Illumina‐HTv4.0 beadchip. Differential gene expression (limma), unsupervised subgroup assignment (mclust) and pathway analysis (gage) were undertaken in R statistical computing and results were related to demographic and clinical parameters. Unsupervised assignment of the gene expression profiles revealed three distinct molecular subgroups, which were not associated with anatomical location, tumor stage or grade (p > 0.05). Group 1 was enriched for pathways involved in cell turnover, Group 2 was enriched for metabolic processes and Group 3 for immune‐response pathways. Patients in group 1 showed the worst overall survival (p = 0.019). Key genes for the three subtypes were confirmed by immunohistochemistry. The newly defined intrinsic subtypes were analyzed in four independent datasets of gastric and esophageal adenocarcinomas with transcriptomic data available (RNAseq data: OCCAMS cohort, n = 158; gene expression arrays: Belfast, n = 63; Singapore, n = 191; Asian Cancer Research Group, n = 300). The subgroups were represented in the independent cohorts and pooled analysis confirmed the prognostic effect of the new subtypes. In conclusion, adenocarcinomas at the GEJ comprise three distinct molecular phenotypes which do not reflect anatomical location but rather inform our understanding of the key pathways expressed.


Introduction
Incidence of tumors at the gastroesophageal junction (GEJ) has increased rapidly over the past 50 years. 1 Current clinical classification systems for these tumors are primarily based on the location of the main tumor mass and do not consider tumor biology. 2 These systems have been developed to facilitate the decision making for the optimal surgical approach, which was historically the mainstay of treatment. With newly emerging systemic treatment options and multimodal therapy concepts being more dominant in curative treatment approaches, understanding of the biological processes that define different tumor subtypes is becoming increasingly important.
According to current knowledge, cancers in the distal part of the GEJ (Siewert Type 3) are more likely to arise from the proximal stomach. [3][4][5][6][7] Proximal GEJ tumors (Siewert Type 1), on the other hand, are most likely of esophageal origin. 5 It remains not clear if tumors originating directly from the GEJ (Siewert Type 2) comprise a mixed group of esophageal or gastric cancers or if these constitute a separate entity with distinct biological behavior. A recent study from The Cancer Genome Atlas (TCGA) consortium compared the genomic, epigenetic and transcript profiles of esophageal and gastric cancers comprising approximately 550 cancers. 8 Interestingly, the authors concluded that esophageal, junctional and gastric adenocarcinomas are generally of a similar nature, with the majority of junctional cancers belonging to the chromosomal instability (CIN) subtype that has been described in their previous cohort of gastric cancers. 9 CIN tumors were mainly intestinal-type cancers according to the Laurén classification, as is expected for junctional cancers. 3 Previous studies comparing junctional cancers to "true gastric" adenocarcinomas have often included diffuse-type gastric tumors in the analyses introducing a bias due to the different cancer biology and a distinct genomic profile compared to intestinal type cancers. 9 Previous molecular classifications were mainly based on the genomic features which do not necessarily reflect the active gene transcription landscape.
The primary aim of our study was to define adenocarcinomas at the GEJ according to their transcriptomic profile. Cases were very carefully selected to ensure that we had precise information on the location of the tumor in relation to the GEJ coupled with other clinical annotation. Since the Siewert classification is the current gold standard for clinical stratification of these tumors, we ensured that we had this information on each case in order to compare it with the molecular subtypes obtained. 10 We also performed a pathway analysis of key expressed genes from each subgroup to further define the biological features of these subgroups and performed immunohistochemistry for selected genes to check expression at the protein level. The findings were confirmed in transcriptomic data from four independent datasets with clinical outcome data.

Study cohorts
All tissue samples were chemotherapy and radiotherapy-naïve and prospectively collected either (i) as part of the Oesophageal Cancer Clinical and Molecular Stratification (OCCAMS) study consortium, coordinated by the University of Cambridge, United Kingdom, (ii) at the local tissue bank at Addenbrooke's Hospital, Cambridge University Hospitals (local ethics reference 10/H0305/1), or (iii) at the University of Magdeburg, Germany, Department of Gastroenterology, Hepatology & Infectious Diseases (local ethics references 132/01 and 34/08), before being retrospectively assessed for inclusion in our study. All patients gave written informed consent to tissue archiving and further analyses. The study was conducted in accordance with the Declaration of Helsinki. Tissue samples were obtained either during diagnostic endoscopy or surgical resection of the tumor. Diffuse-type cancers and tumors with mixed pathology were excluded for the reasons explained in the Introduction.
A total of 84 patients with intestinal type adenocarcinoma at the GEJ as defined by Siewert and Stein in 1998 (35 GEJ1: main tumor mass 1-5 cm proximal to the junction, 31 GEJ2: 1 cm proximal to 2 cm distal to the junction, 18 GEJ3: 2-5 cm distal to the junction 10 ) were included in two batches. For comparison, 23 nonjunctional gastric cancers (8 antrum, 15 gastric body) were included, as well as 11 mucosal biopsies from four noncancer controls (4 duodenum, 3 gastric body, 4 gastric cardia; local ethics reference LREC 01/149). Samples with histological evidence of squamous contamination as indicated by clear enrichment of genes associated with squamous differentiation were removed (n = 23) from the core analysis, leaving n = 61 GEJ cancers. Refer to Supporting Information Figure S1 for further details on the cohort selection process.
Four independent cohorts were used for validation purposes. The OCCAMS RNASeq cohort comprised 158 esophageal and GEJ adenocarcinomas. The "BELFAST" cohort included transcriptomic data from an additional 63 esophageal adenocarcinomas What's new? Adenocarcinomas that arise at the junction between the esophagus and the stomach are currently classified based on location. Here, the authors looked at patterns of gene expression of these cancers. They found that gastro-esophageal junction adenocarcinomas can be sorted into three biological subtypes, independent of location, based on gene expression. Group 1 cancers have boosted stomach-specific genes that combat the effects of acid reflux. Group 2 tumors express genes characteristic to the intestinal tract, and the genes active in Group 3 relate to inflammation. The differences in biological pathway expression means that these differences could be used to improve treatment. based on a modified Affymetrix expression array. The "SINGA-PORE" cohort comprised 191, 11 the "ACRG" (Asian Cancer Research Group) cohort of 300 true gastric cancers of Asian origin for comparison 12 (see further details below).

RNA and DNA extraction
Snap-frozen tissue samples and matched blood, as a germline reference, were utilized. One section of the sample was stained with hematoxylin and eosin (H&E) and sent for cellularity review (≥70% tumor cellularity required for cancer samples) by at least two expert pathologists. Careful macrodissection and microdissection were performed to maintain this cellularity threshold. RNA/DNA extraction was performed using the AllPrep kit (Qiagen, Hilden, Germany) and using the QIAamp DNA Blood Maxi kit (Qiagen, Hilden, Germany). RNA with an RNA integrity number (RIN) >7.0 was used for cDNA preparation (applying for material extracted from both biopsies and surgical resection specimens). Gene expression analysis was carried out on Illumina HT12 version 4.0 beadchip kit.

Whole-genome sequencing analysis
For 41 GEJ cases, whole-genome sequencing (WGS) data was generated with 50× coverage for the cancer samples and 30× for germline reference samples as part of the International Cancer Genome Consortium (ICGC). Somatic mutations and indels were called using Strelka 1.0.13. 13 Copy numbers were called using ASCAT-NGS v2.1 14 with the read counts at germline heterozygous positions as input for ASCAT being obtained using GATK 3.2-2. Mutational signatures were identified using the methodology described by Alexandrov et al. 15 To assess the alterations in DNA damage-related pathways in our cohort, we performed an analysis similar to the one described by Pearl et al. 16 Refer to the Supporting Information Methods for further details on the genomic analysis.

RNA sequencing
For the OCCAMS validation cohort of 158 samples, transcriptome data was generated by RNA sequencing. Libraries were prepared using the Illumina TruSeq Stranded Total RNA Library Prep Kit and 75 bp paired-end sequencing was performed using the HiSeq 4000 System. RNA-seq data were aligned to the GRCh37_g1k reference genome using TopHat2. Aligned primary reads were then counted and normalized for gene length and sequencing depth. Log transformation of the expression data was performed as additional step of normalization before final analysis. Downstream analysis (see below) highlighted four outlier samples with extreme distribution of the gene expression pattern which were removed from further analysis resulting in 154 samples that were used for further validation.
For all markers, a semiquantitative analysis of the cytoplasmatic staining was performed according to the modified immunoreactivity score by Remmele and Stegner multiplying the intensity of the cytoplasmatic staining (0: absent-3: strong signal) with the proportion of stained tumor cells (0: none-10: 100%). 17

Transcriptomic data analysis
All transcriptome data analyses were performed on R statistical computing using Bioconductor 18 packages. All differential gene expression analyses were performed using limma in R. 19 p Values for limma-based differential gene expression analyses were adjusted for multiple comparison and represent false discovery rates (FDRs) for the respective tests. All unbiased group assignment was performed using mclust in R. 20 Refer to Supporting Information Methods for further details.
The primary data sets for our study can be accessed as GSE96669. Publicly available datasets were used for validation: "BELFAST" dataset (E-MTAB-4666), the "SINGAPORE" cohort (GSE15459 11 ) and the data of the Asian Cancer Research Group ("ACRG"; GSE66229 12 ). Further datasets included were from colorectal (GSE38832), breast (GSE58812) and lung cancer samples (GSE31210).

Comparison of the transcript profile of GEJ adenocarcinoma
Sixty-one junctional adenocarcinomas across all three Siewert types (GEJ1: 26, GEJ2: 22, GEJ3: 13) were included in the core analysis. There was no significant difference in clinical parameters between the Siewert types, apart from an expected higher proportion of Barrett's esophagus in patients with GEJ1 cancers. Patients underwent standard clinical treatment pathways according to their stage ( Fig. 1a) and there was no significant difference in median survival between GEJ1 (22.2 m), GEJ2 (25.9 m) and GEJ3 (29.9 m) tumors (p = 0.251; Fig. 1b).
Differential gene expression analysis between tumors of different Siewert types using limma 19 revealed that REC8 (REC8 Meiotic Recombination Protein) was the only gene with differential expression when comparing between GEJ1 and GEJ3 tumors (FDR: p = 0.004), and SESN1 (Sestrin-1) between GEJ2 and GEJ3 tumors (FDR: p = 0.024). There were no differentially expressed genes below the threshold of p = 0.01 when GEJ1 tumors were compared to GEJ2 cancers (Figs. 1c and 1d; Supporting Information Table S1). Furthermore, there were no differentially expressed genes between junctional and nonjunctional cancers (Supporting Information  Table S1). When the first two principal components of the transcript profile of these 84 gastroesophageal cancers were displayed, a random distribution was observed according to the anatomical location (Fig. 2a).
Next, we applied an unbiased approach to identify molecularly intrinsic cancer subtypes. Using the mclust algorithm, 20 an optimal solution of three distinct subgroups for the core cohort of 61 GEJ cancers emerged (Supporting Information Fig. 2a). Patients were thus assigned by mclust to three subgroups and a group-by-group differential gene expression analysis was performed to identify genes defining each subtype (Supporting Information Table S2). Of these, 82 genes with a p-score <0.0001 (Supporting Information Methods) were considered as candidates for discrimination between the new subtypes. Since location had no impact on the analysis of differentially expressed genes between tumors of different Siewert types and junctional vs. nonjunctional cancers, we also performed a combined analysis with the 23 nonjunctional gastric tumors which resulted in a similar three group distribution (Fig. 2a, Supporting Information Fig. S2b). Of the genes mentioned above, 67 genes (82%) were also represented in this parallel analysis which were then selected for further validation (Fig. 2b, Supporting Information Methods).
Since GEJ cancers can express a range of intestinal cell types, we also compared the gene expression profile of the identified subtypes with samples from gastric and duodenal mucosa of patients without cancer. Compared to these noncancer mucosal controls, upregulation of cancer-specific genes was confirmed but no further genes were highlighted (Supporting Information Table S3).
Thirty patients for which high-quality surgical resection specimens were available were selected for immunohistochemistry to investigate if the new subtypes could also be confirmed at the protein level. Markers were selected according to the first and second principal component of the gene expression data analysis (Fig. 3).
The immunostaining scores for all markers were as expected for each subgroup (Supporting Information Table S4). For CTSE (p = 0.047) and membranous CLDN18 (p = 0.048), the absolute scores were significantly different between the three , presence of Barrett's esophagus (p < 0.001) and proportion of patients on a curative treatment pathway (p = 0.139) for GEJ Type 1, Type 2 and Type 3 cancers, respectively. There was no statistically significant difference in censored overall survival between cancers of different Siewert type as shown in (b). The boxplots in (c) show the relative expression of genes REC8 and SESN1, which were the only differentially expressed genes in pairwise differential gene expression comparison of GEJ cancers. Panel (d) shows the respective volcano plots for the differential gene expression analyses.
subgroups with the highest scores for Group 1. SULF1 (a marker for stromal activation) was more intensely stained in patients of Group 1 and Group 2 (p = 0.004). Presence of IDO1 positive immune cells was highest in Group 3 tumors (90%; p = 0.217) and was associated with IP10 expression in the tumor (p = 0.017).

Pathway analysis support different biological background of the three subtypes
In order to better understand the biological pathways underpinning the new group assignment, gene-set enrichment analysis was performed using gage in R. 21 Based on KEGG terms, the top essential pathways enriched in Group 1 were "Ribosome," "Fatty Acid Metabolism," "Oxidative Phosphorylation" and pathways involved in nucleic acid turnover (both DNA and RNA). Group 2 was characterized by "Steroid Hormone Biosynthesis," "Peroxisome," "Primary Bile Acid Biosynthesis" and terms related to metabolic processes. Essential KEGG pathways enriched in Group 3 were "Antigen Processing and Presentation," "Chemokine Signaling Pathways" and "Natural Killer Cell-Mediated Cytotoxicity," among other immune-response related terms (Table 1; Supporting Information Table S4). These results were in line with a parallel analysis based on gene ontology terms (Supporting Information Table S5).
A complementary Ingenuity ® Pathway Analysis (IPA ® , QIAGEN Redwood City, www.qiagen.com/ingenuity) showed broadly similar results (Supporting Information Table S6). The expression profile of Group 1 was associated with canonical pathways involved in the degradation of organic substances, with the top regulatory networks being related to fatty acid metabolism. Group 2 showed enrichment for genes involved in retinoic acid receptor activation, bile acid biosynthesis and endothelin signaling. Group 3 was characterized by canonical pathways involved in immune response and cell-cell interaction.
Association of the three subtypes to clinical and genomic parameters Next, we assessed whether there was any association between the new subtypes and clinical parameters. Of 107 cancers, 28 (26.2%) were assigned to Group 1, 39 (36.4%) to Group 2 and 40 (37.4%) to Group 3. Overall, there was no relevant difference between the groups with regards to clinical or demographic factors ( Fig. 4a; Supporting Information Table S7). When only patients with cancer at the GEJ were analyzed, there was a strong association of the presence of Barrett's esophagus with the new subgroups (Group 1: 93.3%, Group 2: 60.7%, Group 3: 40.9%; p = 0.004). . Immunohistochemistry profile of the three subtypes of gastroesophageal adenocarcinoma. The immunohistochemical staining for markers that were ranked highest in the principal component analysis is shown for the respective groups. One representative case for each group is displayed. For some of the markers, distinction was more obvious (e.g., CTSE more strongly expressed in Group 1, and CDH17 more strongly expressed in the Group 2), whereas for some markers differences were subtler (e.g., nuclear staining of CDX1 in Group 2 or cytoplasmic staining of IP10 in Group 3). For MUC5AC cytoplasmic staining and extracellular mucin is assessed, for CTSE, and IP10 cytoplasmic staining is typical, for CLDN18 and CDH17 membranous staining, and for CDX1 nuclear staining. Kaplan-Meier analysis revealed a difference in the median overall survival between the three subtypes, with borderline statistical significance (Group 1: 25.9 m vs. Group 2: 45.2 m vs. Group 3: 83.5 m; p = 0.019; Fig. 4b) compared to the other known clinical parameters: stage of disease (p < 0.001), T-stage (p < 0.001), nodal involvement (p < 0.001) and presence of distant metastases (p < 0.001).
Cox regression analysis showed that the new tumor subtype was an independent prognostic factor for overall survival with a Hazard ratio of 1.506 (95% confidence interval: 1.021-2.222; p = 0.039), along with nodal involvement and distant metastases. There was no difference in the proportion of patients who underwent a curative or a palliative treatment pathway between each group (Fig. 4c, Supporting Information Table S7).
For 41 cases, WGS data were available to compare the genomic properties of the new subtypes. Bearing in mind the heterogeneous nature of genomic alterations in this cancer and the relatively small sample size with WGS available, 22 there was no demonstrable difference between the three subtypes with regards to the overall mutational burden and the profile of copy number aberrations and amplifications or deletions (Supporting Information Fig. 3a). There was enrichment across all groups for mutational signatures 1, 2, 3 and 17 as defined by Alexandrov et al. 15 (Supporting Information Fig. 3b), which was as expected for gastroesophageal adenocarcinomas. 22 Group 3 showed a slightly higher prevalence for alterations in genes involved in DNA damage repair (DDR) pathways (checkpoint factors, chromatin remodeling, Fanconi anemia, telomere maintenance, translesion synthesis; Supporting Information Fig. 3c). In keeping with this, this subgroup also showed a higher proportion of "DDR impaired" positive tumors according to the classification recently published by our group 22 although it did not reach statistical significance due to the relatively small numbers with WGS data available (Supporting Information Figs. 4a and 4b).

Application of new subtype classification in independent cohorts
It is crucial to determine if these findings are reproducible in other datasets across other platforms. Four further datasets were available for analysis. These were not necessarily focused on junctional tumors but demonstrate the broad applicability of these molecular subgroups to esophageal and gastric adenocarcinomas independent of their anatomical location. While the OCCAMS dataset was generated based on RNA-sequencing, the BELFAST, SINGAPORE and ACRG datasets were generated on Affymetrix platforms. The 67 genes panel was applied to all four validation cohorts (Supporting Information Methods) for subtype assignment.
The 154 samples of the OCCAMS cohort recapitulated a three-group solution as expected (Group 1: n = 51, Group 2: n = 77, Group 3: n = 26), which was also the case for the 63 esophageal adenocarcinomas of the BELFAST cohort (Group 1: n = 26, Group 2: n = 15, Group 3: n = 22). The 191 gastric adenocarcinomas from the SINGAPORE cohort 11 (tumors of unclear histological subtype were excluded), could also be classified into the three groups (Group 1: n = 78, Group 2: n = 66, Group 3: n = 47); and subtype assignment was also consistent for the 300 gastric cancers of the Asian Cancer Research Group (ACRG) 12 (Group 1: n = 85, Group 2: n = 108, Group 3: n = 107; Fig. 5a). For the latter two Asian validation cohorts, our classification showed statistical overlap (p < 0.001) with the subtypes that have been previously proposed by Lei et al. 11 and Cristescu et al., 12 but the distribution of the subtypes within the cohorts suggested a distinct classification (Fig. 5b). Ethnic origin did not influence the results since there was no difference in the subtype distribution between Western (OCCAMS, BELFAST and primary study cohort) and Asian (SINGAPORE, ACRG) patients (p = 0.967). This was also the case when cohorts with predominantly esophageal cancers were compared to gastric tumor cohorts (p = 0.351).
A pooled analysis of all 815 cases across all five cohorts (including our primary study cohort) showed significantly different median overall survival, with Group 1 showing the worst and Group 3 the best prognosis (p = 0.001). Similarly to our primary cohort, also grade of differentiation (p < 0.001), UICC stage (p < 0.001), nodal involvement (p < 0.001) and distant metastases were influencing factors (p < 0.001). Cox regression analysis including stage, grading and the new subtypes as factors confirmed both stage of disease (p < 0.001) and the new subtypes (p = 0.002) as independent prognostic factor, whereas grading was not confirmed (p = 0.169). In the individual validation cohorts, a moderate statistical difference in outcome could be seen in the BELFAST (p = 0.038) and the SINGAPORE (p = 0.007) cohort, but not in the ACRG (p = 0.075) and OCCAMS (p = 0.796) datasets (Fig. 5d).
To check whether the findings were cancer type specific we applied our gene panel to datasets from other tumor entities (colorectal, lung and breast) which interestingly also clustered into three groups suggesting that there may be some modules common across multiple cancer types, but they did not show differences in survival (Supporting Information Fig. 5).

Discussion
These data confirm that the biological properties of adenocarcinomas at the GEJ are independent of the anatomical location of the main tumor mass. Adenocarcinomas at the GEJ and nonjunctional gastric cancers of the intestinal type can be stratified into three biologically distinct subtypes based on their gene expression profile.
The pathway analysis gives some insight into the biological basis for each tumor subtype. Group 1 shows features which appear to be in keeping with mucosal damage by reflux components including enrichment of stomach-specific genes, particularly CLDN18 which is upregulated under reflux conditions to increase mucosal resistance to acid 23 and MUC5AC which is upregulated in response to bile exposure. 24 In addition, the metabolic processes enhanced in this group indicate a possible interaction with visceral adipocytes. Adipose tissue can constitute a proinflammatory microenvironment in obese patients, leading to stromal activation which is associated with more aggressive tumor behavior and poor prognosis. [25][26][27][28][29] Negative regulators of adipogenesis like BMP and activin membrane-bound inhibitor (BAMBI) or transglutaminase 2 (TGM2) showed the lowest expression in Group 1 (Supporting Information Fig. S6). 30,31 Group 2 is characterized by metabolic pathways which are usually active in the intestinal and hepatobiliary tract. Expression of the intestinal transcription factor CDX2 can also be induced by exposure to bile acids, mediated by the farnesoid X receptor. 32 The intestinal properties of Group 2 are further supported by expression of Achaete-scute family bHLH transcription factor 2 (ASCL2), an intestinal stemness marker (Supporting Information Fig. S6).
Group 3 is linked to inflammatory response regulation showing a threefold to fivefold higher expression of CD8A (T-cell marker CD8) and GZMB (granzyme B, marker of cytotoxic activity) compared to the other groups (Supporting Information Fig. S6). Gastric cancers with a high ratio of tumor-infiltrating lymphocytes show a better prognosis and are associated with impairment in mismatch repair pathways. 33 DDR impairment can also be associated with chronic infection with H. pylori, 34 and is a feature of a subtype of esophageal adenocarcinomas with a higher mutational and neo-antigen burden. 22 While the small subcohort for which WGS data were analyzed showed a trend toward a higher proportion of "DDR impaired" tumors 22 in Group 3, this association was not confirmed in the OCCAMS validation cohort. In this cohort, Group 2 tumors showed a higher proportion of the "DDR impaired" genome signature type. It is of note that there is some overlap between the dominant genes for Group 2 and Group 3 (Fig. 2). It requires further elucidation in larger cohorts to determine whether our transcriptome-based classification is linked to genome-based subtypes.
We also assessed the association of our new subgroups to MSI status using data from the OCCAMS cohort for which WGS data was available. MSI status was classified as MSI stable (MSS) or MSI-low/high (MSI-L/H) as described before. 22 While 91.4% of patients were classified as MSS, 8.6% were MSI-L/H and there was no association of MSI status to the new subgroups (p = 0.361). The low prevalence of MSI positive cases is in keeping with previous reports for this disease. 9 However, we also compared MLH1 status that was provided for the ACRG cohort with the new subtype classification. Of 300 cases 23.1% were MLH1-negative indicating MSI-H status. This was more often seen in Group 3 (32.9%) when compared to groups 1 (18.7%) and Group 2 (14.8%; p = 0.007). Although there is some overlap, MSI status affects only about a third of patients in Group 3 and is therefore unlikely to be a dominant discriminating factor for our classification.
Interestingly, there is a strong association between the new subgroups and presence of Barrett's esophagus. If only patients with junctional cancers were analyzed, there was a dominance of Barrett's positive cases for the subgroups with stromal enhancement and worse prognosis. These data need to be interpreted with care due to the limited numbers in our study and the incomplete data regarding prevalence of Barrett's esophagus. The significantly higher prevalence of Barrett's esophagus Figure 5. Comparison of subtype distribution and survival in independent cohorts. Panel (a) shows the distribution of each subtype in our primary cohort and across the four validation cohorts (please see main text for further details). We also compared the group stratification as originally published for the SINGAPORE and ACRG cohorts (b). On the left we show the distribution of the originally published subtypes within our new groups for each cohort, on the right the distribution of our newly defined subtypes within each subtype that has previously been published by Lei  is in line with the results of the pathway analysis being suggestive of an influence of bile exposure as well as visceral adipocytes (as seen in obesity) playing a relevant role, both risk factors also relevant for Barrett's metaplasia and its progression.
Our study was not designed to develop a prognostic predictor panel. Explorative analysis of the available clinical data showed a modest prognostic effect that we interpret rather as proof-of-principle data supporting the biological relevance of our subtypes, rather than being of robust prognostic value when compared to other studies. 35,36 It is encouraging that our classification is also supported by the results from further independent datasets given that these comprised RNA-Seq data or were generated on Affymetrix-based platforms, whereas we used Illumina. Two of these cohorts comprised mainly cancers from Asian populations resulting in a different genetic background and different exposure to risk factors when compared to the Western patients of our primary cohort. 11,12 Although there seems to be some overlap between our new subgroups and the previously published classifications, study objectives, methods and design differed from our approach.
Interestingly, Kim and colleagues published data on a cohort of 64 patients with EAC, also demonstrating three subgroups when applying nonsupervised clustering on arraybased transcriptome data. 37 They also demonstrated an association of their subgroup with prognosis. The gene list that served as the foundation for the subgroup assignment is not disclosed so comparison to our groups is limited. Furthermore, the target genes used for subgroup validation were selected based on Cox regression analysis and prognostic relevance whereas we aimed at selection based on biological dominance in the principal component analysis. It requires further prospective validation if our markers or the ones described by others before are useful for clinical application, and if so in which setting (e.g., as a prognostic marker, for treatment assignment or for individual preneoplastic risk assignment). We acknowledge that the results regarding different prognostic outcome for each group were not consistent across all individual validation datasets, but, most importantly, the three molecular subtypes were confirmed for all four validation cohorts, independent from the ethnic origin of the respective cohorts and the platform used for expression analysis.
The staining results in our cohort further support the transcriptome analysis. Some of the immunohistochemical markers have also been previously tested in malignant and premalignant stages of colorectal and gastroesophageal cancers. 38,39 The combination of CDH17 and CLDN18, for example, has been confirmed as being predictive for nodal involvement and poor prognosis in gastric adenocarcinomas. 40 CLDN18 is a dominant marker in our poor prognosis Group 1 and CDH17 is characteristic for Group 2 which shows intermediate outcome in our primary cohort, but poor prognosis in some of the validation cohort. Some of our target genes have also been reported to be relevant for subtypes of pancreatic and right-sided colorectal cancer, suggesting that similar mechanisms such as exposure to small bowel content (including bile and pancreatic enzymes) might be involved in carcinogenesis of gastroenteropancreatic tumors. 38,41 Similarly, dysregulation of specific transcription factors in Barrett's esophagus have been reported to be comparable to gene signals seen in normal colonic mucosa. 42 Of note is the high expression of SULF1 in the two groups with poorer prognosis indicating again the relevance of stromal activation as poor prognostic factor. Saadi et al. demonstrated previously that there is a stage-dependent stromal signature in Barrett's metaplasia, dysplasia and EAC that is associated with prognosis. 43 While the aim of the previous study was the selection of a gene panel with optimal prognostic properties in the present study we aimed to an understanding of the biological background of the newly identified subtypes. This paves the way for further work to determine clinical significance.
In summary, our data show that the transcriptomic profiles of GEJ tumors reflect distinct molecular subgroups of intestinal type gastro-esophageal adenocarcinomas indicative of cell biological function which is independent of anatomical location. Further understanding the biology of these subtypes will help to refine efforts for individualized targeted treatment as well as strategies for early detection and prevention.
normalization processes of the expression data: Menon S and Eldridge MD. Whole genome sequencing data analysis: Secrier M, Bower L and Eldridge MD. RNA sequencing dataset curated and process: Devonshire G. Quality control of the clinical data: Bornschein J, Cheah C, and Turkington R. Sample acquisition contribution: Selgrad M and Venerito M. Funding for the study was obtained from Fitzgerald RC who takes responsibility for the data integrity.