Transcription start site‐level expression of thyroid transcription factor 1 isoforms in lung adenocarcinoma and its clinicopathological significance

Abstract There are multiple transcription start sites (TSSs) in agreement with multiple transcript variants encoding different isoforms of NKX2‐1/TTF‐1 (thyroid transcription factor 1); however, the clinicopathological significance of each transcript isoform of NKX2‐1/TTF‐1 in lung adenocarcinoma (LAD) is unknown. Herein, TSS‐level expression of NKX2‐1/TTF‐1 isoforms was evaluated in 71 LADs using bioinformatic analysis of cap analysis of gene expression (CAGE)‐sequencing data, which provides genome‐wide expression levels of the 5′‐untranslated regions and the TSSs of different isoforms. Results of CAGE were further validated in 664 LADs using in situ hybridisation. Fourteen of 17 TSSs in NKX2‐1/TTF‐1 (80% of known TSSs in FANTOM5, an atlas of mammalian promoters) were identified in LADs, including TSSs 1–13 and 15; four isoforms of NKX2‐1/TTF‐1 transcripts (NKX2‐1_001, NKX2‐1_002, NKX2‐1_004, and NKX2‐1_005) were expressed in LADs, although NKX2‐1_005 did not contain a homeodomain. Among those, six TSSs regulated NKX2‐1_004 and NKX2‐1_005, both of which contain exon 1. LADs with low expression of isoforms from TSS region 11 regulating exon 1 were significantly associated with poor prognosis in the CAGE data set. In the validation set, 62 tumours (9.3%) showed no expression of NKX2‐1/TTF‐1 exon 1; such tumours were significantly associated with older age, EGFR wild‐type tumours, and poor prognosis. In contrast, 94 tumours, including 22 of 30 pulmonary invasive mucinous adenocarcinomas (IMAs) exhibited exon 1 expression without immunohistochemical TTF‐1 protein expression. Furthermore, IMAs commonly exhibited higher exon 1 expression relative to that of exon 4/5, which contained a homeodomain in comparison with EGFR‐mutated LADs. These transcriptome and clinicopathological results reveal that LAD use at least 80% of NKX2‐1 TSSs and expression of the NKX2‐1/TTF‐1 transcript isoform without exon 1 (NKX2‐1_004 and NKX2‐1_005) defines a distinct subset of LAD characterised by aggressive behaviour in elder patients. Moreover, usage of alternative TSSs regions regulating NKX2‐1_005 may occur in subsets of LADs.


Introduction
Thyroid transcription factor 1 (TTF-1), also known as NKX2-1, is a member of the highly conserved homeodomain-containing transcription factor family, understood; however, different epithelial regions and compartments in the lung are known to be maintained by distinct resident stem cell populations [4]. Notably, a series of peripheral lung cells defined as the terminal respiratory unit (TRU), in which gas exchange occurs, is under the control of NKX2-1/TTF-1 [5]. Furthermore, approximately 70% of lung adenocarcinomas (LADs) express TTF-1 independent of disease stage and retain features of the TRU to a certain extent [6], strongly suggesting that NKX2-1/TTF-1 is a potential lineage survival oncogene in lung cancer [7]. Currently, TTF-1 is widely used as the most specific marker for LAD diagnosis in routine pathological examination [8], which leads to the detection of actionable alterations, such as EGFR, KRAS, and BRAF mutations; gene fusions involving anaplastic lymphoma kinase (ALK); rearranged during transfection (RET); and proto-oncogene tyrosine-protein kinase ROS (ROS1) or tyrosine-protein kinase Met (MET) exon 14 skipping [9,10]. Despite the discovery of these oncogene mutations, at least 12% of patients with LAD do not possess any of these genetic alterations [10,11], suggesting that other molecular changes likely contribute to lung cancer development.
Epigenomic features do not affect the DNA sequence, but may affect the transcriptional output of genes in a cell-type specific manner by altering the activity of regulatory elements, including promoters, which are located proximal to the transcription start site (TSS) of genes [12]. Alternative splicing is the process by which a single gene may produce many different transcripts that may show a wide range of activities, and is responsible for much of the diversity of the human proteome [12]. TSS determination of NKX2-1/ TTF-1 transcription shows multiple TSSs, in agreement with multiple transcript variants encoding different isoforms [5]. In humans, two complementary DNAs (cDNAs) were initially identified that translate into the 42-kDa 'major isoform' and the 44-kDa 'minor protein isoform'. These isoforms were differentially expressed during mouse fetal lung development, with the onset of accumulation of the longer transcript occurring at a later stage than that of the shorter transcript [13,14]. These two transcripts have differences in their capacity to activate the surfactant protein-C promoter, which is a pulmonary differentiation-specific gene, indicating functional differences [13]. Furthermore, NKX2-1/TTF-1 shows different functions depending on cell conditions, being considered a double-sword gene with lineage-dependent tumour cell survival and tumour suppression activities depending on the context [7]. This suggests that each isoform has differential functions in lung carcinogenesis. However, the clinicopathological significance of each promoter and the concordant isoform in LAD remains largely unknown.
In recent studies, an atlas of human cellular states based on regulatory element activities across the genome, such as promoters [15] and enhancers [16], has been built by monitoring transcription initiation activities with cap analysis of gene expression (CAGE) [17]. The method determines 5'-end sequences of messenger RNA (mRNA) using next-generation sequencing, where cDNAs are synthesised from extracted RNA, and cDNAs corresponding to 5'-ends of RNA are selected using the cap-trapper method [18] and sequenced. Obtained reads are aligned with genome sequences and their 5'-ends indicate frequencies of TSSs at single-base resolution [19]. Herein, the ability of this technology to elucidate the role of each TSS and transcript isoform of NKX2-1/TTF-1 in LAD was examined, with special emphasis on its prognostic impact. The clinicopathological significance of NKX2-1/ TTF-1 exon 1 expression in a large cohort of Japanese patients was further evaluated using RNAscope, a novel in situ hybridisation assay. We used the NKX2-1 probe that was designed to target exon 1 in NKX2-1_004 (ENST00000518149.5_4). This study expands the understanding of the role of NKX2-1/TTF-1 in LAD.

Study population
The archives of the Department of Human Pathology, Juntendo University School of Medicine, were screened for all patients who had undergone a complete resection of primary LAD from February 2010 to July 2016. Clinicopathological data were obtained, including age, gender, smoking status, tumour size, lymphovascular invasion, lymph node and distant metastases, resection type, adjuvant therapy, and mutation status of EGFR and KRAS. The archives contained data for 1,124 patients with LAD. Of the 1,124 LAD samples of the cohort, 71 cases were assigned to the discovery set used to perform CAGE assay [20], while full-length RNA sequencing (RNA-seq) was also performed in seven cases [9]. Among the remaining 1,053 cases, adenocarcinoma in situ, minimally invasive adenocarcinoma, and lepidic adenocarcinoma were excluded to clarify the prognostic impact of NKX2-1 exon 1 expression. Invasive LADs with intermediate-to high-grade clinical aggressiveness including acinar, papillary, solid, micropapillary, or other invasive adenocarcinomas of a special type [21] were assigned to the validation set. Follow-up was conducted for all patients via regular physical and blood examination, with mandatory X-ray, computed tomography, or magnetic resonance imaging. Informed consent was obtained from all involved patients. The study design was ethically approved by the institutional review board of Juntendo University (Approval No. 2020096).
Bioinformatics analysis of the CAGE data set CAGE data were obtained from a previous study [20]. In brief, the CAGE reads were aligned to the reference genome (hg19) with a high mapping quality of ≥20. The aligned CAGE reads were counted in each region of the FANTOM5 robust peaks [15], a reference set of TSS regions, as raw signals for promoter activities. Expression levels of individual TSSs were quantified as counts per million (CPM). Inactive TSS regions, with CPM ≤ 1 in more than 77% of samples, were filtered out [22]. Associations among the TSS regions were assessed by Spearman's rank correlation. The distances between the samples in the NKX2-1 TSS regions were calculated as Euclidean distances for CPM, and the average linkage clustering was performed using R (version 3.6.3, https://www.r-project.org/). Based on expression levels, survival analyses of individual TSS regions were performed using the survival package in R (https://cran.r-project.org/web/packages/survival/).

Histological and immunohistochemical analyses
All tissues were fixed in 10% formalin-fixed para ffin-embedded (FFPE) after routine processing. Haematoxylin and eosin (H&E)-stained slides and Elastica van Gieson-stained slides were available for all patient samples. All tumours measuring 3 cm or less in diameter were submitted in their entirety, and larger tumours were sampled extensively. Pathological diagnoses were based on the 2015 World Health Organization classification [23]. For immunohistochemical analyses of TTF-1 (clone 8G7G3/1; DAKO, Glostrup, Denmark), tumours were assembled into tissue microarrays (TMAs), using 1.5-2.0 mm cores sampled from one or two different representative areas of each FFPE tissue block (Pathology Institute Corp., Toyama, Japan), as previously described [24]. TTF-1 was considered positive if 1% or greater of tumour cells were stained.

RNAscope assay and image analysis
In situ detection of NKX2-1 transcript was performed with a RNAscope Assay using the RNAscope Duplex Reagent Kit (#322430; Advanced Cell Diagnostics Inc., Newark, CA, USA), according to the manufacturer's instructions. The NKX2-1 probe was designed to target exon 1 in NKX2-1_004 (ENST00000518149.5_4) (Advanced Cell Diagnostics Inc.) (see supplementary material, Table S1). For the RNAscope assay, TMA slides from FFPE tissue blocks were used. RNAscope and immunohistochemistry for TTF-1 were performed on serial sections. The 664 cases of the validation cohort and 33 cases of the discovery cohort were submitted to RNAscope assay. To ensure result interpretability, a positive (#313901, RNAscope Positive Control Probe-Hs-PPIB; Advanced Cell Diagnostics Inc.) and a negative control probe (#310043, RNAscope Negative Control Probe-Hs-DapB; Advanced Cell Diagnostics Inc.) were used. After staining, TMA slides were scanned using the Nuance Multispectral Imaging System (version 3.0.2; Perkin Elmer Inc., Waltham, MA, USA). inForm Advanced Image Analysis Software (version 2.4.0; Perkin Elmer Inc.) was used for quantitative image analysis. Four random areas (0.09048 mm 2 each) in each sample were analysed at ×400 total magnification. The data are expressed as optical density (average signal levels per area). An optical density of ≥66.56 was considered positive for expression based on the optical density distribution data in the 664 samples examined in this study (see supplementary material, Figure S1).

Quantitative polymerase chain reaction
Quantitative polymerase chain reaction (qPCR) was performed on 20 cases of invasive mucinous adenocarcinoma (IMA) of the lung whose genetic alterations were previously described [25], as well as three EGFRmutated LAD cases that were immunohistochemically positive for TTF-1, consisting of two cases of papillary adenocarcinoma and one case of acinar adenocarcinoma. RNA was extracted from FFPE tissue using the RNeasy FFPE Kit (Qiagen, Hilden, Germany). qPCR was performed using inventoried Taqman assays (Applied Biosystems, Carlsbad, CA, USA) corresponding to exon 1 in the NKX2-1_004 isoform (forward: 5'-GCCATTTACGCCACCACTTTAA-3'; reverse: 5'-GCAGCTCAGCCATGCAAA-3'; probe: AAGATATT TGGTTATTCCCG), TTF-1 exon 4/5 (Assay ID: Hs00968940_m1; Thermo Fisher Scientific, Waltham, CA, USA), myosin-binding protein H (MYBPH) (Assay ID: Hs00192226_m1; Thermo Fisher Scientific), and glyceraldehyde 3-phosphate dehydrogenase (GAPDH) (Assay ID: Hs02786624_g1; Thermo Fisher Scientific). All PCRs were performed with a TaqMan Fast Advanced Master Mix (Applied Biosystems) on an Applied Biosystems Step One Plus Real-Time PCR System in accordance with the standard protocols. The amount of each target gene relative to the GAPDH housekeeping gene was determined using the comparative threshold cycle Cell lines and cell culture EGFR-mutated cell lines (H3255, 11-18, H4006, HCC827, PC9, and H1650) and KRAS-mutated cell lines (HCC44, H23, H2030, and A549) were purchased from the American Type Culture Collection (Manassas, VA, USA). All cell lines were cultured in RPMI-1640 supplemented with 10% FBS and 1% penicillin-streptomycin in an atmosphere of 5% CO 2 at 37 C. All cell lines were routinely tested for Mycoplasma and were found to be negative.

Western blot assays
Protein samples were separated by SDS-PAGE and subsequently blotted onto a polyvinylidene fluoride membrane. An iBind Western Device (Life Technologies Corporation, Carlsbad, CA, USA) was used for the antigen-antibody reaction. The membrane was incubated with antibodies against TTF-1 (#sc-53 136; Santa Cruz, Dallas, TX, USA) and GAPDH (#sc-32 233; Santa Cruz). Bound antibodies were detected with horseradish peroxidase-conjugated secondary mouse antibody (GE Healthcare Biosciences, Little Chalfont, UK), and images were taken using the Amersham Imager 680 (GE Healthcare Biosciences).

Statistical analysis
Categorical variables were analysed using a Fisher's exact or chi-square test. To determine prognosis, Kaplan-Meier survival analysis was performed. The date of surgical resection was set as the starting point and the

Clinicopathological characteristics in the study cohort
Clinicopathological characteristics of 71 and 664 patients with LADs examined in the discovery and validation data sets, respectively, are shown in Table 1. The median age of the 71 cases in the discovery data set was 66.7 years, 29 (40.8%) of which were female and 37 (52.9%) were never or light smokers (the smoking index was ≤400); the median age of the 664 cases in the validation data set was 67.8 years, 314 (47.2%) of which were female and 367 (55.4%) were never or light smokers. Overall, comparison of the discovery and validation data sets revealed no significance differences in clinicopathological features, such as age, gender, smoking status, and the pathological stage at presentation. As expected, histological subtypes were significantly different among the two groups as the validation data set consisted of invasive LADs, including

Detection of NKX2-1/TTF-1 exon 1 in FFPE LAD tumour tissues
The clinicopathological impact of expression of NKX2-1/ TTF-1 exon 1 was further examined. Expression of NKX2-1/TTF-1 exon 1 was evaluated using RNAscope and a designed probe targeting NKX2-1/TTF-1 exon 1 in FFPE tissues. First, the association of expression of NKX2-1/TTF-1 exon 1 was examined, which was detected by CAGE and RNAscope in the discovery data set, and whose tumours exhibited high-to intermediategrade histology (acinar, papillary, and solid adenocarcinoma), indicating that there were correlations among them (p = 0.0020) (see supplementary material, Figure S2). Next, RNAscope was performed in the validation data set    Table 2.

Clinical outcomes
The median follow-up period after surgery for all patients in the validation data set was 53.2 months. Overall survival (OS) rate was significantly associated with pathological stage (log-rank test, p < 0.0001; Breslow-Wilcoxon test, p < 0.0001) and pathological grade (log-rank test, p = 0.0008; Breslow-Wilcoxon test, p < 0.0001) ( Figure 5A,B). Patients with TTF-1-positive tumours had significantly favourable OS (log-rank test, p < 0.0001; Breslow-Wilcoxon test, p < 0.0001) ( Figure 5C), which is consistent with previous reports [27]. In addition, patients whose tumour exhibited no expression of NKX2-1/TTF-1 exon 1 had significantly shorter median OS (log-rank test, p = 0.0306; Breslow-Wilcoxon test, p = 0.0032); however, these differences were barely significant compared to TTF-1 protein expression ( Figure 5D). Furthermore, among tumours with TTF-1 protein expression, median OS was shorter in those patients whose tumours exhibited no expression of NKX2-1/ TTF-1 exon 1 than in those patients with expression of NKX2-1/TTF-1 exon 1, although these differences were not significantly different; similar results were obtained among TTF-1-negative LADs, suggesting that a tumour suppressive role of NKX2-1/TTF-1 transcript isoforms lacking exon 1 may be independent of TTF-1 protein expression ( Figure 5E,F). In subgroup analyses of EGFR-and KRAS-mutated cases, there was no significant difference in survival between tumours with and without expression of NKX2-1/TTF-1 exon 1, although the latter had shorter OS (41 months in EGFR-mutated cases and 36 months in KRAS-mutated cases) (see supplementary material, Figure S3).

Discussion
To the best of our knowledge, this is the first study to assess the association between NKX2-1/TTF-1 isoforms and various clinicopathological parameters in LAD. These results of TSS-level expression of NKX2-1/TTF-1 revealed that LAD cells use at least 80% of NKX2-1 TSSs, suggesting that each TSS and transcript isoform could play a distinct role in lung tumourigenesis, which contributes to the degree of heterogeneity of tumours. Furthermore, it was demonstrated that LAD with low expression of isoforms from TSS region 11, as well as from TSSs 2, 4, and 10, which regulate NKX2-1_004 and NKX2-1_005 containing exon 1, exhibited poor prognosis. Among NKX2-1/TTF-1 isoforms, NKX2-1_001 and NKX2-1_002 have been the focus of intense research activities. 'The proximal major promoter' that regulates NKX2-1_001 contains a TATA-like element and binding site for Forkhead box A1 (FOXA1) (HNF3α), FOXA2 (HNF3β), and GATA-binding protein 6 (GATA6), all of which are known to be crucially involved in lung development, whereas 'the minor distal promoter' that regulates NKX2-1_002 is modulated by the transcription factors SP1 and SP3 [28]. However, the role of NKX2-1_004 and NKX2-1_005 containing exon 1 in lung development, homeostasis, and tumourigenesis remains largely unclear. Interestingly, NKX2-1/TTF-1 exon 1 is conserved across multiple species, including non-primate species, except fish, in contrast to other exons in NKX2-1/TTF-1 that are highly to completely conserved, regardless of species, on the University of California Santa Cruz Genome Browser [29], suggesting that NKX2-1_004 and NKX2-1_005, especially NKX2-1_004 containing a homeodomain, may play a pivotal role in the development and regulation of homeostasis of peripheral lung epithelial cells.
In the present study, LADs with no expression of NKX2-1/TTF-1 exon 1 were associated with poor survival outcomes. As NKX2-1_005 lacks a homeodomain that binds DNA in a sequence-specific manner and transcriptionally activates target genes, reduced expression of NKX2-1_004 rather than NKX2-1_005 may induce aggressiveness in LAD, suggesting a tumour suppressive role of NKX2-1_004. Likewise, previous reports [27] along with these data show that reduced expression of TTF-1 is significantly associated with unfavourable prognosis in patients with LAD, indicating a tumour suppressive function of NKX2-1/TTF-1 in lung tumourigenesis. However, loss-of-function and gain-of-function studies in human lung carcinoma and transformed cells support a role of NKX2-1 as an oncogene [7,[30][31][32][33]. Furthermore, haploinsufficiency or conditional knockout of Nkx2-1/ Ttf-1 in a transgenic mouse model leads to enhanced development of Kras-mutated lung tumours, in contrast to suppression of Egfr-mutated lung tumours [26]. Notably, MYBPH, a direct transcriptional target of NKX2-1/TTF-1, reduces cell motility and metastasis in KRAS-mutated cell lines [34]. In addition, NKX2-1/TTF-1-regulated microRNA-532-5p has a tumour suppressive role by targeting KRAS in LADs [35]. These data suggested that NKX2-1/TTF-1 had both an oncogenic and suppressive role in lung tumourigenesis, which could be dependent on mitogenic driver mutations. Although the specific function of NKX2-1_004 in both LAD and normal lung tissues remains unclear at present, recent comprehensive epigenome and transcriptome analyses using Tracing Enhancer Networks using Epigenetic Traits (TENET) reveals that NKX2-1 is the top transcriptional regulator inactivated in LAD, and is linked to over a hundred inactivated enhancers [36]. Further studies by other approaches will be needed to elucidate transcriptional regulation of NKX2-1 in lung cancer development and the role of each splice variant in different genetic backgrounds. Nevertheless, it was demonstrated that no expression of NKX2-1/TTF-1 exon 1 was frequently detected in EGFR and KRAS wild-type tumours, suggesting that a tumour suppressive role of NKX2-1_004 might be independent of such oncogenic alterations.
IMA is a unique histological variant of LAD, which commonly lacks TTF-1 expression and expresses hepatocyte nuclear factors (HNFs), including HNF4α [25]. An inactivating mutation or epigenetic silencing of NKX2-1/ TTF-1 downregulates its protein expression. Recently, while NKX2-1/TTF-1 inactivation mutations are rare, they are found in TTF-1-negative LADs, especially in IMA (33-43%) [37,38]. Furthermore, NKX2-1/TTF-1 is hypermethylated in the remaining TTF-1-negative cases; however, neither inactivation mutations nor hypermethylation is detected in some TTF-1-negative LADs [37], suggesting that other mechanisms of epigenetic silencing, such as microRNA and histone modification, may be involved in the downregulation of NKX2-1/TTF-1. In the present study, A549 cells exhibited expression of NKX2-1/TTF-1 exon 1, despite lack of expression of NKX2-1/TTF-1 exon 4/5 and subsequent TTF-1 protein expression in vitro, which is consistent with a previous study showing that neither NKX2-1_001 nor NKX2-1_ 002 transcripts are detected in A549 cells [14]. Moreover, it was identified that most IMA exhibited the same expression pattern. Thus, it is possible that a subset of LADs, including IMA, use an alternative TSS and subsequently express NKX2-1_005 lacking a homeodomain. It still unclear whether NKX2-1_005 translates into protein. However, accumulated evidence shows that long non-coding RNAs play a pivotal role in gene regulation [39]. RNAs insufficiently spliced from NKX2-1_005 are retained in the nucleus and might be linked with their specific subcellular localisations and functions in IMA, and may distinguish the biological behaviour of IMA from that of other conventional LADs. Alternatively, the protein translated from NKX2-1_005 might act in a dominantnegative manner, affecting interactions of other isoforms with cofactors, and thus affecting TTF-1 binding to its cognate sites.
However, this study has some limitations including undetermined transcriptome profiles of NKX2-1 in lymph nodes or in distant metastases, as NKX2-1/ TTF-1 shows different functions depending on cell conditions [7]. Additional studies are required to clarify the clinicopathological impact of NKX2-1/TTF-1 exon 1 expression in LAD using samples from metastatic sites or recurrent disease.
In summary, these transcriptome and clinicopathological analyses reveal that LADs harbour at least 14 TSSs of NKX2-1/TTF-1, and decreased expression of NKX2-1/TTF-1 transcript isoforms with exon 1, such as NKX2-1_004, lead to poor prognosis in patients with LAD, most of which had a EGFR/ KRAS wild-type genotype. In situ hybridisation for Epstein-Barr virus-encoded RNA is practically used in the pathology laboratory. Likewise, detection of specific exons or exon junctions by in situ hybridisation may be useful to further classify LADs. While these results are valuable as an indicator of a patient's prognosis, further investigation targeting cancer-specific splice variants, such as NKX2-1_005 in IMA, may be novel potential targets for LAD.

Author contributions statement
KSa, TH, SKi, MK, TY and TS provided pathological information. KT and KSu provided patient's clinical information. KSa and SS carried out RNAscope assay and image analysis. MH, KT and SKo carried out bioinformatics analysis of the CAGE data set. KSa, TH, YS, KK, MI and TS conceived experiments and analysed data. All authors were involved in writing the paper and had final approval of the submitted and published versions. TH takes full responsibility for the work as a whole, including the study design, access to data, and the decision to submit and publish the manuscript.  Table S1. RNAscope probe design Table S2. Expression level of NKX2-1/TTF-1 promoters in 71 LADs

374
K Sano et al