Pulmonary adenocarcinoma: A renewed entity in 2011



    Corresponding author
    1. Departments of Thoracic/Head and Neck Medical Oncology
      Humam Kadara, Department of Thoracic/Head and Neck Medical Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA, Email: hkadara@mdanderson.org
    Search for more papers by this author

    1. Departments of Thoracic/Head and Neck Medical Oncology
    Search for more papers by this author

    1. Departments of Thoracic/Head and Neck Medical Oncology
    2. Pathology, The University of Texas MD Anderson Cancer Center, Houston, Texas, USA
    Search for more papers by this author

  • The Authors: Humam Kadara, PhD, is an instructor in the Department of Thoracic/Head and Neck Medical Oncology, Division of Cancer Medicine, The University of Texas MD Anderson Cancer Center. His research interests focus on lung cancer genomics, pathogenesis and prevention. Mohamed Kabbout, PhD, is a postdoctoral fellow in the same department researching on mutant Kirsten rat sarcoma oncogene-mediated lung cancer pathogenesis. Ignacio I. Wistuba, MD, is a Jeri and Lori Eisenberg Professor of Pathology in the Departments of Thoracic/Head and Neck Medical Oncology and Pathology and director of the Thoracic Molecular Pathology Laboratory at the University of Texas MD Anderson Cancer Center. His research interests focus on understanding the molecular pathology of lung cancer to guide or develop therapeutic and prevention strategies.


Humam Kadara, Department of Thoracic/Head and Neck Medical Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA, Email: hkadara@mdanderson.org


Lung cancer, of which non-small-cell lung cancer comprises the majority, is the leading cause of cancer-related deaths in the United States and worldwide. Lung adenocarcinomas are a major subtype of non-small-cell lung cancers, are increasing in incidence globally in both males and females and in smokers and non-smokers, and are the cause for almost 50% of deaths attributable to lung cancer. Lung adenocarcinoma is a tumour with complex biology that we have recently started to understand with the advent of various histological, transcriptomic, genomic and proteomic technologies. However, the histological and molecular pathogenesis of this malignancy is still largely unknown. This review will describe advances in the molecular pathology of lung adenocarcinoma with emphasis on genomics and DNA alterations of this disease. Moreover, the review will discuss recognized lung adenocarcinoma preneoplastic lesions and current concepts of the early pathogenesis and progression of the disease. We will also portray the field cancerization phenomenon and lineage-specific oncogene expression pattern in lung cancer and how both remerging concepts can be exploited to increase our understanding of lung adenocarcinoma pathogenesis for subsequent development of biomarkers for early detection of adenocarcinomas and possibly personalized prevention.


Lung cancer is the leading cause of cancer deaths in the United States and worldwide in both developing and developed regions.1 The high mortality of this disease is in part due to the late diagnosis of the majority of lung cancers after regional or distant spread of the malignancy2 and when only palliative treatment options are available.3 Given that various epithelial tumours develop in a multi-stage stepwise fashion, it is plausible to assume that early diagnosis of lung cancer or intraepithelial lesions coupled with effective prevention strategies will improve survival of patients and reduce the significant health burden and mortality associated with this disease.3 Despite recent encouraging findings from the National Lung Screening Trial (NLST),4 early detection of lung cancer is challenging due to the lack of biomarkers for early diagnosis of the disease and to the presence of multiple neoplastic molecular pathways that mediate lung carcinogenesis. A better understanding of the molecular origins of lung cancer is expected to pave the way for unmet effective and personalized strategies for lung cancer prevention and treatment.

The two major forms of lung cancer are non-small-cell lung cancer (NSCLC), which accounts for approximately 85% of all diagnosed lung cancers, and small-cell lung cancer (SCLC), which constitute about 15% of lung neoplasms.2 NSCLC is comprised of three major histological subtypes, squamous-cell carcinomas (SCC), lung adenocarcinomas and large-cell lung carcinomas.2,5 Several major differences exist between adenocarcinomas and SCC, the two major subtypes of NSCLC. Compared with SCC and SCLC that arise from the major bronchi and are centrally located, pulmonary adenocarcinomas arise from small bronchi, bronchioles or alveolar epithelial cells, and are typically peripherally located as reviewed elsewhere.2,5–7 Clinically, SCC and lung adenocarcinoma respond differently to chemotherapeutic agents, exemplified by the use of pemetrexed for treatment of the latter subtype and not for SCC.8,9 Moreover, although smoking is the major causative factor in lung cancer pathogenesis, significant differences in smoking patterns are observed between the two major NSCLC histological subtypes. Whereas SCC pathogenesis is strongly linked to smoking, lung adenocarcinoma is the more common histological subtype in never-smoker patients.10–13 Accumulating evidence suggests that lung adenocarcinoma arising in never-smokers is a disease with different pathological and epidemiological features compared with adenocarcinomas causally linked to cigarette smoking.13 Specifically, never-smoker lung adenocarcinoma is more commonly diagnosed in females compared with males14 and is more frequently found in eastern and southern parts of the Asian continent,15 and displays better prognosis and survival compared with ever-smoker patients.2,12,13 At the molecular level, and to date, two major pathways are thought to mediate lung adenocarcinoma development: an epidermal growth factor receptor (EGFR)-dependent pathway in never-smokers and a Kirsten rat sarcoma oncogene (KRAS)-dependent signalling module in smokers16–23 (discussed further later). Further understanding of lung adenocarcinoma pathogenesis would be needed to unravel other pathways that play important roles in development of this major subtype of lung cancer.

Lung adenocarcinomas have a wide spectrum of clinical, molecular and histological features.24 The 2004 World Health Organization (WHO) classification of lung tumours included four growth patterns for its adenocarcinoma classification: bronchioloalveolar (BAC; also known as lepidic), acinar, papillary and solid.24 Most invasive lung adenocarcinomas are heterogeneous in nature and include more than one of these histological patterns.24,25 The existing lung adenocarcinoma histological heterogeneity and the varying clinicopathological features (e.g. patient outcome) of the aforementioned histological patterns highlight the importance of incorporating histological pattern information into clinical management of this complex disease. More recently, the European Respiratory Society (ERS), the International Association for the Study of Lung Cancer (IASLC) and the American Thoracic Society (ATS) sponsored a new classification of lung adenocarcinoma.26 The new classification study presented several modifications to the WHO 2004 criteria for diagnosis of resected adenocarcinoma specimens. Mainly, the consortium study suggested that the term BAC should be discontinued.26 Instead, it is agreed that adenocarcinoma in situ (AIS) and minimally invasive adenocarcinoma (MIA) are to be used for small adenocarcinomas with either pure lepidic growth or predominant lepidic growth with less than 5 mm invasion, respectively.26 Moreover, the new classification dropped the use of mixed subtype, and instead, adenocarcinomas are classified according to their predominant subtype.26

This review will describe advances in the molecular pathology of lung adenocarcinoma with emphasis on genomics and DNA alterations of this disease. Moreover, the review will describe recognized lung adenocarcinoma preneoplastic lesions and current concepts of the early pathogenesis and progression of the disease. We will also portray the field cancerization phenomenon and lineage-specific oncogene expression pattern in lung cancer and how both remerging concepts can be exploited to increase our understanding of lung adenocarcinoma pathogenesis for subsequent development of biomarkers for early detection of adenocarcinomas and possibly personalized prevention.


Molecular pathology of lung adenocarcinoma

Lung adenocarcinomas exhibit unique genomic aberrations compared with lung SCC, indicating that the molecular pathology of both NSCLC subtypes encompasses different molecular pathways of development and progression.2 Earlier studies have shown that lung SCC exhibit higher frequencies of deletions at chromosomal regions 17p13 (TP53), 13q14 (RB), 9p21 (CDKN2A), 8p21–23 and 3p compared with lung adenocarcinomas.27–29 Moreover, many of the aforementioned molecular abnormalities (e.g. allelic losses at 9p21 and 13q24) occur in the sequential multi-step progression of SCC but not of adenocarcinomas.6,27 In contrast, mutations in the KRAS, EGFR and HER2/NEU oncogenes occur almost exclusively in adenocarcinomas.2,20,22,23,30,31 Amplification of the embryonic stem cell (ESC) factor sex determining Y-box 2 (SOX2) is exclusive to SCC,32,33 and increased gene dosage and protein expression of thyroid transcriptional factor−1/NK2 homeobox 1 (TITF-1/NKX2-1) is prevalent in lung adenocarcinoma, indicating that both transcriptional factors most likely function as lineage-specific genes in lung cancer.34–36 This section will highlight molecular abnormalities, with special emphasis on genomics and DNA alterations, of lung adenocarcinoma that render this malignancy a unique entity.

Lung adenocarcinoma genomics

KRAS, a low molecular weight guanosine triphosphatase (GTPase) and the major upstream activator of the RAF-MEK-ERK pathway, is considered to be the most frequently mutated oncogene in lung adenocarcinomas.19,21,37 As mentioned before, mutations in this oncogene are more common in adenocarcinomas arising in ever-smoker (former and current) lung cancer patients.13,16,17,19,20,22,23,30 Most KRAS mutations involve replacing glycine 12 with other amino acids such as valine (G12V), aspartic acid (G12D) and glutamic acid (G12D), and replacing glycine 13, and are activating rendering the gene with reduced GTPase activity with subsequent potent activation of mitogenic and proliferative signalling through the RAF-MEK-ERK cascade.19,37–39 Thus, it is plausible to assume that therapeutic strategies targeting KRAS would be very beneficial in adenocarcinomas with activating mutations in this oncogene. However, there are currently no available treatment options for KRAS-mutant lung adenocarcinomas compared with tumours with mutations in other oncogenes,40 as strategies targeting KRAS farnesylation, MEK activation and BRAF have either failed or yielded no responses.41–43

In contrast to KRAS, mutations in EGFR are strongly linked to lung adenocarcinomas arising in never-smokers and are suggested to molecularly drive the disease in this patient subpopulation.13,14,17,18,22,23,30,37 It is important to note that EGFR mutations are more common in East Asian patients and in female gender.2,13,22 Small in-frame deletions in exon 19 and missense mutations in exon 21 (L858R and L861Q) are the most common mutations detected in EGFR44 and were shown by several ground-breaking studies to underlie sensitivity of lung adenocarcinoma patients to EGFR-targeting small tyrosine-kinase inhibitors (e.g. erlotinib and gefitinib).18,45,46 These studies were the first to prove the feasibility of personalized medicine approaches for the management of lung adenocarcinoma and represent the landmark for the application of genomic medicine in this disease.

The discovery of fusions involving anaplastic large-cell lymphoma kinase (ALK) with the upstream partner echinoderm microtubule associated protein 4 (EML4) by Soda et al.47 further opened new venues for genomic-driven personalized treatment strategies for lung adenocarcinoma.48 Both EML4 and ALK are located in chromosome 2p, and fusion of both involves small inversions within this region.47EML4-ALK fusion results in constitutive activation of the ALK kinase rendering cells and adenocarcinoma tumours expressing this oncogenic fusion protein sensitive to ALK inhibitors.47–49 Like EGFR mutations, EML4-ALK fusion genes are prevalent in lung adenocarcinomas, younger patients and, in particular, in lifetime never-smoker patients or light smokers.49,50 Importantly, EML4-ALK fusion genes are mutually exclusive from EGFR and KRAS mutations, indicating that such molecular defects function as drivers of pathogenesis, which is clinically important, as it increases potential of personalized treatment options that target driver oncogenes in this malignancy.50

Other mutually exclusive and, thus, potential oncogenic drivers have been identified in lung adenocarcinomas. Mutations in HER2/NEU were found by Stephens et al. to occur in lung adenocarcinomas.31 Compared with mutations of EGFR oncogene, HER2/NEU mutations are less frequent30 and have not been successfully exploited in the clinic for lung adenocarcinoma treatment.51 Similar to HER2/NEU, mutations in BRAF also occur at low frequency in lung adenocarcinoma and are exclusive from EGFR and KRAS mutations, as well as from EML-ALK fusions.50 There are yet no successful target-specific treatment strategies for lung adenocarcinoma with BRAF mutations. It is important to note that mutations in HER2/NEU and BRAF have not been found in lung SCC.50

The TP53 tumour suppressor is the most frequently mutated gene in lung adenocarcinoma (65–70%). Various abnormalities in TP53 were identified in lung adenocarcinoma almost two decades ago52,53 and more recently in the tumour-sequencing project37 and occur in similar pathways to those mediated by the oncogenic driver mutations mentioned earlier.20,22,37 Mutations in the CDKN2A tumour suppressor have also been described in lung adenocarcinoma.54,55 However, methylation56 or focal DNA deletion36,55 rather than mutation of this tumour suppressor seems to be more frequent and occurs earlier in lung cancer pathogenesis.5 With the advent of various technologies including single nucleotide polymorphism (SNP) arrays, mass spectrometry mutational analysis and more recently second-generation sequencing, and the undertaking of large-scale studies such as the tumour-sequencing project,37 our knowledge of the mutational spectrum of lung adenocarcinoma has substantially increased. New mutated oncogenes and tumour suppressor genes have been identified in lung adenocarcinoma and along with previously characterized mutated genes are outlined in Table 1 and have been reviewed in detail elsewhere.50,51 It is important to note that many, if not most, of these mutations are not mutually exclusive of other driver mutations and events such as EGFR mutations and EML4-ALK fusions. For example, PIK3CA mutations were always found together with EGFR mutations in never-smoker lung adenocarcinomas.57 It is also worthwhile to mention that some of the outlined mutations have been detected in both lung adenocarcinomas and SCC (e.g. PIK3CA and MET) or only in the former subtype of NSCLC (e.g. MEK1, HER2/NEU and BRAF).50 It is unknown for most of mutations occurring in lung adenocarcinomas and recently identified by exon-directing sequencing in the tumour-sequencing project, whether they also occur in lung SCC. The discovery of new oncogene and tumour suppressor mutations in lung adenocarcinoma occurring in mutually exclusive and inclusive cell signalling pathways expands the range of possible target-specific and even combinatorial personalized therapeutic strategies for this disease.

Table 1.  Mutations in lung adenocarcinoma
Mutation rate (%)
 ALK (fusion)5–15
Tumour-suppressor genes 

Copy-number alterations

Gene dosage variations occur in many pathological conditions. For example, in cancer, deletions and copy-number increases modulate the expression of tumour-suppressor genes and oncogenes, therefore contributing to tumourigenesis. Characterization of these DNA copy-number changes is vital for both the basic understanding of cancer and its diagnosis. Copy-number alterations are routinely assessed in laboratories by fluorescent in situ hybridization (FISH) techniques as well as genomic polymerase chain reaction (PCR), including quantitative PCR approaches. However, these approaches are labour-intensive and would hamper the discovery and complete understanding of the genome of tumours in large-scale studies. High-throughput and genome-wide analysis of DNA copy-number alterations was made possible by comparative genomic hybridization (CGH) approaches, which utilize differentially labelled test and reference genomic DNA that are co-hybridized to normal metaphase chromosomes.58 CGH, however, exhibits limited mapping resolution even when compared with lower throughput higher resolution techniques, such as FISH.59 Subsequently, high-resolution genome-wide analysis was successfully performed using cDNA microarray-based CGH and SNP arrays coupled with statistical methods to assess both the amplitude and the frequency of copy-number changes at each position in the genome.59

Genome-wide alterations in human lung adenocarcinoma tumours stemming from several major studies have assuredly increased our understanding of the molecular pathogenesis of this major malignancy.36,60,61 Earlier chromosomal CGH studies have revealed in NSCLC recurrent gains at 1q31, 3q25–27, 5p13–14 and 8q23–24, and deletions at 3p21, 8p22, 9p21–22, 13q22 and 17p12–13.62–66 Moreover, these early studies already highlighted genomic differences and similarities between lung adenocarcinomas and SCC; most prominent of which were gains in 3q mainly by lung SCC.62–66 For example, Petersen et al. found genomic aberrations that characterize lung adenocarcinomas from SCC, namely gain of 1q23, and the deletion at 9q22 were significantly associated with adenocarcinomas, whereas the loss of chromosomal band 2q36–37 and gain of 3q were strongly associated with SCC.66 Bjorkqvist et al. demonstrated that 94% (15/16) of lung SCC analysed had a gain in 3q, whereas only 24% (4/17) of the adenocarcinoma samples exhibited a gain in 3q, and high-level amplifications in 3q were only detected in SCC.63 In addition, Luk et al. demonstrated that gains at 1q22–32.2, 15q, 20q and losses at 6q, 13q and 18q were more prevalent in lung adenocarcinomas, whereas SCC, as shown in earlier studies, exhibited gains/amplifications at 3q.64 Moreover, Pei et al. showed that besides prevalent gain of 3q in lung SCC, gain of 20p13 and loss of 4q also were significantly higher in SCC, whereas gain of 6p was more common in adenocarcinomas.65 Massion et al. utilized higher resolution analysis by array CGH to study copy-number alterations of known loci and found that the most distinct genomic aberration between both NSCLC subtypes was gain of 3q22–26 and loss of 3p by lung SCC.67 Moreover, and in the same study, PIK3CA oncogene was found to be a member of the chromosome-3q amplicon with higher copy number and expression in SCC but not in adenocarcinomas.67

More recent studies utilized more advanced technologies to query the genome of lung adenocarcinoma and associate specific gene modulations with chromosomal and loci gain or losses. In the study by Tonon et al., high-resolution cDNA microarray-based CGH was utilized to study the genomic profiles of 18 lung adenocarcinomas and 26 SCC, as well as 14 NSCLC cell lines.61 The study identified 93 focal copy-number alterations that mainly comprised previously uncharacterized recurrent high-amplitude amplifications and homozygous deletions.61 Besides confirming previous findings by chromosomal CGH and highlighting known gains (e.g. 1q31 and 3q25–27) and known deletions (e.g. 3p, 8p22 and 13q22), the study by Tonon et al. was able to map specific genes to focal copy-number alterations including CDKN2A and RB1 tumour-suppressor genes and EGFR, and KRAS oncogenes.61 However, when comparing both adenocarcinomas and lung SCC, the study found that the only notable genomic difference between both NSCLC histological subtypes was the well-characterized gain of 3q26–29 in SCC that included TP63, well known for its role in squamous differentiation, and concluded that similar oncogene and tumour-suppressor gene aberrations drive lung adenocarcinoma and SCC pathogenesis.61 The study by Tonon et al. was a major step in understanding the genomic profiles of NSCLC tumours despite the small number of lung adenocarcinomas and SCC analysed.

Later on, Weir et al. studied the genomic profiles of a large collection of primary lung adenocarcinomas (n = 371) by high-density SNP arrays using 238 000 probe sets.36 The report by Weir et al. was a milestone in understanding the lung adenocarcinoma genome, as it unravelled previously uncharacterized amplified genes and loci that otherwise may have not been identified using a small number of primary tumours. The study identified 39 large-scale chromosomal arm gain or loss, 26 of which were significantly recurrent across many lung adenocarcinomas. Importantly, the large-scale study by Weir et al. identified 31 recurrent focal events that included 24 amplifications and 7 homozygous deletions. Using dense SNP arrays coupled with statistical methods (genomic identification of significant targets in cancer), the group was able to associate specific genes to the focal events and rank significance of events based on both the amplitude and frequency of copy-number change,36 similar to what was performed by Tonon et al.61 to identify minimal common regions of copy-number alterations. The most significant focal regions of amplification included known oncogenes such as MDM2 (12q15), MYC (8q24), EGFR (7p11), CDK4 (12q14), KRAS (12p12), CCNE1 (19q12), ERBB2 (17q12), CCND1 (11q13) and TERT (5p15).36 It is important to note that three of these oncogenes, EGFR, KRAS and ERBB2, are mutated in lung adenocarcinoma, as discussed before, suggesting that amplification and mutation of these oncogenes may cooperate systematically in lung adenocarcinoma pathogenesis. The most significant focal regions of deletions also included known tumour-suppressor genes such as CDKN2A (9p21) and PTEN (10q23).36 Although 5p was previously shown to be gained in lung adenocarcinomas, the identity of genes involved in this gain was unknown prior to the study by Weir et al. The application of more advanced technologies to characterize the genomic profile of lung adenocarcinomas enabled the group to highlight previously unknown associations between canonical cancer-associated genes and known loci copy-number alterations, as well as to identify potentially new oncogenes. For example, 10 genes, including TERT, were found in the study to be included in the 5p15 region.36 Furthermore, the study highlighted previously uncharacterized amplification of TITF-1/NKX2-1 (14q13.3) in lung adenocarcinomas and demonstrated the oncogenic role of this lineage-specific transcriptional factor in lung cancer cells evidenced by the effect of RNA interference-mediated knockdown of its expression on anchorage-independent growth of lung adenocarcinoma cell lines with amplification of this gene.36 The amplification/copy-number gain of TITF-1 in lung adenocarcinoma was later confirmed in different studies including that by Kwei et al. using array CGH.34 However, it is important to mention that although TITF-1 amplification is generally prevalent in lung adenocarcinoma and is thought to function as a lineage-specific oncogene in this subtype of NSCLC, it has also shown by FISH analysis in lung SCC, which will be discussed later in this review.

As mentioned before, numerous earlier studies have demonstrated that gain of 3q is a genomic feature of lung SCC. Similar to the aforementioned study by Weir et al., Bass and colleagues utilized high-density Affymetrix SNP arrays to analyse 40 oesophageal and 47 lung SCC, which confirmed that gain of 3q26 was the main focal amplification event in lung and oesophageal SCC.32 Importantly, the same study revealed the presence of the ESC factor in this region, which was later on confirmed to be amplified in SCC and not in lung adenocarcinomas, and promoted the survival of cell lines of the former NSCLC subtype harbouring amplification of this transcriptional factor and further suggesting that lung adenocarcinomas are genetically different from SCC.

More recently, high-resolution array CGH was performed on never-smoker lung adenocarcinomas (n = 60) with known mutation status of EGFR.68 This study identified 14 new minimal common regions of gain or loss and confirmed previously known copy-number alterations such as those involving TERT (5p), TITF-1 (14q13), EGFR (7p) and CDKN2B (9p). Notably, the study revealed new genomic aberrations, namely the 16p11.2 region harbouring the FUS oncogene that functions in transcriptional splicing and DNA repair.68 Gain of 16p11.2 was evident in greater than 20% of the never-smoker lung adenocarcinomas analysed and mRNA levels of FUS correlated with copy gain of 16p, as they were higher in tumours with gain of this region compared with tumours that did not exhibit 16p copy gain. Importantly, the study by Job et al. revealed genomic copy-number alterations that were highly associated with presence of EGFR mutation, an oncogenic driver of never-smoker lung adenocarcinoma pathogenesis.68 Gains of 7p were significantly associated with presence of EGFR mutations and included EGFR gene, suggesting, as mentioned before, that copy-number alternations and mutations cooperate at the genomic level in lung adenocarcinoma pathogenesis. However, it cannot be ignored that EGFR copy gain or amplification may favour the detection of EGFR mutations in a heterogeneous tumour due to the mutant allele-specific imbalance phenomenon.69 In a more recent study by Yuan et al. and also using array CGH technology, gains in 7p, including the EGFR gene, were common in EGFR mutant lung adenocarcinomas and predicted overall and recurrence-free survival in this disease population.70 More importantly, in contrast to EGFR mutations, presence of genes (including EGFR) within the 7p gain predicted poorer response to tyrosine kinase inhibitors targeting EGFR.70

Gene expression profiling

Numerous studies have utilized microarray technology to analyse the global transcriptome of NSCLC for diagnosis (discussed later), molecular classification, response to therapy and prognosis. For the purpose of this review, we will discuss several key studies that investigated expression profiles of lung adenocarcinomas to further understand the molecular biology of this prevalent lung malignancy. Bhattacharjee and colleagues utilized arrays to study adenocarcinomas of lung origin (n = 127), SCC (n = 21), carcinoids (n = 20), SCLC (n = 6) and 17 normal samples.71 The study found that differential expression profiles segregated the samples into different clusters based on histology, evidenced by the two-dimensional cluster analysis. Genes associated with squamous differentiation such as keratin and TP63 were overexpressed in SCC, and neuroendocrine markers were enriched in the SCLC cluster. Importantly, the study by Bhattacharjee et al. also analysed the adenocarcinomas alone by hierarchical clustering and demonstrated that the adenocarcinomas were heterogeneous in molecular make-up, being separated into various clusters with distinct clinical outcomes.71 Similarly, Garber et al. utilized cDNA microarrays to study expression profiles of 41 adenocarcinomas, 16 SCC, 5 large-cell carcinomas and 5 SCLC, as well as 5 normal lung samples.72 Again, lung adenocarcinomas were most heterogeneous and were divided into different clusters that were associated with clinicopathological variables such as tumour grade.72 Later, Hayes et al. found that adenocarcinoma subtypes identified by the Bhattacharjee and Garber studies were reproducible in additional microarray datasets.73 Several other studies have also demonstrated, using microarray expression profiling technology, the heterogeneity of lung adenocarcinomas and their distinction from other lung cancer subtypes. As reviewed by Yatabe, global gene expression profiling was able to subdivide lung adenocarcinomas into various clusters that correlated with EGFR mutation status, prognosis, expression of lung peripheral airway markers such as surfactant proteins (SP) and CC10, as well as enrichment of the BAC subtype.44

Lung adenocarcinoma preneoplasia

From biological and histopathological perspectives, NSCLC is a complex malignancy that develops through multiple preneoplastic pathways. Lung adenocarcinoma, a major subtype of NSCLC, has been increasing in incidence globally in both smokers and non-smokers13 with a concurrent decrease in SCC frequency. It has been postulated that the increasing incidence of lung adenocarcinomas compared with SCC is in part due to the change in the type of cigarettes used (lower nicotine and tar) and smoking habits and behaviour.11 Anatomical differences in the location of diagnosed lung adenocarcinomas and SCC strongly suggest that both NSCLC subtypes develop through different histopathological and molecular pathways and have different cells of origin; however, the specific respiratory epithelial cell type from which each lung cancer type develops has not been established with certainty.5 Lung SCC is typically centrally located in the lung and is thought to arise from the major bronchi. In contrast, lung adenocarcinomas that are usually peripherally located are believed to arise from small bronchi, bronchioles or alveoli of the distant airways of the lung. The sequence of histopathological changes in bronchial epithelia that precede the development of lung SCC has been characterized.6,27 However, the sequential preneoplastic changes, as well as the corresponding molecular abnormalities, leading to the development of lung adenocarcinomas are poorly documented.

Histopathological development of lung adenocarcinoma

Clara cells and the type II pneumocytes are believed to be the progenitor cells of the peripheral airways, and peripherally arising adenocarcinomas often express markers of these cell types.44,74 Atypical adenomatous hyperplasias (AAH) are considered to be a precursor lesion for peripheral lung adenocarcinomas.5,7 However, and until now, AAH is the only sequence of morphological change identified so far for the development of invasive lung adenocarcinomas, and there is consensus that the pathogenesis of many adenocarcinomas is largely unknown. The postulated progression of AAH to adenocarcinomas in situ, which is characterized by the growth of neoplastic cells along pre-existing alveolar structures without invasion, is supported by molecular studies.75 Distinction between highly atypical AAH and what was known as BAC is sometimes difficult. Therefore, and as mentioned before, the ERS, IASLC and ATS sponsored a new classification of lung adenocarcinoma that presented several modifications to the WHO 2004 criteria for diagnosis of resected adenocarcinoma specimens. The term BAC was suggested to be discontinued and replaced with AIS and MIA used for small adenocarcinomas with either pure lepidic growth or predominant lepidic growth with less than 5 mm invasion, respectively. Importantly, the clinical features of both adenocarcinoma progression types are unique as patients with AIS or MIA have a 100% 5-year survival rate after respective surgery.26

The differentiation phenotype derived from immunohistochemical and ultrastructural features indicates that AAH originate from the progenitor cells of the peripheral airways.26,44,74 Surfactant apoprotein and Clara cell-specific 10-kDd protein are expressed in almost all AAH. In addition, an increasing body of evidence suggests that AAH is the precursor of at least a subset of adenocarcinomas. For example, AAH is most frequently detected in lungs of patients bearing lung cancers (9–20%), especially adenocarcinomas (as many as 40%), compared with lung SCC (11%).76 It is important to note that AAH is detected more frequently in East Asian patients relative to Western patients. In such studies, it has been suggested that AAH is involved in the linear progression of cells of the ‘terminal respiratory unit’ (TRU) to AIS and subsequently invasive adenocarcinomas7,44,74 due to the expression of common genes between the TRU and the AAH, which is discussed later. Such studies have postulated that most, if not all, peripheral lung adenocarcinomas progress from alveoli through AAH as a preneoplastic lesion. As will be discussed further later, we have noted similar molecular abnormalities (e.g. EGFR mutations) between adenocarcinomas arising in never-smokers and small bronchioles within the localized and adjacent fields of the adenocarcinomas, suggesting that lung adenocarcinomas may arise from bronchiolar epithelium and small bronchi, and not only from alveoli.77,78 In a recent review by Yatabe et al., a nonlinear progression schema for lung adenocarcinomas was suggested.7 In this nonlinear schema, Yatabe et al. postulated that lung adenocarcinomas of the TRU subtype, as named by the authors, develop through AAH. On the other hand, and according to the same nonlinear progression hypothesis, some lung adenocarcinomas arise through unknown preneoplastic precursors from other cells besides the TRU, which we believe may as well be the bronchiolar epithelium.77,78

Molecular pathogenesis of lung adenocarcinomas

Several molecular changes frequently present in lung adenocarcinomas are also present in AAH lesions, and they are further evidence that AAH may represent true preneoplastic lesions.79 The most important finding is the presence of KRAS (codon 12) mutations in as many as 39% of AAH, which are also a relatively frequent alteration in lung adenocarcinomas.6,80 Other molecular aberrations that were identified in AAH are overexpression of cyclin D1 (∼70%), survivin (48%) and HER2/neu (7%) proteins.5 Moreover, and as mentioned in the review by Wistuba and Gazdar, some AAH lesions were found to exhibit loss of heterozygosity (LOH) in chromosomes 3p (18%), 9p (p16INK4a, 13%), 9q (53%), 17q and 17p (TP53, 6%).5 It is noteworthy that most if not all of the aforementioned changes identified in AAH lesions are also frequently detected in lung adenocarcinomas. Later, AAH lesions were shown to exhibit LOH of tuberous sclerosis complex (TSC)-associated regions, activation of telomerase, loss of LKB1, overexpression of DICER, a key effector protein for small interfering RNA and miRNA function, and DNA methylation of CDKN2A and PTPRN2.6,81,82 It is important to note that several studies have attempted to globally comprehend differential gene expression patterns and copy-number alterations between low-grade lesions (e.g. precursor lesions) or in situ adenocarcinomas and invasive tumours and found that amplification of the EGFR oncogene was the predominant differential molecular feature between the two different adenocarcinoma grade classes and occurred after mutations in the gene.7 Importantly, as will be discussed in the next section of this review, EGFR mutations also preceded changes in copy number of the gene when studying histologically normal bronchiolar epithelia.78

KRAS and EGFR mutations in lung adenocarcinoma pathogenesis

Although there is only one sequence of morphological change characterized so far for the development of invasive lung adenocarcinomas, namely AAH, a large body of evidence suggests that at least two molecular pathways are involved, the KRAS and EGFR pathways in smoker and never-smoker adenocarcinoma subpopulations, respectively.2,14,16,17,20–23,30 Mutations in EGFR, in particular, in-frame deletions of exon 19 and L858R and L861Q of exon 21, are strongly associated with never-smoking status, female gender and East Asian ethnicity, as well as predict favourable response to EGFR tyrosine kinase inhibitor.2,12,13,17,22,23 On the other hand, mutations in KRAS, the most frequently mutated oncogene in lung adenocarcinoma, based on recent findings of the tumour-sequencing project, are strongly associated with development of adenocarcinomas linked to tobacco consumption.2,16,17,20,21,23

It has been suggested that the vast majority of AAH precursor lesions and adenocarcinomas in situ are associated with the TRU adenocarcinoma subtype that were found to express high levels of TITF-1 and SP, leading to the conclusion that such adenocarcinomas are of the same lineage as terminal airway epithelial cells. In addition, it has been postulated that EGFR mutations are predominant in or specific to peripheral lung adenocarcinomas of the TRU subtype, which were suggested to arise from AAH lesions,44,74,83 as 90 of 97 EGFR mutant adenocarcinomas were positive for TITF-1, and 91 of the 97 tumours were of the TRU subtype.83 In addition, the hypothesis put forward that EGFR mutations are associated with or specific to the TRU subtype of lung adenocarcinomas is also in part due to the observation that the frequency of EGFR and KRAS mutations among AAH lesions, adenocarcinomas in situ and invasive adenocarcinomas is significantly different.7,44 It was determined that whereas KRAS mutations decreased along adenocarcinoma progression, from 33% in AAH to 8% in adenocarcinomas, EGFR mutations were evenly distributed suggesting that KRAS-mutated AAH lesions rarely progress to adenocarcinomas. It is also important to mention, and as reviewed by Yatabe, that several studies performed gene expression profiling of lung adenocarcinomas and other histological subtypes of lung cancer and found that lung adenocarcinomas were heterogeneous and divided into different clusters.44 Clusters with expression of CC10 and features of alveolar signature such as TITF-1 exhibited significantly better survival compared with adenocarcinomas in other clusters and comprised a higher frequency of EGFR mutations.

Mutations in the tyrosine-kinase domain of EGFR mutations were shown to be involved in the early pathogenesis of lung cancer, being identified in histologically normal epithelium of small bronchi and bronchioles adjacent to EGFR mutant adenocarcinomas77 (discussed further in the next section of the review). EGFR mutations were detected in normal-appearing peripheral respiratory epithelium in 43% adenocarcinoma patients,77 but not in patients without mutation in the tumour.77 These findings may signify different cell types comprising the examined epithelia, which could represent sites of the cells of origin for EGFR mutant adenocarcinomas of the lung. Although the cell type having those mutations is unknown, our group has hypothesized that stem or progenitor cells of the bronchial and bronchiolar epithelium bear such mutations. It is also noteworthy that EGFR mutations were identified in only 3 of 40 AAH lesions examined83,84 and were shown to be absent22 or relatively infrequent in what was previously known as BAC of the lung.84 These earlier observations support the argument that abnormalities of EGFR are not only relevant to the pathogenesis of alveolar-type lung neoplasia but also may play drive peripheral lung adenocarcinoma from bronchiolar epithelium cells that are distinct from terminal respiratory and alveolar cells.5,44 The different findings of EGFR mutation rates in AAH lesions may as well reflect the ethnicity (Asian vs Western) of the patients from which the lesions were isolated, as well as the standard practice of detection of small lesions such as AAH.

Field cancerization

Although the majority of lung cancer patients are current or former smokers (approximately 85%), a relatively small fraction of smokers (approximately 15%) develop primary lung tumours. Patients with early stage NSCLC, relative to other early stage malignancies, frequently exhibit recurrence or second primary tumour development after definitive treatment by surgery and removal of the original lung primary tumour. There is a large body of evidence that heavy smokers and patients who have survived an upper aerodigestive cancer comprise a high-risk population that may be targeted for early detection and chemoprevention efforts.6 Although the risk of developing lung cancer decreases after smoking cessation, the risk never returns to baseline. Preneoplastic changes, namely dysplastic histological abnormalities, have been utilized as surrogate endpoints for chemopreventive studies. However, it was suggested that this ‘shooting-in-the-dark’ approach may explain the reasons behind the general failures of clinical chemoprevention studies.3 It is also important to note that we are unable to predict which lifetime never-smokers or definitively treated never-smoker early-stage lung cancer patients will develop lung tumours or relapse. Therefore, novel approaches to identify the best population to be targeted for early detection and chemoprevention should be devised, and risk factors for lung cancer development or relapse need to be better defined. For these important purposes, a better understanding of the biology and molecular origins of lung cancer, for example, lung adenocarcinoma, is warranted. In this section of the review, we will describe the field cancerization phenomenon that herein refers to that occurring due to direct and indirect effects of smoking (field of injury) or independent of smoking in patients with and without cancer, with emphasis on aberrant molecular markers in histological normal epithelia that can be used to increase our understanding of lung cancer pathogenesis.

Smoking damaged epithelium and the lung field cancerization phenomenon

Earlier work by Danely Slaughter et al. in patients with oral cancer and oral premalignant lesions has suggested that histologically normal-appearing tissue adjacent to neoplastic and preneoplastic lesions display molecular abnormalities, some of which are in common with those in the tumours.85 In 1961, a seminal report by Auerbach et al. suggested that cigarette smoke induces extensive histological changes in the bronchial epithelia in the lungs of smokers and that premalignant lesions are widespread and multifocal throughout the respiratory epithelium, suggestive of a field effect.86 This phenomenon, coined ‘field of cancerization’, was later shown to be evident in various epithelial cell malignancies including lung cancer. Some degree of inflammation and inflammatory-related damage is almost invariably present in the central and peripheral airways of smokers and may precede the development of lung cancer.87 Thus, the field of cancerization may be explained by both direct effect of tobacco carcinogens and initiation of inflammatory response. In this context, different theories for the origin of the field of cancerization or smoking-related field of injury have been put forward and will not be discussed here, as they have been nicely and extensively reviewed elsewhere by Steiling et al.88

Several studies focusing on the respiratory epithelium of lung cancer patients and smokers have demonstrated that multiple altered foci of bronchial epithelium are present throughout the airway.27,28,89 A detailed analysis of histologically normal epithelium, and premalignant and malignant epithelia from lung SCC patients indicated that multiple, sequentially occurring allele-specific chromosomal deletions of LOH begin in dispersed clonally independent foci very early in the multi-stage pathogenesis of this smoking-related lung malignancy.27,28 Notably, 31% of histologically normal epithelium and 42% of mildly abnormal (hyperplasia/metaplasia) specimens had clones of cells with allelic loss at one or more regions examined. Moreover, these molecular aberrations were also found in carcinomas in situ and SCC, and at a more advanced level.27 Molecular changes involving LOH of chromosomal regions 3p (DDUT and FHIT genes), 9p (CDKN2A), genomic instability (increased microsatellite repeats) and p16 methylation have all shown to commence in histologically normal or slightly abnormal tissue in SCC patients and in the sequence of pathogenesis of the disease.5 As mentioned before, KRAS is the most mutated oncogene in lung adenocarcinomas.37 Almost 15 years ago, Nelson et al. demonstrated that KRAS mutations are found in histologically normal lung tissue adjacent to lung tumours.90 As will be discussed later, mutations in EGFR were also found in adjacent to tumour histologically normal epithelium.77,78 Similar epigenetic and gene methylation patterns between tumours and adjacent histologically normal epithelia were described. An important study by Belinsky et al. reported aberrant promoter methylation of p16, which was described to be commonly methylated in lung tumours,91 in at least one bronchial epithelial site from 44% of lung cancer patients examined.92 Moreover, p16 and death-associated protein kinase (DAPK) promoter methylation was observed frequently in bronchial epithelium from current and former smoker but not from never-smoker lung cancer patients and persisted after smoking cessation. Notably, 94% of lung tumours exhibited a concordant pattern of p16 methylation with that in at least one bronchial epithelial site.92

The aforementioned molecular abnormalities were detected in histologically normal epithelia adjacent to archival surgically resected tumours from primary lung cancer patients. LOH and microsatellite alterations in multiple foci were also detected in distal histological normal bronchial epithelia of smokers without cancer.93,94 Moreover, and importantly, these molecular abnormalities were detected in bronchial epithelia of cancer-free former smokers that appeared to have persisted for many years after smoking cessation. In addition, LOH was detected in DNA obtained from bronchial brushings of normal and abnormal lungs from patients undergoing diagnostic bronchoscopy and was detected in cells from the ipsilateral and contralateral lung.95 Mutations in TP53 were also described to occur in bronchial epithelia of cancer-free smokers in a widely dispersed manner.96 Similar evidence also exists for promoter methylation and epigenetic changes in smoking-damaged lung epithelium of cancer-free patients. Methylation of various genes, including retinoic acid receptor 2 beta (RAR-b2), H-cadherin, APC, p16 and RASSFF1A was described in bronchial epithelial cells of heavy smokers.97 Moreover, methylation of p16, GSTP1 and DAPK was reported to be evident in bronchial brushings of one third of the cancer-free smokers examined.98 In the same study by Belinsky et al., as mentioned before, methylation of p16 was detected in epithelia of cancer-free smokers.92 A more detailed list of aberrant gene promoter methylation in lung cancer patients and cancer-free smokers is nicely summarized and explained in the review by Heller et al.99

Gene expression profiling of the lung field cancerization

High-throughput microarray profiling was used by several groups to study the transcriptome of lung airways. Hackett et al. utilized microarrays to study the expression of 44 anti-oxidant-related genes using bronchial brushings from cancer-free current smokers and never-smokers, and found significant upregulation of 16 of the antioxidant genes in the airways of smokers compared with non-smokers.100 Later, Spira et al. described global alterations in gene expression between normal-appearing bronchial epithelium of healthy cancer-free smokers and that of non-smokers.101 In addition, and in the reports by Spira et al. and Beane et al., irreversible changes in expression in airways of former smokers after years of smoking cessation were described that were thought to underlay the increased risk former smokers display, compared with never-smokers, for developing lung cancer long after they have discontinued smoking.101,102 Schembri et al. also reported alterations in the expression of miRNA between large airways of current and never-smokers.103 Notably, an 80-gene signature was derived from the transcriptome of large airway epithelial cells that can distinguish smokers without overt cancer from smokers with lung cancer and exhibited statistically significant utility characteristics of a lung cancer biomarker, despite originating from normal bronchial epithelia.104 Moreover, the 80-gene signature, using publicly available microarray datasets, was able to distinguish lung tumours from corresponding normal lung tissues.104 More recently, Gustafson et al. derived a phosphatidylinositol 3-kinase (PI3K) pathway activation signature using recombinant adenoviruses to express the 110α subunit of PI3K in primary human epithelial cells.105 The same study then demonstrated that the PI3K pathway activation signature was elevated in cytologically normal bronchial airways of smokers with lung cancer and with dysplastic lesions.105 Of substantial clinical importance, the study found that the signature was decreased in the airways of high-risk smokers whose dysplastic lesions regressed following treatment with the PI3K inhibitor myoinositol.105

Microarray and gene expression profiling methodologies were also used to demonstrate the wide anatomical spread of the lung field cancerization to epithelial regions that can be non-invasively sampled when devising approaches for early detection of lung cancer. Sridhar et al. highlighted common gene expression alterations in bronchial, nasal and buccal epithelia of smokers, in particular of various detoxification genes that perpetuate the field of cancerization due to tobacco consumption.106 In addition, Zhang et al. identified 119 genes whose expression was affected by smoking similarly in both bronchial and nasal epithelium, including genes related to detoxification, oxidative stress and wound healing,107 and the study by Boyle et al. highlighted significant similarities in expression changes between smokers and never-smokers in oral and bronchial epithelia.108

Lung adenocarcinoma field cancerization

To better understand the pathogenesis of EGFR mutant lung adenocarcinomas, Tang and colleagues investigated the presence of EGFR mutations in normal bronchial and bronchiolar epithelium adjacent to EGFR mutant tumours. As mentioned before, EGFR mutations were detected in histologically normal peripheral epithelia in 44% of lung adenocarcinoma patients with mutations but none in patients lacking mutations in the oncogene.77 Moreover, the same study highlighted more frequent EGFR mutations in normal epithelium within the tumour (43%) than in adjacent sites (24%) suggests a localized field-effect phenomenon for this abnormality in the respiratory epithelium of the lung.77 In addition, a higher frequency of mutations in cells obtained from small bronchi (35%) compared with bronchioles (18%) was detected.77 More recently, EGFR protein overexpression, similar to mutation of the gene, also exhibited a localized field effect, as it was more frequent in normal bronchial epithelia sites within tumours than in sites adjacent to and distant from tumours.78 Interestingly, EGFR copy-number alteration was not evident in normal bronchial epithelia, which is in accordance with findings that EGFR copy number is relatively a late event in pathogenesis of adenocarcinomas.7,78

Field cancerization compartmentalization

The low frequency of molecular abnormalities detected in the centrally located bronchial respiratory epithelium in patients with peripheral lung adenocarcinomas, compared with specimens from patients with SCC and SCLC,89 suggests the presence of two compartments in the lung with different degrees of smoking-related genetic damage. Thus, smokers who develop SCC have more smoking-related genetic damage in the respiratory epithelium of the central airway, whereas patients who develop adenocarcinoma have damage mainly in the peripheral airways (small bronchus, bronchioles and alveoli). While some molecular changes (e.g. inflammation and signalling pathways activation) have been detected throughout the lung airway and include both compartments (central and peripheral airway), other aberrations have been more frequently altered in either central (e.g. LOH, genetic instability evidenced by microsatellite repeats) or peripheral (e.g. EGFR mutations) airways.

Lineage-specific genes in lung cancer

The transformation of normal cells into tumourigenic counterparts is mediated by a complex array of intracellular signals, as well as genetic and epigenetic regulation. It has been suggested that lineage-specific genes, which play important roles in normal developmental processes such as organogenesis or tissue homeostasis and remain to be expressed or become amplified during an acquired pathological condition, are crucial for maintenance of the disease state.32,109 Interestingly, lineage genes can discriminate different subtypes of the same cancer that rise from dissimilar cells/progenitors, for example, adenocarcinomas versus squamous tumours, and might offer new insights into crucial and therapeutically pliable tumour dependencies.109 Various studies have highlighted the potential ‘addiction’ of tumour cells to aberrant and growth-promoting cell signalling mediated by lineage-specific oncogenes, for example, presence of the BCR-ABL fusion oncoprotein in chronic myelogenous leukemia,110 mutations in the KIT oncogene in gastrointestinal stromal tumours,111 amplification of the microphthalmia-associated transcriptional factor (MITF) in melanoma112 and, more recently, amplification of PAX8 in ovarian cancer.113 Two lineage-specific oncogenes have been characterized in NSCLC. Recently, TITF-1 amplification and protein expression were shown to be prevalent in lung adenocarcinomas and elicit growth-promoting signals in this malignancy.34–36 The master ESC transcriptional factor SOX2 was shown to be a member of the 3q locus (3q26.3) that is specifically amplified in lung and oesophageal squamous carcinomas.32 These findings demonstrate that TITF-1 and SOX2 function as lineage-specific oncogenes in lung adenocarcinomas and SCC, respectively, and that targeting pathways downstream of those two master regulators may leverage new therapeutic strategies independently for each NSCLC subtype.


TITF-1 is a homeodomain-containing transactivating factor predominantly expressed in the terminal lung bronchioles and lung periphery in the developing and adult mouse.114,115 In addition, TITF-1 is crucial for branching morphogenesis during normal lung development114–116 and transactivates the expression of the SP, such as SP-A, -B and -C, which are in turn typically expressed in the Clara cells and are important for the differentiation of alveolar type II pneumocyte cells in the peripheral lung.117

Several studies have demonstrated increased copy number and amplification of the 14q13.3 locus that harbours the TITF-1 gene as well as paired box transcriptional factor family member 9 (PAX9) and NKX2.8.34,36 It is postulated that TITF-1 functions as a lineage-specific oncogene in lung adenocarcinoma as knockdown of TITF-1 expression, in cells with amplification of the gene, by RNA interference results in lung adenocarcinoma cell-growth inhibition and apoptosis demonstrating a lineage-specific dependency of lung adenocarcinomas on TITF-1.34–36 Kendall et al. demonstrated that co-amplified TITF-1, PAX9 and NKX2.8 exhibit oncogenic cooperation and cell prosurvival and proliferative properties.118 Overexpression of both TITF-1 and NKX2.8 simultaneously in BEAS-2B immortalized human bronchial epithelial cells elicited the highest increase in cell colony growth compared with single-gene transfected cells.118 Moreover, pathway gene signatures that overlap downstream of both TITF-1 and NKX2.8 defined lung adenocarcinoma patients with most dismal prognosis compared with signatures downstream of either transcriptional factor alone.119 However, recently in KRAS(LSL-G12D/+);p53(flox/flox) mice, TITF-1 was shown to suppress tumourigenesis and limit metastatic potential in vivo.120

Our group and others have demonstrated that TITF-1 copy-number gain or amplification is associated with poor prognosis in NSCLC.121,122 In contrast to the expected pro-survival properties of a cell-lineage oncogene and the association of TITF-1 copy-number gain and amplification with poor survival, TITF-1 protein expression by immunohistochemistry was shown to be a marker of favourable prognosis in NSCLC122–125 including early stage (stage I) lung adenocarcinoma.126 It is worthwhile to mention that TITF-1 protein expression and TITF-1 gene copy number were found to be associated with mutations in the KRAS and EGFR oncogenes, respectively.122 As mutations in EGFR and KRAS occur almost mutually exclusively in lung adenocarcinomas2 and were suggested to function in different lineages of lung adenocarcinomas,109 it is possible that TITF-1 expression is aberrantly differently controlled within different subsets of adenocarcinomas. It is also important to note that TITF-1 copy-number gain was also demonstrated in lung SCC.34,122,125 It is plausible that TITF-1 copy gain may only be a surrogate marker in SCC of another molecular defect in a gene nearby or within the 14q13.3 amplicon, for example, NKX2.8 or PAX9. The significance of the infrequent copy number increase of TITF-1 in lung SCC remains elusive.


SOX2 was suggested to play key developmental roles in the formation of the lung, trachea and oesophagus based on its expression pattern in these tissues and organs.127 Interestingly, SOX2 was shown to be important for the morphogenesis of the trachea and oesophagus, and the differentiation of the oesophageal epithelium.128 Moreover, the timing of SOX2 expression in the foregut is tightly regulated, as it is only expressed in the main airways and non-branching bronchioles in the developing and adult mouse lung.127,129,130 Heterozygote and homozygote transgenic mice with mutant SOX2 have substantial defects in lung branching and morphogenesis during development.129,130 Moreover, SOX2 plays key roles in the maintenance of developing and adult tracheal cells evidenced by shorter and injured trachea in mice with knockout of both alleles of the transcriptional factor.130 The numerous functions SOX2 elicits in the differentiation of the conducting airways among other roles are reviewed in more detail by Whitsett and colleagues.131 It is important to note that SOX2 forms a core transcriptional factor complex with OCT4 or OCT1 and TirNaNog/NANOG that binds to enhancer sequences of various genes to regulate the inner cell mass or embryoblast within the blastocyst cavity in embryos.132 Moreover, Boyer and colleagues demonstrated that SOX2 along with OCT4 and NANOG form a core regulatory transcriptional circuitry, signified by a SOX2/OCT4/NANOG expression signature, consisting of autoregulatory and feedforward loops for the pluripotency and self-renewal of ESC.133

As mentioned earlier in the review, various studies have demonstrated that amplification of chromosomal region 3q (3q26.3) is almost specific to lung SCC.61,63,64,66,67 The studies by Bass et al. and Hussenet and colleagues revealed that SOX2 is amplified in this chromosomal region in lung SCC and squamous oesophageal cancers and promotes survival of SCC with amplification of this gene.32,134 Subsequently, increased SOX2 mRNA levels in lung SCC relative to adenocarcinomas was further evidenced by effective separation of both NSCLC subtypes by the previously characterized OCT4/SOX2/NANOG ESC expression signature,133 following analysis of publicly available NSCLC microarray datasets.33 In addition, SOX2 immunohistochemical protein expression was completely absent in lung adenocarcinoma pathogenesis, highly expressed in SCC development and significantly elevated in lung SCC relative to adenocarcinomas.33 Interestingly, Maeir et al. later demonstrated that SOX2 amplification was found in squamous carcinomas originating from other tissues and organs, such as those of the cervix, skin and penis.135 It is noteworthy that SOX2 immunohistochemical protein expression in lung SCC and adenocarcinomas was also observed by other groups but in association with clinicopathological features including patient outcome. Interestingly, Wilbertz et al. reported the association of SOX2 expression with favourable prognosis in lung SCC.136 On the other hand, Sholl and colleagues demonstrated that SOX2 immunohistochemical expression was an indicator of poor prognosis in lung adenocarcinomas.137 Despite the equivocal associations of SOX2 with lung cancer prognosis, various studies have highlighted tumour-promoting roles for this lineage-specific oncogene in lung cancer.32,138,139

McCaughan and colleagues specifically analysed 3q copy-number alteration in bronchial dysplasia of varying grades and severity and demonstrated that SOX2 amplification was present in high-grade bronchial dysplasias but not in low-grade lesions and, importantly, was associated with clinical progression of high-grade preinvasive squamous lesions.140 It is important to mention that Yuan et al. had found relatively high SOX2 immunohistochemical protein expression in normal bronchial epithelia and alveolar bronchiolarization structures.33 Congruent with the study by Yuan et al., the results by McCaughan and colleagues demonstrated the implication of SOX2 in the early pathogenesis of lung SCC.33,140 Given the high SOX2 protein expression in histologically normal bronchial epithelia, amplification of SOX2 in high-grade dysplasia may exacerbate signalling downstream of this transcriptional factor in the course of SCC development. It is unknown whether SOX2 may be amplified in normal bronchial epithelia, in particular, those adjacent to lung SCC with increased dosage of the gene. The findings outlined earlier demonstrate that SOX2 is another cell-lineage oncogene with dissimilar functions between SCC and lung adenocarcinomas.


Lung adenocarcinoma genomics

Studies addressing genomic profiles, including copy-number alterations and mutational spectrums, have substantially increased our understanding of the molecular make-up and biology of lung adenocarcinomas demonstrating that, genomically, this subtype of NSCLC is different from SCC. However, the heterogeneity within lung adenocarcinomas is still poorly understood. For example, it is unknown whether, for example, genomic copy-number alterations found in never-smoker adenocarcinomas are unique to this subtype or whether they are also found in smoker tumours. A large-scale side-by-side genomic analysis of never-smoker and smoker lung adenocarcinomas would shed light on copy-number alterations unique to both subtypes of lung adenocarcinomas. Moreover, it is not clear whether certain copy-number alterations can be clinically exploited for targeted therapy of lung adenocarcinoma. An important step in this direction was the demonstration by Yuan et al. that lung adenocarcinomas with mutant EGFR and amplification of specific genes within the 7p region predict poor response to EGFR targeting tyrosine-kinase inhibitors.70 It is tempting to speculate that an orthogonal study, largely encompassing both copy-number alterations and mutational spectrum and detecting focal amplification of oncogenes and loss of tumour-suppressor genes, would, for example, highlight potential targets of therapy in EGFR, KRAS and ALK wild-type lung adenocarcinomas for which there is an unmet need for therapeutic strategies.

Next-generation sequencing

Next-generation sequencing (NGS) technology, through whole-genome, whole-exome and whole-transcriptome approaches, holds great promise for providing invaluable insights into lung adenocarcinoma biology, diagnosis, prevention and therapy.141 NGS enables the sequencing of expressed genes, exons and complete genomes providing data on levels of expression with a substantially larger dynamic range compared with array technology, sequence alterations, single nucleotide variations, as well as structural genomic aberrations.141 A handful of studies have successfully applied NGS approaches to sequence one or two human lung tumour samples or cell lines demonstrating the feasibility of systematic, genome-wide characterization of rearrangements and alterations in complex human cancer genomes.141–144 NGS analysis of a significant number of lung adenocarcinomas and/or NSCLC with characterized mutational status of known oncogenes (e.g. EGFR and KRAS) undoubtedly represents an important next step in furthering our comprehension of lung cancer biology. However, the application of NGS technology in clinical decision-making and personalized medicine is yet challenging.

Field cancerization and lung adenocarcinoma pathogenesis

Applying the same advanced high-throughput methodologies currently used in studying established tumours for the genetic analysis of lung adenocarcinoma preneoplasia and intraepithelial lesions, as well as histologically normal adjacent regions, is expected to expand our understanding of the biology of this prevalent disease. An important step in this direction was a recent study by Beane et al. in which RNA of bronchial airway epithelial cell brushings from healthy never-smokers and smokers with and without lung cancer was analysed by RNA sequencing.145 The study highlighted transcripts whose expression was either not interrogated by or was not found to be significantly altered when using microarrays demonstrating that NGS, like in established lung tumours, has the potential to provide new insights into the biology of the airway field cancerization associated with smoking and lung cancer.145

Earlier findings demonstrated that centrally located lung SCC and peripherally located lung adenocarcinomas elicit and perpetuate differential effects on the airway epithelia. We believe that these effects overlap with those of the response of the host to tobacco exposure (reviewed by Steiling et al.88) but may be unique in several aspects. Changes in expression in the lung field of injury have shown to be similar in the large and small airways, and it is unknown whether they are associated with the development of the particular subtype of NSCLC. Addressing this question may be highly pertinent because both NSCLC subtypes display different genomic features, as previously discussed, and, therefore, are clinically managed by significantly dissimilar treatment strategies, let alone differences among various subtypes of lung adenocarcinomas. Moreover, a compartmental approach in studying the field of cancerization (Fig. 1) will shed light on events in the early pathogenesis of lung adenocarcinomas versus SCC and unravel biomarkers that can be lineage specific and can guide personalized chemoprevention strategies suitable for each different NSCLC subtype, which may reduce the relatively high frequency of relapse of early stage patients.

Figure 1.

Molecular analysis of the lung field cancerization. It is unknown whether changes in expression in the lung field cancerization are associated with the development of a particular subtype of non-small cell lung cancer (NSCLC), that is, adenocarcinomas compared with squamous-cell carcinomas (SCC). Analysing local and distant field cancerization independently for lung adenocarcinomas and SCC may shed light on events common or unique to the molecular pathogenesis of the two major subtypes of NSCLC. Such a ‘compartmental’ approach in studying the field cancerization may unravel biomarkers that can guide personalized prevention strategies suitable for each different NSCLC subtype.


Despite numerous efforts that have focused on increasing our understanding of the biology of lung adenocarcinomas, this subtype of NSCLC that is increasing in incidence compared with SCC, constitutes for approximately half of lung cancer deaths each year, which in turn comprise the biggest share of cancer-related deaths in the United States and worldwide. Compared with advances in targeted and personalized therapy of lung adenocarcinomas, little progress has been made in the tailored prevention of this fatal malignancy leading to a substantially decreased enthusiasm. This may change with the recent encouraging and significant findings of the NLST. Various molecular markers and expression classifiers previously described in the lung airways and in less invasive sites of the field cancerization, for example, nasal, sputum and exhaled breath condensates, can aid in selecting high-risk individuals best suited for CT screening for example. A comprehensive analysis of early molecular events in lung adenocarcinoma pathogenesis will undoubtedly unravel biomarkers that can, in the future, aid prevention through personalized strategies, deliver its longstanding promise to oppose this disease.


Funded in part by a Lung Cancer Research Foundation grant (HK) and DoD W81XWH-10-1-1007 (IIW).