Department of Pathology and Immunology, Washington University School of Medicine, St. Louis, Missouri
Corresponding author: Cory T. Bernadt, MD, PhD, Department of Pathology and Immunology, Washington University School of Medicine, Campus Box 8118, 660 S. Euclid Ave, St. Louis, MO 63110; Fax: (314) 747-2663; email@example.com
We thank Haley Abel, PhD, and David Spencer, MD, PhD, for their assistance with data analysis. We also thank Eric Tycksen and the Washington University Genomics and Pathology Services (WU-GPS) for their assistance with sample preparation and the Genome Technology Access Center in the Department of Genetics at Washington University School of Medicine for help with DNA sequencing.
Molecular testing of cancer is increasingly critical to medicine. Next-generation sequencing (NGS) provides comprehensive, unbiased, and inexpensive mutation analysis of multiple genes with a single test. However, to the authors' knowledge, the usefulness of NGS in fine-needle aspiration (FNA) specimens, which may be the only specimens available, is unknown. Non-small cell lung cancer (NSCLC) is an ideal model in which to evaluate cytopathologic applications of NGS because FNA is used for diagnosis and staging and specific molecular therapeutic targets in NSCLC are known. Herein, the performance and quality of targeted NGS in FNA specimens from a small series of lung adenocarcinomas is evaluated.
Sequence data were generated from FNA specimens and paired formalin-fixed paraffin-embedded (FFPE) tissues from 5 patients with lung adenocarcinoma. DNA was isolated from FNA aspirate smears and cores of FFPE tissue. Multiplex sequencing of 27 cancer-related genes was performed after hybrid capture enrichment. Read-quality metrics and single-nucleotide variant calls were compared.
The overall concordance of total reads across specimens was > 99% and the average concordance of single-nucleotide variants was 99.5%. The total reads generated, as well as the percentages of mapped, on-target, and unique reads were statistically indistinguishable (P > 0.05) between FFPE and FNA preparations. There also was no difference in the depth of sequencing coverage, including exon-level coverage in known lung cancer mutation hotspots.
Advanced non-small cell lung cancer (NSCLC) accounts for approximately 85% of all lung cancers, is frequently diagnosed in the advanced clinical stages, and is associated with a poor prognosis. An increased understanding of the molecular pathogenesis of NSCLC has produced promising treatment options using targeted therapy.[2, 3]
Clinical studies with the tyrosine kinase inhibitors gefitinib[4, 5] and erlotinib[6, 7] have produced significant improvements in tumor response and survival in patients who have NSCLC with mutated epidermal growth factor receptor (EGFR). Similarly, clinical trials using the tyrosine kinase inhibitor crizotinib have shown improved response rates and progression-free survival in patients with advanced NSCLC who harbor anaplastic lymphoma kinase (ALK) rearrangements.[8, 9] In addition, several additional biomarkers with potential prognostic significance in NSCLC, including ROS1,[10, 11] MET, ERBB2, and RET, have also been described. The importance of identifying specific molecular aberrations in lung cancer underscores the shift toward personalized medicine to optimize therapeutic regimens.
Next-generation sequencing (NGS), also referred to as massively parallel sequencing, has recently emerged as a cost-effective method for identifying clinically actionable genetic mutations across many genes (ie, in parallel). NGS has been used effectively to detect somatic mutations in hematopoietic and solid tumors as well as constitutional mutations in genes associated with inherited cancer predisposition syndromes.[15-19] In addition, studies have shown that NGS is capable of detecting the full range of mutation types, including substitutions, insertions/deletions, translocations, and copy number changes.[18, 20-25] Coupled with targeted DNA enrichment methods, NGS provides comprehensive, unbiased, and inexpensive mutation analysis of multiple target genes with a single test.
Although we have previously demonstrated that FFPE and fresh tissue are equivalent substrates for NGS, to the best of our knowledge the usefulness of NGS in fine-needle aspiration (FNA) smears is unknown because of differences in cellularity and fixation procedures specific to these sample types. To determine whether DNA harvested directly from the aspirate smear slides of FNAs is suitable for NGS, we conducted an in-depth comparison of NGS sequence data obtained from FNA smears and formalin-fixed paraffin-embedded (FFPE) tissue. Targeted, solution-phase capture enrichment and Illumina paired-end sequencing (Illumina Inc, San Diego, Calif) were used to sequence 27 genes from DNA extracted from FNA smears (percutaneous computed tomography [CT]-guided or endobronchial ultrasound-guided) and paired FFPE tissue resections from patients with primary lung adenocarcinoma. For the FNA smears, both Wright-Giemsa/Diff-Quik (DQ)–stained smears and Papanicolaou (Pap)-stained smears were sequenced to explore the effects these processing variables may have on NGS data. For each set of sequence data, the current analysis focused on all aspects of the analytical process, including quality measures of the DNA prepared from the specimens, raw sequence outputs, mapping results, coverage depths, and single-nucleotide variant (SNV) calls.
MATERIALS AND METHODS
The Human Subjects Committee at Washington University School of Medicine approved this study, and informed consent of subjects was waived.
Five cases of primary lung adenocarcinoma diagnosed by either CT-guided or endobronchial ultrasound-guided FNA with subsequent surgical resection were identified. Cases were selected from archived specimens submitted to the Lauren V. Ackerman Laboratory of Surgical Pathology for routine processing and diagnosis. FNA smears were either air-dried and methanol-fixed or immediately fixed in ethanol. Air-dried smears were stained by a modified Wright-Giemsa/DQ method using Diff-Quik Solutions I and II (Siemens Healthcare Diagnostics Inc, Deerfield, Ill) as follows: methanol (1 minute), DQ Solution I (1 minute), and DQ Solution II (2 minutes). Alcohol-fixed smears were stained by hand using a progressive Pap method as follows: 0.1% acid alcohol (1 minute); water (10 dips); hematoxylin (30 seconds); water (10 dips); water (10 dips); water (bluing step: 30 seconds); Orange G-6 counterstain (20 seconds); 95% ethanol (10 dips twice); eosin azure (EA)-50 (7 minutes), 100% ethanol (15 dips, 3 times); 100% ethanol (3 minutes); and xylene (10 dips, 3 times) followed by coverslipping. One DQ-stained smear and 1 Pap-stained smear were selected from each case. Cases were randomly selected from those that contained abundant viable tumor cells with little to no contaminating benign cells (see Supplementary Figs. 1 and 2, available online).
Slide coverslips were detached in xylene, and the slides were rehydrated by successive ethanol washes (95%, 70%, 50%, and 30%) followed by soaking in phosphate-buffered saline for 2 minutes. The slides were allowed to air-dry. Using a new, flat, single-edged razor blade, the entire contents of the slide were scraped into 200 μL of phosphate-buffered saline, and were kept frozen at −80°C until DNA extraction could be performed. DNA extraction was performed using the QIAamp DNA Blood Mini Kit (Qiagen, Germantown, Md) according to the manufacturer's protocol.
For FFPE tissue specimens, DNA was extracted from 3 cores of the paraffin block measuring 1 mm each. The samples were first deparaffinized by incubation with CitriSolv (Thermo Fisher Scientific, Waltham, Mass) followed by an ethanol wash. DNA extraction was then performed using the DNeasy Blood and Tissue kit (Qiagen) according to the manufacturer's protocol.
DNA quality and quantity assessment was performed for each sample using a Qubit Fluorometer (Life Technologies, Grand Island, NY). Genomic DNA fragmentation/degradation was assessed by electrophoresis using a 0.8% agarose gel.
All samples were capture-enriched using the WU-CaMP27 panel (Washington University Genomics and Pathology Services, St. Louis, Mo) that includes 27 genes commonly mutated in cancer (Table 1). Up to 1 μg of extracted DNA was fragmented to 200 base pair (bp) to 250 bp using a Covaris E210 instrument (Covaris Inc, Woburn, Mass). For specimens that yielded < 1 μg of DNA, the entire yield of extracted DNA was used (range, 170 nanograms [ng]-1470 ng for FNA specimens). Fragmentation was verified on an Agilent 2100 Bioanalyzer (Agilent, Santa Clara, Calif), and the fragmented DNA was purified with Agencourt AmpureXP beads (Beckman Coulter Inc, Danvers, Mass), end-repaired, polyadenylated with Klenow DNA polymerase, and ligated to universal Illumina sequencing adapters. Library fragments were then bead purified and analyzed for adequate ligation on an Agilent 2100 bioanalyzer. Limited-cycle polymerase chain reaction (PCR) with sample-specific, index-tagged primers was then performed to enrich for ligation products with the appropriate configuration (ie, ligation of 1 of each of the adapters on either end). The number of PCR cycles was based on the DNA extraction yield (100 ng-249 ng: 10 cycles; 250 ng-749 ng: 9 cycles; and 750 ng-1000 ng:7 cycles). Whole-genome libraries were enriched for exons plus 200 bp of flanking intronic sequence and 1 kilobase pair flanking the first and last exon of the 27 genes targeted by the WU-CaMP27 gene panel using a custom Agilent SureSelect biotinylated cRNA probe set. SureSelect reagents were prepared according to the manufacturer's instructions, and 500 ng of each indexed library was hybridized at 65°C for 24 hours. Captured library fragments were washed and purified from unbound material using MyOne T1 streptavidin beads (Life Technologies Corporation) and then resuspended and purified by bead purification before a final limited-cycle PCR amplification. Verification of library size and quantity was performed by electrophoresis using an Agilent Bioanalyzer. Enriched libraries from all samples in this experiment were pooled and sequenced in multiplex on a single lane of an Illumina HiSeq 2000 instrument using version 3 chemistry following established protocols for paired-end 101-bp reads.
Table 1. WU-CaMP27 Gene List
Total Coverage Area, bp
Exonic Coverage Area, bp
Abbreviations: bp, base pair; WU-CaMP27, Washington University Genomics and Pathology Services panel containing 27 genes that are commonly mutated in cancer.
Base calls and quality scores were produced by the included Illumina analytical software (Casava version 1.7). The resulting FASTQ files were aligned to National Center for Biotechnology Information (NBCI) build 37.2 of the human reference genome (hg19) using Novoalign (Novacraft, Selangor, Malaysia) with default paired-end parameters. Mapped reads were marked for duplicates (paired reads with the same start positions for each read) with picard tools (picard.sourceforge.net) before subsequent analysis. Quality metrics including gene coverage were calculated using BedTools and SAMTools.[27, 28] Variant identification was performed using the UnifiedGenotyper in the Genome Analysis Toolkit (GATK; version 2.1, Broad Institute) and reported as variant calling format files with parameters described in the “Best Practices Methods” section of the GATK package, including removal of duplicate reads, quality score recalibration, and indel realignment. Discordant SNV calls were confirmed manually using the Broad Institute's freely available Integrated Genomics Viewer Software (broadinstitute.org/igv/). All plots and statistical analyses were performed using the freely available R statistics package (R, version 2.15.1; R Project for Statistical Computing, Vienna, Austria; r-project.org/) and Microsoft Excel (Microsoft Corporation, Redmond, Wash).
Five cases of primary adenocarcinoma of the lung for which FNA and tissue resection specimens were available were included in the analyses. All slides were reviewed by a board-certified anatomic pathologist (C.T.B.), and FNA smears with abundant tumor cellularity and little to no contaminating benign cells were preferentially included. DNA was extracted from DQ-stained FNA smears, Pap-stained FNA smears, and FFPE tissue resections. Because DQ staining is used routinely for rapid on-site assessment, our analyses focused predominantly on DQ-stained smears. All samples were capture-enriched using a panel of 27 genes commonly mutated in cancer, including the EGFR and KRAS genes (Table 1).
Assessment of DNA Quality and Insert Size
Table 2 shows the amount of DNA extracted from each specimen. As expected, the amount of DNA extracted from FFPE tissues (3 cores measuring 1 mm each) was higher than the amount of DNA extracted from the single FNA smear slides. In general, the DQ-stained smear slides appeared to yield more DNA than the Pap-stained FNA smear slides; however, there was no statistically significant difference noted (P = .22, Student t test). One of the Pap-stained smears (case 2) failed to produce a measurable amount of DNA and was excluded from further analyses. DNA extracted from the Pap-stained smears demonstrated lower average molecular weight than DNA extracted from the DQ-stained smears (data not shown).
Table 2. DNA (in μg) Extracted From Specimen Preparations by Case
DNA (Qubit Yield), μg
Abbreviations: DQ, Diff-Quik; FFPE, formalin-fixed paraffin-embedded; NA, not available; Pap, Papanicolaou; SEM, standard error of the mean.
Because fixation, especially with formalin, is known to chemically modify and cross-link DNA, the distribution of library insert sizes was examined between DQ-stained FNA smears and FFPE tissue. Insert sizes for all indexed sequencing libraries were determined by measuring the distances between properly mapped forward and reverse read pairs. Although the distribution of library insert sizes appears similar (Fig. 1A), closer examination of the mean insert sizes reveals DQ-stained FNA smears generated statistically shorter sequencing library inserts compared with FFPE tissues (186 bp vs 206 bp for FFPE; P = .0151, Student t test) (Fig. 1B). In addition, insert sizes for Pap-stained FNA smears were shorter than those for both DQ-stained smears and FFPE tissues (data not shown).
Comparison of Sequence Data Quality Across Specimen Types
To determine whether DNA harvested from FNA smear slides produces genetic sequence information comparable to FFPE tissue, specific quality metrics, including raw sequence outputs, mapping results, and coverage depths, were compared between DQ-stained smears and FFPE tissues for each case. The number of total reads (raw sequence outputs) generated from multiplex sequencing of DQ-stained smears was similar to the number of total reads generated from the corresponding FFPE tissue (Fig. 2A) (Table 3), and the number of total reads generated from the DQ-stained smears was statistically indistinguishable from the number of those generated from FFPE tissues (P = 0.672, Student t test) (Fig. 2B). Likewise, the number of sequence reads that mapped to the human reference genome (mapped reads) from DQ-stained FNA smears (98.52%) and FFPE tissues (99.04%) was statistically indistinguishable (P = .721) (Fig. 3) (Table 3). Of the uniquely human sequences, an average of 31.31% of reads from DQ-stained smears and 31.13% from FFPE tissues mapped to the 27 cancer-related gene panel for which we targeted (on-target reads) (Fig. 3) and were statistically indistinguishable (P = .55). Finally, the number of unique reads, defined as read pairs with unique start coordinates, was statistically indistinguishable (P = .993) between DQ-stained FNA smears (mean, 7.14%) and FFPE tissues (mean, 7.28%). The number of total reads, mapped reads, and on-target reads was similar between DQ-stained smears and Pap-stained smears (data not shown).
Table 3. Raw Sequence Outputs for Total Reads and Percentages of Mapped Reads, On-Target Reads, and Unique Reads Across Specimen Types by Case
To determine whether these subtle differences in sequence outputs and read metrics between specimen types influenced the ability to obtain adequate sequence information for specific targeted genes of interest, the depths of sequencing coverage for all 27 genes on the capture panel were examined. A Student t test for paired data demonstrated no significant difference (P = .993) in overall gene-level coverage between DNA derived from DQ-stained FNA smears (mean, 1906.4) and FFPE tissues (mean, 1908.4) (Fig. 4A). In addition, there was no difference in overall coverage when known lung cancer “hotspots” such as EGFR exons 19/20 and KRAS exon 2 were examined (Figs. 4B and 4C).
Comparison of SNVs Across Specimen Types
Pairing cytologic smears with FFPE tissues from the same patients allowed us to directly compare SNV calls across specimen types. The overall sequence concordance (reference and nonreference alleles) across specimen types was high (> 99.9%). On average, 79.6 SNVs (range, 69 SNVs-94 SNVs) were identified per case trio within the total capture region, 79.2 of which (99.5%) were common across all specimen types (Fig. 5). After manual review of the read alignments using the Broad Institute's Integrated Genomics Viewer (broadinstitute.org/igv/), only 2 discordant SNVs were identified. These occurred within case 2, in which SNVs were identified within the tissue specimen but not the DQ-stained FNA slide (see Supplemental Table 1, available online), resulting in an average of 0.5% discordant SNVs across case trios (Fig. 5). In all other cases, 100% of SNVs were concordant across specimen types. Further evaluation of these 2 discordant SNVs using the University of California at Santa Cruz (UCSC) Genome Browser demonstrated that one SNV represented a mutation in PTEN, the deletion of which has been described previously in lung cancer. The other had not been previously characterized. These 2 discrepant SNVs likely represent somatic mutations that are heterogeneously distributed throughout the tumor tissue.
The advent of targeted therapeutics has ushered in the era of personalized medicine. This not only impacts the management of patients, but also alters the role of the pathologist in evaluating malignancies. The morphologic classification, although relevant, is no longer sufficient, and often needs to be supplemented with molecular information. This paradigm shift is particularly evident in the treatment of patients with NSCLC. Recently, the College of American Pathologists, in collaboration with the International Association for the Study of Lung Cancer and Association for Molecular Pathology, recommended molecular testing of EGFR and ALK in all patients with adenocarcinoma of the lung at the time of initial diagnosis and/or subsequent disease recurrence. Several additional biomarkers with potential prognostic significance in patients with NSCLC, including ROS1,[10, 11] MET, ERBB2, and RET, have also been described; however, these are not currently generally examined as part of routine clinical management. With a growing understanding of the molecular events involved in malignancy and the mechanisms of pharmacotherapy, the number of target genes will inevitably increase.
NGS is emerging as a powerful tool in molecular diagnostics because it offers a great improvement over current molecular methods such as PCR and Sanger sequencing. NGS platforms are becoming increasingly cost-efficient and provide a greater breadth of genetic information compared with their traditional single-gene sequencing counterparts.[34, 35] Whereas the more traditional approaches can only identify a limited spectrum of mutations at a single genomic locus, NGS can identify the full range of mutation types (ie, insertions, deletions, rearrangements, etc) across many genes with a single test. The ability to do more with less is particularly appealing in cytopathology, in which diagnostic material is often limited. The use of cytologic material for molecular testing becomes critically important for patients with advanced stages of lung cancer who are not candidates for surgical resection. Because only 25% to 30% of patients with NSCLC undergo surgical resection, a large percentage of patients may have FNA material as the only available specimen for diagnosis and ancillary molecular studies.[37, 38]
NGS has been used successfully on FFPE tissue specimens to identify clinically relevant mutations in carcinomas.[26, 39, 40] It is important to note that the quality of sequence data obtained with the use of FFPE has been shown to be similar to the quality that is obtained with fresh frozen tissue. NGS has also been shown to be effective when tissue is scarce and only small quantities of DNA are available for analysis. Although one prior study demonstrated that NGS could be successfully performed on FNAs of pulmonary and pancreatic tumors using cell blocks generated from FNA specimens, the correlation of FNA results with those from the gold standard excision specimen was not demonstrated. In any event, NGS analysis has not to the best of our knowledge been previously applied to cells obtained from direct cytologic smears of FNAs.
Consequently, we evaluated the application of NGS directly to cytologic smears (either DQ-stained smears or Pap-stained smears) from FNA specimens. We demonstrated that DNA isolated from FNA smears yields comprehensive and accurate sequence information that is statistically indistinguishable from that obtained from FFPE tissue. We observed no significant differences in the total number of reads, the percentage of reads mapping to the target region, or the coverage of target regions in our gene set. There was > 99.9% agreement in base calls across the entire target region, and 99.5% of SNVs identified in individual case sets were present in all 3 specimen types. Only rare discrepancies (2 discordant SNVs) were identified between FNA and FFPE tissue specimens, which likely represent variation from tumor heterogeneity and tumor sampling and not artifacts induced by specimen preparation or sequencing errors.
Much of the current analysis was focused on comparing the DQ-stained smears with FFPE because DQ staining is used for on-site immediate assessment during FNA procedures and allows for the determination of adequacy not only for diagnosis, but also for ancillary studies, including molecular testing. Because not all laboratories or all cases will have available DQ-stained material, we also explored differences between DQ-stained FNA smears and Pap-stained FNA smears. The amount of DNA isolated from the Pap-stained slides was generally lower than but not significantly different from the amount of DNA isolated from the DQ-stained slides. One of the Pap-stained slides (case 2) did not meet the required minimum amount of DNA (100 ng) and was therefore excluded from the current study. In addition, the DNA extracted from the DQ-stained slides demonstrated significantly higher molecular weights than the DNA extracted from Pap-stained slides from the same case. It is interesting to note that this same phenomenon was observed by Killian et al when comparing DQ-stained smears with Pap-stained smears using high-resolution comparative genomic hybridization assays. Despite these quantitative and qualitative differences between DNA extracted from the 2 stains, there were no significant differences noted in the sequence information obtained. One reason that the differences in lengths evident in the extracted DNA did not adversely impact the results of NGS using Pap-stained slides is that unlike conventional Sanger sequencing, NGS reads are inherently short (101 bp in the current study), and therefore little sequence information is lost.
The results of the current study have demonstrated the usefulness of NGS on FNA smears to provide extensive, high-quality molecular characterization of tumors and support the integration of NGS technologies into the standard cytopathology workflow.
This work was funded by the Washington University Department of Pathology and Immunology. The Genome Technology Access Center in the Department of Genetics at Washington University School of Medicine is partially supported by National Cancer Institute (NCI) Cancer Center Support Grant P30 CA91842 to the Siteman Cancer Center and by Washington University Institute of Clinical and Translational Sciences (ICTS)/National Institutes of Health (NIH) Clinical and Translational Science Award (CTSA) grant UL1RR024992 from the National Center for Research Resources (NCRR), a component of the NIH and the NIH Roadmap for Medical Research. The center is partially supported by NCI Cancer Center Support Grant P30 CA91842 to the Siteman Cancer Center and by ICTS/CTSA grant UL1RR024992 from the NCRR.