Characterization of the genomic features and expressed fusion genes in micropapillary carcinomas of the breast

Micropapillary carcinoma (MPC) is a rare histological special type of breast cancer, characterized by an aggressive clinical behaviour and a pattern of copy number aberrations (CNAs) distinct from that of grade- and oestrogen receptor (ER)-matched invasive carcinomas of no special type (IC-NSTs). The aims of this study were to determine whether MPCs are underpinned by a recurrent fusion gene(s) or mutations in 273 genes recurrently mutated in breast cancer. Sixteen MPCs were subjected to microarray-based comparative genomic hybridization (aCGH) analysis and Sequenom OncoCarta mutation analysis. Eight and five MPCs were subjected to targeted capture and RNA sequencing, respectively. aCGH analysis confirmed our previous observations about the repertoire of CNAs of MPCs. Sequencing analysis revealed a spectrum of mutations similar to those of luminal B IC-NSTs, and recurrent mutations affecting mitogen-activated protein kinase family genes and NBPF10. RNA-sequencing analysis identified 17 high-confidence fusion genes, eight of which were validated and two of which were in-frame. No recurrent fusions were identified in an independent series of MPCs and IC-NSTs. Forced expression of in-frame fusion genes (SLC2A1–FAF1 and BCAS4–AURKA) resulted in increased viability of breast cancer cells. In addition, genomic disruption of CDK12 caused by out-of-frame rearrangements was found in one MPC and in 13% of HER2-positive breast cancers, identified through a re-analysis of publicly available massively parallel sequencing data. In vitro analyses revealed that CDK12 gene disruption results in sensitivity to PARP inhibition, and forced expression of wild-type CDK12 in a CDK12-null cell line model resulted in relative resistance to PARP inhibition. Our findings demonstrate that MPCs are neither defined by highly recurrent mutations in the 273 genes tested, nor underpinned by a recurrent fusion gene. Although seemingly private genetic events, some of the fusion transcripts found in MPCs may play a role in maintenance of a malignant phenotype and potentially offer therapeutic opportunities.


Tumor samples
Two cohorts of micropapillary carcinomas (MPCs) were analyzed; the first cohort comprised 16 consecutive formalin fixed paraffin embedded (FFPE) MPCs, 11 pure and 5 mixed, which were retrieved from the authors' institutions (Table 1), and a second, validation cohort comprised 14 additional consecutive FFPE MPCs, retrieved from Molinette Hospital, Turin, Italy. Frozen samples were available from five out of the 16 cases from the first cohort of MPCs.
As a comparator for the results of the Sequenom mutation profiling, a cohort of 16 consecutive IC-NSTs matched to the first cohort of 16 MPCs according to ER and HER2 status and histological grade were retrieved from a series of breast cancers previously analyzed by aCGH [1]. In addition, 14 IC-NSTs matched according to grade, and ER and HER2 status to tumors from the second cohort of 14 MPCs, and 48 grade 3 IC-NSTs were retrieved from Hospital La Paz, Madrid, Spain [1] (Supplementary Table S1).

Power calculation
For power calculations, we have assumed that if MPCs were driven by a recurrent fusion gene in a way akin to secretory carcinomas (which harbor the ETV6-NTRK3 fusion gene in >95% of cases [2][3][4]) or adenoid cystic carcinomas of the breast (which harbor the MYB-NFIB fusion gene in >90% of cases[5]), a 'pathognomonic' driver event would be present in at least ≥70% of cases (an estimate that is conservative). Therefore, based on a binomial distribution, by sequencing 5 samples we would have been able to identify a recurrent event (i.e. in two or more cases) with 97% statistical power. Furthermore, a 'pathognomonic' driver event present in 50% of cases would be detectable with 80% power.

Immunohistochemical analysis
Representative sections of each case were cut at 3μm and mounted on silane-coated slides. Immunohistochemistry was performed as previously described [6,7], using antibodies raised against estrogen receptor (ER), progesterone receptor (PR), HER2, and epithelial membrane antigen (EMA). Antibody clones, dilution, antigen retrieval methods, scoring systems, and cut-offs [8,9] are summarized in Supplementary Table 2. Positive and negative controls (omission of the primary antibody and IgG-matched serum) were included for each immunohistochemical run. The scoring was performed by at least two pathologists (CM, AS and/ or JSR-F).

Microdissection, DNA extraction, RNA extraction
Representative 8μm-thick sections of the MPCs and IC-NSTs were subjected to microdissection with a sterile needle under a stereomicroscope (Olympus SZ61, Tokyo, Japan) to ensure a percentage of tumor cells greater than 90%, as previously described [7,10]. DNA was extracted from the 16 MPCs and 16 grade-, ER-, and HER2matched IC-NSTs using the DNeasy Blood and Tissue Kit (Qiagen, Crawley, UK). In addition, for 8 MPCs matched adjacent normal breast tissue was microdissected and subjected to DNA extraction. Double-stranded DNA concentration and DNA quality were determined using a Qubit Fluorometer (Invitrogen, Paisley, UK) and a multiplex PCR, respectively, as previously described [6,10]. RNA was extracted from the microdissected MPCs and IC-NSTs FFPE samples using the RNeasy kit (Qiagen). Frozen samples from five MPCs and 48 grade 3 IC-NSTs were cut, microdissected, and subjected to RNA extraction with Trizol (Invitrogen), according to the manufacturer's protocol. RNA quantity and quality was assessed using Agilent 2100 Bioanalyzer with RNA Nano LabChip Kits (Agilent Biosystems, Stockport, UK).

Microarray comparative genomic hybridization (aCGH)
aCGH data were pre-processed and analyzed using the Base.R script in R version 2.14.0, as previously described [1,11]. Genomic DNA from each sample was hybridized against a pool of normal female DNA derived from peripheral blood. Raw Log 2 ratios of intensity between samples and pooled female genomic DNA were read without background subtraction and normalized in the LIMMA package in R using PrinTipLoess.
Outliers were removed based upon their deviation from neighboring genomic probes, using an estimation of the genome-wide median absolute deviation of all probes. Log 2 ratios were rescaled using the genome wide median absolute deviation in each sample and then smoothed using circular binary segmentation (cbs) in the DNACopy package [12][13][14][15]. After filtering polymorphic BACs and BACs mapping to chromosome Y, a final dataset of 31,157 clones with unambiguous mapping information according to build hg19 of the human genome (http://www.ensembl.org). A categorical analysis was applied to the BACs after classifying them as representing amplification (>0.45), gain (>0.08 and ≤0.45), loss (<−0.08) or no change, according to their cbs-smoothed log2 ratio values [7,14]. Threshold values were determined and validated as previously described [13,14].

Mutation screening and validation
Sixteen MPCs and 16 grade-, ER-and HER2-matched IC-NSTs were subjected to mutation screening using the OncoCarta Panel v 1.0 (Sequenom, San Diego, CA, USA), detecting 238 mutations in 19 common cancer-related genes, as previously described [12,15]. The prevalence of mutant alleles was estimated by calculating the ratio of the area of the raw spectra of the mutant allele to its wild-type allele. Mutations were validated using Sanger sequencing as previously described [12,15]. Primer sequences are available at http://rock.icr.ac.uk/collaborations/Mackay/Micropapillary. Sequences were visualized using 4Peaks (http://4peaks.en.softonic.com/). Eight MPCs, from which both primary tumor and normal breast tissue could be microdissected, were subjected to microdissection, and DNA was extracted as previously described [7,10]. Tumor and germline DNA were subjected to targeted capture massively parallel sequencing using a platform containing baits targeting all exons of 273 genes that were either recurrently mutated in breast cancer or are involved in DNA repair pathways (Supplementary  Table  S3). Custom oligonucleotides (NimblegenSeqCap) were designed for hybridization capture of all protein-coding exons of 273 genes (Supplementary Table S3). Barcoded sequence libraries were prepared (New England Biolabs, KapaBiosystems) using 50ng DNA and pooled at equimolar concentrations into a single exon capture reaction as previously described [16,17].
Sequencing was performed in a single lane of an Illumina HiSeq2000 (San Diego, CA), and reads were aligned to the reference human genome hg19 using the Burrows-Wheeler Aligner (BWA) [6,18]. Somatic single nucleotide variants were identified using a combination of muTect [19], MutationSeq[20], Haplotypecaller[21] and VarScan2 [22]; only single nucleotide variations identified by at least two out of the four callers employed were considered as valid. This conservative approach has been employed to minimize false positive results obtained with high depth targeted massively parallel sequencing performed with DNA extracted from formalin-fixed, paraffin-embedded tissues. For small insertions and deletions (indels), Haplotype Caller and VarScan2 were employed. All candidate mutations were reviewed manually using the Integrative Genomics Viewer [23]. Mutations with allelic frequency of <1% and/ or supported by <5 reads were disregarded. Regions with loss of heterozygosity were identified using exomeCNV [24].

Paired-end massively parallel RNA sequencing
Briefly, messenger RNA was selected using oligo-dT magnetic beads from 3μg of total tumor RNA. RNA was fragmented at 94°C for 5 minutes in fragmentation buffer (Illumina) and converted to single stranded cDNA using SuperScript II reverse transcriptase, followed by second-strand cDNA synthesis using Escherichia coli DNA polymerase I. Double-stranded cDNA was end repaired by using T4 DNA polymerase and T4 polynucleotide kinase; monoadenylated using a Klenow DNA polymerase I (3'-5' exonucleotide activity), and adapters ligated using T4 DNA ligase. The adaptor-ligated cDNA library was then fractioned on a 2% agarose gel, and a smear corresponding to 200 nucleotides was excised, purified, and PCR amplified using high fidelity Phusion DNA polymerase (Finnenzymes, Vantaa, Finland). The library was quantified using the Agilent 1000 kit on the Agilent 2100 Bioanalyzer following the manufacturer's instructions. 3pM was then loaded onto each lane of a flowcell, and sequencing performed with and 2x 54bp cycles on the Genome Analyser II.