Archival formalin-fixed and paraffin-embedded clinical samples represent a very diverse source of material for proteomic investigation of diseases, often with follow-up patient information. Here, we describe an analytical workflow for analysis of laser-capture microdissected formalin-fixed and paraffin-embedded samples that allows studying proteomes to a depth of 10 000 proteins per sample.
The workflow involves lysis of tissue in SDS-containing buffer, detergent removal, and consecutive digestion of the proteins with two enzymes by the multienzyme digestion filter-aided sample preparation method. Resulting peptides are fractionated by pipette-tip based strong anion exchange into six fractions and analyzed by LC-MS/MS on a bench top quadrupole Orbitrap mass spectrometer.
Analysis of the data using the MaxQuant software resulted in the identification of 9502 ± 28 protein groups per a 110 nL sample of microdissected cells from human colonic adenoma. This depth of proteome analysis enables systemic insights into the organization of the adenoma cells and an estimation of the abundances of known biomarkers. It also allows the identification of proteins expressed from tumor suppressors, oncogenes, and other key players in the development and progression of the colorectal cancer.
Conclusion and clinical relevance
Our proteomic platform can be used for quantitative comparisons between samples representing different stages of diseases and thus can be applied to the discovery of biomarkers or drug targets.
Adenomatous polyps arising from glandular epithelium or mucosal surface of colon and rectum are frequently precursor states to colorectal cancer (CRC). Stratification of the adenomas results in the formation of invasive carcinoma. Spreading of the cancer cells to lymph nodes and formation of distant metastases are the final stages of CRC. Molecular genetics analyses conducted over the past decades have revealed numerous mutations underlying the pathogenesis of CRC. However, only a few well-studied oncogenes and tumor-suppressor genes, such as p53, APC, and KRAS, appear to be mutated in the majority of CRCs, whereas the role of a much larger collection of genes that are mutated less frequently still have to be defined . Gene mutations have diverse effects on the abundance and structure of proteins and can affect basic cellular processes such as growth and mobility. The understanding of these mechanisms could be facilitated by proteomic technologies, potentially allowing deep insights into alterations at the proteome level [2-4].
Over the past decade, many studies have focused on measuring the proteome composition in normal colonic mucosa, adenomas, and cancer cells to identify protein changes associated with this disease. Due to technological limitations, typically they identified only several hundred proteins per study, providing a relatively short list of changes occurring in the sequence normal mucosa-adenoma-cancer . Several recent proteomic studies reported a larger number of proteins, around 2000–3000 [6-8], but this still covers only the most abundant 20% of the entire proteome.
Protein extraction and generation of peptides by digestion with proteinases are essential steps in proteomic sample preparation. Due to the incompatibility of mass spectrometers with solutes containing detergents, many of the published protocols for analysis of clinical material avoid their use, substituting them by other chaotropic reagents—mainly urea. Unfortunately, in comparison to detergents, the lytic strength of urea is limited. This can lead to incomplete lysis, mostly affecting membrane proteins which remain insoluble and in consequence may not be digested and analyzed. Both 2D electrophoresis and “in solution-digestion” approaches are affected by this technical limitation. As an alternative approach, many studies utilized the classical “in-gel digestion” technique, which uses 1D SDS-electrophoresis . The major advantage of this method is its compatibility with SDS, which can be used for sample solubilization and is easily removable from the gel before protein digestion. A disadvantage is the low yield that can accompany in-gel digestion and peptide extraction . Although this is not a significant drawback for analysis of samples that are abundantly accessible, such as cultured cells, the low yields can become a serious problem when only minute amounts of lysates are available. Recently, we have described a technique that combines attractive features of in-solution digestion and in-gel digestion. In this method, termed filter-aided sample preparation (FASP), SDS solubilized sample is deposited on a filter membrane where the SDS is exchanged with urea. Subsequent digestion leads to release of clean peptides through the filter, while debris remains behind on the filter [11, 12]. A multienzyme digestion version, termed multienzyme digestion filter-aided sample preparation (MED-FASP) is especially economical with the use of input material and provides complementary peptides .
Generally, two types of samples are available for investigation of clinical tissue material: fresh/frozen and formalin-fixed and paraffin-embedded (FFPE) ones. Fresh/frozen tissue would appear more suitable for proteomic work than chemically cross-linked and paraffin-embedded material. However, independent studies have demonstrated the equivalence of frozen material and that one after rehydration and heat-induced reversal of fixation [13-15]. Obvious advantages of FFPE material are the convenience of its handling, storage, and in particular its presence in archives that often cover decades. Importantly, patient histories for the years after sample acquisition are often available. For these reasons, FFPE material has been tested for its suitability for cancer research by many research groups [15-21].
In this article, we present a streamlined workflow with the potential to analyze the proteome at the 10 000 protein scale from microdissected FFPE tissue. We demonstrate this with the analysis of four samples of colonic adenomas.
2 Materials and methods
2.1 Archival FFPE clinical samples
Archival FFPE samples of colonic adenomas were obtained from the Department of Pathology of the Wrocław Medical University. The analysis of the samples followed an informed consent approved by the local ethics committee.
2.2 Tissue microdissection and lysis
Colonic adenoma tissue was dissected with the Laser Pressure Catapulting PALM Instrument (Zeiss, Göttingen, Germany) and was lysed in a buffer consisting of 0.1 M Tris-HCl, pH 8.0, 0.1 M DTT, 0.5% (w/v) polyethylene glycol 20 000, and 4% SDS at 99°C as described previously [10, 14].
2.3 HeLa and HCT116 cells
HeLa and HCT116 cells were grown in DMEM supplemented with 10% FBS and 1% streptomycin. The cells were harvested at 70% confluence and were lysed in 0.1 M Tris-HCl, pH 8.0, 0.1 M DTT, 2% SDS at 99°C. After cooling to room temperature, the lysates were sonicated to reduce the viscosity of the samples.
2.4 Protein digestion and peptide fractionation
Detergent was removed from the lysates and the proteins were digested consecutively with LysC and trypsin and strong anion exchange (SAX)-fractionated according to the MED-FASP protocol . Briefly, protein lysates were digested in spin ultrafiltration units of nominal molecular weight “cut-off” of 30 000 (Cat No. MRCF0R030, Millipore) containing protein concentrates, 200 μL of 8 M urea in 0.1 M Tris/HCl, pH 8.5 (UA), were added and samples were centrifuged at 14 000 × g at 20°C for 15 min. This step was repeated once. Then 50 μL of 0.05 M iodoacetamide in UA was added to the filters and the samples were incubated in darkness for 20 min. Filters were washed twice with 100 μL of UA followed by two washes with 100 μL of 0.05 M Tris/HCl pH 8.5. Proteins were digested in 40 μL 0.05 M Tris/HCl pH 8.5 using endoproteinase LysC at enzyme to protein ratio 1:50 at 30°C for 18 h. The released peptides were collected by centrifugation at 14 000 × g for 10 min followed by two washes with 0.05 M Tris/HCl pH 8.5. The residual on the filter was digested with trypsin using the above conditions.
2.5 Protein content determination
The protein content was determined with a Cary Eclipse Fluorescence Spectrometer (Varian, Palo Alto, USA) as described previously . Tryptophan fluorescence in tissue lysates was assayed in 8 M urea in 10 mM Tris-HCl pH 8.5. The fluorescence of peptides resulting from FASP digests was analyzed in 0.2 mL of 0.05 M Tris/HCl pH 8.5 in 5 × 5 mm quartz cells. Fluorescence was measured at 295 nm for excitation and 350 nm for emission. Tryptophan was used as a standard. The slits were set to 5 nm and 20 nm for excitation and emission, respectively. Protein content was calculated assuming that eukaryotic proteins contain on average of 1.3% tryptophan.
2.6 Peptide fractionation
Peptides were fractionated according to the previously described pipette tip SAX protocol , with minor modifications. Briefly, peptides were loaded into tip-columns made by stacking six layers of a 3 M Empore Anion Exchange disk (1214–5012 Varian,) into a 200 μL micropipette tip. For column equilibration and elution of fractions, we used Britton & Robinson universal buffer (BRUB) composed of 20 mM acetic acid, 20 mM phosphoric acid, and 20 mM boric acid titrated with NaOH to the desired pH. Peptides eluted after LysC digestions were loaded into this “pipette-tip column” at pH 11 and three fractions were eluted at pH of 6, 4, and 2. Tryptic peptides were loaded at pH 5 and eluted from the ion exchanger with pH 2. The flow-through and the fractions were analyzed.
2.7 LC MS/MS
Peptides were separated on a reverse-phase column (20 cm × 75 μm inner diameter) packed in-house with 1.8 μm C18 particles (Dr. Maisch, Germany) using a 4 h ACN gradient in 0.1% formic acid at a flow rate of 250 nL/min. The column was operated at a constant temperature of 35oC regulated by an in-house designed oven with a Peltier element . The LC was coupled to a Q Exactive mass spectrometer  (Thermo Fisher Scientific, Germany) via the nanoelectrospray source (Proxeon Biosystems, now Thermo Fisher Scientific). The Q Exactive was operated in the data-dependent mode with survey scans acquired at a resolution of 50 000 at m/z 400 (transient time 256 ms). Up to the top ten most abundant isotope patterns with charge ≥2 from the survey scan were selected with an isolation window of 1.6 Th and fragmented by higher energy colisional dissociation (HCD)  with normalized collision energies of 25. The maximum ion injection times for the survey scan and the MS/MS scans were 20 and 60 ms, respectively, and the ion target value for both scan modes were set to 106. In this mode of operation, MS/MS scans are acquired with maximum speed because filling of the HCD trap is almost completely in parallel with acquisition of the transient for the preceding scan. Furthermore, since virtually all scans “time out” at 60 ms due to the high target value, the maximum ion signal in the fragmentation spectra is obtained.
2.8 Data analysis
The MS data were analyzed using the software environment MaxQuant version 22.214.171.124 . The proteins were identified by searching MS and MS/MS data of peptides against a decoy version of the International Protein Index human database (v. 3.68). Carbamidomethylation of cysteines was set as fixed modification. The minimum peptide length was specified to be seven amino acids. The initial maximal mass tolerance in MS mode was set to 7 ppm, whereas fragment mass tolerance was set to 20 ppm for fragmentation data. The maximum false peptide discovery rate was specified as 0.01. Label-free quantification was carried out in MaxQuant as previously described . The total protein content was defined as the sum of peptide intensities integrated over the elution profile of each peptide. The proportional amount of individual proteins was calculated as the ratio of their LFQ-intensity (Label Free Quantitation MS intensity in MaxQuant) to the sum of all LFQ-intensities (total protein) in the measured sample. Compositional diversity of the groups of proteins matching Gene Ontology categories was defined as: Diversity = [number of proteins with annotation i/total number of proteins] × [100%/total protein (with annotation i)%].
Today's MS-based proteomics enables quantitative analysis of proteins to a depth comparable with global gene expression profiling technologies. Proteomics can provide detailed descriptions of proteomes of cells and tissues, and thereby appears as a powerful tool to study human disorders. Here, we demonstrated that clinical formalin-fixed and paraffin-embedded archival human tissue can be analyzed by the proteomic technology in a similar fashion and depth as cultured cells. Therefore, the described workflow allows systemic insights into etiology and development of diseases.
3 Results and discussion
Over last few years, many protocols for analysis of FFPE samples have been published . Most frequently, the lysates of FFPE tissue were treated with trypsin and analyzed directly by LC-MS/MS. In some cases, the mass spectrometric analysis was carried out following fractionation of the sample at the protein level by SDS-PAGE and at the peptide level by reverse-phase chromatography in the LC-MS/MS step. Recently, we demonstrated that SAX-based prefractionation of tryptic digests from FFPE lysates followed by LC-MS/MS on a Velos Orbitrap spectrometer allowed identification of 3000–5000 proteins from a single sample . In this study, we present a refined sample preparation strategy and combine it with mass spectrometric analysis on a quadrupole Orbitrap instrument (Q Exactive ). The key steps of the sample preparation part are microdissection of the desired cell types, reversal of the fixation and sample lysis in SDS, removal of the detergent, and consecutive digestion of the sample with LysC and trypsin in a proteomic reactor-format (MED-FASP). This is followed by fractionation of the peptides on pipette tip-SAX-columns into in a total of six fractions; consisting of four fractions for LysC and two fractions for tryptic peptides (Fig. 1). In the second step, peptides are chromatographed on a C18 column coupled to the bench-top mass spectrometer. The resulting spectra are analyzed in a completely automated manner by the MaxQuant-software.
3.1 Peptide amount required for an in-depth analysis
The depth of the proteomic analysis from clinical samples depends on many factors. Among them the sensitivity and speed of the mass spectrometer are crucial as emerged from our recent work on the analyses of minute amounts of sample. We found that a certain minimum amount of sample was needed to generate data that covered more than the most abundant components of the proteome . To investigate the question of the required sample quantity on the new mass spectrometric platform, we first analyzed different amounts of peptides ranging from 30 ng to 3 μg by LC-MS/MS using 1D separation only. We found that 0.3–1 μg of peptides were required to identify 4000–5000 proteins (Fig. 2A). For loads below this amount, we observed a rapid drop in the number of proteins. Injection of 3 μg of peptide resulted in only a slight increase of protein numbers. The 1 μg peptide amount corresponds to about 20 000 cells HeLa cells
3.2 Peptide prefractionation prior to LC-MS/MS
Multidimensional fractionation at the peptide level for the large-scale exploration of the proteome was advocated more than a decade ago [30, 31] and this concept has been further developed and refined by many laboratories. Whereas fractionation of peptides over a C18 reverse-phase column is the standard second dimension separation in proteomics, separation of the peptides in the first dimension is achieved by different methods. Frequently, as mentioned above, it is carried out on the protein level by the SDS-gel electrophoresis (GeLC MS). This technology allowed identification of about 1000 proteins from 10 000 laser captured microdissected tumor cells using short-range gel . Alternatively, proteins are first digested and the peptides separated in the first dimension by ion exchange chromatography, most frequently using off-line SCX separation.
In our protocol, we instead use SAX as the first dimensions of peptide fractionation. We have further developed the concept of Ishihama and co-workers  to fractionation peptides in a pipette tip-format  and applied it in several studies. Recently, we combined the SAX-fractionation with consecutive protein digestion with two enzymes among which the LysC and trypsin combination was particularly effective . The LysC peptides are separated into four fractions whereas the “postdigested” tryptic peptides are collected in two fractions. In the experiments described here, LC-MS/MS analysis of HeLa and HCT116 cells allowed identification 13 000–27 000 peptides per fraction (Fig. 2C and D). This strategy resulted in separation of 70–72% peptides into a single fraction. Such efficiency is close to that achieved by IEF-separation [11, 34]. Pipette-tip SAX fractionation is a useful tool for separation of samples containing only few microgram of peptide. This is in contrast to the popular OFFGEL Fractionator, which in our hands usually requires samples in size of 50 μg peptide or more. In addition, after IEF fractionation, peptides are mixed with ampholytes, which share biophysical properties with peptides and are therefore difficult to remove before analysis.
The combination of the consecutive digestion (MED-FASP) of the sample and fractionation of the peptides in the pipette-tip by SAX into six fractions led to the identification of 8500 HeLa proteins per sample (Fig. 1A) (Supporting Information Table S1). This result was achieved using only 6 ug of peptide, an amount corresponding to about 100 000 HeLa cells. Similar analysis of the colon cancer cell line HCT116 allowed identification of 8906 proteins (Supporting Information Table S1).
3.3 Label-free protein quantitation
One of the unique features of large-scale proteome analysis is the possibility of estimation of absolute protein abundances and to compare these values for individual proteins between samples. Recent in-depth proteomic projects estimated protein copy numbers per cell by extrapolating from added standards [35, 36]. Here, we determine the amount of each protein in the proteome on the basis of individual MS LFQ-intensities compared to the total MS LFQ-signal of the measured proteome. First, we tested the linearity of our proteomic platform by measuring a standard protein mixture. Analysis of the UPSII standard (Sigma Aldrich, St. Louis, MO) containing 48 different proteins at concentrations spanning five orders of magnitude resulted in the identification of 33 proteins and showed that the observed intensities were proportional to the protein amount of the standard (Fig. 3A). Importantly, this experiment also demonstrated that the platform allows identification and quantitation of proteins occurring at concentrations spanning at least five orders of magnitude.
3.4 Analysis of adenoma samples
We next applied our workflow to microdissected adenomas. Extraction of four samples followed by MED-FASP digestion resulted in a peptide yield of 56 ± 6.1 ng per 1 nL of FFPE dissected tissue (Table 1). Since we used 6 μg of peptide per sample, the analyzed FFPE material was about 100 nL. Neglecting partial tissue deformation and protein losses during the fixation and embedding processes, this volume corresponds to about 100 μg of fresh tissue. Extraction of the fresh tissues with SDS-containing buffers typically yields about 100 mg/mL protein . Thus, the peptide yield obtained by the MED-FASP procedure was above 50%, a value consistent with previous reports [11, 22]. SAX-fractionation of the peptides allowed separation of 71% of the peptides into a single fraction, with 6000–18 000 identified peptides per fraction (Fig. 2E).
Table 1. Summary of the analysis of four samples of colonic adenoma
Sample volume was calculated by multiplying the microdissected area by the slice thickness.
In the mass spectrometric analysis, 39.4 ± 4.9% of MS/MS scans lead to unambiguous identification of a peptide sequence. This was only 5% lower than observed for cultured HCT116 cells (44.7 ± 7.4%), and this difference may be due to a somewhat higher nonpeptide contamination in the preparation from clinical samples. Combining the analyses of the six fractions identified on average 55 144 unique peptides per tissue sample (Table 1). This resulted in an average of 8481 protein groups in each sample (Fig. 2B). Using the “matching between runs” algorithm in MaxQuant, in which peptide identifications are transferred between runs at a false discovery rate of less than 1%, this number increased to 9500 (Supporting Information Table S2).
3.5 Features of the proteome of the adenoma cells
The protein abundances of the identified proteins span six orders of magnitude as judged by their summed LFQ peptide intensities. Nevertheless, the levels of 98% of the proteins were within a 10 000-fold expression range (Fig. 3B), a similar observation than that in a recent in-depth investigation into the HeLa cell proteome . Intriguingly, the abundance of Histone H4 appeared to be only two orders of magnitude higher that of oncogene KRAS and four orders higher than the key tumor suppressor in CRC, adenomatous polyposis coli protein (APC) (Fig. 3B). Another well-known protein and common marker of colon cancer, the carcinoembryonic antigen, appears to be 1000 times less abundant than histone H4 and therefore still in the middle of the abundance range.
More than 90% of the identified proteins have Gene Ontology Annotation. Bioinformatic data analysis revealed that “nucleus” was the largest cell organelle category, comprising more than 3000 proteins with a cumulative protein contribution of more than 30% to the total cell protein amount (Fig. 4A). In contrast, the proteins in the “integral to plasma membrane” category constitute only 2–3% of the total protein amount and represented nearly 300 proteins. Our dataset contains close to 2000 proteins annotated by GO as “integral membrane,” however, this category represents 10% of total protein mass. Among the different organelles, this group of proteins has the highest compositional diversity (Fig. 4B), showing that this organelle contains a large number of low-abundant proteins. In contrast, mitochondrion, an abundant organelle is composed of only several-hundred canonical components, has the lowest diversity (see Section 'Materials and methods').
The identified proteins contain 567 ones belonging to transcription factor activity GO-category (Fig. 4C). Note that this is about half the transcription factors in human genome . Furthermore, we found 592 proteins with kinase activity annotation. This group of proteins contains 383 protein kinases a number close to the value of about 500 ones expected in the human genome . We also mapped 494 transporters and 84 channel proteins. The latter group shows the highest diversity (Fig. 4C) whereas histones and ribosomal proteins are on average more abundant proteins and thus these groups have low compositional diversities. Interestingly, the transporter proteins have an average diversity.
Finally, our dataset covers a large number of proteins whose genes have been recognized as key players in CRC, such as oncogenes and tumor suppressor genes (Table 2). We identified the tumor suppressor APC protein by two peptides, but we did not identify any product of the second most important tumor suppressor TP53. This may be due to very low abundance of this protein in adenoma samples.
Table 2. Identification of proteins coded by oncogenes and tumor suppressors genes showing somatic mutations in CRC
For protein names, see Supporting Information Table S2.
1.07 × e−02
2.23 × e−03
6.53 × e−04
2.02 × e−03
4.83 × e−03
1.82 × e−04
2.40 × e−04
3.99 × e−03
1.74 × e−04
2.01 × e−03
1.98 × e−04
3.53 × e−04
3.28 × e−03
2.87 × e−03
7.06 × e−04
3.57 × e−04
4.25 × e−04
1.26 × e−02
4 Concluding remarks
Depending on the nature of the data and analysis criteria, transcripts of 8000–16 000 protein-coding genes expressed from a single cell type can be detected [41-43]. Recent large-scale proteomic and transcriptomic analyses revealed 10 255 proteins and 11 936 transcripts of protein-coding genes in HeLa cells  and 10 008 proteins in a osteosarcoma U2OS cell line . In this work, we have shown that similarly large numbers of proteins can now be identified using minute amounts of microdissected FFPE human tissue samples. To our knowledge, such depth of proteomic analysis of a tissue sample has not been achieved before. We hope that our workflow for proteomic analysis of FFPE tissues will be useful for exploring a large variety of FFPE preserved material and that this will facilitate the mapping of human proteomes as well as the discovery of novel biomarkers and drug targets.
We thank Piotr Ziółkowski (Wroclaw Medical University) for providing clinical samples, Korbinian Mayr for the support in the mass spectrometric analysis, and Katharina Zettl for technical assistance. This work was supported by the Max-Planck Society for the Advancement of Science, by PROSPECTS, a grant by the European Commission's 7th Framework Program (HEALTH-F4–2008-201648/PROSPECTS), the Munich Center for Integrated Protein Science (CIPSM), and the Polish National Center of Science (DEC-2011/01/N/NZ5/04253).