These authors contributed equally to this work.
Total variance should drive data handling strategies in third generation proteomic studies
Article first published online: 27 OCT 2013
© 2013 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim
Volume 13, Issue 22, pages 3251–3255, November 2013
How to Cite
Herrmann, A. G., Searcy, J. L., Le Bihan, T., McCulloch, J. and Deighton, R. F. (2013), Total variance should drive data handling strategies in third generation proteomic studies. Proteomics, 13: 3251–3255. doi: 10.1002/pmic.201300056
- Issue published online: 20 NOV 2013
- Article first published online: 27 OCT 2013
- Accepted manuscript online: 8 OCT 2013 04:17AM EST
- Manuscript Accepted: 21 AUG 2013
- Manuscript Revised: 2 AUG 2013
- Manuscript Received: 6 FEB 2013
- Centre for Cognitive Aging and Cognitive Epidemiology
- SynthSysBBSRC and EPSRC
- EPSRC. Grant Number: BB/D019621/1
- Melville Trust
- Differential protein expression;
- Protein marker discovery;
- Proteome analysis
Quantitative proteomics is entering its “third generation,” where intricate experimental designs aim to increase the spatial and temporal resolution of protein changes. This paper re-analyses multiple internally consistent proteomic datasets generated from whole cell homogenates and fractionated brain tissue samples providing a unique opportunity to explore the different factors influencing experimental outcomes. The results clearly indicate that improvements in data handling are required to compensate for the increased mean CV associated with complex study design and intricate upstream tissue processing. Furthermore, applying arbitrary inclusion thresholds such as fold change in protein abundance between groups can lead to unnecessary exclusion of important and biologically relevant data.
glucose regulated protein 78
glucose regulated protein 94
Proteomics is entering its “third generation,” where MS is increasingly being used, not only to quantify total protein levels, but also to investigate how proteins within specific cell types and subcellular organelles respond both spatially and temporally to a host of experimental stimuli. . As proteomic studies embark on more intricate designs, it is essential to re-evaluate whether the currently used data-handling strategies remain appropriate. Fundamental weaknesses and arbitrary design decisions still permeate proteomic research, despite efforts to improve the rigor of data handling [2-7]. This article compares primary datasets generated contemporaneously in our laboratory using peak intensity based LC-MS to provide a novel perspective on the suitability of various inclusion criteria and data-handling strategies in analyzing third generation proteomic data.
Many quantitative LC-MS proteomic studies use an initial inclusion criterion that proteins should be identified with two or more peptides. Though seemingly arbitrary, this inclusion criterion is important for two reasons: first, removal of proteins identified with only one peptide increases the reliability of LC-MS protein identification and helps avoid false detections. A single peptide feature may be found in several proteins or protein isoforms, therefore a truly definitive identification is less likely . Second, this cut-off of two peptides for identification purposes significantly reduces the overall variance within the dataset, defined as the mean of the coefficient of variances for all proteins in the dataset. This reduction is variance considerably increases the power to detect subtle protein changes (Fig. 1). There is clearly a trade-off between reducing variance and the number of proteins remaining for analysis. Extending the inclusion criterion to identification of proteins with three or more peptides further reduces variance, however, also drastically reduces the number of proteins by nearly half of those originally identified by one peptide.
Correspondence concerning this and other Viewpoint articles can be accessed on the journals' home page at:
Correspondence for posting on these pages is welcome and can also be submitted at this site.
Subcellular proteomics will be a dominant theme in third generation proteomic research, yet sample fractionation can greatly impact variance within protein datasets. Sample processing techniques including the enrichment of microvessels, mitochondria  or white matter  can be used upstream of proteomic analysis to provide a more in-depth proteomic profile of how individual cell types and subcellular compartments are responding to experimental stimuli. However, increasing technicality upstream of protein detection increases the total variance of the final dataset, as demonstrated by analysis of our own proteomic data generated using a range of enrichment techniques (Fig. 2). White matter enrichment via micropunches of the corpus callosum and microvessel enrichment using density gradient centrifugation, two intricate upstream tissue handling techniques, induce a 7 and 15% increase in total variance in control tissue, respectively, compared to whole brain homogenates. We hypothesise that this increase in variance might be linked to varying degrees of protein degradation occurring when samples are handled at room temperature for extended periods of time. Upstream tissue processing enriches samples with targeted proteins, improving the spatial resolution of detected protein changes. However, the associated increases in variance make detection of subtle protein changes more difficult.
The magnitude of the change in protein abundance (fold change) is a popular but arbitrary inclusion criterion often used to dissect proteomic data. Analysis of our in vitro human cell line data shows that employing an arbitrary fold change value as a data dissection tool can exclude important proteins from the final analysis. This in vitro study investigated the effects of a global metabolic challenge on mitochondrial function and cellular proteomics. A total of 958 proteins were identified with two or more peptides (n = 6/group). A stringent a priori inclusion criterion of a p < 0.01 was set for a protein change to be deemed significant, resulting in a final protein list of 193 significantly altered proteins  (Fig. 3A). However, as well as a p-value threshold, many investigators also utilize a fold change cut-off to rapidly identify the most “important” protein changes. Datasets with a low overall variance allow for the detection of subtle protein changes, however, employing an arbitrary fold change inclusion criterion such as the popular “minimum 1.5 fold change” on these low variance datasets excludes the subtle yet significant protein changes. The fold change cut-off drastically reduces the number of proteins included in the final analysis and increases the risk of creating false negatives (Fig. 3A).
A similar analysis of the impact of arbitrary fold change cut-offs was carried out on the more variable microvessel extraction data (Fig. 3B). Due to the increased variability of these data (as shown in Fig. 2C), employing a stringent alpha value of p < 0.01 significantly reduces the number of proteins in the final list for analysis from 653 identified with two or more peptides to only 12. In this more variable system, imposing a 1.5 fold change cut-off has no further effect on protein number, due to a large fold change required to overcome the variance for inclusion at the set alpha level. It is therefore concluded that inclusion of a fold change data cut-off is either dangerous in the creation of false negatives (in studies with low overall variance) or irrelevant (in studies with high overall variance).
Alternatively, power calculations can be used to determine the magnitude of change required to detect a significant difference between two populations given the technical and biological variance . Used a priori, power calculations are beneficial in study design, guiding decisions regarding the number of replicates needed to obtain a set level of power . However, the nature of a priori power calculations means these calculations are based on an estimate of overall biological and technical variance. Our analysis reveals that the CV is highly dependent upon the type of tissue being analyzed and the degree of upstream tissue processing involved (Fig. 2). Using a CV that is not specific to the dataset to decide detectable fold change can be problematic, and could lead to an over- or underestimation of proteins found to be differentially expressed. To ensure maximum accuracy in a priori power calculations, extensive and specific pilot data should be obtained.
The question of whether inclusion of fold change cut-offs in addition to a p-value cut-off adds biological value to proteomic data remains. To assess this, we identified two key proteins involved in the endoplasmic reticulum stress response: glucose regulated protein 78 (GRP78) and glucose regulated protein 94 (GRP94). In our in vitro study, experimental intervention with the metabolic challenge of oxygen-glucose deprivation (OGD) saw significant upregulation of GRP78 and GRP94 (p < 0.01). However, GRP78 underwent a fold change of 1.53, whereas GRP94 only had a fold change of 1.48 (Fig. 3C and D) . The popular fold change cut-off of 2 would exclude both of these proteins from the analysis, and only GRP78 would be included if a fold change of 1.5 was used. The interplay between these two proteins is integral to the endoplasmic reticulum stress response; however, one or both of these proteins would be lost from the final dataset if an arbitrary fold change inclusion criterion was employed. Temporal evolution of protein level change is another important factor to be considered in understanding third generation proteomics. Data from the in vitro study demonstrate that following 6 h of OGD, small increases in protein levels of GRP78 and GRP94 predict larger increases following 18 h OGD (Fig. 3C and D). These results suggest that protein fold change should not be used as threshold for inclusion, but rather as an indicator of evolving events occurring within the cell. A protein exhibiting a small fold change at an early time point can be indicative of increasing abundance that might be detected as significant at a later time.
The ability to detect a fold change at a particular level of significance is intrinsically linked to the variance of the data, and this variance is dependent on tissue source and processing techniques (Fig. 2). It is therefore misguided to include fold change in the initial stages of data dissection. A protein reaching the threshold set by a stringent p-value (which in its nature incorporates the variance and the magnitude of change) should be sufficient for the initial inclusion criterion, resulting in a much reduced but relevant list of protein changes (Fig. 3).
The concept of excluding proteins based on fold change not only increases the likelihood of making type II errors, but is also fundamentally flawed given that the biological relevance of a change in protein abundance is likely to be protein specific. For example, proteins in the Bcl-2 family are important evolutionarily conserved regulators of apoptosis. However, even within this family, certain proteins are more influential than others: PUMA (p53 upregulated modulator of apoptosis) being one of the most potent . Subtle changes in this protein are likely to have important cellular effects; however, may be ruled out if stringent fold change cut-offs are employed when analyzing data. The importance of subtle protein changes needs to be recognized in the analysis of large proteomic datasets to avoid the loss of valuable data through the use of inappropriate fold change cut-offs.
The issue of multiple hypothesis testing, where investigating changes in many separate proteins can lead to significant results purely by chance, is an important and widely reviewed issue that is not formally dealt with in this article [4, 14-16]. However, consideration should be given to the fact that overly stringent corrections for multiple comparisons can limit the ability to glean biologically meaningful conclusions from data. Typical methods, such as the Bonferroni correction, are too stringent when studying changes in hundreds of gene or protein abundances in microarray and proteomic experiments. A less stringent method for dealing with multiple comparisons is to employ the false discovery rate, described by Benjamini and Hochberg, based on the frequency distribution of the statistically generated p-values . It must be noted that a level of arbitrariness remains when implementing a false discovery rate. The rate of incorrectly rejecting the null hypotheses is chosen by the individual, depending on the perceived acceptability of false-positives remaining in the final dataset.
As proteomic technology advances, it is important to remember where the true power of proteomics lies: as a hypothesis generator and a tool for generating candidates of potential biomarkers and drug targets of disease. The utility of proteomics is greatest when a maximum number of proteins are identified and included for further analysis. Data processing techniques such as an initial inclusion of a protein identification threshold of two or more peptides give the researcher confidence in the protein identification. Statistical significance should then be considered as a sufficient threshold in detecting important protein changes. Pushing proteins to clear too many hurdles on their way to the final dataset increases the likelihood of omitting biologically interesting and relevant data.
AGH is supported by the MRC. This research is supported by Age UK as part of the Disconnected Mind programme, performed under the aegis of the Centre for Cognitive Aging and Cognitive Epidemiology. TLB is funded by SynthSys, a Centre for Integrative Systems Biology funded by BBSRC and EPSRC; reference BB/D019621/1. RD is funded by the Melville Trust.
The authors have declared no conflicts of interest.
- 15Multiple hypothesis testing: a methodological overview. Methods Mol. Biol. 2013, 972, 37–55.,
- 17Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc Series B Methodol. 1995, 57, 289–300., ,