Tissue-based absolute quantification using large-scale TMT and LFQ experiments

Relative and absolute intensity-based protein quantification across cell lines, tissue atlases and tumour datasets is increasingly available in public datasets. These atlases enable researchers to explore fundamental biological questions, such as protein existence,expressionlocation,quantityandcorrelationwithRNAexpression.Moststudies provide MS1 feature-based label-free quantitative (LFQ) datasets; however, growing numbers of isobaric tandem mass tags (TMT) datasets remain unexplored. Here, we compare traditional intensity-based absolute quantification (iBAQ) proteome abun-dancerankingtoananalogousmethodusingreporterionproteomeabundanceranking with data from an experiment where LFQ and TMT were measured on the same samples. This new TMT method substitutes reporter ion intensities for MS1 feature intensities in the iBAQ framework. Additionally, we compared LFQ-iBAQ values to TMT-iBAQ values from two independent large-scale tissue atlas datasets (one LFQ and one TMT) using robust bottom-up proteomic identification, normalisation and quantitation workflows.

Proteomics is a powerful tool for understanding the underlying biology of cells and tissues.Large-scale cell lines, tumour datasets or tissue atlases enable researchers to ask fundamental questions about the proteome, such as protein existence, expression location and correlation with RNA expression [1][2][3].The number of publicly available datasets continues to expand every year [4], facilitating their reuse [5,6] and integration into protein expression resources [7,8].
Label-free intensity-based absolute quantification (iBAQ) is a robust and common method to estimate the expression of proteins without the need for a standard reference sample [9,10].This method measures relative protein abundances within a sample and can be converted to approximate absolute scales, like copy number when certain assumptions are met.iBAQ protein expression has been only explored for the label-free data-dependent (DDA) [9] and independent acquisition (DIA) methods using MS1 [10].riBAQ is similar to iBAQ except that each protein's iBAQ value was normalised to the sum of all iBAQ values to obtain its riBAQ value [10,11].
MS2 methods [12,13], such as spectral counting, can serve as a proxy for absolute quantification in bottom-up proteomics experiments.Spectral-counting algorithms offer some advantages because they can be applied directly to the data commonly collected for identification purposes including tandem mass tags (TMT, multiplex) experiments.In 2011, Colaert et al. [13] explored three MS2-based quantitative methods: Exponentially modified Protein Abundance Index (EmPAI) [14], Normalised Spectral Abundance Factor (NSAF) [15] and normalised Spectral Index (SIn) [16].Their findings indicated that the NSAF method outperformed both EmPAI and SIn in terms of accuracy and precision [13].However, spectral counting-based quantification has limitations because it does not use chromatography peak attributes such as height or area potentially limiting its accuracy and dynamic range [17,18].Ahrné et al. [19] undertook a distinct intensitybased strategy to calculate iBAQ values in TMT datasets, treating them as label-free datasets.This involved distributing MS1 intensities of all TMT-labelled features among the individual samples based on the relative reporter ion intensities.However, this approach is more complex, as the datasets need to be analysed as label-free experiments and precursor ion intensities must be extracted.Furthermore, this approach has not been applied to a large-scale dataset or benchmarked across different datasets.
Here, we explored an alternative approach to perform absolute protein expression analysis on TMT datasets using the direct reporter ion intensities.To assess the accuracy of this method, we employed a gold-standard mix-proteome dataset (PXD007683) [20] analysed with both LFQ and TMT methods.We then calculated iBAQ values based on either MS1 feature or reporter ion intensities (respectively) and compared the correlation for all quantified proteins.Additionally, we applied robust normalisation and quantitation workflows to analyse two large-scale tissue datasets from Jian et al. (TMT -PXD016999) [1] and Wang et al. (LFQ -PXD010154) [2].iBAQ values were estimated using the MS1 intensities for labelfree experiments, and the reporter ion intensities in the case of TMT datasets.Feature intensity tables for all analysed datasets were generated using the quantms (https://quantms.readthedocs.io/)workflow which enables the analysis of DDA, DIA label-free and TMT datasets [21,22].quantms is a novel workflow that allows performing cloud and HPC data analysis in a distributed manner [23] and has been already benchmarked with popular tools such as MaxQuant and ProteomeDiscover [24].Each generated feature was the combination of a peptide sequence, modifications, charge state, sample, fraction, and technical or biological replicate.Feature intensities were normalised using quantile normalisation, the highest intensity for each feature was selected across replicates (Supplementary Note 1).Then feature intensities were added together across replicates of the same sample.Finally, feature intensities were averaged (median) at the peptide sequence level.iBAQ is computed by dividing the sum of peptide intensities by the number of theoretically observable peptides of the protein.Each iBAQ value was normalised to the sum of all iBAQ values for the same sample (riBAQ) [11,25].All analysis steps are included in a Python package (https://github.com/bigbio/ibaqpy, Supplementary Note 2).
We tested the TMT-iBAQ approach using a mix-proteome dataset comprising both Human and Yeast samples in multiple concentrations [20].The primary objective of the dataset and the original study was to evaluate the capability of TMT and LFQ approaches in accurately quantifying fold changes of 3-, 2-and 1.5-fold across the entire dataset.
All parameters for the reanalysis were annotated using the SDRF file format [26] (Supplementary Note 3).In the present study, we did not explore the differential expression across samples (as originally designed by O'Connell et al. [20]) but compared the expression of the human proteins when using TMT-iBAQ or LFQ-iBAQ.
In the PXD007683 dataset, we quantified a total of 94,804 peptides and 8401 proteins.There were 33,321 peptides and 6273 proteins commonly identified using TMT and LFQ approaches; while 18,524 peptides from 392 proteins and 42,959 peptides from 1736 proteins were quantified using only LFQ or TMT approaches, respectively.The peptide intensity between both approaches is statistically significantly correlated for all samples (R > 0.4, Lin's concordance correlation coefficient [CCC] > 0.02 -Supplementary Note 4) and the protein intensity between both approaches shows a correlation coefficient higher than 0.8 (R > 0.86, Lin's CCC > 0.1 -Supplementary Note 5).The log-scale iBAQ values for both TMT and LFQ approaches of the PXD007683 dataset were compared, as shown in Figure 1A,B.First, we evaluated the reproducibility of the two methods across all 11 sample replicates for both approaches (Figure 1A).Samples analysed with the label-free method showed a higher coefficient of variation (average CV = 14%), while TMT samples had an average CV = 10%.The iBAQ values displayed a similar distribution across the 11 samples, with a higher median intensity observed for TMT experiments than LFQ in all samples (Figure 1A).The iBAQ Pearson correlation and Lin's CCC between the TMT and LFQ approaches is remarkably high (R > 0.84, Lin's CCC > 0.74).These results demonstrate that the iBAQ values obtained from both LFQ and TMT approaches in this benchmark dataset are highly consistent and reliable.In fact, this result is supported by the long use of MS2 (based on fragment ion intensities) data for quantification in proteomics experiments by using MRM, DIA or having found good correlations between precursors and their reporters in DDA experiments [27].While previous authors [17,20,28] have found that LFQ and TMT methods offer similar performance in terms of accuracy when analysing the same sample, comparisons of these methods for proteome characterisation between different studies with similar tissue remain unexplored.We tested this in the reanalysis of two large-scale human tissue datasets from Jian et al. (TMT -PXD016999) [1] and Wang et al. (LFQ -PXD010154) [2] (Supplementary Note 3).
Both datasets were analysed using the same database (UniProt human Swiss-Prot 092022), the quantms workflow and the corresponding datasets parameters (Supplementary Note 3).For PXD010154, a total number of 340,306 peptides and 14,602 proteins were quantified, while the number of quantified peptides and proteins for PXD016999 were 173,678 and 10,351, respectively.Figure 2A shows the distribution of iBAQ values for all shared tissues between both datasets (adrenal gland, liver, lung, ovary, pancreas, prostate, spleen, stomach and testis), while median intensity is higher for TMT experiments compared with LFQ for all tissues except prostate.Figure 2B shows the iBAQ correlation between both experiments for the shared tissues, and all tissues show a correlation coefficient higher than 0.80 and a Lin's CCC higher than 0.7.The iBAQ values obtained by LFQ and TMT of these nine tissues had a strong correlation and high consistency.Previously, Betancourt et al. [29] integrated TMT results with LFQ using the three most abundant peptides for each protein quantified (TOP3), but the reproducibility and the correlation between both technologies were never explored.Using the transformed normalised intensities as suggested by Jiang et al. [1], instead of the iBAQ values from reporter ion intensities (as suggested in this research), could negatively affect the correlation between relative proteome abundances obtained with LFQ or TMT.
In summary, iBAQ, as previously reported, is a robust and common method for estimating the relative/absolute expression of proteins.
This study explored and extended the capabilities of the LFQ-iBAQ approach to perform proteome-wide quantification in TMT datasets using direct reporter ion intensities.The results showed that the iBAQ correlation between the TMT and LFQ approaches in different datasets is high, indicating the potential of the direct reporter ion intensity method for relative protein abundance analyses in TMT datasets.This new approach can enable the future integration of public TMT and LFQ proteomics datasets using intensity-based methods instead of less accurate spectral counting which could improve the accuracy and reproducibility of proteomics meta-analyses.

F
I G U R E 1 (A) Boxplot of riBAQ Log-transformed for the 11 samples dataset PXD007683, for both TMT and LFQ approaches.(B) Correlation between riBAQ values for all quantified proteins between the TMT and LFQ approaches, for dataset PXD007683.iBAQ, intensity-based absolute quantification; LFQ, label-free quantitative; TMT, tandem mass tags.

F
I G U R E 2 (A) Boxplot of riBAQ log-transformed for all tissues shared between datasets PXD016999 and PXD010154.(B) Correlation between riBAQ values for all quantified proteins between PXD016999 and PXD010154 datasets.iBAQ, intensity-based absolute quantification.