Data‐independent acquisition‐based SWATH‐MS for quantitative proteomics: a tutorial

Abstract Many research questions in fields such as personalized medicine, drug screens or systems biology depend on obtaining consistent and quantitatively accurate proteomics data from many samples. SWATH‐MS is a specific variant of data‐independent acquisition (DIA) methods and is emerging as a technology that combines deep proteome coverage capabilities with quantitative consistency and accuracy. In a SWATH‐MS measurement, all ionized peptides of a given sample that fall within a specified mass range are fragmented in a systematic and unbiased fashion using rather large precursor isolation windows. To analyse SWATH‐MS data, a strategy based on peptide‐centric scoring has been established, which typically requires prior knowledge about the chromatographic and mass spectrometric behaviour of peptides of interest in the form of spectral libraries and peptide query parameters. This tutorial provides guidelines on how to set up and plan a SWATH‐MS experiment, how to perform the mass spectrometric measurement and how to analyse SWATH‐MS data using peptide‐centric scoring. Furthermore, concepts on how to improve SWATH‐MS data acquisition, potential trade‐offs of parameter settings and alternative data analysis strategies are discussed.


* corresponding author History and terminology of DIA and SWATH--MS
The fundamental idea of data--independent acquisition (DIA) is to record fragment ion mass spectra (MS2 spectra), regardless whether a peptide precursor ion is detected or not. This concept was published for the first time in a proof--of--principle experiment by Masselon et al. in the year 2000 (Masselon et al, 2000). Shortly after, several groups further developed this idea and demonstrated parallel fragmentation of all peptides (Bateman et al, 2002;Purvine et al, 2003) as well as the usage of sequentially windowed precursor isolation methods (Venable et al, 2004). Since then, different varieties of DIA have been explored and various acquisition schemes and analysis workflows have been implemented (Carvalho et al, 2010;Egertson et al, 2013;Geiger et al, 2010;Gillet et al, 2012;Panchaud et al, 2011;Panchaud et al, 2009;Plumb et al, 2006;Purvine et al, 2003;Silva et al, 2005;Venable et al, 2004;Weisbrod et al, 2012) ( Figure S1). The methods differ in the type of mass spectrometer used, in the specific data acquisition parameters and in the data analysis strategies employed. For recent reviews that compare different DIA methods and discuss specific advantages and disadvantages see (Chapman et al, 2014;Law & Lim, 2013;Sajic et al, 2015). The terminology "DIA" was originally coined by Venable et al. in 2004(Venable et al, 2004, to contrast with data dependent acquisition (DDA) where peptide ions are specifically selected for fragmentation based on their detection in precursor ion scans. Initially, data generated by DIA was analysed using similar database search tools as for DDA data, either by searching the acquired mixed MS2 spectra directly (Panchaud et al, 2009;Venable et al, 2004) or by searching pseudo MS2 spectra that were reconstituted based on co--elution of precursor ions and their potential fragment ions (Bern et al, 2010;Geromanos et al, 2009;. A new era in the field started 2012, when Gillet et al. proposed a novel query strategy based on prior knowledge as an alternative DIA data analysis workflow (Gillet et al, 2012). In a method analogous to targeted proteomics by selected/multiple reaction monitoring (S/MRM), Gillet et al. used a consecutive acquisition scheme that supports the repeated measurement of fragment ions of the same peptide over the chromatographic elution time and exploited prior information from spectral libraries, in which mass spectrometric and chromatographic parameters from previously observed peptides are used to query the data. The authors termed their method SWATH--MS in analogy to the field of geophysics, where "swaths" refers to image strips of the Earth's surface acquired by a satellite or aircraft and where the acquired "swaths" collectively can be reconstructed in silico into a complete map of the earth. In SWATH--MS a "swath" refers to a strip of the peptide mass range selected for fragmentation by the mass spectrometer corresponding to a specific width of the precursor isolation window. Due to important advancements in data analysis and technical improvements in acquisition speed, mass accuracy and resolution, scientific interest and research activity in the field of DIA have increased dramatically over the last few years (Doerr, 2015). Numerous variants of DIA methods have been developed ( Figure S1) and several automated DIA data analysis tools have been implemented . This allows proteomic laboratories to perform routine high--quality DIA experiments and to apply DIA methods successfully in various research areas . In the wealth of DIA acquisition schemes and data analysis strategies, several different names have been coined and are now in use. The naming situation is further complicated by trademarking, whereby certain names are associated with specific mass spectrometer vendors. Throughout the tutorial we will use the generic term "DIA" when referring to the breadth of all data--independent acquisition strategies, while we use the term "SWATH--MS" when referring to the specific acquisition and data analysis method described by Gillet et al.(Gillet et al, 2012) and which is the focus of this tutorial. Figure  S1: Representation of the historical development of DIA methods with their corresponding precursor isolation window widths. Since the introduction of DIA in the early 2000s different implementations of DIA have emerged. They differ in the instrument type used, the data analysis strategy and specific acquisition parameters such as the precursor isolation window widths. Due to technical improvements in instrument acquisition speed and resolution, in addition to novel data analysis approaches, research activity in the field has dramatically increased over the last few years. Several novel DIA approaches with "medium--sized" precursor isolation window widths (in the range between 4 and 100 m/z) have been published recently. References: Shotgun--CID (Purvine et al, 2003), Original DIA (Venable et al, 2004) , MS E (Silva et al, 2006),

The basic principle of SWATH--MS
SWATH--MS measurements can be performed on fast scanning (usually at least 10 Hz), high-resolution (usually at least 15000 at full width half maximum on MS2 level) and accurate mass (usually at least 50 ppm) hybrid mass spectrometers, typically employing a quadrupole (Q) as first mass analyser and a time--of--flight (TOF) or Orbitrap as second mass analyser ( Figure 1A). Technical improvements in quadrupole design have been realized in recent years that enable efficient and even transmission of precursors across the entire width of a SWATH precursor isolation window resulting in a "squared" precursor isolation window ( Figure  4B). This important feature assures that the intensity of an acquired signal and thus the quantity determined for the specific analyte is independent of the position of the precursor in the isolation window.
During SWATH--MS data acquisition, series of MS2 scans are recorded repeatedly during the entire chromatographic gradient. Optionally, one or several MS1 scans per cycle can be included ( Figure 1A). Importantly, all MS2 scans are independent from the content of the optional MS1 scans. In every MS2 scan a wide precursor window is isolated. This precursor isolation window is adjusted stepwise in defined increments, until a predefined total precursor ion mass range is covered ( Figure 1A). To ensure a sufficiently fast recording of MS1 and MS2 scans for chromatogram extraction and peptide quantification, an optimal balance between the number of MS2 scans, the accumulation time per scan, the precursor isolation window width, and the peptide mass range must be found. For example, the SWATH--MS acquisition scheme originally published by Gillet et al. used 32 precursor isolation windows of 25 m/z each to cover a range from 400 to 1200 m/z with an accumulation time per scan of 100 ms ( Figure 1B). One important consequence arising from such wide precursor isolation windows is that co-eluting peptide precursors falling into the same precursor isolation window will be co-fragmented ( Figure 1C and 1D). Depending on sample complexity this can lead to 10s or even 100s of co--fragmented peptides , which results in highly mixed and convoluted MS2 spectra. Such multiplexed MS2 spectra can no longer be successfully analysed using the established database search tools developed for the analysis of MS2 spectra generated by DDA. Instead SWATH--MS data require advanced data processing strategies. Of course, convoluted fragment ion spectra from co--fragmentation of more than one precursor are also observed in DDA--based and targeted proteomics where isolation window widths range from 0.5 to 3 m/z, but to a significantly smaller extent (Michalski et al, 2011;Wang et al, 2014). One out of several possible DIA analysis strategies developed for SWATH--MS data is "peptide--centric scoring" (Ting et al, 2015). In this strategy the acquired dataset is queried for the presence of a specific peptide using a set of peptide query parameters that relate to such peptides. This is most typically achieved by using previously generated knowledge in form of spectral libraries. This strategy, also referred to as "targeted data extraction" was described in the initial SWATH--MS publication (Gillet et al, 2012) and has remained the primary approach for DIA data analysis in applied projects to date. In summary, in SWATH--MS all ionized peptides that fall within the isolated precursor mass range are fragmented and recorded in a systematic and unbiased fashion. Qualitative and quantitative information in SWATH--MS data needs to be extracted from highly multiplexed MS2 spectra, which requires advanced processing strategies, such as peptide--centric scoring. Altogether, SWATH--MS is a good compromise between the proteome--coverage capabilities of DDA--based proteomics and the quantitative consistency of targeted proteomics.

Advantages of SWATH--MS
The principles of SWATH--MS outlined above result in specific performance benefits and limitations, which are summarized and compared in Table 1. The favourable performance characteristics of SWATH--MS and other DIA--based methods have contributed to its recent popularization (Doerr, 2015) and will be discussed in the following paragraphs.

Ease of data acquisition
An important advantage of SWATH--MS is the simplicity of data acquisition. A SWATH--MS measurement requires definition of the precursor isolation scheme, which includes parameters such as precursor isolation window widths, m/z range to cover, and accumulation time for the MS2 scans. Once this acquisition scheme has been optimized for a certain sample type on a given LC--MS/MS system, all samples of that particular type can be analysed with the same method. The acquisition scheme has also proven quite robust across sample types. This ease in data acquisition is similar to DDA acquisition, and an advantage over targeted data acquisition where peptide precursor ions (PRM) or peptide--fragment ion pairs (SRM) need to be defined upfront for every peptide of interest and where data acquisition often needs to be scheduled according to retention time (Picotti & Aebersold, 2012).

Breadth of peptide detection and multiplexing
A second advantage of SWATH--MS is the high multiplexing rate achievable for peptide identification and quantification. Similar to DDA--based methods, but in contrast to targeted proteomics, it is possible to query and detect 10,000s of peptides in a single SWATH--MS injection. This has been demonstrated for example for human cell lines such as HEK293, where routinely 30,000--40,000 peptides were identified from which 4000--5000 proteins were inferred at a 1% protein FDR cut off in a 2--hour run . Therefore, SWATH--MS provides a dramatic increase in the number of peptides that are quantified in a single run when compared with targeted proteomics by SRM or PRM. The higher number of quantified peptides leads to a higher number of inferred proteins and in increased coverage of their primary amino acid sequence. This can improve protein inference, can lead to more precise and accurate protein quantification and may allow a more extensive mapping of posttranslational modifications or protein variants. Furthermore, a high multiplexing rate is beneficial for global data normalisation. With SRM and PRM, global normalization between samples is often difficult, because the underlying assumption that the major fraction of the detected proteome does not change its concentration between samples is often not true (especially if the target peptides were selected based on a prior hypothesis of being involved in the studied biological process). SWATH--MS, just like DDA, allows parallel monitoring of a potentially unbiased set of peptides with regards to regulation, which is necessary to perform global normalisation.

Reproducibility and consistency
When proteomic studies take on a scale of hundreds of samples, reproducibility of peptide detection and quantification becomes a major issue. In DDA--based proteomics consistency in peptide sequencing is difficult due to the heuristic precursor selection carried out by the mass spectrometer in real--time. Therefore, when a peptide is not identified in a particular DDA measurement, it not justified to conclude that this peptide is not present at a detectable level in the sample ("true negative peptide quantity"). Instead it might be that this peptide was not selected for fragmentation due to the presence of many co--eluting peptides ("false negative peptide quantity") (Michalski et al, 2011). An important improvement, which helped to decrease the number of such false negatives, was the development of analysis tools that allow to transfer peptide identifications between samples and thereby to improve the completeness of the quantitative data matrix (Cox et al, 2014;Mueller et al, 2007;Prakash et al, 2006). In contrast, in SRM, PRM and SWATH--MS, MS2 data acquisition is performed in a systematic fashion and a peptide centric analysis strategy is used. In combination, these strategies make it possible to conclude with a higher confidence that a targeted peptide is correctly "not detected" and therefore indeed not present in the sample at a concentration above the lower limit of detection. Nonetheless, when automated data analysis tools are used to query thousands of proteins, or tens of thousands of peptides, also in SWATH--MS data "false negative peptide quantities" will occur to some extent. However, just as for DDA, also for SWATH--MS software tools have been established ) that allow to align SWATH--MS runs and to transfer identifications between runs. Due to the increased sensitivity of fragment ion detection compared to precursor ion detection, as well as due to the higher information content of MS2 data (co--elution of many fragment ion chromatograms compared to just one precursor ion chromatogram), this transfer of identifications can be more robust and sensitive for SWATH--MS than for DDA. Finally, for SRM (Abbatiello et al, 2015;Addona et al, 2009) as well as SWATH--MS ) excellent intra--and inter--laboratory reproducibility have been reported, with coefficients of variance (CV) below 20% CV, even in large--scale studies including several international laboratories. Consequently, such high levels of reproducibility and consistency lead to high--quality quantitative data matrices that are suited to address a wide variety of biological questions (Rost et al, 2015).
Retrospective querying SWATH--MS data is comprehensive in the chromatographic and mass dimensions in both the MS1 precursor and MS2 fragment ion spaces. This is in contrast to DDA data, where precursor ions are continuously recorded on MS1 level, but not so the measurements of fragment ions and the MS2 data is hence inherently incomplete. SWATH--MS data are well suited to be re--analysed at a later time point, when new proteins, peptides or PTMs, that were not part of the first biological hypothesis or for which peptide query parameters were initially not available, should be included . This is unlike SRM or PRM analyses, where for novel target proteins or peptides new measurements are required.

Analysis of modified peptides
An additional promise of SWATH--MS is its usefulness to identify and quantify modified peptides, to localise the amino acid position of a modification within a peptide sequence and to generally search for previously unexpected analytes. These important tasks are supported in SWATH--MS by the high--resolution and accurate mass MS2 spectra that are recorded recursively over the whole elution profile of a peptide. Therefore, maximally informative XICs can be extracted from the data at the precursor (MS1) and fragment ion (MS2) level without concerns of stochastic sampling or dynamic exclusion that might hinder identification of isobaric species, as is the case in DDA. Furthermore, for the purpose of PTM-site localisation, fragment ions that are deterministic for the position of the modified amino acid can be extracted and used to discriminate between possible sites of modification, because full MS2 spectra are available (Keller et al, 2016;. Finally, SWATH--MS in combination with open modification search tools offers unique possibilities to discover unanticipated modifications through the peptide--centric query strategy, which does not suffer from the combinatorial explosion of the search space (Keller et al, 2016;Wang et al, 2015). However, intrinsic challenges when analyzing modified peptides apply to SWATH--MS in the same way as to any other bottom--up proteomic approach: the possibly low modification stoichiometry and the therefore required high dynamic quantification range, as modified peptides tend to be much lower concentrated than other peptides in the sample. In summary, the above mentioned performance characteristics make SWATH--MS ideally suited for projects that require accurate quantitative information for a large fraction of peptides in a given sample and that include large sample cohorts to be analysed in a maximally reproducible and consistent way. Typical projects that require these properties include systems biology studies (Bensimon et al, 2012), genetic association studies Okada et al, 2016;Williams et al, 2016) clinical screens (Liu et al, 2014;Sajic et al, 2015), drug/perturbation screens (Litichevskiy et al, 2017) or exploratory basic research (Collins et al, 2013;Lambert et al, 2013;Parker et al, 2015;Schubert et al, 2015b;Selevsek et al, 2015). SWATH--MS is also well suited for fast analyses, where proteome coverage of 50% of the MS--detectable proteome in complex mammalian samples can be achieved in a single shot analyses (Bruderer et al, 2017) Limitations and challenges of SWATH--MS Prior knowledge and the availability of a spectral library In the most routinely and successfully applied SWATH--MS workflows, peptide--centric scoring requires prior knowledge about the chromatographic and mass spectrometric behaviour of the queried peptides in the form of a high--quality spectral library (Schubert et al, 2015a). Usually spectral libraries are generated from discovery proteomics data obtained by DDA, which means that only peptides that were previously detected are contained in the spectral library. In recent years, alternative data analysis approaches for SWATH--MS data that do not require spectral libraries have been developed, examples are DIA--Umpire (Tsou et al, 2015), FT--ARM (Weisbrod et al, 2012), and PECAN (Ting et al, 2017).

Ease of data analysis
Currently the main challenge in SWATH--MS experiments is the peptide--centric data analysis part. The highly multiplexed MS2 spectra originating from 10s to 100s of co--fragmented peptide precursor ions require a sophisticated analysis pipeline to detect and quantify peptides and assign a measure of statistical confidence. For DDA--based proteomics a wealth of analysis pipelines have been developed over the past ~20 years (Cox & Mann, 2008;Deutsch et al, 2015;Keller et al, 2005;Reinert & Kohlbacher, 2010). Also for targeted proteomics mature software tools have been readily available for several years (Colangelo et al, 2013;Teleman et al, 2012). In comparison, automated peptide-centric SWATH--MS analysis is at an earlier stage of development, but several analysis tools are readily available, some of them freely as open source software (for a recent review see ). A comparison of five widely used tools shows that they largely agree in their detection and quantification results . Selectivity, sensitivity and dynamic quantification range A major challenge of SWATH--MS, as of any other mass spectrometric approach, is the limited dynamic quantification range afforded by the instruments. Several groups have reported that in complex proteomic samples quantification by fragment ion signals (MS2) is more sensitive than by the corresponding precursor signals (MS1) due to a better signal--to--noise ratio, selectivity, and intra--scan dynamic range (Egertson et al, 2013;Gillet et al, 2012;Venable et al, 2004). In complex samples there is a greater likelihood that an MS1 signal for a given peptide will be interfered by the signal of another analyte with the same or very similar m/z value. Signals at the MS2 level, on the other hand, are less prone to interferences due to the additional filtering step of precursor isolation and the fact that several fragment ion signals can be combined for quantification. In addition to selectivity considerations, in trapping instruments, a gain in sensitivity can be expected because the signal for a given peptide in MS1 scans may be limited by the automatic gain control effects driven by other more abundant peptides from across the mass range, whereas in SWATH--MS this effect is significantly reduced and limited only to peptides co--isolated in the same SWATH precursor isolation window. In its current implementation on a Q--TOF instrument, SWATH--MS covers a peptide concentration range of 4.0 to 4.5 orders of magnitude in a single injection of a complex proteomic sample, such as a whole--cell lysate of a human cell line . Reported lower limits of quantification in this complex sample are in the mid-attomole to low femtomole range (on--column). Such a quantification range is 3-- to 10--fold less sensitive than in state--of--the--art SRM or PRM measurements (Gillet et al, 2012;Liu et al, 2013;Schmidlin et al, 2016) .
In conclusion, a current drawback of SWATH--MS compared to SRM or PRM is that peptide quantification with SWATH--MS is still less sensitive. Hence, for projects that involve quantification of especially low abundant peptides with maximal accuracy, targeted data acquisition still remains the better option, especially if only few peptides are of interest. Further, SWATH--MS measurements require some upfront effort on spectral library and peptide query parameter generation and optimization.