Rapid and in‐depth proteomic profiling of small extracellular vesicles for ultralow samples

The integration of robust single‐pot, solid‐phase‐enhanced sample preparation with powerful liquid chromatography‐tandem mass spectrometry (LC‐MS/MS) is routinely used to define the extracellular vesicle (EV) proteome landscape and underlying biology. However, EV proteome studies are often limited by sample availability, requiring upscaling cell cultures or larger volumes of biofluids to generate sufficient materials. Here, we have refined data independent acquisition (DIA)‐based MS analysis of EV proteome by optimizing both protein enzymatic digestion and chromatography gradient length (ranging from 15 to 44 min). Our short 15 min gradient length can reproducibly quantify 1168 (from as little as 500 pg of EV peptides) to 3882 proteins groups (from 50 ng peptides), including robust quantification of 22 core EV marker proteins. Compared to data‐dependent acquisition, DIA achieved significantly greater EV proteome coverage and quantification of low abundant protein species. Moreover, we have achieved optimal magnetic bead‐based sample preparation tailored to low quantities of EVs (0.5 to 1 µg protein) to obtain sufficient peptides for MS quantification of 1908–2340 protein groups. We demonstrate the power and robustness of our pipeline in obtaining sufficient EV proteomes granularity of different cell sources to ascertain known EV biology. This underscores the capacity of our optimised workflow to capture precise and comprehensive proteome of EVs, especially from ultra‐low sample quantities (sub‐nanogram), an important challenge in the field where obtaining in‐depth proteome information is essential.

One major challenge in proteomics studies of EVs is sample availability.Although EVs are often described as abundantly secreted entities, their protein concentrations in the cell secreteome (compared to soluble secreted factors) or in biofluids such as plasma (compared to soluble proteins like albumin) such as plasma is relatively low.Combined with the lack of protein amplification mechanisms, this has traditionally necessitated upscaling of cell cultures or larger volumes of biofluids to obtain sufficient EV quantities for mass spectrometry (MS)-based proteome profiling.The focus of this paper is directed at integrating rapid on-bead enzymatic digest of low amount quantity of sEVs, with short liquid chromatography (LC) gradients for data-independent acquisition (DIA) based MS analysis to provide an in-depth proteome [32].We show that this ultra-low proteome workflow is a robust, standardized platform that can support the comprehensive analysis of sEV proteome from 0.5-1 µg initial sample amounts, and subnanogram (500 pg) load amounts.These findings further demonstrate the power and robustness of our pipeline in obtaining sufficient EV proteomes granularity of different cell sources to ascertain known EV biology.

2.1
Cell culture and large-scale purification of sEVs SW480 and SW620 cell lines were cultured in a CELLine AD-1000 bioreactor device as described [33] with 20 mL of culture media harvested for each biological replicate, n = 6.SW480 and SW620 culture media were sequentially centrifuged at 500 × g for 5 min (to remove floating cells), 2000 × g for 10 min (to remove apoptotic body debris) and 10,000 × g for 30 min at 4 • C. The 10K supernatant was centrifuged

Significance Statement
The form and function of extracellular vesicles (EV) is defined by their proteome.This knowledge is essential to describe and understand EVs, encompassing their marker proteins, capacity as a signalling platform, and utility as diagnostic tools and therapeutic targets.However, EV are low-abundant entities in the secreteome and biological sam- at 100,000 × g (1 h, 4 • C) with the pellet re-suspended in 500 µL PBS and subjected to OptiPrep buoyant density gradient centrifugation as described [12,16] and purified sEVs harvested and biophysical characterization performed as per our previous studies [34,35].Protein quantification of samples were determined as described [33].
Initially, proteomic sample preparation parameters used for limited cell lysate.Cell-derived samples (SW620) were subjected to varying conditions of reduction/alkylation and enzymatic digestion time (Table S1).The final concentrations of DTT (10 mM) and IAA (20 mM) remained constant, with a 30-min reduction and a 15-min alkylation step.Bead amounts remained constant throughout, at an excess of 200 µg per sample.The enzyme amount per volume remained constant in each sample (0.16 µg enzyme/100 µl digestion buffer), with enzyme to protein ratio adjusted based on the amount protein for each sample group (1:3 to 1:25).Samples were digested at either 0.5-, 1-, or 4-h digestion at 37 • C and peptides collected and processed as described, with direct proteomic analysis (Table S1).We report peptide recovery on average 48% (of starting protein quantity) across all preparations, with the 4 h digestion timeframe resulting in the highest peptide recovery for all initial protein amounts (Table S1).
Sample preparation of sEV samples of 1 µg and 0.5 µg were performed.Samples of 0.5 and 1 µg were analysed in triplicate and digested for 4 h at 37 • C with varying enzyme to protein ratios from 1:3 to 1:6.Peptides were collected and processed as described, with direct proteomic analysis.
MS acquisition was performed in either data-dependent (DDA) or data-independent (DIA) modes.For DDA, full scan MS were acquired from 300-1400 m/z (60,000 resolution, 3 × 10 6 automatic gain control (AGC), 122 ms injection time) followed by MS/MS data-dependent acquisition (top 30) with CID.MS2 was set to 15,000 resolution, 1e5 AGC target and 27 ms maximum IT, 28.5% normalized collision energy, 1.3 m/z quadrupole isolation width.Unassigned, 1, 6-8 precursor ions charge states were rejected, and peptide match preferred.Selected sequenced ions were dynamically excluded for 30 s (Table S2).
For DIA, full scan MS were performed in the m/z range of 350 to 1100 m/z with a 60,000 resolution, using an automatic gain control (AGC) of 3 × 10 6 , maximum injection time of 50 ms and 1 microscan.
MS2 was set to 15,000 resolution, 1e6 AGC target and the first fixed mass set to 120 m/z.Default charge state set to 2 and recorded in centroid mode.Total scan windows were optimized for each gradient method (15-, 20-, 30-, and 44-min) (Table S2), with staggered isolation window from 350 to 1100 m/z were applied with 28% normalized collision energy.For each gradient length method, DIA workflows (scan windows, isolation window schemes, and mass ranges) were optimised using Freestyle (v1.8SP2) and Skyline (v.21.2.0.369) as described [32] (Table S2).

Consortium via the MASSive partner repository and available via
MASSive with identifier (MSV000092668).

Proteomic data processing and analysis
Protein sequences of homo sapiens (UP000005640, #81,837 entries; Mar 2023) were obtained from UniProt.Peptide identification and quantification for DDA data were performed using MaxQuant (v1.6.14) with its built-in search engine Andromeda as described [16].
DIA data was performed using DIA-NN (v1.8).Spectral libraries were predicted using the deep learning algorithm employed by in DI-ANN with Trypsin/P, allowing up to 1 missed cleavage.The precursor change range was set to 1-4, and the m/z precursor range was set to 300-1800 for peptides consisting of 7-30 amino acids with N-term methionine excision and cysteine carbamidomethylation enabled as a fixed modification with 0 maximum number of variable modifications.The mass spectra were analysed using default settings with a false discovery rate (FDR) of 1% for precursor identifications and match between runs (MBR) enabled for replicates.The resulting output files contaminants and reverse identifications were removed and further analysed using Perseus (v2.0.7.0).Perseus was applied for downstream data processing and analysis.Data quality cut-off was applied with minimum 50% protein group quantification in at least one group.

Nanoparticle tracking and imaging analysis
sEVs from SW480 and SW620 were captured with antibodies against CD63, CD81, and CD9, immobilized, labelled with ONI EV Profiler Kit anti-CD9-CF 488 (yellow), anti-CD63-CF 568 (cyan) and anti-CD81-CF 647 (magenta) and imaged using dSTORM on the ONI Nanoimager as per manufacturer's instructions, with an exposure time of 30 ms for 3000 captured frames.

RESULTS
In this study, we questioned whether the proteome of small EVs (sEVs) could be defined in an optimized, high-sensitivity workflow from sub-nanogram starting quantity, combined with optimized and rapid proteome analysis.Importantly, we applied a comprehensive analysis using different modes of mass spectrometry acquisition and short chromatography gradient length and applied this pipeline to investigate the detection and quantification of sEV proteome.
Within this study we used sEVs that were isolated from SW480 and SW620 cells using differential centrifugation coupled to density gradient separation as previously described [41,42] and characterized (for buoyant density at 1.07-1.11g/mL, size, morphology) as described [16] to meet the experimental requirements as set out by the International Society for Extracellular Vesicles (ISEV) guidelines [43].Orbitrap-based (QE HF-X) MS has been utilized in ultrafast proteomics because of its fast scan rate in high-sensitivity MS [32].This study aimed to establish a high sensitivity method based on ultra-low sample quantity [44] combined with short chromatography [45] and label-free MS by optimized DIA.
We first assessed two MS-based acquisition modes (DDA/DIA) for their ability to interrogate sEV proteome (Figure 1).Next, to find minimal amount of peptides required to obtain comprehensive proteome, we optimised the sEV peptide load as a function of chromatography gradient length (Figure 2).Then, to ascertain EV amount required for obtaining the minimum amount of peptides, we optimised sEV protein enzymatic digest (variables including enzymatic digest using Lys-C/trypsin amount, digest time) (Figure 3).

Comparative analysis of DDA and DIA for sEV proteome
In our three previous studies, we have reported the extensive proteomes (∼2000 proteins) of small EVs (sEVs) from SW480 cells using MS-based analysis in data-dependent acquisition (DDA) mode [16,41], which collectively served as a reference proteome we can ascertain currently in the EV field.
As an alternate to DDA acquisition mode, DIA is gaining rapid traction for its ability to obtain broad protein coverage, high reproducibility, sensitivity, and accuracy [32,[44][45][46].Therefore, we subjected sEVs to standard proteomic sample preparation pipeline (starting 10 µg protein quantity, with 18 h enzymatic digest) and performed MS-based analyses in either DDA or DIA acquisition mode (Figure 1A).The acquisition mode was assessed using a 44 min LC gradient, where we maintained the configuration using a 75 µm I.D. × 25 cm column in the direct injection workflow (Figure 1B).A liquid junction on the column inlet and zero-dead volume post-column connection to the emitter ensured stable ionization and minimal post-column dispersion to maintain chromatographic performance.
Because DIA workflow can interrogate a greater coverage of the sEV proteome, we next aimed to optimise this workflow for low EV sample proteomics.

Optimization of LC gradient length for DIA analysis of sub-nanogram sEV peptides
Next, we optimised DIA workflow using shorter LC gradient using nanogram (ng) peptide amounts.For optimisation, we employed cell lysates from LIM1863 cells, monitoring protein identifications and precursor identifications for 1 ng to 100 ng (peptide) loads across LC gradients of 15-44 min (Figure S3).We evaluated different window sizes in DIA across these different LC gradient lengths (Table S2).While keeping scan speed, resolution, and AGC target, the window size was

F I G U R E 3 Optimising proteome sample preparation pipeline for ultra-low quantities of sEVs. (A)
We next applied a rapid ultrasensitive proteome sample preparation workflow analysis of ultralow starting quantities of sEVs.Here, sEVs (0.5 µg and 1 µg) from SW480 and SW620 cells (n = 4) were processed using solid-phase paramagnetic bead technology for sequential reduction, alkylation, and rapid enzymatic digestion.sEVs were analyzed on optimized 15 min gradient combined with DIA proteome profiling; (B) represents total intensity profile for each condition (0.5 µg and 1 µg protein quantity) / cell source.(C) Coefficient of variance (median, boxes represent the interquartile range, and the whiskers extending to 5%-95% percentiles) at a protein level and (D) protein groups and (E) peptide identifications identified across each sample amount and cell line (mean).(F) Identification of EV marker proteins provided for each group are indicated.(G) Bar graph showing the missingness (%) for proteins not detected relative to each group.Within each group we report non-detected proteins in either one, two, or three out of the four biological replicates samples.(H) Heatmap expression analysis showing the missingness (%) for proteins not detected relative to each group (white).Within each group we report missing/non-detected proteins, which based on their intensity distribution, are low abundant in their expression/intensity. optimized with each LC gradient length to reflect spectral complexity and ion intensity to enable optimal protein identification; window size for 15 min (21 loop size), 20 min (15), 30 min (28), and 44 min (38) (Table S2).A linear increase in protein group and precursor identifications from 1 ng to 100 ng LIM1863 cell digest was observed (Figure S3).Injections with larger sample amounts (100 ng) resulted in an increase in protein identifications at longer gradient lengths (5171, 44 min; 4340, 20 min) due to more intense peaks and, consequently, more ions for sampling.For ultra-low sample loads (1 and 3 ng) we did not observe a significant improvement in protein group or precursor identification across 15-30 min gradients.Importantly, we observe low variation in all LC gradients and load quantities (Figure S3), confirming reproducible nLC-MS performance for protein quantification.
We applied this sub nanogram LC-MS workflow using shorter LC gradient length and DIA mode to sEVs as an approach to describe sEV proteome (Figure 2A-D, Figure S4).For 50 ng load, we identify more than 3730 proteins for all LC gradient lengths assessed (15-44 min), with 4599 proteins identified for 44 min gradient workflow (Table S10).For shorter LC gradients (15 and 20 min lengths) we did not observe a significant difference in protein groups or peptides identified.This short gradient length approach quantified 3882 proteins from 50 ng load (Figure 2C).For 50 ng sEV digest, we observe a high reproducible quantitative proteome across different LC gradients (with median protein coefficients of variation ∼ 10% CV) (Figure 2E).
Further, for 500 pg and 1 ng sEV analyses, we reproducible quantify 19 and 21 EV core marker proteins from these samples (Figure 2F).
We further analyzed non-detected proteins (missingness) within each group in either one, two, or three out of the four biological replicates samples for each group (Figure 2G).We observe a greater proteome variance when reduced peptide loads are sampled, most likely attributed to a greater missingness in the proteome (low abundance) for each group (Figure 2E/G).We highlight that these missing proteins are predominantly low abundant based on the proteome coverage and abundance intensity distribution for 500 pg and 1 ng sEV proteome (Figure 2H, Table S12).
Where possible, for 50 ng sample loads using this analysis workflow (15 min nLC gradient, DIA mode analysis) provides a comprehensive (> 3800 proteins) and reproducible sEV proteome.When limited in sEV peptide amount, then one should consider that this short gradient length analysis workflow can be applied to sub nanogram peptide quantity, although with reduced identification and proteome coverage.

Optimization of proteome sample preparation for ultra-low quantities of sEVs
We next optimized enzymatic digest (enzymatic digest using Lys-C/trypsin amount, digest time) for low amount of sEVs (Figure 3, Figure S5).From our previous experience, SP3 sample preparation of sEV proteins (10 µg) typically results in peptide recovery of 30%-40%.
We also anticipate that this recovery will drop when lower amount of sEV proteins are processed.With the aim to obtain 50 ng peptides for MS analysis, and taking into account peptides for quantitation, dead volume for injection and sample handling, we therefore reasoned that single-pot, solid-phase-enhanced sample preparation (SP3) [49] processing of 0.5 µg to 1 µg sEV protein amount is minimal sEV protein amount required.We interrogate this in sEVs from two cell lines (SW480 and SW620 cells).We have also extensively published proteomes of SW620 cell-derived sEVs (10 µg) [42,50], thus also providing a reference proteome we can compare to assess the effectiveness of our optimized workflow to obtain meaningful proteome.
In addition, for each group we analyzed the missingness (not detected proteins) in the proteomes (Figure 3G, H).We analyzed nondetected proteins within each group in either one, two, or three out of the four biological replicates samples.Expectedly, 0.5 µg sEV samples ranged in data missingness on average 23.8%.For 1 µg sEV samples this reduced the proportion of missing values to 18.3% (Table S16).
Expectedly, proteins that were not detected were typically low abundant proteins (Figure 3H).Hence, our pipeline can obtain sufficient depth of sEV proteome with quantitative performance in identification and quantification of EV marker proteins from ultra-low sEV amounts.

Ultra-low proteome analyses reveal small extracellular vesicle biology in the context of donor cell
We next interrogated whether proteome datasets obtained with our workflow retains sufficient granularity to cover EV biology we have previously reported for SW480 and SW620 cells [52] (Figure 4).These cells were derived from same patient who initially reported for a primary colorectal cancer (from which SW480 cells were established), and later developed metastasis in the sentinel lymph node (from which SW620 cells were established); hence they serve as an excellent in vitro tool to study the progression of the disease.

DISCUSSION
In this study, we have tailored current sample preparation and nLC-MS/MS workflow for in-depth proteomic characterization of ultralow quantities of EVs.By optimising parameters associated with efficient protein digestion and time (4 h enzymatic digest), chromatography workflow (15 min gradient) for 50 ng peptide injection, optimised MS data acquisition (DIA) analysis can reproducibly quantify more than 3880 protein identifications, including coverage of low abundant and EV core marker proteins.This represents a significant step towards describing a comprehensive proteome landscape for these low-abundant but essential signalling entities.
In-depth, reproducible, and robust quantification of proteins is critically important for the field of EVs, including EV-based proteomic studies.However, challenges in obtaining sufficient material following isolation, purification, and downstream analyses have impacted the source material required for proteome studies.We improve upon our existing pipeline for EV proteomic analysis [12,16,20,21,32,53], where we typically required 10 µg sEVs and longer MS run time.
Because a comprehensive workflow includes more than just sample preparation, our pipeline incorporates significantly reduced runtimes (while retaining sensitivity) owing to shorter gradients, which is amenable to large-scale EV studies.These refinements resulted in quantification of sEV with high sensitivity and quantification accuracy.
In terms of MS acquisition modes, our findings complement the performance of DIA in quantitative analysis of scarce sample amounts where high sensitivity sample workflow (e.g., single cell context [44,54,55]) using different chromatography settings and DIA methodology.
Finally, this ultralow proteomic sample workflow is straightforward and can be easily implemented in quantitative proteomics studies applied to subtypes of EVs (where quantities are even lower than bulk EVs) as well as from different biological sources (i.e., cell-, tissue-, and biofluid-derived).Our pipeline could also be easily integrated into robotics platform where needed, particularly for large-scale, cohort studies.Future work will feature the development of an MS1-peptide identification library matching strategy to retrieve peptide precursors to unassigned quantitative frames, to improve proteome coverage, quantitative precision, and missing data levels, thus further enhancing proteome depth of low abundant proteins (incl.cytokines [56]), data completeness and other EV-centric markers [27].This could further be applied for the identification and analysis of post-translationally modified proteins/peptides, important in the context of EV biology such as glycosylation, acetylation, phosphorylation, and sumoylation [57].This sub nanogram approach applied to sEVs provides important considerations when limited sample analysis is required for investigating sEV composition and heterogeneity in biological context.
ples, with disease-specific EV proteins often present in low abundance challenging their detection and quantification due to the inherent challenges in dynamic range using mass spectrometry-based strategies.Combined with lack of protein amplification mechanisms, their proteomic studies require upscaling cell cultures or larger volumes of biofluids.Here, we outline high-sensitivity sample preparation and finely tuned LC gradients for DIA to obtain precise and comprehensive proteome of EVs from ultra-low sample amounts (sub-nanogram).

F I G U R E 1
Comparative analysis of data dependent/independent acquisition for sEV proteomics.(A) Proteomic sample preparation workflow using solid-phase paramagnetic bead technology (e.g., initial EV protein quantity of 10 µg, conventional 18 h Lys-C (1:100) / trypsin (50:1) enzymatic digest).(B) Total intensity profile relative to retention time for DDA and DIA workflow (200 ng load, SW480 sEV TIC profile).(C) Comparative analysis of DDA and DIA approaches applied to EV proteome (n = 3) (from SW480 cells) (two-tailed, ****p < 0.0001); 200 ng peptide load using 44 min gradient method.EV proteomes for each acquisition were analyzed based on protein groups (DDA: 1763, DIA: 3988), peptide identifications (DDA: 14723, DIA: 35777), the percentage of missingness for each group, and coefficient of variance (median) at a protein level.Identification of EV marker proteins are provided for each group.Proteins groups for each acquisition mode are shown; where valid values in at least two out of three biological replicates for each group are provided.(D) Relative abundance analysis of EV proteome (protein rank) from DIA (normalized log10 intensity value) are shown.(E) Protein rank abundance of DDA sEV proteome compared to DIA mode.Histogram represents proteins identified in DDA, relative to protein rank using DIA; all missing values indicate unique proteins identified in DIA (n = 3, identifications in two or more replicates, compared to DDA; unique is not detected in DDA in any replicate).

F I G U R E 2
Optimization of nano chromatography gradient length for optimised DIA analysis of sub-nanogram sEV peptides.(A) EV proteome analysis was evaluated across different preset gradient conditions (15, 20, 30, and 44 min gradient length) for a 50 ng load of lysate digest ((SW480 source, n = 4).Further, for the 15 min gradient workflow, we evaluated direct injection of limited EV peptide analyses (0.5-50 ng).Analyses include (B) Pearson correlation analysis (C) protein groups and (D) peptide identifications identified across each condition (mean).(E) Coefficient of variance (median, boxes represent the interquartile range, and the whiskers extending to 5%-95% percentiles) at a protein level.(F) Identification of EV marker proteins provided for each group are indicated.(G) Bar graph showing the missingness (%) for proteins not detected relative to each group.Within each group we report non-detected proteins in either one, two, or three out of the four biological replicates samples.(H) Log10 relative protein abundance (rank) of sEV proteome using across the different gradient length and ultralow peptide inputs.Histogram represents proteins identified relative to intensity/abundance (log10); orange highlight indicates optimised 15 min gradient length combined with DIA analysis and 50 ng sEV peptide load.

F I G U R E 4
Ultra-low proteome analyses reveal granularity in small extracellular vesicle biology from different donor cells.Using our developed EV proteome analysis pipeline, we performed ultrasensitive proteome sample preparation workflow from sub microgram starting quantities of sEVs from different donor cells.(A) Based on initial starting quantity of sEV (0.5 µg and 1 µg), pearson correlation matrix reveals distinct correlation in sEVs (normalised, 50 ng peptide amount injected for each sample) from SW480 and SW620 cells.(B) Comparative analysis of sEV proteome using EV core marker proteins and MISEV2018 recommended EV proteins.(C) Fluorescent nanoparticle tracking images of SW480 (left) and SW620 (right) sEV positive for CD63 (cyan), CD81 (magenta), and CD9 (yellow) (labelled).(D) Scatter plot of relative protein abundance reported in our pipeline between SW620 sEVs versus SW480 sEVs.(E) Heatmap of differentially abundant proteins (p < 0.05, lf c > 1.2).(F) Bar plot of relative protein abundance (centered) of proteins known to regulate sEVs function.(G) Gene Ontologies (Biological Processes) enriched in differentially abundant proteins (p < 0.05, lfc > 1.2).