Peptidomic analysis of mycobacterial secreted proteins enables species identification

Pulmonary disease arising from slow‐growing mycobacterial infections has emerged as an increasingly prevalent clinical concern over the past two to three decades. Proteins belonging to the family of ESAT‐6 secretion (Esx) systems play critical roles in the virulence of most pathogenic mycobacterial species and are associated with drug resistance. However, no clinical applications can detect and discriminate the expression of species‐specific variants of these proteins in clinical samples, such as early growth cultures, for rapid diagnosis of specific mycobacterial infections, which may require distinct interventions. Conventional immunoassay approaches are not suitable for this purpose due to the significant degree of conservation of Esx proteins among species. Herein we describe the development of a novel immunoprecipitation‐coupled mass spectrometry assay that can distinguish Esx proteins that are expressed by slow‐growing mycobacterial species commonly detected in clinical isolates. This approach uses custom antibodies raised against single semi‐conserved peptide regions in M. tuberculosis (Mtb) EsxB and EsxN to capture corresponding peptides from protein orthologs of mycobacteria associated with human respiratory infections, including Mtb, M. avium, M. intracellulare, M. kansasii, M. gordonae, and M. marinum, to detect these species in standard clinical cultures at the first sign mycobacterial growth to allow rapid disease diagnosis.


INTRODUCTION
M. tuberculosis (Mtb) infections can progress to tuberculosis (TB), the leading cause of death from respiratory infection prior to coronavirus disease 2019 and, similar to severe acute respiratory syndrome coronavirus 2 infections, is primarily spread by respiratory droplet exposure through close contact with infected individuals. However, the genus Mycobacterium contains more than 190 species, [1] several of which can also cause human infections. Such non-tuberculous mycobacteria (NTM) infections, unlike Mtb infections, are usually contracted by exposure to environmental sources. NTMs that can function as human pathogens and may cause a variety of disease manifestations, including pulmonary disease (e.g., M. abscessus, M. kansasii, and M. avium complex [MAC] species) that resembles TB, while others may primarily cause skin and other soft tissue infections (e.g., M. marinum). [2] Notably, the incidence of NTM-related pulmonary disease has substantially increased in recent decades, with estimates now ranging from 15.5 to 26.7 new cases per 100,000 adults in populations >50 years of age in developed countries. NTM infections are less common than TB in the developing world but may be misdiagnosed as a significant fraction of suspected (4%-30%) or multidrug-resistant TB cases (18%-27%) in countries with high endemic TB rates. NTM infection rates can exhibit different geographic distributions, but certainly slow-growing NTM (e.g., MAC, M. gordonae, M. xenopi, and M. kansasii) and rapidly growing NTM (M. abscessus and M. fortuitum) tend to be the most prevalent NTM isolates worldwide, [3] while M. gordonaepositive isolates are frequently the result of environmental contamination. [4] In the United States, MAC is the most common cause of pulmonary NTM infection, followed by M. kansasii, and M. abscessus is responsible for most pulmonary NTM infections caused by rapidly growing NTM species. [5,6] Effective treatment of Mtb and NTM infections requires accurate identification of the responsible mycobacterial species or species complexes (e.g., M. abscessus, M. kansasii, or the Mtb and MAC complexes) since different species require different interventions. GeneXpert MTB/RIF, which amplifies genome regions that are specific to members of the Mtb complex, has improved the speed of TB diagnosis, but nucleic amplification assays are not yet available for NTM species. Culture remains the gold standard for clinical confirmation of mycobacterial infection, species identification, and drug susceptibility testing, [7][8][9] and various automated culturereading systems are employed for the rapid evaluation of Mtb and NTM growth in mycobacterial growth indicator tube (MGIT) liquid cultures. NTM species identifi-cation is typically performed by polymerase chain reaction (PCR) sequencing [10] or the detection of PCRamplification of species-specific target regions [11][12][13][14] in laboratory-developed tests, or by matrix-assisted laser desorption/ionization time-of-flight mass spectrometry (MALDI-TOF MS) analysis of positive MGIT cultures, or solid-phase subcultures of these samples, [15,16] However, several of these assays have limitations that may reduce their clinical utility, including complexity, the potential for contamination, and/or the need for late-stage or secondary culture material. New laboratory tests that employ nucleic amplification approaches [17] or MALDI-TOF analyses [8,18] have been employed to analyze polymicrobial clinical samples for species identification and antimicrobial susceptibility testing. Liquid chromatography-MS (LC-MS) analysis of virulence factors detectable in early-stage MGIT cultures may offer another alternative for species-specific identification of mycobacteria.
Mycobacteria utilize five homologous secretion systems (ESX-1 to ESX-5) that have resulted from gene duplication of ancestral gene loci to export many of their secreted proteins, and these loci have undergone further duplication events. [19] The Mtb genome contains 23 ESX genes among these loci that can be classified into specific gene subfamilies. [19,20] The ESX-1 subfamily contains only EsxA (ESAT-6) and EsxB (CFP-10), while the ESX-5 subfamily contains EsxM and EsxN genes located within the ESX-5 locus as well as EsxM and EsxN paralogs at four distant loci that exhibit gene arrangements similar to those in the ESX-5 locus. Mtb requires ESX-5 for virulence, and the secretion of ESX-5 substrates, including EsxN paralogs, is tightly controlled to avoid Mtb elimination by host immune responses, [21] and it has been proposed that the ESX-5 secretion system can selectively export a subset of the ESX-5 secretome. [22] Further, a Thr2Ala variant of the ESX-5 substrate EsxW has been shown to be under positive selection and may play a role in substrate selection for ESX-5 secretion, and multi-drug resistant TB. [23,24] Due to their critical importance in mycobacterial virulence, Esx-1 and Esx-5 substrates appear to represent good candidates for biomarkers of species-specific mycobacterial infection. LC coupled tandem mass spectrometry (LC-MS/MS) can be employed to unambiguously identify peptides from conserved protein homologs (e.g. EsxN paralogs/homologs) expressed by distinct mycobacteria at a single amino acid resolution to accurately diagnose specific infections. [25] In the current study, we thus first identified semi-conserved peptide regions in ESX-1 and ESX-5 substrates from culture filtrate protein (CFP) samples of mycobacteria frequently detected in clinical isolates. We then analyzed their ability to serve as species-specific biomarkers in an immunoprecipitationbased LC-MS (IP-MS) assay, where captured peptides were identified by parallel reaction monitoring (PRM), and their amino acid differences were used to identify their species origin. Diagnostic peptides analyzed in this approach identified target species with high accuracy (87.5%-100%) in early positive clinical MGIT cultures.

BLAST analysis of Mtb CFP10 and EsxN protein sequence
Reviewed UniProt KB database entries for full-length Mtb CFP-10 (P9WNK5) and EsxN (P9WNJ3) protein sequence were searched against the UniProt KB bacterial proteome database using the UniProt The Basic Local Alignment Search Tool (BLAST) function to identify homologous proteins, using default search parameters (E-Threshold of 10, auto matrix selection, no filtering for low-complexity regions, and permitting gaps in the sequence alignments). To evaluate the number of gene duplication events in the 15 most common mycobacterial species that cause human disease, [26] esxB (CFP-10), and esxN gene sequences corresponding to protein hits for these species were evaluated against their corresponding genomes using the NCBI BlastN sequence alignment program with the setting "optimize for somewhat similar sequences". For species that had sequence hits from unassembled whole-genome sequencing data, the GenBank protein ID corresponding to a hit was used to retrieve the sequence of the target peptide region protein. This information was gathered for CFP-10 from M. ulcerans (WP_012392185.1), M. scrofulaceum (ORB70459.1), and M. mucogenicum (WP_061000007.1). To confirm gene duplication events detected in several species, literature reviews were conducted to identify articles that examined gene duplications for the affected species: M. xenopi, [27] M. kansasii, M. szulgai, and M. marinum, [28] M. mucogenicum, [29] M. chelonae, and M. fortuitum. [30]

Protein filtration and digestion
Stored NTM CFP samples were passed through a 50 kDa filter unit (Amicon, Thermo) by 14,000 × g centrifugation at 4˚C to reduce albumin present in the starting culture media, and analyzed for total protein concentration by BCA assay (Thermo Pierce). For NTM samples, 100 µg of this filtrate was reduced with 10 mM dithiothreitol (Sigma, USA) at 37˚C for one h, alkylated with 50 mM iodoacetamide (Sigma) at 25˚C for one h, and then digested with sequencing grade trypsin (Promega, USA) at a protein: enzyme ratio of 50:1 for 16 h at 37˚C. Mtb CFP obtained from BEI Resources protein additives was pre-concentrated, [31] but otherwise processed in the same manner as all other CFP samples.

LC-MS/MS analysis of CFP tryptic digests
Tryptic peptide mixtures were separated with a linear gradient of 2%-37% buffer B (100% ACN and 0.1% formic acid) at a flow rate of 300 nl/min on an EASY-Spray C18 LC column (15 cm × 75 µm internal diameter, 3 µm particle size, Thermo Scientific). An UltiMate 3000 nanoLC system (Thermo Scientific) was online coupled to an Orbitrap Velos Pro (NTM samples) or Orbitrap Fusion Lumos instrument (Mtb sample) (Thermo Scientific). NTM and Mtb CFP samples were analyzed using different MS instruments and conditions to account for differences in complexity arising from the greater protein contribution (5 mg/ml) by the media of the NTM versus Mtb culture media, as indicated above. MS data were acquired in a data-dependent strategy selecting the fragmentation events based on the precursor abundance in the survey scan (mass-to-charge [m/z] 275-1850 for NTM samples and m/z 375-1700 for the Mtb sample). The resolution of the survey scan was 120,000 at m/z 400.
For NTM samples, low-resolution MS/MS spectra were acquired in rapid collision-induced dissociation (CID) scan mode for the 15 most intense peaks from the survey scan. The normalized collision energy of 35 was used for fragmentation, with a dynamic exclusion of 40 s using an isolation window for MS/MS fragmentation set to 2 m/z. Resulting CID MS/MS spectra were searched against mycobacterial proteomes using Proteome Discoverer (version 2.4, Thermo Scientific) and the following parameters: tryptic digestion with no more than two missed cleavage sites; precursor and product mass tolerances of 10 ppm and 0.6 Da respectively; cysteine carbamidomethylation as a stable modification; and methionine oxidation as a dynamic modification. UniProt Proteome IDs for the databases of each mycobacteria isolate analyzed in this search, and their total number of protein sequence entries were: M. kansasii strain ATCC12478 (UP000017786; 5,843 entries), M. abscessus strain ATCC19977 (UP000007137; 4,940 entries), M. intracellulare strain ATCC 13950 (UP000008004; 5,131 entries), M. avium subsp. avium 2285 (R) (UP000019908; 6,021 entries) and Mtb strain H37Rv (UP000001584; 3,993 entries).
For the Mtb CFP sample, data-dependent MS/MS analysis was performed using a top-speed approach (cycle time of 3 s), normalized collision energy of 30 for Higher-energy C-trap dissociation (HCD), with dynamic exclusion set to 60 s, and the isolation window for MS/MS fragmentation set to 1.6 m/z. Mtb HCD MS/MS spectra were searched against the TB proteome in UniProt using Proteome Discoverer (version 2.4; Thermo Scientific) and the following parameters: tryptic digestion allowing no more than two missed cleavage sites; precursor and product mass tolerances of 10 ppm and 0.02 Da respectively; cysteine carbamidomethylation as a stable modification; and methionine oxidation as a dynamic modification. After the database search, Percolator was used for peptidespectrum match (PSM) validation with default settings. A decoy database was generated by reversing each protein sequence and a concatenated database was created from the non-decoy and the decoy databases were searched to evaluate the false discovery rate (FDR), calculated as the number of decoy hits divided by the number of target hits. Percolator analyses using a 1% FDR were sequentially employed to filter out high confidence matches at the PSM, peptide, and protein levels.

MGIT samples
This research project was reviewed and approved by the Institutional Review Board (IRB) at Tulane University prior to study initiation. Mycobacterial cultures were obtained from patients enrolled in IRB-approved protocols at the National Institutes of Health Clinical Center. Respiratory specimens were decontaminated using the N-acetyl-L-cysteine/sodium hydroxide method, concentrated by 15 min centrifugation at 3000 g, and pellets were suspended in 0.8 ml of sterile phosphate buffer saline (PBS). Microscopic examination for the presence of acid-fast bacilli (AFB) was performed by auraminerhodamine staining (Becton Dickinson, Sparks, MD). All patient samples were inoculated into Bactec 960 MGIT tubes containing the manufacturer's growth supplement (PANTA) and Middlebrook 7H11 agar (Remel, Lenexa, KS). Growth-positive MGIT samples were stained with auramine-rhodamine to detect AFB and/or contaminants and sub-cultured on solid media for species identification, and 3 ml was filtered sterilized by passage through a 0.22 µm filter and stored at -80 • C for subsequent LC-MS/MS analysis. Mycobacteria isolated from positive MGIT cultures were identified by secA1 gene sequencing directly from the culture [10] and/or by MALDI-TOF MS after subculture on solid media. [32] Multiplex PCR was employed to differentiate Mtb from M. bovis. [33]

EsxN and CFP-10 peptide IP
Clinical MGIT sample aliquots (500 µl) were supplemented with 10 µl of 1 M Tris solution (adjusted to pH 8) and rotation-mixed at 37 • C for 16 h with 5 µg of sequencing grade trypsin (1:600 enzyme/protein mass for a mean 6 mg protein/ml value). After digestion, samples were supplemented with 5 µl of 10% trifluoracetic acid solution (adjusted to pH 7) and spiked with 40 µl of an internal standard (IS) peptide mixture (25 nM each) to permit accurate identification of the isolated target peptides. Custom rabbit polyclonal antibodies (GenScript, Nanjing, China) raised against Mtb EsxN-p2 (AQAASLEAEHQAIVR) and Mtb CFP-10-p2 (TQIDQVESTAGSLQGQWR) were suspended in PBS buffer (pH 7.4) separately, and 40 µg of antibody was bound to 3 mg of protein G Dynabeads (Thermo Fisher Scientific) in 400 µl of PBS buffer containing 0.2% (v/v) Tween-20 binding buffer, which was then washed twice with 200 µl of PBST binding buffer, and suspended in 50 µl of binding buffer. EsxN and CFP-10 peptide antibody beads (150 µg each) were simultaneously mixed with trypsin-digested CFP samples for 1 h at 25 • C, then washed three times with 50 µl of binding buffer, and two times with 50 µl of LC grade water (Fisher Scientific). Beads were then incubated for 30 min at 25 • C with 100 µl of a 1% (v/v) formic acid solution, then precipitated by magnetic isolation, after which supernatants containing eluted peptides were loaded onto StageTips (9) generated by packing 200 µl pipette tips with four layers of Empore C8 solid phase extraction disk (3 M; Cat. # 2214-C8). StageTips were washed by 25 • C centrifugation at 1000 g for 3 min with 50 µl of 0.1% (v/v) trifluoroacetic acid (TFA) acetonitrile solution and 50 µl of 0.1% (v/v) TFA prior to sample loading. Each peptide sample was split and loaded onto two StageTips, and captured peptides were eluted with 50 µl of 0.1% (v/v) TFA acetonitrile solution. Paired eluents were combined, dried by vacuum concentration, re-dissolved with 8 µl of sampling buffer containing 0.1% (v/v) formic acid, 2% (v/v) acetonitrile, and centrifuged at 25 • C and 21,000 g, for 10 min.

Development and analytical validation of targeted MS measurements
In this study, Tier 2 level PRM was employed to identify and quantify peptides generated by tryptic digestion of mycobacterial secreted proteins. To select the precursor ions of EsxN and CFP-10 peptide targets for PRM analysis, we retrieved eight protein sequences in April 2020 from the UniProt database for Mtb EsxN (P9WNJ3), Mtb EsxO (P9WNI7), M. kansasii ESAT-6-like protein (U5WP40), M. avium ESAT-6-like protein (X8B9F3), M. intracellulare ESAT-6-like protein (H8IMX9), Mtb CFP-10 (P9WNK5), M. kansasii CFP-10 (B5TV82) and M. marinum CFP-10 (B5TV81) and imported into Skyline and predicted the precursor m/z of theoretical tryptic peptides detected with no missed cleavages, omitting sites prior to proline residues, and allowing peptide lengths 8-25 amino acids and cysteine carbamidomethylation. These proteins were selected from the identified protein lists of our CFP peptide discovery experiments using five mycobacteria species and the available UniProt database proteomes of M. marinum. Monitored PRM analysis precursor ions are listed in Figure 1D in a charge 2 m/z state. PRM analysis for target peptides was performed using a QExactive HF-X Orbitrap mass spectrometer coupled with an UltiMate 3000 ultra-high-pressure LC system (Thermo Scientific). Peptides were loaded onto an Acclaim PepMap100 C18 trap column (300 µm ID × 5 mm, 5 µm, Thermo Fisher Scientific; Cat. # 160454) and then separated on a PepMap RSLC C18 analytical column (75 µm ID × 15 cm, 3 µm, Thermo Fisher Scientific, Cat. # 164568). Peptides were eluted with a 300 nl/min gradient generated by mixing buffer A (0.1% formic acid in water) and buffer B (0.1% formic acid in acetonitrile) as follows: a 5 min wash with 5% buffer B, a 17 min 5%-38% buffer B gradient, a 2 min 38%-95% buffer B gradient, a 0.1 min 95-5% buffer gradient, and a 1 min 5% buffer B wash. The theoretical m/z of each peptide's charge 2 precursor ion was predicted by Skyline and input into the inclusion list. The full mass scan range was 500-1200 m/z. The analysis used a 200 ms maximum injection time (IT) and 3e6 automatic gain control (AGC) for full mass scan, 100 ms maximum IT and 2e5 AGC for tandem mass scan, 0.7 m/z isolation window, a 12 loop count setting, a 1.8 kV spray voltage and a 275 • C capillary temperature.
For method validation using MALDI-TOF MS as assay readout, Captured peptides were eluted from capture beads by a 30 min room temperature incubation with 5 µl of 0.1% TFA, after which 1.5 µl of eluent and 1.5 µl of α-Cyano-4-hydroxycinnamic acid (Sigma) was spotted onto a multiwell sample plate, dried and analyzed by a Microflex LRF MALDI-TOF MS system, measuring two replicate spots for each sample. An m/z range of 800-2500 was used for detection and a threshold of signal-to-noise ratio (S/N) of 3 was set to define a mass peak. A mass target was defined to be observed in a sample only when its m/z was observed with an S/N threshold value greater than 3 in both spots in duplicate IP-MS assays.
Crude IS peptides (HPLC purity >75%) labeled with stable isotope arginine ( 13 C 6 15 N 4 ) were purchased from New England Peptide (Gardner, MA). Mass errors for these peptides were all within 0.1% of their theoretical molecular masses when analyzed by an AmaZon SL ion trap mass spectrometer (Bruker, MA). All these peptides were dissolved in 2% acetonitrile/0.1% trifluoracetic acid to final a concentration of 50 µg/ml each to generate a multiplex IS peptide mixture, and a 1 µl aliquot was analyzed by LC-PRM-MS to obtain the retention times and MS/MS spectra for each of these peptides. A spectral library was built using the synthetic IS peptide for each targeted peptide, and these MS/spectra were searched against the eight target protein sequences and 236 common protein contaminants with Peaks Studio software (version 10.5). Identified peptide spectra were exported as mzXML files and imported into Skyline to build a spectral library. The dotp, rdotp, and peak area ratio values (endogenous/IS peptide) for target peptides detected in clinical MGIT samples were calculated in Skyline. EsxN-p2 fragment ions y1-y13 and b2-b3 and CFP-10-p2 fragment ions y1-y16 and b2-b3 were used to extract the ion chromatograms of each sample to achieve a full-length coverage to discriminate potential isomers. Endogenous (light) peptide fragment ions were required to co-elute in the same retention window as IS peptide (heavy) fragment peptide ions with a < 20 ppm mass error tolerance and to yield a dotp similarity score ≥0.9 upon Skyline analysis.
Identification of endogenous peptides was based on retention time and dotp, to avoid the potential loss of rdotp   Figure 1A, Dataset file S1), where the reduced protein content of the starting Mtb culture media likely contributed to the greater number of Mtb proteins detected in this analysis. Sequest HT protein identification scores for the CFP sample with the fewest identifications detected five ESAT-6-like proteins and one PE domain-containing protein associated with the ESX-1 and ESX-5 secretion systems ( Figure 1B, Dataset file S1). Substrates of the ESX-1 and ESX-5 secretion systems were also identified in Mtb CFP digests; substrates of ESX-5 were identified in CFP digests from M. avium and M. intracellulare, which lack ESX-1 secretion, and neither ESX-1 nor ESX-5 substrate proteins were detected in the CFP digests of M. abscessus, which does not express either pathway. We thus hypothesized that sequence variants in CFP-10 and EsxN ortholog/paralog peptides might be able to distinguish these species in early growth mycobacterial cultures of clinical samples.

Identification of conserved regions of CFPs as diagnostic peptides
Analysis of trypsin-digested Mtb CFP samples detected four and two peptides that were, respectively, shared among EsxM and its paralogs and EsxN and one or more of its paralogs (Table S1), while a single detected peptide distinguished EsxN and two of its paralogs (EsxO and EsxL).
A semi-conserved peptide region derived from EsxN or one of its paralogs was detected in digests of CFP from the four species that expressed the ESX-5 locus, and distinguished cultures of Mtb and M. kansasii cultures from those of M. avium and M. intracellulare, which were not distinguishable from each other ( Figure 1C). Similarly, a CFP-10-derived peptide detected in Mtb and M. kansasii CFP digests distinguished these samples. Thus, identification of species-specific EsxN paralog and CFP-10 peptides in early phase mycobacterial cultures of clinical specimens have the potential to differentiate Mtb, M. kansasii, and MAC infections, but not infections caused by M. avium versus M. intracellulare.
We next searched the reference genomes of 16 mycobacteria species that commonly cause human disease [26] to detect potential ESX-1 and ESX-5 loci based on their characteristic gene structures, [34] which were then analyzed to identify unique peptides sequences (Table S2) corresponding to the candidate CFP-10 and EsxN target regions identified in Figure 1C,D. For the most common mycobacterial isolates, single CFP-10 genes were detected in Mtb and M. kansasii; MAC species lacked CFP-10 orthologs, and M. gordonae and M. marinum strain isolates encoded two and two to three CFP-10 genes. [35] Similarly, among these common isolates, Mtb encoded EsxN and its four paralogs, while M. kansasii, MAC, M. gordonae, and M. marinum, and their subspecies and strains, respectively encoded three, two to five, three, and two to four EsxN orthologs/paralogs. Protein sequences of CFP-10 and EsxN orthologs and/or paralogs detected in Mtb, M. avium, M. intracellulare, M. gordonae, and M. marinum were aligned to identify peptide targets suitable for species identification. Bioinformatic analysis of Mtb CFP-10 identified five tryptic peptides, two of which (CFP-10-p2 and -p4) distinguished CFP-10 expressed by Mtb and M. kansasii ( Figure S1A). CFP-10-p2 was detected with the highest signal upon LC-MS/MS analysis of a trypsin digest of recombinant Mtb CFP-10 protein and was selected for further analysis ( Figure S1C), while CFP-10-p2 ortholog sequences distinguished all four of the target species and one of the three M. gordonae strains.
Bioinformatic analysis identified five tryptic peptides that were conserved in EsxN orthologs/paralogs, which were identified for "Mtb, M. avium, M. intracellulare, M. gordonae, and M. kansasii". Among these peptides, semiconserved EsxN-p1, p2, and p5 sequences could distinguish EsxN orthologs/paralogs produced by these five species with variable discriminatory power to different numbers of species, groups of species, and strains ( Figure S1B). EsxN-p2 produced the highest LC-MS/MS signal intensity of these three peptides upon analysis of a trypsin-digested recombinant Mtb EsxN protein sample and was selected for further evaluation ( Figure S1C). M. avium EsxN-p2 peptide sequence was identical among all retrieved MAC subspecies and strains (Table S3), while Mtb EsxN-p2 sequence matched protein homologs detected in M. kansasii and M. marinum, and two different EsxN-p2 sequences were detected for different strains of M. gordonae ( Figure 1D).

Detection efficiency for peptide variants following antibody capture
EsxN-p2 and CFP-10-p2 sequences selected as mass targets for MS-based mycobacterial species identification differed by one or more amino acids, although two different pairs of EsxN-p2 peptides had equivalent mass and were distinguished from each other by their different retention times and product ions ( Figure 1D). Two EsxN-p2, one EsxO-p2, and three CFP-10-p2 sequences were synthesized with heavy isotope labels and subjected to MS/MS analysis to build a spectral library that could be used to identify target peptides specific for Mtb, M. avium, M. intracellulare, M. kansasii, and M. marinum in trypsin-digested MGIT cultures.
To exhibit the performance of peptide variant IP, we calculated the recovery rate as the percentage of total peptide fragment ion peak area before and after IP. In consistency with the above experiment result, 32.6 ± 7.5% and 32.0 ± 9.3% recovery rates were observed for Mtb and MAC EsxN-p2 respectively, whereas they decreased to 7.3 ± 1.1% for  Figure S2), which was caused by antibody affinity variation. The CFP-10-p2 peptide variants, however, showed a quite similar recovery rate among three variants (41.2%-47.8%, Figure S3), supporting the feasibility of using our IP-MS assay in assessing the quantitative changes of each peptide variant in an unbiased way.

Method validation in MGIT samples
To evaluate the feasibility of the proposed assay IP-MS approach, MS detection criteria for EsxN-p2, EsxO-p2, and CFP-10-p2 target peptides were applied to analyze MGIT cultures from a cohort of 74 patients diagnosed with mycobacterial infections, including 52 patients diagnosed with MAC, Mtb complex, M. kansasii, or M. gordonae infections expected to express detectable variants of these target peptides.
Mtb and M. avium EsxN-p2 and Mtb EsxO-p2 target peptides and their heavy-isotope IS peptides were confidently identified in trypsin-digested MGIT cultures of patients infected with each of these species (Figure 2), with each pair of target and IS peptides exhibiting a distinct retention time.
Similar results were obtained when MGIT cultures from patients with active Mtb or M. kansasii infections were analyzed for their corresponding CFP-10-p2 target and IS peptide signals (Figure 3). This LC-MS/MS analysis also detected an extracted ion chromatogram signature similar to that of the M. marinum CFP-10-p2 sequence, but which yielded LC retention times and dotp values that differed from that characteristic of this peptide. Notably, the M. marinum and Mtb CFP-10-p2 peptide monoisotopic peaks differ by 14 Da (Figure S4 , where (E) indicates an interference peak determined to be the M+4 isotopic peak of the Mtb CFP-10-p2 IS peptide, as described in Figure S6 CFP-10-p2 ([M+H] + 2018.0) are captured by the same m/z screen. However, the stochastic distribution of the M+4 isotopic mass among Mtb CFP-10-p2 IS peptide fragments should reduce the detection of transitions ions corresponding to shorter y-ion peptides and decrease its dotp score for M. marinum CFP-10-p2, as observed in this analysis, and these peptide ions and their fragment ions should be detected at different LC retention times.
MS peaks corresponding to the EsxN-p2 sequence shared by Mtb, MAC, and M. kansasii were detected only in MGIT samples obtained from patients infected with one or more of these pathogens (Figure 4). This shared EsxN-p2 signal was detected in all Mtb cases (100%; 22/22) and most M. kansasii cases (92.3%; 12/13), with all positive samples exhibiting robust dotp and peak area values (≥0.98 and >2 × 10 9 , respectively), with the peak area of this shared EsxN-p2 peptide tending to be markedly higher in M. kansasii cases (median 3.62 × 10 9 vs. 4.93 × 10 8 ). Shared EsxN-p2 peptide signal was also sporadically detected in MGIT samples from patients diagnosed with MAC infections, likely indicating cases infected with particular subspecies or strains of M. avium, M. intracellulare, or M. chimera that encode the shared EsxN-p2 peptide target sequence (Table  S3).
M. intracellulare and M. chimaera are difficult to distinguish using established methods, [36,37] and four MGIT cultures were identified as M. intracellulare/M. chimaera positive after MALDI-MS Biotyper analysis. IP-MS results also could not reliably distinguish M. avium, M. intracellulare, and M. chimera MGIT culture samples, since strains or subspecies of these mycobacteria share one or more EsxN-p2 sequences (Table S4), and thus MGIT cultures with positive IP-MS MAC EsxN-p2 signal must be identified as resulting from MAC infections rather than specific MAC species, subspecies or strains.
Peaks matching MAC-specific EsxN-p2 peptide sequence were detected in all MGIT cultures from MAC cases (100%; 14/14), but not in cultures from individuals with any other type of mycobacterial infection. Peaks corresponding to the Mtb EsxO-p2 sequence were observed only in MGIT cultures from Mtb cases. However, unlike the Mtb EsxN-p2 signal that was detected in all these cases, only 13 of these 22 samples (59%) had detectable Mtb-specific EsxO-p2 signal, which was only ∼7% of the matching EsxN-p2 peak signal. None of the MGIT cultures from the negative control group (cases diagnosed with M. massilense, or M. abscessus or M. fortuitum group infections), or from individuals diagnosed with M. gordonae infections, revealed dotp or peak area values similar to those observed in patients with active Mtb, M. kansasii, or MAC infections (Table S4).
Signal consistent with M. kansasii CFP-10-p2 was also detected in three MGIT samples found to contain M. gordonae by MALDI-MS Biotyper assay results due to the equivalent mass of the CFP-10-p2 peptides produced by M. kansasii and two M. gordonae strains (a and d), despite two amino acid sequence difference ( Figure 6A,B). However, the CFP-10-p2 peptides detected in the three M. gordonae cultures revealed a 0.3-min LC retention time, poor similarity (dotp < 0.9), and five unique fragment ions (y8-y12) when compared to the M. kansasii CFP-10-p2 peptide ( Figure 6C,D) and were thus not considered to represent positive M. kansasii CFP-10-p2 signal in this analysis. No signal matching the predicted CFP-10-p2 peak of M. marinum, which primarily causes skin infections, [38] was detected in any MGIT sample, consistent with the lack of M. marinum cases in this cohort.
Surprisingly, m/z targets with dotp values and RT times corresponding to Mtb CFP-10-p2 were sporadically detected in MGIT samples containing mycobacteria that do not express CFP-10 or that express CFP-10 ortholog that cannot produce the target peptide. Peak areas in these samples were not greater than maximum values detected in samples with dotp values < 0.9, except when analyzed immediately after an Mtb sample, suggesting the signal detected in these cases was due to an interfering peptide yielding a weak positive signal or a sample carryover effect.
To adjust for potential non-specific background or LC contamination events results from the 22 MGIT cultures of cases infected with rapidly growing NTM species (i.e., M. abscessus, M. fortuitum, and M. massilense) that do not express any of the target peptides were used to specify the criteria for true positive peaks. Briefly, data above the 99th percentile of the extracted ion chromatogram peak area at the characteristic elution time for each target peptide in these 22 samples were selected as the threshold for a true positive signal. Log10 transformed values for these cut-off thresholds were 6.24 for Mtb/MAC/M. kansasii EsxN-p2 (m/z 797.42), 5.45 for Mtb EsxO-p2 (m/z 803.44), 6.34 for MAC EsxN-p2 (m/z 804.43), 7.33 for Mtb CFP-10-p2 (m/z 1002.5), 6.26 for M. kansasii CFP-10 (m/z 1016.5) and 6.92 for M. marinum CFP-10 (m/z 1009.5), while a dotp cut-off threshold of 0.9 was empirically set to reduce detection of non-specific background peaks. These two criteria disqualified 92% (22/25) of the candidate peaks detected in non-Mtb MGIT culture samples.
Thus, integrated criteria employing target peptide LC retention time, dotp, and peak area filtering criteria should permit accurate identification of all target peptides in clinical samples, even in the presence of high background signal from interfering peptide ions or peptide contaminants.

Distinguishing MTBC species by peptide variant combination
Identifying M. tuberculosis complex (MTBC) species enables distinguishing between strict human and zoonotic TB and to trace source exposure during epidemiological studies. [39][40][41] It is difficult to use MALDI-TOF MS assays to differentiate MTBC species, including M. bovis, M. bovis BCG strain, and M. africanum that often cause human diseases. [39,42] The feasibility of identifying MTBC species-distinguishable peptide variant markers through our IP-MS assay was validated using MGIT digest from the laboratory strains. Inherited lack of CFP-10-p2 is the signature of M. bovis BCG strain, due to the deletion of RD1 region covering CFP-10 coding gene esxB in its genome. [43] As a result, we easily distinguished BCG strains by this signature. The bioinformatic analysis predicts two EsxN paralog peptide variants among the involved MTBC species (Table S2). Our result showed that M. africanum can be unambiguously differentiated from the rest of MTBC species by one of the two variants (Figure 7). M. bovis shared most peptide variants with Mtb, except for EsxO-p2. In our first validated cohort with 74 samples, we identified EsxO in 59% (13/22) MTBC infected samples. Lacking EsxO in the rest 41% of samples may be explained by either the selective expression of EsxO or the actual infection with M. bovis. Further investigation of the species origin in these samples through sequencing is needed to elucidate the reason.

High-throughput MALDI-TOF MS as assay readout
To translate the proposed IP-MS assay into a clinical microbiologic laboratory, we tested the feasibility of employing a MALDI-TOF MS that is commonly equipped in such a laboratory as assay readout. To avoid the ambiguity of isomeric peptide variants such as those from M. gordanae and M. kansasii, we selected 21 MGIT samples including eight Mtb, three MAC, two M. kansassi, one M. bovis BCG, two M. abscessus, and 5 unrelated species as controls, and analyzed them by this approach. This IP-MS assay had an overall diagnostic sensitivity of 92.9% (13/14) with 100% specificity (7/7) (Table S5). This diagnostic sensitivity was reduced by the inclusion of a culture of a rare M. bovis BCG arising after vaccination since M. bovis BCG does not express EsxB and in the absence of EsxB expression, it is not possible to distinguish an infection arising from M. kansasii or an Mtb complex species using the shared EsxN-p2 variant.

4-h turnaround time setup for rapid diagnosis
Inspired by our previous work on developing a rapid diagnostic test for Ebola infectious disease, [44] we tested the 4-h sample preparation workflow in the multiplexed IP-MS assay of EsxN and CFP-10 peptide variants. We selected four representative MGIT samples infected with Mtb or M. kansasii or MAC, and shorten the time of trypsin digestion  Figure S5-S10). All samples showed a relatively low abundance of peptide targets in the initial 16-h digestion method, as shown in Figure 4. The result proves that a 4hr workflow, that is, 0.1-h pH adjustment, 1-h digestion, 1h enrichment, 1-h washing/elution, and 0.5-h LC-MS/MS analysis allows mycobacterial species identification.

DISCUSSION
Secreted proteins, particularly factors associated with virulence represent promising biomarkers for the rapid, species-specific diagnosis of infections from early-positive mycobacterial cultures inoculated with clinical samples. Candidate biomarker peptides analyzed in this study are derived from the ESX-1 and ESX-5 substates CFP-10 and EsxN, which play essential roles in mycobacterial growth and avoiding immune suppression, are detected at high abundance in CFP samples [22] and thus have potential utility as diagnostic biomarkers, as supported by our results. Several NTM can produce infections resembling TB, but which respond to different drug regimens, and which may represent a significant percentage of suspected TB cases (4%-30%) [45,46] and chronic/multi-drug resistant TB cases (18%-27%) cases [47][48][49] Our IP-MS assay approach uses immunoprecipitation to capture and enrich a pair of semi-conserved peptides from CFP-10 and EsxN protein orthologs expressed by a number of slow-growing  [50,51] while M. gordonae-positive cultures frequently represent MGIT culture contamination and this can thus affect the decision to treat a patient. [4] Further, this IP-MS was able to identify both species in two MGIT cultures that grew both MAC and M. kansasii. Such cultures can potentially confound species identifications in sequencing-, probe-, or MS-based methods if they produce overlapping target signals or signal fingerprints that obscure the identity of the species present.
Both immunoprecipitation and LC-MS/MS are essential for the performance of our described IP-MS assay. Specific antibodies that can bind sequence variants of CFP-10-p2 and/or EsxN-p2 sequences that are encoded by our target mycobacteria species of interest are required to selectively enrich these peptides to allow sensitive and specific detection against the high non-specific background produced by the protein constituents of the MGIT culture media, and other mycobacteria-derived proteins, particularly during the early growth phase when mycobacteriaderived proteins are present at very low concentration relative to the protein content contributed by the MGIT culture media. Specific antibodies produced for this proof-ofprinciple study were raised against full-length Mtb CFP-10-p2 and EsxN-p2 peptide sequence and yielded variable detection efficiency in an IP-MS assay when employed to detect equimolar amounts of species-specific synthetic peptide variants spiked into MGIT culture media. This variation appeared to arise from differences in antibody affinity rather than LC-MS/MS detection efficiency among these variants since LC-MS/MS analysis of an equimolar mixture of all CFP-10-p2 and EsxN-p2 IS peptides in PBS did not exhibit significant peak area differences (data not shown). The reduced affinity of the anti-CFP-10-p2 antibodies for the M. kansasii peptide did decrease its detection rate in early positive MGIT samples, although it might increase the amount of time required to detect positive biomarker results when attempting to detect positive cultures and identify mycobacterial species prior to MGIT culture positivity. Affinity differences for speciesspecific CFP-10-p2 and EsxN-p2 could, however, be attenuated by producing monoclonal antibodies to eight to eleven amino acid regions of CFP-10-p2 and EsxN-p2 sequence that contain single amino acid variations and selecting for clones that exhibit similar affinity for all target peptides.
LC-MS chromatography and spectra data both provide information that can distinguish species-specific isomers of target peptides (e.g., M. gordonae and M. kansasii CFP-10-p2) by the characteristic retention times of their peptide ions and co-eluting fragment ions. Species-specific target peptides are identified using pre-established criteria for the retention times of their individual peak ions and their resulting unique fragment ions, the similarity of these spectra to MS/MS reference libraries for the target peptides, and peak area thresholds that differentiate signal from background. Spectral libraries are also employed in commercial MALDI-TOF-MS-based diagnostic assays, but strain-specific differences and growth conditions can alter mycobacterial protein expression to influence spectral library data and subsequent species identifications that rely on the correspondence with species-specific protein expression signatures. Such mycobacterial phenotype differences do not affect IP-MS results, however, since this assay relies on the positive detection of distinct target peptide sequences from two actively secreted virulence factors to make species-specific identifications.
This study indicates that IP-MS assay can rapidly and specifically identify mycobacteria commonly detected in mycobacterial isolates from clinical specimens (M. tuberculosis, MAC, and M. kansasii), as well as one species frequently associated with MGIT contamination (M. gordonae), at the first sign of MGIT growth. Small proofof-concept studies demonstrated that our target peptides were also efficiently detected when trypsin digestions were reduced to 1 h and when MALDI-TOF MS was used as the assay readout to further the reduced assay performance time and increase its throughput. However, further studies should evaluate whether this approach could be employed for target detection prior to MGIT positivity. Further studies should also investigate the potential of this approach to identify mycobacterial infections using plasma or serum as the analysis sample, since we have reported that a similar approach that analyzes a different CFP-10 peptide can detect all manifestations of active TB disease, including culture-negative pulmonary and extrapulmonary cases, in a diverse set of affected patient populations. [52,53]

A C K N O W L E D G E M E N T S
The following reagents were obtained through BEI Resources, NIAID, NIH: Mycobacterium tuberculosis, Strain H37Rv, and CDC1551, Culture Filtrate Proteins, NR-14825, and NR-14826. The work was primarily supported by research funding provided by NIH [R01HD090927, R01AI122932, R01AI113725, and R21AI126361-01], the Department of Defense [W8IXWH1910926], Arizona Biomedical Research Commission (ABRC) young investigator award, and the Weatherhead Presidential Endowment Fund.

C O N F L I C T O F I N T E R E S T
T.H and Christopher J. Lyon report other interests from NanoPin Technologies, Inc., outside the submitted work. In addition, T.H. and Qingbo Shu have a patent ("Compositions and methods of determining a level of infection in a subject") licensed to NanoPin Technologies, Inc. The rest of the authors declare no competing interests.