Strategies to Avoid Artifacts in Mass Spectrometry‐Based Epitranscriptome Analyses

Abstract In this report, we perform structure validation of recently reported RNA phosphorothioate (PT) modifications, a new set of epitranscriptome marks found in bacteria and eukaryotes including humans. By comparing synthetic PT‐containing diribonucleotides with native species in RNA hydrolysates by high‐resolution mass spectrometry (MS), metabolic stable isotope labeling, and PT‐specific iodine‐desulfurization, we disprove the existence of PTs in RNA from E. coli, S. cerevisiae, human cell lines, and mouse brain. Furthermore, we discuss how an MS artifact led to the initial misidentification of 2′‐O‐methylated diribonucleotides as RNA phosphorothioates. To aid structure validation of new nucleic acid modifications, we present a detailed guideline for MS analysis of RNA hydrolysates, emphasizing how the chosen RNA hydrolysis protocol can be a decisive factor in discovering and quantifying RNA modifications in biological samples.


Introduction
All forms of RNAa re initially transcribed with four canonical building blocks,w ith the transcripts then being enzymatically decorated with any of more than 170 chemical modifications that define the epitranscriptome. [1] Them ost recently proposed addition to the epitranscriptome family involves the first known modification of the phosphate backbone with substitution of an on-bridging phosphate oxygen with sulfur as ap hosphorothioate (PT) in both prokaryotes and eukaryotes. [2] While PT modifications are new for RNA, they have previously been observed in bacterial DNA. [3] Thes ulfur in the PT renders the nucleic acid vulnerable to oxidation, resulting in strand breaks. [4] This instability has led to the initial observation of as ulfur-containing DNAm odification that caused strand breaks during electrophoresis, [5] before the modification was characterized as aPTbymass spectrometry (MS). [3] Furthermore,this property has now been exploited to determine the location of PTs in bacterial genomes at singlenucleotide resolution through iodine-induced cleavage and sequencing-based mapping of the breaks. [6] PTs are introduced into DNAb yaspecialized enzyme complex, DndABCDE, [7] where DndA acts as ac ysteine desulferase. In E. coli,D ndA can be replaced by the desulferase IscS, [8] which is involved in various bacterial RNAt hiolation processes. [9] Half of all PT-containing bacteria have an additional set of restriction enzymes,D ndFGHI, as part of ac lassical restriction-modification system. [10] In other bacteria, PTs are involved in an epigenetic interplay with the 6methyladenosine introduced by the DNAm ethyltransferase Dam. [11] Theg enomic insertion of PT is beneficial to microorganisms and thus aw ide distribution of PT in the human microbiome is not surprising. [12] Thediscovery of most RNAmodifications,including PT in RNA, has been facilitated by sensitive MS analysis.Although this approach is straight-forward and new modifications are reported on aregular basis, [13] the sensitivity of modern mass spectrometers and the need for multiple types of mass spectrometry for rigorous structural definition are potential pitfalls.J ora et al. showed that low abundance artifacts introduced by enzymatic RNAh ydrolysis can be misinterpreted as novel RNAm odifications. [14] In the original RNA hydrolysis protocol by Crain and colleagues, [15] the hydrolysis is performed in two steps,f irst at pH 5u sing nuclease P1 (NP1) and phosphodiesterase 1(PDE1) followed by dephosphorylation by alkaline phosphatase at pH 8. Ao ne-pot alternative using Benzonase instead of NP1 at pH 8has been reported and is now widely used. [16] At pH 8, the labile RNA modification cyclic N(6)-threonylcarbamoyladenosine (ct 6 A) undergoes epimerization and various artifacts arise. [17] Furthermore,n ot all enzymes used for RNAh ydrolysis are capable of cleaving modified nucleotides.F or example, nucleases S1 and P1 are not able to cleave m 7 Gf rom the mRNA5 '-m 7 GpppN cap,w hich has been exploited for cap analysis in transcripts. [18] However,o ther nucleases,s uch as PDE1, can cleave the cap structure as well as RNA phosphodiester bonds to release m 7 Gf or analysis,w hich He and co-workers exploited to differentiate m 7 Gi nc aps from the body of mRNA. [19] Given the large and growing variety of RNAm odifications, [1] there is growing pressure on researchers to correctly distinguish isobaric and structurally similar modifications as well as to rigorously identify new structures.Here we provide ag uide for the discovery and structural validation of new nucleic acid modification candidates.W ea pplied this approach to the recently described RNAp hosphorothioate modification [2] and found that the correct identity of the modification in the nuclease-resistant diribonucleotide species is 2'-O-methylated ribose.Our systematic comparison of RNAh ydrolysis protocols highlights the central role of the hydrolysis step and structural validation by high-resolution MS and other methods in RNAm odification discovery experiments as well as in absolute quantification of modified nucleosides.

Results and Discussion
Mass spectrometric analysis of DNAphosphorothioation depends on the hydrolytic stability of the PT towards several nucleases,i ncluding nuclease P1 (NP1). Nuclease treatment releases PT-linked dinucleotides from the DNAa nd is exploited to quantify and characterize the dinucleotide context by LC-MS. [3,4] Fors ynthetic PT-containing RNA, we observe the same stability towards NP1 ( Figure S1) and thus LC-MS analysis of PT is possible by NP1 hydrolysis followed by detection of the PT-linked diribonucleotide. [2] Under the assumption that PTs might occur within any combination of canonical ribonucleosides,1 6p ossible PT diribonucleotide structures must be considered during method development. In addition, thiolation of the phosphate backbone introduces as tereocenter and thus Rp and Sp isomers of each dinucleotide must be established. With the goal of developing afast and reliable method for absolute quantification of native PTs in RNA, all 32 possible Rp and Sp PT dinucleotides were prepared as reported [2] and their HPLC retention times and MS characteristics assessed by LC-MS/MS ( Figure S2). We then analyzed total RNAf rom E. coli K12 and B7A strains, human embryonic kidney cells (HEK 293), and mouse brain tissue for the presence of PT-containing diribonucleotides. Total RNAw as first fractionated by size-exclusion chromatography following established protocols [20] to yield tRNA, 16S/18S small ribosomal RNA, and 23S/28S large ribosomal RNA. Each fraction was then hydrolyzed with NP1 and analyzed using the developed LC-MS method. We observed signals similar to the synthetic PT precursor and product ions of GpsG and CpsC in all digests with variable abundance depending on the identity of the respective RNAf raction (Figure 1). While the signal for native GpsG and the synthetic Rp-GpsG overlapped, we noticed several signals for CpsC in the various species,w ith only one of these overlapping with the synthetic Rp-CpsC standard.
Theo bservation with CpsC merited am ore detailed analysis of the native PT dinucleotide signals by orthogonal UHPLC-MS/MS analysis using high-resolution mass spectrometry (HRMS) and metabolic isotope labeling.U HPLC-HRMS analysis with adapted solvent gradient of the RNA isolated from E. coli B7A and S. cerevisiae revealed ad iscrepancyinthe retention times of the dinucleotides relative to the synthetic standards for CpsC and GpsG PT dinucleotides ( Figure 2). Thep utative CpsC dinucleotide eluted 30 sm ore slowly than the synthetic Rp PT standard, while the putative GpsG dinucleotide eluted 60 smore slowly than the GpsG PT standard. These results suggested that the dinucleotides isolated from E. coli and S. cerevisiae were not PT-containing dinucleotides.
To ascertain the identities of the observed dinucleotides present in RNAf rom these organisms,t otal RNAw as extracted from both E. coli B7A and S. cerevisiae and digested to am ixture of ribonucleosides and diribonucleotides suspected to contain PT.T he putative PT-containing diribonucleotides were isolated by preparative HPLC for HRMS analysis.High-resolution mass spectra were obtained by orbitrap mass spectrometry for both the synthetic PT diribonucleotide standards and the diribonucleotides isolated from biological samples,w hich revealed a2Da discrepancy ( Figure 2). Unexpectedly,the exact mass found for the native GpsG signal is 1.96017 Da lighter compared to the exact mass observed for synthetic GpsG ( Figures 2B and S3A). In addition, synthetic GpsG showed the natural 34 Ss ignal (4 % at M + 2), while the native GpsG did not. Metabolic stable isotope labeling of all carbon, nitrogen or sulfur atoms was performed in E. coli K12 using minimal medium M9 containing as ingle source for 34 S, 13 Ca nd 15 Nr espectively, and the RNAw as purified and analyzed as described earlier. [4,13b,21] Human cells were stable isotope labeled by feeding 15 N 5 -adenine and/or 15 N 2 13 C 5 -labeled uridine as recently reported. [22] MS analysis of 34 S-labeled E. coli RNAs howed an absence of sulfur in the native analyte,w hich is as trong indication against aP Td iribonucleotide structure.H owever, after growing the cells with L-methionine-[ 2 H 3 ]-methyl, as ignal at m/z 646 indicated the presence of am ethyl group in the diribonucleotide ( Figure 2B). [13b] Complete 15 Nand 13 C labeling in E. coli [13b] also does not provide evidence for the putative GpsG structure.S imilar results were obtained in stable isotope labeled HEK cells ( Figure S3A), where we observed amethylation mark after L-methionine-[ 2 H 3 ]-methyl feeding (m/z 646). Them ass increase from m/z 643 to 651 (+ 8Da) indicates the presence of two 15 N 4 -labeled guanine bases,w hich confirms its nature as ac anonical phosphatelinked GG dinucleotide in HEK cells.T he presence of two guanine bases is additionally supported by analysis of the 15 N-RNAextracted from E. coli,which is 10 Da heavier than the starting material with the exocyclic amino group labeled with 15 Nhere in addition. From this data, we conclude that there is no evidence for GpsG in E. coli, S. cerevisiae,human cells or mouse brain tissue.InF igure 2C,wefocused on the multiple signals for CpsC obtained through targeted LC-MS analysis of native RNA. HRMS of synthetic CpsC and native putative CpsC showed am ass discrepancy of 0.97811 Da ( Figures 2D  and S3B). Again, stable isotope labeling provided evidence of amethyl group instead of asulfur in the analyte.Furthermore, the mass difference between unlabeled and 15 N-labeled signals indicates the presence of only five nitrogen atoms, whereas CpsC has six. We analyzed the corresponding peak from isotope labeled HEK RNAand confirmed the presence of amethyl group and two pyrimidine ribonucleosides (due to the mass increase of + 14, Figure S3B). In human and mouse RNA, five signals were found in targeted CpsC MS analysis using the chromatographic system from Figure 1(Figure S3). Thefirst signal co-elutes with synthetic Rp-CpsC,but HRMS analysis of this signal revealed a m/z of 564 and thus the same % 1umass discrepancy as seen in E. coli and peak 4ofHEK cells in Figure S3. Similarly,t he signal vanishes in the presence of L-methionine-[ 2 H 3 ]-methyl and a3Da heavier signal at m/z 567 appears,w hich suggests the presence of am ethyl group.T he MS spectra from all other peaks from targeted CpsC analysis did not show the expected m/z of 565 for unlabeled RNA, while as ignal for methylation can be found in all of them ( Figure S3C).
Ther esults obtained with multiple mass spectrometric approaches convincingly demonstrate that there are no PTcontaining diribonucleotides in RNAf rom four model organisms,w ith the most likely identity of the modified species being 2'-O-methylated dinucleotides.F or the sake of rigor,wetested the presence of PTs in RNAbyexploiting the sensitivity of PTs towards iodine oxidation ( Figure 3A). This has been used for PT-specific cleavage and subsequent mapping of PT sites in microbial DNAb yn ext-generation sequencing. [6a] To establish iodine-induced cleavage of RNA PTs,w es ynthesized a3 0-mer RNAo ligoribonucleotide with as ite-specific GpsG PT and established the presence of the GpsG by NP1 digestion and UPLC-MS/MS analysis (Figure 3B). Theo ligo was then treated with iodine and the reaction mixture analyzed by HPLC.Asshown in Figure 3C, iodine treatment resulted in the formation of two shorter fragments of 10 nt and 20 nt, which is consistent with cleavage at GpsG site by iodine.A sw es howed with DNA, [3] iodineinduced strand breaks only accounted for % 20 %ofthe oligo degradation, with % 80 %o ft he oligo converted to af aster eluting 30-mer oligo that co-eluted with synthetic 30-mer lacking PT ( Figure 3C). This is consistent with iodine-induced desulfurization of PT to phosphate. [3] To establish loss of GpsG in the iodine-oxidized RNA, we analyzed the digestion mixture by UPLC-MS/MS,which confirmed the loss of GpsG PT-containing diribonucleotide ( Figure 3D). This approach was then applied to total RNAf rom E. coli B7A, which possesses Dnd genes for PT insertion in DNA, an E. coli B7A mutant lacking the Dnd genes (Ddnd BCDE), S. cerevisiae BY4741, and human A549 cells.F ollowing iodine oxidation, the NP1-hydrolyzed total RNAwas analyzed by UPLC-MS/ MS.A ss hown in Figures 3E-H, the presumed MS signal of putative GpsG diribonucleotide was stable to iodine treatment ( Figure 3E-H). Furthermore,w ed id not observe iodine-induced RNAcleavage when total RNAwas analyzed on aBioanalyzer,again suggesting the absence of PTs in RNA ( Figure S4). In summary,our orthogonal approaches show no evidence for PTs in RNAi nE. coli, S. cerevisiae,m ice or humans.
Ther esults of these studies cast doubt on the identity of the RNA-derived molecules as PT-linked dinucleotides, which initiates ap rocess of predicting and proving the true structure.H ere we refer to the workflow depicted in Scheme 1, which starts with ap rediction of the structure. This can lead immediately to am etabolic isotope labeling study or, if abiosynthetic pathway can be predicted, aknockout or knockdown study to assess the modification level. [13,23] In any event, the structure must be synthesized and compared to the native compound for behavior in LC-MS and, if enough biological analyte exists,N MR studies.L Cr etention time represents afirst dimension of identification and ideally more than one stationary/mobile phase pair is used to confirm coelution of synthetic and native compound. As as econd dimension, af ull mass spectrum of fragmentation on ah ighresolution instrument is required to establish exact molecular weight, MS/MS fragmentation patterns,a nd isotope envelopes.T he chemical structure is confirmed if the native and synthetic versions behave identically.
However,o ne must also consider the possibility that the observed molecule is an artifact caused by adventitious enzymatic or chemical reactions during cell lysis,R NA purification, RNAp rocessing,oreven ionization in the mass spectrometer.S uch artifacts are best excluded by analysis of stable isotope labeled nucleic acids.F or example,aminations, which occur during some RNAh ydrolysis protocols,a re identified in 15 N-labeled RNAb ythe absence of one 15 N. [14] Furthermore,stable isotope labeled nucleic acids are ideal for co-injection with the synthetic standard. Only compounds that pass this final step of structure validation should be taken into biological testing,i ncluding experiments on the compoundsbiosynthesis,l ocation, distribution or quantity.
This strategy was applied here with astructural prediction that starts by considering that the MS analysis detected as ignal at m/z 643 that was 2Dal ower in mass than the predicted GpsG,w hich should have had as ignal at m/z 645. Considering that the predicted structure has 21 Ca toms,t he natural abundance of 13 C( 1.1 %) would produce an M + 1 signal (m/z 644) that is 23 %o fp arent molecular ion (M) intensity and an M + 2signal (m/z 645) that is 2% of M. The high sensitivity of triple-quadrupole instruments can lead to amistaken identification of M + 1orM+ 2signals as M. The most immediately practical candidate dinucleotide structures that could account for this 2Dad ifference are 2'-O-methylated dinucleotides,w ith 2'-O-methylated ribonucleosides occurring abundantly in most forms of RNA.
We tested this prediction in as eries of studies that followed the checklist in Scheme 1.
We started by testing the 2'-O-methyl dinucleotide hypothesis as as ample preparation artifact:c ould 2'-Omethyl dinucleotides arise from incomplete hydrolysis of RNA? Indeed, more extensive hydrolysis of native RNAwith NP1 for longer than 30 minutes decreases the dinucleotide signal. In contrast, dinucleotide signals from synthetic PT RNAa re stable even after 3hours of NP1 hydrolysis (Figure 4A). This led us to compare common enzymatic RNA hydrolysis protocols for the completeness of the reaction. Here we used native RNAf rom HEK cells digested with either (1) Benzonase + phosphodiesterase I( PDE1) + calf intestine alkaline phosphatase (CIP) (protocol 1), [16] (2) NP1 + CIP [15] (protocol 2; as in Figures 1a nd 3), or (3) a commercial RNAhydrolysis kit (NEB,Nucleoside Digestion Mix), followed by quantification of the released nucleosides by isotope-dilution LC-MS/MS. As shown in Figure 4B,a ll three approaches release asimilar amount of canonical ribonucleosides.Similarly,some modifications,such as 1-methyladenosine (m 1 A), 5-methyluridine (m 5 U) and pseudouridine (Y)a re released in similar abundances ( Figure 4C). However,o ther modifications,e specially the 2'-O-methylated ribonucleosides Cm, Um and Gm were detected at lower concentrations using protocol 2 and the kit ( Figure 4D), the latter also failing to release other modified nucleosides such as 5-methylcytidine (m 5 C) (Figure S5). To understand why protocol 1w as superior to protocol 2, we repeated the experiment using Benzonase + CIP and NP1 + CIP in the presence and absence of PDE1. As shown in the first graph of Figure 4E,Benzonase alone does not fully hydrolyze RNAtothe monoribonucleotide level for dephosphorylation by CIP.Incontrast, NP1 produces amore extensive RNAh ydrolysis in the absence of PDE1, but complete release of Cm, Um and Gm is only possible with the addition of PDE1.
These results show that 2'-O-methylribonucleosides are recalcitrant to release from RNAd uring hydrolysis,w hich raises the question of the identity of the PT mimics as 2'-Omethylated dinucleotides that arise due to incomplete RNA hydrolysis.Wenext defined the structure of the PT mimics.T o confirm the predicted structure,w es ynthesized 2'-O-methylated dinucleotides CmC,C mU,U mC and GmG (example  given in Figure 5A), which we used to start the workflow in Scheme 1b yf irst confirming the HPLC retention time of synthetic and native PT mimics ( Figure 5B). Thes ynthetic CmC,CmU,UmC and GmG were then co-injected with fully hydrolyzed (NP1 + PDE1 + CIP) 13 C-labeled E. coli RNAfor LC-MS/MS analysis,w hich revealed co-elution of the GmG (m/z 643) with amolecule with m/z 664 ( Figure 5C)and CmU (m/z 564) co-eluting with amolecule with m/z 583 ( Figure 5D, S6). Themass differences between native and isotope labeled molecules are consistent with the number of carbons in CmC, CmU,UmC and GmG fully labeled with 13 C. High-resolution fragmentation spectra of synthetic and putative native CmU in Figure 5E show am olecular ion (m/z 564.133) and fragments that differ by < 1ppm (Table S1). Similar results were obtained by co-injection with stable isotope labeled RNAf rom HEK cells ( Figure S6).
To further confirm the identity of the 2'-O-methylated dinucleotides,w ec ompared MS/MS signal intensities associated with mass transitions corresponding to PT and 2'-Omethylated dinucleotides.T he CmU and GmG dinucleotides isolated from E. coli, S. cerevisiae,a nd HeLa cells showed a % 25-fold increase in abundance when detected with the mass transitions for the 2'-O-methylated dinucleotides rather than the PTs ( Figure S7). This observation suggests that the signals detected with the PT transitions likely represent low abundance isotopomers of the dinucleotides with amass that agrees with the mass of the respective PT dinucleotide.These results prove that 2'-O-methylated dinucleotides account for the signals described by Wu et al. [2] [16] 2, NP1/CIP; [15] or 3 acommercialR NA hydrolysis kit (NEB, Nucleoside Digestion Mix). (E) Abundance of ribose methylated nucleosidesf rom HEK total RNA digested in the absence (À)a nd presence (+ +)o fphosphodiesterase 1( PDE1) using either:1 ,Benzonase + CIP [16] or 2, NP1 + CIP. [15] All data represent mean AE SD for 3e xperimental replicates. 2'-O-Methylation is an abundant modification in both ribosomal RNA(rRNA) and tRNA. [1] Given their resistance to hydrolysis and our focus on 4o f1 6p ossible dinucleotide contexts,wewondered about the diversity of 2'-O-methylated dinucleotides in different organisms and different types of RNA. Literature precedent provided guidance on established dinucleotide contexts in tRNAand large and small rRNAs,as indicated by circles in Figure 6A.O ptimal hydrolysis conditions resulted in detection of only 3o ft he 16 possible dinucleotide contexts in tRNAfrom E. coli (GmG,CmA and CmU;F igure 6A,B), which contrasts with 14 detected 2'-Omethylated dinucleotides in tRNAfrom HEK cells,including 7previously unreported dinucleotide sequence contexts (Figure 6A,C). For E. coli 16S and 23S rRNA, we detected the reported GmG and CmC dinucleotides as well as unreported GmU and CmU contexts.Weextended these studies to mouse brain RNAs for which there is little information about 2'-Omethylated dinucleotides.A ss hown in Figure 6A,m ouse tRNA, and 18S and 28S rRNAs possess every possible dinucleotide sequence context, including the AmA not detectable in human tRNA. These results point to the power of rigorous LC-MS to discover new modifications and their sequence contexts.However,there are also serious limitations for interpreting the biological meaning of the LC-MS observations.F or example,w hile UmU has been observed in published studies, [1] we were not able to detect it in any type of RNAf rom any organism tested ( Figure 6A). Waso ur inability to detect UmU and other published 2'-O-methylated dinucleotide contexts due to limited sensitivity of our instruments for rare dinucleotide motifs as well as the potential for inefficient release during hydrolysis?W eare confident in the rigorous identification and quantification of those modifications that we are able to detect but those that we cannot detect cannot be ruled out and we must use orthogonal methods such as RiboMethSeq and other techniques that exploit the biochemical properties of 2'-O-methylation modifications in RNA. [24] Conclusion Thes earch for new post-transcriptional RNAm odifications is an important aspect of modern epitranscriptome research and mass spectrometry is the instrument of choice for the challenge.F ollowing an established pathway for defining and validating molecular structures (Scheme 1), we discovered that the putative PT-containing dinucleotides observed in RNAf rom diverse organisms [2] were actually 2'-O-methylated dinucleotides.T his is not the first instance of confusion about RNAm odifications. [14,17] Them ajor sources of confusion that likely led to the misidentification appear to be incomplete hydrolysis of RNAa nd reliance on lowresolution mass spectrometry.W ith regard to hydrolysis,w e found that am inimal combination of PDE1 with either Benzonase or NP1 is required, with prolonged incubation with high nuclease concentrations providing what appears to be optimal hydrolysis of RNAt ot he mononucleotide level. Given published studies [18c, 19] and our observations,even with these precautions,t here may be modifications that are substantially resistant to release by nuclease hydrolysis,t hat are released in low abundance,orthat are poorly detected by mass spectrometry.T hese limitations demand caution in the interpretation of mass spectrometric studies of the epitranscriptome:the absence of signal does not mean the absence of the analyte.
With regard to isotopomer confusion, the M + 2signal for the abundant 2'-O-methylated dinucleotides is relatively strong and could easily be mistaken for the molecular ion M of another molecule.A sillustrated in Figure 5E,H RMS of CmU shows an isotope envelope of Mo f5 64, M + 1o f5 65, and M + 2o f5 66, with the integer difference in m/z value validating the expected ion charge of + 1. This is av ery common problem that we have experienced in discovering 7deazaguanine modifications in DNA, with initial prediction of 2'-deoxy-5-carboxy-7-deazaguanosine associated with m/z 311 proving to be the M + 1i sotopomer of 2'-deoxy-7amido-7-deazaguanosine,w ith the error caught during rigorous structural validation studies. [23] Here,t he case of GmG illustrates what we suspect is the problem for misidentification of RNAP Ts.T he isotope envelope for the abundant GmG is comprised of Mo f6 43, M + 1o f6 44, and M + 2o f 645, with the putative GpsG having Mof645. Thecautionary conclusion is that rigorous identification of molecular structure by mass spectrometry requires systematic exploration of all adjacent signals by full mass spectra or even HRMS to define the correct precursor molecular ion. Even now,w e cannot rule out the presence of PT modifications in some type of RNAinsome organism. Since we discovered PTs as natural products in DNA, [3] we hope that the search continues for PTs in RNA.