MALDImID: Spatialomics R package and Shiny app for more specific identification of MALDI imaging proteolytic peaks using LC‐MS/MS‐based proteomic biomarker discovery data

Matrix‐assisted laser desorption/ionization (MALDI) imaging of proteolytic peptides from formalin‐fixed paraffin embedded (FFPE) tissue sections could be integrated in the portfolio of molecular pathologists for protein localization and tissue classification. However, protein identification can be very tedious using MALDI‐time‐of‐flight (TOF) and post‐source decay (PSD)‐based fragmentation. Hereby, we implemented an R package and Shiny app to exploit liquid chromatography‐tandem mass spectrometry (LC‐MS/MS)‐based proteomic biomarker discovery data for more specific identification of peaks observed in bottom‐up MALDI imaging data. The package is made available under the GPL 3 license. The Shiny app can directly be used at the following address: https://biosciences.shinyapps.io/Maldimid.

that is, cancer versus normal tissues, cancer versus dysplasia, dysplasia versus normal, dysplasia low-grade versus dysplasia high grade, etc. Biomarker discovery analyses provides the relative abundance and therefore relative specificity of the compounds between compared tissues. A similar approach is followed in bottom proteomic analyses by MALDI-imaging, with the direct visualization of the differential abundance and therefore specificity of proteolytic peptides in ROI and relevant counterparts. Using biomarker discovery data obtained from LC-MS/MS thus appears to be ideal to identify more specific proteolytic peaks from MALDI imaging data.
This concept was first conceived and explained in detail few years ago [2]. It was illustrated with LC-MS/MS and MALDI imaging data from a study of high-grade intraepithelial lesions (HSILs) of the uterine cervix [2]. Thereafter, the concept was exploited for the development of a software integrating (i) LC-MS/MS data processing, (ii) MALDI imaging data processing, and (iii) data correlation [4].
Although fully integrative software solutions present some advantages, small modular applications have the benefit of easier integration and workflow/pipeline customization that are necessary for data correlation. This avoids data processing duplication when standalone biomarker discovery studies have already been performed, either using LC-MS/MS of MALDI imaging.
With our former proof-of-concept description [2], an R script was developed to permit data correlation [2]. Since a user-friendly interface was still lacking, an R package and Shiny app is proposed here, called "MALDImID," standing for "MALDI imaging protein IDentification."

Concept overview
The process for data curation was formerly described [2]. A standalone LC-MS/MS-based biomarker discovery study can be performed using the known combination of MaxQuant (MQ) [5] for peptide and protein identification/normalization/quantification and Perseus [6] for statistical analyses. The biomarker discovery analysis used for illustration in this article was published before [7] (Figure 1), and consisted of the comparison between HSILs ( Figure 1A,D) and normal tissues counterparts, that is, ectocervix ( Figure 1A,B) and endocervix ( Figure 1A,C).
Briefly, the different ROIs containing a comparable number of cells (approximately 3500) were collected from FFPE tissue sections from five different patients by laser microdissection (LMD). Thereafter, the collected tissue pieces were processed using a protocol for microproteomics based on hybrid in-solution/on-surface tryptic digestion [8,9]. The data were processed using MQ for peptide/protein identification and label-free quantification (LFQ) (Figure 1 Analyses of MALDI-imaging data can also be performed separately, also as standalone studies, in order to find peaks of interest from (i) MALDImID is thereafter only used when protein identification from MALDI imaging data is necessary (Figure 1.3). From the MQ processing step, the "peptides.txt" file ( Figure 1.2) is retrieved and loaded in MALDImID (Figure 2A). The version of MQ used was version 1.5.2.8. When running the MQ analysis while allowing to further run MALDImID, it is important to define the samples as "condition numberofreplicate" where "condition" corresponds to the tissue type (e.g.,  Figure 2D).
In Table 1, the process steps are listed (input/output) in which the data are used, the MS method used, the format, the program used, and the type of data.

Analytical workflow
MALDImID correlates data from "peptide.txt" files obtained from MQ analyses and extracted lists of markers obtained from Perseus analyses.
Below is the list of tasks performed by the R script contained in the MALDImID package: • Task 1: open the peptide.txt file. Dotted lines indicate the proteomic investigation that were previously and independently performed before MALDImID. Standalone matrix-assisted laser desorption ionization (MALDI) imaging investigations can be performed for biomarker discovery, leading to lists of m/z peaks of interest that are difficultly identified (upper right inset 1.) [2]. Standalone liquid chromatography-tandem mass spectrometry (LC-MS/MS)-based biomarker discovery investigations can be performed where the Maxquant analysis provides a "peptide.txt" file. Statistical analyses, for example, using Perseus, provide a list of significantly more or less abundant proteins in a group of interest compared to other analyzed groups [7] (lower left inset 2.). These two data sets are loaded in MALDImID for the creation of the list of region of interest (ROI)-specific identifications (IDs) and associated information related to the peptides (lower right inset 3.).  ing ran in positive mode are used. It is also important to note that monoisotopic peaks from the MALDI data have to be selected. • Thereafter, a margin of tolerance of ±0.2 Da is applied for the search.
• Task 4: filter out the masses that are not in the range defined in task 3.
• Task 5: identify the gene names of interest in the peptide.txt file.
• Task 6: filter out the peptides that do not belong to proteins (gene names) of interest. • Task 7: find the conditions that were set for the MQ analysis and the number of replicates. In the MQ analysis used for this illustration, the conditions compared and introduced in MQ where "HSILx," "ectoCx," and "endoCx" with x defining the number of the replicate.
In the peptide.txt files, intensities of the peptides are reported for each sample ("conditionx"). The conditions were detected by screening the text appearing after the word "Intensity" and the number of condition by the "x" following the condition (Supplementary data 1).
• Task 8: reduce the peptide.txt table to the most relevant data: mass, protein identifications (IDs) (gene names), peptide ID score, unique peptides.
• Task 9: adjust column names, that is, attribute the origin of the data (LC-MS or MALDI), specify units.

Installation requirements and use
The Shiny app is available on the shinyapps.io servers for users who are not familiar with the RStudio IDE and can be used directly at the following link: https://biosciences.shinyapps.io/Maldimid.
The installation requires R (≥3.4) and RStudio (≥ 1.0143). It follows a common R package installation.
The local use requires the installation of R and the following packages: "shinythemes," "shinyjs," "shinyBS," "stringr," "openxlsx," "rlang," and "stringi" using the command install.packages("packagename"), where the italics part should be replaced by the name of the package. After downloading the package, the directory is informed in unix path with the action: setwd("unixpathdirectory") and the package TA B L E 2 Description of the different tasks performed by the R script and illustration of the process in the different upstream and downstream files.
After installation, the package can then be loaded and run with the subsequent actions: library(MALDImID) and runMALDImID() to start the web application.
For integration with other workflows or pipelines that do not require human interaction through a GUI, the package also exports the function "GetResult," which requires the same input as the Shiny app user interface and give the option to automatically save the results to the provided location.
The R package can also be found at the following address: https:// bitbucket.org/biosciences/maldimid.
The current version is the very first release and may evolve with time and the evolution of data table formats.

RESULTS
The Shiny app has a minimalist interface where only the "peptides.

DISCUSSION
The integrative tools such as a recently developed software [4], the advantages of the Shiny app are (i) its ease of use, (ii) its user-friendly minimalistic interface, (iii) its availability as a package, and (iv) its independence from upstream biomarker discovery workflows (LC-MS/MS using MQ processing and MALDI imaging). This last point allows for the integration with other tools/steps, making possible to rapidly interrogate preexisting data without data processing duplication. This also allows for more flexibility with the type of data used, since two groups or multiple group comparisons of LC-MS/MS-based proteomic data can be used for correlation with MALDI imaging data, depending on the available data.
In the present article, this concept was illustrated using data from an LMD-based microproteomic workflow for LC-MS/MS-based biomarker discovery. Multiple workflows exist for proteomic investigations in tissue sections, preserving histological contexts [9]. A major alternative to LMD-based workflows consists of direct on-surface digestion of tissue ROIs followed by liquid extraction surface analysis (LESA) tryptic peptide recovery before LC-MS/MS analysis [10]. Both methods allow for the identification of thousands of proteins using up-to-date LC-MS/MS setup. One advantage of LMD is the fine control of the shape of tissue pieces to collect. This can be particularly useful for the specific analysis of ROIs embedded in larger structures, as formerly experienced with lymphatic vessels [11]. Using the same sample preparation between MALDI-MS and LC-MS/MS would largely contribute to a more consistent data correlation [3]. , although it may not be a general rule as with the putative identification of one peptide as KRT17 [2].
In addition to the mass of peptides, additional values to be correlated between MALDI imaging and LC-MS/MS data for their identification would be highly valuable. Recent developments in ion mobility (IM), and in particular trapped ion mobility spectrometry (TIMS), enable to add another dimension of molecular separation besides LC [12]. This new feature allows for a significant increase of sequencing speed without compromising sensitivity in LC-MS/MS-based proteomics [12]. When used for on-surface analyses, IM can partially compensate for the lack of molecular separation before ionization [13][14][15].

CONCLUSION
MALDImID is a user-friendly package application that permits rapid interrogation of LC-MS/MS-based biomarker discovery data for identification of MALDI imaging peaks, and allowing easy automation and integration with other tools for further data storage and processing.

AUTHOR CONTRIBUTIONS
Cristiano Oliveira and Rémi Longuespée designed the workflow for the Open access funding enabled and organized by Projekt DEAL.