OPTIMAL: An OPTimized Imaging Mass cytometry AnaLysis framework for benchmarking segmentation and data exploration

Abstract Analysis of imaging mass cytometry (IMC) data and other low‐resolution multiplexed tissue imaging technologies is often confounded by poor single‐cell segmentation and suboptimal approaches for data visualization and exploration. This can lead to inaccurate identification of cell phenotypes, states, or spatial relationships compared to reference data from single‐cell suspension technologies. To this end we have developed the “OPTimized Imaging Mass cytometry AnaLysis (OPTIMAL)” framework to benchmark any approaches for cell segmentation, parameter transformation, batch effect correction, data visualization/clustering, and spatial neighborhood analysis. Using a panel of 27 metal‐tagged antibodies recognizing well‐characterized phenotypic and functional markers to stain the same Formalin‐Fixed Paraffin Embedded (FFPE) human tonsil sample tissue microarray over 12 temporally distinct batches we tested several cell segmentation models, a range of different arcsinh cofactor parameter transformation values, 5 different dimensionality reduction algorithms, and 2 clustering methods. Finally, we assessed the optimal approach for performing neighborhood analysis. We found that single‐cell segmentation was improved by the use of an Ilastik‐derived probability map but that issues with poor segmentation were only really evident after clustering and cell type/state identification and not always evident when using “classical” bivariate data display techniques. The optimal arcsinh cofactor for parameter transformation was 1 as it maximized the statistical separation between negative and positive signal distributions and a simple Z‐score normalization step after arcsinh transformation eliminated batch effects. Of the five different dimensionality reduction approaches tested, PacMap gave the best data structure with FLOWSOM clustering out‐performing phenograph in terms of cell type identification. We also found that neighborhood analysis was influenced by the method used for finding neighboring cells with a “disc” pixel expansion outperforming a “bounding box” approach combined with the need for filtering objects based on size and image‐edge location. Importantly, OPTIMAL can be used to assess and integrate with any existing approach to IMC data analysis and, as it creates .FCS files from the segmentation output and allows for single‐cell exploration to be conducted using a wide variety of accessible software and algorithms familiar to conventional flow cytometrists.

for finding neighboring cells with a "disc" pixel expansion outperforming a "bounding box" approach combined with the need for filtering objects based on size and imageedge location.Importantly, OPTIMAL can be used to assess and integrate with any existing approach to IMC data analysis and, as it creates .FCS files from the segmentation output and allows for single-cell exploration to be conducted using a wide variety of accessible software and algorithms familiar to conventional flow cytometrists.
image analysis, image cytometry, imaging mass cytometry, tissue segmentation

| INTRODUCTION
Single-cell suspension technologies have now advanced to the point where we can measure thousands of parameters on millions of individual cells at truly "multi-omic" scale.However, the digestion and destruction of tissues to liberate single cells can affect the native cellular states as well as obliterating all spatial context.As such, "space" very much remains the "final frontier" with multiplexed single-cell tissue imaging traditionally lagging behind suspension technologies due to previously insurmountable technical issues around how many signals can be measured on the same sample/slide/section.Over the past few years, these issues have been overcome by the use of cyclical approaches to staining and imaging with fluorescent probes [1,2], or by moving away from fluorescence detection entirely with technologies such as "multiplexed ion beam imaging" [3] and imaging mass cytometry (IMC).IMC uses a powerful 1-μm laser to raster scan the metal-conjugated antibody-stained slide liberating small pieces of tissue for analysis by "cytometry by time of flight" (CyTOF) technology [4].IMC has several advantages over cyclical fluorescence detection such as no auto-fluorescence and no increase in measurement time with an increasing number of signals.It does, however, lack the same image resolution as optical systems (fixed at 10Â magnification) due to the 1 μm beam size of the ablating laser.While this is still sufficient to detect individual cells for phenotyping and spatial analysis the low image resolution can present challenges with subsequent data analysis.Unlike suspension technologies, IMC, along with all tissue imaging approaches with single-cell resolution, usually requires an additional preprocessing step whereby single cells or objects are identified using an image analysis technique called "segmentation." Segmentation algorithms are generally based on assessing variance at the pixel level and then using commonalities and differences to group individual pixels together as "super pixels" or "single-cell objects" via machine learning approaches [5].It is then possible to derive singleobject/single-cell features based on metal intensity (antibody/DNA intercalator), morphometrics (area, circularity, etc.) as well as the x and y centroid co-ordinates for every cell within each image.These features can then be used to explore the data using classical single-cell analysis approaches such as dimensionality reduction and clustering [6,7].There is generally a need to validate any cell types/states identified within the tissue against reference data derived from tissue digestion and suspension technologies, with usual caveats concerning the effects on cells/markers caused by enzymatic and/or mechanical disaggregation.As such, poor single-cell segmentation can have dramatic and confounding effects on accurate cell type/state identification, akin to measuring doublets/aggregates of debris by conventional flow cytometry or scRNAseq.There are a number of published "end to end" pipelines for IMC data analysis [8][9][10][11][12] that utilize open-source software for segmentation such as Ilastik [13] and CellProfiler [14,15], as well as StarDist [16] and IMC-specific approaches that utilize deep learning [17].There have also been attempts to use matched fluorescent images of the nuclei using DAPI co-staining to improve segmentation accuracy [18] as well as removing image noise [19,20].
Nonetheless, it has been shown that, due to the nature of tissue imaging, simple approaches to single-cell segmentation are often highly effective [21].Once single cells have been identified and exploratory features created and assigned, analysis follows an analogous route to suspension technologies with various corrections being applied.This includes a form of isotopic signal spillover correction [22], as well as parameter transformation and batch effect normalization prior to the use of dimensionality reduction techniques to visualize high parameter data and clustering to identify resident cell types/states.There are a number of existing approaches for visualizing and analyzing IMC data such as HistoCat [9] and ImaCytE [23], both provide the ability to perform spatial neighborhood analyses on the cell types and states identified via clustering approaches.However, they lack the flexibility to be able to optimize key steps and parameters of the pipeline in an easy and accessible manner.Here, we present a novel framework we call "OPTIMAL" (OPTimized Imaging Mass cytometry AnaLysis), which provides metrics and benchmarks for each major step of IMC data analysis including segmentation, parameter correction, normalization, and batch effect removal, as well as dimensionality reduction, clustering, and spatial analysis.This is not a new analysis pipeline per se, rather an exploration and optimization of existing approaches that allows for democratized analysis of cellular phenotypes from multiplexed tissue imaging technologies such as IMC; especially, as we convert all data to .FCS file format allowing it to be explored in an easy to use, accessible software.To test OPTIMAL we stained, acquired, and analyzed tissue microarrays (TMAs) from the same human tonsil sample over 12 temporally distinct batches using a panel of 27 metaltagged antibodies and IMC.We then investigated several different cell segmentation approaches based on the previously described Bodenmiller method [10], open-source software (Ilastik and CellProfiler), as well as deep learning (CellPose) [24] using cell type cluster "fidelity" as our measure of success using the human tonsil "ground truth" populations known to be identified by our 27-marker panel.Prior to clustering, however, we used OPTIMAL to identify the optimum arcsinh transformation cofactor to maximize signal resolution and to identify the use of a subsequent Z-score normalization factor as the best method of batch effect removal.We also identified the most effective dimensionality reduction and visualization method for IMC data to be PacMap, and FLOWSOM to be the best performing clustering algorithm for finding the expected cell types and states.Finally, we developed an approach to optimize spatial neighborhood analysis that used a more accurate method of finding neighboring cells than existing approaches and benchmarked this against well-defined cell types and structures in human tonsil.The OPTIMAL framework can be applied to any existing and future IMC data analysis as it provides a set of methods and metrics to empirically assess each stage in any pipeline, moreover, by producing .FCS files from our segmentation output we make exploration of the single-cell data highly democratized and not reliant on further expert coding skills.

| Tonsil tissue preparation and antigen retrieval
Formalin-fixed paraffin-embedded (FFPE) 2-mm human tonsil tissue cores were obtained from the Novopath Tissue Biobank (Royal Victoria Infirmary, Newcastle upon Tyne) and embedded into a threecore TMA.TMA blocks were constructed manually using medical biopsy punches (PFM Medical, UK).Cores were selected using hematoxylin and eosin-stained slides to guide suitable areas in the donor blocks.Cores were placed in a paraffin embedding mold, heated to 65 C and embedded in molten wax before cooling to set.Eightmicrometer serial sections were cut using HM 325 Rotary Microtome (Fisher Scientific, USA) and mounted onto SuperFrost Plus™ Adhesion slides (Epredia, CAT#10149870).

| Antibody panel design, conjugation, and validation by immuno-fluorescence
A 27-plex antibody panel was designed to identify the immune, signaling, and stromal components in the surrounding microenvironment.
All antibodies used in this study were first validated for performance using the chosen single-antigen retrieval methods outlined previously (Tris EDTA pH "Heat-Induced Epitope Retrieval") for IMC using simple two-color immunofluorescence (IF).All relevant antibody details are shown in Table S1 including the choice of metal tag based on the relative staining intensity of each marker by IF using the rules of "best practice" for CyTOF panel design [25].Unless stated otherwise, following verification of staining pattern and performance quality, approved antibodies were subject to lanthanide metal conjugation using a Maxpar X8 metal conjugation kit following manufacturer's protocol (Standard Biotools, CAT#201300).Antibodies conjugated to platinum isotopes 194 Pt and 198 Pt were conjugated as described in Mei et al. [26].Conjugated antibodies were validated by firstly checking the recovery of antibody postconjugation.Second, we checked for successful metal conjugation by binding the antibody to iridiumlabeled antibody capture beads AbC™ Total Antibody Compensation Beads (Thermo Fisher, USA, CAT#A10513) and acquiring on a Helios system (Standard Bio-tools, USA) in suspension sample-delivery mode.
Finally, we checked that the antibody had refolded and retained the ability to recognize antigen by using the postconjugation antibody in either a two-layer IF with a fluorescently labeled secondary antibody recognizing the primary antibody species or directly by IMC using the Hyperion imaging module (Standard Bio-Tool) connected to the Helios.A gallery of IMC-derived gray scale images for each stain (Ab and DNA) is shown in Figure S1B.Test tissue sections were then stained with the 27 marker antibody cocktails as outlined in Table S1.

| Hyperion (IMC) set up, quality control, and sample acquisition
Prior to each sample acquisition, the Hyperion Tissue Imager was calibrated and rigorously quality controlled (QCe'd) to achieve reproducible sensitivity based on the detection of 175Lutetium.Briefly, a stable plasma was allowed to develop prior to ablation of a single multielement-coated "tuning slide" (Standard Biotools).During this ablation, performance was standardized to an acceptable range by optimizing system parameters using the manufacturer's "auto-tune" application or by manual optimization of XY settings whilst monitoring 175Lutetium dual counts.After system tuning, tonsil sections were loaded onto the Hyperion system in order to create Epi-fluorescence panorama images of the entire tissue surface to guide region of interest (ROI) selection.Two ROIs of approx.500 μm 2 encompassing lymphoid follicles and surrounding structural cells were selected for ablation per batch run.Small regions of tonsil tissue were first targeted to ensure complete ablation of tissue during the laser shot with ablation energies adjusted to achieve this where required.Finally, ablations were performed at 200 Hz laser frequency to create a resultant MCD file containing all data from ROIs.Correction of "spillover" between isotopes was performed as per the protocol described at Spill over correction j Analysis workflow for IMC data (https:// bodenmillergroup.github.io/IMCDataAnalysis/spillover-correction.html) without deviation [22].

| Image QC and export
MCD files from the Hyperion system were opened using MCD™ Viewer v1.0.560.6 software (standard bio-tools) in order to perform a qualitative, visual QC of the staining intensity and pattern with the initial IF images as a benchmark.Pixel display values (max/min and gamma) were set to optimize the display of the 16-bit pixel range from the Hyperion detector (0-65,535) to the 8-bit display (0-255).
Multi-pseudo-colored, overlaid images were built for figures with a scale bar included and the option to export as an 8-bit TIFF with "burn in" was used.The digital magnification was also set to "1Â" so that each signal was carefully balanced for display purposes to aid qualitative visual interpretation.All images were exported as 16-bit single multilevel TIFFs using the "export" function from the "file" menu.For ease of use, all open collection channels from the experimental acquisition template (in this case, 60, including several "Blank" channels for QC purposes) from all ROIs were left ticked and any image/channel removal was dealt with later in the analysis.This avoided having to repeatedly deselect image channels for each ROI in the MCD file.These multilevel 16-bit TIFF images were then input in to our pipeline as shown in Figure 1A.

| Cell segmentation, feature extraction, parameter correction/normalization, and FCS file creation
Cell segmentation was based on the previously described method of Zantonelli et al. [10], which uses a combination of random forest pixel classification using Ilastik (Version 1.3.2 or later) [13] and helps to inform single-cell segmentation and feature extraction using CellProfiler (Version 4 or later) [15].Ilastik models were created to distinguish nuclear versus non-nuclear pixels based on partial labelling of multiplexed images of tonsil tissue or "Vero" monolayer cell culture.An additional run using unprocessed input Iridium 193 (DNA channel 51) was also trialed for comparison to Ilastik processing.Output nuclear probability maps were input into CellProfiler, enabling instance segmentation of cell nuclei, which were subsequently used as seeds for cell segmentation.Cell boundaries were determined using a seeded watershed algorithm either to EPCAM (channel 29) signal, or a maximum intensity projection of multiple membrane markers (see supplemental notes, Section S2.5.3).Following cell segmentation, individual cells were measured for mean intensity in each of the labeled channels.Intensity measurements were compensated for spillover according to a previously described approach [22].Arcsinh transformation was trialed using values from 0.1 to 120 using the Fisher Discrimination Ratio (Rd) to determine the optimum value for positive versus negative signal distribution (see supplemental notes).Following optimization, arcsinh transformation was applied to all experimental datasets with a value of 1.In addition, a second set of metal intensity parameters were derived whereby an additional subsequent Z-score normalization step was applied to the previously arcsinh cofactor (c.f.) 1 transformed values.This additional Z-score normalization was used to remove batch effect as well as to normalize marker intensities relative to one another for subsequent optimized heat map display.At this stage, any additional metadata was included in the files such as batch number in order to be an accessible and plot-able parameter for subsequent analysis.Final matrix data were converted to .FCS file format within the MATLAB pipeline for preparing for clustering.More details on our method and tests can be found in the supplemental method section (S2.5 and also visual guide at the end of the supplemental notes).

| Visualization, clustering, and exploration of single-cell IMC data
For this study, we used the commercially available "FCSExpress" software for all single-cell data analysis (Version 7.14.0020or later, Denovo software by Dotmatics, USA).More extensive information can be found in supplemental notes.Briefly, the FCS files created from the segmentation pipeline shown in Figure 1A for all 24 tonsil images across 12 batches were loaded as a single merged file.We then created a set of batch gates using a simple density plot of batch number (x-axis) versus iridium signal (Z-normalized parameter version) and selected contrasting colors for each and used the "pipelines" function within the "tools" menu to create UMAP parameters derived from the arcsinh c.f. 1 transformed and Z-score normalized antibody channels.In addition, we created UMAP parameters from the arcsinhonly versions in order to verify for the presence of batch effect and subsequent correction by Z-score normalization (see Figure 1B).Next, we used the same fully transformed and corrected parameters for FLOWSOM clustering using the default settings (see supplemental notes S2.6.5) with a merging of the 100 SOMs to 30 consensus clusters (cSOMs) based on hierarchical clustering and created a set of uniquely colored cSOM gates using the "plate heat map" and "well gates" function.We also created a PacMap dimensionality reduction plot using the same parameters using the Python interface within the FCS Express pipeline module (see supplemental notes S2.6.4).At this stage, after validation of results, we exported the data as both a single merged and a set of individual "Data stream" (.DNS) files.These contained the new clustering and visualization parameters (SOMs and PacMap x/y co-ordinates) as well as all the original .FCS file metadata but in a smaller, compressed, and easier to work with file format.
Next, we loaded the merged .DNS file in to a new incidence of FCS Express and conducted a much more extensive analysis of the data by (re)-constructing all the necessary meta-data and SOM gates as well as a heat map of transformed and normalized antibody-derived signals (rows) versus the 30 cSOMs (columns).The median values were normalized by column (cluster) to aid interpretation of the heat map on a per cluster per marker basis.Using the information on the panel in Table S1, we assigned broad cluster identities to these SOMs.We then used simple x/y centroid plots as well as further a priori legacy knowledge to manually merge any highly similar clusters with basic spatial verification.Finally, we exported the clustered data in two formats.First, as a .CSV file only containing the minimum information needed for neighborhood analysis, namely the ROI/Sample/Image ID, the cell ID within the ROI and the final cluster assignment for each cell.The second export file contained the ROI/Sample/Image name, the total cell count in the ROI and the percentage and/or cell number in each of the final cSOMs.The latter step can easily be performed using the individual .DNS files for each ROI/Sample/Image and using the FCS Express "batch export" function (see supplemental notes Section S2.6).

| Neighborhood analysis
Neighborhood analysis was performed with slight adaptations to the method outlined by HISTOCAT [9] and ImaCytE [23].Cell identities were determined by cluster analysis and saved, along with all other cell data, into a single large .CSV file.A separate excel file was used to store the cell type information as a biologically relevant name.Cell masks, stored from the cell segmentation stage, were input and cell identity transposed onto this data.Each cell was assessed for the number of unique cell identities within a pixel-defined threshold distance from the cell edge.The original HISTOCAT code was used, in addition to a modification using a "disc" element to determine the F I G U R E 1 Summary workflow diagram for the OPTimized Imaging Mass cytometry AnaLysis (OPTIMAL) analysis pipeline.Briefly, input multilevel TIFF images created from MCD file export were segmented using a combination of the nuclear channel via an Ilastik random forest pixel classification to generate a probability map ( p-map).Then single-or multiple-membrane channel images were used in conjunction with the p-map to create single-cell objects via CellProfiler and to create features sets based on intensity, morphometry, and spatial location.Additional metadata were also incorporated at this stage and included batch number.Metal intensity values were corrected for bleed through and two sets of subsequent parameters were created (i) arcsinh c.f. 1 transformed and (ii) a subsequent Z-score normalized set.This data matrix was converted in to .FCS file format (collectively MATLAB code I) and analyzed using FCS Express to explore things like batch effect normalization, dimensionality reduction (DimRedux) for visualization and clustering via FLOWSOM and heat map creation/interpretation. Cluster annotation was performed using a combination of hierarchical consensus merging and expert a priori knowledge combined with a basic spatial validation using x/y centroid plots.Once all cells in all images were assigned a cluster membership, a master .CSV file was created with the minimal necessary metadata to perform a neighborhood interaction analysis using MATLAB code II with the results visualized using MATLAB code III.
[Color figure can be viewed at wileyonlinelibrary.com] nearest-cell neighbors to each start point cell to investigate.We also tested the use of and automated edge cell removal as well as cells of extreme areas (<20 and >200 μm 2 ) to account for any possible segmentation errors.The cell identities for analysis were then mapped at random onto the cell masks, according to the number of each cell type identified by clustering for each image.This was repeated to create 100 iterations of randomly organized cell types on the underlying tissue.The interaction between cell types (i.e., the neighbor breakdown by cell type) was compared between these iterations and the original data, to determine if a difference can be identified between the original data and the randomly organized iterations.If differences are detected in the original data compared to a 90% threshold of the random iterations, then a significant difference is listed for that cell type for that image.These positive, neutral, and negative interactions were then collated to create the overall proportion heatmap for the condition (i.e.pathology, region, etc.) ranging from 1 (100% of images showed positive interaction) to À1 (100% of images showed negative interaction).A cluster "occupancy" cut-off percentage value of 0.01 was used for all analyses; however, this was unimportant as all final consensus clusters were present in all 24 ROIs.

| Ilastik-derived nuclear probability maps and a single "pan" membrane signal provides the optimal segmentation of single cells in human tonsil
We began by generating an IMC data set using TMA tissue from serially sectioned FFPE human tonsil tissue that had been stained with a panel of 27 metal tagged antibodies targeting well-characterized cell types/states as detailed in Table S1 over 12 temporally distinct batches (staining and acquisition).In each case, ROI selection was designed to capture as much structure as possible, including lymphoid follicles, germinal centers, and epithelium in order to provide high likelihood of positive staining for all 27 markers in all ROIs selected (see Figure S1A,B).Tonsil tissue was also selected for its dense cellularity in order to present a genuine challenge to segmentation but with clear a priori knowledge concerning what cell types our panel should find and where.It should be noted that while EPCAM is not biologically expressed across all cells in the Tonsil, due to some degree of non-specific staining, it was judged to show the most uniform and comprehensive ability to identify cell membranes across the entire tonsil tissue, far superior to using combination of tonsil-specific markers (see Figure S1C).Figure 2A shows a representative ROI from batch 3 with sequential composite images to mark distinct cell types such as T cells (CD3+), B cells (CD79a+), and macrophages (CD68+).
Ki67 was included to help denote follicles/germinal centers by virtue of a proliferative signature.The selection of these markers was deliberate as CD3, CD79a, and CD68 should all be mutually exclusive and not co-expressed by any single cell.Moreover, the spatial location/ segregation of several populations that our panel was designed to identify should follow a well-established pattern.As such this provided us a qualitative way to assess the potential signal overlap in each ROI that would likely be due to the dense cellular nature of the human tonsil combined with the lack of Z plane information afforded by IMC (see Figure S2 for CD3/CD79a/DNA composite images for all 24 ROIs).We next sought to test our different segmentation approaches on these images to determine which was optimal.To do this, we constructed probability maps ( p-maps) using the Random Forest pixel classifier within Ilastik using only the DNA (Iridium 193) channel image from either our tonsil TMA tissue or from an embedded "irrelevant" suspension cell line (Vero cells) using two pixel classes: "nuclear" and "background".Figure 2B (upper panel) shows the Vero and Tonsil-derived probability maps ( p-maps).The final step of our segmentation approach was to use the nuclear objects derived from the p-maps as "seeds" to anchor a marker-controlled watershed approach to expand out and delineate the boundaries for each single cell.In this case we compared the use of a single-membrane signal (EPCAM) for both models versus using the sum of several membrane signals (tonsil p-map model only) and the cell segmentation boundaries for the same representative ROI are shown in Figure 2B (lower panel).As an example of suboptimal segmentation we also used an approach that was not based on an Ilastik machine learning model, but instead directly attempted to segment objects within CellProfiler using the DNA (Iridium 193) channel.The segmentation outputs for all 24 tonsil ROIs derived from the Tonsil Ilastik-EPCAM membrane approach are shown in Figure S3 and the same for the Nucleus-only model in Figure S4.To provide some quantifiable metric to assess each approach we plotted the intensity of CD3 versus CD79a and looked for double-positive (DP) "nonsense cells" and included the total number of objects identified (Figure 2C).For the representative ROI shown in Figure 2B, we noted that the Vero cell p-map identified fewer objects than the Tonsil-derived p-map (5039 vs. 5839) with the nucleus-only approach identifying far fewer (3551).Moreover, the frequency of CD3/CD79a DP cells was also similar regardless of the p-map model (tonsil vs. Vero) or the approach used to delineate cell boundaries (EPCAM alone versus a multimarker signal approach) with $21% of events within the gates.There was, however, an increase in the frequency of DP cells in the nucleus-only plot ($29%) but it was surprisingly modest considering the gross under segmentation using this method.Bivariate plots of CD3 versus CD79a on all segmented objects are shown for the Tonsil Ilastik-EPCAM membrane approach in Figure S5 and for the Nucleus-only approach in Figure S6 for comparison and again show that there was minimal impact on the percentage of DP cells as a result of clearly suboptimal segmentation.To provide some basic spatial context we also plotted the x and y centroid values for each segmented object colored by membership of each gate (B cells, T cells, and DP cells, see figure 2D).These data showed very little differences in the arrangement of CD3+, CD79a+, and DP cells between segmentation approaches but did highlight the fact that without the use of a p-map approach, the cells were grossly under segmented.Collectively, these data suggested that an effective, yet straightforward approach to cell segmentation is the use of random forest pixel classifier trained on the same or similar sample/tissue type with a single widely expressed membrane marker to delineate cell boundaries.We hypothesized that more profound differences between segmentation methods may be revealed by clustering, however, before moving to this stage, we needed to optimize other elements of the data set.
3.2 | IMC data structure and batch effect removal benefits from optimal parameter transformation, Zscore normalization and optimal dimensionality reduction approaches Having determined that the optimal cell segmentation approach we tested used the Tonsil Ilastik p-map combined with watershed detection of the EPCAM membrane boundary, we next wanted to determine the optimal parameters for transformation, batch effect normalization and multiparameter data visualization.We began by evaluating the most suitable cofactor for arcsinh transformation of the metal signal parameters.Single-cell data variance increases with parameter value meaning that distances at higher (positive) values are less significant than distance from lower (negative) values.This is not suitable for dimensional reduction or clustering algorithms as most assume distances are of equal importance/weight.It is therefore essential to use special scaling formulas to stabilize variance.One of the most effective and widely utilized approaches is the hyperbolic arcsine (arcsinh) transformation [27].It is widely used in fluorescencebased flow cytometry and suspension-based mass cytometry [28].
The choice of cofactor has a profound influence on the post-transformation data structure and values of between 100 and 150 have been recommended and are widely used for fluorescencebased detection whereas a lower value of 5 is routinely used for mass cytometry in suspension.To our knowledge, however, there have been minimal attempts to empirically prove why these values have been used in either technology [29] or, importantly, any attempts to determine what co factor is optimal for IMC data.Values of 5 have been used to simply mirror suspension based mass cytometry [9] or values of 5-15 have been proposed [30].To this end, we performed a titration of arcsinh c.f. values spanning a range from 150 to 0.1 and used the "fisher discrimination ratio" (Rd), also known as the "linear discriminate analysis" (LDA) [31] to determine the statistical separation between a gated low and high signal distribution (Figure 3A) and then created UMAP plots from each c.f. values parameter set.By plotting the arcsinh c.f. versus the Rd value we were able to empirically determine that a value of "1" was optimal for achieving the maximal resolution of IMC-derived metal signal parameters (see Figure 3B).This was the same for all 28 metal isotope parameters in the panel.
We next sought to address the issues of batch effect normalization.While every attempt was made to eliminate and control batch effect by using the same donor tonsil tissue across all batches, the same lot of conjugated antibodies, the same person carrying out the staining protocols and a well maintained/QCe'd Hyperion instrument, the nature of working with FFPE tissues often generates significant variation.We began by firstly assessing whether there were batch effects in our data set that could influence the data structure and thus any biological interpretation.By plotting all 109,535 cells derived from the Tonsil-EPCAM segmentation model as a UMAP, we could see that our 28 arcsinh c.f. 1 transformed metal signal parameters (27 antibodies plus iridium) gave us very well structured data.
However, when we introduced coloration to these events based on batch membership, we could see that the majority of the data structure came from batch effects rather than true underlying biology (see Figure 3C upper panels).To attempt to correct for batch effect we tried a number of approaches such as "Batchelor" [32], "Harmony" [33], and "Seurat" [34]; however, we found that they were not easily compatible with our data files.As such, we found that Z-score normalization of the arcsinh c.f. 1 transformed metal signal parameters to be most effective (Z-score normalization is available in the FCS Express pipelines feature).When we created a UMAP plot using the Z-score normalized version for all 28 of the metal signal parameters, while the global data structure did collapse somewhat, coloration of each event by batch membership revealed an almost complete removal of batch effect (Figure 3C lower panels).We also tested the method of "0-1 scaling" as described by Ashhurst et al. [30] but this did not eliminate the batch effect in our data (see Figure S7).
Having established the optimal transformation c.f. and normalized for batch effects, we next wanted to determine if UMAP was indeed the optimal algorithm for presenting the underlying structure of IMC data.To this end, we assessed five different dimensionality reduction methods in all cases uses the recommended hyper-parameter settings (see Figure 3D upper panel); UMAP (as described previously), fltSNE [35], tSNE [6], PacMap [36], and Tri-Map [37].Typically, tSNE is widely to visualize IMC data, however, it is often the case that it projects very little data structure.There is some argument that UMAP performs better for data with parameters that are poorly resolved and does a better job of projecting both local and global data structures [38].Our data support this concept as tSNE representation of our IMC data lacked any discernible structure and moreover, densitybased overlay of fiducial phenotyping markers such as CD3 (Figure 3D, middle panel) and CD68 (Figure 3D, lower panel) showed very little focus of events expressing these markers in defined areas of the map.The fltSNE algorithm performed as poorly as tSNE with triMAP giving by far the most suboptimal results.Interestingly though, PacMap performed very well and gave better data structure than UMAP in our hands, with very clear islands with mapping of the fiducial markers to defined areas.As such we decided to use PacMap to visualize our IMC data using the arcsinh c.f. 1 and subsequent Z-score normalized metal parameter feature set.

| Suboptimal segmentation has a detrimental impact on the ability to confidently identify all expected tonsil-resident phenotypes using clustering approaches
Although the segmentation approach did not seem to create overtly inferior or superior single-cell-level data outputs as judged by our simplistic CD3 and CD79a DP "nonsense" cell frequency analysis (Figure 2B, S5, and S6), we wanted to assess whether clustering and cell type identification would be more affected.To this end we used the FLOWSOM clustering algorithm [39] to partition the single cells into initially 100 SOMs (clusters) based on "similarity" over the 27 antibody-derived metal signal parameters.We used the arcsinh c.
f.1 and Z-score normalized versions as previously reasoned.Figure 4A shows the 100 SOMs for the output of the 109,535 single-cell objects generated by the "Tonsil Ilastik-EPCAM membrane" segmentation approach in the form of a radial spanning tree with the mean expression of the fiducial markers CD79a and CD3 used as the radial statistic for each of the three plots respectively.In Figure 4B we present the same visualizations but this time using the 84,268 single-cell objects derived from the "Nucleus-only" segmentation approach.A qualitative comparison between the two sets of data suggests that the suboptimal segmentation output (Nucleus-only model) leads to more SOMs (clusters) that seem to have higher expression of CD3 and CD79a but also less radial spanning structure compared to the "Tonsil Ilastik-EPCAM membrane" model SOMS.To further investigate these potential differences we compressed the 100 SOMs to 30 cSOMs using the standard hierarchal approach [39].We then sought to annotate the clusters based on the heat map outputs and marker expression pattern on a per-SOM, per-marker basis using heat maps.Figure 4C shows that the majority of "Tonsil Ilastik-EPCAM membrane" model consensus SOMs could be assigned a biological identity (27 out of 30) using expert a priori knowledge whereas for the 30 cSOMs derived from the "Nucleus-only" segmentation model we were only able to confidently assign identities to 23 (Figure 4D).Interestingly, we also noted a reduction in T-cell and macrophage cluster heterogeneity in the data derived from the "Nucleus-only" segmentation model with no evidence of naïve CD8 T cells or mature macrophages as well as an over-clustering of B cells.Overall these data collectively suggested that suboptimal segmentation did have a negative impact on phenotypic identification based on clustering approaches where all "n dimensions" are considered.While FLOW-SOM has been widely used for suspension cytometry data analysis, we are not aware of a study using it for IMC data analysis.IMC data clustering tends to be done using the Phenograph algorithm [9,23,40,41].As such we wanted to also test this approach on our "Tonsil Ilastik-EPCAM membrane" data set.Again we used the optimal arcsinh c.f. 1 transformed, Z-score normalized metal signal parameter feature set and selected a "k nearest neighbor" value of 17 to generate a similar number of Louvain communities (clusters) to our FLOWSOM consensus approach (30 clusters).Figure 4E shows the Phenograph output as a heat map with attempts to assign cell identities to the clusters.In this case we could only confidentially annotate 18 out of the 30 clusters and as a result several populations were totally absent compared to the equivalent FLOWSOM approach (Figure 4C).It was also of note that Phenograph found several non-classified (NC) clusters that were of very low frequency also suggesting a suboptimal performance compared to FLOWSOM as our panel was not designed to find any rare cell types in tonsil.

| FLOWSOM clustering combined with expert cluster merging is able to identify cell types/states with high spatial accuracy
Having established the optimal clustering approach for correctly transformed and batch normalized IMC data we wanted to further refine our clusters in terms of biological meaning.Several of the annotated cSOMs from the FLOWSOM approach were still phenotypically identical to one another and thus were unlikely to represent truly unique cell types or states.We also wanted to combine out final annotated cSOMS with the use of PacMap dimensionality reduction.
To this end, we manually merged any of the 30 cSOMs from the heat map shown in Figure 4C based on highly similar marker expression patterns.This left us with 21 unique clusters, all of which could be biologically annotated with a high degree of confidence apart from one cluster of cells with high CD56 expression present as a majority in a single image (see Figure S2). Figure 5A shows the heat map of the 21 manually merged cSOMS with biological annotations and the relative frequencies of each.The clusters were also mapped back on to the same PacMap plot constructed from all 109,535 cells as shown in Figure 2B.The follicular B and T cells formed a distinct structure as did the non-follicular immune cells and the macrophages/structural cells (endothelium and epithelium).The real power of IMC and other high parameter tissue imaging approaches is that spatial context of all cell phenotypes/clusters can be mapped back in to the tissue space.
We chose human tonsil, the antibody panel, and the specific ROIs precisely as they should identify well-known cell types that also possess well-defined spatial co-ordinates with respect to anatomical structures but also in relation to one another.To validate our final 21 manually merged cSOMs, we used the fact that each of the 109,535 cell objects identified by Tonsil-EPCAM segmentation within the 24 ROIs retained their x and y centroid co-ordinates as part of the FCS file creation (see Section 2 and Supplemental Methods).This meant we could simply plot the X and Y centroid features for any ROI as bivariate dot F I G U R E 3 Optimization of data scaling co-factors, batch effect correction and dimensionality reduction for imaging mass cytometry (IMC) data analysis.(A) The impact of arcsinh cofactor (c.f.) values on parameter/channel resolution.Left panel shows histograms of CD3 expression intensity derived from segmented single cells within the human tonsil tissue with decreasing arcsinh c.f. values down the rows (100, 1, and 0.1).The right panels show the same analysis for CD79a expression intensity levels."Negative" and "positive" gates are set on each plot to derive the population statistics (median and rSD) required to calculate the "Fisher ratio" (Rd) resolution metric (see Section 2).(B) The graph shows the relationship between Rd (y-axis) as a function of arcsinh c.f. (x-axis) for CD3 and CD79a.(C) Batch effect in data can be eliminated by correct normalization approaches.UMAP plots of 27 antibody-based parameters of arcsinh c.f. 1 transformed data only (upper panels) and arcsinh c. plot and color by the selected cSOMs.Figure 5B shows the spatial mapping of six annotated cSOMs from the heat map in 5A for two representative ROIs.Reassuringly, the spatial locations of each cluster followed the expected biological patterns with the follicular T and B cells mapping to follicular structures and the endothelial cells mapping to the inner walls of the vessels/tonsillar crypt.These observations were further verified by the original staining patterns in IMC images (Figure 5C) with cells in the follicles Ki67 positive as they are undergoing aggressive proliferation.As a final level of validation, we mapped all 21 clusters using unique colors on to the cell object maps derived from the segmentation pipeline (Figure 5D).These were also in agreement with the expected spatial patterns of locations.The colored cluster maps for all 24 ROIs are shown in Figure S8.Overall, the combination of manual merging of FLOWSOM-derived cSOMs, Pac-Map visualization, and validation by spatial mapping confirmed our analysis approach to be optimal and accurate with respect to our panel and tonsil tissue.

| The choice of pixel expansion approach combined with removal of edge cells has a negative impact on neighborhood mapping
Having established that our analysis approach could reliably identify cell types and states in tonsil tissue with high accuracy both phenotypically and spatially, we wanted to use these data to benchmark our neighborhood analysis.Our method was based on the previously described approach from HistoCat [9] and is based on a defined pixel outgrowth that creates a bounding box in which significance of interaction or avoidance is tested using a permutations-based approach with a significance cut off (typically 100 permutations and a 10% cut off).There is also a threshold parameter that can filter out clusters that only appear in a certain percentage of the images/ROIs (see Figure S9A).A threshold of 0.1% means that clusters have to be present in over 10% of all ROIs to be considered in the neighborhood analysis.This was not a function relevant to our data set, however, as all final 21 cSOMs were present in all 24 ROIs (see Figure S9B).We would also caution against using this feature as it could lead to removal of a key, biologically defining cluster from one sample group in a large batch analysis.A further important consideration is the removal of any partial and fragmented cells around the edge of the image as well as size-based filter for removing under and over-segmented objects.We set a gate on our data that would ignore cells/objects less than 20 μm 2 (5 μm 2 diameter) and more than 200 μm 2 (16 μm 2 diameter) as shown in Figure S9C.We could then compare the outcomes and results of using this filtering method with edge cell removal versus analyzing all objects.First, we wanted to compare the results of conducting a spatial analysis on a defined, well characterized ROI (ROI 23 in this example) using the "bounding box" approach versus a "disc"-based method of pixel outgrowth (see Figure 6A).Our first metric of assessment was the median nearest neighbor number (median NN) versus the pixel outgrowth value (see Figure 6B).As expected for all approaches, increasing the pixel distances led to an increase in the median NN with the largest values coming from the original bounding box approach.However, based on the physical geometry of the cells in the tonsil tissue we reasoned that a median NN value between 6 and 8 would be indicative of an optimal area for "true" neighborhood analysis.The usual recommended pixel outgrowth for this approach is 5 [9,23]; however, the data in Figure 6B showed that the original bounding box method gave a median NN of $11 cells at this distance.This suggests that the bounding box created was sampling an area greater than the area occupied by the immediate nearest neighboring cells.Both the "disc"-based approaches at 5 pixels gave NN values of 8 regardless of any object filtration suggesting that it was more accurately finding "true" immediate neighbor cells compared to the original "bounding box" approach.Finally, we wanted to generate heat maps for each of the 12 combinations and focus on the accuracy of the interactions and avoidances at the clustering level.We first considered the general "qualitative" appearance of the heat maps with regard to how much red (significant interaction), blue (significant avoidance) and white (indifference) we observed.Figure 6C shows that in qualitative terms, only the disc method with object filtration (first column) produced heat maps over the entire pixel range tested [3][4][5][6][7][8][9][10][11][12][13][14][15] that were not dominated by red (significant interactions), rather the majority of cluster relationships were "indifferent" or random (white).Based on the "ground truth" image of ROI 23 in Figure 6A, the majority of clusters were not interacting with one another suggesting that the disc with object filtering method was more accurate in reflecting the actual spatial arrangement of cells in the tissue.Finally, we focused on the four cell types (clusters) highlighted in Figure 6A, namely follicular B cells, memory CD4 T cells, memory CD8 T cells and B cells and looked at the heat maps for each up to a 10 pixel expansion.The only condition where we noted significant avoidance (blue) between follicular B cells as the central cluster (row) with respect to memory CD4/CD8 T cells and epithelium as well as an acceptable median NN value was the disc approach with filtering and a 5-pixel outgrowth.Overall these data support the idea that this was the optimal approach for finding "true" neighbors.The analysis of IMC data has historically been challenging with limited attempts made to develop accurate, scalable, and accessible solutions.Moreover, approaches tend not to be very accessible and require expert knowledge of programming languages such as R, Python or MATLAB [12,42].This has resulted in significant frustration in the community, contributing to inaccurate single-cell data and unconvincing biological conclusions.There are several issues with existing analysis pipelines, namely the approaches used to segment single cells accurately from low-resolution, often "noisy" data.They often lack any real appraisal of how successful the segmentation is in terms of how well they find known/expected cell types/states in a given tissue.
To this end, the "OPTIMAL" framework for analyzing IMC-derived multiplexed image data provides several recommendations for testing, optimizing and benchmarking key steps in any pipeline.First, we show using the currently most established approach for IMC data segmentation [10] that object identification is improved by the use of an Ilastik-generated probability map constructed using only nuclear and background signal but that the p-map does not necessarily need to be derived from the same cell or tissue type as an embedded cell line (Vero cells) performed comparably.Of note, we only utilized two classes of pixel classification for Ilastik learning (nucleus and background) and not membrane as this reduced the computational burden and showed no benefit over using three pixel-class Ilastick learning to define cell boundaries (Figure S10).When we also looked at the frequency of so called "nonsense" CD3/CD79a DP cells we found a minimal increase as a result of under segmentation in the absence of a p-map input.While this seemed surprising, it likely reflected the fact that T cells and B cells are often in quite distinct anatomical locations F I G U R E 6 Optimization of spatial "neighborhood" analysis of human tonsil tissue reveals the importance of edge cell removal and the method of pixel expansion.(A) A cluster map for ROI 23 (see Figure S8) only showing 5 of the final consensus clusters (see legend) with clear and expected spatial relationships.The teal dotted line denotes the optional removal of edge cells prior to neighborhood analysis.The area of tissue within the solid teal square has been magnified to show the two methods of pixel expansions for finding neighboring cells to the central cell (X); a "bounding box" (BB) approach (i) or a disc approach (ii).(B) A graph showing the relationship between the selected pixel expansion/distance value (x-axis) and the median number of nearest neighbors (NN) for each of the three conditions tested (see legend).(C) Interaction heat maps for the different input options shown in B (columns) versus pixel expansion distance (rows, 3, 5, and 10 pixels).The rows for each heat map denote the central cell cluster (marked as X in (A)) and the columns denote the potential neighboring cell types (clusters).The color of each square in the grid relates to the nature of the spatial relationship with red denoting a significant interaction, blue a significant avoidance and white indifference as per legend (i).The clusters in x and y are denoted by color as per legend (ii).The violet dashed boxes highlight the interactions and avoidances of the follicular B-cell population with the memory CD4 and memory CD8 T-cell populations.All maps were created using significance cut off of 10% and 100 permutations.[Color figure can be viewed at wileyonlinelibrary.com] in human tonsil [43].Perhaps unsurprisingly, however, when we took the data to the clustering stage, we did see dramatic effects on the fidelity and identity of the cell types and states derived from poorly segmented images.As such, we would recommend always assessing any segmentation approaches using a clustering-based metric against a known "ground truth."We saw no measurable benefit to using an approach for cell boundary detection that used the sum of multiple membrane signals over a single-membrane marker such as EPCAM; even considering that EPCAM was labeling membranes in a nonspecific fashion.Moreover, IMC is not an optical imaging technology and has a relatively low resolution (1 μm per pixel, equivalent to around 10Â magnification on an optical imaging system).The ablation laser that also "drills" in to a defined depth of tissue in order to liberate sufficient material to achieve suitable metal ion detection and signal resolution [1].It is therefore highly likely that information from cells located in different z planes are mixed together.As such, one could argue that segmentation will always be flawed to an extent.
Certainly, there are numerous approaches that do not attempt to segment cells but instead work with the pixel level data in the image [44].
To this end, several groups are working on 3D imaging of cleared tissues [45] or by modifying the IMC approach to achieve Z-stack information [46]; however, at present, these techniques either lack the parameter space or throughput.We also tested the deep learningbased segmentation approach CellPose [17,24] as well as an approach that only segmented nuclei and saw no measurable benefits (Figure S10).Nonetheless, we propose that OPTIMAL provides a framework for benchmarking any segmentation approach.
Post-segmentation but prior to any further single-cell analysis using dimensionality reduction and clustering techniques it is essential to apply various transformations and corrections to the data in order to remove noise, background, maximize signal resolution, and remove any batch effects.Any form of semi-quantitative tissue imaging is by nature composed of quite poorly resolved signals due to the fact that we are never measuring a whole cell, unlike flow cytometry or suspension mass cytometry.Moreover, IMC is around fivefold less sensitive than fluorescence-based detection (our unpublished observation).As such, it is imperative to ensure that the resolution of signal is optimized in order to provide the very best overall data structure prior to going in to both dimensionality reduction (DimRedux) and clustering.
As described previously [22], we applied spillover correction to all of the mean pixel values for all metal ion channels.This has been shown to improve data interpretation.We also removed any "hot" pixels by capping at the top and bottom 5% for analogous reasons.However, probably the most important step is the use of data transformations such as the hyperbolic arcsine (arcsinh).Without applying such a transformation, comparatively high parameter values with greater variance will have a lower weighting in any subsequent dimensionality reduction or clustering compared to lower values with greatly reduced variance.Using a very simple approach based on the Fisher discriminatory ratio (also known as the LDA) we used our OPTIMAL approach to determine the best arcsinh cofactor value for IMC data to be "1," not between 5 and 15 as reported by others [30].We show that values greater than or less than 1 do not project the IMC data structure in an optimal fashion.While previous attempts have been made to try and develop frameworks for optimizing parameter transformations for fluorescence-based flow cytometry data [47], to our knowledge OPTIMAL is the first for IMC data.
After parameter transformation, the next important step of data preprocessing is to look for, and if necessary, correct for batch effects.
While we purposefully attempted to minimize and where possible eliminate all sources of batch effect by the same person staining TMAs from the same FFPE human tonsil section with the same panel of 27 metal tagged antibodies on 12 separate occasions and acquiring data on a well maintained and consistently QCe'd Hyperion IMC system, the nature of FFPE tissue analysis remains highly variable.As such it was no surprise that a UMAP-based analysis of our 27 arcsinh c.f.-transformed antibody-metal parameters revealed measurable batch effect in the data structure and presented us with the perfect opportunity to develop an OPTIMAL solution for correction.As with parameter transformation, there has also been a lack of exploration as to what the best approach is for batch effect normalization, and while several approaches exist for cytometry and single-cell data often these are not tested using actual dedicated, empirically generated batch controls.We did test a number of these approaches on our data set, including the "0-1" scale compression proposed by Ashhurst et al. [30] but found that a simple Z-score normalization after arcsinh transformation was sufficient to remove all measurable batch effect from our data without eliminating biological relevance.
Having formulated the OPTIMAL approach for empirically determining the necessary transformations and corrections to achieve the very best resolution from our IMC data we next assessed the suitability of five different dimensionality reduction algorithms to determine which provided the best representation of our data structure.As Dim-Redux approaches are used to present multidimensional single-cell data in a way that can aid interpretation it is essential that the right approach is used.The use of widely available software such as FCS Express of FlowJo that requires little to no knowledge of coding means that using our OPTIMAL approach, researchers can easily explore what DimRedux method is best for their data.To our knowledge, existing analysis methods such as HistoCat and ImaCyte do not offer the same flexibility of choice and ability to also alter the hyper parameters for these algorithms (iteration, seed, neighbors, perplexity etc.).In this case, we found that PacMap performed the best with UMAP a close second.PacMap generated more discrete structures as well as showing improved mapping of fiducial markers back on to these whereas other approaches lacked any discernible structures and fiducial markers were more diffuse in mapping.Of note, PacMap also clearly identified the follicular structures in the tonsil driven by Ki67, CD57, CD3, and CD79a expression and has been proposed to handle weakly resolved signals better than other DimRedux approaches [36], making it highly suitable for IMC data.
The use of clustering approaches to identify cell types and states based on marker expression levels/patterns is well established in single-cell analysis.There are several different approaches and one of the best performing is the FLOWSOM algorithm [7].To date, few if any IMC analysis approaches have reported the use of FLOWSOM for cell type and state identification, but rather have used Phenograph as part of HistoCat or IMaCYte [9,23].Our data were conclusive in that FLOWSOM, in conjugation with the Tonsil-EPCAM segmentation model, arcsinh c.f. 1 parameter transformation and subsequent Zscore normalization could identify the majority of expected cell types and states within the human tonsil.Phenograph performed poorly, missing several expected cell types as well as generating a large number of low-frequency, unidentifiable clusters.While it may be possible to try and optimize the Phenograph algorithm to improve the outputs, in all cases we deliberately used the default hyper-parameters for both clustering algorithms we tested to mainly reflect that we want the OPTIMAL framework to be accessible to non-specialists in data analysis.
Finally, having arrived at a set of cell clusters that we could annotate with phenotypic, functional, and spatial confidence, we wanted to assess whether we could benchmark and optimize the commonly used neighborhood analysis method used in HistoCat [9].We purposely ensured that we selected quite varying areas within the tonsil tissue sections over the 12 batches, to introduce variance at the spatial level (but not at the cell type or marker level).We selected example ROIs where there was clear spatial definition of different cell types so that we could always compare any interaction-based neighborhood analysis with what we could observe to be true in the image.
We also restricted our analysis to only a few cell types such as follicular B cells and memory CD4 and CD8 T cells.We tested a number of different tune-able parameters for detecting immediate neighboring cells from a central cell phenotype and used two metrics to determine which was best; the median NN and the significance of either interaction or avoidance as a heatmap.The median NN values as a function of pixel outgrowth was very interesting as it showed the original script's "bounding box" approach to be including cells that were not true neighbors.We found that a better approach was to use a radial "disc"-based pixel outgrowth and that filtering of edge cells as well as small and large cells from the images was essential to generate the expected interactions and avoidances.Moreover, this was optimal at 5 pixels, as recommended by HistoCat and ImaCytE by default but only when using the "disc"-based outgrowth method.While this approach was simple and informative, we did notice that the structural heterogeneity we purposefully collected in our data set meant that if we created interaction heatmaps from the average of all 24 ROIs, the data were almost un-interpretable (data not shown).As such analysis of structurally heterogeneous tissue may benefit from other spatial analysis methods that the one we tested.
In conclusion, we show using the OPTIMAL approach that methods for segmenting single cells in IMC data can be assessed using well characterized tissues and antibody panels followed by cluster analysis to verify that the expected cell types/states are identified.
However, prior to any clustering analysis, IMC data structure can be optimized by transforming all metal parameters with an arscinh c.f. of 1, and this can be empirically tested using the Rd approach, and also corrected for batch effect using a an additional Z-score normalization.
We also found that PacMap was the best dimensionality reduction approach for visualizing IMC data and that FLOWSOM was the best performing closeting algorithm.Finally, we show that the OPTIMAL approach for conducting neighborhood analysis of the resident cell types/states is to use a "disc"-based radial pixel outgrowth rather than a "bounding box approach".We have further validated and utilized the OPTIMAL approach to analyze several other tissues using similar and distinct panels of antibodies to the ones used in this study.

F
I G U R E 2 Assessment of segmentation approaches for accurate single-cell identification in complex tissues using a 28 parameter (27 antibody) imaging mass cytometry (IMC) panel on human FFPE tonsil.(A) Multi-parameter pseudo-colored images from a representative human tonsil ROI with well-separated B-and T-cell areas.The first image column (left to right) shows DNA staining with iridium (red pseudo color), the next column images include CD79a as an overlay (green) with iridium (red), the next shows CD3 (blue) overlaid with iridium (red).The next set of images combine the CD3 (blue), CD79a (green), and iridium (red) as a triple overlay.The final image panel shows the addition of two further parameters, CD68 (teal) and ki67 (yellow).(B) Segmentation maps for the four different segmentation models tested in this study showing the same region of interest (ROI) as in (A).The upper panels show (where used) the probability map outputs from the indicated Ilastik model (derived from either the same tonsil data set or from Vero cells).The lower panels show the segmentation boundaries generated using CellProfiler alone (far left image) or from the indicated Ilastik p-map.Various approaches to delineate the cell boundary are indicated and include using a single-membrane signal (EPCAM) or a combination of multiple markers (multisignal).(C) Bivariate single-cell-level intensity plots for each of the four segmentation approaches shown in (A) and (B) with CD3 intensity displayed on the x-axis and CD79a intensity displayed on the y-axis.In both cases the arcsinh c.f. 1 transformed, Z-score normalized values have been used.Gates have been set to quantify the percentage of cells that express CD3 or CD79a alone as well as biologically impossible double-positive (DP) cells that may indicate a failure in accurate segmentation.The total number of single-cell events is also shown on each plot.(D) x/y cell centroid maps of the same tonsil ROI in (A) to (C) colored by the gated population shown in (C) for each of the four individual segmentation approaches colored as indicated in the legend.[Color figure can be viewed at wileyonlinelibrary.com] f. 1 transformed, Z-score normalized data (lower panels) showing all 109,535 single cells.From left to right, the first UMAP plots are colored by cell density.The middle UMAP plots are standard black and white dot plots and the third UMAP plots are colored by batch (see key).(D) The choice of dimensional reduction algorithm impacts on the representation and interpretation of underlying IMC data structures.The indicated dimensionality reduction algorithms were run on the same 27 arcsinh c.f. 1 Z-score normalized parameters as in (B).The upper row shows general cell density, the middle row is density weighted by CD79a expression and the lower panel by CD3 expression.[Color figure can be viewed at wileyonlinelibrary.com]F I G U R E 4 FLOWSOM clustering performs better on imaging mass cytometry (IMC) data than Phenograph but is affected by suboptimal segmentation.(A) Radial spanning trees of the original 100 SOMs generated by the FLOWSOM clustering algorithm from single-cell outputs generated by the "Tonsil EPCAM" segmentation model with the mean expression of the "fiducial" markers CD79a and CD3 used as the radial statistics as indicated.(B) The same plots as in A but for the output of FLOWSOM clustering on the single-cell data from the "Nucleus-only" segmentation model.(C) Heat map of the 30 consensus cluster (SOMs) derived from the original 100 SOMs for the "EPCAM" model.The frequency of each cluster is indicated by the bar chart below each column (cluster).Specific as well as broad cluster annotations are provided for T cells, B cells, macrophages (Macs), endothelial cells (Endo), and epithelial cells (Epi).Where a cluster could not be confidently identified, they were labeled "NC" (not classified).The heat map is showing the median of the arcsinh c.f. 1 transformed, Z-score normalized 27 antibody marker signals as indicated on the y-axis (rows) and have been further normalized by column value (by cluster).(D) An analogous heat map as shown in (C) but for the 30 consensus cluster (SOMs) derived from the original 100 SOMs for the "nucleus-only" segmentation model.(E) A heat map as shown in (C) but for the 30 Louvain communities (clusters) derived from analyzing the single-cell outputs generated by the "Tonsil-EPCAM" segmentation model using the Phenograph clustering algorithm with a "K-nearest neighbor" value of 17 (see Section 2).[Color figure can be viewed at wileyonlinelibrary.com]

F I G U R E 5
Cell type/state verification using the original imaging mass cytometry (IMC) images and spatial mapping visualization tools.(A) A heat map (left panel) of the 21 final manually merged populations created from the 30 consensus clusters for the "EPCAM" segmentation model shown in Figure 3C.Cluster frequencies are shown by the colored bar charts below each cluster column and the map intensity has been derived from the arcsinh c.f. 1 transformed, Z-score normalized markers as indicated in each row with further normalization down each column (by cluster).A PacMap dimensionality reduction plot (right panel) of the 109,535 segmented single cells from all 24 tonsil ROIs across the 12 staining batches as shown in Figure 2C but now colored by the final 21 FLOWSOM clusters as per legend.(B) x/y centroid maps for two representative tonsil ROIs with six different unique cell (see legend) clusters displayed.(C) Pseudo-colored IMC images of the same representative tonsil ROIs as in (B) showing six fiducial stains as indicated in the legend (nuclear plus five antibodies) that support classification of the cell types in (A) and (B).(D) Cluster maps of the same two representative tonsil ROIs as shown in (B) and (C) with all 21 final consensus clusters shown (see legend).[Color figure can be viewed at wileyonlinelibrary.com] These include COVID-19 postmortem lung tissue (manuscript submitted), gut tissue from various inflammatory conditions (manuscript in preparation) and inflamed synovial tissue from rheumatoid arthritis patients (manuscript in preparation).Furthermore, the OPTIMAL framework has been used to analyze data from other multiplexed tissue imaging technologies that are fluorescence-based such as the Miltenyi MACSima with a high degree of success.As stated previously, we do not describe OPTIMAL as a new pipeline per se for analyzing IMC and other multiplexed imaging technology data sets, but we do argue that it offers a framework for assessing, optimizing and benchmarking existing and future approaches.AUTHOR CONTRIBUTIONS Andrew Filby: Conceptualization; investigation; funding acquisition; writing-original draft; methodology; supervision; project administration.Bethany Hunter: Conceptualization; investigation; methodology; writing-original draft; writing-review and editing; formal analysis.Ioana Nicorescu: Writing-original draft.Emma Foster: Methodology; writing-review and editing.David McDonald: Conceptualization; investigation; writing-original draft; methodology; formal analysis; supervision.Gillian Hulme: Methodology.Andrew Fuller: Methodology.Amanda Thomson: Methodology; writing-original draft.Thibaut Goldsborough: Methodology.Catharien M. U. Hilkens: Writingreview and editing.Joaquim Majo: Data curation; writing-review and editing.Luke Milross: Writing-review and editing; data curation.Andrew Fisher: Writing-review and editing; supervision.Peter Bankhead: Methodology; writing-review and editing.John Wills: Methodology; software; formal analysis; writing-review and editing.Paul Rees: Writing-review and editing; software; formal analysis; methodology.George Merces: Methodology; software; formal analysis; data curation.ACKNOWLEDGMENTS This work was funded by UK Research and Innovations / NIHR UK Coronavirus Immunology Consortium (UK-CIC; MR/V028448) and the European Union's Horizon 2020 research and innovation programme under grant agreement No 860003, and the JGW Patterson Foundation.This work was also supported by the United Kingdom Research and Innovation (grant EP/S02431X/1), UKRI Centre for Doctoral Training in Biomedical AI at the University of Edinburgh, School of Informatics.For the purpose of open access, the author has applied a creative commons attribution (CC BY) licence to any author accepted manuscript version arising We would like to thank Jennifer Doyle and Saskia Bos for critical review of the methods and fellow members of the Newcastle University Flow Cytometry Core Facility and Bio-Imaging Unit for advice and support as well as colleagues from Novopath for creating the TMAs used in this study.Finally we