Grayscale representation of infrared microscopy images by extended multiplicative signal correction for registration with histological images

Fourier‐transform infrared (FTIR) microspectroscopy is rounding the corner to become a label‐free routine method for cancer diagnosis. In order to build infrared‐spectral based classifiers, infrared images need to be registered with Hematoxylin and Eosin (H&E) stained histological images. While FTIR images have a deep spectral domain with thousands of channels carrying chemical and scatter information, the H&E images have only three color channels for each pixel and carry mainly morphological information. Therefore, image representations of infrared images are needed that match the morphological information in H&E images. In this paper, we propose a novel approach for representation of FTIR images based on extended multiplicative signal correction highlighting morphological features that showed to correlate well with morphological information in H&E images. Based on the obtained representations, we developed a strategy for global‐to‐local image registration for FTIR images and H&E stained histological images of parallel tissue sections.


| INTRODUCTION
Infrared imaging is used today by a wide vibrational spectroscopic community for medical diagnosis, such as diagnosis of in principle all types of cancer, and is rounding the corner to be a part of clinical routine analysis. Infrared imaging techniques allow biochemical investigations of intact cell materials and tissues [1][2][3][4][5][6], and establishment of classifiers for identification between healthy and diseased tissues via chemical spectral fingerprints. In order to build classifiers based on the chemical information inherent in infrared images, for example, for cancer diagnosis, it is necessary to establish a ground truth. Today, the ground truth is given by a pathologist's histopathological analysis of, for example, H&E stained images of microscopic sections. The same or parallel, that is adjacent tissue sections, are used for infrared imaging in addition to light microscope imaging [4]. In both cases, a registration of the infrared hyperspectral image cube with the microscopy image is necessary if one wants to establish the same ground truth for infrared images.
The image registration implies spatial alignment of two images. In the image registration process, one image is usually referred to as a fixed or a source image, while the other is referred to as a moving or a target image [7]. Every position of the fixed image corresponds in theory to some position on the moving image. A transformation between the fixed and the moving image can therefore be established. This transformation between the fixed and the moving image may imply operations such as locally or globally rotating and stretching the moving image. The flexibility of the transformation depends on the image registration process. The aim of the image registration is to estimate the optimal transformation by optimizing chosen to match criterion, which measures the degree of alignment of the fixed and the moving image.
While the matching of the infrared and H&E images is sought in the (2D) image domain, the number of variables of infrared and H&E images in every pixel differs. Every pixel of the infrared image represents an infrared spectrum with several hundreds of absorbance values. For the H&E image, every pixel represents a threechannel vector of R-red, G-green and B-blue intensity values. The process of matching images obtained with different imaging systems is called multimodal registration. For the multimodal registration of infrared hyperspectral images and light microscopy images, special care has to be taken with regard to mutual information in the images. The most important source of information in infrared hyperspectral images is chemical information. The chemical information is caused by absorption of infrared radiation by chemical bonds in the biological material that forms the tissue, and is not necessarily related to the morphological information obtained in light microscopy images. However, it is well-known that the morphology of tissue samples introduces strong scatter distortions to infrared images [2,[8][9][10][11]. Thus, the information from scatter distortions could be used for obtaining a grayscale representation of infrared images.
In the literature, two strategies for image registration of infrared images with H&E images have been suggested [12,13] in the context of which various approaches for image representation have been proposed. In Reference [12], two approaches are considered to obtain segmentation masks: (a) presegmentation based on k-means clustering of both H&E and infrared images and (b) binary segmentation by separating background and tissue pixels. An optimization of the mutual information metric was employed for both methods. In Reference [13], H&E images were converted to grayscale by obtaining the luminosity component from the RGB color space using a linear formula, while two different approaches were tested for obtaining grayscale representations of infrared images: (a) SD of spectral values, and (b) computation of spectral features such as the integral of the amide I region or the height and the width of certain preselected peaks. The approaches differ considerably in the type of information they use. However, all image representations for FTIR images suggested for image registration involve mainly chemical information. At the same time, models for estimating and separating chemical and scatter features in FTIR images are readily available [2,14]. Scatter distortions have been considered as a major obstacle in the interpretation and analysis of infrared hyperspectral microscopy images and approaches for the estimation of scatter parameters have been developed [2,[8][9][10][11]. While in most cases the aim of the scatter correction is to remove the scatter variations in order to achieve interpretable pure absorbance spectra containing only chemical absorption features, it is well known that scattering can also be considered as a source of information. It has been shown that scattering is directly related to physical properties of the sample, such as morphology and refractive index [2,[8][9][10][11]15]. Since physical sample properties also determine to a high degree the information in microscopy images obtained in the visible range, we wanted to evaluate if physical parameters of FTIR images can be used for multimodal image registration. A frequently used and stable approach for estimating physical parameters in FTIR spectra is extended multiplicative signal correction (EMSC) [14]. EMSC is a model-based approach that allows retrieving pure absorbance spectra, while in parallel parameters that are related to scattering and physical properties of the sample are calculated [2,16]. It has been demonstrated that physical information obtained by EMSC modeling can be used to improve classifiers when combined with chemical information from FTIR spectra [17]. EMSC parameters have been further used for infrared hyperspectral image segmentation and for establishing spectral quality test based on physical information in infrared spectra [16].
The aim of the current paper was to develop a robust approach based on FTIR image representation by EMSC scatter parameters. Using this image representation, we established an intensity-based registration framework with a multiresolution scheme for global and local transformations and mutual information as the similarity measure. Special attention was paid to the impact of different optimization methods on the quality of the registration, as well as the reproducibility of experiments and H&E-to-grayscale transformations. Existing image registration techniques for FTIR and H&E are compared to the one proposed in this paper.

| Data set
The data set used in this study was provided by the biospectroscopy group (Center for Protein Diagnostics [Prodi], Department of Biophysics, Ruhr University Bochum, Germany). The samples were randomly chosen from previous studies performed at the Department of Biophysics [4,6,18]. In total, 13 pairs of FTIR and corresponding H&E microscopic images of different types of tissues (10 lung, 2 colon and 1 bladder) were used for the study. All samples were collected during surgery and fixed as described in the previous publications. All thin tissue slices were deposited on LowE slides (Kevley, Chesterland, Ohio). For more information, please refer to [4,6,18].

| Histological staining
The same tissue samples (#1-8 and #11) were stained with Hematoxylin and Eosin (H&E) after FTIR spectroscopic measurements. The use of the same samples, as opposed to using adjacent sections, leads to a more precise overlay between the spectral image and the classical stained image. For the other four tissue sections (#9-10 and #12-13) an adjacent section was used for staining. For staining, the tissue samples were washed with Mili-Q water, stained 50 seconds with Harris Hematoxylin (VWR, Germany), washed with water, counterstained with eosin (Merck, Germany), dehydrated with increasing gradients of alcohol, and mounted with Euparal (ROTH, Germany). The stained sections were imaged automatically with an Olympus BX43 microscope with a 10× objective.
The resolution of all H&E images presented here was reduced due to the size limitations in the data format and the microscope operation software used in the study. Therefore, the pixel size does not correspond to the expected size for a 10× objective. An overview of the images used in this paper is given in Table 1.

| FTIR imaging
FTIR imaging was done in transflection (reflectionabsorption) of LowE slides as described previously in details in [4,6,18]. The measurements were performed using two Agilent (Santa Clara, California) Cary 620 infrared microscopes equipped with a 128 × 128 pixel liquid nitrogen-cooled (MCT) focal plane array detector. Spectra have been collected between 3700 and 950 cm −1 at a spectral resolution of 4 cm −1 . One hundred and twentyeight scans were co-added for sample and background spectra. The mapped pixel resolution is $5.5 μm, so the tissue sampling area is nearly 715 × 715 μm for each FPA field. The instrument and the microscope chamber of the instrument were continuously purged with dry air to avoid spectral contributions of atmospheric water. Furthermore, a liquid nitrogen cooling supply was installed (Norhof, Maarssen, Netherlands) at both systems enabling constant measurements 24 hours a day, 7 days a week. The Fourier transform was done using Mertz phase correction and Blackman-Harris 4-term apodization. The measurements were taken in the mosaic mode of the Agilent software. Individual mosaic tiles, each measuring 128 × 128 pixels, were stitched automatically after measurement. Each raw spectral vector consisted of 1428 data points (resolution 4 cm −1 , zero-filling 2, upper limit 5266 cm −1 ). The stitching for both instrument data sets was performed in Matlab (MathWorks, Natick, Massachusetts).

| Proposed image registration framework
The aim of this study was to develop a robust approach for obtaining grayscale representations of FTIR images that can be used for image registration, based on EMSC scatter parameters. In this study, a five-step approach for FTIR and H&E image registration is proposed. Figure 1 outlines the main steps and ideas of our proposed image registration framework.
In the first step, grayscale representations are established from both FTIR and H&E images. For the image registration, it is crucial to establish grayscale images from FTIR images with similar local contrast as the grayscale images of H&E images by using the EMSC multiplicative b parameter from Equation (5). The H&E image is converted to grayscale by computing the luminance component from Equation (1).
In the second step, manual preregistration is employed. Four visually identified and selected landmark points at both images are used for estimation of an affine transformation T 1 [19]. In this step, only a coarse transformation is determined, which is used as an initial transformation for the optimizer-based registration method in the following step. The initial manual registration step is preferred since the spatial resolution, dimensions and morphology of the images differ strongly. This makes fully automatic, as well as robust, initialization challenging and non-trivial.
In the third step, the parameters of the other affine transformation T 2 are estimated by maximizing the mutual information of the fixed image I f and the transformed moving image I m ∘ T 2 ∘ T 1 , using gradient descent or conjugate gradient descent optimization. The multiresolution approach is established to speed up the optimization process, and to make it more robust [20]. In this step, the images are aligned only linearly.
In the fourth step, the parameters of the cubic B-splines transformation T 3 are estimated by maximizing the mutual information of I f and I m ∘ T 3 ∘ T 2 ∘ T 1 images using the limited-memory Broyden-Fletcher-Goldfarb-Shanno (L-BFGS) optimization method with the multiresolution approach. Free form deformation is established for T 3 such that the moving image is deformed by the influence of the control points. In this step, local deformations of tissues are estimated, which are often present in, but not limited to, parallel sections.
In the last step, the composed transformation T 3 ∘ T 2 ∘ T 1 is applied to the indexes of the fixed image to obtain a warped moving image and align it to the fixed one. The measure of the registration quality is the transformation registration error (TRE), which represents a geometric accuracy between the moving and warped fixed landmarks in the moving image domain.

| H&E stained images
The H&E staining protocol and light-field imaging is routine in histopathology. Hematoxylin stains nucleic acids with a deep blue-purple hue, while eosin stains proteins with a pink color nonspecifically [21]. The histopathologically stained image received from the light microscope has a higher spatial resolution compared to the FTIR image, while the chemical information is contained in only three channels-RGB. Two approaches for obtaining grayscale representations from microscopic three-channel images were explored: (1) obtaining the luminance component using a linear Equations (1), and (2) estimating a total concentration of H&E dyes using different stain deconvolution methods [22,23].

First approach for image representation of H&E images
The H&E stained image is converted from RGB to grayscale using the linear combination of color channels and converting it to the luminance Y [24,25] according to where R, G and B represent the red, green and blue channels of the original H&E image. The luminance Y represents human brightness perception and is connected to the amount of absorbed light due to H&E stains. We denoted the resulting grayscale image by luminance image.
Second approach for image representation of H&E images An alternative approach is based on color deconvolution methods which are used in histopathology to obtain stain concentration maps from the RGB image. For color deconvolution, the original histological image I is first converted to an optical density image OD according to Beer-Lambert's Law: As was shown in Reference [26], the OD for each channel is linear to the concentration of the absorbing material and the following equation holds true: where OD flat is a N × 3 matrix and N is the total number of pixels in the image, C is a N × 2 matrix of H&E stain concentrations, and S is the stain matrix. Each row of S represents a specific stain (hematoxylin or eosin) and every column represents the optical density as detected by the red, green and blue channel for each stain [26]: Two methods were employed to estimate the stain matrix S: (a) the method of Macenko et al [22], which is based on a singular value decomposition and (b) the method of Vahadane et al [23] in which sparse nonnegative matrix factorization (SNMF) is used. When the matrix S is calculated, the stain concentration matrix C is estimated by Equation (3). Concentration maps are obtained by reshaping the matrix C to the image domain where every pixel is a vector of H&E concentrations. By summing up H&E density maps, a total concentration map is obtained. The resulting image is converted back to intensity values using Equation (2), and thus a grayscale representation of a histological image is obtained, which we denoted as total concentration image.
As was shown in Reference [22], for stability reasons of deconvolution algorithms, background pixels of the image should be removed. For this we first employed a brightness standardization procedure to make the background fully white: (a) the image is converted to CIELAB color space [24,25], (b) lightness values larger than the 95th percentile are set to 255 (fully white), (c) the image in CIELAB is converted back to RGB. The pixels with low absorbance (lightness higher than 0.8 in CIELAB color space) were thereafter truncated by thresholding.
The images obtained when applying Macenko et al deconvolution, we denoted as total C (Macenko) and total C (Vahadane) when applying Vahadane et al. deconvolution.

| FTIR images
The hyperspectral images used in this paper consist of a high number of pixels with a full spectrum in each pixel. Since infrared microscopy imaging data are strongly affected by Mie scatter distortions, the first step of the data processing is usually the Mie scatter correction to remove Mie scattering effects from the spectra. This correction had been performed previously on this data set [4,6]. State-of-the-art Mie scatter corrections for infrared microspectroscopy are based on iterative algorithms developed during the recent years [8][9][10][11]27]. The stateof-the-art algorithm employs a model based on EMSC, and uses an iterative procedure to restore the pure absorbance spectra. In the modeling process, parameters for model components are estimated [9][10][11]27]. However, since the process is iterative and model components are adapted individually for each iteration in the iterative process, the Mie scatter parameters are not directly comparable from spectrum to spectrum. In order to obtain parameters for each image pixel that refer to the same components, we use in this paper a basic EMSC approach on the raw spectra. It is important to note that the only aim of the basic EMSC approach here is to estimate model parameters for image registration and not to preprocess the image. Any further use of the infrared spectra, for example, for building a classifier as done in [4,6], may involve basic EMSC or a Mie correction of the spectra, which could be done after image registration.
In the present paper, we used the basic form of EMSC according to Reference [14]. The EMSC model in its basic form allows to model a measured apparent absorbance spectrum Zν ð Þ given at a wavenumberν with a polynomial of degree two where the parameter a is the baseline shift, b is the optical path length, and d and e are linear and wavelength-dependent variations. The model spectrum Z refν ð Þ represents the reference spectrum, for example, a mean spectrum, and ϵν ð Þ represents the unmodelled residual [2,11,14,28]. The parameters are estimated by ordinary least squares. Scatter contributions can be added to the model either based on empirical considerations such as the linear and quadratic contribution in Equation (5) or based on physical models [11]. The model parameter estimation is very stable, since it is based on all pixel observations and only four parameters are estimated. After applying the EMSC model to every spectrum of the FTIR image, parameter maps are obtained. When the parameters are estimated, corrected and nearly scatter-free spectra Z corrν ð Þ can be obtained according to According to Beer-Lambert' law, the measured absorbance spectrum in the infrared is directly proportional to the effective optical path length b. Therefore, the parameter b is an estimate of the optical density in the infrared spectral range. Both the H&E representations, that is, luminescence and the total concentration image, are optical densities. Therefore, the EMSC b parameter and the H&E representations are equivalent. Since the parameter b is directly related to the optical path length, it has a strong relation to the morphology of the sample [2]. Therefore, we use the parameter b that is estimated for each spectrum in the hyperspectral map for a grayscale representation. Another important point to mention is that the parameter b is a very stable estimate to be obtained from infrared spectra since it refers to the model reference spectrum. This is due to the fact that the signature of a biological sample (which the reference spectrum is) is very characteristic in the infrared and has -compared to noise or instrumental interferents-a very distinct and characteristic form.
The EMSC model of Equation (5) requires the estimation of a reference spectrum. A commonly used approach is to use the average spectrum of a set of raw spectra as a reference spectrum. In the paper at hand, we consider three different types of reference spectra. (a) We use the average spectrum of a complete infrared image. We refer to the global average spectrum as Zν ð Þ . (b) We first remove noisy spectra from empty or nearly empty regions of the section by applying an quality test according to [4,6,18], then calculate the average spectrum and use it as the reference spectrum. We denote this reference spectrum by Z qtν ð Þ. (c) We used the Matrigel spectrum from [11] as a reference spectrum. We label the Matrigel spectrum by Z Mν ð Þ. Using three different reference spectra as model spectra for EMSC resulted in three different EMSC parameters and three grayscale representations. In addition to the representations obtained by EMSC, we consider the grayscale representation which was computed as the SD of the original spectra, according to the approach from Reference [13].

| Image preprocessing
A variety of image preprocessing and enhancement methods can be applied to FTIR and H&E stained grayscale images prior to registration. Among them are histogram matching [29], histogram equalization techniques [30], brightness standardization, gamma and logarithmic correction. In this study, the H&E stained grayscale representation, which we denoted by contrasted image, was obtained using brightness standardization and extracting the luminance image. We tested histogram equalization techniques for both grayscale images, but no improvement of the registration quality was observed. Thus, no preprocessing was employed on the FTIR grayscale representations.

| Image registration background
Image registration is the process of spatially aligning and overlaying two or more images. In this paper, we consider multi-modal registration of two grayscale images: the fixed image I f and the moving image I m . Image registration methods can be classified into two main groups: intensity-based (or area-based) methods and featurebased methods [31]. We consider both types of methods. In our intensity-based framework, the similarity metric is computed on the physical coordinates of a fixed image. Therefore, it is preferable to select the image with the smallest number of pixels as a fixed image in order to speed up the calculations. Since the FTIR grayscale representation had a lower number of pixels, we selected the FTIR grayscale representation as the fixed image I f and the H&E grayscale image as the moving one I m . This saved computation time.

| Intensity-based registration framework
Intensity-based registration is usually formulated as an optimization problem of an iterative search of the spatial mapping T, which optimizes (minimizes or maximizes) the chosen matching criterion M(I f , I m ∘ T). The transformation model T, which maps points in the fixed image I m (x) to points in the moving image I f (x) is defined on x ∈Ω with parameter vector θ = (θ 1 , …, θ n ). The matching criterion, that is, the (dis)similarity criterion M, characterizes a measure of how well the transformed moving image is aligned with the fixed one. In case of maximization of the similarity criterion, the task of image registration is to find the parameter θ*, such that θ* = argmax θ M(I f , I m ∘ T).
Interpolation is needed to compute the similarity criterion, where image values I m (T(x, θ)) are evaluated in nongrid positions. In our study, linear interpolation was used.

| Matching criterion
Although FTIR and H&E grayscale images represent the morphology of the same tissue, there are obviously differences in sample preparation, image acquisition and transformation. It was shown in Reference [32], that metrics based on the calculation of mutual information are well suited and widely used for such multi-modal registration (see Supporting Information for more details).
There are several ways to implement the mutual information metric [32]. Mattes et al [33] approach was established in this study, using 50 histogram bins.

| Transformation model
Transformation defines a mapping from points on the moving image to every point on the fixed image. In this paper, a composition of three transformation models T 3 ∘ T 2 ∘ T 1 was employed, where T 1 and T 2 are global affine transformations, and T 3 is a local or deformable transformation model using cubic B-splines.
The affine transformation has six parameters and the maximum degree of freedom among 2D linear transformations: This model allows rotations, translations, nonuniform scaling and shearing, which maps a parallelogram into a square. The T 1 transformation was roughly estimated using manual landmarks, while T 2 was able to refine the resulting composed transformation T 2 ∘ T 1 using an optimization procedure.
Due to the artifacts of the sample preparation and differences in sample sections, a local transformation model was established. A lot of effort has been directed to research on the deformable or non-rigid image registration (see [7] for a review paper on deformable medical image registration). The deformable transformation allows local regions to be registered independently: where u(x, y) is the displacement field. The free-form deformation model T 3 , which is one of the most common types of deformable transformation models, was estimated by maximization of the MI metric M(I f , I m ∘ T 3 ∘ T 2 ∘ T 1 ) and cubic B-splines [34]. In the current study, we compared two free-form deformation models. One is based on a 10 × 10 grid of B-spline control points which we denoted further as simple transformation. For the other, which we denoted as advanced transformation, in addition, a multiresolution transformation model complexity scheme is used (see Section 2.5.5).

| Optimization method
The role of the optimization method is to estimate a parameter vector θ of a chosen transformation T and minimize/maximize a metric M. The most common class of optimization methods is the class of continuous methods. The general form for continuous optimization is: where θ denotes the vector of parameters of the transformation, n is the iteration number, α n the learning rate and g is the direction of the parameter search. In general, the search direction is computed in respect to metric (or matching criterion) M and regularization term R and can thus be written as g t (M(θ n ) + R(θ n )). The regularization term helps to avoid unrealistic deformations, but was not used in the current study. In this paper, the affine transformation T 1 was estimated using a closed-form solution [19,35], the affine transformation T 2 using gradient descent (GD) and conjugate gradient (CG) methods, while the cubic-B spline transformation T 3 was estimated with the Broyden-Fletcher-Goldfarb-Shanno algorithm (BFGS). More specifically, it was done by a limited-memory BFGS (L-BFGS) which is part of a quasi-Newton family of methods [34]. Gradient descent methods are based on the optimization of parameters by following the negative gradient of an objective function: g = −r θ M(θ n ). Conjugate gradient methods utilize the knowledge of the previous gradient so that the new gradient is conjugate to the previous search direction: g n = −r θ M(θ n ) + β n g n − 1 , where β n is a weighting factor which was calculated using the Polak-Ribiere formula [7]. Learning rate and parameters scaling estimation at every iteration was used for GD and CG [36]. Learning rate estimation helped in convergence, whereas estimation of parameters scaling resolves the difference in magnitude of the shifting, scaling and rotation parameters. Quasi-Newton methods aim to estimate the inverse Hessian matrix H −1 (θ) and the search direction is defined as g = −H −1 (θ)r θ M(θ n ). The exhaustive optimizer (EO) was tested as the initialization step, but due to lack of robustness, manual annotations were chosen as an initialization step for the optimization.

| Sampling strategy
During the optimization, matching criterion M along with its gradient ∂M/∂θ are computed on a selected set of samples x ∈Ω f . Several sampling strategies are available in the literature. Among them are random sampling, quasi-random sampling [37] and sampling based on striking image features like edges [38]. Since in our data set the amount of pixels on a fixed image is usually lower than on the moving one, we used all pixels of the fixed image sampled on a random grid. This improved stability. For the sake of reproducibility, the sampling random seed was fixed across all the experiments.

| Multiresolution strategies
Multiresolution or hierarchical strategies are very widely used in image registration [20,31,32,38]. We established image pyramids with three levels, utilizing two multiresolution strategies. The first strategy applies a Gaussian pyramid (σ = 2, 1 pixels) with downsampling (factor = 4, 2) to the image data. We used the second strategy for computing the advanced transformation model. This strategy is aimed for the gradual increase of model complexity where grid spacing is halved every resolution level starting from a 10 × 10 grid.

| Evaluation metric
First of all, the registration results were evaluated visually. However, a quantitative measure of the quality of the results is required to compare different methods. An efficient way to evaluate multimodal image registration of microscopic images quantitatively is to use a target registration error denoted by TRE. For this, we manually selected a set of landmarks {x i } on a fixed (FTIR) image, and a corresponding set of landmarks {y i } on a moving (H&E stained) image for every pair. Landmarks {x i } were warped using the computed transformation T. The TRE is calculated as the Euclidean distance between landmark y i and warped landmark T(x i ): TRE = d(T(x i ), y i ). In order to compare the TRE across images with different pixel resolutions, we calculated absolute TRE which we denote by aTRE = TRE × r m , where r m is a pixel resolution of a moving image. Furthermore, a median of aTRE, which we call maTRE, is calculated for all aTRE in the image in order to avoid penalizing inaccurately selected landmarks.
We selected from 9 to 20 landmark points in every image pair. According to the literature [31], it is more preferable to have more landmark points with higher localization error, rather than having only a few of them which were detected more precisely. We should mention that landmarks used for evaluation and for initial transformation are completely independent. Ones for estimating the initial transformation were selected very roughly. Those used in the evaluation metric were selected carefully as precise as possible by looking at the big picture first and zooming the area of interest afterward with comparing the local intensities of corresponding landmarks on images.

| Implementation details
We adapted the code from [39] for obtaining the total concentration image which was used as one of the grayscale representations of the H&E image. In-house python routines for FTIR-to-grayscale and H&E-to-grayscale conversions, and for establishing image registration were developed. We used Fiji [40] and its macros from [41] for the landmark selection. SimpleITK [42] which is part of the Insight Segmentation and Registration Toolkit (ITK) [36] was used as the image registration framework. The parameters of image registration methods are listed in Table S1. The source code used in this study is publicly available at https://github.com/BioSpecNorway/IRmHiRegistration.

| RESULTS AND DISCUSSION
Thirteen pairs of FTIR and H&E stained images of bladder, lung and colon tissues were used to evaluate the proposed registration framework. H&E stained images were converted to grayscale representations by obtaining (a) the luminance of the RGB image which we refer to as luminance image, (b) the luminance of the brightness standardized RGB image denoted as contrasted luminance, (c) the total concentration images using the deconvolution method of Macenko et al. [22] denoted as total C(Macenko) and (d) using the deconvolution method of Vahadane et al [23] denoted as total C(Vahadane) which applies color deconvolutions as was described in Section 2.3. The grayscale representations of the H&E stained images are shown in Figure 2. The luminance transformation, as well as the contrasted luminance transformation, are shown in Figure 2B,C. They are straightforward, fast to compute and represent sample morphology well. Grayscale representations obtained from the H&E stained images by color deconvolution are shown in Figure 2D,E. Color deconvolution is the procedure where we calculate stain vectors and concentration maps. Received concentration maps are then summed up to obtain grayscale representation. Visually, images transformed in this way appear even more contrasted. We measured the contrast of images using Root-mean-square (RMS) contrast [43]. Averaged RMS contrast is 39.58 ± 9.5, 46 ± 10.65, 61.83 ± 13.25 and 61.19 ± 13.22 for luminance, contrasted luminance, total C(Macenko) and total C(Vahadane) images respectively. It has been shown that color deconvolution methods have powerful stain normalization capabilities and remove undesirable color variation [22,23]. However, color deconvolution requires more computational time and memory, as well as reference images. As reported in [25], there are many other color-to-grayscale transformations such as Intensity, Luminance, Value, Luster and Decolorize. According to [25], these color-to-grayscale algorithms are very different in their performances for image recognition tasks using a Naive Bayes Nearest Neighbor framework with different image descriptors. However, we found that different grayscale representations did not influence the quality of the intensity-based registration procedure we used in our study, as shown further below.
FTIR images were transformed to grayscale using the SD according to [13] and the methods we propose in this paper, which involve parameters calculated from an EMSC model. Among the EMSC parameters, the most informative parameter is the parameter b of Equation (5). It is informative, since it is directly related to the optical thickness of the material. It has been shown previously that the parameter b displays distinct morphological information in infrared images which could even be related to chemical changes in the images, when morphology and chemistry were correlated [2,17]. Since the EMSC parameter b allows a stable estimation and since it is strongly related to the morphology, we suggest to use the EMSC model parameter b to calculate grayscale images.
In order to test the effect of the reference spectra on the EMSC model, we used three different reference spectra which resulted in three different EMSC models: (a) the EMSC model labeled as EMSC Z Mν ð Þ ð Þwas calculated using the Matrigel spectrum [44], which is a spectrum obtained from Reference [11], and which represents a nearly scatter-free spectrum of tissue components, (b) the average spectrum of a full infrared hyperspectral image (model EMSC Zν ð Þ ð Þ), and (c) the average spectrum of spectra that have passed a quality test (model EMSC Z qtν ð Þ À Á ). The b parameters estimated by the corresponding three models are plotted as grayscale images in Figure 3A,E,F for the EMSC models with reference spectra (a), (b) and (c), respectively. The EMSC parameters a, d and e for the EMSC model using the matrigel spectrum as a reference are shown in Figure 3B,C, D, respectively. It can be seen that all three EMSC models using the three different reference spectra result in grayscale images with a high contrast. The parameters a, d and e in Figure 3B,C,D obviously depict slightly different morphological features with a lower contrast with respect to morphology. The parameters a, d and e show a tiling effect, as the edges of the focal plane array detector are clearly visible. Since the absorbance spectra are obtained by dividing the measured sample intensities by the measured background intensities followed by calculating the logarithm, the tiling effect cannot be due to a trivial intensity variation. Therefore, it must be related to a physical effect where radiation at the edges of the focal plane array detector is lost in the sample due to scattering or other optical effects. The EMSC algorithm seems to capture this effect very well through the parameters a, F I G U R E 3 A-D, The estimated b, a, d, e parameters of the EMSC model with the Matrigel as a reference spectrum. A, E, F, G, FTIR grayscale representations are used in the registration. E, The estimated b parameter of the EMSC model using a reference spectrum calculated as a pixel-wise mean of a whole hyperspectral image. F, The estimated b parameter of the EMSC model using a reference spectrum calculated as a mean of quality assessed foreground spectra. G, SD for every spectrum of the raw FTIR image. H, Binary image of a quality test (white-foreground or tissue region, black-background) d and e. It is important to note that the parameter b referring to the optical path length is not affected by the tiling effect. The SD image, which calculates the grayscale representation using SD for each spectrum, is shown in Figure 3G. It can be seen that the SD image is strongly affected by the tiling effect and it therefore shows a lower contract compared to the EMSC parameter b image representation. Figure 3H shows the image segmentation obtained by performing a quality test of the images according to [4,6,18]. The segmented image was used to calculate the reference spectrum Z qtν ð Þ. After establishing grayscale images as representations for the H&E and the FTIR images, we turn now to the image registration procedure. In Figure 4 the registration results are shown for the image pair #11. In Figure 4A the grayscale FTIR EMSC Z Mν ð Þ ð Þb image (cyan) superposed with warped grayscale total C(Vahadane) image is shown after advanced local transformation. A zoomed region after manual landmark transformation is shown in Figure 4B, after affine transformation estimated by the conjugate gradient method is shown in Figure 4C and after advanced local transformation estimated by the L-BFGS method is shown in Figure 4E,D. The first essential step in our proposed registration framework is the initial alignment of images. To the best of our knowledge, current light-field and FTIR microscopes do not supply meta information of acquired images, which can include image origin, spacing and directional cosines as in CT or MRI modalities. Accurate estimation of a pixel spacing, i.e. pixel resolution, could greatly simplify the initial alignment, since in such a case the search space of the parameters is reduced. We tested several automatic methods based on matching of image feature descriptors including SIFT, SURF, ORB [45][46][47] and a method proposed in Reference [13]. In addition, we employed an exhaustive optimizer with prior image moments centering to search through the specified range of parameters for a similarity transformation. However, none of these automatic methods showed to be robust in estimating the initial affine transformation T 1 for every image pair in our data set. The main reason for this is that H&E and FTIR images differ in basic image characteristics such as spatial resolution, initial orientation, and dimensions, as well as the difference in regions of interest. Feature-based methods are generally not robust because images commonly have different local features. Such methods, therefore, suffer from the false matching of descriptors. The intensity-based exhaustive method can get stuck in the local minimum very easily and in addition to that is computationally expensive. In addition, despite the fact that these methods are automatic, they still require hyperparameters tuning for every image pair which is usually done either manually or using a timeconsuming grid search. Therefore, we decided to manually select four corresponding landmarks in both images which yielded a reasonable estimation of scaling, shifting and rotation parameters of the T 1 transformation.
As described in the materials and methods section, the initial transformation T 1 was refined by the affine transformation T 2 which was obtained by maximizing the mutual information of two grayscale images. We compared gradient descent and conjugate gradient optimization methods with parameters described in the Section 2.5. The conjugate gradient showed faster convergence and slightly better registration results than gradient descent (see Figure 5).
An important step in the proposed registration scheme is the deformable or nonrigid registration. We observed local differences in sample morphology even in cases where the same tissue samples were used for both FTIR spectroscopic and optical microscopic measurements (see Figure 4). In Figure 4C,  To compensate for local differences, deformable registration models are applicable [7,13,32,48]. As was described in Section 2.5, a deformable transformation T 3 parameterized by cubic B-splines was estimated using the L-BFGS method based on mutual information. The transformation T 2 ∘ T 1 was used as the initial transformation for T 3 , where T 2 was computed using the conjugate gradient method. We calculated simple transformation with a 10 × 10 grid of B-spline control points and advanced transformation where grid spacing is halved at every resolution level. In our study, the simple transformation model provides a good balance between registration accuracy and computational time. However, in cases of large differences in sample morphology (eg, due to local stretching and in case of adjacent tissue sections) advanced transformation with a large number of parameters are required. To be able to handle such large differences, high capacity registration models can be optimized with a regularization term, which penalizes unrealistic transformations [7]. In Figure 4E, we see that FTIR and H&E images are perfectly aligned using advanced local transformation T 3 ∘ T 2 ∘ T 1 . Additionally, the computed displacement field (see Figure 4D) confirms our observations that there are local differences in sample morphology of two measurements.
Furthermore, we compared the performance of registration methods considered in the study (see Figure S2). Affine transformations computed using Gradient Descent and Conjugate Gradient took, on average, 64 ± 57 seconds and 189 ± 121 seconds correspondingly. A computing of Cubic B-Spline simple and advance transformations took 133 ± 146 seconds and 2711 ± 3056 accordingly. In total, it can take around 5 minutes for a simple transformation and 1 hour for advanced transformation.
As was shown by Rohlfing [49], surrogate measures, for example, cross-correlation which are commonly used in estimating the accuracy of nonrigid image registration algorithms do not provide valid evidence for accurate registrations and should thus not be reported or accepted as such. In our study, the registration quality was measured by the registration transformation error (TRE), which was computed using in average 12 manually selected corresponding landmarks in both images. However, we found that often it is problematic to find corresponding landmarks in the images of adjacent tissue sections because of the morphology difference (see Figure S3). F I G U R E 5 Box plots of the measured median absolute target registration errors (maTRE) for two affine registration methods (using gradient descent and conjugate gradient optimization methods) and for two cubic B-Spline transformations using L-BFGS optimization method (simple transformation denoted by (1), advanced transformation denoted by (2) Some landmarks were very precisely marked, while a few others comprised a notable marking error. In order to take into account inaccurately selected landmarks when measuring the quality of registration, we calculated the median of TREs as described in Section 2.6. Furthermore, maTREs were obtained which allowed comparing several algorithms of registration and grayscale representation across images with different pixel resolution. In our study, we thoroughly explored 64 image registration experiments per image pair, which correspond to all combinations of four FTIR-to-grayscale, four H&E-tograyscale and four image registration algorithms. In Figure 5, we present only the most intriguing combinations of grayscale representation algorithms. All combinations are presented in Figure S1. In this figure, box-and-whisker plots show medians of maTRE values. From this, we see that conjugate gradient method is appeared to be more precise than gradient descent in estimating global affine transformation. Using local transformations, specifically with a higher degree of freedom as in advanced transformation, greatly reduced maTREs. The registration of the same tissue sections (#1-8 and #11) shows high accuracy. However, the registration of the adjacent tissue sections (#9-10 and #12-13) appeared to be more challenging. For the image pair #9, it was difficult to find enough corresponding landmarks in two images to obtain an unbiased measure of the image registration error. This is due to the large difference in morphology. Thus, we excluded it from a quantitative comparison. It is worth mentioning that for the image pair #10 the registration error was significantly reduced by the local registration methods (see Figure 5). We recommend therefore to use the same tissue sections for FTIR imaging and H&E staining when images are to be registered. Adjacent tissue sections may naturally show locally strong differences in morphology, a fact that is reducing the registration quality considerably.
Furthermore, from these results, we have observed that different algorithms of grayscale conversion do not influence the quality of the proposed intensity-based image registration scheme. However, as we have seen in our experiments, the results of feature-based image registration for estimating the initial transformation were in most cases similar to those presented in Reference [13]. We think that our proposed algorithm of FTIR-to-grayscale conversion based on EMSC b parameter should thus be preferred since it does not produce the tiling effect.

| CONCLUSION
In this work, we proposed a multimodal registration scheme of infrared spectroscopic and optical microscopic images. We compared different methods for grayscale representations of FTIR and H&E stained images and image registration, including methods that can deal with local deformation of images. We studied various algorithms of H&E-to-grayscale transformations. Since we did not find qualitative and quantitative differences in the quality of intensity-based image registration comparing several grayscale representations, we suggest using the simplest H&E-to-grayscale transformation, that is, the luminance image.
We tested feature-based image registration methods (including the one suggested in Reference [13]) for all possible combinations of grayscale representations at hand. Due to the nature of several samples, it was not possible to match the descriptors of detected salient image features [45][46][47], which was also reported in [12]. For those pairs of images, when feature-based methods are not applicable, we suggest estimating the initial affine transformation using matching of manually selected landmarks on two images. This initial transformation was then refined by an intensity-based registration framework, which includes two steps: (a) estimating a global affine transformation, and (b) estimating a local deformable transformation. The resulting transformation is a composition of three calculated transforms. We used a transformation composition and an online interpolator which allowed minimizing the pixel intensity computation error. The results of the developed registration scheme showed that we can achieve high-quality alignment of multimodal images of similar morphology. The obtained results suggest that with the proposed registration scheme, clinically labeled regions of the H&E stained image can be automatically transferred into the FTIR image. In addition, through alignment, a tissue sample can be thoroughly analyzed using two complementary sources of information. To our knowledge, there is only one openly accessible algorithm [12] for image registration of FTIR images with H&E images, which is focused on the automation of image registration. We address specifically the issue of the quality of the image registration and present a framework for image registration of FTIR and H&E stained images. We further published the source code of the framework presented in this paper as an open access algorithm on https://github. com/BioSpecNorway/IRmHiRegistration.
Since morphological information is needed for image registration, we based our grayscale representation of FTIR images on a parameter that represents the effective optical path length. This is in line with the calculation of the optical density images for the H&E images. Moreover, the proposed approach of FTIR-to-grayscale representation can be applied to other imaging techniques such as Raman [50] and NIR [14] where the EMSC method can be used in the same manner. We showed visually that grayscale representations that are obtained from the parameter related to the effective optical path length show a high contrast and is in agreement with the morphological features contained in H&E images. We showed further that the FTIR-to-grayscale method based on the SD calculated for each spectrum, which was used in Reference [13], show a tiling effect in several images. This tiling effect was nicely separated by the EMSC parameters and does not affect the multiplicative b parameter of an EMSC model. To our knowledge, the tiling effect was not yet explored in the literature. However, we saw the evidence that among the EMSC parameters, the parameter b is strongly related to the sample morphology and is free from the tiling effect. Thus, we suggest to use this parameter to build the grayscale representation of the FTIR image.